No OneTemporary
Actions

Subscribers

None

File Metadata

Created: Thu, Jun 19, 16:02

Offset	End	Complete
0	4194304	Yes
4194304	7385135	Yes

View Options

This file is larger than 256 KB, so syntax highlighting was skipped.

	diff --git a/doc/src/Manual.txt b/doc/src/Manual.txt
	index a5a874fa8..bceb17401 100644
	--- a/doc/src/Manual.txt
	+++ b/doc/src/Manual.txt
	@@ -1,340 +1,339 @@
	<!-- HTML_ONLY -->
	<HEAD>
	<TITLE>LAMMPS Users Manual</TITLE>
	-<META NAME="docnumber" CONTENT="11 Apr 2017 version">
	+<META NAME="docnumber" CONTENT="4 May 2017 version">
	<META NAME="author" CONTENT="http://lammps.sandia.gov - Sandia National Laboratories">
	<META NAME="copyright" CONTENT="Copyright (2003) Sandia Corporation. This software and manual is distributed under the GNU General Public License.">
	</HEAD>

	<BODY>

	<!-- END_HTML_ONLY -->

	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	<H1></H1>

	LAMMPS Documentation :c,h3
	-11 Apr 2017 version :c,h4
	+4 May 2017 version :c,h4

	Version info: :h4

	The LAMMPS "version" is the date when it was released, such as 1 May
	2010. LAMMPS is updated continuously. Whenever we fix a bug or add a
	feature, we release it immediately, and post a notice on "this page of
	the WWW site"_bug. Every 2-4 months one of the incremental releases
	is subjected to more thorough testing and labeled as a {stable} version.

	Each dated copy of LAMMPS contains all the
	features and bug-fixes up to and including that version date. The
	version date is printed to the screen and logfile every time you run
	LAMMPS. It is also in the file src/version.h and in the LAMMPS
	directory name created when you unpack a tarball, and at the top of
	the first page of the manual (this page).

	If you browse the HTML doc pages on the LAMMPS WWW site, they always
	describe the most current [development] version of LAMMPS. :ulb,l

	If you browse the HTML doc pages included in your tarball, they
	describe the version you have. :l

	The "PDF file"_Manual.pdf on the WWW site or in the tarball is updated
	about once per month. This is because it is large, and we don't want
	it to be part of every patch. :l

	There is also a "Developer.pdf"_Developer.pdf file in the doc
	directory, which describes the internal structure and algorithms of
	LAMMPS. :l
	:ule

	LAMMPS stands for Large-scale Atomic/Molecular Massively Parallel
	Simulator.

	LAMMPS is a classical molecular dynamics simulation code designed to
	run efficiently on parallel computers. It was developed at Sandia
	National Laboratories, a US Department of Energy facility, with
	funding from the DOE. It is an open-source code, distributed freely
	under the terms of the GNU Public License (GPL).

	The current core group of LAMMPS developers is at Sandia National
	Labs and Temple University:

	"Steve Plimpton"_sjp, sjplimp at sandia.gov :ulb,l
	Aidan Thompson, athomps at sandia.gov :l
	Stan Moore, stamoor at sandia.gov :l
	"Axel Kohlmeyer"_ako, akohlmey at gmail.com :l
	:ule

	Past core developers include Paul Crozier, Ray Shan and Mark Stevens,
	all at Sandia. The [LAMMPS home page] at
	"http://lammps.sandia.gov"_http://lammps.sandia.gov has more information
	about the code and its uses. Interaction with external LAMMPS developers,
	bug reports and feature requests are mainly coordinated through the
	"LAMMPS project on GitHub."_https://github.com/lammps/lammps
	The lammps.org domain, currently hosting "public continuous integration
	testing"_https://ci.lammps.org/job/lammps/ and "precompiled Linux
	RPM and Windows installer packages"_http://rpm.lammps.org is located
	at Temple University and managed by Richard Berger,
	richard.berger at temple.edu.

	:link(bug,http://lammps.sandia.gov/bug.html)
	:link(sjp,http://www.sandia.gov/~sjplimp)
	:link(ako,http://goo.gl/1wk0)

	:line

	The LAMMPS documentation is organized into the following sections. If
	you find errors or omissions in this manual or have suggestions for
	useful information to add, please send an email to the developers so
	we can improve the LAMMPS documentation.

	Once you are familiar with LAMMPS, you may want to bookmark "this
	page"_Section_commands.html#comm at Section_commands.html#comm since
	it gives quick access to documentation for all LAMMPS commands.

	"PDF file"_Manual.pdf of the entire manual, generated by
	"htmldoc"_http://freecode.com/projects/htmldoc

	<!-- RST

	.. toctree::
	:maxdepth: 2
	:numbered:
	:caption: User Documentation
	:name: userdoc
	:includehidden:

	Section_intro
	Section_start
	Section_commands
	Section_packages
	Section_accelerate
	Section_howto
	Section_example
	Section_perf
	Section_tools
	Section_modify
	Section_python
	Section_errors
	Section_history

	.. toctree::
	:caption: Index
	:name: index
	:hidden:

	tutorials
	commands
	fixes
	computes
	pairs
	bonds
	angles
	dihedrals
	impropers

	Indices and tables
	==================

	* :ref:`genindex`
	* :ref:`search`

	END_RST -->

	<!-- HTML_ONLY -->
	"Introduction"_Section_intro.html :olb,l
	1.1 "What is LAMMPS"_intro_1 :ulb,b
	1.2 "LAMMPS features"_intro_2 :b
	1.3 "LAMMPS non-features"_intro_3 :b
	1.4 "Open source distribution"_intro_4 :b
	1.5 "Acknowledgments and citations"_intro_5 :ule,b
	"Getting started"_Section_start.html :l
	2.1 "What's in the LAMMPS distribution"_start_1 :ulb,b
	2.2 "Making LAMMPS"_start_2 :b
	2.3 "Making LAMMPS with optional packages"_start_3 :b
	- 2.4 "Building LAMMPS via the Make.py script"_start_4 :b
	- 2.5 "Building LAMMPS as a library"_start_5 :b
	- 2.6 "Running LAMMPS"_start_6 :b
	- 2.7 "Command-line options"_start_7 :b
	- 2.8 "Screen output"_start_8 :b
	- 2.9 "Tips for users of previous versions"_start_9 :ule,b
	+ 2.4 "Building LAMMPS as a library"_start_4 :b
	+ 2.5 "Running LAMMPS"_start_5 :b
	+ 2.6 "Command-line options"_start_6 :b
	+ 2.7 "Screen output"_start_7 :b
	+ 2.8 "Tips for users of previous versions"_start_8 :ule,b
	"Commands"_Section_commands.html :l
	3.1 "LAMMPS input script"_cmd_1 :ulb,b
	3.2 "Parsing rules"_cmd_2 :b
	3.3 "Input script structure"_cmd_3 :b
	3.4 "Commands listed by category"_cmd_4 :b
	3.5 "Commands listed alphabetically"_cmd_5 :ule,b
	"Packages"_Section_packages.html :l
	4.1 "Standard packages"_pkg_1 :ulb,b
	4.2 "User packages"_pkg_2 :ule,b
	"Accelerating LAMMPS performance"_Section_accelerate.html :l
	5.1 "Measuring performance"_acc_1 :ulb,b
	5.2 "Algorithms and code options to boost performace"_acc_2 :b
	5.3 "Accelerator packages with optimized styles"_acc_3 :b
	5.3.1 "GPU package"_accelerate_gpu.html :ulb,b
	5.3.2 "USER-INTEL package"_accelerate_intel.html :b
	5.3.3 "KOKKOS package"_accelerate_kokkos.html :b
	5.3.4 "USER-OMP package"_accelerate_omp.html :b
	5.3.5 "OPT package"_accelerate_opt.html :ule,b
	5.4 "Comparison of various accelerator packages"_acc_4 :ule,b
	"How-to discussions"_Section_howto.html :l
	6.1 "Restarting a simulation"_howto_1 :ulb,b
	6.2 "2d simulations"_howto_2 :b
	6.3 "CHARMM and AMBER force fields"_howto_3 :b
	6.4 "Running multiple simulations from one input script"_howto_4 :b
	6.5 "Multi-replica simulations"_howto_5 :b
	6.6 "Granular models"_howto_6 :b
	6.7 "TIP3P water model"_howto_7 :b
	6.8 "TIP4P water model"_howto_8 :b
	6.9 "SPC water model"_howto_9 :b
	6.10 "Coupling LAMMPS to other codes"_howto_10 :b
	6.11 "Visualizing LAMMPS snapshots"_howto_11 :b
	6.12 "Triclinic (non-orthogonal) simulation boxes"_howto_12 :b
	6.13 "NEMD simulations"_howto_13 :b
	6.14 "Finite-size spherical and aspherical particles"_howto_14 :b
	6.15 "Output from LAMMPS (thermo, dumps, computes, fixes, variables)"_howto_15 :b
	6.16 "Thermostatting, barostatting, and compute temperature"_howto_16 :b
	6.17 "Walls"_howto_17 :b
	6.18 "Elastic constants"_howto_18 :b
	6.19 "Library interface to LAMMPS"_howto_19 :b
	6.20 "Calculating thermal conductivity"_howto_20 :b
	6.21 "Calculating viscosity"_howto_21 :b
	6.22 "Calculating a diffusion coefficient"_howto_22 :b
	6.23 "Using chunks to calculate system properties"_howto_23 :b
	6.24 "Setting parameters for pppm/disp"_howto_24 :b
	6.25 "Polarizable models"_howto_25 :b
	6.26 "Adiabatic core/shell model"_howto_26 :b
	6.27 "Drude induced dipoles"_howto_27 :ule,b
	"Example problems"_Section_example.html :l
	"Performance & scalability"_Section_perf.html :l
	"Additional tools"_Section_tools.html :l
	"Modifying & extending LAMMPS"_Section_modify.html :l
	10.1 "Atom styles"_mod_1 :ulb,b
	10.2 "Bond, angle, dihedral, improper potentials"_mod_2 :b
	10.3 "Compute styles"_mod_3 :b
	10.4 "Dump styles"_mod_4 :b
	10.5 "Dump custom output options"_mod_5 :b
	10.6 "Fix styles"_mod_6 :b
	10.7 "Input script commands"_mod_7 :b
	10.8 "Kspace computations"_mod_8 :b
	10.9 "Minimization styles"_mod_9 :b
	10.10 "Pairwise potentials"_mod_10 :b
	10.11 "Region styles"_mod_11 :b
	10.12 "Body styles"_mod_12 :b
	10.13 "Thermodynamic output options"_mod_13 :b
	10.14 "Variable options"_mod_14 :b
	10.15 "Submitting new features for inclusion in LAMMPS"_mod_15 :ule,b
	"Python interface"_Section_python.html :l
	11.1 "Overview of running LAMMPS from Python"_py_1 :ulb,b
	11.2 "Overview of using Python from a LAMMPS script"_py_2 :b
	11.3 "Building LAMMPS as a shared library"_py_3 :b
	11.4 "Installing the Python wrapper into Python"_py_4 :b
	11.5 "Extending Python with MPI to run in parallel"_py_5 :b
	11.6 "Testing the Python-LAMMPS interface"_py_6 :b
	11.7 "Using LAMMPS from Python"_py_7 :b
	11.8 "Example Python scripts that use LAMMPS"_py_8 :ule,b
	"Errors"_Section_errors.html :l
	12.1 "Common problems"_err_1 :ulb,b
	12.2 "Reporting bugs"_err_2 :b
	12.3 "Error & warning messages"_err_3 :ule,b
	"Future and history"_Section_history.html :l
	13.1 "Coming attractions"_hist_1 :ulb,b
	13.2 "Past versions"_hist_2 :ule,b
	:ole

	:link(intro_1,Section_intro.html#intro_1)
	:link(intro_2,Section_intro.html#intro_2)
	:link(intro_3,Section_intro.html#intro_3)
	:link(intro_4,Section_intro.html#intro_4)
	:link(intro_5,Section_intro.html#intro_5)

	:link(start_1,Section_start.html#start_1)
	:link(start_2,Section_start.html#start_2)
	:link(start_3,Section_start.html#start_3)
	:link(start_4,Section_start.html#start_4)
	:link(start_5,Section_start.html#start_5)
	:link(start_6,Section_start.html#start_6)
	:link(start_7,Section_start.html#start_7)
	:link(start_8,Section_start.html#start_8)
	:link(start_9,Section_start.html#start_9)

	:link(cmd_1,Section_commands.html#cmd_1)
	:link(cmd_2,Section_commands.html#cmd_2)
	:link(cmd_3,Section_commands.html#cmd_3)
	:link(cmd_4,Section_commands.html#cmd_4)
	:link(cmd_5,Section_commands.html#cmd_5)

	:link(pkg_1,Section_packages.html#pkg_1)
	:link(pkg_2,Section_packages.html#pkg_2)

	:link(acc_1,Section_accelerate.html#acc_1)
	:link(acc_2,Section_accelerate.html#acc_2)
	:link(acc_3,Section_accelerate.html#acc_3)
	:link(acc_4,Section_accelerate.html#acc_4)

	:link(howto_1,Section_howto.html#howto_1)
	:link(howto_2,Section_howto.html#howto_2)
	:link(howto_3,Section_howto.html#howto_3)
	:link(howto_4,Section_howto.html#howto_4)
	:link(howto_5,Section_howto.html#howto_5)
	:link(howto_6,Section_howto.html#howto_6)
	:link(howto_7,Section_howto.html#howto_7)
	:link(howto_8,Section_howto.html#howto_8)
	:link(howto_9,Section_howto.html#howto_9)
	:link(howto_10,Section_howto.html#howto_10)
	:link(howto_11,Section_howto.html#howto_11)
	:link(howto_12,Section_howto.html#howto_12)
	:link(howto_13,Section_howto.html#howto_13)
	:link(howto_14,Section_howto.html#howto_14)
	:link(howto_15,Section_howto.html#howto_15)
	:link(howto_16,Section_howto.html#howto_16)
	:link(howto_17,Section_howto.html#howto_17)
	:link(howto_18,Section_howto.html#howto_18)
	:link(howto_19,Section_howto.html#howto_19)
	:link(howto_20,Section_howto.html#howto_20)
	:link(howto_21,Section_howto.html#howto_21)
	:link(howto_22,Section_howto.html#howto_22)
	:link(howto_23,Section_howto.html#howto_23)
	:link(howto_24,Section_howto.html#howto_24)
	:link(howto_25,Section_howto.html#howto_25)
	:link(howto_26,Section_howto.html#howto_26)
	:link(howto_27,Section_howto.html#howto_27)

	:link(mod_1,Section_modify.html#mod_1)
	:link(mod_2,Section_modify.html#mod_2)
	:link(mod_3,Section_modify.html#mod_3)
	:link(mod_4,Section_modify.html#mod_4)
	:link(mod_5,Section_modify.html#mod_5)
	:link(mod_6,Section_modify.html#mod_6)
	:link(mod_7,Section_modify.html#mod_7)
	:link(mod_8,Section_modify.html#mod_8)
	:link(mod_9,Section_modify.html#mod_9)
	:link(mod_10,Section_modify.html#mod_10)
	:link(mod_11,Section_modify.html#mod_11)
	:link(mod_12,Section_modify.html#mod_12)
	:link(mod_13,Section_modify.html#mod_13)
	:link(mod_14,Section_modify.html#mod_14)
	:link(mod_15,Section_modify.html#mod_15)

	:link(py_1,Section_python.html#py_1)
	:link(py_2,Section_python.html#py_2)
	:link(py_3,Section_python.html#py_3)
	:link(py_4,Section_python.html#py_4)
	:link(py_5,Section_python.html#py_5)
	:link(py_6,Section_python.html#py_6)

	:link(err_1,Section_errors.html#err_1)
	:link(err_2,Section_errors.html#err_2)
	:link(err_3,Section_errors.html#err_3)

	:link(hist_1,Section_history.html#hist_1)
	:link(hist_2,Section_history.html#hist_2)
	<!-- END_HTML_ONLY -->

	</BODY>
	diff --git a/doc/src/Section_commands.txt b/doc/src/Section_commands.txt
	index 3f1d6ff20..c71acfe06 100644
	--- a/doc/src/Section_commands.txt
	+++ b/doc/src/Section_commands.txt
	@@ -1,1226 +1,1226 @@
	"Previous Section"_Section_start.html - "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next Section"_Section_packages.html :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	3. Commands :h3

	This section describes how a LAMMPS input script is formatted and the
	input script commands used to define a LAMMPS simulation.

	3.1 "LAMMPS input script"_#cmd_1
	3.2 "Parsing rules"_#cmd_2
	3.3 "Input script structure"_#cmd_3
	3.4 "Commands listed by category"_#cmd_4
	3.5 "Commands listed alphabetically"_#cmd_5 :all(b)

	:line
	:line

	3.1 LAMMPS input script :link(cmd_1),h4

	LAMMPS executes by reading commands from a input script (text file),
	one line at a time. When the input script ends, LAMMPS exits. Each
	command causes LAMMPS to take some action. It may set an internal
	variable, read in a file, or run a simulation. Most commands have
	default settings, which means you only need to use the command if you
	wish to change the default.

	In many cases, the ordering of commands in an input script is not
	important. However the following rules apply:

	(1) LAMMPS does not read your entire input script and then perform a
	simulation with all the settings. Rather, the input script is read
	one line at a time and each command takes effect when it is read.
	Thus this sequence of commands:

	timestep 0.5
	run 100
	run 100 :pre

	does something different than this sequence:

	run 100
	timestep 0.5
	run 100 :pre

	In the first case, the specified timestep (0.5 fmsec) is used for two
	simulations of 100 timesteps each. In the 2nd case, the default
	timestep (1.0 fmsec) is used for the 1st 100 step simulation and a 0.5
	fmsec timestep is used for the 2nd one.

	(2) Some commands are only valid when they follow other commands. For
	example you cannot set the temperature of a group of atoms until atoms
	have been defined and a group command is used to define which atoms
	belong to the group.

	(3) Sometimes command B will use values that can be set by command A.
	This means command A must precede command B in the input script if it
	is to have the desired effect. For example, the
	"read_data"_read_data.html command initializes the system by setting
	up the simulation box and assigning atoms to processors. If default
	values are not desired, the "processors"_processors.html and
	"boundary"_boundary.html commands need to be used before read_data to
	tell LAMMPS how to map processors to the simulation box.

	Many input script errors are detected by LAMMPS and an ERROR or
	WARNING message is printed. "This section"_Section_errors.html gives
	more information on what errors mean. The documentation for each
	command lists restrictions on how the command can be used.

	:line

	3.2 Parsing rules :link(cmd_2),h4

	Each non-blank line in the input script is treated as a command.
	LAMMPS commands are case sensitive. Command names are lower-case, as
	are specified command arguments. Upper case letters may be used in
	file names or user-chosen ID strings.

	Here is how each line in the input script is parsed by LAMMPS:

	(1) If the last printable character on the line is a "&" character,
	the command is assumed to continue on the next line. The next line is
	concatenated to the previous line by removing the "&" character and
	line break. This allows long commands to be continued across two or
	more lines. See the discussion of triple quotes in (6) for how to
	continue a command across multiple line without using "&" characters.

	(2) All characters from the first "#" character onward are treated as
	comment and discarded. See an exception in (6). Note that a
	comment after a trailing "&" character will prevent the command from
	continuing on the next line. Also note that for multi-line commands a
	single leading "#" will comment out the entire command.

	(3) The line is searched repeatedly for $ characters, which indicate
	variables that are replaced with a text string. See an exception in
	(6).

	If the $ is followed by curly brackets, then the variable name is the
	text inside the curly brackets. If no curly brackets follow the $,
	then the variable name is the single character immediately following
	the $. Thus $\{myTemp\} and $x refer to variable names "myTemp" and
	"x".

	How the variable is converted to a text string depends on what style
	of variable it is; see the "variable"_variable.html doc page for details.
	It can be a variable that stores multiple text strings, and return one
	of them. The returned text string can be multiple "words" (space
	separated) which will then be interpreted as multiple arguments in the
	input command. The variable can also store a numeric formula which
	will be evaluated and its numeric result returned as a string.

	As a special case, if the $ is followed by parenthesis, then the text
	inside the parenthesis is treated as an "immediate" variable and
	evaluated as an "equal-style variable"_variable.html. This is a way
	to use numeric formulas in an input script without having to assign
	them to variable names. For example, these 3 input script lines:

	variable X equal (xlo+xhi)/2+sqrt(v_area)
	region 1 block $X 2 INF INF EDGE EDGE
	variable X delete :pre

	can be replaced by

	region 1 block $((xlo+xhi)/2+sqrt(v_area)) 2 INF INF EDGE EDGE :pre

	so that you do not have to define (or discard) a temporary variable X.

	Note that neither the curly-bracket or immediate form of variables can
	contain nested $ characters for other variables to substitute for.
	Thus you cannot do this:

	variable a equal 2
	variable b2 equal 4
	print "B2 = $\{b$a\}" :pre

	Nor can you specify this $($x-1.0) for an immediate variable, but
	you could use $(v_x-1.0), since the latter is valid syntax for an
	"equal-style variable"_variable.html.

	See the "variable"_variable.html command for more details of how
	strings are assigned to variables and evaluated, and how they can be
	used in input script commands.

	(4) The line is broken into "words" separated by whitespace (tabs,
	spaces). Note that words can thus contain letters, digits,
	underscores, or punctuation characters.

	(5) The first word is the command name. All successive words in the
	line are arguments.

	(6) If you want text with spaces to be treated as a single argument,
	it can be enclosed in either single or double or triple quotes. A
	long single argument enclosed in single or double quotes can span
	multiple lines if the "&" character is used, as described above. When
	the lines are concatenated together (and the "&" characters and line
	breaks removed), the text will become a single line. If you want
	multiple lines of an argument to retain their line breaks, the text
	can be enclosed in triple quotes, in which case "&" characters are not
	needed. For example:

	print "Volume = $v"
	print 'Volume = $v'
	if "$\{steps\} > 1000" then quit
	variable a string "red green blue &
	purple orange cyan"
	print """
	System volume = $v
	System temperature = $t
	""" :pre

	In each case, the single, double, or triple quotes are removed when
	the single argument they enclose is stored internally.

	See the "dump modify format"_dump_modify.html, "print"_print.html,
	"if"_if.html, and "python"_python.html commands for examples.

	A "#" or "$" character that is between quotes will not be treated as a
	comment indicator in (2) or substituted for as a variable in (3).

	NOTE: If the argument is itself a command that requires a quoted
	argument (e.g. using a "print"_print.html command as part of an
	"if"_if.html or "run every"_run.html command), then single, double, or
	triple quotes can be nested in the usual manner. See the doc pages
	for those commands for examples. Only one of level of nesting is
	allowed, but that should be sufficient for most use cases.

	:line

	3.3 Input script structure :h4,link(cmd_3)

	This section describes the structure of a typical LAMMPS input script.
	The "examples" directory in the LAMMPS distribution contains many
	sample input scripts; the corresponding problems are discussed in
	"Section 7"_Section_example.html, and animated on the "LAMMPS
	WWW Site"_lws.

	A LAMMPS input script typically has 4 parts:

	Initialization
	Atom definition
	Settings
	Run a simulation :ol

	The last 2 parts can be repeated as many times as desired. I.e. run a
	simulation, change some settings, run some more, etc. Each of the 4
	parts is now described in more detail. Remember that almost all the
	commands need only be used if a non-default value is desired.

	(1) Initialization

	Set parameters that need to be defined before atoms are created or
	read-in from a file.

	The relevant commands are "units"_units.html,
	"dimension"_dimension.html, "newton"_newton.html,
	"processors"_processors.html, "boundary"_boundary.html,
	"atom_style"_atom_style.html, "atom_modify"_atom_modify.html.

	If force-field parameters appear in the files that will be read, these
	commands tell LAMMPS what kinds of force fields are being used:
	"pair_style"_pair_style.html, "bond_style"_bond_style.html,
	"angle_style"_angle_style.html, "dihedral_style"_dihedral_style.html,
	"improper_style"_improper_style.html.

	(2) Atom definition

	There are 3 ways to define atoms in LAMMPS. Read them in from a data
	or restart file via the "read_data"_read_data.html or
	"read_restart"_read_restart.html commands. These files can contain
	molecular topology information. Or create atoms on a lattice (with no
	molecular topology), using these commands: "lattice"_lattice.html,
	"region"_region.html, "create_box"_create_box.html,
	"create_atoms"_create_atoms.html. The entire set of atoms can be
	duplicated to make a larger simulation using the
	"replicate"_replicate.html command.

	(3) Settings

	Once atoms and molecular topology are defined, a variety of settings
	can be specified: force field coefficients, simulation parameters,
	output options, etc.

	Force field coefficients are set by these commands (they can also be
	set in the read-in files): "pair_coeff"_pair_coeff.html,
	"bond_coeff"_bond_coeff.html, "angle_coeff"_angle_coeff.html,
	"dihedral_coeff"_dihedral_coeff.html,
	"improper_coeff"_improper_coeff.html,
	"kspace_style"_kspace_style.html, "dielectric"_dielectric.html,
	"special_bonds"_special_bonds.html.

	Various simulation parameters are set by these commands:
	"neighbor"_neighbor.html, "neigh_modify"_neigh_modify.html,
	"group"_group.html, "timestep"_timestep.html,
	"reset_timestep"_reset_timestep.html, "run_style"_run_style.html,
	"min_style"_min_style.html, "min_modify"_min_modify.html.

	Fixes impose a variety of boundary conditions, time integration, and
	diagnostic options. The "fix"_fix.html command comes in many flavors.

	Various computations can be specified for execution during a
	simulation using the "compute"_compute.html,
	"compute_modify"_compute_modify.html, and "variable"_variable.html
	commands.

	Output options are set by the "thermo"_thermo.html, "dump"_dump.html,
	and "restart"_restart.html commands.

	(4) Run a simulation

	A molecular dynamics simulation is run using the "run"_run.html
	command. Energy minimization (molecular statics) is performed using
	the "minimize"_minimize.html command. A parallel tempering
	(replica-exchange) simulation can be run using the
	"temper"_temper.html command.

	:line

	3.4 Commands listed by category :link(cmd_4),h4

	This section lists core LAMMPS commands, grouped by category.
	The "next section"_#cmd_5 lists all commands alphabetically. The
	next section also includes (long) lists of style options for entries
	that appear in the following categories as a single command (fix,
	compute, pair, etc). Commands that are added by user packages are not
	included in the categories here, but they are in the next section.

	Initialization:

	"newton"_newton.html,
	"package"_package.html,
	"processors"_processors.html,
	"suffix"_suffix.html,
	"units"_units.html

	Setup simulation box:

	"boundary"_boundary.html,
	"box"_box.html,
	"change_box"_change_box.html,
	"create_box"_create_box.html,
	"dimension"_dimension.html,
	"lattice"_lattice.html,
	"region"_region.html

	Setup atoms:

	"atom_modify"_atom_modify.html,
	"atom_style"_atom_style.html,
	"balance"_balance.html,
	"create_atoms"_create_atoms.html,
	"create_bonds"_create_bonds.html,
	"delete_atoms"_delete_atoms.html,
	"delete_bonds"_delete_bonds.html,
	"displace_atoms"_displace_atoms.html,
	"group"_group.html,
	"mass"_mass.html,
	"molecule"_molecule.html,
	"read_data"_read_data.html,
	"read_dump"_read_dump.html,
	"read_restart"_read_restart.html,
	"replicate"_replicate.html,
	"set"_set.html,
	"velocity"_velocity.html

	Force fields:

	"angle_coeff"_angle_coeff.html,
	"angle_style"_angle_style.html,
	"bond_coeff"_bond_coeff.html,
	"bond_style"_bond_style.html,
	"bond_write"_bond_write.html,
	"dielectric"_dielectric.html,
	"dihedral_coeff"_dihedral_coeff.html,
	"dihedral_style"_dihedral_style.html,
	"improper_coeff"_improper_coeff.html,
	"improper_style"_improper_style.html,
	"kspace_modify"_kspace_modify.html,
	"kspace_style"_kspace_style.html,
	"pair_coeff"_pair_coeff.html,
	"pair_modify"_pair_modify.html,
	"pair_style"_pair_style.html,
	"pair_write"_pair_write.html,
	"special_bonds"_special_bonds.html

	Settings:

	"comm_modify"_comm_modify.html,
	"comm_style"_comm_style.html,
	"info"_info.html,
	"min_modify"_min_modify.html,
	"min_style"_min_style.html,
	"neigh_modify"_neigh_modify.html,
	"neighbor"_neighbor.html,
	"partition"_partition.html,
	"reset_timestep"_reset_timestep.html,
	"run_style"_run_style.html,
	"timer"_timer.html,
	"timestep"_timestep.html

	Operations within timestepping (fixes) and diagnostics (computes):

	"compute"_compute.html,
	"compute_modify"_compute_modify.html,
	"fix"_fix.html,
	"fix_modify"_fix_modify.html,
	"uncompute"_uncompute.html,
	"unfix"_unfix.html

	Output:

	"dump image"_dump_image.html,
	"dump movie"_dump_image.html,
	"dump"_dump.html,
	"dump_modify"_dump_modify.html,
	"restart"_restart.html,
	"thermo"_thermo.html,
	"thermo_modify"_thermo_modify.html,
	"thermo_style"_thermo_style.html,
	"undump"_undump.html,
	"write_coeff"_write_coeff.html,
	"write_data"_write_data.html,
	"write_dump"_write_dump.html,
	"write_restart"_write_restart.html

	Actions:

	"minimize"_minimize.html,
	"neb"_neb.html,
	"prd"_prd.html,
	"rerun"_rerun.html,
	"run"_run.html,
	"tad"_tad.html,
	"temper"_temper.html

	Input script control:

	"clear"_clear.html,
	"echo"_echo.html,
	"if"_if.html,
	"include"_include.html,
	"jump"_jump.html,
	"label"_label.html,
	"log"_log.html,
	"next"_next.html,
	"print"_print.html,
	"python"_python.html,
	"quit"_quit.html,
	"shell"_shell.html,
	"variable"_variable.html

	:line

	3.5 Individual commands :h4,link(cmd_5),link(comm)

	This section lists all LAMMPS commands alphabetically, with a separate
	listing below of styles within certain commands. The "previous
	section"_#cmd_4 lists the same commands, grouped by category. Note
	that some style options for some commands are part of specific LAMMPS
	packages, which means they cannot be used unless the package was
	included when LAMMPS was built. Not all packages are included in a
	default LAMMPS build. These dependencies are listed as Restrictions
	in the command's documentation.

	"angle_coeff"_angle_coeff.html,
	"angle_style"_angle_style.html,
	"atom_modify"_atom_modify.html,
	"atom_style"_atom_style.html,
	"balance"_balance.html,
	"bond_coeff"_bond_coeff.html,
	"bond_style"_bond_style.html,
	"bond_write"_bond_write.html,
	"boundary"_boundary.html,
	"box"_box.html,
	"change_box"_change_box.html,
	"clear"_clear.html,
	"comm_modify"_comm_modify.html,
	"comm_style"_comm_style.html,
	"compute"_compute.html,
	"compute_modify"_compute_modify.html,
	"create_atoms"_create_atoms.html,
	"create_bonds"_create_bonds.html,
	"create_box"_create_box.html,
	"delete_atoms"_delete_atoms.html,
	"delete_bonds"_delete_bonds.html,
	"dielectric"_dielectric.html,
	"dihedral_coeff"_dihedral_coeff.html,
	"dihedral_style"_dihedral_style.html,
	"dimension"_dimension.html,
	"displace_atoms"_displace_atoms.html,
	"dump"_dump.html,
	"dump image"_dump_image.html,
	"dump_modify"_dump_modify.html,
	"dump movie"_dump_image.html,
	"echo"_echo.html,
	"fix"_fix.html,
	"fix_modify"_fix_modify.html,
	"group"_group.html,
	"if"_if.html,
	"info"_info.html,
	"improper_coeff"_improper_coeff.html,
	"improper_style"_improper_style.html,
	"include"_include.html,
	"jump"_jump.html,
	"kspace_modify"_kspace_modify.html,
	"kspace_style"_kspace_style.html,
	"label"_label.html,
	"lattice"_lattice.html,
	"log"_log.html,
	"mass"_mass.html,
	"minimize"_minimize.html,
	"min_modify"_min_modify.html,
	"min_style"_min_style.html,
	"molecule"_molecule.html,
	"neb"_neb.html,
	"neigh_modify"_neigh_modify.html,
	"neighbor"_neighbor.html,
	"newton"_newton.html,
	"next"_next.html,
	"package"_package.html,
	"pair_coeff"_pair_coeff.html,
	"pair_modify"_pair_modify.html,
	"pair_style"_pair_style.html,
	"pair_write"_pair_write.html,
	"partition"_partition.html,
	"prd"_prd.html,
	"print"_print.html,
	"processors"_processors.html,
	"python"_python.html,
	"quit"_quit.html,
	"read_data"_read_data.html,
	"read_dump"_read_dump.html,
	"read_restart"_read_restart.html,
	"region"_region.html,
	"replicate"_replicate.html,
	"rerun"_rerun.html,
	"reset_timestep"_reset_timestep.html,
	"restart"_restart.html,
	"run"_run.html,
	"run_style"_run_style.html,
	"set"_set.html,
	"shell"_shell.html,
	"special_bonds"_special_bonds.html,
	"suffix"_suffix.html,
	"tad"_tad.html,
	"temper"_temper.html,
	"thermo"_thermo.html,
	"thermo_modify"_thermo_modify.html,
	"thermo_style"_thermo_style.html,
	"timer"_timer.html,
	"timestep"_timestep.html,
	"uncompute"_uncompute.html,
	"undump"_undump.html,
	"unfix"_unfix.html,
	"units"_units.html,
	"variable"_variable.html,
	"velocity"_velocity.html,
	"write_coeff"_write_coeff.html,
	"write_data"_write_data.html,
	"write_dump"_write_dump.html,
	"write_restart"_write_restart.html :tb(c=6,ea=c)

	These are additional commands in USER packages, which can be used if
	"LAMMPS is built with the appropriate
	package"_Section_start.html#start_3.

	"dump custom/vtk"_dump_custom_vtk.html,
	"dump nc"_dump_nc.html,
	"dump nc/mpiio"_dump_nc.html,
	"group2ndx"_group2ndx.html,
	"ndx2group"_group2ndx.html,
	"temper/grem"_temper_grem.html :tb(c=3,ea=c)

	:line

	Fix styles :h4

	See the "fix"_fix.html command for one-line descriptions of each style
	or click on the style itself for a full description. Some of the
	styles have accelerated versions, which can be used if LAMMPS is built
	with the "appropriate accelerated package"_Section_accelerate.html.
	This is indicated by additional letters in parenthesis: g = GPU, i =
	USER-INTEL, k = KOKKOS, o = USER-OMP, t = OPT.

	"adapt"_fix_adapt.html,
	"addforce"_fix_addforce.html,
	"append/atoms"_fix_append_atoms.html,
	"atom/swap"_fix_atom_swap.html,
	"aveforce"_fix_aveforce.html,
	"ave/atom"_fix_ave_atom.html,
	"ave/chunk"_fix_ave_chunk.html,
	"ave/correlate"_fix_ave_correlate.html,
	"ave/histo"_fix_ave_histo.html,
	"ave/histo/weight"_fix_ave_histo.html,
	"ave/time"_fix_ave_time.html,
	"balance"_fix_balance.html,
	"bond/break"_fix_bond_break.html,
	"bond/create"_fix_bond_create.html,
	"bond/swap"_fix_bond_swap.html,
	"box/relax"_fix_box_relax.html,
	"cmap"_fix_cmap.html,
	"controller"_fix_controller.html,
	"deform (k)"_fix_deform.html,
	"deposit"_fix_deposit.html,
	"drag"_fix_drag.html,
	"dt/reset"_fix_dt_reset.html,
	"efield"_fix_efield.html,
	"ehex"_fix_ehex.html,
	"enforce2d"_fix_enforce2d.html,
	"evaporate"_fix_evaporate.html,
	"external"_fix_external.html,
	"freeze"_fix_freeze.html,
	"gcmc"_fix_gcmc.html,
	"gld"_fix_gld.html,
	"gravity (o)"_fix_gravity.html,
	"halt"_fix_halt.html,
	"heat"_fix_heat.html,
	"indent"_fix_indent.html,
	"langevin (k)"_fix_langevin.html,
	"lineforce"_fix_lineforce.html,
	"momentum (k)"_fix_momentum.html,
	"move"_fix_move.html,
	"mscg"_fix_mscg.html,
	"msst"_fix_msst.html,
	"neb"_fix_neb.html,
	"nph (ko)"_fix_nh.html,
	"nphug (o)"_fix_nphug.html,
	"nph/asphere (o)"_fix_nph_asphere.html,
	"nph/body"_fix_nph_body.html,
	"nph/sphere (o)"_fix_nph_sphere.html,
	"npt (kio)"_fix_nh.html,
	"npt/asphere (o)"_fix_npt_asphere.html,
	"npt/body"_fix_npt_body.html,
	"npt/sphere (o)"_fix_npt_sphere.html,
	"nve (kio)"_fix_nve.html,
	"nve/asphere (i)"_fix_nve_asphere.html,
	"nve/asphere/noforce"_fix_nve_asphere_noforce.html,
	"nve/body"_fix_nve_body.html,
	"nve/limit"_fix_nve_limit.html,
	"nve/line"_fix_nve_line.html,
	"nve/noforce"_fix_nve_noforce.html,
	"nve/sphere (o)"_fix_nve_sphere.html,
	"nve/tri"_fix_nve_tri.html,
	"nvt (iko)"_fix_nh.html,
	"nvt/asphere (o)"_fix_nvt_asphere.html,
	"nvt/body"_fix_nvt_body.html,
	"nvt/sllod (io)"_fix_nvt_sllod.html,
	"nvt/sphere (o)"_fix_nvt_sphere.html,
	"oneway"_fix_oneway.html,
	"orient/bcc"_fix_orient.html,
	"orient/fcc"_fix_orient.html,
	"planeforce"_fix_planeforce.html,
	"poems"_fix_poems.html,
	"pour"_fix_pour.html,
	"press/berendsen"_fix_press_berendsen.html,
	"print"_fix_print.html,
	"property/atom"_fix_property_atom.html,
	"qeq/comb (o)"_fix_qeq_comb.html,
	"qeq/dynamic"_fix_qeq.html,
	"qeq/fire"_fix_qeq.html,
	"qeq/point"_fix_qeq.html,
	"qeq/shielded"_fix_qeq.html,
	"qeq/slater"_fix_qeq.html,
	"rattle"_fix_shake.html,
	"reax/bonds"_fix_reax_bonds.html,
	"recenter"_fix_recenter.html,
	"restrain"_fix_restrain.html,
	"rigid (o)"_fix_rigid.html,
	"rigid/nph (o)"_fix_rigid.html,
	"rigid/npt (o)"_fix_rigid.html,
	"rigid/nve (o)"_fix_rigid.html,
	"rigid/nvt (o)"_fix_rigid.html,
	"rigid/small (o)"_fix_rigid.html,
	"rigid/small/nph (o)"_fix_rigid.html,
	"rigid/small/npt (o)"_fix_rigid.html,
	"rigid/small/nve (o)"_fix_rigid.html,
	"rigid/small/nvt (o)"_fix_rigid.html,
	"setforce (k)"_fix_setforce.html,
	"shake"_fix_shake.html,
	"spring"_fix_spring.html,
	"spring/chunk"_fix_spring_chunk.html,
	"spring/rg"_fix_spring_rg.html,
	"spring/self"_fix_spring_self.html,
	"srd"_fix_srd.html,
	"store/force"_fix_store_force.html,
	"store/state"_fix_store_state.html,
	"temp/berendsen"_fix_temp_berendsen.html,
	"temp/csld"_fix_temp_csvr.html,
	"temp/csvr"_fix_temp_csvr.html,
	"temp/rescale"_fix_temp_rescale.html,
	"tfmc"_fix_tfmc.html,
	"thermal/conductivity"_fix_thermal_conductivity.html,
	"tmd"_fix_tmd.html,
	"ttm"_fix_ttm.html,
	"tune/kspace"_fix_tune_kspace.html,
	"vector"_fix_vector.html,
	"viscosity"_fix_viscosity.html,
	"viscous"_fix_viscous.html,
	"wall/colloid"_fix_wall.html,
	"wall/gran"_fix_wall_gran.html,
	"wall/gran/region"_fix_wall_gran_region.html,
	"wall/harmonic"_fix_wall.html,
	"wall/lj1043"_fix_wall.html,
	"wall/lj126"_fix_wall.html,
	"wall/lj93"_fix_wall.html,
	"wall/piston"_fix_wall_piston.html,
	"wall/reflect (k)"_fix_wall_reflect.html,
	"wall/region"_fix_wall_region.html,
	"wall/srd"_fix_wall_srd.html :tb(c=8,ea=c)

	These are additional fix styles in USER packages, which can be used if
	"LAMMPS is built with the appropriate
	package"_Section_start.html#start_3.

	"adapt/fep"_fix_adapt_fep.html,
	"addtorque"_fix_addtorque.html,
	"atc"_fix_atc.html,
	"ave/correlate/long"_fix_ave_correlate_long.html,
	"colvars"_fix_colvars.html,
	"dpd/energy"_fix_dpd_energy.html,
	"drude"_fix_drude.html,
	"drude/transform/direct"_fix_drude_transform.html,
	"drude/transform/reverse"_fix_drude_transform.html,
	"eos/cv"_fix_eos_cv.html,
	"eos/table"_fix_eos_table.html,
	"eos/table/rx"_fix_eos_table_rx.html,
	"filter/corotate"_fix_filter_corotate.html,
	"flow/gauss"_fix_flow_gauss.html,
	"gle"_fix_gle.html,
	"grem"_fix_grem.html,
	"imd"_fix_imd.html,
	"ipi"_fix_ipi.html,
	"langevin/drude"_fix_langevin_drude.html,
	"langevin/eff"_fix_langevin_eff.html,
	"lb/fluid"_fix_lb_fluid.html,
	"lb/momentum"_fix_lb_momentum.html,
	"lb/pc"_fix_lb_pc.html,
	"lb/rigid/pc/sphere"_fix_lb_rigid_pc_sphere.html,
	"lb/viscous"_fix_lb_viscous.html,
	"meso"_fix_meso.html,
	"manifoldforce"_fix_manifoldforce.html,
	"meso/stationary"_fix_meso_stationary.html,
	"nve/dot"_fix_nve_dot.html,
	"nve/dotc/langevin"_fix_nve_dotc_langevin.html,
	"nve/manifold/rattle"_fix_nve_manifold_rattle.html,
	"nvk"_fix_nvk.html,
	"nvt/manifold/rattle"_fix_nvt_manifold_rattle.html,
	"nph/eff"_fix_nh_eff.html,
	"npt/eff"_fix_nh_eff.html,
	"nve/eff"_fix_nve_eff.html,
	"nvt/eff"_fix_nh_eff.html,
	"nvt/sllod/eff"_fix_nvt_sllod_eff.html,
	"phonon"_fix_phonon.html,
	"pimd"_fix_pimd.html,
	"qbmsst"_fix_qbmsst.html,
	"qeq/reax"_fix_qeq_reax.html,
	"qmmm"_fix_qmmm.html,
	"qtb"_fix_qtb.html,
	"reax/c/bonds"_fix_reax_bonds.html,
	"reax/c/species"_fix_reaxc_species.html,
	"rx"_fix_rx.html,
	"saed/vtk"_fix_saed_vtk.html,
	"shardlow"_fix_shardlow.html,
	"smd"_fix_smd.html,
	"smd/adjust/dt"_fix_smd_adjust_dt.html,
	"smd/integrate/tlsph"_fix_smd_integrate_tlsph.html,
	"smd/integrate/ulsph"_fix_smd_integrate_ulsph.html,
	"smd/move/triangulated/surface"_fix_smd_move_triangulated_surface.html,
	"smd/setvel"_fix_smd_setvel.html,
	"smd/wall/surface"_fix_smd_wall_surface.html,
	"temp/rescale/eff"_fix_temp_rescale_eff.html,
	"ti/spring"_fix_ti_spring.html,
	"ttm/mod"_fix_ttm.html :tb(c=6,ea=c)

	:line

	Compute styles :h4

	See the "compute"_compute.html command for one-line descriptions of
	each style or click on the style itself for a full description. Some
	of the styles have accelerated versions, which can be used if LAMMPS
	is built with the "appropriate accelerated
	package"_Section_accelerate.html. This is indicated by additional
	letters in parenthesis: g = GPU, i = USER-INTEL, k =
	KOKKOS, o = USER-OMP, t = OPT.

	"angle"_compute_angle.html,
	"angle/local"_compute_angle_local.html,
	"angmom/chunk"_compute_angmom_chunk.html,
	"body/local"_compute_body_local.html,
	"bond"_compute_bond.html,
	"bond/local"_compute_bond_local.html,
	"centro/atom"_compute_centro_atom.html,
	"chunk/atom"_compute_chunk_atom.html,
	"cluster/atom"_compute_cluster_atom.html,
	"cna/atom"_compute_cna_atom.html,
	"com"_compute_com.html,
	"com/chunk"_compute_com_chunk.html,
	"contact/atom"_compute_contact_atom.html,
	"coord/atom"_compute_coord_atom.html,
	"damage/atom"_compute_damage_atom.html,
	"dihedral"_compute_dihedral.html,
	"dihedral/local"_compute_dihedral_local.html,
	"dilatation/atom"_compute_dilatation_atom.html,
	"dipole/chunk"_compute_dipole_chunk.html,
	"displace/atom"_compute_displace_atom.html,
	"erotate/asphere"_compute_erotate_asphere.html,
	"erotate/rigid"_compute_erotate_rigid.html,
	"erotate/sphere"_compute_erotate_sphere.html,
	"erotate/sphere/atom"_compute_erotate_sphere_atom.html,
	"event/displace"_compute_event_displace.html,
	"global/atom"_compute_global_atom.html,
	"group/group"_compute_group_group.html,
	"gyration"_compute_gyration.html,
	"gyration/chunk"_compute_gyration_chunk.html,
	"heat/flux"_compute_heat_flux.html,
	"hexorder/atom"_compute_hexorder_atom.html,
	"improper"_compute_improper.html,
	"improper/local"_compute_improper_local.html,
	"inertia/chunk"_compute_inertia_chunk.html,
	"ke"_compute_ke.html,
	"ke/atom"_compute_ke_atom.html,
	"ke/rigid"_compute_ke_rigid.html,
	"msd"_compute_msd.html,
	"msd/chunk"_compute_msd_chunk.html,
	"msd/nongauss"_compute_msd_nongauss.html,
	"omega/chunk"_compute_omega_chunk.html,
	"orientorder/atom"_compute_orientorder_atom.html,
	"pair"_compute_pair.html,
	"pair/local"_compute_pair_local.html,
	"pe"_compute_pe.html,
	"pe/atom"_compute_pe_atom.html,
	"plasticity/atom"_compute_plasticity_atom.html,
	"pressure"_compute_pressure.html,
	"property/atom"_compute_property_atom.html,
	"property/local"_compute_property_local.html,
	"property/chunk"_compute_property_chunk.html,
	"rdf"_compute_rdf.html,
	"reduce"_compute_reduce.html,
	"reduce/region"_compute_reduce.html,
	"rigid/local"_compute_rigid_local.html,
	"slice"_compute_slice.html,
	"sna/atom"_compute_sna_atom.html,
	"snad/atom"_compute_sna_atom.html,
	"snav/atom"_compute_sna_atom.html,
	"stress/atom"_compute_stress_atom.html,
	"temp (k)"_compute_temp.html,
	"temp/asphere"_compute_temp_asphere.html,
	"temp/body"_compute_temp_body.html,
	"temp/chunk"_compute_temp_chunk.html,
	"temp/com"_compute_temp_com.html,
	"temp/deform"_compute_temp_deform.html,
	"temp/partial"_compute_temp_partial.html,
	"temp/profile"_compute_temp_profile.html,
	"temp/ramp"_compute_temp_ramp.html,
	"temp/region"_compute_temp_region.html,
	"temp/sphere"_compute_temp_sphere.html,
	"ti"_compute_ti.html,
	"torque/chunk"_compute_torque_chunk.html,
	"vacf"_compute_vacf.html,
	"vcm/chunk"_compute_vcm_chunk.html,
	"voronoi/atom"_compute_voronoi_atom.html :tb(c=6,ea=c)

	These are additional compute styles in USER packages, which can be
	used if "LAMMPS is built with the appropriate
	package"_Section_start.html#start_3.

	"ackland/atom"_compute_ackland_atom.html,
	"basal/atom"_compute_basal_atom.html,
	"dpd"_compute_dpd.html,
	"dpd/atom"_compute_dpd_atom.html,
	"fep"_compute_fep.html,
	"force/tally"_compute_tally.html,
	"heat/flux/tally"_compute_tally.html,
	"ke/eff"_compute_ke_eff.html,
	"ke/atom/eff"_compute_ke_atom_eff.html,
	"meso/e/atom"_compute_meso_e_atom.html,
	"meso/rho/atom"_compute_meso_rho_atom.html,
	"meso/t/atom"_compute_meso_t_atom.html,
	"pe/tally"_compute_tally.html,
	"pe/mol/tally"_compute_tally.html,
	"saed"_compute_saed.html,
	"smd/contact/radius"_compute_smd_contact_radius.html,
	"smd/damage"_compute_smd_damage.html,
	"smd/hourglass/error"_compute_smd_hourglass_error.html,
	"smd/internal/energy"_compute_smd_internal_energy.html,
	"smd/plastic/strain"_compute_smd_plastic_strain.html,
	"smd/plastic/strain/rate"_compute_smd_plastic_strain_rate.html,
	"smd/rho"_compute_smd_rho.html,
	"smd/tlsph/defgrad"_compute_smd_tlsph_defgrad.html,
	"smd/tlsph/dt"_compute_smd_tlsph_dt.html,
	"smd/tlsph/num/neighs"_compute_smd_tlsph_num_neighs.html,
	"smd/tlsph/shape"_compute_smd_tlsph_shape.html,
	"smd/tlsph/strain"_compute_smd_tlsph_strain.html,
	"smd/tlsph/strain/rate"_compute_smd_tlsph_strain_rate.html,
	"smd/tlsph/stress"_compute_smd_tlsph_stress.html,
	"smd/triangle/mesh/vertices"_compute_smd_triangle_mesh_vertices.html,
	"smd/ulsph/num/neighs"_compute_smd_ulsph_num_neighs.html,
	"smd/ulsph/strain"_compute_smd_ulsph_strain.html,
	"smd/ulsph/strain/rate"_compute_smd_ulsph_strain_rate.html,
	"smd/ulsph/stress"_compute_smd_ulsph_stress.html,
	"smd/vol"_compute_smd_vol.html,
	"stress/tally"_compute_tally.html,
	"temp/drude"_compute_temp_drude.html,
	"temp/eff"_compute_temp_eff.html,
	"temp/deform/eff"_compute_temp_deform_eff.html,
	"temp/region/eff"_compute_temp_region_eff.html,
	"temp/rotate"_compute_temp_rotate.html,
	"xrd"_compute_xrd.html :tb(c=6,ea=c)

	:line

	Pair_style potentials :h4

	See the "pair_style"_pair_style.html command for an overview of pair
	potentials. Click on the style itself for a full description. Many
	of the styles have accelerated versions, which can be used if LAMMPS
	is built with the "appropriate accelerated
	package"_Section_accelerate.html. This is indicated by additional
	letters in parenthesis: g = GPU, i = USER-INTEL, k =
	KOKKOS, o = USER-OMP, t = OPT.

	"none"_pair_none.html,
	"zero"_pair_zero.html,
	"hybrid"_pair_hybrid.html,
	"hybrid/overlay"_pair_hybrid.html,
	"adp (o)"_pair_adp.html,
	"airebo (o)"_pair_airebo.html,
	"airebo/morse (o)"_pair_airebo.html,
	"beck (go)"_pair_beck.html,
	"body"_pair_body.html,
	"bop"_pair_bop.html,
	"born (go)"_pair_born.html,
	"born/coul/dsf"_pair_born.html,
	"born/coul/dsf/cs"_pair_born.html,
	"born/coul/long (go)"_pair_born.html,
	"born/coul/long/cs"_pair_born.html,
	"born/coul/msm (o)"_pair_born.html,
	"born/coul/wolf (go)"_pair_born.html,
	"brownian (o)"_pair_brownian.html,
	"brownian/poly (o)"_pair_brownian.html,
	"buck (gkio)"_pair_buck.html,
	"buck/coul/cut (gkio)"_pair_buck.html,
	"buck/coul/long (gkio)"_pair_buck.html,
	"buck/coul/long/cs"_pair_buck.html,
	"buck/coul/msm (o)"_pair_buck.html,
	"buck/long/coul/long (o)"_pair_buck_long.html,
	"colloid (go)"_pair_colloid.html,
	"comb (o)"_pair_comb.html,
	"comb3"_pair_comb.html,
	"coul/cut (gko)"_pair_coul.html,
	"coul/debye (gko)"_pair_coul.html,
	"coul/dsf (gko)"_pair_coul.html,
	"coul/long (gko)"_pair_coul.html,
	"coul/long/cs"_pair_coul.html,
	"coul/msm"_pair_coul.html,
	"coul/streitz"_pair_coul.html,
	"coul/wolf (ko)"_pair_coul.html,
	"dpd (go)"_pair_dpd.html,
	"dpd/tstat (go)"_pair_dpd.html,
	"dsmc"_pair_dsmc.html,
	"eam (gkiot)"_pair_eam.html,
	"eam/alloy (gkot)"_pair_eam.html,
	"eam/fs (gkot)"_pair_eam.html,
	"eim (o)"_pair_eim.html,
	"gauss (go)"_pair_gauss.html,
	"gayberne (gio)"_pair_gayberne.html,
	"gran/hertz/history (o)"_pair_gran.html,
	"gran/hooke (o)"_pair_gran.html,
	"gran/hooke/history (o)"_pair_gran.html,
	"hbond/dreiding/lj (o)"_pair_hbond_dreiding.html,
	"hbond/dreiding/morse (o)"_pair_hbond_dreiding.html,
	"kim"_pair_kim.html,
	"lcbop"_pair_lcbop.html,
	"line/lj"_pair_line_lj.html,
	"lj/charmm/coul/charmm (ko)"_pair_charmm.html,
	"lj/charmm/coul/charmm/implicit (ko)"_pair_charmm.html,
	"lj/charmm/coul/long (giko)"_pair_charmm.html,
	"lj/charmm/coul/msm"_pair_charmm.html,
	"lj/charmmfsw/coul/charmmfsh"_pair_charmm.html,
	"lj/charmmfsw/coul/long"_pair_charmm.html,
	"lj/class2 (gko)"_pair_class2.html,
	"lj/class2/coul/cut (ko)"_pair_class2.html,
	"lj/class2/coul/long (gko)"_pair_class2.html,
	"lj/cubic (go)"_pair_lj_cubic.html,
	"lj/cut (gikot)"_pair_lj.html,
	"lj/cut/coul/cut (gko)"_pair_lj.html,
	"lj/cut/coul/debye (gko)"_pair_lj.html,
	"lj/cut/coul/dsf (gko)"_pair_lj.html,
	"lj/cut/coul/long (gikot)"_pair_lj.html,
	"lj/cut/coul/long/cs"_pair_lj.html,
	"lj/cut/coul/msm (go)"_pair_lj.html,
	"lj/cut/dipole/cut (go)"_pair_dipole.html,
	"lj/cut/dipole/long"_pair_dipole.html,
	"lj/cut/tip4p/cut (o)"_pair_lj.html,
	"lj/cut/tip4p/long (ot)"_pair_lj.html,
	"lj/expand (gko)"_pair_lj_expand.html,
	"lj/gromacs (gko)"_pair_gromacs.html,
	"lj/gromacs/coul/gromacs (ko)"_pair_gromacs.html,
	"lj/long/coul/long (o)"_pair_lj_long.html,
	"lj/long/dipole/long"_pair_dipole.html,
	"lj/long/tip4p/long"_pair_lj_long.html,
	"lj/smooth (o)"_pair_lj_smooth.html,
	"lj/smooth/linear (o)"_pair_lj_smooth_linear.html,
	"lj96/cut (go)"_pair_lj96.html,
	"lubricate (o)"_pair_lubricate.html,
	"lubricate/poly (o)"_pair_lubricate.html,
	"lubricateU"_pair_lubricateU.html,
	"lubricateU/poly"_pair_lubricateU.html,
	"meam"_pair_meam.html,
	"mie/cut (o)"_pair_mie.html,
	"morse (gkot)"_pair_morse.html,
	"nb3b/harmonic (o)"_pair_nb3b_harmonic.html,
	"nm/cut (o)"_pair_nm.html,
	"nm/cut/coul/cut (o)"_pair_nm.html,
	"nm/cut/coul/long (o)"_pair_nm.html,
	"peri/eps"_pair_peri.html,
	"peri/lps (o)"_pair_peri.html,
	"peri/pmb (o)"_pair_peri.html,
	"peri/ves"_pair_peri.html,
	"polymorphic"_pair_polymorphic.html,
	"reax"_pair_reax.html,
	"rebo (o)"_pair_airebo.html,
	"resquared (go)"_pair_resquared.html,
	"snap"_pair_snap.html,
	"soft (go)"_pair_soft.html,
	"sw (gkio)"_pair_sw.html,
	"table (gko)"_pair_table.html,
	"tersoff (gkio)"_pair_tersoff.html,
	"tersoff/mod (gko)"_pair_tersoff_mod.html,
	"tersoff/mod/c (o)"_pair_tersoff_mod.html,
	"tersoff/zbl (gko)"_pair_tersoff_zbl.html,
	"tip4p/cut (o)"_pair_coul.html,
	"tip4p/long (o)"_pair_coul.html,
	"tri/lj"_pair_tri_lj.html,
	"vashishta (ko)"_pair_vashishta.html,
	"vashishta/table (o)"_pair_vashishta.html,
	"yukawa (go)"_pair_yukawa.html,
	"yukawa/colloid (go)"_pair_yukawa_colloid.html,
	"zbl (go)"_pair_zbl.html :tb(c=4,ea=c)

	These are additional pair styles in USER packages, which can be used
	if "LAMMPS is built with the appropriate
	package"_Section_start.html#start_3.

	"agni (o)"_pair_agni.html,
	"awpmd/cut"_pair_awpmd.html,
	"buck/mdf"_pair_mdf.html,
	"coul/cut/soft (o)"_pair_lj_soft.html,
	"coul/diel (o)"_pair_coul_diel.html,
	"coul/long/soft (o)"_pair_lj_soft.html,
	"dpd/fdt"_pair_dpd_fdt.html,
	"dpd/fdt/energy"_pair_dpd_fdt.html,
	"eam/cd (o)"_pair_eam.html,
	"edip (o)"_pair_edip.html,
	"eff/cut"_pair_eff.html,
	"exp6/rx"_pair_exp6_rx.html,
	"gauss/cut"_pair_gauss.html,
	"kolmogorov/crespi/z"_pair_kolmogorov_crespi_z.html,
	"lennard/mdf"_pair_mdf.html,
	"list"_pair_list.html,
	"lj/charmm/coul/long/soft (o)"_pair_charmm.html,
	"lj/cut/coul/cut/soft (o)"_pair_lj_soft.html,
	"lj/cut/coul/long/soft (o)"_pair_lj_soft.html,
	"lj/cut/dipole/sf (go)"_pair_dipole.html,
	"lj/cut/soft (o)"_pair_lj_soft.html,
	"lj/cut/thole/long (o)"_pair_thole.html,
	"lj/cut/tip4p/long/soft (o)"_pair_lj_soft.html,
	"lj/mdf"_pair_mdf.html,
	"lj/sdk (gko)"_pair_sdk.html,
	"lj/sdk/coul/long (go)"_pair_sdk.html,
	"lj/sdk/coul/msm (o)"_pair_sdk.html,
	"lj/sf (o)"_pair_lj_sf.html,
	"meam/spline (o)"_pair_meam_spline.html,
	"meam/sw/spline"_pair_meam_sw_spline.html,
	"mgpt"_pair_mgpt.html,
	"momb"_pair_momb.html,
	"morse/smooth/linear"_pair_morse.html,
	"morse/soft"_pair_morse.html,
	"multi/lucy"_pair_multi_lucy.html,
	"multi/lucy/rx"_pair_multi_lucy_rx.html,
	"oxdna/coaxstk"_pair_oxdna.html,
	"oxdna/excv"_pair_oxdna.html,
	"oxdna/hbond"_pair_oxdna.html,
	"oxdna/stk"_pair_oxdna.html,
	"oxdna/xstk"_pair_oxdna.html,
	"oxdna2/coaxstk"_pair_oxdna2.html,
	"oxdna2/dh"_pair_oxdna2.html,
	"oxdna2/excv"_pair_oxdna2.html,
	"oxdna2/stk"_pair_oxdna2.html,
	"quip"_pair_quip.html,
	-"reax/c (k)"_pair_reax_c.html,
	+"reax/c (k)"_pair_reaxc.html,
	"smd/hertz"_pair_smd_hertz.html,
	"smd/tlsph"_pair_smd_tlsph.html,
	"smd/triangulated/surface"_pair_smd_triangulated_surface.html,
	"smd/ulsph"_pair_smd_ulsph.html,
	"smtbq"_pair_smtbq.html,
	"sph/heatconduction"_pair_sph_heatconduction.html,
	"sph/idealgas"_pair_sph_idealgas.html,
	"sph/lj"_pair_sph_lj.html,
	"sph/rhosum"_pair_sph_rhosum.html,
	"sph/taitwater"_pair_sph_taitwater.html,
	"sph/taitwater/morris"_pair_sph_taitwater_morris.html,
	"srp"_pair_srp.html,
	"table/rx"_pair_table_rx.html,
	"tersoff/table (o)"_pair_tersoff.html,
	"thole"_pair_thole.html,
	"tip4p/long/soft (o)"_pair_lj_soft.html :tb(c=4,ea=c)

	:line

	Bond_style potentials :h4

	See the "bond_style"_bond_style.html command for an overview of bond
	potentials. Click on the style itself for a full description. Some
	of the styles have accelerated versions, which can be used if LAMMPS
	is built with the "appropriate accelerated
	package"_Section_accelerate.html. This is indicated by additional
	letters in parenthesis: g = GPU, i = USER-INTEL, k =
	KOKKOS, o = USER-OMP, t = OPT.

	"none"_bond_none.html,
	"zero"_bond_zero.html,
	"hybrid"_bond_hybrid.html,
	"class2 (ko)"_bond_class2.html,
	"fene (iko)"_bond_fene.html,
	"fene/expand (o)"_bond_fene_expand.html,
	"harmonic (ko)"_bond_harmonic.html,
	"morse (o)"_bond_morse.html,
	"nonlinear (o)"_bond_nonlinear.html,
	"quartic (o)"_bond_quartic.html,
	"table (o)"_bond_table.html :tb(c=4,ea=c)

	These are additional bond styles in USER packages, which can be used
	if "LAMMPS is built with the appropriate
	package"_Section_start.html#start_3.

	"harmonic/shift (o)"_bond_harmonic_shift.html,
	"harmonic/shift/cut (o)"_bond_harmonic_shift_cut.html,
	"oxdna/fene"_bond_oxdna.html,
	"oxdna2/fene"_bond_oxdna.html :tb(c=4,ea=c)

	:line

	Angle_style potentials :h4

	See the "angle_style"_angle_style.html command for an overview of
	angle potentials. Click on the style itself for a full description.
	Some of the styles have accelerated versions, which can be used if
	LAMMPS is built with the "appropriate accelerated
	package"_Section_accelerate.html. This is indicated by additional
	letters in parenthesis: g = GPU, i = USER-INTEL, k = KOKKOS, o =
	USER-OMP, t = OPT.

	"none"_angle_none.html,
	"zero"_angle_zero.html,
	"hybrid"_angle_hybrid.html,
	"charmm (ko)"_angle_charmm.html,
	"class2 (ko)"_angle_class2.html,
	"cosine (o)"_angle_cosine.html,
	"cosine/delta (o)"_angle_cosine_delta.html,
	"cosine/periodic (o)"_angle_cosine_periodic.html,
	"cosine/squared (o)"_angle_cosine_squared.html,
	"harmonic (iko)"_angle_harmonic.html,
	"table (o)"_angle_table.html :tb(c=4,ea=c)

	These are additional angle styles in USER packages, which can be used
	if "LAMMPS is built with the appropriate
	package"_Section_start.html#start_3.

	"cosine/shift (o)"_angle_cosine_shift.html,
	"cosine/shift/exp (o)"_angle_cosine_shift_exp.html,
	"dipole (o)"_angle_dipole.html,
	"fourier (o)"_angle_fourier.html,
	"fourier/simple (o)"_angle_fourier_simple.html,
	"quartic (o)"_angle_quartic.html,
	"sdk"_angle_sdk.html :tb(c=4,ea=c)

	:line

	Dihedral_style potentials :h4

	See the "dihedral_style"_dihedral_style.html command for an overview
	of dihedral potentials. Click on the style itself for a full
	description. Some of the styles have accelerated versions, which can
	be used if LAMMPS is built with the "appropriate accelerated
	package"_Section_accelerate.html. This is indicated by additional
	letters in parenthesis: g = GPU, i = USER-INTEL, k = KOKKOS, o =
	USER-OMP, t = OPT.

	"none"_dihedral_none.html,
	"zero"_dihedral_zero.html,
	"hybrid"_dihedral_hybrid.html,
	"charmm (ko)"_dihedral_charmm.html,
	"charmmfsw"_dihedral_charmm.html,
	"class2 (ko)"_dihedral_class2.html,
	"harmonic (io)"_dihedral_harmonic.html,
	"helix (o)"_dihedral_helix.html,
	"multi/harmonic (o)"_dihedral_multi_harmonic.html,
	"opls (iko)"_dihedral_opls.html :tb(c=4,ea=c)

	These are additional dihedral styles in USER packages, which can be
	used if "LAMMPS is built with the appropriate
	package"_Section_start.html#start_3.

	"cosine/shift/exp (o)"_dihedral_cosine_shift_exp.html,
	"fourier (o)"_dihedral_fourier.html,
	"nharmonic (o)"_dihedral_nharmonic.html,
	"quadratic (o)"_dihedral_quadratic.html,
	"spherical (o)"_dihedral_spherical.html,
	"table (o)"_dihedral_table.html :tb(c=4,ea=c)

	:line

	Improper_style potentials :h4

	See the "improper_style"_improper_style.html command for an overview
	of improper potentials. Click on the style itself for a full
	description. Some of the styles have accelerated versions, which can
	be used if LAMMPS is built with the "appropriate accelerated
	package"_Section_accelerate.html. This is indicated by additional
	letters in parenthesis: g = GPU, i = USER-INTEL, k = KOKKOS, o =
	USER-OMP, t = OPT.

	"none"_improper_none.html,
	"zero"_improper_zero.html,
	"hybrid"_improper_hybrid.html,
	"class2 (ko)"_improper_class2.html,
	"cvff (io)"_improper_cvff.html,
	"harmonic (ko)"_improper_harmonic.html,
	"umbrella (o)"_improper_umbrella.html :tb(c=4,ea=c)

	These are additional improper styles in USER packages, which can be
	used if "LAMMPS is built with the appropriate
	package"_Section_start.html#start_3.

	"cossq (o)"_improper_cossq.html,
	"distance"_improper_distance.html,
	"fourier (o)"_improper_fourier.html,
	"ring (o)"_improper_ring.html :tb(c=4,ea=c)

	:line

	Kspace solvers :h4

	See the "kspace_style"_kspace_style.html command for an overview of
	Kspace solvers. Click on the style itself for a full description.
	Some of the styles have accelerated versions, which can be used if
	LAMMPS is built with the "appropriate accelerated
	package"_Section_accelerate.html. This is indicated by additional
	letters in parenthesis: g = GPU, i = USER-INTEL, k = KOKKOS, o =
	USER-OMP, t = OPT.

	"ewald (o)"_kspace_style.html,
	"ewald/disp"_kspace_style.html,
	"msm (o)"_kspace_style.html,
	"msm/cg (o)"_kspace_style.html,
	"pppm (go)"_kspace_style.html,
	"pppm/cg (o)"_kspace_style.html,
	"pppm/disp"_kspace_style.html,
	"pppm/disp/tip4p"_kspace_style.html,
	"pppm/stagger"_kspace_style.html,
	"pppm/tip4p (o)"_kspace_style.html :tb(c=4,ea=c)
	diff --git a/doc/src/Section_errors.txt b/doc/src/Section_errors.txt
	index 832c5718a..5e0574b39 100644
	--- a/doc/src/Section_errors.txt
	+++ b/doc/src/Section_errors.txt
	@@ -1,11922 +1,11928 @@
	"Previous Section"_Section_python.html - "LAMMPS WWW Site"_lws -
	"LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next
	Section"_Section_history.html :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	12. Errors :h3

	This section describes the errors you can encounter when using LAMMPS,
	either conceptually, or as printed out by the program.

	12.1 "Common problems"_#err_1
	12.2 "Reporting bugs"_#err_2
	12.3 "Error & warning messages"_#err_3 :all(b)

	:line
	:line

	12.1 Common problems :link(err_1),h4

	If two LAMMPS runs do not produce the exact same answer on different
	machines or different numbers of processors, this is typically not a
	bug. In theory you should get identical answers on any number of
	processors and on any machine. In practice, numerical round-off can
	cause slight differences and eventual divergence of molecular dynamics
	phase space trajectories within a few 100s or few 1000s of timesteps.
	However, the statistical properties of the two runs (e.g. average
	energy or temperature) should still be the same.

	If the "velocity"_velocity.html command is used to set initial atom
	velocities, a particular atom can be assigned a different velocity
	when the problem is run on a different number of processors or on
	different machines. If this happens, the phase space trajectories of
	the two simulations will rapidly diverge. See the discussion of the
	{loop} option in the "velocity"_velocity.html command for details and
	options that avoid this issue.

	Similarly, the "create_atoms"_create_atoms.html command generates a
	lattice of atoms. For the same physical system, the ordering and
	numbering of atoms by atom ID may be different depending on the number
	of processors.

	Some commands use random number generators which may be setup to
	produce different random number streams on each processor and hence
	will produce different effects when run on different numbers of
	processors. A commonly-used example is the "fix
	langevin"_fix_langevin.html command for thermostatting.

	A LAMMPS simulation typically has two stages, setup and run. Most
	LAMMPS errors are detected at setup time; others like a bond
	stretching too far may not occur until the middle of a run.

	LAMMPS tries to flag errors and print informative error messages so
	you can fix the problem. For most errors it will also print the last
	input script command that it was processing. Of course, LAMMPS cannot
	figure out your physics or numerical mistakes, like choosing too big a
	timestep, specifying erroneous force field coefficients, or putting 2
	atoms on top of each other! If you run into errors that LAMMPS
	doesn't catch that you think it should flag, please send an email to
	the "developers"_http://lammps.sandia.gov/authors.html.

	If you get an error message about an invalid command in your input
	script, you can determine what command is causing the problem by
	looking in the log.lammps file or using the "echo command"_echo.html
	to see it on the screen. If you get an error like "Invalid ...
	style", with ... being fix, compute, pair, etc, it means that you
	mistyped the style name or that the command is part of an optional
	package which was not compiled into your executable. The list of
	available styles in your executable can be listed by using "the -h
	command-line argument"_Section_start.html#start_7. The installation
	and compilation of optional packages is explained in the "installation
	instructions"_Section_start.html#start_3.

	For a given command, LAMMPS expects certain arguments in a specified
	order. If you mess this up, LAMMPS will often flag the error, but it
	may also simply read a bogus argument and assign a value that is
	valid, but not what you wanted. E.g. trying to read the string "abc"
	as an integer value of 0. Careful reading of the associated doc page
	for the command should allow you to fix these problems. In most cases,
	where LAMMPS expects to read a number, either integer or floating point,
	it performs a stringent test on whether the provided input actually
	is an integer or floating-point number, respectively, and reject the
	input with an error message (for instance, when an integer is required,
	but a floating-point number 1.0 is provided):

	ERROR: Expected integer parameter in input script or data file :pre

	Some commands allow for using variable references in place of numeric
	constants so that the value can be evaluated and may change over the
	course of a run. This is typically done with the syntax {v_name} for a
	parameter, where name is the name of the variable. On the other hand,
	immediate variable expansion with the syntax ${name} is performed while
	reading the input and before parsing commands,

	NOTE: Using a variable reference (i.e. {v_name}) is only allowed if
	the documentation of the corresponding command explicitly says it is.

	Generally, LAMMPS will print a message to the screen and logfile and
	exit gracefully when it encounters a fatal error. Sometimes it will
	print a WARNING to the screen and logfile and continue on; you can
	decide if the WARNING is important or not. A WARNING message that is
	generated in the middle of a run is only printed to the screen, not to
	the logfile, to avoid cluttering up thermodynamic output. If LAMMPS
	crashes or hangs without spitting out an error message first then it
	could be a bug (see "this section"_#err_2) or one of the following
	cases:

	LAMMPS runs in the available memory a processor allows to be
	allocated. Most reasonable MD runs are compute limited, not memory
	limited, so this shouldn't be a bottleneck on most platforms. Almost
	all large memory allocations in the code are done via C-style malloc's
	which will generate an error message if you run out of memory.
	Smaller chunks of memory are allocated via C++ "new" statements. If
	you are unlucky you could run out of memory just when one of these
	small requests is made, in which case the code will crash or hang (in
	parallel), since LAMMPS doesn't trap on those errors.

	Illegal arithmetic can cause LAMMPS to run slow or crash. This is
	typically due to invalid physics and numerics that your simulation is
	computing. If you see wild thermodynamic values or NaN values in your
	LAMMPS output, something is wrong with your simulation. If you
	suspect this is happening, it is a good idea to print out
	thermodynamic info frequently (e.g. every timestep) via the
	"thermo"_thermo.html so you can monitor what is happening.
	Visualizing the atom movement is also a good idea to insure your model
	is behaving as you expect.

	In parallel, one way LAMMPS can hang is due to how different MPI
	implementations handle buffering of messages. If the code hangs
	without an error message, it may be that you need to specify an MPI
	setting or two (usually via an environment variable) to enable
	buffering or boost the sizes of messages that can be buffered.

	:line

	12.2 Reporting bugs :link(err_2),h4

	If you are confident that you have found a bug in LAMMPS, follow these
	steps.

	Check the "New features and bug
	fixes"_http://lammps.sandia.gov/bug.html section of the "LAMMPS WWW
	site"_lws to see if the bug has already been reported or fixed or the
	"Unfixed bug"_http://lammps.sandia.gov/unbug.html to see if a fix is
	pending.

	Check the "mailing list"_http://lammps.sandia.gov/mail.html
	to see if it has been discussed before.

	If not, send an email to the mailing list describing the problem with
	any ideas you have as to what is causing it or where in the code the
	problem might be. The developers will ask for more info if needed,
	such as an input script or data files.

	The most useful thing you can do to help us fix the bug is to isolate
	the problem. Run it on the smallest number of atoms and fewest number
	of processors and with the simplest input script that reproduces the
	bug and try to identify what command or combination of commands is
	causing the problem.

	As a last resort, you can send an email directly to the
	"developers"_http://lammps.sandia.gov/authors.html.

	:line

	12.3 Error & warning messages :h4,link(err_3)

	These are two alphabetic lists of the "ERROR"_#error and
	"WARNING"_#warn messages LAMMPS prints out and the reason why. If the
	explanation here is not sufficient, the documentation for the
	offending command may help.
	Error and warning messages also list the source file and line number
	where the error was generated. For example, this message

	ERROR: Illegal velocity command (velocity.cpp:78)

	means that line #78 in the file src/velocity.cpp generated the error.
	Looking in the source code may help you figure out what went wrong.

	Note that error messages from "user-contributed
	packages"_Section_start.html#start_3 are not listed here. If such an
	error occurs and is not self-explanatory, you'll need to look in the
	source code or contact the author of the package.

	Errors: :h4,link(error)

	:dlb

	{1-3 bond count is inconsistent} :dt

	An inconsistency was detected when computing the number of 1-3
	neighbors for each atom. This likely means something is wrong with
	the bond topologies you have defined. :dd

	{1-4 bond count is inconsistent} :dt

	An inconsistency was detected when computing the number of 1-4
	neighbors for each atom. This likely means something is wrong with
	the bond topologies you have defined. :dd

	{Accelerator sharing is not currently supported on system} :dt

	Multiple MPI processes cannot share the accelerator on your
	system. For NVIDIA GPUs, see the nvidia-smi command to change this
	setting. :dd

	{All angle coeffs are not set} :dt

	All angle coefficients must be set in the data file or by the
	angle_coeff command before running a simulation. :dd

	{All atom IDs = 0 but atom_modify id = yes} :dt

	Self-explanatory. :dd

	{All atoms of a swapped type must have same charge.} :dt

	Self-explanatory. :dd

	{All atoms of a swapped type must have the same charge.} :dt

	Self-explanatory. :dd

	{All bond coeffs are not set} :dt

	All bond coefficients must be set in the data file or by the
	bond_coeff command before running a simulation. :dd

	{All dihedral coeffs are not set} :dt

	All dihedral coefficients must be set in the data file or by the
	dihedral_coeff command before running a simulation. :dd

	{All improper coeffs are not set} :dt

	All improper coefficients must be set in the data file or by the
	improper_coeff command before running a simulation. :dd

	{All masses are not set} :dt

	For atom styles that define masses for each atom type, all masses must
	be set in the data file or by the mass command before running a
	simulation. They must also be set before using the velocity
	command. :dd

	{All mol IDs should be set for fix gcmc group atoms} :dt

	The molecule flag is on, yet not all molecule ids in the fix group
	have been set to non-zero positive values by the user. This is an
	error since all atoms in the fix gcmc group are eligible for deletion,
	rotation, and translation and therefore must have valid molecule ids. :dd

	{All pair coeffs are not set} :dt

	All pair coefficients must be set in the data file or by the
	pair_coeff command before running a simulation. :dd

	{All read_dump x,y,z fields must be specified for scaled, triclinic coords} :dt

	For triclinic boxes and scaled coordinates you must specify all 3 of
	the x,y,z fields, else LAMMPS cannot reconstruct the unscaled
	coordinates. :dd

	{All universe/uloop variables must have same # of values} :dt

	Self-explanatory. :dd

	{All variables in next command must be same style} :dt

	Self-explanatory. :dd

	{Angle atom missing in delete_bonds} :dt

	The delete_bonds command cannot find one or more atoms in a particular
	angle on a particular processor. The pairwise cutoff is too short or
	the atoms are too far apart to make a valid angle. :dd

	{Angle atom missing in set command} :dt

	The set command cannot find one or more atoms in a particular angle on
	a particular processor. The pairwise cutoff is too short or the atoms
	are too far apart to make a valid angle. :dd

	{Angle atoms %d %d %d missing on proc %d at step %ld} :dt

	One or more of 3 atoms needed to compute a particular angle are
	missing on this processor. Typically this is because the pairwise
	cutoff is set too short or the angle has blown apart and an atom is
	too far away. :dd

	{Angle atoms missing on proc %d at step %ld} :dt

	One or more of 3 atoms needed to compute a particular angle are
	missing on this processor. Typically this is because the pairwise
	cutoff is set too short or the angle has blown apart and an atom is
	too far away. :dd

	{Angle coeff for hybrid has invalid style} :dt

	Angle style hybrid uses another angle style as one of its
	coefficients. The angle style used in the angle_coeff command or read
	from a restart file is not recognized. :dd

	{Angle coeffs are not set} :dt

	No angle coefficients have been assigned in the data file or via the
	angle_coeff command. :dd

	{Angle extent > half of periodic box length} :dt

	This error was detected by the neigh_modify check yes setting. It is
	an error because the angle atoms are so far apart it is ambiguous how
	it should be defined. :dd

	{Angle potential must be defined for SHAKE} :dt

	When shaking angles, an angle_style potential must be used. :dd

	{Angle style hybrid cannot have hybrid as an argument} :dt

	Self-explanatory. :dd

	{Angle style hybrid cannot have none as an argument} :dt

	Self-explanatory. :dd

	{Angle style hybrid cannot use same angle style twice} :dt

	Self-explanatory. :dd

	{Angle table must range from 0 to 180 degrees} :dt

	Self-explanatory. :dd

	{Angle table parameters did not set N} :dt

	List of angle table parameters must include N setting. :dd

	{Angle_coeff command before angle_style is defined} :dt

	Coefficients cannot be set in the data file or via the angle_coeff
	command until an angle_style has been assigned. :dd

	{Angle_coeff command before simulation box is defined} :dt

	The angle_coeff command cannot be used before a read_data,
	read_restart, or create_box command. :dd

	{Angle_coeff command when no angles allowed} :dt

	The chosen atom style does not allow for angles to be defined. :dd

	{Angle_style command when no angles allowed} :dt

	The chosen atom style does not allow for angles to be defined. :dd

	{Angles assigned incorrectly} :dt

	Angles read in from the data file were not assigned correctly to
	atoms. This means there is something invalid about the topology
	definitions. :dd

	{Angles defined but no angle types} :dt

	The data file header lists angles but no angle types. :dd

	{Append boundary must be shrink/minimum} :dt

	The boundary style of the face where atoms are added
	must be of type m (shrink/minimum). :dd

	{Arccos of invalid value in variable formula} :dt

	Argument of arccos() must be between -1 and 1. :dd

	{Arcsin of invalid value in variable formula} :dt

	Argument of arcsin() must be between -1 and 1. :dd

	{Assigning body parameters to non-body atom} :dt

	Self-explanatory. :dd

	{Assigning ellipsoid parameters to non-ellipsoid atom} :dt

	Self-explanatory. :dd

	{Assigning line parameters to non-line atom} :dt

	Self-explanatory. :dd

	{Assigning quat to non-body atom} :dt

	Self-explanatory. :dd

	{Assigning tri parameters to non-tri atom} :dt

	Self-explanatory. :dd

	{At least one atom of each swapped type must be present to define charges.} :dt

	Self-explanatory. :dd

	{Atom IDs must be consecutive for velocity create loop all} :dt

	Self-explanatory. :dd

	{Atom IDs must be used for molecular systems} :dt

	Atom IDs are used to identify and find partner atoms in bonds. :dd

	{Atom count changed in fix neb} :dt

	This is not allowed in a NEB calculation. :dd

	{Atom count is inconsistent, cannot write data file} :dt

	The sum of atoms across processors does not equal the global number
	of atoms. Probably some atoms have been lost. :dd

	{Atom count is inconsistent, cannot write restart file} :dt

	Sum of atoms across processors does not equal initial total count.
	This is probably because you have lost some atoms. :dd

	{Atom in too many rigid bodies - boost MAXBODY} :dt

	Fix poems has a parameter MAXBODY (in fix_poems.cpp) which determines
	the maximum number of rigid bodies a single atom can belong to (i.e. a
	multibody joint). The bodies you have defined exceed this limit. :dd

	{Atom sort did not operate correctly} :dt

	This is an internal LAMMPS error. Please report it to the
	developers. :dd

	{Atom sorting has bin size = 0.0} :dt

	The neighbor cutoff is being used as the bin size, but it is zero.
	Thus you must explicitly list a bin size in the atom_modify sort
	command or turn off sorting. :dd

	{Atom style hybrid cannot have hybrid as an argument} :dt

	Self-explanatory. :dd

	{Atom style hybrid cannot use same atom style twice} :dt

	Self-explanatory. :dd

	{Atom style template molecule must have atom types} :dt

	The defined molecule(s) does not specify atom types. :dd

	{Atom style was redefined after using fix property/atom} :dt

	This is not allowed. :dd

	{Atom type must be zero in fix gcmc mol command} :dt

	Self-explanatory. :dd

	{Atom vector in equal-style variable formula} :dt

	Atom vectors generate one value per atom which is not allowed
	in an equal-style variable. :dd

	{Atom-style variable in equal-style variable formula} :dt

	Atom-style variables generate one value per atom which is not allowed
	in an equal-style variable. :dd

	{Atom_modify id command after simulation box is defined} :dt

	The atom_modify id command cannot be used after a read_data,
	read_restart, or create_box command. :dd

	{Atom_modify map command after simulation box is defined} :dt

	The atom_modify map command cannot be used after a read_data,
	read_restart, or create_box command. :dd

	{Atom_modify sort and first options cannot be used together} :dt

	Self-explanatory. :dd

	{Atom_style command after simulation box is defined} :dt

	The atom_style command cannot be used after a read_data,
	read_restart, or create_box command. :dd

	{Atom_style line can only be used in 2d simulations} :dt

	Self-explanatory. :dd

	{Atom_style tri can only be used in 3d simulations} :dt

	Self-explanatory. :dd

	{Atomfile variable could not read values} :dt

	Check the file assigned to the variable. :dd

	{Atomfile variable in equal-style variable formula} :dt

	Self-explanatory. :dd

	{Atomfile-style variable in equal-style variable formula} :dt

	Self-explanatory. :dd

	{Attempt to pop empty stack in fix box/relax} :dt

	Internal LAMMPS error. Please report it to the developers. :dd

	{Attempt to push beyond stack limit in fix box/relax} :dt

	Internal LAMMPS error. Please report it to the developers. :dd

	{Attempting to rescale a 0.0 temperature} :dt

	Cannot rescale a temperature that is already 0.0. :dd

	{Bad FENE bond} :dt

	Two atoms in a FENE bond have become so far apart that the bond cannot
	be computed. :dd

	{Bad TIP4P angle type for PPPM/TIP4P} :dt

	Specified angle type is not valid. :dd

	{Bad TIP4P angle type for PPPMDisp/TIP4P} :dt

	Specified angle type is not valid. :dd

	{Bad TIP4P bond type for PPPM/TIP4P} :dt

	Specified bond type is not valid. :dd

	{Bad TIP4P bond type for PPPMDisp/TIP4P} :dt

	Specified bond type is not valid. :dd

	{Bad fix ID in fix append/atoms command} :dt

	The value of the fix_id for keyword spatial must start with 'f_'. :dd

	{Bad grid of processors} :dt

	The 3d grid of processors defined by the processors command does not
	match the number of processors LAMMPS is being run on. :dd

	{Bad kspace_modify kmax/ewald parameter} :dt

	Kspace_modify values for the kmax/ewald keyword must be integers > 0 :dd

	{Bad kspace_modify slab parameter} :dt

	Kspace_modify value for the slab/volume keyword must be >= 2.0. :dd

	{Bad matrix inversion in mldivide3} :dt

	This error should not occur unless the matrix is badly formed. :dd

	{Bad principal moments} :dt

	Fix rigid did not compute the principal moments of inertia of a rigid
	group of atoms correctly. :dd

	{Bad quadratic solve for particle/line collision} :dt

	This is an internal error. It should normally not occur. :dd

	{Bad quadratic solve for particle/tri collision} :dt

	This is an internal error. It should normally not occur. :dd

	{Bad real space Coulomb cutoff in fix tune/kspace} :dt

	Fix tune/kspace tried to find the optimal real space Coulomb cutoff using
	the Newton-Rhaphson method, but found a non-positive or NaN cutoff :dd

	{Balance command before simulation box is defined} :dt

	The balance command cannot be used before a read_data, read_restart,
	or create_box command. :dd

	{Balance produced bad splits} :dt

	This should not occur. It means two or more cutting plane locations
	are on top of each other or out of order. Report the problem to the
	developers. :dd

	{Balance rcb cannot be used with comm_style brick} :dt

	Comm_style tiled must be used instead. :dd

	{Balance shift string is invalid} :dt

	The string can only contain the characters "x", "y", or "z". :dd

	{Bias compute does not calculate a velocity bias} :dt

	The specified compute must compute a bias for temperature. :dd

	{Bias compute does not calculate temperature} :dt

	The specified compute must compute temperature. :dd

	{Bias compute group does not match compute group} :dt

	The specified compute must operate on the same group as the parent
	compute. :dd

	{Big particle in fix srd cannot be point particle} :dt

	Big particles must be extended spheriods or ellipsoids. :dd

	{Bigint setting in lmptype.h is invalid} :dt

	Size of bigint is less than size of tagint. :dd

	{Bigint setting in lmptype.h is not compatible} :dt

	Format of bigint stored in restart file is not consistent with LAMMPS
	version you are running. See the settings in src/lmptype.h :dd

	{Bitmapped lookup tables require int/float be same size} :dt

	Cannot use pair tables on this machine, because of word sizes. Use
	the pair_modify command with table 0 instead. :dd

	{Bitmapped table in file does not match requested table} :dt

	Setting for bitmapped table in pair_coeff command must match table
	in file exactly. :dd

	{Bitmapped table is incorrect length in table file} :dt

	Number of table entries is not a correct power of 2. :dd

	{Bond and angle potentials must be defined for TIP4P} :dt

	Cannot use TIP4P pair potential unless bond and angle potentials
	are defined. :dd

	{Bond atom missing in box size check} :dt

	The 2nd atoms needed to compute a particular bond is missing on this
	processor. Typically this is because the pairwise cutoff is set too
	short or the bond has blown apart and an atom is too far away. :dd

	{Bond atom missing in delete_bonds} :dt

	The delete_bonds command cannot find one or more atoms in a particular
	bond on a particular processor. The pairwise cutoff is too short or
	the atoms are too far apart to make a valid bond. :dd

	{Bond atom missing in image check} :dt

	The 2nd atom in a particular bond is missing on this processor.
	Typically this is because the pairwise cutoff is set too short or the
	bond has blown apart and an atom is too far away. :dd

	{Bond atom missing in set command} :dt

	The set command cannot find one or more atoms in a particular bond on
	a particular processor. The pairwise cutoff is too short or the atoms
	are too far apart to make a valid bond. :dd

	{Bond atoms %d %d missing on proc %d at step %ld} :dt

	The 2nd atom needed to compute a particular bond is missing on this
	processor. Typically this is because the pairwise cutoff is set too
	short or the bond has blown apart and an atom is too far away. :dd

	{Bond atoms missing on proc %d at step %ld} :dt

	The 2nd atom needed to compute a particular bond is missing on this
	processor. Typically this is because the pairwise cutoff is set too
	short or the bond has blown apart and an atom is too far away. :dd

	{Bond coeff for hybrid has invalid style} :dt

	Bond style hybrid uses another bond style as one of its coefficients.
	The bond style used in the bond_coeff command or read from a restart
	file is not recognized. :dd

	{Bond coeffs are not set} :dt

	No bond coefficients have been assigned in the data file or via the
	bond_coeff command. :dd

	{Bond extent > half of periodic box length} :dt

	This error was detected by the neigh_modify check yes setting. It is
	an error because the bond atoms are so far apart it is ambiguous how
	it should be defined. :dd

	{Bond potential must be defined for SHAKE} :dt

	Cannot use fix shake unless bond potential is defined. :dd

	{Bond style hybrid cannot have hybrid as an argument} :dt

	Self-explanatory. :dd

	{Bond style hybrid cannot have none as an argument} :dt

	Self-explanatory. :dd

	{Bond style hybrid cannot use same bond style twice} :dt

	Self-explanatory. :dd

	{Bond style quartic cannot be used with 3,4-body interactions} :dt

	No angle, dihedral, or improper styles can be defined when using
	bond style quartic. :dd

	{Bond style quartic cannot be used with atom style template} :dt

	This bond style can change the bond topology which is not
	allowed with this atom style. :dd

	{Bond style quartic requires special_bonds = 1,1,1} :dt

	This is a restriction of the current bond quartic implementation. :dd

	{Bond table parameters did not set N} :dt

	List of bond table parameters must include N setting. :dd

	{Bond table values are not increasing} :dt

	The values in the tabulated file must be monotonically increasing. :dd

	{BondAngle coeff for hybrid angle has invalid format} :dt

	No "ba" field should appear in data file entry. :dd

	{BondBond coeff for hybrid angle has invalid format} :dt

	No "bb" field should appear in data file entry. :dd

	{Bond_coeff command before bond_style is defined} :dt

	Coefficients cannot be set in the data file or via the bond_coeff
	command until an bond_style has been assigned. :dd

	{Bond_coeff command before simulation box is defined} :dt

	The bond_coeff command cannot be used before a read_data,
	read_restart, or create_box command. :dd

	{Bond_coeff command when no bonds allowed} :dt

	The chosen atom style does not allow for bonds to be defined. :dd

	{Bond_style command when no bonds allowed} :dt

	The chosen atom style does not allow for bonds to be defined. :dd

	{Bonds assigned incorrectly} :dt

	Bonds read in from the data file were not assigned correctly to atoms.
	This means there is something invalid about the topology definitions. :dd

	{Bonds defined but no bond types} :dt

	The data file header lists bonds but no bond types. :dd

	{Both restart files must use % or neither} :dt

	Self-explanatory. :dd

	{Both restart files must use MPI-IO or neither} :dt

	Self-explanatory. :dd

	{Both sides of boundary must be periodic} :dt

	Cannot specify a boundary as periodic only on the lo or hi side. Must
	be periodic on both sides. :dd

	{Boundary command after simulation box is defined} :dt

	The boundary command cannot be used after a read_data, read_restart,
	or create_box command. :dd

	{Box bounds are invalid} :dt

	The box boundaries specified in the read_data file are invalid. The
	lo value must be less than the hi value for all 3 dimensions. :dd

	{Box command after simulation box is defined} :dt

	The box command cannot be used after a read_data, read_restart, or
	create_box command. :dd

	{CPU neighbor lists must be used for ellipsoid/sphere mix.} :dt

	When using Gay-Berne or RE-squared pair styles with both ellipsoidal and
	spherical particles, the neighbor list must be built on the CPU :dd

	{Can not specify Pxy/Pxz/Pyz in fix box/relax with non-triclinic box} :dt

	Only triclinic boxes can be used with off-diagonal pressure components.
	See the region prism command for details. :dd

	{Can not specify Pxy/Pxz/Pyz in fix nvt/npt/nph with non-triclinic box} :dt

	Only triclinic boxes can be used with off-diagonal pressure components.
	See the region prism command for details. :dd

	{Can only use -plog with multiple partitions} :dt

	Self-explanatory. See doc page discussion of command-line switches. :dd

	{Can only use -pscreen with multiple partitions} :dt

	Self-explanatory. See doc page discussion of command-line switches. :dd

	{Can only use Kokkos supported regions with Kokkos package} :dt

	Self-explanatory. :dd

	{Can only use NEB with 1-processor replicas} :dt

	This is current restriction for NEB as implemented in LAMMPS. :dd

	{Can only use TAD with 1-processor replicas for NEB} :dt

	This is current restriction for NEB as implemented in LAMMPS. :dd

	{Cannot (yet) do analytic differentiation with pppm/gpu} :dt

	This is a current restriction of this command. :dd

	{Cannot (yet) request ghost atoms with Kokkos half neighbor list} :dt

	This feature is not yet supported. :dd

	{Cannot (yet) use 'electron' units with dipoles} :dt

	This feature is not yet supported. :dd

	{Cannot (yet) use Ewald with triclinic box and slab correction} :dt

	This feature is not yet supported. :dd

	{Cannot (yet) use K-space slab correction with compute group/group for triclinic systems} :dt

	This option is not yet supported. :dd

	{Cannot (yet) use MSM with 2d simulation} :dt

	This feature is not yet supported. :dd

	{Cannot (yet) use PPPM with triclinic box and TIP4P} :dt

	This feature is not yet supported. :dd

	{Cannot (yet) use PPPM with triclinic box and kspace_modify diff ad} :dt

	This feature is not yet supported. :dd

	{Cannot (yet) use PPPM with triclinic box and slab correction} :dt

	This feature is not yet supported. :dd

	{Cannot (yet) use kspace slab correction with long-range dipoles and non-neutral systems or per-atom energy} :dt

	This feature is not yet supported. :dd

	{Cannot (yet) use kspace_modify diff ad with compute group/group} :dt

	This option is not yet supported. :dd

	{Cannot (yet) use kspace_style pppm/stagger with triclinic systems} :dt

	This feature is not yet supported. :dd

	{Cannot (yet) use molecular templates with Kokkos} :dt

	Self-explanatory. :dd

	{Cannot (yet) use respa with Kokkos} :dt

	Self-explanatory. :dd

	{Cannot (yet) use rigid bodies with fix deform and Kokkos} :dt

	Self-explanatory. :dd

	{Cannot (yet) use rigid bodies with fix nh and Kokkos} :dt

	Self-explanatory. :dd

	{Cannot (yet) use single precision with MSM (remove -DFFT_SINGLE from Makefile and recompile)} :dt

	Single precision cannot be used with MSM. :dd

	{Cannot add atoms to fix move variable} :dt

	Atoms can not be added afterwards to this fix option. :dd

	{Cannot append atoms to a triclinic box} :dt

	The simulation box must be defined with edges aligned with the
	Cartesian axes. :dd

	{Cannot balance in z dimension for 2d simulation} :dt

	Self-explanatory. :dd

	{Cannot change box ortho/triclinic with certain fixes defined} :dt

	This is because those fixes store the shape of the box. You need to
	use unfix to discard the fix, change the box, then redefine a new
	fix. :dd

	{Cannot change box ortho/triclinic with dumps defined} :dt

	This is because some dumps store the shape of the box. You need to
	use undump to discard the dump, change the box, then redefine a new
	dump. :dd

	{Cannot change box tilt factors for orthogonal box} :dt

	Cannot use tilt factors unless the simulation box is non-orthogonal. :dd

	{Cannot change box to orthogonal when tilt is non-zero} :dt

	Self-explanatory. :dd

	{Cannot change box z boundary to nonperiodic for a 2d simulation} :dt

	Self-explanatory. :dd

	{Cannot change dump_modify every for dump dcd} :dt

	The frequency of writing dump dcd snapshots cannot be changed. :dd

	{Cannot change dump_modify every for dump xtc} :dt

	The frequency of writing dump xtc snapshots cannot be changed. :dd

	{Cannot change timestep once fix srd is setup} :dt

	This is because various SRD properties depend on the timestep
	size. :dd

	{Cannot change timestep with fix pour} :dt

	This is because fix pour pre-computes the time delay for particles to
	fall out of the insertion volume due to gravity. :dd

	{Cannot change to comm_style brick from tiled layout} :dt

	Self-explanatory. :dd

	{Cannot change_box after reading restart file with per-atom info} :dt

	This is because the restart file info cannot be migrated with the
	atoms. You can get around this by performing a 0-timestep run which
	will assign the restart file info to actual atoms. :dd

	{Cannot change_box in xz or yz for 2d simulation} :dt

	Self-explanatory. :dd

	{Cannot change_box in z dimension for 2d simulation} :dt

	Self-explanatory. :dd

	{Cannot clear group all} :dt

	This operation is not allowed. :dd

	{Cannot close restart file - MPI error: %s} :dt

	This error was generated by MPI when reading/writing an MPI-IO restart
	file. :dd

	{Cannot compute initial g_ewald_disp} :dt

	LAMMPS failed to compute an initial guess for the PPPM_disp g_ewald_6
	factor that partitions the computation between real space and k-space
	for Dispersion interactions. :dd

	{Cannot create an atom map unless atoms have IDs} :dt

	The simulation requires a mapping from global atom IDs to local atoms,
	but the atoms that have been defined have no IDs. :dd

	{Cannot create atoms with undefined lattice} :dt

	Must use the lattice command before using the create_atoms
	command. :dd

	{Cannot create/grow a vector/array of pointers for %s} :dt

	LAMMPS code is making an illegal call to the templated memory
	allocaters, to create a vector or array of pointers. :dd

	{Cannot create_atoms after reading restart file with per-atom info} :dt

	The per-atom info was stored to be used when by a fix that you may
	re-define. If you add atoms before re-defining the fix, then there
	will not be a correct amount of per-atom info. :dd

	{Cannot create_box after simulation box is defined} :dt

	A simulation box can only be defined once. :dd

	{Cannot currently use pair reax with pair hybrid} :dt

	This is not yet supported. :dd

	{Cannot currently use pppm/gpu with fix balance.} :dt

	Self-explanatory. :dd

	{Cannot delete group all} :dt

	Self-explanatory. :dd

	{Cannot delete group currently used by a compute} :dt

	Self-explanatory. :dd

	{Cannot delete group currently used by a dump} :dt

	Self-explanatory. :dd

	{Cannot delete group currently used by a fix} :dt

	Self-explanatory. :dd

	{Cannot delete group currently used by atom_modify first} :dt

	Self-explanatory. :dd

	{Cannot delete_atoms bond yes for non-molecular systems} :dt

	Self-explanatory. :dd

	{Cannot displace_atoms after reading restart file with per-atom info} :dt

	This is because the restart file info cannot be migrated with the
	atoms. You can get around this by performing a 0-timestep run which
	will assign the restart file info to actual atoms. :dd

	{Cannot do GCMC on atoms in atom_modify first group} :dt

	This is a restriction due to the way atoms are organized in a list to
	enable the atom_modify first command. :dd

	{Cannot do atom/swap on atoms in atom_modify first group} :dt

	This is a restriction due to the way atoms are organized in a list to
	enable the atom_modify first command. :dd

	{Cannot dump sort on atom IDs with no atom IDs defined} :dt

	Self-explanatory. :dd

	{Cannot dump sort when multiple dump files are written} :dt

	In this mode, each processor dumps its atoms to a file, so
	no sorting is allowed. :dd

	{Cannot embed Python when also extending Python with LAMMPS} :dt

	When running LAMMPS via Python through the LAMMPS library interface
	you cannot also user the input script python command. :dd

	{Cannot evaporate atoms in atom_modify first group} :dt

	This is a restriction due to the way atoms are organized in
	a list to enable the atom_modify first command. :dd

	{Cannot find create_bonds group ID} :dt

	Self-explanatory. :dd

	{Cannot find delete_bonds group ID} :dt

	Group ID used in the delete_bonds command does not exist. :dd

	{Cannot find specified group ID for core particles} :dt

	Self-explanatory. :dd

	{Cannot find specified group ID for shell particles} :dt

	Self-explanatory. :dd

	{Cannot have both pair_modify shift and tail set to yes} :dt

	These 2 options are contradictory. :dd

	{Cannot intersect groups using a dynamic group} :dt

	This operation is not allowed. :dd

	{Cannot mix molecular and molecule template atom styles} :dt

	Self-explanatory. :dd

	{Cannot open -reorder file} :dt

	Self-explanatory. :dd

	{Cannot open ADP potential file %s} :dt

	The specified ADP potential file cannot be opened. Check that the
	path and name are correct. :dd

	{Cannot open AIREBO potential file %s} :dt

	The specified AIREBO potential file cannot be opened. Check that the
	path and name are correct. :dd

	{Cannot open BOP potential file %s} :dt

	The specified BOP potential file cannot be opened. Check that the
	path and name are correct. :dd

	{Cannot open COMB potential file %s} :dt

	The specified COMB potential file cannot be opened. Check that the
	path and name are correct. :dd

	{Cannot open COMB3 lib.comb3 file} :dt

	The COMB3 library file cannot be opened. Check that the path and name
	are correct. :dd

	{Cannot open COMB3 potential file %s} :dt

	The specified COMB3 potential file cannot be opened. Check that the
	path and name are correct. :dd

	{Cannot open EAM potential file %s} :dt

	The specified EAM potential file cannot be opened. Check that the
	path and name are correct. :dd

	{Cannot open EIM potential file %s} :dt

	The specified EIM potential file cannot be opened. Check that the
	path and name are correct. :dd

	{Cannot open LCBOP potential file %s} :dt

	The specified LCBOP potential file cannot be opened. Check that the
	path and name are correct. :dd

	{Cannot open MEAM potential file %s} :dt

	The specified MEAM potential file cannot be opened. Check that the
	path and name are correct. :dd

	{Cannot open SNAP coefficient file %s} :dt

	The specified SNAP coefficient file cannot be opened. Check that the
	path and name are correct. :dd

	{Cannot open SNAP parameter file %s} :dt

	The specified SNAP parameter file cannot be opened. Check that the
	path and name are correct. :dd

	{Cannot open Stillinger-Weber potential file %s} :dt

	The specified SW potential file cannot be opened. Check that the path
	and name are correct. :dd

	{Cannot open Tersoff potential file %s} :dt

	The specified potential file cannot be opened. Check that the path
	and name are correct. :dd

	{Cannot open Vashishta potential file %s} :dt

	The specified Vashishta potential file cannot be opened. Check that the path
	and name are correct. :dd

	{Cannot open balance output file} :dt

	Self-explanatory. :dd

	{Cannot open coul/streitz potential file %s} :dt

	The specified coul/streitz potential file cannot be opened. Check
	that the path and name are correct. :dd

	{Cannot open custom file} :dt

	Self-explanatory. :dd

	{Cannot open data file %s} :dt

	The specified file cannot be opened. Check that the path and name are
	correct. :dd

	{Cannot open dir to search for restart file} :dt

	Using a "*" in the name of the restart file will open the current
	directory to search for matching file names. :dd

	{Cannot open dump file} :dt

	Self-explanatory. :dd

	{Cannot open dump file %s} :dt

	The output file for the dump command cannot be opened. Check that the
	path and name are correct. :dd

	{Cannot open file %s} :dt

	The specified file cannot be opened. Check that the path and name are
	correct. If the file is a compressed file, also check that the gzip
	executable can be found and run. :dd

	{Cannot open file variable file %s} :dt

	The specified file cannot be opened. Check that the path and name are
	correct. :dd

	{Cannot open fix ave/chunk file %s} :dt

	The specified file cannot be opened. Check that the path and name are
	correct. :dd

	{Cannot open fix ave/correlate file %s} :dt

	The specified file cannot be opened. Check that the path and name are
	correct. :dd

	{Cannot open fix ave/histo file %s} :dt

	The specified file cannot be opened. Check that the path and name are
	correct. :dd

	{Cannot open fix ave/spatial file %s} :dt

	The specified file cannot be opened. Check that the path and name are
	correct. :dd

	{Cannot open fix ave/time file %s} :dt

	The specified file cannot be opened. Check that the path and name are
	correct. :dd

	{Cannot open fix balance output file} :dt

	Self-explanatory. :dd

	{Cannot open fix poems file %s} :dt

	The specified file cannot be opened. Check that the path and name are
	correct. :dd

	{Cannot open fix print file %s} :dt

	The output file generated by the fix print command cannot be opened :dd

	{Cannot open fix qeq parameter file %s} :dt

	The specified file cannot be opened. Check that the path and name are
	correct. :dd

	{Cannot open fix qeq/comb file %s} :dt

	The output file for the fix qeq/combs command cannot be opened.
	Check that the path and name are correct. :dd

	{Cannot open fix reax/bonds file %s} :dt

	The output file for the fix reax/bonds command cannot be opened.
	Check that the path and name are correct. :dd

	{Cannot open fix rigid infile %s} :dt

	The specified file cannot be opened. Check that the path and name are
	correct. :dd

	{Cannot open fix rigid restart file %s} :dt

	The specified file cannot be opened. Check that the path and name are
	correct. :dd

	{Cannot open fix rigid/small infile %s} :dt

	The specified file cannot be opened. Check that the path and name are
	correct. :dd

	{Cannot open fix tmd file %s} :dt

	The output file for the fix tmd command cannot be opened. Check that
	the path and name are correct. :dd

	{Cannot open fix ttm file %s} :dt

	The output file for the fix ttm command cannot be opened. Check that
	the path and name are correct. :dd

	{Cannot open gzipped file} :dt

	LAMMPS was compiled without support for reading and writing gzipped
	files through a pipeline to the gzip program with -DLAMMPS_GZIP. :dd

	{Cannot open input script %s} :dt

	Self-explanatory. :dd

	{Cannot open log.cite file} :dt

	This file is created when you use some LAMMPS features, to indicate
	what paper you should cite on behalf of those who implemented
	the feature. Check that you have write privileges into the directory
	you are running in. :dd

	{Cannot open log.lammps for writing} :dt

	The default LAMMPS log file cannot be opened. Check that the
	directory you are running in allows for files to be created. :dd

	{Cannot open logfile} :dt

	The LAMMPS log file named in a command-line argument cannot be opened.
	Check that the path and name are correct. :dd

	{Cannot open logfile %s} :dt

	The LAMMPS log file specified in the input script cannot be opened.
	Check that the path and name are correct. :dd

	{Cannot open molecule file %s} :dt

	The specified file cannot be opened. Check that the path and name are
	correct. :dd

	{Cannot open nb3b/harmonic potential file %s} :dt

	The specified potential file cannot be opened. Check that the path
	and name are correct. :dd

	{Cannot open pair_write file} :dt

	The specified output file for pair energies and forces cannot be
	opened. Check that the path and name are correct. :dd

	{Cannot open polymorphic potential file %s} :dt

	The specified polymorphic potential file cannot be opened. Check that
	the path and name are correct. :dd

	{Cannot open print file %s} :dt

	Self-explanatory. :dd

	{Cannot open processors output file} :dt

	Self-explanatory. :dd

	{Cannot open restart file %s} :dt

	Self-explanatory. :dd

	{Cannot open restart file for reading - MPI error: %s} :dt

	This error was generated by MPI when reading/writing an MPI-IO restart
	file. :dd

	{Cannot open restart file for writing - MPI error: %s} :dt

	This error was generated by MPI when reading/writing an MPI-IO restart
	file. :dd

	{Cannot open screen file} :dt

	The screen file specified as a command-line argument cannot be
	opened. Check that the directory you are running in allows for files
	to be created. :dd

	{Cannot open temporary file for world counter.} :dt

	Self-explanatory. :dd

	{Cannot open universe log file} :dt

	For a multi-partition run, the master log file cannot be opened.
	Check that the directory you are running in allows for files to be
	created. :dd

	{Cannot open universe screen file} :dt

	For a multi-partition run, the master screen file cannot be opened.
	Check that the directory you are running in allows for files to be
	created. :dd

	{Cannot read from restart file - MPI error: %s} :dt

	This error was generated by MPI when reading/writing an MPI-IO restart
	file. :dd

	{Cannot read_data without add keyword after simulation box is defined} :dt

	Self-explanatory. :dd

	{Cannot read_restart after simulation box is defined} :dt

	The read_restart command cannot be used after a read_data,
	read_restart, or create_box command. :dd

	{Cannot redefine variable as a different style} :dt

	An equal-style variable can be re-defined but only if it was
	originally an equal-style variable. :dd

	{Cannot replicate 2d simulation in z dimension} :dt

	The replicate command cannot replicate a 2d simulation in the z
	dimension. :dd

	{Cannot replicate with fixes that store atom quantities} :dt

	Either fixes are defined that create and store atom-based vectors or a
	restart file was read which included atom-based vectors for fixes.
	The replicate command cannot duplicate that information for new atoms.
	You should use the replicate command before fixes are applied to the
	system. :dd

	{Cannot reset timestep with a dynamic region defined} :dt

	Dynamic regions (see the region command) have a time dependence.
	Thus you cannot change the timestep when one or more of these
	are defined. :dd

	{Cannot reset timestep with a time-dependent fix defined} :dt

	You cannot reset the timestep when a fix that keeps track of elapsed
	time is in place. :dd

	{Cannot run 2d simulation with nonperiodic Z dimension} :dt

	Use the boundary command to make the z dimension periodic in order to
	run a 2d simulation. :dd

	{Cannot set bond topology types for atom style template} :dt

	The bond, angle, etc types cannot be changed for this atom style since
	they are static settings in the molecule template files. :dd

	{Cannot set both respa pair and inner/middle/outer} :dt

	In the rRESPA integrator, you must compute pairwise potentials either
	all together (pair), or in pieces (inner/middle/outer). You can't do
	both. :dd

	{Cannot set cutoff/multi before simulation box is defined} :dt

	Self-explanatory. :dd

	{Cannot set dpd/theta for this atom style} :dt

	Self-explanatory. :dd

	{Cannot set dump_modify flush for dump xtc} :dt

	Self-explanatory. :dd

	{Cannot set mass for this atom style} :dt

	This atom style does not support mass settings for each atom type.
	Instead they are defined on a per-atom basis in the data file. :dd

	{Cannot set meso/cv for this atom style} :dt

	Self-explanatory. :dd

	{Cannot set meso/e for this atom style} :dt

	Self-explanatory. :dd

	{Cannot set meso/rho for this atom style} :dt

	Self-explanatory. :dd

	{Cannot set non-zero image flag for non-periodic dimension} :dt

	Self-explanatory. :dd

	{Cannot set non-zero z velocity for 2d simulation} :dt

	Self-explanatory. :dd

	{Cannot set quaternion for atom that has none} :dt

	Self-explanatory. :dd

	{Cannot set quaternion with xy components for 2d system} :dt

	Self-explanatory. :dd

	{Cannot set respa hybrid and any of pair/inner/middle/outer} :dt

	In the rRESPA integrator, you must compute pairwise potentials either
	all together (pair), with different cutoff regions (inner/middle/outer),
	or per hybrid sub-style (hybrid). You cannot mix those. :dd

	{Cannot set respa middle without inner/outer} :dt

	In the rRESPA integrator, you must define both a inner and outer
	setting in order to use a middle setting. :dd

	{Cannot set restart file size - MPI error: %s} :dt

	This error was generated by MPI when reading/writing an MPI-IO restart
	file. :dd

	{Cannot set smd/contact/radius for this atom style} :dt

	Self-explanatory. :dd

	{Cannot set smd/mass/density for this atom style} :dt

	Self-explanatory. :dd

	{Cannot set temperature for fix rigid/nph} :dt

	The temp keyword cannot be specified. :dd

	{Cannot set theta for atom that is not a line} :dt

	Self-explanatory. :dd

	{Cannot set this attribute for this atom style} :dt

	The attribute being set does not exist for the defined atom style. :dd

	{Cannot set variable z velocity for 2d simulation} :dt

	Self-explanatory. :dd

	{Cannot skew triclinic box in z for 2d simulation} :dt

	Self-explanatory. :dd

	{Cannot subtract groups using a dynamic group} :dt

	This operation is not allowed. :dd

	{Cannot union groups using a dynamic group} :dt

	This operation is not allowed. :dd

	{Cannot use -cuda on and -kokkos on together} :dt

	This is not allowed since both packages can use GPUs. :dd

	{Cannot use -cuda on without USER-CUDA installed} :dt

	The USER-CUDA package must be installed via "make yes-user-cuda"
	before LAMMPS is built. :dd

	{Cannot use -kokkos on without KOKKOS installed} :dt

	Self-explanatory. :dd

	{Cannot use -reorder after -partition} :dt

	Self-explanatory. See doc page discussion of command-line switches. :dd

	{Cannot use Ewald with 2d simulation} :dt

	The kspace style ewald cannot be used in 2d simulations. You can use
	2d Ewald in a 3d simulation; see the kspace_modify command. :dd

	{Cannot use Ewald/disp solver on system with no charge, dipole, or LJ particles} :dt

	No atoms in system have a non-zero charge or dipole, or are LJ
	particles. Change charges/dipoles or change options of the kspace
	solver/pair style. :dd

	{Cannot use EwaldDisp with 2d simulation} :dt

	This is a current restriction of this command. :dd

	{Cannot use GPU package with USER-CUDA package enabled} :dt

	You cannot use both the GPU and USER-CUDA packages
	together. Use one or the other. :dd

	{Cannot use Kokkos pair style with rRESPA inner/middle} :dt

	Self-explanatory. :dd

	{Cannot use NEB unless atom map exists} :dt

	Use the atom_modify command to create an atom map. :dd

	{Cannot use NEB with a single replica} :dt

	Self-explanatory. :dd

	{Cannot use NEB with atom_modify sort enabled} :dt

	This is current restriction for NEB implemented in LAMMPS. :dd

	{Cannot use PPPM with 2d simulation} :dt

	The kspace style pppm cannot be used in 2d simulations. You can use
	2d PPPM in a 3d simulation; see the kspace_modify command. :dd

	{Cannot use PPPMDisp with 2d simulation} :dt

	The kspace style pppm/disp cannot be used in 2d simulations. You can
	use 2d pppm/disp in a 3d simulation; see the kspace_modify command. :dd

	{Cannot use PRD with a changing box} :dt

	The current box dimensions are not copied between replicas :dd

	{Cannot use PRD with a time-dependent fix defined} :dt

	PRD alters the timestep in ways that will mess up these fixes. :dd

	{Cannot use PRD with a time-dependent region defined} :dt

	PRD alters the timestep in ways that will mess up these regions. :dd

	{Cannot use PRD with atom_modify sort enabled} :dt

	This is a current restriction of PRD. You must turn off sorting,
	which is enabled by default, via the atom_modify command. :dd

	{Cannot use PRD with multi-processor replicas unless atom map exists} :dt

	Use the atom_modify command to create an atom map. :dd

	{Cannot use TAD unless atom map exists for NEB} :dt

	See atom_modify map command to set this. :dd

	{Cannot use TAD with a single replica for NEB} :dt

	NEB requires multiple replicas. :dd

	{Cannot use TAD with atom_modify sort enabled for NEB} :dt

	This is a current restriction of NEB. :dd

	{Cannot use a damped dynamics min style with fix box/relax} :dt

	This is a current restriction in LAMMPS. Use another minimizer
	style. :dd

	{Cannot use a damped dynamics min style with per-atom DOF} :dt

	This is a current restriction in LAMMPS. Use another minimizer
	style. :dd

	{Cannot use append/atoms in periodic dimension} :dt

	The boundary style of the face where atoms are added can not be of
	type p (periodic). :dd

	{Cannot use atomfile-style variable unless atom map exists} :dt

	Self-explanatory. See the atom_modify command to create a map. :dd

	{Cannot use both com and bias with compute temp/chunk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with buck/coul/cut/kk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with buck/coul/long/kk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with buck/kk} :dt

	That style is not supported by Kokkos. :dd

	{Cannot use chosen neighbor list style with coul/cut/kk} :dt

	That style is not supported by Kokkos. :dd

	{Cannot use chosen neighbor list style with coul/debye/kk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with coul/dsf/kk} :dt

	That style is not supported by Kokkos. :dd

	{Cannot use chosen neighbor list style with coul/wolf/kk} :dt

	That style is not supported by Kokkos. :dd

	{Cannot use chosen neighbor list style with lj/charmm/coul/charmm/implicit/kk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with lj/charmm/coul/charmm/kk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with lj/charmm/coul/long/kk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with lj/class2/coul/cut/kk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with lj/class2/coul/long/kk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with lj/class2/kk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with lj/cut/coul/cut/kk} :dt

	That style is not supported by Kokkos. :dd

	{Cannot use chosen neighbor list style with lj/cut/coul/debye/kk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with lj/cut/coul/long/kk} :dt

	That style is not supported by Kokkos. :dd

	{Cannot use chosen neighbor list style with lj/cut/kk} :dt

	That style is not supported by Kokkos. :dd

	{Cannot use chosen neighbor list style with lj/expand/kk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with lj/gromacs/coul/gromacs/kk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with lj/gromacs/kk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with lj/sdk/kk} :dt

	That style is not supported by Kokkos. :dd

	{Cannot use chosen neighbor list style with pair eam/kk} :dt

	That style is not supported by Kokkos. :dd

	{Cannot use chosen neighbor list style with pair eam/kk/alloy} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with pair eam/kk/fs} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with pair sw/kk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with tersoff/kk} :dt

	Self-explanatory. :dd

	{Cannot use chosen neighbor list style with tersoff/zbl/kk} :dt

	Self-explanatory. :dd

	{Cannot use compute chunk/atom bin z for 2d model} :dt

	Self-explanatory. :dd

	{Cannot use compute cluster/atom unless atoms have IDs} :dt

	Atom IDs are used to identify clusters. :dd

	{Cannot use create_atoms rotate unless single style} :dt

	Self-explanatory. :dd

	{Cannot use create_bonds unless atoms have IDs} :dt

	This command requires a mapping from global atom IDs to local atoms,
	but the atoms that have been defined have no IDs. :dd

	{Cannot use create_bonds with non-molecular system} :dt

	Self-explanatory. :dd

	{Cannot use cwiggle in variable formula between runs} :dt

	This is a function of elapsed time. :dd

	{Cannot use delete_atoms bond yes with atom_style template} :dt

	This is because the bonds for that atom style are hardwired in the
	molecule template. :dd

	{Cannot use delete_atoms unless atoms have IDs} :dt

	Your atoms do not have IDs, so the delete_atoms command cannot be
	used. :dd

	{Cannot use delete_bonds with non-molecular system} :dt

	Your choice of atom style does not have bonds. :dd

	{Cannot use dump_modify fileper without % in dump file name} :dt

	Self-explanatory. :dd

	{Cannot use dump_modify nfile without % in dump file name} :dt

	Self-explanatory. :dd

	{Cannot use dynamic group with fix adapt atom} :dt

	This is not yet supported. :dd

	{Cannot use fix TMD unless atom map exists} :dt

	Using this fix requires the ability to lookup an atom index, which is
	provided by an atom map. An atom map does not exist (by default) for
	non-molecular problems. Using the atom_modify map command will force
	an atom map to be created. :dd

	{Cannot use fix ave/spatial z for 2 dimensional model} :dt

	Self-explanatory. :dd

	{Cannot use fix bond/break with non-molecular systems} :dt

	Only systems with bonds that can be changed can be used. Atom_style
	template does not qualify. :dd

	{Cannot use fix bond/create with non-molecular systems} :dt

	Only systems with bonds that can be changed can be used. Atom_style
	template does not qualify. :dd

	{Cannot use fix bond/swap with non-molecular systems} :dt

	Only systems with bonds that can be changed can be used. Atom_style
	template does not qualify. :dd

	{Cannot use fix box/relax on a 2nd non-periodic dimension} :dt

	When specifying an off-diagonal pressure component, the 2nd of the two
	dimensions must be periodic. E.g. if the xy component is specified,
	then the y dimension must be periodic. :dd

	{Cannot use fix box/relax on a non-periodic dimension} :dt

	When specifying a diagonal pressure component, the dimension must be
	periodic. :dd

	{Cannot use fix box/relax with both relaxation and scaling on a tilt factor} :dt

	When specifying scaling on a tilt factor component, that component can not
	also be controlled by the barostat. E.g. if scalexy yes is specified and
	also keyword tri or xy, this is wrong. :dd

	{Cannot use fix box/relax with tilt factor scaling on a 2nd non-periodic dimension} :dt

	When specifying scaling on a tilt factor component, the 2nd of the two
	dimensions must be periodic. E.g. if the xy component is specified,
	then the y dimension must be periodic. :dd

	{Cannot use fix deform on a shrink-wrapped boundary} :dt

	The x, y, z options cannot be applied to shrink-wrapped
	dimensions. :dd

	{Cannot use fix deform tilt on a shrink-wrapped 2nd dim} :dt

	This is because the shrink-wrapping will change the value
	of the strain implied by the tilt factor. :dd

	{Cannot use fix deform trate on a box with zero tilt} :dt

	The trate style alters the current strain. :dd

	{Cannot use fix deposit rigid and not molecule} :dt

	Self-explanatory. :dd

	{Cannot use fix deposit rigid and shake} :dt

	These two attributes are conflicting. :dd

	{Cannot use fix deposit shake and not molecule} :dt

	Self-explanatory. :dd

	{Cannot use fix enforce2d with 3d simulation} :dt

	Self-explanatory. :dd

	{Cannot use fix gcmc in a 2d simulation} :dt

	Fix gcmc is set up to run in 3d only. No 2d simulations with fix gcmc
	are allowed. :dd

	{Cannot use fix gcmc shake and not molecule} :dt

	Self-explanatory. :dd

	{Cannot use fix msst without per-type mass defined} :dt

	Self-explanatory. :dd

	{Cannot use fix npt and fix deform on same component of stress tensor} :dt

	This would be changing the same box dimension twice. :dd

	{Cannot use fix nvt/npt/nph on a 2nd non-periodic dimension} :dt

	When specifying an off-diagonal pressure component, the 2nd of the two
	dimensions must be periodic. E.g. if the xy component is specified,
	then the y dimension must be periodic. :dd

	{Cannot use fix nvt/npt/nph on a non-periodic dimension} :dt

	When specifying a diagonal pressure component, the dimension must be
	periodic. :dd

	{Cannot use fix nvt/npt/nph with both xy dynamics and xy scaling} :dt

	Self-explanatory. :dd

	{Cannot use fix nvt/npt/nph with both xz dynamics and xz scaling} :dt

	Self-explanatory. :dd

	{Cannot use fix nvt/npt/nph with both yz dynamics and yz scaling} :dt

	Self-explanatory. :dd

	{Cannot use fix nvt/npt/nph with xy scaling when y is non-periodic dimension} :dt

	The 2nd dimension in the barostatted tilt factor must be periodic. :dd

	{Cannot use fix nvt/npt/nph with xz scaling when z is non-periodic dimension} :dt

	The 2nd dimension in the barostatted tilt factor must be periodic. :dd

	{Cannot use fix nvt/npt/nph with yz scaling when z is non-periodic dimension} :dt

	The 2nd dimension in the barostatted tilt factor must be periodic. :dd

	{Cannot use fix pour rigid and not molecule} :dt

	Self-explanatory. :dd

	{Cannot use fix pour rigid and shake} :dt

	These two attributes are conflicting. :dd

	{Cannot use fix pour shake and not molecule} :dt

	Self-explanatory. :dd

	{Cannot use fix pour with triclinic box} :dt

	This option is not yet supported. :dd

	{Cannot use fix press/berendsen and fix deform on same component of stress tensor} :dt

	These commands both change the box size/shape, so you cannot use both
	together. :dd

	{Cannot use fix press/berendsen on a non-periodic dimension} :dt

	Self-explanatory. :dd

	{Cannot use fix press/berendsen with triclinic box} :dt

	Self-explanatory. :dd

	{Cannot use fix reax/bonds without pair_style reax} :dt

	Self-explanatory. :dd

	{Cannot use fix rigid npt/nph and fix deform on same component of stress tensor} :dt

	This would be changing the same box dimension twice. :dd

	{Cannot use fix rigid npt/nph on a non-periodic dimension} :dt

	When specifying a diagonal pressure component, the dimension must be
	periodic. :dd

	{Cannot use fix rigid/small npt/nph on a non-periodic dimension} :dt

	When specifying a diagonal pressure component, the dimension must be
	periodic. :dd

	{Cannot use fix shake with non-molecular system} :dt

	Your choice of atom style does not have bonds. :dd

	{Cannot use fix ttm with 2d simulation} :dt

	This is a current restriction of this fix due to the grid it creates. :dd

	{Cannot use fix ttm with triclinic box} :dt

	This is a current restriction of this fix due to the grid it creates. :dd

	{Cannot use fix tune/kspace without a kspace style} :dt

	Self-explanatory. :dd

	{Cannot use fix tune/kspace without a pair style} :dt

	This fix (tune/kspace) can only be used when a pair style has been specified. :dd

	{Cannot use fix wall in periodic dimension} :dt

	Self-explanatory. :dd

	{Cannot use fix wall zlo/zhi for a 2d simulation} :dt

	Self-explanatory. :dd

	{Cannot use fix wall/reflect in periodic dimension} :dt

	Self-explanatory. :dd

	{Cannot use fix wall/reflect zlo/zhi for a 2d simulation} :dt

	Self-explanatory. :dd

	{Cannot use fix wall/srd in periodic dimension} :dt

	Self-explanatory. :dd

	{Cannot use fix wall/srd more than once} :dt

	Nor is their a need to since multiple walls can be specified
	in one command. :dd

	{Cannot use fix wall/srd without fix srd} :dt

	Self-explanatory. :dd

	{Cannot use fix wall/srd zlo/zhi for a 2d simulation} :dt

	Self-explanatory. :dd

	{Cannot use fix_deposit unless atoms have IDs} :dt

	Self-explanatory. :dd

	{Cannot use fix_pour unless atoms have IDs} :dt

	Self-explanatory. :dd

	{Cannot use include command within an if command} :dt

	Self-explanatory. :dd

	{Cannot use lines with fix srd unless overlap is set} :dt

	This is because line segments are connected to each other. :dd

	{Cannot use multiple fix wall commands with pair brownian} :dt

	Self-explanatory. :dd

	{Cannot use multiple fix wall commands with pair lubricate} :dt

	Self-explanatory. :dd

	{Cannot use multiple fix wall commands with pair lubricate/poly} :dt

	Self-explanatory. :dd

	{Cannot use multiple fix wall commands with pair lubricateU} :dt

	Self-explanatory. :dd

	{Cannot use neigh_modify exclude with GPU neighbor builds} :dt

	This is a current limitation of the GPU implementation
	in LAMMPS. :dd

	{Cannot use neighbor bins - box size << cutoff} :dt

	Too many neighbor bins will be created. This typically happens when
	the simulation box is very small in some dimension, compared to the
	neighbor cutoff. Use the "nsq" style instead of "bin" style. :dd

	{Cannot use newton pair with beck/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with born/coul/long/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with born/coul/wolf/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with born/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with buck/coul/cut/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with buck/coul/long/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with buck/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with colloid/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with coul/cut/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with coul/debye/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with coul/dsf/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with coul/long/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with dipole/cut/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with dipole/sf/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with dpd/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with dpd/tstat/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with eam/alloy/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with eam/fs/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with eam/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with gauss/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with gayberne/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with lj/charmm/coul/long/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with lj/class2/coul/long/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with lj/class2/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with lj/cubic/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with lj/cut/coul/cut/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with lj/cut/coul/debye/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with lj/cut/coul/dsf/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with lj/cut/coul/long/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with lj/cut/coul/msm/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with lj/cut/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with lj/expand/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with lj/gromacs/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with lj/sdk/coul/long/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with lj/sdk/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with lj96/cut/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with mie/cut/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with morse/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with resquared/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with soft/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with table/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with yukawa/colloid/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with yukawa/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use newton pair with zbl/gpu pair style} :dt

	Self-explanatory. :dd

	{Cannot use non-zero forces in an energy minimization} :dt

	Fix setforce cannot be used in this manner. Use fix addforce
	instead. :dd

	{Cannot use nonperiodic boundares with fix ttm} :dt

	This fix requires a fully periodic simulation box. :dd

	{Cannot use nonperiodic boundaries with Ewald} :dt

	For kspace style ewald, all 3 dimensions must have periodic boundaries
	unless you use the kspace_modify command to define a 2d slab with a
	non-periodic z dimension. :dd

	{Cannot use nonperiodic boundaries with EwaldDisp} :dt

	For kspace style ewald/disp, all 3 dimensions must have periodic
	boundaries unless you use the kspace_modify command to define a 2d
	slab with a non-periodic z dimension. :dd

	{Cannot use nonperiodic boundaries with PPPM} :dt

	For kspace style pppm, all 3 dimensions must have periodic boundaries
	unless you use the kspace_modify command to define a 2d slab with a
	non-periodic z dimension. :dd

	{Cannot use nonperiodic boundaries with PPPMDisp} :dt

	For kspace style pppm/disp, all 3 dimensions must have periodic
	boundaries unless you use the kspace_modify command to define a 2d
	slab with a non-periodic z dimension. :dd

	{Cannot use order greater than 8 with pppm/gpu.} :dt

	Self-explanatory. :dd

	{Cannot use package gpu neigh yes with triclinic box} :dt

	This is a current restriction in LAMMPS. :dd

	{Cannot use pair hybrid with GPU neighbor list builds} :dt

	Neighbor list builds must be done on the CPU for this pair style. :dd

	{Cannot use pair tail corrections with 2d simulations} :dt

	The correction factors are only currently defined for 3d systems. :dd

	{Cannot use processors part command without using partitions} :dt

	See the command-line -partition switch. :dd

	{Cannot use ramp in variable formula between runs} :dt

	This is because the ramp() function is time dependent. :dd

	{Cannot use read_data add before simulation box is defined} :dt

	Self-explanatory. :dd

	{Cannot use read_data extra with add flag} :dt

	Self-explanatory. :dd

	{Cannot use read_data offset without add flag} :dt

	Self-explanatory. :dd

	{Cannot use read_data shift without add flag} :dt

	Self-explanatory. :dd

	{Cannot use region INF or EDGE when box does not exist} :dt

	Regions that extend to the box boundaries can only be used after the
	create_box command has been used. :dd

	{Cannot use set atom with no atom IDs defined} :dt

	Atom IDs are not defined, so they cannot be used to identify an atom. :dd

	{Cannot use set mol with no molecule IDs defined} :dt

	Self-explanatory. :dd

	{Cannot use swiggle in variable formula between runs} :dt

	This is a function of elapsed time. :dd

	{Cannot use tris with fix srd unless overlap is set} :dt

	This is because triangles are connected to each other. :dd

	{Cannot use variable energy with constant efield in fix efield} :dt

	LAMMPS computes the energy itself when the E-field is constant. :dd

	{Cannot use variable energy with constant force in fix addforce} :dt

	This is because for constant force, LAMMPS can compute the change
	in energy directly. :dd

	{Cannot use variable every setting for dump dcd} :dt

	The format of DCD dump files requires snapshots be output
	at a constant frequency. :dd

	{Cannot use variable every setting for dump xtc} :dt

	The format of this file requires snapshots at regular intervals. :dd

	{Cannot use vdisplace in variable formula between runs} :dt

	This is a function of elapsed time. :dd

	{Cannot use velocity bias command without temp keyword} :dt

	Self-explanatory. :dd

	{Cannot use velocity create loop all unless atoms have IDs} :dt

	Atoms in the simulation to do not have IDs, so this style
	of velocity creation cannot be performed. :dd

	{Cannot use wall in periodic dimension} :dt

	Self-explanatory. :dd

	{Cannot use write_restart fileper without % in restart file name} :dt

	Self-explanatory. :dd

	{Cannot use write_restart nfile without % in restart file name} :dt

	Self-explanatory. :dd

	{Cannot wiggle and shear fix wall/gran} :dt

	Cannot specify both options at the same time. :dd

	{Cannot write to restart file - MPI error: %s} :dt

	This error was generated by MPI when reading/writing an MPI-IO restart
	file. :dd

	{Cannot yet use KSpace solver with grid with comm style tiled} :dt

	This is current restriction in LAMMPS. :dd

	{Cannot yet use comm_style tiled with multi-mode comm} :dt

	Self-explanatory. :dd

	{Cannot yet use comm_style tiled with triclinic box} :dt

	Self-explanatory. :dd

	{Cannot yet use compute tally with Kokkos} :dt

	This feature is not yet supported. :dd

	{Cannot yet use fix bond/break with this improper style} :dt

	This is a current restriction in LAMMPS. :dd

	{Cannot yet use fix bond/create with this improper style} :dt

	This is a current restriction in LAMMPS. :dd

	{Cannot yet use minimize with Kokkos} :dt

	This feature is not yet supported. :dd

	{Cannot yet use pair hybrid with Kokkos} :dt

	This feature is not yet supported. :dd

	{Cannot zero Langevin force of 0 atoms} :dt

	The group has zero atoms, so you cannot request its force
	be zeroed. :dd

	{Cannot zero gld force for zero atoms} :dt

	There are no atoms currently in the group. :dd

	{Cannot zero momentum of no atoms} :dt

	Self-explanatory. :dd

	{Change_box command before simulation box is defined} :dt

	Self-explanatory. :dd

	{Change_box volume used incorrectly} :dt

	The "dim volume" option must be used immediately following one or two
	settings for "dim1 ..." (and optionally "dim2 ...") and must be for a
	different dimension, i.e. dim != dim1 and dim != dim2. :dd

	{Chunk/atom compute does not exist for compute angmom/chunk} :dt

	Self-explanatory. :dd

	{Chunk/atom compute does not exist for compute com/chunk} :dt

	Self-explanatory. :dd

	{Chunk/atom compute does not exist for compute gyration/chunk} :dt

	Self-explanatory. :dd

	{Chunk/atom compute does not exist for compute inertia/chunk} :dt

	Self-explanatory. :dd

	{Chunk/atom compute does not exist for compute msd/chunk} :dt

	Self-explanatory. :dd

	{Chunk/atom compute does not exist for compute omega/chunk} :dt

	Self-explanatory. :dd

	{Chunk/atom compute does not exist for compute property/chunk} :dt

	Self-explanatory. :dd

	{Chunk/atom compute does not exist for compute temp/chunk} :dt

	Self-explanatory. :dd

	{Chunk/atom compute does not exist for compute torque/chunk} :dt

	Self-explanatory. :dd

	{Chunk/atom compute does not exist for compute vcm/chunk} :dt

	Self-explanatory. :dd

	{Chunk/atom compute does not exist for fix ave/chunk} :dt

	Self-explanatory. :dd

	{Comm tiled invalid index in box drop brick} :dt

	Internal error check in comm_style tiled which should not occur.
	Contact the developers. :dd

	{Comm tiled mis-match in box drop brick} :dt

	Internal error check in comm_style tiled which should not occur.
	Contact the developers. :dd

	{Comm_modify group != atom_modify first group} :dt

	Self-explanatory. :dd

	{Communication cutoff for comm_style tiled cannot exceed periodic box length} :dt

	Self-explanatory. :dd

	{Communication cutoff too small for SNAP micro load balancing} :dt

	This can happen if you change the neighbor skin after your pair_style
	command or if your box dimensions grow during a run. You can set the
	cutoff explicitly via the comm_modify cutoff command. :dd

	{Compute %s does not allow use of dynamic group} :dt

	Dynamic groups have not yet been enabled for this compute. :dd

	{Compute ID for compute chunk /atom does not exist} :dt

	Self-explanatory. :dd

	{Compute ID for compute chunk/atom does not exist} :dt

	Self-explanatory. :dd

	{Compute ID for compute reduce does not exist} :dt

	Self-explanatory. :dd

	{Compute ID for compute slice does not exist} :dt

	Self-explanatory. :dd

	{Compute ID for fix ave/atom does not exist} :dt

	Self-explanatory. :dd

	{Compute ID for fix ave/chunk does not exist} :dt

	Self-explanatory. :dd

	{Compute ID for fix ave/correlate does not exist} :dt

	Self-explanatory. :dd

	{Compute ID for fix ave/histo does not exist} :dt

	Self-explanatory. :dd

	{Compute ID for fix ave/spatial does not exist} :dt

	Self-explanatory. :dd

	{Compute ID for fix ave/time does not exist} :dt

	Self-explanatory. :dd

	{Compute ID for fix store/state does not exist} :dt

	Self-explanatory. :dd

	{Compute ID for fix vector does not exist} :dt

	Self-explanatory. :dd

	{Compute ID must be alphanumeric or underscore characters} :dt

	Self-explanatory. :dd

	{Compute angle/local used when angles are not allowed} :dt

	The atom style does not support angles. :dd

	{Compute angmom/chunk does not use chunk/atom compute} :dt

	The style of the specified compute is not chunk/atom. :dd

	{Compute body/local requires atom style body} :dt

	Self-explanatory. :dd

	{Compute bond/local used when bonds are not allowed} :dt

	The atom style does not support bonds. :dd

	{Compute centro/atom requires a pair style be defined} :dt

	This is because the computation of the centro-symmetry values
	uses a pairwise neighbor list. :dd

	{Compute chunk/atom bin/cylinder radius is too large for periodic box} :dt

	Radius cannot be bigger than 1/2 of a non-axis periodic dimension. :dd

	{Compute chunk/atom bin/sphere radius is too large for periodic box} :dt

	Radius cannot be bigger than 1/2 of any periodic dimension. :dd

	{Compute chunk/atom compute array is accessed out-of-range} :dt

	The index for the array is out of bounds. :dd

	{Compute chunk/atom compute does not calculate a per-atom array} :dt

	Self-explanatory. :dd

	{Compute chunk/atom compute does not calculate a per-atom vector} :dt

	Self-explanatory. :dd

	{Compute chunk/atom compute does not calculate per-atom values} :dt

	Self-explanatory. :dd

	{Compute chunk/atom cylinder axis must be z for 2d} :dt

	Self-explanatory. :dd

	{Compute chunk/atom fix array is accessed out-of-range} :dt

	the index for the array is out of bounds. :dd

	{Compute chunk/atom fix does not calculate a per-atom array} :dt

	Self-explanatory. :dd

	{Compute chunk/atom fix does not calculate a per-atom vector} :dt

	Self-explanatory. :dd

	{Compute chunk/atom fix does not calculate per-atom values} :dt

	Self-explanatory. :dd

	{Compute chunk/atom for triclinic boxes requires units reduced} :dt

	Self-explanatory. :dd

	{Compute chunk/atom ids once but nchunk is not once} :dt

	You cannot assign chunks IDs to atom permanently if the number of
	chunks may change. :dd

	{Compute chunk/atom molecule for non-molecular system} :dt

	Self-explanatory. :dd

	{Compute chunk/atom sphere z origin must be 0.0 for 2d} :dt

	Self-explanatory. :dd

	{Compute chunk/atom stores no IDs for compute property/chunk} :dt

	It will only store IDs if its compress option is enabled. :dd

	{Compute chunk/atom stores no coord1 for compute property/chunk} :dt

	Only certain binning options for compute chunk/atom store coordinates. :dd

	{Compute chunk/atom stores no coord2 for compute property/chunk} :dt

	Only certain binning options for compute chunk/atom store coordinates. :dd

	{Compute chunk/atom stores no coord3 for compute property/chunk} :dt

	Only certain binning options for compute chunk/atom store coordinates. :dd

	{Compute chunk/atom variable is not atom-style variable} :dt

	Self-explanatory. :dd

	{Compute chunk/atom without bins cannot use discard mixed} :dt

	That discard option only applies to the binning styles. :dd

	{Compute cluster/atom cutoff is longer than pairwise cutoff} :dt

	Cannot identify clusters beyond cutoff. :dd

	{Compute cluster/atom requires a pair style be defined} :dt

	This is so that the pair style defines a cutoff distance which
	is used to find clusters. :dd

	{Compute cna/atom cutoff is longer than pairwise cutoff} :dt

	Self-explanatory. :dd

	{Compute cna/atom requires a pair style be defined} :dt

	Self-explanatory. :dd

	{Compute com/chunk does not use chunk/atom compute} :dt

	The style of the specified compute is not chunk/atom. :dd

	{Compute contact/atom requires a pair style be defined} :dt

	Self-explanatory. :dd

	{Compute contact/atom requires atom style sphere} :dt

	Self-explanatory. :dd

	{Compute coord/atom cutoff is longer than pairwise cutoff} :dt

	Cannot compute coordination at distances longer than the pair cutoff,
	since those atoms are not in the neighbor list. :dd

	{Compute coord/atom requires a pair style be defined} :dt

	Self-explanatory. :dd

	{Compute damage/atom requires peridynamic potential} :dt

	Damage is a Peridynamic-specific metric. It requires you
	to be running a Peridynamics simulation. :dd

	{Compute dihedral/local used when dihedrals are not allowed} :dt

	The atom style does not support dihedrals. :dd

	{Compute dilatation/atom cannot be used with this pair style} :dt

	Self-explanatory. :dd

	{Compute dilatation/atom requires Peridynamic pair style} :dt

	Self-explanatory. :dd

	{Compute does not allow an extra compute or fix to be reset} :dt

	This is an internal LAMMPS error. Please report it to the
	developers. :dd

	{Compute erotate/asphere requires atom style ellipsoid or line or tri} :dt

	Self-explanatory. :dd

	{Compute erotate/asphere requires extended particles} :dt

	This compute cannot be used with point particles. :dd

	{Compute erotate/rigid with non-rigid fix-ID} :dt

	Self-explanatory. :dd

	{Compute erotate/sphere requires atom style sphere} :dt

	Self-explanatory. :dd

	{Compute erotate/sphere/atom requires atom style sphere} :dt

	Self-explanatory. :dd

	{Compute event/displace has invalid fix event assigned} :dt

	This is an internal LAMMPS error. Please report it to the
	developers. :dd

	{Compute group/group group ID does not exist} :dt

	Self-explanatory. :dd

	{Compute gyration/chunk does not use chunk/atom compute} :dt

	The style of the specified compute is not chunk/atom. :dd

	{Compute heat/flux compute ID does not compute ke/atom} :dt

	Self-explanatory. :dd

	{Compute heat/flux compute ID does not compute pe/atom} :dt

	Self-explanatory. :dd

	{Compute heat/flux compute ID does not compute stress/atom} :dt

	Self-explanatory. :dd

	{Compute hexorder/atom cutoff is longer than pairwise cutoff} :dt

	Cannot compute order parameter beyond cutoff. :dd

	{Compute hexorder/atom requires a pair style be defined} :dt

	Self-explanatory. :dd

	{Compute improper/local used when impropers are not allowed} :dt

	The atom style does not support impropers. :dd

	{Compute inertia/chunk does not use chunk/atom compute} :dt

	The style of the specified compute is not chunk/atom. :dd

	{Compute ke/rigid with non-rigid fix-ID} :dt

	Self-explanatory. :dd

	{Compute msd/chunk does not use chunk/atom compute} :dt

	The style of the specified compute is not chunk/atom. :dd

	{Compute msd/chunk nchunk is not static} :dt

	This is required because the MSD cannot be computed consistently if
	the number of chunks is changing. Compute chunk/atom allows setting
	nchunk to be static. :dd

	{Compute nve/asphere requires atom style ellipsoid} :dt

	Self-explanatory. :dd

	{Compute nvt/nph/npt asphere requires atom style ellipsoid} :dt

	Self-explanatory. :dd

	{Compute nvt/nph/npt body requires atom style body} :dt

	Self-explanatory. :dd

	{Compute omega/chunk does not use chunk/atom compute} :dt

	The style of the specified compute is not chunk/atom. :dd

	{Compute orientorder/atom cutoff is longer than pairwise cutoff} :dt

	Cannot compute order parameter beyond cutoff. :dd

	{Compute orientorder/atom requires a pair style be defined} :dt

	Self-explanatory. :dd

	{Compute pair must use group all} :dt

	Pair styles accumulate energy on all atoms. :dd

	{Compute pe must use group all} :dt

	Energies computed by potentials (pair, bond, etc) are computed on all
	atoms. :dd

	{Compute plasticity/atom cannot be used with this pair style} :dt

	Self-explanatory. :dd

	{Compute plasticity/atom requires Peridynamic pair style} :dt

	Self-explanatory. :dd

	{Compute pressure must use group all} :dt

	Virial contributions computed by potentials (pair, bond, etc) are
	computed on all atoms. :dd

	{Compute pressure requires temperature ID to include kinetic energy} :dt

	The keflag cannot be used unless a temperature compute is provided. :dd

	{Compute pressure temperature ID does not compute temperature} :dt

	The compute ID assigned to a pressure computation must compute
	temperature. :dd

	{Compute property/atom floating point vector does not exist} :dt

	The command is accessing a vector added by the fix property/atom
	command, that does not exist. :dd

	{Compute property/atom for atom property that isn't allocated} :dt

	Self-explanatory. :dd

	{Compute property/atom integer vector does not exist} :dt

	The command is accessing a vector added by the fix property/atom
	command, that does not exist. :dd

	{Compute property/chunk does not use chunk/atom compute} :dt

	The style of the specified compute is not chunk/atom. :dd

	{Compute property/local cannot use these inputs together} :dt

	Only inputs that generate the same number of datums can be used
	together. E.g. bond and angle quantities cannot be mixed. :dd

	{Compute property/local does not (yet) work with atom_style template} :dt

	Self-explanatory. :dd

	{Compute property/local for property that isn't allocated} :dt

	Self-explanatory. :dd

	{Compute rdf requires a pair style be defined} :dt

	Self-explanatory. :dd

	{Compute reduce compute array is accessed out-of-range} :dt

	An index for the array is out of bounds. :dd

	{Compute reduce compute calculates global values} :dt

	A compute that calculates peratom or local values is required. :dd

	{Compute reduce compute does not calculate a local array} :dt

	Self-explanatory. :dd

	{Compute reduce compute does not calculate a local vector} :dt

	Self-explanatory. :dd

	{Compute reduce compute does not calculate a per-atom array} :dt

	Self-explanatory. :dd

	{Compute reduce compute does not calculate a per-atom vector} :dt

	Self-explanatory. :dd

	{Compute reduce fix array is accessed out-of-range} :dt

	An index for the array is out of bounds. :dd

	{Compute reduce fix calculates global values} :dt

	A fix that calculates peratom or local values is required. :dd

	{Compute reduce fix does not calculate a local array} :dt

	Self-explanatory. :dd

	{Compute reduce fix does not calculate a local vector} :dt

	Self-explanatory. :dd

	{Compute reduce fix does not calculate a per-atom array} :dt

	Self-explanatory. :dd

	{Compute reduce fix does not calculate a per-atom vector} :dt

	Self-explanatory. :dd

	{Compute reduce replace requires min or max mode} :dt

	Self-explanatory. :dd

	{Compute reduce variable is not atom-style variable} :dt

	Self-explanatory. :dd

	{Compute slice compute array is accessed out-of-range} :dt

	An index for the array is out of bounds. :dd

	{Compute slice compute does not calculate a global array} :dt

	Self-explanatory. :dd

	{Compute slice compute does not calculate a global vector} :dt

	Self-explanatory. :dd

	{Compute slice compute does not calculate global vector or array} :dt

	Self-explanatory. :dd

	{Compute slice compute vector is accessed out-of-range} :dt

	The index for the vector is out of bounds. :dd

	{Compute slice fix array is accessed out-of-range} :dt

	An index for the array is out of bounds. :dd

	{Compute slice fix does not calculate a global array} :dt

	Self-explanatory. :dd

	{Compute slice fix does not calculate a global vector} :dt

	Self-explanatory. :dd

	{Compute slice fix does not calculate global vector or array} :dt

	Self-explanatory. :dd

	{Compute slice fix vector is accessed out-of-range} :dt

	The index for the vector is out of bounds. :dd

	{Compute sna/atom cutoff is longer than pairwise cutoff} :dt

	Self-explanatory. :dd

	{Compute sna/atom requires a pair style be defined} :dt

	Self-explanatory. :dd

	{Compute snad/atom cutoff is longer than pairwise cutoff} :dt

	Self-explanatory. :dd

	{Compute snad/atom requires a pair style be defined} :dt

	Self-explanatory. :dd

	{Compute snav/atom cutoff is longer than pairwise cutoff} :dt

	Self-explanatory. :dd

	{Compute snav/atom requires a pair style be defined} :dt

	Self-explanatory. :dd

	{Compute stress/atom temperature ID does not compute temperature} :dt

	The specified compute must compute temperature. :dd

	{Compute temp/asphere requires atom style ellipsoid} :dt

	Self-explanatory. :dd

	{Compute temp/asphere requires extended particles} :dt

	This compute cannot be used with point particles. :dd

	{Compute temp/body requires atom style body} :dt

	Self-explanatory. :dd

	{Compute temp/body requires bodies} :dt

	This compute can only be applied to body particles. :dd

	{Compute temp/chunk does not use chunk/atom compute} :dt

	The style of the specified compute is not chunk/atom. :dd

	{Compute temp/cs requires ghost atoms store velocity} :dt

	Use the comm_modify vel yes command to enable this. :dd

	{Compute temp/cs used when bonds are not allowed} :dt

	This compute only works on pairs of bonded particles. :dd

	{Compute temp/partial cannot use vz for 2d systemx} :dt

	Self-explanatory. :dd

	{Compute temp/profile cannot bin z for 2d systems} :dt

	Self-explanatory. :dd

	{Compute temp/profile cannot use vz for 2d systemx} :dt

	Self-explanatory. :dd

	{Compute temp/sphere requires atom style sphere} :dt

	Self-explanatory. :dd

	{Compute ti kspace style does not exist} :dt

	Self-explanatory. :dd

	{Compute ti pair style does not exist} :dt

	Self-explanatory. :dd

	{Compute ti tail when pair style does not compute tail corrections} :dt

	Self-explanatory. :dd

	{Compute torque/chunk does not use chunk/atom compute} :dt

	The style of the specified compute is not chunk/atom. :dd

	{Compute used in dump between runs is not current} :dt

	The compute was not invoked on the current timestep, therefore it
	cannot be used in a dump between runs. :dd

	{Compute used in variable between runs is not current} :dt

	Computes cannot be invoked by a variable in between runs. Thus they
	must have been evaluated on the last timestep of the previous run in
	order for their value(s) to be accessed. See the doc page for the
	variable command for more info. :dd

	{Compute used in variable thermo keyword between runs is not current} :dt

	Some thermo keywords rely on a compute to calculate their value(s).
	Computes cannot be invoked by a variable in between runs. Thus they
	must have been evaluated on the last timestep of the previous run in
	order for their value(s) to be accessed. See the doc page for the
	variable command for more info. :dd

	{Compute vcm/chunk does not use chunk/atom compute} :dt

	The style of the specified compute is not chunk/atom. :dd

	{Computed temperature for fix temp/berendsen cannot be 0.0} :dt

	Self-explanatory. :dd

	{Computed temperature for fix temp/rescale cannot be 0.0} :dt

	Cannot rescale the temperature to a new value if the current
	temperature is 0.0. :dd

	{Core/shell partner atom not found} :dt

	Could not find one of the atoms in the bond pair. :dd

	{Core/shell partners were not all found} :dt

	Could not find or more atoms in the bond pairs. :dd

	{Could not adjust g_ewald_6} :dt

	The Newton-Raphson solver failed to converge to a good value for
	g_ewald. This error should not occur for typical problems. Please
	send an email to the developers. :dd

	{Could not compute g_ewald} :dt

	The Newton-Raphson solver failed to converge to a good value for
	g_ewald. This error should not occur for typical problems. Please
	send an email to the developers. :dd

	{Could not compute grid size} :dt

	The code is unable to compute a grid size consistent with the desired
	accuracy. This error should not occur for typical problems. Please
	send an email to the developers. :dd

	{Could not compute grid size for Coulomb interaction} :dt

	The code is unable to compute a grid size consistent with the desired
	accuracy. This error should not occur for typical problems. Please
	send an email to the developers. :dd

	{Could not compute grid size for Dispersion} :dt

	The code is unable to compute a grid size consistent with the desired
	accuracy. This error should not occur for typical problems. Please
	send an email to the developers. :dd

	{Could not create 3d FFT plan} :dt

	The FFT setup for the PPPM solver failed, typically due
	to lack of memory. This is an unusual error. Check the
	size of the FFT grid you are requesting. :dd

	{Could not create 3d grid of processors} :dt

	The specified constraints did not allow a Px by Py by Pz grid to be
	created where Px * Py * Pz = P = total number of processors. :dd

	{Could not create 3d remap plan} :dt

	The FFT setup in pppm failed. :dd

	{Could not create Python function arguments} :dt

	This is an internal Python error, possibly because the number
	of inputs to the function is too large. :dd

	{Could not create numa grid of processors} :dt

	The specified constraints did not allow this style of grid to be
	created. Usually this is because the total processor count is not a
	multiple of the cores/node or the user specified processor count is >
	1 in one of the dimensions. :dd

	{Could not create twolevel 3d grid of processors} :dt

	The specified constraints did not allow this style of grid to be
	created. :dd

	{Could not evaluate Python function input variable} :dt

	Self-explanatory. :dd

	{Could not find Python function} :dt

	The provided Python code was run successfully, but it not
	define a callable function with the required name. :dd

	{Could not find atom_modify first group ID} :dt

	Self-explanatory. :dd

	{Could not find change_box group ID} :dt

	Group ID used in the change_box command does not exist. :dd

	{Could not find compute ID for PRD} :dt

	Self-explanatory. :dd

	{Could not find compute ID for TAD} :dt

	Self-explanatory. :dd

	{Could not find compute ID for temperature bias} :dt

	Self-explanatory. :dd

	{Could not find compute ID to delete} :dt

	Self-explanatory. :dd

	{Could not find compute displace/atom fix ID} :dt

	Self-explanatory. :dd

	{Could not find compute event/displace fix ID} :dt

	Self-explanatory. :dd

	{Could not find compute group ID} :dt

	Self-explanatory. :dd

	{Could not find compute heat/flux compute ID} :dt

	Self-explanatory. :dd

	{Could not find compute msd fix ID} :dt

	Self-explanatory. :dd

	{Could not find compute msd/chunk fix ID} :dt

	The compute creates an internal fix, which has been deleted. :dd

	{Could not find compute pressure temperature ID} :dt

	The compute ID for calculating temperature does not exist. :dd

	{Could not find compute stress/atom temperature ID} :dt

	Self-explanatory. :dd

	{Could not find compute vacf fix ID} :dt

	Self-explanatory. :dd

	{Could not find compute/voronoi surface group ID} :dt

	Self-explanatory. :dd

	{Could not find compute_modify ID} :dt

	Self-explanatory. :dd

	{Could not find custom per-atom property ID} :dt

	Self-explanatory. :dd

	{Could not find delete_atoms group ID} :dt

	Group ID used in the delete_atoms command does not exist. :dd

	{Could not find delete_atoms region ID} :dt

	Region ID used in the delete_atoms command does not exist. :dd

	{Could not find displace_atoms group ID} :dt

	Group ID used in the displace_atoms command does not exist. :dd

	{Could not find dump custom compute ID} :dt

	Self-explanatory. :dd

	{Could not find dump custom fix ID} :dt

	Self-explanatory. :dd

	{Could not find dump custom variable name} :dt

	Self-explanatory. :dd

	{Could not find dump group ID} :dt

	A group ID used in the dump command does not exist. :dd

	{Could not find dump local compute ID} :dt

	Self-explanatory. :dd

	{Could not find dump local fix ID} :dt

	Self-explanatory. :dd

	{Could not find dump modify compute ID} :dt

	Self-explanatory. :dd

	{Could not find dump modify custom atom floating point property ID} :dt

	Self-explanatory. :dd

	{Could not find dump modify custom atom integer property ID} :dt

	Self-explanatory. :dd

	{Could not find dump modify fix ID} :dt

	Self-explanatory. :dd

	{Could not find dump modify variable name} :dt

	Self-explanatory. :dd

	{Could not find fix ID to delete} :dt

	Self-explanatory. :dd

	{Could not find fix adapt storage fix ID} :dt

	This should not happen unless you explicitly deleted
	a secondary fix that fix adapt created internally. :dd

	{Could not find fix gcmc exclusion group ID} :dt

	Self-explanatory. :dd

	{Could not find fix gcmc rotation group ID} :dt

	Self-explanatory. :dd

	{Could not find fix group ID} :dt

	A group ID used in the fix command does not exist. :dd

	{Could not find fix msst compute ID} :dt

	Self-explanatory. :dd

	{Could not find fix poems group ID} :dt

	A group ID used in the fix poems command does not exist. :dd

	{Could not find fix recenter group ID} :dt

	A group ID used in the fix recenter command does not exist. :dd

	{Could not find fix rigid group ID} :dt

	A group ID used in the fix rigid command does not exist. :dd

	{Could not find fix srd group ID} :dt

	Self-explanatory. :dd

	{Could not find fix_modify ID} :dt

	A fix ID used in the fix_modify command does not exist. :dd

	{Could not find fix_modify pressure ID} :dt

	The compute ID for computing pressure does not exist. :dd

	{Could not find fix_modify temperature ID} :dt

	The compute ID for computing temperature does not exist. :dd

	{Could not find group clear group ID} :dt

	Self-explanatory. :dd

	{Could not find group delete group ID} :dt

	Self-explanatory. :dd

	{Could not find pair fix ID} :dt

	A fix is created internally by the pair style to store shear
	history information. You cannot delete it. :dd

	{Could not find set group ID} :dt

	Group ID specified in set command does not exist. :dd

	{Could not find specified fix gcmc group ID} :dt

	Self-explanatory. :dd

	{Could not find thermo compute ID} :dt

	Compute ID specified in thermo_style command does not exist. :dd

	{Could not find thermo custom compute ID} :dt

	The compute ID needed by thermo style custom to compute a requested
	quantity does not exist. :dd

	{Could not find thermo custom fix ID} :dt

	The fix ID needed by thermo style custom to compute a requested
	quantity does not exist. :dd

	{Could not find thermo custom variable name} :dt

	Self-explanatory. :dd

	{Could not find thermo fix ID} :dt

	Fix ID specified in thermo_style command does not exist. :dd

	{Could not find thermo variable name} :dt

	Self-explanatory. :dd

	{Could not find thermo_modify pressure ID} :dt

	The compute ID needed by thermo style custom to compute pressure does
	not exist. :dd

	{Could not find thermo_modify temperature ID} :dt

	The compute ID needed by thermo style custom to compute temperature does
	not exist. :dd

	{Could not find undump ID} :dt

	A dump ID used in the undump command does not exist. :dd

	{Could not find velocity group ID} :dt

	A group ID used in the velocity command does not exist. :dd

	{Could not find velocity temperature ID} :dt

	The compute ID needed by the velocity command to compute temperature
	does not exist. :dd

	{Could not find/initialize a specified accelerator device} :dt

	Could not initialize at least one of the devices specified for the gpu
	package :dd

	{Could not grab element entry from EIM potential file} :dt

	Self-explanatory :dd

	{Could not grab global entry from EIM potential file} :dt

	Self-explanatory. :dd

	{Could not grab pair entry from EIM potential file} :dt

	Self-explanatory. :dd

	{Could not initialize embedded Python} :dt

	The main module in Python was not accessible. :dd

	{Could not open Python file} :dt

	The specified file of Python code cannot be opened. Check that the
	path and name are correct. :dd

	{Could not process Python file} :dt

	The Python code in the specified file was not run successfully by
	Python, probably due to errors in the Python code. :dd

	{Could not process Python string} :dt

	The Python code in the here string was not run successfully by Python,
	probably due to errors in the Python code. :dd

	{Coulomb PPPMDisp order has been reduced below minorder} :dt

	The default minimum order is 2. This can be reset by the
	kspace_modify minorder command. :dd

	{Coulomb cut not supported in pair_style buck/long/coul/coul} :dt

	Must use long-range Coulombic interactions. :dd

	{Coulomb cut not supported in pair_style lj/long/coul/long} :dt

	Must use long-range Coulombic interactions. :dd

	{Coulomb cut not supported in pair_style lj/long/tip4p/long} :dt

	Must use long-range Coulombic interactions. :dd

	{Coulomb cutoffs of pair hybrid sub-styles do not match} :dt

	If using a Kspace solver, all Coulomb cutoffs of long pair styles must
	be the same. :dd

	{Coulombic cut not supported in pair_style lj/long/dipole/long} :dt

	Must use long-range Coulombic interactions. :dd

	{Cound not find dump_modify ID} :dt

	Self-explanatory. :dd

	{Create_atoms command before simulation box is defined} :dt

	The create_atoms command cannot be used before a read_data,
	read_restart, or create_box command. :dd

	{Create_atoms molecule has atom IDs, but system does not} :dt

	The atom_style id command can be used to force atom IDs to be stored. :dd

	{Create_atoms molecule must have atom types} :dt

	The defined molecule does not specify atom types. :dd

	{Create_atoms molecule must have coordinates} :dt

	The defined molecule does not specify coordinates. :dd

	{Create_atoms region ID does not exist} :dt

	A region ID used in the create_atoms command does not exist. :dd

	{Create_bonds command before simulation box is defined} :dt

	Self-explanatory. :dd

	{Create_bonds command requires no kspace_style be defined} :dt

	This is so that atom pairs that are already bonded to not appear
	in the neighbor list. :dd

	{Create_bonds command requires special_bonds 1-2 weights be 0.0} :dt

	This is so that atom pairs that are already bonded to not appear in
	the neighbor list. :dd

	{Create_bonds max distance > neighbor cutoff} :dt

	Can only create bonds for atom pairs that will be in neighbor list. :dd

	{Create_bonds requires a pair style be defined} :dt

	Self-explanatory. :dd

	{Create_box region ID does not exist} :dt

	Self-explanatory. :dd

	{Create_box region does not support a bounding box} :dt

	Not all regions represent bounded volumes. You cannot use
	such a region with the create_box command. :dd

	{Custom floating point vector for fix store/state does not exist} :dt

	The command is accessing a vector added by the fix property/atom
	command, that does not exist. :dd

	{Custom integer vector for fix store/state does not exist} :dt

	The command is accessing a vector added by the fix property/atom
	command, that does not exist. :dd

	{Custom per-atom property ID is not floating point} :dt

	Self-explanatory. :dd

	{Custom per-atom property ID is not integer} :dt

	Self-explanatory. :dd

	{Cut-offs missing in pair_style lj/long/dipole/long} :dt

	Self-explanatory. :dd

	{Cutoffs missing in pair_style buck/long/coul/long} :dt

	Self-explanatory. :dd

	{Cutoffs missing in pair_style lj/long/coul/long} :dt

	Self-explanatory. :dd

	{Cyclic loop in joint connections} :dt

	Fix poems cannot (yet) work with coupled bodies whose joints connect
	the bodies in a ring (or cycle). :dd

	{Degenerate lattice primitive vectors} :dt

	Invalid set of 3 lattice vectors for lattice command. :dd

	{Delete region ID does not exist} :dt

	Self-explanatory. :dd

	{Delete_atoms command before simulation box is defined} :dt

	The delete_atoms command cannot be used before a read_data,
	read_restart, or create_box command. :dd

	{Delete_atoms cutoff > max neighbor cutoff} :dt

	Can only delete atoms in atom pairs that will be in neighbor list. :dd

	{Delete_atoms mol yes requires atom attribute molecule} :dt

	Cannot use this option with a non-molecular system. :dd

	{Delete_atoms requires a pair style be defined} :dt

	This is because atom deletion within a cutoff uses a pairwise
	neighbor list. :dd

	{Delete_bonds command before simulation box is defined} :dt

	The delete_bonds command cannot be used before a read_data,
	read_restart, or create_box command. :dd

	{Delete_bonds command with no atoms existing} :dt

	No atoms are yet defined so the delete_bonds command cannot be used. :dd

	{Deposition region extends outside simulation box} :dt

	Self-explanatory. :dd

	{Did not assign all atoms correctly} :dt

	Atoms read in from a data file were not assigned correctly to
	processors. This is likely due to some atom coordinates being
	outside a non-periodic simulation box. :dd

	{Did not assign all restart atoms correctly} :dt

	Atoms read in from the restart file were not assigned correctly to
	processors. This is likely due to some atom coordinates being outside
	a non-periodic simulation box. Normally this should not happen. You
	may wish to use the "remap" option on the read_restart command to see
	if this helps. :dd

	{Did not find all elements in MEAM library file} :dt

	The requested elements were not found in the MEAM file. :dd

	{Did not find fix shake partner info} :dt

	Could not find bond partners implied by fix shake command. This error
	can be triggered if the delete_bonds command was used before fix
	shake, and it removed bonds without resetting the 1-2, 1-3, 1-4
	weighting list via the special keyword. :dd

	{Did not find keyword in table file} :dt

	Keyword used in pair_coeff command was not found in table file. :dd

	{Did not set pressure for fix rigid/nph} :dt

	The press keyword must be specified. :dd

	{Did not set temp for fix rigid/nvt/small} :dt

	Self-explanatory. :dd

	{Did not set temp or press for fix rigid/npt/small} :dt

	Self-explanatory. :dd

	{Did not set temperature for fix rigid/nvt} :dt

	The temp keyword must be specified. :dd

	{Did not set temperature or pressure for fix rigid/npt} :dt

	The temp and press keywords must be specified. :dd

	{Dihedral atom missing in delete_bonds} :dt

	The delete_bonds command cannot find one or more atoms in a particular
	dihedral on a particular processor. The pairwise cutoff is too short
	or the atoms are too far apart to make a valid dihedral. :dd

	{Dihedral atom missing in set command} :dt

	The set command cannot find one or more atoms in a particular dihedral
	on a particular processor. The pairwise cutoff is too short or the
	atoms are too far apart to make a valid dihedral. :dd

	{Dihedral atoms %d %d %d %d missing on proc %d at step %ld} :dt

	One or more of 4 atoms needed to compute a particular dihedral are
	missing on this processor. Typically this is because the pairwise
	cutoff is set too short or the dihedral has blown apart and an atom is
	too far away. :dd

	{Dihedral atoms missing on proc %d at step %ld} :dt

	One or more of 4 atoms needed to compute a particular dihedral are
	missing on this processor. Typically this is because the pairwise
	cutoff is set too short or the dihedral has blown apart and an atom is
	too far away. :dd

	{Dihedral charmm is incompatible with Pair style} :dt

	Dihedral style charmm must be used with a pair style charmm
	in order for the 1-4 epsilon/sigma parameters to be defined. :dd

	{Dihedral coeff for hybrid has invalid style} :dt

	Dihedral style hybrid uses another dihedral style as one of its
	coefficients. The dihedral style used in the dihedral_coeff command
	or read from a restart file is not recognized. :dd

	{Dihedral coeffs are not set} :dt

	No dihedral coefficients have been assigned in the data file or via
	the dihedral_coeff command. :dd

	{Dihedral style hybrid cannot have hybrid as an argument} :dt

	Self-explanatory. :dd

	{Dihedral style hybrid cannot have none as an argument} :dt

	Self-explanatory. :dd

	{Dihedral style hybrid cannot use same dihedral style twice} :dt

	Self-explanatory. :dd

	{Dihedral/improper extent > half of periodic box length} :dt

	This error was detected by the neigh_modify check yes setting. It is
	an error because the dihedral atoms are so far apart it is ambiguous
	how it should be defined. :dd

	{Dihedral_coeff command before dihedral_style is defined} :dt

	Coefficients cannot be set in the data file or via the dihedral_coeff
	command until an dihedral_style has been assigned. :dd

	{Dihedral_coeff command before simulation box is defined} :dt

	The dihedral_coeff command cannot be used before a read_data,
	read_restart, or create_box command. :dd

	{Dihedral_coeff command when no dihedrals allowed} :dt

	The chosen atom style does not allow for dihedrals to be defined. :dd

	{Dihedral_style command when no dihedrals allowed} :dt

	The chosen atom style does not allow for dihedrals to be defined. :dd

	{Dihedrals assigned incorrectly} :dt

	Dihedrals read in from the data file were not assigned correctly to
	atoms. This means there is something invalid about the topology
	definitions. :dd

	{Dihedrals defined but no dihedral types} :dt

	The data file header lists dihedrals but no dihedral types. :dd

	{Dimension command after simulation box is defined} :dt

	The dimension command cannot be used after a read_data,
	read_restart, or create_box command. :dd

	{Dispersion PPPMDisp order has been reduced below minorder} :dt

	The default minimum order is 2. This can be reset by the
	kspace_modify minorder command. :dd

	{Displace_atoms command before simulation box is defined} :dt

	The displace_atoms command cannot be used before a read_data,
	read_restart, or create_box command. :dd

	{Distance must be > 0 for compute event/displace} :dt

	Self-explanatory. :dd

	{Divide by 0 in influence function} :dt

	This should not normally occur. It is likely a problem with your
	model. :dd

	{Divide by 0 in influence function of pair peri/lps} :dt

	This should not normally occur. It is likely a problem with your
	model. :dd

	{Divide by 0 in variable formula} :dt

	Self-explanatory. :dd

	{Domain too large for neighbor bins} :dt

	The domain has become extremely large so that neighbor bins cannot be
	used. Most likely, one or more atoms have been blown out of the
	simulation box to a great distance. :dd

	{Double precision is not supported on this accelerator} :dt

	Self-explanatory :dd

	{Dump atom/gz only writes compressed files} :dt

	The dump atom/gz output file name must have a .gz suffix. :dd

	{Dump cfg arguments can not mix xs\|ys\|zs with xsu\|ysu\|zsu} :dt

	Self-explanatory. :dd

	{Dump cfg arguments must start with 'mass type xs ys zs' or 'mass type xsu ysu zsu'} :dt

	This is a requirement of the CFG output format. See the dump cfg doc
	page for more details. :dd

	{Dump cfg requires one snapshot per file} :dt

	Use the wildcard "*" character in the filename. :dd

	{Dump cfg/gz only writes compressed files} :dt

	The dump cfg/gz output file name must have a .gz suffix. :dd

	{Dump custom and fix not computed at compatible times} :dt

	The fix must produce per-atom quantities on timesteps that dump custom
	needs them. :dd

	{Dump custom compute does not calculate per-atom array} :dt

	Self-explanatory. :dd

	{Dump custom compute does not calculate per-atom vector} :dt

	Self-explanatory. :dd

	{Dump custom compute does not compute per-atom info} :dt

	Self-explanatory. :dd

	{Dump custom compute vector is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Dump custom fix does not compute per-atom array} :dt

	Self-explanatory. :dd

	{Dump custom fix does not compute per-atom info} :dt

	Self-explanatory. :dd

	{Dump custom fix does not compute per-atom vector} :dt

	Self-explanatory. :dd

	{Dump custom fix vector is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Dump custom variable is not atom-style variable} :dt

	Only atom-style variables generate per-atom quantities, needed for
	dump output. :dd

	{Dump custom/gz only writes compressed files} :dt

	The dump custom/gz output file name must have a .gz suffix. :dd

	{Dump dcd of non-matching # of atoms} :dt

	Every snapshot written by dump dcd must contain the same # of atoms. :dd

	{Dump dcd requires sorting by atom ID} :dt

	Use the dump_modify sort command to enable this. :dd

	{Dump every variable returned a bad timestep} :dt

	The variable must return a timestep greater than the current timestep. :dd

	{Dump file MPI-IO output not allowed with % in filename} :dt

	This is because a % signifies one file per processor and MPI-IO
	creates one large file for all processors. :dd

	{Dump file does not contain requested snapshot} :dt

	Self-explanatory. :dd

	{Dump file is incorrectly formatted} :dt

	Self-explanatory. :dd

	{Dump image body yes requires atom style body} :dt

	Self-explanatory. :dd

	{Dump image bond not allowed with no bond types} :dt

	Self-explanatory. :dd

	{Dump image cannot perform sorting} :dt

	Self-explanatory. :dd

	{Dump image line requires atom style line} :dt

	Self-explanatory. :dd

	{Dump image persp option is not yet supported} :dt

	Self-explanatory. :dd

	{Dump image requires one snapshot per file} :dt

	Use a "*" in the filename. :dd

	{Dump image tri requires atom style tri} :dt

	Self-explanatory. :dd

	{Dump local and fix not computed at compatible times} :dt

	The fix must produce per-atom quantities on timesteps that dump local
	needs them. :dd

	{Dump local attributes contain no compute or fix} :dt

	Self-explanatory. :dd

	{Dump local cannot sort by atom ID} :dt

	This is because dump local does not really dump per-atom info. :dd

	{Dump local compute does not calculate local array} :dt

	Self-explanatory. :dd

	{Dump local compute does not calculate local vector} :dt

	Self-explanatory. :dd

	{Dump local compute does not compute local info} :dt

	Self-explanatory. :dd

	{Dump local compute vector is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Dump local count is not consistent across input fields} :dt

	Every column of output must be the same length. :dd

	{Dump local fix does not compute local array} :dt

	Self-explanatory. :dd

	{Dump local fix does not compute local info} :dt

	Self-explanatory. :dd

	{Dump local fix does not compute local vector} :dt

	Self-explanatory. :dd

	{Dump local fix vector is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Dump modify bcolor not allowed with no bond types} :dt

	Self-explanatory. :dd

	{Dump modify bdiam not allowed with no bond types} :dt

	Self-explanatory. :dd

	{Dump modify compute ID does not compute per-atom array} :dt

	Self-explanatory. :dd

	{Dump modify compute ID does not compute per-atom info} :dt

	Self-explanatory. :dd

	{Dump modify compute ID does not compute per-atom vector} :dt

	Self-explanatory. :dd

	{Dump modify compute ID vector is not large enough} :dt

	Self-explanatory. :dd

	{Dump modify element names do not match atom types} :dt

	Number of element names must equal number of atom types. :dd

	{Dump modify fix ID does not compute per-atom array} :dt

	Self-explanatory. :dd

	{Dump modify fix ID does not compute per-atom info} :dt

	Self-explanatory. :dd

	{Dump modify fix ID does not compute per-atom vector} :dt

	Self-explanatory. :dd

	{Dump modify fix ID vector is not large enough} :dt

	Self-explanatory. :dd

	{Dump modify variable is not atom-style variable} :dt

	Self-explanatory. :dd

	{Dump sort column is invalid} :dt

	Self-explanatory. :dd

	{Dump xtc requires sorting by atom ID} :dt

	Use the dump_modify sort command to enable this. :dd

	{Dump xyz/gz only writes compressed files} :dt

	The dump xyz/gz output file name must have a .gz suffix. :dd

	{Dump_modify buffer yes not allowed for this style} :dt

	Self-explanatory. :dd

	{Dump_modify format string is too short} :dt

	There are more fields to be dumped in a line of output than your
	format string specifies. :dd

	{Dump_modify region ID does not exist} :dt

	Self-explanatory. :dd

	{Dumping an atom property that isn't allocated} :dt

	The chosen atom style does not define the per-atom quantity being
	dumped. :dd

	{Duplicate atom IDs exist} :dt

	Self-explanatory. :dd

	{Duplicate fields in read_dump command} :dt

	Self-explanatory. :dd

	{Duplicate particle in PeriDynamic bond - simulation box is too small} :dt

	This is likely because your box length is shorter than 2 times
	the bond length. :dd

	{Electronic temperature dropped below zero} :dt

	Something has gone wrong with the fix ttm electron temperature model. :dd

	{Element not defined in potential file} :dt

	The specified element is not in the potential file. :dd

	{Empty brackets in variable} :dt

	There is no variable syntax that uses empty brackets. Check
	the variable doc page. :dd

	{Energy was not tallied on needed timestep} :dt

	You are using a thermo keyword that requires potentials to
	have tallied energy, but they didn't on this timestep. See the
	variable doc page for ideas on how to make this work. :dd

	{Epsilon or sigma reference not set by pair style in PPPMDisp} :dt

	Self-explanatory. :dd

	{Epsilon or sigma reference not set by pair style in ewald/n} :dt

	The pair style is not providing the needed epsilon or sigma values. :dd

	{Error in vdw spline: inner radius > outer radius} :dt

	A pre-tabulated spline is invalid. Likely a problem with the
	potential parameters. :dd

	{Error writing averaged chunk data} :dt

	Something in the output to the file triggered an error. :dd

	{Error writing file header} :dt

	Something in the output to the file triggered an error. :dd

	{Error writing out correlation data} :dt

	Something in the output to the file triggered an error. :dd

	{Error writing out histogram data} :dt

	Something in the output to the file triggered an error. :dd

	{Error writing out time averaged data} :dt

	Something in the output to the file triggered an error. :dd

	{Failed to allocate %ld bytes for array %s} :dt

	Your LAMMPS simulation has run out of memory. You need to run a
	smaller simulation or on more processors. :dd

	{Failed to open FFmpeg pipeline to file %s} :dt

	The specified file cannot be opened. Check that the path and name are
	correct and writable and that the FFmpeg executable can be found and run. :dd

	{Failed to reallocate %ld bytes for array %s} :dt

	Your LAMMPS simulation has run out of memory. You need to run a
	smaller simulation or on more processors. :dd

	{Fewer SRD bins than processors in some dimension} :dt

	This is not allowed. Make your SRD bin size smaller. :dd

	{File variable could not read value} :dt

	Check the file assigned to the variable. :dd

	{Final box dimension due to fix deform is < 0.0} :dt

	Self-explanatory. :dd

	{Fix %s does not allow use of dynamic group} :dt

	Dynamic groups have not yet been enabled for this fix. :dd

	{Fix ID for compute chunk/atom does not exist} :dt

	Self-explanatory. :dd

	{Fix ID for compute erotate/rigid does not exist} :dt

	Self-explanatory. :dd

	{Fix ID for compute ke/rigid does not exist} :dt

	Self-explanatory. :dd

	{Fix ID for compute reduce does not exist} :dt

	Self-explanatory. :dd

	{Fix ID for compute slice does not exist} :dt

	Self-explanatory. :dd

	{Fix ID for fix ave/atom does not exist} :dt

	Self-explanatory. :dd

	{Fix ID for fix ave/chunk does not exist} :dt

	Self-explanatory. :dd

	{Fix ID for fix ave/correlate does not exist} :dt

	Self-explanatory. :dd

	{Fix ID for fix ave/histo does not exist} :dt

	Self-explanatory. :dd

	{Fix ID for fix ave/spatial does not exist} :dt

	Self-explanatory. :dd

	{Fix ID for fix ave/time does not exist} :dt

	Self-explanatory. :dd

	{Fix ID for fix store/state does not exist} :dt

	Self-explanatory :dd

	{Fix ID for fix vector does not exist} :dt

	Self-explanatory. :dd

	{Fix ID for read_data does not exist} :dt

	Self-explanatory. :dd

	{Fix ID for velocity does not exist} :dt

	Self-explanatory. :dd

	{Fix ID must be alphanumeric or underscore characters} :dt

	Self-explanatory. :dd

	{Fix SRD: bad bin assignment for SRD advection} :dt

	Something has gone wrong in your SRD model; try using more
	conservative settings. :dd

	{Fix SRD: bad search bin assignment} :dt

	Something has gone wrong in your SRD model; try using more
	conservative settings. :dd

	{Fix SRD: bad stencil bin for big particle} :dt

	Something has gone wrong in your SRD model; try using more
	conservative settings. :dd

	{Fix SRD: too many big particles in bin} :dt

	Reset the ATOMPERBIN parameter at the top of fix_srd.cpp
	to a larger value, and re-compile the code. :dd

	{Fix SRD: too many walls in bin} :dt

	This should not happen unless your system has been setup incorrectly. :dd

	{Fix adapt interface to this pair style not supported} :dt

	New coding for the pair style would need to be done. :dd

	{Fix adapt kspace style does not exist} :dt

	Self-explanatory. :dd

	{Fix adapt pair style does not exist} :dt

	Self-explanatory :dd

	{Fix adapt pair style param not supported} :dt

	The pair style does not know about the parameter you specified. :dd

	{Fix adapt requires atom attribute charge} :dt

	The atom style being used does not specify an atom charge. :dd

	{Fix adapt requires atom attribute diameter} :dt

	The atom style being used does not specify an atom diameter. :dd

	{Fix adapt type pair range is not valid for pair hybrid sub-style} :dt

	Self-explanatory. :dd

	{Fix append/atoms requires a lattice be defined} :dt

	Use the lattice command for this purpose. :dd

	{Fix ave/atom compute array is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Fix ave/atom compute does not calculate a per-atom array} :dt

	Self-explanatory. :dd

	{Fix ave/atom compute does not calculate a per-atom vector} :dt

	A compute used by fix ave/atom must generate per-atom values. :dd

	{Fix ave/atom compute does not calculate per-atom values} :dt

	A compute used by fix ave/atom must generate per-atom values. :dd

	{Fix ave/atom fix array is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Fix ave/atom fix does not calculate a per-atom array} :dt

	Self-explanatory. :dd

	{Fix ave/atom fix does not calculate a per-atom vector} :dt

	A fix used by fix ave/atom must generate per-atom values. :dd

	{Fix ave/atom fix does not calculate per-atom values} :dt

	A fix used by fix ave/atom must generate per-atom values. :dd

	{Fix ave/atom variable is not atom-style variable} :dt

	A variable used by fix ave/atom must generate per-atom values. :dd

	{Fix ave/chunk compute does not calculate a per-atom array} :dt

	Self-explanatory. :dd

	{Fix ave/chunk compute does not calculate a per-atom vector} :dt

	Self-explanatory. :dd

	{Fix ave/chunk compute does not calculate per-atom values} :dt

	Self-explanatory. :dd

	{Fix ave/chunk compute vector is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Fix ave/chunk does not use chunk/atom compute} :dt

	The specified compute is not for a compute chunk/atom command. :dd

	{Fix ave/chunk fix does not calculate a per-atom array} :dt

	Self-explanatory. :dd

	{Fix ave/chunk fix does not calculate a per-atom vector} :dt

	Self-explanatory. :dd

	{Fix ave/chunk fix does not calculate per-atom values} :dt

	Self-explanatory. :dd

	{Fix ave/chunk fix vector is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Fix ave/chunk variable is not atom-style variable} :dt

	Self-explanatory. :dd

	{Fix ave/correlate compute does not calculate a scalar} :dt

	Self-explanatory. :dd

	{Fix ave/correlate compute does not calculate a vector} :dt

	Self-explanatory. :dd

	{Fix ave/correlate compute vector is accessed out-of-range} :dt

	The index for the vector is out of bounds. :dd

	{Fix ave/correlate fix does not calculate a scalar} :dt

	Self-explanatory. :dd

	{Fix ave/correlate fix does not calculate a vector} :dt

	Self-explanatory. :dd

	{Fix ave/correlate fix vector is accessed out-of-range} :dt

	The index for the vector is out of bounds. :dd

	{Fix ave/correlate variable is not equal-style variable} :dt

	Self-explanatory. :dd

	{Fix ave/histo cannot input local values in scalar mode} :dt

	Self-explanatory. :dd

	{Fix ave/histo cannot input per-atom values in scalar mode} :dt

	Self-explanatory. :dd

	{Fix ave/histo compute array is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Fix ave/histo compute does not calculate a global array} :dt

	Self-explanatory. :dd

	{Fix ave/histo compute does not calculate a global scalar} :dt

	Self-explanatory. :dd

	{Fix ave/histo compute does not calculate a global vector} :dt

	Self-explanatory. :dd

	{Fix ave/histo compute does not calculate a local array} :dt

	Self-explanatory. :dd

	{Fix ave/histo compute does not calculate a local vector} :dt

	Self-explanatory. :dd

	{Fix ave/histo compute does not calculate a per-atom array} :dt

	Self-explanatory. :dd

	{Fix ave/histo compute does not calculate a per-atom vector} :dt

	Self-explanatory. :dd

	{Fix ave/histo compute does not calculate local values} :dt

	Self-explanatory. :dd

	{Fix ave/histo compute does not calculate per-atom values} :dt

	Self-explanatory. :dd

	{Fix ave/histo compute vector is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Fix ave/histo fix array is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Fix ave/histo fix does not calculate a global array} :dt

	Self-explanatory. :dd

	{Fix ave/histo fix does not calculate a global scalar} :dt

	Self-explanatory. :dd

	{Fix ave/histo fix does not calculate a global vector} :dt

	Self-explanatory. :dd

	{Fix ave/histo fix does not calculate a local array} :dt

	Self-explanatory. :dd

	{Fix ave/histo fix does not calculate a local vector} :dt

	Self-explanatory. :dd

	{Fix ave/histo fix does not calculate a per-atom array} :dt

	Self-explanatory. :dd

	{Fix ave/histo fix does not calculate a per-atom vector} :dt

	Self-explanatory. :dd

	{Fix ave/histo fix does not calculate local values} :dt

	Self-explanatory. :dd

	{Fix ave/histo fix does not calculate per-atom values} :dt

	Self-explanatory. :dd

	{Fix ave/histo fix vector is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Fix ave/histo input is invalid compute} :dt

	Self-explanatory. :dd

	{Fix ave/histo input is invalid fix} :dt

	Self-explanatory. :dd

	{Fix ave/histo input is invalid variable} :dt

	Self-explanatory. :dd

	{Fix ave/histo inputs are not all global, peratom, or local} :dt

	All inputs in a single fix ave/histo command must be of the
	same style. :dd

	{Fix ave/histo/weight value and weight vector lengths do not match} :dt

	Self-explanatory. :dd

	{Fix ave/spatial compute does not calculate a per-atom array} :dt

	Self-explanatory. :dd

	{Fix ave/spatial compute does not calculate a per-atom vector} :dt

	A compute used by fix ave/spatial must generate per-atom values. :dd

	{Fix ave/spatial compute does not calculate per-atom values} :dt

	A compute used by fix ave/spatial must generate per-atom values. :dd

	{Fix ave/spatial compute vector is accessed out-of-range} :dt

	The index for the vector is out of bounds. :dd

	{Fix ave/spatial fix does not calculate a per-atom array} :dt

	Self-explanatory. :dd

	{Fix ave/spatial fix does not calculate a per-atom vector} :dt

	A fix used by fix ave/spatial must generate per-atom values. :dd

	{Fix ave/spatial fix does not calculate per-atom values} :dt

	A fix used by fix ave/spatial must generate per-atom values. :dd

	{Fix ave/spatial fix vector is accessed out-of-range} :dt

	The index for the vector is out of bounds. :dd

	{Fix ave/spatial for triclinic boxes requires units reduced} :dt

	Self-explanatory. :dd

	{Fix ave/spatial settings invalid with changing box size} :dt

	If the box size changes, only the units reduced option can be
	used. :dd

	{Fix ave/spatial variable is not atom-style variable} :dt

	A variable used by fix ave/spatial must generate per-atom values. :dd

	{Fix ave/time cannot set output array intensive/extensive from these inputs} :dt

	One of more of the vector inputs has individual elements which are
	flagged as intensive or extensive. Such an input cannot be flagged as
	all intensive/extensive when turned into an array by fix ave/time. :dd

	{Fix ave/time cannot use variable with vector mode} :dt

	Variables produce scalar values. :dd

	{Fix ave/time columns are inconsistent lengths} :dt

	Self-explanatory. :dd

	{Fix ave/time compute array is accessed out-of-range} :dt

	An index for the array is out of bounds. :dd

	{Fix ave/time compute does not calculate a scalar} :dt

	Self-explanatory. :dd

	{Fix ave/time compute does not calculate a vector} :dt

	Self-explanatory. :dd

	{Fix ave/time compute does not calculate an array} :dt

	Self-explanatory. :dd

	{Fix ave/time compute vector is accessed out-of-range} :dt

	The index for the vector is out of bounds. :dd

	{Fix ave/time fix array cannot be variable length} :dt

	Self-explanatory. :dd

	{Fix ave/time fix array is accessed out-of-range} :dt

	An index for the array is out of bounds. :dd

	{Fix ave/time fix does not calculate a scalar} :dt

	Self-explanatory. :dd

	{Fix ave/time fix does not calculate a vector} :dt

	Self-explanatory. :dd

	{Fix ave/time fix does not calculate an array} :dt

	Self-explanatory. :dd

	{Fix ave/time fix vector cannot be variable length} :dt

	Self-explanatory. :dd

	{Fix ave/time fix vector is accessed out-of-range} :dt

	The index for the vector is out of bounds. :dd

	{Fix ave/time variable is not equal-style variable} :dt

	Self-explanatory. :dd

	{Fix balance rcb cannot be used with comm_style brick} :dt

	Comm_style tiled must be used instead. :dd

	{Fix balance shift string is invalid} :dt

	The string can only contain the characters "x", "y", or "z". :dd

	{Fix bond/break needs ghost atoms from further away} :dt

	This is because the fix needs to walk bonds to a certain distance to
	acquire needed info, The comm_modify cutoff command can be used to
	extend the communication range. :dd

	{Fix bond/create angle type is invalid} :dt

	Self-explanatory. :dd

	{Fix bond/create cutoff is longer than pairwise cutoff} :dt

	This is not allowed because bond creation is done using the
	pairwise neighbor list. :dd

	{Fix bond/create dihedral type is invalid} :dt

	Self-explanatory. :dd

	{Fix bond/create improper type is invalid} :dt

	Self-explanatory. :dd

	{Fix bond/create induced too many angles/dihedrals/impropers per atom} :dt

	See the read_data command for info on setting the "extra angle per
	atom", etc header values to allow for additional angles, etc to be
	formed. :dd

	{Fix bond/create needs ghost atoms from further away} :dt

	This is because the fix needs to walk bonds to a certain distance to
	acquire needed info, The comm_modify cutoff command can be used to
	extend the communication range. :dd

	{Fix bond/swap cannot use dihedral or improper styles} :dt

	These styles cannot be defined when using this fix. :dd

	{Fix bond/swap requires pair and bond styles} :dt

	Self-explanatory. :dd

	{Fix bond/swap requires special_bonds = 0,1,1} :dt

	Self-explanatory. :dd

	{Fix box/relax generated negative box length} :dt

	The pressure being applied is likely too large. Try applying
	it incrementally, to build to the high pressure. :dd

	{Fix command before simulation box is defined} :dt

	The fix command cannot be used before a read_data, read_restart, or
	create_box command. :dd

	{Fix deform cannot use yz variable with xy} :dt

	The yz setting cannot be a variable if xy deformation is also
	specified. This is because LAMMPS cannot determine if the yz setting
	will induce a box flip which would be invalid if xy is also changing. :dd

	{Fix deform is changing yz too much with xy} :dt

	When both yz and xy are changing, it induces changes in xz if the
	box must flip from one tilt extreme to another. Thus it is not
	allowed for yz to grow so much that a flip is induced. :dd

	{Fix deform tilt factors require triclinic box} :dt

	Cannot deform the tilt factors of a simulation box unless it
	is a triclinic (non-orthogonal) box. :dd

	{Fix deform volume setting is invalid} :dt

	Cannot use volume style unless other dimensions are being controlled. :dd

	{Fix deposit and fix rigid/small not using same molecule template ID} :dt

	Self-explanatory. :dd

	{Fix deposit and fix shake not using same molecule template ID} :dt

	Self-explanatory. :dd

	{Fix deposit molecule must have atom types} :dt

	The defined molecule does not specify atom types. :dd

	{Fix deposit molecule must have coordinates} :dt

	The defined molecule does not specify coordinates. :dd

	{Fix deposit molecule template ID must be same as atom_style template ID} :dt

	When using atom_style template, you cannot deposit molecules that are
	not in that template. :dd

	{Fix deposit region cannot be dynamic} :dt

	Only static regions can be used with fix deposit. :dd

	{Fix deposit region does not support a bounding box} :dt

	Not all regions represent bounded volumes. You cannot use
	such a region with the fix deposit command. :dd

	{Fix deposit shake fix does not exist} :dt

	Self-explanatory. :dd

	{Fix efield requires atom attribute q or mu} :dt

	The atom style defined does not have this attribute. :dd

	{Fix efield with dipoles cannot use atom-style variables} :dt

	This option is not supported. :dd

	{Fix evaporate molecule requires atom attribute molecule} :dt

	The atom style being used does not define a molecule ID. :dd

	{Fix external callback function not set} :dt

	This must be done by an external program in order to use this fix. :dd

	{Fix for fix ave/atom not computed at compatible time} :dt

	Fixes generate their values on specific timesteps. Fix ave/atom is
	requesting a value on a non-allowed timestep. :dd

	{Fix for fix ave/chunk not computed at compatible time} :dt

	Fixes generate their values on specific timesteps. Fix ave/chunk is
	requesting a value on a non-allowed timestep. :dd

	{Fix for fix ave/correlate not computed at compatible time} :dt

	Fixes generate their values on specific timesteps. Fix ave/correlate
	is requesting a value on a non-allowed timestep. :dd

	{Fix for fix ave/histo not computed at compatible time} :dt

	Fixes generate their values on specific timesteps. Fix ave/histo is
	requesting a value on a non-allowed timestep. :dd

	{Fix for fix ave/spatial not computed at compatible time} :dt

	Fixes generate their values on specific timesteps. Fix ave/spatial is
	requesting a value on a non-allowed timestep. :dd

	{Fix for fix ave/time not computed at compatible time} :dt

	Fixes generate their values on specific timesteps. Fix ave/time
	is requesting a value on a non-allowed timestep. :dd

	{Fix for fix store/state not computed at compatible time} :dt

	Fixes generate their values on specific timesteps. Fix store/state
	is requesting a value on a non-allowed timestep. :dd

	{Fix for fix vector not computed at compatible time} :dt

	Fixes generate their values on specific timesteps. Fix vector is
	requesting a value on a non-allowed timestep. :dd

	{Fix freeze requires atom attribute torque} :dt

	The atom style defined does not have this attribute. :dd

	{Fix gcmc and fix shake not using same molecule template ID} :dt

	Self-explanatory. :dd

	{Fix gcmc atom has charge, but atom style does not} :dt

	Self-explanatory. :dd

	{Fix gcmc cannot exchange individual atoms belonging to a molecule} :dt

	This is an error since you should not delete only one atom of a
	molecule. The user has specified atomic (non-molecular) gas
	exchanges, but an atom belonging to a molecule could be deleted. :dd

	{Fix gcmc does not (yet) work with atom_style template} :dt

	Self-explanatory. :dd

	{Fix gcmc molecule command requires that atoms have molecule attributes} :dt

	Should not choose the gcmc molecule feature if no molecules are being
	simulated. The general molecule flag is off, but gcmc's molecule flag
	is on. :dd

	{Fix gcmc molecule has charges, but atom style does not} :dt

	Self-explanatory. :dd

	{Fix gcmc molecule must have atom types} :dt

	The defined molecule does not specify atom types. :dd

	{Fix gcmc molecule must have coordinates} :dt

	The defined molecule does not specify coordinates. :dd

	{Fix gcmc molecule template ID must be same as atom_style template ID} :dt

	When using atom_style template, you cannot insert molecules that are
	not in that template. :dd

	{Fix gcmc put atom outside box} :dt

	This should not normally happen. Contact the developers. :dd

	{Fix gcmc ran out of available atom IDs} :dt

	See the setting for tagint in the src/lmptype.h file. :dd

	{Fix gcmc ran out of available molecule IDs} :dt

	See the setting for tagint in the src/lmptype.h file. :dd

	{Fix gcmc region cannot be dynamic} :dt

	Only static regions can be used with fix gcmc. :dd

	{Fix gcmc region does not support a bounding box} :dt

	Not all regions represent bounded volumes. You cannot use
	such a region with the fix gcmc command. :dd

	{Fix gcmc region extends outside simulation box} :dt

	Self-explanatory. :dd

	{Fix gcmc shake fix does not exist} :dt

	Self-explanatory. :dd

	{Fix gld c coefficients must be >= 0} :dt

	Self-explanatory. :dd

	{Fix gld needs more prony series coefficients} :dt

	Self-explanatory. :dd

	{Fix gld prony terms must be > 0} :dt

	Self-explanatory. :dd

	{Fix gld series type must be pprony for now} :dt

	Self-explanatory. :dd

	{Fix gld start temperature must be >= 0} :dt

	Self-explanatory. :dd

	{Fix gld stop temperature must be >= 0} :dt

	Self-explanatory. :dd

	{Fix gld tau coefficients must be > 0} :dt

	Self-explanatory. :dd

	{Fix heat group has no atoms} :dt

	Self-explanatory. :dd

	{Fix heat kinetic energy of an atom went negative} :dt

	This will cause the velocity rescaling about to be performed by fix
	heat to be invalid. :dd

	{Fix heat kinetic energy went negative} :dt

	This will cause the velocity rescaling about to be performed by fix
	heat to be invalid. :dd

	{Fix in variable not computed at compatible time} :dt

	Fixes generate their values on specific timesteps. The variable is
	requesting the values on a non-allowed timestep. :dd

	{Fix langevin angmom is not yet implemented with kokkos} :dt

	This option is not yet available. :dd

	{Fix langevin angmom requires atom style ellipsoid} :dt

	Self-explanatory. :dd

	{Fix langevin angmom requires extended particles} :dt

	This fix option cannot be used with point particles. :dd

	{Fix langevin omega is not yet implemented with kokkos} :dt

	This option is not yet available. :dd

	{Fix langevin omega requires atom style sphere} :dt

	Self-explanatory. :dd

	{Fix langevin omega requires extended particles} :dt

	One of the particles has radius 0.0. :dd

	{Fix langevin period must be > 0.0} :dt

	The time window for temperature relaxation must be > 0 :dd

	{Fix langevin variable returned negative temperature} :dt

	Self-explanatory. :dd

	{Fix momentum group has no atoms} :dt

	Self-explanatory. :dd

	{Fix move cannot define z or vz variable for 2d problem} :dt

	Self-explanatory. :dd

	{Fix move cannot rotate aroung non z-axis for 2d problem} :dt

	Self-explanatory. :dd

	{Fix move cannot set linear z motion for 2d problem} :dt

	Self-explanatory. :dd

	{Fix move cannot set wiggle z motion for 2d problem} :dt

	Self-explanatory. :dd

	{Fix msst compute ID does not compute potential energy} :dt

	Self-explanatory. :dd

	{Fix msst compute ID does not compute pressure} :dt

	Self-explanatory. :dd

	{Fix msst compute ID does not compute temperature} :dt

	Self-explanatory. :dd

	{Fix msst requires a periodic box} :dt

	Self-explanatory. :dd

	{Fix msst tscale must satisfy 0 <= tscale < 1} :dt

	Self-explanatory. :dd

	{Fix npt/nph has tilted box too far in one step - periodic cell is too far from equilibrium state} :dt

	Self-explanatory. The change in the box tilt is too extreme
	on a short timescale. :dd

	{Fix nve/asphere requires extended particles} :dt

	This fix can only be used for particles with a shape setting. :dd

	{Fix nve/asphere/noforce requires atom style ellipsoid} :dt

	Self-explanatory. :dd

	{Fix nve/asphere/noforce requires extended particles} :dt

	One of the particles is not an ellipsoid. :dd

	{Fix nve/body requires atom style body} :dt

	Self-explanatory. :dd

	{Fix nve/body requires bodies} :dt

	This fix can only be used for particles that are bodies. :dd

	{Fix nve/line can only be used for 2d simulations} :dt

	Self-explanatory. :dd

	{Fix nve/line requires atom style line} :dt

	Self-explanatory. :dd

	{Fix nve/line requires line particles} :dt

	Self-explanatory. :dd

	{Fix nve/sphere dipole requires atom attribute mu} :dt

	An atom style with this attribute is needed. :dd

	{Fix nve/sphere requires atom style sphere} :dt

	Self-explanatory. :dd

	{Fix nve/sphere requires extended particles} :dt

	This fix can only be used for particles of a finite size. :dd

	{Fix nve/tri can only be used for 3d simulations} :dt

	Self-explanatory. :dd

	{Fix nve/tri requires atom style tri} :dt

	Self-explanatory. :dd

	{Fix nve/tri requires tri particles} :dt

	Self-explanatory. :dd

	{Fix nvt/nph/npt asphere requires extended particles} :dt

	The shape setting for a particle in the fix group has shape = 0.0,
	which means it is a point particle. :dd

	{Fix nvt/nph/npt body requires bodies} :dt

	Self-explanatory. :dd

	{Fix nvt/nph/npt sphere requires atom style sphere} :dt

	Self-explanatory. :dd

	{Fix nvt/npt/nph damping parameters must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix nvt/npt/nph dilate group ID does not exist} :dt

	Self-explanatory. :dd

	{Fix nvt/sphere requires extended particles} :dt

	This fix can only be used for particles of a finite size. :dd

	{Fix orient/fcc file open failed} :dt

	The fix orient/fcc command could not open a specified file. :dd

	{Fix orient/fcc file read failed} :dt

	The fix orient/fcc command could not read the needed parameters from a
	specified file. :dd

	{Fix orient/fcc found self twice} :dt

	The neighbor lists used by fix orient/fcc are messed up. If this
	error occurs, it is likely a bug, so send an email to the
	"developers"_http://lammps.sandia.gov/authors.html. :dd

	{Fix peri neigh does not exist} :dt

	Somehow a fix that the pair style defines has been deleted. :dd

	{Fix pour and fix rigid/small not using same molecule template ID} :dt

	Self-explanatory. :dd

	{Fix pour and fix shake not using same molecule template ID} :dt

	Self-explanatory. :dd

	{Fix pour insertion count per timestep is 0} :dt

	Self-explanatory. :dd

	{Fix pour molecule must have atom types} :dt

	The defined molecule does not specify atom types. :dd

	{Fix pour molecule must have coordinates} :dt

	The defined molecule does not specify coordinates. :dd

	{Fix pour molecule template ID must be same as atom style template ID} :dt

	When using atom_style template, you cannot pour molecules that are
	not in that template. :dd

	{Fix pour polydisperse fractions do not sum to 1.0} :dt

	Self-explanatory. :dd

	{Fix pour region ID does not exist} :dt

	Self-explanatory. :dd

	{Fix pour region cannot be dynamic} :dt

	Only static regions can be used with fix pour. :dd

	{Fix pour region does not support a bounding box} :dt

	Not all regions represent bounded volumes. You cannot use
	such a region with the fix pour command. :dd

	{Fix pour requires atom attributes radius, rmass} :dt

	The atom style defined does not have these attributes. :dd

	{Fix pour rigid fix does not exist} :dt

	Self-explanatory. :dd

	{Fix pour shake fix does not exist} :dt

	Self-explanatory. :dd

	{Fix press/berendsen damping parameters must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix property/atom cannot specify mol twice} :dt

	Self-explanatory. :dd

	{Fix property/atom cannot specify q twice} :dt

	Self-explanatory. :dd

	{Fix property/atom mol when atom_style already has molecule attribute} :dt

	Self-explanatory. :dd

	{Fix property/atom q when atom_style already has charge attribute} :dt

	Self-explanatory. :dd

	{Fix property/atom vector name already exists} :dt

	The name for an integer or floating-point vector must be unique. :dd

	{Fix qeq has negative upper Taper radius cutoff} :dt

	Self-explanatory. :dd

	{Fix qeq/comb group has no atoms} :dt

	Self-explanatory. :dd

	{Fix qeq/comb requires atom attribute q} :dt

	An atom style with charge must be used to perform charge equilibration. :dd

	{Fix qeq/dynamic group has no atoms} :dt

	Self-explanatory. :dd

	{Fix qeq/dynamic requires atom attribute q} :dt

	Self-explanatory. :dd

	{Fix qeq/fire group has no atoms} :dt

	Self-explanatory. :dd

	{Fix qeq/fire requires atom attribute q} :dt

	Self-explanatory. :dd

	{Fix qeq/point group has no atoms} :dt

	Self-explanatory. :dd

	{Fix qeq/point has insufficient QEq matrix size} :dt

	Occurs when number of neighbor atoms for an atom increased too much
	during a run. Increase SAFE_ZONE and MIN_CAP in fix_qeq.h and
	recompile. :dd

	{Fix qeq/point requires atom attribute q} :dt

	Self-explanatory. :dd

	{Fix qeq/shielded group has no atoms} :dt

	Self-explanatory. :dd

	{Fix qeq/shielded has insufficient QEq matrix size} :dt

	Occurs when number of neighbor atoms for an atom increased too much
	during a run. Increase SAFE_ZONE and MIN_CAP in fix_qeq.h and
	recompile. :dd

	{Fix qeq/shielded requires atom attribute q} :dt

	Self-explanatory. :dd

	{Fix qeq/slater could not extract params from pair coul/streitz} :dt

	This should not happen unless pair coul/streitz has been altered. :dd

	{Fix qeq/slater group has no atoms} :dt

	Self-explanatory. :dd

	{Fix qeq/slater has insufficient QEq matrix size} :dt

	Occurs when number of neighbor atoms for an atom increased too much
	during a run. Increase SAFE_ZONE and MIN_CAP in fix_qeq.h and
	recompile. :dd

	{Fix qeq/slater requires atom attribute q} :dt

	Self-explanatory. :dd

	{Fix reax/bonds numbonds > nsbmax_most} :dt

	The limit of the number of bonds expected by the ReaxFF force field
	was exceeded. :dd

	{Fix recenter group has no atoms} :dt

	Self-explanatory. :dd

	{Fix restrain requires an atom map, see atom_modify} :dt

	Self-explanatory. :dd

	{Fix rigid atom has non-zero image flag in a non-periodic dimension} :dt

	Image flags for non-periodic dimensions should not be set. :dd

	{Fix rigid file has no lines} :dt

	Self-explanatory. :dd

	{Fix rigid langevin period must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix rigid molecule requires atom attribute molecule} :dt

	Self-explanatory. :dd

	{Fix rigid npt/nph dilate group ID does not exist} :dt

	Self-explanatory. :dd

	{Fix rigid npt/nph does not yet allow triclinic box} :dt

	This is a current restriction in LAMMPS. :dd

	{Fix rigid npt/nph period must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix rigid npt/small t_chain should not be less than 1} :dt

	Self-explanatory. :dd

	{Fix rigid npt/small t_order must be 3 or 5} :dt

	Self-explanatory. :dd

	{Fix rigid nvt/npt/nph damping parameters must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix rigid nvt/small t_chain should not be less than 1} :dt

	Self-explanatory. :dd

	{Fix rigid nvt/small t_iter should not be less than 1} :dt

	Self-explanatory. :dd

	{Fix rigid nvt/small t_order must be 3 or 5} :dt

	Self-explanatory. :dd

	{Fix rigid xy torque cannot be on for 2d simulation} :dt

	Self-explanatory. :dd

	{Fix rigid z force cannot be on for 2d simulation} :dt

	Self-explanatory. :dd

	{Fix rigid/npt period must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix rigid/npt temperature order must be 3 or 5} :dt

	Self-explanatory. :dd

	{Fix rigid/npt/small period must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix rigid/nvt period must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix rigid/nvt temperature order must be 3 or 5} :dt

	Self-explanatory. :dd

	{Fix rigid/nvt/small period must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix rigid/small atom has non-zero image flag in a non-periodic dimension} :dt

	Image flags for non-periodic dimensions should not be set. :dd

	{Fix rigid/small langevin period must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix rigid/small molecule must have atom types} :dt

	The defined molecule does not specify atom types. :dd

	{Fix rigid/small molecule must have coordinates} :dt

	The defined molecule does not specify coordinates. :dd

	{Fix rigid/small npt/nph period must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix rigid/small nvt/npt/nph damping parameters must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix rigid/small nvt/npt/nph dilate group ID does not exist} :dt

	Self-explanatory. :dd

	{Fix rigid/small requires an atom map, see atom_modify} :dt

	Self-explanatory. :dd

	{Fix rigid/small requires atom attribute molecule} :dt

	Self-explanatory. :dd

	{Fix rigid: Bad principal moments} :dt

	The principal moments of inertia computed for a rigid body
	are not within the required tolerances. :dd

	{Fix shake cannot be used with minimization} :dt

	Cannot use fix shake while doing an energy minimization since
	it turns off bonds that should contribute to the energy. :dd

	{Fix shake molecule template must have shake info} :dt

	The defined molecule does not specify SHAKE information. :dd

	{Fix spring couple group ID does not exist} :dt

	Self-explanatory. :dd

	{Fix srd can only currently be used with comm_style brick} :dt

	This is a current restriction in LAMMPS. :dd

	{Fix srd lamda must be >= 0.6 of SRD grid size} :dt

	This is a requirement for accuracy reasons. :dd

	{Fix srd no-slip requires atom attribute torque} :dt

	This is because the SRD collisions will impart torque to the solute
	particles. :dd

	{Fix srd requires SRD particles all have same mass} :dt

	Self-explanatory. :dd

	{Fix srd requires ghost atoms store velocity} :dt

	Use the comm_modify vel yes command to enable this. :dd

	{Fix srd requires newton pair on} :dt

	Self-explanatory. :dd

	{Fix store/state compute array is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Fix store/state compute does not calculate a per-atom array} :dt

	The compute calculates a per-atom vector. :dd

	{Fix store/state compute does not calculate a per-atom vector} :dt

	The compute calculates a per-atom vector. :dd

	{Fix store/state compute does not calculate per-atom values} :dt

	Computes that calculate global or local quantities cannot be used
	with fix store/state. :dd

	{Fix store/state fix array is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Fix store/state fix does not calculate a per-atom array} :dt

	The fix calculates a per-atom vector. :dd

	{Fix store/state fix does not calculate a per-atom vector} :dt

	The fix calculates a per-atom array. :dd

	{Fix store/state fix does not calculate per-atom values} :dt

	Fixes that calculate global or local quantities cannot be used with
	fix store/state. :dd

	{Fix store/state for atom property that isn't allocated} :dt

	Self-explanatory. :dd

	{Fix store/state variable is not atom-style variable} :dt

	Only atom-style variables calculate per-atom quantities. :dd

	{Fix temp/berendsen period must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix temp/berendsen variable returned negative temperature} :dt

	Self-explanatory. :dd

	{Fix temp/csld is not compatible with fix rattle or fix shake} :dt

	These two commands cannot currently be used together with fix temp/csld. :dd

	{Fix temp/csld variable returned negative temperature} :dt

	Self-explanatory. :dd

	{Fix temp/csvr variable returned negative temperature} :dt

	Self-explanatory. :dd

	{Fix temp/rescale variable returned negative temperature} :dt

	Self-explanatory. :dd

	{Fix tfmc displacement length must be > 0} :dt

	Self-explanatory. :dd

	{Fix tfmc is not compatible with fix shake} :dt

	These two commands cannot currently be used together. :dd

	{Fix tfmc temperature must be > 0} :dt

	Self-explanatory. :dd

	{Fix thermal/conductivity swap value must be positive} :dt

	Self-explanatory. :dd

	{Fix tmd must come after integration fixes} :dt

	Any fix tmd command must appear in the input script after all time
	integration fixes (nve, nvt, npt). See the fix tmd documentation for
	details. :dd

	{Fix ttm electron temperatures must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix ttm electronic_density must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix ttm electronic_specific_heat must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix ttm electronic_thermal_conductivity must be >= 0.0} :dt

	Self-explanatory. :dd

	{Fix ttm gamma_p must be > 0.0} :dt

	Self-explanatory. :dd

	{Fix ttm gamma_s must be >= 0.0} :dt

	Self-explanatory. :dd

	{Fix ttm number of nodes must be > 0} :dt

	Self-explanatory. :dd

	{Fix ttm v_0 must be >= 0.0} :dt

	Self-explanatory. :dd

	{Fix used in compute chunk/atom not computed at compatible time} :dt

	The chunk/atom compute cannot query the output of the fix on a timestep
	it is needed. :dd

	{Fix used in compute reduce not computed at compatible time} :dt

	Fixes generate their values on specific timesteps. Compute reduce is
	requesting a value on a non-allowed timestep. :dd

	{Fix used in compute slice not computed at compatible time} :dt

	Fixes generate their values on specific timesteps. Compute slice is
	requesting a value on a non-allowed timestep. :dd

	{Fix vector cannot set output array intensive/extensive from these inputs} :dt

	The inputs to the command have conflicting intensive/extensive attributes.
	You need to use more than one fix vector command. :dd

	{Fix vector compute does not calculate a scalar} :dt

	Self-explanatory. :dd

	{Fix vector compute does not calculate a vector} :dt

	Self-explanatory. :dd

	{Fix vector compute vector is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Fix vector fix does not calculate a scalar} :dt

	Self-explanatory. :dd

	{Fix vector fix does not calculate a vector} :dt

	Self-explanatory. :dd

	{Fix vector fix vector is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Fix vector variable is not equal-style variable} :dt

	Self-explanatory. :dd

	{Fix viscosity swap value must be positive} :dt

	Self-explanatory. :dd

	{Fix viscosity vtarget value must be positive} :dt

	Self-explanatory. :dd

	{Fix wall cutoff <= 0.0} :dt

	Self-explanatory. :dd

	{Fix wall/colloid requires atom style sphere} :dt

	Self-explanatory. :dd

	{Fix wall/colloid requires extended particles} :dt

	One of the particles has radius 0.0. :dd

	{Fix wall/gran is incompatible with Pair style} :dt

	Must use a granular pair style to define the parameters needed for
	this fix. :dd

	{Fix wall/gran requires atom style sphere} :dt

	Self-explanatory. :dd

	{Fix wall/piston command only available at zlo} :dt

	The face keyword must be zlo. :dd

	{Fix wall/region colloid requires atom style sphere} :dt

	Self-explanatory. :dd

	{Fix wall/region colloid requires extended particles} :dt

	One of the particles has radius 0.0. :dd

	{Fix wall/region cutoff <= 0.0} :dt

	Self-explanatory. :dd

	{Fix_modify pressure ID does not compute pressure} :dt

	The compute ID assigned to the fix must compute pressure. :dd

	{Fix_modify temperature ID does not compute temperature} :dt

	The compute ID assigned to the fix must compute temperature. :dd

	{For triclinic deformation, specified target stress must be hydrostatic} :dt

	Triclinic pressure control is allowed using the tri keyword, but
	non-hydrostatic pressure control can not be used in this case. :dd

	{Found no restart file matching pattern} :dt

	When using a "*" in the restart file name, no matching file was found. :dd

	{GPU library not compiled for this accelerator} :dt

	Self-explanatory. :dd

	{GPU package does not (yet) work with atom_style template} :dt

	Self-explanatory. :dd

	{GPU particle split must be set to 1 for this pair style.} :dt

	For this pair style, you cannot run part of the force calculation on
	the host. See the package command. :dd

	{GPU split param must be positive for hybrid pair styles} :dt

	See the package gpu command. :dd

	{GPUs are requested but Kokkos has not been compiled for CUDA} :dt

	Recompile Kokkos with CUDA support to use GPUs. :dd

	{Ghost velocity forward comm not yet implemented with Kokkos} :dt

	This is a current restriction. :dd

	{Gmask function in equal-style variable formula} :dt

	Gmask is per-atom operation. :dd

	{Gravity changed since fix pour was created} :dt

	The gravity vector defined by fix gravity must be static. :dd

	{Gravity must point in -y to use with fix pour in 2d} :dt

	Self-explanatory. :dd

	{Gravity must point in -z to use with fix pour in 3d} :dt

	Self-explanatory. :dd

	{Grmask function in equal-style variable formula} :dt

	Grmask is per-atom operation. :dd

	{Group ID does not exist} :dt

	A group ID used in the group command does not exist. :dd

	{Group ID in variable formula does not exist} :dt

	Self-explanatory. :dd

	{Group all cannot be made dynamic} :dt

	This operation is not allowed. :dd

	{Group command before simulation box is defined} :dt

	The group command cannot be used before a read_data, read_restart, or
	create_box command. :dd

	{Group dynamic cannot reference itself} :dt

	Self-explanatory. :dd

	{Group dynamic parent group cannot be dynamic} :dt

	Self-explanatory. :dd

	{Group dynamic parent group does not exist} :dt

	Self-explanatory. :dd

	{Group region ID does not exist} :dt

	A region ID used in the group command does not exist. :dd

	{If read_dump purges it cannot replace or trim} :dt

	These operations are not compatible. See the read_dump doc
	page for details. :dd

	{Illegal ... command} :dt

	Self-explanatory. Check the input script syntax and compare to the
	documentation for the command. You can use -echo screen as a
	command-line option when running LAMMPS to see the offending line. :dd

	{Illegal COMB parameter} :dt

	One or more of the coefficients defined in the potential file is
	invalid. :dd

	{Illegal COMB3 parameter} :dt

	One or more of the coefficients defined in the potential file is
	invalid. :dd

	{Illegal Stillinger-Weber parameter} :dt

	One or more of the coefficients defined in the potential file is
	invalid. :dd

	{Illegal Tersoff parameter} :dt

	One or more of the coefficients defined in the potential file is
	invalid. :dd

	{Illegal Vashishta parameter} :dt

	One or more of the coefficients defined in the potential file is
	invalid. :dd

	{Illegal compute voronoi/atom command (occupation and (surface or edges))} :dt

	Self-explanatory. :dd

	{Illegal coul/streitz parameter} :dt

	One or more of the coefficients defined in the potential file is
	invalid. :dd

	{Illegal dump_modify sfactor value (must be > 0.0)} :dt

	Self-explanatory. :dd

	{Illegal dump_modify tfactor value (must be > 0.0)} :dt

	Self-explanatory. :dd

	{Illegal fix gcmc gas mass <= 0} :dt

	The computed mass of the designated gas molecule or atom type was less
	than or equal to zero. :dd

	{Illegal fix tfmc random seed} :dt

	Seeds can only be nonzero positive integers. :dd

	{Illegal fix wall/piston velocity} :dt

	The piston velocity must be positive. :dd

	{Illegal integrate style} :dt

	Self-explanatory. :dd

	{Illegal nb3b/harmonic parameter} :dt

	One or more of the coefficients defined in the potential file is
	invalid. :dd

	{Illegal number of angle table entries} :dt

	There must be at least 2 table entries. :dd

	{Illegal number of bond table entries} :dt

	There must be at least 2 table entries. :dd

	{Illegal number of pair table entries} :dt

	There must be at least 2 table entries. :dd

	{Illegal or unset periodicity in restart} :dt

	This error should not normally occur unless the restart file is invalid. :dd

	{Illegal range increment value} :dt

	The increment must be >= 1. :dd

	{Illegal simulation box} :dt

	The lower bound of the simulation box is greater than the upper bound. :dd

	{Illegal size double vector read requested} :dt

	This error should not normally occur unless the restart file is invalid. :dd

	{Illegal size integer vector read requested} :dt

	This error should not normally occur unless the restart file is invalid. :dd

	{Illegal size string or corrupt restart} :dt

	This error should not normally occur unless the restart file is invalid. :dd

	{Imageint setting in lmptype.h is invalid} :dt

	Imageint must be as large or larger than smallint. :dd

	{Imageint setting in lmptype.h is not compatible} :dt

	Format of imageint stored in restart file is not consistent with
	LAMMPS version you are running. See the settings in src/lmptype.h :dd

	{Improper atom missing in delete_bonds} :dt

	The delete_bonds command cannot find one or more atoms in a particular
	improper on a particular processor. The pairwise cutoff is too short
	or the atoms are too far apart to make a valid improper. :dd

	{Improper atom missing in set command} :dt

	The set command cannot find one or more atoms in a particular improper
	on a particular processor. The pairwise cutoff is too short or the
	atoms are too far apart to make a valid improper. :dd

	{Improper atoms %d %d %d %d missing on proc %d at step %ld} :dt

	One or more of 4 atoms needed to compute a particular improper are
	missing on this processor. Typically this is because the pairwise
	cutoff is set too short or the improper has blown apart and an atom is
	too far away. :dd

	{Improper atoms missing on proc %d at step %ld} :dt

	One or more of 4 atoms needed to compute a particular improper are
	missing on this processor. Typically this is because the pairwise
	cutoff is set too short or the improper has blown apart and an atom is
	too far away. :dd

	{Improper coeff for hybrid has invalid style} :dt

	Improper style hybrid uses another improper style as one of its
	coefficients. The improper style used in the improper_coeff command
	or read from a restart file is not recognized. :dd

	{Improper coeffs are not set} :dt

	No improper coefficients have been assigned in the data file or via
	the improper_coeff command. :dd

	{Improper style hybrid cannot have hybrid as an argument} :dt

	Self-explanatory. :dd

	{Improper style hybrid cannot have none as an argument} :dt

	Self-explanatory. :dd

	{Improper style hybrid cannot use same improper style twice} :dt

	Self-explanatory. :dd

	{Improper_coeff command before improper_style is defined} :dt

	Coefficients cannot be set in the data file or via the improper_coeff
	command until an improper_style has been assigned. :dd

	{Improper_coeff command before simulation box is defined} :dt

	The improper_coeff command cannot be used before a read_data,
	read_restart, or create_box command. :dd

	{Improper_coeff command when no impropers allowed} :dt

	The chosen atom style does not allow for impropers to be defined. :dd

	{Improper_style command when no impropers allowed} :dt

	The chosen atom style does not allow for impropers to be defined. :dd

	{Impropers assigned incorrectly} :dt

	Impropers read in from the data file were not assigned correctly to
	atoms. This means there is something invalid about the topology
	definitions. :dd

	{Impropers defined but no improper types} :dt

	The data file header lists improper but no improper types. :dd

	{Incomplete use of variables in create_atoms command} :dt

	The var and set options must be used together. :dd

	{Inconsistent iparam/jparam values in fix bond/create command} :dt

	If itype and jtype are the same, then their maxbond and newtype
	settings must also be the same. :dd

	{Inconsistent line segment in data file} :dt

	The end points of the line segment are not equal distances from the
	center point which is the atom coordinate. :dd

	{Inconsistent triangle in data file} :dt

	The centroid of the triangle as defined by the corner points is not
	the atom coordinate. :dd

	{Inconsistent use of finite-size particles by molecule template molecules} :dt

	Not all of the molecules define a radius for their constituent
	particles. :dd

	{Incorrect # of floating-point values in Bodies section of data file} :dt

	See doc page for body style. :dd

	{Incorrect # of integer values in Bodies section of data file} :dt

	See doc page for body style. :dd

	{Incorrect %s format in data file} :dt

	A section of the data file being read by fix property/atom does
	not have the correct number of values per line. :dd

	{Incorrect SNAP parameter file} :dt

	The file cannot be parsed correctly, check its internal syntax. :dd

	{Incorrect args for angle coefficients} :dt

	Self-explanatory. Check the input script or data file. :dd

	{Incorrect args for bond coefficients} :dt

	Self-explanatory. Check the input script or data file. :dd

	{Incorrect args for dihedral coefficients} :dt

	Self-explanatory. Check the input script or data file. :dd

	{Incorrect args for improper coefficients} :dt

	Self-explanatory. Check the input script or data file. :dd

	{Incorrect args for pair coefficients} :dt

	Self-explanatory. Check the input script or data file. :dd

	{Incorrect args in pair_style command} :dt

	Self-explanatory. :dd

	{Incorrect atom format in data file} :dt

	Number of values per atom line in the data file is not consistent with
	the atom style. :dd

	{Incorrect atom format in neb file} :dt

	The number of fields per line is not what expected. :dd

	{Incorrect bonus data format in data file} :dt

	See the read_data doc page for a description of how various kinds of
	bonus data must be formatted for certain atom styles. :dd

	{Incorrect boundaries with slab Ewald} :dt

	Must have periodic x,y dimensions and non-periodic z dimension to use
	2d slab option with Ewald. :dd

	{Incorrect boundaries with slab EwaldDisp} :dt

	Must have periodic x,y dimensions and non-periodic z dimension to use
	2d slab option with Ewald. :dd

	{Incorrect boundaries with slab PPPM} :dt

	Must have periodic x,y dimensions and non-periodic z dimension to use
	2d slab option with PPPM. :dd

	{Incorrect boundaries with slab PPPMDisp} :dt

	Must have periodic x,y dimensions and non-periodic z dimension to use
	2d slab option with pppm/disp. :dd

	{Incorrect element names in ADP potential file} :dt

	The element names in the ADP file do not match those requested. :dd

	{Incorrect element names in EAM potential file} :dt

	The element names in the EAM file do not match those requested. :dd

	{Incorrect format in COMB potential file} :dt

	Incorrect number of words per line in the potential file. :dd

	{Incorrect format in COMB3 potential file} :dt

	Incorrect number of words per line in the potential file. :dd

	{Incorrect format in MEAM potential file} :dt

	Incorrect number of words per line in the potential file. :dd

	{Incorrect format in SNAP coefficient file} :dt

	Incorrect number of words per line in the coefficient file. :dd

	{Incorrect format in SNAP parameter file} :dt

	Incorrect number of words per line in the parameter file. :dd

	{Incorrect format in Stillinger-Weber potential file} :dt

	Incorrect number of words per line in the potential file. :dd

	{Incorrect format in TMD target file} :dt

	Format of file read by fix tmd command is incorrect. :dd

	{Incorrect format in Tersoff potential file} :dt

	Incorrect number of words per line in the potential file. :dd

	{Incorrect format in Vashishta potential file} :dt

	Incorrect number of words per line in the potential file. :dd

	{Incorrect format in coul/streitz potential file} :dt

	Incorrect number of words per line in the potential file. :dd

	{Incorrect format in nb3b/harmonic potential file} :dt

	Incorrect number of words per line in the potential file. :dd

	{Incorrect integer value in Bodies section of data file} :dt

	See doc page for body style. :dd

	{Incorrect multiplicity arg for dihedral coefficients} :dt

	Self-explanatory. Check the input script or data file. :dd

	{Incorrect number of elements in potential file} :dt

	Self-explanatory. :dd

	{Incorrect rigid body format in fix rigid file} :dt

	The number of fields per line is not what expected. :dd

	{Incorrect rigid body format in fix rigid/small file} :dt

	The number of fields per line is not what expected. :dd

	{Incorrect sign arg for dihedral coefficients} :dt

	Self-explanatory. Check the input script or data file. :dd

	{Incorrect table format check for element types} :dt

	Self-explanatory. :dd

	{Incorrect velocity format in data file} :dt

	Each atom style defines a format for the Velocity section
	of the data file. The read-in lines do not match. :dd

	{Incorrect weight arg for dihedral coefficients} :dt

	Self-explanatory. Check the input script or data file. :dd

	{Index between variable brackets must be positive} :dt

	Self-explanatory. :dd

	{Indexed per-atom vector in variable formula without atom map} :dt

	Accessing a value from an atom vector requires the ability to lookup
	an atom index, which is provided by an atom map. An atom map does not
	exist (by default) for non-molecular problems. Using the atom_modify
	map command will force an atom map to be created. :dd

	{Initial temperatures not all set in fix ttm} :dt

	Self-explanatory. :dd

	{Input line quote not followed by whitespace} :dt

	An end quote must be followed by whitespace. :dd

	{Insertion region extends outside simulation box} :dt

	Self-explanatory. :dd

	{Insufficient Jacobi rotations for POEMS body} :dt

	Eigensolve for rigid body was not sufficiently accurate. :dd

	{Insufficient Jacobi rotations for body nparticle} :dt

	Eigensolve for rigid body was not sufficiently accurate. :dd

	{Insufficient Jacobi rotations for rigid body} :dt

	Eigensolve for rigid body was not sufficiently accurate. :dd

	{Insufficient Jacobi rotations for rigid molecule} :dt

	Eigensolve for rigid body was not sufficiently accurate. :dd

	{Insufficient Jacobi rotations for triangle} :dt

	The calculation of the inertia tensor of the triangle failed. This
	should not happen if it is a reasonably shaped triangle. :dd

	{Insufficient memory on accelerator} :dt

	There is insufficient memory on one of the devices specified for the gpu
	package :dd

	{Internal error in atom_style body} :dt

	This error should not occur. Contact the developers. :dd

	{Invalid -reorder N value} :dt

	Self-explanatory. :dd

	{Invalid Angles section in molecule file} :dt

	Self-explanatory. :dd

	{Invalid Bonds section in molecule file} :dt

	Self-explanatory. :dd

	{Invalid Boolean syntax in if command} :dt

	Self-explanatory. :dd

	{Invalid Charges section in molecule file} :dt

	Self-explanatory. :dd

	{Invalid Coords section in molecule file} :dt

	Self-explanatory. :dd

	{Invalid Diameters section in molecule file} :dt

	Self-explanatory. :dd

	{Invalid Dihedrals section in molecule file} :dt

	Self-explanatory. :dd

	{Invalid Impropers section in molecule file} :dt

	Self-explanatory. :dd

	{Invalid Kokkos command-line args} :dt

	Self-explanatory. See Section 2.7 of the manual for details. :dd

	{Invalid LAMMPS restart file} :dt

	The file does not appear to be a LAMMPS restart file since
	it doesn't contain the correct magic string at the beginning. :dd

	{Invalid Masses section in molecule file} :dt

	Self-explanatory. :dd

	{Invalid REAX atom type} :dt

	There is a mis-match between LAMMPS atom types and the elements
	listed in the ReaxFF force field file. :dd

	{Invalid Special Bond Counts section in molecule file} :dt

	Self-explanatory. :dd

	{Invalid Types section in molecule file} :dt

	Self-explanatory. :dd

	{Invalid angle count in molecule file} :dt

	Self-explanatory. :dd

	{Invalid angle table length} :dt

	Length must be 2 or greater. :dd

	{Invalid angle type in Angles section of data file} :dt

	Angle type must be positive integer and within range of specified angle
	types. :dd

	{Invalid angle type in Angles section of molecule file} :dt

	Self-explanatory. :dd

	{Invalid angle type index for fix shake} :dt

	Self-explanatory. :dd

	{Invalid args for non-hybrid pair coefficients} :dt

	"NULL" is only supported in pair_coeff calls when using pair hybrid :dd

	{Invalid argument to factorial %d} :dt

	N must be >= 0 and <= 167, otherwise the factorial result is too
	large. :dd

	{Invalid atom ID in %s section of data file} :dt

	An atom in a section of the data file being read by fix property/atom
	has an invalid atom ID that is <= 0 or > the maximum existing atom ID. :dd

	{Invalid atom ID in Angles section of data file} :dt

	Atom IDs must be positive integers and within range of defined
	atoms. :dd

	{Invalid atom ID in Angles section of molecule file} :dt

	Self-explanatory. :dd

	{Invalid atom ID in Atoms section of data file} :dt

	Atom IDs must be positive integers. :dd

	{Invalid atom ID in Bodies section of data file} :dt

	Atom IDs must be positive integers and within range of defined
	atoms. :dd

	{Invalid atom ID in Bonds section of data file} :dt

	Atom IDs must be positive integers and within range of defined
	atoms. :dd

	{Invalid atom ID in Bonds section of molecule file} :dt

	Self-explanatory. :dd

	{Invalid atom ID in Bonus section of data file} :dt

	Atom IDs must be positive integers and within range of defined
	atoms. :dd

	{Invalid atom ID in Dihedrals section of data file} :dt

	Atom IDs must be positive integers and within range of defined
	atoms. :dd

	{Invalid atom ID in Impropers section of data file} :dt

	Atom IDs must be positive integers and within range of defined
	atoms. :dd

	{Invalid atom ID in Velocities section of data file} :dt

	Atom IDs must be positive integers and within range of defined
	atoms. :dd

	{Invalid atom ID in dihedrals section of molecule file} :dt

	Self-explanatory. :dd

	{Invalid atom ID in impropers section of molecule file} :dt

	Self-explanatory. :dd

	{Invalid atom ID in variable file} :dt

	Self-explanatory. :dd

	{Invalid atom IDs in neb file} :dt

	An ID in the file was not found in the system. :dd

	{Invalid atom diameter in molecule file} :dt

	Diameters must be >= 0.0. :dd

	{Invalid atom mass for fix shake} :dt

	Mass specified in fix shake command must be > 0.0. :dd

	{Invalid atom mass in molecule file} :dt

	Masses must be > 0.0. :dd

	{Invalid atom type in Atoms section of data file} :dt

	Atom types must range from 1 to specified # of types. :dd

	{Invalid atom type in create_atoms command} :dt

	The create_box command specified the range of valid atom types.
	An invalid type is being requested. :dd

	{Invalid atom type in create_atoms mol command} :dt

	The atom types in the defined molecule are added to the value
	specified in the create_atoms command, as an offset. The final value
	for each atom must be between 1 to N, where N is the number of atom
	types. :dd

	{Invalid atom type in fix atom/swap command} :dt

	The atom type specified in the atom/swap command does not exist. :dd

	{Invalid atom type in fix bond/create command} :dt

	Self-explanatory. :dd

	{Invalid atom type in fix deposit command} :dt

	Self-explanatory. :dd

	{Invalid atom type in fix deposit mol command} :dt

	The atom types in the defined molecule are added to the value
	specified in the create_atoms command, as an offset. The final value
	for each atom must be between 1 to N, where N is the number of atom
	types. :dd

	{Invalid atom type in fix gcmc command} :dt

	The atom type specified in the gcmc command does not exist. :dd

	{Invalid atom type in fix pour command} :dt

	Self-explanatory. :dd

	{Invalid atom type in fix pour mol command} :dt

	The atom types in the defined molecule are added to the value
	specified in the create_atoms command, as an offset. The final value
	for each atom must be between 1 to N, where N is the number of atom
	types. :dd

	{Invalid atom type in molecule file} :dt

	Atom types must range from 1 to specified # of types. :dd

	{Invalid atom type in neighbor exclusion list} :dt

	Atom types must range from 1 to Ntypes inclusive. :dd

	{Invalid atom type index for fix shake} :dt

	Atom types must range from 1 to Ntypes inclusive. :dd

	{Invalid atom types in pair_write command} :dt

	Atom types must range from 1 to Ntypes inclusive. :dd

	{Invalid atom vector in variable formula} :dt

	The atom vector is not recognized. :dd

	{Invalid atom_style body command} :dt

	No body style argument was provided. :dd

	{Invalid atom_style command} :dt

	Self-explanatory. :dd

	{Invalid attribute in dump custom command} :dt

	Self-explanatory. :dd

	{Invalid attribute in dump local command} :dt

	Self-explanatory. :dd

	{Invalid attribute in dump modify command} :dt

	Self-explanatory. :dd

	{Invalid basis setting in create_atoms command} :dt

	The basis index must be between 1 to N where N is the number of basis
	atoms in the lattice. The type index must be between 1 to N where N
	is the number of atom types. :dd

	{Invalid basis setting in fix append/atoms command} :dt

	The basis index must be between 1 to N where N is the number of basis
	atoms in the lattice. The type index must be between 1 to N where N
	is the number of atom types. :dd

	{Invalid bin bounds in compute chunk/atom} :dt

	The lo/hi values are inconsistent. :dd

	{Invalid bin bounds in fix ave/spatial} :dt

	The lo/hi values are inconsistent. :dd

	{Invalid body nparticle command} :dt

	Arguments in atom-style command are not correct. :dd

	{Invalid bond count in molecule file} :dt

	Self-explanatory. :dd

	{Invalid bond table length} :dt

	Length must be 2 or greater. :dd

	{Invalid bond type in Bonds section of data file} :dt

	Bond type must be positive integer and within range of specified bond
	types. :dd

	{Invalid bond type in Bonds section of molecule file} :dt

	Self-explanatory. :dd

	{Invalid bond type in create_bonds command} :dt

	Self-explanatory. :dd

	{Invalid bond type in fix bond/break command} :dt

	Self-explanatory. :dd

	{Invalid bond type in fix bond/create command} :dt

	Self-explanatory. :dd

	{Invalid bond type index for fix shake} :dt

	Self-explanatory. Check the fix shake command in the input script. :dd

	{Invalid coeffs for this dihedral style} :dt

	Cannot set class 2 coeffs in data file for this dihedral style. :dd

	{Invalid color in dump_modify command} :dt

	The specified color name was not in the list of recognized colors.
	See the dump_modify doc page. :dd

	{Invalid color map min/max values} :dt

	The min/max values are not consistent with either each other or
	with values in the color map. :dd

	{Invalid command-line argument} :dt

	One or more command-line arguments is invalid. Check the syntax of
	the command you are using to launch LAMMPS. :dd

	{Invalid compute ID in variable formula} :dt

	The compute is not recognized. :dd

	{Invalid create_atoms rotation vector for 2d model} :dt

	The rotation vector can only have a z component. :dd

	{Invalid custom OpenCL parameter string.} :dt

	There are not enough or too many parameters in the custom string for package
	GPU. :dd

	{Invalid cutoff in comm_modify command} :dt

	Specified cutoff must be >= 0.0. :dd

	{Invalid cutoffs in pair_write command} :dt

	Inner cutoff must be larger than 0.0 and less than outer cutoff. :dd

	{Invalid d1 or d2 value for pair colloid coeff} :dt

	Neither d1 or d2 can be < 0. :dd

	{Invalid data file section: Angle Coeffs} :dt

	Atom style does not allow angles. :dd

	{Invalid data file section: AngleAngle Coeffs} :dt

	Atom style does not allow impropers. :dd

	{Invalid data file section: AngleAngleTorsion Coeffs} :dt

	Atom style does not allow dihedrals. :dd

	{Invalid data file section: AngleTorsion Coeffs} :dt

	Atom style does not allow dihedrals. :dd

	{Invalid data file section: Angles} :dt

	Atom style does not allow angles. :dd

	{Invalid data file section: Bodies} :dt

	Atom style does not allow bodies. :dd

	{Invalid data file section: Bond Coeffs} :dt

	Atom style does not allow bonds. :dd

	{Invalid data file section: BondAngle Coeffs} :dt

	Atom style does not allow angles. :dd

	{Invalid data file section: BondBond Coeffs} :dt

	Atom style does not allow angles. :dd

	{Invalid data file section: BondBond13 Coeffs} :dt

	Atom style does not allow dihedrals. :dd

	{Invalid data file section: Bonds} :dt

	Atom style does not allow bonds. :dd

	{Invalid data file section: Dihedral Coeffs} :dt

	Atom style does not allow dihedrals. :dd

	{Invalid data file section: Dihedrals} :dt

	Atom style does not allow dihedrals. :dd

	{Invalid data file section: Ellipsoids} :dt

	Atom style does not allow ellipsoids. :dd

	{Invalid data file section: EndBondTorsion Coeffs} :dt

	Atom style does not allow dihedrals. :dd

	{Invalid data file section: Improper Coeffs} :dt

	Atom style does not allow impropers. :dd

	{Invalid data file section: Impropers} :dt

	Atom style does not allow impropers. :dd

	{Invalid data file section: Lines} :dt

	Atom style does not allow lines. :dd

	{Invalid data file section: MiddleBondTorsion Coeffs} :dt

	Atom style does not allow dihedrals. :dd

	{Invalid data file section: Triangles} :dt

	Atom style does not allow triangles. :dd

	{Invalid delta_conf in tad command} :dt

	The value must be between 0 and 1 inclusive. :dd

	{Invalid density in Atoms section of data file} :dt

	Density value cannot be <= 0.0. :dd

	{Invalid density in set command} :dt

	Density must be > 0.0. :dd

	{Invalid diameter in set command} :dt

	Self-explanatory. :dd

	{Invalid dihedral count in molecule file} :dt

	Self-explanatory. :dd

	{Invalid dihedral type in Dihedrals section of data file} :dt

	Dihedral type must be positive integer and within range of specified
	dihedral types. :dd

	{Invalid dihedral type in dihedrals section of molecule file} :dt

	Self-explanatory. :dd

	{Invalid dipole length in set command} :dt

	Self-explanatory. :dd

	{Invalid displace_atoms rotate axis for 2d} :dt

	Axis must be in z direction. :dd

	{Invalid dump dcd filename} :dt

	Filenames used with the dump dcd style cannot be binary or compressed
	or cause multiple files to be written. :dd

	{Invalid dump frequency} :dt

	Dump frequency must be 1 or greater. :dd

	{Invalid dump image element name} :dt

	The specified element name was not in the standard list of elements.
	See the dump_modify doc page. :dd

	{Invalid dump image filename} :dt

	The file produced by dump image cannot be binary and must
	be for a single processor. :dd

	{Invalid dump image persp value} :dt

	Persp value must be >= 0.0. :dd

	{Invalid dump image theta value} :dt

	Theta must be between 0.0 and 180.0 inclusive. :dd

	{Invalid dump image zoom value} :dt

	Zoom value must be > 0.0. :dd

	{Invalid dump movie filename} :dt

	The file produced by dump movie cannot be binary or compressed
	and must be a single file for a single processor. :dd

	{Invalid dump xtc filename} :dt

	Filenames used with the dump xtc style cannot be binary or compressed
	or cause multiple files to be written. :dd

	{Invalid dump xyz filename} :dt

	Filenames used with the dump xyz style cannot be binary or cause files
	to be written by each processor. :dd

	{Invalid dump_modify threshold operator} :dt

	Operator keyword used for threshold specification in not recognized. :dd

	{Invalid entry in -reorder file} :dt

	Self-explanatory. :dd

	{Invalid fix ID in variable formula} :dt

	The fix is not recognized. :dd

	{Invalid fix ave/time off column} :dt

	Self-explanatory. :dd

	{Invalid fix box/relax command for a 2d simulation} :dt

	Fix box/relax styles involving the z dimension cannot be used in
	a 2d simulation. :dd

	{Invalid fix box/relax command pressure settings} :dt

	If multiple dimensions are coupled, those dimensions must be specified. :dd

	{Invalid fix box/relax pressure settings} :dt

	Settings for coupled dimensions must be the same. :dd

	{Invalid fix nvt/npt/nph command for a 2d simulation} :dt

	Cannot control z dimension in a 2d model. :dd

	{Invalid fix nvt/npt/nph command pressure settings} :dt

	If multiple dimensions are coupled, those dimensions must be
	specified. :dd

	{Invalid fix nvt/npt/nph pressure settings} :dt

	Settings for coupled dimensions must be the same. :dd

	{Invalid fix press/berendsen for a 2d simulation} :dt

	The z component of pressure cannot be controlled for a 2d model. :dd

	{Invalid fix press/berendsen pressure settings} :dt

	Settings for coupled dimensions must be the same. :dd

	{Invalid fix qeq parameter file} :dt

	Element index > number of atom types. :dd

	{Invalid fix rigid npt/nph command for a 2d simulation} :dt

	Cannot control z dimension in a 2d model. :dd

	{Invalid fix rigid npt/nph command pressure settings} :dt

	If multiple dimensions are coupled, those dimensions must be
	specified. :dd

	{Invalid fix rigid/small npt/nph command for a 2d simulation} :dt

	Cannot control z dimension in a 2d model. :dd

	{Invalid fix rigid/small npt/nph command pressure settings} :dt

	If multiple dimensions are coupled, those dimensions must be
	specified. :dd

	{Invalid flag in force field section of restart file} :dt

	Unrecognized entry in restart file. :dd

	{Invalid flag in header section of restart file} :dt

	Unrecognized entry in restart file. :dd

	{Invalid flag in peratom section of restart file} :dt

	The format of this section of the file is not correct. :dd

	{Invalid flag in type arrays section of restart file} :dt

	Unrecognized entry in restart file. :dd

	{Invalid frequency in temper command} :dt

	Nevery must be > 0. :dd

	{Invalid group ID in neigh_modify command} :dt

	A group ID used in the neigh_modify command does not exist. :dd

	{Invalid group function in variable formula} :dt

	Group function is not recognized. :dd

	{Invalid group in comm_modify command} :dt

	Self-explanatory. :dd

	{Invalid image up vector} :dt

	Up vector cannot be (0,0,0). :dd

	{Invalid immediate variable} :dt

	Syntax of immediate value is incorrect. :dd

	{Invalid improper count in molecule file} :dt

	Self-explanatory. :dd

	{Invalid improper type in Impropers section of data file} :dt

	Improper type must be positive integer and within range of specified
	improper types. :dd

	{Invalid improper type in impropers section of molecule file} :dt

	Self-explanatory. :dd

	{Invalid index for non-body particles in compute body/local command} :dt

	Only indices 1,2,3 can be used for non-body particles. :dd

	{Invalid index in compute body/local command} :dt

	Self-explanatory. :dd

	{Invalid is_active() function in variable formula} :dt

	Self-explanatory. :dd

	{Invalid is_available() function in variable formula} :dt

	Self-explanatory. :dd

	{Invalid is_defined() function in variable formula} :dt

	Self-explanatory. :dd

	{Invalid keyword in angle table parameters} :dt

	Self-explanatory. :dd

	{Invalid keyword in bond table parameters} :dt

	Self-explanatory. :dd

	{Invalid keyword in compute angle/local command} :dt

	Self-explanatory. :dd

	{Invalid keyword in compute bond/local command} :dt

	Self-explanatory. :dd

	{Invalid keyword in compute dihedral/local command} :dt

	Self-explanatory. :dd

	{Invalid keyword in compute improper/local command} :dt

	Self-explanatory. :dd

	{Invalid keyword in compute pair/local command} :dt

	Self-explanatory. :dd

	{Invalid keyword in compute property/atom command} :dt

	Self-explanatory. :dd

	{Invalid keyword in compute property/chunk command} :dt

	Self-explanatory. :dd

	{Invalid keyword in compute property/local command} :dt

	Self-explanatory. :dd

	{Invalid keyword in dump cfg command} :dt

	Self-explanatory. :dd

	{Invalid keyword in pair table parameters} :dt

	Keyword used in list of table parameters is not recognized. :dd

	{Invalid length in set command} :dt

	Self-explanatory. :dd

	{Invalid mass in set command} :dt

	Self-explanatory. :dd

	{Invalid mass line in data file} :dt

	Self-explanatory. :dd

	{Invalid mass value} :dt

	Self-explanatory. :dd

	{Invalid math function in variable formula} :dt

	Self-explanatory. :dd

	{Invalid math/group/special function in variable formula} :dt

	Self-explanatory. :dd

	{Invalid option in lattice command for non-custom style} :dt

	Certain lattice keywords are not supported unless the
	lattice style is "custom". :dd

	{Invalid order of forces within respa levels} :dt

	For respa, ordering of force computations within respa levels must
	obey certain rules. E.g. bonds cannot be compute less frequently than
	angles, pairwise forces cannot be computed less frequently than
	kspace, etc. :dd

	{Invalid pair table cutoff} :dt

	Cutoffs in pair_coeff command are not valid with read-in pair table. :dd

	{Invalid pair table length} :dt

	Length of read-in pair table is invalid :dd

	{Invalid param file for fix qeq/shielded} :dt

	Invalid value of gamma. :dd

	{Invalid param file for fix qeq/slater} :dt

	Zeta value is 0.0. :dd

	{Invalid partitions in processors part command} :dt

	Valid partitions are numbered 1 to N and the sender and receiver
	cannot be the same partition. :dd

	{Invalid python command} :dt

	Self-explanatory. Check the input script syntax and compare to the
	documentation for the command. You can use -echo screen as a
	command-line option when running LAMMPS to see the offending line. :dd

	{Invalid radius in Atoms section of data file} :dt

	Radius must be >= 0.0. :dd

	{Invalid random number seed in fix ttm command} :dt

	Random number seed must be > 0. :dd

	{Invalid random number seed in set command} :dt

	Random number seed must be > 0. :dd

	{Invalid replace values in compute reduce} :dt

	Self-explanatory. :dd

	{Invalid rigid body ID in fix rigid file} :dt

	The ID does not match the number of an existing ID of rigid bodies
	that are defined by the fix rigid command. :dd

	{Invalid rigid body ID in fix rigid/small file} :dt

	The ID does not match the number of an existing ID of rigid bodies
	that are defined by the fix rigid/small command. :dd

	{Invalid run command N value} :dt

	The number of timesteps must fit in a 32-bit integer. If you want to
	run for more steps than this, perform multiple shorter runs. :dd

	{Invalid run command start/stop value} :dt

	Self-explanatory. :dd

	{Invalid run command upto value} :dt

	Self-explanatory. :dd

	{Invalid seed for Marsaglia random # generator} :dt

	The initial seed for this random number generator must be a positive
	integer less than or equal to 900 million. :dd

	{Invalid seed for Park random # generator} :dt

	The initial seed for this random number generator must be a positive
	integer. :dd

	{Invalid shake angle type in molecule file} :dt

	Self-explanatory. :dd

	{Invalid shake atom in molecule file} :dt

	Self-explanatory. :dd

	{Invalid shake bond type in molecule file} :dt

	Self-explanatory. :dd

	{Invalid shake flag in molecule file} :dt

	Self-explanatory. :dd

	{Invalid shape in Ellipsoids section of data file} :dt

	Self-explanatory. :dd

	{Invalid shape in Triangles section of data file} :dt

	Two or more of the triangle corners are duplicate points. :dd

	{Invalid shape in set command} :dt

	Self-explanatory. :dd

	{Invalid shear direction for fix wall/gran} :dt

	Self-explanatory. :dd

	{Invalid special atom index in molecule file} :dt

	Self-explanatory. :dd

	{Invalid special function in variable formula} :dt

	Self-explanatory. :dd

	{Invalid style in pair_write command} :dt

	Self-explanatory. Check the input script. :dd

	{Invalid syntax in variable formula} :dt

	Self-explanatory. :dd

	{Invalid t_event in prd command} :dt

	Self-explanatory. :dd

	{Invalid t_event in tad command} :dt

	The value must be greater than 0. :dd

	{Invalid template atom in Atoms section of data file} :dt

	The atom indices must be between 1 to N, where N is the number of
	atoms in the template molecule the atom belongs to. :dd

	{Invalid template index in Atoms section of data file} :dt

	The template indices must be between 1 to N, where N is the number of
	molecules in the template. :dd

	{Invalid thermo keyword in variable formula} :dt

	The keyword is not recognized. :dd

	{Invalid threads_per_atom specified.} :dt

	For 3-body potentials on the GPU, the threads_per_atom setting cannot be
	greater than 4 for NVIDIA GPUs. :dd

	{Invalid timestep reset for fix ave/atom} :dt

	Resetting the timestep has invalidated the sequence of timesteps this
	fix needs to process. :dd

	{Invalid timestep reset for fix ave/chunk} :dt

	Resetting the timestep has invalidated the sequence of timesteps this
	fix needs to process. :dd

	{Invalid timestep reset for fix ave/correlate} :dt

	Resetting the timestep has invalidated the sequence of timesteps this
	fix needs to process. :dd

	{Invalid timestep reset for fix ave/histo} :dt

	Resetting the timestep has invalidated the sequence of timesteps this
	fix needs to process. :dd

	{Invalid timestep reset for fix ave/spatial} :dt

	Resetting the timestep has invalidated the sequence of timesteps this
	fix needs to process. :dd

	{Invalid timestep reset for fix ave/time} :dt

	Resetting the timestep has invalidated the sequence of timesteps this
	fix needs to process. :dd

	{Invalid tmax in tad command} :dt

	The value must be greater than 0.0. :dd

	{Invalid type for mass set} :dt

	Mass command must set a type from 1-N where N is the number of atom
	types. :dd

	{Invalid use of library file() function} :dt

	This function is called thru the library interface. This
	error should not occur. Contact the developers if it does. :dd

	{Invalid value in set command} :dt

	The value specified for the setting is invalid, likely because it is
	too small or too large. :dd

	{Invalid variable evaluation in variable formula} :dt

	A variable used in a formula could not be evaluated. :dd

	{Invalid variable in next command} :dt

	Self-explanatory. :dd

	{Invalid variable name} :dt

	Variable name used in an input script line is invalid. :dd

	{Invalid variable name in variable formula} :dt

	Variable name is not recognized. :dd

	{Invalid variable style in special function next} :dt

	Only file-style or atomfile-style variables can be used with next(). :dd

	{Invalid variable style with next command} :dt

	Variable styles {equal} and {world} cannot be used in a next
	command. :dd

	{Invalid volume in set command} :dt

	Volume must be > 0.0. :dd

	{Invalid wiggle direction for fix wall/gran} :dt

	Self-explanatory. :dd

	{Invoked angle equil angle on angle style none} :dt

	Self-explanatory. :dd

	{Invoked angle single on angle style none} :dt

	Self-explanatory. :dd

	{Invoked bond equil distance on bond style none} :dt

	Self-explanatory. :dd

	{Invoked bond single on bond style none} :dt

	Self-explanatory. :dd

	{Invoked pair single on pair style none} :dt

	A command (e.g. a dump) attempted to invoke the single() function on a
	pair style none, which is illegal. You are probably attempting to
	compute per-atom quantities with an undefined pair style. :dd

	{Invoking coulombic in pair style lj/coul requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Invoking coulombic in pair style lj/long/dipole/long requires atom attribute q} :dt

	The atom style defined does not have these attributes. :dd

	{KIM neighbor iterator exceeded range} :dt

	This should not happen. It likely indicates a bug
	in the KIM implementation of the interatomic potential
	where it is requesting neighbors incorrectly. :dd

	{KOKKOS package does not yet support comm_style tiled} :dt

	Self-explanatory. :dd

	{KOKKOS package requires a kokkos enabled atom_style} :dt

	Self-explanatory. :dd

	{KSpace accuracy must be > 0} :dt

	The kspace accuracy designated in the input must be greater than zero. :dd

	{KSpace accuracy too large to estimate G vector} :dt

	Reduce the accuracy request or specify gwald explicitly
	via the kspace_modify command. :dd

	{KSpace accuracy too low} :dt

	Requested accuracy must be less than 1.0. :dd

	{KSpace solver requires a pair style} :dt

	No pair style is defined. :dd

	{KSpace style does not yet support triclinic geometries} :dt

	The specified kspace style does not allow for non-orthogonal
	simulation boxes. :dd

	{KSpace style has not yet been set} :dt

	Cannot use kspace_modify command until a kspace style is set. :dd

	{KSpace style is incompatible with Pair style} :dt

	Setting a kspace style requires that a pair style with matching
	long-range Coulombic or dispersion components be used. :dd

	{Keyword %s in MEAM parameter file not recognized} :dt

	Self-explanatory. :dd

	{Kokkos has been compiled for CUDA but no GPUs are requested} :dt

	One or more GPUs must be used when Kokkos is compiled for CUDA. :dd

	{Kspace style does not support compute group/group} :dt

	Self-explanatory. :dd

	{Kspace style pppm/disp/tip4p requires newton on} :dt

	Self-explanatory. :dd

	{Kspace style pppm/tip4p requires newton on} :dt

	Self-explanatory. :dd

	{Kspace style requires atom attribute q} :dt

	The atom style defined does not have these attributes. :dd

	{Kspace_modify eigtol must be smaller than one} :dt

	Self-explanatory. :dd

	{LAMMPS is not built with Python embedded} :dt

	This is done by including the PYTHON package before LAMMPS is built.
	This is required to use python-style variables. :dd

	{LAMMPS unit_style lj not supported by KIM models} :dt

	Self-explanatory. Check the input script or data file. :dd

	{LJ6 off not supported in pair_style buck/long/coul/long} :dt

	Self-explanatory. :dd

	{Label wasn't found in input script} :dt

	Self-explanatory. :dd

	{Lattice orient vectors are not orthogonal} :dt

	The three specified lattice orientation vectors must be mutually
	orthogonal. :dd

	{Lattice orient vectors are not right-handed} :dt

	The three specified lattice orientation vectors must create a
	right-handed coordinate system such that a1 cross a2 = a3. :dd

	{Lattice primitive vectors are collinear} :dt

	The specified lattice primitive vectors do not for a unit cell with
	non-zero volume. :dd

	{Lattice settings are not compatible with 2d simulation} :dt

	One or more of the specified lattice vectors has a non-zero z
	component. :dd

	{Lattice spacings are invalid} :dt

	Each x,y,z spacing must be > 0. :dd

	{Lattice style incompatible with simulation dimension} :dt

	2d simulation can use sq, sq2, or hex lattice. 3d simulation can use
	sc, bcc, or fcc lattice. :dd

	{Log of zero/negative value in variable formula} :dt

	Self-explanatory. :dd

	{Lost atoms via balance: original %ld current %ld} :dt

	This should not occur. Report the problem to the developers. :dd

	{Lost atoms: original %ld current %ld} :dt

	Lost atoms are checked for each time thermo output is done. See the
	thermo_modify lost command for options. Lost atoms usually indicate
	bad dynamics, e.g. atoms have been blown far out of the simulation
	box, or moved further than one processor's sub-domain away before
	reneighboring. :dd

	{MEAM library error %d} :dt

	A call to the MEAM Fortran library returned an error. :dd

	{MPI_LMP_BIGINT and bigint in lmptype.h are not compatible} :dt

	The size of the MPI datatype does not match the size of a bigint. :dd

	{MPI_LMP_TAGINT and tagint in lmptype.h are not compatible} :dt

	The size of the MPI datatype does not match the size of a tagint. :dd

	{MSM can only currently be used with comm_style brick} :dt

	This is a current restriction in LAMMPS. :dd

	{MSM grid is too large} :dt

	The global MSM grid is larger than OFFSET in one or more dimensions.
	OFFSET is currently set to 16384. You likely need to decrease the
	requested accuracy. :dd

	{MSM order must be 4, 6, 8, or 10} :dt

	This is a limitation of the MSM implementation in LAMMPS:
	the MSM order can only be 4, 6, 8, or 10. :dd

	{Mass command before simulation box is defined} :dt

	The mass command cannot be used before a read_data, read_restart, or
	create_box command. :dd

	{Matrix factorization to split dispersion coefficients failed} :dt

	This should not normally happen. Contact the developers. :dd

	{Min_style command before simulation box is defined} :dt

	The min_style command cannot be used before a read_data, read_restart,
	or create_box command. :dd

	{Minimization could not find thermo_pe compute} :dt

	This compute is created by the thermo command. It must have been
	explicitly deleted by a uncompute command. :dd

	{Minimize command before simulation box is defined} :dt

	The minimize command cannot be used before a read_data, read_restart,
	or create_box command. :dd

	{Mismatched brackets in variable} :dt

	Self-explanatory. :dd

	{Mismatched compute in variable formula} :dt

	A compute is referenced incorrectly or a compute that produces per-atom
	values is used in an equal-style variable formula. :dd

	{Mismatched fix in variable formula} :dt

	A fix is referenced incorrectly or a fix that produces per-atom
	values is used in an equal-style variable formula. :dd

	{Mismatched variable in variable formula} :dt

	A variable is referenced incorrectly or an atom-style variable that
	produces per-atom values is used in an equal-style variable
	formula. :dd

	{Modulo 0 in variable formula} :dt

	Self-explanatory. :dd

	{Molecule IDs too large for compute chunk/atom} :dt

	The IDs must not be larger than can be stored in a 32-bit integer
	since chunk IDs are 32-bit integers. :dd

	{Molecule auto special bond generation overflow} :dt

	Counts exceed maxspecial setting for other atoms in system. :dd

	{Molecule file has angles but no nangles setting} :dt

	Self-explanatory. :dd

	{Molecule file has body params but no setting for them} :dt

	Self-explanatory. :dd

	{Molecule file has bonds but no nbonds setting} :dt

	Self-explanatory. :dd

	{Molecule file has dihedrals but no ndihedrals setting} :dt

	Self-explanatory. :dd

	{Molecule file has impropers but no nimpropers setting} :dt

	Self-explanatory. :dd

	{Molecule file has no Body Doubles section} :dt

	Self-explanatory. :dd

	{Molecule file has no Body Integers section} :dt

	Self-explanatory. :dd

	{Molecule file has special flags but no bonds} :dt

	Self-explanatory. :dd

	{Molecule file needs both Special Bond sections} :dt

	Self-explanatory. :dd

	{Molecule file requires atom style body} :dt

	Self-explanatory. :dd

	{Molecule file shake flags not before shake atoms} :dt

	The order of the two sections is important. :dd

	{Molecule file shake flags not before shake bonds} :dt

	The order of the two sections is important. :dd

	{Molecule file shake info is incomplete} :dt

	All 3 SHAKE sections are needed. :dd

	{Molecule file special list does not match special count} :dt

	The number of values in an atom's special list does not match count. :dd

	{Molecule file z center-of-mass must be 0.0 for 2d} :dt

	Self-explanatory. :dd

	{Molecule file z coord must be 0.0 for 2d} :dt

	Self-explanatory. :dd

	{Molecule natoms must be 1 for body particle} :dt

	Self-explanatory. :dd

	{Molecule sizescale must be 1.0 for body particle} :dt

	Self-explanatory. :dd

	{Molecule template ID for atom_style template does not exist} :dt

	Self-explanatory. :dd

	{Molecule template ID for create_atoms does not exist} :dt

	Self-explanatory. :dd

	{Molecule template ID for fix deposit does not exist} :dt

	Self-explanatory. :dd

	{Molecule template ID for fix gcmc does not exist} :dt

	Self-explanatory. :dd

	{Molecule template ID for fix pour does not exist} :dt

	Self-explanatory. :dd

	{Molecule template ID for fix rigid/small does not exist} :dt

	Self-explanatory. :dd

	{Molecule template ID for fix shake does not exist} :dt

	Self-explanatory. :dd

	{Molecule template ID must be alphanumeric or underscore characters} :dt

	Self-explanatory. :dd

	{Molecule topology/atom exceeds system topology/atom} :dt

	The number of bonds, angles, etc per-atom in the molecule exceeds the
	system setting. See the create_box command for how to specify these
	values. :dd

	{Molecule topology type exceeds system topology type} :dt

	The number of bond, angle, etc types in the molecule exceeds the
	system setting. See the create_box command for how to specify these
	values. :dd

	{More than one fix deform} :dt

	Only one fix deform can be defined at a time. :dd

	{More than one fix freeze} :dt

	Only one of these fixes can be defined, since the granular pair
	potentials access it. :dd

	{More than one fix shake} :dt

	Only one fix shake can be defined. :dd

	{Mu not allowed when not using semi-grand in fix atom/swap command} :dt

	Self-explanatory. :dd

	{Must define angle_style before Angle Coeffs} :dt

	Must use an angle_style command before reading a data file that
	defines Angle Coeffs. :dd

	{Must define angle_style before BondAngle Coeffs} :dt

	Must use an angle_style command before reading a data file that
	defines Angle Coeffs. :dd

	{Must define angle_style before BondBond Coeffs} :dt

	Must use an angle_style command before reading a data file that
	defines Angle Coeffs. :dd

	{Must define bond_style before Bond Coeffs} :dt

	Must use a bond_style command before reading a data file that
	defines Bond Coeffs. :dd

	{Must define dihedral_style before AngleAngleTorsion Coeffs} :dt

	Must use a dihedral_style command before reading a data file that
	defines AngleAngleTorsion Coeffs. :dd

	{Must define dihedral_style before AngleTorsion Coeffs} :dt

	Must use a dihedral_style command before reading a data file that
	defines AngleTorsion Coeffs. :dd

	{Must define dihedral_style before BondBond13 Coeffs} :dt

	Must use a dihedral_style command before reading a data file that
	defines BondBond13 Coeffs. :dd

	{Must define dihedral_style before Dihedral Coeffs} :dt

	Must use a dihedral_style command before reading a data file that
	defines Dihedral Coeffs. :dd

	{Must define dihedral_style before EndBondTorsion Coeffs} :dt

	Must use a dihedral_style command before reading a data file that
	defines EndBondTorsion Coeffs. :dd

	{Must define dihedral_style before MiddleBondTorsion Coeffs} :dt

	Must use a dihedral_style command before reading a data file that
	defines MiddleBondTorsion Coeffs. :dd

	{Must define improper_style before AngleAngle Coeffs} :dt

	Must use an improper_style command before reading a data file that
	defines AngleAngle Coeffs. :dd

	{Must define improper_style before Improper Coeffs} :dt

	Must use an improper_style command before reading a data file that
	defines Improper Coeffs. :dd

	{Must define pair_style before Pair Coeffs} :dt

	Must use a pair_style command before reading a data file that defines
	Pair Coeffs. :dd

	{Must define pair_style before PairIJ Coeffs} :dt

	Must use a pair_style command before reading a data file that defines
	PairIJ Coeffs. :dd

	{Must have more than one processor partition to temper} :dt

	Cannot use the temper command with only one processor partition. Use
	the -partition command-line option. :dd

	{Must read Atoms before Angles} :dt

	The Atoms section of a data file must come before an Angles section. :dd

	{Must read Atoms before Bodies} :dt

	The Atoms section of a data file must come before a Bodies section. :dd

	{Must read Atoms before Bonds} :dt

	The Atoms section of a data file must come before a Bonds section. :dd

	{Must read Atoms before Dihedrals} :dt

	The Atoms section of a data file must come before a Dihedrals section. :dd

	{Must read Atoms before Ellipsoids} :dt

	The Atoms section of a data file must come before a Ellipsoids
	section. :dd

	{Must read Atoms before Impropers} :dt

	The Atoms section of a data file must come before an Impropers
	section. :dd

	{Must read Atoms before Lines} :dt

	The Atoms section of a data file must come before a Lines section. :dd

	{Must read Atoms before Triangles} :dt

	The Atoms section of a data file must come before a Triangles section. :dd

	{Must read Atoms before Velocities} :dt

	The Atoms section of a data file must come before a Velocities
	section. :dd

	{Must set both respa inner and outer} :dt

	Cannot use just the inner or outer option with respa without using the
	other. :dd

	{Must set number of threads via package omp command} :dt

	Because you are using the USER-OMP package, set the number of threads
	via its settings, not by the pair_style snap nthreads setting. :dd

	{Must shrink-wrap piston boundary} :dt

	The boundary style of the face where the piston is applied must be of
	type s (shrink-wrapped). :dd

	{Must specify a region in fix deposit} :dt

	The region keyword must be specified with this fix. :dd

	{Must specify a region in fix pour} :dt

	Self-explanatory. :dd

	{Must specify at least 2 types in fix atom/swap command} :dt

	Self-explanatory. :dd

	{Must use 'kspace_modify pressure/scalar no' for rRESPA with kspace_style MSM} :dt

	The kspace scalar pressure option cannot (yet) be used with rRESPA. :dd

	{Must use 'kspace_modify pressure/scalar no' for tensor components with kspace_style msm} :dt

	Otherwise MSM will compute only a scalar pressure. See the kspace_modify
	command for details on this setting. :dd

	{Must use 'kspace_modify pressure/scalar no' to obtain per-atom virial with kspace_style MSM} :dt

	The kspace scalar pressure option cannot be used to obtain per-atom virial. :dd

	{Must use 'kspace_modify pressure/scalar no' with GPU MSM Pair styles} :dt

	The kspace scalar pressure option is not (yet) compatible with GPU MSM Pair styles. :dd

	{Must use 'kspace_modify pressure/scalar no' with kspace_style msm/cg} :dt

	The kspace scalar pressure option is not compatible with kspace_style msm/cg. :dd

	{Must use -in switch with multiple partitions} :dt

	A multi-partition simulation cannot read the input script from stdin.
	The -in command-line option must be used to specify a file. :dd

	{Must use Kokkos half/thread or full neighbor list with threads or GPUs} :dt

	Using Kokkos half-neighbor lists with threading is not allowed. :dd

	{Must use a block or cylinder region with fix pour} :dt

	Self-explanatory. :dd

	{Must use a block region with fix pour for 2d simulations} :dt

	Self-explanatory. :dd

	{Must use a bond style with TIP4P potential} :dt

	TIP4P potentials assume bond lengths in water are constrained
	by a fix shake command. :dd

	{Must use a molecular atom style with fix poems molecule} :dt

	Self-explanatory. :dd

	{Must use a z-axis cylinder region with fix pour} :dt

	Self-explanatory. :dd

	{Must use an angle style with TIP4P potential} :dt

	TIP4P potentials assume angles in water are constrained by a fix shake
	command. :dd

	{Must use atom map style array with Kokkos} :dt

	See the atom_modify map command. :dd

	{Must use atom style with molecule IDs with fix bond/swap} :dt

	Self-explanatory. :dd

	{Must use pair_style comb or comb3 with fix qeq/comb} :dt

	Self-explanatory. :dd

	{Must use variable energy with fix addforce} :dt

	Must define an energy variable when applying a dynamic
	force during minimization. :dd

	{Must use variable energy with fix efield} :dt

	You must define an energy when performing a minimization with a
	variable E-field. :dd

	{NEB command before simulation box is defined} :dt

	Self-explanatory. :dd

	{NEB requires damped dynamics minimizer} :dt

	Use a different minimization style. :dd

	{NEB requires use of fix neb} :dt

	Self-explanatory. :dd

	{NL ramp in wall/piston only implemented in zlo for now} :dt

	The ramp keyword can only be used for piston applied to face zlo. :dd

	{Need nswaptypes mu values in fix atom/swap command} :dt

	Self-explanatory. :dd

	{Needed bonus data not in data file} :dt

	Some atom styles require bonus data. See the read_data doc page for
	details. :dd

	{Needed molecular topology not in data file} :dt

	The header of the data file indicated bonds, angles, etc would be
	included, but they are not present. :dd

	{Neigh_modify exclude molecule requires atom attribute molecule} :dt

	Self-explanatory. :dd

	{Neigh_modify include group != atom_modify first group} :dt

	Self-explanatory. :dd

	{Neighbor delay must be 0 or multiple of every setting} :dt

	The delay and every parameters set via the neigh_modify command are
	inconsistent. If the delay setting is non-zero, then it must be a
	multiple of the every setting. :dd

	{Neighbor include group not allowed with ghost neighbors} :dt

	This is a current restriction within LAMMPS. :dd

	{Neighbor list overflow, boost neigh_modify one} :dt

	There are too many neighbors of a single atom. Use the neigh_modify
	command to increase the max number of neighbors allowed for one atom.
	You may also want to boost the page size. :dd

	{Neighbor multi not yet enabled for ghost neighbors} :dt

	This is a current restriction within LAMMPS. :dd

	{Neighbor multi not yet enabled for granular} :dt

	Self-explanatory. :dd

	{Neighbor multi not yet enabled for rRESPA} :dt

	Self-explanatory. :dd

	{Neighbor page size must be >= 10x the one atom setting} :dt

	This is required to prevent wasting too much memory. :dd

	{New atom IDs exceed maximum allowed ID} :dt

	See the setting for tagint in the src/lmptype.h file. :dd

	{New bond exceeded bonds per atom in create_bonds} :dt

	See the read_data command for info on setting the "extra bond per
	atom" header value to allow for additional bonds to be formed. :dd

	{New bond exceeded bonds per atom in fix bond/create} :dt

	See the read_data command for info on setting the "extra bond per
	atom" header value to allow for additional bonds to be formed. :dd

	{New bond exceeded special list size in fix bond/create} :dt

	See the special_bonds extra command for info on how to leave space in
	the special bonds list to allow for additional bonds to be formed. :dd

	{Newton bond change after simulation box is defined} :dt

	The newton command cannot be used to change the newton bond value
	after a read_data, read_restart, or create_box command. :dd

	{Next command must list all universe and uloop variables} :dt

	This is to insure they stay in sync. :dd

	{No Kspace style defined for compute group/group} :dt

	Self-explanatory. :dd

	{No OpenMP support compiled in} :dt

	An OpenMP flag is set, but LAMMPS was not built with
	OpenMP support. :dd

	{No angle style is defined for compute angle/local} :dt

	Self-explanatory. :dd

	{No angles allowed with this atom style} :dt

	Self-explanatory. :dd

	{No atoms in data file} :dt

	The header of the data file indicated that atoms would be included,
	but they are not present. :dd

	{No basis atoms in lattice} :dt

	Basis atoms must be defined for lattice style user. :dd

	{No bodies allowed with this atom style} :dt

	Self-explanatory. Check data file. :dd

	{No bond style is defined for compute bond/local} :dt

	Self-explanatory. :dd

	{No bonds allowed with this atom style} :dt

	Self-explanatory. :dd

	{No box information in dump. You have to use 'box no'} :dt

	Self-explanatory. :dd

	{No count or invalid atom count in molecule file} :dt

	The number of atoms must be specified. :dd

	{No dihedral style is defined for compute dihedral/local} :dt

	Self-explanatory. :dd

	{No dihedrals allowed with this atom style} :dt

	Self-explanatory. :dd

	{No dump custom arguments specified} :dt

	The dump custom command requires that atom quantities be specified to
	output to dump file. :dd

	{No dump local arguments specified} :dt

	Self-explanatory. :dd

	{No ellipsoids allowed with this atom style} :dt

	Self-explanatory. Check data file. :dd

	{No fix gravity defined for fix pour} :dt

	Gravity is required to use fix pour. :dd

	{No improper style is defined for compute improper/local} :dt

	Self-explanatory. :dd

	{No impropers allowed with this atom style} :dt

	Self-explanatory. :dd

	{No input values for fix ave/spatial} :dt

	Self-explanatory. :dd

	{No lines allowed with this atom style} :dt

	Self-explanatory. Check data file. :dd

	{No matching element in ADP potential file} :dt

	The ADP potential file does not contain elements that match the
	requested elements. :dd

	{No matching element in EAM potential file} :dt

	The EAM potential file does not contain elements that match the
	requested elements. :dd

	{No molecule topology allowed with atom style template} :dt

	The data file cannot specify the number of bonds, angles, etc,
	because this info if inferred from the molecule templates. :dd

	{No overlap of box and region for create_atoms} :dt

	Self-explanatory. :dd

	{No pair coul/streitz for fix qeq/slater} :dt

	These commands must be used together. :dd

	{No pair hbond/dreiding coefficients set} :dt

	Self-explanatory. :dd

	{No pair style defined for compute group/group} :dt

	Cannot calculate group interactions without a pair style defined. :dd

	{No pair style is defined for compute pair/local} :dt

	Self-explanatory. :dd

	{No pair style is defined for compute property/local} :dt

	Self-explanatory. :dd

	{No rigid bodies defined} :dt

	The fix specification did not end up defining any rigid bodies. :dd

	{No triangles allowed with this atom style} :dt

	Self-explanatory. Check data file. :dd

	{No values in fix ave/chunk command} :dt

	Self-explanatory. :dd

	{No values in fix ave/time command} :dt

	Self-explanatory. :dd

	{Non digit character between brackets in variable} :dt

	Self-explanatory. :dd

	{Non integer # of swaps in temper command} :dt

	Swap frequency in temper command must evenly divide the total # of
	timesteps. :dd

	{Non-numeric box dimensions - simulation unstable} :dt

	The box size has apparently blown up. :dd

	{Non-zero atom IDs with atom_modify id = no} :dt

	Self-explanatory. :dd

	{Non-zero read_data shift z value for 2d simulation} :dt

	Self-explanatory. :dd

	{Nprocs not a multiple of N for -reorder} :dt

	Self-explanatory. :dd

	{Number of core atoms != number of shell atoms} :dt

	There must be a one-to-one pairing of core and shell atoms. :dd

	{Numeric index is out of bounds} :dt

	A command with an argument that specifies an integer or range of
	integers is using a value that is less than 1 or greater than the
	maximum allowed limit. :dd

	{One or more Atom IDs is negative} :dt

	Atom IDs must be positive integers. :dd

	{One or more atom IDs is too big} :dt

	The limit on atom IDs is set by the SMALLBIG, BIGBIG, SMALLSMALL
	setting in your Makefile. See Section_start 2.2 of the manual for
	more details. :dd

	{One or more atom IDs is zero} :dt

	Either all atoms IDs must be zero or none of them. :dd

	{One or more atoms belong to multiple rigid bodies} :dt

	Two or more rigid bodies defined by the fix rigid command cannot
	contain the same atom. :dd

	{One or more rigid bodies are a single particle} :dt

	Self-explanatory. :dd

	{One or zero atoms in rigid body} :dt

	Any rigid body defined by the fix rigid command must contain 2 or more
	atoms. :dd

	{Only 2 types allowed when not using semi-grand in fix atom/swap command} :dt

	Self-explanatory. :dd

	{Only one cut-off allowed when requesting all long} :dt

	Self-explanatory. :dd

	{Only one cutoff allowed when requesting all long} :dt

	Self-explanatory. :dd

	{Only zhi currently implemented for fix append/atoms} :dt

	Self-explanatory. :dd

	{Out of range atoms - cannot compute MSM} :dt

	One or more atoms are attempting to map their charge to a MSM grid point
	that is not owned by a processor. This is likely for one of two
	reasons, both of them bad. First, it may mean that an atom near the
	boundary of a processor's sub-domain has moved more than 1/2 the
	"neighbor skin distance"_neighbor.html without neighbor lists being
	rebuilt and atoms being migrated to new processors. This also means
	you may be missing pairwise interactions that need to be computed.
	The solution is to change the re-neighboring criteria via the
	"neigh_modify"_neigh_modify.html command. The safest settings are
	"delay 0 every 1 check yes". Second, it may mean that an atom has
	moved far outside a processor's sub-domain or even the entire
	simulation box. This indicates bad physics, e.g. due to highly
	overlapping atoms, too large a timestep, etc. :dd

	{Out of range atoms - cannot compute PPPM} :dt

	One or more atoms are attempting to map their charge to a PPPM grid
	point that is not owned by a processor. This is likely for one of two
	reasons, both of them bad. First, it may mean that an atom near the
	boundary of a processor's sub-domain has moved more than 1/2 the
	"neighbor skin distance"_neighbor.html without neighbor lists being
	rebuilt and atoms being migrated to new processors. This also means
	you may be missing pairwise interactions that need to be computed.
	The solution is to change the re-neighboring criteria via the
	"neigh_modify"_neigh_modify.html command. The safest settings are
	"delay 0 every 1 check yes". Second, it may mean that an atom has
	moved far outside a processor's sub-domain or even the entire
	simulation box. This indicates bad physics, e.g. due to highly
	overlapping atoms, too large a timestep, etc. :dd

	{Out of range atoms - cannot compute PPPMDisp} :dt

	One or more atoms are attempting to map their charge to a PPPM grid
	point that is not owned by a processor. This is likely for one of two
	reasons, both of them bad. First, it may mean that an atom near the
	boundary of a processor's sub-domain has moved more than 1/2 the
	"neighbor skin distance"_neighbor.html without neighbor lists being
	rebuilt and atoms being migrated to new processors. This also means
	you may be missing pairwise interactions that need to be computed.
	The solution is to change the re-neighboring criteria via the
	"neigh_modify"_neigh_modify.html command. The safest settings are
	"delay 0 every 1 check yes". Second, it may mean that an atom has
	moved far outside a processor's sub-domain or even the entire
	simulation box. This indicates bad physics, e.g. due to highly
	overlapping atoms, too large a timestep, etc. :dd

	{Overflow of allocated fix vector storage} :dt

	This should not normally happen if the fix correctly calculated
	how long the vector will grow to. Contact the developers. :dd

	{Overlapping large/large in pair colloid} :dt

	This potential is infinite when there is an overlap. :dd

	{Overlapping small/large in pair colloid} :dt

	This potential is infinite when there is an overlap. :dd

	{POEMS fix must come before NPT/NPH fix} :dt

	NPT/NPH fix must be defined in input script after all poems fixes,
	else the fix contribution to the pressure virial is incorrect. :dd

	{PPPM can only currently be used with comm_style brick} :dt

	This is a current restriction in LAMMPS. :dd

	{PPPM grid is too large} :dt

	The global PPPM grid is larger than OFFSET in one or more dimensions.
	OFFSET is currently set to 4096. You likely need to decrease the
	requested accuracy. :dd

	{PPPM grid stencil extends beyond nearest neighbor processor} :dt

	This is not allowed if the kspace_modify overlap setting is no. :dd

	{PPPM order < minimum allowed order} :dt

	The default minimum order is 2. This can be reset by the
	kspace_modify minorder command. :dd

	{PPPM order cannot be < 2 or > than %d} :dt

	This is a limitation of the PPPM implementation in LAMMPS. :dd

	{PPPMDisp Coulomb grid is too large} :dt

	The global PPPM grid is larger than OFFSET in one or more dimensions.
	OFFSET is currently set to 4096. You likely need to decrease the
	requested accuracy. :dd

	{PPPMDisp Dispersion grid is too large} :dt

	The global PPPM grid is larger than OFFSET in one or more dimensions.
	OFFSET is currently set to 4096. You likely need to decrease the
	requested accuracy. :dd

	{PPPMDisp can only currently be used with comm_style brick} :dt

	This is a current restriction in LAMMPS. :dd

	{PPPMDisp coulomb order cannot be greater than %d} :dt

	This is a limitation of the PPPM implementation in LAMMPS. :dd

	{PPPMDisp used but no parameters set, for further information please see the pppm/disp documentation} :dt

	An efficient and accurate usage of the pppm/disp requires settings via the kspace_modify command. Please see the pppm/disp documentation for further instructions. :dd

	{PRD command before simulation box is defined} :dt

	The prd command cannot be used before a read_data,
	read_restart, or create_box command. :dd

	{PRD nsteps must be multiple of t_event} :dt

	Self-explanatory. :dd

	{PRD t_corr must be multiple of t_event} :dt

	Self-explanatory. :dd

	{Package command after simulation box is defined} :dt

	The package command cannot be used afer a read_data, read_restart, or
	create_box command. :dd

	{Package cuda command without USER-CUDA package enabled} :dt

	The USER-CUDA package must be installed via "make yes-user-cuda"
	before LAMMPS is built, and the "-c on" must be used to enable the
	package. :dd

	{Package gpu command without GPU package installed} :dt

	The GPU package must be installed via "make yes-gpu" before LAMMPS is
	built. :dd

	{Package intel command without USER-INTEL package installed} :dt

	The USER-INTEL package must be installed via "make yes-user-intel"
	before LAMMPS is built. :dd

	{Package kokkos command without KOKKOS package enabled} :dt

	The KOKKOS package must be installed via "make yes-kokkos" before
	LAMMPS is built, and the "-k on" must be used to enable the package. :dd

	{Package omp command without USER-OMP package installed} :dt

	The USER-OMP package must be installed via "make yes-user-omp" before
	LAMMPS is built. :dd

	{Pair body requires atom style body} :dt

	Self-explanatory. :dd

	{Pair body requires body style nparticle} :dt

	This pair style is specific to the nparticle body style. :dd

	{Pair brownian requires atom style sphere} :dt

	Self-explanatory. :dd

	{Pair brownian requires extended particles} :dt

	One of the particles has radius 0.0. :dd

	{Pair brownian requires monodisperse particles} :dt

	All particles must be the same finite size. :dd

	{Pair brownian/poly requires atom style sphere} :dt

	Self-explanatory. :dd

	{Pair brownian/poly requires extended particles} :dt

	One of the particles has radius 0.0. :dd

	{Pair brownian/poly requires newton pair off} :dt

	Self-explanatory. :dd

	{Pair coeff for hybrid has invalid style} :dt

	Style in pair coeff must have been listed in pair_style command. :dd

	{Pair coul/wolf requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair cutoff < Respa interior cutoff} :dt

	One or more pairwise cutoffs are too short to use with the specified
	rRESPA cutoffs. :dd

	{Pair dipole/cut requires atom attributes q, mu, torque} :dt

	The atom style defined does not have these attributes. :dd

	{Pair dipole/cut/gpu requires atom attributes q, mu, torque} :dt

	The atom style defined does not have this attribute. :dd

	{Pair dipole/long requires atom attributes q, mu, torque} :dt

	The atom style defined does not have these attributes. :dd

	{Pair dipole/sf/gpu requires atom attributes q, mu, torque} :dt

	The atom style defined does not one or more of these attributes. :dd

	{Pair distance < table inner cutoff} :dt

	Two atoms are closer together than the pairwise table allows. :dd

	{Pair distance > table outer cutoff} :dt

	Two atoms are further apart than the pairwise table allows. :dd

	{Pair dpd requires ghost atoms store velocity} :dt

	Use the comm_modify vel yes command to enable this. :dd

	{Pair gayberne epsilon a,b,c coeffs are not all set} :dt

	Each atom type involved in pair_style gayberne must
	have these 3 coefficients set at least once. :dd

	{Pair gayberne requires atom style ellipsoid} :dt

	Self-explanatory. :dd

	{Pair gayberne requires atoms with same type have same shape} :dt

	Self-explanatory. :dd

	{Pair gayberne/gpu requires atom style ellipsoid} :dt

	Self-explanatory. :dd

	{Pair gayberne/gpu requires atoms with same type have same shape} :dt

	Self-explanatory. :dd

	{Pair granular requires atom attributes radius, rmass} :dt

	The atom style defined does not have these attributes. :dd

	{Pair granular requires ghost atoms store velocity} :dt

	Use the comm_modify vel yes command to enable this. :dd

	{Pair granular with shear history requires newton pair off} :dt

	This is a current restriction of the implementation of pair
	granular styles with history. :dd

	{Pair hybrid single calls do not support per sub-style special bond values} :dt

	Self-explanatory. :dd

	{Pair hybrid sub-style does not support single call} :dt

	You are attempting to invoke a single() call on a pair style
	that doesn't support it. :dd

	{Pair hybrid sub-style is not used} :dt

	No pair_coeff command used a sub-style specified in the pair_style
	command. :dd

	{Pair inner cutoff < Respa interior cutoff} :dt

	One or more pairwise cutoffs are too short to use with the specified
	rRESPA cutoffs. :dd

	{Pair inner cutoff >= Pair outer cutoff} :dt

	The specified cutoffs for the pair style are inconsistent. :dd

	{Pair line/lj requires atom style line} :dt

	Self-explanatory. :dd

	{Pair lj/long/dipole/long requires atom attributes mu, torque} :dt

	The atom style defined does not have these attributes. :dd

	{Pair lubricate requires atom style sphere} :dt

	Self-explanatory. :dd

	{Pair lubricate requires ghost atoms store velocity} :dt

	Use the comm_modify vel yes command to enable this. :dd

	{Pair lubricate requires monodisperse particles} :dt

	All particles must be the same finite size. :dd

	{Pair lubricate/poly requires atom style sphere} :dt

	Self-explanatory. :dd

	{Pair lubricate/poly requires extended particles} :dt

	One of the particles has radius 0.0. :dd

	{Pair lubricate/poly requires ghost atoms store velocity} :dt

	Use the comm_modify vel yes command to enable this. :dd

	{Pair lubricate/poly requires newton pair off} :dt

	Self-explanatory. :dd

	{Pair lubricateU requires atom style sphere} :dt

	Self-explanatory. :dd

	{Pair lubricateU requires ghost atoms store velocity} :dt

	Use the comm_modify vel yes command to enable this. :dd

	{Pair lubricateU requires monodisperse particles} :dt

	All particles must be the same finite size. :dd

	{Pair lubricateU/poly requires ghost atoms store velocity} :dt

	Use the comm_modify vel yes command to enable this. :dd

	{Pair lubricateU/poly requires newton pair off} :dt

	Self-explanatory. :dd

	{Pair peri lattice is not identical in x, y, and z} :dt

	The lattice defined by the lattice command must be cubic. :dd

	{Pair peri requires a lattice be defined} :dt

	Use the lattice command for this purpose. :dd

	{Pair peri requires an atom map, see atom_modify} :dt

	Even for atomic systems, an atom map is required to find Peridynamic
	bonds. Use the atom_modify command to define one. :dd

	{Pair resquared epsilon a,b,c coeffs are not all set} :dt

	Self-explanatory. :dd

	{Pair resquared epsilon and sigma coeffs are not all set} :dt

	Self-explanatory. :dd

	{Pair resquared requires atom style ellipsoid} :dt

	Self-explanatory. :dd

	{Pair resquared requires atoms with same type have same shape} :dt

	Self-explanatory. :dd

	{Pair resquared/gpu requires atom style ellipsoid} :dt

	Self-explanatory. :dd

	{Pair resquared/gpu requires atoms with same type have same shape} :dt

	Self-explanatory. :dd

	{Pair style AIREBO requires atom IDs} :dt

	This is a requirement to use the AIREBO potential. :dd

	{Pair style AIREBO requires newton pair on} :dt

	See the newton command. This is a restriction to use the AIREBO
	potential. :dd

	{Pair style BOP requires atom IDs} :dt

	This is a requirement to use the BOP potential. :dd

	{Pair style BOP requires newton pair on} :dt

	See the newton command. This is a restriction to use the BOP
	potential. :dd

	{Pair style COMB requires atom IDs} :dt

	This is a requirement to use the AIREBO potential. :dd

	{Pair style COMB requires atom attribute q} :dt

	Self-explanatory. :dd

	{Pair style COMB requires newton pair on} :dt

	See the newton command. This is a restriction to use the COMB
	potential. :dd

	{Pair style COMB3 requires atom IDs} :dt

	This is a requirement to use the COMB3 potential. :dd

	{Pair style COMB3 requires atom attribute q} :dt

	Self-explanatory. :dd

	{Pair style COMB3 requires newton pair on} :dt

	See the newton command. This is a restriction to use the COMB3
	potential. :dd

	{Pair style LCBOP requires atom IDs} :dt

	This is a requirement to use the LCBOP potential. :dd

	{Pair style LCBOP requires newton pair on} :dt

	See the newton command. This is a restriction to use the Tersoff
	potential. :dd

	{Pair style MEAM requires newton pair on} :dt

	See the newton command. This is a restriction to use the MEAM
	potential. :dd

	{Pair style SNAP requires newton pair on} :dt

	See the newton command. This is a restriction to use the SNAP
	potential. :dd

	{Pair style Stillinger-Weber requires atom IDs} :dt

	This is a requirement to use the SW potential. :dd

	{Pair style Stillinger-Weber requires newton pair on} :dt

	See the newton command. This is a restriction to use the SW
	potential. :dd

	{Pair style Tersoff requires atom IDs} :dt

	This is a requirement to use the Tersoff potential. :dd

	{Pair style Tersoff requires newton pair on} :dt

	See the newton command. This is a restriction to use the Tersoff
	potential. :dd

	{Pair style Vashishta requires atom IDs} :dt

	This is a requirement to use the Vashishta potential. :dd

	{Pair style Vashishta requires newton pair on} :dt

	See the newton command. This is a restriction to use the Vashishta
	potential. :dd

	{Pair style bop requires comm ghost cutoff at least 3x larger than %g} :dt

	Use the communicate ghost command to set this. See the pair bop
	doc page for more details. :dd

	{Pair style born/coul/long requires atom attribute q} :dt

	An atom style that defines this attribute must be used. :dd

	{Pair style born/coul/long/gpu requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style born/coul/wolf requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style buck/coul/cut requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style buck/coul/long requires atom attribute q} :dt

	The atom style defined does not have these attributes. :dd

	{Pair style buck/coul/long/gpu requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style buck/long/coul/long requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style coul/cut requires atom attribute q} :dt

	The atom style defined does not have these attributes. :dd

	{Pair style coul/cut/gpu requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style coul/debye/gpu requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style coul/dsf requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style coul/dsf/gpu requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style coul/long/gpu requires atom attribute q} :dt

	The atom style defined does not have these attributes. :dd

	{Pair style coul/streitz requires atom attribute q} :dt

	Self-explanatory. :dd

	{Pair style does not have extra field requested by compute pair/local} :dt

	The pair style does not support the pN value requested by the compute
	pair/local command. :dd

	{Pair style does not support bond_style quartic} :dt

	The pair style does not have a single() function, so it can
	not be invoked by bond_style quartic. :dd

	{Pair style does not support compute group/group} :dt

	The pair_style does not have a single() function, so it cannot be
	invoked by the compute group/group command. :dd

	{Pair style does not support compute pair/local} :dt

	The pair style does not have a single() function, so it can
	not be invoked by compute pair/local. :dd

	{Pair style does not support compute property/local} :dt

	The pair style does not have a single() function, so it can
	not be invoked by fix bond/swap. :dd

	{Pair style does not support fix bond/swap} :dt

	The pair style does not have a single() function, so it can
	not be invoked by fix bond/swap. :dd

	{Pair style does not support pair_write} :dt

	The pair style does not have a single() function, so it can
	not be invoked by pair write. :dd

	{Pair style does not support rRESPA inner/middle/outer} :dt

	You are attempting to use rRESPA options with a pair style that
	does not support them. :dd

	{Pair style granular with history requires atoms have IDs} :dt

	Atoms in the simulation do not have IDs, so history effects
	cannot be tracked by the granular pair potential. :dd

	{Pair style hbond/dreiding requires an atom map, see atom_modify} :dt

	Self-explanatory. :dd

	{Pair style hbond/dreiding requires atom IDs} :dt

	Self-explanatory. :dd

	{Pair style hbond/dreiding requires molecular system} :dt

	Self-explanatory. :dd

	{Pair style hbond/dreiding requires newton pair on} :dt

	See the newton command for details. :dd

	{Pair style hybrid cannot have hybrid as an argument} :dt

	Self-explanatory. :dd

	{Pair style hybrid cannot have none as an argument} :dt

	Self-explanatory. :dd

	{Pair style is incompatible with KSpace style} :dt

	If a pair style with a long-range Coulombic component is selected,
	then a kspace style must also be used. :dd

	{Pair style is incompatible with TIP4P KSpace style} :dt

	The pair style does not have the requires TIP4P settings. :dd

	{Pair style lj/charmm/coul/charmm requires atom attribute q} :dt

	The atom style defined does not have these attributes. :dd

	{Pair style lj/charmm/coul/long requires atom attribute q} :dt

	The atom style defined does not have these attributes. :dd

	{Pair style lj/charmm/coul/long/gpu requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style lj/class2/coul/cut requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style lj/class2/coul/long requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style lj/class2/coul/long/gpu requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style lj/cut/coul/cut requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style lj/cut/coul/cut/gpu requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style lj/cut/coul/debye/gpu requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style lj/cut/coul/dsf requires atom attribute q} :dt

	The atom style defined does not have these attributes. :dd

	{Pair style lj/cut/coul/dsf/gpu requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style lj/cut/coul/long requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style lj/cut/coul/long/gpu requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style lj/cut/tip4p/cut requires atom IDs} :dt

	This is a requirement to use this potential. :dd

	{Pair style lj/cut/tip4p/cut requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style lj/cut/tip4p/cut requires newton pair on} :dt

	See the newton command. This is a restriction to use this
	potential. :dd

	{Pair style lj/cut/tip4p/long requires atom IDs} :dt

	There are no atom IDs defined in the system and the TIP4P potential
	requires them to find O,H atoms with a water molecule. :dd

	{Pair style lj/cut/tip4p/long requires atom attribute q} :dt

	The atom style defined does not have these attributes. :dd

	{Pair style lj/cut/tip4p/long requires newton pair on} :dt

	This is because the computation of constraint forces within a water
	molecule adds forces to atoms owned by other processors. :dd

	{Pair style lj/gromacs/coul/gromacs requires atom attribute q} :dt

	An atom_style with this attribute is needed. :dd

	{Pair style lj/long/dipole/long does not currently support respa} :dt

	This feature is not yet supported. :dd

	{Pair style lj/long/tip4p/long requires atom IDs} :dt

	There are no atom IDs defined in the system and the TIP4P potential
	requires them to find O,H atoms with a water molecule. :dd

	{Pair style lj/long/tip4p/long requires atom attribute q} :dt

	The atom style defined does not have these attributes. :dd

	{Pair style lj/long/tip4p/long requires newton pair on} :dt

	This is because the computation of constraint forces within a water
	molecule adds forces to atoms owned by other processors. :dd

	{Pair style lj/sdk/coul/long/gpu requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style nb3b/harmonic requires atom IDs} :dt

	This is a requirement to use this potential. :dd

	{Pair style nb3b/harmonic requires newton pair on} :dt

	See the newton command. This is a restriction to use this potential. :dd

	{Pair style nm/cut/coul/cut requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style nm/cut/coul/long requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style peri requires atom style peri} :dt

	Self-explanatory. :dd

	{Pair style polymorphic requires atom IDs} :dt

	This is a requirement to use the polymorphic potential. :dd

	{Pair style polymorphic requires newton pair on} :dt

	See the newton command. This is a restriction to use the polymorphic
	potential. :dd

	{Pair style reax requires atom IDs} :dt

	This is a requirement to use the ReaxFF potential. :dd

	{Pair style reax requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style reax requires newton pair on} :dt

	This is a requirement to use the ReaxFF potential. :dd

	{Pair style requires a KSpace style} :dt

	No kspace style is defined. :dd

	{Pair style requires use of kspace_style ewald/disp} :dt

	Self-explanatory. :dd

	{Pair style sw/gpu requires atom IDs} :dt

	This is a requirement to use this potential. :dd

	{Pair style sw/gpu requires newton pair off} :dt

	See the newton command. This is a restriction to use this potential. :dd

	{Pair style tersoff/gpu requires atom IDs} :dt

	This is a requirement to use the tersoff/gpu potential. :dd

	{Pair style tersoff/gpu requires newton pair off} :dt

	See the newton command. This is a restriction to use this pair style. :dd

	{Pair style tip4p/cut requires atom IDs} :dt

	This is a requirement to use this potential. :dd

	{Pair style tip4p/cut requires atom attribute q} :dt

	The atom style defined does not have this attribute. :dd

	{Pair style tip4p/cut requires newton pair on} :dt

	See the newton command. This is a restriction to use this potential. :dd

	{Pair style tip4p/long requires atom IDs} :dt

	There are no atom IDs defined in the system and the TIP4P potential
	requires them to find O,H atoms with a water molecule. :dd

	{Pair style tip4p/long requires atom attribute q} :dt

	The atom style defined does not have these attributes. :dd

	{Pair style tip4p/long requires newton pair on} :dt

	This is because the computation of constraint forces within a water
	molecule adds forces to atoms owned by other processors. :dd

	{Pair table cutoffs must all be equal to use with KSpace} :dt

	When using pair style table with a long-range KSpace solver, the
	cutoffs for all atom type pairs must all be the same, since the
	long-range solver starts at that cutoff. :dd

	{Pair table parameters did not set N} :dt

	List of pair table parameters must include N setting. :dd

	{Pair tersoff/zbl requires metal or real units} :dt

	This is a current restriction of this pair potential. :dd

	{Pair tersoff/zbl/kk requires metal or real units} :dt

	This is a current restriction of this pair potential. :dd

	{Pair tri/lj requires atom style tri} :dt

	Self-explanatory. :dd

	{Pair yukawa/colloid requires atom style sphere} :dt

	Self-explanatory. :dd

	{Pair yukawa/colloid requires atoms with same type have same radius} :dt

	Self-explanatory. :dd

	{Pair yukawa/colloid/gpu requires atom style sphere} :dt

	Self-explanatory. :dd

	{PairKIM only works with 3D problems} :dt

	This is a current limitation. :dd

	{Pair_coeff command before pair_style is defined} :dt

	Self-explanatory. :dd

	{Pair_coeff command before simulation box is defined} :dt

	The pair_coeff command cannot be used before a read_data,
	read_restart, or create_box command. :dd

	{Pair_modify command before pair_style is defined} :dt

	Self-explanatory. :dd

	{Pair_modify special setting for pair hybrid incompatible with global special_bonds setting} :dt

	Cannot override a setting of 0.0 or 1.0 or change a setting between
	0.0 and 1.0. :dd

	{Pair_write command before pair_style is defined} :dt

	Self-explanatory. :dd

	{Particle on or inside fix wall surface} :dt

	Particles must be "exterior" to the wall in order for energy/force to
	be calculated. :dd

	{Particle outside surface of region used in fix wall/region} :dt

	Particles must be inside the region for energy/force to be calculated.
	A particle outside the region generates an error. :dd

	{Per-atom compute in equal-style variable formula} :dt

	Equal-style variables cannot use per-atom quantities. :dd

	{Per-atom energy was not tallied on needed timestep} :dt

	You are using a thermo keyword that requires potentials to
	have tallied energy, but they didn't on this timestep. See the
	variable doc page for ideas on how to make this work. :dd

	{Per-atom fix in equal-style variable formula} :dt

	Equal-style variables cannot use per-atom quantities. :dd

	{Per-atom virial was not tallied on needed timestep} :dt

	You are using a thermo keyword that requires potentials to have
	tallied the virial, but they didn't on this timestep. See the
	variable doc page for ideas on how to make this work. :dd

	{Per-processor system is too big} :dt

	The number of owned atoms plus ghost atoms on a single
	processor must fit in 32-bit integer. :dd

	{Potential energy ID for fix neb does not exist} :dt

	Self-explanatory. :dd

	{Potential energy ID for fix nvt/nph/npt does not exist} :dt

	A compute for potential energy must be defined. :dd

	{Potential file has duplicate entry} :dt

	The potential file has more than one entry for the same element. :dd

	{Potential file is missing an entry} :dt

	The potential file does not have a needed entry. :dd

	{Power by 0 in variable formula} :dt

	Self-explanatory. :dd

	{Pressure ID for fix box/relax does not exist} :dt

	The compute ID needed to compute pressure for the fix does not
	exist. :dd

	{Pressure ID for fix modify does not exist} :dt

	Self-explanatory. :dd

	{Pressure ID for fix npt/nph does not exist} :dt

	Self-explanatory. :dd

	{Pressure ID for fix press/berendsen does not exist} :dt

	The compute ID needed to compute pressure for the fix does not
	exist. :dd

	{Pressure ID for fix rigid npt/nph does not exist} :dt

	Self-explanatory. :dd

	{Pressure ID for thermo does not exist} :dt

	The compute ID needed to compute pressure for thermodynamics does not
	exist. :dd

	{Pressure control can not be used with fix nvt} :dt

	Self-explanatory. :dd

	{Pressure control can not be used with fix nvt/asphere} :dt

	Self-explanatory. :dd

	{Pressure control can not be used with fix nvt/body} :dt

	Self-explanatory. :dd

	{Pressure control can not be used with fix nvt/sllod} :dt

	Self-explanatory. :dd

	{Pressure control can not be used with fix nvt/sphere} :dt

	Self-explanatory. :dd

	{Pressure control must be used with fix nph} :dt

	Self-explanatory. :dd

	{Pressure control must be used with fix nph/asphere} :dt

	Self-explanatory. :dd

	{Pressure control must be used with fix nph/body} :dt

	Self-explanatory. :dd

	{Pressure control must be used with fix nph/small} :dt

	Self-explanatory. :dd

	{Pressure control must be used with fix nph/sphere} :dt

	Self-explanatory. :dd

	{Pressure control must be used with fix nphug} :dt

	A pressure control keyword (iso, aniso, tri, x, y, or z) must be
	provided. :dd

	{Pressure control must be used with fix npt} :dt

	Self-explanatory. :dd

	{Pressure control must be used with fix npt/asphere} :dt

	Self-explanatory. :dd

	{Pressure control must be used with fix npt/body} :dt

	Self-explanatory. :dd

	{Pressure control must be used with fix npt/sphere} :dt

	Self-explanatory. :dd

	{Processor count in z must be 1 for 2d simulation} :dt

	Self-explanatory. :dd

	{Processor partitions do not match number of allocated processors} :dt

	The total number of processors in all partitions must match the number
	of processors LAMMPS is running on. :dd

	{Processors command after simulation box is defined} :dt

	The processors command cannot be used after a read_data, read_restart,
	or create_box command. :dd

	{Processors custom grid file is inconsistent} :dt

	The vales in the custom file are not consistent with the number of
	processors you are running on or the Px,Py,Pz settings of the
	processors command. Or there was not a setting for every processor. :dd

	{Processors grid numa and map style are incompatible} :dt

	Using numa for gstyle in the processors command requires using
	cart for the map option. :dd

	{Processors part option and grid style are incompatible} :dt

	Cannot use gstyle numa or custom with the part option. :dd

	{Processors twogrid requires proc count be a multiple of core count} :dt

	Self-explanatory. :dd

	{Pstart and Pstop must have the same value} :dt

	Self-explanatory. :dd

	{Python function evaluation failed} :dt

	The Python function did not run successfully and/or did not return a
	value (if it is supposed to return a value). This is probably due to
	some error condition in the function. :dd

	{Python function is not callable} :dt

	The provided Python code was run successfully, but it not
	define a callable function with the required name. :dd

	{Python invoke of undefined function} :dt

	Cannot invoke a function that has not been previously defined. :dd

	{Python variable does not match Python function} :dt

	This matching is defined by the python-style variable and the python
	command. :dd

	{Python variable has no function} :dt

	No python command was used to define the function associated with the
	python-style variable. :dd

	{QEQ with 'newton pair off' not supported} :dt

	See the newton command. This is a restriction to use the QEQ fixes. :dd

	{R0 < 0 for fix spring command} :dt

	Equilibrium spring length is invalid. :dd

	{RATTLE coordinate constraints are not satisfied up to desired tolerance} :dt

	Self-explanatory. :dd

	{RATTLE determinant = 0.0} :dt

	The determinant of the matrix being solved for a single cluster
	specified by the fix rattle command is numerically invalid. :dd

	{RATTLE failed} :dt

	Certain constraints were not satisfied. :dd

	{RATTLE velocity constraints are not satisfied up to desired tolerance} :dt

	Self-explanatory. :dd

	{Read data add offset is too big} :dt

	It cannot be larger than the size of atom IDs, e.g. the maximum 32-bit
	integer. :dd

	{Read dump of atom property that isn't allocated} :dt

	Self-explanatory. :dd

	{Read rerun dump file timestep > specified stop} :dt

	Self-explanatory. :dd

	{Read restart MPI-IO input not allowed with % in filename} :dt

	This is because a % signifies one file per processor and MPI-IO
	creates one large file for all processors. :dd

	{Read_data shrink wrap did not assign all atoms correctly} :dt

	This is typically because the box-size specified in the data file is
	large compared to the actual extent of atoms in a shrink-wrapped
	dimension. When LAMMPS shrink-wraps the box atoms will be lost if the
	processor they are re-assigned to is too far away. Choose a box
	size closer to the actual extent of the atoms. :dd

	{Read_dump command before simulation box is defined} :dt

	The read_dump command cannot be used before a read_data, read_restart,
	or create_box command. :dd

	{Read_dump field not found in dump file} :dt

	Self-explanatory. :dd

	{Read_dump triclinic status does not match simulation} :dt

	Both the dump snapshot and the current LAMMPS simulation must
	be using either an orthogonal or triclinic box. :dd

	{Read_dump xyz fields do not have consistent scaling/wrapping} :dt

	Self-explanatory. :dd

	{Reading from MPI-IO filename when MPIIO package is not installed} :dt

	Self-explanatory. :dd

	{Reax_defs.h setting for NATDEF is too small} :dt

	Edit the setting in the ReaxFF library and re-compile the
	library and re-build LAMMPS. :dd

	{Reax_defs.h setting for NNEIGHMAXDEF is too small} :dt

	Edit the setting in the ReaxFF library and re-compile the
	library and re-build LAMMPS. :dd

	{Receiving partition in processors part command is already a receiver} :dt

	Cannot specify a partition to be a receiver twice. :dd

	{Region ID for compute chunk/atom does not exist} :dt

	Self-explanatory. :dd

	{Region ID for compute reduce/region does not exist} :dt

	Self-explanatory. :dd

	{Region ID for compute temp/region does not exist} :dt

	Self-explanatory. :dd

	{Region ID for dump custom does not exist} :dt

	Self-explanatory. :dd

	{Region ID for fix addforce does not exist} :dt

	Self-explanatory. :dd

	{Region ID for fix atom/swap does not exist} :dt

	Self-explanatory. :dd

	{Region ID for fix ave/spatial does not exist} :dt

	Self-explanatory. :dd

	{Region ID for fix aveforce does not exist} :dt

	Self-explanatory. :dd

	{Region ID for fix deposit does not exist} :dt

	Self-explanatory. :dd

	{Region ID for fix efield does not exist} :dt

	Self-explanatory. :dd

	{Region ID for fix evaporate does not exist} :dt

	Self-explanatory. :dd

	{Region ID for fix gcmc does not exist} :dt

	Self-explanatory. :dd

	{Region ID for fix heat does not exist} :dt

	Self-explanatory. :dd

	{Region ID for fix setforce does not exist} :dt

	Self-explanatory. :dd

	{Region ID for fix wall/region does not exist} :dt

	Self-explanatory. :dd

	{Region ID for group dynamic does not exist} :dt

	Self-explanatory. :dd

	{Region ID in variable formula does not exist} :dt

	Self-explanatory. :dd

	{Region cannot have 0 length rotation vector} :dt

	Self-explanatory. :dd

	{Region for fix oneway does not exist} :dt

	Self-explanatory. :dd

	{Region intersect region ID does not exist} :dt

	Self-explanatory. :dd

	{Region union or intersect cannot be dynamic} :dt

	The sub-regions can be dynamic, but not the combined region. :dd

	{Region union region ID does not exist} :dt

	One or more of the region IDs specified by the region union command
	does not exist. :dd

	{Replacing a fix, but new style != old style} :dt

	A fix ID can be used a 2nd time, but only if the style matches the
	previous fix. In this case it is assumed you with to reset a fix's
	parameters. This error may mean you are mistakenly re-using a fix ID
	when you do not intend to. :dd

	{Replicate command before simulation box is defined} :dt

	The replicate command cannot be used before a read_data, read_restart,
	or create_box command. :dd

	{Replicate did not assign all atoms correctly} :dt

	Atoms replicated by the replicate command were not assigned correctly
	to processors. This is likely due to some atom coordinates being
	outside a non-periodic simulation box. :dd

	{Replicated system atom IDs are too big} :dt

	See the setting for tagint in the src/lmptype.h file. :dd

	{Replicated system is too big} :dt

	See the setting for bigint in the src/lmptype.h file. :dd

	{Required border comm not yet implemented with Kokkos} :dt

	There are various limitations in the communication options supported
	by Kokkos. :dd

	{Rerun command before simulation box is defined} :dt

	The rerun command cannot be used before a read_data, read_restart, or
	create_box command. :dd

	{Rerun dump file does not contain requested snapshot} :dt

	Self-explanatory. :dd

	{Resetting timestep size is not allowed with fix move} :dt

	This is because fix move is moving atoms based on elapsed time. :dd

	{Respa inner cutoffs are invalid} :dt

	The first cutoff must be <= the second cutoff. :dd

	{Respa levels must be >= 1} :dt

	Self-explanatory. :dd

	{Respa middle cutoffs are invalid} :dt

	The first cutoff must be <= the second cutoff. :dd

	{Restart file MPI-IO output not allowed with % in filename} :dt

	This is because a % signifies one file per processor and MPI-IO
	creates one large file for all processors. :dd

	{Restart file byte ordering is not recognized} :dt

	The file does not appear to be a LAMMPS restart file since it doesn't
	contain a recognized byte-orderomg flag at the beginning. :dd

	{Restart file byte ordering is swapped} :dt

	The file was written on a machine with different byte-ordering than
	the machine you are reading it on. Convert it to a text data file
	instead, on the machine you wrote it on. :dd

	{Restart file incompatible with current version} :dt

	This is probably because you are trying to read a file created with a
	version of LAMMPS that is too old compared to the current version.
	Use your older version of LAMMPS and convert the restart file
	to a data file. :dd

	{Restart file is a MPI-IO file} :dt

	The file is inconsistent with the filename you specified for it. :dd

	{Restart file is a multi-proc file} :dt

	The file is inconsistent with the filename you specified for it. :dd

	{Restart file is not a MPI-IO file} :dt

	The file is inconsistent with the filename you specified for it. :dd

	{Restart file is not a multi-proc file} :dt

	The file is inconsistent with the filename you specified for it. :dd

	{Restart variable returned a bad timestep} :dt

	The variable must return a timestep greater than the current timestep. :dd

	{Restrain atoms %d %d %d %d missing on proc %d at step %ld} :dt

	The 4 atoms in a restrain dihedral specified by the fix restrain
	command are not all accessible to a processor. This probably means an
	atom has moved too far. :dd

	{Restrain atoms %d %d %d missing on proc %d at step %ld} :dt

	The 3 atoms in a restrain angle specified by the fix restrain
	command are not all accessible to a processor. This probably means an
	atom has moved too far. :dd

	{Restrain atoms %d %d missing on proc %d at step %ld} :dt

	The 2 atoms in a restrain bond specified by the fix restrain
	command are not all accessible to a processor. This probably means an
	atom has moved too far. :dd

	{Reuse of compute ID} :dt

	A compute ID cannot be used twice. :dd

	{Reuse of dump ID} :dt

	A dump ID cannot be used twice. :dd

	{Reuse of molecule template ID} :dt

	The template IDs must be unique. :dd

	{Reuse of region ID} :dt

	A region ID cannot be used twice. :dd

	{Rigid body atoms %d %d missing on proc %d at step %ld} :dt

	This means that an atom cannot find the atom that owns the rigid body
	it is part of, or vice versa. The solution is to use the communicate
	cutoff command to insure ghost atoms are acquired from far enough away
	to encompass the max distance printed when the fix rigid/small command
	was invoked. :dd

	{Rigid body has degenerate moment of inertia} :dt

	Fix poems will only work with bodies (collections of atoms) that have
	non-zero principal moments of inertia. This means they must be 3 or
	more non-collinear atoms, even with joint atoms removed. :dd

	{Rigid fix must come before NPT/NPH fix} :dt

	NPT/NPH fix must be defined in input script after all rigid fixes,
	else the rigid fix contribution to the pressure virial is
	incorrect. :dd

	{Rmask function in equal-style variable formula} :dt

	Rmask is per-atom operation. :dd

	{Run command before simulation box is defined} :dt

	The run command cannot be used before a read_data, read_restart, or
	create_box command. :dd

	{Run command start value is after start of run} :dt

	Self-explanatory. :dd

	{Run command stop value is before end of run} :dt

	Self-explanatory. :dd

	{Run_style command before simulation box is defined} :dt

	The run_style command cannot be used before a read_data,
	read_restart, or create_box command. :dd

	{SRD bin size for fix srd differs from user request} :dt

	Fix SRD had to adjust the bin size to fit the simulation box. See the
	cubic keyword if you want this message to be an error vs warning. :dd

	{SRD bins for fix srd are not cubic enough} :dt

	The bin shape is not within tolerance of cubic. See the cubic
	keyword if you want this message to be an error vs warning. :dd

	{SRD particle %d started inside big particle %d on step %ld bounce %d} :dt

	See the inside keyword if you want this message to be an error vs
	warning. :dd

	{SRD particle %d started inside wall %d on step %ld bounce %d} :dt

	See the inside keyword if you want this message to be an error vs
	warning. :dd

	{Same dimension twice in fix ave/spatial} :dt

	Self-explanatory. :dd

	{Sending partition in processors part command is already a sender} :dt

	Cannot specify a partition to be a sender twice. :dd

	{Set command before simulation box is defined} :dt

	The set command cannot be used before a read_data, read_restart,
	or create_box command. :dd

	{Set command floating point vector does not exist} :dt

	Self-explanatory. :dd

	{Set command integer vector does not exist} :dt

	Self-explanatory. :dd

	{Set command with no atoms existing} :dt

	No atoms are yet defined so the set command cannot be used. :dd

	{Set region ID does not exist} :dt

	Region ID specified in set command does not exist. :dd

	{Shake angles have different bond types} :dt

	All 3-atom angle-constrained SHAKE clusters specified by the fix shake
	command that are the same angle type, must also have the same bond
	types for the 2 bonds in the angle. :dd

	{Shake atoms %d %d %d %d missing on proc %d at step %ld} :dt

	The 4 atoms in a single shake cluster specified by the fix shake
	command are not all accessible to a processor. This probably means
	an atom has moved too far. :dd

	{Shake atoms %d %d %d missing on proc %d at step %ld} :dt

	The 3 atoms in a single shake cluster specified by the fix shake
	command are not all accessible to a processor. This probably means
	an atom has moved too far. :dd

	{Shake atoms %d %d missing on proc %d at step %ld} :dt

	The 2 atoms in a single shake cluster specified by the fix shake
	command are not all accessible to a processor. This probably means
	an atom has moved too far. :dd

	{Shake cluster of more than 4 atoms} :dt

	A single cluster specified by the fix shake command can have no more
	than 4 atoms. :dd

	{Shake clusters are connected} :dt

	A single cluster specified by the fix shake command must have a single
	central atom with up to 3 other atoms bonded to it. :dd

	{Shake determinant = 0.0} :dt

	The determinant of the matrix being solved for a single cluster
	specified by the fix shake command is numerically invalid. :dd

	{Shake fix must come before NPT/NPH fix} :dt

	NPT fix must be defined in input script after SHAKE fix, else the
	SHAKE fix contribution to the pressure virial is incorrect. :dd

	{Shear history overflow, boost neigh_modify one} :dt

	There are too many neighbors of a single atom. Use the neigh_modify
	command to increase the max number of neighbors allowed for one atom.
	You may also want to boost the page size. :dd

	{Small to big integers are not sized correctly} :dt

	This error occurs whenthe sizes of smallint, imageint, tagint, bigint,
	as defined in src/lmptype.h are not what is expected. Contact
	the developers if this occurs. :dd

	{Smallint setting in lmptype.h is invalid} :dt

	It has to be the size of an integer. :dd

	{Smallint setting in lmptype.h is not compatible} :dt

	Smallint stored in restart file is not consistent with LAMMPS version
	you are running. :dd

	{Special list size exceeded in fix bond/create} :dt

	See the read_data command for info on setting the "extra special per
	atom" header value to allow for additional special values to be
	stored. :dd

	{Specified processors != physical processors} :dt

	The 3d grid of processors defined by the processors command does not
	match the number of processors LAMMPS is being run on. :dd

	{Specified target stress must be uniaxial or hydrostatic} :dt

	Self-explanatory. :dd

	{Sqrt of negative value in variable formula} :dt

	Self-explanatory. :dd

	{Subsequent read data induced too many angles per atom} :dt

	See the create_box extra/angle/per/atom or read_data "extra angle per
	atom" header value to set this limit larger. :dd

	{Subsequent read data induced too many bonds per atom} :dt

	See the create_box extra/bond/per/atom or read_data "extra bond per
	atom" header value to set this limit larger. :dd

	{Subsequent read data induced too many dihedrals per atom} :dt

	See the create_box extra/dihedral/per/atom or read_data "extra
	dihedral per atom" header value to set this limit larger. :dd

	{Subsequent read data induced too many impropers per atom} :dt

	See the create_box extra/improper/per/atom or read_data "extra
	improper per atom" header value to set this limit larger. :dd

	{Substitution for illegal variable} :dt

	Input script line contained a variable that could not be substituted
	for. :dd

	{Support for writing images in JPEG format not included} :dt

	LAMMPS was not built with the -DLAMMPS_JPEG switch in the Makefile. :dd

	{Support for writing images in PNG format not included} :dt

	LAMMPS was not built with the -DLAMMPS_PNG switch in the Makefile. :dd

	{Support for writing movies not included} :dt

	LAMMPS was not built with the -DLAMMPS_FFMPEG switch in the Makefile :dd

	{System in data file is too big} :dt

	See the setting for bigint in the src/lmptype.h file. :dd

	{System is not charge neutral, net charge = %g} :dt

	The total charge on all atoms on the system is not 0.0.
	For some KSpace solvers this is an error. :dd

	{TAD nsteps must be multiple of t_event} :dt

	Self-explanatory. :dd

	{TIP4P hydrogen has incorrect atom type} :dt

	The TIP4P pairwise computation found an H atom whose type does not
	agree with the specified H type. :dd

	{TIP4P hydrogen is missing} :dt

	The TIP4P pairwise computation failed to find the correct H atom
	within a water molecule. :dd

	{TMD target file did not list all group atoms} :dt

	The target file for the fix tmd command did not list all atoms in the
	fix group. :dd

	{Tad command before simulation box is defined} :dt

	Self-explanatory. :dd

	{Tagint setting in lmptype.h is invalid} :dt

	Tagint must be as large or larger than smallint. :dd

	{Tagint setting in lmptype.h is not compatible} :dt

	Format of tagint stored in restart file is not consistent with LAMMPS
	version you are running. See the settings in src/lmptype.h :dd

	{Target pressure for fix rigid/nph cannot be < 0.0} :dt

	Self-explanatory. :dd

	{Target pressure for fix rigid/npt/small cannot be < 0.0} :dt

	Self-explanatory. :dd

	{Target temperature for fix nvt/npt/nph cannot be 0.0} :dt

	Self-explanatory. :dd

	{Target temperature for fix rigid/npt cannot be 0.0} :dt

	Self-explanatory. :dd

	{Target temperature for fix rigid/npt/small cannot be 0.0} :dt

	Self-explanatory. :dd

	{Target temperature for fix rigid/nvt cannot be 0.0} :dt

	Self-explanatory. :dd

	{Target temperature for fix rigid/nvt/small cannot be 0.0} :dt

	Self-explanatory. :dd

	{Temper command before simulation box is defined} :dt

	The temper command cannot be used before a read_data, read_restart, or
	create_box command. :dd

	{Temperature ID for fix bond/swap does not exist} :dt

	Self-explanatory. :dd

	{Temperature ID for fix box/relax does not exist} :dt

	Self-explanatory. :dd

	{Temperature ID for fix nvt/npt does not exist} :dt

	Self-explanatory. :dd

	{Temperature ID for fix press/berendsen does not exist} :dt

	Self-explanatory. :dd

	{Temperature ID for fix rigid nvt/npt/nph does not exist} :dt

	Self-explanatory. :dd

	{Temperature ID for fix temp/berendsen does not exist} :dt

	Self-explanatory. :dd

	{Temperature ID for fix temp/csld does not exist} :dt

	Self-explanatory. :dd

	{Temperature ID for fix temp/csvr does not exist} :dt

	Self-explanatory. :dd

	{Temperature ID for fix temp/rescale does not exist} :dt

	Self-explanatory. :dd

	{Temperature compute degrees of freedom < 0} :dt

	This should not happen if you are calculating the temperature
	on a valid set of atoms. :dd

	{Temperature control can not be used with fix nph} :dt

	Self-explanatory. :dd

	{Temperature control can not be used with fix nph/asphere} :dt

	Self-explanatory. :dd

	{Temperature control can not be used with fix nph/body} :dt

	Self-explanatory. :dd

	{Temperature control can not be used with fix nph/sphere} :dt

	Self-explanatory. :dd

	{Temperature control must be used with fix nphug} :dt

	The temp keyword must be provided. :dd

	{Temperature control must be used with fix npt} :dt

	Self-explanatory. :dd

	{Temperature control must be used with fix npt/asphere} :dt

	Self-explanatory. :dd

	{Temperature control must be used with fix npt/body} :dt

	Self-explanatory. :dd

	{Temperature control must be used with fix npt/sphere} :dt

	Self-explanatory. :dd

	{Temperature control must be used with fix nvt} :dt

	Self-explanatory. :dd

	{Temperature control must be used with fix nvt/asphere} :dt

	Self-explanatory. :dd

	{Temperature control must be used with fix nvt/body} :dt

	Self-explanatory. :dd

	{Temperature control must be used with fix nvt/sllod} :dt

	Self-explanatory. :dd

	{Temperature control must be used with fix nvt/sphere} :dt

	Self-explanatory. :dd

	{Temperature control must not be used with fix nph/small} :dt

	Self-explanatory. :dd

	{Temperature for fix nvt/sllod does not have a bias} :dt

	The specified compute must compute temperature with a bias. :dd

	{Tempering could not find thermo_pe compute} :dt

	This compute is created by the thermo command. It must have been
	explicitly deleted by a uncompute command. :dd

	{Tempering fix ID is not defined} :dt

	The fix ID specified by the temper command does not exist. :dd

	{Tempering temperature fix is not valid} :dt

	The fix specified by the temper command is not one that controls
	temperature (nvt or langevin). :dd

	{Test_descriptor_string already allocated} :dt

	This is an internal error. Contact the developers. :dd

	{The package gpu command is required for gpu styles} :dt

	Self-explanatory. :dd

	{Thermo and fix not computed at compatible times} :dt

	Fixes generate values on specific timesteps. The thermo output
	does not match these timesteps. :dd

	{Thermo compute array is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Thermo compute does not compute array} :dt

	Self-explanatory. :dd

	{Thermo compute does not compute scalar} :dt

	Self-explanatory. :dd

	{Thermo compute does not compute vector} :dt

	Self-explanatory. :dd

	{Thermo compute vector is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Thermo custom variable cannot be indexed} :dt

	Self-explanatory. :dd

	{Thermo custom variable is not equal-style variable} :dt

	Only equal-style variables can be output with thermodynamics, not
	atom-style variables. :dd

	{Thermo every variable returned a bad timestep} :dt

	The variable must return a timestep greater than the current timestep. :dd

	{Thermo fix array is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Thermo fix does not compute array} :dt

	Self-explanatory. :dd

	{Thermo fix does not compute scalar} :dt

	Self-explanatory. :dd

	{Thermo fix does not compute vector} :dt

	Self-explanatory. :dd

	{Thermo fix vector is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Thermo keyword in variable requires thermo to use/init pe} :dt

	You are using a thermo keyword in a variable that requires
	potential energy to be calculated, but your thermo output
	does not use it. Add it to your thermo output. :dd

	{Thermo keyword in variable requires thermo to use/init press} :dt

	You are using a thermo keyword in a variable that requires pressure to
	be calculated, but your thermo output does not use it. Add it to your
	thermo output. :dd

	{Thermo keyword in variable requires thermo to use/init temp} :dt

	You are using a thermo keyword in a variable that requires temperature
	to be calculated, but your thermo output does not use it. Add it to
	your thermo output. :dd

	{Thermo style does not use press} :dt

	Cannot use thermo_modify to set this parameter since the thermo_style
	is not computing this quantity. :dd

	{Thermo style does not use temp} :dt

	Cannot use thermo_modify to set this parameter since the thermo_style
	is not computing this quantity. :dd

	{Thermo_modify every variable returned a bad timestep} :dt

	The returned timestep is less than or equal to the current timestep. :dd

	{Thermo_modify int format does not contain d character} :dt

	Self-explanatory. :dd

	{Thermo_modify pressure ID does not compute pressure} :dt

	The specified compute ID does not compute pressure. :dd

	{Thermo_modify temperature ID does not compute temperature} :dt

	The specified compute ID does not compute temperature. :dd

	{Thermo_style command before simulation box is defined} :dt

	The thermo_style command cannot be used before a read_data,
	read_restart, or create_box command. :dd

	{This variable thermo keyword cannot be used between runs} :dt

	Keywords that refer to time (such as cpu, elapsed) do not
	make sense in between runs. :dd

	{Threshhold for an atom property that isn't allocated} :dt

	A dump threshold has been requested on a quantity that is
	not defined by the atom style used in this simulation. :dd

	{Timestep must be >= 0} :dt

	Specified timestep is invalid. :dd

	{Too big a problem to use velocity create loop all} :dt

	The system size must fit in a 32-bit integer to use this option. :dd

	{Too big a timestep for dump dcd} :dt

	The timestep must fit in a 32-bit integer to use this dump style. :dd

	{Too big a timestep for dump xtc} :dt

	The timestep must fit in a 32-bit integer to use this dump style. :dd

	{Too few bits for lookup table} :dt

	Table size specified via pair_modify command does not work with your
	machine's floating point representation. :dd

	{Too few lines in %s section of data file} :dt

	Self-explanatory. :dd

	{Too few values in body lines in data file} :dt

	Self-explanatory. :dd

	{Too few values in body section of molecule file} :dt

	Self-explanatory. :dd

	{Too many -pk arguments in command line} :dt

	The string formed by concatenating the arguments is too long. Use a
	package command in the input script instead. :dd

	{Too many MSM grid levels} :dt

	The max number of MSM grid levels is hardwired to 10. :dd

	{Too many args in variable function} :dt

	More args are used than any variable function allows. :dd

	{Too many atom pairs for pair bop} :dt

	The number of atomic pairs exceeds the expected number. Check your
	atomic structure to ensure that it is realistic. :dd

	{Too many atom sorting bins} :dt

	This is likely due to an immense simulation box that has blown up
	to a large size. :dd

	{Too many atom triplets for pair bop} :dt

	The number of three atom groups for angle determinations exceeds the
	expected number. Check your atomic structure to ensure that it is
	realistic. :dd

	{Too many atoms for dump dcd} :dt

	The system size must fit in a 32-bit integer to use this dump
	style. :dd

	{Too many atoms for dump xtc} :dt

	The system size must fit in a 32-bit integer to use this dump
	style. :dd

	{Too many atoms to dump sort} :dt

	Cannot sort when running with more than 2^31 atoms. :dd

	{Too many exponent bits for lookup table} :dt

	Table size specified via pair_modify command does not work with your
	machine's floating point representation. :dd

	{Too many groups} :dt

	The maximum number of atom groups (including the "all" group) is
	given by MAX_GROUP in group.cpp and is 32. :dd

	{Too many iterations} :dt

	You must use a number of iterations that fit in a 32-bit integer
	for minimization. :dd

	{Too many lines in one body in data file - boost MAXBODY} :dt

	MAXBODY is a setting at the top of the src/read_data.cpp file.
	Set it larger and re-compile the code. :dd

	{Too many local+ghost atoms for neighbor list} :dt

	The number of nlocal + nghost atoms on a processor
	is limited by the size of a 32-bit integer with 2 bits
	removed for masking 1-2, 1-3, 1-4 neighbors. :dd

	{Too many mantissa bits for lookup table} :dt

	Table size specified via pair_modify command does not work with your
	machine's floating point representation. :dd

	{Too many masses for fix shake} :dt

	The fix shake command cannot list more masses than there are atom
	types. :dd

	{Too many molecules for fix poems} :dt

	The limit is 2^31 = ~2 billion molecules. :dd

	{Too many molecules for fix rigid} :dt

	The limit is 2^31 = ~2 billion molecules. :dd

	{Too many neighbor bins} :dt

	This is likely due to an immense simulation box that has blown up
	to a large size. :dd

	{Too many timesteps} :dt

	The cumulative timesteps must fit in a 64-bit integer. :dd

	{Too many timesteps for NEB} :dt

	You must use a number of timesteps that fit in a 32-bit integer
	for NEB. :dd

	{Too many total atoms} :dt

	See the setting for bigint in the src/lmptype.h file. :dd

	{Too many total bits for bitmapped lookup table} :dt

	Table size specified via pair_modify command is too large. Note that
	a value of N generates a 2^N size table. :dd

	{Too many values in body lines in data file} :dt

	Self-explanatory. :dd

	{Too many values in body section of molecule file} :dt

	Self-explanatory. :dd

	{Too much buffered per-proc info for dump} :dt

	The size of the buffered string must fit in a 32-bit integer for a
	dump. :dd

	{Too much per-proc info for dump} :dt

	Number of local atoms times number of columns must fit in a 32-bit
	integer for dump. :dd

	{Tree structure in joint connections} :dt

	Fix poems cannot (yet) work with coupled bodies whose joints connect
	the bodies in a tree structure. :dd

	{Triclinic box skew is too large} :dt

	The displacement in a skewed direction must be less than half the box
	length in that dimension. E.g. the xy tilt must be between -half and
	+half of the x box length. This constraint can be relaxed by using
	the box tilt command. :dd

	{Tried to convert a double to int, but input_double > INT_MAX} :dt

	Self-explanatory. :dd

	{Trying to build an occasional neighbor list before initialization completed} :dt

	This is not allowed. Source code caller needs to be modified. :dd

	{Two fix ave commands using same compute chunk/atom command in incompatible ways} :dt

	They are both attempting to "lock" the chunk/atom command so that the
	chunk assignments persist for some number of timesteps, but are doing
	it in different ways. :dd

	{Two groups cannot be the same in fix spring couple} :dt

	Self-explanatory. :dd

	{USER-CUDA mode requires CUDA variant of min style} :dt

	CUDA mode is enabled, so the min style must include a cuda suffix. :dd

	{USER-CUDA mode requires CUDA variant of run style} :dt

	CUDA mode is enabled, so the run style must include a cuda suffix. :dd

	{USER-CUDA package does not yet support comm_style tiled} :dt

	Self-explanatory. :dd

	{USER-CUDA package requires a cuda enabled atom_style} :dt

	Self-explanatory. :dd

	{Unable to initialize accelerator for use} :dt

	There was a problem initializing an accelerator for the gpu package :dd

	{Unbalanced quotes in input line} :dt

	No matching end double quote was found following a leading double
	quote. :dd

	{Unexpected end of -reorder file} :dt

	Self-explanatory. :dd

	{Unexpected end of AngleCoeffs section} :dt

	Read a blank line. :dd

	{Unexpected end of BondCoeffs section} :dt

	Read a blank line. :dd

	{Unexpected end of DihedralCoeffs section} :dt

	Read a blank line. :dd

	{Unexpected end of ImproperCoeffs section} :dt

	Read a blank line. :dd

	{Unexpected end of PairCoeffs section} :dt

	Read a blank line. :dd

	{Unexpected end of custom file} :dt

	Self-explanatory. :dd

	{Unexpected end of data file} :dt

	LAMMPS hit the end of the data file while attempting to read a
	section. Something is wrong with the format of the data file. :dd

	{Unexpected end of dump file} :dt

	A read operation from the file failed. :dd

	{Unexpected end of fix rigid file} :dt

	A read operation from the file failed. :dd

	{Unexpected end of fix rigid/small file} :dt

	A read operation from the file failed. :dd

	{Unexpected end of molecule file} :dt

	Self-explanatory. :dd

	{Unexpected end of neb file} :dt

	A read operation from the file failed. :dd

	{Units command after simulation box is defined} :dt

	The units command cannot be used after a read_data, read_restart, or
	create_box command. :dd

	{Universe/uloop variable count < # of partitions} :dt

	A universe or uloop style variable must specify a number of values >= to the
	number of processor partitions. :dd

	{Unknown angle style} :dt

	The choice of angle style is unknown. :dd

	{Unknown atom style} :dt

	The choice of atom style is unknown. :dd

	{Unknown body style} :dt

	The choice of body style is unknown. :dd

	{Unknown bond style} :dt

	The choice of bond style is unknown. :dd

	{Unknown category for info is_active()} :dt

	Self-explanatory. :dd

	{Unknown category for info is_available()} :dt

	Self-explanatory. :dd

	{Unknown category for info is_defined()} :dt

	Self-explanatory. :dd

	{Unknown command: %s} :dt

	The command is not known to LAMMPS. Check the input script. :dd

	{Unknown compute style} :dt

	The choice of compute style is unknown. :dd

	{Unknown dihedral style} :dt

	The choice of dihedral style is unknown. :dd

	{Unknown dump reader style} :dt

	The choice of dump reader style via the format keyword is unknown. :dd

	{Unknown dump style} :dt

	The choice of dump style is unknown. :dd

	{Unknown error in GPU library} :dt

	Self-explanatory. :dd

	{Unknown fix style} :dt

	The choice of fix style is unknown. :dd

	{Unknown identifier in data file: %s} :dt

	A section of the data file cannot be read by LAMMPS. :dd

	{Unknown improper style} :dt

	The choice of improper style is unknown. :dd

	{Unknown keyword in thermo_style custom command} :dt

	One or more specified keywords are not recognized. :dd

	{Unknown kspace style} :dt

	The choice of kspace style is unknown. :dd

	{Unknown name for info newton category} :dt

	Self-explanatory. :dd

	{Unknown name for info package category} :dt

	Self-explanatory. :dd

	{Unknown name for info pair category} :dt

	Self-explanatory. :dd

	{Unknown pair style} :dt

	The choice of pair style is unknown. :dd

	{Unknown pair_modify hybrid sub-style} :dt

	The choice of sub-style is unknown. :dd

	{Unknown region style} :dt

	The choice of region style is unknown. :dd

	{Unknown section in molecule file} :dt

	Self-explanatory. :dd

	{Unknown table style in angle style table} :dt

	Self-explanatory. :dd

	{Unknown table style in bond style table} :dt

	Self-explanatory. :dd

	{Unknown table style in pair_style command} :dt

	Style of table is invalid for use with pair_style table command. :dd

	{Unknown unit_style} :dt

	Self-explanatory. Check the input script or data file. :dd

	{Unrecognized lattice type in MEAM file 1} :dt

	The lattice type in an entry of the MEAM library file is not
	valid. :dd

	{Unrecognized lattice type in MEAM file 2} :dt

	The lattice type in an entry of the MEAM parameter file is not
	valid. :dd

	{Unrecognized pair style in compute pair command} :dt

	Self-explanatory. :dd

	{Unrecognized virial argument in pair_style command} :dt

	Only two options are supported: LAMMPSvirial and KIMvirial :dd

	{Unsupported mixing rule in kspace_style ewald/disp} :dt

	Only geometric mixing is supported. :dd

	{Unsupported order in kspace_style ewald/disp} :dt

	Only 1/r^6 dispersion or dipole terms are supported. :dd

	{Unsupported order in kspace_style pppm/disp, pair_style %s} :dt

	Only pair styles with 1/r and 1/r^6 dependence are currently supported. :dd

	{Use cutoff keyword to set cutoff in single mode} :dt

	Mode is single so cutoff/multi keyword cannot be used. :dd

	{Use cutoff/multi keyword to set cutoff in multi mode} :dt

	Mode is multi so cutoff keyword cannot be used. :dd

	{Using fix nvt/sllod with inconsistent fix deform remap option} :dt

	Fix nvt/sllod requires that deforming atoms have a velocity profile
	provided by "remap v" as a fix deform option. :dd

	{Using fix nvt/sllod with no fix deform defined} :dt

	Self-explanatory. :dd

	{Using fix srd with inconsistent fix deform remap option} :dt

	When shearing the box in an SRD simulation, the remap v option for fix
	deform needs to be used. :dd

	{Using pair lubricate with inconsistent fix deform remap option} :dt

	Must use remap v option with fix deform with this pair style. :dd

	{Using pair lubricate/poly with inconsistent fix deform remap option} :dt

	If fix deform is used, the remap v option is required. :dd

	{Using suffix cuda without USER-CUDA package enabled} :dt

	Self-explanatory. :dd

	{Using suffix gpu without GPU package installed} :dt

	Self-explanatory. :dd

	{Using suffix intel without USER-INTEL package installed} :dt

	Self-explanatory. :dd

	{Using suffix kk without KOKKOS package enabled} :dt

	Self-explanatory. :dd

	{Using suffix omp without USER-OMP package installed} :dt

	Self-explanatory. :dd

	{Using update dipole flag requires atom attribute mu} :dt

	Self-explanatory. :dd

	{Using update dipole flag requires atom style sphere} :dt

	Self-explanatory. :dd

	{Variable ID in variable formula does not exist} :dt

	Self-explanatory. :dd

	{Variable atom ID is too large} :dt

	Specified ID is larger than the maximum allowed atom ID. :dd

	{Variable evaluation before simulation box is defined} :dt

	Cannot evaluate a compute or fix or atom-based value in a variable
	before the simulation has been setup. :dd

	{Variable evaluation in fix wall gave bad value} :dt

	The returned value for epsilon or sigma < 0.0. :dd

	{Variable evaluation in region gave bad value} :dt

	Variable returned a radius < 0.0. :dd

	{Variable for compute ti is invalid style} :dt

	Self-explanatory. :dd

	{Variable for create_atoms is invalid style} :dt

	The variables must be equal-style variables. :dd

	{Variable for displace_atoms is invalid style} :dt

	It must be an equal-style or atom-style variable. :dd

	{Variable for dump every is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for dump image center is invalid style} :dt

	Must be an equal-style variable. :dd

	{Variable for dump image persp is invalid style} :dt

	Must be an equal-style variable. :dd

	{Variable for dump image phi is invalid style} :dt

	Must be an equal-style variable. :dd

	{Variable for dump image theta is invalid style} :dt

	Must be an equal-style variable. :dd

	{Variable for dump image zoom is invalid style} :dt

	Must be an equal-style variable. :dd

	{Variable for fix adapt is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for fix addforce is invalid style} :dt

	Self-explanatory. :dd

	{Variable for fix aveforce is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for fix deform is invalid style} :dt

	The variable must be an equal-style variable. :dd

	{Variable for fix efield is invalid style} :dt

	The variable must be an equal- or atom-style variable. :dd

	{Variable for fix gravity is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for fix heat is invalid style} :dt

	Only equal-style or atom-style variables can be used. :dd

	{Variable for fix indent is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for fix indent is not equal style} :dt

	Only equal-style variables can be used. :dd

	{Variable for fix langevin is invalid style} :dt

	It must be an equal-style variable. :dd

	{Variable for fix move is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for fix setforce is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for fix temp/berendsen is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for fix temp/csld is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for fix temp/csvr is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for fix temp/rescale is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for fix wall is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for fix wall/reflect is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for fix wall/srd is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for group dynamic is invalid style} :dt

	The variable must be an atom-style variable. :dd

	{Variable for group is invalid style} :dt

	Only atom-style variables can be used. :dd

	{Variable for region cylinder is invalid style} :dt

	Only equal-style variables are allowed. :dd

	{Variable for region is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for region is not equal style} :dt

	Self-explanatory. :dd

	{Variable for region sphere is invalid style} :dt

	Only equal-style variables are allowed. :dd

	{Variable for restart is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for set command is invalid style} :dt

	Only atom-style variables can be used. :dd

	{Variable for thermo every is invalid style} :dt

	Only equal-style variables can be used. :dd

	{Variable for velocity set is invalid style} :dt

	Only atom-style variables can be used. :dd

	{Variable for voronoi radius is not atom style} :dt

	Self-explanatory. :dd

	{Variable formula compute array is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Variable formula compute vector is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Variable formula fix array is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Variable formula fix vector is accessed out-of-range} :dt

	Self-explanatory. :dd

	{Variable has circular dependency} :dt

	A circular dependency is when variable "a" in used by variable "b" and
	variable "b" is also used by variable "a". Circular dependencies with
	longer chains of dependence are also not allowed. :dd

	{Variable name between brackets must be alphanumeric or underscore characters} :dt

	Self-explanatory. :dd

	{Variable name for compute chunk/atom does not exist} :dt

	Self-explanatory. :dd

	{Variable name for compute reduce does not exist} :dt

	Self-explanatory. :dd

	{Variable name for compute ti does not exist} :dt

	Self-explanatory. :dd

	{Variable name for create_atoms does not exist} :dt

	Self-explanatory. :dd

	{Variable name for displace_atoms does not exist} :dt

	Self-explanatory. :dd

	{Variable name for dump every does not exist} :dt

	Self-explanatory. :dd

	{Variable name for dump image center does not exist} :dt

	Self-explanatory. :dd

	{Variable name for dump image persp does not exist} :dt

	Self-explanatory. :dd

	{Variable name for dump image phi does not exist} :dt

	Self-explanatory. :dd

	{Variable name for dump image theta does not exist} :dt

	Self-explanatory. :dd

	{Variable name for dump image zoom does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix adapt does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix addforce does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix ave/atom does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix ave/chunk does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix ave/correlate does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix ave/histo does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix ave/spatial does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix ave/time does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix aveforce does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix deform does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix efield does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix gravity does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix heat does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix indent does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix langevin does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix move does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix setforce does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix store/state does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix temp/berendsen does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix temp/csld does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix temp/csvr does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix temp/rescale does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix vector does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix wall does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix wall/reflect does not exist} :dt

	Self-explanatory. :dd

	{Variable name for fix wall/srd does not exist} :dt

	Self-explanatory. :dd

	{Variable name for group does not exist} :dt

	Self-explanatory. :dd

	{Variable name for group dynamic does not exist} :dt

	Self-explanatory. :dd

	{Variable name for region cylinder does not exist} :dt

	Self-explanatory. :dd

	{Variable name for region does not exist} :dt

	Self-explanatory. :dd

	{Variable name for region sphere does not exist} :dt

	Self-explanatory. :dd

	{Variable name for restart does not exist} :dt

	Self-explanatory. :dd

	{Variable name for set command does not exist} :dt

	Self-explanatory. :dd

	{Variable name for thermo every does not exist} :dt

	Self-explanatory. :dd

	{Variable name for velocity set does not exist} :dt

	Self-explanatory. :dd

	{Variable name for voronoi radius does not exist} :dt

	Self-explanatory. :dd

	{Variable name must be alphanumeric or underscore characters} :dt

	Self-explanatory. :dd

	{Variable uses atom property that isn't allocated} :dt

	Self-explanatory. :dd

	{Velocity command before simulation box is defined} :dt

	The velocity command cannot be used before a read_data, read_restart,
	or create_box command. :dd

	{Velocity command with no atoms existing} :dt

	A velocity command has been used, but no atoms yet exist. :dd

	{Velocity ramp in z for a 2d problem} :dt

	Self-explanatory. :dd

	{Velocity rigid used with non-rigid fix-ID} :dt

	Self-explanatory. :dd

	{Velocity temperature ID does calculate a velocity bias} :dt

	The specified compute must compute a bias for temperature. :dd

	{Velocity temperature ID does not compute temperature} :dt

	The compute ID given to the velocity command must compute
	temperature. :dd

	{Verlet/split can only currently be used with comm_style brick} :dt

	This is a current restriction in LAMMPS. :dd

	{Verlet/split does not yet support TIP4P} :dt

	This is a current limitation. :dd

	{Verlet/split requires 2 partitions} :dt

	See the -partition command-line switch. :dd

	{Verlet/split requires Rspace partition layout be multiple of Kspace partition layout in each dim} :dt

	This is controlled by the processors command. :dd

	{Verlet/split requires Rspace partition size be multiple of Kspace partition size} :dt

	This is so there is an equal number of Rspace processors for every
	Kspace processor. :dd

	{Virial was not tallied on needed timestep} :dt

	You are using a thermo keyword that requires potentials to
	have tallied the virial, but they didn't on this timestep. See the
	variable doc page for ideas on how to make this work. :dd

	{Voro++ error: narea and neigh have a different size} :dt

	This error is returned by the Voro++ library. :dd

	{Wall defined twice in fix wall command} :dt

	Self-explanatory. :dd

	{Wall defined twice in fix wall/reflect command} :dt

	Self-explanatory. :dd

	{Wall defined twice in fix wall/srd command} :dt

	Self-explanatory. :dd

	{Water H epsilon must be 0.0 for pair style lj/cut/tip4p/cut} :dt

	This is because LAMMPS does not compute the Lennard-Jones interactions
	with these particles for efficiency reasons. :dd

	{Water H epsilon must be 0.0 for pair style lj/cut/tip4p/long} :dt

	This is because LAMMPS does not compute the Lennard-Jones interactions
	with these particles for efficiency reasons. :dd

	{Water H epsilon must be 0.0 for pair style lj/long/tip4p/long} :dt

	This is because LAMMPS does not compute the Lennard-Jones interactions
	with these particles for efficiency reasons. :dd

	{World variable count doesn't match # of partitions} :dt

	A world-style variable must specify a number of values equal to the
	number of processor partitions. :dd

	{Write_data command before simulation box is defined} :dt

	Self-explanatory. :dd

	{Write_restart command before simulation box is defined} :dt

	The write_restart command cannot be used before a read_data,
	read_restart, or create_box command. :dd

	{Writing to MPI-IO filename when MPIIO package is not installed} :dt

	Self-explanatory. :dd

	{Zero length rotation vector with displace_atoms} :dt

	Self-explanatory. :dd

	{Zero length rotation vector with fix move} :dt

	Self-explanatory. :dd

	{Zero-length lattice orient vector} :dt

	Self-explanatory. :dd

	:dle

	Warnings: :h4,link(warn)

	:dlb

	{Adjusting Coulombic cutoff for MSM, new cutoff = %g} :dt

	The adjust/cutoff command is turned on and the Coulombic cutoff has been
	adjusted to match the user-specified accuracy. :dd

	{Angle atoms missing at step %ld} :dt

	One or more of 3 atoms needed to compute a particular angle are
	missing on this processor. Typically this is because the pairwise
	cutoff is set too short or the angle has blown apart and an atom is
	too far away. :dd

	{Angle style in data file differs from currently defined angle style} :dt

	Self-explanatory. :dd

	{Atom style in data file differs from currently defined atom style} :dt

	Self-explanatory. :dd

	{Bond atom missing in box size check} :dt

	The 2nd atoms needed to compute a particular bond is missing on this
	processor. Typically this is because the pairwise cutoff is set too
	short or the bond has blown apart and an atom is too far away. :dd

	{Bond atom missing in image check} :dt

	The 2nd atom in a particular bond is missing on this processor.
	Typically this is because the pairwise cutoff is set too short or the
	bond has blown apart and an atom is too far away. :dd

	{Bond atoms missing at step %ld} :dt

	The 2nd atom needed to compute a particular bond is missing on this
	processor. Typically this is because the pairwise cutoff is set too
	short or the bond has blown apart and an atom is too far away. :dd

	{Bond style in data file differs from currently defined bond style} :dt

	Self-explanatory. :dd

	{Bond/angle/dihedral extent > half of periodic box length} :dt

	This is a restriction because LAMMPS can be confused about which image
	of an atom in the bonded interaction is the correct one to use.
	"Extent" in this context means the maximum end-to-end length of the
	bond/angle/dihedral. LAMMPS computes this by taking the maximum bond
	length, multiplying by the number of bonds in the interaction (e.g. 3
	for a dihedral) and adding a small amount of stretch. :dd

	{Both groups in compute group/group have a net charge; the Kspace boundary correction to energy will be non-zero} :dt

	Self-explanatory. :dd

	{Calling write_dump before a full system init.} :dt

	The write_dump command is used before the system has been fully
	initialized as part of a 'run' or 'minimize' command. Not all dump
	styles and features are fully supported at this point and thus the
	command may fail or produce incomplete or incorrect output. Insert
	a "run 0" command, if a full system init is required. :dd

	{Cannot count rigid body degrees-of-freedom before bodies are fully initialized} :dt

	This means the temperature associated with the rigid bodies may be
	incorrect on this timestep. :dd

	{Cannot count rigid body degrees-of-freedom before bodies are initialized} :dt

	This means the temperature associated with the rigid bodies may be
	incorrect on this timestep. :dd

	{Cannot include log terms without 1/r terms; setting flagHI to 1} :dt

	Self-explanatory. :dd

	{Cannot include log terms without 1/r terms; setting flagHI to 1.} :dt

	Self-explanatory. :dd

	{Charges are set, but coulombic solver is not used} :dt

	Self-explanatory. :dd

	{Charges did not converge at step %ld: %lg} :dt

	Self-explanatory. :dd

	{Communication cutoff is too small for SNAP micro load balancing, increased to %lf} :dt

	Self-explanatory. :dd

	{Compute cna/atom cutoff may be too large to find ghost atom neighbors} :dt

	The neighbor cutoff used may not encompass enough ghost atoms
	to perform this operation correctly. :dd

	{Computing temperature of portions of rigid bodies} :dt

	The group defined by the temperature compute does not encompass all
	the atoms in one or more rigid bodies, so the change in
	degrees-of-freedom for the atoms in those partial rigid bodies will
	not be accounted for. :dd

	{Create_bonds max distance > minimum neighbor cutoff} :dt

	This means atom pairs for some atom types may not be in the neighbor
	list and thus no bond can be created between them. :dd

	{Delete_atoms cutoff > minimum neighbor cutoff} :dt

	This means atom pairs for some atom types may not be in the neighbor
	list and thus an atom in that pair cannot be deleted. :dd

	{Dihedral atoms missing at step %ld} :dt

	One or more of 4 atoms needed to compute a particular dihedral are
	missing on this processor. Typically this is because the pairwise
	cutoff is set too short or the dihedral has blown apart and an atom is
	too far away. :dd

	{Dihedral problem} :dt

	Conformation of the 4 listed dihedral atoms is extreme; you may want
	to check your simulation geometry. :dd

	{Dihedral problem: %d %ld %d %d %d %d} :dt

	Conformation of the 4 listed dihedral atoms is extreme; you may want
	to check your simulation geometry. :dd

	{Dihedral style in data file differs from currently defined dihedral style} :dt

	Self-explanatory. :dd

	{Dump dcd/xtc timestamp may be wrong with fix dt/reset} :dt

	If the fix changes the timestep, the dump dcd file will not
	reflect the change. :dd

	+{Energy due to X extra global DOFs will be included in minimizer energies} :dt
	+
	+When using fixes like box/relax, the potential energy used by the minimizer
	+is augmented by an additional energy provided by the fix. Thus the printed
	+converged energy may be different from the total potential energy. :dd
	+
	{Energy tally does not account for 'zero yes'} :dt

	The energy removed by using the 'zero yes' flag is not accounted
	for in the energy tally and thus energy conservation cannot be
	monitored in this case. :dd

	{Estimated error in splitting of dispersion coeffs is %g} :dt

	Error is greater than 0.0001 percent. :dd

	{Ewald/disp Newton solver failed, using old method to estimate g_ewald} :dt

	Self-explanatory. Choosing a different cutoff value may help. :dd

	{FENE bond too long} :dt

	A FENE bond has stretched dangerously far. It's interaction strength
	will be truncated to attempt to prevent the bond from blowing up. :dd

	{FENE bond too long: %ld %d %d %g} :dt

	A FENE bond has stretched dangerously far. It's interaction strength
	will be truncated to attempt to prevent the bond from blowing up. :dd

	{FENE bond too long: %ld %g} :dt

	A FENE bond has stretched dangerously far. It's interaction strength
	will be truncated to attempt to prevent the bond from blowing up. :dd

	{Fix SRD walls overlap but fix srd overlap not set} :dt

	You likely want to set this in your input script. :dd

	{Fix bond/swap will ignore defined angles} :dt

	See the doc page for fix bond/swap for more info on this
	restriction. :dd

	{Fix deposit near setting < possible overlap separation %g} :dt

	This test is performed for finite size particles with a diameter, not
	for point particles. The near setting is smaller than the particle
	diameter which can lead to overlaps. :dd

	{Fix evaporate may delete atom with non-zero molecule ID} :dt

	This is probably an error, since you should not delete only one atom
	of a molecule. :dd

	{Fix gcmc using full_energy option} :dt

	Fix gcmc has automatically turned on the full_energy option since it
	is required for systems like the one specified by the user. User input
	included one or more of the following: kspace, triclinic, a hybrid
	pair style, an eam pair style, or no "single" function for the pair
	style. :dd

	{Fix property/atom mol or charge w/out ghost communication} :dt

	A model typically needs these properties defined for ghost atoms. :dd

	{Fix qeq CG convergence failed (%g) after %d iterations at %ld step} :dt

	Self-explanatory. :dd

	{Fix qeq has non-zero lower Taper radius cutoff} :dt

	Absolute value must be <= 0.01. :dd

	{Fix qeq has very low Taper radius cutoff} :dt

	Value should typically be >= 5.0. :dd

	{Fix qeq/dynamic tolerance may be too small for damped dynamics} :dt

	Self-explanatory. :dd

	{Fix qeq/fire tolerance may be too small for damped fires} :dt

	Self-explanatory. :dd

	{Fix rattle should come after all other integration fixes} :dt

	This fix is designed to work after all other integration fixes change
	atom positions. Thus it should be the last integration fix specified.
	If not, it will not satisfy the desired constraints as well as it
	otherwise would. :dd

	{Fix recenter should come after all other integration fixes} :dt

	Other fixes may change the position of the center-of-mass, so
	fix recenter should come last. :dd

	{Fix srd SRD moves may trigger frequent reneighboring} :dt

	This is because the SRD particles may move long distances. :dd

	{Fix srd grid size > 1/4 of big particle diameter} :dt

	This may cause accuracy problems. :dd

	{Fix srd particle moved outside valid domain} :dt

	This may indicate a problem with your simulation parameters. :dd

	{Fix srd particles may move > big particle diameter} :dt

	This may cause accuracy problems. :dd

	{Fix srd viscosity < 0.0 due to low SRD density} :dt

	This may cause accuracy problems. :dd

	{Fix thermal/conductivity comes before fix ave/spatial} :dt

	The order of these 2 fixes in your input script is such that fix
	thermal/conductivity comes first. If you are using fix ave/spatial to
	measure the temperature profile induced by fix viscosity, then this
	may cause a glitch in the profile since you are averaging immediately
	after swaps have occurred. Flipping the order of the 2 fixes
	typically helps. :dd

	{Fix viscosity comes before fix ave/spatial} :dt

	The order of these 2 fixes in your input script is such that
	fix viscosity comes first. If you are using fix ave/spatial
	to measure the velocity profile induced by fix viscosity, then
	this may cause a glitch in the profile since you are averaging
	immediately after swaps have occurred. Flipping the order
	of the 2 fixes typically helps. :dd

	{Fixes cannot send data in Kokkos communication, switching to classic communication} :dt

	This is current restriction with Kokkos. :dd

	{For better accuracy use 'pair_modify table 0'} :dt

	The user-specified force accuracy cannot be achieved unless the table
	feature is disabled by using 'pair_modify table 0'. :dd

	{Geometric mixing assumed for 1/r^6 coefficients} :dt

	Self-explanatory. :dd

	{Group for fix_modify temp != fix group} :dt

	The fix_modify command is specifying a temperature computation that
	computes a temperature on a different group of atoms than the fix
	itself operates on. This is probably not what you want to do. :dd

	{H matrix size has been exceeded: m_fill=%d H.m=%d\n} :dt

	This is the size of the matrix. :dd

	{Ignoring unknown or incorrect info command flag} :dt

	Self-explanatory. An unknown argument was given to the info command.
	Compare your input with the documentation. :dd

	{Improper atoms missing at step %ld} :dt

	One or more of 4 atoms needed to compute a particular improper are
	missing on this processor. Typically this is because the pairwise
	cutoff is set too short or the improper has blown apart and an atom is
	too far away. :dd

	{Improper problem: %d %ld %d %d %d %d} :dt

	Conformation of the 4 listed improper atoms is extreme; you may want
	to check your simulation geometry. :dd

	{Improper style in data file differs from currently defined improper style} :dt

	Self-explanatory. :dd

	{Inconsistent image flags} :dt

	The image flags for a pair on bonded atoms appear to be inconsistent.
	Inconsistent means that when the coordinates of the two atoms are
	unwrapped using the image flags, the two atoms are far apart.
	Specifically they are further apart than half a periodic box length.
	Or they are more than a box length apart in a non-periodic dimension.
	This is usually due to the initial data file not having correct image
	flags for the 2 atoms in a bond that straddles a periodic boundary.
	They should be different by 1 in that case. This is a warning because
	inconsistent image flags will not cause problems for dynamics or most
	LAMMPS simulations. However they can cause problems when such atoms
	are used with the fix rigid or replicate commands. Note that if you
	have an infinite periodic crystal with bonds then it is impossible to
	have fully consistent image flags, since some bonds will cross
	periodic boundaries and connect two atoms with the same image
	flag. :dd

	{KIM Model does not provide 'energy'; Potential energy will be zero} :dt

	Self-explanatory. :dd

	{KIM Model does not provide 'forces'; Forces will be zero} :dt

	Self-explanatory. :dd

	{KIM Model does not provide 'particleEnergy'; energy per atom will be zero} :dt

	Self-explanatory. :dd

	{KIM Model does not provide 'particleVirial'; virial per atom will be zero} :dt

	Self-explanatory. :dd

	{Kspace_modify slab param < 2.0 may cause unphysical behavior} :dt

	The kspace_modify slab parameter should be larger to insure periodic
	grids padded with empty space do not overlap. :dd

	{Less insertions than requested} :dt

	The fix pour command was unsuccessful at finding open space
	for as many particles as it tried to insert. :dd

	{Library error in lammps_gather_atoms} :dt

	This library function cannot be used if atom IDs are not defined
	or are not consecutively numbered. :dd

	{Library error in lammps_scatter_atoms} :dt

	This library function cannot be used if atom IDs are not defined or
	are not consecutively numbered, or if no atom map is defined. See the
	atom_modify command for details about atom maps. :dd

	{Lost atoms via change_box: original %ld current %ld} :dt

	The command options you have used caused atoms to be lost. :dd

	{Lost atoms via displace_atoms: original %ld current %ld} :dt

	The command options you have used caused atoms to be lost. :dd

	{Lost atoms: original %ld current %ld} :dt

	Lost atoms are checked for each time thermo output is done. See the
	thermo_modify lost command for options. Lost atoms usually indicate
	bad dynamics, e.g. atoms have been blown far out of the simulation
	box, or moved further than one processor's sub-domain away before
	reneighboring. :dd

	{MSM mesh too small, increasing to 2 points in each direction} :dt

	Self-explanatory. :dd

	{Mismatch between velocity and compute groups} :dt

	The temperature computation used by the velocity command will not be
	on the same group of atoms that velocities are being set for. :dd

	{Mixing forced for lj coefficients} :dt

	Self-explanatory. :dd

	{Molecule attributes do not match system attributes} :dt

	An attribute is specified (e.g. diameter, charge) that is
	not defined for the specified atom style. :dd

	{Molecule has bond topology but no special bond settings} :dt

	This means the bonded atoms will not be excluded in pair-wise
	interactions. :dd

	{Molecule template for create_atoms has multiple molecules} :dt

	The create_atoms command will only create molecules of a single type,
	i.e. the first molecule in the template. :dd

	{Molecule template for fix gcmc has multiple molecules} :dt

	The fix gcmc command will only create molecules of a single type,
	i.e. the first molecule in the template. :dd

	{Molecule template for fix shake has multiple molecules} :dt

	The fix shake command will only recognize molecules of a single
	type, i.e. the first molecule in the template. :dd

	{More than one compute centro/atom} :dt

	It is not efficient to use compute centro/atom more than once. :dd

	{More than one compute cluster/atom} :dt

	It is not efficient to use compute cluster/atom more than once. :dd

	{More than one compute cna/atom defined} :dt

	It is not efficient to use compute cna/atom more than once. :dd

	{More than one compute contact/atom} :dt

	It is not efficient to use compute contact/atom more than once. :dd

	{More than one compute coord/atom} :dt

	It is not efficient to use compute coord/atom more than once. :dd

	{More than one compute damage/atom} :dt

	It is not efficient to use compute ke/atom more than once. :dd

	{More than one compute dilatation/atom} :dt

	Self-explanatory. :dd

	{More than one compute erotate/sphere/atom} :dt

	It is not efficient to use compute erorate/sphere/atom more than once. :dd

	{More than one compute hexorder/atom} :dt

	It is not efficient to use compute hexorder/atom more than once. :dd

	{More than one compute ke/atom} :dt

	It is not efficient to use compute ke/atom more than once. :dd

	{More than one compute orientorder/atom} :dt

	It is not efficient to use compute orientorder/atom more than once. :dd

	{More than one compute plasticity/atom} :dt

	Self-explanatory. :dd

	{More than one compute sna/atom} :dt

	Self-explanatory. :dd

	{More than one compute snad/atom} :dt

	Self-explanatory. :dd

	{More than one compute snav/atom} :dt

	Self-explanatory. :dd

	{More than one fix poems} :dt

	It is not efficient to use fix poems more than once. :dd

	{More than one fix rigid} :dt

	It is not efficient to use fix rigid more than once. :dd

	{Neighbor exclusions used with KSpace solver may give inconsistent Coulombic energies} :dt

	This is because excluding specific pair interactions also excludes
	them from long-range interactions which may not be the desired effect.
	The special_bonds command handles this consistently by insuring
	excluded (or weighted) 1-2, 1-3, 1-4 interactions are treated
	consistently by both the short-range pair style and the long-range
	solver. This is not done for exclusions of charged atom pairs via the
	neigh_modify exclude command. :dd

	{New thermo_style command, previous thermo_modify settings will be lost} :dt

	If a thermo_style command is used after a thermo_modify command, the
	settings changed by the thermo_modify command will be reset to their
	default values. This is because the thermo_modify command acts on
	the currently defined thermo style, and a thermo_style command creates
	a new style. :dd

	{No Kspace calculation with verlet/split} :dt

	The 2nd partition performs a kspace calculation so the kspace_style
	command must be used. :dd

	{No automatic unit conversion to XTC file format conventions possible for units lj} :dt

	This means no scaling will be performed. :dd

	{No fixes defined, atoms won't move} :dt

	If you are not using a fix like nve, nvt, npt then atom velocities and
	coordinates will not be updated during timestepping. :dd

	{No joints between rigid bodies, use fix rigid instead} :dt

	The bodies defined by fix poems are not connected by joints. POEMS
	will integrate the body motion, but it would be more efficient to use
	fix rigid. :dd

	{Not using real units with pair reax} :dt

	This is most likely an error, unless you have created your own ReaxFF
	parameter file in a different set of units. :dd

	{Number of MSM mesh points changed to be a multiple of 2} :dt

	MSM requires that the number of grid points in each direction be a multiple
	of two and the number of grid points in one or more directions have been
	adjusted to meet this requirement. :dd

	{OMP_NUM_THREADS environment is not set.} :dt

	This environment variable must be set appropriately to use the
	USER-OMP package. :dd

	{One or more atoms are time integrated more than once} :dt

	This is probably an error since you typically do not want to
	advance the positions or velocities of an atom more than once
	per timestep. :dd

	{One or more chunks do not contain all atoms in molecule} :dt

	This may not be what you intended. :dd

	{One or more dynamic groups may not be updated at correct point in timestep} :dt

	If there are other fixes that act immediately after the initial stage
	of time integration within a timestep (i.e. after atoms move), then
	the command that sets up the dynamic group should appear after those
	fixes. This will insure that dynamic group assignments are made
	after all atoms have moved. :dd

	{One or more respa levels compute no forces} :dt

	This is computationally inefficient. :dd

	{Pair COMB charge %.10f with force %.10f hit max barrier} :dt

	Something is possibly wrong with your model. :dd

	{Pair COMB charge %.10f with force %.10f hit min barrier} :dt

	Something is possibly wrong with your model. :dd

	{Pair brownian needs newton pair on for momentum conservation} :dt

	Self-explanatory. :dd

	{Pair dpd needs newton pair on for momentum conservation} :dt

	Self-explanatory. :dd

	{Pair dsmc: num_of_collisions > number_of_A} :dt

	Collision model in DSMC is breaking down. :dd

	{Pair dsmc: num_of_collisions > number_of_B} :dt

	Collision model in DSMC is breaking down. :dd

	{Pair style in data file differs from currently defined pair style} :dt

	Self-explanatory. :dd

	{Particle deposition was unsuccessful} :dt

	The fix deposit command was not able to insert as many atoms as
	needed. The requested volume fraction may be too high, or other atoms
	may be in the insertion region. :dd

	{Proc sub-domain size < neighbor skin, could lead to lost atoms} :dt

	The decomposition of the physical domain (likely due to load
	balancing) has led to a processor's sub-domain being smaller than the
	neighbor skin in one or more dimensions. Since reneighboring is
	triggered by atoms moving the skin distance, this may lead to lost
	atoms, if an atom moves all the way across a neighboring processor's
	sub-domain before reneighboring is triggered. :dd

	{Reducing PPPM order b/c stencil extends beyond nearest neighbor processor} :dt

	This may lead to a larger grid than desired. See the kspace_modify overlap
	command to prevent changing of the PPPM order. :dd

	{Reducing PPPMDisp Coulomb order b/c stencil extends beyond neighbor processor} :dt

	This may lead to a larger grid than desired. See the kspace_modify overlap
	command to prevent changing of the PPPM order. :dd

	{Reducing PPPMDisp dispersion order b/c stencil extends beyond neighbor processor} :dt

	This may lead to a larger grid than desired. See the kspace_modify overlap
	command to prevent changing of the PPPM order. :dd

	{Replacing a fix, but new group != old group} :dt

	The ID and style of a fix match for a fix you are changing with a fix
	command, but the new group you are specifying does not match the old
	group. :dd

	{Replicating in a non-periodic dimension} :dt

	The parameters for a replicate command will cause a non-periodic
	dimension to be replicated; this may cause unwanted behavior. :dd

	{Resetting reneighboring criteria during PRD} :dt

	A PRD simulation requires that neigh_modify settings be delay = 0,
	every = 1, check = yes. Since these settings were not in place,
	LAMMPS changed them and will restore them to their original values
	after the PRD simulation. :dd

	{Resetting reneighboring criteria during TAD} :dt

	A TAD simulation requires that neigh_modify settings be delay = 0,
	every = 1, check = yes. Since these settings were not in place,
	LAMMPS changed them and will restore them to their original values
	after the PRD simulation. :dd

	{Resetting reneighboring criteria during minimization} :dt

	Minimization requires that neigh_modify settings be delay = 0, every =
	1, check = yes. Since these settings were not in place, LAMMPS
	changed them and will restore them to their original values after the
	minimization. :dd

	{Restart file used different # of processors} :dt

	The restart file was written out by a LAMMPS simulation running on a
	different number of processors. Due to round-off, the trajectories of
	your restarted simulation may diverge a little more quickly than if
	you ran on the same # of processors. :dd

	{Restart file used different 3d processor grid} :dt

	The restart file was written out by a LAMMPS simulation running on a
	different 3d grid of processors. Due to round-off, the trajectories
	of your restarted simulation may diverge a little more quickly than if
	you ran on the same # of processors. :dd

	{Restart file used different boundary settings, using restart file values} :dt

	Your input script cannot change these restart file settings. :dd

	{Restart file used different newton bond setting, using restart file value} :dt

	The restart file value will override the setting in the input script. :dd

	{Restart file used different newton pair setting, using input script value} :dt

	The input script value will override the setting in the restart file. :dd

	{Restrain problem: %d %ld %d %d %d %d} :dt

	Conformation of the 4 listed dihedral atoms is extreme; you may want
	to check your simulation geometry. :dd

	{Running PRD with only one replica} :dt

	This is allowed, but you will get no parallel speed-up. :dd

	{SRD bin shifting turned on due to small lamda} :dt

	This is done to try to preserve accuracy. :dd

	{SRD bin size for fix srd differs from user request} :dt

	Fix SRD had to adjust the bin size to fit the simulation box. See the
	cubic keyword if you want this message to be an error vs warning. :dd

	{SRD bins for fix srd are not cubic enough} :dt

	The bin shape is not within tolerance of cubic. See the cubic
	keyword if you want this message to be an error vs warning. :dd

	{SRD particle %d started inside big particle %d on step %ld bounce %d} :dt

	See the inside keyword if you want this message to be an error vs
	warning. :dd

	{SRD particle %d started inside wall %d on step %ld bounce %d} :dt

	See the inside keyword if you want this message to be an error vs
	warning. :dd

	{Shake determinant < 0.0} :dt

	The determinant of the quadratic equation being solved for a single
	cluster specified by the fix shake command is numerically suspect. LAMMPS
	will set it to 0.0 and continue. :dd

	{Shell command '%s' failed with error '%s'} :dt

	Self-explanatory. :dd

	{Shell command returned with non-zero status} :dt

	This may indicate the shell command did not operate as expected. :dd

	{Should not allow rigid bodies to bounce off relecting walls} :dt

	LAMMPS allows this, but their dynamics are not computed correctly. :dd

	{Should not use fix nve/limit with fix shake or fix rattle} :dt

	This will lead to invalid constraint forces in the SHAKE/RATTLE
	computation. :dd

	{Simulations might be very slow because of large number of structure factors} :dt

	Self-explanatory. :dd

	{Slab correction not needed for MSM} :dt

	Slab correction is intended to be used with Ewald or PPPM and is not needed by MSM. :dd

	{System is not charge neutral, net charge = %g} :dt

	The total charge on all atoms on the system is not 0.0.
	For some KSpace solvers this is only a warning. :dd

	{Table inner cutoff >= outer cutoff} :dt

	You specified an inner cutoff for a Coulombic table that is longer
	than the global cutoff. Probably not what you wanted. :dd

	{Temperature for MSST is not for group all} :dt

	User-assigned temperature to MSST fix does not compute temperature for
	all atoms. Since MSST computes a global pressure, the kinetic energy
	contribution from the temperature is assumed to also be for all atoms.
	Thus the pressure used by MSST could be inaccurate. :dd

	{Temperature for NPT is not for group all} :dt

	User-assigned temperature to NPT fix does not compute temperature for
	all atoms. Since NPT computes a global pressure, the kinetic energy
	contribution from the temperature is assumed to also be for all atoms.
	Thus the pressure used by NPT could be inaccurate. :dd

	{Temperature for fix modify is not for group all} :dt

	The temperature compute is being used with a pressure calculation
	which does operate on group all, so this may be inconsistent. :dd

	{Temperature for thermo pressure is not for group all} :dt

	User-assigned temperature to thermo via the thermo_modify command does
	not compute temperature for all atoms. Since thermo computes a global
	pressure, the kinetic energy contribution from the temperature is
	assumed to also be for all atoms. Thus the pressure printed by thermo
	could be inaccurate. :dd

	{The fix ave/spatial command has been replaced by the more flexible fix ave/chunk and compute chunk/atom commands -- fix ave/spatial will be removed in the summer of 2015} :dt

	Self-explanatory. :dd

	{The minimizer does not re-orient dipoles when using fix efield} :dt

	This means that only the atom coordinates will be minimized,
	not the orientation of the dipoles. :dd

	{Too many common neighbors in CNA %d times} :dt

	More than the maximum # of neighbors was found multiple times. This
	was unexpected. :dd

	{Too many inner timesteps in fix ttm} :dt

	Self-explanatory. :dd

	{Too many neighbors in CNA for %d atoms} :dt

	More than the maximum # of neighbors was found multiple times. This
	was unexpected. :dd

	{Triclinic box skew is large} :dt

	The displacement in a skewed direction is normally required to be less
	than half the box length in that dimension. E.g. the xy tilt must be
	between -half and +half of the x box length. You have relaxed the
	constraint using the box tilt command, but the warning means that a
	LAMMPS simulation may be inefficient as a result. :dd

	{Use special bonds = 0,1,1 with bond style fene} :dt

	Most FENE models need this setting for the special_bonds command. :dd

	{Use special bonds = 0,1,1 with bond style fene/expand} :dt

	Most FENE models need this setting for the special_bonds command. :dd

	{Using a manybody potential with bonds/angles/dihedrals and special_bond exclusions} :dt

	This is likely not what you want to do. The exclusion settings will
	eliminate neighbors in the neighbor list, which the manybody potential
	needs to calculated its terms correctly. :dd

	{Using compute temp/deform with inconsistent fix deform remap option} :dt

	Fix nvt/sllod assumes deforming atoms have a velocity profile provided
	by "remap v" or "remap none" as a fix deform option. :dd

	{Using compute temp/deform with no fix deform defined} :dt

	This is probably an error, since it makes little sense to use
	compute temp/deform in this case. :dd

	{Using fix srd with box deformation but no SRD thermostat} :dt

	The deformation will heat the SRD particles so this can
	be dangerous. :dd

	{Using kspace solver on system with no charge} :dt

	Self-explanatory. :dd

	{Using largest cut-off for lj/long/dipole/long long long} :dt

	Self-explanatory. :dd

	{Using largest cutoff for buck/long/coul/long} :dt

	Self-explanatory. :dd

	{Using largest cutoff for lj/long/coul/long} :dt

	Self-explanatory. :dd

	{Using largest cutoff for pair_style lj/long/tip4p/long} :dt

	Self-explanatory. :dd

	{Using package gpu without any pair style defined} :dt

	Self-explanatory. :dd

	{Using pair potential shift with pair_modify compute no} :dt

	The shift effects will thus not be computed. :dd

	{Using pair tail corrections with nonperiodic system} :dt

	This is probably a bogus thing to do, since tail corrections are
	computed by integrating the density of a periodic system out to
	infinity. :dd

	{Using pair tail corrections with pair_modify compute no} :dt

	The tail corrections will thus not be computed. :dd

	{pair style reax is now deprecated and will soon be retired. Users should switch to pair_style reax/c} :dt

	Self-explanatory. :dd

	:dle

	diff --git a/doc/src/Section_intro.txt b/doc/src/Section_intro.txt
	index 33c3cf395..bfb6ef390 100644
	--- a/doc/src/Section_intro.txt
	+++ b/doc/src/Section_intro.txt
	@@ -1,540 +1,544 @@
	"Previous Section"_Manual.html - "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next Section"_Section_start.html :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	1. Introduction :h3

	This section provides an overview of what LAMMPS can and can't do,
	describes what it means for LAMMPS to be an open-source code, and
	acknowledges the funding and people who have contributed to LAMMPS
	over the years.

	1.1 "What is LAMMPS"_#intro_1
	1.2 "LAMMPS features"_#intro_2
	1.3 "LAMMPS non-features"_#intro_3
	1.4 "Open source distribution"_#intro_4
	1.5 "Acknowledgments and citations"_#intro_5 :all(b)

	:line
	:line

	1.1 What is LAMMPS :link(intro_1),h4

	LAMMPS is a classical molecular dynamics code that models an ensemble
	of particles in a liquid, solid, or gaseous state. It can model
	atomic, polymeric, biological, metallic, granular, and coarse-grained
	systems using a variety of force fields and boundary conditions.

	For examples of LAMMPS simulations, see the Publications page of the
	"LAMMPS WWW Site"_lws.

	LAMMPS runs efficiently on single-processor desktop or laptop
	machines, but is designed for parallel computers. It will run on any
	parallel machine that compiles C++ and supports the "MPI"_mpi
	message-passing library. This includes distributed- or shared-memory
	parallel machines and Beowulf-style clusters.

	:link(mpi,http://www-unix.mcs.anl.gov/mpi)

	LAMMPS can model systems with only a few particles up to millions or
	billions. See "Section 8"_Section_perf.html for information on
	LAMMPS performance and scalability, or the Benchmarks section of the
	"LAMMPS WWW Site"_lws.

	LAMMPS is a freely-available open-source code, distributed under the
	terms of the "GNU Public License"_gnu, which means you can use or
	modify the code however you wish. See "this section"_#intro_4 for a
	brief discussion of the open-source philosophy.

	:link(gnu,http://www.gnu.org/copyleft/gpl.html)

	LAMMPS is designed to be easy to modify or extend with new
	capabilities, such as new force fields, atom types, boundary
	conditions, or diagnostics. See "Section 10"_Section_modify.html
	for more details.

	The current version of LAMMPS is written in C++. Earlier versions
	were written in F77 and F90. See
	"Section 13"_Section_history.html for more information on
	different versions. All versions can be downloaded from the "LAMMPS
	WWW Site"_lws.

	LAMMPS was originally developed under a US Department of Energy CRADA
	(Cooperative Research and Development Agreement) between two DOE labs
	and 3 companies. It is distributed by "Sandia National Labs"_snl.
	See "this section"_#intro_5 for more information on LAMMPS funding and
	individuals who have contributed to LAMMPS.

	:link(snl,http://www.sandia.gov)

	In the most general sense, LAMMPS integrates Newton's equations of
	motion for collections of atoms, molecules, or macroscopic particles
	that interact via short- or long-range forces with a variety of
	initial and/or boundary conditions. For computational efficiency
	LAMMPS uses neighbor lists to keep track of nearby particles. The
	lists are optimized for systems with particles that are repulsive at
	short distances, so that the local density of particles never becomes
	too large. On parallel machines, LAMMPS uses spatial-decomposition
	techniques to partition the simulation domain into small 3d
	sub-domains, one of which is assigned to each processor. Processors
	communicate and store "ghost" atom information for atoms that border
	their sub-domain. LAMMPS is most efficient (in a parallel sense) for
	systems whose particles fill a 3d rectangular box with roughly uniform
	density. Papers with technical details of the algorithms used in
	LAMMPS are listed in "this section"_#intro_5.

	:line

	1.2 LAMMPS features :link(intro_2),h4

	This section highlights LAMMPS features, with pointers to specific
	commands which give more details. If LAMMPS doesn't have your
	favorite interatomic potential, boundary condition, or atom type, see
	"Section 10"_Section_modify.html, which describes how you can add
	it to LAMMPS.

	General features :h5

	runs on a single processor or in parallel
	distributed-memory message-passing parallelism (MPI)
	spatial-decomposition of simulation domain for parallelism
	open-source distribution
	highly portable C++
	optional libraries used: MPI and single-processor FFT
	GPU (CUDA and OpenCL), Intel(R) Xeon Phi(TM) coprocessors, and OpenMP support for many code features
	easy to extend with new features and functionality
	runs from an input script
	syntax for defining and using variables and formulas
	syntax for looping over runs and breaking out of loops
	run one or multiple simulations simultaneously (in parallel) from one script
	build as library, invoke LAMMPS thru library interface or provided Python wrapper
	couple with other codes: LAMMPS calls other code, other code calls LAMMPS, umbrella code calls both :ul

	Particle and model types :h5
	("atom style"_atom_style.html command)

	atoms
	coarse-grained particles (e.g. bead-spring polymers)
	united-atom polymers or organic molecules
	all-atom polymers, organic molecules, proteins, DNA
	metals
	granular materials
	coarse-grained mesoscale models
	finite-size spherical and ellipsoidal particles
	finite-size line segment (2d) and triangle (3d) particles
	point dipole particles
	rigid collections of particles
	hybrid combinations of these :ul

	Force fields :h5
	("pair style"_pair_style.html, "bond style"_bond_style.html,
	"angle style"_angle_style.html, "dihedral style"_dihedral_style.html,
	"improper style"_improper_style.html, "kspace style"_kspace_style.html
	commands)

	pairwise potentials: Lennard-Jones, Buckingham, Morse, Born-Mayer-Huggins, \
	Yukawa, soft, class 2 (COMPASS), hydrogen bond, tabulated
	charged pairwise potentials: Coulombic, point-dipole
	manybody potentials: EAM, Finnis/Sinclair EAM, modified EAM (MEAM), \
	embedded ion method (EIM), EDIP, ADP, Stillinger-Weber, Tersoff, \
	REBO, AIREBO, ReaxFF, COMB, SNAP, Streitz-Mintmire, 3-body polymorphic
	long-range interactions for charge, point-dipoles, and LJ dispersion: \
	Ewald, Wolf, PPPM (similar to particle-mesh Ewald)
	polarization models: "QEq"_fix_qeq.html, \
	"core/shell model"_Section_howto.html#howto_26, \
	"Drude dipole model"_Section_howto.html#howto_27
	charge equilibration (QEq via dynamic, point, shielded, Slater methods)
	coarse-grained potentials: DPD, GayBerne, REsquared, colloidal, DLVO
	mesoscopic potentials: granular, Peridynamics, SPH
	electron force field (eFF, AWPMD)
	bond potentials: harmonic, FENE, Morse, nonlinear, class 2, \
	quartic (breakable)
	angle potentials: harmonic, CHARMM, cosine, cosine/squared, cosine/periodic, \
	class 2 (COMPASS)
	dihedral potentials: harmonic, CHARMM, multi-harmonic, helix, \
	class 2 (COMPASS), OPLS
	improper potentials: harmonic, cvff, umbrella, class 2 (COMPASS)
	polymer potentials: all-atom, united-atom, bead-spring, breakable
	water potentials: TIP3P, TIP4P, SPC
	implicit solvent potentials: hydrodynamic lubrication, Debye
	force-field compatibility with common CHARMM, AMBER, DREIDING, \
	OPLS, GROMACS, COMPASS options
	access to "KIM archive"_http://openkim.org of potentials via \
	"pair kim"_pair_kim.html
	hybrid potentials: multiple pair, bond, angle, dihedral, improper \
	potentials can be used in one simulation
	overlaid potentials: superposition of multiple pair potentials :ul

	Atom creation :h5
	("read_data"_read_data.html, "lattice"_lattice.html,
	"create_atoms"_create_atoms.html, "delete_atoms"_delete_atoms.html,
	"displace_atoms"_displace_atoms.html, "replicate"_replicate.html commands)

	read in atom coords from files
	create atoms on one or more lattices (e.g. grain boundaries)
	delete geometric or logical groups of atoms (e.g. voids)
	replicate existing atoms multiple times
	displace atoms :ul

	Ensembles, constraints, and boundary conditions :h5
	("fix"_fix.html command)

	2d or 3d systems
	orthogonal or non-orthogonal (triclinic symmetry) simulation domains
	constant NVE, NVT, NPT, NPH, Parinello/Rahman integrators
	thermostatting options for groups and geometric regions of atoms
	pressure control via Nose/Hoover or Berendsen barostatting in 1 to 3 dimensions
	simulation box deformation (tensile and shear)
	harmonic (umbrella) constraint forces
	rigid body constraints
	SHAKE bond and angle constraints
	Monte Carlo bond breaking, formation, swapping
	atom/molecule insertion and deletion
	walls of various kinds
	non-equilibrium molecular dynamics (NEMD)
	variety of additional boundary conditions and constraints :ul

	Integrators :h5
	("run"_run.html, "run_style"_run_style.html, "minimize"_minimize.html commands)

	velocity-Verlet integrator
	Brownian dynamics
	rigid body integration
	energy minimization via conjugate gradient or steepest descent relaxation
	rRESPA hierarchical timestepping
	rerun command for post-processing of dump files :ul

	Diagnostics :h5

	see the various flavors of the "fix"_fix.html and "compute"_compute.html commands :ul

	Output :h5
	("dump"_dump.html, "restart"_restart.html commands)

	log file of thermodynamic info
	text dump files of atom coords, velocities, other per-atom quantities
	binary restart files
	parallel I/O of dump and restart files
	per-atom quantities (energy, stress, centro-symmetry parameter, CNA, etc)
	user-defined system-wide (log file) or per-atom (dump file) calculations
	spatial and time averaging of per-atom quantities
	time averaging of system-wide quantities
	atom snapshots in native, XYZ, XTC, DCD, CFG formats :ul

	Multi-replica models :h5

	"nudged elastic band"_neb.html
	"parallel replica dynamics"_prd.html
	"temperature accelerated dynamics"_tad.html
	"parallel tempering"_temper.html

	Pre- and post-processing :h5

	Various pre- and post-processing serial tools are packaged
	with LAMMPS; see these "doc pages"_Section_tools.html. :ulb,l

	Our group has also written and released a separate toolkit called
	"Pizza.py"_pizza which provides tools for doing setup, analysis,
	plotting, and visualization for LAMMPS simulations. Pizza.py is
	written in "Python"_python and is available for download from "the
	Pizza.py WWW site"_pizza. :l
	:ule

	:link(pizza,http://www.sandia.gov/~sjplimp/pizza.html)
	:link(python,http://www.python.org)

	Specialized features :h5

	-These are LAMMPS capabilities which you may not think of as typical
	-molecular dynamics options:
	+LAMMPS can be built with optional packages which implement a variety
	+of additional capabilities. An overview of all the packages is "given
	+here"_Section_packages.html.
	+
	+These are some LAMMPS capabilities which you may not think of as
	+typical classical molecular dynamics options:

	"static"_balance.html and "dynamic load-balancing"_fix_balance.html
	"generalized aspherical particles"_body.html
	"stochastic rotation dynamics (SRD)"_fix_srd.html
	"real-time visualization and interactive MD"_fix_imd.html
	calculate "virtual diffraction patterns"_compute_xrd.html
	"atom-to-continuum coupling"_fix_atc.html with finite elements
	coupled rigid body integration via the "POEMS"_fix_poems.html library
	"QM/MM coupling"_fix_qmmm.html
	"path-integral molecular dynamics (PIMD)"_fix_ipi.html and "this as well"_fix_pimd.html
	Monte Carlo via "GCMC"_fix_gcmc.html and "tfMC"_fix_tfmc.html "atom swapping"_fix_atom_swap.html and "bond swapping"_fix_bond_swap.html
	"Direct Simulation Monte Carlo"_pair_dsmc.html for low-density fluids
	"Peridynamics mesoscale modeling"_pair_peri.html
	"Lattice Boltzmann fluid"_fix_lb_fluid.html
	"targeted"_fix_tmd.html and "steered"_fix_smd.html molecular dynamics
	"two-temperature electron model"_fix_ttm.html :ul

	:line

	1.3 LAMMPS non-features :link(intro_3),h4

	LAMMPS is designed to efficiently compute Newton's equations of motion
	for a system of interacting particles. Many of the tools needed to
	pre- and post-process the data for such simulations are not included
	in the LAMMPS kernel for several reasons:

	the desire to keep LAMMPS simple
	they are not parallel operations
	other codes already do them
	limited development resources :ul

	Specifically, LAMMPS itself does not:

	run thru a GUI
	build molecular systems
	assign force-field coefficients automagically
	perform sophisticated analyses of your MD simulation
	visualize your MD simulation
	plot your output data :ul

	A few tools for pre- and post-processing tasks are provided as part of
	the LAMMPS package; they are described in "this
	section"_Section_tools.html. However, many people use other codes or
	write their own tools for these tasks.

	As noted above, our group has also written and released a separate
	toolkit called "Pizza.py"_pizza which addresses some of the listed
	bullets. It provides tools for doing setup, analysis, plotting, and
	visualization for LAMMPS simulations. Pizza.py is written in
	"Python"_python and is available for download from "the Pizza.py WWW
	site"_pizza.

	LAMMPS requires as input a list of initial atom coordinates and types,
	molecular topology information, and force-field coefficients assigned
	to all atoms and bonds. LAMMPS will not build molecular systems and
	assign force-field parameters for you.

	For atomic systems LAMMPS provides a "create_atoms"_create_atoms.html
	command which places atoms on solid-state lattices (fcc, bcc,
	user-defined, etc). Assigning small numbers of force field
	coefficients can be done via the "pair coeff"_pair_coeff.html, "bond
	coeff"_bond_coeff.html, "angle coeff"_angle_coeff.html, etc commands.
	For molecular systems or more complicated simulation geometries, users
	typically use another code as a builder and convert its output to
	LAMMPS input format, or write their own code to generate atom
	coordinate and molecular topology for LAMMPS to read in.

	For complicated molecular systems (e.g. a protein), a multitude of
	topology information and hundreds of force-field coefficients must
	typically be specified. We suggest you use a program like
	"CHARMM"_charmm or "AMBER"_amber or other molecular builders to setup
	such problems and dump its information to a file. You can then
	reformat the file as LAMMPS input. Some of the tools in "this
	section"_Section_tools.html can assist in this process.

	Similarly, LAMMPS creates output files in a simple format. Most users
	post-process these files with their own analysis tools or re-format
	them for input into other programs, including visualization packages.
	If you are convinced you need to compute something on-the-fly as
	LAMMPS runs, see "Section 10"_Section_modify.html for a discussion
	of how you can use the "dump"_dump.html and "compute"_compute.html and
	"fix"_fix.html commands to print out data of your choosing. Keep in
	mind that complicated computations can slow down the molecular
	dynamics timestepping, particularly if the computations are not
	parallel, so it is often better to leave such analysis to
	post-processing codes.

	For high-quality visualization we recommend the
	following packages:

	"VMD"_http://www.ks.uiuc.edu/Research/vmd
	"AtomEye"_http://mt.seas.upenn.edu/Archive/Graphics/A
	"OVITO"_http://www.ovito.org/
	"ParaView"_http://www.paraview.org/
	"PyMol"_http://www.pymol.org
	"Raster3d"_http://www.bmsc.washington.edu/raster3d/raster3d.html
	"RasMol"_http://www.openrasmol.org :ul

	Other features that LAMMPS does not yet (and may never) support are
	discussed in "Section 13"_Section_history.html.

	Finally, these are freely-available molecular dynamics codes, most of
	them parallel, which may be well-suited to the problems you want to
	model. They can also be used in conjunction with LAMMPS to perform
	complementary modeling tasks.

	"CHARMM"_charmm
	"AMBER"_amber
	"NAMD"_namd
	"NWCHEM"_nwchem
	"DL_POLY"_dlpoly
	"Tinker"_tinker :ul

	:link(charmm,http://www.charmm.org)
	:link(amber,http://ambermd.org)
	:link(namd,http://www.ks.uiuc.edu/Research/namd/)
	:link(nwchem,http://www.emsl.pnl.gov/docs/nwchem/nwchem.html)
	:link(dlpoly,http://www.ccp5.ac.uk/DL_POLY_CLASSIC)
	:link(tinker,http://dasher.wustl.edu/tinker)

	CHARMM, AMBER, NAMD, NWCHEM, and Tinker are designed primarily for
	modeling biological molecules. CHARMM and AMBER use
	atom-decomposition (replicated-data) strategies for parallelism; NAMD
	and NWCHEM use spatial-decomposition approaches, similar to LAMMPS.
	Tinker is a serial code. DL_POLY includes potentials for a variety of
	biological and non-biological materials; both a replicated-data and
	spatial-decomposition version exist.

	:line

	1.4 Open source distribution :link(intro_4),h4

	LAMMPS comes with no warranty of any kind. As each source file states
	in its header, it is a copyrighted code that is distributed free-of-
	charge, under the terms of the "GNU Public License"_gnu (GPL). This
	is often referred to as open-source distribution - see
	"www.gnu.org"_gnuorg or "www.opensource.org"_opensource for more
	details. The legal text of the GPL is in the LICENSE file that is
	included in the LAMMPS distribution.

	:link(gnuorg,http://www.gnu.org)
	:link(opensource,http://www.opensource.org)

	Here is a summary of what the GPL means for LAMMPS users:

	(1) Anyone is free to use, modify, or extend LAMMPS in any way they
	choose, including for commercial purposes.

	(2) If you distribute a modified version of LAMMPS, it must remain
	open-source, meaning you distribute it under the terms of the GPL.
	You should clearly annotate such a code as a derivative version of
	LAMMPS.

	(3) If you release any code that includes LAMMPS source code, then it
	must also be open-sourced, meaning you distribute it under the terms
	of the GPL.

	(4) If you give LAMMPS files to someone else, the GPL LICENSE file and
	source file headers (including the copyright and GPL notices) should
	remain part of the code.

	In the spirit of an open-source code, these are various ways you can
	contribute to making LAMMPS better. You can send email to the
	"developers"_http://lammps.sandia.gov/authors.html on any of these
	items.

	Point prospective users to the "LAMMPS WWW Site"_lws. Mention it in
	talks or link to it from your WWW site. :ulb,l

	If you find an error or omission in this manual or on the "LAMMPS WWW
	Site"_lws, or have a suggestion for something to clarify or include,
	send an email to the
	"developers"_http://lammps.sandia.gov/authors.html. :l

	If you find a bug, "Section 12.2"_Section_errors.html#err_2
	describes how to report it. :l

	If you publish a paper using LAMMPS results, send the citation (and
	any cool pictures or movies if you like) to add to the Publications,
	Pictures, and Movies pages of the "LAMMPS WWW Site"_lws, with links
	and attributions back to you. :l

	Create a new Makefile.machine that can be added to the src/MAKE
	directory. :l

	The tools sub-directory of the LAMMPS distribution has various
	stand-alone codes for pre- and post-processing of LAMMPS data. More
	details are given in "Section 9"_Section_tools.html. If you write
	a new tool that users will find useful, it can be added to the LAMMPS
	distribution. :l

	LAMMPS is designed to be easy to extend with new code for features
	like potentials, boundary conditions, diagnostic computations, etc.
	"This section"_Section_modify.html gives details. If you add a
	feature of general interest, it can be added to the LAMMPS
	distribution. :l

	The Benchmark page of the "LAMMPS WWW Site"_lws lists LAMMPS
	performance on various platforms. The files needed to run the
	benchmarks are part of the LAMMPS distribution. If your machine is
	sufficiently different from those listed, your timing data can be
	added to the page. :l

	You can send feedback for the User Comments page of the "LAMMPS WWW
	Site"_lws. It might be added to the page. No promises. :l

	Cash. Small denominations, unmarked bills preferred. Paper sack OK.
	Leave on desk. VISA also accepted. Chocolate chip cookies
	encouraged. :l
	:ule

	:line

	1.5 Acknowledgments and citations :h4,link(intro_5)

	LAMMPS development has been funded by the "US Department of
	Energy"_doe (DOE), through its CRADA, LDRD, ASCI, and Genomes-to-Life
	programs and its "OASCR"_oascr and "OBER"_ober offices.

	Specifically, work on the latest version was funded in part by the US
	Department of Energy's Genomics:GTL program
	("www.doegenomestolife.org"_gtl) under the "project"_ourgtl, "Carbon
	Sequestration in Synechococcus Sp.: From Molecular Machines to
	Hierarchical Modeling".

	:link(doe,http://www.doe.gov)
	:link(gtl,http://www.doegenomestolife.org)
	:link(ourgtl,http://www.genomes2life.org)
	:link(oascr,http://www.sc.doe.gov/ascr/home.html)
	:link(ober,http://www.er.doe.gov/production/ober/ober_top.html)

	The following paper describe the basic parallel algorithms used in
	LAMMPS. If you use LAMMPS results in your published work, please cite
	this paper and include a pointer to the "LAMMPS WWW Site"_lws
	(http://lammps.sandia.gov):

	S. Plimpton, [Fast Parallel Algorithms for Short-Range Molecular
	Dynamics], J Comp Phys, 117, 1-19 (1995).

	Other papers describing specific algorithms used in LAMMPS are listed
	under the "Citing LAMMPS link"_http://lammps.sandia.gov/cite.html of
	the LAMMPS WWW page.

	The "Publications link"_http://lammps.sandia.gov/papers.html on the
	LAMMPS WWW page lists papers that have cited LAMMPS. If your paper is
	not listed there for some reason, feel free to send us the info. If
	the simulations in your paper produced cool pictures or animations,
	we'll be pleased to add them to the
	"Pictures"_http://lammps.sandia.gov/pictures.html or
	"Movies"_http://lammps.sandia.gov/movies.html pages of the LAMMPS WWW
	site.

	The core group of LAMMPS developers is at Sandia National Labs:

	Steve Plimpton, sjplimp at sandia.gov
	Aidan Thompson, athomps at sandia.gov
	Paul Crozier, pscrozi at sandia.gov :ul

	The following folks are responsible for significant contributions to
	the code, or other aspects of the LAMMPS development effort. Many of
	the packages they have written are somewhat unique to LAMMPS and the
	code would not be as general-purpose as it is without their expertise
	and efforts.

	-Axel Kohlmeyer (Temple U), akohlmey at gmail.com, SVN and Git repositories, indefatigable mail list responder, USER-CG-CMM and USER-OMP packages
	+Axel Kohlmeyer (Temple U), akohlmey at gmail.com, SVN and Git repositories, indefatigable mail list responder, USER-CGSDK and USER-OMP packages
	Roy Pollock (LLNL), Ewald and PPPM solvers
	Mike Brown (ORNL), brownw at ornl.gov, GPU package
	Greg Wagner (Sandia), gjwagne at sandia.gov, MEAM package for MEAM potential
	Mike Parks (Sandia), mlparks at sandia.gov, PERI package for Peridynamics
	Rudra Mukherjee (JPL), Rudranarayan.M.Mukherjee at jpl.nasa.gov, POEMS package for articulated rigid body motion
	Reese Jones (Sandia) and collaborators, rjones at sandia.gov, USER-ATC package for atom/continuum coupling
	Ilya Valuev (JIHT), valuev at physik.hu-berlin.de, USER-AWPMD package for wave-packet MD
	Christian Trott (U Tech Ilmenau), christian.trott at tu-ilmenau.de, USER-CUDA package
	Andres Jaramillo-Botero (Caltech), ajaramil at wag.caltech.edu, USER-EFF package for electron force field
	Christoph Kloss (JKU), Christoph.Kloss at jku.at, USER-LIGGGHTS package for granular models and granular/fluid coupling
	Metin Aktulga (LBL), hmaktulga at lbl.gov, USER-REAXC package for C version of ReaxFF
	Georg Gunzenmuller (EMI), georg.ganzenmueller at emi.fhg.de, USER-SPH package :ul

	As discussed in "Section 13"_Section_history.html, LAMMPS
	originated as a cooperative project between DOE labs and industrial
	partners. Folks involved in the design and testing of the original
	version of LAMMPS were the following:

	John Carpenter (Mayo Clinic, formerly at Cray Research)
	Terry Stouch (Lexicon Pharmaceuticals, formerly at Bristol Myers Squibb)
	Steve Lustig (Dupont)
	Jim Belak (LLNL) :ul
	diff --git a/doc/src/Section_packages.txt b/doc/src/Section_packages.txt
	index b327b7b1c..2a0a8386e 100644
	--- a/doc/src/Section_packages.txt
	+++ b/doc/src/Section_packages.txt
	@@ -1,1904 +1,2601 @@
	"Previous Section"_Section_commands.html - "LAMMPS WWW Site"_lws -
	"LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next
	Section"_Section_accelerate.html :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	4. Packages :h3

	-This section gives an overview of the add-on optional packages that
	-extend LAMMPS functionality. Packages are groups of files that enable
	-a specific set of features. For example, force fields for molecular
	-systems or granular systems are in packages. You can see the list of
	-all packages by typing "make package" from within the src directory of
	-the LAMMPS distribution.
	-
	-Here are links for two tables below, which list standard and user
	-packages.
	-
	-4.1 "Standard packages"_#pkg_1
	-4.2 "User packages"_#pkg_2 :all(b)
	-
	-"Section 2.3"_Section_start.html#start_3 of the manual describes
	-the difference between standard packages and user packages. It also
	-has general details on how to include/exclude specific packages as
	-part of the LAMMPS build process, and on how to build auxiliary
	-libraries or modify a machine Makefile if a package requires it.
	-
	-Following the two tables below, is a sub-section for each package. It
	-has a summary of what the package contains. It has specific
	-instructions on how to install it, build or obtain any auxiliary
	-library it requires, and any Makefile.machine changes it requires. It
	-also lists pointers to examples of its use or documentation provided
	-in the LAMMPS distribution. If you want to know the complete list of
	-commands that a package adds to LAMMPS, simply list the files in its
	-directory, e.g. "ls src/GRANULAR". Source files with names that start
	-with compute, fix, pair, bond, etc correspond to command styles with
	-the same names.
	-
	-NOTE: The USER package sub-sections below are still being filled in,
	-as of March 2016.
	-
	-Unless otherwise noted below, every package is independent of all the
	-others. I.e. any package can be included or excluded in a LAMMPS
	-build, independent of all other packages. However, note that some
	-packages include commands derived from commands in other packages. If
	-the other package is not installed, the derived command from the new
	-package will also not be installed when you include the new one.
	-E.g. the pair lj/cut/coul/long/omp command from the USER-OMP package
	-will not be installed as part of the USER-OMP package if the KSPACE
	-package is not also installed, since it contains the pair
	-lj/cut/coul/long command. If you later install the KSPACE package and
	-the USER-OMP package is already installed, both the pair
	-lj/cut/coul/long and lj/cut/coul/long/omp commands will be installed.
	-
	-:line
	-
	-4.1 Standard packages :h4,link(pkg_1)
	-
	-The current list of standard packages is as follows. Each package
	-name links to a sub-section below with more details.
	-
	-Package, Description, Author(s), Doc page, Example, Library
	-"ASPHERE"_#ASPHERE, aspherical particles, -, "Section 6.6.14"_Section_howto.html#howto_14, ellipse, -
	-"BODY"_#BODY, body-style particles, -, "body"_body.html, body, -
	-"CLASS2"_#CLASS2, class 2 force fields, -, "pair_style lj/class2"_pair_class2.html, -, -
	-"COLLOID"_#COLLOID, colloidal particles, Kumar (1), "atom_style colloid"_atom_style.html, colloid, -
	-"COMPRESS"_#COMPRESS, I/O compression, Axel Kohlmeyer (Temple U), "dump */gz"_dump.html, -, -
	-"CORESHELL"_#CORESHELL, adiabatic core/shell model, Hendrik Heenen (Technical U of Munich), "Section 6.6.25"_Section_howto.html#howto_25, coreshell, -
	-"DIPOLE"_#DIPOLE, point dipole particles, -, "pair_style dipole/cut"_pair_dipole.html, dipole, -
	-"GPU"_#GPU, GPU-enabled styles, Mike Brown (ORNL), "Section 5.3.1"_accelerate_gpu.html, gpu, lib/gpu
	-"GRANULAR"_#GRANULAR, granular systems, -, "Section 6.6.6"_Section_howto.html#howto_6, pour, -
	-"KIM"_#KIM, openKIM potentials, Smirichinski & Elliot & Tadmor (3), "pair_style kim"_pair_kim.html, kim, KIM
	-"KOKKOS"_#KOKKOS, Kokkos-enabled styles, Trott & Moore (4), "Section 5.3.3"_accelerate_kokkos.html, kokkos, lib/kokkos
	-"KSPACE"_#KSPACE, long-range Coulombic solvers, -, "kspace_style"_kspace_style.html, peptide, -
	-"MANYBODY"_#MANYBODY, many-body potentials, -, "pair_style tersoff"_pair_tersoff.html, shear, -
	-"MEAM"_#MEAM, modified EAM potential, Greg Wagner (Sandia), "pair_style meam"_pair_meam.html, meam, lib/meam
	-"MC"_#MC, Monte Carlo options, -, "fix gcmc"_fix_gcmc.html, -, -
	-"MOLECULE"_#MOLECULE, molecular system force fields, -, "Section 6.6.3"_Section_howto.html#howto_3, peptide, -
	-"OPT"_#OPT, optimized pair styles, Fischer & Richie & Natoli (2), "Section 5.3.5"_accelerate_opt.html, -, -
	-"PERI"_#PERI, Peridynamics models, Mike Parks (Sandia), "pair_style peri"_pair_peri.html, peri, -
	-"POEMS"_#POEMS, coupled rigid body motion, Rudra Mukherjee (JPL), "fix poems"_fix_poems.html, rigid, lib/poems
	-"PYTHON"_#PYTHON, embed Python code in an input script, -, "python"_python.html, python, lib/python
	-"REAX"_#REAX, ReaxFF potential, Aidan Thompson (Sandia), "pair_style reax"_pair_reax.html, reax, lib/reax
	-"REPLICA"_#REPLICA, multi-replica methods, -, "Section 6.6.5"_Section_howto.html#howto_5, tad, -
	-"RIGID"_#RIGID, rigid bodies, -, "fix rigid"_fix_rigid.html, rigid, -
	-"SHOCK"_#SHOCK, shock loading methods, -, "fix msst"_fix_msst.html, -, -
	-"SNAP"_#SNAP, quantum-fit potential, Aidan Thompson (Sandia), "pair snap"_pair_snap.html, snap, -
	-"SRD"_#SRD, stochastic rotation dynamics, -, "fix srd"_fix_srd.html, srd, -
	-"VORONOI"_#VORONOI, Voronoi tesselations, Daniel Schwen (LANL), "compute voronoi/atom"_compute_voronoi_atom.html, -, Voro++
	-:tb(ea=c)
	-
	-The "Authors" column lists a name(s) if a specific person is
	-responsible for creating and maintaining the package.
	-
	-(1) The COLLOID package includes Fast Lubrication Dynamics pair styles
	-which were created by Amit Kumar and Michael Bybee from Jonathan
	-Higdon's group at UIUC.
	-
	-(2) The OPT package was created by James Fischer (High Performance
	-Technologies), David Richie, and Vincent Natoli (Stone Ridge
	-Technolgy).
	-
	-(3) The KIM package was created by Valeriu Smirichinski, Ryan Elliott,
	-and Ellad Tadmor (U Minn).
	-
	-(4) The KOKKOS package was created primarily by Christian Trott and
	-Stan Moore (Sandia). It uses the Kokkos library which was developed
	-by Carter Edwards, Christian Trott, and others at Sandia.
	+This section gives an overview of the optional packages that extend
	+LAMMPS functionality with instructions on how to build LAMMPS with
	+each of them. Packages are groups of files that enable a specific set
	+of features. For example, force fields for molecular systems or
	+granular systems are in packages. You can see the list of all
	+packages and "make" commands to manage them by typing "make package"
	+from within the src directory of the LAMMPS distribution. "Section
	+2.3"_Section_start.html#start_3 gives general info on how to install
	+and un-install packages as part of the LAMMPS build process.
	+
	+There are two kinds of packages in LAMMPS, standard and user packages:
	+
	+"Table of standard packages"_#table_standard
	+"Table of user packages"_#table_user :ul
	+
	+Standard packages are supported by the LAMMPS developers and are
	+written in a syntax and style consistent with the rest of LAMMPS.
	+This means the developers will answer questions about them, debug and
	+fix them if necessary, and keep them compatible with future changes to
	+LAMMPS.
	+
	+User packages have been contributed by users, and begin with the
	+"user" prefix. If they are a single command (single file), they are
	+typically in the user-misc package. User packages don't necessarily
	+meet the requirements of the standard packages. If you have problems
	+using a feature provided in a user package, you may need to contact
	+the contributor directly to get help. Information on how to submit
	+additions you make to LAMMPS as single files or as a standard or user
	+package are given in "this section"_Section_modify.html#mod_15 of the
	+manual.
	+
	+Following the next two tables is a sub-section for each package. It
	+lists authors (if applicable) and summarizes the package contents. It
	+has specific instructions on how to install the package, including (if
	+necessary) downloading or building any extra library it requires. It
	+also gives links to documentation, example scripts, and
	+pictures/movies (if available) that illustrate use of the package.
	+
	+NOTE: To see the complete list of commands a package adds to LAMMPS,
	+just look at the files in its src directory, e.g. "ls src/GRANULAR".
	+Files with names that start with fix, compute, atom, pair, bond,
	+angle, etc correspond to commands with the same style names.
	+
	+In these two tables, the "Example" column is a sub-directory in the
	+examples directory of the distribution which has an input script that
	+uses the package. E.g. "peptide" refers to the examples/peptide
	+directory; USER/atc refers to the examples/USER/atc directory. The
	+"Library" column indicates whether an extra library is needed to build
	+and use the package:
	+
	+dash = no library
	+sys = system library: you likely have it on your machine
	+int = internal library: provided with LAMMPS, but you may need to build it
	+ext = external library: you will need to download and install it on your machine :ul

	-The "Doc page" column links to either a sub-section of the
	-"Section 6"_Section_howto.html of the manual, or an input script
	-command implemented as part of the package, or to additional
	-documentation provided within the package.
	-
	-The "Example" column is a sub-directory in the examples directory of
	-the distribution which has an input script that uses the package.
	-E.g. "peptide" refers to the examples/peptide directory.
	+:line
	+:line

	-The "Library" column lists an external library which must be built
	-first and which LAMMPS links to when it is built. If it is listed as
	-lib/package, then the code for the library is under the lib directory
	-of the LAMMPS distribution. See the lib/package/README file for info
	-on how to build the library. If it is not listed as lib/package, then
	-it is a third-party library not included in the LAMMPS distribution.
	-See details on all of this below for individual packages.
	+[Standard packages] :link(table_standard),p
	+
	+Package, Description, Doc page, Example, Library
	+"ASPHERE"_#ASPHERE, aspherical particle models, "Section 6.6.14"_Section_howto.html#howto_14, ellipse, -
	+"BODY"_#BODY, body-style particles, "body"_body.html, body, -
	+"CLASS2"_#CLASS2, class 2 force fields, "pair_style lj/class2"_pair_class2.html, -, -
	+"COLLOID"_#COLLOID, colloidal particles, "atom_style colloid"_atom_style.html, colloid, -
	+"COMPRESS"_#COMPRESS, I/O compression, "dump */gz"_dump.html, -, sys
	+"CORESHELL"_#CORESHELL, adiabatic core/shell model, "Section 6.6.25"_Section_howto.html#howto_25, coreshell, -
	+"DIPOLE"_#DIPOLE, point dipole particles, "pair_style dipole/cut"_pair_dipole.html, dipole, -
	+"GPU"_#GPU, GPU-enabled styles, "Section 5.3.1"_accelerate_gpu.html, WWW bench, int
	+"GRANULAR"_#GRANULAR, granular systems, "Section 6.6.6"_Section_howto.html#howto_6, pour, -
	+"KIM"_#KIM, openKIM wrapper, "pair_style kim"_pair_kim.html, kim, ext
	+"KOKKOS"_#KOKKOS, Kokkos-enabled styles, "Section 5.3.3"_accelerate_kokkos.html, WWW bench, -
	+"KSPACE"_#KSPACE, long-range Coulombic solvers, "kspace_style"_kspace_style.html, peptide, -
	+"MANYBODY"_#MANYBODY, many-body potentials, "pair_style tersoff"_pair_tersoff.html, shear, -
	+"MC"_#MC, Monte Carlo options, "fix gcmc"_fix_gcmc.html, -, -
	+"MEAM"_#MEAM, modified EAM potential, "pair_style meam"_pair_meam.html, meam, int
	+"MISC"_#MISC, miscellanous single-file commands, -, -, -
	+"MOLECULE"_#MOLECULE, molecular system force fields, "Section 6.6.3"_Section_howto.html#howto_3, peptide, -
	+"MPIIO"_#MPIIO, MPI parallel I/O dump and restart, "dump"_dump.html, -, -
	+"MSCG"_#MSCG, multi-scale coarse-graining wrapper, "fix mscg"_fix_mscg.html, mscg, ext
	+"OPT"_#OPT, optimized pair styles, "Section 5.3.5"_accelerate_opt.html, WWW bench, -
	+"PERI"_#PERI, Peridynamics models, "pair_style peri"_pair_peri.html, peri, -
	+"POEMS"_#POEMS, coupled rigid body motion, "fix poems"_fix_poems.html, rigid, int
	+"PYTHON"_#PYTHON, embed Python code in an input script, "python"_python.html, python, sys
	+"QEQ"_#QEQ, QEq charge equilibration, "fix qeq"_fix_qeq.html, qeq, -
	+"REAX"_#REAX, ReaxFF potential (Fortran), "pair_style reax"_pair_reax.html, reax, int
	+"REPLICA"_#REPLICA, multi-replica methods, "Section 6.6.5"_Section_howto.html#howto_5, tad, -
	+"RIGID"_#RIGID, rigid bodies and constraints, "fix rigid"_fix_rigid.html, rigid, -
	+"SHOCK"_#SHOCK, shock loading methods, "fix msst"_fix_msst.html, -, -
	+"SNAP"_#SNAP, quantum-fitted potential, "pair snap"_pair_snap.html, snap, -
	+"SRD"_#SRD, stochastic rotation dynamics, "fix srd"_fix_srd.html, srd, -
	+"VORONOI"_#VORONOI, Voronoi tesselation, "compute voronoi/atom"_compute_voronoi_atom.html, -, ext
	+:tb(ea=c,ca1=l)
	+
	+[USER packages] :link(table_user),p
	+
	+Package, Description, Doc page, Example, Library
	+"USER-ATC"_#USER-ATC, atom-to-continuum coupling, "fix atc"_fix_atc.html, USER/atc, int
	+"USER-AWPMD"_#USER-AWPMD, wave-packet MD, "pair_style awpmd/cut"_pair_awpmd.html, USER/awpmd, int
	+"USER-CGDNA"_#USER-CGDNA, coarse-grained DNA force fields, src/USER-CGDNA/README, USER/cgdna, -
	+"USER-CGSDK"_#USER-CGSDK, SDK coarse-graining model, "pair_style lj/sdk"_pair_sdk.html, USER/cgsdk, -
	+"USER-COLVARS"_#USER-COLVARS, collective variables library, "fix colvars"_fix_colvars.html, USER/colvars, int
	+"USER-DIFFRACTION"_#USER-DIFFRACTION, virtual x-ray and electron diffraction,"compute xrd"_compute_xrd.html, USER/diffraction, -
	+"USER-DPD"_#USER-DPD, reactive dissipative particle dynamics, src/USER-DPD/README, USER/dpd, -
	+"USER-DRUDE"_#USER-DRUDE, Drude oscillators, "tutorial"_tutorial_drude.html, USER/drude, -
	+"USER-EFF"_#USER-EFF, electron force field,"pair_style eff/cut"_pair_eff.html, USER/eff, -
	+"USER-FEP"_#USER-FEP, free energy perturbation,"compute fep"_compute_fep.html, USER/fep, -
	+"USER-H5MD"_#USER-H5MD, dump output via HDF5,"dump h5md"_dump_h5md.html, -, ext
	+"USER-INTEL"_#USER-INTEL, optimized Intel CPU and KNL styles,"Section 5.3.2"_accelerate_intel.html, WWW bench, -
	+"USER-LB"_#USER-LB, Lattice Boltzmann fluid,"fix lb/fluid"_fix_lb_fluid.html, USER/lb, -
	+"USER-MANIFOLD"_#USER-MANIFOLD, motion on 2d surfaces,"fix manifoldforce"_fix_manifoldforce.html, USER/manifold, -
	+"USER-MGPT"_#USER-MGPT, fast MGPT multi-ion potentials, "pair_style mgpt"_pair_mgpt.html, USER/mgpt, -
	+"USER-MISC"_#USER-MISC, single-file contributions, USER-MISC/README, USER/misc, -
	+"USER-MOLFILE"_#USER-MOLFILE, "VMD"_VMD molfile plug-ins,"dump molfile"_dump_molfile.html, -, ext
	+"USER-NETCDF"_#USER-NETCDF, dump output via NetCDF,"dump netcdf"_dump_netcdf.html, -, ext
	+"USER-OMP"_#USER-OMP, OpenMP-enabled styles,"Section 5.3.4"_accelerate_omp.html, WWW bench, -
	+"USER-PHONON"_#USER-PHONON, phonon dynamical matrix,"fix phonon"_fix_phonon.html, USER/phonon, -
	+"USER-QMMM"_#USER-QMMM, QM/MM coupling,"fix qmmm"_fix_qmmm.html, USER/qmmm, ext
	+"USER-QTB"_#USER-QTB, quantum nuclear effects,"fix qtb"_fix_qtb.html "fix qbmsst"_fix_qbmsst.html, qtb, -
	+"USER-QUIP"_#USER-QUIP, QUIP/libatoms interface,"pair_style quip"_pair_quip.html, USER/quip, ext
	+"USER-REAXC"_#USER-REAXC, ReaxFF potential (C/C++) ,"pair_style reaxc"_pair_reaxc.html, reax, -
	+"USER-SMD"_#USER-SMD, smoothed Mach dynamics,"SMD User Guide"_PDF/SMD_LAMMPS_userguide.pdf, USER/smd, ext
	+"USER-SMTBQ"_#USER-SMTBQ, second moment tight binding QEq potential,"pair_style smtbq"_pair_smtbq.html, USER/smtbq, -
	+"USER-SPH"_#USER-SPH, smoothed particle hydrodynamics,"SPH User Guide"_PDF/SPH_LAMMPS_userguide.pdf, USER/sph, -
	+"USER-TALLY"_#USER-TALLY, pairwise tally computes,"compute XXX/tally"_compute_tally.html, USER/tally, -
	+"USER-VTK"_#USER-VTK, dump output via VTK, "compute custom/vtk"_dump_custom_vtk.html, -, ext
	+:tb(ea=c,ca1=l)

	:line
	+:line
	+
	+ASPHERE package :link(ASPHERE),h4

	-ASPHERE package :link(ASPHERE),h5
	+[Contents:]

	-Contents: Several computes, time-integration fixes, and pair styles
	-for aspherical particle models: ellipsoids, 2d lines, 3d triangles.
	+Computes, time-integration fixes, and pair styles for aspherical
	+particle models including ellipsoids, 2d lines, and 3d triangles.

	-To install via make or Make.py:
	+[Install or un-install:]

	make yes-asphere
	make machine :pre

	-Make.py -p asphere -a machine :pre
	-
	-To un-install via make or Make.py:
	-
	make no-asphere
	make machine :pre

	-Make.py -p ^asphere -a machine :pre
	+[Supporting info:]

	-Supporting info: "Section 6.14"_Section_howto.html#howto_14,
	-"pair_style gayberne"_pair_gayberne.html, "pair_style
	-resquared"_pair_resquared.html,
	-"doc/PDF/pair_gayberne_extra.pdf"_PDF/pair_gayberne_extra.pdf,
	-"doc/PDF/pair_resquared_extra.pdf"_PDF/pair_resquared_extra.pdf,
	-examples/ASPHERE, examples/ellipse
	+src/ASPHERE: filenames -> commands
	+"Section 6.14"_Section_howto.html#howto_14
	+"pair_style gayberne"_pair_gayberne.html
	+"pair_style resquared"_pair_resquared.html
	+"doc/PDF/pair_gayberne_extra.pdf"_PDF/pair_gayberne_extra.pdf
	+"doc/PDF/pair_resquared_extra.pdf"_PDF/pair_resquared_extra.pdf
	+examples/ASPHERE
	+examples/ellipse
	+http://lammps.sandia.gov/movies.html#line
	+http://lammps.sandia.gov/movies.html#tri :ul

	:line

	-BODY package :link(BODY),h5
	+BODY package :link(BODY),h4
	+
	+[Contents:]

	-Contents: Support for body-style particles. Computes,
	+Body-style particles with internal structure. Computes,
	time-integration fixes, pair styles, as well as the body styles
	themselves. See the "body"_body.html doc page for an overview.

	-To install via make or Make.py:
	+[Install or un-install:]

	make yes-body
	make machine :pre

	-Make.py -p body -a machine :pre
	-
	-To un-install via make or Make.py:
	-
	make no-body
	make machine :pre

	-Make.py -p ^body -a machine :pre
	+[Supporting info:]

	-Supporting info: "atom_style body"_atom_style.html, "body"_body.html,
	-"pair_style body"_pair_body.html, examples/body
	+src/BODY filenames -> commands
	+"body"_body.html
	+"atom_style body"_atom_style.html
	+"fix nve/body"_fix_nve_body.html
	+"pair_style body"_pair_body.html
	+examples/body :ul

	:line

	-CLASS2 package :link(CLASS2),h5
	+CLASS2 package :link(CLASS2),h4

	-Contents: Bond, angle, dihedral, improper, and pair styles for the
	-COMPASS CLASS2 molecular force field.
	+[Contents:]

	-To install via make or Make.py:
	+Bond, angle, dihedral, improper, and pair styles for the COMPASS
	+CLASS2 molecular force field.
	+
	+[Install or un-install:]

	make yes-class2
	make machine :pre

	-Make.py -p class2 -a machine :pre
	-
	-To un-install via make or Make.py:
	-
	make no-class2
	make machine :pre

	-Make.py -p ^class2 -a machine :pre
	+[Supporting info:]

	-Supporting info: "bond_style class2"_bond_class2.html, "angle_style
	-class2"_angle_class2.html, "dihedral_style
	-class2"_dihedral_class2.html, "improper_style
	-class2"_improper_class2.html, "pair_style lj/class2"_pair_class2.html
	+src/CLASS2: filenames -> commands
	+"bond_style class2"_bond_class2.html
	+"angle_style class2"_angle_class2.html
	+"dihedral_style class2"_dihedral_class2.html
	+"improper_style class2"_improper_class2.html
	+"pair_style lj/class2"_pair_class2.html :ul

	:line

	-COLLOID package :link(COLLOID),h5
	+COLLOID package :link(COLLOID),h4

	-Contents: Support for coarse-grained colloidal particles. Wall fix
	-and pair styles that implement colloidal interaction models for
	-finite-size particles. This includes the Fast Lubrication Dynamics
	-method for hydrodynamic interactions, which is a simplified
	-approximation to Stokesian dynamics.
	+[Contents:]

	-To install via make or Make.py:
	+Coarse-grained finite-size colloidal particles. Pair stayle and fix
	+wall styles for colloidal interactions. Includes the Fast Lubrication
	+Dynamics (FLD) method for hydrodynamic interactions, which is a
	+simplified approximation to Stokesian dynamics.

	-make yes-colloid
	-make machine :pre
	+[Authors:] This package includes Fast Lubrication Dynamics pair styles
	+which were created by Amit Kumar and Michael Bybee from Jonathan
	+Higdon's group at UIUC.

	-Make.py -p colloid -a machine :pre
	+[Install or un-install:]

	-To un-install via make or Make.py:
	+make yes-colloid
	+make machine :pre

	make no-colloid
	make machine :pre

	-Make.py -p ^colloid -a machine :pre
	+[Supporting info:]

	-Supporting info: "fix wall/colloid"_fix_wall.html, "pair_style
	-colloid"_pair_colloid.html, "pair_style
	-yukawa/colloid"_pair_yukawa_colloid.html, "pair_style
	-brownian"_pair_brownian.html, "pair_style
	-lubricate"_pair_lubricate.html, "pair_style
	-lubricateU"_pair_lubricateU.html, examples/colloid, examples/srd
	+src/COLLOID: filenames -> commands
	+"fix wall/colloid"_fix_wall.html
	+"pair_style colloid"_pair_colloid.html
	+"pair_style yukawa/colloid"_pair_yukawa_colloid.html
	+"pair_style brownian"_pair_brownian.html
	+"pair_style lubricate"_pair_lubricate.html
	+"pair_style lubricateU"_pair_lubricateU.html
	+examples/colloid
	+examples/srd :ul

	:line

	-COMPRESS package :link(COMPRESS),h5
	+COMPRESS package :link(COMPRESS),h4

	-Contents: Support for compressed output of dump files via the zlib
	-compression library, using dump styles with a "gz" in their style
	-name.
	+[Contents:]

	-Building with the COMPRESS package assumes you have the zlib
	-compression library available on your system. The build uses the
	-lib/compress/Makefile.lammps file in the compile/link process. You
	-should only need to edit this file if the LAMMPS build cannot find the
	-zlib info it specifies.
	+Compressed output of dump files via the zlib compression library,
	+using dump styles with a "gz" in their style name.

	-To install via make or Make.py:
	+To use this package you must have the zlib compression library
	+available on your system.

	-make yes-compress
	-make machine :pre
	+[Author:] Axel Kohlmeyer (Temple U).

	-Make.py -p compress -a machine :pre
	+[Install or un-install:]

	-To un-install via make or Make.py:
	+Note that building with this package assumes you have the zlib
	+compression library available on your system. The LAMMPS build uses
	+the settings in the lib/compress/Makefile.lammps file in the
	+compile/link process. You should only need to edit this file if the
	+LAMMPS build fails on your system.
	+
	+make yes-compress
	+make machine :pre

	make no-compress
	make machine :pre

	-Make.py -p ^compress -a machine :pre
	+[Supporting info:]

	-Supporting info: src/COMPRESS/README, lib/compress/README, "dump
	-atom/gz"_dump.html, "dump cfg/gz"_dump.html, "dump
	-custom/gz"_dump.html, "dump xyz/gz"_dump.html
	+src/COMPRESS: filenames -> commands
	+src/COMPRESS/README
	+lib/compress/README
	+"dump atom/gz"_dump.html
	+"dump cfg/gz"_dump.html
	+"dump custom/gz"_dump.html
	+"dump xyz/gz"_dump.html :ul

	:line

	-CORESHELL package :link(CORESHELL),h5
	+CORESHELL package :link(CORESHELL),h4

	-Contents: Compute and pair styles that implement the adiabatic
	-core/shell model for polarizability. The compute temp/cs command
	-measures the temperature of a system with core/shell particles. The
	-pair styles augment Born, Buckingham, and Lennard-Jones styles with
	-core/shell capabilities. See "Section 6.26"_Section_howto.html#howto_26
	-for an overview of how to use the package.
	+[Contents:]

	-To install via make or Make.py:
	+Compute and pair styles that implement the adiabatic core/shell model
	+for polarizability. The pair styles augment Born, Buckingham, and
	+Lennard-Jones styles with core/shell capabilities. The "compute
	+temp/cs"_compute_temp_cs.html command calculates the temperature of a
	+system with core/shell particles. See "Section
	+6.26"_Section_howto.html#howto_26 for an overview of how to use this
	+package.

	-make yes-coreshell
	-make machine :pre
	+[Author:] Hendrik Heenen (Technical U of Munich).

	-Make.py -p coreshell -a machine :pre
	+[Install or un-install:]

	-To un-install via make or Make.py:
	+make yes-coreshell
	+make machine :pre

	make no-coreshell
	make machine :pre

	-Make.py -p ^coreshell -a machine :pre
	+[Supporting info:]

	-Supporting info: "Section 6.26"_Section_howto.html#howto_26,
	-"compute temp/cs"_compute_temp_cs.html,
	-"pair_style born/coul/long/cs"_pair_cs.html, "pair_style
	-buck/coul/long/cs"_pair_cs.html, pair_style
	-lj/cut/coul/long/cs"_pair_lj.html, examples/coreshell
	+src/CORESHELL: filenames -> commands
	+"Section 6.26"_Section_howto.html#howto_26
	+"Section 6.25"_Section_howto.html#howto_25
	+"compute temp/cs"_compute_temp_cs.html
	+"pair_style born/coul/long/cs"_pair_cs.html
	+"pair_style buck/coul/long/cs"_pair_cs.html
	+"pair_style lj/cut/coul/long/cs"_pair_lj.html
	+examples/coreshell :ul

	:line

	-DIPOLE package :link(DIPOLE),h5
	+DIPOLE package :link(DIPOLE),h4

	-Contents: An atom style and several pair styles to support point
	-dipole models with short-range or long-range interactions.
	+[Contents:]

	-To install via make or Make.py:
	+An atom style and several pair styles for point dipole models with
	+short-range or long-range interactions.
	+
	+[Install or un-install:]

	make yes-dipole
	make machine :pre

	-Make.py -p dipole -a machine :pre
	-
	-To un-install via make or Make.py:
	-
	make no-dipole
	make machine :pre

	-Make.py -p ^dipole -a machine :pre
	+[Supporting info:]

	-Supporting info: "atom_style dipole"_atom_style.html, "pair_style
	-lj/cut/dipole/cut"_pair_dipole.html, "pair_style
	-lj/cut/dipole/long"_pair_dipole.html, "pair_style
	-lj/long/dipole/long"_pair_dipole.html, examples/dipole
	+src/DIPOLE: filenames -> commands
	+"atom_style dipole"_atom_style.html
	+"pair_style lj/cut/dipole/cut"_pair_dipole.html
	+"pair_style lj/cut/dipole/long"_pair_dipole.html
	+"pair_style lj/long/dipole/long"_pair_dipole.html
	+examples/dipole :ul

	:line

	-GPU package :link(GPU),h5
	+GPU package :link(GPU),h4
	+
	+[Contents:]

	-Contents: Dozens of pair styles and a version of the PPPM long-range
	-Coulombic solver for NVIDIA GPUs. All of them have a "gpu" in their
	-style name. "Section 5.3.1"_accelerate_gpu.html gives
	+Dozens of pair styles and a version of the PPPM long-range Coulombic
	+solver optimized for NVIDIA GPUs. All such styles have a "gpu" as a
	+suffix in their style name. "Section 5.3.1"_accelerate_gpu.html gives
	details of what hardware and Cuda software is required on your system,
	-and how to build and use this package. See the KOKKOS package, which
	-also has GPU-enabled styles.
	-
	-Building LAMMPS with the GPU package requires first building the GPU
	-library itself, which is a set of C and Cuda files in lib/gpu.
	-Details of how to do this are in lib/gpu/README. As illustrated
	-below, perform a "make" using one of the Makefile.machine files in
	-lib/gpu which should create a lib/reax/libgpu.a file.
	-Makefile.linux.* and Makefile.xk7 are examples for different
	-platforms. There are 3 important settings in the Makefile.machine you
	-use:
	+and details on how to build and use this package. Its styles can be
	+invoked at run time via the "-sf gpu" or "-suffix gpu" "command-line
	+switches"_Section_start.html#start_7. See also the "KOKKOS"_#KOKKOS
	+package, which has GPU-enabled styles.
	+
	+[Authors:] Mike Brown (Intel) while at Sandia and ORNL and Trung Nguyen
	+(Northwestern U) while at ORNL.
	+
	+[Install or un-install:]
	+
	+Before building LAMMPS with this package, you must first build the GPU
	+library in lib/gpu from a set of provided C and Cuda files. You can
	+do this manually if you prefer; follow the instructions in
	+lib/gpu/README. You can also do it in one step from the lammps/src
	+dir, using a command like these, which simply invoke the
	+lib/gpu/Install.py script with the specified args:
	+
	+make lib-gpu # print help message
	+make lib-gpu args="-m" # build GPU library with default Makefile.linux
	+make lib-gpu args="-i xk7 -p single -o xk7.single" # create new Makefile.xk7.single, altered for single-precision
	+make lib-gpu args="-i xk7 -p single -o xk7.single -m" # ditto, also build GPU library
	+
	+Note that this procedure starts with one of the existing
	+Makefile.machine files in lib/gpu. It allows you to alter 4 important
	+settings in that Makefile, via the -h, -a, -p, -e switches,
	+and save the new Makefile, if desired:

	CUDA_HOME = where NVIDIA Cuda software is installed on your system
	-CUDA_ARCH = appropriate to your GPU hardware
	-CUDA_PREC = precision (double, mixed, single) you desire :ul
	-
	-See example Makefile.machine files in lib/gpu for the syntax of these
	-settings. See lib/gpu/Makefile.linux.double for ARCH settings for
	-various NVIDIA GPUs. The "make" also creates a
	-lib/gpu/Makefile.lammps file. This file has settings that enable
	-LAMMPS to link with Cuda libraries. If the settings in
	-Makefile.lammps for your machine are not correct, the LAMMPS link will
	-fail. Note that the Make.py script has a "-gpu" option to allow the
	-GPU library (with several of its options) and LAMMPS to be built in
	-one step, with Type "python src/Make.py -h -gpu" to see the details.
	-
	-To install via make or Make.py:
	-
	-cd ~/lammps/lib/gpu
	-make -f Makefile.linux.mixed # for example
	-cd ~/lammps/src
	-make yes-gpu
	-make machine :pre
	+CUDA_ARCH = what GPU hardware you have (see help message for details)
	+CUDA_PRECISION = precision (double, mixed, single)
	+EXTRAMAKE = which Makefile.lammps.* file to copy to Makefile.lammps :ul
	+
	+If the library build is successful, 2 files should be created:
	+lib/gpu/libgpu.a and lib/gpu/Makefile.lammps. The latter has settings
	+that enable LAMMPS to link with Cuda libraries. If the settings in
	+Makefile.lammps for your machine are not correct, the LAMMPS build
	+will fail.

	-Make.py -p gpu -gpu mode=mixed arch=35 -a machine :pre
	+You can then install/un-install the package and build LAMMPS in the
	+usual manner:

	-To un-install via make or Make.py:
	+make yes-gpu
	+make machine :pre

	make no-gpu
	make machine :pre

	-Make.py -p ^gpu -a machine :pre
	+NOTE: If you re-build the GPU library in lib/gpu, you should always
	+un-install the GPU package, then re-install it and re-build LAMMPS.
	+This is because the compilation of files in the GPU package use the
	+library settings from the lib/gpu/Makefile.machine used to build the
	+GPU library.

	-Supporting info: src/GPU/README, lib/gpu/README,
	-"Section 5.3"_Section_accelerate.html#acc_3,
	-"Section 5.3.1"_accelerate_gpu.html,
	-Pair Styles section of "Section 3.5"_Section_commands.html#cmd_5
	-for any pair style listed with a (g),
	-"kspace_style"_kspace_style.html, "package gpu"_package.html,
	-examples/accelerate, bench/FERMI, bench/KEPLER
	+[Supporting info:]
	+
	+src/GPU: filenames -> commands
	+src/GPU/README
	+lib/gpu/README
	+"Section 5.3"_Section_accelerate.html#acc_3
	+"Section 5.3.1"_accelerate_gpu.html
	+"Section 2.7 -sf gpu"_Section_start.html#start_7
	+"Section 2.7 -pk gpu"_Section_start.html#start_7
	+"package gpu"_package.html
	+Pair Styles section of "Section 3.5"_Section_commands.html#cmd_5 for pair styles followed by (g)
	+"Benchmarks page"_http://lammps.sandia.gov/bench.html of web site :ul

	:line

	-GRANULAR package :link(GRANULAR),h5
	+GRANULAR package :link(GRANULAR),h4

	-Contents: Fixes and pair styles that support models of finite-size
	-granular particles, which interact with each other and boundaries via
	-frictional and dissipative potentials.
	+[Contents:]

	-To install via make or Make.py:
	+Pair styles and fixes for finite-size granular particles, which
	+interact with each other and boundaries via frictional and dissipative
	+potentials.
	+
	+[Install or un-install:]

	make yes-granular
	make machine :pre

	-Make.py -p granular -a machine :pre
	-
	-To un-install via make or Make.py:
	-
	make no-granular
	make machine :pre

	-Make.py -p ^granular -a machine :pre
	-
	-Supporting info: "Section 6.6"_Section_howto.html#howto_6, "fix
	-pour"_fix_pour.html, "fix wall/gran"_fix_wall_gran.html, "pair_style
	-gran/hooke"_pair_gran.html, "pair_style
	-gran/hertz/history"_pair_gran.html, examples/pour, bench/in.chute
	+[Supporting info:]
	+
	+src/GRANULAR: filenames -> commands
	+"Section 6.6"_Section_howto.html#howto_6,
	+"fix pour"_fix_pour.html
	+"fix wall/gran"_fix_wall_gran.html
	+"pair_style gran/hooke"_pair_gran.html
	+"pair_style gran/hertz/history"_pair_gran.html
	+examples/granregion
	+examples/pour
	+bench/in.chute
	+http://lammps.sandia.gov/pictures.html#jamming
	+http://lammps.sandia.gov/movies.html#hopper
	+http://lammps.sandia.gov/movies.html#dem
	+http://lammps.sandia.gov/movies.html#brazil
	+http://lammps.sandia.gov/movies.html#granregion :ul

	:line

	-KIM package :link(KIM),h5
	+KIM package :link(KIM),h4

	-Contents: A pair style that interfaces to the Knowledge Base for
	-Interatomic Models (KIM) repository of interatomic potentials, so that
	-KIM potentials can be used in a LAMMPS simulation.
	+[Contents:]

	-To build LAMMPS with the KIM package you must have previously
	-installed the KIM API (library) on your system. The lib/kim/README
	-file explains how to download and install KIM. Building with the KIM
	-package also uses the lib/kim/Makefile.lammps file in the compile/link
	-process. You should not need to edit this file.
	+A "pair_style kim"_pair_kim.html command which is a wrapper on the
	+Knowledge Base for Interatomic Models (KIM) repository of interatomic
	+potentials, enabling any of them to be used in LAMMPS simulations.

	-To install via make or Make.py:
	+To use this package you must have the KIM library available on your
	+system.

	-make yes-kim
	-make machine :pre
	+Information about the KIM project can be found at its website:
	+https://openkim.org. The KIM project is led by Ellad Tadmor and Ryan
	+Elliott (U Minnesota) and James Sethna (Cornell U).
	+
	+[Authors:] Ryan Elliott (U Minnesota) is the main developer for the KIM
	+API which the "pair_style kim"_pair_kim.html command uses. He
	+developed the pair style in collaboration with Valeriu Smirichinski (U
	+Minnesota).
	+
	+[Install or un-install:]

	-Make.py -p kim -a machine :pre
	+Using this package requires the KIM library and its models
	+(interatomic potentials) to be downloaded and installed on your
	+system. The library can be downloaded and built in lib/kim or
	+elsewhere on your system. Details of the download, build, and install
	+process for KIM are given in the lib/kim/README file.

	-To un-install via make or Make.py:
	+Once that process is complete, you can then install/un-install the
	+package and build LAMMPS in the usual manner:
	+
	+make yes-kim
	+make machine :pre

	make no-kim
	make machine :pre

	-Make.py -p ^kim -a machine :pre
	+[Supporting info:]

	-Supporting info: src/KIM/README, lib/kim/README, "pair_style
	-kim"_pair_kim.html, examples/kim
	+src/KIM: filenames -> commands
	+src/KIM/README
	+lib/kim/README
	+"pair_style kim"_pair_kim.html
	+examples/kim :ul

	:line

	-KOKKOS package :link(KOKKOS),h5
	+KOKKOS package :link(KOKKOS),h4

	-Contents: Dozens of atom, pair, bond, angle, dihedral, improper styles
	-which run with the Kokkos library to provide optimization for
	-multicore CPUs (via OpenMP), NVIDIA GPUs, or the Intel Xeon Phi (in
	-native mode). All of them have a "kk" in their style name. "Section
	-5.3.3"_accelerate_kokkos.html gives details of what
	-hardware and software is required on your system, and how to build and
	-use this package. See the GPU, OPT, USER-INTEL, USER-OMP packages,
	-which also provide optimizations for the same range of hardware.
	+[Contents:]

	-Building with the KOKKOS package requires choosing which of 3 hardware
	-options you are optimizing for: CPU acceleration via OpenMP, GPU
	-acceleration, or Intel Xeon Phi. (You can build multiple times to
	-create LAMMPS executables for different hardware.) It also requires a
	-C++11 compatible compiler. For GPUs, the NVIDIA "nvcc" compiler is
	-used, and an appropriate KOKKOS_ARCH setting should be made in your
	-Makefile.machine for your GPU hardware and NVIDIA software.
	+Dozens of atom, pair, bond, angle, dihedral, improper, fix, compute
	+styles adapted to compile using the Kokkos library which can convert
	+them to OpenMP or Cuda code so that they run efficiently on multicore
	+CPUs, KNLs, or GPUs. All the styles have a "kk" as a suffix in their
	+style name. "Section 5.3.3"_accelerate_kokkos.html gives details of
	+what hardware and software is required on your system, and how to
	+build and use this package. Its styles can be invoked at run time via
	+the "-sf kk" or "-suffix kk" "command-line
	+switches"_Section_start.html#start_7. Also see the "GPU"_#GPU,
	+"OPT"_#OPT, "USER-INTEL"_#USER-INTEL, and "USER-OMP"_#USER_OMP
	+packages, which have styles optimized for CPUs, KNLs, and GPUs.

	-The simplest way to do this is to use Makefile.kokkos_cuda or
	-Makefile.kokkos_omp or Makefile.kokkos_phi in src/MAKE/OPTIONS, via
	-"make kokkos_cuda" or "make kokkos_omp" or "make kokkos_phi". (Check
	-the KOKKOS_ARCH setting in Makefile.kokkos_cuda), Or, as illustrated
	-below, you can use the Make.py script with its "-kokkos" option to
	-choose which hardware to build for. Type "python src/Make.py -h
	--kokkos" to see the details. If these methods do not work on your
	-system, you will need to read the "Section 5.3.3"_accelerate_kokkos.html
	-doc page for details of what Makefile.machine settings are needed.
	+You must have a C++11 compatible compiler to use this package.

	-To install via make or Make.py for each of 3 hardware options:
	+[Authors:] The KOKKOS package was created primarily by Christian Trott
	+and Stan Moore (Sandia), with contributions from other folks as well.
	+It uses the open-source "Kokkos library"_https://github.com/kokkos
	+which was developed by Carter Edwards, Christian Trott, and others at
	+Sandia, and which is included in the LAMMPS distribution in
	+lib/kokkos.

	-make yes-kokkos
	-make kokkos_omp # for CPUs with OpenMP
	-make kokkos_cuda # for GPUs, check the KOKKOS_ARCH setting in Makefile.kokkos_cuda
	-make kokkos_phi # for Xeon Phis :pre
	+[Install or un-install:]
	+
	+For the KOKKOS package, you have 3 choices when building. You can
	+build with either CPU or KNL or GPU support. Each choice requires
	+additional settings in your Makefile.machine for the KOKKOS_DEVICES
	+and KOKKOS_ARCH settings. See the src/MAKE/OPTIONS/Makefile.kokkos*
	+files for examples.
	+
	+For multicore CPUs using OpenMP:
	+
	+KOKKOS_DEVICES = OpenMP
	+KOKKOS_ARCH = HSW # HSW = Haswell, SNB = SandyBridge, BDW = Broadwell, etc
	+
	+For Intel KNLs using OpenMP:
	+
	+KOKKOS_DEVICES = OpenMP
	+KOKKOS_ARCH = KNL
	+
	+For NVIDIA GPUs using Cuda:
	+
	+KOKKOS_DEVICES = Cuda
	+KOKKOS_ARCH = Pascal60,Power8 # P100 hosted by an IBM Power8, etc
	+KOKKOS_ARCH = Kepler37,Power8 # K80 hosted by an IBM Power8, etc
	+
	+For GPUs, you also need these 2 lines in your Makefile.machine before
	+the CC line is defined, in this case for use with OpenMPI mpicxx. The
	+2 lines define a nvcc wrapper compiler, which will use nvcc for
	+compiling Cuda files or use a C++ compiler for non-Kokkos, non-Cuda
	+files.

	-Make.py -p kokkos -kokkos omp -a machine # for CPUs with OpenMP
	-Make.py -p kokkos -kokkos cuda arch=35 -a machine # for GPUs of style arch
	-Make.py -p kokkos -kokkos phi -a machine # for Xeon Phis
	+KOKKOS_ABSOLUTE_PATH = $(shell cd $(KOKKOS_PATH); pwd)
	+export OMPI_CXX = $(KOKKOS_ABSOLUTE_PATH)/config/nvcc_wrapper
	+CC = mpicxx

	-To un-install via make or Make.py:
	+Once you have an appropriate Makefile.machine, you can
	+install/un-install the package and build LAMMPS in the usual manner.
	+Note that you cannot build one executable to run on multiple hardware
	+targets (CPU or KNL or GPU). You need to build LAMMPS once for each
	+hardware target, to produce a separate executable. Also note that we
	+do not recommend building with other acceleration packages installed
	+(GPU, OPT, USER-INTEL, USER-OMP) when also building with KOKKOS.

	+make yes-kokkos
	+make machine :pre
	+
	make no-kokkos
	make machine :pre

	-Make.py -p ^kokkos -a machine :pre
	+[Supporting info:]

	-Supporting info: src/KOKKOS/README, lib/kokkos/README,
	-"Section 5.3"_Section_accelerate.html#acc_3,
	-"Section 5.3.3"_accelerate_kokkos.html,
	-Pair Styles section of "Section 3.5"_Section_commands.html#cmd_5
	-for any pair style listed with a (k), "package kokkos"_package.html,
	-examples/accelerate, bench/FERMI, bench/KEPLER
	+src/KOKKOS: filenames -> commands
	+src/KOKKOS/README
	+lib/kokkos/README
	+"Section 5.3"_Section_accelerate.html#acc_3
	+"Section 5.3.3"_accelerate_kokkos.html
	+"Section 2.7 -k on ..."_Section_start.html#start_7
	+"Section 2.7 -sf kk"_Section_start.html#start_7
	+"Section 2.7 -pk kokkos"_Section_start.html#start_7
	+"package kokkos"_package.html
	+Styles sections of "Section 3.5"_Section_commands.html#cmd_5 for styles followed by (k)
	+"Benchmarks page"_http://lammps.sandia.gov/bench.html of web site :ul

	:line

	-KSPACE package :link(KSPACE),h5
	+KSPACE package :link(KSPACE),h4

	-Contents: A variety of long-range Coulombic solvers, and pair styles
	-which compute the corresponding short-range portion of the pairwise
	-Coulombic interactions. These include Ewald, particle-particle
	-particle-mesh (PPPM), and multilevel summation method (MSM) solvers.
	+[Contents:]

	-Building with the KSPACE package requires a 1d FFT library be present
	-on your system for use by the PPPM solvers. This can be the KISS FFT
	-library provided with LAMMPS, or 3rd party libraries like FFTW or a
	-vendor-supplied FFT library. See step 6 of "Section
	-2.2.2"_Section_start.html#start_2_2 of the manual for details of how
	-to select different FFT options in your machine Makefile. The Make.py
	-tool has an "-fft" option which can insert these settings into your
	-machine Makefile automatically. Type "python src/Make.py -h -fft" to
	-see the details.
	+A variety of long-range Coulombic solvers, as well as pair styles
	+which compute the corresponding short-range pairwise Coulombic
	+interactions. These include Ewald, particle-particle particle-mesh
	+(PPPM), and multilevel summation method (MSM) solvers.

	-To install via make or Make.py:
	+[Install or un-install:]
	+
	+Building with this package requires a 1d FFT library be present on
	+your system for use by the PPPM solvers. This can be the KISS FFT
	+library provided with LAMMPS, 3rd party libraries like FFTW, or a
	+vendor-supplied FFT library. See step 6 of "Section
	+2.2.2"_Section_start.html#start_2_2 of the manual for details on how
	+to select different FFT options in your machine Makefile.

	make yes-kspace
	make machine :pre

	-Make.py -p kspace -a machine :pre
	-
	-To un-install via make or Make.py:
	-
	make no-kspace
	make machine :pre

	-Make.py -p ^kspace -a machine :pre
	+[Supporting info:]

	-Supporting info: "kspace_style"_kspace_style.html,
	-"doc/PDF/kspace.pdf"_PDF/kspace.pdf,
	-"Section 6.7"_Section_howto.html#howto_7,
	-"Section 6.8"_Section_howto.html#howto_8,
	-"Section 6.9"_Section_howto.html#howto_9,
	-"pair_style coul"_pair_coul.html, other pair style command doc pages
	-which have "long" or "msm" in their style name,
	-examples/peptide, bench/in.rhodo
	+src/KSPACE: filenames -> commands
	+"kspace_style"_kspace_style.html
	+"doc/PDF/kspace.pdf"_PDF/kspace.pdf
	+"Section 6.7"_Section_howto.html#howto_7
	+"Section 6.8"_Section_howto.html#howto_8
	+"Section 6.9"_Section_howto.html#howto_9
	+"pair_style coul"_pair_coul.html
	+Pair Styles section of "Section 3.5"_Section_commands.html#cmd_5 with "long" or "msm" in pair style name
	+examples/peptide
	+bench/in.rhodo :ul

	:line

	-MANYBODY package :link(MANYBODY),h5
	+MANYBODY package :link(MANYBODY),h4
	+
	+[Contents:]

	-Contents: A variety of many-body and bond-order potentials. These
	-include (AI)REBO, EAM, EIM, BOP, Stillinger-Weber, and Tersoff
	-potentials. Do a directory listing, "ls src/MANYBODY", to see
	-the full list.
	+A variety of manybody and bond-order potentials. These include
	+(AI)REBO, BOP, EAM, EIM, Stillinger-Weber, and Tersoff potentials.

	-To install via make or Make.py:
	+[Install or un-install:]

	make yes-manybody
	make machine :pre

	-Make.py -p manybody -a machine :pre
	-
	-To un-install via make or Make.py:
	-
	make no-manybody
	make machine :pre

	-Make.py -p ^manybody -a machine :pre
	-
	-Supporting info:
	+[Supporting info:]

	-Examples: Pair Styles section of "Section
	-3.5"_Section_commands.html#cmd_5, examples/comb, examples/eim,
	-examples/nb3d, examples/vashishta
	+src/MANYBODY: filenames -> commands
	+Pair Styles section of "Section 3.5"_Section_commands.html#cmd_5
	+examples/comb
	+examples/eim
	+examples/nb3d
	+examples/shear
	+examples/streitz
	+examples/vashishta
	+bench/in.eam :ul

	:line

	-MC package :link(MC),h5
	+MC package :link(MC),h4
	+
	+[Contents:]

	-Contents: Several fixes and a pair style that have Monte Carlo (MC) or
	-MC-like attributes. These include fixes for creating, breaking, and
	-swapping bonds, and for performing atomic swaps and grand-canonical MC
	-in conjuction with dynamics.
	+Several fixes and a pair style that have Monte Carlo (MC) or MC-like
	+attributes. These include fixes for creating, breaking, and swapping
	+bonds, for performing atomic swaps, and performing grand-canonical MC
	+(GCMC) in conjuction with dynamics.

	-To install via make or Make.py:
	+[Install or un-install:]

	make yes-mc
	make machine :pre

	-Make.py -p mc -a machine :pre
	-
	-To un-install via make or Make.py:
	-
	make no-mc
	make machine :pre

	-Make.py -p ^mc -a machine :pre
	+[Supporting info:]

	-Supporting info: "fix atom/swap"_fix_atom_swap.html, "fix
	-bond/break"_fix_bond_break.html, "fix
	-bond/create"_fix_bond_create.html, "fix bond/swap"_fix_bond_swap.html,
	-"fix gcmc"_fix_gcmc.html, "pair_style dsmc"_pair_dsmc.html
	+src/MC: filenames -> commands
	+"fix atom/swap"_fix_atom_swap.html
	+"fix bond/break"_fix_bond_break.html
	+"fix bond/create"_fix_bond_create.html
	+"fix bond/swap"_fix_bond_swap.html
	+"fix gcmc"_fix_gcmc.html
	+"pair_style dsmc"_pair_dsmc.html
	+http://lammps.sandia.gov/movies.html#gcmc :ul

	:line

	-MEAM package :link(MEAM),h5
	+MEAM package :link(MEAM),h4

	-Contents: A pair style for the modified embedded atom (MEAM)
	-potential.
	+[Contents:]

	-Building LAMMPS with the MEAM package requires first building the MEAM
	-library itself, which is a set of Fortran 95 files in lib/meam.
	-Details of how to do this are in lib/meam/README. As illustrated
	-below, perform a "make" using one of the Makefile.machine files in
	-lib/meam which should create a lib/meam/libmeam.a file.
	-Makefile.gfortran and Makefile.ifort are examples for the GNU Fortran
	-and Intel Fortran compilers. The "make" also copies a
	-lib/meam/Makefile.lammps.machine file to lib/meam/Makefile.lammps.
	-This file has settings that enable the C++ compiler used to build
	-LAMMPS to link with a Fortran library (typically the 2 compilers to be
	-consistent e.g. both Intel compilers, or both GNU compilers). If the
	-settings in Makefile.lammps for your compilers and machine are not
	-correct, the LAMMPS link will fail. Note that the Make.py script has
	-a "-meam" option to allow the MEAM library and LAMMPS to be built in
	-one step. Type "python src/Make.py -h -meam" to see the details.
	+A pair style for the modified embedded atom (MEAM) potential.

	-NOTE: The MEAM potential can run dramatically faster if built with the
	-Intel Fortran compiler, rather than the GNU Fortran compiler.
	+[Author:] Greg Wagner (Northwestern U) while at Sandia.

	-To install via make or Make.py:
	+[Install or un-install:]

	-cd ~/lammps/lib/meam
	-make -f Makefile.gfortran # for example
	-cd ~/lammps/src
	-make yes-meam
	-make machine :pre
	+Before building LAMMPS with this package, you must first build the
	+MEAM library in lib/meam. You can do this manually if you prefer;
	+follow the instructions in lib/meam/README. You can also do it in one
	+step from the lammps/src dir, using a command like these, which simply
	+invoke the lib/meam/Install.py script with the specified args:
	+
	+make lib-meam # print help message
	+make lib-meam args="-m gfortran" # build with GNU Fortran compiler
	+make lib-meam args="-m ifort" # build with Intel ifort compiler :pre

	-Make.py -p meam -meam make=gfortran -a machine :pre
	+The build should produce two files: lib/meam/libmeam.a and
	+lib/meam/Makefile.lammps. The latter is copied from an existing
	+Makefile.lammps.* and has settings needed to link C++ (LAMMPS) with
	+Fortran (MEAM library). Typically the two compilers used for LAMMPS
	+and the MEAM library need to be consistent (e.g. both Intel or both
	+GNU compilers). If necessary, you can edit/create a new
	+lib/meam/Makefile.machine file for your system, which should define an
	+EXTRAMAKE variable to specify a corresponding Makefile.lammps.machine
	+file.

	-To un-install via make or Make.py:
	+You can then install/un-install the package and build LAMMPS in the
	+usual manner:
	+
	+make yes-meam
	+make machine :pre

	make no-meam
	make machine :pre

	-Make.py -p ^meam -a machine :pre
	+NOTE: You should test building the MEAM library with both the Intel
	+and GNU compilers to see if a simulation runs faster with one versus
	+the other on your system.
	+
	+[Supporting info:]

	-Supporting info: lib/meam/README, "pair_style meam"_pair_meam.html,
	-examples/meam
	+src/MEAM: filenames -> commands
	+src/meam/README
	+lib/meam/README
	+"pair_style meam"_pair_meam.html
	+examples/meam :ul

	:line

	-MISC package :link(MISC),h5
	+MISC package :link(MISC),h4

	-Contents: A variety of computes, fixes, and pair styles that are not
	-commonly used, but don't align with other packages. Do a directory
	+[Contents:]
	+
	+A variety of compute, fix, pair, dump styles with specialized
	+capabilities that don't align with other packages. Do a directory
	listing, "ls src/MISC", to see the list of commands.

	-To install via make or Make.py:
	+[Install or un-install:]

	make yes-misc
	make machine :pre

	-Make.py -p misc -a machine :pre
	-
	-To un-install via make or Make.py:
	-
	make no-misc
	make machine :pre

	-Make.py -p ^misc -a machine :pre
	+[Supporting info:]

	-Supporting info: "compute ti"_compute_ti.html, "fix
	-evaporate"_fix_evaporate.html, "fix tmm"_fix_ttm.html, "fix
	-viscosity"_fix_viscosity.html, examples/misc
	+src/MISC: filenames -> commands
	+"compute ti"_compute_ti.html
	+"fix evaporate"_fix_evaporate.html
	+"fix orient/fcc"_fix_orient.html
	+"fix ttm"_fix_ttm.html
	+"fix thermal/conductivity"_fix_thermal_conductivity.html
	+"fix viscosity"_fix_viscosity.html
	+examples/KAPPA
	+examples/VISCOSITY
	+http://lammps.sandia.gov/pictures.html#ttm
	+http://lammps.sandia.gov/movies.html#evaporation :ul

	:line

	-MOLECULE package :link(MOLECULE),h5
	+MOLECULE package :link(MOLECULE),h4

	-Contents: A large number of atom, pair, bond, angle, dihedral,
	-improper styles that are used to model molecular systems with fixed
	-covalent bonds. The pair styles include terms for the Dreiding
	-(hydrogen-bonding) and CHARMM force fields, and TIP4P water model.
	+[Contents:]

	-To install via make or Make.py:
	+A large number of atom, pair, bond, angle, dihedral, improper styles
	+that are used to model molecular systems with fixed covalent bonds.
	+The pair styles include the Dreiding (hydrogen-bonding) and CHARMM
	+force fields, and a TIP4P water model.
	+
	+[Install or un-install:]

	make yes-molecule
	make machine :pre

	-Make.py -p molecule -a machine :pre
	-
	-To un-install via make or Make.py:
	-
	make no-molecule
	make machine :pre

	-Make.py -p ^molecule -a machine :pre
	-
	-Supporting info:"atom_style"_atom_style.html,
	-"bond_style"_bond_style.html, "angle_style"_angle_style.html,
	-"dihedral_style"_dihedral_style.html,
	-"improper_style"_improper_style.html, "pair_style
	-hbond/dreiding/lj"_pair_hbond_dreiding.html, "pair_style
	-lj/charmm/coul/charmm"_pair_charmm.html,
	-"Section 6.3"_Section_howto.html#howto_3,
	-examples/micelle, examples/peptide, bench/in.chain, bench/in.rhodo
	+[Supporting info:]
	+
	+src/MOLECULE: filenames -> commands
	+"atom_style"_atom_style.html
	+"bond_style"_bond_style.html
	+"angle_style"_angle_style.html
	+"dihedral_style"_dihedral_style.html
	+"improper_style"_improper_style.html
	+"pair_style hbond/dreiding/lj"_pair_hbond_dreiding.html
	+"pair_style lj/charmm/coul/charmm"_pair_charmm.html
	+"Section 6.3"_Section_howto.html#howto_3
	+examples/cmap
	+examples/dreiding
	+examples/micelle,
	+examples/peptide
	+bench/in.chain
	+bench/in.rhodo :ul

	:line

	-MPIIO package :link(MPIIO),h5
	+MPIIO package :link(MPIIO),h4
	+
	+[Contents:]

	-Contents: Support for parallel output/input of dump and restart files
	-via the MPIIO library, which is part of the standard message-passing
	-interface (MPI) library. It adds "dump styles"_dump.html with a
	-"mpiio" in their style name. Restart files with an ".mpiio" suffix
	-are also written and read in parallel.
	+Support for parallel output/input of dump and restart files via the
	+MPIIO library. It adds "dump styles"_dump.html with a "mpiio" in
	+their style name. Restart files with an ".mpiio" suffix are also
	+written and read in parallel.

	-To install via make or Make.py:
	+[Install or un-install:]

	+Note that MPIIO is part of the standard message-passing interface
	+(MPI) library, so you should not need any additional compiler or link
	+settings, beyond what LAMMPS normally uses for MPI on your system.
	+
	make yes-mpiio
	make machine :pre
	+
	+make no-mpiio
	+make machine :pre
	+
	+[Supporting info:]

	-Make.py -p mpiio -a machine :pre
	+src/MPIIO: filenames -> commands
	+"dump"_dump.html
	+"restart"_restart.html
	+"write_restart"_write_restart.html
	+"read_restart"_read_restart.html :ul

	-To un-install via make or Make.py:
	+:line
	+
	+MSCG package :link(MSCG),h4

	-make no-mpiio
	+[Contents:]
	+
	+A "fix mscg"_fix_mscg.html command which can parameterize a
	+Mulit-Scale Coarse-Graining (MSCG) model using the open-source "MS-CG
	+library"_mscg.
	+
	+:link(mscg,https://github.com/uchicago-voth/MSCG-release)
	+
	+To use this package you must have the MS-CG library available on your
	+system.
	+
	+[Authors:] The fix was written by Lauren Abbott (Sandia). The MS-CG
	+library was developed by Jacob Wagner in Greg Voth's group at the
	+University of Chicago.
	+
	+[Install or un-install:]
	+
	+Before building LAMMPS with this package, you must first download and
	+build the MS-CG library. Building the MS-CG library and using it from
	+LAMMPS requires a C++11 compatible compiler, and that LAPACK and GSL
	+(GNU Scientific Library) libraries be installed on your machine. See
	+the lib/mscg/README and MSCG/Install files for more details.
	+
	+Assuming these libraries are in place, you can do the download and
	+build of MS-CG manually if you prefer; follow the instructions in
	+lib/mscg/README. You can also do it in one step from the lammps/src
	+dir, using a command like these, which simply invoke the
	+lib/mscg/Install.py script with the specified args:
	+
	+make lib-mscg # print help message
	+make lib-mscg args="-g -b -l" # download and build in default lib/mscg/MSCG-release-master
	+make lib-mscg args="-h . MSCG -g -b -l" # download and build in lib/mscg/MSCG
	+make lib-mscg args="-h ~ MSCG -g -b -l" # download and build in ~/mscg :pre
	+
	+Note that the final -l switch is to create 2 symbolic (soft) links,
	+"includelink" and "liblink", in lib/mscg to point to the MS-CG src
	+dir. When LAMMPS builds it will use these links. You should not need
	+to edit the lib/mscg/Makefile.lammps file.
	+
	+You can then install/un-install the package and build LAMMPS in the
	+usual manner:
	+
	+make yes-mscg
	+make machine :pre
	+
	+make no-mscg
	make machine :pre

	-Make.py -p ^mpiio -a machine :pre
	+[Supporting info:]

	-Supporting info: "dump"_dump.html, "restart"_restart.html,
	-"write_restart"_write_restart.html, "read_restart"_read_restart.html
	+src/MSCG: filenames -> commands
	+src/MSCG/README
	+lib/mscg/README
	+examples/mscg :ul

	:line
	+
	+OPT package :link(OPT),h4

	-OPT package :link(OPT),h5
	+[Contents:]

	-Contents: A handful of pair styles with an "opt" in their style name
	-which are optimized for improved CPU performance on single or multiple
	-cores. These include EAM, LJ, CHARMM, and Morse potentials. "Section
	-5.3.5"_accelerate_opt.html gives details of how to build and
	-use this package. See the KOKKOS, USER-INTEL, and USER-OMP packages,
	-which also have styles optimized for CPU performance.
	+A handful of pair styles which are optimized for improved CPU
	+performance on single or multiple cores. These include EAM, LJ,
	+CHARMM, and Morse potentials. The styles have an "opt" suffix in
	+their style name. "Section 5.3.5"_accelerate_opt.html gives details
	+of how to build and use this package. Its styles can be invoked at
	+run time via the "-sf opt" or "-suffix opt" "command-line
	+switches"_Section_start.html#start_7. See also the "KOKKOS"_#KOKKOS,
	+"USER-INTEL"_#USER-INTEL, and "USER-OMP"_#USER-OMP packages, which
	+have styles optimized for CPU performance.

	-Some C++ compilers, like the Intel compiler, require the compile flag
	-"-restrict" to build LAMMPS with the OPT package. It should be added
	-to the CCFLAGS line of your Makefile.machine. Or use Makefile.opt in
	-src/MAKE/OPTIONS, via "make opt". For compilers that use the flag,
	-the Make.py command adds it automatically to the Makefile.auto file it
	-creates and uses.
	+[Authors:] James Fischer (High Performance Technologies), David Richie,
	+and Vincent Natoli (Stone Ridge Technolgy).

	-To install via make or Make.py:
	+[Install or un-install:]

	make yes-opt
	make machine :pre

	-Make.py -p opt -a machine :pre
	-
	-To un-install via make or Make.py:
	-
	make no-opt
	make machine :pre

	-Make.py -p ^opt -a machine :pre
	+NOTE: The compile flag "-restrict" must be used to build LAMMPS with
	+the OPT package. It should be added to the CCFLAGS line of your
	+Makefile.machine. See Makefile.opt in src/MAKE/OPTIONS for an
	+example.
	+
	+CCFLAGS: add -restrict :ul

	-Supporting info: "Section 5.3"_Section_accelerate.html#acc_3,
	-"Section 5.3.5"_accelerate_opt.html, Pair Styles section of
	-"Section 3.5"_Section_commands.html#cmd_5 for any pair style
	-listed with an (t), examples/accelerate, bench/KEPLER
	+[Supporting info:]
	+
	+src/OPT: filenames -> commands
	+"Section 5.3"_Section_accelerate.html#acc_3
	+"Section 5.3.5"_accelerate_opt.html
	+"Section 2.7 -sf opt"_Section_start.html#start_7
	+Pair Styles section of "Section 3.5"_Section_commands.html#cmd_5 for pair styles followed by (t)
	+"Benchmarks page"_http://lammps.sandia.gov/bench.html of web site :ul

	:line

	-PERI package :link(PERI),h5
	+PERI package :link(PERI),h4

	-Contents: Support for the Peridynamics method, a particle-based
	-meshless continuum model. The package includes an atom style, several
	-computes which calculate diagnostics, and several Peridynamic pair
	-styles which implement different materials models.
	+[Contents:]

	-To install via make or Make.py:
	+An atom style, several pair styles which implement different
	+Peridynamics materials models, and several computes which calculate
	+diagnostics. Peridynamics is a a particle-based meshless continuum
	+model.

	-make yes-peri
	-make machine :pre
	+[Authors:] The original package was created by Mike Parks (Sandia).
	+Additional Peridynamics models were added by Rezwanur Rahman and John
	+Foster (UTSA).

	-Make.py -p peri -a machine :pre
	+[Install or un-install:]

	-To un-install via make or Make.py:
	+make yes-peri
	+make machine :pre

	make no-peri
	make machine :pre

	-Make.py -p ^peri -a machine :pre
	+[Supporting info:]

	-Supporting info:
	-"doc/PDF/PDLammps_overview.pdf"_PDF/PDLammps_overview.pdf,
	-"doc/PDF/PDLammps_EPS.pdf"_PDF/PDLammps_EPS.pdf,
	-"doc/PDF/PDLammps_VES.pdf"_PDF/PDLammps_VES.pdf, "atom_style
	-peri"_atom_style.html, "compute damage/atom"_compute_damage_atom.html,
	-"pair_style peri/pmb"_pair_peri.html, examples/peri
	+src/PERI: filenames -> commands
	+"doc/PDF/PDLammps_overview.pdf"_PDF/PDLammps_overview.pdf
	+"doc/PDF/PDLammps_EPS.pdf"_PDF/PDLammps_EPS.pdf
	+"doc/PDF/PDLammps_VES.pdf"_PDF/PDLammps_VES.pdf
	+"atom_style peri"_atom_style.html
	+"pair_style peri/*"_pair_peri.html
	+"compute damage/atom"_compute_damage_atom.html
	+"compute plasticity/atom"_compute_plasticity_atom.html
	+examples/peri
	+http://lammps.sandia.gov/movies.html#peri :ul

	:line

	-POEMS package :link(POEMS),h5
	+POEMS package :link(POEMS),h4

	-Contents: A fix that wraps the Parallelizable Open source Efficient
	-Multibody Software (POEMS) librar, which is able to simulate the
	-dynamics of articulated body systems. These are systems with multiple
	-rigid bodies (collections of atoms or particles) whose motion is
	-coupled by connections at hinge points.
	+[Contents:]

	-Building LAMMPS with the POEMS package requires first building the
	-POEMS library itself, which is a set of C++ files in lib/poems.
	-Details of how to do this are in lib/poems/README. As illustrated
	-below, perform a "make" using one of the Makefile.machine files in
	-lib/poems which should create a lib/meam/libpoems.a file.
	-Makefile.g++ and Makefile.icc are examples for the GNU and Intel C++
	-compilers. The "make" also creates a lib/poems/Makefile.lammps file
	-which you should not need to change. Note the Make.py script has a
	-"-poems" option to allow the POEMS library and LAMMPS to be built in
	-one step. Type "python src/Make.py -h -poems" to see the details.
	+A fix that wraps the Parallelizable Open source Efficient Multibody
	+Software (POEMS) library, which is able to simulate the dynamics of
	+articulated body systems. These are systems with multiple rigid
	+bodies (collections of particles) whose motion is coupled by
	+connections at hinge points.

	-To install via make or Make.py:
	+[Author:] Rudra Mukherjee (JPL) while at RPI.

	-cd ~/lammps/lib/poems
	-make -f Makefile.g++ # for example
	-cd ~/lammps/src
	-make yes-poems
	-make machine :pre
	+[Install or un-install:]
	+
	+Before building LAMMPS with this package, you must first build the
	+POEMS library in lib/poems. You can do this manually if you prefer;
	+follow the instructions in lib/poems/README. You can also do it in
	+one step from the lammps/src dir, using a command like these, which
	+simply invoke the lib/poems/Install.py script with the specified args:
	+
	+make lib-poems # print help message
	+make lib-poems args="-m g++" # build with GNU g++ compiler
	+make lib-poems args="-m icc" # build with Intel icc compiler :pre

	-Make.py -p poems -poems make=g++ -a machine :pre
	+The build should produce two files: lib/poems/libpoems.a and
	+lib/poems/Makefile.lammps. The latter is copied from an existing
	+Makefile.lammps.* and has settings needed to build LAMMPS with the
	+POEMS library (though typically the settings are just blank). If
	+necessary, you can edit/create a new lib/poems/Makefile.machine file
	+for your system, which should define an EXTRAMAKE variable to specify
	+a corresponding Makefile.lammps.machine file.

	-To un-install via make or Make.py:
	+You can then install/un-install the package and build LAMMPS in the
	+usual manner:
	+
	+make yes-poems
	+make machine :pre

	make no-meam
	make machine :pre

	-Make.py -p ^meam -a machine :pre
	+[Supporting info:]

	-Supporting info: src/POEMS/README, lib/poems/README,
	-"fix poems"_fix_poems.html, examples/rigid
	+src/POEMS: filenames -> commands
	+src/POEMS/README
	+lib/poems/README
	+"fix poems"_fix_poems.html
	+examples/rigid :ul

	:line

	-PYTHON package :link(PYTHON),h5
	+PYTHON package :link(PYTHON),h4

	-Contents: A "python"_python.html command which allow you to execute
	-Python code from a LAMMPS input script. The code can be in a separate
	-file or embedded in the input script itself. See "Section
	-11.2"_Section_python.html#py_2 for an overview of using Python from
	-LAMMPS and for other ways to use LAMMPS and Python together.
	+[Contents:]

	-Building with the PYTHON package assumes you have a Python shared
	-library available on your system, which needs to be a Python 2
	-version, 2.6 or later. Python 3 is not yet supported. The build uses
	-the contents of the lib/python/Makefile.lammps file to find all the Python
	-files required in the build/link process. See the lib/python/README
	-file if the settings in that file do not work on your system. Note
	-that the Make.py script has a "-python" option to allow an alternate
	-lib/python/Makefile.lammps file to be specified and LAMMPS to be built
	-in one step. Type "python src/Make.py -h -python" to see the details.
	+A "python"_python.html command which allow you to execute Python code
	+from a LAMMPS input script. The code can be in a separate file or
	+embedded in the input script itself. See "Section
	+11.2"_Section_python.html#py_2 for an overview of using Python from
	+LAMMPS in this manner and the entire section for other ways to use
	+LAMMPS and Python together.

	-To install via make or Make.py:
	+[Install or un-install:]

	make yes-python
	make machine :pre

	-Make.py -p python -a machine :pre
	-
	-To un-install via make or Make.py:
	-
	make no-python
	make machine :pre

	-Make.py -p ^python -a machine :pre
	+NOTE: Building with the PYTHON package assumes you have a Python
	+shared library available on your system, which needs to be a Python 2
	+version, 2.6 or later. Python 3 is not yet supported. See the
	+lib/python/README for more details. Note that the build uses the
	+lib/python/Makefile.lammps file in the compile/link process. You
	+should only need to create a new Makefile.lammps.* file (and copy it
	+to Makefile.lammps) if the LAMMPS build fails.

	-Supporting info: examples/python
	+[Supporting info:]
	+
	+src/PYTHON: filenames -> commands
	+"Section 11"_Section_python.html
	+lib/python/README
	+examples/python :ul

	:line

	-QEQ package :link(QEQ),h5
	+QEQ package :link(QEQ),h4
	+
	+[Contents:]

	-Contents: Several fixes for performing charge equilibration (QEq) via
	-severeal different algorithms. These can be used with pair styles
	-that use QEq as part of their formulation.
	+Several fixes for performing charge equilibration (QEq) via different
	+algorithms. These can be used with pair styles that perform QEq as
	+part of their formulation.

	-To install via make or Make.py:
	+[Install or un-install:]

	make yes-qeq
	make machine :pre

	-Make.py -p qeq -a machine :pre
	-
	-To un-install via make or Make.py:
	-
	make no-qeq
	make machine :pre

	-Make.py -p ^qeq -a machine :pre
	+[Supporting info:]

	-Supporting info: "fix qeq/*"_fix_qeq.html, examples/qeq
	+src/QEQ: filenames -> commands
	+"fix qeq/*"_fix_qeq.html
	+examples/qeq
	+examples/streitz :ul

	:line

	-REAX package :link(REAX),h5
	+REAX package :link(REAX),h4

	-Contents: A pair style for the ReaxFF potential, a universal reactive
	-force field, as well as a "fix reax/bonds"_fix_reax_bonds.html command
	-for monitoring molecules as bonds are created and destroyed.
	+[Contents:]

	-Building LAMMPS with the REAX package requires first building the REAX
	-library itself, which is a set of Fortran 95 files in lib/reax.
	-Details of how to do this are in lib/reax/README. As illustrated
	-below, perform a "make" using one of the Makefile.machine files in
	-lib/reax which should create a lib/reax/libreax.a file.
	-Makefile.gfortran and Makefile.ifort are examples for the GNU Fortran
	-and Intel Fortran compilers. The "make" also copies a
	-lib/reax/Makefile.lammps.machine file to lib/reax/Makefile.lammps.
	-This file has settings that enable the C++ compiler used to build
	-LAMMPS to link with a Fortran library (typically the 2 compilers to be
	-consistent e.g. both Intel compilers, or both GNU compilers). If the
	-settings in Makefile.lammps for your compilers and machine are not
	-correct, the LAMMPS link will fail. Note that the Make.py script has
	-a "-reax" option to allow the REAX library and LAMMPS to be built in
	-one step. Type "python src/Make.py -h -reax" to see the details.
	-
	-To install via make or Make.py:
	-
	-cd ~/lammps/lib/reax
	-make -f Makefile.gfortran # for example
	-cd ~/lammps/src
	-make yes-reax
	-make machine :pre
	+A pair style which wraps a Fortran library which implements the ReaxFF
	+potential, which is a universal reactive force field. See the
	+"USER-REAXC package"_#USER-REAXC for an alternate implementation in
	+C/C++. Also a "fix reax/bonds"_fix_reax_bonds.html command for
	+monitoring molecules as bonds are created and destroyed.
	+
	+[Author:] Aidan Thompson (Sandia).

	-Make.py -p reax -reax make=gfortran -a machine :pre
	+[Install or un-install:]

	-To un-install via make or Make.py:
	+Before building LAMMPS with this package, you must first build the
	+REAX library in lib/reax. You can do this manually if you prefer;
	+follow the instructions in lib/reax/README. You can also do it in one
	+step from the lammps/src dir, using a command like these, which simply
	+invoke the lib/reax/Install.py script with the specified args:
	+
	+make lib-reax # print help message
	+make lib-reax args="-m gfortran" # build with GNU Fortran compiler
	+make lib-reax args="-m ifort" # build with Intel ifort compiler :pre
	+
	+The build should produce two files: lib/reax/libreax.a and
	+lib/reax/Makefile.lammps. The latter is copied from an existing
	+Makefile.lammps.* and has settings needed to link C++ (LAMMPS) with
	+Fortran (REAX library). Typically the two compilers used for LAMMPS
	+and the REAX library need to be consistent (e.g. both Intel or both
	+GNU compilers). If necessary, you can edit/create a new
	+lib/reax/Makefile.machine file for your system, which should define an
	+EXTRAMAKE variable to specify a corresponding Makefile.lammps.machine
	+file.
	+
	+You can then install/un-install the package and build LAMMPS in the
	+usual manner:
	+
	+make yes-reax
	+make machine :pre

	make no-reax
	make machine :pre

	-Make.py -p ^reax -a machine :pre
	+[Supporting info:]

	-Supporting info: lib/reax/README, "pair_style reax"_pair_reax.html,
	-"fix reax/bonds"_fix_reax_bonds.html, examples/reax
	+src/REAX: filenames -> commands
	+lib/reax/README
	+"pair_style reax"_pair_reax.html
	+"fix reax/bonds"_fix_reax_bonds.html
	+examples/reax :ul

	:line

	-REPLICA package :link(REPLICA),h5
	+REPLICA package :link(REPLICA),h4

	-Contents: A collection of multi-replica methods that are used by
	-invoking multiple instances (replicas) of LAMMPS
	-simulations. Communication between individual replicas is performed in
	-different ways by the different methods. See "Section
	+[Contents:]
	+
	+A collection of multi-replica methods which can be used when running
	+multiple LAMMPS simulations (replicas). See "Section
	6.5"_Section_howto.html#howto_5 for an overview of how to run
	-multi-replica simulations in LAMMPS. Multi-replica methods included
	-in the package are nudged elastic band (NEB), parallel replica
	-dynamics (PRD), temperature accelerated dynamics (TAD), parallel
	-tempering, and a verlet/split algorithm for performing long-range
	-Coulombics on one set of processors, and the remainder of the force
	-field calculation on another set.
	+multi-replica simulations in LAMMPS. Methods in the package include
	+nudged elastic band (NEB), parallel replica dynamics (PRD),
	+temperature accelerated dynamics (TAD), parallel tempering, and a
	+verlet/split algorithm for performing long-range Coulombics on one set
	+of processors, and the remainder of the force field calcalation on
	+another set.

	-To install via make or Make.py:
	+[Install or un-install:]

	make yes-replica
	make machine :pre

	-Make.py -p replica -a machine :pre
	-
	-To un-install via make or Make.py:
	-
	make no-replica
	make machine :pre

	-Make.py -p ^replica -a machine :pre
	+[Supporting info:]

	-Supporting info: "Section 6.5"_Section_howto.html#howto_5,
	-"neb"_neb.html, "prd"_prd.html, "tad"_tad.html, "temper"_temper.html,
	-"run_style verlet/split"_run_style.html, examples/neb, examples/prd,
	-examples/tad
	+src/REPLICA: filenames -> commands
	+"Section 6.5"_Section_howto.html#howto_5
	+"neb"_neb.html
	+"prd"_prd.html
	+"tad"_tad.html
	+"temper"_temper.html,
	+"run_style verlet/split"_run_style.html
	+examples/neb
	+examples/prd
	+examples/tad :ul

	:line

	-RIGID package :link(RIGID),h5
	+RIGID package :link(RIGID),h4
	+
	+[Contents:]

	-Contents: A collection of computes and fixes which enforce rigid
	-constraints on collections of atoms or particles. This includes SHAKE
	-and RATTLE, as well as variants of rigid-body time integrators for a
	-few large bodies or many small bodies.
	+Fixes which enforce rigid constraints on collections of atoms or
	+particles. This includes SHAKE and RATTLE, as well as varous
	+rigid-body integrators for a few large bodies or many small bodies.
	+Also several computes which calculate properties of rigid bodies.

	-To install via make or Make.py:
	+To install/build:

	make yes-rigid
	make machine :pre

	-Make.py -p rigid -a machine :pre
	-
	-To un-install via make or Make.py:
	+To un-install/re-build:

	make no-rigid
	make machine :pre

	-Make.py -p ^rigid -a machine :pre
	+[Supporting info:]

	-Supporting info: "compute erotate/rigid"_compute_erotate_rigid.html,
	-"fix shake"_fix_shake.html, "fix rattle"_fix_shake.html, "fix
	-rigid/*"_fix_rigid.html, examples/ASPHERE, examples/rigid
	+src/RIGID: filenames -> commands
	+"compute erotate/rigid"_compute_erotate_rigid.html
	+fix shake"_fix_shake.html
	+"fix rattle"_fix_shake.html
	+"fix rigid/*"_fix_rigid.html
	+examples/ASPHERE
	+examples/rigid
	+bench/in.rhodo
	+http://lammps.sandia.gov/movies.html#box
	+http://lammps.sandia.gov/movies.html#star :ul

	:line

	-SHOCK package :link(SHOCK),h5
	+SHOCK package :link(SHOCK),h4
	+
	+[Contents:]

	-Contents: A small number of fixes useful for running impact
	-simulations where a shock-wave passes through a material.
	+Fixes for running impact simulations where a shock-wave passes through
	+a material.

	-To install via make or Make.py:
	+[Install or un-install:]

	make yes-shock
	make machine :pre

	-Make.py -p shock -a machine :pre
	-
	-To un-install via make or Make.py:
	-
	make no-shock
	make machine :pre

	-Make.py -p ^shock -a machine :pre
	+[Supporting info:]

	-Supporting info: "fix append/atoms"_fix_append_atoms.html, "fix
	-msst"_fix_msst.html, "fix nphug"_fix_nphug.html, "fix
	-wall/piston"_fix_wall_piston.html, examples/hugoniostat, examples/msst
	+src/SHOCK: filenames -> commands
	+"fix append/atoms"_fix_append_atoms.html
	+"fix msst"_fix_msst.html
	+"fix nphug"_fix_nphug.html
	+"fix wall/piston"_fix_wall_piston.html
	+examples/hugoniostat
	+examples/msst :ul

	:line

	-SNAP package :link(SNAP),h5
	+SNAP package :link(SNAP),h4

	-Contents: A pair style for the spectral neighbor analysis potential
	-(SNAP), which is an empirical potential which can be quantum accurate
	-when fit to an archive of DFT data. Computes useful for analyzing
	-properties of the potential are also included.
	+[Contents:]

	-To install via make or Make.py:
	+A pair style for the spectral neighbor analysis potential (SNAP).
	+SNAP is methodology for deriving a highly accurate classical potential
	+fit to a large archive of quantum mechanical (DFT) data. Also several
	+computes which analyze attributes of the potential.

	-make yes-snap
	-make machine :pre
	+[Author:] Aidan Thompson (Sandia).

	-Make.py -p snap -a machine :pre
	+[Install or un-install:]

	-To un-install via make or Make.py:
	+make yes-snap
	+make machine :pre

	make no-snap
	make machine :pre

	-Make.py -p ^snap -a machine :pre
	+[Supporting info:]

	-Supporting info: "pair snap"_pair_snap.html, "compute
	-sna/atom"_compute_sna_atom.html, "compute snad/atom"_compute_sna_atom.html,
	-"compute snav/atom"_compute_sna_atom.html, examples/snap
	+src/SNAP: filenames -> commands
	+"pair snap"_pair_snap.html
	+"compute sna/atom"_compute_sna_atom.html
	+"compute snad/atom"_compute_sna_atom.html
	+"compute snav/atom"_compute_sna_atom.html
	+examples/snap :ul

	:line

	-SRD package :link(SRD),h5
	+SRD package :link(SRD),h4

	-Contents: Two fixes which implement the Stochastic Rotation Dynamics
	-(SRD) method for coarse-graining of a solvent, typically around large
	-colloidal-scale particles.
	+[Contents:]

	-To install via make or Make.py:
	+A pair of fixes which implement the Stochastic Rotation Dynamics (SRD)
	+method for coarse-graining of a solvent, typically around large
	+colloidal particles.
	+
	+To install/build:

	make yes-srd
	make machine :pre

	-Make.py -p srd -a machine :pre
	-
	-To un-install via make or Make.py:
	+To un-install/re-build:

	make no-srd
	make machine :pre

	-Make.py -p ^srd -a machine :pre
	+[Supporting info:]

	-Supporting info: "fix srd"_fix_srd.html, "fix
	-wall/srd"_fix_wall_srd.html, examples/srd, examples/ASPHERE
	+src/SRD: filenames -> commands
	+"fix srd"_fix_srd.html
	+"fix wall/srd"_fix_wall_srd.html
	+examples/srd
	+examples/ASPHERE
	+http://lammps.sandia.gov/movies.html#tri
	+http://lammps.sandia.gov/movies.html#line
	+http://lammps.sandia.gov/movies.html#poly :ul

	:line

	-VORONOI package :link(VORONOI),h5
	+VORONOI package :link(VORONOI),h4

	-Contents: A "compute voronoi/atom"_compute_voronoi_atom.html command
	-which computes the Voronoi tesselation of a collection of atoms or
	-particles by wrapping the Voro++ lib
	+[Contents:]

	-To build LAMMPS with the KIM package you must have previously
	-installed the KIM API (library) on your system. The lib/kim/README
	-file explains how to download and install KIM. Building with the KIM
	-package also uses the lib/kim/Makefile.lammps file in the compile/link
	-process. You should not need to edit this file.
	+A compute command which calculates the Voronoi tesselation of a
	+collection of atoms by wrapping the "Voro++ library"_voronoi. This
	+can be used to calculate the local volume or each atoms or its near
	+neighbors.

	+:link(voronoi,http://math.lbl.gov/voro++)

	-To build LAMMPS with the VORONOI package you must have previously
	-installed the Voro++ library on your system. The lib/voronoi/README
	-file explains how to download and install Voro++. There is a
	-lib/voronoi/install.py script which automates the process. Type
	-"python install.py" to see instructions. The final step is to create
	-soft links in the lib/voronoi directory for "includelink" and
	-"liblink" which point to installed Voro++ directories. Building with
	-the VORONOI package uses the contents of the
	-lib/voronoi/Makefile.lammps file in the compile/link process. You
	-should not need to edit this file. Note that the Make.py script has a
	-"-voronoi" option to allow the Voro++ library to be downloaded and/or
	-installed and LAMMPS to be built in one step. Type "python
	-src/Make.py -h -voronoi" to see the details.
	+To use this package you must have the Voro++ library available on your
	+system.

	-To install via make or Make.py:
	+[Author:] Daniel Schwen (INL) while at LANL. The open-source Voro++
	+library was written by Chris Rycroft (Harvard U) while at UC Berkeley
	+and LBNL.

	-cd ~/lammps/lib/voronoi
	-python install.py -g -b -l # download Voro++, build in lib/voronoi, create links
	-cd ~/lammps/src
	-make yes-voronoi
	-make machine :pre
	+[Install or un-install:]
	+
	+Before building LAMMPS with this package, you must first download and
	+build the Voro++ library. You can do this manually if you prefer;
	+follow the instructions in lib/voronoi/README. You can also do it in
	+one step from the lammps/src dir, using a command like these, which
	+simply invoke the lib/voronoi/Install.py script with the specified
	+args:
	+
	+make lib-voronoi # print help message
	+make lib-voronoi args="-g -b -l" # download and build in default lib/voronoi/voro++-0.4.6
	+make lib-voronoi args="-h . voro++ -g -b -l" # download and build in lib/voronoi/voro++
	+make lib-voronoi args="-h ~ voro++ -g -b -l" # download and build in ~/voro++ :pre
	+
	+Note that the final -l switch is to create 2 symbolic (soft) links,
	+"includelink" and "liblink", in lib/voronoi to point to the Voro++ src
	+dir. When LAMMPS builds it will use these links. You should not need
	+to edit the lib/voronoi/Makefile.lammps file.

	-Make.py -p voronoi -voronoi install="-g -b -l" -a machine :pre
	+You can then install/un-install the package and build LAMMPS in the
	+usual manner:

	-To un-install via make or Make.py:
	+make yes-voronoi
	+make machine :pre

	make no-voronoi
	make machine :pre

	-Make.py -p ^voronoi -a machine :pre
	-
	-Supporting info: src/VORONOI/README, lib/voronoi/README, "compute
	-voronoi/atom"_compute_voronoi_atom.html, examples/voronoi
	+[Supporting info:]
	+
	+src/VORONOI: filenames -> commands
	+src/VORONOI/README
	+lib/voronoi/README
	+"compute voronoi/atom"_compute_voronoi_atom.html
	+examples/voronoi :ul

	:line
	+:line
	+
	+USER-ATC package :link(USER-ATC),h4

	-4.2 User packages :h4,link(pkg_2)
	+[Contents:]

	-The current list of user-contributed packages is as follows:
	+ATC stands for atoms-to-continuum. This package implements a "fix
	+atc"_fix_atc.html command to either couple molecular dynamics with
	+continuum finite element equations or perform on-the-fly conversion of
	+atomic information to continuum fields.

	-Package, Description, Author(s), Doc page, Example, Pic/movie, Library
	-"USER-ATC"_#USER-ATC, atom-to-continuum coupling, Jones & Templeton & Zimmerman (1), "fix atc"_fix_atc.html, USER/atc, "atc"_atc, lib/atc
	-"USER-AWPMD"_#USER-AWPMD, wave-packet MD, Ilya Valuev (JIHT), "pair_style awpmd/cut"_pair_awpmd.html, USER/awpmd, -, lib/awpmd
	-"USER-CG-CMM"_#USER-CG-CMM, coarse-graining model, Axel Kohlmeyer (Temple U), "pair_style lj/sdk"_pair_sdk.html, USER/cg-cmm, "cg"_cg, -
	-"USER-CGDNA"_#USER-CGDNA, coarse-grained DNA force fields, Oliver Henrich (U Strathclyde Glasgow), src/USER-CGDNA/README, USER/cgdna, -, -
	-"USER-COLVARS"_#USER-COLVARS, collective variables, Fiorin & Henin & Kohlmeyer (2), "fix colvars"_fix_colvars.html, USER/colvars, "colvars"_colvars, lib/colvars
	-"USER-DIFFRACTION"_#USER-DIFFRACTION, virutal x-ray and electron diffraction, Shawn Coleman (ARL),"compute xrd"_compute_xrd.html, USER/diffraction, -, -
	-"USER-DPD"_#USER-DPD, reactive dissipative particle dynamics (DPD), Larentzos & Mattox & Brennan (5), src/USER-DPD/README, USER/dpd, -, -
	-"USER-DRUDE"_#USER-DRUDE, Drude oscillators, Dequidt & Devemy & Padua (3), "tutorial"_tutorial_drude.html, USER/drude, -, -
	-"USER-EFF"_#USER-EFF, electron force field, Andres Jaramillo-Botero (Caltech), "pair_style eff/cut"_pair_eff.html, USER/eff, "eff"_eff, -
	-"USER-FEP"_#USER-FEP, free energy perturbation, Agilio Padua (U Blaise Pascal Clermont-Ferrand), "compute fep"_compute_fep.html, USER/fep, -, -
	-"USER-H5MD"_#USER-H5MD, dump output via HDF5, Pierre de Buyl (KU Leuven), "dump h5md"_dump_h5md.html, -, -, lib/h5md
	-"USER-INTEL"_#USER-INTEL, Vectorized CPU and Intel(R) coprocessor styles, W. Michael Brown (Intel), "Section 5.3.2"_accelerate_intel.html, examples/intel, -, -
	-"USER-LB"_#USER-LB, Lattice Boltzmann fluid, Colin Denniston (U Western Ontario), "fix lb/fluid"_fix_lb_fluid.html, USER/lb, -, -
	-"USER-MGPT"_#USER-MGPT, fast MGPT multi-ion potentials, Tomas Oppelstrup & John Moriarty (LLNL), "pair_style mgpt"_pair_mgpt.html, USER/mgpt, -, -
	-"USER-MISC"_#USER-MISC, single-file contributions, USER-MISC/README, USER-MISC/README, -, -, -
	-"USER-MANIFOLD"_#USER-MANIFOLD, motion on 2d surface, Stefan Paquay (Eindhoven U of Technology), "fix manifoldforce"_fix_manifoldforce.html, USER/manifold, "manifold"_manifold, -
	-"USER-MOLFILE"_#USER-MOLFILE, "VMD"_VMD molfile plug-ins, Axel Kohlmeyer (Temple U), "dump molfile"_dump_molfile.html, -, -, VMD-MOLFILE
	-"USER-NC-DUMP"_#USER-NC-DUMP, dump output via NetCDF, Lars Pastewka (Karlsruhe Institute of Technology, KIT), "dump nc / dump nc/mpiio"_dump_nc.html, -, -, lib/netcdf
	-"USER-OMP"_#USER-OMP, OpenMP threaded styles, Axel Kohlmeyer (Temple U), "Section 5.3.4"_accelerate_omp.html, -, -, -
	-"USER-PHONON"_#USER-PHONON, phonon dynamical matrix, Ling-Ti Kong (Shanghai Jiao Tong U), "fix phonon"_fix_phonon.html, USER/phonon, -, -
	-"USER-QMMM"_#USER-QMMM, QM/MM coupling, Axel Kohlmeyer (Temple U), "fix qmmm"_fix_qmmm.html, USER/qmmm, -, lib/qmmm
	-"USER-QTB"_#USER-QTB, quantum nuclear effects, Yuan Shen (Stanford), "fix qtb"_fix_qtb.html "fix qbmsst"_fix_qbmsst.html, qtb, -, -
	-"USER-QUIP"_#USER-QUIP, QUIP/libatoms interface, Albert Bartok-Partay (U Cambridge), "pair_style quip"_pair_quip.html, USER/quip, -, lib/quip
	-"USER-REAXC"_#USER-REAXC, C version of ReaxFF, Metin Aktulga (LBNL), "pair_style reaxc"_pair_reax_c.html, reax, -, -
	-"USER-SMD"_#USER-SMD, smoothed Mach dynamics, Georg Ganzenmuller (EMI), "SMD User Guide"_PDF/SMD_LAMMPS_userguide.pdf, USER/smd, -, -
	-"USER-SMTBQ"_#USER-SMTBQ, Second Moment Tight Binding - QEq potential, Salles & Maras & Politano & Tetot (4), "pair_style smtbq"_pair_smtbq.html, USER/smtbq, -, -
	-"USER-SPH"_#USER-SPH, smoothed particle hydrodynamics, Georg Ganzenmuller (EMI), "SPH User Guide"_PDF/SPH_LAMMPS_userguide.pdf, USER/sph, "sph"_sph, -
	-"USER-TALLY"_#USER-TALLY, Pairwise tallied computes, Axel Kohlmeyer (Temple U), "compute XXX/tally"_compute_tally.html, USER/tally, -, -
	-"USER-VTK"_#USER-VTK, VTK-style dumps, Berger and Queteschiner (6), "compute custom/vtk"_dump_custom_vtk.html, -, -, lib/vtk
	-:tb(ea=c)
	+[Authors:] Reese Jones, Jeremy Templeton, Jon Zimmerman (Sandia).

	-:link(atc,http://lammps.sandia.gov/pictures.html#atc)
	-:link(cg,http://lammps.sandia.gov/pictures.html#cg)
	-:link(eff,http://lammps.sandia.gov/movies.html#eff)
	-:link(manifold,http://lammps.sandia.gov/movies.html#manifold)
	-:link(sph,http://lammps.sandia.gov/movies.html#sph)
	-:link(VMD,http://www.ks.uiuc.edu/Research/vmd)
	+[Install or un-install:]
	+
	+Before building LAMMPS with this package, you must first build the ATC
	+library in lib/atc. You can do this manually if you prefer; follow
	+the instructions in lib/atc/README. You can also do it in one step
	+from the lammps/src dir, using a command like these, which simply
	+invoke the lib/atc/Install.py script with the specified args:

	-The "Authors" column lists a name(s) if a specific person is
	-responsible for creating and maintaining the package.
	+make lib-atc # print help message
	+make lib-atc args="-m g++" # build with GNU g++ compiler
	+make lib-atc args="-m icc" # build with Intel icc compiler :pre

	-(1) The ATC package was created by Reese Jones, Jeremy Templeton, and
	-Jon Zimmerman (Sandia).
	+The build should produce two files: lib/atc/libatc.a and
	+lib/atc/Makefile.lammps. The latter is copied from an existing
	+Makefile.lammps.* and has settings needed to build LAMMPS with the ATC
	+library. If necessary, you can edit/create a new
	+lib/atc/Makefile.machine file for your system, which should define an
	+EXTRAMAKE variable to specify a corresponding Makefile.lammps.machine
	+file.

	-(2) The COLVARS package was created by Axel Kohlmeyer (Temple U) using
	-the colvars module library written by Giacomo Fiorin (Temple U) and
	-Jerome Henin (LISM, Marseille, France).
	+Note that the Makefile.lammps file has settings for the BLAS and
	+LAPACK linear algebra libraries. As explained in lib/atc/README these
	+can either exist on your system, or you can use the files provided in
	+lib/linalg. In the latter case you also need to build the library
	+in lib/linalg with a command like these:

	-(3) The DRUDE package was created by Alain Dequidt (U Blaise Pascal
	-Clermont-Ferrand) and co-authors Julien Devemy (CNRS) and Agilio Padua
	-(U Blaise Pascal).
	+make lib-linalg # print help message
	+make lib-atc args="-m gfortran" # build with GNU Fortran compiler

	-(4) The SMTBQ package was created by Nicolas Salles, Emile Maras,
	-Olivier Politano, and Robert Tetot (LAAS-CNRS, France).
	+You can then install/un-install the package and build LAMMPS in the
	+usual manner:

	-(5) The USER-DPD package was created by James Larentzos (ARL), Timothy
	-Mattox (Engility), and John Brennan (ARL).
	+make yes-user-atc
	+make machine :pre
	+
	+make no-user-atc
	+make machine :pre
	+
	+[Supporting info:]

	-(6) The USER-VTK package was created by Richard Berger (JKU) and
	-Daniel Queteschiner (DCS Computing).
	+src/USER-ATC: filenames -> commands
	+src/USER-ATC/README
	+"fix atc"_fix_atc.html
	+examples/USER/atc
	+http://lammps.sandia.gov/pictures.html#atc :ul

	-The "Doc page" column links to either a sub-section of the
	-"Section 6"_Section_howto.html of the manual, or an input script
	-command implemented as part of the package, or to additional
	-documentation provided within the package.
	+:line

	-The "Example" column is a sub-directory in the examples directory of
	-the distribution which has an input script that uses the package.
	-E.g. "peptide" refers to the examples/peptide directory.
	+USER-AWPMD package :link(USER-AWPMD),h4

	-The "Library" column lists an external library which must be built
	-first and which LAMMPS links to when it is built. If it is listed as
	-lib/package, then the code for the library is under the lib directory
	-of the LAMMPS distribution. See the lib/package/README file for info
	-on how to build the library. If it is not listed as lib/package, then
	-it is a third-party library not included in the LAMMPS distribution.
	-See details on all of this below for individual packages.
	+[Contents:]

	-:line
	+AWPMD stands for Antisymmetrized Wave Packet Molecular Dynamics. This
	+package implements an atom, pair, and fix style which allows electrons
	+to be treated as explicit particles in a classical molecular dynamics
	+model.

	-USER-ATC package :link(USER-ATC),h5
	+[Author:] Ilya Valuev (JIHT, Russia).

	-Contents: ATC stands for atoms-to-continuum. This package implements
	-a "fix atc"_fix_atc.html command to either couple MD with continuum
	-finite element equations or perform on-the-fly post-processing of
	-atomic information to continuum fields. See src/USER-ATC/README for
	-more details.
	-
	-To build LAMMPS with this package ...
	-
	-To install via make or Make.py:
	+[Install or un-install:]
	+
	+Before building LAMMPS with this package, you must first build the
	+AWPMD library in lib/awpmd. You can do this manually if you prefer;
	+follow the instructions in lib/awpmd/README. You can also do it in
	+one step from the lammps/src dir, using a command like these, which
	+simply invoke the lib/awpmd/Install.py script with the specified args:

	-make yes-user-atc
	-make machine :pre
	+make lib-awpmd # print help message
	+make lib-awpmd args="-m g++" # build with GNU g++ compiler
	+make lib-awpmd args="-m icc" # build with Intel icc compiler :pre

	-Make.py -p atc -a machine :pre
	+The build should produce two files: lib/awpmd/libawpmd.a and
	+lib/awpmd/Makefile.lammps. The latter is copied from an existing
	+Makefile.lammps.* and has settings needed to build LAMMPS with the
	+AWPMD library. If necessary, you can edit/create a new
	+lib/awpmd/Makefile.machine file for your system, which should define
	+an EXTRAMAKE variable to specify a corresponding
	+Makefile.lammps.machine file.

	-To un-install via make or Make.py:
	+Note that the Makefile.lammps file has settings for the BLAS and
	+LAPACK linear algebra libraries. As explained in lib/awpmd/README
	+these can either exist on your system, or you can use the files
	+provided in lib/linalg. In the latter case you also need to build the
	+library in lib/linalg with a command like these:

	-make no-user-atc
	-make machine :pre
	+make lib-linalg # print help message
	+make lib-atc args="-m gfortran" # build with GNU Fortran compiler

	-Make.py -p ^atc -a machine :pre
	+You can then install/un-install the package and build LAMMPS in the
	+usual manner:

	-Supporting info:src/USER-ATC/README, "fix atc"_fix_atc.html,
	-examples/USER/atc
	+make yes-user-awpmd
	+make machine :pre
	+
	+make no-user-awpmd
	+make machine :pre
	+
	+[Supporting info:]

	-Authors: Reese Jones (rjones at sandia.gov), Jeremy Templeton (jatempl
	-at sandia.gov) and Jon Zimmerman (jzimmer at sandia.gov) at Sandia.
	-Contact them directly if you have questions.
	+src/USER-AWPMD: filenames -> commands
	+src/USER-AWPMD/README
	+"pair awpmd/cut"_pair_awpmd.html
	+"fix nve/awpmd"_fix_nve_awpmd.html
	+examples/USER/awpmd :ul

	:line

	-USER-AWPMD package :link(USER-AWPMD),h5
	+USER-CGDNA package :link(USER-CGDNA),h4

	-Contents: AWPMD stands for Antisymmetrized Wave Packet Molecular
	-Dynamics. This package implements an atom, pair, and fix style which
	-allows electrons to be treated as explicit particles in an MD
	-calculation. See src/USER-AWPMD/README for more details.
	+[Contents:]

	-To build LAMMPS with this package ...
	+Several pair styles, a bond style, and integration fixes for
	+coarse-grained models of single- and double-stranded DNA based on the
	+oxDNA model of Doye, Louis and Ouldridge at the University of Oxford.
	+This includes Langevin-type rigid-body integrators with improved
	+stability.

	-Supporting info: src/USER-AWPMD/README, "fix
	-awpmd/cut"_pair_awpmd.html, examples/USER/awpmd
	+[Author:] Oliver Henrich (University of Edinburgh).

	-Author: Ilya Valuev at the JIHT in Russia (valuev at
	-physik.hu-berlin.de). Contact him directly if you have questions.
	+[Install or un-install:]
	+
	+make yes-user-cgdna
	+make machine :pre
	+
	+make no-user-cgdna
	+make machine :pre
	+
	+[Supporting info:]
	+
	+src/USER-CGDNA: filenames -> commands
	+/src/USER-CGDNA/README
	+"pair_style oxdna/*"_pair_oxdna.html
	+"pair_style oxdna2/*"_pair_oxdna2.html
	+"bond_style oxdna/*"_bond_oxdna.html
	+"bond_style oxdna2/*"_bond_oxdna2.html
	+"fix nve/dotc/langevin"_fix_nve_dotc_langevin.html :ul

	:line

	-USER-CG-CMM package :link(USER-CG-CMM),h5
	+USER-CGSDK package :link(USER-CGSDK),h4

	-Contents: CG-CMM stands for coarse-grained ??. This package
	-implements several pair styles and an angle style using the coarse
	-grained parametrization of Shinoda, DeVane, Klein, Mol Sim, 33, 27
	-(2007) (SDK), with extensions to simulate ionic liquids, electrolytes,
	-lipids and charged amino acids. See src/USER-CG-CMM/README for more
	-details.
	+[Contents:]

	-Supporting info: src/USER-CG-CMM/README, "pair lj/sdk"_pair_sdk.html,
	-"pair lj/sdk/coul/long"_pair_sdk.html, "angle sdk"_angle_sdk.html,
	-examples/USER/cg-cmm
	+Several pair styles and an angle style which implement the
	+coarse-grained SDK model of Shinoda, DeVane, and Klein which enables
	+simulation of ionic liquids, electrolytes, lipids and charged amino
	+acids.

	-Author: Axel Kohlmeyer at Temple U (akohlmey at gmail.com). Contact
	-him directly if you have questions.
	-
	-:line
	+[Author:] Axel Kohlmeyer (Temple U).

	-USER-CGDNA package :link(USER-CGDNA),h5
	-
	-Contents: The CGDNA package implements coarse-grained force fields for
	-single- and double-stranded DNA. These are at the moment mainly the
	-oxDNA and oxDNA2 models, developed by Doye, Louis and Ouldridge at the University
	-of Oxford. The package also contains Langevin-type rigid-body
	-integrators with improved stability.
	+[Install or un-install:]
	+
	+make yes-user-cgsdk
	+make machine :pre
	+
	+make no-user-cgsdk
	+make machine :pre
	+
	+[Supporting info:]

	-See these doc pages to get started:
	+src/USER-CGSDK: filenames -> commands
	+src/USER-CGSDK/README
	+"pair_style lj/sdk/*"_pair_sdk.html
	+"angle_style sdk"_angle_sdk.html
	+examples/USER/cgsdk
	+http://lammps.sandia.gov/pictures.html#cg :ul

	-"bond_style oxdna/fene"_bond_oxdna.html
	-"bond_style oxdna2/fene"_bond_oxdna.html
	-"pair_style oxdna/..."_pair_oxdna.html
	-"pair_style oxdna2/..."_pair_oxdna2.html
	-"fix nve/dotc/langevin"_fix_nve_dotc_langevin.html :ul
	+:line

	-Supporting info: /src/USER-CGDNA/README, "bond_style
	-oxdna/fene"_bond_oxdna.html, "bond_style
	-oxdna2/fene"_bond_oxdna.html, "pair_style
	-oxdna/..."_pair_oxdna.html, "pair_style
	-oxdna2/..."_pair_oxdna2.html, "fix
	-nve/dotc/langevin"_fix_nve_dotc_langevin.html
	+USER-COLVARS package :link(USER-COLVARS),h4
	+
	+[Contents:]
	+
	+COLVARS stands for collective variables, which can be used to
	+implement various enhanced sampling methods, including Adaptive
	+Biasing Force, Metadynamics, Steered MD, Umbrella Sampling and
	+Restraints. A "fix colvars"_fix_colvars.html command is implemented
	+which wraps a COLVARS library, which implements these methods.
	+simulations.
	+
	+[Authors:] Axel Kohlmeyer (Temple U). The COLVARS library was written
	+by Giacomo Fiorin (ICMS, Temple University, Philadelphia, PA, USA) and
	+Jerome Henin (LISM, CNRS, Marseille, France).
	+
	+[Install or un-install:]
	+
	+Before building LAMMPS with this package, you must first build the
	+COLVARS library in lib/colvars. You can do this manually if you
	+prefer; follow the instructions in lib/colvars/README. You can also
	+do it in one step from the lammps/src dir, using a command like these,
	+which simply invoke the lib/colvars/Install.py script with the
	+specified args:
	+
	+make lib-colvars # print help message
	+make lib-colvars args="-m g++" # build with GNU g++ compiler :pre
	+
	+The build should produce two files: lib/colvars/libcolvars.a and
	+lib/colvars/Makefile.lammps. The latter is copied from an existing
	+Makefile.lammps.* and has settings needed to build LAMMPS with the
	+COLVARS library (though typically the settings are just blank). If
	+necessary, you can edit/create a new lib/colvars/Makefile.machine file
	+for your system, which should define an EXTRAMAKE variable to specify
	+a corresponding Makefile.lammps.machine file.
	+
	+You can then install/un-install the package and build LAMMPS in the
	+usual manner:
	+
	+make yes-user-colvars
	+make machine :pre
	+
	+make no-user-colvars
	+make machine :pre
	+
	+[Supporting info:]

	-Author: Oliver Henrich at the University of Strathclyde, Glasgow
	-(oliver.henrich at strath.ac.uk, also ohenrich at ph.ed.ac.uk).
	-Contact him directly if you have any questions.
	+src/USER-COLVARS: filenames -> commands
	+"doc/PDF/colvars-refman-lammps.pdf"_PDF/colvars-refman-lammps.pdf
	+src/USER-COLVARS/README
	+lib/colvars/README
	+"fix colvars"_fix_colvars.html
	+examples/USER/colvars :ul

	:line

	-USER-COLVARS package :link(USER-COLVARS),h5
	+USER-DIFFRACTION package :link(USER-DIFFRACTION),h4

	-Contents: COLVARS stands for collective variables which can be used to
	-implement Adaptive Biasing Force, Metadynamics, Steered MD, Umbrella
	-Sampling and Restraints. This package implements a "fix
	-colvars"_fix_colvars.html command which wraps a COLVARS library which
	-can perform those kinds of simulations. See src/USER-COLVARS/README
	-for more details.
	+[Contents:]

	-Supporting info:
	-"doc/PDF/colvars-refman-lammps.pdf"_PDF/colvars-refman-lammps.pdf,
	-src/USER-COLVARS/README, lib/colvars/README, "fix
	-colvars"_fix_colvars.html, examples/USER/colvars
	+Two computes and a fix for calculating x-ray and electron diffraction
	+intensities based on kinematic diffraction theory.

	-Authors: Axel Kohlmeyer at Temple U (akohlmey at gmail.com) wrote the
	-fix. The COLVARS library itself is written and maintained by Giacomo
	-Fiorin (ICMS, Temple University, Philadelphia, PA, USA) and Jerome
	-Henin (LISM, CNRS, Marseille, France). Contact them directly if you
	-have questions.
	+[Author:] Shawn Coleman while at the U Arkansas.

	-:line
	+[Install or un-install:]
	+
	+make yes-user-diffraction
	+make machine :pre
	+
	+make no-user-diffraction
	+make machine :pre
	+
	+[Supporting info:]

	-USER-DIFFRACTION package :link(USER-DIFFRACTION),h5
	+src/USER-DIFFRACTION: filenames -> commands
	+"compute saed"_compute_saed.html
	+"compute xrd"_compute_xrd.html
	+"fix saed/vtk"_fix_saed_vtk.html
	+examples/USER/diffraction :ul

	-Contents: This packages implements two computes and a fix for
	-calculating x-ray and electron diffraction intensities based on
	-kinematic diffraction theory. See src/USER-DIFFRACTION/README for
	-more details.
	+:line

	-Supporting info: "compute saed"_compute_saed.html, "compute
	-xrd"_compute_xrd.html, "fix saed/vtk"_fix_saed_vtk.html,
	-examples/USER/diffraction
	+USER-DPD package :link(USER-DPD),h4

	-Author: Shawn P. Coleman (shawn.p.coleman8.ctr at mail.mil) while at
	-the University of Arkansas. Contact him directly if you have
	-questions.
	+[Contents:]

	-:line
	+DPD stands for dissipative particle dynamics. This package implements
	+coarse-grained DPD-based models for energetic, reactive molecular
	+crystalline materials. It includes many pair styles specific to these
	+systems, including for reactive DPD, where each particle has internal
	+state for multiple species and a coupled set of chemical reaction ODEs
	+are integrated each timestep. Highly accurate time intergrators for
	+isothermal, isoenergetic, isobaric and isenthalpic conditions are
	+included. These enable long timesteps via the Shardlow splitting
	+algorithm.

	-USER-DPD package :link(USER-DPD),h5
	+[Authors:] Jim Larentzos (ARL), Tim Mattox (Engility Corp), and and John
	+Brennan (ARL).

	-Contents: DPD stands for dissipative particle dynamics, This package
	-implements DPD for isothermal, isoenergetic, isobaric and isenthalpic
	-conditions. It also has extensions for performing reactive DPD, where
	-each particle has internal state for multiple species and a coupled
	-set of chemical reaction ODEs are integrated each timestep. The DPD
	-equations of motion are integrated efficiently through the Shardlow
	-splitting algorithm. See src/USER-DPD/README for more details.
	+[Install or un-install:]
	+
	+make yes-user-dpd
	+make machine :pre
	+
	+make no-user-dpd
	+make machine :pre
	+
	+[Supporting info:]

	-Supporting info: /src/USER-DPD/README, "compute dpd"_compute_dpd.html
	+src/USER-DPD: filenames -> commands
	+/src/USER-DPD/README
	+"compute dpd"_compute_dpd.html
	"compute dpd/atom"_compute_dpd_atom.html
	-"fix eos/cv"_fix_eos_table.html "fix eos/table"_fix_eos_table.html
	-"fix eos/table/rx"_fix_eos_table_rx.html "fix shardlow"_fix_shardlow.html
	-"fix rx"_fix_rx.html "pair table/rx"_pair_table_rx.html
	-"pair dpd/fdt"_pair_dpd_fdt.html "pair dpd/fdt/energy"_pair_dpd_fdt.html
	-"pair exp6/rx"_pair_exp6_rx.html "pair multi/lucy"_pair_multi_lucy.html
	-"pair multi/lucy/rx"_pair_multi_lucy_rx.html, examples/USER/dpd
	-
	-Authors: James Larentzos (ARL) (james.p.larentzos.civ at mail.mil),
	-Timothy Mattox (Engility Corp) (Timothy.Mattox at engilitycorp.com)
	-and John Brennan (ARL) (john.k.brennan.civ at mail.mil). Contact them
	-directly if you have questions.
	+"fix eos/cv"_fix_eos_table.html
	+"fix eos/table"_fix_eos_table.html
	+"fix eos/table/rx"_fix_eos_table_rx.html
	+"fix shardlow"_fix_shardlow.html
	+"fix rx"_fix_rx.html
	+"pair table/rx"_pair_table_rx.html
	+"pair dpd/fdt"_pair_dpd_fdt.html
	+"pair dpd/fdt/energy"_pair_dpd_fdt.html
	+"pair exp6/rx"_pair_exp6_rx.html
	+"pair multi/lucy"_pair_multi_lucy.html
	+"pair multi/lucy/rx"_pair_multi_lucy_rx.html
	+examples/USER/dpd :ul

	:line

	-USER-DRUDE package :link(USER-DRUDE),h5
	+USER-DRUDE package :link(USER-DRUDE),h4

	-Contents: This package contains methods for simulating polarizable
	-systems using thermalized Drude oscillators. It has computes, fixes,
	-and pair styles for this purpose. See "Section
	+[Contents:]
	+
	+Fixes, pair styles, and a compute to simulate thermalized Drude
	+oscillators as a model of polarization. See "Section
	6.27"_Section_howto.html#howto_27 for an overview of how to use the
	-package. See src/USER-DRUDE/README for additional details. There are
	-auxiliary tools for using this package in tools/drude.
	+package. There are auxiliary tools for using this package in
	+tools/drude.

	-Supporting info: "Section 6.27"_Section_howto.html#howto_27,
	-src/USER-DRUDE/README, "fix drude"_fix_drude.html, "fix
	-drude/transform/*"_fix_drude_transform.html, "compute
	-temp/drude"_compute_temp_drude.html, "pair thole"_pair_thole.html,
	-"pair lj/cut/thole/long"_pair_thole.html, examples/USER/drude,
	-tools/drude
	+[Authors:] Alain Dequidt (U Blaise Pascal Clermont-Ferrand), Julien
	+Devemy (CNRS), and Agilio Padua (U Blaise Pascal).

	-Authors: Alain Dequidt at Universite Blaise Pascal Clermont-Ferrand
	-(alain.dequidt at univ-bpclermont.fr); co-authors: Julien Devemy,
	-Agilio Padua. Contact them directly if you have questions.
	+[Install or un-install:]
	+
	+make yes-user-drude
	+make machine :pre
	+
	+make no-user-drude
	+make machine :pre
	+
	+[Supporting info:]
	+
	+src/USER-DRUDE: filenames -> commands
	+"Section 6.27"_Section_howto.html#howto_27
	+"Section 6.25"_Section_howto.html#howto_25
	+src/USER-DRUDE/README
	+"fix drude"_fix_drude.html
	+"fix drude/transform/*"_fix_drude_transform.html
	+"compute temp/drude"_compute_temp_drude.html
	+"pair thole"_pair_thole.html
	+"pair lj/cut/thole/long"_pair_thole.html
	+examples/USER/drude
	+tools/drude :ul

	:line

	-USER-EFF package :link(USER-EFF),h5
	+USER-EFF package :link(USER-EFF),h4

	-Contents: EFF stands for electron force field. This package contains
	-atom, pair, fix and compute styles which implement the eFF as
	+[Contents:]
	+
	+EFF stands for electron force field which allows a classical MD code
	+to model electrons as particles of variable radius. This package
	+contains atom, pair, fix and compute styles which implement the eFF as
	described in A. Jaramillo-Botero, J. Su, Q. An, and W.A. Goddard III,
	-JCC, 2010. The eFF potential was first introduced by Su and Goddard,
	-in 2007. See src/USER-EFF/README for more details. There are
	-auxiliary tools for using this package in tools/eff; see its README
	-file.
	+JCC, 2010. The eFF potential was first introduced by Su and Goddard,
	+in 2007. There are auxiliary tools for using this package in
	+tools/eff; see its README file.

	-Supporting info:
	+[Author:] Andres Jaramillo-Botero (CalTech).

	-Author: Andres Jaramillo-Botero at CalTech (ajaramil at
	-wag.caltech.edu). Contact him directly if you have questions.
	+[Install or un-install:]
	+
	+make yes-user-eff
	+make machine :pre
	+
	+make no-user-eff
	+make machine :pre
	+
	+[Supporting info:]
	+
	+src/USER-EFF: filenames -> commands
	+src/USER-EFF/README
	+"atom_style electron"_atom_style.html
	+"fix nve/eff"_fix_nve_eff.html
	+"fix nvt/eff"_fix_nvt_eff.html
	+"fix npt/eff"_fix_npt_eff.html
	+"fix langevin/eff"_fix_langevin_eff.html
	+"compute temp/eff"_compute_temp_eff.html
	+"pair eff/cut"_pair_eff.html
	+"pair eff/inline"_pair_eff.html
	+examples/USER/eff
	+tools/eff/README
	+tools/eff
	+http://lammps.sandia.gov/movies.html#eff :ul

	:line

	-USER-FEP package :link(USER-FEP),h5
	+USER-FEP package :link(USER-FEP),h4
	+
	+[Contents:]

	-Contents: FEP stands for free energy perturbation. This package
	-provides methods for performing FEP simulations by using a "fix
	+FEP stands for free energy perturbation. This package provides
	+methods for performing FEP simulations by using a "fix
	adapt/fep"_fix_adapt_fep.html command with soft-core pair potentials,
	-which have a "soft" in their style name. See src/USER-FEP/README for
	-more details. There are auxiliary tools for using this package in
	-tools/fep; see its README file.
	+which have a "soft" in their style name. There are auxiliary tools
	+for using this package in tools/fep; see its README file.

	-Supporting info: src/USER-FEP/README, "fix
	-adapt/fep"_fix_adapt_fep.html, "compute fep"_compute_fep.html,
	-"pair_style */soft"_pair_lj_soft.html, examples/USER/fep
	+[Author:] Agilio Padua (Universite Blaise Pascal Clermont-Ferrand)

	-Author: Agilio Padua at Universite Blaise Pascal Clermont-Ferrand
	-(agilio.padua at univ-bpclermont.fr). Contact him directly if you have
	-questions.
	+[Install or un-install:]
	+
	+make yes-user-fep
	+make machine :pre
	+
	+make no-user-fep
	+make machine :pre
	+
	+[Supporting info:]
	+
	+src/USER-FEP: filenames -> commands
	+src/USER-FEP/README
	+"fix adapt/fep"_fix_adapt_fep.html
	+"compute fep"_compute_fep.html
	+"pair_style */soft"_pair_lj_soft.html
	+examples/USER/fep
	+tools/fep/README
	+tools/fep :ul

	:line

	-USER-H5MD package :link(USER-H5MD),h5
	+USER-H5MD package :link(USER-H5MD),h4

	-Contents: H5MD stands for HDF5 for MD. "HDF5"_HDF5 is a binary,
	-portable, self-describing file format, used by many scientific
	-simulations. H5MD is a format for molecular simulations, built on top
	-of HDF5. This package implements a "dump h5md"_dump_h5md.html command
	-to output LAMMPS snapshots in this format. See src/USER-H5MD/README
	-for more details.
	+[Contents:]

	-:link(HDF5,http://www.hdfgroup.org/HDF5/)
	+H5MD stands for HDF5 for MD. "HDF5"_HDF5 is a portable, binary,
	+self-describing file format, used by many scientific simulations.
	+H5MD is a format for molecular simulations, built on top of HDF5.
	+This package implements a "dump h5md"_dump_h5md.html command to output
	+LAMMPS snapshots in this format.

	-Supporting info: src/USER-H5MD/README, lib/h5md/README, "dump
	-h5md"_dump_h5md.html
	+:link(HDF5,http://www.hdfgroup.org/HDF5)

	-Author: Pierre de Buyl at KU Leuven (see http://pdebuyl.be) created
	-this package as well as the H5MD format and library. Contact him
	-directly if you have questions.
	+To use this package you must have the HDF5 library available on your
	+system.

	-:line
	-
	-USER-INTEL package :link(USER-INTEL),h5
	+[Author:] Pierre de Buyl (KU Leuven) created both the package and the
	+H5MD format.

	-Contents: Dozens of pair, bond, angle, dihedral, and improper styles
	-that are optimized for Intel CPUs and the Intel Xeon Phi (in offload
	-mode). All of them have an "intel" in their style name. "Section
	-5.3.2"_accelerate_intel.html gives details of what hardware
	-and compilers are required on your system, and how to build and use
	-this package. Also see src/USER-INTEL/README for more details. See
	-the KOKKOS, OPT, and USER-OMP packages, which also have CPU and
	-Phi-enabled styles.
	+[Install or un-install:]

	-Supporting info: examples/accelerate, src/USER-INTEL/TEST
	+Note that to follow these steps to compile and link to the CH5MD
	+library, you need the standard HDF5 software package installed on your
	+system, which should include the h5cc compiler and the HDF5 library.

	-"Section 5.3"_Section_accelerate.html#acc_3
	+Before building LAMMPS with this package, you must first build the
	+CH5MD library in lib/h5md. You can do this manually if you prefer;
	+follow the instructions in lib/h5md/README. You can also do it in one
	+step from the lammps/src dir, using a command like these, which simply
	+invoke the lib/h5md/Install.py script with the specified args:

	-Author: Mike Brown at Intel (michael.w.brown at intel.com). Contact
	-him directly if you have questions.
	-
	-For the USER-INTEL package, you have 2 choices when building. You can
	-build with CPU or Phi support. The latter uses Xeon Phi chips in
	-"offload" mode. Each of these modes requires additional settings in
	-your Makefile.machine for CCFLAGS and LINKFLAGS.
	+make lib-h5md # print help message
	+make lib-hm5d args="-m h5cc" # build with h5cc compiler :pre

	-For CPU mode (if using an Intel compiler):
	+The build should produce two files: lib/h5md/libch5md.a and
	+lib/h5md/Makefile.lammps. The latter is copied from an existing
	+Makefile.lammps.* and has settings needed to build LAMMPS with the
	+system HDF5 library. If necessary, you can edit/create a new
	+lib/h5md/Makefile.machine file for your system, which should define an
	+EXTRAMAKE variable to specify a corresponding Makefile.lammps.machine
	+file.

	-CCFLAGS: add -fopenmp, -DLAMMPS_MEMALIGN=64, -restrict, -xHost, -fno-alias, -ansi-alias, -override-limits
	-LINKFLAGS: add -fopenmp :ul
	+You can then install/un-install the package and build LAMMPS in the
	+usual manner:
	+
	+make yes-user-h5md
	+make machine :pre
	+
	+make no-user-h5md
	+make machine :pre
	+
	+[Supporting info:]

	-For Phi mode add the following in addition to the CPU mode flags:
	+src/USER-H5MD: filenames -> commands
	+src/USER-H5MD/README
	+lib/h5md/README
	+"dump h5md"_dump_h5md.html :ul

	-CCFLAGS: add -DLMP_INTEL_OFFLOAD and
	-LINKFLAGS: add -offload :ul
	+:line

	-And also add this to CCFLAGS:
	+USER-INTEL package :link(USER-INTEL),h4

	--offload-option,mic,compiler,"-fp-model fast=2 -mGLOB_default_function_attrs=\"gather_scatter_loop_unroll=4\"" :pre
	+[Contents:]

	-Examples:
	+Dozens of pair, fix, bond, angle, dihedral, improper, and kspace
	+styles which are optimized for Intel CPUs and KNLs (Knights Landing).
	+All of them have an "intel" in their style name. "Section
	+5.3.2"_accelerate_intel.html gives details of what hardware and
	+compilers are required on your system, and how to build and use this
	+package. Its styles can be invoked at run time via the "-sf intel" or
	+"-suffix intel" "command-line switches"_Section_start.html#start_7.
	+Also see the "KOKKOS"_#KOKKOS, "OPT"_#OPT, and "USER-OMP"_#USER-OMP
	+packages, which have styles optimized for CPUs and KNLs.

	-:line
	+You need to have an Intel compiler, version 14 or higher to take full
	+advantage of this package.

	-USER-LB package :link(USER-LB),h5
	+[Author:] Mike Brown (Intel).

	-Supporting info:
	+[Install or un-install:]

	-This package contains a LAMMPS implementation of a background
	-Lattice-Boltzmann fluid, which can be used to model MD particles
	-influenced by hydrodynamic forces.
	+For the USER-INTEL package, you have 2 choices when building. You can
	+build with either CPU or KNL support. Each choice requires additional
	+settings in your Makefile.machine for CCFLAGS and LINKFLAGS and
	+optimized malloc libraries. See the
	+src/MAKE/OPTIONS/Makefile.intel_cpu and src/MAKE/OPTIONS/Makefile.knl
	+files for examples.
	+
	+For CPUs:
	+
	+OPTFLAGS = -xHost -O2 -fp-model fast=2 -no-prec-div -qoverride-limits
	+CCFLAGS = -g -qopenmp -DLAMMPS_MEMALIGN=64 -no-offload \
	+ -fno-alias -ansi-alias -restrict $(OPTFLAGS)
	+LINKFLAGS = -g -qopenmp $(OPTFLAGS)
	+LIB = -ltbbmalloc -ltbbmalloc_proxy
	+
	+For KNLs:
	+
	+OPTFLAGS = -xMIC-AVX512 -O2 -fp-model fast=2 -no-prec-div -qoverride-limits
	+CCFLAGS = -g -qopenmp -DLAMMPS_MEMALIGN=64 -no-offload \
	+ -fno-alias -ansi-alias -restrict $(OPTFLAGS)
	+LINKFLAGS = -g -qopenmp $(OPTFLAGS)
	+LIB = -ltbbmalloc
	+
	+Once you have an appropriate Makefile.machine, you can
	+install/un-install the package and build LAMMPS in the usual manner.
	+Note that you cannot build one executable to run on multiple hardware
	+targets (Intel CPUs or KNL). You need to build LAMMPS once for each
	+hardware target, to produce a separate executable.
	+
	+You should also typically install the USER-OMP package, as it can be
	+used in tandem with the USER-INTEL package to good effect, as
	+explained in "Section 5.3.2"_accelerate_intel.html.
	+
	+make yes-user-intel yes-user-omp
	+make machine :pre
	+
	+make no-user-intel no-user-omp
	+make machine :pre

	-See this doc page and its related commands to get started:
	+[Supporting info:]

	-"fix lb/fluid"_fix_lb_fluid.html
	+src/USER-INTEL: filenames -> commands
	+src/USER-INTEL/README
	+"Section 5.3"_Section_accelerate.html#acc_3
	+"Section 5.3.2"_accelerate_gpu.html
	+"Section 2.7 -sf intel"_Section_start.html#start_7
	+"Section 2.7 -pk intel"_Section_start.html#start_7
	+"package intel"_package.html
	+Styles sections of "Section 3.5"_Section_commands.html#cmd_5 for styles followed by (i)
	+src/USER-INTEL/TEST
	+"Benchmarks page"_http://lammps.sandia.gov/bench.html of web site :ul

	-The people who created this package are Frances Mackay (fmackay at
	-uwo.ca) and Colin (cdennist at uwo.ca) Denniston, University of
	-Western Ontario. Contact them directly if you have questions.
	+:line

	-Examples: examples/USER/lb
	+USER-LB package :link(USER-LB),h4

	-:line
	+[Contents:]

	-USER-MGPT package :link(USER-MGPT),h5
	+Fixes which implement a background Lattice-Boltzmann (LB) fluid, which
	+can be used to model MD particles influenced by hydrodynamic forces.

	-Supporting info:
	+[Authors:] Frances Mackay and Colin Denniston (University of Western
	+Ontario).

	-This package contains a fast implementation for LAMMPS of
	-quantum-based MGPT multi-ion potentials. The MGPT or model GPT method
	-derives from first-principles DFT-based generalized pseudopotential
	-theory (GPT) through a series of systematic approximations valid for
	-mid-period transition metals with nearly half-filled d bands. The
	-MGPT method was originally developed by John Moriarty at Lawrence
	-Livermore National Lab (LLNL).
	+[Install or un-install:]
	+
	+make yes-user-lb
	+make machine :pre
	+
	+make no-user-lb
	+make machine :pre
	+
	+[Supporting info:]

	-In the general matrix representation of MGPT, which can also be
	-applied to f-band actinide metals, the multi-ion potentials are
	-evaluated on the fly during a simulation through d- or f-state matrix
	-multiplication, and the forces that move the ions are determined
	-analytically. The {mgpt} pair style in this package calculates forces
	-and energies using an optimized matrix-MGPT algorithm due to Tomas
	-Oppelstrup at LLNL.
	+src/USER-LB: filenames -> commands
	+src/USER-LB/README
	+"fix lb/fluid"_fix_lb_fluid.html
	+"fix lb/momentum"_fix_lb_momentum.html
	+"fix lb/viscous"_fix_lb_viscous.html
	+examples/USER/lb :ul

	-See this doc page to get started:
	+:line

	-"pair_style mgpt"_pair_mgpt.html
	+USER-MGPT package :link(USER-MGPT),h4

	-The persons who created the USER-MGPT package are Tomas Oppelstrup
	-(oppelstrup2@llnl.gov) and John Moriarty (moriarty2@llnl.gov)
	-Contact them directly if you have any questions.
	+[Contents:]

	-Examples: examples/USER/mgpt
	+A pair style which provides a fast implementation of the quantum-based
	+MGPT multi-ion potentials. The MGPT or model GPT method derives from
	+first-principles DFT-based generalized pseudopotential theory (GPT)
	+through a series of systematic approximations valid for mid-period
	+transition metals with nearly half-filled d bands. The MGPT method
	+was originally developed by John Moriarty at LLNL. The pair style in
	+this package calculates forces and energies using an optimized
	+matrix-MGPT algorithm due to Tomas Oppelstrup at LLNL.

	-:line
	+[Authors:] Tomas Oppelstrup and John Moriarty (LLNL).

	-USER-MISC package :link(USER-MISC),h5
	+[Install or un-install:]
	+
	+make yes-user-mgpt
	+make machine :pre
	+
	+make no-user-mgpt
	+make machine :pre
	+
	+[Supporting info:]

	-Supporting info:
	+src/USER-MGPT: filenames -> commands
	+src/USER-MGPT/README
	+"pair_style mgpt"_pair_mgpt.html
	+examples/USER/mgpt :ul

	-The files in this package are a potpourri of (mostly) unrelated
	-features contributed to LAMMPS by users. Each feature is a single
	-pair of files (.cpp and .h).
	+:line

	-More information about each feature can be found by reading its doc
	-page in the LAMMPS doc directory. The doc page which lists all LAMMPS
	-input script commands is as follows:
	+USER-MISC package :link(USER-MISC),h4

	-"Section 3.5"_Section_commands.html#cmd_5
	+[Contents:]

	-User-contributed features are listed at the bottom of the fix,
	-compute, pair, etc sections.
	+A potpourri of (mostly) unrelated features contributed to LAMMPS by
	+users. Each feature is a single fix, compute, pair, bond, angle,
	+dihedral, improper, or command style.

	-The list of features and author of each is given in the
	+[Authors:] The author for each style in the package is listed in the
	src/USER-MISC/README file.

	-You should contact the author directly if you have specific questions
	-about the feature or its coding.
	+[Install or un-install:]
	+
	+make yes-user-misc
	+make machine :pre
	+
	+make no-user-misc
	+make machine :pre
	+
	+[Supporting info:]

	-Examples: examples/USER/misc
	+src/USER-MISC: filenames -> commands
	+src/USER-MISC/README
	+one doc page per individual command listed in src/USER-MISC/README
	+examples/USER/misc :ul

	:line

	-USER-MANIFOLD package :link(USER-MANIFOLD),h5
	+USER-MANIFOLD package :link(USER-MANIFOLD),h4

	-Supporting info:
	+[Contents:]

	-This package contains a dump molfile command which uses molfile
	-plugins that are bundled with the
	-"VMD"_http://www.ks.uiuc.edu/Research/vmd molecular visualization and
	-analysis program, to enable LAMMPS to dump its information in formats
	-compatible with various molecular simulation tools.
	+Several fixes and a "manifold" class which enable simulations of
	+particles constrained to a manifold (a 2D surface within the 3D
	+simulation box). This is done by applying the RATTLE constraint
	+algorithm to formulate single-particle constraint functions
	+g(xi,yi,zi) = 0 and their derivative (i.e. the normal of the manifold)
	+n = grad(g).

	-This package allows LAMMPS to perform MD simulations of particles
	-constrained on a manifold (i.e., a 2D subspace of the 3D simulation
	-box). It achieves this using the RATTLE constraint algorithm applied
	-to single-particle constraint functions g(xi,yi,zi) = 0 and their
	-derivative (i.e. the normal of the manifold) n = grad(g).
	+[Author:] Stefan Paquay (Eindhoven University of Technology (TU/e), The
	+Netherlands)

	-See this doc page to get started:
	+[Install or un-install:]
	+
	+make yes-user-manifold
	+make machine :pre
	+
	+make no-user-manifold
	+make machine :pre
	+
	+[Supporting info:]

	+src/USER-MANIFOLD: filenames -> commands
	+src/USER-MANIFOLD/README
	+"doc/manifolds"_manifolds.html
	"fix manifoldforce"_fix_manifoldforce.html
	-
	-The person who created this package is Stefan Paquay, at the Eindhoven
	-University of Technology (TU/e), The Netherlands (s.paquay at tue.nl).
	-Contact him directly if you have questions.
	+"fix nve/manifold/rattle"_fix_nve_manifold/rattle.html
	+"fix nvt/manifold/rattle"_fix_nvt_manifold/rattle.html
	+examples/USER/manifold
	+http://lammps.sandia.gov/movies.html#manifold :ul

	:line

	-USER-MOLFILE package :link(USER-MOLFILE),h5
	+USER-MOLFILE package :link(USER-MOLFILE),h4
	+
	+[Contents:]
	+
	+A "dump molfile"_dump_molfile.html command which uses molfile plugins
	+that are bundled with the "VMD"_http://www.ks.uiuc.edu/Research/vmd
	+molecular visualization and analysis program, to enable LAMMPS to dump
	+snapshots in formats compatible with various molecular simulation
	+tools.
	+
	+To use this package you must have the desired VMD plugins available on
	+your system.
	+
	+Note that this package only provides the interface code, not the
	+plugins themselves, which will be accessed when requesting a specific
	+plugin via the "dump molfile"_dump_molfile.html command. Plugins can
	+be obtained from a VMD installation which has to match the platform
	+that you are using to compile LAMMPS for. By adding plugins to VMD,
	+support for new file formats can be added to LAMMPS (or VMD or other
	+programs that use them) without having to recompile the application
	+itself. More information about the VMD molfile plugins can be found
	+at
	+"http://www.ks.uiuc.edu/Research/vmd/plugins/molfile"_http://www.ks.uiuc.edu/Research/vmd/plugins/molfile.
	+
	+[Author:] Axel Kohlmeyer (Temple U).
	+
	+[Install or un-install:]
	+
	+Note that the lib/molfile/Makefile.lammps file has a setting for a
	+dynamic loading library libdl.a that should is typically present on
	+all systems, which is required for LAMMPS to link with this package.
	+If the setting is not valid for your system, you will need to edit the
	+Makefile.lammps file. See lib/molfile/README and
	+lib/molfile/Makefile.lammps for details.
	+
	+make yes-user-molfile
	+make machine :pre
	+
	+make no-user-molfile
	+make machine :pre
	+
	+[Supporting info:]
	+
	+src/USER-MOLFILE: filenames -> commands
	+src/USER-MOLFILE/README
	+lib/molfile/README
	+"dump molfile"_dump_molfile.html :ul

	-Supporting info:
	+:line

	-This package contains a dump molfile command which uses molfile
	-plugins that are bundled with the
	-"VMD"_http://www.ks.uiuc.edu/Research/vmd molecular visualization and
	-analysis program, to enable LAMMPS to dump its information in formats
	-compatible with various molecular simulation tools.
	+USER-NETCDF package :link(USER-NETCDF),h4

	-The package only provides the interface code, not the plugins. These
	-can be obtained from a VMD installation which has to match the
	-platform that you are using to compile LAMMPS for. By adding plugins
	-to VMD, support for new file formats can be added to LAMMPS (or VMD or
	-other programs that use them) without having to recompile the
	-application itself.
	+[Contents:]

	-See this doc page to get started:
	+Dump styles for writing NetCDF formatted dump files. NetCDF is a
	+portable, binary, self-describing file format developed on top of
	+HDF5. The file contents follow the AMBER NetCDF trajectory conventions
	+(http://ambermd.org/netcdf/nctraj.xhtml), but include extensions.

	-"dump molfile"_dump_molfile.html
	+To use this package you must have the NetCDF library available on your
	+system.

	-The person who created this package is Axel Kohlmeyer at Temple U
	-(akohlmey at gmail.com). Contact him directly if you have questions.
	+Note that NetCDF files can be directly visualized with the following
	+tools:

	-:line
	+"Ovito"_ovito (Ovito supports the AMBER convention and the extensions mentioned above)
	+"VMD"_vmd
	+"AtomEye"_atomeye (the libAtoms version of AtomEye contains a NetCDF reader not present in the standard distribution) :ul
	+
	+:link(ovito,http://www.ovito.org)
	+:link(atomeye,http://www.libatoms.org)

	-USER-NC-DUMP package :link(USER-NC-DUMP),h5
	+[Author:] Lars Pastewka (Karlsruhe Institute of Technology).

	-Contents: Dump styles for writing NetCDF format files. NetCDF is a binary,
	-portable, self-describing file format on top of HDF5. The file format
	-contents follow the AMBER NetCDF trajectory conventions
	-(http://ambermd.org/netcdf/nctraj.xhtml), but include extensions to this
	-convention. This package implements a "dump nc"_dump_nc.html command
	-and a "dump nc/mpiio"_dump_nc.html command to output LAMMPS snapshots
	-in this format. See src/USER-NC-DUMP/README for more details.
	+[Install or un-install:]
	+
	+Note that to follow these steps, you need the standard NetCDF software
	+package installed on your system. The lib/netcdf/Makefile.lammps file
	+has settings for NetCDF include and library files that LAMMPS needs to
	+compile and linkk with this package. If the settings are not valid
	+for your system, you will need to edit the Makefile.lammps file. See
	+lib/netcdf/README for details.

	-NetCDF files can be directly visualized with the following tools:
	+make yes-user-netcdf
	+make machine :pre
	+
	+make no-user-netcdf
	+make machine :pre

	-Ovito (http://www.ovito.org/). Ovito supports the AMBER convention
	-and all of the above extensions. :ulb,l
	-VMD (http://www.ks.uiuc.edu/Research/vmd/) :l
	-AtomEye (http://www.libatoms.org/). The libAtoms version of AtomEye contains
	-a NetCDF reader that is not present in the standard distribution of AtomEye :l,ule
	+[Supporting info:]

	-The person who created these files is Lars Pastewka at
	-Karlsruhe Institute of Technology (lars.pastewka at kit.edu).
	-Contact him directly if you have questions.
	+src/USER-NETCDF: filenames -> commands
	+src/USER-NETCDF/README
	+lib/netcdf/README
	+"dump netcdf"_dump_netcdf.html :ul

	:line

	-USER-OMP package :link(USER-OMP),h5
	+USER-OMP package :link(USER-OMP),h4

	-Supporting info:
	+[Contents:]

	-This package provides OpenMP multi-threading support and
	-other optimizations of various LAMMPS pair styles, dihedral
	-styles, and fix styles.
	+Hundreds of pair, fix, compute, bond, angle, dihedral, improper, and
	+kspace styles which are altered to enable threading on many-core CPUs
	+via OpenMP directives. All of them have an "omp" in their style name.
	+"Section 5.3.4"_accelerate_omp.html gives details of what hardware and
	+compilers are required on your system, and how to build and use this
	+package. Its styles can be invoked at run time via the "-sf omp" or
	+"-suffix omp" "command-line switches"_Section_start.html#start_7.
	+Also see the "KOKKOS"_#KOKKOS, "OPT"_#OPT, and
	+"USER-INTEL"_#USER-INTEL packages, which have styles optimized for
	+CPUs.

	-See this section of the manual to get started:
	+[Author:] Axel Kohlmeyer (Temple U).

	-"Section 5.3"_Section_accelerate.html#acc_3
	+NOTE: The compile flags "-restrict" and "-fopenmp" must be used to
	+build LAMMPS with the USER-OMP package, as well as the link flag
	+"-fopenmp". They should be added to the CCFLAGS and LINKFLAGS lines
	+of your Makefile.machine. See src/MAKE/OPTIONS/Makefile.omp for an
	+example.

	-The person who created this package is Axel Kohlmeyer at Temple U
	-(akohlmey at gmail.com). Contact him directly if you have questions.
	+Once you have an appropriate Makefile.machine, you can
	+install/un-install the package and build LAMMPS in the usual manner:

	-For the USER-OMP package, your Makefile.machine needs additional
	-settings for CCFLAGS and LINKFLAGS.
	+[Install or un-install:]
	+
	+make yes-user-omp
	+make machine :pre
	+
	+make no-user-omp
	+make machine :pre

	CCFLAGS: add -fopenmp and -restrict
	LINKFLAGS: add -fopenmp :ul

	-Examples: examples/accelerate, bench/KEPLER
	+[Supporting info:]
	+
	+src/USER-OMP: filenames -> commands
	+src/USER-OMP/README
	+"Section 5.3"_Section_accelerate.html#acc_3
	+"Section 5.3.4"_accelerate_omp.html
	+"Section 2.7 -sf omp"_Section_start.html#start_7
	+"Section 2.7 -pk omp"_Section_start.html#start_7
	+"package omp"_package.html
	+Styles sections of "Section 3.5"_Section_commands.html#cmd_5 for styles followed by (o)
	+"Benchmarks page"_http://lammps.sandia.gov/bench.html of web site :ul

	:line

	-USER-PHONON package :link(USER-PHONON),h5
	+USER-PHONON package :link(USER-PHONON),h4
	+
	+[Contents:]

	-This package contains a fix phonon command that calculates dynamical
	+A "fix phonon"_fix_phonon.html command that calculates dynamical
	matrices, which can then be used to compute phonon dispersion
	relations, directly from molecular dynamics simulations.

	-See this doc page to get started:
	-
	-"fix phonon"_fix_phonon.html
	+[Author:] Ling-Ti Kong (Shanghai Jiao Tong University).

	-The person who created this package is Ling-Ti Kong (konglt at
	-sjtu.edu.cn) at Shanghai Jiao Tong University. Contact him directly
	-if you have questions.
	+[Install or un-install:]
	+
	+make yes-user-phonon
	+make machine :pre
	+
	+make no-user-phonon
	+make machine :pre
	+
	+[Supporting info:]

	-Examples: examples/USER/phonon
	+src/USER-PHONON: filenames -> commands
	+src/USER-PHONON/README
	+"fix phonon"_fix_phonon.html
	+examples/USER/phonon :ul

	:line

	-USER-QMMM package :link(USER-QMMM),h5
	+USER-QMMM package :link(USER-QMMM),h4

	-Supporting info:
	+[Contents:]

	-This package provides a fix qmmm command which allows LAMMPS to be
	-used in a QM/MM simulation, currently only in combination with pw.x
	-code from the "Quantum ESPRESSO"_espresso package.
	+A "fix qmmm"_fix_qmmm.html command which allows LAMMPS to be used in a
	+QM/MM simulation, currently only in combination with the "Quantum
	+ESPRESSO"_espresso package.

	:link(espresso,http://www.quantum-espresso.org)

	+To use this package you must have Quantum ESPRESSO available on your
	+system.
	+
	The current implementation only supports an ONIOM style mechanical
	coupling to the Quantum ESPRESSO plane wave DFT package.
	Electrostatic coupling is in preparation and the interface has been
	written in a manner that coupling to other QM codes should be possible
	without changes to LAMMPS itself.

	-See this doc page to get started:
	+[Author:] Axel Kohlmeyer (Temple U).
	+
	+[Install or un-install:]

	-"fix qmmm"_fix_qmmm.html
	+Before building LAMMPS with this package, you must first build the
	+QMMM library in lib/qmmm. You can do this manually if you prefer;
	+follow the first two steps explained in lib/colvars/README. You can
	+also do it in one step from the lammps/src dir, using a command like
	+these, which simply invoke the lib/colvars/Install.py script with the
	+specified args:

	-as well as the lib/qmmm/README file.
	+make lib-qmmm # print help message
	+make lib-qmmm args="-m gfortran" # build with GNU Fortran compiler :pre

	-The person who created this package is Axel Kohlmeyer at Temple U
	-(akohlmey at gmail.com). Contact him directly if you have questions.
	+The build should produce two files: lib/qmmm/libqmmm.a and
	+lib/qmmm/Makefile.lammps. The latter is copied from an existing
	+Makefile.lammps.* and has settings needed to build LAMMPS with the
	+QMMM library (though typically the settings are just blank). If
	+necessary, you can edit/create a new lib/qmmm/Makefile.machine file
	+for your system, which should define an EXTRAMAKE variable to specify
	+a corresponding Makefile.lammps.machine file.
	+
	+You can then install/un-install the package and build LAMMPS in the
	+usual manner:
	+
	+make yes-user-qmmm
	+make machine :pre
	+
	+make no-user-qmmm
	+make machine :pre
	+
	+NOTE: The LAMMPS executable these steps produce is not yet functional
	+for a QM/MM simulation. You must also build Quantum ESPRESSO and
	+create a new executable which links LAMMPS and Quanutm ESPRESSO
	+together. These are steps 3 and 4 described in the lib/qmmm/README
	+file.
	+
	+[Supporting info:]
	+
	+src/USER-QMMM: filenames -> commands
	+src/USER-QMMM/README
	+lib/qmmm/README
	+"fix phonon"_fix_phonon.html
	+lib/qmmm/example-ec/README
	+lib/qmmm/example-mc/README :ul

	:line

	-USER-QTB package :link(USER-QTB),h5
	+USER-QTB package :link(USER-QTB),h4

	-Supporting info:
	+[Contents:]

	-This package provides a self-consistent quantum treatment of the
	+Two fixes which provide a self-consistent quantum treatment of
	vibrational modes in a classical molecular dynamics simulation. By
	coupling the MD simulation to a colored thermostat, it introduces zero
	-point energy into the system, alter the energy power spectrum and the
	-heat capacity towards their quantum nature. This package could be of
	-interest if one wants to model systems at temperatures lower than
	-their classical limits or when temperatures ramp up across the
	-classical limits in the simulation.
	+point energy into the system, altering the energy power spectrum and
	+the heat capacity to account for their quantum nature. This is useful
	+when modeling systems at temperatures lower than their classical
	+limits or when temperatures ramp across the classical limits in a
	+simulation.

	-See these two doc pages to get started:
	+[Author:] Yuan Shen (Stanford U).
	+
	+[Install or un-install:]
	+
	+make yes-user-qtb
	+make machine :pre
	+
	+make no-user-qtb
	+make machine :pre
	+
	+[Supporting info:]

	-"fix qtb"_fix_qtb.html provides quantum nulcear correction through a
	-colored thermostat and can be used with other time integration schemes
	-like "fix nve"_fix_nve.html or "fix nph"_fix_nh.html.
	+src/USER-QTB: filenames -> commands
	+src/USER-QTB/README
	+"fix qtb"_fix_qtb.html
	+"fix qbmsst"_fix_qbmsst.html
	+examples/USER/qtb :ul

	-"fix qbmsst"_fix_qbmsst.html enables quantum nuclear correction of a
	-multi-scale shock technique simulation by coupling the quantum thermal
	-bath with the shocked system.
	+:line

	-The person who created this package is Yuan Shen (sy0302 at
	-stanford.edu) at Stanford University. Contact him directly if you
	-have questions.
	+USER-QUIP package :link(USER-QUIP),h4

	-Examples: examples/USER/qtb
	+[Contents:]

	-:line
	+A "pair_style quip"_pair_quip.html command which wraps the "QUIP
	+libAtoms library"_quip, which includes a variety of interatomic
	+potentials, including Gaussian Approximation Potential (GAP) models
	+developed by the Cambridge University group.

	-USER-QUIP package :link(USER-QUIP),h5
	+:link(quip,https://github.com/libAtoms/QUIP)

	-Supporting info:
	+To use this package you must have the QUIP libAatoms library available
	+on your system.

	-Examples: examples/USER/quip
	+[Author:] Albert Bartok (Cambridge University)

	-:line
	+[Install or un-install:]

	-USER-REAXC package :link(USER-REAXC),h5
	+Note that to follow these steps to compile and link to the QUIP
	+library, you must first download and build QUIP on your systems. It
	+can be obtained from GitHub. See step 1 and step 1.1 in the
	+lib/quip/README file for details on how to do this. Note that it
	+requires setting two environment variables, QUIP_ROOT and QUIP_ARCH,
	+which will be accessed by the lib/quip/Makefile.lammps file which is
	+used when you compile and link LAMMPS with this package. You should
	+only need to edit this file if the LAMMPS build can not use its
	+settings to successfully build on your system.

	-Supporting info:
	+You can then install/un-install the package and build LAMMPS in the
	+usual manner:

	-This package contains a implementation for LAMMPS of the ReaxFF force
	-field. ReaxFF uses distance-dependent bond-order functions to
	-represent the contributions of chemical bonding to the potential
	-energy. It was originally developed by Adri van Duin and the Goddard
	-group at CalTech.
	+make yes-user-quip
	+make machine :pre
	+
	+make no-user-quip
	+make machine :pre
	+
	+[Supporting info:]

	-The USER-REAXC version of ReaxFF (pair_style reax/c), implemented in
	-C, should give identical or very similar results to pair_style reax,
	-which is a ReaxFF implementation on top of a Fortran library, a
	-version of which library was originally authored by Adri van Duin.
	+src/USER-QUIP: filenames -> commands
	+src/USER-QUIP/README
	+"pair_style quip"_pair_quip.html
	+examples/USER/quip :ul

	-The reax/c version should be somewhat faster and more scalable,
	-particularly with respect to the charge equilibration calculation. It
	-should also be easier to build and use since there are no complicating
	-issues with Fortran memory allocation or linking to a Fortran library.
	+:line

	-For technical details about this implementation of ReaxFF, see
	-this paper:
	+USER-REAXC package :link(USER-REAXC),h4

	-Parallel and Scalable Reactive Molecular Dynamics: Numerical Methods
	-and Algorithmic Techniques, H. M. Aktulga, J. C. Fogarty,
	-S. A. Pandit, A. Y. Grama, Parallel Computing, in press (2011).
	+[Contents:]

	-See the doc page for the pair_style reax/c command for details
	-of how to use it in LAMMPS.
	+A pair style which implements the ReaxFF potential in C/C++ (in
	+contrast to the "REAX package"_#REAX and its Fortran library). ReaxFF
	+is universal reactive force field. See the src/USER-REAXC/README file
	+for more info on differences between the two packages. Also two fixes
	+for monitoring molecules as bonds are created and destroyed.

	-The person who created this package is Hasan Metin Aktulga (hmaktulga
	-at lbl.gov), while at Purdue University. Contact him directly, or
	-Aidan Thompson at Sandia (athomps at sandia.gov), if you have
	-questions.
	+[Author:] Hasan Metin Aktulga (MSU) while at Purdue University.

	-Examples: examples/reax
	+[Install or un-install:]
	+
	+make yes-user-reaxc
	+make machine :pre
	+
	+make no-user-reaxc
	+make machine :pre
	+
	+[Supporting info:]
	+
	+src/USER-REAXC: filenames -> commands
	+src/USER-REAXC/README
	+"pair_style reax/c"_pair_reaxc.html
	+"fix reax/c/bonds"_fix_reax_bonds.html
	+"fix reax/c/species"_fix_reaxc_species.html
	+examples/reax :ul

	:line

	-USER-SMD package :link(USER-SMD),h5
	+USER-SMD package :link(USER-SMD),h4

	-Supporting info:
	+[Contents:]

	-This package implements smoothed Mach dynamics (SMD) in
	-LAMMPS. Currently, the package has the following features:
	+An atom style, fixes, computes, and several pair styles which
	+implements smoothed Mach dynamics (SMD) for solids, which is a model
	+related to smoothed particle hydrodynamics (SPH) for liquids (see the
	+"USER-SPH package"_#USER-SPH).

	-* Does liquids via traditional Smooth Particle Hydrodynamics (SPH)
	+This package solves solids mechanics problems via a state of the art
	+stabilized meshless method with hourglass control. It can specify
	+hydrostatic interactions independently from material strength models,
	+i.e. pressure and deviatoric stresses are separated. It provides many
	+material models (Johnson-Cook, plasticity with hardening,
	+Mie-Grueneisen, Polynomial EOS) and allows new material models to be
	+added. It implements rigid boundary conditions (walls) which can be
	+specified as surface geometries from *.STL files.

	-* Also solves solids mechanics problems via a state of the art
	- stabilized meshless method with hourglass control.
	+[Author:] Georg Ganzenmuller (Fraunhofer-Institute for High-Speed
	+Dynamics, Ernst Mach Institute, Germany).

	-* Can specify hydrostatic interactions independently from material
	- strength models, i.e. pressure and deviatoric stresses are separated.
	+[Install or un-install:]

	-* Many material models available (Johnson-Cook, plasticity with
	- hardening, Mie-Grueneisen, Polynomial EOS). Easy to add new
	- material models.
	+Before building LAMMPS with this package, you must first download the
	+Eigen library. Eigen is a template library, so you do not need to
	+build it, just download it. You can do this manually if you prefer;
	+follow the instructions in lib/smd/README. You can also do it in one
	+step from the lammps/src dir, using a command like these, which simply
	+invoke the lib/smd/Install.py script with the specified args:

	-* Rigid boundary conditions (walls) can be loaded as surface geometries
	- from *.STL files.
	+make lib-smd # print help message
	+make lib-smd args="-g -l" # download in default lib/smd/eigen-eigen-*
	+make lib-smd args="-h . eigen -g -l" # download in lib/smd/eigen
	+make lib-smd args="-h ~ eigen -g -l" # download and build in ~/eigen :pre

	-See the file doc/PDF/SMD_LAMMPS_userguide.pdf to get started.
	+Note that the final -l switch is to create a symbolic (soft) link
	+named "includelink" in lib/smd to point to the Eigen dir. When LAMMPS
	+builds it will use this link. You should not need to edit the
	+lib/smd/Makefile.lammps file.

	-There are example scripts for using this package in examples/USER/smd.
	+You can then install/un-install the package and build LAMMPS in the
	+usual manner:

	-The person who created this package is Georg Ganzenmuller at the
	-Fraunhofer-Institute for High-Speed Dynamics, Ernst Mach Institute in
	-Germany (georg.ganzenmueller at emi.fhg.de). Contact him directly if
	-you have questions.
	+make yes-user-smd
	+make machine :pre
	+
	+make no-user-smd
	+make machine :pre
	+
	+[Supporting info:]

	-Examples: examples/USER/smd
	+src/USER-SMD: filenames -> commands
	+src/USER-SMD/README
	+doc/PDF/SMD_LAMMPS_userguide.pdf
	+examples/USER/smd
	+http://lammps.sandia.gov/movies.html#smd :ul

	:line

	-USER-SMTBQ package :link(USER-SMTBQ),h5
	+USER-SMTBQ package :link(USER-SMTBQ),h4

	-Supporting info:
	+[Contents:]

	-This package implements the Second Moment Tight Binding - QEq (SMTB-Q)
	-potential for the description of ionocovalent bonds in oxides.
	+A pair style which implements a Second Moment Tight Binding model with
	+QEq charge equilibration (SMTBQ) potential for the description of
	+ionocovalent bonds in oxides.

	-There are example scripts for using this package in
	-examples/USER/smtbq.
	+[Authors:] Nicolas Salles, Emile Maras, Olivier Politano, and Robert
	+Tetot (LAAS-CNRS, France).

	-See this doc page to get started:
	+[Install or un-install:]
	+
	+make yes-user-smtbq
	+make machine :pre
	+
	+make no-user-smtbq
	+make machine :pre
	+
	+[Supporting info:]

	+src/USER-SMTBQ: filenames -> commands
	+src/USER-SMTBQ/README
	"pair_style smtbq"_pair_smtbq.html
	+examples/USER/smtbq :ul

	-The persons who created the USER-SMTBQ package are Nicolas Salles,
	-Emile Maras, Olivier Politano, Robert Tetot, who can be contacted at
	-these email addresses: lammps@u-bourgogne.fr, nsalles@laas.fr. Contact
	-them directly if you have any questions.
	+:line

	-Examples: examples/USER/smtbq
	+USER-SPH package :link(USER-SPH),h4

	-:line
	+[Contents:]

	-USER-SPH package :link(USER-SPH),h5
	+An atom style, fixes, computes, and several pair styles which
	+implements smoothed particle hydrodynamics (SPH) for liquids. See the
	+related "USER-SMD package"_#USER-SMD package for smooth Mach dynamics
	+(SMD) for solids.

	-Supporting info:
	+This package contains ideal gas, Lennard-Jones equation of states,
	+Tait, and full support for complete (i.e. internal-energy dependent)
	+equations of state. It allows for plain or Monaghans XSPH integration
	+of the equations of motion. It has options for density continuity or
	+density summation to propagate the density field. It has
	+"set"_set.html command options to set the internal energy and density
	+of particles from the input script and allows the same quantities to
	+be output with thermodynamic output or to dump files via the "compute
	+property/atom"_compute_property_atom.html command.

	-This package implements smoothed particle hydrodynamics (SPH) in
	-LAMMPS. Currently, the package has the following features:
	+[Author:] Georg Ganzenmuller (Fraunhofer-Institute for High-Speed
	+Dynamics, Ernst Mach Institute, Germany).

	-* Tait, ideal gas, Lennard-Jones equation of states, full support for
	- complete (i.e. internal-energy dependent) equations of state
	+[Install or un-install:]
	+
	+make yes-user-sph
	+make machine :pre
	+
	+make no-user-sph
	+make machine :pre
	+
	+[Supporting info:]

	-* Plain or Monaghans XSPH integration of the equations of motion
	+src/USER-SPH: filenames -> commands
	+src/USER-SPH/README
	+doc/PDF/SPH_LAMMPS_userguide.pdf
	+examples/USER/sph
	+http://lammps.sandia.gov/movies.html#sph :ul

	-* Density continuity or density summation to propagate the density field
	+:line

	-* Commands to set internal energy and density of particles from the
	- input script
	+USER-TALLY package :link(USER-TALLY),h4

	-* Output commands to access internal energy and density for dumping and
	- thermo output
	+[Contents:]

	-See the file doc/PDF/SPH_LAMMPS_userguide.pdf to get started.
	+Several compute styles that can be called when pairwise interactions
	+are calculated to tally information (forces, heat flux, energy,
	+stress, etc) about individual interactions.

	-There are example scripts for using this package in examples/USER/sph.
	+[Author:] Axel Kohlmeyer (Temple U).

	-The person who created this package is Georg Ganzenmuller at the
	-Fraunhofer-Institute for High-Speed Dynamics, Ernst Mach Institute in
	-Germany (georg.ganzenmueller at emi.fhg.de). Contact him directly if
	-you have questions.
	+[Install or un-install:]
	+
	+make yes-user-tally
	+make machine :pre
	+
	+make no-user-tally
	+make machine :pre
	+
	+[Supporting info:]

	-Examples: examples/USER/sph
	+src/USER-TALLY: filenames -> commands
	+src/USER-TALLY/README
	+"compute */tally"_compute_tally.html
	+examples/USER/tally :ul

	:line

	-USER-TALLY package :link(USER-TALLY),h5
	+USER-VTK package :link(USER-VTK),h4

	-Supporting info:
	+[Contents:]

	-Examples: examples/USER/tally
	+A "dump custom/vtk"_dump_custom_vtk.html command which outputs
	+snapshot info in the "VTK format"_vtk, enabling visualization by
	+"Paraview"_paraview or other visuzlization packages.

	-:line
	+:link(vtk,http://www.vtk.org)
	+:link(paraview,http://www.paraview.org)
	+
	+To use this package you must have VTK library available on your
	+system.
	+
	+[Authors:] Richard Berger (JKU) and Daniel Queteschiner (DCS Computing).

	-USER-VTK package :link(USER-VTK),h5
	+[Install or un-install:]
	+
	+The lib/vtk/Makefile.lammps file has settings for accessing VTK files
	+and its library, which are required for LAMMPS to build and link with
	+this package. If the settings are not valid for your system, check if
	+one of the other lib/vtk/Makefile.lammps.* files is compatible and
	+copy it to Makefile.lammps. If none of the provided files work, you
	+will need to edit the Makefile.lammps file.
	+
	+You can then install/un-install the package and build LAMMPS in the
	+usual manner:
	+
	+make yes-user-vtk
	+make machine :pre
	+
	+make no-user-vtk
	+make machine :pre
	+
	+[Supporting info:]

	+src/USER-VTK: filenames -> commands
	+src/USER-VTK/README
	+lib/vtk/README
	+"dump custom/vtk"_dump_custom_vtk.html :ul
	diff --git a/doc/src/Section_start.txt b/doc/src/Section_start.txt
	index 5a5de9ac9..0a7209765 100644
	--- a/doc/src/Section_start.txt
	+++ b/doc/src/Section_start.txt
	@@ -1,1905 +1,1771 @@
	"Previous Section"_Section_intro.html - "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next Section"_Section_commands.html :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	2. Getting Started :h3

	This section describes how to build and run LAMMPS, for both new and
	experienced users.

	2.1 "What's in the LAMMPS distribution"_#start_1
	2.2 "Making LAMMPS"_#start_2
	2.3 "Making LAMMPS with optional packages"_#start_3
	-2.4 "Building LAMMPS via the Make.py script"_#start_4
	-2.5 "Building LAMMPS as a library"_#start_5
	-2.6 "Running LAMMPS"_#start_6
	-2.7 "Command-line options"_#start_7
	-2.8 "Screen output"_#start_8
	-2.9 "Tips for users of previous versions"_#start_9 :all(b)
	+2.5 "Building LAMMPS as a library"_#start_4
	+2.6 "Running LAMMPS"_#start_5
	+2.7 "Command-line options"_#start_6
	+2.8 "Screen output"_#start_7
	+2.9 "Tips for users of previous versions"_#start_8 :all(b)

	:line

	2.1 What's in the LAMMPS distribution :h4,link(start_1)

	When you download a LAMMPS tarball you will need to unzip and untar
	the downloaded file with the following commands, after placing the
	tarball in an appropriate directory.

	tar -xzvf lammps*.tar.gz :pre

	This will create a LAMMPS directory containing two files and several
	sub-directories:

	README: text file
	LICENSE: the GNU General Public License (GPL)
	bench: benchmark problems
	doc: documentation
	examples: simple test problems
	potentials: embedded atom method (EAM) potential files
	src: source files
	tools: pre- and post-processing tools :tb(s=:)

	Note that the "download page"_download also has links to download
	pre-build Windows installers, as well as pre-built packages for
	several widely used Linux distributions. It also has instructions
	for how to download/install LAMMPS for Macs (via Homebrew), and to
	download and update LAMMPS from SVN and Git repositories, which gives
	you access to the up-to-date sources that are used by the LAMMPS
	core developers.

	:link(download,http://lammps.sandia.gov/download.html)

	The Windows and Linux packages for serial or parallel include
	only selected packages and bug-fixes/upgrades listed on "this
	page"_http://lammps.sandia.gov/bug.html up to a certain date, as
	stated on the download page. If you want an executable with
	non-included packages or that is more current, then you'll need to
	build LAMMPS yourself, as discussed in the next section.

	Skip to the "Running LAMMPS"_#start_6 sections for info on how to
	launch a LAMMPS Windows executable on a Windows box.

	:line

	2.2 Making LAMMPS :h4,link(start_2)

	This section has the following sub-sections:

	2.2.1 "Read this first"_#start_2_1
	2.2.1 "Steps to build a LAMMPS executable"_#start_2_2
	2.2.3 "Common errors that can occur when making LAMMPS"_#start_2_3
	2.2.4 "Additional build tips"_#start_2_4
	2.2.5 "Building for a Mac"_#start_2_5
	2.2.6 "Building for Windows"_#start_2_6 :all(b)

	:line

	Read this first :h5,link(start_2_1)

	-If you want to avoid building LAMMPS yourself, read the preceding
	+If you want to avoid building LAMMPS yourself, read the preceeding
	section about options available for downloading and installing
	executables. Details are discussed on the "download"_download page.

	Building LAMMPS can be simple or not-so-simple. If all you need are
	the default packages installed in LAMMPS, and MPI is already installed
	on your machine, or you just want to run LAMMPS in serial, then you
	can typically use the Makefile.mpi or Makefile.serial files in
	src/MAKE by typing one of these lines (from the src dir):

	make mpi
	make serial :pre

	Note that on a facility supercomputer, there are often "modules"
	loaded in your environment that provide the compilers and MPI you
	should use. In this case, the "mpicxx" compile/link command in
	-Makefile.mpi should just work by accessing those modules.
	+Makefile.mpi should simply work by accessing those modules.

	It may be the case that one of the other Makefile.machine files in the
	src/MAKE sub-directories is a better match to your system (type "make"
	to see a list), you can use it as-is by typing (for example):

	make stampede :pre

	If any of these builds (with an existing Makefile.machine) works on
	your system, then you're done!

	+If you need to install an optional package with a LAMMPS command you
	+want to use, and the package does not depend on an extra library, you
	+can simply type
	+
	+make name :pre
	+
	+before invoking (or re-invoking) the above steps. "Name" is the
	+lower-case name of the package, e.g. replica or user-misc.
	+
	If you want to do one of the following:

	-use optional LAMMPS features that require additional libraries
	-use optional packages that require additional libraries
	-use optional accelerator packages that require special compiler/linker settings
	-run on a specialized platform that has its own compilers, settings, or other libs to use :ul
	+use a LAMMPS command that requires an extra library (e.g. "dump image"_dump_image.html)
	+build with a package that requires an extra library
	+build with an accelerator package that requires special compiler/linker settings
	+run on a machine that has its own compilers, settings, or libraries :ul

	then building LAMMPS is more complicated. You may need to find where
	-auxiliary libraries exist on your machine or install them if they
	-don't. You may need to build additional libraries that are part of
	-the LAMMPS package, before building LAMMPS. You may need to edit a
	+extra libraries exist on your machine or install them if they don't.
	+You may need to build extra libraries that are included in the LAMMPS
	+distribution, before building LAMMPS itself. You may need to edit a
	Makefile.machine file to make it compatible with your system.

	-Note that there is a Make.py tool in the src directory that automates
	-several of these steps, but you still have to know what you are doing.
	-"Section 2.4"_#start_4 below describes the tool. It is a convenient
	-way to work with installing/un-installing various packages, the
	-Makefile.machine changes required by some packages, and the auxiliary
	-libraries some of them use.
	-
	Please read the following sections carefully. If you are not
	comfortable with makefiles, or building codes on a Unix platform, or
	running an MPI job on your machine, please find a local expert to help
	-you. Many compilation, linking, and run problems that users have are
	-often not really LAMMPS issues - they are peculiar to the user's
	-system, compilers, libraries, etc. Such questions are better answered
	-by a local expert.
	+you. Many compilation, linking, and run problems users experience are
	+often not LAMMPS issues - they are peculiar to the user's system,
	+compilers, libraries, etc. Such questions are better answered by a
	+local expert.

	If you have a build problem that you are convinced is a LAMMPS issue
	(e.g. the compiler complains about a line of LAMMPS source code), then
	please post the issue to the "LAMMPS mail
	list"_http://lammps.sandia.gov/mail.html.

	If you succeed in building LAMMPS on a new kind of machine, for which
	there isn't a similar machine Makefile included in the
	src/MAKE/MACHINES directory, then send it to the developers and we can
	include it in the LAMMPS distribution.

	:line

	Steps to build a LAMMPS executable :h5,link(start_2_2)

	Step 0 :h6

	The src directory contains the C++ source and header files for LAMMPS.
	It also contains a top-level Makefile and a MAKE sub-directory with
	low-level Makefile.* files for many systems and machines. See the
	src/MAKE/README file for a quick overview of what files are available
	and what sub-directories they are in.

	The src/MAKE dir has a few files that should work as-is on many
	platforms. The src/MAKE/OPTIONS dir has more that invoke additional
	compiler, MPI, and other setting options commonly used by LAMMPS, to
	illustrate their syntax. The src/MAKE/MACHINES dir has many more that
	have been tweaked or optimized for specific machines. These files are
	all good starting points if you find you need to change them for your
	machine. Put any file you edit into the src/MAKE/MINE directory and
	it will be never be touched by any LAMMPS updates.

	>From within the src directory, type "make" or "gmake". You should see
	a list of available choices from src/MAKE and all of its
	sub-directories. If one of those has the options you want or is the
	machine you want, you can type a command like:

	make mpi :pre
	or

	make serial :pre
	or

	gmake mac :pre

	Note that the corresponding Makefile.machine can exist in src/MAKE or
	any of its sub-directories. If a file with the same name appears in
	multiple places (not a good idea), the order they are used is as
	follows: src/MAKE/MINE, src/MAKE, src/MAKE/OPTIONS, src/MAKE/MACHINES.
	This gives preference to a file you have created/edited and put in
	src/MAKE/MINE.

	Note that on a multi-processor or multi-core platform you can launch a
	parallel make, by using the "-j" switch with the make command, which
	will build LAMMPS more quickly.

	If you get no errors and an executable like [lmp_mpi] or [lmp_serial]
	or [lmp_mac] is produced, then you're done; it's your lucky day.

	Note that by default only a few of LAMMPS optional packages are
	installed. To build LAMMPS with optional packages, see "this
	section"_#start_3 below.

	Step 1 :h6

	If Step 0 did not work, you will need to create a low-level Makefile
	for your machine, like Makefile.foo. You should make a copy of an
	existing Makefile.* in src/MAKE or one of its sub-directories as a
	starting point. The only portions of the file you need to edit are
	the first line, the "compiler/linker settings" section, and the
	"LAMMPS-specific settings" section. When it works, put the edited
	file in src/MAKE/MINE and it will not be altered by any future LAMMPS
	updates.

	Step 2 :h6

	Change the first line of Makefile.foo to list the word "foo" after the
	"#", and whatever other options it will set. This is the line you
	will see if you just type "make".

	Step 3 :h6

	The "compiler/linker settings" section lists compiler and linker
	settings for your C++ compiler, including optimization flags. You can
	use g++, the open-source GNU compiler, which is available on all Unix
	systems. You can also use mpicxx which will typically be available if
	MPI is installed on your system, though you should check which actual
	compiler it wraps. Vendor compilers often produce faster code. On
	boxes with Intel CPUs, we suggest using the Intel icc compiler, which
	can be downloaded from "Intel's compiler site"_intel.

	:link(intel,http://www.intel.com/software/products/noncom)

	If building a C++ code on your machine requires additional libraries,
	then you should list them as part of the LIB variable. You should
	not need to do this if you use mpicxx.

	The DEPFLAGS setting is what triggers the C++ compiler to create a
	dependency list for a source file. This speeds re-compilation when
	source (.cpp) or header (.h) files are edited. Some compilers do
	not support dependency file creation, or may use a different switch
	than -D. GNU g++ and Intel icc works with -D. If your compiler can't
	create dependency files, then you'll need to create a Makefile.foo
	patterned after Makefile.storm, which uses different rules that do not
	involve dependency files. Note that when you build LAMMPS for the
	first time on a new platform, a long list of *.d files will be printed
	out rapidly. This is not an error; it is the Makefile doing its
	normal creation of dependencies.

	Step 4 :h6

	The "system-specific settings" section has several parts. Note that
	if you change any -D setting in this section, you should do a full
	re-compile, after typing "make clean" (which will describe different
	clean options).

	The LMP_INC variable is used to include options that turn on ifdefs
	-within the LAMMPS code. The options that are currently recognized are:
	+within the LAMMPS code. The options that are currently recogized are:

	-DLAMMPS_GZIP
	-DLAMMPS_JPEG
	-DLAMMPS_PNG
	-DLAMMPS_FFMPEG
	-DLAMMPS_MEMALIGN
	-DLAMMPS_XDR
	-DLAMMPS_SMALLBIG
	-DLAMMPS_BIGBIG
	-DLAMMPS_SMALLSMALL
	-DLAMMPS_LONGLONG_TO_LONG
	-DLAMMPS_EXCEPTIONS
	-DPACK_ARRAY
	-DPACK_POINTER
	-DPACK_MEMCPY :ul

	The read_data and dump commands will read/write gzipped files if you
	compile with -DLAMMPS_GZIP. It requires that your machine supports
	the "popen()" function in the standard runtime library and that a gzip
	executable can be found by LAMMPS during a run.

	NOTE: on some clusters with high-speed networks, using the fork()
	library calls (required by popen()) can interfere with the fast
	communication library and lead to simulations using compressed output
	or input to hang or crash. For selected operations, compressed file
	I/O is also available using a compression library instead, which are
	provided in the COMPRESS package. From more details about compiling
	LAMMPS with packages, please see below.

	If you use -DLAMMPS_JPEG, the "dump image"_dump_image.html command
	will be able to write out JPEG image files. For JPEG files, you must
	also link LAMMPS with a JPEG library, as described below. If you use
	-DLAMMPS_PNG, the "dump image"_dump.html command will be able to write
	out PNG image files. For PNG files, you must also link LAMMPS with a
	PNG library, as described below. If neither of those two defines are
	used, LAMMPS will only be able to write out uncompressed PPM image
	files.

	If you use -DLAMMPS_FFMPEG, the "dump movie"_dump_image.html command
	will be available to support on-the-fly generation of rendered movies
	the need to store intermediate image files. It requires that your
	machines supports the "popen" function in the standard runtime library
	and that an FFmpeg executable can be found by LAMMPS during the run.

	NOTE: Similar to the note above, this option can conflict with
	high-speed networks, because it uses popen().

	Using -DLAMMPS_MEMALIGN=<bytes> enables the use of the
	posix_memalign() call instead of malloc() when large chunks or memory
	are allocated by LAMMPS. This can help to make more efficient use of
	vector instructions of modern CPUS, since dynamically allocated memory
	has to be aligned on larger than default byte boundaries (e.g. 16
	bytes instead of 8 bytes on x86 type platforms) for optimal
	performance.

	If you use -DLAMMPS_XDR, the build will include XDR compatibility
	files for doing particle dumps in XTC format. This is only necessary
	if your platform does have its own XDR files available. See the
	Restrictions section of the "dump"_dump.html command for details.

	Use at most one of the -DLAMMPS_SMALLBIG, -DLAMMPS_BIGBIG,
	-DLAMMPS_SMALLSMALL settings. The default is -DLAMMPS_SMALLBIG. These
	settings refer to use of 4-byte (small) vs 8-byte (big) integers
	within LAMMPS, as specified in src/lmptype.h. The only reason to use
	the BIGBIG setting is to enable simulation of huge molecular systems
	(which store bond topology info) with more than 2 billion atoms, or to
	track the image flags of moving atoms that wrap around a periodic box
	more than 512 times. Normally, the only reason to use SMALLSMALL is
	if your machine does not support 64-bit integers, though you can use
	SMALLSMALL setting if you are running in serial or on a desktop
	machine or small cluster where you will never run large systems or for
	long time (more than 2 billion atoms, more than 2 billion timesteps).
	See the "Additional build tips"_#start_2_4 section below for more
	details on these settings.

	Note that the USER-ATC package is not currently compatible with
	-DLAMMPS_BIGBIG. Also the GPU package requires the lib/gpu library to
	be compiled with the same setting, or the link will fail.

	The -DLAMMPS_LONGLONG_TO_LONG setting may be needed if your system or
	MPI version does not recognize "long long" data types. In this case a
	"long" data type is likely already 64-bits, in which case this setting
	will convert to that data type.

	The -DLAMMPS_EXCEPTIONS setting can be used to activate alternative
	versions of error handling inside of LAMMPS. This is useful when
	external codes drive LAMMPS as a library. Using this option, LAMMPS
	errors do not kill the caller. Instead, the call stack is unwound and
	control returns to the caller. The library interface provides the
	lammps_has_error() and lammps_get_last_error_message() functions to
	detect and find out more about a LAMMPS error.

	Using one of the -DPACK_ARRAY, -DPACK_POINTER, and -DPACK_MEMCPY
	options can make for faster parallel FFTs (in the PPPM solver) on some
	platforms. The -DPACK_ARRAY setting is the default. See the
	"kspace_style"_kspace_style.html command for info about PPPM. See
	Step 6 below for info about building LAMMPS with an FFT library.

	Step 5 :h6

	The 3 MPI variables are used to specify an MPI library to build LAMMPS
	with. Note that you do not need to set these if you use the MPI
	compiler mpicxx for your CC and LINK setting in the section above.
	The MPI wrapper knows where to find the needed files.

	If you want LAMMPS to run in parallel, you must have an MPI library
	installed on your platform. If MPI is installed on your system in the
	usual place (under /usr/local), you also may not need to specify these
	3 variables, assuming /usr/local is in your path. On some large
	parallel machines which use "modules" for their compile/link
	-environments, you may simply need to include the correct module in
	+environements, you may simply need to include the correct module in
	your build environment, before building LAMMPS. Or the parallel
	machine may have a vendor-provided MPI which the compiler has no
	trouble finding.

	Failing this, these 3 variables can be used to specify where the mpi.h
	file (MPI_INC) and the MPI library file (MPI_PATH) are found and the
	name of the library file (MPI_LIB).

	If you are installing MPI yourself, we recommend Argonne's MPICH2
	or OpenMPI. MPICH can be downloaded from the "Argonne MPI
	site"_http://www.mcs.anl.gov/research/projects/mpich2/. OpenMPI can
	be downloaded from the "OpenMPI site"_http://www.open-mpi.org.
	Other MPI packages should also work. If you are running on a big
	parallel platform, your system people or the vendor should have
	already installed a version of MPI, which is likely to be faster
	than a self-installed MPICH or OpenMPI, so find out how to build
	and link with it. If you use MPICH or OpenMPI, you will have to
	configure and build it for your platform. The MPI configure script
	should have compiler options to enable you to use the same compiler
	you are using for the LAMMPS build, which can avoid problems that can
	arise when linking LAMMPS to the MPI library.

	If you just want to run LAMMPS on a single processor, you can use the
	dummy MPI library provided in src/STUBS, since you don't need a true
	MPI library installed on your system. See src/MAKE/Makefile.serial
	for how to specify the 3 MPI variables in this case. You will also
	need to build the STUBS library for your platform before making LAMMPS
	itself. Note that if you are building with src/MAKE/Makefile.serial,
	e.g. by typing "make serial", then the STUBS library is built for you.

	To build the STUBS library from the src directory, type "make
	mpi-stubs", or from the src/STUBS dir, type "make". This should
	create a libmpi_stubs.a file suitable for linking to LAMMPS. If the
	build fails, you will need to edit the STUBS/Makefile for your
	platform.

	The file STUBS/mpi.c provides a CPU timer function called MPI_Wtime()
	that calls gettimeofday() . If your system doesn't support
	gettimeofday() , you'll need to insert code to call another timer.
	Note that the ANSI-standard function clock() rolls over after an hour
	or so, and is therefore insufficient for timing long LAMMPS
	simulations.

	Step 6 :h6

	The 3 FFT variables allow you to specify an FFT library which LAMMPS
	uses (for performing 1d FFTs) when running the particle-particle
	particle-mesh (PPPM) option for long-range Coulombics via the
	"kspace_style"_kspace_style.html command.

	LAMMPS supports common open-source or vendor-supplied FFT libraries
	for this purpose. If you leave these 3 variables blank, LAMMPS will
	use the open-source "KISS FFT library"_http://kissfft.sf.net, which is
	included in the LAMMPS distribution. This library is portable to all
	platforms and for typical LAMMPS simulations is almost as fast as FFTW
	or vendor optimized libraries. If you are not including the KSPACE
	package in your build, you can also leave the 3 variables blank.

	Otherwise, select which kinds of FFTs to use as part of the FFT_INC
	setting by a switch of the form -DFFT_XXX. Recommended values for XXX
	are: MKL or FFTW3. FFTW2 and NONE are supported as legacy options.
	Selecting -DFFT_FFTW will use the FFTW3 library and -DFFT_NONE will
	use the KISS library described above.

	You may also need to set the FFT_INC, FFT_PATH, and FFT_LIB variables,
	so the compiler and linker can find the needed FFT header and library
	files. Note that on some large parallel machines which use "modules"
	-for their compile/link environments, you may simply need to include
	+for their compile/link environements, you may simply need to include
	the correct module in your build environment. Or the parallel machine
	may have a vendor-provided FFT library which the compiler has no
	trouble finding.

	FFTW is a fast, portable library that should also work on any
	platform. You can download it from
	"www.fftw.org"_http://www.fftw.org. Both the legacy version 2.1.X and
	the newer 3.X versions are supported as -DFFT_FFTW2 or -DFFT_FFTW3.
	Building FFTW for your box should be as simple as ./configure; make.
	Note that on some platforms FFTW2 has been pre-installed, and uses
	renamed files indicating the precision it was compiled with,
	e.g. sfftw.h, or dfftw.h instead of fftw.h. In this case, you can
	specify an additional define variable for FFT_INC called -DFFTW_SIZE,
	which will select the correct include file. In this case, for FFT_LIB
	you must also manually specify the correct library, namely -lsfftw or
	-ldfftw.

	The FFT_INC variable also allows for a -DFFT_SINGLE setting that will
	use single-precision FFTs with PPPM, which can speed-up long-range
	-calculations, particularly in parallel or on GPUs. Fourier transform
	+calulations, particularly in parallel or on GPUs. Fourier transform
	and related PPPM operations are somewhat insensitive to floating point
	truncation errors and thus do not always need to be performed in
	double precision. Using the -DFFT_SINGLE setting trades off a little
	accuracy for reduced memory use and parallel communication costs for
	-transposing 3d FFT data.
	+transposing 3d FFT data. Note that single precision FFTs have only
	+been tested with the FFTW3, FFTW2, MKL, and KISS FFT options.

	Step 7 :h6

	The 3 JPG variables allow you to specify a JPEG and/or PNG library
	which LAMMPS uses when writing out JPEG or PNG files via the "dump
	image"_dump_image.html command. These can be left blank if you do not
	use the -DLAMMPS_JPEG or -DLAMMPS_PNG switches discussed above in Step
	4, since in that case JPEG/PNG output will be disabled.

	A standard JPEG library usually goes by the name libjpeg.a or
	libjpeg.so and has an associated header file jpeglib.h. Whichever
	JPEG library you have on your platform, you'll need to set the
	appropriate JPG_INC, JPG_PATH, and JPG_LIB variables, so that the
	compiler and linker can find it.

	A standard PNG library usually goes by the name libpng.a or libpng.so
	and has an associated header file png.h. Whichever PNG library you
	have on your platform, you'll need to set the appropriate JPG_INC,
	JPG_PATH, and JPG_LIB variables, so that the compiler and linker can
	find it.

	As before, if these header and library files are in the usual place on
	your machine, you may not need to set these variables.

	Step 8 :h6

	Note that by default only a few of LAMMPS optional packages are
	installed. To build LAMMPS with optional packages, see "this
	section"_#start_3 below, before proceeding to Step 9.

	Step 9 :h6

	That's it. Once you have a correct Makefile.foo, and you have
	pre-built any other needed libraries (e.g. MPI, FFT, etc) all you need
	to do from the src directory is type something like this:

	make foo
	make -j N foo
	gmake foo
	gmake -j N foo :pre

	The -j or -j N switches perform a parallel build which can be much
	faster, depending on how many cores your compilation machine has. N
	is the number of cores the build runs on.

	You should get the executable lmp_foo when the build is complete.

	:line

	Errors that can occur when making LAMMPS: h5 :link(start_2_3)

	-NOTE: If an error occurs when building LAMMPS, the compiler or linker
	-will state very explicitly what the problem is. The error message
	-should give you a hint as to which of the steps above has failed, and
	-what you need to do in order to fix it. Building a code with a
	-Makefile is a very logical process. The compiler and linker need to
	-find the appropriate files and those files need to be compatible with
	-LAMMPS source files. When a make fails, there is usually a very
	+If an error occurs when building LAMMPS, the compiler or linker will
	+state very explicitly what the problem is. The error message should
	+give you a hint as to which of the steps above has failed, and what
	+you need to do in order to fix it. Building a code with a Makefile is
	+a very logical process. The compiler and linker need to find the
	+appropriate files and those files need to be compatible with LAMMPS
	+settings and source files. When a make fails, there is usually a very
	simple reason, which you or a local expert will need to fix.

	Here are two non-obvious errors that can occur:

	(1) If the make command breaks immediately with errors that indicate
	it can't find files with a "*" in their names, this can be because
	your machine's native make doesn't support wildcard expansion in a
	makefile. Try gmake instead of make. If that doesn't work, try using
	a -f switch with your make command to use a pre-generated
	Makefile.list which explicitly lists all the needed files, e.g.

	make makelist
	make -f Makefile.list linux
	gmake -f Makefile.list mac :pre

	The first "make" command will create a current Makefile.list with all
	the file names in your src dir. The 2nd "make" command (make or
	gmake) will use it to build LAMMPS. Note that you should
	include/exclude any desired optional packages before using the "make
	makelist" command.

	(2) If you get an error that says something like 'identifier "atoll"
	is undefined', then your machine does not support "long long"
	integers. Try using the -DLAMMPS_LONGLONG_TO_LONG setting described
	above in Step 4.

	:line

	Additional build tips :h5,link(start_2_4)

	Building LAMMPS for multiple platforms. :h6

	You can make LAMMPS for multiple platforms from the same src
	directory. Each target creates its own object sub-directory called
	Obj_target where it stores the system-specific *.o files.

	Cleaning up. :h6

	Typing "make clean-all" or "make clean-machine" will delete *.o object
	files created when LAMMPS is built, for either all builds or for a
	particular machine.

	-Changing the LAMMPS size limits via -DLAMMPS_SMALLBIG or -DLAMMPS_BIGBIG or -DLAMMPS_SMALLSMALL :h6
	+Changing the LAMMPS size limits via -DLAMMPS_SMALLBIG or
	+-DLAMMPS_BIGBIG or -DLAMMPS_SMALLSMALL :h6

	As explained above, any of these 3 settings can be specified on the
	LMP_INC line in your low-level src/MAKE/Makefile.foo.

	The default is -DLAMMPS_SMALLBIG which allows for systems with up to
	2^63 atoms and 2^63 timesteps (about 9e18). The atom limit is for
	atomic systems which do not store bond topology info and thus do not
	require atom IDs. If you use atom IDs for atomic systems (which is
	the default) or if you use a molecular model, which stores bond
	topology info and thus requires atom IDs, the limit is 2^31 atoms
	(about 2 billion). This is because the IDs are stored in 32-bit
	integers.

	Likewise, with this setting, the 3 image flags for each atom (see the
	"dump"_dump.html doc page for a discussion) are stored in a 32-bit
	integer, which means the atoms can only wrap around a periodic box (in
	each dimension) at most 512 times. If atoms move through the periodic
	box more than this many times, the image flags will "roll over",
	e.g. from 511 to -512, which can cause diagnostics like the
	mean-squared displacement, as calculated by the "compute
	msd"_compute_msd.html command, to be faulty.

	To allow for larger atomic systems with atom IDs or larger molecular
	systems or larger image flags, compile with -DLAMMPS_BIGBIG. This
	stores atom IDs and image flags in 64-bit integers. This enables
	atomic or molecular systems with atom IDS of up to 2^63 atoms (about
	9e18). And image flags will not "roll over" until they reach 2^20 =
	1048576.

	If your system does not support 8-byte integers, you will need to
	compile with the -DLAMMPS_SMALLSMALL setting. This will restrict the
	total number of atoms (for atomic or molecular systems) and timesteps
	to 2^31 (about 2 billion). Image flags will roll over at 2^9 = 512.

	Note that in src/lmptype.h there are definitions of all these data
	types as well as the MPI data types associated with them. The MPI
	types need to be consistent with the associated C data types, or else
	LAMMPS will generate a run-time error. As far as we know, the
	settings defined in src/lmptype.h are portable and work on every
	current system.

	In all cases, the size of problem that can be run on a per-processor
	basis is limited by 4-byte integer storage to 2^31 atoms per processor
	(about 2 billion). This should not normally be a limitation since such
	a problem would have a huge per-processor memory footprint due to
	neighbor lists and would run very slowly in terms of CPU secs/timestep.

	:line

	Building for a Mac :h5,link(start_2_5)

	OS X is a derivative of BSD Unix, so it should just work. See the
	src/MAKE/MACHINES/Makefile.mac and Makefile.mac_mpi files.

	:line

	Building for Windows :h5,link(start_2_6)

	If you want to build a Windows version of LAMMPS, you can build it
	yourself, but it may require some effort. LAMMPS expects a Unix-like
	build environment for the default build procedure. This can be done
	using either Cygwin or MinGW; the latter also exists as a ready-to-use
	Linux-to-Windows cross-compiler in several Linux distributions. In
	these cases, you can do the installation after installing several
	unix-style commands like make, grep, sed and bash with some shell
	utilities.

	For Cygwin and the MinGW cross-compilers, suitable makefiles are
	provided in src/MAKE/MACHINES. When using other compilers, like
	Visual C++ or Intel compilers for Windows, you may have to implement
	your own build system. Since none of the current LAMMPS core developers
	has significant experience building executables on Windows, we are
	happy to distribute contributed instructions and modifications, but
	we cannot provide support for those.

	With the so-called "Anniversary Update" to Windows 10, there is a
	Ubuntu Linux subsystem available for Windows, that can be installed
	and then used to compile/install LAMMPS as if you are running on a
	Ubuntu Linux system instead of Windows.

	As an alternative, you can download "daily builds" (and some older
	versions) of the installer packages from
	"rpm.lammps.org/windows.html"_http://rpm.lammps.org/windows.html.
	These executables are built with most optional packages and the
	download includes documentation, potential files, some tools and
	many examples, but no source code.

	:line

	2.3 Making LAMMPS with optional packages :h4,link(start_3)

	This section has the following sub-sections:

	2.3.1 "Package basics"_#start_3_1
	2.3.2 "Including/excluding packages"_#start_3_2
	2.3.3 "Packages that require extra libraries"_#start_3_3
	2.3.4 "Packages that require Makefile.machine settings"_#start_3_4 :all(b)

	-Note that the following "Section 2.4"_#start_4 describes the Make.py
	-tool which can be used to install/un-install packages and build the
	-auxiliary libraries which some of them use. It can also auto-edit a
	-Makefile.machine to add settings needed by some packages.
	-
	:line

	Package basics: :h5,link(start_3_1)

	The source code for LAMMPS is structured as a set of core files which
	are always included, plus optional packages. Packages are groups of
	files that enable a specific set of features. For example, force
	fields for molecular systems or granular systems are in packages.

	-"Section 4"_Section_packages.html in the manual has details
	-about all the packages, including specific instructions for building
	-LAMMPS with each package, which are covered in a more general manner
	+"Section 4"_Section_packages.html in the manual has details about all
	+the packages, which come in two flavors: [standard] and [user]
	+packages. It also has specific instructions for building LAMMPS with
	+any package which requires an extra library. General instructions are
	below.

	You can see the list of all packages by typing "make package" from
	-within the src directory of the LAMMPS distribution. This also lists
	-various make commands that can be used to manipulate packages.
	+within the src directory of the LAMMPS distribution. It will also
	+list various make commands that can be used to manage packages.

	If you use a command in a LAMMPS input script that is part of a
	package, you must have built LAMMPS with that package, else you will
	get an error that the style is invalid or the command is unknown.
	-Every command's doc page specifies if it is part of a package. You can
	-also type
	+Every command's doc page specfies if it is part of a package. You can
	+type

	lmp_machine -h :pre

	to run your executable with the optional "-h command-line
	-switch"_#start_7 for "help", which will simply list the styles and
	-commands known to your executable, and immediately exit.
	-
	-There are two kinds of packages in LAMMPS, standard and user packages.
	-More information about the contents of standard and user packages is
	-given in "Section 4"_Section_packages.html of the manual. The
	-difference between standard and user packages is as follows:
	-
	-Standard packages, such as molecule or kspace, are supported by the
	-LAMMPS developers and are written in a syntax and style consistent
	-with the rest of LAMMPS. This means we will answer questions about
	-them, debug and fix them if necessary, and keep them compatible with
	-future changes to LAMMPS.
	-
	-User packages, such as user-atc or user-omp, have been contributed by
	-users, and always begin with the user prefix. If they are a single
	-command (single file), they are typically in the user-misc package.
	-Otherwise, they are a set of files grouped together which add a
	-specific functionality to the code.
	-
	-User packages don't necessarily meet the requirements of the standard
	-packages. If you have problems using a feature provided in a user
	-package, you may need to contact the contributor directly to get help.
	-Information on how to submit additions you make to LAMMPS as single
	-files or either a standard or user-contributed package are given in
	-"this section"_Section_modify.html#mod_15 of the documentation.
	+switch"_#start_7 for "help", which will list the styles and commands
	+known to your executable, and immediately exit.

	:line

	Including/excluding packages :h5,link(start_3_2)

	-To use (or not use) a package you must include it (or exclude it)
	-before building LAMMPS. From the src directory, this is typically as
	-simple as:
	+To use (or not use) a package you must install it (or un-install it)
	+before building LAMMPS. From the src directory, this is as simple as:

	make yes-colloid
	make mpi :pre

	or

	-make no-manybody
	+make no-user-omp
	make mpi :pre

	-NOTE: You should NOT include/exclude packages and build LAMMPS in a
	+NOTE: You should NOT install/un-install packages and build LAMMPS in a
	single make command using multiple targets, e.g. make yes-colloid mpi.
	This is because the make procedure creates a list of source files that
	will be out-of-date for the build if the package configuration changes
	within the same command.

	-Some packages have individual files that depend on other packages
	-being included. LAMMPS checks for this and does the right thing.
	-I.e. individual files are only included if their dependencies are
	-already included. Likewise, if a package is excluded, other files
	+Any package can be installed or not in a LAMMPS build, independent of
	+all other packages. However, some packages include files derived from
	+files in other packages. LAMMPS checks for this and does the right
	+thing. I.e. individual files are only included if their dependencies
	+are already included. Likewise, if a package is excluded, other files
	dependent on that package are also excluded.

	+NOTE: The one exception is that we do not recommend building with both
	+the KOKKOS package installed and any of the other acceleration
	+packages (GPU, OPT, USER-INTEL, USER-OMP) also installed. This is
	+because of how Kokkos sometimes builds using a wrapper compiler which
	+can make it difficult to invoke all the compile/link flags correctly
	+for both Kokkos and non-Kokkos files.
	+
	If you will never run simulations that use the features in a
	particular packages, there is no reason to include it in your build.
	-For some packages, this will keep you from having to build auxiliary
	-libraries (see below), and will also produce a smaller executable
	-which may run a bit faster.
	-
	-When you download a LAMMPS tarball, these packages are pre-installed
	-in the src directory: KSPACE, MANYBODY,MOLECULE, because they are so
	-commonly used. When you download LAMMPS source files from the SVN or
	-Git repositories, no packages are pre-installed.
	-
	-Packages are included or excluded by typing "make yes-name" or "make
	-no-name", where "name" is the name of the package in lower-case, e.g.
	-name = kspace for the KSPACE package or name = user-atc for the
	-USER-ATC package. You can also type "make yes-standard", "make
	-no-standard", "make yes-std", "make no-std", "make yes-user", "make
	-no-user", "make yes-lib", "make no-lib", "make yes-all", or "make
	-no-all" to include/exclude various sets of packages. Type "make
	-package" to see all of the package-related make options.
	-
	-NOTE: Inclusion/exclusion of a package works by simply moving files
	-back and forth between the main src directory and sub-directories with
	-the package name (e.g. src/KSPACE, src/USER-ATC), so that the files
	-are seen or not seen when LAMMPS is built. After you have included or
	-excluded a package, you must re-build LAMMPS.
	-
	-Additional package-related make options exist to help manage LAMMPS
	-files that exist in both the src directory and in package
	-sub-directories. You do not normally need to use these commands
	-unless you are editing LAMMPS files or have downloaded a patch from
	-the LAMMPS WWW site.
	-
	-Typing "make package-update" or "make pu" will overwrite src files
	-with files from the package sub-directories if the package has been
	-included. It should be used after a patch is installed, since patches
	-only update the files in the package sub-directory, but not the src
	-files. Typing "make package-overwrite" will overwrite files in the
	-package sub-directories with src files.
	+For some packages, this will keep you from having to build extra
	+libraries, and will also produce a smaller executable which may run a
	+bit faster.
	+
	+When you download a LAMMPS tarball, three packages are pre-installed
	+in the src directory -- KSPACE, MANYBODY, MOLECULE -- because they are
	+so commonly used. When you download LAMMPS source files from the SVN
	+or Git repositories, no packages are pre-installed.
	+
	+Packages are installed or un-installed by typing
	+
	+make yes-name
	+make no-name :pre
	+
	+where "name" is the name of the package in lower-case, e.g. name =
	+kspace for the KSPACE package or name = user-atc for the USER-ATC
	+package. You can also type any of these commands:
	+
	+make yes-all \| install all packages
	+make no-all \| un-install all packages
	+make yes-standard or make yes-std \| install standard packages
	+make no-standard or make no-std\| un-install standard packages
	+make yes-user \| install user packages
	+make no-user \| un-install user packages
	+make yes-lib \| install packages that require extra libraries
	+make no-lib \| un-install packages that require extra libraries
	+make yes-ext \| install packages that require external libraries
	+make no-ext \| un-install packages that require external libraries :tb(s=\|)
	+
	+which install/un-install various sets of packages. Typing "make
	+package" will list all the these commands.
	+
	+NOTE: Installing or un-installing a package works by simply moving
	+files back and forth between the main src directory and
	+sub-directories with the package name (e.g. src/KSPACE, src/USER-ATC),
	+so that the files are included or excluded when LAMMPS is built.
	+After you have installed or un-installed a package, you must re-build
	+LAMMPS for the action to take effect.
	+
	+The following make commands help manage files that exist in both the
	+src directory and in package sub-directories. You do not normally
	+need to use these commands unless you are editing LAMMPS files or have
	+downloaded a patch from the LAMMPS web site.

	Typing "make package-status" or "make ps" will show which packages are
	-currently included. For those that are included, it will list any
	+currently installed. For those that are installed, it will list any
	files that are different in the src directory and package
	-sub-directory. Typing "make package-diff" lists all differences
	-between these files. Again, type "make package" to see all of the
	-package-related make options.
	+sub-directory.

	-:line
	+Typing "make package-update" or "make pu" will overwrite src files
	+with files from the package sub-directories if the package is
	+installed. It should be used after a patch has been applied, since
	+patches only update the files in the package sub-directory, but not
	+the src files.

	-Packages that require extra libraries :h5,link(start_3_3)
	+Typing "make package-overwrite" will overwrite files in the package
	+sub-directories with src files.

	-A few of the standard and user packages require additional auxiliary
	-libraries. Many of them are provided with LAMMPS, in which case they
	-must be compiled first, before LAMMPS is built, if you wish to include
	-that package. If you get a LAMMPS build error about a missing
	-library, this is likely the reason. See the
	-"Section 4"_Section_packages.html doc page for a list of
	-packages that have these kinds of auxiliary libraries.
	-
	-The lib directory in the distribution has sub-directories with package
	-names that correspond to the needed auxiliary libs, e.g. lib/gpu.
	-Each sub-directory has a README file that gives more details. Code
	-for most of the auxiliary libraries is included in that directory.
	-Examples are the USER-ATC and MEAM packages.
	-
	-A few of the lib sub-directories do not include code, but do include
	-instructions (and sometimes scripts) that automate the process of
	-downloading the auxiliary library and installing it so LAMMPS can link
	-to it. Examples are the KIM, VORONOI, USER-MOLFILE, and USER-SMD
	-packages.
	-
	-The lib/python directory (for the PYTHON package) contains only a
	-choice of Makefile.lammps.* files. This is because no auxiliary code
	-or libraries are needed, only the Python library and other system libs
	-that should already available on your system. However, the
	-Makefile.lammps file is needed to tell LAMMPS which libs to use and
	-where to find them.
	-
	-For libraries with provided code, the sub-directory README file
	-(e.g. lib/atc/README) has instructions on how to build that library.
	-This information is also summarized in "Section
	-4"_Section_packages.html. Typically this is done by typing
	-something like:
	+Typing "make package-diff" lists all differences between these files.

	-make -f Makefile.g++ :pre
	-
	-If one of the provided Makefiles is not appropriate for your system
	-you will need to edit or add one. Note that all the Makefiles have a
	-setting for EXTRAMAKE at the top that specifies a Makefile.lammps.*
	-file.
	-
	-If the library build is successful, it will produce 2 files in the lib
	-directory:
	-
	-libpackage.a
	-Makefile.lammps :pre
	-
	-The Makefile.lammps file will typically be a copy of one of the
	-Makefile.lammps.* files in the library directory.
	-
	-Note that you must insure that the settings in Makefile.lammps are
	-appropriate for your system. If they are not, the LAMMPS build may
	-fail. To fix this, you can edit or create a new Makefile.lammps.*
	-file for your system, and copy it to Makefile.lammps.
	-
	-As explained in the lib/package/README files, the settings in
	-Makefile.lammps are used to specify additional system libraries and
	-their locations so that LAMMPS can build with the auxiliary library.
	-For example, if the MEAM package is used, the auxiliary library
	-consists of F90 code, built with a Fortran complier. To link that
	-library with LAMMPS (a C++ code) via whatever C++ compiler LAMMPS is
	-built with, typically requires additional Fortran-to-C libraries be
	-included in the link. Another example are the BLAS and LAPACK
	-libraries needed to use the USER-ATC or USER-AWPMD packages.
	-
	-For libraries without provided code, the sub-directory README file has
	-information on where to download the library and how to build it,
	-e.g. lib/voronoi/README and lib/smd/README. The README files also
	-describe how you must either (a) create soft links, via the "ln"
	-command, in those directories to point to where you built or installed
	-the packages, or (b) check or edit the Makefile.lammps file in the
	-same directory to provide that information.
	-
	-Some of the sub-directories, e.g. lib/voronoi, also have an install.py
	-script which can be used to automate the process of
	-downloading/building/installing the auxiliary library, and setting the
	-needed soft links. Type "python install.py" for further instructions.
	-
	-As with the sub-directories containing library code, if the soft links
	-or settings in the lib/package/Makefile.lammps files are not correct,
	-the LAMMPS build will typically fail.
	+Again, just type "make package" to see all of the package-related make
	+options.

	:line

	-Packages that require Makefile.machine settings :h5,link(start_3_4)
	-
	-A few packages require specific settings in Makefile.machine, to
	-either build or use the package effectively. These are the
	-USER-INTEL, KOKKOS, USER-OMP, and OPT packages, used for accelerating
	-code performance on CPUs or other hardware, as discussed in "Section
	-5.3"_Section_accelerate.html#acc_3.
	+Packages that require extra libraries :h5,link(start_3_3)

	-A summary of what Makefile.machine changes are needed for each of
	-these packages is given in "Section 4"_Section_packages.html.
	-The details are given on the doc pages that describe each of these
	-accelerator packages in detail:
	+A few of the standard and user packages require extra libraries. See
	+"Section 4"_Section_packages.html for two tables of packages which
	+indicate which ones require libraries. For each such package, the
	+Section 4 doc page gives details on how to build the extra library,
	+including how to download it if necessary. The basic ideas are
	+summarized here.
	+
	+[System libraries:]
	+
	+Packages in the tables "Section 4"_Section_packages.html with a "sys"
	+in the last column link to system libraries that typically already
	+exist on your machine. E.g. the python package links to a system
	+Python library. If your machine does not have the required library,
	+you will have to download and install it on your machine, in either
	+the system or user space.
	+
	+[Internal libraries:]
	+
	+Packages in the tables "Section 4"_Section_packages.html with an "int"
	+in the last column link to internal libraries whose source code is
	+included with LAMMPS, in the lib/name directory where name is the
	+package name. You must first build the library in that directory
	+before building LAMMPS with that package installed. E.g. the gpu
	+package links to a library you build in the lib/gpu dir. You can
	+often do the build in one step by typing "make lib-name args=..."
	+from the src dir, with appropriate arguments. You can leave off the
	+args to see a help message. See "Section 4"_Section_packages.html for
	+details for each package.
	+
	+[External libraries:]
	+
	+Packages in the tables "Section 4"_Section_packages.html with an "ext"
	+in the last column link to exernal libraries whose source code is not
	+included with LAMMPS. You must first download and install the library
	+before building LAMMPS with that package installed. E.g. the voronoi
	+package links to the freely available "Voro++ library"_voronoi. You
	+can often do the download/build in one step by typing "make lib-name
	+args=..." from the src dir, with appropriate arguments. You can leave
	+off the args to see a help message. See "Section
	+4"_Section_packages.html for details for each package.
	+
	+:link(voronoi,http://math.lbl.gov/voro++)
	+
	+[Possible errors:]
	+
	+There are various common errors which can occur when building extra
	+libraries or when building LAMMPS with packages that require the extra
	+libraries.
	+
	+If you cannot build the extra library itself successfully, you may
	+need to edit or create an appropriate Makefile for your machine, e.g.
	+with appropriate compiler or system settings. Provided makefiles are
	+typically in the lib/name directory. E.g. see the Makefile.* files in
	+lib/gpu.
	+
	+The LAMMPS build often uses settings in a lib/name/Makefile.lammps
	+file which either exists in the LAMMPS distribution or is created or
	+copied from a lib/name/Makefile.lammps.* file when the library is
	+built. If those settings are not correct for your machine you will
	+need to edit or create an appropriate Makefile.lammps file.
	+
	+Package-specific details for these steps are given in "Section
	+4"_Section_packages.html an in README files in the lib/name
	+directories.
	+
	+[Compiler options needed for accelerator packages:]
	+
	+Several packages contain code that is optimized for specific hardware,
	+e.g. CPU, KNL, or GPU. These are the OPT, GPU, KOKKOS, USER-INTEL,
	+and USER-OMP packages. Compiling and linking the source files in
	+these accelerator packages for optimal performance requires specific
	+settings in the Makefile.machine file you use.
	+
	+A summary of the Makefile.machine settings needed for each of these
	+packages is given in "Section 4"_Section_packages.html. More info is
	+given on the doc pages that describe each package in detail:

	5.3.1 "USER-INTEL package"_accelerate_intel.html
	+5.3.2 "GPU package"_accelerate_intel.html
	5.3.3 "KOKKOS package"_accelerate_kokkos.html
	5.3.4 "USER-OMP package"_accelerate_omp.html
	5.3.5 "OPT package"_accelerate_opt.html :all(b)

	-You can also look at the following machine Makefiles in
	-src/MAKE/OPTIONS, which include the changes. Note that the USER-INTEL
	-and KOKKOS packages allow for settings that build LAMMPS for different
	-hardware. The USER-INTEL package builds for CPU and the Xeon Phi, the
	-KOKKOS package builds for OpenMP, GPUs (Cuda), and the Xeon Phi.
	+You can also use or examine the following machine Makefiles in
	+src/MAKE/OPTIONS, which include the settings. Note that the
	+USER-INTEL and KOKKOS packages can use settings that build LAMMPS for
	+different hardware. The USER-INTEL package can be compiled for Intel
	+CPUs and KNLs; the KOKKOS package builds for CPUs (OpenMP), GPUs
	+(Cuda), and Intel KNLs.

	Makefile.intel_cpu
	Makefile.intel_phi
	Makefile.kokkos_omp
	Makefile.kokkos_cuda
	Makefile.kokkos_phi
	Makefile.omp
	Makefile.opt :ul

	-Also note that the Make.py tool, described in the next "Section
	-2.4"_#start_4 can automatically add the needed info to an existing
	-machine Makefile, using simple command-line arguments.
	-
	-:line
	-
	-2.4 Building LAMMPS via the Make.py tool :h4,link(start_4)
	-
	-The src directory includes a Make.py script, written in Python, which
	-can be used to automate various steps of the build process. It is
	-particularly useful for working with the accelerator packages, as well
	-as other packages which require auxiliary libraries to be built.
	-
	-The goal of the Make.py tool is to allow any complex multi-step LAMMPS
	-build to be performed as a single Make.py command. And you can
	-archive the commands, so they can be re-invoked later via the -r
	-(redo) switch. If you find some LAMMPS build procedure that can't be
	-done in a single Make.py command, let the developers know, and we'll
	-see if we can augment the tool.
	-
	-You can run Make.py from the src directory by typing either:
	-
	-Make.py -h
	-python Make.py -h :pre
	-
	-which will give you help info about the tool. For the former to work,
	-you may need to edit the first line of Make.py to point to your local
	-Python. And you may need to insure the script is executable:
	-
	-chmod +x Make.py :pre
	-
	-Here are examples of build tasks you can perform with Make.py:
	-
	-Install/uninstall packages: Make.py -p no-lib kokkos omp intel
	-Build specific auxiliary libs: Make.py -a lib-atc lib-meam
	-Build libs for all installed packages: Make.py -p cuda gpu -gpu mode=double arch=31 -a lib-all
	-Create a Makefile from scratch with compiler and MPI settings: Make.py -m none -cc g++ -mpi mpich -a file
	-Augment Makefile.serial with settings for installed packages: Make.py -p intel -intel cpu -m serial -a file
	-Add JPG and FFTW support to Makefile.mpi: Make.py -m mpi -jpg -fft fftw -a file
	-Build LAMMPS with a parallel make using Makefile.mpi: Make.py -j 16 -m mpi -a exe
	-Build LAMMPS and libs it needs using Makefile.serial with accelerator settings: Make.py -p gpu intel -intel cpu -a lib-all file serial :tb(s=:)
	-
	-The bench and examples directories give Make.py commands that can be
	-used to build LAMMPS with the various packages and options needed to
	-run all the benchmark and example input scripts. See these files for
	-more details:
	-
	-bench/README
	-bench/FERMI/README
	-bench/KEPLER/README
	-bench/PHI/README
	-examples/README
	-examples/accelerate/README
	-examples/accelerate/make.list :ul
	-
	-All of the Make.py options and syntax help can be accessed by using
	-the "-h" switch.
	-
	-E.g. typing "Make.py -h" gives
	-
	-Syntax: Make.py switch args ...
	- switches can be listed in any order
	- help switch:
	- -h prints help and syntax for all other specified switches
	- switch for actions:
	- -a lib-all, lib-dir, clean, file, exe or machine
	- list one or more actions, in any order
	- machine is a Makefile.machine suffix, must be last if used
	- one-letter switches:
	- -d (dir), -j (jmake), -m (makefile), -o (output),
	- -p (packages), -r (redo), -s (settings), -v (verbose)
	- switches for libs:
	- -atc, -awpmd, -colvars, -cuda
	- -gpu, -meam, -poems, -qmmm, -reax
	- switches for build and makefile options:
	- -intel, -kokkos, -cc, -mpi, -fft, -jpg, -png :pre
	-
	-Using the "-h" switch with other switches and actions gives additional
	-info on all the other specified switches or actions. The "-h" can be
	-anywhere in the command-line and the other switches do not need their
	-arguments. E.g. type "Make.py -h -d -atc -intel" will print:
	-
	--d dir
	- dir = LAMMPS home dir
	- if -d not specified, working dir must be lammps/src :pre
	-
	--atc make=suffix lammps=suffix2
	- all args are optional and can be in any order
	- make = use Makefile.suffix (def = g++)
	- lammps = use Makefile.lammps.suffix2 (def = EXTRAMAKE in makefile) :pre
	-
	--intel mode
	- mode = cpu or phi (def = cpu)
	- build Intel package for CPU or Xeon Phi :pre
	-
	-Note that Make.py never overwrites an existing Makefile.machine.
	-Instead, it creates src/MAKE/MINE/Makefile.auto, which you can save or
	-rename if desired. Likewise it creates an executable named
	-src/lmp_auto, which you can rename using the -o switch if desired.
	-
	-The most recently executed Make.py command is saved in
	-src/Make.py.last. You can use the "-r" switch (for redo) to re-invoke
	-the last command, or you can save a sequence of one or more Make.py
	-commands to a file and invoke the file of commands using "-r". You
	-can also label the commands in the file and invoke one or more of them
	-by name.
	-
	-A typical use of Make.py is to start with a valid Makefile.machine for
	-your system, that works for a vanilla LAMMPS build, i.e. when optional
	-packages are not installed. You can then use Make.py to add various
	-settings (FFT, JPG, PNG) to the Makefile.machine as well as change its
	-compiler and MPI options. You can also add additional packages to the
	-build, as well as build the needed supporting libraries.
	-
	-You can also use Make.py to create a new Makefile.machine from
	-scratch, using the "-m none" switch, if you also specify what compiler
	-and MPI options to use, via the "-cc" and "-mpi" switches.
	-
	:line

	-2.5 Building LAMMPS as a library :h4,link(start_5)
	+2.4 Building LAMMPS as a library :h4,link(start_4)

	LAMMPS can be built as either a static or shared library, which can
	then be called from another application or a scripting language. See
	"this section"_Section_howto.html#howto_10 for more info on coupling
	LAMMPS to other codes. See "this section"_Section_python.html for
	more info on wrapping and running LAMMPS from Python.

	Static library :h5

	To build LAMMPS as a static library (*.a file on Linux), type

	make foo mode=lib :pre

	where foo is the machine name. This kind of library is typically used
	to statically link a driver application to LAMMPS, so that you can
	insure all dependencies are satisfied at compile time. This will use
	the ARCHIVE and ARFLAGS settings in src/MAKE/Makefile.foo. The build
	will create the file liblammps_foo.a which another application can
	link to. It will also create a soft link liblammps.a, which will
	point to the most recently built static library.

	Shared library :h5

	To build LAMMPS as a shared library (*.so file on Linux), which can be
	dynamically loaded, e.g. from Python, type

	make foo mode=shlib :pre

	where foo is the machine name. This kind of library is required when
	wrapping LAMMPS with Python; see "Section 11"_Section_python.html
	for details. This will use the SHFLAGS and SHLIBFLAGS settings in
	src/MAKE/Makefile.foo and perform the build in the directory
	Obj_shared_foo. This is so that each file can be compiled with the
	-fPIC flag which is required for inclusion in a shared library. The
	build will create the file liblammps_foo.so which another application
	-can link to dynamically. It will also create a soft link liblammps.so,
	+can link to dyamically. It will also create a soft link liblammps.so,
	which will point to the most recently built shared library. This is
	the file the Python wrapper loads by default.

	Note that for a shared library to be usable by a calling program, all
	the auxiliary libraries it depends on must also exist as shared
	libraries. This will be the case for libraries included with LAMMPS,
	such as the dummy MPI library in src/STUBS or any package libraries in
	lib/packages, since they are always built as shared libraries using
	the -fPIC switch. However, if a library like MPI or FFTW does not
	exist as a shared library, the shared library build will generate an
	error. This means you will need to install a shared library version
	of the auxiliary library. The build instructions for the library
	should tell you how to do this.

	Here is an example of such errors when the system FFTW or provided
	lib/colvars library have not been built as shared libraries:

	/usr/bin/ld: /usr/local/lib/libfftw3.a(mapflags.o): relocation
	R_X86_64_32 against '.rodata' can not be used when making a shared
	object; recompile with -fPIC
	/usr/local/lib/libfftw3.a: could not read symbols: Bad value :pre

	/usr/bin/ld: ../../lib/colvars/libcolvars.a(colvarmodule.o):
	relocation R_X86_64_32 against '__pthread_key_create' can not be used
	when making a shared object; recompile with -fPIC
	../../lib/colvars/libcolvars.a: error adding symbols: Bad value :pre

	As an example, here is how to build and install the "MPICH
	library"_mpich, a popular open-source version of MPI, distributed by
	Argonne National Labs, as a shared library in the default
	/usr/local/lib location:

	:link(mpich,http://www-unix.mcs.anl.gov/mpi)

	./configure --enable-shared
	make
	make install :pre

	You may need to use "sudo make install" in place of the last line if
	you do not have write privileges for /usr/local/lib. The end result
	should be the file /usr/local/lib/libmpich.so.

	[Additional requirement for using a shared library:] :h5

	The operating system finds shared libraries to load at run-time using
	the environment variable LD_LIBRARY_PATH. So you may wish to copy the
	file src/liblammps.so or src/liblammps_g++.so (for example) to a place
	the system can find it by default, such as /usr/local/lib, or you may
	wish to add the LAMMPS src directory to LD_LIBRARY_PATH, so that the
	current version of the shared library is always available to programs
	that use it.

	For the csh or tcsh shells, you would add something like this to your
	~/.cshrc file:

	setenv LD_LIBRARY_PATH $\{LD_LIBRARY_PATH\}:/home/sjplimp/lammps/src :pre

	Calling the LAMMPS library :h5

	Either flavor of library (static or shared) allows one or more LAMMPS
	objects to be instantiated from the calling program.

	When used from a C++ program, all of LAMMPS is wrapped in a LAMMPS_NS
	namespace; you can safely use any of its classes and methods from
	within the calling code, as needed.

	When used from a C or Fortran program or a scripting language like
	Python, the library has a simple function-style interface, provided in
	src/library.cpp and src/library.h.

	See the sample codes in examples/COUPLE/simple for examples of C++ and
	C and Fortran codes that invoke LAMMPS thru its library interface.
	There are other examples as well in the COUPLE directory which are
	discussed in "Section 6.10"_Section_howto.html#howto_10 of the
	manual. See "Section 11"_Section_python.html of the manual for a
	description of the Python wrapper provided with LAMMPS that operates
	through the LAMMPS library interface.

	The files src/library.cpp and library.h define the C-style API for
	using LAMMPS as a library. See "Section
	6.19"_Section_howto.html#howto_19 of the manual for a description of the
	interface and how to extend it for your needs.

	:line

	-2.6 Running LAMMPS :h4,link(start_6)
	+2.5 Running LAMMPS :h4,link(start_5)

	By default, LAMMPS runs by reading commands from standard input. Thus
	if you run the LAMMPS executable by itself, e.g.

	lmp_linux :pre

	it will simply wait, expecting commands from the keyboard. Typically
	you should put commands in an input script and use I/O redirection,
	e.g.

	lmp_linux < in.file :pre

	For parallel environments this should also work. If it does not, use
	the '-in' command-line switch, e.g.

	lmp_linux -in in.file :pre

	"This section"_Section_commands.html describes how input scripts are
	structured and what commands they contain.

	You can test LAMMPS on any of the sample inputs provided in the
	examples or bench directory. Input scripts are named in.* and sample
	outputs are named log.*.name.P where name is a machine and P is the
	number of processors it was run on.

	Here is how you might run a standard Lennard-Jones benchmark on a
	Linux box, using mpirun to launch a parallel job:

	cd src
	make linux
	cp lmp_linux ../bench
	cd ../bench
	mpirun -np 4 lmp_linux -in in.lj :pre

	See "this page"_bench for timings for this and the other benchmarks on
	various platforms. Note that some of the example scripts require
	LAMMPS to be built with one or more of its optional packages.

	:link(bench,http://lammps.sandia.gov/bench.html)

	:line

	On a Windows box, you can skip making LAMMPS and simply download an
	installer package from "here"_http://rpm.lammps.org/windows.html

	For running the non-MPI executable, follow these steps:

	Get a command prompt by going to Start->Run... ,
	then typing "cmd". :ulb,l

	Move to the directory where you have your input, e.g. a copy of
	the [in.lj] input from the bench folder. (e.g. by typing: cd "Documents"). :l

	At the command prompt, type "lmp_serial -in in.lj", replacing [in.lj]
	with the name of your LAMMPS input script. :l
	:ule

	For the MPI version, which allows you to run LAMMPS under Windows on
	multiple processors, follow these steps:

	Download and install
	"MPICH2"_http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads
	for Windows. :ulb,l

	The LAMMPS Windows installer packages will automatically adjust your
	path for the default location of this MPI package. After the installation
	of the MPICH software, it needs to be integrated into the system.
	For this you need to start a Command Prompt in {Administrator Mode}
	(right click on the icon and select it). Change into the MPICH2
	installation directory, then into the subdirectory [bin] and execute
	[smpd.exe -install]. Exit the command window.

	Get a new, regular command prompt by going to Start->Run... ,
	then typing "cmd". :l

	Move to the directory where you have your input file
	(e.g. by typing: cd "Documents"). :l

	Then type something like this:

	mpiexec -localonly 4 lmp_mpi -in in.lj :pre
	or

	mpiexec -np 4 lmp_mpi -in in.lj :pre

	replacing in.lj with the name of your LAMMPS input script. For the latter
	case, you may be prompted to enter your password. :l

	In this mode, output may not immediately show up on the screen, so if
	your input script takes a long time to execute, you may need to be
	patient before the output shows up. :l

	The parallel executable can also run on a single processor by typing
	something like:

	lmp_mpi -in in.lj :pre

	:ule

	:line

	The screen output from LAMMPS is described in a section below. As it
	runs, LAMMPS also writes a log.lammps file with the same information.

	Note that this sequence of commands copies the LAMMPS executable
	(lmp_linux) to the directory with the input files. This may not be
	necessary, but some versions of MPI reset the working directory to
	where the executable is, rather than leave it as the directory where
	you launch mpirun from (if you launch lmp_linux on its own and not
	under mpirun). If that happens, LAMMPS will look for additional input
	files and write its output files to the executable directory, rather
	than your working directory, which is probably not what you want.

	If LAMMPS encounters errors in the input script or while running a
	simulation it will print an ERROR message and stop or a WARNING
	message and continue. See "Section 12"_Section_errors.html for a
	discussion of the various kinds of errors LAMMPS can or can't detect,
	a list of all ERROR and WARNING messages, and what to do about them.

	LAMMPS can run a problem on any number of processors, including a
	single processor. In theory you should get identical answers on any
	number of processors and on any machine. In practice, numerical
	round-off can cause slight differences and eventual divergence of
	molecular dynamics phase space trajectories.

	LAMMPS can run as large a problem as will fit in the physical memory
	of one or more processors. If you run out of memory, you must run on
	more processors or setup a smaller problem.

	:line

	-2.7 Command-line options :h4,link(start_7)
	+2.6 Command-line options :h4,link(start_6)

	At run time, LAMMPS recognizes several optional command-line switches
	which may be used in any order. Either the full word or a one-or-two
	letter abbreviation can be used:

	-e or -echo
	-h or -help
	-i or -in
	-k or -kokkos
	-l or -log
	-nc or -nocite
	-pk or -package
	-p or -partition
	-pl or -plog
	-ps or -pscreen
	-r or -restart
	-ro or -reorder
	-sc or -screen
	-sf or -suffix
	-v or -var :ul

	For example, lmp_ibm might be launched as follows:

	mpirun -np 16 lmp_ibm -v f tmp.out -l my.log -sc none -in in.alloy
	mpirun -np 16 lmp_ibm -var f tmp.out -log my.log -screen none -in in.alloy :pre

	Here are the details on the options:

	-echo style :pre

	Set the style of command echoing. The style can be {none} or {screen}
	or {log} or {both}. Depending on the style, each command read from
	the input script will be echoed to the screen and/or logfile. This
	can be useful to figure out which line of your script is causing an
	input error. The default value is {log}. The echo style can also be
	set by using the "echo"_echo.html command in the input script itself.

	-help :pre

	Print a brief help summary and a list of options compiled into this
	executable for each LAMMPS style (atom_style, fix, compute,
	pair_style, bond_style, etc). This can tell you if the command you
	want to use was included via the appropriate package at compile time.
	LAMMPS will print the info and immediately exit if this switch is
	used.

	-in file :pre

	Specify a file to use as an input script. This is an optional switch
	when running LAMMPS in one-partition mode. If it is not specified,
	LAMMPS reads its script from standard input, typically from a script
	via I/O redirection; e.g. lmp_linux < in.run. I/O redirection should
	also work in parallel, but if it does not (in the unlikely case that
	an MPI implementation does not support it), then use the -in flag.
	Note that this is a required switch when running LAMMPS in
	multi-partition mode, since multiple processors cannot all read from
	stdin.

	-kokkos on/off keyword/value ... :pre

	Explicitly enable or disable KOKKOS support, as provided by the KOKKOS
	package. Even if LAMMPS is built with this package, as described
	above in "Section 2.3"_#start_3, this switch must be set to enable
	running with the KOKKOS-enabled styles the package provides. If the
	switch is not set (the default), LAMMPS will operate as if the KOKKOS
	package were not installed; i.e. you can run standard LAMMPS or with
	the GPU or USER-OMP packages, for testing or benchmarking purposes.

	Additional optional keyword/value pairs can be specified which
	determine how Kokkos will use the underlying hardware on your
	platform. These settings apply to each MPI task you launch via the
	"mpirun" or "mpiexec" command. You may choose to run one or more MPI
	tasks per physical node. Note that if you are running on a desktop
	machine, you typically have one physical node. On a cluster or
	supercomputer there may be dozens or 1000s of physical nodes.

	Either the full word or an abbreviation can be used for the keywords.
	Note that the keywords do not use a leading minus sign. I.e. the
	keyword is "t", not "-t". Also note that each of the keywords has a
	default setting. Example of when to use these options and what
	settings to use on different platforms is given in "Section
	5.3"_Section_accelerate.html#acc_3.

	d or device
	g or gpus
	t or threads
	n or numa :ul

	device Nd :pre

	This option is only relevant if you built LAMMPS with CUDA=yes, you
	have more than one GPU per node, and if you are running with only one
	MPI task per node. The Nd setting is the ID of the GPU on the node to
	run on. By default Nd = 0. If you have multiple GPUs per node, they
	have consecutive IDs numbered as 0,1,2,etc. This setting allows you
	to launch multiple independent jobs on the node, each with a single
	MPI task per node, and assign each job to run on a different GPU.

	gpus Ng Ns :pre

	This option is only relevant if you built LAMMPS with CUDA=yes, you
	have more than one GPU per node, and you are running with multiple MPI
	tasks per node (up to one per GPU). The Ng setting is how many GPUs
	you will use. The Ns setting is optional. If set, it is the ID of a
	GPU to skip when assigning MPI tasks to GPUs. This may be useful if
	your desktop system reserves one GPU to drive the screen and the rest
	are intended for computational work like running LAMMPS. By default
	Ng = 1 and Ns is not set.

	Depending on which flavor of MPI you are running, LAMMPS will look for
	one of these 3 environment variables

	SLURM_LOCALID (various MPI variants compiled with SLURM support)
	MV2_COMM_WORLD_LOCAL_RANK (Mvapich)
	OMPI_COMM_WORLD_LOCAL_RANK (OpenMPI) :pre

	which are initialized by the "srun", "mpirun" or "mpiexec" commands.
	The environment variable setting for each MPI rank is used to assign a
	unique GPU ID to the MPI task.

	threads Nt :pre

	This option assigns Nt number of threads to each MPI task for
	performing work when Kokkos is executing in OpenMP or pthreads mode.
	The default is Nt = 1, which essentially runs in MPI-only mode. If
	there are Np MPI tasks per physical node, you generally want Np*Nt =
	the number of physical cores per node, to use your available hardware
	optimally. This also sets the number of threads used by the host when
	LAMMPS is compiled with CUDA=yes.

	numa Nm :pre

	This option is only relevant when using pthreads with hwloc support.
	-In this case Nm defines the number of NUMA regions (typically sockets)
	-on a node which will be utilized by a single MPI rank. By default Nm
	+In this case Nm defines the number of NUMA regions (typicaly sockets)
	+on a node which will be utilizied by a single MPI rank. By default Nm
	= 1. If this option is used the total number of worker-threads per
	MPI rank is threads*numa. Currently it is always almost better to
	assign at least one MPI rank per NUMA region, and leave numa set to
	its default value of 1. This is because letting a single process span
	multiple NUMA regions induces a significant amount of cross NUMA data
	traffic which is slow.

	-log file :pre

	Specify a log file for LAMMPS to write status information to. In
	one-partition mode, if the switch is not used, LAMMPS writes to the
	file log.lammps. If this switch is used, LAMMPS writes to the
	specified file. In multi-partition mode, if the switch is not used, a
	log.lammps file is created with hi-level status information. Each
	partition also writes to a log.lammps.N file where N is the partition
	ID. If the switch is specified in multi-partition mode, the hi-level
	logfile is named "file" and each partition also logs information to a
	file.N. For both one-partition and multi-partition mode, if the
	specified file is "none", then no log files are created. Using a
	"log"_log.html command in the input script will override this setting.
	Option -plog will override the name of the partition log files file.N.

	-nocite :pre

	Disable writing the log.cite file which is normally written to list
	references for specific cite-able features used during a LAMMPS run.
	See the "citation page"_http://lammps.sandia.gov/cite.html for more
	details.

	-package style args .... :pre

	Invoke the "package"_package.html command with style and args. The
	syntax is the same as if the command appeared at the top of the input
	script. For example "-package gpu 2" or "-pk gpu 2" is the same as
	"package gpu 2"_package.html in the input script. The possible styles
	and args are documented on the "package"_package.html doc page. This
	switch can be used multiple times, e.g. to set options for the
	USER-INTEL and USER-OMP packages which can be used together.

	Along with the "-suffix" command-line switch, this is a convenient
	mechanism for invoking accelerator packages and their options without
	having to edit an input script.

	-partition 8x2 4 5 ... :pre

	Invoke LAMMPS in multi-partition mode. When LAMMPS is run on P
	processors and this switch is not used, LAMMPS runs in one partition,
	i.e. all P processors run a single simulation. If this switch is
	used, the P processors are split into separate partitions and each
	partition runs its own simulation. The arguments to the switch
	specify the number of processors in each partition. Arguments of the
	form MxN mean M partitions, each with N processors. Arguments of the
	form N mean a single partition with N processors. The sum of
	processors in all partitions must equal P. Thus the command
	"-partition 8x2 4 5" has 10 partitions and runs on a total of 25
	processors.

	Running with multiple partitions can e useful for running
	"multi-replica simulations"_Section_howto.html#howto_5, where each
	replica runs on on one or a few processors. Note that with MPI
	installed on a machine (e.g. your desktop), you can run on more
	(virtual) processors than you have physical processors.

	-To run multiple independent simulations from one input script, using
	+To run multiple independent simulatoins from one input script, using
	multiple partitions, see "Section 6.4"_Section_howto.html#howto_4
	of the manual. World- and universe-style "variables"_variable.html
	are useful in this context.

	-plog file :pre

	Specify the base name for the partition log files, so partition N
	writes log information to file.N. If file is none, then no partition
	log files are created. This overrides the filename specified in the
	-log command-line option. This option is useful when working with
	large numbers of partitions, allowing the partition log files to be
	suppressed (-plog none) or placed in a sub-directory (-plog
	replica_files/log.lammps) If this option is not used the log file for
	partition N is log.lammps.N or whatever is specified by the -log
	command-line option.

	-pscreen file :pre

	Specify the base name for the partition screen file, so partition N
	writes screen information to file.N. If file is none, then no
	partition screen files are created. This overrides the filename
	specified in the -screen command-line option. This option is useful
	when working with large numbers of partitions, allowing the partition
	screen files to be suppressed (-pscreen none) or placed in a
	sub-directory (-pscreen replica_files/screen). If this option is not
	used the screen file for partition N is screen.N or whatever is
	specified by the -screen command-line option.

	-restart restartfile {remap} datafile keyword value ... :pre

	Convert the restart file into a data file and immediately exit. This
	is the same operation as if the following 2-line input script were
	run:

	read_restart restartfile {remap}
	write_data datafile keyword value ... :pre

	Note that the specified restartfile and datafile can have wild-card
	characters ("*",%") as described by the
	"read_restart"_read_restart.html and "write_data"_write_data.html
	commands. But a filename such as file.* will need to be enclosed in
	quotes to avoid shell expansion of the "*" character.

	Note that following restartfile, the optional flag {remap} can be
	used. This has the same effect as adding it to the
	"read_restart"_read_restart.html command, as explained on its doc
	page. This is only useful if the reading of the restart file triggers
	an error that atoms have been lost. In that case, use of the remap
	flag should allow the data file to still be produced.

	Also note that following datafile, the same optional keyword/value
	pairs can be listed as used by the "write_data"_write_data.html
	command.

	-reorder nth N
	-reorder custom filename :pre

	Reorder the processors in the MPI communicator used to instantiate
	LAMMPS, in one of several ways. The original MPI communicator ranks
	all P processors from 0 to P-1. The mapping of these ranks to
	physical processors is done by MPI before LAMMPS begins. It may be
	useful in some cases to alter the rank order. E.g. to insure that
	cores within each node are ranked in a desired order. Or when using
	the "run_style verlet/split"_run_style.html command with 2 partitions
	to insure that a specific Kspace processor (in the 2nd partition) is
	matched up with a specific set of processors in the 1st partition.
	See the "Section 5"_Section_accelerate.html doc pages for
	more details.

	If the keyword {nth} is used with a setting {N}, then it means every
	Nth processor will be moved to the end of the ranking. This is useful
	when using the "run_style verlet/split"_run_style.html command with 2
	partitions via the -partition command-line switch. The first set of
	processors will be in the first partition, the 2nd set in the 2nd
	partition. The -reorder command-line switch can alter this so that
	the 1st N procs in the 1st partition and one proc in the 2nd partition
	will be ordered consecutively, e.g. as the cores on one physical node.
	This can boost performance. For example, if you use "-reorder nth 4"
	and "-partition 9 3" and you are running on 12 processors, the
	processors will be reordered from

	0 1 2 3 4 5 6 7 8 9 10 11 :pre

	to

	0 1 2 4 5 6 8 9 10 3 7 11 :pre

	so that the processors in each partition will be

	0 1 2 4 5 6 8 9 10
	3 7 11 :pre

	See the "processors" command for how to insure processors from each
	partition could then be grouped optimally for quad-core nodes.

	If the keyword is {custom}, then a file that specifies a permutation
	of the processor ranks is also specified. The format of the reorder
	file is as follows. Any number of initial blank or comment lines
	(starting with a "#" character) can be present. These should be
	followed by P lines of the form:

	I J :pre

	where P is the number of processors LAMMPS was launched with. Note
	that if running in multi-partition mode (see the -partition switch
	above) P is the total number of processors in all partitions. The I
	and J values describe a permutation of the P processors. Every I and
	J should be values from 0 to P-1 inclusive. In the set of P I values,
	every proc ID should appear exactly once. Ditto for the set of P J
	values. A single I,J pairing means that the physical processor with
	rank I in the original MPI communicator will have rank J in the
	reordered communicator.

	Note that rank ordering can also be specified by many MPI
	implementations, either by environment variables that specify how to
	order physical processors, or by config files that specify what
	physical processors to assign to each MPI rank. The -reorder switch
	simply gives you a portable way to do this without relying on MPI
	itself. See the "processors out"_processors.html command for how
	to output info on the final assignment of physical processors to
	the LAMMPS simulation domain.

	-screen file :pre

	Specify a file for LAMMPS to write its screen information to. In
	one-partition mode, if the switch is not used, LAMMPS writes to the
	screen. If this switch is used, LAMMPS writes to the specified file
	instead and you will see no screen output. In multi-partition mode,
	if the switch is not used, hi-level status information is written to
	the screen. Each partition also writes to a screen.N file where N is
	the partition ID. If the switch is specified in multi-partition mode,
	the hi-level screen dump is named "file" and each partition also
	writes screen information to a file.N. For both one-partition and
	multi-partition mode, if the specified file is "none", then no screen
	output is performed. Option -pscreen will override the name of the
	partition screen files file.N.

	-suffix style args :pre

	Use variants of various styles if they exist. The specified style can
	be {cuda}, {gpu}, {intel}, {kk}, {omp}, {opt}, or {hybrid}. These
	refer to optional packages that LAMMPS can be built with, as described
	above in "Section 2.3"_#start_3. The "gpu" style corresponds to the
	GPU package, the "intel" style to the USER-INTEL package, the "kk"
	style to the KOKKOS package, the "opt" style to the OPT package, and
	the "omp" style to the USER-OMP package. The hybrid style is the only
	style that accepts arguments. It allows for two packages to be
	specified. The first package specified is the default and will be used
	if it is available. If no style is available for the first package,
	the style for the second package will be used if available. For
	example, "-suffix hybrid intel omp" will use styles from the
	USER-INTEL package if they are installed and available, but styles for
	the USER-OMP package otherwise.

	Along with the "-package" command-line switch, this is a convenient
	mechanism for invoking accelerator packages and their options without
	having to edit an input script.

	As an example, all of the packages provide a "pair_style
	lj/cut"_pair_lj.html variant, with style names lj/cut/gpu,
	lj/cut/intel, lj/cut/kk, lj/cut/omp, and lj/cut/opt. A variant style
	can be specified explicitly in your input script, e.g. pair_style
	lj/cut/gpu. If the -suffix switch is used the specified suffix
	(gpu,intel,kk,omp,opt) is automatically appended whenever your input
	script command creates a new "atom"_atom_style.html,
	"pair"_pair_style.html, "fix"_fix.html, "compute"_compute.html, or
	"run"_run_style.html style. If the variant version does not exist,
	the standard version is created.

	For the GPU package, using this command-line switch also invokes the
	default GPU settings, as if the command "package gpu 1" were used at
	the top of your input script. These settings can be changed by using
	the "-package gpu" command-line switch or the "package
	gpu"_package.html command in your script.

	For the USER-INTEL package, using this command-line switch also
	invokes the default USER-INTEL settings, as if the command "package
	intel 1" were used at the top of your input script. These settings
	can be changed by using the "-package intel" command-line switch or
	the "package intel"_package.html command in your script. If the
	USER-OMP package is also installed, the hybrid style with "intel omp"
	arguments can be used to make the omp suffix a second choice, if a
	requested style is not available in the USER-INTEL package. It will
	also invoke the default USER-OMP settings, as if the command "package
	omp 0" were used at the top of your input script. These settings can
	be changed by using the "-package omp" command-line switch or the
	"package omp"_package.html command in your script.

	For the KOKKOS package, using this command-line switch also invokes
	the default KOKKOS settings, as if the command "package kokkos" were
	used at the top of your input script. These settings can be changed
	by using the "-package kokkos" command-line switch or the "package
	kokkos"_package.html command in your script.

	For the OMP package, using this command-line switch also invokes the
	default OMP settings, as if the command "package omp 0" were used at
	the top of your input script. These settings can be changed by using
	the "-package omp" command-line switch or the "package
	omp"_package.html command in your script.

	The "suffix"_suffix.html command can also be used within an input
	script to set a suffix, or to turn off or back on any suffix setting
	made via the command line.

	-var name value1 value2 ... :pre

	Specify a variable that will be defined for substitution purposes when
	the input script is read. This switch can be used multiple times to
	define multiple variables. "Name" is the variable name which can be a
	single character (referenced as $x in the input script) or a full
	string (referenced as $\{abc\}). An "index-style
	variable"_variable.html will be created and populated with the
	subsequent values, e.g. a set of filenames. Using this command-line
	option is equivalent to putting the line "variable name index value1
	value2 ..." at the beginning of the input script. Defining an index
	variable as a command-line argument overrides any setting for the same
	index variable in the input script, since index variables cannot be
	re-defined. See the "variable"_variable.html command for more info on
	defining index and other kinds of variables and "this
	section"_Section_commands.html#cmd_2 for more info on using variables
	in input scripts.

	NOTE: Currently, the command-line parser looks for arguments that
	start with "-" to indicate new switches. Thus you cannot specify
	multiple variable values if any of they start with a "-", e.g. a
	negative numeric value. It is OK if the first value1 starts with a
	"-", since it is automatically skipped.

	:line

	-2.8 LAMMPS screen output :h4,link(start_8)
	+2.7 LAMMPS screen output :h4,link(start_7)

	As LAMMPS reads an input script, it prints information to both the
	screen and a log file about significant actions it takes to setup a
	simulation. When the simulation is ready to begin, LAMMPS performs
	various initializations and prints the amount of memory (in MBytes per
	processor) that the simulation requires. It also prints details of
	the initial thermodynamic state of the system. During the run itself,
	thermodynamic information is printed periodically, every few
	timesteps. When the run concludes, LAMMPS prints the final
	thermodynamic state and a total run time for the simulation. It then
	appends statistics about the CPU time and storage requirements for the
	simulation. An example set of statistics is shown here:

	Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms :pre

	Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
	97.0% CPU use with 4 MPI tasks x no OpenMP threads :pre

	MPI task timings breakdown:
	Section \| min time \| avg time \| max time \|%varavg\| %total
	---------------------------------------------------------------
	Pair \| 1.9808 \| 2.0134 \| 2.0318 \| 1.4 \| 71.60
	Bond \| 0.0021894 \| 0.0060319 \| 0.010058 \| 4.7 \| 0.21
	Kspace \| 0.3207 \| 0.3366 \| 0.36616 \| 3.1 \| 11.97
	Neigh \| 0.28411 \| 0.28464 \| 0.28516 \| 0.1 \| 10.12
	Comm \| 0.075732 \| 0.077018 \| 0.07883 \| 0.4 \| 2.74
	Output \| 0.00030518 \| 0.00042665 \| 0.00078821 \| 1.0 \| 0.02
	Modify \| 0.086606 \| 0.086631 \| 0.086668 \| 0.0 \| 3.08
	Other \| \| 0.007178 \| \| \| 0.26 :pre

	Nlocal: 501 ave 508 max 490 min
	Histogram: 1 0 0 0 0 0 1 1 0 1
	Nghost: 6586.25 ave 6628 max 6548 min
	Histogram: 1 0 1 0 0 0 1 0 0 1
	Neighs: 177007 ave 180562 max 170212 min
	Histogram: 1 0 0 0 0 0 0 1 1 1 :pre

	Total # of neighbors = 708028
	Ave neighs/atom = 353.307
	Ave special neighs/atom = 2.34032
	Neighbor list builds = 26
	Dangerous builds = 0 :pre

	The first section provides a global loop timing summary. The {loop time}
	is the total wall time for the section. The {Performance} line is
	provided for convenience to help predicting the number of loop
	continuations required and for comparing performance with other,
	-similar MD codes. The {CPU use} line provides the CPU utilization per
	+similar MD codes. The {CPU use} line provides the CPU utilzation per
	MPI task; it should be close to 100% times the number of OpenMP
	threads (or 1 of no OpenMP). Lower numbers correspond to delays due
	to file I/O or insufficient thread utilization.

	The MPI task section gives the breakdown of the CPU run time (in
	seconds) into major categories:

	{Pair} stands for all non-bonded force computation
	{Bond} stands for bonded interactions: bonds, angles, dihedrals, impropers
	{Kspace} stands for reciprocal space interactions: Ewald, PPPM, MSM
	{Neigh} stands for neighbor list construction
	{Comm} stands for communicating atoms and their properties
	{Output} stands for writing dumps and thermo output
	{Modify} stands for fixes and computes called by them
	{Other} is the remaining time :ul

	For each category, there is a breakdown of the least, average and most
	amount of wall time a processor spent on this section. Also you have the
	variation from the average time. Together these numbers allow to gauge
	the amount of load imbalance in this segment of the calculation. Ideally
	the difference between minimum, maximum and average is small and thus
	the variation from the average close to zero. The final column shows
	the percentage of the total loop time is spent in this section.

	When using the "timer full"_timer.html setting, an additional column
	is present that also prints the CPU utilization in percent. In
	addition, when using {timer full} and the "package omp"_package.html
	command are active, a similar timing summary of time spent in threaded
	regions to monitor thread utilization and load balance is provided. A
	new entry is the {Reduce} section, which lists the time spent in
	reducing the per-thread data elements to the storage for non-threaded
	computation. These thread timings are taking from the first MPI rank
	only and and thus, as the breakdown for MPI tasks can change from MPI
	rank to MPI rank, this breakdown can be very different for individual
	ranks. Here is an example output for this section:

	Thread timings breakdown (MPI rank 0):
	Total threaded time 0.6846 / 90.6%
	Section \| min time \| avg time \| max time \|%varavg\| %total
	---------------------------------------------------------------
	Pair \| 0.5127 \| 0.5147 \| 0.5167 \| 0.3 \| 75.18
	Bond \| 0.0043139 \| 0.0046779 \| 0.0050418 \| 0.5 \| 0.68
	Kspace \| 0.070572 \| 0.074541 \| 0.07851 \| 1.5 \| 10.89
	Neigh \| 0.084778 \| 0.086969 \| 0.089161 \| 0.7 \| 12.70
	Reduce \| 0.0036485 \| 0.003737 \| 0.0038254 \| 0.1 \| 0.55 :pre

	The third section lists the number of owned atoms (Nlocal), ghost atoms
	(Nghost), and pair-wise neighbors stored per processor. The max and min
	values give the spread of these values across processors with a 10-bin
	histogram showing the distribution. The total number of histogram counts
	is equal to the number of processors.

	The last section gives aggregate statistics for pair-wise neighbors
	and special neighbors that LAMMPS keeps track of (see the
	"special_bonds"_special_bonds.html command). The number of times
	neighbor lists were rebuilt during the run is given as well as the
	number of potentially "dangerous" rebuilds. If atom movement
	triggered neighbor list rebuilding (see the
	"neigh_modify"_neigh_modify.html command), then dangerous
	reneighborings are those that were triggered on the first timestep
	atom movement was checked for. If this count is non-zero you may wish
	to reduce the delay factor to insure no force interactions are missed
	by atoms moving beyond the neighbor skin distance before a rebuild
	takes place.

	If an energy minimization was performed via the
	"minimize"_minimize.html command, additional information is printed,
	e.g.

	Minimization stats:
	Stopping criterion = linesearch alpha is zero
	Energy initial, next-to-last, final =
	-6372.3765206 -8328.46998942 -8328.46998942
	Force two-norm initial, final = 1059.36 5.36874
	Force max component initial, final = 58.6026 1.46872
	Final line search alpha, max atom move = 2.7842e-10 4.0892e-10
	Iterations, force evaluations = 701 1516 :pre

	The first line prints the criterion that determined the minimization
	to be completed. The third line lists the initial and final energy,
	as well as the energy on the next-to-last iteration. The next 2 lines
	give a measure of the gradient of the energy (force on all atoms).
	The 2-norm is the "length" of this force vector; the inf-norm is the
	largest component. Then some information about the line search and
	statistics on how many iterations and force-evaluations the minimizer
	required. Multiple force evaluations are typically done at each
	iteration to perform a 1d line minimization in the search direction.

	If a "kspace_style"_kspace_style.html long-range Coulombics solve was
	performed during the run (PPPM, Ewald), then additional information is
	printed, e.g.

	FFT time (% of Kspce) = 0.200313 (8.34477)
	FFT Gflps 3d 1d-only = 2.31074 9.19989 :pre

	The first line gives the time spent doing 3d FFTs (4 per timestep) and
	the fraction it represents of the total KSpace time (listed above).
	Each 3d FFT requires computation (3 sets of 1d FFTs) and communication
	(transposes). The total flops performed is 5Nlog_2(N), where N is the
	number of points in the 3d grid. The FFTs are timed with and without
	the communication and a Gflop rate is computed. The 3d rate is with
	communication; the 1d rate is without (just the 1d FFTs). Thus you
	can estimate what fraction of your FFT time was spent in
	communication, roughly 75% in the example above.

	:line

	-2.9 Tips for users of previous LAMMPS versions :h4,link(start_9)
	+2.8 Tips for users of previous LAMMPS versions :h4,link(start_8)

	The current C++ began with a complete rewrite of LAMMPS 2001, which
	was written in F90. Features of earlier versions of LAMMPS are listed
	in "Section 13"_Section_history.html. The F90 and F77 versions
	(2001 and 99) are also freely distributed as open-source codes; check
	the "LAMMPS WWW Site"_lws for distribution information if you prefer
	those versions. The 99 and 2001 versions are no longer under active
	development; they do not have all the features of C++ LAMMPS.

	If you are a previous user of LAMMPS 2001, these are the most
	significant changes you will notice in C++ LAMMPS:

	(1) The names and arguments of many input script commands have
	changed. All commands are now a single word (e.g. read_data instead
	of read data).

	(2) All the functionality of LAMMPS 2001 is included in C++ LAMMPS,
	but you may need to specify the relevant commands in different ways.

	(3) The format of the data file can be streamlined for some problems.
	See the "read_data"_read_data.html command for details. The data file
	section "Nonbond Coeff" has been renamed to "Pair Coeff" in C++ LAMMPS.

	(4) Binary restart files written by LAMMPS 2001 cannot be read by C++
	LAMMPS with a "read_restart"_read_restart.html command. This is
	because they were output by F90 which writes in a different binary
	format than C or C++ writes or reads. Use the {restart2data} tool
	provided with LAMMPS 2001 to convert the 2001 restart file to a text
	data file. Then edit the data file as necessary before using the C++
	LAMMPS "read_data"_read_data.html command to read it in.

	(5) There are numerous small numerical changes in C++ LAMMPS that mean
	you will not get identical answers when comparing to a 2001 run.
	However, your initial thermodynamic energy and MD trajectory should be
	close if you have setup the problem for both codes the same.
	diff --git a/doc/src/Section_tools.txt b/doc/src/Section_tools.txt
	index 03611c7cd..d95c4f0cd 100644
	--- a/doc/src/Section_tools.txt
	+++ b/doc/src/Section_tools.txt
	@@ -1,497 +1,500 @@
	"Previous Section"_Section_perf.html - "LAMMPS WWW Site"_lws - "LAMMPS
	Documentation"_ld - "LAMMPS Commands"_lc - "Next
	Section"_Section_modify.html :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	9. Additional tools :h3

	LAMMPS is designed to be a computational kernel for performing
	molecular dynamics computations. Additional pre- and post-processing
	steps are often necessary to setup and analyze a simulation. A
	list of such tools can be found on the LAMMPS home page
	at "http://lammps.sandia.gov/prepost.html"_http://lammps.sandia.gov/prepost.html

	A few additional tools are provided with the LAMMPS distribution
	and are described in this section.

	Our group has also written and released a separate toolkit called
	"Pizza.py"_pizza which provides tools for doing setup, analysis,
	plotting, and visualization for LAMMPS simulations. Pizza.py is
	written in "Python"_python and is available for download from "the
	Pizza.py WWW site"_pizza.

	:link(pizza,http://www.sandia.gov/~sjplimp/pizza.html)
	:link(python,http://www.python.org)

	Note that many users write their own setup or analysis tools or use
	other existing codes and convert their output to a LAMMPS input format
	or vice versa. The tools listed here are included in the LAMMPS
	distribution as examples of auxiliary tools. Some of them are not
	actively supported by Sandia, as they were contributed by LAMMPS
	users. If you have problems using them, we can direct you to the
	authors.

	The source code for each of these codes is in the tools sub-directory
	of the LAMMPS distribution. There is a Makefile (which you may need
	to edit for your platform) which will build several of the tools which
	reside in that directory. Most of them are larger packages in their
	own sub-directories with their own Makefiles and/or README files.

	"amber2lmp"_#amber
	"binary2txt"_#binary
	"ch2lmp"_#charmm
	"chain"_#chain
	"colvars"_#colvars
	"createatoms"_#createatoms
	"drude"_#drude
	"eam database"_#eamdb
	"eam generate"_#eamgn
	"eff"_#eff
	"emacs"_#emacs
	"fep"_#fep
	"i-pi"_#ipi
	"ipp"_#ipp
	"kate"_#kate
	"lmp2arc"_#arc
	"lmp2cfg"_#cfg
	"matlab"_#matlab
	"micelle2d"_#micelle
	"moltemplate"_#moltemplate
	"msi2lmp"_#msi
	"phonon"_#phonon
	"polybond"_#polybond
	"pymol_asphere"_#pymol
	"python"_#pythontools
	"reax"_#reax_tool
	"smd"_#smd
	"vim"_#vim
	"xmgrace"_#xmgrace

	:line

	amber2lmp tool :h4,link(amber)

	The amber2lmp sub-directory contains two Python scripts for converting
	files back-and-forth between the AMBER MD code and LAMMPS. See the
	README file in amber2lmp for more information.

	These tools were written by Keir Novik while he was at Queen Mary
	University of London. Keir is no longer there and cannot support
	these tools which are out-of-date with respect to the current LAMMPS
	version (and maybe with respect to AMBER as well). Since we don't use
	these tools at Sandia, you'll need to experiment with them and make
	necessary modifications yourself.

	:line

	binary2txt tool :h4,link(binary)

	The file binary2txt.cpp converts one or more binary LAMMPS dump file
	into ASCII text files. The syntax for running the tool is

	binary2txt file1 file2 ... :pre

	which creates file1.txt, file2.txt, etc. This tool must be compiled
	on a platform that can read the binary file created by a LAMMPS run,
	since binary files are not compatible across all platforms.

	:line

	ch2lmp tool :h4,link(charmm)

	The ch2lmp sub-directory contains tools for converting files
	back-and-forth between the CHARMM MD code and LAMMPS.

	They are intended to make it easy to use CHARMM as a builder and as a
	post-processor for LAMMPS. Using charmm2lammps.pl, you can convert a
	PDB file with associated CHARMM info, including CHARMM force field
	data, into its LAMMPS equivalent. Using lammps2pdb.pl you can convert
	LAMMPS atom dumps into PDB files.

	See the README file in the ch2lmp sub-directory for more information.

	These tools were created by Pieter in't Veld (pjintve at sandia.gov)
	and Paul Crozier (pscrozi at sandia.gov) at Sandia.

	:line

	chain tool :h4,link(chain)

	The file chain.f creates a LAMMPS data file containing bead-spring
	polymer chains and/or monomer solvent atoms. It uses a text file
	containing chain definition parameters as an input. The created
	chains and solvent atoms can strongly overlap, so LAMMPS needs to run
	the system initially with a "soft" pair potential to un-overlap it.
	The syntax for running the tool is

	chain < def.chain > data.file :pre

	See the def.chain or def.chain.ab files in the tools directory for
	examples of definition files. This tool was used to create the
	system for the "chain benchmark"_Section_perf.html.

	:line

	colvars tools :h4,link(colvars)

	The colvars directory contains a collection of tools for postprocessing
	data produced by the colvars collective variable library.
	To compile the tools, edit the makefile for your system and run "make".

	Please report problems and issues the colvars library and its tools
	at: https://github.com/colvars/colvars/issues

	abf_integrate:

	MC-based integration of multidimensional free energy gradient
	Version 20110511

	Syntax: ./abf_integrate < filename > \[-n < nsteps >\] \[-t < temp >\] \[-m \[0\|1\] (metadynamics)\] \[-h < hill_height >\] \[-f < variable_hill_factor >\] :pre

	The LAMMPS interface to the colvars collective variable library, as
	well as these tools, were created by Axel Kohlmeyer (akohlmey at
	gmail.com) at ICTP, Italy.

	:line

	createatoms tool :h4,link(createatoms)

	The tools/createatoms directory contains a Fortran program called
	createAtoms.f which can generate a variety of interesting crystal
	structures and geometries and output the resulting list of atom
	coordinates in LAMMPS or other formats.

	See the included Manual.pdf for details.

	The tool is authored by Xiaowang Zhou (Sandia), xzhou at sandia.gov.

	:line

	drude tool :h4,link(drude)

	The tools/drude directory contains a Python script called
	polarizer.py which can add Drude oscillators to a LAMMPS
	data file in the required format.

	See the header of the polarizer.py file for details.

	The tool is authored by Agilio Padua and Alain Dequidt: agilio.padua
	at univ-bpclermont.fr, alain.dequidt at univ-bpclermont.fr

	:line

	eam database tool :h4,link(eamdb)

	The tools/eam_database directory contains a Fortran program that will
	generate EAM alloy setfl potential files for any combination of 16
	elements: Cu, Ag, Au, Ni, Pd, Pt, Al, Pb, Fe, Mo, Ta, W, Mg, Co, Ti,
	Zr. The files can then be used with the "pair_style
	eam/alloy"_pair_eam.html command.

	The tool is authored by Xiaowang Zhou (Sandia), xzhou at sandia.gov,
	and is based on his paper:

	X. W. Zhou, R. A. Johnson, and H. N. G. Wadley, Phys. Rev. B, 69,
	144113 (2004).

	:line

	eam generate tool :h4,link(eamgn)

	The tools/eam_generate directory contains several one-file C programs
	that convert an analytic formula into a tabulated "embedded atom
	method (EAM)"_pair_eam.html setfl potential file. The potentials they
	produce are in the potentials directory, and can be used with the
	"pair_style eam/alloy"_pair_eam.html command.

	The source files and potentials were provided by Gerolf Ziegenhain
	(gerolf at ziegenhain.com).

	:line

	eff tool :h4,link(eff)

	The tools/eff directory contains various scripts for generating
	structures and post-processing output for simulations using the
	electron force field (eFF).

	These tools were provided by Andres Jaramillo-Botero at CalTech
	(ajaramil at wag.caltech.edu).

	:line

	emacs tool :h4,link(emacs)

	The tools/emacs directory contains a Lips add-on file for Emacs that
	enables a lammps-mode for editing of input scripts when using Emacs,
	with various highlighting options setup.

	These tools were provided by Aidan Thompson at Sandia
	(athomps at sandia.gov).

	:line

	fep tool :h4,link(fep)

	The tools/fep directory contains Python scripts useful for
	post-processing results from performing free-energy perturbation
	simulations using the USER-FEP package.

	The scripts were contributed by Agilio Padua (Universite Blaise
	Pascal Clermont-Ferrand), agilio.padua at univ-bpclermont.fr.

	See README file in the tools/fep directory.

	:line

	i-pi tool :h4,link(ipi)

	The tools/i-pi directory contains a version of the i-PI package, with
	all the LAMMPS-unrelated files removed. It is provided so that it can
	be used with the "fix ipi"_fix_ipi.html command to perform
	path-integral molecular dynamics (PIMD).

	The i-PI package was created and is maintained by Michele Ceriotti,
	michele.ceriotti at gmail.com, to interface to a variety of molecular
	dynamics codes.

	See the tools/i-pi/manual.pdf file for an overview of i-PI, and the
	"fix ipi"_fix_ipi.html doc page for further details on running PIMD
	calculations with LAMMPS.

	:line

	ipp tool :h4,link(ipp)

	The tools/ipp directory contains a Perl script ipp which can be used
	to facilitate the creation of a complicated file (say, a lammps input
	script or tools/createatoms input file) using a template file.

	ipp was created and is maintained by Reese Jones (Sandia), rjones at
	sandia.gov.

	See two examples in the tools/ipp directory. One of them is for the
	tools/createatoms tool's input file.

	:line

	kate tool :h4,link(kate)

	The file in the tools/kate directory is an add-on to the Kate editor
	in the KDE suite that allow syntax highlighting of LAMMPS input
	scripts. See the README.txt file for details.

	The file was provided by Alessandro Luigi Sellerio
	(alessandro.sellerio at ieni.cnr.it).

	:line

	lmp2arc tool :h4,link(arc)

	The lmp2arc sub-directory contains a tool for converting LAMMPS output
	files to the format for Accelrys' Insight MD code (formerly
	MSI/Biosym and its Discover MD code). See the README file for more
	information.

	This tool was written by John Carpenter (Cray), Michael Peachey
	(Cray), and Steve Lustig (Dupont). John is now at the Mayo Clinic
	(jec at mayo.edu), but still fields questions about the tool.

	This tool was updated for the current LAMMPS C++ version by Jeff
	Greathouse at Sandia (jagreat at sandia.gov).

	:line

	lmp2cfg tool :h4,link(cfg)

	The lmp2cfg sub-directory contains a tool for converting LAMMPS output
	files into a series of *.cfg files which can be read into the
	"AtomEye"_http://mt.seas.upenn.edu/Archive/Graphics/A visualizer. See
	the README file for more information.

	This tool was written by Ara Kooser at Sandia (askoose at sandia.gov).

	:line

	matlab tool :h4,link(matlab)

	The matlab sub-directory contains several "MATLAB"_matlabhome scripts for
	post-processing LAMMPS output. The scripts include readers for log
	and dump files, a reader for EAM potential files, and a converter that
	reads LAMMPS dump files and produces CFG files that can be visualized
	with the "AtomEye"_http://mt.seas.upenn.edu/Archive/Graphics/A
	visualizer.

	See the README.pdf file for more information.

	These scripts were written by Arun Subramaniyan at Purdue Univ
	(asubrama at purdue.edu).

	:link(matlabhome,http://www.mathworks.com)

	:line

	micelle2d tool :h4,link(micelle)

	The file micelle2d.f creates a LAMMPS data file containing short lipid
	chains in a monomer solution. It uses a text file containing lipid
	definition parameters as an input. The created molecules and solvent
	atoms can strongly overlap, so LAMMPS needs to run the system
	initially with a "soft" pair potential to un-overlap it. The syntax
	for running the tool is

	micelle2d < def.micelle2d > data.file :pre

	See the def.micelle2d file in the tools directory for an example of a
	definition file. This tool was used to create the system for the
	"micelle example"_Section_example.html.

	:line

	moltemplate tool :h4,link(moltemplate)

	The moltemplate sub-directory contains a Python-based tool for
	building molecular systems based on a text-file description, and
	creating LAMMPS data files that encode their molecular topology as
	lists of bonds, angles, dihedrals, etc. See the README.TXT file for
	more information.

	This tool was written by Andrew Jewett (jewett.aij at gmail.com), who
	supports it. It has its own WWW page at
	"http://moltemplate.org"_http://moltemplate.org.

	:line

	msi2lmp tool :h4,link(msi)

	-The msi2lmp sub-directory contains a tool for creating LAMMPS input
	-data files from BIOVIA's Materias Studio files (formerly Accelrys'
	+The msi2lmp sub-directory contains a tool for creating LAMMPS template
	+input and data files from BIOVIA's Materias Studio files (formerly Accelrys'
	Insight MD code, formerly MSI/Biosym and its Discover MD code).

	This tool was written by John Carpenter (Cray), Michael Peachey
	(Cray), and Steve Lustig (Dupont). Several people contributed changes
	to remove bugs and adapt its output to changes in LAMMPS.

	-See the README file for more information.
	+This tool has several known limitations and is no longer under active
	+development, so there are no changes except for the occasional bugfix.
	+
	+See the README file in the tools/msi2lmp folder for more information.

	:line

	phonon tool :h4,link(phonon)

	The phonon sub-directory contains a post-processing tool useful for
	analyzing the output of the "fix phonon"_fix_phonon.html command in
	the USER-PHONON package.

	See the README file for instruction on building the tool and what
	library it needs. And see the examples/USER/phonon directory
	for example problems that can be post-processed with this tool.

	This tool was written by Ling-Ti Kong at Shanghai Jiao Tong
	University.

	:line

	polybond tool :h4,link(polybond)

	The polybond sub-directory contains a Python-based tool useful for
	performing "programmable polymer bonding". The Python file
	lmpsdata.py provides a "Lmpsdata" class with various methods which can
	be invoked by a user-written Python script to create data files with
	complex bonding topologies.

	See the Manual.pdf for details and example scripts.

	This tool was written by Zachary Kraus at Georgia Tech.

	:line

	pymol_asphere tool :h4,link(pymol)

	The pymol_asphere sub-directory contains a tool for converting a
	LAMMPS dump file that contains orientation info for ellipsoidal
	particles into an input file for the "PyMol visualization
	package"_pymolhome or its "open source variant"_pymolopen.

	:link(pymolhome,http://www.pymol.org)
	:link(pymolopen,http://sourceforge.net/scm/?type=svn&group_id=4546)

	Specifically, the tool triangulates the ellipsoids so they can be
	viewed as true ellipsoidal particles within PyMol. See the README and
	examples directory within pymol_asphere for more information.

	This tool was written by Mike Brown at Sandia.

	:line

	python tool :h4,link(pythontools)

	The python sub-directory contains several Python scripts
	that perform common LAMMPS post-processing tasks, such as:

	extract thermodynamic info from a log file as columns of numbers
	plot two columns of thermodynamic info from a log file using GnuPlot
	sort the snapshots in a dump file by atom ID
	convert multiple "NEB"_neb.html dump files into one dump file for viz
	convert dump files into XYZ, CFG, or PDB format for viz by other packages :ul

	These are simple scripts built on "Pizza.py"_pizza modules. See the
	README for more info on Pizza.py and how to use these scripts.

	:line

	reax tool :h4,link(reax_tool)

	The reax sub-directory contains stand-alond codes that can
	post-process the output of the "fix reax/bonds"_fix_reax_bonds.html
	command from a LAMMPS simulation using "ReaxFF"_pair_reax.html. See
	the README.txt file for more info.

	These tools were written by Aidan Thompson at Sandia.

	:line

	smd tool :h4,link(smd)

	The smd sub-directory contains a C++ file dump2vtk_tris.cpp and
	Makefile which can be compiled and used to convert triangle output
	files created by the Smooth-Mach Dynamics (USER-SMD) package into a
	VTK-compatible unstructured grid file. It could then be read in and
	visualized by VTK.

	See the header of dump2vtk.cpp for more details.

	This tool was written by the USER-SMD package author, Georg
	Ganzenmuller at the Fraunhofer-Institute for High-Speed Dynamics,
	Ernst Mach Institute in Germany (georg.ganzenmueller at emi.fhg.de).

	:line

	vim tool :h4,link(vim)

	The files in the tools/vim directory are add-ons to the VIM editor
	that allow easier editing of LAMMPS input scripts. See the README.txt
	file for details.

	These files were provided by Gerolf Ziegenhain (gerolf at
	ziegenhain.com)

	:line

	xmgrace tool :h4,link(xmgrace)

	The files in the tools/xmgrace directory can be used to plot the
	thermodynamic data in LAMMPS log files via the xmgrace plotting
	package. There are several tools in the directory that can be used in
	post-processing mode. The lammpsplot.cpp file can be compiled and
	used to create plots from the current state of a running LAMMPS
	simulation.

	See the README file for details.

	These files were provided by Vikas Varshney (vv0210 at gmail.com)

	diff --git a/doc/src/angle_sdk.txt b/doc/src/angle_sdk.txt
	index 785585f84..0cc535e54 100644
	--- a/doc/src/angle_sdk.txt
	+++ b/doc/src/angle_sdk.txt
	@@ -1,58 +1,58 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	angle_style sdk command :h3

	[Syntax:]

	angle_style sdk :pre
	angle_style sdk/omp :pre

	[Examples:]

	angle_style sdk
	angle_coeff 1 300.0 107.0 :pre

	[Description:]

	The {sdk} angle style is a combination of the harmonic angle potential,

	:c,image(Eqs/angle_harmonic.jpg)

	where theta0 is the equilibrium value of the angle and K a prefactor,
	with the {repulsive} part of the non-bonded {lj/sdk} pair style
	between the atoms 1 and 3. This angle potential is intended for
	coarse grained MD simulations with the CMM parametrization using the
	"pair_style lj/sdk"_pair_sdk.html. Relative to the pair_style
	{lj/sdk}, however, the energy is shifted by {epsilon}, to avoid sudden
	jumps. Note that the usual 1/2 factor is included in K.

	The following coefficients must be defined for each angle type via the
	"angle_coeff"_angle_coeff.html command as in the example above:

	K (energy/radian^2)
	theta0 (degrees) :ul

	Theta0 is specified in degrees, but LAMMPS converts it to radians
	internally; hence the units of K are in energy/radian^2.
	The also required {lj/sdk} parameters will be extracted automatically
	from the pair_style.

	[Restrictions:]

	This angle style can only be used if LAMMPS was built with the
	-USER-CG-CMM package. See the "Making
	+USER-CGSDK package. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info on packages.

	[Related commands:]

	"angle_coeff"_angle_coeff.html, "angle_style
	harmonic"_angle_harmonic.html, "pair_style lj/sdk"_pair_sdk.html,
	"pair_style lj/sdk/coul/long"_pair_sdk.html

	[Default:] none
	diff --git a/doc/src/bonds.txt b/doc/src/bonds.txt
	index 3b50f6482..169d56ecb 100644
	--- a/doc/src/bonds.txt
	+++ b/doc/src/bonds.txt
	@@ -1,24 +1,23 @@
	Bond Styles :h1

	<!-- RST

	.. toctree::
	:maxdepth: 1

	bond_class2
	bond_fene
	bond_fene_expand
	bond_harmonic
	bond_harmonic_shift
	bond_harmonic_shift_cut
	bond_hybrid
	bond_morse
	bond_none
	bond_nonlinear
	bond_oxdna
	- bond_oxdna2
	bond_quartic
	bond_table
	bond_zero

	END_RST -->
	diff --git a/doc/src/compute_sna_atom.txt b/doc/src/compute_sna_atom.txt
	index e2df70647..f82df0d81 100644
	--- a/doc/src/compute_sna_atom.txt
	+++ b/doc/src/compute_sna_atom.txt
	@@ -1,250 +1,274 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	compute sna/atom command :h3
	compute snad/atom command :h3
	compute snav/atom command :h3

	[Syntax:]

	compute ID group-ID sna/atom rcutfac rfac0 twojmax R_1 R_2 ... w_1 w_2 ... keyword values ...
	compute ID group-ID snad/atom rcutfac rfac0 twojmax R_1 R_2 ... w_1 w_2 ... keyword values ...
	compute ID group-ID snav/atom rcutfac rfac0 twojmax R_1 R_2 ... w_1 w_2 ... keyword values ... :pre

	ID, group-ID are documented in "compute"_compute.html command :ulb,l
	sna/atom = style name of this compute command :l
	rcutfac = scale factor applied to all cutoff radii (positive real) :l
	rfac0 = parameter in distance to angle conversion (0 < rcutfac < 1) :l
	twojmax = band limit for bispectrum components (non-negative integer) :l
	R_1, R_2,... = list of cutoff radii, one for each type (distance units) :l
	w_1, w_2,... = list of neighbor weights, one for each type :l
	zero or more keyword/value pairs may be appended :l
	-keyword = {diagonal} or {rmin0} or {switchflag} or {bzeroflag} :l
	+keyword = {diagonal} or {rmin0} or {switchflag} or {bzeroflag} or {quadraticflag}:l
	{diagonal} value = {0} or {1} or {2} or {3}
	{0} = all j1, j2, j <= twojmax, j2 <= j1
	{1} = subset satisfying j1 == j2
	{2} = subset satisfying j1 == j2 == j3
	{3} = subset satisfying j2 <= j1 <= j
	{rmin0} value = parameter in distance to angle conversion (distance units)
	{switchflag} value = {0} or {1}
	{0} = do not use switching function
	{1} = use switching function
	{bzeroflag} value = {0} or {1}
	{0} = do not subtract B0
	- {1} = subtract B0 :pre
	+ {1} = subtract B0
	+ {quadraticflag} value = {0} or {1}
	+ {0} = do not generate quadratic terms
	+ {1} = generate quadratic terms :pre
	:ule

	[Examples:]

	compute b all sna/atom 1.4 0.99363 6 2.0 2.4 0.75 1.0 diagonal 3 rmin0 0.0
	compute db all sna/atom 1.4 0.95 6 2.0 1.0
	compute vb all sna/atom 1.4 0.95 6 2.0 1.0 :pre

	[Description:]

	Define a computation that calculates a set of bispectrum components
	for each atom in a group.

	Bispectrum components of an atom are order parameters characterizing
	the radial and angular distribution of neighbor atoms. The detailed
	mathematical definition is given in the paper by Thompson et
	al. "(Thompson)"_#Thompson20141

	The position of a neighbor atom {i'} relative to a central atom {i} is
	a point within the 3D ball of radius {R_ii' = rcutfac*(R_i + R_i')}

	Bartok et al. "(Bartok)"_#Bartok20101, proposed mapping this 3D ball
	onto the 3-sphere, the surface of the unit ball in a four-dimensional
	space. The radial distance {r} within {R_ii'} is mapped on to a third
	polar angle {theta0} defined by,

	:c,image(Eqs/compute_sna_atom1.jpg)

	In this way, all possible neighbor positions are mapped on to a subset
	of the 3-sphere. Points south of the latitude {theta0max=rfac0*Pi}
	are excluded.

	The natural basis for functions on the 3-sphere is formed by the 4D
	hyperspherical harmonics {U^j_m,m'(theta, phi, theta0).} These
	functions are better known as {D^j_m,m',} the elements of the Wigner
	{D}-matrices "(Meremianin"_#Meremianin2006,
	"Varshalovich)"_#Varshalovich1987.

	The density of neighbors on the 3-sphere can be written as a sum of
	Dirac-delta functions, one for each neighbor, weighted by species and
	radial distance. Expanding this density function as a generalized
	Fourier series in the basis functions, we can write each Fourier
	coefficient as

	:c,image(Eqs/compute_sna_atom2.jpg)

	The {w_i'} neighbor weights are dimensionless numbers that are chosen
	to distinguish atoms of different types, while the central atom is
	arbitrarily assigned a unit weight. The function {fc(r)} ensures that
	the contribution of each neighbor atom goes smoothly to zero at
	{R_ii'}:

	:c,image(Eqs/compute_sna_atom4.jpg)

	The expansion coefficients {u^j_m,m'} are complex-valued and they are
	not directly useful as descriptors, because they are not invariant
	under rotation of the polar coordinate frame. However, the following
	scalar triple products of expansion coefficients can be shown to be
	real-valued and invariant under rotation "(Bartok)"_#Bartok20101.

	:c,image(Eqs/compute_sna_atom3.jpg)

	The constants {H^jmm'_j1m1m1'_j2m2m2'} are coupling coefficients,
	analogous to Clebsch-Gordan coefficients for rotations on the
	2-sphere. These invariants are the components of the bispectrum and
	these are the quantities calculated by the compute {sna/atom}. They
	characterize the strength of density correlations at three points on
	the 3-sphere. The j2=0 subset form the power spectrum, which
	characterizes the correlations of two points. The lowest-order
	components describe the coarsest features of the density function,
	while higher-order components reflect finer detail. Note that the
	central atom is included in the expansion, so three point-correlations
	can be either due to three neighbors, or two neighbors and the central
	atom.

	Compute {snad/atom} calculates the derivative of the bispectrum components
	summed separately for each atom type:

	:c,image(Eqs/compute_sna_atom5.jpg)

	The sum is over all atoms {i'} of atom type {I}. For each atom {i},
	this compute evaluates the above expression for each direction, each
	atom type, and each bispectrum component. See section below on output
	for a detailed explanation.

	Compute {snav/atom} calculates the virial contribution due to the
	derivatives:

	:c,image(Eqs/compute_sna_atom6.jpg)

	Again, the sum is over all atoms {i'} of atom type {I}. For each atom
	{i}, this compute evaluates the above expression for each of the six
	virial components, each atom type, and each bispectrum component. See
	section below on output for a detailed explanation.

	The value of all bispectrum components will be zero for atoms not in
	the group. Neighbor atoms not in the group do not contribute to the
	bispectrum of atoms in the group.

	The neighbor list needed to compute this quantity is constructed each
	time the calculation is performed (i.e. each time a snapshot of atoms
	is dumped). Thus it can be inefficient to compute/dump this quantity
	too frequently.

	The argument {rcutfac} is a scale factor that controls the ratio of
	atomic radius to radial cutoff distance.

	The argument {rfac0} and the optional keyword {rmin0} define the
	linear mapping from radial distance to polar angle {theta0} on the
	3-sphere.

	The argument {twojmax} and the keyword {diagonal} define which
	bispectrum components are generated. See section below on output for a
	detailed explanation of the number of bispectrum components and the
	-ordered in which they are listed
	+ordered in which they are listed.

	The keyword {switchflag} can be used to turn off the switching
	function.

	The keyword {bzeroflag} determines whether or not {B0}, the bispectrum
	components of an atom with no neighbors, are subtracted from
	the calculated bispectrum components. This optional keyword is only
	available for compute {sna/atom}, as {snad/atom} and {snav/atom}
	are unaffected by the removal of constant terms.

	+The keyword {quadraticflag} determines whether or not the
	+quadratic analogs to the bispectrum quantities are generated.
	+These are formed by taking the outer product of the vector
	+of bispectrum components with itself.
	+See section below on output for a
	+detailed explanation of the number of quadratic terms and the
	+ordered in which they are listed.
	+
	NOTE: If you have a bonded system, then the settings of
	"special_bonds"_special_bonds.html command can remove pairwise
	interactions between atoms in the same bond, angle, or dihedral. This
	is the default setting for the "special_bonds"_special_bonds.html
	command, and means those pairwise interactions do not appear in the
	neighbor list. Because this fix uses the neighbor list, it also means
	those pairs will not be included in the calculation. One way to get
	around this, is to write a dump file, and use the "rerun"_rerun.html
	command to compute the bispectrum components for snapshots in the dump
	file. The rerun script can use a "special_bonds"_special_bonds.html
	command that includes all pairs in the neighbor list.

	;line

	[Output info:]

	Compute {sna/atom} calculates a per-atom array, each column
	corresponding to a particular bispectrum component. The total number
	-of columns and the identities of the bispectrum component contained in
	+of columns and the identity of the bispectrum component contained in
	each column depend on the values of {twojmax} and {diagonal}, as
	described by the following piece of python code:

	for j1 in range(0,twojmax+1):
	if(diagonal==2):
	print j1/2.,j1/2.,j1/2.
	elif(diagonal==1):
	for j in range(0,min(twojmax,2*j1)+1,2):
	print j1/2.,j1/2.,j/2.
	elif(diagonal==0):
	for j2 in range(0,j1+1):
	for j in range(j1-j2,min(twojmax,j1+j2)+1,2):
	print j1/2.,j2/2.,j/2.
	elif(diagonal==3):
	for j2 in range(0,j1+1):
	for j in range(j1-j2,min(twojmax,j1+j2)+1,2):
	if (j>=j1): print j1/2.,j2/2.,j/2. :pre

	Compute {snad/atom} evaluates a per-atom array. The columns are
	arranged into {ntypes} blocks, listed in order of atom type {I}. Each
	block contains three sub-blocks corresponding to the {x}, {y}, and {z}
	components of the atom position. Each of these sub-blocks contains
	one column for each bispectrum component, the same as for compute
	{sna/atom}

	Compute {snav/atom} evaluates a per-atom array. The columns are
	arranged into {ntypes} blocks, listed in order of atom type {I}. Each
	block contains six sub-blocks corresponding to the {xx}, {yy}, {zz},
	{yz}, {xz}, and {xy} components of the virial tensor in Voigt
	notation. Each of these sub-blocks contains one column for each
	bispectrum component, the same as for compute {sna/atom}

	+For example, if {K}=30 and ntypes=1, the number of columns in the per-atom
	+arrays generated by {sna/atom}, {snad/atom}, and {snav/atom}
	+are 30, 90, and 180, respectively. With {quadratic} value=1,
	+the numbers of columns are 930, 2790, and 5580, respectively.
	+
	+If the {quadratic} keyword value is set to 1, then additional
	+columns are appended to each per-atom array, corresponding to
	+a matrix of quantities that are products of two bispectrum components. If the
	+number of bispectrum components is {K}, then the number of matrix elements
	+is {K}^2. These are output in subblocks of {K}^2 columns, using the same
	+ordering of columns and sub-blocks as was used for the bispectrum
	+components.
	+
	These values can be accessed by any command that uses per-atom values
	from a compute as input. See "Section
	6.15"_Section_howto.html#howto_15 for an overview of LAMMPS output
	options.

	[Restrictions:]

	These computes are part of the SNAP package. They are only enabled if
	LAMMPS was built with that package. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	[Related commands:]

	"pair_style snap"_pair_snap.html

	[Default:]

	The optional keyword defaults are {diagonal} = 0, {rmin0} = 0,
	-{switchflag} = 1, {bzeroflag} = 0.
	+{switchflag} = 1, {bzeroflag} = 1, {quadraticflag} = 0,

	:line

	:link(Thompson20141)
	[(Thompson)] Thompson, Swiler, Trott, Foiles, Tucker, under review, preprint
	available at "arXiv:1409.3880"_http://arxiv.org/abs/1409.3880

	:link(Bartok20101)
	[(Bartok)] Bartok, Payne, Risi, Csanyi, Phys Rev Lett, 104, 136403 (2010).

	:link(Meremianin2006)
	[(Meremianin)] Meremianin, J. Phys. A, 39, 3099 (2006).

	:link(Varshalovich1987)
	[(Varshalovich)] Varshalovich, Moskalev, Khersonskii, Quantum Theory
	of Angular Momentum, World Scientific, Singapore (1987).
	diff --git a/doc/src/dump.txt b/doc/src/dump.txt
	index cb9a5ba74..69a00eb47 100644
	--- a/doc/src/dump.txt
	+++ b/doc/src/dump.txt
	@@ -1,678 +1,676 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	dump command :h3
	-"dump custom/vtk"_dump_custom_vtk.html command :h3
	+"dump vtk"_dump_vtk.html command :h3
	"dump h5md"_dump_h5md.html command :h3
	+"dump molfile"_dump_molfile.html command :h3
	+"dump netcdf"_dump_netcdf.html command :h3
	"dump image"_dump_image.html command :h3
	"dump movie"_dump_image.html command :h3
	-"dump molfile"_dump_molfile.html command :h3
	-"dump nc"_dump_nc.html command :h3

	[Syntax:]

	dump ID group-ID style N file args :pre

	ID = user-assigned name for the dump :ulb,l
	group-ID = ID of the group of atoms to be dumped :l
	-style = {atom} or {atom/gz} or {atom/mpiio} or {cfg} or {cfg/gz} or {cfg/mpiio} or {dcd} or {xtc} or {xyz} or {xyz/gz} or {xyz/mpiio} or {h5md} or {image} or {movie} or {molfile} or {local} or {custom} or {custom/gz} or {custom/mpiio} :l
	+style = {atom} or {atom/gz} or {atom/mpiio} or {cfg} or {cfg/gz} or {cfg/mpiio} or {custom} or {custom/gz} or {custom/mpiio} or {dcd} or {h5md} or {image} or or {local} or {molfile} or {movie} or {netcdf} or {netcdf/mpiio} or {vtk} or {xtc} or {xyz} or {xyz/gz} or {xyz/mpiio} :l
	N = dump every this many timesteps :l
	file = name of file to write dump info to :l
	args = list of arguments for a particular style :l
	{atom} args = none
	{atom/gz} args = none
	{atom/mpiio} args = none
	{cfg} args = same as {custom} args, see below
	{cfg/gz} args = same as {custom} args, see below
	{cfg/mpiio} args = same as {custom} args, see below
	+ {custom}, {custom/gz}, {custom/mpiio} args = see below
	{dcd} args = none
	+ {h5md} args = discussed on "dump h5md"_dump_h5md.html doc page
	+ {image} args = discussed on "dump image"_dump_image.html doc page
	+ {local} args = see below
	+ {molfile} args = discussed on "dump molfile"_dump_molfile.html doc page
	+ {movie} args = discussed on "dump image"_dump_image.html doc page
	+ {netcdf} args = discussed on "dump netcdf"_dump_netcdf.html doc page
	+ {netcdf/mpiio} args = discussed on "dump netcdf"_dump_netcdf.html doc page
	+ {vtk} args = same as {custom} args, see below, also "dump vtk"_dump_vtk.html doc page
	{xtc} args = none
	- {xyz} args = none :pre
	- {xyz/gz} args = none :pre
	+ {xyz} args = none
	+ {xyz/gz} args = none
	{xyz/mpiio} args = none :pre

	- {custom/vtk} args = similar to custom args below, discussed on "dump custom/vtk"_dump_custom_vtk.html doc page :pre
	-
	- {h5md} args = discussed on "dump h5md"_dump_h5md.html doc page :pre
	-
	- {image} args = discussed on "dump image"_dump_image.html doc page :pre
	-
	- {movie} args = discussed on "dump image"_dump_image.html doc page :pre
	-
	- {molfile} args = discussed on "dump molfile"_dump_molfile.html doc page
	-
	- {nc} args = discussed on "dump nc"_dump_nc.html doc page :pre
	-
	- {local} args = list of local attributes
	- possible attributes = index, c_ID, c_ID\[I\], f_ID, f_ID\[I\]
	- index = enumeration of local values
	- c_ID = local vector calculated by a compute with ID
	- c_ID\[I\] = Ith column of local array calculated by a compute with ID, I can include wildcard (see below)
	- f_ID = local vector calculated by a fix with ID
	- f_ID\[I\] = Ith column of local array calculated by a fix with ID, I can include wildcard (see below) :pre
	-
	- {custom} or {custom/gz} or {custom/mpiio} args = list of atom attributes
	+{custom} or {custom/gz} or {custom/mpiio} args = list of atom attributes :l
	possible attributes = id, mol, proc, procp1, type, element, mass,
	x, y, z, xs, ys, zs, xu, yu, zu,
	xsu, ysu, zsu, ix, iy, iz,
	vx, vy, vz, fx, fy, fz,
	q, mux, muy, muz, mu,
	radius, diameter, omegax, omegay, omegaz,
	angmomx, angmomy, angmomz, tqx, tqy, tqz,
	c_ID, c_ID\[N\], f_ID, f_ID\[N\], v_name :pre

	id = atom ID
	mol = molecule ID
	proc = ID of processor that owns atom
	procp1 = ID+1 of processor that owns atom
	type = atom type
	element = name of atom element, as defined by "dump_modify"_dump_modify.html command
	mass = atom mass
	x,y,z = unscaled atom coordinates
	xs,ys,zs = scaled atom coordinates
	xu,yu,zu = unwrapped atom coordinates
	xsu,ysu,zsu = scaled unwrapped atom coordinates
	ix,iy,iz = box image that the atom is in
	vx,vy,vz = atom velocities
	fx,fy,fz = forces on atoms
	q = atom charge
	mux,muy,muz = orientation of dipole moment of atom
	mu = magnitude of dipole moment of atom
	radius,diameter = radius,diameter of spherical particle
	omegax,omegay,omegaz = angular velocity of spherical particle
	angmomx,angmomy,angmomz = angular momentum of aspherical particle
	tqx,tqy,tqz = torque on finite-size particles
	c_ID = per-atom vector calculated by a compute with ID
	c_ID\[I\] = Ith column of per-atom array calculated by a compute with ID, I can include wildcard (see below)
	f_ID = per-atom vector calculated by a fix with ID
	f_ID\[I\] = Ith column of per-atom array calculated by a fix with ID, I can include wildcard (see below)
	v_name = per-atom vector calculated by an atom-style variable with name
	d_name = per-atom floating point vector with name, managed by fix property/atom
	i_name = per-atom integer vector with name, managed by fix property/atom :pre
	+
	+{local} args = list of local attributes :l
	+ possible attributes = index, c_ID, c_ID\[I\], f_ID, f_ID\[I\]
	+ index = enumeration of local values
	+ c_ID = local vector calculated by a compute with ID
	+ c_ID\[I\] = Ith column of local array calculated by a compute with ID, I can include wildcard (see below)
	+ f_ID = local vector calculated by a fix with ID
	+ f_ID\[I\] = Ith column of local array calculated by a fix with ID, I can include wildcard (see below) :pre
	+
	:ule

	[Examples:]

	dump myDump all atom 100 dump.atom
	dump myDump all atom/mpiio 100 dump.atom.mpiio
	dump myDump all atom/gz 100 dump.atom.gz
	dump 2 subgroup atom 50 dump.run.bin
	dump 2 subgroup atom 50 dump.run.mpiio.bin
	dump 4a all custom 100 dump.myforce.* id type x y vx fx
	dump 4b flow custom 100 dump.%.myforce id type c_myF\[3\] v_ke
	dump 4b flow custom 100 dump.%.myforce id type c_myF\[*\] v_ke
	dump 2 inner cfg 10 dump.snap.*.cfg mass type xs ys zs vx vy vz
	dump snap all cfg 100 dump.config.*.cfg mass type xs ys zs id type c_Stress\[2\]
	dump 1 all xtc 1000 file.xtc :pre

	[Description:]

	Dump a snapshot of atom quantities to one or more files every N
	timesteps in one of several styles. The {image} and {movie} styles are
	the exception: the {image} style renders a JPG, PNG, or PPM image file
	of the atom configuration every N timesteps while the {movie} style
	combines and compresses them into a movie file; both are discussed in
	detail on the "dump image"_dump_image.html doc page. The timesteps on
	which dump output is written can also be controlled by a variable.
	See the "dump_modify every"_dump_modify.html command.

	Only information for atoms in the specified group is dumped. The
	"dump_modify thresh and region"_dump_modify.html commands can also
	alter what atoms are included. Not all styles support all these
	options; see details below.

	As described below, the filename determines the kind of output (text
	or binary or gzipped, one big file or one per timestep, one big file
	or multiple smaller files).

	NOTE: Because periodic boundary conditions are enforced only on
	timesteps when neighbor lists are rebuilt, the coordinates of an atom
	written to a dump file may be slightly outside the simulation box.
	Re-neighbor timesteps will not typically coincide with the timesteps
	dump snapshots are written. See the "dump_modify
	pbc"_dump_modify.html command if you with to force coordinates to be
	strictly inside the simulation box.

	NOTE: Unless the "dump_modify sort"_dump_modify.html option is
	invoked, the lines of atom information written to dump files
	(typically one line per atom) will be in an indeterminate order for
	each snapshot. This is even true when running on a single processor,
	if the "atom_modify sort"_atom_modify.html option is on, which it is
	by default. In this case atoms are re-ordered periodically during a
	simulation, due to spatial sorting. It is also true when running in
	parallel, because data for a single snapshot is collected from
	multiple processors, each of which owns a subset of the atoms.

	For the {atom}, {custom}, {cfg}, and {local} styles, sorting is off by
	default. For the {dcd}, {xtc}, {xyz}, and {molfile} styles, sorting by
	atom ID is on by default. See the "dump_modify"_dump_modify.html doc
	page for details.

	The {atom/gz}, {cfg/gz}, {custom/gz}, and {xyz/gz} styles are identical
	in command syntax to the corresponding styles without "gz", however,
	they generate compressed files using the zlib library. Thus the filename
	suffix ".gz" is mandatory. This is an alternative approach to writing
	compressed files via a pipe, as done by the regular dump styles, which
	may be required on clusters where the interface to the high-speed network
	disallows using the fork() library call (which is needed for a pipe).
	For the remainder of this doc page, you should thus consider the {atom}
	and {atom/gz} styles (etc) to be inter-changeable, with the exception
	of the required filename suffix.

	As explained below, the {atom/mpiio}, {cfg/mpiio}, {custom/mpiio}, and
	{xyz/mpiio} styles are identical in command syntax and in the format
	of the dump files they create, to the corresponding styles without
	"mpiio", except the single dump file they produce is written in
	parallel via the MPI-IO library. For the remainder of this doc page,
	you should thus consider the {atom} and {atom/mpiio} styles (etc) to
	be inter-changeable. The one exception is how the filename is
	specified for the MPI-IO styles, as explained below.

	The precision of values output to text-based dump files can be
	controlled by the "dump_modify format"_dump_modify.html command and
	its options.

	:line

	The {style} keyword determines what atom quantities are written to the
	file and in what format. Settings made via the
	"dump_modify"_dump_modify.html command can also alter the format of
	individual values and the file itself.

	The {atom}, {local}, and {custom} styles create files in a simple text
	format that is self-explanatory when viewing a dump file. Many of the
	LAMMPS "post-processing tools"_Section_tools.html, including
	"Pizza.py"_http://www.sandia.gov/~sjplimp/pizza.html, work with this
	format, as does the "rerun"_rerun.html command.

	For post-processing purposes the {atom}, {local}, and {custom} text
	files are self-describing in the following sense.

	The dimensions of the simulation box are included in each snapshot.
	For an orthogonal simulation box this information is is formatted as:

	ITEM: BOX BOUNDS xx yy zz
	xlo xhi
	ylo yhi
	zlo zhi :pre

	where xlo,xhi are the maximum extents of the simulation box in the
	x-dimension, and similarly for y and z. The "xx yy zz" represent 6
	characters that encode the style of boundary for each of the 6
	simulation box boundaries (xlo,xhi and ylo,yhi and zlo,zhi). Each of
	the 6 characters is either p = periodic, f = fixed, s = shrink wrap,
	or m = shrink wrapped with a minimum value. See the
	"boundary"_boundary.html command for details.

	For triclinic simulation boxes (non-orthogonal), an orthogonal
	bounding box which encloses the triclinic simulation box is output,
	along with the 3 tilt factors (xy, xz, yz) of the triclinic box,
	formatted as follows:

	ITEM: BOX BOUNDS xy xz yz xx yy zz
	xlo_bound xhi_bound xy
	ylo_bound yhi_bound xz
	zlo_bound zhi_bound yz :pre

	The presence of the text "xy xz yz" in the ITEM line indicates that
	the 3 tilt factors will be included on each of the 3 following lines.
	This bounding box is convenient for many visualization programs. The
	meaning of the 6 character flags for "xx yy zz" is the same as above.

	Note that the first two numbers on each line are now xlo_bound instead
	of xlo, etc, since they represent a bounding box. See "this
	section"_Section_howto.html#howto_12 of the doc pages for a geometric
	description of triclinic boxes, as defined by LAMMPS, simple formulas
	for how the 6 bounding box extents (xlo_bound,xhi_bound,etc) are
	calculated from the triclinic parameters, and how to transform those
	parameters to and from other commonly used triclinic representations.

	The "ITEM: ATOMS" line in each snapshot lists column descriptors for
	the per-atom lines that follow. For example, the descriptors would be
	"id type xs ys zs" for the default {atom} style, and would be the atom
	attributes you specify in the dump command for the {custom} style.

	For style {atom}, atom coordinates are written to the file, along with
	the atom ID and atom type. By default, atom coords are written in a
	scaled format (from 0 to 1). I.e. an x value of 0.25 means the atom
	is at a location 1/4 of the distance from xlo to xhi of the box
	boundaries. The format can be changed to unscaled coords via the
	"dump_modify"_dump_modify.html settings. Image flags can also be
	added for each atom via dump_modify.

	Style {custom} allows you to specify a list of atom attributes to be
	written to the dump file for each atom. Possible attributes are
	listed above and will appear in the order specified. You cannot
	specify a quantity that is not defined for a particular simulation -
	such as {q} for atom style {bond}, since that atom style doesn't
	assign charges. Dumps occur at the very end of a timestep, so atom
	attributes will include effects due to fixes that are applied during
	the timestep. An explanation of the possible dump custom attributes
	is given below.

	For style {local}, local output generated by "computes"_compute.html
	and "fixes"_fix.html is used to generate lines of output that is
	written to the dump file. This local data is typically calculated by
	each processor based on the atoms it owns, but there may be zero or
	more entities per atom, e.g. a list of bond distances. An explanation
	of the possible dump local attributes is given below. Note that by
	using input from the "compute
	property/local"_compute_property_local.html command with dump local,
	it is possible to generate information on bonds, angles, etc that can
	be cut and pasted directly into a data file read by the
	"read_data"_read_data.html command.

	Style {cfg} has the same command syntax as style {custom} and writes
	extended CFG format files, as used by the
	"AtomEye"_http://mt.seas.upenn.edu/Archive/Graphics/A visualization
	package. Since the extended CFG format uses a single snapshot of the
	system per file, a wildcard "*" must be included in the filename, as
	discussed below. The list of atom attributes for style {cfg} must
	begin with either "mass type xs ys zs" or "mass type xsu ysu zsu"
	since these quantities are needed to write the CFG files in the
	appropriate format (though the "mass" and "type" fields do not appear
	explicitly in the file). Any remaining attributes will be stored as
	"auxiliary properties" in the CFG files. Note that you will typically
	want to use the "dump_modify element"_dump_modify.html command with
	CFG-formatted files, to associate element names with atom types, so
	that AtomEye can render atoms appropriately. When unwrapped
	coordinates {xsu}, {ysu}, and {zsu} are requested, the nominal AtomEye
	periodic cell dimensions are expanded by a large factor UNWRAPEXPAND =
	10.0, which ensures atoms that are displayed correctly for up to
	UNWRAPEXPAND/2 periodic boundary crossings in any direction. Beyond
	this, AtomEye will rewrap the unwrapped coordinates. The expansion
	causes the atoms to be drawn farther away from the viewer, but it is
	easy to zoom the atoms closer, and the interatomic distances are
	unaffected.

	The {dcd} style writes DCD files, a standard atomic trajectory format
	used by the CHARMM, NAMD, and XPlor molecular dynamics packages. DCD
	files are binary and thus may not be portable to different machines.
	The number of atoms per snapshot cannot change with the {dcd} style.
	The {unwrap} option of the "dump_modify"_dump_modify.html command
	allows DCD coordinates to be written "unwrapped" by the image flags
	for each atom. Unwrapped means that if the atom has passed through
	a periodic boundary one or more times, the value is printed for what
	the coordinate would be if it had not been wrapped back into the
	periodic box. Note that these coordinates may thus be far outside
	the box size stored with the snapshot.

	The {xtc} style writes XTC files, a compressed trajectory format used
	by the GROMACS molecular dynamics package, and described
	"here"_http://manual.gromacs.org/current/online/xtc.html.
	The precision used in XTC files can be adjusted via the
	"dump_modify"_dump_modify.html command. The default value of 1000
	means that coordinates are stored to 1/1000 nanometer accuracy. XTC
	files are portable binary files written in the NFS XDR data format,
	so that any machine which supports XDR should be able to read them.
	The number of atoms per snapshot cannot change with the {xtc} style.
	The {unwrap} option of the "dump_modify"_dump_modify.html command allows
	XTC coordinates to be written "unwrapped" by the image flags for each
	atom. Unwrapped means that if the atom has passed thru a periodic
	boundary one or more times, the value is printed for what the
	coordinate would be if it had not been wrapped back into the periodic
	box. Note that these coordinates may thus be far outside the box size
	stored with the snapshot.

	The {xyz} style writes XYZ files, which is a simple text-based
	coordinate format that many codes can read. Specifically it has
	a line with the number of atoms, then a comment line that is
	usually ignored followed by one line per atom with the atom type
	and the x-, y-, and z-coordinate of that atom. You can use the
	"dump_modify element"_dump_modify.html option to change the output
	from using the (numerical) atom type to an element name (or some
	other label). This will help many visualization programs to guess
	bonds and colors.

	Note that {atom}, {custom}, {dcd}, {xtc}, and {xyz} style dump files
	can be read directly by "VMD"_http://www.ks.uiuc.edu/Research/vmd, a
	popular molecular viewing program.

	:line

	Dumps are performed on timesteps that are a multiple of N (including
	timestep 0) and on the last timestep of a minimization if the
	minimization converges. Note that this means a dump will not be
	performed on the initial timestep after the dump command is invoked,
	if the current timestep is not a multiple of N. This behavior can be
	changed via the "dump_modify first"_dump_modify.html command, which
	can also be useful if the dump command is invoked after a minimization
	ended on an arbitrary timestep. N can be changed between runs by
	using the "dump_modify every"_dump_modify.html command (not allowed
	for {dcd} style). The "dump_modify every"_dump_modify.html command
	also allows a variable to be used to determine the sequence of
	timesteps on which dump files are written. In this mode a dump on the
	first timestep of a run will also not be written unless the
	"dump_modify first"_dump_modify.html command is used.

	The specified filename determines how the dump file(s) is written.
	The default is to write one large text file, which is opened when the
	dump command is invoked and closed when an "undump"_undump.html
	command is used or when LAMMPS exits. For the {dcd} and {xtc} styles,
	this is a single large binary file.

	Dump filenames can contain two wildcard characters. If a "*"
	character appears in the filename, then one file per snapshot is
	written and the "*" character is replaced with the timestep value.
	For example, tmp.dump.* becomes tmp.dump.0, tmp.dump.10000,
	tmp.dump.20000, etc. This option is not available for the {dcd} and
	{xtc} styles. Note that the "dump_modify pad"_dump_modify.html
	command can be used to insure all timestep numbers are the same length
	(e.g. 00010), which can make it easier to read a series of dump files
	in order with some post-processing tools.

	If a "%" character appears in the filename, then each of P processors
	writes a portion of the dump file, and the "%" character is replaced
	with the processor ID from 0 to P-1. For example, tmp.dump.% becomes
	tmp.dump.0, tmp.dump.1, ... tmp.dump.P-1, etc. This creates smaller
	files and can be a fast mode of output on parallel machines that
	support parallel I/O for output. This option is not available for the
	{dcd}, {xtc}, and {xyz} styles.

	By default, P = the number of processors meaning one file per
	processor, but P can be set to a smaller value via the {nfile} or
	{fileper} keywords of the "dump_modify"_dump_modify.html command.
	These options can be the most efficient way of writing out dump files
	when running on large numbers of processors.

	Note that using the "*" and "%" characters together can produce a
	large number of small dump files!

	For the {atom/mpiio}, {cfg/mpiio}, {custom/mpiio}, and {xyz/mpiio}
	styles, a single dump file is written in parallel via the MPI-IO
	library, which is part of the MPI standard for versions 2.0 and above.
	Using MPI-IO requires two steps. First, build LAMMPS with its MPIIO
	package installed, e.g.

	make yes-mpiio # installs the MPIIO package
	make mpi # build LAMMPS for your platform :pre

	Second, use a dump filename which contains ".mpiio". Note that it
	does not have to end in ".mpiio", just contain those characters.
	Unlike MPI-IO restart files, which must be both written and read using
	MPI-IO, the dump files produced by these MPI-IO styles are identical
	in format to the files produced by their non-MPI-IO style
	counterparts. This means you can write a dump file using MPI-IO and
	use the "read_dump"_read_dump.html command or perform other
	post-processing, just as if the dump file was not written using
	MPI-IO.

	Note that MPI-IO dump files are one large file which all processors
	write to. You thus cannot use the "%" wildcard character described
	above in the filename since that specifies generation of multiple
	files. You can use the ".bin" suffix described below in an MPI-IO
	dump file; again this file will be written in parallel and have the
	same binary format as if it were written without MPI-IO.

	If the filename ends with ".bin", the dump file (or files, if "*" or
	"%" is also used) is written in binary format. A binary dump file
	will be about the same size as a text version, but will typically
	write out much faster. Of course, when post-processing, you will need
	to convert it back to text format (see the "binary2txt
	tool"_Section_tools.html#binary) or write your own code to read the
	binary file. The format of the binary file can be understood by
	looking at the tools/binary2txt.cpp file. This option is only
	available for the {atom} and {custom} styles.

	If the filename ends with ".gz", the dump file (or files, if "*" or "%"
	is also used) is written in gzipped format. A gzipped dump file will
	be about 3x smaller than the text version, but will also take longer
	to write. This option is not available for the {dcd} and {xtc}
	styles.

	:line

	Note that in the discussion which follows, for styles which can
	reference values from a compute or fix, like the {custom}, {cfg}, or
	{local} styles, the bracketed index I can be specified using a
	wildcard asterisk with the index to effectively specify multiple
	values. This takes the form "" or "n" or "n" or "mn". If N = the
	size of the vector (for {mode} = scalar) or the number of columns in
	the array (for {mode} = vector), then an asterisk with no numeric
	values means all indices from 1 to N. A leading asterisk means all
	indices from 1 to n (inclusive). A trailing asterisk means all
	indices from n to N (inclusive). A middle asterisk means all indices
	from m to n (inclusive).

	Using a wildcard is the same as if the individual columns of the array
	had been listed one by one. E.g. these 2 dump commands are
	equivalent, since the "compute stress/atom"_compute_stress_atom.html
	command creates a per-atom array with 6 columns:

	compute myPress all stress/atom NULL
	dump 2 all custom 100 tmp.dump id myPress\[*\]
	dump 2 all custom 100 tmp.dump id myPress\[1\] myPress\[2\] myPress\[3\] &
	myPress\[4\] myPress\[5\] myPress\[6\] :pre

	:line

	This section explains the local attributes that can be specified as
	part of the {local} style.

	The {index} attribute can be used to generate an index number from 1
	to N for each line written into the dump file, where N is the total
	number of local datums from all processors, or lines of output that
	will appear in the snapshot. Note that because data from different
	processors depend on what atoms they currently own, and atoms migrate
	between processor, there is no guarantee that the same index will be
	used for the same info (e.g. a particular bond) in successive
	snapshots.

	The {c_ID} and {c_ID\[I\]} attributes allow local vectors or arrays
	calculated by a "compute"_compute.html to be output. The ID in the
	attribute should be replaced by the actual ID of the compute that has
	been defined previously in the input script. See the
	"compute"_compute.html command for details. There are computes for
	calculating local information such as indices, types, and energies for
	bonds and angles.

	Note that computes which calculate global or per-atom quantities, as
	opposed to local quantities, cannot be output in a dump local command.
	Instead, global quantities can be output by the "thermo_style
	custom"_thermo_style.html command, and per-atom quantities can be
	output by the dump custom command.

	If {c_ID} is used as a attribute, then the local vector calculated by
	the compute is printed. If {c_ID\[I\]} is used, then I must be in the
	range from 1-M, which will print the Ith column of the local array
	with M columns calculated by the compute. See the discussion above
	for how I can be specified with a wildcard asterisk to effectively
	specify multiple values.

	The {f_ID} and {f_ID\[I\]} attributes allow local vectors or arrays
	calculated by a "fix"_fix.html to be output. The ID in the attribute
	should be replaced by the actual ID of the fix that has been defined
	previously in the input script.

	If {f_ID} is used as a attribute, then the local vector calculated by
	the fix is printed. If {f_ID\[I\]} is used, then I must be in the
	range from 1-M, which will print the Ith column of the local with M
	columns calculated by the fix. See the discussion above for how I can
	be specified with a wildcard asterisk to effectively specify multiple
	values.

	Here is an example of how to dump bond info for a system, including
	the distance and energy of each bond:

	compute 1 all property/local batom1 batom2 btype
	compute 2 all bond/local dist eng
	dump 1 all local 1000 tmp.dump index c_1\[1\] c_1\[2\] c_1\[3\] c_2\[1\] c_2\[2\] :pre

	:line

	This section explains the atom attributes that can be specified as
	part of the {custom} and {cfg} styles.

	The {id}, {mol}, {proc}, {procp1}, {type}, {element}, {mass}, {vx},
	{vy}, {vz}, {fx}, {fy}, {fz}, {q} attributes are self-explanatory.

	{Id} is the atom ID. {Mol} is the molecule ID, included in the data
	file for molecular systems. {Proc} is the ID of the processor (0 to
	Nprocs-1) that currently owns the atom. {Procp1} is the proc ID+1,
	which can be convenient in place of a {type} attribute (1 to Ntypes)
	for coloring atoms in a visualization program. {Type} is the atom
	type (1 to Ntypes). {Element} is typically the chemical name of an
	element, which you must assign to each type via the "dump_modify
	element"_dump_modify.html command. More generally, it can be any
	string you wish to associated with an atom type. {Mass} is the atom
	mass. {Vx}, {vy}, {vz}, {fx}, {fy}, {fz}, and {q} are components of
	atom velocity and force and atomic charge.

	There are several options for outputting atom coordinates. The {x},
	{y}, {z} attributes write atom coordinates "unscaled", in the
	appropriate distance "units"_units.html (Angstroms, sigma, etc). Use
	{xs}, {ys}, {zs} if you want the coordinates "scaled" to the box size,
	so that each value is 0.0 to 1.0. If the simulation box is triclinic
	(tilted), then all atom coords will still be between 0.0 and 1.0.
	I.e. actual unscaled (x,y,z) = xsA + ysB + zs*C, where (A,B,C) are
	the non-orthogonal vectors of the simulation box edges, as discussed
	in "Section 6.12"_Section_howto.html#howto_12.

	Use {xu}, {yu}, {zu} if you want the coordinates "unwrapped" by the
	image flags for each atom. Unwrapped means that if the atom has
	passed thru a periodic boundary one or more times, the value is
	printed for what the coordinate would be if it had not been wrapped
	back into the periodic box. Note that using {xu}, {yu}, {zu} means
	that the coordinate values may be far outside the box bounds printed
	with the snapshot. Using {xsu}, {ysu}, {zsu} is similar to using
	{xu}, {yu}, {zu}, except that the unwrapped coordinates are scaled by
	the box size. Atoms that have passed through a periodic boundary will
	have the corresponding coordinate increased or decreased by 1.0.

	The image flags can be printed directly using the {ix}, {iy}, {iz}
	attributes. For periodic dimensions, they specify which image of the
	simulation box the atom is considered to be in. An image of 0 means
	it is inside the box as defined. A value of 2 means add 2 box lengths
	to get the true value. A value of -1 means subtract 1 box length to
	get the true value. LAMMPS updates these flags as atoms cross
	periodic boundaries during the simulation.

	The {mux}, {muy}, {muz} attributes are specific to dipolar systems
	defined with an atom style of {dipole}. They give the orientation of
	the atom's point dipole moment. The {mu} attribute gives the
	magnitude of the atom's dipole moment.

	The {radius} and {diameter} attributes are specific to spherical
	particles that have a finite size, such as those defined with an atom
	style of {sphere}.

	The {omegax}, {omegay}, and {omegaz} attributes are specific to
	finite-size spherical particles that have an angular velocity. Only
	certain atom styles, such as {sphere} define this quantity.

	The {angmomx}, {angmomy}, and {angmomz} attributes are specific to
	finite-size aspherical particles that have an angular momentum. Only
	the {ellipsoid} atom style defines this quantity.

	The {tqx}, {tqy}, {tqz} attributes are for finite-size particles that
	can sustain a rotational torque due to interactions with other
	particles.

	The {c_ID} and {c_ID\[I\]} attributes allow per-atom vectors or arrays
	calculated by a "compute"_compute.html to be output. The ID in the
	attribute should be replaced by the actual ID of the compute that has
	been defined previously in the input script. See the
	"compute"_compute.html command for details. There are computes for
	calculating the per-atom energy, stress, centro-symmetry parameter,
	and coordination number of individual atoms.

	Note that computes which calculate global or local quantities, as
	opposed to per-atom quantities, cannot be output in a dump custom
	command. Instead, global quantities can be output by the
	"thermo_style custom"_thermo_style.html command, and local quantities
	can be output by the dump local command.

	If {c_ID} is used as a attribute, then the per-atom vector calculated
	by the compute is printed. If {c_ID\[I\]} is used, then I must be in
	the range from 1-M, which will print the Ith column of the per-atom
	array with M columns calculated by the compute. See the discussion
	above for how I can be specified with a wildcard asterisk to
	effectively specify multiple values.

	The {f_ID} and {f_ID\[I\]} attributes allow vector or array per-atom
	quantities calculated by a "fix"_fix.html to be output. The ID in the
	attribute should be replaced by the actual ID of the fix that has been
	defined previously in the input script. The "fix
	ave/atom"_fix_ave_atom.html command is one that calculates per-atom
	quantities. Since it can time-average per-atom quantities produced by
	any "compute"_compute.html, "fix"_fix.html, or atom-style
	"variable"_variable.html, this allows those time-averaged results to
	be written to a dump file.

	If {f_ID} is used as a attribute, then the per-atom vector calculated
	by the fix is printed. If {f_ID\[I\]} is used, then I must be in the
	range from 1-M, which will print the Ith column of the per-atom array
	with M columns calculated by the fix. See the discussion above for
	how I can be specified with a wildcard asterisk to effectively specify
	multiple values.

	The {v_name} attribute allows per-atom vectors calculated by a
	"variable"_variable.html to be output. The name in the attribute
	should be replaced by the actual name of the variable that has been
	defined previously in the input script. Only an atom-style variable
	can be referenced, since it is the only style that generates per-atom
	values. Variables of style {atom} can reference individual atom
	attributes, per-atom atom attributes, thermodynamic keywords, or
	invoke other computes, fixes, or variables when they are evaluated, so
	this is a very general means of creating quantities to output to a
	dump file.

	The {d_name} and {i_name} attributes allow to output custom per atom
	floating point or integer properties that are managed by
	"fix property/atom"_fix_property_atom.html.

	See "Section 10"_Section_modify.html of the manual for information
	on how to add new compute and fix styles to LAMMPS to calculate
	per-atom quantities which could then be output into dump files.

	:line

	[Restrictions:]

	To write gzipped dump files, you must either compile LAMMPS with the
	-DLAMMPS_GZIP option or use the styles from the COMPRESS package
	- see the "Making LAMMPS"_Section_start.html#start_2 section of
	the documentation.

	The {atom/gz}, {cfg/gz}, {custom/gz}, and {xyz/gz} styles are part
	of the COMPRESS package. They are only enabled if LAMMPS was built
	with that package. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	The {atom/mpiio}, {cfg/mpiio}, {custom/mpiio}, and {xyz/mpiio} styles
	are part of the MPIIO package. They are only enabled if LAMMPS was
	built with that package. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	The {xtc} style is part of the MISC package. It is only enabled if
	LAMMPS was built with that package. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info. This is
	because some machines may not support the low-level XDR data format
	that XTC files are written with, which will result in a compile-time
	error when a low-level include file is not found. Putting this style
	in a package makes it easy to exclude from a LAMMPS build for those
	machines. However, the MISC package also includes two compatibility
	header files and associated functions, which should be a suitable
	substitute on machines that do not have the appropriate native header
	files. This option can be invoked at build time by adding
	-DLAMMPS_XDR to the CCFLAGS variable in the appropriate low-level
	Makefile, e.g. src/MAKE/Makefile.foo. This compatibility mode has
	been tested successfully on Cray XT3/XT4/XT5 and IBM BlueGene/L
	machines and should also work on IBM BG/P, and Windows XP/Vista/7
	machines.

	[Related commands:]

	"dump h5md"_dump_h5md.html, "dump image"_dump_image.html,
	"dump molfile"_dump_molfile.html, "dump_modify"_dump_modify.html,
	"undump"_undump.html

	[Default:]

	The defaults for the {image} and {movie} styles are listed on the
	"dump image"_dump_image.html doc page.
	diff --git a/doc/src/dump_custom_vtk.txt b/doc/src/dump_custom_vtk.txt
	deleted file mode 100644
	index d4c16193d..000000000
	--- a/doc/src/dump_custom_vtk.txt
	+++ /dev/null
	@@ -1,347 +0,0 @@
	- "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
	-
	-:link(lws,http://lammps.sandia.gov)
	-:link(ld,Manual.html)
	-:link(lc,Section_commands.html#comm)
	-
	-:line
	-
	-dump custom/vtk command :h3
	-
	-[Syntax:]
	-
	-dump ID group-ID style N file args :pre
	-
	-ID = user-assigned name for the dump :ulb,l
	-group-ID = ID of the group of atoms to be dumped :l
	-style = {custom/vtk} :l
	-N = dump every this many timesteps :l
	-file = name of file to write dump info to :l
	-args = list of arguments for a particular style :l
	- {custom/vtk} args = list of atom attributes
	- possible attributes = id, mol, proc, procp1, type, element, mass,
	- x, y, z, xs, ys, zs, xu, yu, zu,
	- xsu, ysu, zsu, ix, iy, iz,
	- vx, vy, vz, fx, fy, fz,
	- q, mux, muy, muz, mu,
	- radius, diameter, omegax, omegay, omegaz,
	- angmomx, angmomy, angmomz, tqx, tqy, tqz,
	- c_ID, c_ID\[N\], f_ID, f_ID\[N\], v_name :pre
	-
	- id = atom ID
	- mol = molecule ID
	- proc = ID of processor that owns atom
	- procp1 = ID+1 of processor that owns atom
	- type = atom type
	- element = name of atom element, as defined by "dump_modify"_dump_modify.html command
	- mass = atom mass
	- x,y,z = unscaled atom coordinates
	- xs,ys,zs = scaled atom coordinates
	- xu,yu,zu = unwrapped atom coordinates
	- xsu,ysu,zsu = scaled unwrapped atom coordinates
	- ix,iy,iz = box image that the atom is in
	- vx,vy,vz = atom velocities
	- fx,fy,fz = forces on atoms
	- q = atom charge
	- mux,muy,muz = orientation of dipole moment of atom
	- mu = magnitude of dipole moment of atom
	- radius,diameter = radius,diameter of spherical particle
	- omegax,omegay,omegaz = angular velocity of spherical particle
	- angmomx,angmomy,angmomz = angular momentum of aspherical particle
	- tqx,tqy,tqz = torque on finite-size particles
	- c_ID = per-atom vector calculated by a compute with ID
	- c_ID\[I\] = Ith column of per-atom array calculated by a compute with ID, I can include wildcard (see below)
	- f_ID = per-atom vector calculated by a fix with ID
	- f_ID\[I\] = Ith column of per-atom array calculated by a fix with ID, I can include wildcard (see below)
	- v_name = per-atom vector calculated by an atom-style variable with name
	- d_name = per-atom floating point vector with name, managed by fix property/atom
	- i_name = per-atom integer vector with name, managed by fix property/atom :pre
	-:ule
	-
	-[Examples:]
	-
	-dump dmpvtk all custom/vtk 100 dump*.myforce.vtk id type vx fx
	-dump dmpvtp flow custom/vtk 100 dump*.%.displace.vtp id type c_myD\[1\] c_myD\[2\] c_myD\[3\] v_ke :pre
	-
	-The style {custom/vtk} is similar to the "custom"_dump.html style but
	-uses the VTK library to write data to VTK simple legacy or XML format
	-depending on the filename extension specified. This can be either
	-{.vtk} for the legacy format or {.vtp} and {*.vtu}, respectively,
	-for the XML format; see the "VTK
	-homepage"_http://www.vtk.org/VTK/img/file-formats.pdf for a detailed
	-description of these formats. Since this naming convention conflicts
	-with the way binary output is usually specified (see below),
	-"dump_modify binary"_dump_modify.html allows to set the binary
	-flag for this dump style explicitly.
	-
	-[Description:]
	-
	-Dump a snapshot of atom quantities to one or more files every N
	-timesteps in a format readable by the "VTK visualization
	-toolkit"_http://www.vtk.org or other visualization tools that use it,
	-e.g. "ParaView"_http://www.paraview.org. The timesteps on which dump
	-output is written can also be controlled by a variable; see the
	-"dump_modify every"_dump_modify.html command for details.
	-
	-Only information for atoms in the specified group is dumped. The
	-"dump_modify thresh and region"_dump_modify.html commands can also
	-alter what atoms are included; see details below.
	-
	-As described below, special characters ("*", "%") in the filename
	-determine the kind of output.
	-
	-IMPORTANT NOTE: Because periodic boundary conditions are enforced only
	-on timesteps when neighbor lists are rebuilt, the coordinates of an
	-atom written to a dump file may be slightly outside the simulation
	-box.
	-
	-IMPORTANT NOTE: Unless the "dump_modify sort"_dump_modify.html
	-option is invoked, the lines of atom information written to dump files
	-will be in an indeterminate order for each snapshot. This is even
	-true when running on a single processor, if the "atom_modify
	-sort"_atom_modify.html option is on, which it is by default. In this
	-case atoms are re-ordered periodically during a simulation, due to
	-spatial sorting. It is also true when running in parallel, because
	-data for a single snapshot is collected from multiple processors, each
	-of which owns a subset of the atoms.
	-
	-For the {custom/vtk} style, sorting is off by default. See the
	-"dump_modify"_dump_modify.html doc page for details.
	-
	-:line
	-
	-The dimensions of the simulation box are written to a separate file
	-for each snapshot (either in legacy VTK or XML format depending on
	-the format of the main dump file) with the suffix {_boundingBox}
	-appended to the given dump filename.
	-
	-For an orthogonal simulation box this information is saved as a
	-rectilinear grid (legacy .vtk or .vtr XML format).
	-
	-Triclinic simulation boxes (non-orthogonal) are saved as
	-hexahedrons in either legacy .vtk or .vtu XML format.
	-
	-Style {custom/vtk} allows you to specify a list of atom attributes
	-to be written to the dump file for each atom. Possible attributes
	-are listed above. In contrast to the {custom} style, the attributes
	-are rearranged to ensure correct ordering of vector components
	-(except for computes and fixes - these have to be given in the right
	-order) and duplicate entries are removed.
	-
	-You cannot specify a quantity that is not defined for a particular
	-simulation - such as {q} for atom style {bond}, since that atom style
	-doesn't assign charges. Dumps occur at the very end of a timestep,
	-so atom attributes will include effects due to fixes that are applied
	-during the timestep. An explanation of the possible dump custom/vtk attributes
	-is given below. Since position data is required to write VTK files "x y z"
	-do not have to be specified explicitly.
	-
	-The VTK format uses a single snapshot of the system per file, thus
	-a wildcard "*" must be included in the filename, as discussed below.
	-Otherwise the dump files will get overwritten with the new snapshot
	-each time.
	-
	-:line
	-
	-Dumps are performed on timesteps that are a multiple of N (including
	-timestep 0) and on the last timestep of a minimization if the
	-minimization converges. Note that this means a dump will not be
	-performed on the initial timestep after the dump command is invoked,
	-if the current timestep is not a multiple of N. This behavior can be
	-changed via the "dump_modify first"_dump_modify.html command, which
	-can also be useful if the dump command is invoked after a minimization
	-ended on an arbitrary timestep. N can be changed between runs by
	-using the "dump_modify every"_dump_modify.html command.
	-The "dump_modify every"_dump_modify.html command
	-also allows a variable to be used to determine the sequence of
	-timesteps on which dump files are written. In this mode a dump on the
	-first timestep of a run will also not be written unless the
	-"dump_modify first"_dump_modify.html command is used.
	-
	-Dump filenames can contain two wildcard characters. If a "*"
	-character appears in the filename, then one file per snapshot is
	-written and the "*" character is replaced with the timestep value.
	-For example, tmp.dump*.vtk becomes tmp.dump0.vtk, tmp.dump10000.vtk,
	-tmp.dump20000.vtk, etc. Note that the "dump_modify pad"_dump_modify.html
	-command can be used to insure all timestep numbers are the same length
	-(e.g. 00010), which can make it easier to read a series of dump files
	-in order with some post-processing tools.
	-
	-If a "%" character appears in the filename, then each of P processors
	-writes a portion of the dump file, and the "%" character is replaced
	-with the processor ID from 0 to P-1 preceded by an underscore character.
	-For example, tmp.dump%.vtp becomes tmp.dump_0.vtp, tmp.dump_1.vtp, ...
	-tmp.dump_P-1.vtp, etc. This creates smaller files and can be a fast
	-mode of output on parallel machines that support parallel I/O for output.
	-
	-By default, P = the number of processors meaning one file per
	-processor, but P can be set to a smaller value via the {nfile} or
	-{fileper} keywords of the "dump_modify"_dump_modify.html command.
	-These options can be the most efficient way of writing out dump files
	-when running on large numbers of processors.
	-
	-For the legacy VTK format "%" is ignored and P = 1, i.e., only
	-processor 0 does write files.
	-
	-Note that using the "*" and "%" characters together can produce a
	-large number of small dump files!
	-
	-If {dump_modify binary} is used, the dump file (or files, if "*" or
	-"%" is also used) is written in binary format. A binary dump file
	-will be about the same size as a text version, but will typically
	-write out much faster.
	-
	-:line
	-
	-This section explains the atom attributes that can be specified as
	-part of the {custom/vtk} style.
	-
	-The {id}, {mol}, {proc}, {procp1}, {type}, {element}, {mass}, {vx},
	-{vy}, {vz}, {fx}, {fy}, {fz}, {q} attributes are self-explanatory.
	-
	-{Id} is the atom ID. {Mol} is the molecule ID, included in the data
	-file for molecular systems. {Proc} is the ID of the processor (0 to
	-Nprocs-1) that currently owns the atom. {Procp1} is the proc ID+1,
	-which can be convenient in place of a {type} attribute (1 to Ntypes)
	-for coloring atoms in a visualization program. {Type} is the atom
	-type (1 to Ntypes). {Element} is typically the chemical name of an
	-element, which you must assign to each type via the "dump_modify
	-element"_dump_modify.html command. More generally, it can be any
	-string you wish to associated with an atom type. {Mass} is the atom
	-mass. {Vx}, {vy}, {vz}, {fx}, {fy}, {fz}, and {q} are components of
	-atom velocity and force and atomic charge.
	-
	-There are several options for outputting atom coordinates. The {x},
	-{y}, {z} attributes write atom coordinates "unscaled", in the
	-appropriate distance "units"_units.html (Angstroms, sigma, etc). Use
	-{xs}, {ys}, {zs} if you want the coordinates "scaled" to the box size,
	-so that each value is 0.0 to 1.0. If the simulation box is triclinic
	-(tilted), then all atom coords will still be between 0.0 and 1.0.
	-I.e. actual unscaled (x,y,z) = xsA + ysB + zs*C, where (A,B,C) are
	-the non-orthogonal vectors of the simulation box edges, as discussed
	-in "Section 6.12"_Section_howto.html#howto_12.
	-
	-Use {xu}, {yu}, {zu} if you want the coordinates "unwrapped" by the
	-image flags for each atom. Unwrapped means that if the atom has
	-passed thru a periodic boundary one or more times, the value is
	-printed for what the coordinate would be if it had not been wrapped
	-back into the periodic box. Note that using {xu}, {yu}, {zu} means
	-that the coordinate values may be far outside the box bounds printed
	-with the snapshot. Using {xsu}, {ysu}, {zsu} is similar to using
	-{xu}, {yu}, {zu}, except that the unwrapped coordinates are scaled by
	-the box size. Atoms that have passed through a periodic boundary will
	-have the corresponding coordinate increased or decreased by 1.0.
	-
	-The image flags can be printed directly using the {ix}, {iy}, {iz}
	-attributes. For periodic dimensions, they specify which image of the
	-simulation box the atom is considered to be in. An image of 0 means
	-it is inside the box as defined. A value of 2 means add 2 box lengths
	-to get the true value. A value of -1 means subtract 1 box length to
	-get the true value. LAMMPS updates these flags as atoms cross
	-periodic boundaries during the simulation.
	-
	-The {mux}, {muy}, {muz} attributes are specific to dipolar systems
	-defined with an atom style of {dipole}. They give the orientation of
	-the atom's point dipole moment. The {mu} attribute gives the
	-magnitude of the atom's dipole moment.
	-
	-The {radius} and {diameter} attributes are specific to spherical
	-particles that have a finite size, such as those defined with an atom
	-style of {sphere}.
	-
	-The {omegax}, {omegay}, and {omegaz} attributes are specific to
	-finite-size spherical particles that have an angular velocity. Only
	-certain atom styles, such as {sphere} define this quantity.
	-
	-The {angmomx}, {angmomy}, and {angmomz} attributes are specific to
	-finite-size aspherical particles that have an angular momentum. Only
	-the {ellipsoid} atom style defines this quantity.
	-
	-The {tqx}, {tqy}, {tqz} attributes are for finite-size particles that
	-can sustain a rotational torque due to interactions with other
	-particles.
	-
	-The {c_ID} and {c_ID\[I\]} attributes allow per-atom vectors or arrays
	-calculated by a "compute"_compute.html to be output. The ID in the
	-attribute should be replaced by the actual ID of the compute that has
	-been defined previously in the input script. See the
	-"compute"_compute.html command for details. There are computes for
	-calculating the per-atom energy, stress, centro-symmetry parameter,
	-and coordination number of individual atoms.
	-
	-Note that computes which calculate global or local quantities, as
	-opposed to per-atom quantities, cannot be output in a dump custom/vtk
	-command. Instead, global quantities can be output by the
	-"thermo_style custom"_thermo_style.html command, and local quantities
	-can be output by the dump local command.
	-
	-If {c_ID} is used as a attribute, then the per-atom vector calculated
	-by the compute is printed. If {c_ID\[I\]} is used, then I must be in
	-the range from 1-M, which will print the Ith column of the per-atom
	-array with M columns calculated by the compute. See the discussion
	-above for how I can be specified with a wildcard asterisk to
	-effectively specify multiple values.
	-
	-The {f_ID} and {f_ID\[I\]} attributes allow vector or array per-atom
	-quantities calculated by a "fix"_fix.html to be output. The ID in the
	-attribute should be replaced by the actual ID of the fix that has been
	-defined previously in the input script. The "fix
	-ave/atom"_fix_ave_atom.html command is one that calculates per-atom
	-quantities. Since it can time-average per-atom quantities produced by
	-any "compute"_compute.html, "fix"_fix.html, or atom-style
	-"variable"_variable.html, this allows those time-averaged results to
	-be written to a dump file.
	-
	-If {f_ID} is used as a attribute, then the per-atom vector calculated
	-by the fix is printed. If {f_ID\[I\]} is used, then I must be in the
	-range from 1-M, which will print the Ith column of the per-atom array
	-with M columns calculated by the fix. See the discussion above for
	-how I can be specified with a wildcard asterisk to effectively specify
	-multiple values.
	-
	-The {v_name} attribute allows per-atom vectors calculated by a
	-"variable"_variable.html to be output. The name in the attribute
	-should be replaced by the actual name of the variable that has been
	-defined previously in the input script. Only an atom-style variable
	-can be referenced, since it is the only style that generates per-atom
	-values. Variables of style {atom} can reference individual atom
	-attributes, per-atom atom attributes, thermodynamic keywords, or
	-invoke other computes, fixes, or variables when they are evaluated, so
	-this is a very general means of creating quantities to output to a
	-dump file.
	-
	-The {d_name} and {i_name} attributes allow to output custom per atom
	-floating point or integer properties that are managed by
	-"fix property/atom"_fix_property_atom.html.
	-
	-See "Section 10"_Section_modify.html of the manual for information
	-on how to add new compute and fix styles to LAMMPS to calculate
	-per-atom quantities which could then be output into dump files.
	-
	-:line
	-
	-[Restrictions:]
	-
	-The {custom/vtk} style does not support writing of gzipped dump files.
	-
	-The {custom/vtk} dump style is part of the USER-VTK package. It is
	-only enabled if LAMMPS was built with that package. See the "Making
	-LAMMPS"_Section_start.html#start_3 section for more info.
	-
	-To use this dump style, you also must link to the VTK library. See
	-the info in lib/vtk/README and insure the Makefile.lammps file in that
	-directory is appropriate for your machine.
	-
	-The {custom/vtk} dump style neither supports buffering nor custom
	-format strings.
	-
	-[Related commands:]
	-
	-"dump"_dump.html, "dump image"_dump_image.html,
	-"dump_modify"_dump_modify.html, "undump"_undump.html
	-
	-[Default:]
	-
	-By default, files are written in ASCII format. If the file extension
	-is not one of .vtk, .vtp or .vtu, the legacy VTK file format is used.
	-
	diff --git a/doc/src/dump_h5md.txt b/doc/src/dump_h5md.txt
	index d797e633e..93c87d85b 100644
	--- a/doc/src/dump_h5md.txt
	+++ b/doc/src/dump_h5md.txt
	@@ -1,123 +1,123 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	dump h5md command :h3

	[Syntax:]

	dump ID group-ID h5md N file.h5 args :pre

	ID = user-assigned name for the dump :ulb,l
	group-ID = ID of the group of atoms to be imaged :l
	h5md = style of dump command (other styles {atom} or {cfg} or {dcd} or {xtc} or {xyz} or {local} or {custom} are discussed on the "dump"_dump.html doc page) :l
	N = dump every this many timesteps :l
	file.h5 = name of file to write to :l
	-args = list of data elements to dump, with their dump "subintervals".
	-At least one element must be given and image may only be present if
	-position is specified first. :l
	+args = list of data elements to dump, with their dump "subintervals"
	position options
	image
	velocity options
	force options
	species options
	file_from ID: do not open a new file, re-use the already opened file from dump ID
	box value = {yes} or {no}
	create_group value = {yes} or {no}
	author value = quoted string :pre
	+:ule

	-For the elements {position}, {velocity}, {force} and {species}, one
	-may specify a sub-interval to write the data only every N_element
	-iterations of the dump (i.e. every N*N_element time steps). This is
	-specified by the option
	+Note that at least one element must be specified and image may only be
	+present if position is specified first.

	- every N_element :pre
	+For the elements {position}, {velocity}, {force} and {species}, a
	+sub-interval may be specified to write the data only every N_element
	+iterations of the dump (i.e. every N*N_element time steps). This is
	+specified by this option directly following the element declaration:

	-that follows directly the element declaration.
	+every N_element :pre

	:ule

	[Examples:]

	dump h5md1 all h5md 100 dump_h5md.h5 position image
	dump h5md1 all h5md 100 dump_h5md.h5 position velocity every 10
	dump h5md1 all h5md 100 dump_h5md.h5 velocity author "John Doe" :pre

	[Description:]

	Dump a snapshot of atom coordinates every N timesteps in the
	"HDF5"_HDF5_ws based "H5MD"_h5md file format "(de Buyl)"_#h5md_cpc.
	HDF5 files are binary, portable and self-describing. This dump style
	will write only one file, on the root node.

	Several dumps may write to the same file, by using file_from and
	referring to a previously defined dump. Several groups may also be
	stored within the same file by defining several dumps. A dump that
	refers (via {file_from}) to an already open dump ID and that concerns
	another particle group must specify {create_group yes}.

	:link(h5md,http://nongnu.org/h5md/)

	Each data element is written every N*N_element steps. For {image}, no
	subinterval is needed as it must be present at the same interval as
	{position}. {image} must be given after {position} in any case. The
	box information (edges in each dimension) is stored at the same
	interval than the {position} element, if present. Else it is stored
	every N steps.

	NOTE: Because periodic boundary conditions are enforced only on
	timesteps when neighbor lists are rebuilt, the coordinates of an atom
	written to a dump file may be slightly outside the simulation box.

	[Use from write_dump:]

	It is possible to use this dump style with the
	"write_dump"_write_dump.html command. In this case, the subintervals
	must not be set at all. The write_dump command can be used either to
	create a new file or to add current data to an existing dump file by
	using the {file_from} keyword.

	Typically, the {species} data is fixed. The following two commands
	store the position data every 100 timesteps, with the image data, and
	store once the species data in the same file.

	dump h5md1 all h5md 100 dump.h5 position image
	write_dump all h5md dump.h5 file_from h5md1 species :pre

	:line

	[Restrictions:]

	The number of atoms per snapshot cannot change with the h5md style.
	The position data is stored wrapped (box boundaries not enforced, see
	note above). Only orthogonal domains are currently supported. This is
	a limitation of the present dump h5md command and not of H5MD itself.

	The {h5md} dump style is part of the USER-H5MD package. It is only
	enabled if LAMMPS was built with that package. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info. It also
	requires (i) building the ch5md library provided with LAMMPS (See the
	"Making LAMMPS"_Section_start.html#start_3 section for more info.) and
	(ii) having the "HDF5"_HDF5_ws library installed (C bindings are
	sufficient) on your system. The library ch5md is compiled with the
	h5cc wrapper provided by the HDF5 library.

	:link(HDF5_ws,http://www.hdfgroup.org/HDF5/)

	:line

	[Related commands:]

	"dump"_dump.html, "dump_modify"_dump_modify.html, "undump"_undump.html

	:line

	:link(h5md_cpc)
	[(de Buyl)] de Buyl, Colberg and Hofling, H5MD: A structured,
	efficient, and portable file format for molecular data,
	Comp. Phys. Comm. 185(6), 1546-1553 (2014) -
	"\[arXiv:1308.6382\]"_http://arxiv.org/abs/1308.6382/.
	diff --git a/doc/src/dump_nc.txt b/doc/src/dump_nc.txt
	deleted file mode 100644
	index 0b81ee6a3..000000000
	--- a/doc/src/dump_nc.txt
	+++ /dev/null
	@@ -1,66 +0,0 @@
	-"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
	-
	-:link(lws,http://lammps.sandia.gov)
	-:link(ld,Manual.html)
	-:link(lc,Section_commands.html#comm)
	-
	-:line
	-
	-dump nc command :h3
	-dump nc/mpiio command :h3
	-
	-[Syntax:]
	-
	-dump ID group-ID nc N file.nc args
	-dump ID group-ID nc/mpiio N file.nc args :pre
	-
	-ID = user-assigned name for the dump :ulb,l
	-group-ID = ID of the group of atoms to be imaged :l
	-{nc} or {nc/mpiio} = style of dump command (other styles {atom} or {cfg} or {dcd} or {xtc} or {xyz} or {local} or {custom} are discussed on the "dump"_dump.html doc page) :l
	-N = dump every this many timesteps :l
	-file.nc = name of file to write to :l
	-args = list of per atom data elements to dump, same as for the 'custom' dump style. :l,ule
	-
	-[Examples:]
	-
	-dump 1 all nc 100 traj.nc type x y z vx vy vz
	-dump_modify 1 append yes at -1 global c_thermo_pe c_thermo_temp c_thermo_press :pre
	-
	-dump 1 all nc/mpiio 1000 traj.nc id type x y z :pre
	-
	-[Description:]
	-
	-Dump a snapshot of atom coordinates every N timesteps in Amber-style
	-NetCDF file format. NetCDF files are binary, portable and
	-self-describing. This dump style will write only one file on the root
	-node. The dump style {nc} uses the "standard NetCDF
	-library"_netcdf-home all data is collected on one processor and then
	-written to the dump file. Dump style {nc/mpiio} used the "parallel
	-NetCDF library"_pnetcdf-home and MPI-IO; it has better performance on
	-a larger number of processors. Note that 'nc' outputs all atoms sorted
	-by atom tag while 'nc/mpiio' outputs in order of the MPI rank.
	-
	-In addition to per-atom data, also global (i.e. not per atom, but per
	-frame) quantities can be included in the dump file. This can be
	-variables, output from computes or fixes data prefixed with v_, c_ and
	-f_, respectively. These properties are included via
	-"dump_modify"_dump_modify.html {global}.
	-
	-:link(netcdf-home,http://www.unidata.ucar.edu/software/netcdf/)
	-:link(pnetcdf-home,http://trac.mcs.anl.gov/projects/parallel-netcdf/)
	-
	-:line
	-
	-[Restrictions:]
	-
	-The {nc} and {nc/mpiio} dump styles are part of the USER-NC-DUMP
	-package. It is only enabled if LAMMPS was built with that
	-package. See the "Making LAMMPS"_Section_start.html#start_3 section
	-for more info.
	-
	-:line
	-
	-[Related commands:]
	-
	-"dump"_dump.html, "dump_modify"_dump_modify.html, "undump"_undump.html
	-
	diff --git a/doc/src/dump_netcdf.txt b/doc/src/dump_netcdf.txt
	new file mode 100644
	index 000000000..4e8265669
	--- /dev/null
	+++ b/doc/src/dump_netcdf.txt
	@@ -0,0 +1,82 @@
	+"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
	+
	+:link(lws,http://lammps.sandia.gov)
	+:link(ld,Manual.html)
	+:link(lc,Section_commands.html#comm)
	+
	+:line
	+
	+dump netcdf command :h3
	+dump netcdf/mpiio command :h3
	+
	+[Syntax:]
	+
	+dump ID group-ID netcdf N file args
	+dump ID group-ID netcdf/mpiio N file args :pre
	+
	+ID = user-assigned name for the dump :ulb,l
	+group-ID = ID of the group of atoms to be imaged :l
	+{netcdf} or {netcdf/mpiio} = style of dump command (other styles {atom} or {cfg} or {dcd} or {xtc} or {xyz} or {local} or {custom} are discussed on the "dump"_dump.html doc page) :l
	+N = dump every this many timesteps :l
	+file = name of file to write dump info to :l
	+args = list of atom attributes, same as for "dump_style custom"_dump.html :l,ule
	+
	+[Examples:]
	+
	+dump 1 all netcdf 100 traj.nc type x y z vx vy vz
	+dump_modify 1 append yes at -1 global c_thermo_pe c_thermo_temp c_thermo_press
	+dump 1 all netcdf/mpiio 1000 traj.nc id type x y z :pre
	+
	+[Description:]
	+
	+Dump a snapshot of atom coordinates every N timesteps in Amber-style
	+NetCDF file format. NetCDF files are binary, portable and
	+self-describing. This dump style will write only one file on the root
	+node. The dump style {netcdf} uses the "standard NetCDF
	+library"_netcdf-home. All data is collected on one processor and then
	+written to the dump file. Dump style {netcdf/mpiio} uses the
	+"parallel NetCDF library"_pnetcdf-home and MPI-IO to write to the dump
	+file in parallel; it has better performance on a larger number of
	+processors. Note that style {netcdf} outputs all atoms sorted by atom
	+tag while style {netcdf/mpiio} outputs atoms in order of their MPI
	+rank.
	+
	+NetCDF files can be directly visualized via the following tools:
	+
	+Ovito (http://www.ovito.org/). Ovito supports the AMBER convention and
	+all of the above extensions. :ule,b
	+
	+VMD (http://www.ks.uiuc.edu/Research/vmd/). :l
	+
	+AtomEye (http://www.libatoms.org/). The libAtoms version of AtomEye
	+contains a NetCDF reader that is not present in the standard
	+distribution of AtomEye. :l,ule
	+
	+In addition to per-atom data, global data can be included in the dump
	+file, which are the kinds of values output by the
	+"thermo_style"_thermo_style.html command . See "Section howto
	+6.15"_Section_howto.html#howto_15 for an explanation of per-atom
	+versus global data. The global output written into the dump file can
	+be from computes, fixes, or variables, by prefixing the compute/fix ID
	+or variable name with "c_" or "f_" or "v_" respectively, as in the
	+example above. These global values are specified via the "dump_modify
	+global"_dump_modify.html command.
	+
	+:link(netcdf-home,http://www.unidata.ucar.edu/software/netcdf/)
	+:link(pnetcdf-home,http://trac.mcs.anl.gov/projects/parallel-netcdf/)
	+
	+:line
	+
	+[Restrictions:]
	+
	+The {netcdf} and {netcdf/mpiio} dump styles are part of the
	+USER-NETCDF package. They are only enabled if LAMMPS was built with
	+that package. See the "Making LAMMPS"_Section_start.html#start_3
	+section for more info.
	+
	+:line
	+
	+[Related commands:]
	+
	+"dump"_dump.html, "dump_modify"_dump_modify.html, "undump"_undump.html
	+
	diff --git a/doc/src/dump_vtk.txt b/doc/src/dump_vtk.txt
	new file mode 100644
	index 000000000..21502e7f4
	--- /dev/null
	+++ b/doc/src/dump_vtk.txt
	@@ -0,0 +1,179 @@
	+ "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
	+
	+:link(lws,http://lammps.sandia.gov)
	+:link(ld,Manual.html)
	+:link(lc,Section_commands.html#comm)
	+
	+:line
	+
	+dump vtk command :h3
	+
	+[Syntax:]
	+
	+dump ID group-ID vtk N file args :pre
	+
	+ID = user-assigned name for the dump
	+group-ID = ID of the group of atoms to be dumped
	+vtk = style of dump command (other styles {atom} or {cfg} or {dcd} or {xtc} or {xyz} or {local} or {custom} are discussed on the "dump"_dump.html doc page)
	+N = dump every this many timesteps
	+file = name of file to write dump info to
	+args = same as arguments for "dump_style custom"_dump.html :ul
	+
	+[Examples:]
	+
	+dump dmpvtk all vtk 100 dump*.myforce.vtk id type vx fx
	+dump dmpvtp flow vtk 100 dump*.%.displace.vtp id type c_myD\[1\] c_myD\[2\] c_myD\[3\] v_ke :pre
	+
	+[Description:]
	+
	+Dump a snapshot of atom quantities to one or more files every N
	+timesteps in a format readable by the "VTK visualization
	+toolkit"_http://www.vtk.org or other visualization tools that use it,
	+e.g. "ParaView"_http://www.paraview.org. The timesteps on which dump
	+output is written can also be controlled by a variable; see the
	+"dump_modify every"_dump_modify.html command for details.
	+
	+This dump style is similar to "dump_style custom"_dump.html but uses
	+the VTK library to write data to VTK simple legacy or XML format
	+depending on the filename extension specified for the dump file. This
	+can be either {.vtk} for the legacy format or {.vtp} and {*.vtu},
	+respectively, for XML format; see the "VTK
	+homepage"_http://www.vtk.org/VTK/img/file-formats.pdf for a detailed
	+description of these formats. Since this naming convention conflicts
	+with the way binary output is usually specified (see below), the
	+"dump_modify binary"_dump_modify.html command allows setting of a
	+binary option for this dump style explicitly.
	+
	+Only information for atoms in the specified group is dumped. The
	+"dump_modify thresh and region"_dump_modify.html commands can also
	+alter what atoms are included; see details below.
	+
	+As described below, special characters ("*", "%") in the filename
	+determine the kind of output.
	+
	+IMPORTANT NOTE: Because periodic boundary conditions are enforced only
	+on timesteps when neighbor lists are rebuilt, the coordinates of an
	+atom written to a dump file may be slightly outside the simulation
	+box.
	+
	+IMPORTANT NOTE: Unless the "dump_modify sort"_dump_modify.html option
	+is invoked, the lines of atom information written to dump files will
	+be in an indeterminate order for each snapshot. This is even true
	+when running on a single processor, if the "atom_modify
	+sort"_atom_modify.html option is on, which it is by default. In this
	+case atoms are re-ordered periodically during a simulation, due to
	+spatial sorting. It is also true when running in parallel, because
	+data for a single snapshot is collected from multiple processors, each
	+of which owns a subset of the atoms.
	+
	+For the {vtk} style, sorting is off by default. See the
	+"dump_modify"_dump_modify.html doc page for details.
	+
	+:line
	+
	+The dimensions of the simulation box are written to a separate file
	+for each snapshot (either in legacy VTK or XML format depending on the
	+format of the main dump file) with the suffix {_boundingBox} appended
	+to the given dump filename.
	+
	+For an orthogonal simulation box this information is saved as a
	+rectilinear grid (legacy .vtk or .vtr XML format).
	+
	+Triclinic simulation boxes (non-orthogonal) are saved as
	+hexahedrons in either legacy .vtk or .vtu XML format.
	+
	+Style {vtk} allows you to specify a list of atom attributes to be
	+written to the dump file for each atom. The list of possible attributes
	+is the same as for the "dump_style custom"_dump.html command; see
	+its doc page for a listing and an explanation of each attribute.
	+
	+NOTE: Since position data is required to write VTK files the atom
	+attributes "x y z" do not have to be specified explicitly; they will
	+be included in the dump file regardless. Also, in contrast to the
	+{custom} style, the specified {vtk} attributes are rearranged to
	+ensure correct ordering of vector components (except for computes and
	+fixes - these have to be given in the right order) and duplicate
	+entries are removed.
	+
	+The VTK format uses a single snapshot of the system per file, thus
	+a wildcard "*" must be included in the filename, as discussed below.
	+Otherwise the dump files will get overwritten with the new snapshot
	+each time.
	+
	+:line
	+
	+Dumps are performed on timesteps that are a multiple of N (including
	+timestep 0) and on the last timestep of a minimization if the
	+minimization converges. Note that this means a dump will not be
	+performed on the initial timestep after the dump command is invoked,
	+if the current timestep is not a multiple of N. This behavior can be
	+changed via the "dump_modify first"_dump_modify.html command, which
	+can also be useful if the dump command is invoked after a minimization
	+ended on an arbitrary timestep. N can be changed between runs by
	+using the "dump_modify every"_dump_modify.html command.
	+The "dump_modify every"_dump_modify.html command
	+also allows a variable to be used to determine the sequence of
	+timesteps on which dump files are written. In this mode a dump on the
	+first timestep of a run will also not be written unless the
	+"dump_modify first"_dump_modify.html command is used.
	+
	+Dump filenames can contain two wildcard characters. If a "*"
	+character appears in the filename, then one file per snapshot is
	+written and the "*" character is replaced with the timestep value.
	+For example, tmp.dump*.vtk becomes tmp.dump0.vtk, tmp.dump10000.vtk,
	+tmp.dump20000.vtk, etc. Note that the "dump_modify pad"_dump_modify.html
	+command can be used to insure all timestep numbers are the same length
	+(e.g. 00010), which can make it easier to read a series of dump files
	+in order with some post-processing tools.
	+
	+If a "%" character appears in the filename, then each of P processors
	+writes a portion of the dump file, and the "%" character is replaced
	+with the processor ID from 0 to P-1 preceded by an underscore character.
	+For example, tmp.dump%.vtp becomes tmp.dump_0.vtp, tmp.dump_1.vtp, ...
	+tmp.dump_P-1.vtp, etc. This creates smaller files and can be a fast
	+mode of output on parallel machines that support parallel I/O for output.
	+
	+By default, P = the number of processors meaning one file per
	+processor, but P can be set to a smaller value via the {nfile} or
	+{fileper} keywords of the "dump_modify"_dump_modify.html command.
	+These options can be the most efficient way of writing out dump files
	+when running on large numbers of processors.
	+
	+For the legacy VTK format "%" is ignored and P = 1, i.e., only
	+processor 0 does write files.
	+
	+Note that using the "*" and "%" characters together can produce a
	+large number of small dump files!
	+
	+If {dump_modify binary} is used, the dump file (or files, if "*" or
	+"%" is also used) is written in binary format. A binary dump file
	+will be about the same size as a text version, but will typically
	+write out much faster.
	+
	+:line
	+
	+[Restrictions:]
	+
	+The {vtk} style does not support writing of gzipped dump files.
	+
	+The {vtk} dump style is part of the USER-VTK package. It is
	+only enabled if LAMMPS was built with that package. See the "Making
	+LAMMPS"_Section_start.html#start_3 section for more info.
	+
	+To use this dump style, you also must link to the VTK library. See
	+the info in lib/vtk/README and insure the Makefile.lammps file in that
	+directory is appropriate for your machine.
	+
	+The {vtk} dump style supports neither buffering or custom format
	+strings.
	+
	+[Related commands:]
	+
	+"dump"_dump.html, "dump image"_dump_image.html,
	+"dump_modify"_dump_modify.html, "undump"_undump.html
	+
	+[Default:]
	+
	+By default, files are written in ASCII format. If the file extension
	+is not one of .vtk, .vtp or .vtu, the legacy VTK file format is used.
	+
	diff --git a/doc/src/fix_cmap.txt b/doc/src/fix_cmap.txt
	index 5fcac589b..2b14a20c1 100644
	--- a/doc/src/fix_cmap.txt
	+++ b/doc/src/fix_cmap.txt
	@@ -1,132 +1,135 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	fix cmap command :h3

	[Syntax:]

	fix ID group-ID cmap filename :pre

	ID, group-ID are documented in "fix"_fix.html command
	cmap = style name of this fix command
	filename = force-field file with CMAP coefficients :ul

	[Examples:]

	fix myCMAP all cmap ../potentials/cmap36.data
	read_data proteinX.data fix myCMAP crossterm CMAP
	fix_modify myCMAP energy yes :pre

	[Description:]

	This command enables CMAP crossterms to be added to simulations which
	use the CHARMM force field. These are relevant for any CHARMM model
	of a peptide or protein sequences that is 3 or more amino-acid
	residues long; see "(Buck)"_#Buck and "(Brooks)"_#Brooks2 for details,
	including the analytic energy expressions for CMAP interactions. The
	CMAP crossterms add additional potential energy contributions to pairs
	of overlapping phi-psi dihedrals of amino-acids, which are important
	to properly represent their conformational behavior.

	The examples/cmap directory has a sample input script and data file
	for a small peptide, that illustrates use of the fix cmap command.

	As in the example above, this fix should be used before reading a data
	file that contains a listing of CMAP interactions. The {filename}
	specified should contain the CMAP parameters for a particular version
	of the CHARMM force field. Two such files are including in the
	lammps/potentials directory: charmm22.cmap and charmm36.cmap.

	The data file read by the "read_data" must contain the topology of all
	the CMAP interactions, similar to the topology data for bonds, angles,
	dihedrals, etc. Specially it should have a line like this
	in its header section:

	N crossterms :pre

	where N is the number of CMAP crossterms. It should also have a section
	in the body of the data file like this with N lines:

	CMAP :pre

	1 1 8 10 12 18 20
	2 5 18 20 22 25 27
	\[...\]
	N 3 314 315 317 318 330 :pre

	The first column is an index from 1 to N to enumerate the CMAP terms;
	it is ignored by LAMMPS. The 2nd column is the "type" of the
	interaction; it is an index into the CMAP force field file. The
	remaining 5 columns are the atom IDs of the atoms in the two 4-atom
	dihedrals that overlap to create the CMAP 5-body interaction. Note
	that the "crossterm" and "CMAP" keywords for the header and body
	sections match those specified in the read_data command following the
	data file name; see the "read_data"_read_data.html doc page for
	more details.

	A data file containing CMAP crossterms can be generated from a PDB
	file using the charmm2lammps.pl script in the tools/ch2lmp directory
	of the LAMMPS distribution. The script must be invoked with the
	optional "-cmap" flag to do this; see the tools/ch2lmp/README file for
	more information.

	The potential energy associated with CMAP interactions can be output
	as described below. It can also be included in the total potential
	energy of the system, as output by the
	"thermo_style"_thermo_style.html command, if the "fix_modify
	energy"_fix_modify.html command is used, as in the example above. See
	the note below about how to include the CMAP energy when performing an
	"energy minimization"_minimize.html.

	:line

	[Restart, fix_modify, output, run start/stop, minimize info:]

	-No information about this fix is written to "binary restart
	-files"_restart.html.
	+This fix writes the list of CMAP crossterms to "binary restart
	+files"_restart.html. See the "read_restart"_read_restart.html command
	+for info on how to re-specify a fix in an input script that reads a
	+restart file, so that the operation of the fix continues in an
	+uninterrupted fashion.

	The "fix_modify"_fix_modify.html {energy} option is supported by this
	fix to add the potential "energy" of the CMAP interactions system's
	potential energy as part of "thermodynamic output"_thermo_style.html.

	This fix computes a global scalar which can be accessed by various
	"output commands"_Section_howto.html#howto_15. The scalar is the
	potential energy discussed above. The scalar value calculated by this
	fix is "extensive".

	No parameter of this fix can be used with the {start/stop} keywords of
	the "run"_run.html command.

	The forces due to this fix are imposed during an energy minimization,
	invoked by the "minimize"_minimize.html command.

	NOTE: If you want the potential energy associated with the CMAP terms
	forces to be included in the total potential energy of the system (the
	quantity being minimized), you MUST enable the
	"fix_modify"_fix_modify.html {energy} option for this fix.

	[Restrictions:]

	This fix can only be used if LAMMPS was built with the MOLECULE
	package. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info on packages.

	[Related commands:]

	"fix_modify"_fix_modify.html, "read_data"_read_data.html

	[Default:] none

	:line

	:link(Buck)
	[(Buck)] Buck, Bouguet-Bonnet, Pastor, MacKerell Jr., Biophys J, 90, L36
	(2006).

	:link(Brooks2)
	[(Brooks)] Brooks, Brooks, MacKerell Jr., J Comput Chem, 30, 1545 (2009).
	diff --git a/doc/src/fix_gcmc.txt b/doc/src/fix_gcmc.txt
	index 53973cdfb..7ac607a2f 100644
	--- a/doc/src/fix_gcmc.txt
	+++ b/doc/src/fix_gcmc.txt
	@@ -1,417 +1,417 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	fix gcmc command :h3

	[Syntax:]

	fix ID group-ID gcmc N X M type seed T mu displace keyword values ... :pre

	ID, group-ID are documented in "fix"_fix.html command :ulb,l
	gcmc = style name of this fix command :l
	N = invoke this fix every N steps :l
	X = average number of GCMC exchanges to attempt every N steps :l
	M = average number of MC moves to attempt every N steps :l
	type = atom type for inserted atoms (must be 0 if mol keyword used) :l
	seed = random # seed (positive integer) :l
	T = temperature of the ideal gas reservoir (temperature units) :l
	mu = chemical potential of the ideal gas reservoir (energy units) :l
	displace = maximum Monte Carlo translation distance (length units) :l
	zero or more keyword/value pairs may be appended to args :l
	keyword = {mol}, {region}, {maxangle}, {pressure}, {fugacity_coeff}, {full_energy}, {charge}, {group}, {grouptype}, {intra_energy}, {tfac_insert}, or {overlap_cutoff}
	{mol} value = template-ID
	template-ID = ID of molecule template specified in a separate "molecule"_molecule.html command
	{rigid} value = fix-ID
	fix-ID = ID of "fix rigid/small"_fix_rigid.html command
	{shake} value = fix-ID
	fix-ID = ID of "fix shake"_fix_shake.html command
	{region} value = region-ID
	region-ID = ID of region where MC moves are allowed
	{maxangle} value = maximum molecular rotation angle (degrees)
	{pressure} value = pressure of the gas reservoir (pressure units)
	{fugacity_coeff} value = fugacity coefficient of the gas reservoir (unitless)
	{full_energy} = compute the entire system energy when performing MC moves
	{charge} value = charge of inserted atoms (charge units)
	{group} value = group-ID
	group-ID = group-ID for inserted atoms (string)
	{grouptype} values = type group-ID
	type = atom type (int)
	group-ID = group-ID for inserted atoms (string)
	{intra_energy} value = intramolecular energy (energy units)
	{tfac_insert} value = scale up/down temperature of inserted atoms (unitless)
	{overlap_cutoff} value = maximum pair distance for overlap rejection (distance units) :pre
	:ule

	[Examples:]

	fix 2 gas gcmc 10 1000 1000 2 29494 298.0 -0.5 0.01
	fix 3 water gcmc 10 100 100 0 3456543 3.0 -2.5 0.1 mol my_one_water maxangle 180 full_energy
	fix 4 my_gas gcmc 1 10 10 1 123456543 300.0 -12.5 1.0 region disk :pre

	[Description:]

	This fix performs grand canonical Monte Carlo (GCMC) exchanges of
	atoms or molecules of the given type with an imaginary ideal gas
	reservoir at the specified T and chemical potential (mu) as discussed
	in "(Frenkel)"_#Frenkel. If used with the "fix nvt"_fix_nh.html
	command, simulations in the grand canonical ensemble (muVT, constant
	chemical potential, constant volume, and constant temperature) can be
	performed. Specific uses include computing isotherms in microporous
	materials, or computing vapor-liquid coexistence curves.

	Every N timesteps the fix attempts a number of GCMC exchanges
	(insertions or deletions) of gas atoms or molecules of the given type
	between the simulation cell and the imaginary reservoir. It also
	attempts a number of Monte Carlo moves (translations and molecule
	rotations) of gas of the given type within the simulation cell or
	region. The average number of attempted GCMC exchanges is X. The
	average number of attempted MC moves is M. M should typically be
	chosen to be approximately equal to the expected number of gas atoms
	or molecules of the given type within the simulation cell or region,
	which will result in roughly one MC translation per atom or molecule
	per MC cycle.

	For MC moves of molecular gasses, rotations and translations are each
	attempted with 50% probability. For MC moves of atomic gasses,
	translations are attempted 100% of the time. For MC exchanges of
	either molecular or atomic gasses, deletions and insertions are each
	attempted with 50% probability.

	All inserted particles are always assigned to two groups: the default
	group "all" and the group specified in the fix gcmc command (which can
	also be "all"). In addition, particles are also added to any groups
	specified by the {group} and {grouptype} keywords. If inserted
	particles are individual atoms, they are assigned the atom type given
	by the type argument. If they are molecules, the type argument has no
	effect and must be set to zero. Instead, the type of each atom in the
	inserted molecule is specified in the file read by the
	"molecule"_molecule.html command.

	This fix cannot be used to perform MC insertions of gas atoms or
	molecules other than the exchanged type, but MC deletions,
	translations, and rotations can be performed on any atom/molecule in
	the fix group. All atoms in the simulation cell can be moved using
	regular time integration translations, e.g. via "fix nvt"_fix_nh.html,
	resulting in a hybrid GCMC+MD simulation. A smaller-than-usual
	timestep size may be needed when running such a hybrid simulation,
	especially if the inserted molecules are not well equilibrated.

	This command may optionally use the {region} keyword to define an
	exchange and move volume. The specified region must have been
	previously defined with a "region"_region.html command. It must be
	defined with side = {in}. Insertion attempts occur only within the
	specified region. For non-rectangular regions, random trial points are
	generated within the rectangular bounding box until a point is found
	that lies inside the region. If no valid point is generated after 1000
	trials, no insertion is performed, but it is counted as an attempted
	insertion. Move and deletion attempt candidates are selected from gas
	atoms or molecules within the region. If there are no candidates, no
	move or deletion is performed, but it is counted as an attempt move or
	deletion. If an attempted move places the atom or molecule
	center-of-mass outside the specified region, a new attempted move is
	generated. This process is repeated until the atom or molecule
	center-of-mass is inside the specified region.

	If used with "fix nvt"_fix_nh.html, the temperature of the imaginary
	reservoir, T, should be set to be equivalent to the target temperature
	used in fix nvt. Otherwise, the imaginary reservoir will not be in
	thermal equilibrium with the simulation cell. Also, it is important
	that the temperature used by fix nvt be dynamic/dof, which can be
	achieved as follows:

	compute mdtemp mdatoms temp
	compute_modify mdtemp dynamic/dof yes
	fix mdnvt mdatoms nvt temp 300.0 300.0 10.0
	fix_modify mdnvt temp mdtemp :pre

	Note that neighbor lists are re-built every timestep that this fix is
	invoked, so you should not set N to be too small. However, periodic
	rebuilds are necessary in order to avoid dangerous rebuilds and missed
	interactions. Specifically, avoid performing so many MC translations
	per timestep that atoms can move beyond the neighbor list skin
	distance. See the "neighbor"_neighbor.html command for details.

	When an atom or molecule is to be inserted, its coordinates are chosen
	at a random position within the current simulation cell or region, and
	new atom velocities are randomly chosen from the specified temperature
	distribution given by T. The effective temperature for new atom
	velocities can be increased or decreased using the optional keyword
	{tfac_insert} (see below). Relative coordinates for atoms in a
	molecule are taken from the template molecule provided by the
	user. The center of mass of the molecule is placed at the insertion
	point. The orientation of the molecule is chosen at random by rotating
	about this point.

	Individual atoms are inserted, unless the {mol} keyword is used. It
	specifies a {template-ID} previously defined using the
	"molecule"_molecule.html command, which reads a file that defines the
	molecule. The coordinates, atom types, charges, etc, as well as any
	bond/angle/etc and special neighbor information for the molecule can
	be specified in the molecule file. See the "molecule"_molecule.html
	command for details. The only settings required to be in this file
	are the coordinates and types of atoms in the molecule.

	When not using the {mol} keyword, you should ensure you do not delete
	atoms that are bonded to other atoms, or LAMMPS will soon generate an
	error when it tries to find bonded neighbors. LAMMPS will warn you if
	any of the atoms eligible for deletion have a non-zero molecule ID,
	but does not check for this at the time of deletion.

	If you wish to insert molecules via the {mol} keyword, that will be
	treated as rigid bodies, use the {rigid} keyword, specifying as its
	value the ID of a separate "fix rigid/small"_fix_rigid.html command
	which also appears in your input script.

	NOTE: If you wish the new rigid molecules (and other rigid molecules)
	to be thermostatted correctly via "fix rigid/small/nvt"_fix_rigid.html
	or "fix rigid/small/npt"_fix_rigid.html, then you need to use the
	"fix_modify dynamic/dof yes" command for the rigid fix. This is to
	inform that fix that the molecule count will vary dynamically.

	If you wish to insert molecules via the {mol} keyword, that will have
	their bonds or angles constrained via SHAKE, use the {shake} keyword,
	specifying as its value the ID of a separate "fix
	shake"_fix_shake.html command which also appears in your input script.

	Optionally, users may specify the maximum rotation angle for molecular
	rotations using the {maxangle} keyword and specifying the angle in
	degrees. Rotations are performed by generating a random point on the
	unit sphere and a random rotation angle on the range
	\[0,maxangle). The molecule is then rotated by that angle about an
	axis passing through the molecule center of mass. The axis is parallel
	to the unit vector defined by the point on the unit sphere. The same
	procedure is used for randomly rotating molecules when they are
	inserted, except that the maximum angle is 360 degrees.

	Note that fix GCMC does not use configurational bias MC or any other
	kind of sampling of intramolecular degrees of freedom. Inserted
	molecules can have different orientations, but they will all have the
	same intramolecular configuration, which was specified in the molecule
	command input.

	For atomic gasses, inserted atoms have the specified atom type, but
	deleted atoms are any atoms that have been inserted or that belong to
	the user-specified fix group. For molecular gasses, exchanged
	molecules use the same atom types as in the template molecule supplied
	by the user. In both cases, exchanged atoms/molecules are assigned to
	two groups: the default group "all" and the group specified in the fix
	gcmc command (which can also be "all").

	The chemical potential is a user-specified input parameter defined
	as:

	:c,image(Eqs/fix_gcmc1.jpg)

	The second term mu_ex is the excess chemical potential due to
	energetic interactions and is formally zero for the fictitious gas
	reservoir but is non-zero for interacting systems. So, while the
	chemical potential of the reservoir and the simulation cell are equal,
	mu_ex is not, and as a result, the densities of the two are generally
	quite different. The first term mu_id is the ideal gas contribution
	to the chemical potential. mu_id can be related to the density or
	pressure of the fictitious gas reservoir by:

	:c,image(Eqs/fix_gcmc2.jpg)

	where k is Boltzman's constant,
	T is the user-specified temperature, rho is the number density,
	P is the pressure, and phi is the fugacity coefficient.
	The constant Lambda is required for dimensional consistency.
	For all unit styles except {lj} it is defined as the thermal
	de Broglie wavelength

	:c,image(Eqs/fix_gcmc3.jpg)

	where h is Planck's constant, and m is the mass of the exchanged atom
	or molecule. For unit style {lj}, Lambda is simply set to the
	unity. Note that prior to March 2017, lambda for unit style {lj} was
	calculated using the above formula with h set to the rather specific
	value of 0.18292026. Chemical potential under the old definition can
	be converted to an equivalent value under the new definition by
	subtracting 3kTln(Lambda_old).

	As an alternative to specifying mu directly, the ideal gas reservoir
	can be defined by its pressure P using the {pressure} keyword, in
	which case the user-specified chemical potential is ignored. The user
	may also specify the fugacity coefficient phi using the
	{fugacity_coeff} keyword, which defaults to unity.

	The {full_energy} option means that fix GCMC will compute the total
	potential energy of the entire simulated system. The total system
	energy before and after the proposed GCMC move is then used in the
	Metropolis criterion to determine whether or not to accept the
	proposed GCMC move. By default, this option is off, in which case only
	partial energies are computed to determine the difference in energy
	that would be caused by the proposed GCMC move.

	The {full_energy} option is needed for systems with complicated
	potential energy calculations, including the following:

	long-range electrostatics (kspace)
	many-body pair styles
	hybrid pair styles
	eam pair styles
	tail corrections
	need to include potential energy contributions from other fixes :ul

	In these cases, LAMMPS will automatically apply the {full_energy}
	keyword and issue a warning message.

	When the {mol} keyword is used, the {full_energy} option also includes
	the intramolecular energy of inserted and deleted molecules. If this
	is not desired, the {intra_energy} keyword can be used to define an
	amount of energy that is subtracted from the final energy when a
	molecule is inserted, and added to the initial energy when a molecule
	is deleted. For molecules that have a non-zero intramolecular energy,
	this will ensure roughly the same behavior whether or not the
	{full_energy} option is used.

	Inserted atoms and molecules are assigned random velocities based on
	the specified temperature T. Because the relative velocity of all
	atoms in the molecule is zero, this may result in inserted molecules
	that are systematically too cold. In addition, the intramolecular
	potential energy of the inserted molecule may cause the kinetic energy
	of the molecule to quickly increase or decrease after insertion. The
	{tfac_insert} keyword allows the user to counteract these effects by
	changing the temperature used to assign velocities to inserted atoms
	and molecules by a constant factor. For a particular application, some
	experimentation may be required to find a value of {tfac_insert} that
	results in inserted molecules that equilibrate quickly to the correct
	temperature.

	Some fixes have an associated potential energy. Examples of such fixes
	include: "efield"_fix_efield.html, "gravity"_fix_gravity.html,
	"addforce"_fix_addforce.html, "langevin"_fix_langevin.html,
	"restrain"_fix_restrain.html,
	"temp/berendsen"_fix_temp_berendsen.html,
	"temp/rescale"_fix_temp_rescale.html, and "wall fixes"_fix_wall.html.
	For that energy to be included in the total potential energy of the
	system (the quantity used when performing GCMC moves), you MUST enable
	the "fix_modify"_fix_modify.html {energy} option for that fix. The
	doc pages for individual "fix"_fix.html commands specify if this
	should be done.

	Use the {charge} option to insert atoms with a user-specified point
	charge. Note that doing so will cause the system to become
	non-neutral. LAMMPS issues a warning when using long-range
	electrostatics (kspace) with non-neutral systems. See the "compute
	group/group"_compute_group_group.html documentation for more details
	about simulating non-neutral systems with kspace on.

	Use of this fix typically will cause the number of atoms to fluctuate,
	therefore, you will want to use the
	"compute_modify"_compute_modify.html command to insure that the
	current number of atoms is used as a normalizing factor each time
	temperature is computed. Here is the necessary command:

	NOTE: If the density of the cell is initially very small or zero, and
	increases to a much larger density after a period of equilibration,
	then certain quantities that are only calculated once at the start
	(kspace parameters, tail corrections) may no longer be accurate. The
	solution is to start a new simulation after the equilibrium density
	has been reached.

	With some pair_styles, such as "Buckingham"_pair_buck.html,
	-"Born-Mayer-Huggins"_pair_born.html and "ReaxFF"_pair_reax_c.html, two
	+"Born-Mayer-Huggins"_pair_born.html and "ReaxFF"_pair_reaxc.html, two
	atoms placed close to each other may have an arbitrary large, negative
	potential energy due to the functional form of the potential. While
	these unphysical configurations are inaccessible to typical dynamical
	trajectories, they can be generated by Monte Carlo moves. The
	{overlap_cutoff} keyword suppresses these moves by effectively
	assigning an infinite positive energy to all new configurations that
	place any pair of atoms closer than the specified overlap cutoff
	distance.

	compute_modify thermo_temp dynamic yes :pre

	If LJ units are used, note that a value of 0.18292026 is used by this
	fix as the reduced value for Planck's constant. This value was
	derived from LJ parameters for argon, where h* = h/sqrt(sigma^2 *
	epsilon * mass), sigma = 3.429 angstroms, epsilon/k = 121.85 K, and
	mass = 39.948 amu.

	The {group} keyword assigns all inserted atoms to the
	"group"_group.html of the group-ID value. The {grouptype} keyword
	assigns all inserted atoms of the specified type to the
	"group"_group.html of the group-ID value.

	[Restart, fix_modify, output, run start/stop, minimize info:]

	This fix writes the state of the fix to "binary restart
	files"_restart.html. This includes information about the random
	number generator seed, the next timestep for MC exchanges, etc. See
	the "read_restart"_read_restart.html command for info on how to
	re-specify a fix in an input script that reads a restart file, so that
	the operation of the fix continues in an uninterrupted fashion.

	None of the "fix_modify"_fix_modify.html options are relevant to this
	fix.

	This fix computes a global vector of length 8, which can be accessed
	by various "output commands"_Section_howto.html#howto_15. The vector
	values are the following global cumulative quantities:

	1 = translation attempts
	2 = translation successes
	3 = insertion attempts
	4 = insertion successes
	5 = deletion attempts
	6 = deletion successes
	7 = rotation attempts
	8 = rotation successes :ul

	The vector values calculated by this fix are "extensive".

	No parameter of this fix can be used with the {start/stop} keywords of
	the "run"_run.html command. This fix is not invoked during "energy
	minimization"_minimize.html.

	[Restrictions:]

	This fix is part of the MC package. It is only enabled if LAMMPS was
	built with that package. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	Do not set "neigh_modify once yes" or else this fix will never be
	called. Reneighboring is required.

	Can be run in parallel, but aspects of the GCMC part will not scale
	well in parallel. Only usable for 3D simulations.

	Note that very lengthy simulations involving insertions/deletions of
	billions of gas molecules may run out of atom or molecule IDs and
	trigger an error, so it is better to run multiple shorter-duration
	simulations. Likewise, very large molecules have not been tested and
	may turn out to be problematic.

	Use of multiple fix gcmc commands in the same input script can be
	problematic if using a template molecule. The issue is that the
	user-referenced template molecule in the second fix gcmc command may
	no longer exist since it might have been deleted by the first fix gcmc
	command. An existing template molecule will need to be referenced by
	the user for each subsequent fix gcmc command.

	[Related commands:]

	"fix atom/swap"_fix_atom_swap.html,
	"fix nvt"_fix_nh.html, "neighbor"_neighbor.html,
	"fix deposit"_fix_deposit.html, "fix evaporate"_fix_evaporate.html,
	"delete_atoms"_delete_atoms.html

	[Default:]

	The option defaults are mol = no, maxangle = 10, overlap_cutoff = 0.0,
	fugacity_coeff = 1, and full_energy = no,
	except for the situations where full_energy is required, as
	listed above.

	:line

	:link(Frenkel)
	[(Frenkel)] Frenkel and Smit, Understanding Molecular Simulation,
	Academic Press, London, 2002.
	diff --git a/doc/src/fix_gle.txt b/doc/src/fix_gle.txt
	index ca7625e2d..b8d3cc9b3 100644
	--- a/doc/src/fix_gle.txt
	+++ b/doc/src/fix_gle.txt
	@@ -1,155 +1,156 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	fix gle command :h3

	[Syntax:]

	fix ID id-group gle Ns Tstart Tstop seed Amatrix \[noneq Cmatrix\] \[every stride\] :pre

	ID, group-ID are documented in "fix"_fix.html command :ulb,l
	gle = style name of this fix command :l
	Ns = number of additional fictitious momenta :l
	Tstart, Tstop = temperature ramp during the run :l
	Amatrix = file to read the drift matrix A from :l
	seed = random number seed to use for generating noise (positive integer) :l
	zero or more keyword/value pairs may be appended :l
	keyword = {noneq} or {every}
	{noneq} Cmatrix = file to read the non-equilibrium covariance matrix from
	{every} stride = apply the GLE once every time steps. Reduces the accuracy
	of the integration of the GLE, but has no effect on the accuracy of equilibrium
	sampling. It might change sampling properties when used together with {noneq}. :pre
	:ule

	[Examples:]

	fix 3 boundary gle 6 300 300 31415 smart.A
	fix 1 all gle 6 300 300 31415 qt-300k.A noneq qt-300k.C :pre

	[Description:]

	Apply a Generalized Langevin Equation (GLE) thermostat as described
	in "(Ceriotti)"_#Ceriotti. The formalism allows one to obtain a number
	of different effects ranging from efficient sampling of all
	vibrational modes in the system to inexpensive (approximate)
	modelling of nuclear quantum effects. Contrary to
	"fix langevin"_fix_langevin.html, this fix performs both
	thermostatting and evolution of the Hamiltonian equations of motion, so it
	should not be used together with "fix nve"_fix_nve.html -- at least not
	on the same atom groups.

	Each degree of freedom in the thermostatted group is supplemented
	with Ns additional degrees of freedom s, and the equations of motion
	become

	dq/dt=p/m
	d(p,s)/dt=(F,0) - A(p,s) + B dW/dt :pre

	where F is the physical force, A is the drift matrix (that generalizes
	the friction in Langevin dynamics), B is the diffusion term and dW/dt
	un-correlated Gaussian random forces. The A matrix couples the physical
	(q,p) dynamics with that of the additional degrees of freedom,
	and makes it possible to obtain effectively a history-dependent
	noise and friction kernel.

	The drift matrix should be given as an external file {Afile},
	as a (Ns+1 x Ns+1) matrix in inverse time units. Matrices that are
	optimal for a given application and the system of choice can be
	obtained from "(GLE4MD)"_#GLE4MD.

	Equilibrium sampling a temperature T is obtained by specifying the
	target value as the {Tstart} and {Tstop} arguments, so that the diffusion
	matrix that gives canonical sampling for a given A is computed automatically.
	However, the GLE framework also allow for non-equilibrium sampling, that
	can be used for instance to model inexpensively zero-point energy
	-effects "(Ceriotti2)"_#Ceriotti2. This is achieved specifying the
	-{noneq} keyword followed by the name of the file that contains the
	-static covariance matrix for the non-equilibrium dynamics.
	+effects "(Ceriotti2)"_#Ceriotti2. This is achieved specifying the {noneq}
	+ keyword followed by the name of the file that contains the static covariance
	+matrix for the non-equilibrium dynamics. Please note, that the covariance
	+matrix is expected to be given in [temperature units].

	Since integrating GLE dynamics can be costly when used together with
	simple potentials, one can use the {every} optional keyword to
	apply the Langevin terms only once every several MD steps, in a
	multiple time-step fashion. This should be used with care when doing
	non-equilibrium sampling, but should have no effect on equilibrium
	averages when using canonical sampling.

	The random number {seed} must be a positive integer. A Marsaglia random
	number generator is used. Each processor uses the input seed to
	generate its own unique seed and its own stream of random numbers.
	Thus the dynamics of the system will not be identical on two runs on
	different numbers of processors.

	Note also that the Generalized Langevin Dynamics scheme that is
	implemented by the "fix gld"_fix_gld.html scheme is closely related
	to the present one. In fact, it should be always possible to cast the
	Prony series form of the memory kernel used by GLD into an appropriate
	input matrix for "fix gle"_fix_gle.html. While the GLE scheme is more
	general, the form used by "fix gld"_fix_gld.html can be more directly
	related to the representation of an implicit solvent environment.

	[Restart, fix_modify, output, run start/stop, minimize info:]

	The instantaneous values of the extended variables are written to
	"binary restart files"_restart.html. Because the state of the random
	number generator is not saved in restart files, this means you cannot
	do "exact" restarts with this fix, where the simulation continues on
	the same as if no restart had taken place. However, in a statistical
	sense, a restarted simulation should produce the same behavior.
	Note however that you should use a different seed each time you
	restart, otherwise the same sequence of random numbers will be used
	each time, which might lead to stochastic synchronization and
	subtle artefacts in the sampling.

	This fix can ramp its target temperature over multiple runs, using the
	{start} and {stop} keywords of the "run"_run.html command. See the
	"run"_run.html command for details of how to do this.

	The "fix_modify"_fix_modify.html {energy} option is supported by this
	fix to add the energy change induced by Langevin thermostatting to the
	system's potential energy as part of "thermodynamic
	output"_thermo_style.html.

	This fix computes a global scalar which can be accessed by various
	"output commands"_Section_howto.html#howto_15. The scalar is the
	cumulative energy change due to this fix. The scalar value
	calculated by this fix is "extensive".

	[Restrictions:]

	The GLE thermostat in its current implementation should not be used
	with rigid bodies, SHAKE or RATTLE. It is expected that all the
	thermostatted degrees of freedom are fully flexible, and the sampled
	ensemble will not be correct otherwise.

	In order to perform constant-pressure simulations please use
	"fix press/berendsen"_fix_press_berendsen.html, rather than
	"fix npt"_fix_nh.html, to avoid duplicate integration of the
	equations of motion.

	This fix is part of the USER-MISC package. It is only enabled if LAMMPS
	was built with that package. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	[Related commands:]

	"fix nvt"_fix_nh.html, "fix temp/rescale"_fix_temp_rescale.html, "fix
	viscous"_fix_viscous.html, "fix nvt"_fix_nh.html, "pair_style
	dpd/tstat"_pair_dpd.html, "fix gld"_fix_gld.html

	:line

	:link(Ceriotti)
	[(Ceriotti)] Ceriotti, Bussi and Parrinello, J Chem Theory Comput 6,
	1170-80 (2010)

	:link(GLE4MD)
	-[(GLE4MD)] "http://epfl-cosmo.github.io/gle4md/"_http://epfl-cosmo.github.io/gle4md/
	+[(GLE4MD)] "http://gle4md.org/"_http://gle4md.org/

	:link(Ceriotti2)
	[(Ceriotti2)] Ceriotti, Bussi and Parrinello, Phys Rev Lett 103,
	030603 (2009)
	diff --git a/doc/src/fix_qeq.txt b/doc/src/fix_qeq.txt
	index f9c8ecde6..22f476689 100644
	--- a/doc/src/fix_qeq.txt
	+++ b/doc/src/fix_qeq.txt
	@@ -1,217 +1,217 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	fix qeq/point command :h3
	fix qeq/shielded command :h3
	fix qeq/slater command :h3
	fix qeq/dynamic command :h3
	fix qeq/fire command :h3

	[Syntax:]

	fix ID group-ID style Nevery cutoff tolerance maxiter qfile keyword ... :pre

	ID, group-ID are documented in "fix"_fix.html command :ulb,l
	style = {qeq/point} or {qeq/shielded} or {qeq/slater} or {qeq/dynamic} or {qeq/fire} :l
	Nevery = perform charge equilibration every this many steps :l
	cutoff = global cutoff for charge-charge interactions (distance unit) :l
	tolerance = precision to which charges will be equilibrated :l
	maxiter = maximum iterations to perform charge equilibration :l
	qfile = a filename with QEq parameters :l

	zero or more keyword/value pairs may be appended :l
	keyword = {alpha} or {qdamp} or {qstep} :l
	{alpha} value = Slater type orbital exponent (qeq/slater only)
	{qdamp} value = damping factor for damped dynamics charge solver (qeq/dynamic and qeq/fire only)
	{qstep} value = time step size for damped dynamics charge solver (qeq/dynamic and qeq/fire only) :pre

	:ule

	[Examples:]

	fix 1 all qeq/point 1 10 1.0e-6 200 param.qeq1
	fix 1 qeq qeq/shielded 1 8 1.0e-6 100 param.qeq2
	fix 1 all qeq/slater 5 10 1.0e-6 100 params alpha 0.2
	fix 1 qeq qeq/dynamic 1 12 1.0e-3 100 my_qeq
	fix 1 all qeq/fire 1 10 1.0e-3 100 my_qeq qdamp 0.2 qstep 0.1 :pre

	[Description:]

	Perform the charge equilibration (QEq) method as described in "(Rappe
	and Goddard)"_#Rappe1 and formulated in "(Nakano)"_#Nakano1 (also known
	as the matrix inversion method) and in "(Rick and Stuart)"_#Rick1 (also
	known as the extended Lagrangian method) based on the
	electronegativity equilization principle.

	These fixes can be used with any "pair style"_pair_style.html in
	LAMMPS, so long as per-atom charges are defined. The most typical
	use-case is in conjunction with a "pair style"_pair_style.html that
	performs charge equilibration periodically (e.g. every timestep), such
	as the ReaxFF or Streitz-Mintmire potential.
	But these fixes can also be used with
	potentials that normally assume per-atom charges are fixed, e.g. a
	"Buckingham"_pair_buck.html or "LJ/Coulombic"_pair_lj.html potential.

	Because the charge equilibration calculation is effectively
	independent of the pair style, these fixes can also be used to perform
	a one-time assignment of charges to atoms. For example, you could
	define the QEq fix, perform a zero-timestep run via the "run"_run.html
	command without any pair style defined which would set per-atom
	charges (based on the current atom configuration), then remove the fix
	via the "unfix"_unfix.html command before performing further dynamics.

	NOTE: Computing and using charge values different from published
	values defined for a fixed-charge potential like Buckingham or CHARMM
	or AMBER, can have a strong effect on energies and forces, and
	produces a different model than the published versions.

	NOTE: The "fix qeq/comb"_fix_qeq_comb.html command must still be used
	to perform charge equilibration with the "COMB
	potential"_pair_comb.html. The "fix qeq/reax"_fix_qeq_reax.html
	command can be used to perform charge equilibration with the "ReaxFF
	-force field"_pair_reax_c.html, although fix qeq/shielded yields the
	+force field"_pair_reaxc.html, although fix qeq/shielded yields the
	same results as fix qeq/reax if {Nevery}, {cutoff}, and {tolerance}
	are the same. Eventually the fix qeq/reax command will be deprecated.

	The QEq method minimizes the electrostatic energy of the system (or
	equalizes the derivative of energy with respect to charge of all the
	atoms) by adjusting the partial charge on individual atoms based on
	interactions with their neighbors within {cutoff}. It requires a few
	parameters, in {metal} units, for each atom type which provided in a
	file specified by {qfile}. The file has the following format

	1 chi eta gamma zeta qcore
	2 chi eta gamma zeta qcore
	...
	Ntype chi eta gamma zeta qcore :pre

	There is one line per atom type with the following parameters.
	Only a subset of the parameters is used by each QEq style as described
	below, thus the others can be set to 0.0 if desired.

	{chi} = electronegativity in energy units
	{eta} = self-Coulomb potential in energy units
	{gamma} = shielded Coulomb constant defined by "ReaxFF force field"_#vanDuin in distance units
	{zeta} = Slater type orbital exponent defined by the "Streitz-Mintmire"_#Streitz1 potential in reverse distance units
	{qcore} = charge of the nucleus defined by the "Streitz-Mintmire potential"_#Streitz1 potential in charge units :ul

	The {qeq/point} style describes partial charges on atoms as point
	charges. Interaction between a pair of charged particles is 1/r,
	which is the simplest description of the interaction between charges.
	Only the {chi} and {eta} parameters from the {qfile} file are used.
	Note that Coulomb catastrophe can occur if repulsion between the pair
	of charged particles is too weak. This style solves partial charges
	on atoms via the matrix inversion method. A tolerance of 1.0e-6 is
	usually a good number.

	The {qeq/shielded} style describes partial charges on atoms also as
	point charges, but uses a shielded Coulomb potential to describe the
	interaction between a pair of charged particles. Interaction through
	the shielded Coulomb is given by equation (13) of the "ReaxFF force
	field"_#vanDuin paper. The shielding accounts for charge overlap
	between charged particles at small separation. This style is the same
	as "fix qeq/reax"_fix_qeq_reax.html, and can be used with "pair_style
	-reax/c"_pair_reax_c.html. Only the {chi}, {eta}, and {gamma}
	+reax/c"_pair_reaxc.html. Only the {chi}, {eta}, and {gamma}
	parameters from the {qfile} file are used. This style solves partial
	charges on atoms via the matrix inversion method. A tolerance of
	1.0e-6 is usually a good number.

	The {qeq/slater} style describes partial charges on atoms as spherical
	charge densities centered around atoms via the Slater 1{s} orbital, so
	that the interaction between a pair of charged particles is the
	product of two Slater 1{s} orbitals. The expression for the Slater
	1{s} orbital is given under equation (6) of the
	"Streitz-Mintmire"_#Streitz1 paper. Only the {chi}, {eta}, {zeta}, and
	{qcore} parameters from the {qfile} file are used. This style solves
	partial charges on atoms via the matrix inversion method. A tolerance
	of 1.0e-6 is usually a good number. Keyword {alpha} can be used to
	change the Slater type orbital exponent.

	The {qeq/dynamic} style describes partial charges on atoms as point
	charges that interact through 1/r, but the extended Lagrangian method
	is used to solve partial charges on atoms. Only the {chi} and {eta}
	parameters from the {qfile} file are used. Note that Coulomb
	catastrophe can occur if repulsion between the pair of charged
	particles is too weak. A tolerance of 1.0e-3 is usually a good
	number. Keyword {qdamp} can be used to change the damping factor, while
	keyword {qstep} can be used to change the time step size.

	The "{qeq/fire}"_#Shan style describes the same charge model and charge
	solver as the {qeq/dynamic} style, but employs a FIRE minimization
	algorithm to solve for equilibrium charges.
	Keyword {qdamp} can be used to change the damping factor, while
	keyword {qstep} can be used to change the time step size.

	Note that {qeq/point}, {qeq/shielded}, and {qeq/slater} describe
	different charge models, whereas the matrix inversion method and the
	extended Lagrangian method ({qeq/dynamic} and {qeq/fire}) are
	different solvers.

	Note that {qeq/point}, {qeq/dynamic} and {qeq/fire} styles all describe
	charges as point charges that interact through 1/r relationship, but
	solve partial charges on atoms using different solvers. These three
	styles should yield comparable results if
	the QEq parameters and {Nevery}, {cutoff}, and {tolerance} are the
	same. Style {qeq/point} is typically faster, {qeq/dynamic} scales
	better on larger sizes, and {qeq/fire} is faster than {qeq/dynamic}.

	NOTE: To avoid the evaluation of the derivative of charge with respect
	to position, which is typically ill-defined, the system should have a
	zero net charge.

	NOTE: Developing QEq parameters (chi, eta, gamma, zeta, and qcore) is
	non-trivial. Charges on atoms are not guaranteed to equilibrate with
	arbitrary choices of these parameters. We do not develop these QEq
	parameters. See the examples/qeq directory for some examples.

	[Restart, fix_modify, output, run start/stop, minimize info:]

	No information about these fixes is written to "binary restart
	files"_restart.html. No global scalar or vector or per-atom
	quantities are stored by these fixes for access by various "output
	commands"_Section_howto.html#howto_15. No parameter of these fixes
	can be used with the {start/stop} keywords of the "run"_run.html
	command.

	Thexe fixes are invoked during "energy minimization"_minimize.html.

	[Restrictions:]

	These fixes are part of the QEQ package. They are only enabled if
	LAMMPS was built with that package. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	[Related commands:]

	"fix qeq/reax"_fix_qeq_reax.html, "fix qeq/comb"_fix_qeq_comb.html

	[Default:] none

	:line

	:link(Rappe1)
	[(Rappe and Goddard)] A. K. Rappe and W. A. Goddard III, J Physical
	Chemistry, 95, 3358-3363 (1991).

	:link(Nakano1)
	[(Nakano)] A. Nakano, Computer Physics Communications, 104, 59-69 (1997).

	:link(Rick1)
	[(Rick and Stuart)] S. W. Rick, S. J. Stuart, B. J. Berne, J Chemical Physics
	101, 16141 (1994).

	:link(Streitz1)
	[(Streitz-Mintmire)] F. H. Streitz, J. W. Mintmire, Physical Review B, 50,
	16, 11996 (1994)

	:link(vanDuin)
	[(ReaxFF)] A. C. T. van Duin, S. Dasgupta, F. Lorant, W. A. Goddard III, J
	Physical Chemistry, 105, 9396-9049 (2001)

	:link(Shan)
	[(QEq/Fire)] T.-R. Shan, A. P. Thompson, S. J. Plimpton, in preparation
	diff --git a/doc/src/fix_qeq_reax.txt b/doc/src/fix_qeq_reax.txt
	index 76c95e111..aed043f6c 100644
	--- a/doc/src/fix_qeq_reax.txt
	+++ b/doc/src/fix_qeq_reax.txt
	@@ -1,124 +1,124 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	fix qeq/reax command :h3
	fix qeq/reax/kk command :h3

	[Syntax:]

	fix ID group-ID qeq/reax Nevery cutlo cuthi tolerance params :pre

	ID, group-ID are documented in "fix"_fix.html command
	qeq/reax = style name of this fix command
	Nevery = perform QEq every this many steps
	cutlo,cuthi = lo and hi cutoff for Taper radius
	tolerance = precision to which charges will be equilibrated
	params = reax/c or a filename :ul

	[Examples:]

	fix 1 all qeq/reax 1 0.0 10.0 1.0e-6 reax/c
	fix 1 all qeq/reax 1 0.0 10.0 1.0e-6 param.qeq :pre

	[Description:]

	Perform the charge equilibration (QEq) method as described in "(Rappe
	and Goddard)"_#Rappe2 and formulated in "(Nakano)"_#Nakano2. It is
	typically used in conjunction with the ReaxFF force field model as
	-implemented in the "pair_style reax/c"_pair_reax_c.html command, but
	+implemented in the "pair_style reax/c"_pair_reaxc.html command, but
	it can be used with any potential in LAMMPS, so long as it defines and
	uses charges on each atom. The "fix qeq/comb"_fix_qeq_comb.html
	command should be used to perform charge equilibration with the "COMB
	potential"_pair_comb.html. For more technical details about the
	charge equilibration performed by fix qeq/reax, see the
	"(Aktulga)"_#qeq-Aktulga paper.

	The QEq method minimizes the electrostatic energy of the system by
	adjusting the partial charge on individual atoms based on interactions
	with their neighbors. It requires some parameters for each atom type.
	If the {params} setting above is the word "reax/c", then these are
	-extracted from the "pair_style reax/c"_pair_reax_c.html command and
	+extracted from the "pair_style reax/c"_pair_reaxc.html command and
	the ReaxFF force field file it reads in. If a file name is specified
	for {params}, then the parameters are taken from the specified file
	and the file must contain one line for each atom type. The latter
	form must be used when performing QeQ with a non-ReaxFF potential.
	Each line should be formatted as follows:

	itype chi eta gamma :pre

	where {itype} is the atom type from 1 to Ntypes, {chi} denotes the
	electronegativity in eV, {eta} denotes the self-Coulomb
	potential in eV, and {gamma} denotes the valence orbital
	exponent. Note that these 3 quantities are also in the ReaxFF
	potential file, except that eta is defined here as twice the eta value
	in the ReaxFF file. Note that unlike the rest of LAMMPS, the units
	of this fix are hard-coded to be A, eV, and electronic charge.

	[Restart, fix_modify, output, run start/stop, minimize info:]

	No information about this fix is written to "binary restart
	files"_restart.html. No global scalar or vector or per-atom
	quantities are stored by this fix for access by various "output
	commands"_Section_howto.html#howto_15. No parameter of this fix can
	be used with the {start/stop} keywords of the "run"_run.html command.

	This fix is invoked during "energy minimization"_minimize.html.

	:line

	Styles with a {gpu}, {intel}, {kk}, {omp}, or {opt} suffix are
	functionally the same as the corresponding style without the suffix.
	They have been optimized to run faster, depending on your available
	hardware, as discussed in "Section 5"_Section_accelerate.html
	of the manual. The accelerated styles take the same arguments and
	should produce the same results, except for round-off and precision
	issues.

	These accelerated styles are part of the GPU, USER-INTEL, KOKKOS,
	USER-OMP and OPT packages, respectively. They are only enabled if
	LAMMPS was built with those packages. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	You can specify the accelerated styles explicitly in your input script
	by including their suffix, or you can use the "-suffix command-line
	switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
	use the "suffix"_suffix.html command in your input script.

	See "Section 5"_Section_accelerate.html of the manual for
	more instructions on how to use the accelerated styles effectively.

	:line

	[Restrictions:]

	This fix is part of the USER-REAXC package. It is only enabled if
	LAMMPS was built with that package. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	This fix does not correctly handle interactions
	involving multiple periodic images of the same atom. Hence, it should not
	be used for periodic cell dimensions less than 10 angstroms.

	[Related commands:]

	-"pair_style reax/c"_pair_reax_c.html
	+"pair_style reax/c"_pair_reaxc.html

	[Default:] none

	:line

	:link(Rappe2)
	[(Rappe)] Rappe and Goddard III, Journal of Physical Chemistry, 95,
	3358-3363 (1991).

	:link(Nakano2)
	[(Nakano)] Nakano, Computer Physics Communications, 104, 59-69 (1997).

	:link(qeq-Aktulga)
	[(Aktulga)] Aktulga, Fogarty, Pandit, Grama, Parallel Computing, 38,
	245-259 (2012).
	diff --git a/doc/src/fix_reax_bonds.txt b/doc/src/fix_reax_bonds.txt
	index 1fd1b3ca5..d3f108709 100644
	--- a/doc/src/fix_reax_bonds.txt
	+++ b/doc/src/fix_reax_bonds.txt
	@@ -1,93 +1,93 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	fix reax/bonds command :h3
	fix reax/c/bonds command :h3
	fix reax/c/bonds/kk command :h3

	[Syntax:]

	fix ID group-ID reax/bonds Nevery filename :pre

	ID, group-ID are documented in "fix"_fix.html command
	reax/bonds = style name of this fix command
	Nevery = output interval in timesteps
	filename = name of output file :ul

	[Examples:]

	fix 1 all reax/bonds 100 bonds.tatb
	fix 1 all reax/c/bonds 100 bonds.reaxc :pre

	[Description:]

	Write out the bond information computed by the ReaxFF potential
	specified by "pair_style reax"_pair_reax.html or "pair_style
	-reax/c"_pair_reax_c.html in the exact same format as the original
	+reax/c"_pair_reaxc.html in the exact same format as the original
	stand-alone ReaxFF code of Adri van Duin. The bond information is
	written to {filename} on timesteps that are multiples of {Nevery},
	including timestep 0. For time-averaged chemical species analysis,
	please see the "fix reaxc/c/species"_fix_reaxc_species.html command.

	The format of the output file should be self-explanatory.

	:line

	[Restart, fix_modify, output, run start/stop, minimize info:]

	No information about this fix is written to "binary restart
	files"_restart.html. None of the "fix_modify"_fix_modify.html options
	are relevant to this fix. No global or per-atom quantities are stored
	by this fix for access by various "output
	commands"_Section_howto.html#howto_15. No parameter of this fix can
	be used with the {start/stop} keywords of the "run"_run.html command.
	This fix is not invoked during "energy minimization"_minimize.html.

	:line

	Styles with a {gpu}, {intel}, {kk}, {omp}, or {opt} suffix are
	functionally the same as the corresponding style without the suffix.
	They have been optimized to run faster, depending on your available
	hardware, as discussed in "Section_accelerate"_Section_accelerate.html
	of the manual. The accelerated styles take the same arguments and
	should produce the same results, except for round-off and precision
	issues.

	These accelerated styles are part of the GPU, USER-INTEL, KOKKOS,
	USER-OMP and OPT packages, respectively. They are only enabled if
	LAMMPS was built with those packages. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	You can specify the accelerated styles explicitly in your input script
	by including their suffix, or you can use the "-suffix command-line
	switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
	use the "suffix"_suffix.html command in your input script.

	See "Section_accelerate"_Section_accelerate.html of the manual for
	more instructions on how to use the accelerated styles effectively.

	:line

	[Restrictions:]

	The fix reax/bonds command requires that the "pair_style
	reax"_pair_reax.html be invoked. This fix is part of the REAX
	package. It is only enabled if LAMMPS was built with that package,
	which also requires the REAX library be built and linked with LAMMPS.
	The fix reax/c/bonds command requires that the "pair_style
	-reax/c"_pair_reax_c.html be invoked. This fix is part of the
	+reax/c"_pair_reaxc.html be invoked. This fix is part of the
	USER-REAXC package. It is only enabled if LAMMPS was built with that
	package. See the "Making LAMMPS"_Section_start.html#start_3 section
	for more info.

	[Related commands:]

	"pair_style reax"_pair_reax.html, "pair_style
	-reax/c"_pair_reax_c.html, "fix reax/c/species"_fix_reaxc_species.html
	+reax/c"_pair_reaxc.html, "fix reax/c/species"_fix_reaxc_species.html

	[Default:] none
	diff --git a/doc/src/fix_reaxc_species.txt b/doc/src/fix_reaxc_species.txt
	index 00db91900..d43a338a6 100644
	--- a/doc/src/fix_reaxc_species.txt
	+++ b/doc/src/fix_reaxc_species.txt
	@@ -1,180 +1,180 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	fix reax/c/species command :h3
	fix reax/c/species/kk command :h3

	[Syntax:]

	fix ID group-ID reax/c/species Nevery Nrepeat Nfreq filename keyword value ... :pre

	ID, group-ID are documented in "fix"_fix.html command :ulb,l
	reax/c/species = style name of this command :l

	Nevery = sample bond-order every this many timesteps :l
	Nrepeat = # of bond-order samples used for calculating averages :l
	Nfreq = calculate average bond-order every this many timesteps :l
	filename = name of output file :l

	zero or more keyword/value pairs may be appended :l
	keyword = {cutoff} or {element} or {position} :l
	{cutoff} value = I J Cutoff
	I, J = atom types
	Cutoff = Bond-order cutoff value for this pair of atom types
	{element} value = Element1, Element2, ...
	{position} value = posfreq filepos
	posfreq = write position files every this many timestep
	filepos = name of position output file :pre
	:ule

	[Examples:]

	fix 1 all reax/c/species 10 10 100 species.out
	fix 1 all reax/c/species 1 2 20 species.out cutoff 1 1 0.40 cutoff 1 2 0.55
	fix 1 all reax/c/species 1 100 100 species.out element Au O H position 1000 AuOH.pos :pre

	[Description:]

	Write out the chemical species information computed by the ReaxFF
	-potential specified by "pair_style reax/c"_pair_reax_c.html.
	+potential specified by "pair_style reax/c"_pair_reaxc.html.
	Bond-order values (either averaged or instantaneous, depending on
	value of {Nrepeat}) are used to determine chemical bonds. Every
	{Nfreq} timesteps, chemical species information is written to
	{filename} as a two line output. The first line is a header
	containing labels. The second line consists of the following:
	timestep, total number of molecules, total number of distinct species,
	number of molecules of each species. In this context, "species" means
	a unique molecule. The chemical formula of each species is given in
	the first line.

	Optional keyword {cutoff} can be assigned to change the minimum
	bond-order values used in identifying chemical bonds between pairs of
	atoms. Bond-order cutoffs should be carefully chosen, as bond-order
	cutoffs that are too small may include too many bonds (which will
	result in an error), while cutoffs that are too large will result in
	fragmented molecules. The default cutoff of 0.3 usually gives good
	results.

	The optional keyword {element} can be used to specify the chemical
	symbol printed for each LAMMPS atom type. The number of symbols must
	match the number of LAMMPS atom types and each symbol must consist of
	1 or 2 alphanumeric characters. Normally, these symbols should be
	chosen to match the chemical identity of each LAMMPS atom type, as
	-specified using the "reax/c pair_coeff"_pair_reax_c.html command and
	+specified using the "reax/c pair_coeff"_pair_reaxc.html command and
	the ReaxFF force field file.

	The optional keyword {position} writes center-of-mass positions of
	each identified molecules to file {filepos} every {posfreq} timesteps.
	The first line contains information on timestep, total number of
	molecules, total number of distinct species, and box dimensions. The
	second line is a header containing labels. From the third line
	downward, each molecule writes a line of output containing the
	following information: molecule ID, number of atoms in this molecule,
	chemical formula, total charge, and center-of-mass xyz positions of
	this molecule. The xyz positions are in fractional coordinates
	relative to the box dimensions.

	For the keyword {position}, the {filepos} is the name of the output
	file. It can contain the wildcard character "". If the ""
	character appears in {filepos}, then one file per snapshot is written
	at {posfreq} and the "*" character is replaced with the timestep
	value. For example, AuO.pos.* becomes AuO.pos.0, AuO.pos.1000, etc.

	:line

	The {Nevery}, {Nrepeat}, and {Nfreq} arguments specify on what
	timesteps the bond-order values are sampled to get the average bond
	order. The species analysis is performed using the average bond-order
	on timesteps that are a multiple of {Nfreq}. The average is over
	{Nrepeat} bond-order samples, computed in the preceding portion of the
	simulation every {Nevery} timesteps. {Nfreq} must be a multiple of
	{Nevery} and {Nevery} must be non-zero even if {Nrepeat} is 1.
	Also, the timesteps
	contributing to the average bond-order cannot overlap,
	i.e. Nrepeat*Nevery can not exceed Nfreq.

	For example, if Nevery=2, Nrepeat=6, and Nfreq=100, then bond-order
	values on timesteps 90,92,94,96,98,100 will be used to compute the
	average bond-order for the species analysis output on timestep 100.

	:line

	[Restart, fix_modify, output, run start/stop, minimize info:]

	No information about this fix is written to "binary restart
	files"_restart.html. None of the "fix_modify"_fix_modify.html options
	are relevant to this fix.

	This fix computes both a global vector of length 2 and a per-atom
	vector, either of which can be accessed by various "output
	commands"_Section_howto.html#howto_15. The values in the global
	vector are "intensive".

	The 2 values in the global vector are as follows:

	1 = total number of molecules
	2 = total number of distinct species :ul

	The per-atom vector stores the molecule ID for each atom as identified
	by the fix. If an atom is not in a molecule, its ID will be 0.
	For atoms in the same molecule, the molecule ID for all of them
	will be the same and will be equal to the smallest atom ID of
	any atom in the molecule.

	No parameter of this fix can be used with the {start/stop} keywords of
	the "run"_run.html command. This fix is not invoked during "energy
	minimization"_minimize.html.

	:line

	Styles with a {gpu}, {intel}, {kk}, {omp}, or {opt} suffix are
	functionally the same as the corresponding style without the suffix.
	They have been optimized to run faster, depending on your available
	hardware, as discussed in "Section_accelerate"_Section_accelerate.html
	of the manual. The accelerated styles take the same arguments and
	should produce the same results, except for round-off and precision
	issues.

	These accelerated styles are part of the GPU, USER-INTEL, KOKKOS,
	USER-OMP and OPT packages, respectively. They are only enabled if
	LAMMPS was built with those packages. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	You can specify the accelerated styles explicitly in your input script
	by including their suffix, or you can use the "-suffix command-line
	switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
	use the "suffix"_suffix.html command in your input script.

	See "Section_accelerate"_Section_accelerate.html of the manual for
	more instructions on how to use the accelerated styles effectively.

	:line

	[Restrictions:]

	The fix species currently only works with
	-"pair_style reax/c"_pair_reax_c.html and it requires that the "pair_style
	-reax/c"_pair_reax_c.html be invoked. This fix is part of the
	+"pair_style reax/c"_pair_reaxc.html and it requires that the "pair_style
	+reax/c"_pair_reaxc.html be invoked. This fix is part of the
	USER-REAXC package. It is only enabled if LAMMPS was built with that
	package. See the "Making LAMMPS"_Section_start.html#start_3 section
	for more info.

	It should be possible to extend it to other reactive pair_styles (such as
	"rebo"_pair_airebo.html, "airebo"_pair_airebo.html,
	"comb"_pair_comb.html, and "bop"_pair_bop.html), but this has not yet been done.

	[Related commands:]

	-"pair_style reax/c"_pair_reax_c.html, "fix
	+"pair_style reax/c"_pair_reaxc.html, "fix
	reax/bonds"_fix_reax_bonds.html

	[Default:]

	The default values for bond-order cutoffs are 0.3 for all I-J pairs. The
	default element symbols are C, H, O, N. Position files are not written
	by default.
	diff --git a/doc/src/improper_cossq.txt b/doc/src/improper_cossq.txt
	index 513f0b315..e238063a8 100644
	--- a/doc/src/improper_cossq.txt
	+++ b/doc/src/improper_cossq.txt
	@@ -1,89 +1,86 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	improper_style cossq command :h3
	improper_style cossq/omp command :h3

	[Syntax:]

	improper_style cossq :pre

	[Examples:]

	improper_style cossq
	improper_coeff 1 4.0 0.0 :pre

	[Description:]

	The {cossq} improper style uses the potential

	:c,image(Eqs/improper_cossq.jpg)

	where x is the improper angle, x0 is its equilibrium value, and K is a
	prefactor.

	If the 4 atoms in an improper quadruplet (listed in the data file read
	by the "read_data"_read_data.html command) are ordered I,J,K,L then X
	is the angle between the plane of I,J,K and the plane of J,K,L.
	Alternatively, you can think of atoms J,K,L as being in a plane, and
	atom I above the plane, and X as a measure of how far out-of-plane I
	is with respect to the other 3 atoms.

	Note that defining 4 atoms to interact in this way, does not mean that
	bonds necessarily exist between I-J, J-K, or K-L, as they would in a
	linear dihedral. Normally, the bonds I-J, I-K, I-L would exist for an
	improper to be defined between the 4 atoms.

	The following coefficients must be defined for each improper type via
	the "improper_coeff"_improper_coeff.html command as in the example
	above, or in the data file or restart files read by the
	"read_data"_read_data.html or "read_restart"_read_restart.html
	commands:

	-K (energy/radian^2)
	+K (energy)
	X0 (degrees) :ul

	-X0 is specified in degrees, but LAMMPS converts it to radians
	-internally; hence the units of K are in energy/radian^2.
	-
	:line

	Styles with a {gpu}, {intel}, {kk}, {omp}, or {opt} suffix are
	functionally the same as the corresponding style without the suffix.
	They have been optimized to run faster, depending on your available
	hardware, as discussed in "Section 5"_Section_accelerate.html
	of the manual. The accelerated styles take the same arguments and
	should produce the same results, except for round-off and precision
	issues.

	These accelerated styles are part of the GPU, USER-INTEL, KOKKOS,
	USER-OMP and OPT packages, respectively. They are only enabled if
	LAMMPS was built with those packages. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	You can specify the accelerated styles explicitly in your input script
	by including their suffix, or you can use the "-suffix command-line
	switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
	use the "suffix"_suffix.html command in your input script.

	See "Section 5"_Section_accelerate.html of the manual for
	more instructions on how to use the accelerated styles effectively.

	:line

	[Restrictions:]

	This improper style can only be used if LAMMPS was built with the
	USER-MISC package. See the "Making LAMMPS"_Section_start.html#start_3
	section for more info on packages.

	[Related commands:]

	"improper_coeff"_improper_coeff.html

	[Default:] none
	diff --git a/doc/src/improper_ring.txt b/doc/src/improper_ring.txt
	index 705b1cf74..cba59399e 100644
	--- a/doc/src/improper_ring.txt
	+++ b/doc/src/improper_ring.txt
	@@ -1,96 +1,93 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	improper_style ring command :h3
	improper_style ring/omp command :h3

	[Syntax:]

	improper_style ring :pre

	[Examples:]

	improper_style ring
	improper_coeff 1 8000 70.5 :pre

	[Description:]

	The {ring} improper style uses the potential

	:c,image(Eqs/improper_ring.jpg)

	where K is a prefactor, theta is the angle formed by the atoms
	specified by (i,j,k,l) indices and theta0 its equilibrium value.

	If the 4 atoms in an improper quadruplet (listed in the data file read
	by the "read_data"_read_data.html command) are ordered i,j,k,l then
	theta_{ijl} is the angle between atoms i,j and l, theta_{ijk} is the
	angle between atoms i,j and k, theta_{kjl} is the angle between atoms
	j,k, and l.

	The "ring" improper style implements the improper potential introduced
	by Destree et al., in Equation (9) of "(Destree)"_#Destree. This
	potential does not affect small amplitude vibrations but is used in an
	ad-hoc way to prevent the onset of accidentally large amplitude
	fluctuations leading to the occurrence of a planar conformation of the
	three bonds i-j, j-k and j-l, an intermediate conformation toward the
	chiral inversion of a methine carbon. In the "Impropers" section of
	data file four atoms: i, j, k and l are specified with i,j and l lying
	on the backbone of the chain and k specifying the chirality of j.

	The following coefficients must be defined for each improper type via
	the "improper_coeff"_improper_coeff.html command as in the example
	above, or in the data file or restart files read by the
	"read_data"_read_data.html or "read_restart"_read_restart.html
	commands:

	-K (energy/radian^2)
	+K (energy)
	theta0 (degrees) :ul

	-theta0 is specified in degrees, but LAMMPS converts it to radians
	-internally; hence the units of K are in energy/radian^2.
	-
	:line

	Styles with a {gpu}, {intel}, {kk}, {omp}, or {opt} suffix are
	functionally the same as the corresponding style without the suffix.
	They have been optimized to run faster, depending on your available
	hardware, as discussed in "Section 5"_Section_accelerate.html
	of the manual. The accelerated styles take the same arguments and
	should produce the same results, except for round-off and precision
	issues.

	These accelerated styles are part of the GPU, USER-INTEL, KOKKOS,
	USER-OMP and OPT packages, respectively. They are only enabled if
	LAMMPS was built with those packages. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	You can specify the accelerated styles explicitly in your input script
	by including their suffix, or you can use the "-suffix command-line
	switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
	use the "suffix"_suffix.html command in your input script.

	See "Section 5"_Section_accelerate.html of the manual for
	more instructions on how to use the accelerated styles effectively.

	:line

	[Restrictions:]

	This improper style can only be used if LAMMPS was built with the
	USER-MISC package. See the "Making LAMMPS"_Section_start.html#start_3
	section for more info on packages.

	[Related commands:]

	"improper_coeff"_improper_coeff.html

	:link(Destree)
	[(Destree)] M. Destree, F. Laupretre, A. Lyulin, and J.-P. Ryckaert,
	J Chem Phys, 112, 9632 (2000).

	diff --git a/doc/src/lammps.book b/doc/src/lammps.book
	index 6c68955bc..b2b42aa7e 100644
	--- a/doc/src/lammps.book
	+++ b/doc/src/lammps.book
	@@ -1,643 +1,643 @@
	#HTMLDOC 1.8.27
	-t pdf14 -f "../Manual.pdf" --book --toclevels 4 --no-numbered --toctitle "Table of Contents" --title --textcolor #000000 --linkcolor #0000ff --linkstyle plain --bodycolor #ffffff --size Universal --left 1.00in --right 0.50in --top 0.50in --bottom 0.50in --header .t. --header1 ... --footer ..1 --nup 1 --tocheader .t. --tocfooter ..i --portrait --color --no-pscommands --no-xrxcomments --compression=1 --jpeg=0 --fontsize 11.0 --fontspacing 1.2 --headingfont helvetica --bodyfont times --headfootsize 11.0 --headfootfont helvetica --charset iso-8859-1 --links --embedfonts --pagemode document --pagelayout single --firstpage c1 --pageeffect none --pageduration 10 --effectduration 1.0 --no-encryption --permissions all --owner-password "" --user-password "" --browserwidth 680 --no-strict --no-overflow

	Manual.html
	Section_intro.html
	Section_start.html
	Section_commands.html
	Section_packages.html
	Section_accelerate.html
	accelerate_gpu.html
	accelerate_intel.html
	accelerate_kokkos.html
	accelerate_omp.html
	accelerate_opt.html
	Section_howto.html
	Section_example.html
	Section_perf.html
	Section_tools.html
	Section_modify.html
	Section_python.html
	Section_errors.html
	Section_history.html

	tutorial_drude.html
	tutorial_github.html
	tutorial_pylammps.html

	body.html
	manifolds.html

	angle_coeff.html
	angle_style.html
	atom_modify.html
	atom_style.html
	balance.html
	bond_coeff.html
	bond_style.html
	bond_write.html
	boundary.html
	box.html
	change_box.html
	clear.html
	comm_modify.html
	comm_style.html
	compute.html
	compute_modify.html
	create_atoms.html
	create_bonds.html
	create_box.html
	delete_atoms.html
	delete_bonds.html
	dielectric.html
	dihedral_coeff.html
	dihedral_style.html
	dimension.html
	displace_atoms.html
	dump.html
	dump_custom_vtk.html
	dump_h5md.html
	dump_image.html
	dump_modify.html
	dump_molfile.html
	dump_nc.html
	echo.html
	fix.html
	fix_modify.html
	group.html
	group2ndx.html
	if.html
	improper_coeff.html
	improper_style.html
	include.html
	info.html
	jump.html
	kspace_modify.html
	kspace_style.html
	label.html
	lattice.html
	log.html
	mass.html
	min_modify.html
	min_style.html
	minimize.html
	molecule.html
	neb.html
	neigh_modify.html
	neighbor.html
	newton.html
	next.html
	package.html
	pair_coeff.html
	pair_modify.html
	pair_style.html
	pair_write.html
	partition.html
	prd.html
	print.html
	processors.html
	python.html
	quit.html
	read_data.html
	read_dump.html
	read_restart.html
	region.html
	replicate.html
	rerun.html
	reset_timestep.html
	restart.html
	run.html
	run_style.html
	set.html
	shell.html
	special_bonds.html
	suffix.html
	tad.html
	temper.html
	temper_grem.html
	thermo.html
	thermo_modify.html
	thermo_style.html
	timer.html
	timestep.html
	uncompute.html
	undump.html
	unfix.html
	units.html
	variable.html
	velocity.html
	write_coeff.html
	write_data.html
	write_dump.html
	write_restart.html

	fix_adapt.html
	fix_adapt_fep.html
	fix_addforce.html
	fix_addtorque.html
	fix_append_atoms.html
	fix_atc.html
	fix_atom_swap.html
	fix_ave_atom.html
	fix_ave_chunk.html
	fix_ave_correlate.html
	fix_ave_correlate_long.html
	fix_ave_histo.html
	fix_ave_time.html
	fix_aveforce.html
	fix_balance.html
	fix_bond_break.html
	fix_bond_create.html
	fix_bond_swap.html
	fix_box_relax.html
	fix_cmap.html
	fix_colvars.html
	fix_controller.html
	fix_deform.html
	fix_deposit.html
	fix_dpd_energy.html
	fix_drag.html
	fix_drude.html
	fix_drude_transform.html
	fix_dt_reset.html
	fix_efield.html
	fix_ehex.html
	fix_enforce2d.html
	fix_eos_cv.html
	fix_eos_table.html
	fix_eos_table_rx.html
	fix_evaporate.html
	fix_external.html
	fix_filter_corotate.html
	fix_flow_gauss.html
	fix_freeze.html
	fix_gcmc.html
	fix_gld.html
	fix_gle.html
	fix_gravity.html
	fix_grem.html
	fix_halt.html
	fix_heat.html
	fix_imd.html
	fix_indent.html
	fix_ipi.html
	fix_langevin.html
	fix_langevin_drude.html
	fix_langevin_eff.html
	fix_lb_fluid.html
	fix_lb_momentum.html
	fix_lb_pc.html
	fix_lb_rigid_pc_sphere.html
	fix_lb_viscous.html
	fix_lineforce.html
	fix_manifoldforce.html
	fix_meso.html
	fix_meso_stationary.html
	fix_momentum.html
	fix_move.html
	fix_mscg.html
	fix_msst.html
	fix_neb.html
	fix_nh.html
	fix_nh_eff.html
	fix_nph_asphere.html
	fix_nph_body.html
	fix_nph_sphere.html
	fix_nphug.html
	fix_npt_asphere.html
	fix_npt_body.html
	fix_npt_sphere.html
	fix_nve.html
	fix_nve_asphere.html
	fix_nve_asphere_noforce.html
	fix_nve_body.html
	fix_nve_dot.html
	fix_nve_dotc_langevin.html
	fix_nve_eff.html
	fix_nve_limit.html
	fix_nve_line.html
	fix_nve_manifold_rattle.html
	fix_nve_noforce.html
	fix_nve_sphere.html
	fix_nve_tri.html
	fix_nvk.html
	fix_nvt_asphere.html
	fix_nvt_body.html
	fix_nvt_manifold_rattle.html
	fix_nvt_sllod.html
	fix_nvt_sllod_eff.html
	fix_nvt_sphere.html
	fix_oneway.html
	fix_orient.html
	fix_phonon.html
	fix_pimd.html
	fix_planeforce.html
	fix_poems.html
	fix_pour.html
	fix_press_berendsen.html
	fix_print.html
	fix_property_atom.html
	fix_qbmsst.html
	fix_qeq.html
	fix_qeq_comb.html
	fix_qeq_reax.html
	fix_qmmm.html
	fix_qtb.html
	fix_reax_bonds.html
	fix_reaxc_species.html
	fix_recenter.html
	fix_restrain.html
	fix_rigid.html
	fix_rx.html
	fix_saed_vtk.html
	fix_setforce.html
	fix_shake.html
	fix_shardlow.html
	fix_smd.html
	fix_smd_adjust_dt.html
	fix_smd_integrate_tlsph.html
	fix_smd_integrate_ulsph.html
	fix_smd_move_triangulated_surface.html
	fix_smd_setvel.html
	fix_smd_wall_surface.html
	fix_spring.html
	fix_spring_chunk.html
	fix_spring_rg.html
	fix_spring_self.html
	fix_srd.html
	fix_store_force.html
	fix_store_state.html
	fix_temp_berendsen.html
	fix_temp_csvr.html
	fix_temp_rescale.html
	fix_temp_rescale_eff.html
	fix_tfmc.html
	fix_thermal_conductivity.html
	fix_ti_spring.html
	fix_tmd.html
	fix_ttm.html
	fix_tune_kspace.html
	fix_vector.html
	fix_viscosity.html
	fix_viscous.html
	fix_wall.html
	fix_wall_gran.html
	fix_wall_gran_region.html
	fix_wall_piston.html
	fix_wall_reflect.html
	fix_wall_region.html
	fix_wall_srd.html

	compute_ackland_atom.html
	compute_angle.html
	compute_angle_local.html
	compute_angmom_chunk.html
	compute_basal_atom.html
	compute_body_local.html
	compute_bond.html
	compute_bond_local.html
	compute_centro_atom.html
	compute_chunk_atom.html
	compute_cluster_atom.html
	compute_cna_atom.html
	compute_com.html
	compute_com_chunk.html
	compute_contact_atom.html
	compute_coord_atom.html
	compute_damage_atom.html
	compute_dihedral.html
	compute_dihedral_local.html
	compute_dilatation_atom.html
	compute_dipole_chunk.html
	compute_displace_atom.html
	compute_dpd.html
	compute_dpd_atom.html
	compute_erotate_asphere.html
	compute_erotate_rigid.html
	compute_erotate_sphere.html
	compute_erotate_sphere_atom.html
	compute_event_displace.html
	compute_fep.html
	compute_global_atom.html
	compute_group_group.html
	compute_gyration.html
	compute_gyration_chunk.html
	compute_heat_flux.html
	compute_hexorder_atom.html
	compute_improper.html
	compute_improper_local.html
	compute_inertia_chunk.html
	compute_ke.html
	compute_ke_atom.html
	compute_ke_atom_eff.html
	compute_ke_eff.html
	compute_ke_rigid.html
	compute_meso_e_atom.html
	compute_meso_rho_atom.html
	compute_meso_t_atom.html
	compute_msd.html
	compute_msd_chunk.html
	compute_msd_nongauss.html
	compute_omega_chunk.html
	compute_orientorder_atom.html
	compute_pair.html
	compute_pair_local.html
	compute_pe.html
	compute_pe_atom.html
	compute_plasticity_atom.html
	compute_pressure.html
	compute_property_atom.html
	compute_property_chunk.html
	compute_property_local.html
	compute_rdf.html
	compute_reduce.html
	compute_rigid_local.html
	compute_saed.html
	compute_slice.html
	compute_smd_contact_radius.html
	compute_smd_damage.html
	compute_smd_hourglass_error.html
	compute_smd_internal_energy.html
	compute_smd_plastic_strain.html
	compute_smd_plastic_strain_rate.html
	compute_smd_rho.html
	compute_smd_tlsph_defgrad.html
	compute_smd_tlsph_dt.html
	compute_smd_tlsph_num_neighs.html
	compute_smd_tlsph_shape.html
	compute_smd_tlsph_strain.html
	compute_smd_tlsph_strain_rate.html
	compute_smd_tlsph_stress.html
	compute_smd_triangle_mesh_vertices.html
	compute_smd_ulsph_num_neighs.html
	compute_smd_ulsph_strain.html
	compute_smd_ulsph_strain_rate.html
	compute_smd_ulsph_stress.html
	compute_smd_vol.html
	compute_sna_atom.html
	compute_stress_atom.html
	compute_tally.html
	compute_temp.html
	compute_temp_asphere.html
	compute_temp_body.html
	compute_temp_chunk.html
	compute_temp_com.html
	compute_temp_cs.html
	compute_temp_deform.html
	compute_temp_deform_eff.html
	compute_temp_drude.html
	compute_temp_eff.html
	compute_temp_partial.html
	compute_temp_profile.html
	compute_temp_ramp.html
	compute_temp_region.html
	compute_temp_region_eff.html
	compute_temp_rotate.html
	compute_temp_sphere.html
	compute_ti.html
	compute_torque_chunk.html
	compute_vacf.html
	compute_vcm_chunk.html
	compute_voronoi_atom.html
	compute_xrd.html

	pair_adp.html
	pair_agni.html
	pair_airebo.html
	pair_awpmd.html
	pair_beck.html
	pair_body.html
	pair_bop.html
	pair_born.html
	pair_brownian.html
	pair_buck.html
	pair_buck_long.html
	pair_charmm.html
	pair_class2.html
	pair_colloid.html
	pair_comb.html
	pair_coul.html
	pair_coul_diel.html
	pair_cs.html
	pair_dipole.html
	pair_dpd.html
	pair_dpd_fdt.html
	pair_dsmc.html
	pair_eam.html
	pair_edip.html
	pair_eff.html
	pair_eim.html
	pair_exp6_rx.html
	pair_gauss.html
	pair_gayberne.html
	pair_gran.html
	pair_gromacs.html
	pair_hbond_dreiding.html
	pair_hybrid.html
	pair_kim.html
	pair_kolmogorov_crespi_z.html
	pair_lcbop.html
	pair_line_lj.html
	pair_list.html
	pair_lj.html
	pair_lj96.html
	pair_lj_cubic.html
	pair_lj_expand.html
	pair_lj_long.html
	pair_lj_sf.html
	pair_lj_smooth.html
	pair_lj_smooth_linear.html
	pair_lj_soft.html
	pair_lubricate.html
	pair_lubricateU.html
	pair_mdf.html
	pair_meam.html
	pair_meam_spline.html
	pair_meam_sw_spline.html
	pair_mgpt.html
	pair_mie.html
	pair_momb.html
	pair_morse.html
	pair_multi_lucy.html
	pair_multi_lucy_rx.html
	pair_nb3b_harmonic.html
	pair_nm.html
	pair_none.html
	pair_oxdna.html
	pair_oxdna2.html
	pair_peri.html
	pair_polymorphic.html
	pair_quip.html
	pair_reax.html
	-pair_reax_c.html
	+pair_reaxc.html
	pair_resquared.html
	pair_sdk.html
	pair_smd_hertz.html
	pair_smd_tlsph.html
	pair_smd_triangulated_surface.html
	pair_smd_ulsph.html
	pair_smtbq.html
	pair_snap.html
	pair_soft.html
	pair_sph_heatconduction.html
	pair_sph_idealgas.html
	pair_sph_lj.html
	pair_sph_rhosum.html
	pair_sph_taitwater.html
	pair_sph_taitwater_morris.html
	pair_srp.html
	pair_sw.html
	pair_table.html
	pair_table_rx.html
	pair_tersoff.html
	pair_tersoff_mod.html
	pair_tersoff_zbl.html
	pair_thole.html
	pair_tri_lj.html
	pair_vashishta.html
	pair_yukawa.html
	pair_yukawa_colloid.html
	pair_zbl.html
	pair_zero.html

	bond_class2.html
	bond_fene.html
	bond_fene_expand.html
	bond_oxdna.html
	bond_harmonic.html
	bond_harmonic_shift.html
	bond_harmonic_shift_cut.html
	bond_hybrid.html
	bond_morse.html
	bond_none.html
	bond_nonlinear.html
	bond_quartic.html
	bond_table.html
	bond_zero.html

	angle_charmm.html
	angle_class2.html
	angle_cosine.html
	angle_cosine_delta.html
	angle_cosine_periodic.html
	angle_cosine_shift.html
	angle_cosine_shift_exp.html
	angle_cosine_squared.html
	angle_dipole.html
	angle_fourier.html
	angle_fourier_simple.html
	angle_harmonic.html
	angle_hybrid.html
	angle_none.html
	angle_quartic.html
	angle_sdk.html
	angle_table.html
	angle_zero.html

	dihedral_charmm.html
	dihedral_class2.html
	dihedral_cosine_shift_exp.html
	dihedral_fourier.html
	dihedral_harmonic.html
	dihedral_helix.html
	dihedral_hybrid.html
	dihedral_multi_harmonic.html
	dihedral_nharmonic.html
	dihedral_none.html
	dihedral_opls.html
	dihedral_quadratic.html
	dihedral_spherical.html
	dihedral_table.html
	dihedral_zero.html

	improper_class2.html
	improper_cossq.html
	improper_cvff.html
	improper_distance.html
	improper_fourier.html
	improper_harmonic.html
	improper_hybrid.html
	improper_none.html
	improper_ring.html
	improper_umbrella.html
	improper_zero.html

	USER/atc/man_add_molecule.html
	USER/atc/man_add_species.html
	USER/atc/man_atom_element_map.html
	USER/atc/man_atom_weight.html
	USER/atc/man_atomic_charge.html
	USER/atc/man_boundary.html
	USER/atc/man_boundary_dynamics.html
	USER/atc/man_boundary_faceset.html
	USER/atc/man_boundary_integral.html
	USER/atc/man_consistent_fe_initialization.html
	USER/atc/man_contour_integral.html
	USER/atc/man_control.html
	USER/atc/man_control_momentum.html
	USER/atc/man_control_thermal.html
	USER/atc/man_control_thermal_correction_max_iterations.html
	USER/atc/man_decomposition.html
	USER/atc/man_electron_integration.html
	USER/atc/man_equilibrium_start.html
	USER/atc/man_extrinsic_exchange.html
	USER/atc/man_fe_md_boundary.html
	USER/atc/man_fem_mesh.html
	USER/atc/man_filter_scale.html
	USER/atc/man_filter_type.html
	USER/atc/man_fix_atc.html
	USER/atc/man_fix_flux.html
	USER/atc/man_fix_nodes.html
	USER/atc/man_hardy_computes.html
	USER/atc/man_hardy_fields.html
	USER/atc/man_hardy_gradients.html
	USER/atc/man_hardy_kernel.html
	USER/atc/man_hardy_on_the_fly.html
	USER/atc/man_hardy_rates.html
	USER/atc/man_initial.html
	USER/atc/man_internal_atom_integrate.html
	USER/atc/man_internal_element_set.html
	USER/atc/man_internal_quadrature.html
	USER/atc/man_kernel_function.html
	USER/atc/man_localized_lambda.html
	USER/atc/man_lumped_lambda_solve.html
	USER/atc/man_mask_direction.html
	USER/atc/man_mass_matrix.html
	USER/atc/man_material.html
	USER/atc/man_mesh_add_to_nodeset.html
	USER/atc/man_mesh_create.html
	USER/atc/man_mesh_create_elementset.html
	USER/atc/man_mesh_create_faceset_box.html
	USER/atc/man_mesh_create_faceset_plane.html
	USER/atc/man_mesh_create_nodeset.html
	USER/atc/man_mesh_delete_elements.html
	USER/atc/man_mesh_nodeset_to_elementset.html
	USER/atc/man_mesh_output.html
	USER/atc/man_mesh_quadrature.html
	USER/atc/man_mesh_read.html
	USER/atc/man_mesh_write.html
	USER/atc/man_momentum_time_integration.html
	USER/atc/man_output.html
	USER/atc/man_output_elementset.html
	USER/atc/man_output_nodeset.html
	USER/atc/man_pair_interactions.html
	USER/atc/man_poisson_solver.html
	USER/atc/man_read_restart.html
	USER/atc/man_remove_molecule.html
	USER/atc/man_remove_source.html
	USER/atc/man_remove_species.html
	USER/atc/man_reset_atomic_reference_positions.html
	USER/atc/man_reset_time.html
	USER/atc/man_sample_frequency.html
	USER/atc/man_set.html
	USER/atc/man_source.html
	USER/atc/man_source_integration.html
	USER/atc/man_temperature_definition.html
	USER/atc/man_thermal_time_integration.html
	USER/atc/man_time_filter.html
	USER/atc/man_track_displacement.html
	USER/atc/man_unfix_flux.html
	USER/atc/man_unfix_nodes.html
	USER/atc/man_write_atom_weights.html
	USER/atc/man_write_restart.html

	diff --git a/doc/src/pair_hybrid.txt b/doc/src/pair_hybrid.txt
	index 7ef54e7f0..5166fe1f8 100644
	--- a/doc/src/pair_hybrid.txt
	+++ b/doc/src/pair_hybrid.txt
	@@ -1,382 +1,388 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	pair_style hybrid command :h3
	pair_style hybrid/omp command :h3
	pair_style hybrid/overlay command :h3
	pair_style hybrid/overlay/omp command :h3

	[Syntax:]

	pair_style hybrid style1 args style2 args ...
	pair_style hybrid/overlay style1 args style2 args ... :pre

	style1,style2 = list of one or more pair styles and their arguments :ul

	[Examples:]

	pair_style hybrid lj/cut/coul/cut 10.0 eam lj/cut 5.0
	pair_coeff 12 12 eam niu3
	pair_coeff 3 3 lj/cut/coul/cut 1.0 1.0
	pair_coeff 1*2 3 lj/cut 0.5 1.2 :pre

	pair_style hybrid/overlay lj/cut 2.5 coul/long 2.0
	pair_coeff * * lj/cut 1.0 1.0
	pair_coeff * * coul/long :pre

	[Description:]

	The {hybrid} and {hybrid/overlay} styles enable the use of multiple
	pair styles in one simulation. With the {hybrid} style, exactly one
	pair style is assigned to each pair of atom types. With the
	{hybrid/overlay} style, one or more pair styles can be assigned to
	each pair of atom types. The assignment of pair styles to type pairs
	is made via the "pair_coeff"_pair_coeff.html command.

	Here are two examples of hybrid simulations. The {hybrid} style could
	be used for a simulation of a metal droplet on a LJ surface. The
	metal atoms interact with each other via an {eam} potential, the
	surface atoms interact with each other via a {lj/cut} potential, and
	the metal/surface interaction is also computed via a {lj/cut}
	potential. The {hybrid/overlay} style could be used as in the 2nd
	example above, where multiple potentials are superposed in an additive
	fashion to compute the interaction between atoms. In this example,
	using {lj/cut} and {coul/long} together gives the same result as if
	the {lj/cut/coul/long} potential were used by itself. In this case,
	it would be more efficient to use the single combined potential, but
	in general any combination of pair potentials can be used together in
	to produce an interaction that is not encoded in any single pair_style
	file, e.g. adding Coulombic forces between granular particles.

	All pair styles that will be used are listed as "sub-styles" following
	the {hybrid} or {hybrid/overlay} keyword, in any order. Each
	sub-style's name is followed by its usual arguments, as illustrated in
	the example above. See the doc pages of individual pair styles for a
	listing and explanation of the appropriate arguments.

	Note that an individual pair style can be used multiple times as a
	sub-style. For efficiency this should only be done if your model
	requires it. E.g. if you have different regions of Si and C atoms and
	wish to use a Tersoff potential for pure Si for one set of atoms, and
	a Tersoff potential for pure C for the other set (presumably with some
	3rd potential for Si-C interactions), then the sub-style {tersoff}
	could be listed twice. But if you just want to use a Lennard-Jones or
	other pairwise potential for several different atom type pairs in your
	model, then you should just list the sub-style once and use the
	pair_coeff command to assign parameters for the different type pairs.

	NOTE: There are two exceptions to this option to list an individual
	pair style multiple times. The first is for pair styles implemented
	as Fortran libraries: "pair_style meam"_pair_meam.html and "pair_style
	-reax"_pair_reax.html ("pair_style reax/c"_pair_reax_c.html is OK).
	+reax"_pair_reax.html ("pair_style reax/c"_pair_reaxc.html is OK).
	This is because unlike a C++ class, they can not be instantiated
	multiple times, due to the manner in which they were coded in Fortran.
	The second is for GPU-enabled pair styles in the GPU package. This is
	b/c the GPU package also currently assumes that only one instance of a
	pair style is being used.

	In the pair_coeff commands, the name of a pair style must be added
	after the I,J type specification, with the remaining coefficients
	being those appropriate to that style. If the pair style is used
	multiple times in the pair_style command, then an additional numeric
	argument must also be specified which is a number from 1 to M where M
	is the number of times the sub-style was listed in the pair style
	command. The extra number indicates which instance of the sub-style
	these coefficients apply to.

	For example, consider a simulation with 3 atom types: types 1 and 2
	are Ni atoms, type 3 are LJ atoms with charges. The following
	commands would set up a hybrid simulation:

	pair_style hybrid eam/alloy lj/cut/coul/cut 10.0 lj/cut 8.0
	pair_coeff * * eam/alloy nialhjea Ni Ni NULL
	pair_coeff 3 3 lj/cut/coul/cut 1.0 1.0
	pair_coeff 1*2 3 lj/cut 0.8 1.3 :pre

	As an example of using the same pair style multiple times, consider a
	simulation with 2 atom types. Type 1 is Si, type 2 is C. The
	following commands would model the Si atoms with Tersoff, the C atoms
	with Tersoff, and the cross-interactions with Lennard-Jones:

	pair_style hybrid lj/cut 2.5 tersoff tersoff
	pair_coeff * * tersoff 1 Si.tersoff Si NULL
	pair_coeff * * tersoff 2 C.tersoff NULL C
	pair_coeff 1 2 lj/cut 1.0 1.5 :pre

	If pair coefficients are specified in the data file read via the
	"read_data"_read_data.html command, then the same rule applies.
	E.g. "eam/alloy" or "lj/cut" must be added after the atom type, for
	each line in the "Pair Coeffs" section, e.g.

	Pair Coeffs :pre

	1 lj/cut/coul/cut 1.0 1.0
	... :pre

	Note that the pair_coeff command for some potentials such as
	"pair_style eam/alloy"_pair_eam.html includes a mapping specification
	of elements to all atom types, which in the hybrid case, can include
	atom types not assigned to the {eam/alloy} potential. The NULL
	keyword is used by many such potentials (eam/alloy, Tersoff, AIREBO,
	etc), to denote an atom type that will be assigned to a different
	sub-style.

	For the {hybrid} style, each atom type pair I,J is assigned to exactly
	one sub-style. Just as with a simulation using a single pair style,
	if you specify the same atom type pair in a second pair_coeff command,
	the previous assignment will be overwritten.

	For the {hybrid/overlay} style, each atom type pair I,J can be
	assigned to one or more sub-styles. If you specify the same atom type
	pair in a second pair_coeff command with a new sub-style, then the
	second sub-style is added to the list of potentials that will be
	calculated for two interacting atoms of those types. If you specify
	the same atom type pair in a second pair_coeff command with a
	sub-style that has already been defined for that pair of atoms, then
	the new pair coefficients simply override the previous ones, as in the
	normal usage of the pair_coeff command. E.g. these two sets of
	commands are the same:

	pair_style lj/cut 2.5
	pair_coeff * * 1.0 1.0
	pair_coeff 2 2 1.5 0.8 :pre

	pair_style hybrid/overlay lj/cut 2.5
	pair_coeff * * lj/cut 1.0 1.0
	pair_coeff 2 2 lj/cut 1.5 0.8 :pre

	Coefficients must be defined for each pair of atoms types via the
	"pair_coeff"_pair_coeff.html command as described above, or in the
	data file or restart files read by the "read_data"_read_data.html or
	"read_restart"_read_restart.html commands, or by mixing as described
	below.

	For both the {hybrid} and {hybrid/overlay} styles, every atom type
	pair I,J (where I <= J) must be assigned to at least one sub-style via
	the "pair_coeff"_pair_coeff.html command as in the examples above, or
	in the data file read by the "read_data"_read_data.html, or by mixing
	as described below.

	If you want there to be no interactions between a particular pair of
	atom types, you have 3 choices. You can assign the type pair to some
	sub-style and use the "neigh_modify exclude type"_neigh_modify.html
	command. You can assign it to some sub-style and set the coefficients
	so that there is effectively no interaction (e.g. epsilon = 0.0 in a
	LJ potential). Or, for {hybrid} and {hybrid/overlay} simulations, you
	can use this form of the pair_coeff command in your input script:

	pair_coeff 2 3 none :pre

	or this form in the "Pair Coeffs" section of the data file:

	3 none :pre

	If an assignment to {none} is made in a simulation with the
	{hybrid/overlay} pair style, it wipes out all previous assignments of
	that atom type pair to sub-styles.

	Note that you may need to use an "atom_style"_atom_style.html hybrid
	command in your input script, if atoms in the simulation will need
	attributes from several atom styles, due to using multiple pair
	potentials.

	:line

	Different force fields (e.g. CHARMM vs AMBER) may have different rules
	for applying weightings that change the strength of pairwise
	interactions between pairs of atoms that are also 1-2, 1-3, and 1-4
	neighbors in the molecular bond topology, as normally set by the
	"special_bonds"_special_bonds.html command. Different weights can be
	assigned to different pair hybrid sub-styles via the "pair_modify
	special"_pair_modify.html command. This allows multiple force fields
	to be used in a model of a hybrid system, however, there is no consistent
	approach to determine parameters automatically for the interactions
	between the two force fields, this is only recommended when particles
	described by the different force fields do not mix.

	Here is an example for mixing CHARMM and AMBER: The global {amber}
	setting sets the 1-4 interactions to non-zero scaling factors and
	then overrides them with 0.0 only for CHARMM:

	special_bonds amber
	pair_hybrid lj/charmm/coul/long 8.0 10.0 lj/cut/coul/long 10.0
	pair_modify pair lj/charmm/coul/long special lj/coul 0.0 0.0 0.0 :pre

	The this input achieves the same effect:

	special_bonds 0.0 0.0 0.1
	pair_hybrid lj/charmm/coul/long 8.0 10.0 lj/cut/coul/long 10.0
	pair_modify pair lj/cut/coul/long special lj 0.0 0.0 0.5
	pair_modify pair lj/cut/coul/long special coul 0.0 0.0 0.83333333
	pair_modify pair lj/charmm/coul/long special lj/coul 0.0 0.0 0.0 :pre

	Here is an example for mixing Tersoff with OPLS/AA based on
	a data file that defines bonds for all atoms where for the
	Tersoff part of the system the force constants for the bonded
	interactions have been set to 0. Note the global settings are
	effectively {lj/coul 0.0 0.0 0.5} as required for OPLS/AA:

	special_bonds lj/coul 1e-20 1e-20 0.5
	pair_hybrid tersoff lj/cut/coul/long 12.0
	pair_modify pair tersoff special lj/coul 1.0 1.0 1.0 :pre

	+For use with the various "compute */tally"_compute_tally.html
	+computes, the "pair_modify compute/tally"_pair_modify.html
	+command can be used to selectively turn off processing of
	+the compute tally styles, for example, if those pair styles
	+(e.g. manybody styles) do not support this feature.
	+
	See the "pair_modify"_pair_modify.html doc page for details on
	the specific syntax, requirements and restrictions.

	:line

	The potential energy contribution to the overall system due to an
	individual sub-style can be accessed and output via the "compute
	pair"_compute_pair.html command.

	:line

	NOTE: Several of the potentials defined via the pair_style command in
	LAMMPS are really many-body potentials, such as Tersoff, AIREBO, MEAM,
	ReaxFF, etc. The way to think about using these potentials in a
	hybrid setting is as follows.

	A subset of atom types is assigned to the many-body potential with a
	single "pair_coeff"_pair_coeff.html command, using "* *" to include
	all types and the NULL keywords described above to exclude specific
	types not assigned to that potential. If types 1,3,4 were assigned in
	that way (but not type 2), this means that all many-body interactions
	between all atoms of types 1,3,4 will be computed by that potential.
	Pair_style hybrid allows interactions between type pairs 2-2, 1-2,
	2-3, 2-4 to be specified for computation by other pair styles. You
	could even add a second interaction for 1-1 to be computed by another
	pair style, assuming pair_style hybrid/overlay is used.

	But you should not, as a general rule, attempt to exclude the
	many-body interactions for some subset of the type pairs within the
	set of 1,3,4 interactions, e.g. exclude 1-1 or 1-3 interactions. That
	is not conceptually well-defined for many-body interactions, since the
	potential will typically calculate energies and foces for small groups
	of atoms, e.g. 3 or 4 atoms, using the neighbor lists of the atoms to
	find the additional atoms in the group. It is typically non-physical
	to think of excluding an interaction between a particular pair of
	atoms when the potential computes 3-body or 4-body interactions.

	However, you can still use the pair_coeff none setting or the
	"neigh_modify exclude"_neigh_modify.html command to exclude certain
	type pairs from the neighbor list that will be passed to a manybody
	sub-style. This will alter the calculations made by a many-body
	potential, since it builds its list of 3-body, 4-body, etc
	interactions from the pair list. You will need to think carefully as
	to whether it produces a physically meaningful result for your model.

	For example, imagine you have two atom types in your model, type 1 for
	atoms in one surface, and type 2 for atoms in the other, and you wish
	to use a Tersoff potential to compute interactions within each
	surface, but not between surfaces. Then either of these two command
	sequences would implement that model:

	pair_style hybrid tersoff
	pair_coeff * * tersoff SiC.tersoff C C
	pair_coeff 1 2 none :pre

	pair_style tersoff
	pair_coeff * * SiC.tersoff C C
	neigh_modify exclude type 1 2 :pre

	Either way, only neighbor lists with 1-1 or 2-2 interactions would be
	passed to the Tersoff potential, which means it would compute no
	3-body interactions containing both type 1 and 2 atoms.

	Here is another example, using hybrid/overlay, to use 2 many-body
	potentials together, in an overlapping manner. Imagine you have CNT
	(C atoms) on a Si surface. You want to use Tersoff for Si/Si and Si/C
	interactions, and AIREBO for C/C interactions. Si atoms are type 1; C
	atoms are type 2. Something like this will work:

	pair_style hybrid/overlay tersoff airebo 3.0
	pair_coeff * * tersoff SiC.tersoff.custom Si C
	pair_coeff * * airebo CH.airebo NULL C :pre

	Note that to prevent the Tersoff potential from computing C/C
	interactions, you would need to modify the SiC.tersoff file to turn
	off C/C interaction, i.e. by setting the appropriate coefficients to
	0.0.

	:line

	Styles with a {gpu}, {intel}, {kk}, {omp}, or {opt} suffix are
	functionally the same as the corresponding style without the suffix.
	They have been optimized to run faster, depending on your available
	hardware, as discussed in "Section 5"_Section_accelerate.html
	of the manual.

	Since the {hybrid} and {hybrid/overlay} styles delegate computation to
	the individual sub-styles, the suffix versions of the {hybrid} and
	{hybrid/overlay} styles are used to propagate the corresponding suffix
	to all sub-styles, if those versions exist. Otherwise the
	non-accelerated version will be used.

	The individual accelerated sub-styles are part of the GPU,
	USER-OMP and OPT packages, respectively. They are only enabled if
	LAMMPS was built with those packages. See the
	"Making LAMMPS"_Section_start.html#start_3 section for more info.

	You can specify the accelerated styles explicitly in your input script
	by including their suffix, or you can use the "-suffix command-line
	switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
	use the "suffix"_suffix.html command in your input script.

	See "Section 5"_Section_accelerate.html of the manual for
	more instructions on how to use the accelerated styles effectively.

	:line

	[Mixing, shift, table, tail correction, restart, rRESPA info]:

	Any pair potential settings made via the
	"pair_modify"_pair_modify.html command are passed along to all
	sub-styles of the hybrid potential.

	For atom type pairs I,J and I != J, if the sub-style assigned to I,I
	and J,J is the same, and if the sub-style allows for mixing, then the
	coefficients for I,J can be mixed. This means you do not have to
	specify a pair_coeff command for I,J since the I,J type pair will be
	assigned automatically to the sub-style defined for both I,I and J,J
	and its coefficients generated by the mixing rule used by that
	sub-style. For the {hybrid/overlay} style, there is an additional
	requirement that both the I,I and J,J pairs are assigned to a single
	sub-style. See the "pair_modify" command for details of mixing rules.
	See the See the doc page for the sub-style to see if allows for
	mixing.

	The hybrid pair styles supports the "pair_modify"_pair_modify.html
	shift, table, and tail options for an I,J pair interaction, if the
	associated sub-style supports it.

	For the hybrid pair styles, the list of sub-styles and their
	respective settings are written to "binary restart
	files"_restart.html, so a "pair_style"_pair_style.html command does
	not need to specified in an input script that reads a restart file.
	However, the coefficient information is not stored in the restart
	file. Thus, pair_coeff commands need to be re-specified in the
	restart input script.

	These pair styles support the use of the {inner}, {middle}, and
	{outer} keywords of the "run_style respa"_run_style.html command, if
	their sub-styles do.

	[Restrictions:]

	When using a long-range Coulombic solver (via the
	"kspace_style"_kspace_style.html command) with a hybrid pair_style,
	one or more sub-styles will be of the "long" variety,
	e.g. {lj/cut/coul/long} or {buck/coul/long}. You must insure that the
	short-range Coulombic cutoff used by each of these long pair styles is
	the same or else LAMMPS will generate an error.

	[Related commands:]

	"pair_coeff"_pair_coeff.html

	[Default:] none
	diff --git a/doc/src/pair_modify.txt b/doc/src/pair_modify.txt
	index 03fb80ae5..34dbb5bc3 100644
	--- a/doc/src/pair_modify.txt
	+++ b/doc/src/pair_modify.txt
	@@ -1,260 +1,275 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	pair_modify command :h3

	[Syntax:]

	pair_modify keyword values ... :pre

	one or more keyword/value pairs may be listed :ulb,l
	keyword = {pair} or {shift} or {mix} or {table} or {table/disp} or {tabinner} or {tabinner/disp} or {tail} or {compute} :l
	{pair} values = sub-style N {special} which wt1 wt2 wt3
	+ or sub-style N {compute/tally} flag
	sub-style = sub-style of "pair hybrid"_pair_hybrid.html
	N = which instance of sub-style (only if sub-style is used multiple times)
	- {special} which wt1 wt2 wt3 = override {special_bonds} settings (optional)
	- which = {lj/coul} or {lj} or {coul}
	- w1,w2,w3 = 1-2, 1-3, and 1-4 weights from 0.0 to 1.0 inclusive
	+ {special} which wt1 wt2 wt3 = override {special_bonds} settings (optional)
	+ which = {lj/coul} or {lj} or {coul}
	+ w1,w2,w3 = 1-2, 1-3, and 1-4 weights from 0.0 to 1.0 inclusive
	+ {compute/tally} flag = {yes} or {no}
	{mix} value = {geometric} or {arithmetic} or {sixthpower}
	{shift} value = {yes} or {no}
	{table} value = N
	2^N = # of values in table
	{table/disp} value = N
	2^N = # of values in table
	{tabinner} value = cutoff
	cutoff = inner cutoff at which to begin table (distance units)
	{tabinner/disp} value = cutoff
	cutoff = inner cutoff at which to begin table (distance units)
	{tail} value = {yes} or {no}
	{compute} value = {yes} or {no} :pre
	:ule

	[Examples:]

	pair_modify shift yes mix geometric
	pair_modify tail yes
	pair_modify table 12
	pair_modify pair lj/cut compute no
	+pair_modify pair tersoff compute/tally no
	pair_modify pair lj/cut/coul/long 1 special lj/coul 0.0 0.0 0.0 :pre

	[Description:]

	Modify the parameters of the currently defined pair style. Not all
	parameters are relevant to all pair styles.

	If used, the {pair} keyword must appear first in the list of keywords.
	It can only be used with the "hybrid and
	hybrid/overlay"_pair_hybrid.html pair styles. It means that all the
	following parameters will only be modified for the specified
	sub-style. If the sub-style is defined multiple times, then an
	additional numeric argument {N} must also be specified, which is a
	number from 1 to M where M is the number of times the sub-style was
	listed in the "pair_style hybrid"_pair_hybrid.html command. The extra
	number indicates which instance of the sub-style the remaining
	keywords will be applied to. Note that if the {pair} keyword is not
	used, and the pair style is {hybrid} or {hybrid/overlay}, then all the
	specified keywords will be applied to all sub-styles.

	-The {special} keyword can only be used in conjunction with the {pair}
	-keyword and must directly follow it. It allows to override the
	+The {special} and {compute/tally} keywords can [only] be used in
	+conjunction with the {pair} keyword and must directly follow it.
	+{special} allows to override the
	"special_bonds"_special_bonds.html settings for the specified sub-style.
	+{compute/tally} allows to disable or enable registering
	+"compute */tally"_compute_tally.html computes for a given sub-style.
	More details are given below.

	The {mix} keyword affects pair coefficients for interactions between
	atoms of type I and J, when I != J and the coefficients are not
	explicitly set in the input script. Note that coefficients for I = J
	must be set explicitly, either in the input script via the
	"pair_coeff" command or in the "Pair Coeffs" section of the "data
	file"_read_data.html. For some pair styles it is not necessary to
	specify coefficients when I != J, since a "mixing" rule will create
	them from the I,I and J,J settings. The pair_modify {mix} value
	determines what formulas are used to compute the mixed coefficients.
	In each case, the cutoff distance is mixed the same way as sigma.

	Note that not all pair styles support mixing. Also, some mix options
	are not available for certain pair styles. See the doc page for
	individual pair styles for those restrictions. Note also that the
	"pair_coeff"_pair_coeff.html command also can be to directly set
	coefficients for a specific I != J pairing, in which case no mixing is
	performed.

	mix {geometric}

	epsilon_ij = sqrt(epsilon_i * epsilon_j)
	sigma_ij = sqrt(sigma_i * sigma_j) :pre

	mix {arithmetic}

	epsilon_ij = sqrt(epsilon_i * epsilon_j)
	sigma_ij = (sigma_i + sigma_j) / 2 :pre

	mix {sixthpower}

	epsilon_ij = (2 * sqrt(epsilon_iepsilon_j) sigma_i^3 * sigma_j^3) /
	(sigma_i^6 + sigma_j^6)
	sigma_ij = ((sigma_i6 + sigma_j6) / 2) ^ (1/6) :pre

	The {shift} keyword determines whether a Lennard-Jones potential is
	shifted at its cutoff to 0.0. If so, this adds an energy term to each
	pairwise interaction which will be included in the thermodynamic
	output, but does not affect pair forces or atom trajectories. See the
	doc page for individual pair styles to see which ones support this
	option.

	The {table} and {table/disp} keywords apply to pair styles with a
	long-range Coulombic term or long-range dispersion term respectively;
	see the doc page for individual styles to see which potentials support
	these options. If N is non-zero, a table of length 2^N is
	pre-computed for forces and energies, which can shrink their
	computational cost by up to a factor of 2. The table is indexed via a
	bit-mapping technique "(Wolff)"_#Wolff1 and a linear interpolation is
	performed between adjacent table values. In our experiments with
	different table styles (lookup, linear, spline), this method typically
	gave the best performance in terms of speed and accuracy.

	The choice of table length is a tradeoff in accuracy versus speed. A
	larger N yields more accurate force computations, but requires more
	memory which can slow down the computation due to cache misses. A
	reasonable value of N is between 8 and 16. The default value of 12
	(table of length 4096) gives approximately the same accuracy as the
	no-table (N = 0) option. For N = 0, forces and energies are computed
	directly, using a polynomial fit for the needed erfc() function
	evaluation, which is what earlier versions of LAMMPS did. Values
	greater than 16 typically slow down the simulation and will not
	improve accuracy; values from 1 to 8 give unreliable results.

	The {tabinner} and {tabinner/disp} keywords set an inner cutoff above
	which the pairwise computation is done by table lookup (if tables are
	invoked), for the corresponding Coulombic and dispersion tables
	discussed with the {table} and {table/disp} keywords. The smaller the
	cutoff is set, the less accurate the table becomes (for a given number
	of table values), which can require use of larger tables. The default
	cutoff value is sqrt(2.0) distance units which means nearly all
	pairwise interactions are computed via table lookup for simulations
	with "real" units, but some close pairs may be computed directly
	(non-table) for simulations with "lj" units.

	When the {tail} keyword is set to {yes}, certain pair styles will add
	a long-range VanderWaals tail "correction" to the energy and pressure.
	These corrections are bookkeeping terms which do not affect dynamics,
	unless a constant-pressure simulation is being performed. See the doc
	page for individual styles to see which support this option. These
	corrections are included in the calculation and printing of
	thermodynamic quantities (see the "thermo_style"_thermo_style.html
	command). Their effect will also be included in constant NPT or NPH
	simulations where the pressure influences the simulation box
	dimensions (e.g. the "fix npt"_fix_nh.html and "fix nph"_fix_nh.html
	commands). The formulas used for the long-range corrections come from
	equation 5 of "(Sun)"_#Sun.

	NOTE: The tail correction terms are computed at the beginning of each
	run, using the current atom counts of each atom type. If atoms are
	deleted (or lost) or created during a simulation, e.g. via the "fix
	gcmc"_fix_gcmc.html command, the correction factors are not
	re-computed. If you expect the counts to change dramatically, you can
	break a run into a series of shorter runs so that the correction
	factors are re-computed more frequently.

	Several additional assumptions are inherent in using tail corrections,
	including the following:

	The simulated system is a 3d bulk homogeneous liquid. This option
	should not be used for systems that are non-liquid, 2d, have a slab
	geometry (only 2d periodic), or inhomogeneous. :ulb,l

	G(r), the radial distribution function (rdf), is unity beyond the
	cutoff, so a fairly large cutoff should be used (i.e. 2.5 sigma for an
	LJ fluid), and it is probably a good idea to verify this assumption by
	checking the rdf. The rdf is not exactly unity beyond the cutoff for
	each pair of interaction types, so the tail correction is necessarily
	an approximation. :l

	The tail corrections are computed at the beginning of each simulation
	run. If the number of atoms changes during the run, e.g. due to atoms
	leaving the simulation domain, or use of the "fix gcmc"_fix_gcmc.html
	command, then the corrections are not updates to relect the changed
	atom count. If this is a large effect in your simulation, you should
	break the long run into several short runs, so that the correction
	factors are re-computed multiple times.

	Thermophysical properties obtained from calculations with this option
	enabled will not be thermodynamically consistent with the truncated
	force-field that was used. In other words, atoms do not feel any LJ
	pair interactions beyond the cutoff, but the energy and pressure
	reported by the simulation include an estimated contribution from
	those interactions. :l
	:ule

	The {compute} keyword allows pairwise computations to be turned off,
	even though a "pair_style"_pair_style.html is defined. This is not
	useful for running a real simulation, but can be useful for debugging
	purposes or for performing a "rerun"_rerun.html simulation, when you
	only wish to compute partial forces that do not include the pairwise
	contribution.

	Two examples are as follows. First, this option allows you to perform
	a simulation with "pair_style hybrid"_pair_hybrid.html with only a
	subset of the hybrid sub-styles enabled. Second, this option allows
	you to perform a simulation with only long-range interactions but no
	short-range pairwise interactions. Doing this by simply not defining
	a pair style will not work, because the
	"kspace_style"_kspace_style.html command requires a Kspace-compatible
	pair style be defined.

	:line

	The {special} keyword allows to override the 1-2, 1-3, and 1-4
	exclusion settings for individual sub-styles of a
	"hybrid pair style"_pair_hybrid.html. It requires 4 arguments similar
	to the "special_bonds"_special_bonds.html command, {which} and
	wt1,wt2,wt3. The {which} argument can be {lj} to change the
	Lennard-Jones settings, {coul} to change the Coulombic settings,
	or {lj/coul} to change both to the same set of 3 values. The wt1,wt2,wt3
	values are numeric weights from 0.0 to 1.0 inclusive, for the 1-2,
	1-3, and 1-4 bond topology neighbors, respectively. The {special}
	keyword can only be used in conjunction with the {pair} keyword
	and has to directly follow it.

	NOTE: The global settings specified by the
	"special_bonds"_special_bonds.html command affect the construction of
	neighbor lists. Weights of 0.0 (for 1-2, 1-3, or 1-4 neighbors)
	exclude those pairs from the neighbor list entirely. Weights of 1.0
	store the neighbor with no weighting applied. Thus only global values
	different from exactly 0.0 or 1.0 can be overridden and an error is
	generated if the requested setting is not compatible with the global
	setting. Substituting 1.0e-10 for 0.0 and 0.9999999999 for 1.0 is
	usually a sufficient workaround in this case without causing a
	significant error.

	+The {compute/tally} keyword takes exactly 1 argument ({no} or {yes}),
	+and allows to selectively disable or enable processing of the various
	+"compute */tally"_compute_tally.html styles for a given
	+"pair hybrid or hybrid/overlay"_pair_hybrid.html sub-style.
	+
	+NOTE: Any "pair_modify pair compute/tally" command must be issued
	+[before] the corresponding compute style is defined.
	+
	:line

	[Restrictions:] none

	You cannot use {shift} yes with {tail} yes, since those are
	conflicting options. You cannot use {tail} yes with 2d simulations.

	[Related commands:]

	-"pair_style"_pair_style.html, "pair_coeff"_pair_coeff.html,
	-"thermo_style"_thermo_style.html
	+"pair_style"_pair_style.html, "pair_style hybrid"_pair_hybrid.html,
	+pair_coeff"_pair_coeff.html, "thermo_style"_thermo_style.html,
	+"compute */tally"_compute_tally.html

	[Default:]

	The option defaults are mix = geometric, shift = no, table = 12,
	tabinner = sqrt(2.0), tail = no, and compute = yes.

	Note that some pair styles perform mixing, but only a certain style of
	mixing. See the doc pages for individual pair styles for details.

	:line

	:link(Wolff1)
	[(Wolff)] Wolff and Rudd, Comp Phys Comm, 120, 200-32 (1999).

	:link(Sun)
	[(Sun)] Sun, J Phys Chem B, 102, 7338-7364 (1998).
	diff --git a/doc/src/pair_reax.txt b/doc/src/pair_reax.txt
	index 7215c12ce..1d13f9370 100644
	--- a/doc/src/pair_reax.txt
	+++ b/doc/src/pair_reax.txt
	@@ -1,216 +1,216 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	pair_style reax command :h3

	[Syntax:]

	pair_style reax hbcut hbnewflag tripflag precision :pre

	hbcut = hydrogen-bond cutoff (optional) (distance units)
	hbnewflag = use old or new hbond function style (0 or 1) (optional)
	tripflag = apply stabilization to all triple bonds (0 or 1) (optional)
	precision = precision for charge equilibration (optional) :ul

	[Examples:]

	pair_style reax
	pair_style reax 10.0 0 1 1.0e-5
	pair_coeff * * ffield.reax 3 1 2 2
	pair_coeff * * ffield.reax 3 NULL NULL 3 :pre

	[Description:]

	Style {reax} computes the ReaxFF potential of van Duin, Goddard and
	co-workers. ReaxFF uses distance-dependent bond-order functions to
	represent the contributions of chemical bonding to the potential
	energy. There is more than one version of ReaxFF. The version
	implemented in LAMMPS uses the functional forms documented in the
	supplemental information of the following paper:
	"(Chenoweth)"_#Chenoweth_20081. The version integrated into LAMMPS matches
	the most up-to-date version of ReaxFF as of summer 2010.

	WARNING: pair style reax is now deprecated and will soon be retired. Users
	-should switch to "pair_style reax/c"_pair_reax_c.html. The {reax} style
	+should switch to "pair_style reax/c"_pair_reaxc.html. The {reax} style
	differs from the {reax/c} style in the lo-level implementation details.
	The {reax} style is a
	Fortran library, linked to LAMMPS. The {reax/c} style was initially
	implemented as stand-alone C code and is now integrated into LAMMPS as
	a package.

	LAMMPS requires that a file called ffield.reax be provided, containing
	the ReaxFF parameters for each atom type, bond type, etc. The format
	is identical to the ffield file used by van Duin and co-workers. The
	filename is required as an argument in the pair_coeff command. Any
	value other than "ffield.reax" will be rejected (see below).

	LAMMPS provides several different versions of ffield.reax in its
	potentials dir, each called potentials/ffield.reax.label. These are
	documented in potentials/README.reax. The default ffield.reax
	contains parameterizations for the following elements: C, H, O, N.

	NOTE: We do not distribute a wide variety of ReaxFF force field files
	with LAMMPS. Adri van Duin's group at PSU is the central repository
	for this kind of data as they are continuously deriving and updating
	parameterizations for different classes of materials. You can submit
	a contact request at the Materials Computation Center (MCC) website
	"https://www.mri.psu.edu/materials-computation-center/connect-mcc"_https://www.mri.psu.edu/materials-computation-center/connect-mcc,
	describing the material(s) you are interested in modeling with ReaxFF.
	They can tell
	you what is currently available or what it would take to create a
	suitable ReaxFF parameterization.

	The format of these files is identical to that used originally by van
	Duin. We have tested the accuracy of {pair_style reax} potential
	against the original ReaxFF code for the systems mentioned above. You
	can use other ffield files for specific chemical systems that may be
	available elsewhere (but note that their accuracy may not have been
	tested).

	The {hbcut}, {hbnewflag}, {tripflag}, and {precision} settings are
	optional arguments. If none are provided, default settings are used:
	{hbcut} = 6 (which is Angstroms in real units), {hbnewflag} = 1 (use
	new hbond function style), {tripflag} = 1 (apply stabilization to all
	triple bonds), and {precision} = 1.0e-6 (one part in 10^6). If you
	wish to override any of these defaults, then all of the settings must
	be specified.

	Two examples using {pair_style reax} are provided in the examples/reax
	sub-directory, along with corresponding examples for
	-"pair_style reax/c"_pair_reax_c.html. Note that while the energy and force
	+"pair_style reax/c"_pair_reaxc.html. Note that while the energy and force
	calculated by both of these pair styles match very closely, the
	contributions due to the valence angles differ slightly due to
	the fact that with {pair_style reax/c} the default value of {thb_cutoff_sq}
	is 0.00001, while for {pair_style reax} it is hard-coded to be 0.001.

	Use of this pair style requires that a charge be defined for every
	atom since the {reax} pair style performs a charge equilibration (QEq)
	calculation. See the "atom_style"_atom_style.html and
	"read_data"_read_data.html commands for details on how to specify
	charges.

	The thermo variable {evdwl} stores the sum of all the ReaxFF potential
	energy contributions, with the exception of the Coulombic and charge
	equilibration contributions which are stored in the thermo variable
	{ecoul}. The output of these quantities is controlled by the
	"thermo"_thermo.html command.

	This pair style tallies a breakdown of the total ReaxFF potential
	energy into sub-categories, which can be accessed via the "compute
	pair"_compute_pair.html command as a vector of values of length 14.
	The 14 values correspond to the following sub-categories (the variable
	names in italics match those used in the ReaxFF FORTRAN library):

	{eb} = bond energy
	{ea} = atom energy
	{elp} = lone-pair energy
	{emol} = molecule energy (always 0.0)
	{ev} = valence angle energy
	{epen} = double-bond valence angle penalty
	{ecoa} = valence angle conjugation energy
	{ehb} = hydrogen bond energy
	{et} = torsion energy
	{eco} = conjugation energy
	{ew} = van der Waals energy
	{ep} = Coulomb energy
	{efi} = electric field energy (always 0.0)
	{eqeq} = charge equilibration energy :ol

	To print these quantities to the log file (with descriptive column
	headings) the following commands could be included in an input script:

	compute reax all pair reax
	variable eb equal c_reax\[1\]
	variable ea equal c_reax\[2\]
	...
	variable eqeq equal c_reax\[14\]
	thermo_style custom step temp epair v_eb v_ea ... v_eqeq :pre

	Only a single pair_coeff command is used with the {reax} style which
	specifies a ReaxFF potential file with parameters for all needed
	elements. These are mapped to LAMMPS atom types by specifying N
	additional arguments after the filename in the pair_coeff command,
	where N is the number of LAMMPS atom types:

	filename
	N indices = mapping of ReaxFF elements to atom types :ul

	The specification of the filename and the mapping of LAMMPS atom types
	recognized by the ReaxFF is done differently than for other LAMMPS
	potentials, due to the non-portable difficulty of passing character
	strings (e.g. filename, element names) between C++ and Fortran.

	The filename has to be "ffield.reax" and it has to exist in the
	directory you are running LAMMPS in. This means you cannot prepend a
	path to the file in the potentials dir. Rather, you should copy that
	file into the directory you are running from. If you wish to use
	another ReaxFF potential file, then name it "ffield.reax" and put it
	in the directory you run from.

	In the ReaxFF potential file, near the top, after the general
	parameters, is the atomic parameters section that contains element
	names, each with a couple dozen numeric parameters. If there are M
	elements specified in the {ffield} file, think of these as numbered 1
	to M. Each of the N indices you specify for the N atom types of LAMMPS
	atoms must be an integer from 1 to M. Atoms with LAMMPS type 1 will
	be mapped to whatever element you specify as the first index value,
	etc. If a mapping value is specified as NULL, the mapping is not
	performed. This can be used when a ReaxFF potential is used as part
	of the {hybrid} pair style. The NULL values are placeholders for atom
	types that will be used with other potentials.

	NOTE: Currently the reax pair style cannot be used as part of the
	{hybrid} pair style. Some additional changes still need to be made to
	enable this.

	As an example, say your LAMMPS simulation has 4 atom types and the
	elements are ordered as C, H, O, N in the {ffield} file. If you want
	the LAMMPS atom type 1 and 2 to be C, type 3 to be N, and type 4 to be
	H, you would use the following pair_coeff command:

	pair_coeff * * ffield.reax 1 1 4 2 :pre

	:line

	[Mixing, shift, table, tail correction, restart, rRESPA info]:

	This pair style does not support the "pair_modify"_pair_modify.html
	mix, shift, table, and tail options.

	This pair style does not write its information to "binary restart
	files"_restart.html, since it is stored in potential files. Thus, you
	need to re-specify the pair_style and pair_coeff commands in an input
	script that reads a restart file.

	This pair style can only be used via the {pair} keyword of the
	"run_style respa"_run_style.html command. It does not support the
	{inner}, {middle}, {outer} keywords.

	[Restrictions:]

	The ReaxFF potential files provided with LAMMPS in the potentials
	directory are parameterized for real "units"_units.html. You can use
	the ReaxFF potential with any LAMMPS units, but you would need to
	create your own potential file with coefficients listed in the
	appropriate units if your simulation doesn't use "real" units.

	[Related commands:]

	-"pair_coeff"_pair_coeff.html, "pair_style reax/c"_pair_reax_c.html,
	+"pair_coeff"_pair_coeff.html, "pair_style reax/c"_pair_reaxc.html,
	"fix_reax_bonds"_fix_reax_bonds.html

	[Default:]

	The keyword defaults are {hbcut} = 6, {hbnewflag} = 1, {tripflag} = 1,
	{precision} = 1.0e-6.

	:line

	:link(Chenoweth_20081)
	[(Chenoweth_2008)] Chenoweth, van Duin and Goddard,
	Journal of Physical Chemistry A, 112, 1040-1053 (2008).
	diff --git a/doc/src/pair_reax_c.txt b/doc/src/pair_reaxc.txt
	similarity index 96%
	rename from doc/src/pair_reax_c.txt
	rename to doc/src/pair_reaxc.txt
	index c1d719d22..76a8e6fd5 100644
	--- a/doc/src/pair_reax_c.txt
	+++ b/doc/src/pair_reaxc.txt
	@@ -1,349 +1,357 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	pair_style reax/c command :h3
	pair_style reax/c/kk command :h3

	[Syntax:]

	pair_style reax/c cfile keyword value :pre

	cfile = NULL or name of a control file :ulb,l
	zero or more keyword/value pairs may be appended :l
	keyword = {checkqeq} or {lgvdw} or {safezone} or {mincap}
	{checkqeq} value = {yes} or {no} = whether or not to require qeq/reax fix
	+ {enobonds} value = {yes} or {no} = whether or not to tally energy of atoms with no bonds
	{lgvdw} value = {yes} or {no} = whether or not to use a low gradient vdW correction
	{safezone} = factor used for array allocation
	{mincap} = minimum size for array allocation :pre
	:ule

	[Examples:]

	pair_style reax/c NULL
	pair_style reax/c controlfile checkqeq no
	pair_style reax/c NULL lgvdw yes
	pair_style reax/c NULL safezone 1.6 mincap 100
	pair_coeff * * ffield.reax C H O N :pre

	[Description:]

	Style {reax/c} computes the ReaxFF potential of van Duin, Goddard and
	co-workers. ReaxFF uses distance-dependent bond-order functions to
	represent the contributions of chemical bonding to the potential
	energy. There is more than one version of ReaxFF. The version
	implemented in LAMMPS uses the functional forms documented in the
	supplemental information of the following paper: "(Chenoweth et al.,
	2008)"_#Chenoweth_20082. The version integrated into LAMMPS matches
	the most up-to-date version of ReaxFF as of summer 2010. For more
	technical details about the pair reax/c implementation of ReaxFF, see
	the "(Aktulga)"_#Aktulga paper. The {reax/c} style was initially
	implemented as a stand-alone C code and is now integrated into LAMMPS
	as a package.

	The {reax/c/kk} style is a Kokkos version of the ReaxFF potential that is
	derived from the {reax/c} style. The Kokkos version can run on GPUs and
	can also use OpenMP multithreading. For more information about the Kokkos package,
	see "Section 4"_Section_packages.html#kokkos and "Section 5.3.3"_accelerate_kokkos.html.
	One important consideration when using the {reax/c/kk} style is the choice of either
	half or full neighbor lists. This setting can be changed using the Kokkos "package"_package.html
	command.

	The {reax/c} style differs from the "pair_style reax"_pair_reax.html
	command in the lo-level implementation details. The {reax} style is a
	Fortran library, linked to LAMMPS. The {reax/c} style was initially
	implemented as stand-alone C code and is now integrated into LAMMPS as
	a package.

	LAMMPS provides several different versions of ffield.reax in its
	potentials dir, each called potentials/ffield.reax.label. These are
	documented in potentials/README.reax. The default ffield.reax
	contains parameterizations for the following elements: C, H, O, N.

	The format of these files is identical to that used originally by van
	Duin. We have tested the accuracy of {pair_style reax/c} potential
	against the original ReaxFF code for the systems mentioned above. You
	can use other ffield files for specific chemical systems that may be
	available elsewhere (but note that their accuracy may not have been
	tested).

	NOTE: We do not distribute a wide variety of ReaxFF force field files
	with LAMMPS. Adri van Duin's group at PSU is the central repository
	for this kind of data as they are continuously deriving and updating
	parameterizations for different classes of materials. You can submit
	a contact request at the Materials Computation Center (MCC) website
	"https://www.mri.psu.edu/materials-computation-center/connect-mcc"_https://www.mri.psu.edu/materials-computation-center/connect-mcc,
	describing the material(s) you are interested in modeling with ReaxFF.
	They can tell
	you what is currently available or what it would take to create a
	suitable ReaxFF parameterization.

	The {cfile} setting can be specified as NULL, in which case default
	settings are used. A control file can be specified which defines
	values of control variables. Some control variables are
	global parameters for the ReaxFF potential. Others define certain
	performance and output settings.
	Each line in the control file specifies the value for
	a control variable. The format of the control file is described
	below.

	NOTE: The LAMMPS default values for the ReaxFF global parameters
	correspond to those used by Adri van Duin's stand-alone serial
	code. If these are changed by setting control variables in the control
	file, the results from LAMMPS and the serial code will not agree.

	Two examples using {pair_style reax/c} are provided in the examples/reax
	sub-directory, along with corresponding examples for
	"pair_style reax"_pair_reax.html.

	Use of this pair style requires that a charge be defined for every
	atom. See the "atom_style"_atom_style.html and
	"read_data"_read_data.html commands for details on how to specify
	charges.

	The ReaxFF parameter files provided were created using a charge
	equilibration (QEq) model for handling the electrostatic interactions.
	Therefore, by default, LAMMPS requires that the "fix
	qeq/reax"_fix_qeq_reax.html command be used with {pair_style reax/c}
	when simulating a ReaxFF model, to equilibrate charge each timestep.
	Using the keyword {checkqeq} with the value {no}
	turns off the check for {fix qeq/reax},
	allowing a simulation to be run without charge equilibration.
	In this case, the static charges you
	assign to each atom will be used for computing the electrostatic
	interactions in the system.
	See the "fix qeq/reax"_fix_qeq_reax.html command for details.

	Using the optional keyword {lgvdw} with the value {yes} turns on
	the low-gradient correction of the ReaxFF/C for long-range
	London Dispersion, as described in the "(Liu)"_#Liu_2011 paper. Force field
	file {ffield.reax.lg} is designed for this correction, and is trained
	for several energetic materials (see "Liu"). When using lg-correction,
	recommended value for parameter {thb} is 0.01, which can be set in the
	control file. Note: Force field files are different for the original
	or lg corrected pair styles, using wrong ffield file generates an error message.

	+Using the optional keyword {enobonds} with the value {yes}, the energy
	+of atoms with no bonds (i.e. isolated atoms) is included in the total
	+potential energy and the per-atom energy of that atom. If the value
	+{no} is specified then the energy of atoms with no bonds is set to zero.
	+The latter behavior is usual not desired, as it causes discontinuities
	+in the potential energy when the bonding of an atom drops to zero.
	+
	Optional keywords {safezone} and {mincap} are used for allocating
	reax/c arrays. Increasing these values can avoid memory problems, such
	as segmentation faults and bondchk failed errors, that could occur under
	certain conditions. These keywords aren't used by the Kokkos version, which
	instead uses a more robust memory allocation scheme that checks if the sizes of
	the arrays have been exceeded and automatically allocates more memory.

	The thermo variable {evdwl} stores the sum of all the ReaxFF potential
	energy contributions, with the exception of the Coulombic and charge
	equilibration contributions which are stored in the thermo variable
	{ecoul}. The output of these quantities is controlled by the
	"thermo"_thermo.html command.

	This pair style tallies a breakdown of the total ReaxFF potential
	energy into sub-categories, which can be accessed via the "compute
	pair"_compute_pair.html command as a vector of values of length 14.
	The 14 values correspond to the following sub-categories (the variable
	names in italics match those used in the original FORTRAN ReaxFF code):

	{eb} = bond energy
	{ea} = atom energy
	{elp} = lone-pair energy
	{emol} = molecule energy (always 0.0)
	{ev} = valence angle energy
	{epen} = double-bond valence angle penalty
	{ecoa} = valence angle conjugation energy
	{ehb} = hydrogen bond energy
	{et} = torsion energy
	{eco} = conjugation energy
	{ew} = van der Waals energy
	{ep} = Coulomb energy
	{efi} = electric field energy (always 0.0)
	{eqeq} = charge equilibration energy :ol

	To print these quantities to the log file (with descriptive column
	headings) the following commands could be included in an input script:

	compute reax all pair reax/c
	variable eb equal c_reax\[1\]
	variable ea equal c_reax\[2\]
	\[...\]
	variable eqeq equal c_reax\[14\]
	thermo_style custom step temp epair v_eb v_ea \[...\] v_eqeq :pre

	Only a single pair_coeff command is used with the {reax/c} style which
	specifies a ReaxFF potential file with parameters for all needed
	elements. These are mapped to LAMMPS atom types by specifying N
	additional arguments after the filename in the pair_coeff command,
	where N is the number of LAMMPS atom types:

	filename
	N indices = ReaxFF elements :ul

	The filename is the ReaxFF potential file. Unlike for the {reax}
	pair style, any filename can be used.

	In the ReaxFF potential file, near the top, after the general
	parameters, is the atomic parameters section that contains element
	names, each with a couple dozen numeric parameters. If there are M
	elements specified in the {ffield} file, think of these as numbered 1
	to M. Each of the N indices you specify for the N atom types of LAMMPS
	atoms must be an integer from 1 to M. Atoms with LAMMPS type 1 will
	be mapped to whatever element you specify as the first index value,
	etc. If a mapping value is specified as NULL, the mapping is not
	performed. This can be used when the {reax/c} style is used as part
	of the {hybrid} pair style. The NULL values are placeholders for atom
	types that will be used with other potentials.

	As an example, say your LAMMPS simulation has 4 atom types and the
	elements are ordered as C, H, O, N in the {ffield} file. If you want
	the LAMMPS atom type 1 and 2 to be C, type 3 to be N, and type 4 to be
	H, you would use the following pair_coeff command:

	pair_coeff * * ffield.reax C C N H :pre

	:line

	The format of a line in the control file is as follows:

	variable_name value :pre

	and it may be followed by an "!" character and a trailing comment.

	If the value of a control variable is not specified, then default
	values are used. What follows is the list of variables along with a
	brief description of their use and default values.

	simulation_name: Output files produced by {pair_style reax/c} carry
	this name + extensions specific to their contents. Partial energies
	are reported with a ".pot" extension, while the trajectory file has
	".trj" extension.

	tabulate_long_range: To improve performance, long range interactions
	can optionally be tabulated (0 means no tabulation). Value of this
	variable denotes the size of the long range interaction table. The
	range from 0 to long range cutoff (defined in the {ffield} file) is
	divided into {tabulate_long_range} points. Then at the start of
	simulation, we fill in the entries of the long range interaction table
	by computing the energies and forces resulting from van der Waals and
	Coulomb interactions between every possible atom type pairs present in
	the input system. During the simulation we consult to the long range
	interaction table to estimate the energy and forces between a pair of
	atoms. Linear interpolation is used for estimation. (default value =
	0)

	energy_update_freq: Denotes the frequency (in number of steps) of
	writes into the partial energies file. (default value = 0)

	nbrhood_cutoff: Denotes the near neighbors cutoff (in Angstroms)
	regarding the bonded interactions. (default value = 5.0)

	hbond_cutoff: Denotes the cutoff distance (in Angstroms) for hydrogen
	bond interactions.(default value = 7.5. A value of 0.0 turns off
	hydrogen bonds)

	bond_graph_cutoff: is the threshold used in determining what is a
	physical bond, what is not. Bonds and angles reported in the
	trajectory file rely on this cutoff. (default value = 0.3)

	thb_cutoff: cutoff value for the strength of bonds to be considered in
	three body interactions. (default value = 0.001)

	thb_cutoff_sq: cutoff value for the strength of bond order products
	to be considered in three body interactions. (default value = 0.00001)

	write_freq: Frequency of writes into the trajectory file. (default
	value = 0)

	traj_title: Title of the trajectory - not the name of the trajectory
	file.

	atom_info: 1 means print only atomic positions + charge (default = 0)

	atom_forces: 1 adds net forces to atom lines in the trajectory file
	(default = 0)

	atom_velocities: 1 adds atomic velocities to atoms line (default = 0)

	bond_info: 1 prints bonds in the trajectory file (default = 0)

	angle_info: 1 prints angles in the trajectory file (default = 0)

	:line

	[Mixing, shift, table, tail correction, restart, rRESPA info]:

	This pair style does not support the "pair_modify"_pair_modify.html
	mix, shift, table, and tail options.

	This pair style does not write its information to "binary restart
	files"_restart.html, since it is stored in potential files. Thus, you
	need to re-specify the pair_style and pair_coeff commands in an input
	script that reads a restart file.

	This pair style can only be used via the {pair} keyword of the
	"run_style respa"_run_style.html command. It does not support the
	{inner}, {middle}, {outer} keywords.

	:line

	Styles with a {gpu}, {intel}, {kk}, {omp}, or {opt} suffix are
	functionally the same as the corresponding style without the suffix.
	They have been optimized to run faster, depending on your available
	hardware, as discussed in "Section 5"_Section_accelerate.html
	of the manual. The accelerated styles take the same arguments and
	should produce the same results, except for round-off and precision
	issues.

	These accelerated styles are part of the GPU, USER-INTEL, KOKKOS,
	USER-OMP and OPT packages, respectively. They are only enabled if
	LAMMPS was built with those packages. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	You can specify the accelerated styles explicitly in your input script
	by including their suffix, or you can use the "-suffix command-line
	switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
	use the "suffix"_suffix.html command in your input script.

	See "Section 5"_Section_accelerate.html of the manual for
	more instructions on how to use the accelerated styles effectively.

	:line

	[Restrictions:]

	This pair style is part of the USER-REAXC package. It is only enabled
	if LAMMPS was built with that package. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	The ReaxFF potential files provided with LAMMPS in the potentials
	directory are parameterized for real "units"_units.html. You can use
	the ReaxFF potential with any LAMMPS units, but you would need to
	create your own potential file with coefficients listed in the
	appropriate units if your simulation doesn't use "real" units.

	[Related commands:]

	"pair_coeff"_pair_coeff.html, "fix qeq/reax"_fix_qeq_reax.html, "fix
	reax/c/bonds"_fix_reax_bonds.html, "fix
	reax/c/species"_fix_reaxc_species.html, "pair_style
	reax"_pair_reax.html

	[Default:]

	-The keyword defaults are checkqeq = yes, lgvdw = no, safezone = 1.2,
	+The keyword defaults are checkqeq = yes, enobonds = yes, lgvdw = no, safezone = 1.2,
	mincap = 50.

	:line

	:link(Chenoweth_20082)
	[(Chenoweth_2008)] Chenoweth, van Duin and Goddard,
	Journal of Physical Chemistry A, 112, 1040-1053 (2008).

	:link(Aktulga)
	(Aktulga) Aktulga, Fogarty, Pandit, Grama, Parallel Computing, 38,
	245-259 (2012).

	:link(Liu_2011)
	[(Liu)] L. Liu, Y. Liu, S. V. Zybin, H. Sun and W. A. Goddard, Journal
	of Physical Chemistry A, 115, 11016-11022 (2011).
	diff --git a/doc/src/pair_sdk.txt b/doc/src/pair_sdk.txt
	index 212760e03..1c348eaaf 100644
	--- a/doc/src/pair_sdk.txt
	+++ b/doc/src/pair_sdk.txt
	@@ -1,156 +1,156 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	pair_style lj/sdk command :h3
	pair_style lj/sdk/gpu command :h3
	pair_style lj/sdk/kk command :h3
	pair_style lj/sdk/omp command :h3
	pair_style lj/sdk/coul/long command :h3
	pair_style lj/sdk/coul/long/gpu command :h3
	pair_style lj/sdk/coul/long/omp command :h3

	[Syntax:]

	pair_style style args :pre

	style = {lj/sdk} or {lj/sdk/coul/long}
	args = list of arguments for a particular style :ul
	{lj/sdk} args = cutoff
	cutoff = global cutoff for Lennard Jones interactions (distance units)
	{lj/sdk/coul/long} args = cutoff (cutoff2)
	cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units)
	cutoff2 = global cutoff for Coulombic (optional) (distance units) :pre

	[Examples:]

	pair_style lj/sdk 2.5
	pair_coeff 1 1 lj12_6 1 1.1 2.8 :pre

	pair_style lj/sdk/coul/long 10.0
	pair_style lj/sdk/coul/long 10.0 12.0
	pair_coeff 1 1 lj9_6 100.0 3.5 12.0 :pre

	[Description:]

	The {lj/sdk} styles compute a 9/6, 12/4, or 12/6 Lennard-Jones potential,
	given by

	:c,image(Eqs/pair_cmm.jpg)

	as required for the SDK Coarse-grained MD parametrization discussed in
	"(Shinoda)"_#Shinoda3 and "(DeVane)"_#DeVane. Rc is the cutoff.

	Style {lj/sdk/coul/long} computes the adds Coulombic interactions
	with an additional damping factor applied so it can be used in
	conjunction with the "kspace_style"_kspace_style.html command and
	its {ewald} or {pppm} or {pppm/cg} option. The Coulombic cutoff
	specified for this style means that pairwise interactions within
	this distance are computed directly; interactions outside that
	distance are computed in reciprocal space.

	The following coefficients must be defined for each pair of atoms
	types via the "pair_coeff"_pair_coeff.html command as in the examples
	above, or in the data file or restart files read by the
	"read_data"_read_data.html or "read_restart"_read_restart.html
	commands, or by mixing as described below:

	cg_type (lj9_6, lj12_4, or lj12_6)
	epsilon (energy units)
	sigma (distance units)
	cutoff1 (distance units) :ul

	Note that sigma is defined in the LJ formula as the zero-crossing
	distance for the potential, not as the energy minimum. The prefactors
	are chosen so that the potential minimum is at -epsilon.

	The latter 2 coefficients are optional. If not specified, the global
	LJ and Coulombic cutoffs specified in the pair_style command are used.
	If only one cutoff is specified, it is used as the cutoff for both LJ
	and Coulombic interactions for this type pair. If both coefficients
	are specified, they are used as the LJ and Coulombic cutoffs for this
	type pair.

	For {lj/sdk/coul/long} only the LJ cutoff can be specified since a
	Coulombic cutoff cannot be specified for an individual I,J type pair.
	All type pairs use the same global Coulombic cutoff specified in the
	pair_style command.

	:line

	Styles with a {gpu}, {intel}, {kk}, {omp} or {opt} suffix are
	functionally the same as the corresponding style without the suffix.
	They have been optimized to run faster, depending on your available
	hardware, as discussed in "Section 5"_Section_accelerate.html
	of the manual. The accelerated styles take the same arguments and
	should produce the same results, except for round-off and precision
	issues.

	These accelerated styles are part of the GPU, USER-INTEL, KOKKOS,
	USER-OMP, and OPT packages respectively. They are only enabled if
	LAMMPS was built with those packages. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	You can specify the accelerated styles explicitly in your input script
	by including their suffix, or you can use the "-suffix command-line
	switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
	use the "suffix"_suffix.html command in your input script.

	See "Section 5"_Section_accelerate.html of the manual for
	more instructions on how to use the accelerated styles effectively.

	:line

	[Mixing, shift, table, tail correction, restart, and rRESPA info]:

	For atom type pairs I,J and I != J, the epsilon and sigma coefficients
	and cutoff distance for all of the lj/sdk pair styles {cannot} be mixed,
	since different pairs may have different exponents. So all parameters
	for all pairs have to be specified explicitly through the "pair_coeff"
	command. Defining then in a data file is also not supported, due to
	limitations of that file format.

	All of the lj/sdk pair styles support the
	"pair_modify"_pair_modify.html shift option for the energy of the
	Lennard-Jones portion of the pair interaction.

	The {lj/sdk/coul/long} pair styles support the
	"pair_modify"_pair_modify.html table option since they can tabulate
	the short-range portion of the long-range Coulombic interaction.

	All of the lj/sdk pair styles write their information to "binary
	restart files"_restart.html, so pair_style and pair_coeff commands do
	not need to be specified in an input script that reads a restart file.

	The lj/sdk and lj/cut/coul/long pair styles do not support
	the use of the {inner}, {middle}, and {outer} keywords of the "run_style
	respa"_run_style.html command.

	:line

	[Restrictions:]

	-All of the lj/sdk pair styles are part of the USER-CG-CMM package.
	+All of the lj/sdk pair styles are part of the USER-CGSDK package.
	The {lj/sdk/coul/long} style also requires the KSPACE package to be
	built (which is enabled by default). They are only enabled if LAMMPS
	was built with that package. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	[Related commands:]

	"pair_coeff"_pair_coeff.html, "angle_style sdk"_angle_sdk.html

	[Default:] none

	:line

	:link(Shinoda3)
	[(Shinoda)] Shinoda, DeVane, Klein, Mol Sim, 33, 27 (2007).

	:link(DeVane)
	[(DeVane)] Shinoda, DeVane, Klein, Soft Matter, 4, 2453-2462 (2008).

	diff --git a/doc/src/pair_srp.txt b/doc/src/pair_srp.txt
	index 3f54445ba..e7f1e00d1 100644
	--- a/doc/src/pair_srp.txt
	+++ b/doc/src/pair_srp.txt
	@@ -1,166 +1,168 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	pair_style srp command :h3

	[Syntax:]

	pair_style srp cutoff btype dist keyword value ...

	cutoff = global cutoff for SRP interactions (distance units) :ulb,l
	btype = bond type to apply SRP interactions to (can be wildcard, see below) :l
	distance = {min} or {mid} :l
	zero or more keyword/value pairs may be appended :l
	keyword = {exclude} :l
	{bptype} value = atom type for bond particles
	{exclude} value = {yes} or {no} :pre
	:ule

	[Examples:]

	pair_style hybrid dpd 1.0 1.0 12345 srp 0.8 1 mid exclude yes
	pair_coeff 1 1 dpd 60.0 4.5 1.0
	pair_coeff 1 2 none
	pair_coeff 2 2 srp 100.0 0.8 :pre

	pair_style hybrid dpd 1.0 1.0 12345 srp 0.8 * min exclude yes
	pair_coeff 1 1 dpd 60.0 50 1.0
	pair_coeff 1 2 none
	pair_coeff 2 2 srp 40.0 :pre

	pair_style hybrid srp 0.8 2 mid
	pair_coeff 1 1 none
	pair_coeff 1 2 none
	pair_coeff 2 2 srp 100.0 0.8 :pre

	[Description:]

	Style {srp} computes a soft segmental repulsive potential (SRP) that
	acts between pairs of bonds. This potential is useful for preventing
	bonds from passing through one another when a soft non-bonded
	potential acts between beads in, for example, DPD polymer chains. An
	example input script that uses this command is provided in
	examples/USER/srp.

	Bonds of specified type {btype} interact with one another through a
	bond-pairwise potential, such that the force on bond {i} due to bond
	{j} is as follows

	:c,image(Eqs/pair_srp1.jpg)

	where {r} and {rij} are the distance and unit vector between the two
	bonds. Note that {btype} can be specified as an asterisk "*", which
	case the interaction is applied to all bond types. The {mid} option
	computes {r} and {rij} from the midpoint distance between bonds. The
	{min} option computes {r} and {rij} from the minimum distance between
	bonds. The force acting on a bond is mapped onto the two bond atoms
	according to the lever rule,

	:c,image(Eqs/pair_srp2.jpg)

	where {L} is the normalized distance from the atom to the point of
	closest approach of bond {i} and {j}. The {mid} option takes {L} as
	0.5 for each interaction as described in "(Sirk)"_#Sirk2.

	The following coefficients must be defined via the
	"pair_coeff"_pair_coeff.html command as in the examples above, or in
	the data file or restart file read by the "read_data"_read_data.html
	or "read_restart"_read_restart.html commands:

	{C} (force units)
	{rc} (distance units) :ul

	The last coefficient is optional. If not specified, the global cutoff
	is used.

	NOTE: Pair style srp considers each bond of type {btype} to be a
	fictitious "particle" of type {bptype}, where {bptype} is either the
	largest atom type in the system, or the type set by the {bptype} flag.
	Any actual existing particles with this atom type will be deleted at
	the beginning of a run. This means you must specify the number of
	types in your system accordingly; usually to be one larger than what
	would normally be the case, e.g. via the "create_box"_create_box.html
	or by changing the header in your "data file"_read_data.html. The
	fictitious "bond particles" are inserted at the beginning of the run,
	and serve as placeholders that define the position of the bonds. This
	allows neighbor lists to be constructed and pairwise interactions to
	be computed in almost the same way as is done for actual particles.
	Because bonds interact only with other bonds, "pair_style
	hybrid"_pair_hybrid.html should be used to turn off interactions
	between atom type {bptype} and all other types of atoms. An error
	will be flagged if "pair_style hybrid"_pair_hybrid.html is not used.

	The optional {exclude} keyword determines if forces are computed
	between first neighbor (directly connected) bonds. For a setting of
	{no}, first neighbor forces are computed; for {yes} they are not
	computed. A setting of {no} cannot be used with the {min} option for
	distance calculation because the minimum distance between directly
	connected bonds is zero.

	Pair style {srp} turns off normalization of thermodynamic properties
	by particle number, as if the command "thermo_modify norm
	no"_thermo_modify.html had been issued.

	The pairwise energy associated with style {srp} is shifted to be zero
	at the cutoff distance {rc}.

	:line

	[Mixing, shift, table, tail correction, restart, rRESPA info]:

	This pair styles does not support mixing.

	This pair style does not support the "pair_modify"_pair_modify.html
	shift option for the energy of the pair interaction. Note that as
	discussed above, the energy term is already shifted to be 0.0 at the
	cutoff distance {rc}.

	The "pair_modify"_pair_modify.html table option is not relevant for
	this pair style.

	This pair style does not support the "pair_modify"_pair_modify.html
	tail option for adding long-range tail corrections to energy and
	pressure.

	This pair style writes global and per-atom information to "binary
	restart files"_restart.html. Pair srp should be used with "pair_style
	hybrid"_pair_hybrid.html, thus the pair_coeff commands need to be
	specified in the input script when reading a restart file.

	This pair style can only be used via the {pair} keyword of the
	"run_style respa"_run_style.html command. It does not support the
	{inner}, {middle}, {outer} keywords.

	:line

	[Restrictions:]

	This pair style is part of the USER-MISC package. It is only enabled
	if LAMMPS was built with that package. See the Making LAMMPS section
	for more info.

	This pair style must be used with "pair_style
	hybrid"_pair_hybrid.html.

	This pair style requires the "newton"_newton.html command to be {on}
	for non-bonded interactions.

	+This pair style is not compatible with "rigid body integrators"_fix_rigid.html
	+
	[Related commands:]

	"pair_style hybrid"_pair_hybrid.html, "pair_coeff"_pair_coeff.html,
	"pair dpd"_pair_dpd.html

	[Default:]

	The default keyword value is exclude = yes.

	:line

	:link(Sirk2)
	[(Sirk)] Sirk TW, Sliozberg YR, Brennan JK, Lisal M, Andzelm JW, J
	Chem Phys, 136 (13) 134903, 2012.
	diff --git a/doc/src/pairs.txt b/doc/src/pairs.txt
	index 8694747da..0898906e7 100644
	--- a/doc/src/pairs.txt
	+++ b/doc/src/pairs.txt
	@@ -1,107 +1,107 @@
	Pair Styles :h1

	<!-- RST

	.. toctree::
	:maxdepth: 1

	pair_adp
	pair_agni
	pair_airebo
	pair_awpmd
	pair_beck
	pair_body
	pair_bop
	pair_born
	pair_brownian
	pair_buck
	pair_buck_long
	pair_charmm
	pair_class2
	pair_colloid
	pair_comb
	pair_coul
	pair_coul_diel
	pair_cs
	pair_dipole
	pair_dpd
	pair_dpd_fdt
	pair_dsmc
	pair_eam
	pair_edip
	pair_eff
	pair_eim
	pair_exp6_rx
	pair_gauss
	pair_gayberne
	pair_gran
	pair_gromacs
	pair_hbond_dreiding
	pair_hybrid
	pair_kim
	pair_kolmogorov_crespi_z
	pair_lcbop
	pair_line_lj
	pair_list
	pair_lj
	pair_lj96
	pair_lj_cubic
	pair_lj_expand
	pair_lj_long
	pair_lj_sf
	pair_lj_smooth
	pair_lj_smooth_linear
	pair_lj_soft
	pair_lubricate
	pair_lubricateU
	pair_mdf
	pair_meam
	pair_meam_spline
	pair_meam_sw_spline
	pair_mgpt
	pair_mie
	pair_momb
	pair_morse
	pair_multi_lucy
	pair_multi_lucy_rx
	pair_nb3b_harmonic
	pair_nm
	pair_none
	pair_oxdna
	pair_oxdna2
	pair_peri
	pair_polymorphic
	pair_quip
	pair_reax
	- pair_reax_c
	+ pair_reaxc
	pair_resquared
	pair_sdk
	pair_smd_hertz
	pair_smd_tlsph
	pair_smd_triangulated_surface
	pair_smd_ulsph
	pair_smtbq
	pair_snap
	pair_soft
	pair_sph_heatconduction
	pair_sph_idealgas
	pair_sph_lj
	pair_sph_rhosum
	pair_sph_taitwater
	pair_sph_taitwater_morris
	pair_srp
	pair_sw
	pair_table
	pair_table_rx
	pair_tersoff
	pair_tersoff_mod
	pair_tersoff_zbl
	pair_thole
	pair_tri_lj
	pair_vashishta
	pair_yukawa
	pair_yukawa_colloid
	pair_zbl
	pair_zero

	END_RST -->
	diff --git a/doc/src/python.txt b/doc/src/python.txt
	index a5003be54..e8a76c0e3 100644
	--- a/doc/src/python.txt
	+++ b/doc/src/python.txt
	@@ -1,481 +1,481 @@
	"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

	:link(lws,http://lammps.sandia.gov)
	:link(ld,Manual.html)
	:link(lc,Section_commands.html#comm)

	:line

	python command :h3

	[Syntax:]

	python func keyword args ... :pre

	func = name of Python function :ulb,l
	one or more keyword/args pairs must be appended :l
	keyword = {invoke} or {input} or {return} or {format} or {length} or {file} or {here} or {exists}
	{invoke} arg = none = invoke the previously defined Python function
	{input} args = N i1 i2 ... iN
	N = # of inputs to function
	i1,...,iN = value, SELF, or LAMMPS variable name
	value = integer number, floating point number, or string
	SELF = reference to LAMMPS itself which can be accessed by Python function
	variable = v_name, where name = name of LAMMPS variable, e.g. v_abc
	{return} arg = varReturn
	varReturn = v_name = LAMMPS variable name which return value of function will be assigned to
	{format} arg = fstring with M characters
	M = N if no return value, where N = # of inputs
	M = N+1 if there is a return value
	fstring = each character (i,f,s,p) corresponds in order to an input or return value
	'i' = integer, 'f' = floating point, 's' = string, 'p' = SELF
	{length} arg = Nlen
	Nlen = max length of string returned from Python function
	{file} arg = filename
	filename = file of Python code, which defines func
	{here} arg = inline
	inline = one or more lines of Python code which defines func
	must be a single argument, typically enclosed between triple quotes
	{exists} arg = none = Python code has been loaded by previous python command :pre
	:ule

	[Examples:]

	python pForce input 2 v_x 20.0 return v_f format fff file force.py
	python pForce invoke :pre

	python factorial input 1 myN return v_fac format ii here """
	def factorial(n):
	if n == 1: return n
	return n * factorial(n-1)
	""" :pre

	python loop input 1 SELF return v_value format pf here """
	def loop(lmpptr,N,cut0):
	from lammps import lammps
	lmp = lammps(ptr=lmpptr) :pre

	# loop N times, increasing cutoff each time :pre

	for i in range(N):
	cut = cut0 + i*0.1
	lmp.set_variable("cut",cut) # set a variable in LAMMPS
	lmp.command("pair_style lj/cut $\{cut\}") # LAMMPS commands
	lmp.command("pair_coeff * * 1.0 1.0")
	lmp.command("run 100")
	""" :pre

	[Description:]

	Define a Python function or execute a previously defined function.
	Arguments, including LAMMPS variables, can be passed to the function
	from the LAMMPS input script and a value returned by the Python
	function to a LAMMPS variable. The Python code for the function can
	be included directly in the input script or in a separate Python file.
	The function can be standard Python code or it can make "callbacks" to
	LAMMPS through its library interface to query or set internal values
	within LAMMPS. This is a powerful mechanism for performing complex
	operations in a LAMMPS input script that are not possible with the
	simple input script and variable syntax which LAMMPS defines. Thus
	your input script can operate more like a true programming language.

	Use of this command requires building LAMMPS with the PYTHON package
	which links to the Python library so that the Python interpreter is
	embedded in LAMMPS. More details about this process are given below.

	There are two ways to invoke a Python function once it has been
	defined. One is using the {invoke} keyword. The other is to assign
	the function to a "python-style variable"_variable.html defined in
	your input script. Whenever the variable is evaluated, it will
	execute the Python function to assign a value to the variable. Note
	that variables can be evaluated in many different ways within LAMMPS.
	They can be substituted for directly in an input script. Or they can
	be passed to various commands as arguments, so that the variable is
	evaluated during a simulation run.

	A broader overview of how Python can be used with LAMMPS is
	given in "Section 11"_Section_python.html. There is an
	examples/python directory which illustrates use of the python
	command.

	:line

	The {func} setting specifies the name of the Python function. The
	code for the function is defined using the {file} or {here} keywords
	as explained below.

	If the {invoke} keyword is used, no other keywords can be used, and a
	previous python command must have defined the Python function
	referenced by this command. This invokes the Python function with the
	previously defined arguments and return value processed as explained
	below. You can invoke the function as many times as you wish in your
	input script.

	The {input} keyword defines how many arguments {N} the Python function
	expects. If it takes no arguments, then the {input} keyword should
	not be used. Each argument can be specified directly as a value,
	e.g. 6 or 3.14159 or abc (a string of characters). The type of each
	argument is specified by the {format} keyword as explained below, so
	that Python will know how to interpret the value. If the word SELF is
	used for an argument it has a special meaning. A pointer is passed to
	the Python function which it converts into a reference to LAMMPS
	itself. This enables the function to call back to LAMMPS through its
	library interface as explained below. This allows the Python function
	to query or set values internal to LAMMPS which can affect the
	subsequent execution of the input script. A LAMMPS variable can also
	be used as an argument, specified as v_name, where "name" is the name
	of the variable. Any style of LAMMPS variable can be used, as defined
	by the "variable"_variable.html command. Each time the Python
	function is invoked, the LAMMPS variable is evaluated and its value is
	passed to the Python function.

	The {return} keyword is only needed if the Python function returns a
	value. The specified {varReturn} must be of the form v_name, where
	"name" is the name of a python-style LAMMPS variable, defined by the
	"variable"_variable.html command. The Python function can return a
	numeric or string value, as specified by the {format} keyword.

	As explained on the "variable"_variable.html doc page, the definition
	of a python-style variable associates a Python function name with the
	variable. This must match the {func} setting for this command. For
	example these two commands would be self-consistent:

	variable foo python myMultiply
	python myMultiply return v_foo format f file funcs.py :pre

	The two commands can appear in either order in the input script so
	long as both are specified before the Python function is invoked for
	the first time.

	The {format} keyword must be used if the {input} or {return} keyword
	is used. It defines an {fstring} with M characters, where M = sum of
	number of inputs and outputs. The order of characters corresponds to
	the N inputs, followed by the return value (if it exists). Each
	character must be one of the following: "i" for integer, "f" for
	floating point, "s" for string, or "p" for SELF. Each character
	defines the type of the corresponding input or output value of the
	Python function and affects the type conversion that is performed
	internally as data is passed back and forth between LAMMPS and Python.
	Note that it is permissible to use a "python-style
	variable"_variable.html in a LAMMPS command that allows for an
	equal-style variable as an argument, but only if the output of the
	Python function is flagged as a numeric value ("i" or "f") via the
	{format} keyword.

	If the {return} keyword is used and the {format} keyword specifies the
	output as a string, then the default maximum length of that string is
	63 characters (64-1 for the string terminator). If you want to return
	a longer string, the {length} keyword can be specified with its {Nlen}
	value set to a larger number (the code allocates space for Nlen+1 to
	include the string terminator). If the Python function generates a
	string longer than the default 63 or the specified {Nlen}, it will be
	truncated.

	:line

	Either the {file}, {here}, or {exists} keyword must be used, but only
	one of them. These keywords specify what Python code to load into the
	Python interpreter. The {file} keyword gives the name of a file,
	which should end with a ".py" suffix, which contains Python code. The
	code will be immediately loaded into and run in the "main" module of
	the Python interpreter. Note that Python code which contains a
	function definition does not "execute" the function when it is run; it
	simply defines the function so that it can be invoked later.

	The {here} keyword does the same thing, except that the Python code
	follows as a single argument to the {here} keyword. This can be done
	using triple quotes as delimiters, as in the examples above. This
	allows Python code to be listed verbatim in your input script, with
	proper indentation, blank lines, and comments, as desired. See
	"Section 3.2"_Section_commands.html#cmd_2, for an explanation of how
	triple quotes can be used as part of input script syntax.

	The {exists} keyword takes no argument. It means that Python code
	containing the required Python function defined by the {func} setting,
	is assumed to have been previously loaded by another python command.

	Note that the Python code that is loaded and run must contain a
	function with the specified {func} name. To operate properly when
	later invoked, the function code must match the {input} and
	{return} and {format} keywords specified by the python command.
	Otherwise Python will generate an error.

	:line

	This section describes how Python code can be written to work with
	LAMMPS.

	Whether you load Python code from a file or directly from your input
	script, via the {file} and {here} keywords, the code can be identical.
	It must be indented properly as Python requires. It can contain
	comments or blank lines. If the code is in your input script, it
	cannot however contain triple-quoted Python strings, since that will
	conflict with the triple-quote parsing that the LAMMPS input script
	performs.

	All the Python code you specify via one or more python commands is
	loaded into the Python "main" module, i.e. __main__. The code can
	define global variables or statements that are outside of function
	definitions. It can contain multiple functions, only one of which
	matches the {func} setting in the python command. This means you can
	use the {file} keyword once to load several functions, and the
	{exists} keyword thereafter in subsequent python commands to access
	the other functions previously loaded.

	A Python function you define (or more generally, the code you load)
	can import other Python modules or classes, it can make calls to other
	system functions or functions you define, and it can access or modify
	global variables (in the "main" module) which will persist between
	successive function calls. The latter can be useful, for example, to
	prevent a function from being invoke multiple times per timestep by
	different commands in a LAMMPS input script that access the returned
	python-style variable associated with the function. For example,
	consider this function loaded with two global variables defined
	outside the function:

	nsteplast = -1
	nvaluelast = 0 :pre

	def expensive(nstep):
	global nsteplast,nvaluelast
	if nstep == nsteplast: return nvaluelast
	nsteplast = nstep
	# perform complicated calculation
	nvalue = ...
	nvaluelast = nvalue
	return nvalue :pre

	Nsteplast stores the previous timestep the function was invoked
	(passed as an argument to the function). Nvaluelast stores the return
	value computed on the last function invocation. If the function is
	invoked again on the same timestep, the previous value is simply
	returned, without re-computing it. The "global" statement inside the
	Python function allows it to overwrite the global variables.

	Note that if you load Python code multiple times (via multiple python
	commands), you can overwrite previously loaded variables and functions
	if you are not careful. E.g. if the code above were loaded twice, the
	global variables would be re-initialized, which might not be what you
	want. Likewise, if a function with the same name exists in two chunks
	of Python code you load, the function loaded second will override the
	function loaded first.

	It's important to realize that if you are running LAMMPS in parallel,
	each MPI task will load the Python interpreter and execute a local
	copy of the Python function(s) you define. There is no connection
	between the Python interpreters running on different processors.
	This implies three important things.

	First, if you put a print statement in your Python function, you will
	see P copies of the output, when running on P processors. If the
	prints occur at (nearly) the same time, the P copies of the output may
	be mixed together. Welcome to the world of parallel programming and
	debugging.

	Second, if your Python code loads modules that are not pre-loaded by
	the Python library, then it will load the module from disk. This may
	be a bottleneck if 1000s of processors try to load a module at the
	same time. On some large supercomputers, loading of modules from disk
	by Python may be disabled. In this case you would need to pre-build a
	Python library that has the required modules pre-loaded and link
	LAMMPS with that library.

	Third, if your Python code calls back to LAMMPS (discussed in the
	next section) and causes LAMMPS to perform an MPI operation requires
	global communication (e.g. via MPI_Allreduce), such as computing the
	global temperature of the system, then you must insure all your Python
	functions (running independently on different processors) call back to
	LAMMPS. Otherwise the code may hang.

	:line

	Your Python function can "call back" to LAMMPS through its
	library interface, if you use the SELF input to pass Python
	a pointer to LAMMPS. The mechanism for doing this in your
	Python function is as follows:

	def foo(lmpptr,...):
	from lammps import lammps
	lmp = lammps(ptr=lmpptr)
	lmp.command('print "Hello from inside Python"')
	... :pre

	The function definition must include a variable (lmpptr in this case)
	which corresponds to SELF in the python command. The first line of
	the function imports the Python module lammps.py in the python dir of
	the distribution. The second line creates a Python object "lmp" which
	wraps the instance of LAMMPS that called the function. The
	-"ptr=lmpptr" argument is what makes that happen. The thrid line
	+"ptr=lmpptr" argument is what makes that happen. The third line
	invokes the command() function in the LAMMPS library interface. It
	takes a single string argument which is a LAMMPS input script command
	for LAMMPS to execute, the same as if it appeared in your input
	script. In this case, LAMMPS should output

	Hello from inside Python :pre

	to the screen and log file. Note that since the LAMMPS print command
	itself takes a string in quotes as its argument, the Python string
	must be delimited with a different style of quotes.

	"Section 11.7"_Section_python.html#py_7 describes the syntax for how
	Python wraps the various functions included in the LAMMPS library
	interface.

	A more interesting example is in the examples/python/in.python script
	which loads and runs the following function from examples/python/funcs.py:

	def loop(N,cut0,thresh,lmpptr):
	print "LOOP ARGS",N,cut0,thresh,lmpptr
	from lammps import lammps
	lmp = lammps(ptr=lmpptr)
	natoms = lmp.get_natoms() :pre

	for i in range(N):
	cut = cut0 + i*0.1 :pre

	lmp.set_variable("cut",cut) # set a variable in LAMMPS
	lmp.command("pair_style lj/cut $\{cut\}") # LAMMPS command
	#lmp.command("pair_style lj/cut %d" % cut) # LAMMPS command option :pre

	lmp.command("pair_coeff * * 1.0 1.0") # ditto
	lmp.command("run 10") # ditto
	pe = lmp.extract_compute("thermo_pe",0,0) # extract total PE from LAMMPS
	print "PE",pe/natoms,thresh
	if pe/natoms < thresh: return :pre

	with these input script commands:

	python loop input 4 10 1.0 -4.0 SELF format iffp file funcs.py
	python loop invoke :pre

	This has the effect of looping over a series of 10 short runs (10
	timesteps each) where the pair style cutoff is increased from a value
	of 1.0 in distance units, in increments of 0.1. The looping stops
	when the per-atom potential energy falls below a threshold of -4.0 in
	energy units. More generally, Python can be used to implement a loop
	with complex logic, much more so than can be created using the LAMMPS
	"jump"_jump.html and "if"_if.html commands.

	Several LAMMPS library functions are called from the loop function.
	Get_natoms() returns the number of atoms in the simulation, so that it
	can be used to normalize the potential energy that is returned by
	extract_compute() for the "thermo_pe" compute that is defined by
	default for LAMMPS thermodynamic output. Set_variable() sets the
	value of a string variable defined in LAMMPS. This library function
	is a useful way for a Python function to return multiple values to
	LAMMPS, more than the single value that can be passed back via a
	return statement. This cutoff value in the "cut" variable is then
	substituted (by LAMMPS) in the pair_style command that is executed
	next. Alternatively, the "LAMMPS command option" line could be used
	in place of the 2 preceding lines, to have Python insert the value
	into the LAMMPS command string.

	NOTE: When using the callback mechanism just described, recognize that
	there are some operations you should not attempt because LAMMPS cannot
	execute them correctly. If the Python function is invoked between
	runs in the LAMMPS input script, then it should be OK to invoke any
	LAMMPS input script command via the library interface command() or
	file() functions, so long as the command would work if it were
	executed in the LAMMPS input script directly at the same point.

	However, a Python function can also be invoked during a run, whenever
	an associated LAMMPS variable it is assigned to is evaluated. If the
	variable is an input argument to another LAMMPS command (e.g. "fix
	setforce"_fix_setforce.html), then the Python function will be invoked
	inside the class for that command, in one of its methods that is
	invoked in the middle of a timestep. You cannot execute arbitrary
	input script commands from the Python function (again, via the
	command() or file() functions) at that point in the run and expect it
	to work. Other library functions such as those that invoke computes
	or other variables may have hidden side effects as well. In these
	cases, LAMMPS has no simple way to check that something illogical is
	being attempted.

	:line

	If you run Python code directly on your workstation, either
	interactively or by using Python to launch a Python script stored in a
	file, and your code has an error, you will typically see informative
	error messages. That is not the case when you run Python code from
	LAMMPS using an embedded Python interpreter. The code will typically
	fail silently. LAMMPS will catch some errors but cannot tell you
	where in the Python code the problem occurred. For example, if the
	Python code cannot be loaded and run because it has syntax or other
	logic errors, you may get an error from Python pointing to the
	offending line, or you may get one of these generic errors from
	LAMMPS:

	Could not process Python file
	Could not process Python string :pre

	When the Python function is invoked, if it does not return properly,
	you will typically get this generic error from LAMMPS:

	Python function evaluation failed :pre

	Here are three suggestions for debugging your Python code while
	running it under LAMMPS.

	First, don't run it under LAMMPS, at least to start with! Debug it
	using plain Python. Load and invoke your function, pass it arguments,
	check return values, etc.

	Second, add Python print statements to the function to check how far
	it gets and intermediate values it calculates. See the discussion
	above about printing from Python when running in parallel.

	Third, use Python exception handling. For example, say this statement
	in your Python function is failing, because you have not initialized the
	variable foo:

	foo += 1 :pre

	If you put one (or more) statements inside a "try" statement,
	like this:

	import exceptions
	print "Inside simple function"
	try:
	foo += 1 # one or more statements here
	except Exception, e:
	print "FOO error:",e :pre

	then you will get this message printed to the screen:

	FOO error: local variable 'foo' referenced before assignment :pre

	If there is no error in the try statements, then nothing is printed.
	Either way the function continues on (unless you put a return or
	sys.exit() in the except clause).

	:line

	[Restrictions:]

	This command is part of the PYTHON package. It is only enabled if
	LAMMPS was built with that package. See the "Making
	LAMMPS"_Section_start.html#start_3 section for more info.

	Building LAMMPS with the PYTHON package will link LAMMPS with the
	Python library on your system. Settings to enable this are in the
	lib/python/Makefile.lammps file. See the lib/python/README file for
	information on those settings.

	If you use Python code which calls back to LAMMPS, via the SELF input
	argument explained above, there is an extra step required when
	building LAMMPS. LAMMPS must also be built as a shared library and
	your Python function must be able to to load the Python module in
	python/lammps.py that wraps the LAMMPS library interface. These are
	the same steps required to use Python by itself to wrap LAMMPS.
	Details on these steps are explained in "Section
	python"_Section_python.html. Note that it is important that the
	stand-alone LAMMPS executable and the LAMMPS shared library be
	consistent (built from the same source code files) in order for this
	to work. If the two have been built at different times using
	different source files, problems may occur.

	[Related commands:]

	"shell"_shell.html, "variable"_variable.html

	[Default:] none
	diff --git a/examples/USER/cg-cmm/README b/examples/USER/cgsdk/README
	similarity index 95%
	rename from examples/USER/cg-cmm/README
	rename to examples/USER/cgsdk/README
	index 6a283114b..5d3a49377 100644
	--- a/examples/USER/cg-cmm/README
	+++ b/examples/USER/cgsdk/README
	@@ -1,19 +1,19 @@
	-LAMMPS USER-CMM-CG example problems
	+LAMMPS USER-CGSDK example problems

	Each of these sub-directories contains a sample problem for the SDK
	coarse grained MD potentials that you can run with LAMMPS.

	These are the two sample systems

	peg-verlet: coarse grained PEG surfactant/water mixture lamella
	verlet version
	this example uses the plain LJ term only, no charges.
	two variants are provided regular harmonic angles and
	the SDK variant that includes 1-3 LJ repulsion.

	sds-monolayer: coarse grained SDS surfactant monolayers at water/vapor
	interface.
	this example uses the SDK LJ term with coulomb and shows
	how to use the combined coulomb style vs. hybrid/overlay
	with possible optimizations due to the small number of
	charged particles in this system
	diff --git a/examples/USER/cg-cmm/peg-verlet/data.pegc12e8.gz b/examples/USER/cgsdk/peg-verlet/data.pegc12e8.gz
	similarity index 100%
	rename from examples/USER/cg-cmm/peg-verlet/data.pegc12e8.gz
	rename to examples/USER/cgsdk/peg-verlet/data.pegc12e8.gz
	diff --git a/examples/USER/cg-cmm/peg-verlet/in.pegc12e8 b/examples/USER/cgsdk/peg-verlet/in.pegc12e8
	similarity index 100%
	rename from examples/USER/cg-cmm/peg-verlet/in.pegc12e8
	rename to examples/USER/cgsdk/peg-verlet/in.pegc12e8
	diff --git a/examples/USER/cg-cmm/peg-verlet/in.pegc12e8-angle b/examples/USER/cgsdk/peg-verlet/in.pegc12e8-angle
	similarity index 100%
	rename from examples/USER/cg-cmm/peg-verlet/in.pegc12e8-angle
	rename to examples/USER/cgsdk/peg-verlet/in.pegc12e8-angle
	diff --git a/examples/USER/cg-cmm/peg-verlet/log.pegc12e8 b/examples/USER/cgsdk/peg-verlet/log.pegc12e8
	similarity index 100%
	rename from examples/USER/cg-cmm/peg-verlet/log.pegc12e8
	rename to examples/USER/cgsdk/peg-verlet/log.pegc12e8
	diff --git a/examples/USER/cg-cmm/peg-verlet/log.pegc12e8-angle b/examples/USER/cgsdk/peg-verlet/log.pegc12e8-angle
	similarity index 100%
	rename from examples/USER/cg-cmm/peg-verlet/log.pegc12e8-angle
	rename to examples/USER/cgsdk/peg-verlet/log.pegc12e8-angle
	diff --git a/examples/USER/cg-cmm/sds-monolayer/data.sds.gz b/examples/USER/cgsdk/sds-monolayer/data.sds.gz
	similarity index 100%
	rename from examples/USER/cg-cmm/sds-monolayer/data.sds.gz
	rename to examples/USER/cgsdk/sds-monolayer/data.sds.gz
	diff --git a/examples/USER/cg-cmm/sds-monolayer/in.sds-hybrid b/examples/USER/cgsdk/sds-monolayer/in.sds-hybrid
	similarity index 100%
	rename from examples/USER/cg-cmm/sds-monolayer/in.sds-hybrid
	rename to examples/USER/cgsdk/sds-monolayer/in.sds-hybrid
	diff --git a/examples/USER/cg-cmm/sds-monolayer/in.sds-regular b/examples/USER/cgsdk/sds-monolayer/in.sds-regular
	similarity index 100%
	rename from examples/USER/cg-cmm/sds-monolayer/in.sds-regular
	rename to examples/USER/cgsdk/sds-monolayer/in.sds-regular
	diff --git a/examples/USER/cg-cmm/sds-monolayer/log.sds-hybrid b/examples/USER/cgsdk/sds-monolayer/log.sds-hybrid
	similarity index 100%
	rename from examples/USER/cg-cmm/sds-monolayer/log.sds-hybrid
	rename to examples/USER/cgsdk/sds-monolayer/log.sds-hybrid
	diff --git a/examples/USER/cg-cmm/sds-monolayer/log.sds-regular b/examples/USER/cgsdk/sds-monolayer/log.sds-regular
	similarity index 100%
	rename from examples/USER/cg-cmm/sds-monolayer/log.sds-regular
	rename to examples/USER/cgsdk/sds-monolayer/log.sds-regular
	diff --git a/examples/USER/flow_gauss/README b/examples/USER/misc/flow_gauss/README
	similarity index 100%
	rename from examples/USER/flow_gauss/README
	rename to examples/USER/misc/flow_gauss/README
	diff --git a/examples/USER/flow_gauss/in.GD b/examples/USER/misc/flow_gauss/in.GD
	similarity index 100%
	rename from examples/USER/flow_gauss/in.GD
	rename to examples/USER/misc/flow_gauss/in.GD
	diff --git a/examples/cmap/log.11Apr17.cmap.g++.1 b/examples/cmap/log.11Apr17.cmap.g++.1
	new file mode 100644
	index 000000000..9b4fc2999
	--- /dev/null
	+++ b/examples/cmap/log.11Apr17.cmap.g++.1
	@@ -0,0 +1,205 @@
	+LAMMPS (31 Mar 2017)
	+# Created by charmm2lammps v1.8.2.6 beta on Thu Mar 3 20:56:57 EST 2016
	+
	+units real
	+neigh_modify delay 2 every 1
	+#newton off
	+
	+boundary p p p
	+
	+atom_style full
	+bond_style harmonic
	+angle_style charmm
	+dihedral_style charmmfsw
	+improper_style harmonic
	+
	+pair_style lj/charmmfsw/coul/charmmfsh 8 12
	+pair_modify mix arithmetic
	+
	+fix cmap all cmap charmm22.cmap
	+Reading potential file charmm22.cmap with DATE: 2016-09-26
	+fix_modify cmap energy yes
	+
	+read_data gagg.data fix cmap crossterm CMAP
	+ orthogonal box = (-34.4147 -36.1348 -39.3491) to (45.5853 43.8652 40.6509)
	+ 1 by 1 by 1 MPI processor grid
	+ reading atoms ...
	+ 34 atoms
	+ scanning bonds ...
	+ 4 = max bonds/atom
	+ scanning angles ...
	+ 6 = max angles/atom
	+ scanning dihedrals ...
	+ 12 = max dihedrals/atom
	+ scanning impropers ...
	+ 1 = max impropers/atom
	+ reading bonds ...
	+ 33 bonds
	+ reading angles ...
	+ 57 angles
	+ reading dihedrals ...
	+ 75 dihedrals
	+ reading impropers ...
	+ 7 impropers
	+ 4 = max # of 1-2 neighbors
	+ 7 = max # of 1-3 neighbors
	+ 13 = max # of 1-4 neighbors
	+ 16 = max # of special neighbors
	+
	+special_bonds charmm
	+fix 1 all nve
	+
	+#fix 1 all nvt temp 300 300 100.0
	+#fix 2 all shake 1e-9 500 0 m 1.0
	+
	+velocity all create 0.0 12345678 dist uniform
	+
	+thermo 1000
	+thermo_style custom step ecoul evdwl ebond eangle edihed f_cmap eimp
	+timestep 2.0
	+
	+run 100000
	+Neighbor list info ...
	+ update every 1 steps, delay 2 steps, check yes
	+ max neighbors/atom: 2000, page size: 100000
	+ master list distance cutoff = 14
	+ ghost atom cutoff = 14
	+ binsize = 7, bins = 12 12 12
	+ 1 neighbor lists, perpetual/occasional/extra = 1 0 0
	+ (1) pair lj/charmmfsw/coul/charmmfsh, perpetual
	+ attributes: half, newton on
	+ pair build: half/bin/newton
	+ stencil: half/bin/3d/newton
	+ bin: standard
	+Per MPI rank memory allocation (min/avg/max) = 14.96 \| 14.96 \| 14.96 Mbytes
	+Step E_coul E_vdwl E_bond E_angle E_dihed f_cmap E_impro
	+ 0 16.287573 -0.85933785 1.2470497 4.8441789 4.5432816 -1.473352 0.10453023
	+ 1000 18.816462 -0.84379243 0.78931817 2.7554247 4.4371421 -2.7762038 0.12697656
	+ 2000 18.091571 -1.045888 0.72306589 3.0951524 4.6725102 -2.3580092 0.22712496
	+ 3000 17.835596 -1.2171641 0.72666403 2.6696491 5.4373798 -2.0737041 0.075101693
	+ 4000 16.211232 -0.42713611 0.99472642 3.8961462 5.2009895 -2.5626866 0.17356243
	+ 5000 17.72183 -0.57081189 0.90733068 3.4376382 4.5457582 -2.3727543 0.12354518
	+ 6000 18.753977 -1.5772499 0.81468321 2.9236782 4.6033216 -2.3380859 0.12835782
	+ 7000 18.186024 -0.84205608 0.58996182 3.0329585 4.7221473 -2.5733243 0.10047631
	+ 8000 18.214306 -1.1360938 0.72597611 3.7493028 4.7319958 -2.8957969 0.2006046
	+ 9000 17.248408 -0.48641993 0.90266229 2.9721743 4.7651056 -2.1473354 0.1302043
	+ 10000 17.760655 -1.2968444 0.92384663 3.7007455 4.7378947 -2.2147779 0.06940579
	+ 11000 17.633929 -0.57368413 0.84872849 3.4277114 4.285393 -2.236944 0.17204973
	+ 12000 18.305835 -1.0675148 0.75879532 2.8853173 4.685027 -2.409087 0.087538866
	+ 13000 17.391558 -0.9975291 0.66671947 3.8065638 5.2285578 -2.4198822 0.06253594
	+ 14000 17.483387 -0.67727643 0.91966477 3.7317031 4.7770445 -2.6080027 0.11487095
	+ 15000 18.131749 -1.1918751 1.0025684 3.1238131 4.789742 -2.2546745 0.13782813
	+ 16000 16.972343 -0.43926531 0.60644597 3.7551592 4.8658618 -2.2627659 0.12353145
	+ 17000 18.080785 -1.2073565 0.7867072 3.5671106 4.43754 -2.5092904 0.17429146
	+ 18000 17.474576 -0.97836065 0.8678524 3.7961537 4.3409032 -1.8922572 0.134048
	+ 19000 17.000911 -1.2286864 0.83615834 3.9322908 4.9319492 -2.3281576 0.056689619
	+ 20000 17.043286 -0.8506561 0.80966589 3.5087339 4.8603878 -2.3365263 0.096794824
	+ 21000 17.314495 -1.1430889 0.95363892 4.2446032 4.2756745 -2.1829483 0.17119518
	+ 22000 18.954881 -0.998673 0.58688334 2.71536 4.6634319 -2.6862804 0.20328442
	+ 23000 17.160427 -0.97803282 0.86894041 4.0897736 4.3146238 -2.1962289 0.075339092
	+ 24000 17.602026 -1.0833323 0.94888776 3.7341878 4.3084335 -2.1640414 0.081493681
	+ 25000 17.845584 -1.3432612 0.93497086 3.8911043 4.468032 -2.3475883 0.093204333
	+ 26000 17.833261 -1.1020534 0.77931087 3.7628141 4.512381 -2.3134761 0.15568465
	+ 27000 17.68607 -1.3222026 1.1985872 3.5817624 4.6360755 -2.3492774 0.08427906
	+ 28000 18.326649 -1.2669291 0.74809075 3.2624429 4.4698564 -2.3679076 0.14677293
	+ 29000 17.720933 -1.0773886 0.83099482 3.7652834 4.6584594 -2.8255303 0.23092596
	+ 30000 18.201999 -1.0168706 1.0637455 3.453095 4.3738593 -2.8063214 0.18658217
	+ 31000 17.823502 -1.2685768 0.84805585 3.8600661 4.2195821 -2.1169716 0.12517101
	+ 32000 16.883133 -0.62062648 0.84434922 3.5042683 5.1264906 -2.2674699 0.030138165
	+ 33000 17.805715 -1.679553 1.2430372 4.314677 4.2523894 -2.3008321 0.18591872
	+ 34000 16.723767 -0.54189072 1.1282827 3.8542159 4.3026559 -2.2186336 0.05392425
	+ 35000 17.976909 -0.72092075 0.5876319 2.9726396 5.0881439 -2.491692 0.17356291
	+ 36000 18.782492 -1.514246 0.63237955 3.2777164 4.6077164 -2.502574 0.082537318
	+ 37000 17.247716 -0.6344626 0.79885976 3.452491 4.7618281 -2.3902444 0.11450271
	+ 38000 17.996494 -1.6712877 1.0111769 4.1689136 4.46963 -2.4076725 0.11875756
	+ 39000 17.586857 -0.74508086 0.95970486 3.7395038 4.6011357 -2.9854953 0.30143284
	+ 40000 17.494879 -0.30772446 0.72047991 3.2604877 4.7283734 -2.3812495 0.16399034
	+ 41000 15.855772 -0.49642605 0.82496448 4.5139653 4.76884 -2.214141 0.10899661
	+ 42000 17.898568 -1.3078863 1.1505144 4.0429873 4.3889581 -2.8696559 0.23336417
	+ 43000 19.014372 -1.6325979 1.1553166 3.5660772 4.4047997 -2.9302044 0.13672127
	+ 44000 18.250782 -0.97211613 0.72714301 3.2258362 4.7257298 -2.5533613 0.11968073
	+ 45000 17.335174 0.24746331 1.0415866 3.3220992 4.5251095 -3.0415216 0.24453084
	+ 46000 17.72846 -0.9541418 0.88153841 3.7893452 4.5251883 -2.4003613 0.051809816
	+ 47000 18.226762 -0.67057787 0.84352989 3.0609522 4.5449078 -2.4694254 0.073703949
	+ 48000 17.838074 -0.88768441 1.3812262 3.5890492 4.5827868 -3.0137515 0.21417113
	+ 49000 17.973733 -0.75118705 0.69667886 3.3989025 4.7058886 -2.8243945 0.26665792
	+ 50000 17.461583 -0.65040016 0.68943524 2.9374743 5.6971777 -2.4438011 0.1697603
	+ 51000 16.79766 -0.010684434 0.89795555 3.959039 4.56763 -2.5101098 0.15048853
	+ 52000 17.566543 -0.7262764 0.74354418 3.3423185 4.8426523 -2.4187649 0.16908776
	+ 53000 17.964274 -0.9270914 1.065952 3.0397181 4.4682262 -2.2179503 0.07873406
	+ 54000 17.941256 -0.5807578 0.76516121 3.7262371 4.6975126 -3.179899 0.24433708
	+ 55000 17.079478 -0.48559832 0.95364453 3.0414645 5.2811414 -2.7064882 0.30102814
	+ 56000 17.632179 -0.75403299 0.97577942 3.3672363 4.4851336 -2.3683659 0.051117638
	+ 57000 16.17128 -0.44699325 0.76341543 4.267716 5.0881056 -2.4122329 0.16671692
	+ 58000 16.899276 -0.76481024 1.0400825 3.973493 4.8823309 -2.4270284 0.048716383
	+ 59000 18.145412 -0.84968335 0.71698306 3.2024358 4.6115739 -2.2520353 0.19466966
	+ 60000 17.578258 -1.0067331 0.72822527 3.5375208 4.9110255 -2.2319607 0.11922362
	+ 61000 17.434762 -1.0244393 0.90593099 3.8446915 4.8571191 -2.6228357 0.23259208
	+ 62000 17.580489 -1.1135917 0.79577432 3.7043524 4.6058114 -2.351492 0.042904152
	+ 63000 18.207335 -1.1512268 0.82684507 3.4114738 4.351069 -2.1878441 0.082922105
	+ 64000 18.333083 -1.1182287 0.74058959 3.6905164 4.3226172 -2.7110393 0.14721704
	+ 65000 16.271579 -0.7122151 1.0200168 4.6983643 4.3681131 -2.194921 0.12831024
	+ 66000 17.316444 -0.5729385 0.85254108 3.5769963 4.5526705 -2.3321328 0.040452643
	+ 67000 17.19011 -0.8814312 1.1381258 3.8605789 4.4183813 -2.299607 0.091527355
	+ 68000 18.223367 -1.362189 0.74472056 3.259165 4.486512 -2.2181134 0.048952796
	+ 69000 17.646348 -0.91647162 0.73990335 3.9313692 5.2663097 -3.3816778 0.27769877
	+ 70000 18.173493 -1.3107718 0.96484426 3.219728 4.5045124 -2.3349534 0.082327407
	+ 71000 17.0627 -0.58509083 0.85964129 3.8490884 4.437895 -2.1673348 0.24151404
	+ 72000 17.809764 -0.35128902 0.65479258 3.3945008 4.6160508 -2.5486166 0.10829531
	+ 73000 18.27769 -1.0739758 0.80890957 3.6070901 4.6256762 -2.4576547 0.080025736
	+ 74000 18.109437 -1.0691837 0.66679323 3.5923203 4.4825716 -2.5048169 0.21372319
	+ 75000 17.914569 -1.3500765 1.2993494 3.362421 4.4160377 -2.1278163 0.19397641
	+ 76000 16.563928 -0.16539261 1.0067302 3.5742755 4.8581915 -2.1362429 0.059822408
	+ 77000 18.130477 -0.38361279 0.43406954 3.4725995 4.7005855 -2.8836242 0.11958174
	+ 78000 16.746204 -1.1732959 0.7455507 3.6296638 5.6344113 -2.459208 0.16099803
	+ 79000 18.243999 -1.5850155 1.0108545 3.4727867 4.3367411 -2.316686 0.070480814
	+ 80000 16.960715 -0.84100929 0.91604996 3.862215 4.780949 -2.3711596 0.073916605
	+ 81000 17.697722 -1.1126605 0.952804 3.7114455 4.4216316 -2.2770085 0.091372066
	+ 82000 17.835901 -1.3091474 0.71867629 3.8168122 5.0150205 -2.4730634 0.062592852
	+ 83000 19.168418 -1.476938 0.75592316 3.2304519 4.3946471 -2.2991395 0.13083324
	+ 84000 17.945778 -1.5223622 1.0859941 3.4334011 5.0286682 -2.7550892 0.2476269
	+ 85000 17.950251 -0.85843846 0.86888218 3.3101287 4.5511879 -2.3640013 0.12080834
	+ 86000 17.480699 -0.97493649 0.85049761 3.4973085 4.6344922 -2.343121 0.2009677
	+ 87000 17.980244 -1.114983 0.88796989 3.4113329 4.3535853 -2.2535412 0.14494917
	+ 88000 18.023866 -1.226683 0.62339706 3.7649269 4.5923973 -2.3923523 0.10464375
	+ 89000 16.362829 -0.311462 1.0265375 4.0101723 4.4184777 -2.0314129 0.056570704
	+ 90000 17.533149 -0.41526788 1.0362029 3.4247412 4.2734431 -2.4776658 0.16960663
	+ 91000 17.719099 -1.1956801 1.0069945 3.2380672 4.8982805 -2.2154906 0.12950936
	+ 92000 17.762654 -1.170027 0.95814525 3.5217717 4.5405343 -2.5983677 0.15037754
	+ 93000 17.393958 -0.45641026 0.6579069 3.6002204 4.5942053 -2.5559641 0.12026544
	+ 94000 16.8182 -0.92962066 0.86801362 4.2914398 4.659848 -2.5251987 0.18000415
	+ 95000 17.642086 -0.7994896 0.7003756 3.8036697 4.5252487 -2.4166307 0.15686517
	+ 96000 18.114292 -1.5102104 1.2635908 3.2764427 5.0659496 -2.2777806 0.054309645
	+ 97000 18.575765 -1.6015311 0.69500699 3.1649317 4.9945742 -2.4012125 0.067373724
	+ 98000 16.578893 -0.78030229 0.91524222 4.4429655 4.4622392 -2.4052655 0.15355705
	+ 99000 17.26063 -0.57832833 0.7098846 3.9000046 4.5576484 -2.5333026 0.25517222
	+ 100000 18.377235 -0.89109577 0.68988617 2.8751751 4.4115591 -2.3560731 0.12185212
	+Loop time of 2.96043 on 1 procs for 100000 steps with 34 atoms
	+
	+Performance: 5836.990 ns/day, 0.004 hours/ns, 33778.875 timesteps/s
	+99.9% CPU use with 1 MPI tasks x no OpenMP threads
	+
	+MPI task timing breakdown:
	+Section \| min time \| avg time \| max time \|%varavg\| %total
	+---------------------------------------------------------------
	+Pair \| 1.074 \| 1.074 \| 1.074 \| 0.0 \| 36.28
	+Bond \| 1.6497 \| 1.6497 \| 1.6497 \| 0.0 \| 55.72
	+Neigh \| 0.007576 \| 0.007576 \| 0.007576 \| 0.0 \| 0.26
	+Comm \| 0.012847 \| 0.012847 \| 0.012847 \| 0.0 \| 0.43
	+Output \| 0.0010746 \| 0.0010746 \| 0.0010746 \| 0.0 \| 0.04
	+Modify \| 0.16485 \| 0.16485 \| 0.16485 \| 0.0 \| 5.57
	+Other \| \| 0.05037 \| \| \| 1.70
	+
	+Nlocal: 34 ave 34 max 34 min
	+Histogram: 1 0 0 0 0 0 0 0 0 0
	+Nghost: 0 ave 0 max 0 min
	+Histogram: 1 0 0 0 0 0 0 0 0 0
	+Neighs: 395 ave 395 max 395 min
	+Histogram: 1 0 0 0 0 0 0 0 0 0
	+
	+Total # of neighbors = 395
	+Ave neighs/atom = 11.6176
	+Ave special neighs/atom = 9.52941
	+Neighbor list builds = 253
	+Dangerous builds = 0
	+Total wall time: 0:00:02
	diff --git a/examples/cmap/log.11Apr17.cmap.g++.4 b/examples/cmap/log.11Apr17.cmap.g++.4
	new file mode 100644
	index 000000000..ec471d5a7
	--- /dev/null
	+++ b/examples/cmap/log.11Apr17.cmap.g++.4
	@@ -0,0 +1,205 @@
	+LAMMPS (31 Mar 2017)
	+# Created by charmm2lammps v1.8.2.6 beta on Thu Mar 3 20:56:57 EST 2016
	+
	+units real
	+neigh_modify delay 2 every 1
	+#newton off
	+
	+boundary p p p
	+
	+atom_style full
	+bond_style harmonic
	+angle_style charmm
	+dihedral_style charmmfsw
	+improper_style harmonic
	+
	+pair_style lj/charmmfsw/coul/charmmfsh 8 12
	+pair_modify mix arithmetic
	+
	+fix cmap all cmap charmm22.cmap
	+Reading potential file charmm22.cmap with DATE: 2016-09-26
	+fix_modify cmap energy yes
	+
	+read_data gagg.data fix cmap crossterm CMAP
	+ orthogonal box = (-34.4147 -36.1348 -39.3491) to (45.5853 43.8652 40.6509)
	+ 1 by 2 by 2 MPI processor grid
	+ reading atoms ...
	+ 34 atoms
	+ scanning bonds ...
	+ 4 = max bonds/atom
	+ scanning angles ...
	+ 6 = max angles/atom
	+ scanning dihedrals ...
	+ 12 = max dihedrals/atom
	+ scanning impropers ...
	+ 1 = max impropers/atom
	+ reading bonds ...
	+ 33 bonds
	+ reading angles ...
	+ 57 angles
	+ reading dihedrals ...
	+ 75 dihedrals
	+ reading impropers ...
	+ 7 impropers
	+ 4 = max # of 1-2 neighbors
	+ 7 = max # of 1-3 neighbors
	+ 13 = max # of 1-4 neighbors
	+ 16 = max # of special neighbors
	+
	+special_bonds charmm
	+fix 1 all nve
	+
	+#fix 1 all nvt temp 300 300 100.0
	+#fix 2 all shake 1e-9 500 0 m 1.0
	+
	+velocity all create 0.0 12345678 dist uniform
	+
	+thermo 1000
	+thermo_style custom step ecoul evdwl ebond eangle edihed f_cmap eimp
	+timestep 2.0
	+
	+run 100000
	+Neighbor list info ...
	+ update every 1 steps, delay 2 steps, check yes
	+ max neighbors/atom: 2000, page size: 100000
	+ master list distance cutoff = 14
	+ ghost atom cutoff = 14
	+ binsize = 7, bins = 12 12 12
	+ 1 neighbor lists, perpetual/occasional/extra = 1 0 0
	+ (1) pair lj/charmmfsw/coul/charmmfsh, perpetual
	+ attributes: half, newton on
	+ pair build: half/bin/newton
	+ stencil: half/bin/3d/newton
	+ bin: standard
	+Per MPI rank memory allocation (min/avg/max) = 14.94 \| 15.57 \| 16.2 Mbytes
	+Step E_coul E_vdwl E_bond E_angle E_dihed f_cmap E_impro
	+ 0 16.287573 -0.85933785 1.2470497 4.8441789 4.5432816 -1.473352 0.10453023
	+ 1000 18.816462 -0.84379243 0.78931817 2.7554247 4.4371421 -2.7762038 0.12697656
	+ 2000 18.091571 -1.045888 0.72306589 3.0951524 4.6725102 -2.3580092 0.22712496
	+ 3000 17.835596 -1.2171641 0.72666403 2.6696491 5.4373798 -2.0737041 0.075101693
	+ 4000 16.211232 -0.42713611 0.99472642 3.8961462 5.2009895 -2.5626866 0.17356243
	+ 5000 17.72183 -0.57081189 0.90733068 3.4376382 4.5457582 -2.3727543 0.12354518
	+ 6000 18.753977 -1.5772499 0.81468321 2.9236782 4.6033216 -2.3380859 0.12835782
	+ 7000 18.186024 -0.84205609 0.58996181 3.0329584 4.7221473 -2.5733244 0.10047631
	+ 8000 18.214306 -1.1360934 0.72597583 3.7493032 4.7319959 -2.8957975 0.20060467
	+ 9000 17.248415 -0.48642024 0.90266262 2.9721744 4.7651003 -2.1473349 0.13020438
	+ 10000 17.760663 -1.2968458 0.92384687 3.7007432 4.7378917 -2.2147799 0.06940514
	+ 11000 17.63395 -0.57366075 0.84871737 3.4276851 4.2853865 -2.2369491 0.17205075
	+ 12000 18.305713 -1.0672299 0.75876262 2.8852171 4.6850229 -2.4090072 0.087568888
	+ 13000 17.383367 -0.99678627 0.66712651 3.8060954 5.233865 -2.4180629 0.062014239
	+ 14000 17.510901 -0.68723297 0.92448551 3.7550867 4.7321218 -2.6059088 0.11504409
	+ 15000 18.080165 -1.13316 0.99982253 3.09947 4.8171402 -2.2713372 0.14580371
	+ 16000 17.383245 -0.4535296 0.57826268 3.6453593 4.6541138 -2.2434512 0.13285609
	+ 17000 17.111153 -0.3414839 0.73667584 3.7485311 4.6262965 -2.6166049 0.12635815
	+ 18000 16.862046 -1.3592061 1.2371142 4.4878937 4.2937117 -2.2112584 0.066145125
	+ 19000 18.313891 -1.654238 0.90644101 3.3934089 4.550735 -2.1862171 0.081267736
	+ 20000 19.083561 -1.3081747 0.56257812 2.7633848 4.6211438 -2.5196707 0.13763071
	+ 21000 18.23741 -1.051353 0.64408722 3.1735565 4.6912533 -2.2491947 0.099394904
	+ 22000 17.914515 -0.89769621 0.61793801 3.1224992 4.8683543 -2.282475 0.14524537
	+ 23000 16.756122 -0.98277883 1.2554905 3.7916115 4.7301443 -2.3094994 0.10226772
	+ 24000 16.109857 -0.54593177 0.86934462 4.4293574 4.926985 -2.2652264 0.11414331
	+ 25000 18.590559 -1.497327 1.1898361 2.9134403 4.7854107 -2.4437918 0.067416154
	+ 26000 18.493391 -1.0533797 0.4889578 3.6563013 4.6171721 -2.3240835 0.11607829
	+ 27000 18.646522 -1.1229601 0.67956815 2.7937638 4.8991207 -2.4068997 0.10109147
	+ 28000 18.545103 -1.7237438 0.72488022 3.8041665 4.6459974 -2.4339333 0.21943258
	+ 29000 17.840505 -1.0909667 0.88133248 3.3698456 5.0311644 -2.5116617 0.08102693
	+ 30000 17.649527 -0.65409177 0.86781692 3.24112 4.9903073 -2.6234925 0.14799777
	+ 31000 18.156812 -0.77476556 0.83192789 2.9620784 4.9160635 -2.8571635 0.22283201
	+ 32000 18.251583 -1.3384075 0.8059007 3.2588176 4.4365328 -2.1875071 0.087883637
	+ 33000 17.702785 -0.88311587 0.98573641 3.4645713 4.2650091 -2.0909158 0.14233004
	+ 34000 17.123413 -1.4873429 1.0419563 4.2628178 4.6318762 -2.2292095 0.105354
	+ 35000 18.162061 -1.0136007 0.82436129 3.6365024 4.5801677 -2.6856989 0.28648222
	+ 36000 17.65618 -1.094718 0.8872444 3.5075241 4.6382423 -2.3895134 0.18116961
	+ 37000 17.336475 -1.0657995 0.98869254 3.9252927 4.4383632 -2.2048244 0.22285949
	+ 38000 17.369467 -0.97623132 0.6712095 4.1349304 4.597754 -2.4088341 0.14608514
	+ 39000 18.170206 -1.2344285 0.77546195 3.6451049 4.7482287 -2.9895286 0.25768859
	+ 40000 16.210866 -0.81407781 0.99246271 4.2676233 5.0253763 -2.2929865 0.13348624
	+ 41000 17.641798 -1.0868157 0.80119513 3.4302526 5.280872 -2.4025406 0.22747391
	+ 42000 18.349848 -1.613759 1.1497004 3.7800682 4.3237683 -2.8676401 0.2120425
	+ 43000 19.130245 -1.196778 0.71845659 2.9325758 4.3684415 -2.433424 0.12240982
	+ 44000 18.061321 -1.2410101 1.0329373 3.0751569 4.7138313 -2.2880904 0.075814461
	+ 45000 18.162713 -1.4414622 1.009159 4.2298758 4.589593 -2.8502298 0.21606844
	+ 46000 18.591574 -0.99730412 1.0955215 3.3965004 4.359466 -3.1049731 0.17322629
	+ 47000 18.380259 -1.2717381 0.72291269 3.3958016 4.6099628 -2.4605065 0.19825185
	+ 48000 18.130478 -1.5051279 1.2087492 3.2488529 4.6690881 -2.2518174 0.05633061
	+ 49000 16.419912 -0.89320635 0.98926144 4.0388252 4.9919488 -2.1699511 0.15646479
	+ 50000 16.453196 -1.0433497 0.778346 4.6078069 4.7320614 -2.3760788 0.17161976
	+ 51000 18.245221 -0.89550444 0.9310446 3.0758194 4.3944595 -2.3082379 0.19983428
	+ 52000 17.839632 -1.0221781 0.76425017 3.3331547 4.5368437 -2.0988773 0.21098435
	+ 53000 18.693035 -1.4231915 0.76333082 3.1612761 4.583242 -2.4485762 0.089191206
	+ 54000 16.334672 -0.36309884 1.0200365 4.6700448 4.1628702 -2.1713841 0.11431995
	+ 55000 17.33842 -0.61522682 0.89847366 3.4970659 4.673495 -2.4743036 0.068004878
	+ 56000 17.790294 -1.0150845 0.73697112 3.6000297 4.5988343 -2.4822509 0.11434632
	+ 57000 18.913486 -1.0985507 1.0231848 2.7483267 4.4421755 -2.574424 0.1763388
	+ 58000 17.586896 -0.98284126 0.96965633 3.3330357 4.5325543 -2.1936869 0.083230915
	+ 59000 17.77788 -1.1649953 0.83092298 3.8004148 4.3940176 -2.3136642 0.017207608
	+ 60000 17.013042 -0.21728023 1.1688832 3.5374476 4.5462244 -2.4425301 0.15028297
	+ 61000 17.236242 -1.1342147 1.0301086 3.685948 4.6842331 -2.328108 0.070210812
	+ 62000 17.529852 -1.2961547 1.0323133 3.4474598 5.1435839 -2.4553423 0.060842687
	+ 63000 18.754704 -1.1816999 0.51806039 3.140172 4.5832701 -2.2713213 0.06327871
	+ 64000 17.54594 -1.3592836 0.9694558 4.1363258 4.3547729 -2.3818433 0.12634448
	+ 65000 16.962312 -0.54192775 0.90321315 4.0788618 4.2008255 -2.1376711 0.039504515
	+ 66000 18.078619 -1.3552947 1.0716861 3.3285374 4.7229362 -2.3331115 0.21978698
	+ 67000 17.132732 -1.4376876 0.91486534 4.4461852 4.6894176 -2.3655045 0.068150385
	+ 68000 18.69286 -1.2856207 0.3895394 3.0620063 4.9922992 -2.3459189 0.079879643
	+ 69000 18.329552 -1.1545957 0.88632275 3.1741058 4.4562418 -2.7094867 0.25329613
	+ 70000 16.681168 -0.94434373 1.2450393 4.5737944 4.4902996 -2.4581775 0.15313095
	+ 71000 17.375032 -1.0514442 1.0741595 3.4896146 4.8407713 -2.5302576 0.13640847
	+ 72000 17.833013 -0.9047134 0.87067876 3.1658924 4.8825932 -2.4398117 0.2343991
	+ 73000 17.421411 -1.2190741 0.73706811 4.2895 4.6464636 -2.3872727 0.19696525
	+ 74000 17.383158 -0.34208984 0.71333984 3.2718891 4.2718495 -2.2484281 0.10827022
	+ 75000 17.20885 -1.2710479 1.125102 3.8414467 5.3222741 -2.375505 0.12910797
	+ 76000 16.811578 -0.545162 0.59076961 3.9118604 4.8031296 -2.2777895 0.063015508
	+ 77000 16.679231 -0.080955983 0.7253398 3.4203454 5.0987608 -2.379614 0.12961874
	+ 78000 18.164524 -1.3115525 0.92526408 3.5764487 4.3814882 -2.3712488 0.073436724
	+ 79000 17.738686 -1.0697859 1.2186866 3.0593848 4.6551053 -2.2505871 0.075340661
	+ 80000 16.767483 -0.84777477 1.03128 4.1982958 4.6992227 -2.4146425 0.079774219
	+ 81000 16.257265 0.62803774 0.84032194 3.3873471 5.0961071 -2.7219776 0.20467848
	+ 82000 18.232082 -1.2129302 0.50746051 3.9207128 4.5073437 -2.599371 0.094522372
	+ 83000 16.618985 -0.60917055 0.8825847 3.805497 4.9560959 -2.2194726 0.14852687
	+ 84000 17.90762 -0.82336075 0.90504161 3.0324198 4.7444271 -2.5036073 0.15860682
	+ 85000 16.699883 -0.50297228 0.83405307 3.8598996 4.7971968 -2.2427788 0.10338668
	+ 86000 16.353038 -0.0096880616 0.80705167 4.0865115 4.5364338 -2.4548873 0.098456203
	+ 87000 17.887331 -0.75281219 1.0030148 4.0117123 4.3443074 -2.9774392 0.16190152
	+ 88000 18.583708 -1.4867053 0.86324814 3.3971237 4.3526221 -2.221239 0.14459352
	+ 89000 17.684828 -1.283764 1.0021118 3.5426808 4.9057005 -2.3921967 0.05844702
	+ 90000 17.2597 -0.84306489 0.99797936 3.8896866 4.4315457 -2.5662899 0.18270206
	+ 91000 16.705581 -0.44704047 0.75239556 3.470805 4.976868 -2.1894571 0.12312848
	+ 92000 17.548071 -1.2222664 0.92898812 4.0813773 4.3432647 -2.1631158 0.14071343
	+ 93000 17.163675 -0.94994776 0.96876981 3.9137692 4.4388666 -2.1260232 0.13187968
	+ 94000 18.842071 -1.2822113 0.58767049 3.1393475 4.5820965 -2.7264682 0.10406266
	+ 95000 18.112287 -1.1011381 0.63546648 3.4672667 4.486275 -2.2991936 0.041589685
	+ 96000 17.102713 -0.6877313 0.8389032 3.6892719 4.5676004 -2.1905327 0.13507011
	+ 97000 16.778253 -1.2902153 1.1588744 4.2820083 4.9537657 -2.4798159 0.35696636
	+ 98000 18.34638 -1.2908146 1.185356 3.0739807 4.4575453 -2.3959144 0.22407922
	+ 99000 17.995148 -1.3939639 0.7727299 3.8774144 4.4345458 -2.1142776 0.13550099
	+ 100000 18.444746 -1.2456693 0.86061526 3.468696 4.5264336 -2.4239851 0.074369539
	+Loop time of 2.52011 on 4 procs for 100000 steps with 34 atoms
	+
	+Performance: 6856.851 ns/day, 0.004 hours/ns, 39680.850 timesteps/s
	+98.8% CPU use with 4 MPI tasks x no OpenMP threads
	+
	+MPI task timing breakdown:
	+Section \| min time \| avg time \| max time \|%varavg\| %total
	+---------------------------------------------------------------
	+Pair \| 0.072506 \| 0.28131 \| 0.69088 \| 46.2 \| 11.16
	+Bond \| 0.050544 \| 0.45307 \| 0.9416 \| 57.6 \| 17.98
	+Neigh \| 0.0060885 \| 0.0061619 \| 0.0062056 \| 0.1 \| 0.24
	+Comm \| 0.44686 \| 1.3679 \| 2.0111 \| 53.5 \| 54.28
	+Output \| 0.0028057 \| 0.0029956 \| 0.003264 \| 0.3 \| 0.12
	+Modify \| 0.028202 \| 0.095174 \| 0.15782 \| 19.8 \| 3.78
	+Other \| \| 0.3135 \| \| \| 12.44
	+
	+Nlocal: 8.5 ave 14 max 2 min
	+Histogram: 1 0 1 0 0 0 0 0 0 2
	+Nghost: 25.5 ave 32 max 20 min
	+Histogram: 2 0 0 0 0 0 0 1 0 1
	+Neighs: 98.75 ave 242 max 31 min
	+Histogram: 2 0 1 0 0 0 0 0 0 1
	+
	+Total # of neighbors = 395
	+Ave neighs/atom = 11.6176
	+Ave special neighs/atom = 9.52941
	+Neighbor list builds = 246
	+Dangerous builds = 0
	+Total wall time: 0:00:02
	diff --git a/examples/mscg/log.31Mar17.g++.1 b/examples/mscg/log.31Mar17.g++.1
	new file mode 100644
	index 000000000..c67bc483d
	--- /dev/null
	+++ b/examples/mscg/log.31Mar17.g++.1
	@@ -0,0 +1,145 @@
	+LAMMPS (13 Apr 2017)
	+units real
	+atom_style full
	+pair_style zero 10.0
	+
	+read_data data.meoh
	+ orthogonal box = (-20.6917 -20.6917 -20.6917) to (20.6917 20.6917 20.6917)
	+ 1 by 1 by 1 MPI processor grid
	+ reading atoms ...
	+ 1000 atoms
	+ 0 = max # of 1-2 neighbors
	+ 0 = max # of 1-3 neighbors
	+ 0 = max # of 1-4 neighbors
	+ 1 = max # of special neighbors
	+pair_coeff * *
	+
	+thermo 1
	+thermo_style custom step
	+
	+# Test 1a: range finder functionality
	+fix 1 all mscg 1 range on
	+rerun dump.meoh first 0 last 4500 every 250 dump x y z fx fy fz
	+Neighbor list info ...
	+ update every 1 steps, delay 10 steps, check yes
	+ max neighbors/atom: 2000, page size: 100000
	+ master list distance cutoff = 12
	+ ghost atom cutoff = 12
	+ binsize = 6, bins = 7 7 7
	+ 1 neighbor lists, perpetual/occasional/extra = 1 0 0
	+ (1) pair zero, perpetual
	+ attributes: half, newton on
	+ pair build: half/bin/newton
	+ stencil: half/bin/3d/newton
	+ bin: standard
	+Per MPI rank memory allocation (min/avg/max) = 5.794 \| 5.794 \| 5.794 Mbytes
	+Step
	+ 0
	+ 250
	+ 500
	+ 750
	+ 1000
	+ 1250
	+ 1500
	+ 1750
	+ 2000
	+ 2250
	+ 2500
	+ 2750
	+ 3000
	+ 3250
	+ 3500
	+ 3750
	+ 4000
	+ 4250
	+ 4500
	+Loop time of 0.581537 on 1 procs for 19 steps with 1000 atoms
	+
	+Performance: 2.823 ns/day, 8.502 hours/ns, 32.672 timesteps/s
	+99.2% CPU use with 1 MPI tasks x no OpenMP threads
	+
	+MPI task timing breakdown:
	+Section \| min time \| avg time \| max time \|%varavg\| %total
	+---------------------------------------------------------------
	+Pair \| 0 \| 0 \| 0 \| 0.0 \| 0.00
	+Bond \| 0 \| 0 \| 0 \| 0.0 \| 0.00
	+Neigh \| 0 \| 0 \| 0 \| 0.0 \| 0.00
	+Comm \| 0 \| 0 \| 0 \| 0.0 \| 0.00
	+Output \| 0 \| 0 \| 0 \| 0.0 \| 0.00
	+Modify \| 0 \| 0 \| 0 \| 0.0 \| 0.00
	+Other \| \| 0.5815 \| \| \|100.00
	+
	+Nlocal: 1000 ave 1000 max 1000 min
	+Histogram: 1 0 0 0 0 0 0 0 0 0
	+Nghost: 2934 ave 2934 max 2934 min
	+Histogram: 1 0 0 0 0 0 0 0 0 0
	+Neighs: 50654 ave 50654 max 50654 min
	+Histogram: 1 0 0 0 0 0 0 0 0 0
	+
	+Total # of neighbors = 50654
	+Ave neighs/atom = 50.654
	+Ave special neighs/atom = 0
	+Neighbor list builds = 0
	+Dangerous builds = 0
	+print "TEST_1a mscg range finder"
	+TEST_1a mscg range finder
	+unfix 1
	+
	+# Test 1b: force matching functionality
	+fix 1 all mscg 1
	+rerun dump.meoh first 0 last 4500 every 250 dump x y z fx fy fz
	+Per MPI rank memory allocation (min/avg/max) = 5.794 \| 5.794 \| 5.794 Mbytes
	+Step
	+ 0
	+ 250
	+ 500
	+ 750
	+ 1000
	+ 1250
	+ 1500
	+ 1750
	+ 2000
	+ 2250
	+ 2500
	+ 2750
	+ 3000
	+ 3250
	+ 3500
	+ 3750
	+ 4000
	+ 4250
	+ 4500
	+Loop time of 0.841917 on 1 procs for 19 steps with 1000 atoms
	+
	+Performance: 1.950 ns/day, 12.309 hours/ns, 22.568 timesteps/s
	+99.8% CPU use with 1 MPI tasks x no OpenMP threads
	+
	+MPI task timing breakdown:
	+Section \| min time \| avg time \| max time \|%varavg\| %total
	+---------------------------------------------------------------
	+Pair \| 0 \| 0 \| 0 \| 0.0 \| 0.00
	+Bond \| 0 \| 0 \| 0 \| 0.0 \| 0.00
	+Neigh \| 0 \| 0 \| 0 \| 0.0 \| 0.00
	+Comm \| 0 \| 0 \| 0 \| 0.0 \| 0.00
	+Output \| 0 \| 0 \| 0 \| 0.0 \| 0.00
	+Modify \| 0 \| 0 \| 0 \| 0.0 \| 0.00
	+Other \| \| 0.8419 \| \| \|100.00
	+
	+Nlocal: 1000 ave 1000 max 1000 min
	+Histogram: 1 0 0 0 0 0 0 0 0 0
	+Nghost: 2934 ave 2934 max 2934 min
	+Histogram: 1 0 0 0 0 0 0 0 0 0
	+Neighs: 50654 ave 50654 max 50654 min
	+Histogram: 1 0 0 0 0 0 0 0 0 0
	+
	+Total # of neighbors = 50654
	+Ave neighs/atom = 50.654
	+Ave special neighs/atom = 0
	+Neighbor list builds = 0
	+Dangerous builds = 0
	+print "TEST_1b mscg force matching"
	+TEST_1b mscg force matching
	+
	+print TEST_DONE
	+TEST_DONE
	+Total wall time: 0:00:01
	diff --git a/lib/Install.py b/lib/Install.py
	new file mode 100644
	index 000000000..18b426f92
	--- /dev/null
	+++ b/lib/Install.py
	@@ -0,0 +1,82 @@
	+#!/usr/bin/env python
	+
	+# install.py tool to do a generic build of a library
	+# soft linked to by many of the lib/Install.py files
	+# used to automate the steps described in the corresponding lib/README
	+
	+import sys,commands,os
	+
	+# help message
	+
	+help = """
	+Syntax: python Install.py -m machine -e suffix
	+ specify -m and optionally -e, order does not matter
	+ -m = peform a clean followed by "make -f Makefile.machine"
	+ machine = suffix of a lib/Makefile.* file
	+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
	+ does not alter existing Makefile.machine
	+"""
	+
	+# print error message or help
	+
	+def error(str=None):
	+ if not str: print help
	+ else: print "ERROR",str
	+ sys.exit()
	+
	+# parse args
	+
	+args = sys.argv[1:]
	+nargs = len(args)
	+if nargs == 0: error()
	+
	+machine = None
	+extraflag = 0
	+
	+iarg = 0
	+while iarg < nargs:
	+ if args[iarg] == "-m":
	+ if iarg+2 > nargs: error()
	+ machine = args[iarg+1]
	+ iarg += 2
	+ elif args[iarg] == "-e":
	+ if iarg+2 > nargs: error()
	+ extraflag = 1
	+ suffix = args[iarg+1]
	+ iarg += 2
	+ else: error()
	+
	+# set lib from working dir
	+
	+cwd = os.getcwd()
	+lib = os.path.basename(cwd)
	+
	+# create Makefile.auto as copy of Makefile.machine
	+# reset EXTRAMAKE if requested
	+
	+if not os.path.exists("Makefile.%s" % machine):
	+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
	+
	+lines = open("Makefile.%s" % machine,'r').readlines()
	+fp = open("Makefile.auto",'w')
	+
	+for line in lines:
	+ words = line.split()
	+ if len(words) == 3 and extraflag and \
	+ words[0] == "EXTRAMAKE" and words[1] == '=':
	+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
	+ print >>fp,line,
	+
	+fp.close()
	+
	+# make the library via Makefile.auto
	+
	+print "Building lib%s.a ..." % lib
	+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
	+txt = commands.getoutput(cmd)
	+print txt
	+
	+if os.path.exists("lib%s.a" % lib): print "Build was successful"
	+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
	+if not os.path.exists("Makefile.lammps"):
	+ print "lib/%s/Makefile.lammps was NOT created" % lib
	diff --git a/lib/README b/lib/README
	index 72ebb0a5f..3c8f46dd0 100644
	--- a/lib/README
	+++ b/lib/README
	@@ -1,57 +1,59 @@
	This directory contains libraries that can be linked to when building
	LAMMPS, if particular packages are included in the LAMMPS build.

	Most of these directories contain code for the library; some contain
	a Makefile.lammps file that points to where the library is installed
	elsewhere on your system.

	In either case, the library itself must be installed and/or built
	first, so that the appropriate library files exist for LAMMPS to link
	against.

	Each library directory contains a README with additional info about
	how to acquire and/or build the library. This may require you to edit
	one of the provided Makefiles to make it suitable for your machine.

	The libraries in this directory are the following:

	atc atomistic-to-continuum methods, USER-ATC package
	from Reese Jones, Jeremy Templeton, Jon Zimmerman (Sandia)
	awpmd antisymmetrized wave packet molecular dynamics, AWPMD package
	from Ilya Valuev (JIHT RAS)
	colvars collective variable module (Metadynamics, ABF and more)
	from Giacomo Fiorin and Jerome Henin (ICMS, Temple U)
	compress hook to system lib for performing I/O compression, COMPRESS pkg
	from Axel Kohlmeyer (Temple U)
	gpu general GPU routines, GPU package
	from Mike Brown (ORNL)
	h5md ch5md library for output of MD data in HDF5 format
	from Pierre de Buyl (KU Leuven)
	kim hooks to the KIM library, used by KIM package
	from Ryan Elliott and Ellad Tadmor (U Minn)
	kokkos Kokkos package for GPU and many-core acceleration
	from Kokkos development team (Sandia)
	linalg set of BLAS and LAPACK routines needed by USER-ATC package
	from Axel Kohlmeyer (Temple U)
	-poems POEMS rigid-body integration package, POEMS package
	- from Rudranarayan Mukherjee (RPI)
	meam modified embedded atom method (MEAM) potential, MEAM package
	from Greg Wagner (Sandia)
	molfile hooks to VMD molfile plugins, used by the USER-MOLFILE package
	from Axel Kohlmeyer (Temple U) and the VMD development team
	mscg hooks to the MSCG library, used by fix_mscg command
	from Jacob Wagner and Greg Voth group (U Chicago)
	+netcdf hooks to a NetCDF library installed on your system
	+ from Lars Pastewka (Karlsruhe Institute of Technology)
	+poems POEMS rigid-body integration package, POEMS package
	+ from Rudranarayan Mukherjee (RPI)
	python hooks to the system Python library, used by the PYTHON package
	from the LAMMPS development team
	qmmm quantum mechanics/molecular mechanics coupling interface
	from Axel Kohlmeyer (Temple U)
	quip interface to QUIP/libAtoms framework, USER-QUIP package
	from Albert Bartok-Partay and Gabor Csanyi (U Cambridge)
	reax ReaxFF potential, REAX package
	from Adri van Duin (Penn State) and Aidan Thompson (Sandia)
	smd hooks to Eigen library, used by USER-SMD package
	from Georg Ganzenmueller (Ernst Mach Institute, Germany)
	voronoi hooks to the Voro++ library, used by compute voronoi/atom command
	from Daniel Schwen (LANL)
	vtk hooks to the VTK library, used by dump custom/vtk command
	from Richard Berger (JKU)
	diff --git a/lib/atc/Install.py b/lib/atc/Install.py
	new file mode 100644
	index 000000000..18b426f92
	--- /dev/null
	+++ b/lib/atc/Install.py
	@@ -0,0 +1,82 @@
	+#!/usr/bin/env python
	+
	+# install.py tool to do a generic build of a library
	+# soft linked to by many of the lib/Install.py files
	+# used to automate the steps described in the corresponding lib/README
	+
	+import sys,commands,os
	+
	+# help message
	+
	+help = """
	+Syntax: python Install.py -m machine -e suffix
	+ specify -m and optionally -e, order does not matter
	+ -m = peform a clean followed by "make -f Makefile.machine"
	+ machine = suffix of a lib/Makefile.* file
	+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
	+ does not alter existing Makefile.machine
	+"""
	+
	+# print error message or help
	+
	+def error(str=None):
	+ if not str: print help
	+ else: print "ERROR",str
	+ sys.exit()
	+
	+# parse args
	+
	+args = sys.argv[1:]
	+nargs = len(args)
	+if nargs == 0: error()
	+
	+machine = None
	+extraflag = 0
	+
	+iarg = 0
	+while iarg < nargs:
	+ if args[iarg] == "-m":
	+ if iarg+2 > nargs: error()
	+ machine = args[iarg+1]
	+ iarg += 2
	+ elif args[iarg] == "-e":
	+ if iarg+2 > nargs: error()
	+ extraflag = 1
	+ suffix = args[iarg+1]
	+ iarg += 2
	+ else: error()
	+
	+# set lib from working dir
	+
	+cwd = os.getcwd()
	+lib = os.path.basename(cwd)
	+
	+# create Makefile.auto as copy of Makefile.machine
	+# reset EXTRAMAKE if requested
	+
	+if not os.path.exists("Makefile.%s" % machine):
	+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
	+
	+lines = open("Makefile.%s" % machine,'r').readlines()
	+fp = open("Makefile.auto",'w')
	+
	+for line in lines:
	+ words = line.split()
	+ if len(words) == 3 and extraflag and \
	+ words[0] == "EXTRAMAKE" and words[1] == '=':
	+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
	+ print >>fp,line,
	+
	+fp.close()
	+
	+# make the library via Makefile.auto
	+
	+print "Building lib%s.a ..." % lib
	+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
	+txt = commands.getoutput(cmd)
	+print txt
	+
	+if os.path.exists("lib%s.a" % lib): print "Build was successful"
	+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
	+if not os.path.exists("Makefile.lammps"):
	+ print "lib/%s/Makefile.lammps was NOT created" % lib
	diff --git a/lib/atc/README b/lib/atc/README
	index 106c303dd..d3adfdafe 100644
	--- a/lib/atc/README
	+++ b/lib/atc/README
	@@ -1,59 +1,64 @@
	ATC (Atom To Continuum methods)

	Reese Jones, Jeremy Templeton, Jonathan Zimmerman (Sandia National Labs)
	rjones, jatempl, jzimmer at sandia.gov
	September 2009

	This is version 1.0 of the ATC library, which provides continuum field
	estimation and molecular dynamics-finite element coupling methods.

	-------------------------------------------------

	This directory has source files to build a library that LAMMPS
	links against when using the USER-ATC package.

	This library must be built with a C++ compiler, before LAMMPS is
	built, so LAMMPS can link against it.

	+You can type "make lib-atc" from the src directory to see help on how
	+to build this library via make commands, or you can do the same thing
	+by typing "python Install.py" from within this directory, or you can
	+do it manually by following the instructions below.
	+
	Build the library using one of the provided Makefile.* files or create
	your own, specific to your compiler and system. For example:

	make -f Makefile.g++

	Note that the ATC library makes MPI calls, so you must build it with
	the same MPI library that is used to build LAMMPS, i.e. as specified
	by settings in the lammps/src/MAKE/Makefile.machine file you are
	using.

	When you are done building this library, two files should
	exist in this directory:

	libatc.a the library LAMMPS will link against
	Makefile.lammps settings the LAMMPS Makefile will import

	Makefile.lammps is created by the make command, by copying one of the
	Makefile.lammps.* files. See the EXTRAMAKE setting at the top of the
	Makefile.* files.

	IMPORTANT: You must examine the final Makefile.lammps to insure it is
	correct for your system, else the LAMMPS build will likely fail.

	Makefile.lammps has settings for 3 variables:

	user-atc_SYSINC = leave blank for this package
	user-atc_SYSLIB = BLAS and LAPACK libraries needed by this package
	user-atc_SYSPATH = path(s) to where those libraries are

	-You have several choices for these settings:
	+You have 3 choices for these settings:

	-If the 2 libraries are already installed on your system, the settings
	-in Makefile.lammps.installed should work.
	+a) If the 2 libraries are already installed on your system, the
	+settings in Makefile.lammps.installed should work.

	-If they are not, you can install them yourself, and speficy the
	-appropriate settings accordingly.
	+b) If they are not, you can install them yourself, and specify the
	+appropriate settings accordingly in a Makefile.lammps.* file
	+and set the EXTRAMAKE setting in Makefile.* to that file.

	-If you want to use the minimalist version of these libraries provided
	-with LAMMPS in lib/linalg, then the settings in Makefile.lammps.linalg
	-should work. Note that in this case you also need to build the
	-linear-algebra in lib/linalg; see the lib/linalg/README for more
	-details.
	+c) Use the minimalist version of these libraries provided with LAMMPS
	+in lib/linalg, by using Makefile.lammps.linalg. In this case you also
	+need to build the library in lib/linalg; see the lib/linalg/README
	+file for more details.
	diff --git a/lib/awpmd/Install.py b/lib/awpmd/Install.py
	new file mode 100644
	index 000000000..18b426f92
	--- /dev/null
	+++ b/lib/awpmd/Install.py
	@@ -0,0 +1,82 @@
	+#!/usr/bin/env python
	+
	+# install.py tool to do a generic build of a library
	+# soft linked to by many of the lib/Install.py files
	+# used to automate the steps described in the corresponding lib/README
	+
	+import sys,commands,os
	+
	+# help message
	+
	+help = """
	+Syntax: python Install.py -m machine -e suffix
	+ specify -m and optionally -e, order does not matter
	+ -m = peform a clean followed by "make -f Makefile.machine"
	+ machine = suffix of a lib/Makefile.* file
	+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
	+ does not alter existing Makefile.machine
	+"""
	+
	+# print error message or help
	+
	+def error(str=None):
	+ if not str: print help
	+ else: print "ERROR",str
	+ sys.exit()
	+
	+# parse args
	+
	+args = sys.argv[1:]
	+nargs = len(args)
	+if nargs == 0: error()
	+
	+machine = None
	+extraflag = 0
	+
	+iarg = 0
	+while iarg < nargs:
	+ if args[iarg] == "-m":
	+ if iarg+2 > nargs: error()
	+ machine = args[iarg+1]
	+ iarg += 2
	+ elif args[iarg] == "-e":
	+ if iarg+2 > nargs: error()
	+ extraflag = 1
	+ suffix = args[iarg+1]
	+ iarg += 2
	+ else: error()
	+
	+# set lib from working dir
	+
	+cwd = os.getcwd()
	+lib = os.path.basename(cwd)
	+
	+# create Makefile.auto as copy of Makefile.machine
	+# reset EXTRAMAKE if requested
	+
	+if not os.path.exists("Makefile.%s" % machine):
	+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
	+
	+lines = open("Makefile.%s" % machine,'r').readlines()
	+fp = open("Makefile.auto",'w')
	+
	+for line in lines:
	+ words = line.split()
	+ if len(words) == 3 and extraflag and \
	+ words[0] == "EXTRAMAKE" and words[1] == '=':
	+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
	+ print >>fp,line,
	+
	+fp.close()
	+
	+# make the library via Makefile.auto
	+
	+print "Building lib%s.a ..." % lib
	+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
	+txt = commands.getoutput(cmd)
	+print txt
	+
	+if os.path.exists("lib%s.a" % lib): print "Build was successful"
	+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
	+if not os.path.exists("Makefile.lammps"):
	+ print "lib/%s/Makefile.lammps was NOT created" % lib
	diff --git a/lib/awpmd/README b/lib/awpmd/README
	index 3c0248041..20e142f74 100644
	--- a/lib/awpmd/README
	+++ b/lib/awpmd/README
	@@ -1,62 +1,67 @@
	AWPMD (Antisymmetrized Wave Packet Molecular Dynamics) library

	Ilya Valuev, Igor Morozov, JIHT RAS
	valuev at physik.hu-berlin.de
	June 2011

	This is version 0.9 of the AWPMD library taken from JIHT GridMD project.
	It contains interface to calculate electronic and electron-ion Hamiltonian,
	norm matrix and forces for AWPMD method.

	AWPMD is an open source program distributed under the terms
	of wxWidgets Library License (see license directory for details).

	-------------------------------------------------

	This directory has source files to build a library that LAMMPS
	links against when using the USER-AWPMD package.

	This library must be built with a C++ compiler, before LAMMPS is
	built, so LAMMPS can link against it.

	+You can type "make lib-awpmd" from the src directory to see help on
	+how to build this library via make commands, or you can do the same
	+thing by typing "python Install.py" from within this directory, or you
	+can do it manually by following the instructions below.
	+
	Build the library using one of the provided Makefile.* files or create
	your own, specific to your compiler and system. For example:

	make -f Makefile.g++

	Note that this library makes MPI calls, so you must build it with the
	same MPI library that is used to build LAMMPS, i.e. as specified by
	settings in the lammps/src/MAKE/Makefile.machine file you are using.

	When you are done building this library, two files should
	exist in this directory:

	libawpmd.a the library LAMMPS will link against
	Makefile.lammps settings the LAMMPS Makefile will import

	Makefile.lammps is created by the make command, by copying one of the
	Makefile.lammps.* files. See the EXTRAMAKE setting at the top of the
	Makefile.* files.

	IMPORTANT: You must examine the final Makefile.lammps to insure it is
	correct for your system, else the LAMMPS build will likely fail.

	Makefile.lammps has settings for 3 variables:

	user-awpmd_SYSINC = leave blank for this package
	user-awpmd_SYSLIB = BLAS and LAPACK libraries needed by this package
	user-awpmd_SYSPATH = path(s) to where those libraries are

	-You have several choices for these settings:
	+You have 3 choices for these settings:

	-If the 2 libraries are already installed on your system, the settings
	-in Makefile.lammps.installed should work.
	+a) If the 2 libraries are already installed on your system, the
	+settings in Makefile.lammps.installed should work.

	-If they are not, you can install them yourself, and speficy the
	-appropriate settings accordingly.
	+b) If they are not, you can install them yourself, and specify the
	+appropriate settings accordingly in a Makefile.lammps.* file
	+and set the EXTRAMAKE setting in Makefile.* to that file.

	-If you want to use the minimalist version of these libraries provided
	-with LAMMPS in lib/linalg, then the settings in Makefile.lammps.linalg
	-should work. Note that in this case you also need to build the
	-linear-algebra in lib/linalg; see the lib/linalg/README for more
	-details.
	+c) Use the minimalist version of these libraries provided with LAMMPS
	+in lib/linalg, by using Makefile.lammps.linalg. In this case you also
	+need to build the library in lib/linalg; see the lib/linalg/README
	+file for more details.
	diff --git a/lib/colvars/Install.py b/lib/colvars/Install.py
	new file mode 100644
	index 000000000..18b426f92
	--- /dev/null
	+++ b/lib/colvars/Install.py
	@@ -0,0 +1,82 @@
	+#!/usr/bin/env python
	+
	+# install.py tool to do a generic build of a library
	+# soft linked to by many of the lib/Install.py files
	+# used to automate the steps described in the corresponding lib/README
	+
	+import sys,commands,os
	+
	+# help message
	+
	+help = """
	+Syntax: python Install.py -m machine -e suffix
	+ specify -m and optionally -e, order does not matter
	+ -m = peform a clean followed by "make -f Makefile.machine"
	+ machine = suffix of a lib/Makefile.* file
	+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
	+ does not alter existing Makefile.machine
	+"""
	+
	+# print error message or help
	+
	+def error(str=None):
	+ if not str: print help
	+ else: print "ERROR",str
	+ sys.exit()
	+
	+# parse args
	+
	+args = sys.argv[1:]
	+nargs = len(args)
	+if nargs == 0: error()
	+
	+machine = None
	+extraflag = 0
	+
	+iarg = 0
	+while iarg < nargs:
	+ if args[iarg] == "-m":
	+ if iarg+2 > nargs: error()
	+ machine = args[iarg+1]
	+ iarg += 2
	+ elif args[iarg] == "-e":
	+ if iarg+2 > nargs: error()
	+ extraflag = 1
	+ suffix = args[iarg+1]
	+ iarg += 2
	+ else: error()
	+
	+# set lib from working dir
	+
	+cwd = os.getcwd()
	+lib = os.path.basename(cwd)
	+
	+# create Makefile.auto as copy of Makefile.machine
	+# reset EXTRAMAKE if requested
	+
	+if not os.path.exists("Makefile.%s" % machine):
	+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
	+
	+lines = open("Makefile.%s" % machine,'r').readlines()
	+fp = open("Makefile.auto",'w')
	+
	+for line in lines:
	+ words = line.split()
	+ if len(words) == 3 and extraflag and \
	+ words[0] == "EXTRAMAKE" and words[1] == '=':
	+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
	+ print >>fp,line,
	+
	+fp.close()
	+
	+# make the library via Makefile.auto
	+
	+print "Building lib%s.a ..." % lib
	+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
	+txt = commands.getoutput(cmd)
	+print txt
	+
	+if os.path.exists("lib%s.a" % lib): print "Build was successful"
	+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
	+if not os.path.exists("Makefile.lammps"):
	+ print "lib/%s/Makefile.lammps was NOT created" % lib
	diff --git a/lib/colvars/README b/lib/colvars/README
	index d6efc333a..a5e5938b2 100644
	--- a/lib/colvars/README
	+++ b/lib/colvars/README
	@@ -1,68 +1,73 @@
	This library is the portable "colvars" module, originally interfaced
	with the NAMD MD code, to provide an extensible software framework,
	that allows enhanced sampling in molecular dynamics simulations.
	The module is written to maximize performance, portability,
	flexibility of usage for the user, and extensibility for the developer.

	The development of the colvars library is now hosted on github at:
	http://colvars.github.io/
	You can use this site to get access to the latest development sources
	and the up-to-date documentation.

	Copy of the specific documentation is also in
	doc/PDF/colvars-refman-lammps.pdf

	Please report bugs and request new features at:
	https://github.com/colvars/colvars/issues

	The following publications describe the principles of
	the implementation of this library:

	Using collective variables to drive molecular dynamics simulations,
	Giacomo Fiorin , Michael L. Klein & Jérôme Hénin (2013):
	Molecular Physics DOI:10.1080/00268976.2013.813594

	Exploring Multidimensional Free Energy Landscapes Using
	Time-Dependent Biases on Collective Variables,
	J. Hénin, G. Fiorin, C. Chipot, and M. L. Klein,
	J. Chem. Theory Comput., 6, 35-47 (2010).

	-------------------------------------------------

	This directory has source files to build a library that LAMMPS
	links against when using the USER-COLVARS package.

	This library must be built with a C++ compiler, before LAMMPS is
	built, so LAMMPS can link against it.

	+You can type "make lib-colvars" from the src directory to see help on
	+how to build this library via make commands, or you can do the same
	+thing by typing "python Install.py" from within this directory, or you
	+can do it manually by following the instructions below.
	+
	Build the library using one of the provided Makefile.* files or create
	your own, specific to your compiler and system. For example:

	make -f Makefile.g++

	When you are done building this library, two files should
	exist in this directory:

	libcolvars.a the library LAMMPS will link against
	Makefile.lammps settings the LAMMPS Makefile will import

	Makefile.lammps is created by the make command, by copying one of the
	Makefile.lammps.* files. See the EXTRAMAKE setting at the top of the
	Makefile.* files.

	IMPORTANT: You must examine the final Makefile.lammps to insure it is
	correct for your system, else the LAMMPS build will likely fail.

	Makefile.lammps has settings for 3 variables:

	user-colvars_SYSINC = leave blank for this package unless debugging
	user-colvars_SYSLIB = leave blank for this package
	user-colvars_SYSPATH = leave blank for this package

	You have several choices for these settings:

	Since they do not normally need to be set, the settings in
	Makefile.lammps.empty should work.

	If you want to set a debug flag recognized by the library, the
	settings in Makefile.lammps.debug should work.
	diff --git a/lib/gpu/Install.py b/lib/gpu/Install.py
	new file mode 100644
	index 000000000..d396be5e1
	--- /dev/null
	+++ b/lib/gpu/Install.py
	@@ -0,0 +1,146 @@
	+#!/usr/bin/env python
	+
	+# Install.py tool to build the GPU library
	+# used to automate the steps described in the README file in this dir
	+
	+import sys,os,re,commands
	+
	+# help message
	+
	+help = """
	+Syntax: python Install.py -i isuffix -h hdir -a arch -p precision -e esuffix -m -o osuffix
	+ specify one or more options, order does not matter
	+ copies an existing Makefile.isuffix in lib/gpu to Makefile.auto
	+ optionally edits these variables in Makefile.auto:
	+ CUDA_HOME, CUDA_ARCH, CUDA_PRECISION, EXTRAMAKE
	+ optionally uses Makefile.auto to build the GPU library -> libgpu.a
	+ and to copy a Makefile.lammps.esuffix -> Makefile.lammps
	+ optionally copies Makefile.auto to a new Makefile.osuffix
	+
	+ -i = use Makefile.isuffix as starting point, copy to Makefile.auto
	+ default isuffix = linux
	+ -h = set CUDA_HOME variable in Makefile.auto to hdir
	+ hdir = path to NVIDIA Cuda software, e.g. /usr/local/cuda
	+ -a = set CUDA_ARCH variable in Makefile.auto to arch
	+ use arch = ?? for K40 (Tesla)
	+ use arch = 37 for dual K80 (Tesla)
	+ use arch = 60 for P100 (Pascal)
	+ -p = set CUDA_PRECISION variable in Makefile.auto to precision
	+ use precision = double or mixed or single
	+ -e = set EXTRAMAKE variable in Makefile.auto to Makefile.lammps.esuffix
	+ -m = make the GPU library using Makefile.auto
	+ first performs a "make clean"
	+ produces libgpu.a if successful
	+ also copies EXTRAMAKE file -> Makefile.lammps
	+ -e can set which Makefile.lammps.esuffix file is copied
	+ -o = copy final Makefile.auto to Makefile.osuffix
	+"""
	+
	+# print error message or help
	+
	+def error(str=None):
	+ if not str: print help
	+ else: print "ERROR",str
	+ sys.exit()
	+
	+# parse args
	+
	+args = sys.argv[1:]
	+nargs = len(args)
	+if nargs == 0: error()
	+
	+isuffix = "linux"
	+hflag = aflag = pflag = eflag = 0
	+makeflag = 0
	+outflag = 0
	+
	+iarg = 0
	+while iarg < nargs:
	+ if args[iarg] == "-i":
	+ if iarg+2 > nargs: error()
	+ isuffix = args[iarg+1]
	+ iarg += 2
	+ elif args[iarg] == "-h":
	+ if iarg+2 > nargs: error()
	+ hflag = 1
	+ hdir = args[iarg+1]
	+ iarg += 2
	+ elif args[iarg] == "-a":
	+ if iarg+2 > nargs: error()
	+ aflag = 1
	+ arch = args[iarg+1]
	+ iarg += 2
	+ elif args[iarg] == "-p":
	+ if iarg+2 > nargs: error()
	+ pflag = 1
	+ precision = args[iarg+1]
	+ iarg += 2
	+ elif args[iarg] == "-e":
	+ if iarg+2 > nargs: error()
	+ eflag = 1
	+ lmpsuffix = args[iarg+1]
	+ iarg += 2
	+ elif args[iarg] == "-m":
	+ makeflag = 1
	+ iarg += 1
	+ elif args[iarg] == "-o":
	+ if iarg+2 > nargs: error()
	+ outflag = 1
	+ osuffix = args[iarg+1]
	+ iarg += 2
	+ else: error()
	+
	+if pflag:
	+ if precision == "double": precstr = "-D_DOUBLE_DOUBLE"
	+ elif precision == "mixed": precstr = "-D_SINGLE_DOUBLE"
	+ elif precision == "single": precstr = "-D_SINGLE_SINGLE"
	+ else: error("Invalid precision setting")
	+
	+# create Makefile.auto
	+# reset EXTRAMAKE, CUDA_HOME, CUDA_ARCH, CUDA_PRECISION if requested
	+
	+if not os.path.exists("Makefile.%s" % isuffix):
	+ error("lib/gpu/Makefile.%s does not exist" % isuffix)
	+
	+lines = open("Makefile.%s" % isuffix,'r').readlines()
	+fp = open("Makefile.auto",'w')
	+
	+for line in lines:
	+ words = line.split()
	+ if len(words) != 3:
	+ print >>fp,line,
	+ continue
	+
	+ if hflag and words[0] == "CUDA_HOME" and words[1] == '=':
	+ line = line.replace(words[2],hdir)
	+ if aflag and words[0] == "CUDA_ARCH" and words[1] == '=':
	+ line = line.replace(words[2],"-arch=sm_%s" % arch)
	+ if pflag and words[0] == "CUDA_PRECISION" and words[1] == '=':
	+ line = line.replace(words[2],precstr)
	+ if eflag and words[0] == "EXTRAMAKE" and words[1] == '=':
	+ line = line.replace(words[2],"Makefile.lammps.%s" % lmpsuffix)
	+
	+ print >>fp,line,
	+
	+fp.close()
	+
	+# perform make
	+# make operations copies EXTRAMAKE file to Makefile.lammps
	+
	+if makeflag:
	+ print "Building libgpu.a ..."
	+ cmd = "rm -f libgpu.a"
	+ commands.getoutput(cmd)
	+ cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
	+ commands.getoutput(cmd)
	+ if not os.path.exists("libgpu.a"):
	+ error("Build of lib/gpu/libgpu.a was NOT successful")
	+ if not os.path.exists("Makefile.lammps"):
	+ error("lib/gpu/Makefile.lammps was NOT created")
	+
	+# copy new Makefile.auto to Makefile.osuffix
	+
	+if outflag:
	+ print "Creating new Makefile.%s" % osuffix
	+ cmd = "cp Makefile.auto Makefile.%s" % osuffix
	+ commands.getoutput(cmd)
	diff --git a/lib/gpu/Nvidia.makefile b/lib/gpu/Nvidia.makefile
	index e02849cfe..660544cfa 100644
	--- a/lib/gpu/Nvidia.makefile
	+++ b/lib/gpu/Nvidia.makefile
	@@ -1,798 +1,798 @@
	CUDA = $(NVCC) $(CUDA_INCLUDE) $(CUDA_OPTS) -Icudpp_mini $(CUDA_ARCH) \
	$(CUDA_PRECISION)
	CUDR = $(CUDR_CPP) $(CUDR_OPTS) $(CUDA_PRECISION) $(CUDA_INCLUDE) \
	$(CUDPP_OPT)
	CUDA_LINK = $(CUDA_LIB) -lcudart
	BIN2C = $(CUDA_HOME)/bin/bin2c

	GPU_LIB = $(LIB_DIR)/libgpu.a

	# Headers for Geryon
	UCL_H = $(wildcard ./geryon/ucl*.h)
	NVC_H = $(wildcard ./geryon/nvc*.h) $(UCL_H)
	NVD_H = $(wildcard ./geryon/nvd*.h) $(UCL_H) lal_preprocessor.h
	# Headers for Pair Stuff
	PAIR_H = lal_atom.h lal_answer.h lal_neighbor_shared.h \
	lal_neighbor.h lal_precision.h lal_device.h \
	lal_balance.h lal_pppm.h

	ALL_H = $(NVD_H) $(PAIR_H)

	EXECS = $(BIN_DIR)/nvc_get_devices
	ifdef CUDPP_OPT
	CUDPP = $(OBJ_DIR)/cudpp.o $(OBJ_DIR)/cudpp_plan.o \
	$(OBJ_DIR)/cudpp_maximal_launch.o $(OBJ_DIR)/cudpp_plan_manager.o \
	$(OBJ_DIR)/radixsort_app.cu_o $(OBJ_DIR)/scan_app.cu_o
	endif
	OBJS = $(OBJ_DIR)/lal_atom.o $(OBJ_DIR)/lal_ans.o \
	$(OBJ_DIR)/lal_neighbor.o $(OBJ_DIR)/lal_neighbor_shared.o \
	$(OBJ_DIR)/lal_device.o $(OBJ_DIR)/lal_base_atomic.o \
	$(OBJ_DIR)/lal_base_charge.o $(OBJ_DIR)/lal_base_ellipsoid.o \
	$(OBJ_DIR)/lal_base_dipole.o $(OBJ_DIR)/lal_base_three.o \
	$(OBJ_DIR)/lal_base_dpd.o \
	$(OBJ_DIR)/lal_pppm.o $(OBJ_DIR)/lal_pppm_ext.o \
	$(OBJ_DIR)/lal_gayberne.o $(OBJ_DIR)/lal_gayberne_ext.o \
	$(OBJ_DIR)/lal_re_squared.o $(OBJ_DIR)/lal_re_squared_ext.o \
	$(OBJ_DIR)/lal_lj.o $(OBJ_DIR)/lal_lj_ext.o \
	$(OBJ_DIR)/lal_lj96.o $(OBJ_DIR)/lal_lj96_ext.o \
	$(OBJ_DIR)/lal_lj_expand.o $(OBJ_DIR)/lal_lj_expand_ext.o \
	$(OBJ_DIR)/lal_lj_coul.o $(OBJ_DIR)/lal_lj_coul_ext.o \
	$(OBJ_DIR)/lal_lj_coul_long.o $(OBJ_DIR)/lal_lj_coul_long_ext.o \
	$(OBJ_DIR)/lal_lj_dsf.o $(OBJ_DIR)/lal_lj_dsf_ext.o \
	$(OBJ_DIR)/lal_lj_class2_long.o $(OBJ_DIR)/lal_lj_class2_long_ext.o \
	$(OBJ_DIR)/lal_coul_long.o $(OBJ_DIR)/lal_coul_long_ext.o \
	$(OBJ_DIR)/lal_morse.o $(OBJ_DIR)/lal_morse_ext.o \
	$(OBJ_DIR)/lal_charmm_long.o $(OBJ_DIR)/lal_charmm_long_ext.o \
	- $(OBJ_DIR)/lal_cg_cmm.o $(OBJ_DIR)/lal_cg_cmm_ext.o \
	- $(OBJ_DIR)/lal_cg_cmm_long.o $(OBJ_DIR)/lal_cg_cmm_long_ext.o \
	+ $(OBJ_DIR)/lal_lj_sdk.o $(OBJ_DIR)/lal_lj_sdk_ext.o \
	+ $(OBJ_DIR)/lal_lj_sdk_long.o $(OBJ_DIR)/lal_lj_sdk_long_ext.o \
	$(OBJ_DIR)/lal_eam.o $(OBJ_DIR)/lal_eam_ext.o \
	$(OBJ_DIR)/lal_eam_fs_ext.o $(OBJ_DIR)/lal_eam_alloy_ext.o \
	$(OBJ_DIR)/lal_buck.o $(OBJ_DIR)/lal_buck_ext.o \
	$(OBJ_DIR)/lal_buck_coul.o $(OBJ_DIR)/lal_buck_coul_ext.o \
	$(OBJ_DIR)/lal_buck_coul_long.o $(OBJ_DIR)/lal_buck_coul_long_ext.o \
	$(OBJ_DIR)/lal_table.o $(OBJ_DIR)/lal_table_ext.o \
	$(OBJ_DIR)/lal_yukawa.o $(OBJ_DIR)/lal_yukawa_ext.o \
	$(OBJ_DIR)/lal_born.o $(OBJ_DIR)/lal_born_ext.o \
	$(OBJ_DIR)/lal_born_coul_wolf.o $(OBJ_DIR)/lal_born_coul_wolf_ext.o \
	$(OBJ_DIR)/lal_born_coul_long.o $(OBJ_DIR)/lal_born_coul_long_ext.o \
	$(OBJ_DIR)/lal_dipole_lj.o $(OBJ_DIR)/lal_dipole_lj_ext.o \
	$(OBJ_DIR)/lal_dipole_lj_sf.o $(OBJ_DIR)/lal_dipole_lj_sf_ext.o \
	$(OBJ_DIR)/lal_colloid.o $(OBJ_DIR)/lal_colloid_ext.o \
	$(OBJ_DIR)/lal_gauss.o $(OBJ_DIR)/lal_gauss_ext.o \
	$(OBJ_DIR)/lal_yukawa_colloid.o $(OBJ_DIR)/lal_yukawa_colloid_ext.o \
	$(OBJ_DIR)/lal_lj_coul_debye.o $(OBJ_DIR)/lal_lj_coul_debye_ext.o \
	$(OBJ_DIR)/lal_coul_dsf.o $(OBJ_DIR)/lal_coul_dsf_ext.o \
	$(OBJ_DIR)/lal_sw.o $(OBJ_DIR)/lal_sw_ext.o \
	$(OBJ_DIR)/lal_beck.o $(OBJ_DIR)/lal_beck_ext.o \
	$(OBJ_DIR)/lal_mie.o $(OBJ_DIR)/lal_mie_ext.o \
	$(OBJ_DIR)/lal_soft.o $(OBJ_DIR)/lal_soft_ext.o \
	$(OBJ_DIR)/lal_lj_coul_msm.o $(OBJ_DIR)/lal_lj_coul_msm_ext.o \
	$(OBJ_DIR)/lal_lj_gromacs.o $(OBJ_DIR)/lal_lj_gromacs_ext.o \
	$(OBJ_DIR)/lal_dpd.o $(OBJ_DIR)/lal_dpd_ext.o \
	$(OBJ_DIR)/lal_tersoff.o $(OBJ_DIR)/lal_tersoff_ext.o \
	$(OBJ_DIR)/lal_tersoff_zbl.o $(OBJ_DIR)/lal_tersoff_zbl_ext.o \
	$(OBJ_DIR)/lal_tersoff_mod.o $(OBJ_DIR)/lal_tersoff_mod_ext.o \
	$(OBJ_DIR)/lal_coul.o $(OBJ_DIR)/lal_coul_ext.o \
	$(OBJ_DIR)/lal_coul_debye.o $(OBJ_DIR)/lal_coul_debye_ext.o \
	$(OBJ_DIR)/lal_zbl.o $(OBJ_DIR)/lal_zbl_ext.o \
	$(OBJ_DIR)/lal_lj_cubic.o $(OBJ_DIR)/lal_lj_cubic_ext.o

	CBNS = $(OBJ_DIR)/device.cubin $(OBJ_DIR)/device_cubin.h \
	$(OBJ_DIR)/atom.cubin $(OBJ_DIR)/atom_cubin.h \
	$(OBJ_DIR)/neighbor_cpu.cubin $(OBJ_DIR)/neighbor_cpu_cubin.h \
	$(OBJ_DIR)/neighbor_gpu.cubin $(OBJ_DIR)/neighbor_gpu_cubin.h \
	$(OBJ_DIR)/pppm_f.cubin $(OBJ_DIR)/pppm_f_cubin.h \
	$(OBJ_DIR)/pppm_d.cubin $(OBJ_DIR)/pppm_d_cubin.h \
	$(OBJ_DIR)/ellipsoid_nbor.cubin $(OBJ_DIR)/ellipsoid_nbor_cubin.h \
	$(OBJ_DIR)/gayberne.cubin $(OBJ_DIR)/gayberne_lj.cubin \
	$(OBJ_DIR)/gayberne_cubin.h $(OBJ_DIR)/gayberne_lj_cubin.h \
	$(OBJ_DIR)/re_squared.cubin $(OBJ_DIR)/re_squared_lj.cubin \
	$(OBJ_DIR)/re_squared_cubin.h $(OBJ_DIR)/re_squared_lj_cubin.h \
	$(OBJ_DIR)/lj.cubin $(OBJ_DIR)/lj_cubin.h \
	$(OBJ_DIR)/lj96.cubin $(OBJ_DIR)/lj96_cubin.h \
	$(OBJ_DIR)/lj_expand.cubin $(OBJ_DIR)/lj_expand_cubin.h \
	$(OBJ_DIR)/lj_coul.cubin $(OBJ_DIR)/lj_coul_cubin.h \
	$(OBJ_DIR)/lj_coul_long.cubin $(OBJ_DIR)/lj_coul_long_cubin.h \
	$(OBJ_DIR)/lj_dsf.cubin $(OBJ_DIR)/lj_dsf_cubin.h \
	$(OBJ_DIR)/lj_class2_long.cubin $(OBJ_DIR)/lj_class2_long_cubin.h \
	$(OBJ_DIR)/coul_long.cubin $(OBJ_DIR)/coul_long_cubin.h \
	$(OBJ_DIR)/morse.cubin $(OBJ_DIR)/morse_cubin.h \
	$(OBJ_DIR)/charmm_long.cubin $(OBJ_DIR)/charmm_long_cubin.h \
	- $(OBJ_DIR)/cg_cmm.cubin $(OBJ_DIR)/cg_cmm_cubin.h \
	- $(OBJ_DIR)/cg_cmm_long.cubin $(OBJ_DIR)/cg_cmm_long_cubin.h \
	+ $(OBJ_DIR)/lj_sdk.cubin $(OBJ_DIR)/lj_sdk_cubin.h \
	+ $(OBJ_DIR)/lj_sdk_long.cubin $(OBJ_DIR)/lj_sdk_long_cubin.h \
	$(OBJ_DIR)/eam.cubin $(OBJ_DIR)/eam_cubin.h \
	$(OBJ_DIR)/buck.cubin $(OBJ_DIR)/buck_cubin.h \
	$(OBJ_DIR)/buck_coul_long.cubin $(OBJ_DIR)/buck_coul_long_cubin.h \
	$(OBJ_DIR)/buck_coul.cubin $(OBJ_DIR)/buck_coul_cubin.h \
	$(OBJ_DIR)/table.cubin $(OBJ_DIR)/table_cubin.h \
	$(OBJ_DIR)/yukawa.cubin $(OBJ_DIR)/yukawa_cubin.h \
	$(OBJ_DIR)/born.cubin $(OBJ_DIR)/born_cubin.h \
	$(OBJ_DIR)/born_coul_wolf.cubin $(OBJ_DIR)/born_coul_wolf_cubin.h \
	$(OBJ_DIR)/born_coul_long.cubin $(OBJ_DIR)/born_coul_long_cubin.h \
	$(OBJ_DIR)/dipole_lj.cubin $(OBJ_DIR)/dipole_lj_cubin.h \
	$(OBJ_DIR)/dipole_lj_sf.cubin $(OBJ_DIR)/dipole_lj_sf_cubin.h \
	$(OBJ_DIR)/colloid.cubin $(OBJ_DIR)/colloid_cubin.h \
	$(OBJ_DIR)/gauss.cubin $(OBJ_DIR)/gauss_cubin.h \
	$(OBJ_DIR)/yukawa_colloid.cubin $(OBJ_DIR)/yukawa_colloid_cubin.h \
	$(OBJ_DIR)/lj_coul_debye.cubin $(OBJ_DIR)/lj_coul_debye_cubin.h \
	$(OBJ_DIR)/coul_dsf.cubin $(OBJ_DIR)/coul_dsf_cubin.h \
	$(OBJ_DIR)/sw.cubin $(OBJ_DIR)/sw_cubin.h \
	$(OBJ_DIR)/beck.cubin $(OBJ_DIR)/beck_cubin.h \
	$(OBJ_DIR)/mie.cubin $(OBJ_DIR)/mie_cubin.h \
	$(OBJ_DIR)/soft.cubin $(OBJ_DIR)/soft_cubin.h \
	$(OBJ_DIR)/lj_coul_msm.cubin $(OBJ_DIR)/lj_coul_msm_cubin.h \
	$(OBJ_DIR)/lj_gromacs.cubin $(OBJ_DIR)/lj_gromacs_cubin.h \
	$(OBJ_DIR)/dpd.cubin $(OBJ_DIR)/dpd_cubin.h \
	$(OBJ_DIR)/tersoff.cubin $(OBJ_DIR)/tersoff_cubin.h \
	$(OBJ_DIR)/tersoff_zbl.cubin $(OBJ_DIR)/tersoff_zbl_cubin.h \
	$(OBJ_DIR)/tersoff_mod.cubin $(OBJ_DIR)/tersoff_mod_cubin.h \
	$(OBJ_DIR)/coul.cubin $(OBJ_DIR)/coul_cubin.h \
	$(OBJ_DIR)/coul_debye.cubin $(OBJ_DIR)/coul_debye_cubin.h \
	$(OBJ_DIR)/zbl.cubin $(OBJ_DIR)/zbl_cubin.h \
	$(OBJ_DIR)/lj_cubic.cubin $(OBJ_DIR)/lj_cubic_cubin.h

	all: $(OBJ_DIR) $(GPU_LIB) $(EXECS)

	$(OBJ_DIR):
	mkdir -p $@

	$(OBJ_DIR)/cudpp.o: cudpp_mini/cudpp.cpp
	$(CUDR) -o $@ -c cudpp_mini/cudpp.cpp -Icudpp_mini

	$(OBJ_DIR)/cudpp_plan.o: cudpp_mini/cudpp_plan.cpp
	$(CUDR) -o $@ -c cudpp_mini/cudpp_plan.cpp -Icudpp_mini

	$(OBJ_DIR)/cudpp_maximal_launch.o: cudpp_mini/cudpp_maximal_launch.cpp
	$(CUDR) -o $@ -c cudpp_mini/cudpp_maximal_launch.cpp -Icudpp_mini

	$(OBJ_DIR)/cudpp_plan_manager.o: cudpp_mini/cudpp_plan_manager.cpp
	$(CUDR) -o $@ -c cudpp_mini/cudpp_plan_manager.cpp -Icudpp_mini

	$(OBJ_DIR)/radixsort_app.cu_o: cudpp_mini/radixsort_app.cu
	$(CUDA) -o $@ -c cudpp_mini/radixsort_app.cu

	$(OBJ_DIR)/scan_app.cu_o: cudpp_mini/scan_app.cu
	$(CUDA) -o $@ -c cudpp_mini/scan_app.cu

	$(OBJ_DIR)/atom.cubin: lal_atom.cu lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_atom.cu

	$(OBJ_DIR)/atom_cubin.h: $(OBJ_DIR)/atom.cubin
	$(BIN2C) -c -n atom $(OBJ_DIR)/atom.cubin > $(OBJ_DIR)/atom_cubin.h

	$(OBJ_DIR)/lal_atom.o: lal_atom.cpp lal_atom.h $(NVD_H) $(OBJ_DIR)/atom_cubin.h
	$(CUDR) -o $@ -c lal_atom.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_ans.o: lal_answer.cpp lal_answer.h $(NVD_H)
	$(CUDR) -o $@ -c lal_answer.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/neighbor_cpu.cubin: lal_neighbor_cpu.cu lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_neighbor_cpu.cu

	$(OBJ_DIR)/neighbor_cpu_cubin.h: $(OBJ_DIR)/neighbor_cpu.cubin
	$(BIN2C) -c -n neighbor_cpu $(OBJ_DIR)/neighbor_cpu.cubin > $(OBJ_DIR)/neighbor_cpu_cubin.h

	$(OBJ_DIR)/neighbor_gpu.cubin: lal_neighbor_gpu.cu lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_neighbor_gpu.cu

	$(OBJ_DIR)/neighbor_gpu_cubin.h: $(OBJ_DIR)/neighbor_gpu.cubin
	$(BIN2C) -c -n neighbor_gpu $(OBJ_DIR)/neighbor_gpu.cubin > $(OBJ_DIR)/neighbor_gpu_cubin.h

	$(OBJ_DIR)/lal_neighbor_shared.o: lal_neighbor_shared.cpp lal_neighbor_shared.h $(OBJ_DIR)/neighbor_cpu_cubin.h $(OBJ_DIR)/neighbor_gpu_cubin.h $(NVD_H)
	$(CUDR) -o $@ -c lal_neighbor_shared.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_neighbor.o: lal_neighbor.cpp lal_neighbor.h lal_neighbor_shared.h $(NVD_H)
	$(CUDR) -o $@ -c lal_neighbor.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/device.cubin: lal_device.cu lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_device.cu

	$(OBJ_DIR)/device_cubin.h: $(OBJ_DIR)/device.cubin
	$(BIN2C) -c -n device $(OBJ_DIR)/device.cubin > $(OBJ_DIR)/device_cubin.h

	$(OBJ_DIR)/lal_device.o: lal_device.cpp lal_device.h $(ALL_H) $(OBJ_DIR)/device_cubin.h
	$(CUDR) -o $@ -c lal_device.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_base_atomic.o: $(ALL_H) lal_base_atomic.h lal_base_atomic.cpp
	$(CUDR) -o $@ -c lal_base_atomic.cpp

	$(OBJ_DIR)/lal_base_charge.o: $(ALL_H) lal_base_charge.h lal_base_charge.cpp
	$(CUDR) -o $@ -c lal_base_charge.cpp

	$(OBJ_DIR)/lal_base_ellipsoid.o: $(ALL_H) lal_base_ellipsoid.h lal_base_ellipsoid.cpp $(OBJ_DIR)/ellipsoid_nbor_cubin.h
	$(CUDR) -o $@ -c lal_base_ellipsoid.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_base_dipole.o: $(ALL_H) lal_base_dipole.h lal_base_dipole.cpp
	$(CUDR) -o $@ -c lal_base_dipole.cpp

	$(OBJ_DIR)/lal_base_three.o: $(ALL_H) lal_base_three.h lal_base_three.cpp
	$(CUDR) -o $@ -c lal_base_three.cpp

	$(OBJ_DIR)/lal_base_dpd.o: $(ALL_H) lal_base_dpd.h lal_base_dpd.cpp
	$(CUDR) -o $@ -c lal_base_dpd.cpp

	$(OBJ_DIR)/pppm_f.cubin: lal_pppm.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -Dgrdtyp=float -Dgrdtyp4=float4 -o $@ lal_pppm.cu

	$(OBJ_DIR)/pppm_f_cubin.h: $(OBJ_DIR)/pppm_f.cubin
	$(BIN2C) -c -n pppm_f $(OBJ_DIR)/pppm_f.cubin > $(OBJ_DIR)/pppm_f_cubin.h

	$(OBJ_DIR)/pppm_d.cubin: lal_pppm.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -Dgrdtyp=double -Dgrdtyp4=double4 -o $@ lal_pppm.cu

	$(OBJ_DIR)/pppm_d_cubin.h: $(OBJ_DIR)/pppm_d.cubin
	$(BIN2C) -c -n pppm_d $(OBJ_DIR)/pppm_d.cubin > $(OBJ_DIR)/pppm_d_cubin.h

	$(OBJ_DIR)/lal_pppm.o: $(ALL_H) lal_pppm.h lal_pppm.cpp $(OBJ_DIR)/pppm_f_cubin.h $(OBJ_DIR)/pppm_d_cubin.h
	$(CUDR) -o $@ -c lal_pppm.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_pppm_ext.o: $(ALL_H) lal_pppm.h lal_pppm_ext.cpp
	$(CUDR) -o $@ -c lal_pppm_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/ellipsoid_nbor.cubin: lal_ellipsoid_nbor.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_ellipsoid_nbor.cu

	$(OBJ_DIR)/ellipsoid_nbor_cubin.h: $(OBJ_DIR)/ellipsoid_nbor.cubin
	$(BIN2C) -c -n ellipsoid_nbor $(OBJ_DIR)/ellipsoid_nbor.cubin > $(OBJ_DIR)/ellipsoid_nbor_cubin.h

	$(OBJ_DIR)/gayberne.cubin: lal_gayberne.cu lal_precision.h lal_ellipsoid_extra.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_gayberne.cu

	$(OBJ_DIR)/gayberne_lj.cubin: lal_gayberne_lj.cu lal_precision.h lal_ellipsoid_extra.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_gayberne_lj.cu

	$(OBJ_DIR)/gayberne_cubin.h: $(OBJ_DIR)/gayberne.cubin
	$(BIN2C) -c -n gayberne $(OBJ_DIR)/gayberne.cubin > $(OBJ_DIR)/gayberne_cubin.h

	$(OBJ_DIR)/gayberne_lj_cubin.h: $(OBJ_DIR)/gayberne_lj.cubin
	$(BIN2C) -c -n gayberne_lj $(OBJ_DIR)/gayberne_lj.cubin > $(OBJ_DIR)/gayberne_lj_cubin.h

	$(OBJ_DIR)/lal_gayberne.o: $(ALL_H) lal_gayberne.h lal_gayberne.cpp $(OBJ_DIR)/gayberne_cubin.h $(OBJ_DIR)/gayberne_lj_cubin.h $(OBJ_DIR)/lal_base_ellipsoid.o
	$(CUDR) -o $@ -c lal_gayberne.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_gayberne_ext.o: $(ALL_H) $(OBJ_DIR)/lal_gayberne.o lal_gayberne_ext.cpp
	$(CUDR) -o $@ -c lal_gayberne_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/re_squared.cubin: lal_re_squared.cu lal_precision.h lal_ellipsoid_extra.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_re_squared.cu

	$(OBJ_DIR)/re_squared_lj.cubin: lal_re_squared_lj.cu lal_precision.h lal_ellipsoid_extra.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_re_squared_lj.cu

	$(OBJ_DIR)/re_squared_cubin.h: $(OBJ_DIR)/re_squared.cubin
	$(BIN2C) -c -n re_squared $(OBJ_DIR)/re_squared.cubin > $(OBJ_DIR)/re_squared_cubin.h

	$(OBJ_DIR)/re_squared_lj_cubin.h: $(OBJ_DIR)/re_squared_lj.cubin
	$(BIN2C) -c -n re_squared_lj $(OBJ_DIR)/re_squared_lj.cubin > $(OBJ_DIR)/re_squared_lj_cubin.h

	$(OBJ_DIR)/lal_re_squared.o: $(ALL_H) lal_re_squared.h lal_re_squared.cpp $(OBJ_DIR)/re_squared_cubin.h $(OBJ_DIR)/re_squared_lj_cubin.h $(OBJ_DIR)/lal_base_ellipsoid.o
	$(CUDR) -o $@ -c lal_re_squared.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_re_squared_ext.o: $(ALL_H) $(OBJ_DIR)/lal_re_squared.o lal_re_squared_ext.cpp
	$(CUDR) -o $@ -c lal_re_squared_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj.cubin: lal_lj.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj.cu

	$(OBJ_DIR)/lj_cubin.h: $(OBJ_DIR)/lj.cubin $(OBJ_DIR)/lj.cubin
	$(BIN2C) -c -n lj $(OBJ_DIR)/lj.cubin > $(OBJ_DIR)/lj_cubin.h

	$(OBJ_DIR)/lal_lj.o: $(ALL_H) lal_lj.h lal_lj.cpp $(OBJ_DIR)/lj_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_lj.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_ext.o: $(ALL_H) lal_lj.h lal_lj_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_lj_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_coul.cubin: lal_lj_coul.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_coul.cu

	$(OBJ_DIR)/lj_coul_cubin.h: $(OBJ_DIR)/lj_coul.cubin $(OBJ_DIR)/lj_coul.cubin
	$(BIN2C) -c -n lj_coul $(OBJ_DIR)/lj_coul.cubin > $(OBJ_DIR)/lj_coul_cubin.h

	$(OBJ_DIR)/lal_lj_coul.o: $(ALL_H) lal_lj_coul.h lal_lj_coul.cpp $(OBJ_DIR)/lj_coul_cubin.h $(OBJ_DIR)/lal_base_charge.o
	$(CUDR) -o $@ -c lal_lj_coul.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_coul_ext.o: $(ALL_H) lal_lj_coul.h lal_lj_coul_ext.cpp lal_base_charge.h
	$(CUDR) -o $@ -c lal_lj_coul_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_class2_long.cubin: lal_lj_class2_long.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_class2_long.cu

	$(OBJ_DIR)/lj_class2_long_cubin.h: $(OBJ_DIR)/lj_class2_long.cubin $(OBJ_DIR)/lj_class2_long.cubin
	$(BIN2C) -c -n lj_class2_long $(OBJ_DIR)/lj_class2_long.cubin > $(OBJ_DIR)/lj_class2_long_cubin.h

	$(OBJ_DIR)/lal_lj_class2_long.o: $(ALL_H) lal_lj_class2_long.h lal_lj_class2_long.cpp $(OBJ_DIR)/lj_class2_long_cubin.h $(OBJ_DIR)/lal_base_charge.o
	$(CUDR) -o $@ -c lal_lj_class2_long.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_class2_long_ext.o: $(ALL_H) lal_lj_class2_long.h lal_lj_class2_long_ext.cpp lal_base_charge.h
	$(CUDR) -o $@ -c lal_lj_class2_long_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/coul_long.cubin: lal_coul_long.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_coul_long.cu

	$(OBJ_DIR)/coul_long_cubin.h: $(OBJ_DIR)/coul_long.cubin $(OBJ_DIR)/coul_long.cubin
	$(BIN2C) -c -n coul_long $(OBJ_DIR)/coul_long.cubin > $(OBJ_DIR)/coul_long_cubin.h

	$(OBJ_DIR)/lal_coul_long.o: $(ALL_H) lal_coul_long.h lal_coul_long.cpp $(OBJ_DIR)/coul_long_cubin.h $(OBJ_DIR)/lal_base_charge.o
	$(CUDR) -o $@ -c lal_coul_long.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_coul_long_ext.o: $(ALL_H) lal_coul_long.h lal_coul_long_ext.cpp lal_base_charge.h
	$(CUDR) -o $@ -c lal_coul_long_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_coul_long.cubin: lal_lj_coul_long.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_coul_long.cu

	$(OBJ_DIR)/lj_coul_long_cubin.h: $(OBJ_DIR)/lj_coul_long.cubin $(OBJ_DIR)/lj_coul_long.cubin
	$(BIN2C) -c -n lj_coul_long $(OBJ_DIR)/lj_coul_long.cubin > $(OBJ_DIR)/lj_coul_long_cubin.h

	$(OBJ_DIR)/lal_lj_coul_long.o: $(ALL_H) lal_lj_coul_long.h lal_lj_coul_long.cpp $(OBJ_DIR)/lj_coul_long_cubin.h $(OBJ_DIR)/lal_base_charge.o
	$(CUDR) -o $@ -c lal_lj_coul_long.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_coul_long_ext.o: $(ALL_H) lal_lj_coul_long.h lal_lj_coul_long_ext.cpp lal_base_charge.h
	$(CUDR) -o $@ -c lal_lj_coul_long_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_dsf.cubin: lal_lj_dsf.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_dsf.cu

	$(OBJ_DIR)/lj_dsf_cubin.h: $(OBJ_DIR)/lj_dsf.cubin $(OBJ_DIR)/lj_dsf.cubin
	$(BIN2C) -c -n lj_dsf $(OBJ_DIR)/lj_dsf.cubin > $(OBJ_DIR)/lj_dsf_cubin.h

	$(OBJ_DIR)/lal_lj_dsf.o: $(ALL_H) lal_lj_dsf.h lal_lj_dsf.cpp $(OBJ_DIR)/lj_dsf_cubin.h $(OBJ_DIR)/lal_base_charge.o
	$(CUDR) -o $@ -c lal_lj_dsf.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_dsf_ext.o: $(ALL_H) lal_lj_dsf.h lal_lj_dsf_ext.cpp lal_base_charge.h
	$(CUDR) -o $@ -c lal_lj_dsf_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/morse.cubin: lal_morse.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_morse.cu

	$(OBJ_DIR)/morse_cubin.h: $(OBJ_DIR)/morse.cubin $(OBJ_DIR)/morse.cubin
	$(BIN2C) -c -n morse $(OBJ_DIR)/morse.cubin > $(OBJ_DIR)/morse_cubin.h

	$(OBJ_DIR)/lal_morse.o: $(ALL_H) lal_morse.h lal_morse.cpp $(OBJ_DIR)/morse_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_morse.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_morse_ext.o: $(ALL_H) lal_morse.h lal_morse_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_morse_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/charmm_long.cubin: lal_charmm_long.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_charmm_long.cu

	$(OBJ_DIR)/charmm_long_cubin.h: $(OBJ_DIR)/charmm_long.cubin $(OBJ_DIR)/charmm_long.cubin
	$(BIN2C) -c -n charmm_long $(OBJ_DIR)/charmm_long.cubin > $(OBJ_DIR)/charmm_long_cubin.h

	$(OBJ_DIR)/lal_charmm_long.o: $(ALL_H) lal_charmm_long.h lal_charmm_long.cpp $(OBJ_DIR)/charmm_long_cubin.h $(OBJ_DIR)/lal_base_charge.o
	$(CUDR) -o $@ -c lal_charmm_long.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_charmm_long_ext.o: $(ALL_H) lal_charmm_long.h lal_charmm_long_ext.cpp lal_base_charge.h
	$(CUDR) -o $@ -c lal_charmm_long_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj96.cubin: lal_lj96.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj96.cu

	$(OBJ_DIR)/lj96_cubin.h: $(OBJ_DIR)/lj96.cubin $(OBJ_DIR)/lj96.cubin
	$(BIN2C) -c -n lj96 $(OBJ_DIR)/lj96.cubin > $(OBJ_DIR)/lj96_cubin.h

	$(OBJ_DIR)/lal_lj96.o: $(ALL_H) lal_lj96.h lal_lj96.cpp $(OBJ_DIR)/lj96_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_lj96.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj96_ext.o: $(ALL_H) lal_lj96.h lal_lj96_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_lj96_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_expand.cubin: lal_lj_expand.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_expand.cu

	$(OBJ_DIR)/lj_expand_cubin.h: $(OBJ_DIR)/lj_expand.cubin $(OBJ_DIR)/lj_expand.cubin
	$(BIN2C) -c -n lj_expand $(OBJ_DIR)/lj_expand.cubin > $(OBJ_DIR)/lj_expand_cubin.h

	$(OBJ_DIR)/lal_lj_expand.o: $(ALL_H) lal_lj_expand.h lal_lj_expand.cpp $(OBJ_DIR)/lj_expand_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_lj_expand.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_expand_ext.o: $(ALL_H) lal_lj_expand.h lal_lj_expand_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_lj_expand_ext.cpp -I$(OBJ_DIR)

	-$(OBJ_DIR)/cg_cmm.cubin: lal_cg_cmm.cu lal_precision.h lal_preprocessor.h
	- $(CUDA) --cubin -DNV_KERNEL -o $@ lal_cg_cmm.cu
	+$(OBJ_DIR)/lj_sdk.cubin: lal_lj_sdk.cu lal_precision.h lal_preprocessor.h
	+ $(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_sdk.cu

	-$(OBJ_DIR)/cg_cmm_cubin.h: $(OBJ_DIR)/cg_cmm.cubin $(OBJ_DIR)/cg_cmm.cubin
	- $(BIN2C) -c -n cg_cmm $(OBJ_DIR)/cg_cmm.cubin > $(OBJ_DIR)/cg_cmm_cubin.h
	+$(OBJ_DIR)/lj_sdk_cubin.h: $(OBJ_DIR)/lj_sdk.cubin $(OBJ_DIR)/lj_sdk.cubin
	+ $(BIN2C) -c -n lj_sdk $(OBJ_DIR)/lj_sdk.cubin > $(OBJ_DIR)/lj_sdk_cubin.h

	-$(OBJ_DIR)/lal_cg_cmm.o: $(ALL_H) lal_cg_cmm.h lal_cg_cmm.cpp $(OBJ_DIR)/cg_cmm_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	- $(CUDR) -o $@ -c lal_cg_cmm.cpp -I$(OBJ_DIR)
	+$(OBJ_DIR)/lal_lj_sdk.o: $(ALL_H) lal_lj_sdk.h lal_lj_sdk.cpp $(OBJ_DIR)/lj_sdk_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	+ $(CUDR) -o $@ -c lal_lj_sdk.cpp -I$(OBJ_DIR)

	-$(OBJ_DIR)/lal_cg_cmm_ext.o: $(ALL_H) lal_cg_cmm.h lal_cg_cmm_ext.cpp lal_base_atomic.h
	- $(CUDR) -o $@ -c lal_cg_cmm_ext.cpp -I$(OBJ_DIR)
	+$(OBJ_DIR)/lal_lj_sdk_ext.o: $(ALL_H) lal_lj_sdk.h lal_lj_sdk_ext.cpp lal_base_atomic.h
	+ $(CUDR) -o $@ -c lal_lj_sdk_ext.cpp -I$(OBJ_DIR)

	-$(OBJ_DIR)/cg_cmm_long.cubin: lal_cg_cmm_long.cu lal_precision.h lal_preprocessor.h
	- $(CUDA) --cubin -DNV_KERNEL -o $@ lal_cg_cmm_long.cu
	+$(OBJ_DIR)/lj_sdk_long.cubin: lal_lj_sdk_long.cu lal_precision.h lal_preprocessor.h
	+ $(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_sdk_long.cu

	-$(OBJ_DIR)/cg_cmm_long_cubin.h: $(OBJ_DIR)/cg_cmm_long.cubin $(OBJ_DIR)/cg_cmm_long.cubin
	- $(BIN2C) -c -n cg_cmm_long $(OBJ_DIR)/cg_cmm_long.cubin > $(OBJ_DIR)/cg_cmm_long_cubin.h
	+$(OBJ_DIR)/lj_sdk_long_cubin.h: $(OBJ_DIR)/lj_sdk_long.cubin $(OBJ_DIR)/lj_sdk_long.cubin
	+ $(BIN2C) -c -n lj_sdk_long $(OBJ_DIR)/lj_sdk_long.cubin > $(OBJ_DIR)/lj_sdk_long_cubin.h

	-$(OBJ_DIR)/lal_cg_cmm_long.o: $(ALL_H) lal_cg_cmm_long.h lal_cg_cmm_long.cpp $(OBJ_DIR)/cg_cmm_long_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	- $(CUDR) -o $@ -c lal_cg_cmm_long.cpp -I$(OBJ_DIR)
	+$(OBJ_DIR)/lal_lj_sdk_long.o: $(ALL_H) lal_lj_sdk_long.h lal_lj_sdk_long.cpp $(OBJ_DIR)/lj_sdk_long_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	+ $(CUDR) -o $@ -c lal_lj_sdk_long.cpp -I$(OBJ_DIR)

	-$(OBJ_DIR)/lal_cg_cmm_long_ext.o: $(ALL_H) lal_cg_cmm_long.h lal_cg_cmm_long_ext.cpp lal_base_charge.h
	- $(CUDR) -o $@ -c lal_cg_cmm_long_ext.cpp -I$(OBJ_DIR)
	+$(OBJ_DIR)/lal_lj_sdk_long_ext.o: $(ALL_H) lal_lj_sdk_long.h lal_lj_sdk_long_ext.cpp lal_base_charge.h
	+ $(CUDR) -o $@ -c lal_lj_sdk_long_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/eam.cubin: lal_eam.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_eam.cu

	$(OBJ_DIR)/eam_cubin.h: $(OBJ_DIR)/eam.cubin $(OBJ_DIR)/eam.cubin
	$(BIN2C) -c -n eam $(OBJ_DIR)/eam.cubin > $(OBJ_DIR)/eam_cubin.h

	$(OBJ_DIR)/lal_eam.o: $(ALL_H) lal_eam.h lal_eam.cpp $(OBJ_DIR)/eam_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_eam.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_eam_ext.o: $(ALL_H) lal_eam.h lal_eam_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_eam_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_eam_fs_ext.o: $(ALL_H) lal_eam.h lal_eam_fs_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_eam_fs_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_eam_alloy_ext.o: $(ALL_H) lal_eam.h lal_eam_alloy_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_eam_alloy_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/buck.cubin: lal_buck.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_buck.cu

	$(OBJ_DIR)/buck_cubin.h: $(OBJ_DIR)/buck.cubin $(OBJ_DIR)/buck.cubin
	$(BIN2C) -c -n buck $(OBJ_DIR)/buck.cubin > $(OBJ_DIR)/buck_cubin.h

	$(OBJ_DIR)/lal_buck.o: $(ALL_H) lal_buck.h lal_buck.cpp $(OBJ_DIR)/buck_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_buck.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_buck_ext.o: $(ALL_H) lal_buck.h lal_buck_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_buck_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/buck_coul.cubin: lal_buck_coul.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_buck_coul.cu

	$(OBJ_DIR)/buck_coul_cubin.h: $(OBJ_DIR)/buck_coul.cubin $(OBJ_DIR)/buck_coul.cubin
	$(BIN2C) -c -n buck_coul $(OBJ_DIR)/buck_coul.cubin > $(OBJ_DIR)/buck_coul_cubin.h

	$(OBJ_DIR)/lal_buck_coul.o: $(ALL_H) lal_buck_coul.h lal_buck_coul.cpp $(OBJ_DIR)/buck_coul_cubin.h $(OBJ_DIR)/lal_base_charge.o
	$(CUDR) -o $@ -c lal_buck_coul.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_buck_coul_ext.o: $(ALL_H) lal_buck_coul.h lal_buck_coul_ext.cpp lal_base_charge.h
	$(CUDR) -o $@ -c lal_buck_coul_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/buck_coul_long.cubin: lal_buck_coul_long.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_buck_coul_long.cu

	$(OBJ_DIR)/buck_coul_long_cubin.h: $(OBJ_DIR)/buck_coul_long.cubin $(OBJ_DIR)/buck_coul_long.cubin
	$(BIN2C) -c -n buck_coul_long $(OBJ_DIR)/buck_coul_long.cubin > $(OBJ_DIR)/buck_coul_long_cubin.h

	$(OBJ_DIR)/lal_buck_coul_long.o: $(ALL_H) lal_buck_coul_long.h lal_buck_coul_long.cpp $(OBJ_DIR)/buck_coul_long_cubin.h $(OBJ_DIR)/lal_base_charge.o
	$(CUDR) -o $@ -c lal_buck_coul_long.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_buck_coul_long_ext.o: $(ALL_H) lal_buck_coul_long.h lal_buck_coul_long_ext.cpp lal_base_charge.h
	$(CUDR) -o $@ -c lal_buck_coul_long_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/table.cubin: lal_table.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_table.cu

	$(OBJ_DIR)/table_cubin.h: $(OBJ_DIR)/table.cubin $(OBJ_DIR)/table.cubin
	$(BIN2C) -c -n table $(OBJ_DIR)/table.cubin > $(OBJ_DIR)/table_cubin.h

	$(OBJ_DIR)/lal_table.o: $(ALL_H) lal_table.h lal_table.cpp $(OBJ_DIR)/table_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_table.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_table_ext.o: $(ALL_H) lal_table.h lal_table_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_table_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/yukawa.cubin: lal_yukawa.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_yukawa.cu

	$(OBJ_DIR)/yukawa_cubin.h: $(OBJ_DIR)/yukawa.cubin $(OBJ_DIR)/yukawa.cubin
	$(BIN2C) -c -n yukawa $(OBJ_DIR)/yukawa.cubin > $(OBJ_DIR)/yukawa_cubin.h

	$(OBJ_DIR)/lal_yukawa.o: $(ALL_H) lal_yukawa.h lal_yukawa.cpp $(OBJ_DIR)/yukawa_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_yukawa.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_yukawa_ext.o: $(ALL_H) lal_yukawa.h lal_yukawa_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_yukawa_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/born.cubin: lal_born.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_born.cu

	$(OBJ_DIR)/born_cubin.h: $(OBJ_DIR)/born.cubin $(OBJ_DIR)/born.cubin
	$(BIN2C) -c -n born $(OBJ_DIR)/born.cubin > $(OBJ_DIR)/born_cubin.h

	$(OBJ_DIR)/lal_born.o: $(ALL_H) lal_born.h lal_born.cpp $(OBJ_DIR)/born_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_born.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_born_ext.o: $(ALL_H) lal_born.h lal_born_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_born_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/born_coul_wolf.cubin: lal_born_coul_wolf.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_born_coul_wolf.cu

	$(OBJ_DIR)/born_coul_wolf_cubin.h: $(OBJ_DIR)/born_coul_wolf.cubin $(OBJ_DIR)/born_coul_wolf.cubin
	$(BIN2C) -c -n born_coul_wolf $(OBJ_DIR)/born_coul_wolf.cubin > $(OBJ_DIR)/born_coul_wolf_cubin.h

	$(OBJ_DIR)/lal_born_coul_wolf.o: $(ALL_H) lal_born_coul_wolf.h lal_born_coul_wolf.cpp $(OBJ_DIR)/born_coul_wolf_cubin.h $(OBJ_DIR)/lal_base_charge.o
	$(CUDR) -o $@ -c lal_born_coul_wolf.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_born_coul_wolf_ext.o: $(ALL_H) lal_born_coul_wolf.h lal_born_coul_wolf_ext.cpp lal_base_charge.h
	$(CUDR) -o $@ -c lal_born_coul_wolf_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/born_coul_long.cubin: lal_born_coul_long.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_born_coul_long.cu

	$(OBJ_DIR)/born_coul_long_cubin.h: $(OBJ_DIR)/born_coul_long.cubin $(OBJ_DIR)/born_coul_long.cubin
	$(BIN2C) -c -n born_coul_long $(OBJ_DIR)/born_coul_long.cubin > $(OBJ_DIR)/born_coul_long_cubin.h

	$(OBJ_DIR)/lal_born_coul_long.o: $(ALL_H) lal_born_coul_long.h lal_born_coul_long.cpp $(OBJ_DIR)/born_coul_long_cubin.h $(OBJ_DIR)/lal_base_charge.o
	$(CUDR) -o $@ -c lal_born_coul_long.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_born_coul_long_ext.o: $(ALL_H) lal_born_coul_long.h lal_born_coul_long_ext.cpp lal_base_charge.h
	$(CUDR) -o $@ -c lal_born_coul_long_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/dipole_lj.cubin: lal_dipole_lj.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_dipole_lj.cu

	$(OBJ_DIR)/dipole_lj_cubin.h: $(OBJ_DIR)/dipole_lj.cubin $(OBJ_DIR)/dipole_lj.cubin
	$(BIN2C) -c -n dipole_lj $(OBJ_DIR)/dipole_lj.cubin > $(OBJ_DIR)/dipole_lj_cubin.h

	$(OBJ_DIR)/lal_dipole_lj.o: $(ALL_H) lal_dipole_lj.h lal_dipole_lj.cpp $(OBJ_DIR)/dipole_lj_cubin.h $(OBJ_DIR)/lal_base_dipole.o
	$(CUDR) -o $@ -c lal_dipole_lj.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_dipole_lj_ext.o: $(ALL_H) lal_dipole_lj.h lal_dipole_lj_ext.cpp lal_base_dipole.h
	$(CUDR) -o $@ -c lal_dipole_lj_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/dipole_lj_sf.cubin: lal_dipole_lj_sf.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_dipole_lj_sf.cu

	$(OBJ_DIR)/dipole_lj_sf_cubin.h: $(OBJ_DIR)/dipole_lj_sf.cubin $(OBJ_DIR)/dipole_lj_sf.cubin
	$(BIN2C) -c -n dipole_lj_sf $(OBJ_DIR)/dipole_lj_sf.cubin > $(OBJ_DIR)/dipole_lj_sf_cubin.h

	$(OBJ_DIR)/lal_dipole_lj_sf.o: $(ALL_H) lal_dipole_lj_sf.h lal_dipole_lj_sf.cpp $(OBJ_DIR)/dipole_lj_sf_cubin.h $(OBJ_DIR)/lal_base_dipole.o
	$(CUDR) -o $@ -c lal_dipole_lj_sf.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_dipole_lj_sf_ext.o: $(ALL_H) lal_dipole_lj_sf.h lal_dipole_lj_sf_ext.cpp lal_base_dipole.h
	$(CUDR) -o $@ -c lal_dipole_lj_sf_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/colloid.cubin: lal_colloid.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_colloid.cu

	$(OBJ_DIR)/colloid_cubin.h: $(OBJ_DIR)/colloid.cubin $(OBJ_DIR)/colloid.cubin
	$(BIN2C) -c -n colloid $(OBJ_DIR)/colloid.cubin > $(OBJ_DIR)/colloid_cubin.h

	$(OBJ_DIR)/lal_colloid.o: $(ALL_H) lal_colloid.h lal_colloid.cpp $(OBJ_DIR)/colloid_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_colloid.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_colloid_ext.o: $(ALL_H) lal_colloid.h lal_colloid_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_colloid_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/gauss.cubin: lal_gauss.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_gauss.cu

	$(OBJ_DIR)/gauss_cubin.h: $(OBJ_DIR)/gauss.cubin $(OBJ_DIR)/gauss.cubin
	$(BIN2C) -c -n gauss $(OBJ_DIR)/gauss.cubin > $(OBJ_DIR)/gauss_cubin.h

	$(OBJ_DIR)/lal_gauss.o: $(ALL_H) lal_gauss.h lal_gauss.cpp $(OBJ_DIR)/gauss_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_gauss.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_gauss_ext.o: $(ALL_H) lal_gauss.h lal_gauss_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_gauss_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/yukawa_colloid.cubin: lal_yukawa_colloid.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_yukawa_colloid.cu

	$(OBJ_DIR)/yukawa_colloid_cubin.h: $(OBJ_DIR)/yukawa_colloid.cubin $(OBJ_DIR)/yukawa_colloid.cubin
	$(BIN2C) -c -n yukawa_colloid $(OBJ_DIR)/yukawa_colloid.cubin > $(OBJ_DIR)/yukawa_colloid_cubin.h

	$(OBJ_DIR)/lal_yukawa_colloid.o: $(ALL_H) lal_yukawa_colloid.h lal_yukawa_colloid.cpp $(OBJ_DIR)/yukawa_colloid_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_yukawa_colloid.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_yukawa_colloid_ext.o: $(ALL_H) lal_yukawa_colloid.h lal_yukawa_colloid_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_yukawa_colloid_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_coul_debye.cubin: lal_lj_coul_debye.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_coul_debye.cu

	$(OBJ_DIR)/lj_coul_debye_cubin.h: $(OBJ_DIR)/lj_coul_debye.cubin $(OBJ_DIR)/lj_coul_debye.cubin
	$(BIN2C) -c -n lj_coul_debye $(OBJ_DIR)/lj_coul_debye.cubin > $(OBJ_DIR)/lj_coul_debye_cubin.h

	$(OBJ_DIR)/lal_lj_coul_debye.o: $(ALL_H) lal_lj_coul_debye.h lal_lj_coul_debye.cpp $(OBJ_DIR)/lj_coul_debye_cubin.h $(OBJ_DIR)/lal_base_charge.o
	$(CUDR) -o $@ -c lal_lj_coul_debye.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_coul_debye_ext.o: $(ALL_H) lal_lj_coul_debye.h lal_lj_coul_debye_ext.cpp lal_base_charge.h
	$(CUDR) -o $@ -c lal_lj_coul_debye_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/coul_dsf.cubin: lal_coul_dsf.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_coul_dsf.cu

	$(OBJ_DIR)/coul_dsf_cubin.h: $(OBJ_DIR)/coul_dsf.cubin $(OBJ_DIR)/coul_dsf.cubin
	$(BIN2C) -c -n coul_dsf $(OBJ_DIR)/coul_dsf.cubin > $(OBJ_DIR)/coul_dsf_cubin.h

	$(OBJ_DIR)/lal_coul_dsf.o: $(ALL_H) lal_coul_dsf.h lal_coul_dsf.cpp $(OBJ_DIR)/coul_dsf_cubin.h $(OBJ_DIR)/lal_base_charge.o
	$(CUDR) -o $@ -c lal_coul_dsf.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_coul_dsf_ext.o: $(ALL_H) lal_coul_dsf.h lal_coul_dsf_ext.cpp lal_base_charge.h
	$(CUDR) -o $@ -c lal_coul_dsf_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/sw.cubin: lal_sw.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_sw.cu

	$(OBJ_DIR)/sw_cubin.h: $(OBJ_DIR)/sw.cubin $(OBJ_DIR)/sw.cubin
	$(BIN2C) -c -n sw $(OBJ_DIR)/sw.cubin > $(OBJ_DIR)/sw_cubin.h

	$(OBJ_DIR)/lal_sw.o: $(ALL_H) lal_sw.h lal_sw.cpp $(OBJ_DIR)/sw_cubin.h $(OBJ_DIR)/lal_base_three.o
	$(CUDR) -o $@ -c lal_sw.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_sw_ext.o: $(ALL_H) lal_sw.h lal_sw_ext.cpp lal_base_three.h
	$(CUDR) -o $@ -c lal_sw_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/beck.cubin: lal_beck.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_beck.cu

	$(OBJ_DIR)/beck_cubin.h: $(OBJ_DIR)/beck.cubin $(OBJ_DIR)/beck.cubin
	$(BIN2C) -c -n beck $(OBJ_DIR)/beck.cubin > $(OBJ_DIR)/beck_cubin.h

	$(OBJ_DIR)/lal_beck.o: $(ALL_H) lal_beck.h lal_beck.cpp $(OBJ_DIR)/beck_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_beck.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_beck_ext.o: $(ALL_H) lal_beck.h lal_beck_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_beck_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/mie.cubin: lal_mie.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_mie.cu

	$(OBJ_DIR)/mie_cubin.h: $(OBJ_DIR)/mie.cubin $(OBJ_DIR)/mie.cubin
	$(BIN2C) -c -n mie $(OBJ_DIR)/mie.cubin > $(OBJ_DIR)/mie_cubin.h

	$(OBJ_DIR)/lal_mie.o: $(ALL_H) lal_mie.h lal_mie.cpp $(OBJ_DIR)/mie_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_mie.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_mie_ext.o: $(ALL_H) lal_mie.h lal_mie_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_mie_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/soft.cubin: lal_soft.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_soft.cu

	$(OBJ_DIR)/soft_cubin.h: $(OBJ_DIR)/soft.cubin $(OBJ_DIR)/soft.cubin
	$(BIN2C) -c -n soft $(OBJ_DIR)/soft.cubin > $(OBJ_DIR)/soft_cubin.h

	$(OBJ_DIR)/lal_soft.o: $(ALL_H) lal_soft.h lal_soft.cpp $(OBJ_DIR)/soft_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_soft.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_soft_ext.o: $(ALL_H) lal_soft.h lal_soft_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_soft_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_coul_msm.cubin: lal_lj_coul_msm.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_coul_msm.cu

	$(OBJ_DIR)/lj_coul_msm_cubin.h: $(OBJ_DIR)/lj_coul_msm.cubin $(OBJ_DIR)/lj_coul_msm.cubin
	$(BIN2C) -c -n lj_coul_msm $(OBJ_DIR)/lj_coul_msm.cubin > $(OBJ_DIR)/lj_coul_msm_cubin.h

	$(OBJ_DIR)/lal_lj_coul_msm.o: $(ALL_H) lal_lj_coul_msm.h lal_lj_coul_msm.cpp $(OBJ_DIR)/lj_coul_msm_cubin.h $(OBJ_DIR)/lal_base_charge.o
	$(CUDR) -o $@ -c lal_lj_coul_msm.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_coul_msm_ext.o: $(ALL_H) lal_lj_coul_msm.h lal_lj_coul_msm_ext.cpp lal_base_charge.h
	$(CUDR) -o $@ -c lal_lj_coul_msm_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_gromacs.cubin: lal_lj_gromacs.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_gromacs.cu

	$(OBJ_DIR)/lj_gromacs_cubin.h: $(OBJ_DIR)/lj_gromacs.cubin $(OBJ_DIR)/lj_gromacs.cubin
	$(BIN2C) -c -n lj_gromacs $(OBJ_DIR)/lj_gromacs.cubin > $(OBJ_DIR)/lj_gromacs_cubin.h

	$(OBJ_DIR)/lal_lj_gromacs.o: $(ALL_H) lal_lj_gromacs.h lal_lj_gromacs.cpp $(OBJ_DIR)/lj_gromacs_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_lj_gromacs.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_gromacs_ext.o: $(ALL_H) lal_lj_gromacs.h lal_lj_gromacs_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_lj_gromacs_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/dpd.cubin: lal_dpd.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_dpd.cu

	$(OBJ_DIR)/dpd_cubin.h: $(OBJ_DIR)/dpd.cubin $(OBJ_DIR)/dpd.cubin
	$(BIN2C) -c -n dpd $(OBJ_DIR)/dpd.cubin > $(OBJ_DIR)/dpd_cubin.h

	$(OBJ_DIR)/lal_dpd.o: $(ALL_H) lal_dpd.h lal_dpd.cpp $(OBJ_DIR)/dpd_cubin.h $(OBJ_DIR)/lal_base_dpd.o
	$(CUDR) -o $@ -c lal_dpd.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_dpd_ext.o: $(ALL_H) lal_dpd.h lal_dpd_ext.cpp lal_base_dpd.h
	$(CUDR) -o $@ -c lal_dpd_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/tersoff.cubin: lal_tersoff.cu lal_precision.h lal_tersoff_extra.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_tersoff.cu

	$(OBJ_DIR)/tersoff_cubin.h: $(OBJ_DIR)/tersoff.cubin $(OBJ_DIR)/tersoff.cubin
	$(BIN2C) -c -n tersoff $(OBJ_DIR)/tersoff.cubin > $(OBJ_DIR)/tersoff_cubin.h

	$(OBJ_DIR)/lal_tersoff.o: $(ALL_H) lal_tersoff.h lal_tersoff.cpp $(OBJ_DIR)/tersoff_cubin.h $(OBJ_DIR)/lal_base_three.o
	$(CUDR) -o $@ -c lal_tersoff.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_tersoff_ext.o: $(ALL_H) lal_tersoff.h lal_tersoff_ext.cpp lal_base_three.h
	$(CUDR) -o $@ -c lal_tersoff_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/tersoff_zbl.cubin: lal_tersoff_zbl.cu lal_precision.h lal_tersoff_zbl_extra.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_tersoff_zbl.cu

	$(OBJ_DIR)/tersoff_zbl_cubin.h: $(OBJ_DIR)/tersoff_zbl.cubin $(OBJ_DIR)/tersoff_zbl.cubin
	$(BIN2C) -c -n tersoff_zbl $(OBJ_DIR)/tersoff_zbl.cubin > $(OBJ_DIR)/tersoff_zbl_cubin.h

	$(OBJ_DIR)/lal_tersoff_zbl.o: $(ALL_H) lal_tersoff_zbl.h lal_tersoff_zbl.cpp $(OBJ_DIR)/tersoff_zbl_cubin.h $(OBJ_DIR)/lal_base_three.o
	$(CUDR) -o $@ -c lal_tersoff_zbl.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_tersoff_zbl_ext.o: $(ALL_H) lal_tersoff_zbl.h lal_tersoff_zbl_ext.cpp lal_base_three.h
	$(CUDR) -o $@ -c lal_tersoff_zbl_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/tersoff_mod.cubin: lal_tersoff_mod.cu lal_precision.h lal_tersoff_mod_extra.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_tersoff_mod.cu

	$(OBJ_DIR)/tersoff_mod_cubin.h: $(OBJ_DIR)/tersoff_mod.cubin $(OBJ_DIR)/tersoff_mod.cubin
	$(BIN2C) -c -n tersoff_mod $(OBJ_DIR)/tersoff_mod.cubin > $(OBJ_DIR)/tersoff_mod_cubin.h

	$(OBJ_DIR)/lal_tersoff_mod.o: $(ALL_H) lal_tersoff_mod.h lal_tersoff_mod.cpp $(OBJ_DIR)/tersoff_mod_cubin.h $(OBJ_DIR)/lal_base_three.o
	$(CUDR) -o $@ -c lal_tersoff_mod.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_tersoff_mod_ext.o: $(ALL_H) lal_tersoff_mod.h lal_tersoff_mod_ext.cpp lal_base_three.h
	$(CUDR) -o $@ -c lal_tersoff_mod_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/coul.cubin: lal_coul.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_coul.cu

	$(OBJ_DIR)/coul_cubin.h: $(OBJ_DIR)/coul.cubin $(OBJ_DIR)/coul.cubin
	$(BIN2C) -c -n coul $(OBJ_DIR)/coul.cubin > $(OBJ_DIR)/coul_cubin.h

	$(OBJ_DIR)/lal_coul.o: $(ALL_H) lal_coul.h lal_coul.cpp $(OBJ_DIR)/coul_cubin.h $(OBJ_DIR)/lal_base_charge.o
	$(CUDR) -o $@ -c lal_coul.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_coul_ext.o: $(ALL_H) lal_coul.h lal_coul_ext.cpp lal_base_charge.h
	$(CUDR) -o $@ -c lal_coul_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/coul_debye.cubin: lal_coul_debye.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_coul_debye.cu

	$(OBJ_DIR)/coul_debye_cubin.h: $(OBJ_DIR)/coul_debye.cubin $(OBJ_DIR)/coul_debye.cubin
	$(BIN2C) -c -n coul_debye $(OBJ_DIR)/coul_debye.cubin > $(OBJ_DIR)/coul_debye_cubin.h

	$(OBJ_DIR)/lal_coul_debye.o: $(ALL_H) lal_coul_debye.h lal_coul_debye.cpp $(OBJ_DIR)/coul_debye_cubin.h $(OBJ_DIR)/lal_base_charge.o
	$(CUDR) -o $@ -c lal_coul_debye.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_coul_debye_ext.o: $(ALL_H) lal_coul_debye.h lal_coul_debye_ext.cpp lal_base_charge.h
	$(CUDR) -o $@ -c lal_coul_debye_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/zbl.cubin: lal_zbl.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_zbl.cu

	$(OBJ_DIR)/zbl_cubin.h: $(OBJ_DIR)/zbl.cubin $(OBJ_DIR)/zbl.cubin
	$(BIN2C) -c -n zbl $(OBJ_DIR)/zbl.cubin > $(OBJ_DIR)/zbl_cubin.h

	$(OBJ_DIR)/lal_zbl.o: $(ALL_H) lal_zbl.h lal_zbl.cpp $(OBJ_DIR)/zbl_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_zbl.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_zbl_ext.o: $(ALL_H) lal_zbl.h lal_zbl_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_zbl_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_cubic.cubin: lal_lj_cubic.cu lal_precision.h lal_preprocessor.h
	$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_cubic.cu

	$(OBJ_DIR)/lj_cubic_cubin.h: $(OBJ_DIR)/lj_cubic.cubin $(OBJ_DIR)/lj_cubic.cubin
	$(BIN2C) -c -n lj_cubic $(OBJ_DIR)/lj_cubic.cubin > $(OBJ_DIR)/lj_cubic_cubin.h

	$(OBJ_DIR)/lal_lj_cubic.o: $(ALL_H) lal_lj_cubic.h lal_lj_cubic.cpp $(OBJ_DIR)/lj_cubic_cubin.h $(OBJ_DIR)/lal_base_atomic.o
	$(CUDR) -o $@ -c lal_lj_cubic.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_cubic_ext.o: $(ALL_H) lal_lj_cubic.h lal_lj_cubic_ext.cpp lal_base_atomic.h
	$(CUDR) -o $@ -c lal_lj_cubic_ext.cpp -I$(OBJ_DIR)

	$(BIN_DIR)/nvc_get_devices: ./geryon/ucl_get_devices.cpp $(NVD_H)
	$(CUDR) -o $@ ./geryon/ucl_get_devices.cpp -DUCL_CUDADR $(CUDA_LIB) -lcuda

	$(GPU_LIB): $(OBJS) $(CUDPP)
	$(AR) -crusv $(GPU_LIB) $(OBJS) $(CUDPP)
	@cp $(EXTRAMAKE) Makefile.lammps

	clean:
	-rm -f $(EXECS) $(GPU_LIB) $(OBJS) $(CUDPP) $(CBNS) *.linkinfo

	veryclean: clean
	-rm -rf ~ .linkinfo

	cleanlib:
	-rm -f $(EXECS) $(GPU_LIB) $(OBJS) $(CBNS) *.linkinfo
	diff --git a/lib/gpu/Opencl.makefile b/lib/gpu/Opencl.makefile
	index 7ef1dfba0..4a5959531 100644
	--- a/lib/gpu/Opencl.makefile
	+++ b/lib/gpu/Opencl.makefile
	@@ -1,584 +1,584 @@
	OCL = $(OCL_CPP) $(OCL_PREC) $(OCL_TUNE) -DUSE_OPENCL
	OCL_LIB = $(LIB_DIR)/libgpu.a
	# Headers for Geryon
	UCL_H = $(wildcard ./geryon/ucl*.h)
	OCL_H = $(wildcard ./geryon/ocl*.h) $(UCL_H)
	# Headers for Pair Stuff
	PAIR_H = lal_atom.h lal_answer.h lal_neighbor_shared.h \
	lal_neighbor.h lal_precision.h lal_device.h \
	lal_balance.h lal_pppm.h
	# Headers for Preprocessor/Auxiliary Functions
	PRE1_H = lal_preprocessor.h lal_aux_fun1.h

	ALL_H = $(OCL_H) $(PAIR_H)

	EXECS = $(BIN_DIR)/ocl_get_devices
	OBJS = $(OBJ_DIR)/lal_atom.o $(OBJ_DIR)/lal_answer.o \
	$(OBJ_DIR)/lal_neighbor_shared.o $(OBJ_DIR)/lal_neighbor.o \
	$(OBJ_DIR)/lal_device.o $(OBJ_DIR)/lal_base_atomic.o \
	$(OBJ_DIR)/lal_base_charge.o $(OBJ_DIR)/lal_base_ellipsoid.o \
	$(OBJ_DIR)/lal_base_dipole.o $(OBJ_DIR)/lal_base_three.o \
	$(OBJ_DIR)/lal_base_dpd.o \
	$(OBJ_DIR)/lal_pppm.o $(OBJ_DIR)/lal_pppm_ext.o \
	$(OBJ_DIR)/lal_gayberne.o $(OBJ_DIR)/lal_gayberne_ext.o \
	$(OBJ_DIR)/lal_re_squared.o $(OBJ_DIR)/lal_re_squared_ext.o \
	$(OBJ_DIR)/lal_lj.o $(OBJ_DIR)/lal_lj_ext.o \
	$(OBJ_DIR)/lal_lj96.o $(OBJ_DIR)/lal_lj96_ext.o \
	$(OBJ_DIR)/lal_lj_expand.o $(OBJ_DIR)/lal_lj_expand_ext.o \
	$(OBJ_DIR)/lal_lj_coul.o $(OBJ_DIR)/lal_lj_coul_ext.o \
	$(OBJ_DIR)/lal_lj_coul_long.o $(OBJ_DIR)/lal_lj_coul_long_ext.o \
	$(OBJ_DIR)/lal_lj_dsf.o $(OBJ_DIR)/lal_lj_dsf_ext.o \
	$(OBJ_DIR)/lal_lj_class2_long.o $(OBJ_DIR)/lal_lj_class2_long_ext.o \
	$(OBJ_DIR)/lal_coul_long.o $(OBJ_DIR)/lal_coul_long_ext.o \
	$(OBJ_DIR)/lal_morse.o $(OBJ_DIR)/lal_morse_ext.o \
	$(OBJ_DIR)/lal_charmm_long.o $(OBJ_DIR)/lal_charmm_long_ext.o \
	- $(OBJ_DIR)/lal_cg_cmm.o $(OBJ_DIR)/lal_cg_cmm_ext.o \
	- $(OBJ_DIR)/lal_cg_cmm_long.o $(OBJ_DIR)/lal_cg_cmm_long_ext.o \
	+ $(OBJ_DIR)/lal_lj_sdk.o $(OBJ_DIR)/lal_lj_sdk_ext.o \
	+ $(OBJ_DIR)/lal_lj_sdk_long.o $(OBJ_DIR)/lal_lj_sdk_long_ext.o \
	$(OBJ_DIR)/lal_eam.o $(OBJ_DIR)/lal_eam_ext.o \
	$(OBJ_DIR)/lal_eam_fs_ext.o $(OBJ_DIR)/lal_eam_alloy_ext.o \
	$(OBJ_DIR)/lal_buck.o $(OBJ_DIR)/lal_buck_ext.o \
	$(OBJ_DIR)/lal_buck_coul.o $(OBJ_DIR)/lal_buck_coul_ext.o \
	$(OBJ_DIR)/lal_buck_coul_long.o $(OBJ_DIR)/lal_buck_coul_long_ext.o \
	$(OBJ_DIR)/lal_table.o $(OBJ_DIR)/lal_table_ext.o \
	$(OBJ_DIR)/lal_yukawa.o $(OBJ_DIR)/lal_yukawa_ext.o \
	$(OBJ_DIR)/lal_born.o $(OBJ_DIR)/lal_born_ext.o \
	$(OBJ_DIR)/lal_born_coul_wolf.o $(OBJ_DIR)/lal_born_coul_wolf_ext.o \
	$(OBJ_DIR)/lal_born_coul_long.o $(OBJ_DIR)/lal_born_coul_long_ext.o \
	$(OBJ_DIR)/lal_dipole_lj.o $(OBJ_DIR)/lal_dipole_lj_ext.o \
	$(OBJ_DIR)/lal_dipole_lj_sf.o $(OBJ_DIR)/lal_dipole_lj_sf_ext.o \
	$(OBJ_DIR)/lal_colloid.o $(OBJ_DIR)/lal_colloid_ext.o \
	$(OBJ_DIR)/lal_gauss.o $(OBJ_DIR)/lal_gauss_ext.o \
	$(OBJ_DIR)/lal_yukawa_colloid.o $(OBJ_DIR)/lal_yukawa_colloid_ext.o \
	$(OBJ_DIR)/lal_lj_coul_debye.o $(OBJ_DIR)/lal_lj_coul_debye_ext.o \
	$(OBJ_DIR)/lal_coul_dsf.o $(OBJ_DIR)/lal_coul_dsf_ext.o \
	$(OBJ_DIR)/lal_sw.o $(OBJ_DIR)/lal_sw_ext.o \
	$(OBJ_DIR)/lal_beck.o $(OBJ_DIR)/lal_beck_ext.o \
	$(OBJ_DIR)/lal_mie.o $(OBJ_DIR)/lal_mie_ext.o \
	$(OBJ_DIR)/lal_soft.o $(OBJ_DIR)/lal_soft_ext.o \
	$(OBJ_DIR)/lal_lj_coul_msm.o $(OBJ_DIR)/lal_lj_coul_msm_ext.o \
	$(OBJ_DIR)/lal_lj_gromacs.o $(OBJ_DIR)/lal_lj_gromacs_ext.o \
	$(OBJ_DIR)/lal_dpd.o $(OBJ_DIR)/lal_dpd_ext.o \
	$(OBJ_DIR)/lal_tersoff.o $(OBJ_DIR)/lal_tersoff_ext.o \
	$(OBJ_DIR)/lal_tersoff_zbl.o $(OBJ_DIR)/lal_tersoff_zbl_ext.o \
	$(OBJ_DIR)/lal_tersoff_mod.o $(OBJ_DIR)/lal_tersoff_mod_ext.o \
	$(OBJ_DIR)/lal_coul.o $(OBJ_DIR)/lal_coul_ext.o \
	$(OBJ_DIR)/lal_coul_debye.o $(OBJ_DIR)/lal_coul_debye_ext.o \
	$(OBJ_DIR)/lal_zbl.o $(OBJ_DIR)/lal_zbl_ext.o \
	$(OBJ_DIR)/lal_lj_cubic.o $(OBJ_DIR)/lal_lj_cubic_ext.o

	KERS = $(OBJ_DIR)/device_cl.h $(OBJ_DIR)/atom_cl.h \
	$(OBJ_DIR)/neighbor_cpu_cl.h $(OBJ_DIR)/pppm_cl.h \
	$(OBJ_DIR)/ellipsoid_nbor_cl.h $(OBJ_DIR)/gayberne_cl.h \
	$(OBJ_DIR)/gayberne_lj_cl.h $(OBJ_DIR)/re_squared_cl.h \
	$(OBJ_DIR)/re_squared_lj_cl.h $(OBJ_DIR)/lj_cl.h $(OBJ_DIR)/lj96_cl.h \
	$(OBJ_DIR)/lj_expand_cl.h $(OBJ_DIR)/lj_coul_cl.h \
	$(OBJ_DIR)/lj_coul_long_cl.h $(OBJ_DIR)/lj_dsf_cl.h \
	$(OBJ_DIR)/lj_class2_long_cl.h \
	$(OBJ_DIR)/coul_long_cl.h $(OBJ_DIR)/morse_cl.h \
	- $(OBJ_DIR)/charmm_long_cl.h $(OBJ_DIR)/cg_cmm_cl.h \
	- $(OBJ_DIR)/cg_cmm_long_cl.h $(OBJ_DIR)/neighbor_gpu_cl.h \
	+ $(OBJ_DIR)/charmm_long_cl.h $(OBJ_DIR)/lj_sdk_cl.h \
	+ $(OBJ_DIR)/lj_sdk_long_cl.h $(OBJ_DIR)/neighbor_gpu_cl.h \
	$(OBJ_DIR)/eam_cl.h $(OBJ_DIR)/buck_cl.h \
	$(OBJ_DIR)/buck_coul_cl.h $(OBJ_DIR)/buck_coul_long_cl.h \
	$(OBJ_DIR)/table_cl.h $(OBJ_DIR)/yukawa_cl.h \
	$(OBJ_DIR)/born_cl.h $(OBJ_DIR)/born_coul_wolf_cl.h \
	$(OBJ_DIR)/born_coul_long_cl.h $(OBJ_DIR)/dipole_lj_cl.h \
	$(OBJ_DIR)/dipole_lj_sf_cl.h $(OBJ_DIR)/colloid_cl.h \
	$(OBJ_DIR)/gauss_cl.h $(OBJ_DIR)/yukawa_colloid_cl.h \
	$(OBJ_DIR)/lj_coul_debye_cl.h $(OBJ_DIR)/coul_dsf_cl.h \
	$(OBJ_DIR)/sw_cl.h $(OBJ_DIR)/beck_cl.h $(OBJ_DIR)/mie_cl.h \
	$(OBJ_DIR)/soft_cl.h $(OBJ_DIR)/lj_coul_msm_cl.h \
	$(OBJ_DIR)/lj_gromacs_cl.h $(OBJ_DIR)/dpd_cl.h \
	$(OBJ_DIR)/lj_gauss_cl.h $(OBJ_DIR)/dzugutov_cl.h \
	$(OBJ_DIR)/tersoff_cl.h $(OBJ_DIR)/tersoff_zbl_cl.h \
	$(OBJ_DIR)/tersoff_mod_cl.h $(OBJ_DIR)/coul_cl.h \
	$(OBJ_DIR)/coul_debye_cl.h $(OBJ_DIR)/zbl_cl.h \
	$(OBJ_DIR)/lj_cubic_cl.h


	OCL_EXECS = $(BIN_DIR)/ocl_get_devices

	all: $(OBJ_DIR) $(OCL_LIB) $(EXECS)

	$(OBJ_DIR):
	mkdir -p $@

	$(OBJ_DIR)/atom_cl.h: lal_atom.cu lal_preprocessor.h
	$(BSH) ./geryon/file_to_cstr.sh atom lal_preprocessor.h lal_atom.cu $(OBJ_DIR)/atom_cl.h

	$(OBJ_DIR)/lal_atom.o: lal_atom.cpp lal_atom.h $(OCL_H) $(OBJ_DIR)/atom_cl.h
	$(OCL) -o $@ -c lal_atom.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_answer.o: lal_answer.cpp lal_answer.h $(OCL_H)
	$(OCL) -o $@ -c lal_answer.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/neighbor_cpu_cl.h: lal_neighbor_cpu.cu lal_preprocessor.h
	$(BSH) ./geryon/file_to_cstr.sh neighbor_cpu lal_preprocessor.h lal_neighbor_cpu.cu $(OBJ_DIR)/neighbor_cpu_cl.h

	$(OBJ_DIR)/neighbor_gpu_cl.h: lal_neighbor_gpu.cu lal_preprocessor.h
	$(BSH) ./geryon/file_to_cstr.sh neighbor_gpu lal_preprocessor.h lal_neighbor_gpu.cu $(OBJ_DIR)/neighbor_gpu_cl.h

	$(OBJ_DIR)/lal_neighbor_shared.o: lal_neighbor_shared.cpp lal_neighbor_shared.h $(OCL_H) $(OBJ_DIR)/neighbor_cpu_cl.h $(OBJ_DIR)/neighbor_gpu_cl.h
	$(OCL) -o $@ -c lal_neighbor_shared.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_neighbor.o: lal_neighbor.cpp lal_neighbor.h $(OCL_H) lal_neighbor_shared.h
	$(OCL) -o $@ -c lal_neighbor.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/device_cl.h: lal_device.cu lal_preprocessor.h
	$(BSH) ./geryon/file_to_cstr.sh device lal_preprocessor.h lal_device.cu $(OBJ_DIR)/device_cl.h

	$(OBJ_DIR)/lal_device.o: lal_device.cpp lal_device.h $(ALL_H) $(OBJ_DIR)/device_cl.h
	$(OCL) -o $@ -c lal_device.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_base_atomic.o: $(OCL_H) lal_base_atomic.h lal_base_atomic.cpp
	$(OCL) -o $@ -c lal_base_atomic.cpp

	$(OBJ_DIR)/lal_base_charge.o: $(OCL_H) lal_base_charge.h lal_base_charge.cpp
	$(OCL) -o $@ -c lal_base_charge.cpp

	$(OBJ_DIR)/lal_base_ellipsoid.o: $(OCL_H) lal_base_ellipsoid.h lal_base_ellipsoid.cpp $(OBJ_DIR)/ellipsoid_nbor_cl.h
	$(OCL) -o $@ -c lal_base_ellipsoid.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_base_dipole.o: $(OCL_H) lal_base_dipole.h lal_base_dipole.cpp
	$(OCL) -o $@ -c lal_base_dipole.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_base_three.o: $(OCL_H) lal_base_three.h lal_base_three.cpp
	$(OCL) -o $@ -c lal_base_three.cpp

	$(OBJ_DIR)/lal_base_dpd.o: $(OCL_H) lal_base_dpd.h lal_base_dpd.cpp
	$(OCL) -o $@ -c lal_base_dpd.cpp

	$(OBJ_DIR)/pppm_cl.h: lal_pppm.cu lal_preprocessor.h
	$(BSH) ./geryon/file_to_cstr.sh pppm lal_preprocessor.h lal_pppm.cu $(OBJ_DIR)/pppm_cl.h;

	$(OBJ_DIR)/lal_pppm.o: $(ALL_H) lal_pppm.h lal_pppm.cpp $(OBJ_DIR)/pppm_cl.h $(OBJ_DIR)/pppm_cl.h
	$(OCL) -o $@ -c lal_pppm.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_pppm_ext.o: $(ALL_H) lal_pppm.h lal_pppm_ext.cpp
	$(OCL) -o $@ -c lal_pppm_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/ellipsoid_nbor_cl.h: lal_ellipsoid_nbor.cu lal_preprocessor.h
	$(BSH) ./geryon/file_to_cstr.sh ellipsoid_nbor lal_preprocessor.h lal_ellipsoid_nbor.cu $(OBJ_DIR)/ellipsoid_nbor_cl.h

	$(OBJ_DIR)/gayberne_cl.h: lal_gayberne.cu lal_ellipsoid_extra.h lal_aux_fun1.h lal_preprocessor.h
	$(BSH) ./geryon/file_to_cstr.sh gayberne lal_preprocessor.h lal_aux_fun1.h lal_ellipsoid_extra.h lal_gayberne.cu $(OBJ_DIR)/gayberne_cl.h;

	$(OBJ_DIR)/gayberne_lj_cl.h: lal_gayberne_lj.cu lal_ellipsoid_extra.h lal_aux_fun1.h lal_preprocessor.h
	$(BSH) ./geryon/file_to_cstr.sh gayberne_lj lal_preprocessor.h lal_aux_fun1.h lal_ellipsoid_extra.h lal_gayberne_lj.cu $(OBJ_DIR)/gayberne_lj_cl.h;

	$(OBJ_DIR)/lal_gayberne.o: $(ALL_H) lal_gayberne.h lal_gayberne.cpp $(OBJ_DIR)/gayberne_cl.h $(OBJ_DIR)/gayberne_lj_cl.h $(OBJ_DIR)/lal_base_ellipsoid.o
	$(OCL) -o $@ -c lal_gayberne.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_gayberne_ext.o: $(ALL_H) $(OBJ_DIR)/lal_gayberne.o lal_gayberne_ext.cpp
	$(OCL) -o $@ -c lal_gayberne_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/re_squared_cl.h: lal_re_squared.cu lal_ellipsoid_extra.h lal_aux_fun1.h lal_preprocessor.h
	$(BSH) ./geryon/file_to_cstr.sh re_squared lal_preprocessor.h lal_aux_fun1.h lal_ellipsoid_extra.h lal_re_squared.cu $(OBJ_DIR)/re_squared_cl.h;

	$(OBJ_DIR)/re_squared_lj_cl.h: lal_re_squared_lj.cu lal_ellipsoid_extra.h lal_aux_fun1.h lal_preprocessor.h
	$(BSH) ./geryon/file_to_cstr.sh re_squared_lj lal_preprocessor.h lal_aux_fun1.h lal_ellipsoid_extra.h lal_re_squared_lj.cu $(OBJ_DIR)/re_squared_lj_cl.h;

	$(OBJ_DIR)/lal_re_squared.o: $(ALL_H) lal_re_squared.h lal_re_squared.cpp $(OBJ_DIR)/re_squared_cl.h $(OBJ_DIR)/re_squared_lj_cl.h $(OBJ_DIR)/lal_base_ellipsoid.o
	$(OCL) -o $@ -c lal_re_squared.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_re_squared_ext.o: $(ALL_H) $(OBJ_DIR)/lal_re_squared.o lal_re_squared_ext.cpp
	$(OCL) -o $@ -c lal_re_squared_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_cl.h: lal_lj.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh lj $(PRE1_H) lal_lj.cu $(OBJ_DIR)/lj_cl.h;

	$(OBJ_DIR)/lal_lj.o: $(ALL_H) lal_lj.h lal_lj.cpp $(OBJ_DIR)/lj_cl.h $(OBJ_DIR)/lj_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_lj.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_ext.o: $(ALL_H) lal_lj.h lal_lj_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_lj_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_coul_cl.h: lal_lj_coul.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh lj_coul $(PRE1_H) lal_lj_coul.cu $(OBJ_DIR)/lj_coul_cl.h;

	$(OBJ_DIR)/lal_lj_coul.o: $(ALL_H) lal_lj_coul.h lal_lj_coul.cpp $(OBJ_DIR)/lj_coul_cl.h $(OBJ_DIR)/lj_coul_cl.h $(OBJ_DIR)/lal_base_charge.o
	$(OCL) -o $@ -c lal_lj_coul.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_coul_ext.o: $(ALL_H) lal_lj_coul.h lal_lj_coul_ext.cpp lal_base_charge.h
	$(OCL) -o $@ -c lal_lj_coul_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_coul_long_cl.h: lal_lj_coul_long.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh lj_coul_long $(PRE1_H) lal_lj_coul_long.cu $(OBJ_DIR)/lj_coul_long_cl.h;

	$(OBJ_DIR)/lal_lj_coul_long.o: $(ALL_H) lal_lj_coul_long.h lal_lj_coul_long.cpp $(OBJ_DIR)/lj_coul_long_cl.h $(OBJ_DIR)/lal_base_charge.o
	$(OCL) -o $@ -c lal_lj_coul_long.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_coul_long_ext.o: $(ALL_H) lal_lj_coul_long.h lal_lj_coul_long_ext.cpp lal_base_charge.h
	$(OCL) -o $@ -c lal_lj_coul_long_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_dsf_cl.h: lal_lj_dsf.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh lj_dsf $(PRE1_H) lal_lj_dsf.cu $(OBJ_DIR)/lj_dsf_cl.h;

	$(OBJ_DIR)/lal_lj_dsf.o: $(ALL_H) lal_lj_dsf.h lal_lj_dsf.cpp $(OBJ_DIR)/lj_dsf_cl.h $(OBJ_DIR)/lj_dsf_cl.h $(OBJ_DIR)/lal_base_charge.o
	$(OCL) -o $@ -c lal_lj_dsf.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_dsf_ext.o: $(ALL_H) lal_lj_dsf.h lal_lj_dsf_ext.cpp lal_base_charge.h
	$(OCL) -o $@ -c lal_lj_dsf_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_class2_long_cl.h: lal_lj_class2_long.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh lj_class2_long $(PRE1_H) lal_lj_class2_long.cu $(OBJ_DIR)/lj_class2_long_cl.h;

	$(OBJ_DIR)/lal_lj_class2_long.o: $(ALL_H) lal_lj_class2_long.h lal_lj_class2_long.cpp $(OBJ_DIR)/lj_class2_long_cl.h $(OBJ_DIR)/lal_base_charge.o
	$(OCL) -o $@ -c lal_lj_class2_long.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_class2_long_ext.o: $(ALL_H) lal_lj_class2_long.h lal_lj_class2_long_ext.cpp lal_base_charge.h
	$(OCL) -o $@ -c lal_lj_class2_long_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/coul_long_cl.h: lal_coul_long.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh coul_long $(PRE1_H) lal_coul_long.cu $(OBJ_DIR)/coul_long_cl.h;

	$(OBJ_DIR)/lal_coul_long.o: $(ALL_H) lal_coul_long.h lal_coul_long.cpp $(OBJ_DIR)/coul_long_cl.h $(OBJ_DIR)/lal_base_charge.o
	$(OCL) -o $@ -c lal_coul_long.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_coul_long_ext.o: $(ALL_H) lal_coul_long.h lal_coul_long_ext.cpp lal_base_charge.h
	$(OCL) -o $@ -c lal_coul_long_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/morse_cl.h: lal_morse.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh morse $(PRE1_H) lal_morse.cu $(OBJ_DIR)/morse_cl.h;

	$(OBJ_DIR)/lal_morse.o: $(ALL_H) lal_morse.h lal_morse.cpp $(OBJ_DIR)/morse_cl.h $(OBJ_DIR)/morse_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_morse.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_morse_ext.o: $(ALL_H) lal_morse.h lal_morse_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_morse_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/charmm_long_cl.h: lal_charmm_long.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh charmm_long $(PRE1_H) lal_charmm_long.cu $(OBJ_DIR)/charmm_long_cl.h;

	$(OBJ_DIR)/lal_charmm_long.o: $(ALL_H) lal_charmm_long.h lal_charmm_long.cpp $(OBJ_DIR)/charmm_long_cl.h $(OBJ_DIR)/charmm_long_cl.h $(OBJ_DIR)/lal_base_charge.o
	$(OCL) -o $@ -c lal_charmm_long.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_charmm_long_ext.o: $(ALL_H) lal_charmm_long.h lal_charmm_long_ext.cpp lal_base_charge.h
	$(OCL) -o $@ -c lal_charmm_long_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj96_cl.h: lal_lj96.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh lj96 $(PRE1_H) lal_lj96.cu $(OBJ_DIR)/lj96_cl.h;

	$(OBJ_DIR)/lal_lj96.o: $(ALL_H) lal_lj96.h lal_lj96.cpp $(OBJ_DIR)/lj96_cl.h $(OBJ_DIR)/lj96_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_lj96.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj96_ext.o: $(ALL_H) lal_lj96.h lal_lj96_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_lj96_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_expand_cl.h: lal_lj_expand.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh lj_expand $(PRE1_H) lal_lj_expand.cu $(OBJ_DIR)/lj_expand_cl.h;

	$(OBJ_DIR)/lal_lj_expand.o: $(ALL_H) lal_lj_expand.h lal_lj_expand.cpp $(OBJ_DIR)/lj_expand_cl.h $(OBJ_DIR)/lj_expand_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_lj_expand.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_expand_ext.o: $(ALL_H) lal_lj_expand.h lal_lj_expand_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_lj_expand_ext.cpp -I$(OBJ_DIR)

	-$(OBJ_DIR)/cg_cmm_cl.h: lal_cg_cmm.cu $(PRE1_H)
	- $(BSH) ./geryon/file_to_cstr.sh cg_cmm $(PRE1_H) lal_cg_cmm.cu $(OBJ_DIR)/cg_cmm_cl.h;
	+$(OBJ_DIR)/lj_sdk_cl.h: lal_lj_sdk.cu $(PRE1_H)
	+ $(BSH) ./geryon/file_to_cstr.sh lj_sdk $(PRE1_H) lal_lj_sdk.cu $(OBJ_DIR)/lj_sdk_cl.h;

	-$(OBJ_DIR)/lal_cg_cmm.o: $(ALL_H) lal_cg_cmm.h lal_cg_cmm.cpp $(OBJ_DIR)/cg_cmm_cl.h $(OBJ_DIR)/cg_cmm_cl.h $(OBJ_DIR)/lal_base_atomic.o
	- $(OCL) -o $@ -c lal_cg_cmm.cpp -I$(OBJ_DIR)
	+$(OBJ_DIR)/lal_lj_sdk.o: $(ALL_H) lal_lj_sdk.h lal_lj_sdk.cpp $(OBJ_DIR)/lj_sdk_cl.h $(OBJ_DIR)/lj_sdk_cl.h $(OBJ_DIR)/lal_base_atomic.o
	+ $(OCL) -o $@ -c lal_lj_sdk.cpp -I$(OBJ_DIR)

	-$(OBJ_DIR)/lal_cg_cmm_ext.o: $(ALL_H) lal_cg_cmm.h lal_cg_cmm_ext.cpp lal_base_atomic.h
	- $(OCL) -o $@ -c lal_cg_cmm_ext.cpp -I$(OBJ_DIR)
	+$(OBJ_DIR)/lal_lj_sdk_ext.o: $(ALL_H) lal_lj_sdk.h lal_lj_sdk_ext.cpp lal_base_atomic.h
	+ $(OCL) -o $@ -c lal_lj_sdk_ext.cpp -I$(OBJ_DIR)

	-$(OBJ_DIR)/cg_cmm_long_cl.h: lal_cg_cmm_long.cu $(PRE1_H)
	- $(BSH) ./geryon/file_to_cstr.sh cg_cmm_long $(PRE1_H) lal_cg_cmm_long.cu $(OBJ_DIR)/cg_cmm_long_cl.h;
	+$(OBJ_DIR)/lj_sdk_long_cl.h: lal_lj_sdk_long.cu $(PRE1_H)
	+ $(BSH) ./geryon/file_to_cstr.sh lj_sdk_long $(PRE1_H) lal_lj_sdk_long.cu $(OBJ_DIR)/lj_sdk_long_cl.h;

	-$(OBJ_DIR)/lal_cg_cmm_long.o: $(ALL_H) lal_cg_cmm_long.h lal_cg_cmm_long.cpp $(OBJ_DIR)/cg_cmm_long_cl.h $(OBJ_DIR)/cg_cmm_long_cl.h $(OBJ_DIR)/lal_base_atomic.o
	- $(OCL) -o $@ -c lal_cg_cmm_long.cpp -I$(OBJ_DIR)
	+$(OBJ_DIR)/lal_lj_sdk_long.o: $(ALL_H) lal_lj_sdk_long.h lal_lj_sdk_long.cpp $(OBJ_DIR)/lj_sdk_long_cl.h $(OBJ_DIR)/lj_sdk_long_cl.h $(OBJ_DIR)/lal_base_atomic.o
	+ $(OCL) -o $@ -c lal_lj_sdk_long.cpp -I$(OBJ_DIR)

	-$(OBJ_DIR)/lal_cg_cmm_long_ext.o: $(ALL_H) lal_cg_cmm_long.h lal_cg_cmm_long_ext.cpp lal_base_charge.h
	- $(OCL) -o $@ -c lal_cg_cmm_long_ext.cpp -I$(OBJ_DIR)
	+$(OBJ_DIR)/lal_lj_sdk_long_ext.o: $(ALL_H) lal_lj_sdk_long.h lal_lj_sdk_long_ext.cpp lal_base_charge.h
	+ $(OCL) -o $@ -c lal_lj_sdk_long_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/eam_cl.h: lal_eam.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh eam $(PRE1_H) lal_eam.cu $(OBJ_DIR)/eam_cl.h;

	$(OBJ_DIR)/lal_eam.o: $(ALL_H) lal_eam.h lal_eam.cpp $(OBJ_DIR)/eam_cl.h $(OBJ_DIR)/eam_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_eam.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_eam_ext.o: $(ALL_H) lal_eam.h lal_eam_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_eam_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_eam_fs_ext.o: $(ALL_H) lal_eam.h lal_eam_fs_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_eam_fs_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_eam_alloy_ext.o: $(ALL_H) lal_eam.h lal_eam_alloy_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_eam_alloy_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/buck_cl.h: lal_buck.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh buck $(PRE1_H) lal_buck.cu $(OBJ_DIR)/buck_cl.h;

	$(OBJ_DIR)/lal_buck.o: $(ALL_H) lal_buck.h lal_buck.cpp $(OBJ_DIR)/buck_cl.h $(OBJ_DIR)/buck_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_buck.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_buck_ext.o: $(ALL_H) lal_buck.h lal_buck_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_buck_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/buck_coul_cl.h: lal_buck_coul.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh buck_coul $(PRE1_H) lal_buck_coul.cu $(OBJ_DIR)/buck_coul_cl.h;

	$(OBJ_DIR)/lal_buck_coul.o: $(ALL_H) lal_buck_coul.h lal_buck_coul.cpp $(OBJ_DIR)/buck_coul_cl.h $(OBJ_DIR)/buck_coul_cl.h $(OBJ_DIR)/lal_base_charge.o
	$(OCL) -o $@ -c lal_buck_coul.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_buck_coul_ext.o: $(ALL_H) lal_buck_coul.h lal_buck_coul_ext.cpp lal_base_charge.h
	$(OCL) -o $@ -c lal_buck_coul_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/buck_coul_long_cl.h: lal_buck_coul_long.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh buck_coul_long $(PRE1_H) lal_buck_coul_long.cu $(OBJ_DIR)/buck_coul_long_cl.h;

	$(OBJ_DIR)/lal_buck_coul_long.o: $(ALL_H) lal_buck_coul_long.h lal_buck_coul_long.cpp $(OBJ_DIR)/buck_coul_long_cl.h $(OBJ_DIR)/buck_coul_long_cl.h $(OBJ_DIR)/lal_base_charge.o
	$(OCL) -o $@ -c lal_buck_coul_long.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_buck_coul_long_ext.o: $(ALL_H) lal_buck_coul_long.h lal_buck_coul_long_ext.cpp lal_base_charge.h
	$(OCL) -o $@ -c lal_buck_coul_long_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/table_cl.h: lal_table.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh table $(PRE1_H) lal_table.cu $(OBJ_DIR)/table_cl.h;

	$(OBJ_DIR)/lal_table.o: $(ALL_H) lal_table.h lal_table.cpp $(OBJ_DIR)/table_cl.h $(OBJ_DIR)/table_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_table.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_table_ext.o: $(ALL_H) lal_table.h lal_table_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_table_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/yukawa_cl.h: lal_yukawa.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh yukawa $(PRE1_H) lal_yukawa.cu $(OBJ_DIR)/yukawa_cl.h;

	$(OBJ_DIR)/lal_yukawa.o: $(ALL_H) lal_yukawa.h lal_yukawa.cpp $(OBJ_DIR)/yukawa_cl.h $(OBJ_DIR)/yukawa_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_yukawa.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_yukawa_ext.o: $(ALL_H) lal_yukawa.h lal_yukawa_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_yukawa_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/born_cl.h: lal_born.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh born $(PRE1_H) lal_born.cu $(OBJ_DIR)/born_cl.h;

	$(OBJ_DIR)/lal_born.o: $(ALL_H) lal_born.h lal_born.cpp $(OBJ_DIR)/born_cl.h $(OBJ_DIR)/born_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_born.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_born_ext.o: $(ALL_H) lal_born.h lal_born_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_born_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/born_coul_wolf_cl.h: lal_born_coul_wolf.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh born_coul_wolf $(PRE1_H) lal_born_coul_wolf.cu $(OBJ_DIR)/born_coul_wolf_cl.h;

	$(OBJ_DIR)/lal_born_coul_wolf.o: $(ALL_H) lal_born_coul_wolf.h lal_born_coul_wolf.cpp $(OBJ_DIR)/born_coul_wolf_cl.h $(OBJ_DIR)/born_coul_wolf_cl.h $(OBJ_DIR)/lal_base_charge.o
	$(OCL) -o $@ -c lal_born_coul_wolf.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_born_coul_wolf_ext.o: $(ALL_H) lal_born_coul_wolf.h lal_born_coul_wolf_ext.cpp lal_base_charge.h
	$(OCL) -o $@ -c lal_born_coul_wolf_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/born_coul_long_cl.h: lal_born_coul_long.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh born_coul_long $(PRE1_H) lal_born_coul_long.cu $(OBJ_DIR)/born_coul_long_cl.h;

	$(OBJ_DIR)/lal_born_coul_long.o: $(ALL_H) lal_born_coul_long.h lal_born_coul_long.cpp $(OBJ_DIR)/born_coul_long_cl.h $(OBJ_DIR)/born_coul_long_cl.h $(OBJ_DIR)/lal_base_charge.o
	$(OCL) -o $@ -c lal_born_coul_long.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_born_coul_long_ext.o: $(ALL_H) lal_born_coul_long.h lal_born_coul_long_ext.cpp lal_base_charge.h
	$(OCL) -o $@ -c lal_born_coul_long_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/dipole_lj_cl.h: lal_dipole_lj.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh dipole_lj $(PRE1_H) lal_dipole_lj.cu $(OBJ_DIR)/dipole_lj_cl.h;

	$(OBJ_DIR)/lal_dipole_lj.o: $(ALL_H) lal_dipole_lj.h lal_dipole_lj.cpp $(OBJ_DIR)/dipole_lj_cl.h $(OBJ_DIR)/dipole_lj_cl.h $(OBJ_DIR)/lal_base_dipole.o
	$(OCL) -o $@ -c lal_dipole_lj.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_dipole_lj_ext.o: $(ALL_H) lal_dipole_lj.h lal_dipole_lj_ext.cpp lal_base_dipole.h
	$(OCL) -o $@ -c lal_dipole_lj_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/dipole_lj_sf_cl.h: lal_dipole_lj_sf.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh dipole_lj_sf $(PRE1_H) lal_dipole_lj_sf.cu $(OBJ_DIR)/dipole_lj_sf_cl.h;

	$(OBJ_DIR)/lal_dipole_lj_sf.o: $(ALL_H) lal_dipole_lj_sf.h lal_dipole_lj_sf.cpp $(OBJ_DIR)/dipole_lj_sf_cl.h $(OBJ_DIR)/dipole_lj_sf_cl.h $(OBJ_DIR)/lal_base_dipole.o
	$(OCL) -o $@ -c lal_dipole_lj_sf.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_dipole_lj_sf_ext.o: $(ALL_H) lal_dipole_lj_sf.h lal_dipole_lj_sf_ext.cpp lal_base_dipole.h
	$(OCL) -o $@ -c lal_dipole_lj_sf_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/colloid_cl.h: lal_colloid.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh colloid $(PRE1_H) lal_colloid.cu $(OBJ_DIR)/colloid_cl.h;

	$(OBJ_DIR)/lal_colloid.o: $(ALL_H) lal_colloid.h lal_colloid.cpp $(OBJ_DIR)/colloid_cl.h $(OBJ_DIR)/colloid_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_colloid.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_colloid_ext.o: $(ALL_H) lal_colloid.h lal_colloid_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_colloid_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/gauss_cl.h: lal_gauss.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh gauss $(PRE1_H) lal_gauss.cu $(OBJ_DIR)/gauss_cl.h;

	$(OBJ_DIR)/lal_gauss.o: $(ALL_H) lal_gauss.h lal_gauss.cpp $(OBJ_DIR)/gauss_cl.h $(OBJ_DIR)/gauss_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_gauss.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_gauss_ext.o: $(ALL_H) lal_gauss.h lal_gauss_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_gauss_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/yukawa_colloid_cl.h: lal_yukawa_colloid.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh yukawa_colloid $(PRE1_H) lal_yukawa_colloid.cu $(OBJ_DIR)/yukawa_colloid_cl.h;

	$(OBJ_DIR)/lal_yukawa_colloid.o: $(ALL_H) lal_yukawa_colloid.h lal_yukawa_colloid.cpp $(OBJ_DIR)/yukawa_colloid_cl.h $(OBJ_DIR)/yukawa_colloid_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_yukawa_colloid.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_yukawa_colloid_ext.o: $(ALL_H) lal_yukawa_colloid.h lal_yukawa_colloid_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_yukawa_colloid_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_coul_debye_cl.h: lal_lj_coul_debye.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh lj_coul_debye $(PRE1_H) lal_lj_coul_debye.cu $(OBJ_DIR)/lj_coul_debye_cl.h;

	$(OBJ_DIR)/lal_lj_coul_debye.o: $(ALL_H) lal_lj_coul_debye.h lal_lj_coul_debye.cpp $(OBJ_DIR)/lj_coul_debye_cl.h $(OBJ_DIR)/lj_coul_debye_cl.h $(OBJ_DIR)/lal_base_charge.o
	$(OCL) -o $@ -c lal_lj_coul_debye.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_coul_debye_ext.o: $(ALL_H) lal_lj_coul_debye.h lal_lj_coul_debye_ext.cpp lal_base_charge.h
	$(OCL) -o $@ -c lal_lj_coul_debye_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/coul_dsf_cl.h: lal_coul_dsf.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh coul_dsf $(PRE1_H) lal_coul_dsf.cu $(OBJ_DIR)/coul_dsf_cl.h;

	$(OBJ_DIR)/lal_coul_dsf.o: $(ALL_H) lal_coul_dsf.h lal_coul_dsf.cpp $(OBJ_DIR)/coul_dsf_cl.h $(OBJ_DIR)/coul_dsf_cl.h $(OBJ_DIR)/lal_base_charge.o
	$(OCL) -o $@ -c lal_coul_dsf.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_coul_dsf_ext.o: $(ALL_H) lal_coul_dsf.h lal_coul_dsf_ext.cpp lal_base_charge.h
	$(OCL) -o $@ -c lal_coul_dsf_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/sw_cl.h: lal_sw.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh sw $(PRE1_H) lal_sw.cu $(OBJ_DIR)/sw_cl.h;

	$(OBJ_DIR)/lal_sw.o: $(ALL_H) lal_sw.h lal_sw.cpp $(OBJ_DIR)/sw_cl.h $(OBJ_DIR)/sw_cl.h $(OBJ_DIR)/lal_base_three.o
	$(OCL) -o $@ -c lal_sw.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_sw_ext.o: $(ALL_H) lal_sw.h lal_sw_ext.cpp lal_base_three.h
	$(OCL) -o $@ -c lal_sw_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/beck_cl.h: lal_beck.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh beck $(PRE1_H) lal_beck.cu $(OBJ_DIR)/beck_cl.h;

	$(OBJ_DIR)/lal_beck.o: $(ALL_H) lal_beck.h lal_beck.cpp $(OBJ_DIR)/beck_cl.h $(OBJ_DIR)/beck_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_beck.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_beck_ext.o: $(ALL_H) lal_beck.h lal_beck_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_beck_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/mie_cl.h: lal_mie.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh mie $(PRE1_H) lal_mie.cu $(OBJ_DIR)/mie_cl.h;

	$(OBJ_DIR)/lal_mie.o: $(ALL_H) lal_mie.h lal_mie.cpp $(OBJ_DIR)/mie_cl.h $(OBJ_DIR)/mie_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_mie.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_mie_ext.o: $(ALL_H) lal_mie.h lal_mie_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_mie_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/soft_cl.h: lal_soft.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh soft $(PRE1_H) lal_soft.cu $(OBJ_DIR)/soft_cl.h;

	$(OBJ_DIR)/lal_soft.o: $(ALL_H) lal_soft.h lal_soft.cpp $(OBJ_DIR)/soft_cl.h $(OBJ_DIR)/soft_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_soft.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_soft_ext.o: $(ALL_H) lal_soft.h lal_soft_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_soft_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_coul_msm_cl.h: lal_lj_coul_msm.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh lj_coul_msm $(PRE1_H) lal_lj_coul_msm.cu $(OBJ_DIR)/lj_coul_msm_cl.h;

	$(OBJ_DIR)/lal_lj_coul_msm.o: $(ALL_H) lal_lj_coul_msm.h lal_lj_coul_msm.cpp $(OBJ_DIR)/lj_coul_msm_cl.h $(OBJ_DIR)/lj_coul_msm_cl.h $(OBJ_DIR)/lal_base_charge.o
	$(OCL) -o $@ -c lal_lj_coul_msm.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_coul_msm_ext.o: $(ALL_H) lal_lj_coul_msm.h lal_lj_coul_msm_ext.cpp lal_base_charge.h
	$(OCL) -o $@ -c lal_lj_coul_msm_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_gromacs_cl.h: lal_lj_gromacs.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh lj_gromacs $(PRE1_H) lal_lj_gromacs.cu $(OBJ_DIR)/lj_gromacs_cl.h;

	$(OBJ_DIR)/lal_lj_gromacs.o: $(ALL_H) lal_lj_gromacs.h lal_lj_gromacs.cpp $(OBJ_DIR)/lj_gromacs_cl.h $(OBJ_DIR)/lj_gromacs_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_lj_gromacs.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_gromacs_ext.o: $(ALL_H) lal_lj_gromacs.h lal_lj_gromacs_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_lj_gromacs_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/dpd_cl.h: lal_dpd.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh dpd $(PRE1_H) lal_dpd.cu $(OBJ_DIR)/dpd_cl.h;

	$(OBJ_DIR)/lal_dpd.o: $(ALL_H) lal_dpd.h lal_dpd.cpp $(OBJ_DIR)/dpd_cl.h $(OBJ_DIR)/dpd_cl.h $(OBJ_DIR)/lal_base_dpd.o
	$(OCL) -o $@ -c lal_dpd.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_dpd_ext.o: $(ALL_H) lal_dpd.h lal_dpd_ext.cpp lal_base_dpd.h
	$(OCL) -o $@ -c lal_dpd_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/tersoff_cl.h: lal_tersoff.cu lal_tersoff_extra.h $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh tersoff $(PRE1_H) lal_tersoff_extra.h lal_tersoff.cu $(OBJ_DIR)/tersoff_cl.h;

	$(OBJ_DIR)/lal_tersoff.o: $(ALL_H) lal_tersoff.h lal_tersoff.cpp $(OBJ_DIR)/tersoff_cl.h $(OBJ_DIR)/tersoff_cl.h $(OBJ_DIR)/lal_base_three.o
	$(OCL) -o $@ -c lal_tersoff.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_tersoff_ext.o: $(ALL_H) lal_tersoff.h lal_tersoff_ext.cpp lal_base_three.h
	$(OCL) -o $@ -c lal_tersoff_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/tersoff_zbl_cl.h: lal_tersoff_zbl.cu lal_tersoff_zbl_extra.h $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh tersoff_zbl $(PRE1_H) lal_tersoff_zbl_extra.h lal_tersoff_zbl.cu $(OBJ_DIR)/tersoff_zbl_cl.h;

	$(OBJ_DIR)/lal_tersoff_zbl.o: $(ALL_H) lal_tersoff_zbl.h lal_tersoff_zbl.cpp $(OBJ_DIR)/tersoff_zbl_cl.h $(OBJ_DIR)/tersoff_zbl_cl.h $(OBJ_DIR)/lal_base_three.o
	$(OCL) -o $@ -c lal_tersoff_zbl.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_tersoff_zbl_ext.o: $(ALL_H) lal_tersoff_zbl.h lal_tersoff_zbl_ext.cpp lal_base_three.h
	$(OCL) -o $@ -c lal_tersoff_zbl_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/tersoff_mod_cl.h: lal_tersoff_mod.cu lal_tersoff_mod_extra.h $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh tersoff_mod $(PRE1_H) lal_tersoff_mod_extra.h lal_tersoff_mod.cu $(OBJ_DIR)/tersoff_mod_cl.h;

	$(OBJ_DIR)/lal_tersoff_mod.o: $(ALL_H) lal_tersoff_mod.h lal_tersoff_mod.cpp $(OBJ_DIR)/tersoff_mod_cl.h $(OBJ_DIR)/tersoff_mod_cl.h $(OBJ_DIR)/lal_base_three.o
	$(OCL) -o $@ -c lal_tersoff_mod.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_tersoff_mod_ext.o: $(ALL_H) lal_tersoff_mod.h lal_tersoff_mod_ext.cpp lal_base_three.h
	$(OCL) -o $@ -c lal_tersoff_mod_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/coul_cl.h: lal_coul.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh coul $(PRE1_H) lal_coul.cu $(OBJ_DIR)/coul_cl.h;

	$(OBJ_DIR)/lal_coul.o: $(ALL_H) lal_coul.h lal_coul.cpp $(OBJ_DIR)/coul_cl.h $(OBJ_DIR)/coul_cl.h $(OBJ_DIR)/lal_base_charge.o
	$(OCL) -o $@ -c lal_coul.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_coul_ext.o: $(ALL_H) lal_coul.h lal_coul_ext.cpp lal_base_charge.h
	$(OCL) -o $@ -c lal_coul_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/coul_debye_cl.h: lal_coul_debye.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh coul_debye $(PRE1_H) lal_coul_debye.cu $(OBJ_DIR)/coul_debye_cl.h;

	$(OBJ_DIR)/lal_coul_debye.o: $(ALL_H) lal_coul_debye.h lal_coul_debye.cpp $(OBJ_DIR)/coul_debye_cl.h $(OBJ_DIR)/coul_debye_cl.h $(OBJ_DIR)/lal_base_charge.o
	$(OCL) -o $@ -c lal_coul_debye.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_coul_debye_ext.o: $(ALL_H) lal_coul_debye.h lal_coul_debye_ext.cpp lal_base_charge.h
	$(OCL) -o $@ -c lal_coul_debye_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/zbl_cl.h: lal_zbl.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh zbl $(PRE1_H) lal_zbl.cu $(OBJ_DIR)/zbl_cl.h;

	$(OBJ_DIR)/lal_zbl.o: $(ALL_H) lal_zbl.h lal_zbl.cpp $(OBJ_DIR)/zbl_cl.h $(OBJ_DIR)/zbl_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_zbl.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_zbl_ext.o: $(ALL_H) lal_zbl.h lal_zbl_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_zbl_ext.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lj_cubic_cl.h: lal_lj_cubic.cu $(PRE1_H)
	$(BSH) ./geryon/file_to_cstr.sh lj_cubic $(PRE1_H) lal_lj_cubic.cu $(OBJ_DIR)/lj_cubic_cl.h;

	$(OBJ_DIR)/lal_lj_cubic.o: $(ALL_H) lal_lj_cubic.h lal_lj_cubic.cpp $(OBJ_DIR)/lj_cubic_cl.h $(OBJ_DIR)/lj_cubic_cl.h $(OBJ_DIR)/lal_base_atomic.o
	$(OCL) -o $@ -c lal_lj_cubic.cpp -I$(OBJ_DIR)

	$(OBJ_DIR)/lal_lj_cubic_ext.o: $(ALL_H) lal_lj_cubic.h lal_lj_cubic_ext.cpp lal_base_atomic.h
	$(OCL) -o $@ -c lal_lj_cubic_ext.cpp -I$(OBJ_DIR)

	$(BIN_DIR)/ocl_get_devices: ./geryon/ucl_get_devices.cpp
	$(OCL) -o $@ ./geryon/ucl_get_devices.cpp -DUCL_OPENCL $(OCL_LINK)

	$(OCL_LIB): $(OBJS) $(PTXS)
	$(AR) -crusv $(OCL_LIB) $(OBJS)
	@cp $(EXTRAMAKE) Makefile.lammps

	opencl: $(OCL_EXECS)

	clean:
	-rm -rf $(EXECS) $(OCL_EXECS) $(OCL_LIB) $(OBJS) $(KERS) *.linkinfo

	veryclean: clean
	-rm -rf ~ .linkinfo

	diff --git a/lib/gpu/README b/lib/gpu/README
	index 45c8ce49b..b26897e88 100644
	--- a/lib/gpu/README
	+++ b/lib/gpu/README
	@@ -1,217 +1,222 @@
	--------------------------------
	LAMMPS ACCELERATOR LIBRARY
	--------------------------------

	W. Michael Brown (ORNL)
	Trung Dac Nguyen (ORNL)
	Peng Wang (NVIDIA)
	Axel Kohlmeyer (Temple)
	Steve Plimpton (SNL)
	Inderaj Bains (NVIDIA)

	-------------------------------------------------------------------

	This directory has source files to build a library that LAMMPS
	links against when using the GPU package.

	This library must be built with a C++ compiler, before LAMMPS is
	built, so LAMMPS can link against it.

	+You can type "make lib-gpu" from the src directory to see help on how
	+to build this library via make commands, or you can do the same thing
	+by typing "python Install.py" from within this directory, or you can
	+do it manually by following the instructions below.
	+
	Build the library using one of the provided Makefile.* files or create
	your own, specific to your compiler and system. For example:

	make -f Makefile.linux

	When you are done building this library, two files should
	exist in this directory:

	libgpu.a the library LAMMPS will link against
	Makefile.lammps settings the LAMMPS Makefile will import

	Makefile.lammps is created by the make command, by copying one of the
	Makefile.lammps.* files. See the EXTRAMAKE setting at the top of the
	Makefile.* files.

	IMPORTANT: You should examine the final Makefile.lammps to insure it is
	correct for your system, else the LAMMPS build can fail.

	IMPORTANT: If you re-build the library, e.g. for a different precision
	(see below), you should do a "make clean" first, e.g. make -f
	Makefile.linux clean, to insure all previous derived files are removed
	before the new build is done.

	Makefile.lammps has settings for 3 variables:

	user-gpu_SYSINC = leave blank for this package
	user-gpu_SYSLIB = CUDA libraries needed by this package
	user-gpu_SYSPATH = path(s) to where those libraries are

	Because you have the CUDA compilers on your system, you should have
	the needed libraries. If the CUDA developement tools were installed
	in the standard manner, the settings in the Makefile.lammps.standard
	file should work.

	-------------------------------------------------------------------

	GENERAL NOTES
	--------------------------------

	This library, libgpu.a, provides routines for GPU acceleration
	of certain LAMMPS styles and neighbor list builds. Compilation of this
	library requires installing the CUDA GPU driver and CUDA toolkit for
	your operating system. Installation of the CUDA SDK is not necessary.
	In addition to the LAMMPS library, the binary nvc_get_devices will also
	be built. This can be used to query the names and properties of GPU
	devices on your system. A Makefile for OpenCL compilation is provided,
	but support for OpenCL use is not currently provided by the developers.
	Details of the implementation are provided in:

	----

	Brown, W.M., Wang, P. Plimpton, S.J., Tharrington, A.N. Implementing
	Molecular Dynamics on Hybrid High Performance Computers - Short Range
	Forces. Computer Physics Communications. 2011. 182: p. 898-911.

	and

	Brown, W.M., Kohlmeyer, A. Plimpton, S.J., Tharrington, A.N. Implementing
	Molecular Dynamics on Hybrid High Performance Computers - Particle-Particle
	Particle-Mesh. Computer Physics Communications. 2012. 183: p. 449-459.

	and

	Brown, W.M., Masako, Y. Implementing Molecular Dynamics on Hybrid High
	Performance Computers - Three-Body Potentials. Computer Physics Communications.
	2013. 184: p. 2785–2793.

	----

	NOTE: Installation of the CUDA SDK is not required.

	Current styles supporting GPU acceleration:

	1 beck
	2 born/coul/long
	3 born/coul/wolf
	4 born
	5 buck/coul/cut
	6 buck/coul/long
	7 buck
	8 colloid
	9 coul/dsf
	10 coul/long
	11 eam/alloy
	12 eam/fs
	13 eam
	14 gauss
	15 gayberne
	16 lj96/cut
	17 lj/charmm/coul/long
	18 lj/class2/coul/long
	19 lj/class2
	20 lj/cut/coul/cut
	21 lj/cut/coul/debye
	22 lj/cut/coul/dsf
	23 lj/cut/coul/long
	24 lj/cut/coul/msm
	25 lj/cut/dipole/cut
	26 lj/cut
	27 lj/expand
	28 lj/gromacs
	29 lj/sdk/coul/long
	30 lj/sdk
	31 lj/sf/dipole/sf
	32 mie/cut
	33 morse
	34 resquared
	35 soft
	36 sw
	37 table
	38 yukawa/colloid
	39 yukawa
	40 pppm


	MULTIPLE LAMMPS PROCESSES
	--------------------------------

	Multiple LAMMPS MPI processes can share GPUs on the system, but multiple
	GPUs cannot be utilized by a single MPI process. In many cases, the
	best performance will be obtained by running as many MPI processes as
	CPU cores available with the condition that the number of MPI processes
	is an integer multiple of the number of GPUs being used. See the
	LAMMPS user manual for details on running with GPU acceleration.


	BUILDING AND PRECISION MODES
	--------------------------------

	To build, edit the CUDA_ARCH, CUDA_PRECISION, CUDA_HOME variables in one of
	the Makefiles. CUDA_ARCH should be set based on the compute capability of
	your GPU. This can be verified by running the nvc_get_devices executable after
	the build is complete. Additionally, the GPU package must be installed and
	compiled for LAMMPS. This may require editing the gpu_SYSPATH variable in the
	LAMMPS makefile.

	Please note that the GPU library accesses the CUDA driver library directly,
	so it needs to be linked not only to the CUDA runtime library (libcudart.so)
	that ships with the CUDA toolkit, but also with the CUDA driver library
	(libcuda.so) that ships with the Nvidia driver. If you are compiling LAMMPS
	on the head node of a GPU cluster, this library may not be installed,
	so you may need to copy it over from one of the compute nodes (best into
	this directory).

	The gpu library supports 3 precision modes as determined by
	the CUDA_PRECISION variable:

	- CUDA_PREC = -D_SINGLE_SINGLE # Single precision for all calculations
	- CUDA_PREC = -D_DOUBLE_DOUBLE # Double precision for all calculations
	- CUDA_PREC = -D_SINGLE_DOUBLE # Accumulation of forces, etc. in double
	+ CUDA_PRECISION = -D_SINGLE_SINGLE # Single precision for all calculations
	+ CUDA_PRECISION = -D_DOUBLE_DOUBLE # Double precision for all calculations
	+ CUDA_PRECISION = -D_SINGLE_DOUBLE # Accumulation of forces, etc. in double

	NOTE: PPPM acceleration can only be run on GPUs with compute capability>=1.1.
	You will get the error "GPU library not compiled for this accelerator."
	when attempting to run PPPM on a GPU with compute capability 1.0.

	NOTE: Double precision is only supported on certain GPUs (with
	compute capability>=1.3). If you compile the GPU library for
	a GPU with compute capability 1.1 and 1.2, then only single
	precision FFTs are supported, i.e. LAMMPS has to be compiled
	with -DFFT_SINGLE. For details on configuring FFT support in
	LAMMPS, see http://lammps.sandia.gov/doc/Section_start.html#2_2_4

	NOTE: For graphics cards with compute capability>=1.3 (e.g. Tesla C1060),
	make sure that -arch=sm_13 is set on the CUDA_ARCH line.

	NOTE: For newer graphics card (a.k.a. "Fermi", e.g. Tesla C2050), make
	sure that either -arch=sm_20 or -arch=sm_21 is set on the
	CUDA_ARCH line, depending on hardware and CUDA toolkit version.

	NOTE: The gayberne/gpu pair style will only be installed if the ASPHERE
	package has been installed.

	NOTE: The cg/cmm/gpu and cg/cmm/coul/long/gpu pair styles will only be
	installed if the USER-CG-CMM package has been installed.

	NOTE: The lj/cut/coul/long/gpu, cg/cmm/coul/long/gpu, coul/long/gpu,
	lj/charmm/coul/long/gpu and pppm/gpu styles will only be installed
	if the KSPACE package has been installed.

	NOTE: The system-specific setting LAMMPS_SMALLBIG (default), LAMMPS_BIGBIG,
	or LAMMPS_SMALLSMALL if specified when building LAMMPS (i.e. in
	src/MAKE/Makefile.foo) should be consistent with that specified
	when building libgpu.a (i.e. by LMP_INC in the lib/gpu/Makefile.bar).

	EXAMPLE BUILD PROCESS
	--------------------------------

	cd ~/lammps/lib/gpu
	emacs Makefile.linux
	make -f Makefile.linux
	./nvc_get_devices
	cd ../../src
	emacs ./MAKE/Makefile.linux
	make yes-asphere
	make yes-kspace
	make yes-gpu
	make linux

	diff --git a/lib/gpu/lal_cg_cmm.cpp b/lib/gpu/lal_lj_sdk.cpp
	similarity index 85%
	rename from lib/gpu/lal_cg_cmm.cpp
	rename to lib/gpu/lal_lj_sdk.cpp
	index d361e32b0..618555e38 100644
	--- a/lib/gpu/lal_cg_cmm.cpp
	+++ b/lib/gpu/lal_lj_sdk.cpp
	@@ -1,154 +1,154 @@
	/***************************************************************************
	- cg_cmm.cpp
	+ lj_sdk.cpp
	-------------------
	W. Michael Brown (ORNL)

	Class for acceleration of the lj/sdk/cut pair style

	__________________________________________________________________________
	This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
	__________________________________________________________________________

	begin :
	email : brownw@ornl.gov
	***************************************************************************/

	#if defined(USE_OPENCL)
	-#include "cg_cmm_cl.h"
	+#include "lj_sdk_cl.h"
	#elif defined(USE_CUDART)
	-const char *cg_cmm=0;
	+const char *lj_sdk=0;
	#else
	-#include "cg_cmm_cubin.h"
	+#include "lj_sdk_cubin.h"
	#endif

	-#include "lal_cg_cmm.h"
	+#include "lal_lj_sdk.h"
	#include <cassert>
	using namespace LAMMPS_AL;
	#define CGCMMT CGCMM<numtyp, acctyp>

	extern Device<PRECISION,ACC_PRECISION> device;

	template <class numtyp, class acctyp>
	CGCMMT::CGCMM() : BaseAtomic<numtyp,acctyp>(), _allocated(false) {
	}

	template <class numtyp, class acctyp>
	CGCMMT::~CGCMM() {
	clear();
	}

	template <class numtyp, class acctyp>
	int CGCMMT::bytes_per_atom(const int max_nbors) const {
	return this->bytes_per_atom_atomic(max_nbors);
	}

	template <class numtyp, class acctyp>
	int CGCMMT::init(const int ntypes, double **host_cutsq,
	int host_cg_type, double host_lj1,
	double host_lj2, double host_lj3,
	double host_lj4, double host_offset,
	double *host_special_lj, const int nlocal,
	const int nall, const int max_nbors,
	const int maxspecial, const double cell_size,
	const double gpu_split, FILE *_screen) {
	int success;
	success=this->init_atomic(nlocal,nall,max_nbors,maxspecial,cell_size,gpu_split,
	- _screen,cg_cmm,"k_cg_cmm");
	+ _screen,lj_sdk,"k_lj_sdk");
	if (success!=0)
	return success;

	// If atom type constants fit in shared memory use fast kernel
	- int cmm_types=ntypes;
	+ int sdk_types=ntypes;
	shared_types=false;
	int max_shared_types=this->device->max_shared_types();
	- if (cmm_types<=max_shared_types && this->_block_size>=max_shared_types) {
	- cmm_types=max_shared_types;
	+ if (sdk_types<=max_shared_types && this->_block_size>=max_shared_types) {
	+ sdk_types=max_shared_types;
	shared_types=true;
	}
	- _cmm_types=cmm_types;
	+ _sdk_types=sdk_types;

	// Allocate a host write buffer for data initialization
	- UCL_H_Vec<numtyp> host_write(cmm_typescmm_types32,*(this->ucl_device),
	+ UCL_H_Vec<numtyp> host_write(sdk_typessdk_types32,*(this->ucl_device),
	UCL_WRITE_ONLY);

	- for (int i=0; i<cmm_types*cmm_types; i++)
	+ for (int i=0; i<sdk_types*sdk_types; i++)
	host_write[i]=0.0;

	- lj1.alloc(cmm_typescmm_types,(this->ucl_device),UCL_READ_ONLY);
	- this->atom->type_pack4(ntypes,cmm_types,lj1,host_write,host_cutsq,
	+ lj1.alloc(sdk_typessdk_types,(this->ucl_device),UCL_READ_ONLY);
	+ this->atom->type_pack4(ntypes,sdk_types,lj1,host_write,host_cutsq,
	host_cg_type,host_lj1,host_lj2);

	- lj3.alloc(cmm_typescmm_types,(this->ucl_device),UCL_READ_ONLY);
	- this->atom->type_pack4(ntypes,cmm_types,lj3,host_write,host_lj3,host_lj4,
	+ lj3.alloc(sdk_typessdk_types,(this->ucl_device),UCL_READ_ONLY);
	+ this->atom->type_pack4(ntypes,sdk_types,lj3,host_write,host_lj3,host_lj4,
	host_offset);

	UCL_H_Vec<double> dview;
	sp_lj.alloc(4,*(this->ucl_device),UCL_READ_ONLY);
	dview.view(host_special_lj,4,*(this->ucl_device));
	ucl_copy(sp_lj,dview,false);

	_allocated=true;
	this->_max_bytes=lj1.row_bytes()+lj3.row_bytes()+sp_lj.row_bytes();
	return 0;
	}

	template <class numtyp, class acctyp>
	void CGCMMT::clear() {
	if (!_allocated)
	return;
	_allocated=false;

	lj1.clear();
	lj3.clear();
	sp_lj.clear();
	this->clear_atomic();
	}

	template <class numtyp, class acctyp>
	double CGCMMT::host_memory_usage() const {
	return this->host_memory_usage_atomic()+sizeof(CGCMM<numtyp,acctyp>);
	}

	// ---------------------------------------------------------------------------
	// Calculate energies, forces, and torques
	// ---------------------------------------------------------------------------
	template <class numtyp, class acctyp>
	void CGCMMT::loop(const bool _eflag, const bool _vflag) {
	// Compute the block size and grid size to keep all cores busy
	const int BX=this->block_size();
	int eflag, vflag;
	if (_eflag)
	eflag=1;
	else
	eflag=0;

	if (_vflag)
	vflag=1;
	else
	vflag=0;

	int GX=static_cast<int>(ceil(static_cast<double>(this->ans->inum())/
	(BX/this->_threads_per_atom)));

	int ainum=this->ans->inum();
	int nbor_pitch=this->nbor->nbor_pitch();
	this->time_pair.start();
	if (shared_types) {
	this->k_pair_fast.set_size(GX,BX);
	this->k_pair_fast.run(&this->atom->x, &lj1, &lj3, &sp_lj,
	&this->nbor->dev_nbor, &this->_nbor_data->begin(),
	&this->ans->force, &this->ans->engv, &eflag,
	&vflag, &ainum, &nbor_pitch,
	&this->_threads_per_atom);
	} else {
	this->k_pair.set_size(GX,BX);
	this->k_pair.run(&this->atom->x, &lj1, &lj3,
	- &_cmm_types, &sp_lj, &this->nbor->dev_nbor,
	+ &_sdk_types, &sp_lj, &this->nbor->dev_nbor,
	&this->_nbor_data->begin(), &this->ans->force,
	&this->ans->engv, &eflag, &vflag, &ainum,
	&nbor_pitch, &this->_threads_per_atom);
	}
	this->time_pair.stop();
	}

	template class CGCMM<PRECISION,ACC_PRECISION>;
	diff --git a/lib/gpu/lal_cg_cmm.cu b/lib/gpu/lal_lj_sdk.cu
	similarity index 97%
	rename from lib/gpu/lal_cg_cmm.cu
	rename to lib/gpu/lal_lj_sdk.cu
	index 70d2ab609..01b2cdd18 100644
	--- a/lib/gpu/lal_cg_cmm.cu
	+++ b/lib/gpu/lal_lj_sdk.cu
	@@ -1,216 +1,216 @@
	// **************************************************************************
	-// cg_cmm.cu
	+// lj_sdk.cu
	// -------------------
	// W. Michael Brown (ORNL)
	//
	// Device code for acceleration of the lj/sdk pair style
	//
	// __________________________________________________________________________
	// This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
	// __________________________________________________________________________
	//
	// begin :
	// email : brownw@ornl.gov
	// ***************************************************************************/

	#ifdef NV_KERNEL
	#include "lal_aux_fun1.h"
	#ifndef _DOUBLE_DOUBLE
	texture<float4> pos_tex;
	#else
	texture<int4,1> pos_tex;
	#endif
	#else
	#define pos_tex x_
	#endif

	-__kernel void k_cg_cmm(const __global numtyp4 *restrict x_,
	+__kernel void k_lj_sdk(const __global numtyp4 *restrict x_,
	const __global numtyp4 *restrict lj1,
	const __global numtyp4 *restrict lj3,
	const int lj_types,
	const __global numtyp *restrict sp_lj_in,
	const __global int *dev_nbor,
	const __global int *dev_packed,
	__global acctyp4 *restrict ans,
	__global acctyp *restrict engv,
	const int eflag, const int vflag, const int inum,
	const int nbor_pitch, const int t_per_atom) {
	int tid, ii, offset;
	atom_info(t_per_atom,ii,tid,offset);

	__local numtyp sp_lj[4];
	sp_lj[0]=sp_lj_in[0];
	sp_lj[1]=sp_lj_in[1];
	sp_lj[2]=sp_lj_in[2];
	sp_lj[3]=sp_lj_in[3];

	acctyp energy=(acctyp)0;
	acctyp4 f;
	f.x=(acctyp)0; f.y=(acctyp)0; f.z=(acctyp)0;
	acctyp virial[6];
	for (int i=0; i<6; i++)
	virial[i]=(acctyp)0;

	if (ii<inum) {
	int nbor, nbor_end;
	int i, numj;
	__local int n_stride;
	nbor_info(dev_nbor,dev_packed,nbor_pitch,t_per_atom,ii,offset,i,numj,
	n_stride,nbor_end,nbor);

	numtyp4 ix; fetch4(ix,i,pos_tex); //x_[i];
	int itype=ix.w;

	numtyp factor_lj;
	for ( ; nbor<nbor_end; nbor+=n_stride) {

	int j=dev_packed[nbor];
	factor_lj = sp_lj[sbmask(j)];
	j &= NEIGHMASK;

	numtyp4 jx; fetch4(jx,j,pos_tex); //x_[j];
	int jtype=jx.w;

	// Compute r12
	numtyp delx = ix.x-jx.x;
	numtyp dely = ix.y-jx.y;
	numtyp delz = ix.z-jx.z;
	numtyp r2inv = delxdelx+delydely+delz*delz;

	int mtype=itype*lj_types+jtype;
	if (r2inv<lj1[mtype].x) {
	r2inv=ucl_recip(r2inv);
	numtyp inv1,inv2;

	if (lj1[mtype].y == 2) {
	inv1=r2inv*r2inv;
	inv2=inv1*inv1;
	} else if (lj1[mtype].y == 1) {
	inv2=r2inv*ucl_sqrt(r2inv);
	inv1=inv2*inv2;
	} else {
	inv1=r2invr2invr2inv;
	inv2=inv1;
	}
	numtyp force = factor_ljr2invinv1(lj1[mtype].zinv2-lj1[mtype].w);

	f.x+=delx*force;
	f.y+=dely*force;
	f.z+=delz*force;
	if (eflag>0)
	energy += factor_ljinv1(lj3[mtype].x*inv2-lj3[mtype].y)-
	lj3[mtype].z;
	if (vflag>0) {
	virial[0] += delxdelxforce;
	virial[1] += delydelyforce;
	virial[2] += delzdelzforce;
	virial[3] += delxdelyforce;
	virial[4] += delxdelzforce;
	virial[5] += delydelzforce;
	}
	}

	} // for nbor
	store_answers(f,energy,virial,ii,inum,tid,t_per_atom,offset,eflag,vflag,
	ans,engv);
	} // if ii
	}

	-__kernel void k_cg_cmm_fast(const __global numtyp4 *restrict x_,
	+__kernel void k_lj_sdk_fast(const __global numtyp4 *restrict x_,
	const __global numtyp4 *restrict lj1_in,
	const __global numtyp4 *restrict lj3_in,
	const __global numtyp *restrict sp_lj_in,
	const __global int *dev_nbor,
	const __global int *dev_packed,
	__global acctyp4 *restrict ans,
	__global acctyp *restrict engv,
	const int eflag, const int vflag, const int inum,
	const int nbor_pitch, const int t_per_atom) {
	int tid, ii, offset;
	atom_info(t_per_atom,ii,tid,offset);

	__local numtyp4 lj1[MAX_SHARED_TYPES*MAX_SHARED_TYPES];
	__local numtyp4 lj3[MAX_SHARED_TYPES*MAX_SHARED_TYPES];
	__local numtyp sp_lj[4];
	if (tid<4)
	sp_lj[tid]=sp_lj_in[tid];
	if (tid<MAX_SHARED_TYPES*MAX_SHARED_TYPES) {
	lj1[tid]=lj1_in[tid];
	if (eflag>0)
	lj3[tid]=lj3_in[tid];
	}

	acctyp energy=(acctyp)0;
	acctyp4 f;
	f.x=(acctyp)0; f.y=(acctyp)0; f.z=(acctyp)0;
	acctyp virial[6];
	for (int i=0; i<6; i++)
	virial[i]=(acctyp)0;

	__syncthreads();

	if (ii<inum) {
	int nbor, nbor_end;
	int i, numj;
	__local int n_stride;
	nbor_info(dev_nbor,dev_packed,nbor_pitch,t_per_atom,ii,offset,i,numj,
	n_stride,nbor_end,nbor);

	numtyp4 ix; fetch4(ix,i,pos_tex); //x_[i];
	int iw=ix.w;
	int itype=fast_mul((int)MAX_SHARED_TYPES,iw);

	numtyp factor_lj;
	for ( ; nbor<nbor_end; nbor+=n_stride) {

	int j=dev_packed[nbor];
	factor_lj = sp_lj[sbmask(j)];
	j &= NEIGHMASK;

	numtyp4 jx; fetch4(jx,j,pos_tex); //x_[j];
	int mtype=itype+jx.w;

	// Compute r12
	numtyp delx = ix.x-jx.x;
	numtyp dely = ix.y-jx.y;
	numtyp delz = ix.z-jx.z;
	numtyp r2inv = delxdelx+delydely+delz*delz;

	if (r2inv<lj1[mtype].x) {
	r2inv=ucl_recip(r2inv);
	numtyp inv1,inv2;

	if (lj1[mtype].y == (numtyp)2) {
	inv1=r2inv*r2inv;
	inv2=inv1*inv1;
	} else if (lj1[mtype].y == (numtyp)1) {
	inv2=r2inv*ucl_sqrt(r2inv);
	inv1=inv2*inv2;
	} else {
	inv1=r2invr2invr2inv;
	inv2=inv1;
	}
	numtyp force = factor_ljr2invinv1(lj1[mtype].zinv2-lj1[mtype].w);

	f.x+=delx*force;
	f.y+=dely*force;
	f.z+=delz*force;
	if (eflag>0)
	energy += factor_ljinv1(lj3[mtype].x*inv2-lj3[mtype].y)-
	lj3[mtype].z;
	if (vflag>0) {
	virial[0] += delxdelxforce;
	virial[1] += delydelyforce;
	virial[2] += delzdelzforce;
	virial[3] += delxdelyforce;
	virial[4] += delxdelzforce;
	virial[5] += delydelzforce;
	}
	}

	} // for nbor
	store_answers(f,energy,virial,ii,inum,tid,t_per_atom,offset,eflag,vflag,
	ans,engv);
	} // if ii
	}

	diff --git a/lib/gpu/lal_cg_cmm.h b/lib/gpu/lal_lj_sdk.h
	similarity index 97%
	rename from lib/gpu/lal_cg_cmm.h
	rename to lib/gpu/lal_lj_sdk.h
	index b7895b589..ac2b9aafe 100644
	--- a/lib/gpu/lal_cg_cmm.h
	+++ b/lib/gpu/lal_lj_sdk.h
	@@ -1,79 +1,79 @@
	/***************************************************************************
	- cg_cmm.h
	+ lj_sdk.h
	-------------------
	W. Michael Brown (ORNL)

	Class for acceleration of the lj/sdk pair style

	__________________________________________________________________________
	This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
	__________________________________________________________________________

	begin :
	email : brownw@ornl.gov
	***************************************************************************/

	#ifndef LAL_CG_CMM_H
	#define LAL_CG_CMM_H

	#include "lal_base_atomic.h"

	namespace LAMMPS_AL {

	template <class numtyp, class acctyp>
	class CGCMM : public BaseAtomic<numtyp, acctyp> {
	public:
	CGCMM();
	~CGCMM();

	/// Clear any previous data and set up for a new LAMMPS run
	/** \param max_nbors initial number of rows in the neighbor matrix
	* \param cell_size cutoff + skin
	* \param gpu_split fraction of particles handled by device
	*
	* Returns:
	* - 0 if successfull
	* - -1 if fix gpu not found
	* - -3 if there is an out of memory error
	* - -4 if the GPU library was not compiled for GPU
	* - -5 Double precision is not supported on card **/
	int init(const int ntypes, double host_cutsq, int host_cg_type,
	double host_lj1, double host_lj2, double **host_lj3,
	double host_lj4, double host_offset, double *host_special_lj,
	const int nlocal, const int nall, const int max_nbors,
	const int maxspecial, const double cell_size,
	const double gpu_split, FILE *screen);

	/// Clear all host and device data
	/ \note This is called at the beginning of the init() routine /
	void clear();

	/// Returns memory usage on device per atom
	int bytes_per_atom(const int max_nbors) const;

	/// Total host memory used by library for pair style
	double host_memory_usage() const;

	// --------------------------- TYPE DATA --------------------------

	/// lj1.x = cutsq, lj1.y=cg_type, lj1.z = lj1, lj1.w = lj2
	UCL_D_Vec<numtyp4> lj1;
	/// lj3.x = lj3, lj3.y = lj4, lj3.z = offset
	UCL_D_Vec<numtyp4> lj3;
	/// Special LJ values
	UCL_D_Vec<numtyp> sp_lj;

	/// If atom type constants fit in shared memory, use fast kernels
	bool shared_types;

	/// Number of atom types
	- int _cmm_types;
	+ int _sdk_types;

	private:
	bool _allocated;
	void loop(const bool _eflag, const bool _vflag);
	};

	}

	#endif
	diff --git a/lib/gpu/lal_cg_cmm_ext.cpp b/lib/gpu/lal_lj_sdk_ext.cpp
	similarity index 93%
	rename from lib/gpu/lal_cg_cmm_ext.cpp
	rename to lib/gpu/lal_lj_sdk_ext.cpp
	index b6fc110b1..386106161 100644
	--- a/lib/gpu/lal_cg_cmm_ext.cpp
	+++ b/lib/gpu/lal_lj_sdk_ext.cpp
	@@ -1,121 +1,121 @@
	/***************************************************************************
	- cg_cmm.h
	+ lj_sdk.h
	-------------------
	W. Michael Brown (ORNL)

	Functions for LAMMPS access to lj/sdk pair acceleration routines

	__________________________________________________________________________
	This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
	__________________________________________________________________________

	begin :
	email : brownw@ornl.gov
	***************************************************************************/

	#include <iostream>
	#include <cassert>
	#include <math.h>

	-#include "lal_cg_cmm.h"
	+#include "lal_lj_sdk.h"

	using namespace std;
	using namespace LAMMPS_AL;

	static CGCMM<PRECISION,ACC_PRECISION> CMMMF;

	// ---------------------------------------------------------------------------
	// Allocate memory on host and device and copy constants to device
	// ---------------------------------------------------------------------------
	-int cmm_gpu_init(const int ntypes, double cutsq, int cg_types,
	+int sdk_gpu_init(const int ntypes, double cutsq, int cg_types,
	double host_lj1, double host_lj2, double **host_lj3,
	double host_lj4, double offset, double *special_lj,
	const int inum, const int nall, const int max_nbors,
	const int maxspecial, const double cell_size, int &gpu_mode,
	FILE *screen) {
	CMMMF.clear();
	gpu_mode=CMMMF.device->gpu_mode();
	double gpu_split=CMMMF.device->particle_split();
	int first_gpu=CMMMF.device->first_device();
	int last_gpu=CMMMF.device->last_device();
	int world_me=CMMMF.device->world_me();
	int gpu_rank=CMMMF.device->gpu_rank();
	int procs_per_gpu=CMMMF.device->procs_per_gpu();

	CMMMF.device->init_message(screen,"lj/sdk",first_gpu,last_gpu);

	bool message=false;
	if (CMMMF.device->replica_me()==0 && screen)
	message=true;

	if (message) {
	fprintf(screen,"Initializing Device and compiling on process 0...");
	fflush(screen);
	}

	int init_ok=0;
	if (world_me==0)
	init_ok=CMMMF.init(ntypes,cutsq,cg_types,host_lj1,host_lj2,host_lj3,
	host_lj4, offset, special_lj, inum, nall, 300,
	maxspecial, cell_size, gpu_split, screen);

	CMMMF.device->world_barrier();
	if (message)
	fprintf(screen,"Done.\n");

	for (int i=0; i<procs_per_gpu; i++) {
	if (message) {
	if (last_gpu-first_gpu==0)
	fprintf(screen,"Initializing Device %d on core %d...",first_gpu,i);
	else
	fprintf(screen,"Initializing Devices %d-%d on core %d...",first_gpu,
	last_gpu,i);
	fflush(screen);
	}
	if (gpu_rank==i && world_me!=0)
	init_ok=CMMMF.init(ntypes,cutsq,cg_types,host_lj1,host_lj2,host_lj3,
	host_lj4, offset, special_lj, inum, nall, 300,
	maxspecial, cell_size, gpu_split, screen);

	CMMMF.device->gpu_barrier();
	if (message)
	fprintf(screen,"Done.\n");
	}
	if (message)
	fprintf(screen,"\n");

	if (init_ok==0)
	CMMMF.estimate_gpu_overhead();
	return init_ok;
	}

	-void cmm_gpu_clear() {
	+void sdk_gpu_clear() {
	CMMMF.clear();
	}

	-int** cmm_gpu_compute_n(const int ago, const int inum_full,
	+int** sdk_gpu_compute_n(const int ago, const int inum_full,
	const int nall, double *host_x, int host_type,
	double sublo, double subhi, tagint tag, int *nspecial,
	tagint **special, const bool eflag, const bool vflag,
	const bool eatom, const bool vatom, int &host_start,
	int ilist, int jnum, const double cpu_time,
	bool &success) {
	return CMMMF.compute(ago, inum_full, nall, host_x, host_type, sublo,
	subhi, tag, nspecial, special, eflag, vflag, eatom,
	vatom, host_start, ilist, jnum, cpu_time, success);
	}

	-void cmm_gpu_compute(const int ago, const int inum_full, const int nall,
	+void sdk_gpu_compute(const int ago, const int inum_full, const int nall,
	double *host_x, int host_type, int ilist, int numj,
	int **firstneigh, const bool eflag, const bool vflag,
	const bool eatom, const bool vatom, int &host_start,
	const double cpu_time, bool &success) {
	CMMMF.compute(ago,inum_full,nall,host_x,host_type,ilist,numj,
	firstneigh,eflag,vflag,eatom,vatom,host_start,cpu_time,success);
	}

	-double cmm_gpu_bytes() {
	+double sdk_gpu_bytes() {
	return CMMMF.host_memory_usage();
	}


	diff --git a/lib/gpu/lal_cg_cmm_long.cpp b/lib/gpu/lal_lj_sdk_long.cpp
	similarity index 96%
	rename from lib/gpu/lal_cg_cmm_long.cpp
	rename to lib/gpu/lal_lj_sdk_long.cpp
	index 14b5b7622..46caf6bd3 100644
	--- a/lib/gpu/lal_cg_cmm_long.cpp
	+++ b/lib/gpu/lal_lj_sdk_long.cpp
	@@ -1,166 +1,166 @@
	/***************************************************************************
	- cg_cmm_long.cpp
	+ lj_sdk_long.cpp
	-------------------
	W. Michael Brown (ORNL)

	Class for acceleration of the lj/sdk/coul/long pair style

	__________________________________________________________________________
	This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
	__________________________________________________________________________

	begin :
	email : brownw@ornl.gov
	***************************************************************************/

	#if defined(USE_OPENCL)
	-#include "cg_cmm_long_cl.h"
	+#include "lj_sdk_long_cl.h"
	#elif defined(USE_CUDART)
	-const char *cg_cmm_long=0;
	+const char *lj_sdk_long=0;
	#else
	-#include "cg_cmm_long_cubin.h"
	+#include "lj_sdk_long_cubin.h"
	#endif

	-#include "lal_cg_cmm_long.h"
	+#include "lal_lj_sdk_long.h"
	#include <cassert>
	using namespace LAMMPS_AL;
	#define CGCMMLongT CGCMMLong<numtyp, acctyp>

	extern Device<PRECISION,ACC_PRECISION> device;

	template <class numtyp, class acctyp>
	CGCMMLongT::CGCMMLong() : BaseCharge<numtyp,acctyp>(),
	_allocated(false) {
	}

	template <class numtyp, class acctyp>
	CGCMMLongT::~CGCMMLong() {
	clear();
	}

	template <class numtyp, class acctyp>
	int CGCMMLongT::bytes_per_atom(const int max_nbors) const {
	return this->bytes_per_atom_atomic(max_nbors);
	}

	template <class numtyp, class acctyp>
	int CGCMMLongT::init(const int ntypes, double **host_cutsq,
	int host_cg_type, double host_lj1,
	double host_lj2, double host_lj3,
	double host_lj4, double host_offset,
	double *host_special_lj, const int nlocal,
	const int nall, const int max_nbors,
	const int maxspecial, const double cell_size,
	const double gpu_split, FILE *_screen,
	double **host_cut_ljsq,
	const double host_cut_coulsq,
	double *host_special_coul, const double qqrd2e,
	const double g_ewald) {
	int success;
	success=this->init_atomic(nlocal,nall,max_nbors,maxspecial,cell_size,gpu_split,
	- _screen,cg_cmm_long,"k_cg_cmm_long");
	+ _screen,lj_sdk_long,"k_lj_sdk_long");
	if (success!=0)
	return success;

	// If atom type constants fit in shared memory use fast kernel
	int lj_types=ntypes;
	shared_types=false;
	int max_shared_types=this->device->max_shared_types();
	if (lj_types<=max_shared_types && this->_block_size>=max_shared_types) {
	lj_types=max_shared_types;
	shared_types=true;
	}
	_lj_types=lj_types;

	// Allocate a host write buffer for data initialization
	UCL_H_Vec<numtyp> host_write(lj_typeslj_types32,*(this->ucl_device),
	UCL_WRITE_ONLY);

	for (int i=0; i<lj_types*lj_types; i++)
	host_write[i]=0.0;

	lj1.alloc(lj_typeslj_types,(this->ucl_device),UCL_READ_ONLY);
	this->atom->type_pack4(ntypes,lj_types,lj1,host_write,host_cutsq,
	host_cut_ljsq,host_lj1,host_lj2);

	lj3.alloc(lj_typeslj_types,(this->ucl_device),UCL_READ_ONLY);
	this->atom->type_pack4(ntypes,lj_types,lj3,host_write,host_cg_type,host_lj3,
	host_lj4,host_offset);

	sp_lj.alloc(8,*(this->ucl_device),UCL_READ_ONLY);
	for (int i=0; i<4; i++) {
	host_write[i]=host_special_lj[i];
	host_write[i+4]=host_special_coul[i];
	}
	ucl_copy(sp_lj,host_write,8,false);

	_cut_coulsq=host_cut_coulsq;
	_qqrd2e=qqrd2e;
	_g_ewald=g_ewald;

	_allocated=true;
	this->_max_bytes=lj1.row_bytes()+lj3.row_bytes()+sp_lj.row_bytes();
	return 0;
	}

	template <class numtyp, class acctyp>
	void CGCMMLongT::clear() {
	if (!_allocated)
	return;
	_allocated=false;

	lj1.clear();
	lj3.clear();
	sp_lj.clear();
	this->clear_atomic();
	}

	template <class numtyp, class acctyp>
	double CGCMMLongT::host_memory_usage() const {
	return this->host_memory_usage_atomic()+sizeof(CGCMMLong<numtyp,acctyp>);
	}

	// ---------------------------------------------------------------------------
	// Calculate energies, forces, and torques
	// ---------------------------------------------------------------------------
	template <class numtyp, class acctyp>
	void CGCMMLongT::loop(const bool _eflag, const bool _vflag) {
	// Compute the block size and grid size to keep all cores busy
	const int BX=this->block_size();
	int eflag, vflag;
	if (_eflag)
	eflag=1;
	else
	eflag=0;

	if (_vflag)
	vflag=1;
	else
	vflag=0;

	int GX=static_cast<int>(ceil(static_cast<double>(this->ans->inum())/
	(BX/this->_threads_per_atom)));

	int ainum=this->ans->inum();
	int nbor_pitch=this->nbor->nbor_pitch();
	this->time_pair.start();
	if (shared_types) {
	this->k_pair_fast.set_size(GX,BX);
	this->k_pair_fast.run(&this->atom->x, &lj1, &lj3, &sp_lj,
	&this->nbor->dev_nbor, &this->_nbor_data->begin(),
	&this->ans->force, &this->ans->engv, &eflag,
	&vflag, &ainum, &nbor_pitch, &this->atom->q,
	&_cut_coulsq, &_qqrd2e, &_g_ewald,
	&this->_threads_per_atom);
	} else {
	this->k_pair.set_size(GX,BX);
	this->k_pair.run(&this->atom->x, &lj1, &lj3, &_lj_types, &sp_lj,
	&this->nbor->dev_nbor, &this->_nbor_data->begin(),
	&this->ans->force, &this->ans->engv, &eflag, &vflag,
	&ainum, &nbor_pitch, &this->atom->q, &_cut_coulsq,
	&_qqrd2e, &_g_ewald, &this->_threads_per_atom);
	}
	this->time_pair.stop();
	}

	template class CGCMMLong<PRECISION,ACC_PRECISION>;
	diff --git a/lib/gpu/lal_cg_cmm_long.cu b/lib/gpu/lal_lj_sdk_long.cu
	similarity index 98%
	rename from lib/gpu/lal_cg_cmm_long.cu
	rename to lib/gpu/lal_lj_sdk_long.cu
	index f6942d180..5ff64b225 100644
	--- a/lib/gpu/lal_cg_cmm_long.cu
	+++ b/lib/gpu/lal_lj_sdk_long.cu
	@@ -1,282 +1,282 @@
	// **************************************************************************
	-// cg_cmm_long.cu
	+// lj_sdk_long.cu
	// -------------------
	// W. Michael Brown (ORNL)
	//
	// Device code for acceleration of the lj/sdk/coul/long pair style
	//
	// __________________________________________________________________________
	// This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
	// __________________________________________________________________________
	//
	// begin :
	// email : brownw@ornl.gov
	// ***************************************************************************/

	#ifdef NV_KERNEL

	#include "lal_aux_fun1.h"
	#ifndef _DOUBLE_DOUBLE
	texture<float4> pos_tex;
	texture<float> q_tex;
	#else
	texture<int4,1> pos_tex;
	texture<int2> q_tex;
	#endif

	#else
	#define pos_tex x_
	#define q_tex q_
	#endif

	-__kernel void k_cg_cmm_long(const __global numtyp4 *restrict x_,
	+__kernel void k_lj_sdk_long(const __global numtyp4 *restrict x_,
	const __global numtyp4 *restrict lj1,
	const __global numtyp4 *restrict lj3,
	const int lj_types,
	const __global numtyp *restrict sp_lj_in,
	const __global int *dev_nbor,
	const __global int *dev_packed,
	__global acctyp4 *restrict ans,
	__global acctyp *restrict engv,
	const int eflag, const int vflag, const int inum,
	const int nbor_pitch,
	const __global numtyp *restrict q_ ,
	const numtyp cut_coulsq, const numtyp qqrd2e,
	const numtyp g_ewald, const int t_per_atom) {
	int tid, ii, offset;
	atom_info(t_per_atom,ii,tid,offset);

	__local numtyp sp_lj[8];
	sp_lj[0]=sp_lj_in[0];
	sp_lj[1]=sp_lj_in[1];
	sp_lj[2]=sp_lj_in[2];
	sp_lj[3]=sp_lj_in[3];
	sp_lj[4]=sp_lj_in[4];
	sp_lj[5]=sp_lj_in[5];
	sp_lj[6]=sp_lj_in[6];
	sp_lj[7]=sp_lj_in[7];

	acctyp energy=(acctyp)0;
	acctyp e_coul=(acctyp)0;
	acctyp4 f;
	f.x=(acctyp)0; f.y=(acctyp)0; f.z=(acctyp)0;
	acctyp virial[6];
	for (int i=0; i<6; i++)
	virial[i]=(acctyp)0;

	if (ii<inum) {
	int nbor, nbor_end;
	int i, numj;
	__local int n_stride;
	nbor_info(dev_nbor,dev_packed,nbor_pitch,t_per_atom,ii,offset,i,numj,
	n_stride,nbor_end,nbor);

	numtyp4 ix; fetch4(ix,i,pos_tex); //x_[i];
	numtyp qtmp; fetch(qtmp,i,q_tex);
	int itype=ix.w;

	for ( ; nbor<nbor_end; nbor+=n_stride) {
	int j=dev_packed[nbor];

	numtyp factor_lj, factor_coul;
	factor_lj = sp_lj[sbmask(j)];
	factor_coul = (numtyp)1.0-sp_lj[sbmask(j)+4];
	j &= NEIGHMASK;

	numtyp4 jx; fetch4(jx,j,pos_tex); //x_[j];
	int jtype=jx.w;

	// Compute r12
	numtyp delx = ix.x-jx.x;
	numtyp dely = ix.y-jx.y;
	numtyp delz = ix.z-jx.z;
	numtyp rsq = delxdelx+delydely+delz*delz;

	int mtype=itype*lj_types+jtype;
	if (rsq<lj1[mtype].x) {
	numtyp forcecoul, force_lj, force, inv1, inv2, prefactor, _erfc;
	numtyp r2inv=ucl_recip(rsq);

	if (rsq < lj1[mtype].y) {
	if (lj3[mtype].x == (numtyp)2) {
	inv1=r2inv*r2inv;
	inv2=inv1*inv1;
	} else if (lj3[mtype].x == (numtyp)1) {
	inv2=r2inv*ucl_rsqrt(rsq);
	inv1=inv2*inv2;
	} else {
	inv1=r2invr2invr2inv;
	inv2=inv1;
	}
	force_lj = factor_ljinv1(lj1[mtype].z*inv2-lj1[mtype].w);
	} else
	force_lj = (numtyp)0.0;

	if (rsq < cut_coulsq) {
	numtyp r = ucl_rsqrt(r2inv);
	numtyp grij = g_ewald * r;
	numtyp expm2 = ucl_exp(-grij*grij);
	numtyp t = ucl_recip((numtyp)1.0 + EWALD_P*grij);
	_erfc = t * (A1+t(A2+t(A3+t(A4+tA5)))) * expm2;
	fetch(prefactor,j,q_tex);
	prefactor = qqrd2e qtmp/r;
	forcecoul = prefactor * (_erfc + EWALD_Fgrijexpm2-factor_coul);
	} else
	forcecoul = (numtyp)0.0;

	force = (force_lj + forcecoul) * r2inv;

	f.x+=delx*force;
	f.y+=dely*force;
	f.z+=delz*force;

	if (eflag>0) {
	if (rsq < cut_coulsq)
	e_coul += prefactor*(_erfc-factor_coul);
	if (rsq < lj1[mtype].y) {
	energy += factor_ljinv1(lj3[mtype].y*inv2-lj3[mtype].z)-
	lj3[mtype].w;
	}
	}
	if (vflag>0) {
	virial[0] += delxdelxforce;
	virial[1] += delydelyforce;
	virial[2] += delzdelzforce;
	virial[3] += delxdelyforce;
	virial[4] += delxdelzforce;
	virial[5] += delydelzforce;
	}
	}

	} // for nbor
	store_answers_q(f,energy,e_coul,virial,ii,inum,tid,t_per_atom,offset,eflag,
	vflag,ans,engv);
	} // if ii
	}

	-__kernel void k_cg_cmm_long_fast(const __global numtyp4 *restrict x_,
	+__kernel void k_lj_sdk_long_fast(const __global numtyp4 *restrict x_,
	const __global numtyp4 *restrict lj1_in,
	const __global numtyp4 *restrict lj3_in,
	const __global numtyp *restrict sp_lj_in,
	const __global int *dev_nbor,
	const __global int *dev_packed,
	__global acctyp4 *restrict ans,
	__global acctyp *restrict engv,
	const int eflag, const int vflag,
	const int inum, const int nbor_pitch,
	const __global numtyp *restrict q_,
	const numtyp cut_coulsq, const numtyp qqrd2e,
	const numtyp g_ewald, const int t_per_atom) {
	int tid, ii, offset;
	atom_info(t_per_atom,ii,tid,offset);

	__local numtyp4 lj1[MAX_SHARED_TYPES*MAX_SHARED_TYPES];
	__local numtyp4 lj3[MAX_SHARED_TYPES*MAX_SHARED_TYPES];
	__local numtyp sp_lj[8];
	if (tid<8)
	sp_lj[tid]=sp_lj_in[tid];
	if (tid<MAX_SHARED_TYPES*MAX_SHARED_TYPES) {
	lj1[tid]=lj1_in[tid];
	lj3[tid]=lj3_in[tid];
	}

	acctyp energy=(acctyp)0;
	acctyp e_coul=(acctyp)0;
	acctyp4 f;
	f.x=(acctyp)0; f.y=(acctyp)0; f.z=(acctyp)0;
	acctyp virial[6];
	for (int i=0; i<6; i++)
	virial[i]=(acctyp)0;

	__syncthreads();

	if (ii<inum) {
	int nbor, nbor_end;
	int i, numj;
	__local int n_stride;
	nbor_info(dev_nbor,dev_packed,nbor_pitch,t_per_atom,ii,offset,i,numj,
	n_stride,nbor_end,nbor);

	numtyp4 ix; fetch4(ix,i,pos_tex); //x_[i];
	numtyp qtmp; fetch(qtmp,i,q_tex);
	int iw=ix.w;
	int itype=fast_mul((int)MAX_SHARED_TYPES,iw);

	for ( ; nbor<nbor_end; nbor+=n_stride) {
	int j=dev_packed[nbor];

	numtyp factor_lj, factor_coul;
	factor_lj = sp_lj[sbmask(j)];
	factor_coul = (numtyp)1.0-sp_lj[sbmask(j)+4];
	j &= NEIGHMASK;

	numtyp4 jx; fetch4(jx,j,pos_tex); //x_[j];
	int mtype=itype+jx.w;

	// Compute r12
	numtyp delx = ix.x-jx.x;
	numtyp dely = ix.y-jx.y;
	numtyp delz = ix.z-jx.z;
	numtyp rsq = delxdelx+delydely+delz*delz;

	if (rsq<lj1[mtype].x) {
	numtyp forcecoul, force_lj, force, inv1, inv2, prefactor, _erfc;
	numtyp r2inv=ucl_recip(rsq);

	if (rsq < lj1[mtype].y) {
	if (lj3[mtype].x == (numtyp)2) {
	inv1=r2inv*r2inv;
	inv2=inv1*inv1;
	} else if (lj3[mtype].x == (numtyp)1) {
	inv2=r2inv*ucl_rsqrt(rsq);
	inv1=inv2*inv2;
	} else {
	inv1=r2invr2invr2inv;
	inv2=inv1;
	}
	force_lj = factor_ljinv1(lj1[mtype].z*inv2-lj1[mtype].w);
	} else
	force_lj = (numtyp)0.0;

	if (rsq < cut_coulsq) {
	numtyp r = ucl_rsqrt(r2inv);
	numtyp grij = g_ewald * r;
	numtyp expm2 = ucl_exp(-grij*grij);
	numtyp t = ucl_recip((numtyp)1.0 + EWALD_P*grij);
	_erfc = t * (A1+t(A2+t(A3+t(A4+tA5)))) * expm2;
	fetch(prefactor,j,q_tex);
	prefactor = qqrd2e qtmp/r;
	forcecoul = prefactor * (_erfc + EWALD_Fgrijexpm2-factor_coul);
	} else
	forcecoul = (numtyp)0.0;

	force = (force_lj + forcecoul) * r2inv;

	f.x+=delx*force;
	f.y+=dely*force;
	f.z+=delz*force;

	if (eflag>0) {
	if (rsq < cut_coulsq)
	e_coul += prefactor*(_erfc-factor_coul);
	if (rsq < lj1[mtype].y) {
	energy += factor_ljinv1(lj3[mtype].y*inv2-lj3[mtype].z)-
	lj3[mtype].w;
	}
	}
	if (vflag>0) {
	virial[0] += delxdelxforce;
	virial[1] += delydelyforce;
	virial[2] += delzdelzforce;
	virial[3] += delxdelyforce;
	virial[4] += delxdelzforce;
	virial[5] += delydelzforce;
	}
	}

	} // for nbor
	store_answers_q(f,energy,e_coul,virial,ii,inum,tid,t_per_atom,offset,eflag,
	vflag,ans,engv);
	} // if ii
	}

	diff --git a/lib/gpu/lal_cg_cmm_long.h b/lib/gpu/lal_lj_sdk_long.h
	similarity index 98%
	rename from lib/gpu/lal_cg_cmm_long.h
	rename to lib/gpu/lal_lj_sdk_long.h
	index aa0cbfbaf..f56687cd7 100644
	--- a/lib/gpu/lal_cg_cmm_long.h
	+++ b/lib/gpu/lal_lj_sdk_long.h
	@@ -1,83 +1,83 @@
	/***************************************************************************
	- cg_cmm_long.h
	+ lj_sdk_long.h
	-------------------
	W. Michael Brown (ORNL)

	Class for acceleration of the lj/sdk/coul/long pair style

	__________________________________________________________________________
	This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
	__________________________________________________________________________

	begin :
	email : brownw@ornl.gov
	***************************************************************************/

	#ifndef LAL_CG_CMM_LONG_H
	#define LAL_CG_CMM_LONG_H

	#include "lal_base_charge.h"

	namespace LAMMPS_AL {

	template <class numtyp, class acctyp>
	class CGCMMLong : public BaseCharge<numtyp, acctyp> {
	public:
	CGCMMLong();
	~CGCMMLong();

	/// Clear any previous data and set up for a new LAMMPS run
	/** \param max_nbors initial number of rows in the neighbor matrix
	* \param cell_size cutoff + skin
	* \param gpu_split fraction of particles handled by device
	*
	* Returns:
	* - 0 if successfull
	* - -1 if fix gpu not found
	* - -3 if there is an out of memory error
	* - -4 if the GPU library was not compiled for GPU
	* - -5 Double precision is not supported on card **/
	int init(const int ntypes, double host_cutsq, int cg_type,
	double host_lj1, double host_lj2, double **host_lj3,
	double host_lj4, double host_offset, double *host_special_lj,
	const int nlocal, const int nall, const int max_nbors,
	const int maxspecial, const double cell_size,
	const double gpu_split, FILE screen, double *host_cut_ljsq,
	const double host_cut_coulsq, double *host_special_coul,
	const double qqrd2e, const double g_ewald);

	/// Clear all host and device data
	/ \note This is called at the beginning of the init() routine /
	void clear();

	/// Returns memory usage on device per atom
	int bytes_per_atom(const int max_nbors) const;

	/// Total host memory used by library for pair style
	double host_memory_usage() const;

	// --------------------------- TYPE DATA --------------------------

	/// lj1.x = cutsq, lj1.y = cutsq_vdw, lj1.z = lj1, lj1.w = lj2,
	UCL_D_Vec<numtyp4> lj1;
	/// lj3.x = cg_type, lj3.y = lj3, lj3.z = lj4, lj3.w = offset
	UCL_D_Vec<numtyp4> lj3;
	/// Special LJ values [0-3] and Special Coul values [4-7]
	UCL_D_Vec<numtyp> sp_lj;

	/// If atom type constants fit in shared memory, use fast kernels
	bool shared_types;

	/// Number of atom types
	int _lj_types;

	numtyp _cut_coulsq, _qqrd2e, _g_ewald;

	private:
	bool _allocated;
	void loop(const bool _eflag, const bool _vflag);
	};

	}

	#endif
	diff --git a/lib/gpu/lal_cg_cmm_long_ext.cpp b/lib/gpu/lal_lj_sdk_long_ext.cpp
	similarity index 93%
	rename from lib/gpu/lal_cg_cmm_long_ext.cpp
	rename to lib/gpu/lal_lj_sdk_long_ext.cpp
	index ee0a0269e..08390d3ee 100644
	--- a/lib/gpu/lal_cg_cmm_long_ext.cpp
	+++ b/lib/gpu/lal_lj_sdk_long_ext.cpp
	@@ -1,129 +1,129 @@
	/***************************************************************************
	- cg_cmm_long.h
	+ lj_sdk_long.h
	-------------------
	W. Michael Brown (ORNL)

	Functions for LAMMPS access to lj/sdk/coul/long acceleration functions

	__________________________________________________________________________
	This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
	__________________________________________________________________________

	begin :
	email : brownw@ornl.gov
	***************************************************************************/

	#include <iostream>
	#include <cassert>
	#include <math.h>

	-#include "lal_cg_cmm_long.h"
	+#include "lal_lj_sdk_long.h"

	using namespace std;
	using namespace LAMMPS_AL;

	static CGCMMLong<PRECISION,ACC_PRECISION> CMMLMF;

	// ---------------------------------------------------------------------------
	// Allocate memory on host and device and copy constants to device
	// ---------------------------------------------------------------------------
	-int cmml_gpu_init(const int ntypes, double cutsq, int cg_type,
	+int sdkl_gpu_init(const int ntypes, double cutsq, int cg_type,
	double host_lj1, double host_lj2, double **host_lj3,
	double host_lj4, double offset, double *special_lj,
	const int inum, const int nall, const int max_nbors,
	const int maxspecial, const double cell_size, int &gpu_mode,
	FILE screen, double *host_cut_ljsq, double host_cut_coulsq,
	double *host_special_coul, const double qqrd2e,
	const double g_ewald) {
	CMMLMF.clear();
	gpu_mode=CMMLMF.device->gpu_mode();
	double gpu_split=CMMLMF.device->particle_split();
	int first_gpu=CMMLMF.device->first_device();
	int last_gpu=CMMLMF.device->last_device();
	int world_me=CMMLMF.device->world_me();
	int gpu_rank=CMMLMF.device->gpu_rank();
	int procs_per_gpu=CMMLMF.device->procs_per_gpu();

	CMMLMF.device->init_message(screen,"lj/sdk/coul/long",first_gpu,last_gpu);

	bool message=false;
	if (CMMLMF.device->replica_me()==0 && screen)
	message=true;

	if (message) {
	fprintf(screen,"Initializing Device and compiling on process 0...");
	fflush(screen);
	}

	int init_ok=0;
	if (world_me==0)
	init_ok=CMMLMF.init(ntypes, cutsq, cg_type, host_lj1, host_lj2, host_lj3,
	host_lj4, offset, special_lj, inum, nall, 300,
	maxspecial, cell_size, gpu_split, screen, host_cut_ljsq,
	host_cut_coulsq, host_special_coul, qqrd2e,g_ewald);

	CMMLMF.device->world_barrier();
	if (message)
	fprintf(screen,"Done.\n");

	for (int i=0; i<procs_per_gpu; i++) {
	if (message) {
	if (last_gpu-first_gpu==0)
	fprintf(screen,"Initializing Device %d on core %d...",first_gpu,i);
	else
	fprintf(screen,"Initializing Devices %d-%d on core %d...",first_gpu,
	last_gpu,i);
	fflush(screen);
	}
	if (gpu_rank==i && world_me!=0)
	init_ok=CMMLMF.init(ntypes, cutsq, cg_type, host_lj1, host_lj2, host_lj3,
	host_lj4, offset, special_lj, inum, nall, 300,
	maxspecial, cell_size, gpu_split, screen,
	host_cut_ljsq, host_cut_coulsq, host_special_coul,
	qqrd2e, g_ewald);
	CMMLMF.device->gpu_barrier();
	if (message)
	fprintf(screen,"Done.\n");
	}
	if (message)
	fprintf(screen,"\n");

	if (init_ok==0)
	CMMLMF.estimate_gpu_overhead();
	return init_ok;
	}

	-void cmml_gpu_clear() {
	+void sdkl_gpu_clear() {
	CMMLMF.clear();
	}

	-int** cmml_gpu_compute_n(const int ago, const int inum_full,
	+int** sdkl_gpu_compute_n(const int ago, const int inum_full,
	const int nall, double *host_x, int host_type,
	double sublo, double subhi, tagint tag, int *nspecial,
	tagint **special, const bool eflag, const bool vflag,
	const bool eatom, const bool vatom, int &host_start,
	int ilist, int jnum, const double cpu_time,
	bool &success, double host_q, double boxlo,
	double *prd) {
	return CMMLMF.compute(ago, inum_full, nall, host_x, host_type, sublo,
	subhi, tag, nspecial, special, eflag, vflag, eatom,
	vatom, host_start, ilist, jnum, cpu_time, success,
	host_q,boxlo,prd);
	}

	-void cmml_gpu_compute(const int ago, const int inum_full, const int nall,
	+void sdkl_gpu_compute(const int ago, const int inum_full, const int nall,
	double *host_x, int host_type, int ilist, int numj,
	int **firstneigh, const bool eflag, const bool vflag,
	const bool eatom, const bool vatom, int &host_start,
	const double cpu_time, bool &success, double *host_q,
	const int nlocal, double boxlo, double prd) {
	CMMLMF.compute(ago,inum_full,nall,host_x,host_type,ilist,numj,
	firstneigh,eflag,vflag,eatom,vatom,host_start,cpu_time,success,
	host_q,nlocal,boxlo,prd);
	}

	-double cmml_gpu_bytes() {
	+double sdkl_gpu_bytes() {
	return CMMLMF.host_memory_usage();
	}


	diff --git a/lib/h5md/Install.py b/lib/h5md/Install.py
	new file mode 100644
	index 000000000..18b426f92
	--- /dev/null
	+++ b/lib/h5md/Install.py
	@@ -0,0 +1,82 @@
	+#!/usr/bin/env python
	+
	+# install.py tool to do a generic build of a library
	+# soft linked to by many of the lib/Install.py files
	+# used to automate the steps described in the corresponding lib/README
	+
	+import sys,commands,os
	+
	+# help message
	+
	+help = """
	+Syntax: python Install.py -m machine -e suffix
	+ specify -m and optionally -e, order does not matter
	+ -m = peform a clean followed by "make -f Makefile.machine"
	+ machine = suffix of a lib/Makefile.* file
	+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
	+ does not alter existing Makefile.machine
	+"""
	+
	+# print error message or help
	+
	+def error(str=None):
	+ if not str: print help
	+ else: print "ERROR",str
	+ sys.exit()
	+
	+# parse args
	+
	+args = sys.argv[1:]
	+nargs = len(args)
	+if nargs == 0: error()
	+
	+machine = None
	+extraflag = 0
	+
	+iarg = 0
	+while iarg < nargs:
	+ if args[iarg] == "-m":
	+ if iarg+2 > nargs: error()
	+ machine = args[iarg+1]
	+ iarg += 2
	+ elif args[iarg] == "-e":
	+ if iarg+2 > nargs: error()
	+ extraflag = 1
	+ suffix = args[iarg+1]
	+ iarg += 2
	+ else: error()
	+
	+# set lib from working dir
	+
	+cwd = os.getcwd()
	+lib = os.path.basename(cwd)
	+
	+# create Makefile.auto as copy of Makefile.machine
	+# reset EXTRAMAKE if requested
	+
	+if not os.path.exists("Makefile.%s" % machine):
	+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
	+
	+lines = open("Makefile.%s" % machine,'r').readlines()
	+fp = open("Makefile.auto",'w')
	+
	+for line in lines:
	+ words = line.split()
	+ if len(words) == 3 and extraflag and \
	+ words[0] == "EXTRAMAKE" and words[1] == '=':
	+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
	+ print >>fp,line,
	+
	+fp.close()
	+
	+# make the library via Makefile.auto
	+
	+print "Building lib%s.a ..." % lib
	+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
	+txt = commands.getoutput(cmd)
	+print txt
	+
	+if os.path.exists("lib%s.a" % lib): print "Build was successful"
	+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
	+if not os.path.exists("Makefile.lammps"):
	+ print "lib/%s/Makefile.lammps was NOT created" % lib
	diff --git a/lib/h5md/Makefile b/lib/h5md/Makefile.h5cc
	similarity index 95%
	rename from lib/h5md/Makefile
	rename to lib/h5md/Makefile.h5cc
	index 085d21ff6..bd3e8a978 100644
	--- a/lib/h5md/Makefile
	+++ b/lib/h5md/Makefile.h5cc
	@@ -1,33 +1,33 @@
	EXTRAMAKE=Makefile.lammps.empty

	CC=h5cc

	# -DH5_NO_DEPRECATED_SYMBOLS is required here to ensure we are using
	# the v1.8 API when HDF5 is configured to default to using the v1.6 API.
	CFLAGS=-D_DEFAULT_SOURCE -O2 -DH5_NO_DEPRECATED_SYMBOLS -Wall -fPIC
	HDF5_PATH=/usr
	INC=-I include
	AR=ar
	ARFLAGS=rc
	LIB=libch5md.a

	all: lib Makefile.lammps

	build:
	mkdir -p build
	build/ch5md.o: src/ch5md.c \| build
	$(CC) $(INC) $(CFLAGS) -c $< -o $@

	Makefile.lammps:
	- cp Makefile.lammps.empty $@
	+ cp $(EXTRAMAKE) $@

	.PHONY: all lib clean

	$(LIB): build/ch5md.o
	$(AR) $(ARFLAGS) $(LIB) build/ch5md.o

	lib: $(LIB)

	clean:
	rm -f build/*.o $(LIB)

	diff --git a/lib/h5md/README b/lib/h5md/README
	index 62a4979cb..fb7d82bfc 100644
	--- a/lib/h5md/README
	+++ b/lib/h5md/README
	@@ -1,27 +1,38 @@
	This directory contains the ch5md library, which is bundled with
	LAMMPS under its own BSD license; see below. This library is used
	when the USER-H5MD package is included in a LAMMPS build and the dump
	h5md command is invoked in a LAMMPS input script.

	+You can type "make lib-h5md" from the src directory to see help on how
	+to build this library via make commands, or you can do the same thing
	+by typing "python Install.py" from within this directory, or you can
	+do it manually by following the instructions below.
	+
	---------------------

	ch5md : Read and write H5MD files in C
	======================================

	Copyright (C) 2013-2014 Pierre de Buyl

	ch5md is a set of C routines to manipulate H5MD files. H5MD is a file format
	specification based on [HDF5](http://www.hdfgroup.org/HDF5/) for storing
	molecular data, whose development is found at <http://nongnu.org/h5md/>.

	ch5md is developped by Pierre de Buyl and is released under the 3-clause BSD
	license that can be found in the file LICENSE.

	-To use the h5md dump style in lammps, execute make in this directory then 'make
	-yes-user-h5md' in the src directory of lammps. Rebuild lammps.
	+To use the h5md dump style in lammps, execute
	+make -f Makefile.h5cc
	+in this directory then
	+make yes-user-h5md
	+in the src directory of LAMMPS to rebuild LAMMPS.
	+
	+Note that you must have the h5cc compiler installed to use
	+Makefile.h5cc. It should be part

	If HDF5 is not in a standard system location, edit Makefile.lammps accordingly.

	In the case of 2015 and more recent debian and ubuntu systems where concurrent
	serial and mpi are possible, use the full platform depedent path, i.e.
	`HDF5_PATH=/usr/lib/x86_64-linux-gnu/hdf5/serial`
	diff --git a/lib/kokkos/CHANGELOG.md b/lib/kokkos/CHANGELOG.md
	index 4a96e2441..c6fe991b9 100644
	--- a/lib/kokkos/CHANGELOG.md
	+++ b/lib/kokkos/CHANGELOG.md
	@@ -1,306 +1,329 @@
	# Change Log

	+## [2.03.00](https://github.com/kokkos/kokkos/tree/2.03.00) (2017-04-25)
	+[Full Changelog](https://github.com/kokkos/kokkos/compare/2.02.15...2.03.00)
	+
	+Implemented enhancements:
	+
	+- UnorderedMap: make it accept Devices or MemorySpaces [\#711](https://github.com/kokkos/kokkos/issues/711)
	+- sort to accept DynamicView and \[begin,end\) indices [\#691](https://github.com/kokkos/kokkos/issues/691)
	+- ENABLE Macros should only be used via \#ifdef or \#if defined [\#675](https://github.com/kokkos/kokkos/issues/675)
	+- Remove impl/Kokkos\_Synchronic\_\* [\#666](https://github.com/kokkos/kokkos/issues/666)
	+- Turning off IVDEP for Intel 14. [\#638](https://github.com/kokkos/kokkos/issues/638)
	+- Using an installed Kokkos in a target application using CMake [\#633](https://github.com/kokkos/kokkos/issues/633)
	+- Create Kokkos Bill of Materials [\#632](https://github.com/kokkos/kokkos/issues/632)
	+- MDRangePolicy and tagged evaluators [\#547](https://github.com/kokkos/kokkos/issues/547)
	+- Add PGI support [\#289](https://github.com/kokkos/kokkos/issues/289)
	+
	+Fixed bugs:
	+
	+- Output from PerTeam fails [\#733](https://github.com/kokkos/kokkos/issues/733)
	+- Cuda: architecture flag not added to link line [\#688](https://github.com/kokkos/kokkos/issues/688)
	+- Getting large chunks of memory for a thread team in a universal way [\#664](https://github.com/kokkos/kokkos/issues/664)
	+- Kokkos RNG normal function hangs for small seed value [\#655](https://github.com/kokkos/kokkos/issues/655)
	+- Kokkos Tests Errors on Shepard/HSW Builds [\#644](https://github.com/kokkos/kokkos/issues/644)
	+
	## [2.02.15](https://github.com/kokkos/kokkos/tree/2.02.15) (2017-02-10)
	[Full Changelog](https://github.com/kokkos/kokkos/compare/2.02.07...2.02.15)

	Implemented enhancements:

	- Containers: Adding block partitioning to StaticCrsGraph [\#625](https://github.com/kokkos/kokkos/issues/625)
	- Kokkos Make System can induce Errors on Cray Volta System [\#610](https://github.com/kokkos/kokkos/issues/610)
	- OpenMP: error out if KOKKOS\_HAVE\_OPENMP is defined but not \_OPENMP [\#605](https://github.com/kokkos/kokkos/issues/605)
	- CMake: fix standalone build with tests [\#604](https://github.com/kokkos/kokkos/issues/604)
	- Change README $that GitHub shows when opening Kokkos project page$ to tell users how to submit PRs [\#597](https://github.com/kokkos/kokkos/issues/597)
	- Add correctness testing for all operators of Atomic View [\#420](https://github.com/kokkos/kokkos/issues/420)
	- Allow assignment of Views with compatible memory spaces [\#290](https://github.com/kokkos/kokkos/issues/290)
	- Build only one version of Kokkos library for tests [\#213](https://github.com/kokkos/kokkos/issues/213)
	- Clean out old KOKKOS\_HAVE\_CXX11 macros clauses [\#156](https://github.com/kokkos/kokkos/issues/156)
	- Harmonize Macro names [\#150](https://github.com/kokkos/kokkos/issues/150)

	Fixed bugs:

	- Cray and PGI: Kokkos\_Parallel\_Reduce [\#634](https://github.com/kokkos/kokkos/issues/634)
	- Kokkos Make System can induce Errors on Cray Volta System [\#610](https://github.com/kokkos/kokkos/issues/610)
	- Normal function random number generator doesn't give the expected distribution [\#592](https://github.com/kokkos/kokkos/issues/592)

	## [2.02.07](https://github.com/kokkos/kokkos/tree/2.02.07) (2016-12-16)
	[Full Changelog](https://github.com/kokkos/kokkos/compare/2.02.01...2.02.07)

	Implemented enhancements:

	- Add CMake option to enable Cuda Lambda support [\#589](https://github.com/kokkos/kokkos/issues/589)
	- Add CMake option to enable Cuda RDC support [\#588](https://github.com/kokkos/kokkos/issues/588)
	- Add Initial Intel Sky Lake Xeon-HPC Compiler Support to Kokkos Make System [\#584](https://github.com/kokkos/kokkos/issues/584)
	- Building Tutorial Examples [\#582](https://github.com/kokkos/kokkos/issues/582)
	- Internal way for using ThreadVectorRange without TeamHandle [\#574](https://github.com/kokkos/kokkos/issues/574)
	- Testing: Add testing for uvm and rdc [\#571](https://github.com/kokkos/kokkos/issues/571)
	- Profiling: Add Memory Tracing and Region Markers [\#557](https://github.com/kokkos/kokkos/issues/557)
	- nvcc\_wrapper not installed with Kokkos built with CUDA through CMake [\#543](https://github.com/kokkos/kokkos/issues/543)
	- Improve DynRankView debug check [\#541](https://github.com/kokkos/kokkos/issues/541)
	- Benchmarks: Add Gather benchmark [\#536](https://github.com/kokkos/kokkos/issues/536)
	- Testing: add spot\_check option to test\_all\_sandia [\#535](https://github.com/kokkos/kokkos/issues/535)
	- Deprecate Kokkos::Impl::VerifyExecutionCanAccessMemorySpace [\#527](https://github.com/kokkos/kokkos/issues/527)
	- Add AtomicAdd support for 64bit float for Pascal [\#522](https://github.com/kokkos/kokkos/issues/522)
	- Add Restrict and Aligned memory trait [\#517](https://github.com/kokkos/kokkos/issues/517)
	- Kokkos Tests are Not Run using Compiler Optimization [\#501](https://github.com/kokkos/kokkos/issues/501)
	- Add support for clang 3.7 w/ openmp backend [\#393](https://github.com/kokkos/kokkos/issues/393)
	- Provide an error throw class [\#79](https://github.com/kokkos/kokkos/issues/79)

	Fixed bugs:

	- Cuda UVM Allocation test broken with UVM as default space [\#586](https://github.com/kokkos/kokkos/issues/586)
	- Bug $develop branch only$: multiple tests are now failing when forcing uvm usage. [\#570](https://github.com/kokkos/kokkos/issues/570)
	- Error in generate\_makefile.sh for Kokkos when Compiler is Empty String/Fails [\#568](https://github.com/kokkos/kokkos/issues/568)
	- XL 13.1.4 incorrect C++11 flag [\#553](https://github.com/kokkos/kokkos/issues/553)
	- Improve DynRankView debug check [\#541](https://github.com/kokkos/kokkos/issues/541)
	- Installing Library on MAC broken due to cp -u [\#539](https://github.com/kokkos/kokkos/issues/539)
	- Intel Nightly Testing with Debug enabled fails [\#534](https://github.com/kokkos/kokkos/issues/534)

	## [2.02.01](https://github.com/kokkos/kokkos/tree/2.02.01) (2016-11-01)
	[Full Changelog](https://github.com/kokkos/kokkos/compare/2.02.00...2.02.01)

	Implemented enhancements:

	- Add Changelog generation to our process. [\#506](https://github.com/kokkos/kokkos/issues/506)

	Fixed bugs:

	- Test scratch\_request fails in Serial with Debug enabled [\#520](https://github.com/kokkos/kokkos/issues/520)
	- Bug In BoundsCheck for DynRankView [\#516](https://github.com/kokkos/kokkos/issues/516)

	## [2.02.00](https://github.com/kokkos/kokkos/tree/2.02.00) (2016-10-30)
	[Full Changelog](https://github.com/kokkos/kokkos/compare/2.01.10...2.02.00)

	Implemented enhancements:

	- Add PowerPC assembly for grabbing clock register in memory pool [\#511](https://github.com/kokkos/kokkos/issues/511)
	- Add GCC 6.x support [\#508](https://github.com/kokkos/kokkos/issues/508)
	- Test install and build against installed library [\#498](https://github.com/kokkos/kokkos/issues/498)
	- Makefile.kokkos adds expt-extended-lambda to cuda build with clang [\#490](https://github.com/kokkos/kokkos/issues/490)
	- Add top-level makefile option to just test kokkos-core unit-test [\#485](https://github.com/kokkos/kokkos/issues/485)
	- Split and harmonize Object Files of Core UnitTests to increase build parallelism [\#484](https://github.com/kokkos/kokkos/issues/484)
	- LayoutLeft to LayoutLeft subview for 3D and 4D views [\#473](https://github.com/kokkos/kokkos/issues/473)
	- Add official Cuda 8.0 support [\#468](https://github.com/kokkos/kokkos/issues/468)
	- Allow C++1Z Flag for Class Lambda capture [\#465](https://github.com/kokkos/kokkos/issues/465)
	- Add Clang 4.0+ compilation of Cuda code [\#455](https://github.com/kokkos/kokkos/issues/455)
	- Possible Issue with Intel 17.0.098 and GCC 6.1.0 in Develop Branch [\#445](https://github.com/kokkos/kokkos/issues/445)
	- Add name of view to "View bounds error" [\#432](https://github.com/kokkos/kokkos/issues/432)
	- Move Sort Binning Operators into Kokkos namespace [\#421](https://github.com/kokkos/kokkos/issues/421)
	- TaskPolicy - generate error when attempt to use uninitialized [\#396](https://github.com/kokkos/kokkos/issues/396)
	- Import WithoutInitializing and AllowPadding into Kokkos namespace [\#325](https://github.com/kokkos/kokkos/issues/325)
	- TeamThreadRange requires begin, end to be the same type [\#305](https://github.com/kokkos/kokkos/issues/305)
	- CudaUVMSpace should track \# allocations, due to CUDA limit on \# UVM allocations [\#300](https://github.com/kokkos/kokkos/issues/300)
	- Remove old View and its infrastructure [\#259](https://github.com/kokkos/kokkos/issues/259)

	Fixed bugs:

	- Bug in TestCuda\_Other.cpp: most likely assembly inserted into Device code [\#515](https://github.com/kokkos/kokkos/issues/515)
	- Cuda Compute Capability check of GPU is outdated [\#509](https://github.com/kokkos/kokkos/issues/509)
	- multi\_scratch test with hwloc and pthreads seg-faults. [\#504](https://github.com/kokkos/kokkos/issues/504)
	- generate\_makefile.bash: "make install" is broken [\#503](https://github.com/kokkos/kokkos/issues/503)
	- make clean in Out of Source Build/Tests Does Not Work Correctly [\#502](https://github.com/kokkos/kokkos/issues/502)
	- Makefiles for test and examples have issues in Cuda when CXX is not explicitly specified [\#497](https://github.com/kokkos/kokkos/issues/497)
	- Dispatch lambda test directly inside GTEST macro doesn't work with nvcc [\#491](https://github.com/kokkos/kokkos/issues/491)
	- UnitTests with HWLOC enabled fail if run with mpirun bound to a single core [\#489](https://github.com/kokkos/kokkos/issues/489)
	- Failing Reducer Test on Mac with Pthreads [\#479](https://github.com/kokkos/kokkos/issues/479)
	- make test Dumps Error with Clang Not Found [\#471](https://github.com/kokkos/kokkos/issues/471)
	- OpenMP TeamPolicy member broadcast not using correct volatile shared variable [\#424](https://github.com/kokkos/kokkos/issues/424)
	- TaskPolicy - generate error when attempt to use uninitialized [\#396](https://github.com/kokkos/kokkos/issues/396)
	- New task policy implementation is pulling in old experimental code. [\#372](https://github.com/kokkos/kokkos/issues/372)
	- MemoryPool unit test hangs on Power8 with GCC 6.1.0 [\#298](https://github.com/kokkos/kokkos/issues/298)

	## [2.01.10](https://github.com/kokkos/kokkos/tree/2.01.10) (2016-09-27)
	[Full Changelog](https://github.com/kokkos/kokkos/compare/2.01.06...2.01.10)

	Implemented enhancements:

	- Enable Profiling by default in Tribits build [\#438](https://github.com/kokkos/kokkos/issues/438)
	- parallel\_reduce$0$, parallel\_scan$0$ unit tests [\#436](https://github.com/kokkos/kokkos/issues/436)
	- data==NULL after realloc with LayoutStride [\#351](https://github.com/kokkos/kokkos/issues/351)
	- Fix tutorials to track new Kokkos::View [\#323](https://github.com/kokkos/kokkos/issues/323)
	- Rename team policy set\_scratch\_size. [\#195](https://github.com/kokkos/kokkos/issues/195)

	Fixed bugs:

	- Possible Issue with Intel 17.0.098 and GCC 6.1.0 in Develop Branch [\#445](https://github.com/kokkos/kokkos/issues/445)
	- Makefile spits syntax error [\#435](https://github.com/kokkos/kokkos/issues/435)
	- Kokkos::sort fails for view with all the same values [\#422](https://github.com/kokkos/kokkos/issues/422)
	- Generic Reducers: can't accept inline constructed reducer [\#404](https://github.com/kokkos/kokkos/issues/404)
	- data\$\$==NULL after realloc with LayoutStride [\#351](https://github.com/kokkos/kokkos/issues/351)
	- const subview of const view with compile time dimensions on Cuda backend [\#310](https://github.com/kokkos/kokkos/issues/310)
	- Kokkos $in Trilinos$ Causes Internal Compiler Error on CUDA 8.0.21-EA on POWER8 [\#307](https://github.com/kokkos/kokkos/issues/307)
	- Core Oversubscription Detection Broken? [\#159](https://github.com/kokkos/kokkos/issues/159)


	## [2.01.06](https://github.com/kokkos/kokkos/tree/2.01.06) (2016-09-02)
	[Full Changelog](https://github.com/kokkos/kokkos/compare/2.01.00...2.01.06)

	Implemented enhancements:

	- Add "standard" reducers for lambda-supportable customized reduce [\#411](https://github.com/kokkos/kokkos/issues/411)
	- TaskPolicy - single thread back-end execution [\#390](https://github.com/kokkos/kokkos/issues/390)
	- Kokkos master clone tag [\#387](https://github.com/kokkos/kokkos/issues/387)
	- Query memory requirements from task policy [\#378](https://github.com/kokkos/kokkos/issues/378)
	- Output order of test\_atomic.cpp is confusing [\#373](https://github.com/kokkos/kokkos/issues/373)
	- Missing testing for atomics [\#341](https://github.com/kokkos/kokkos/issues/341)
	- Feature request for Kokkos to provide Kokkos::atomic\_fetch\_max and atomic\_fetch\_min [\#336](https://github.com/kokkos/kokkos/issues/336)
	- TaskPolicy\<Cuda\> performance requires teams mapped to warps [\#218](https://github.com/kokkos/kokkos/issues/218)

	Fixed bugs:

	- Reduce with Teams broken for custom initialize [\#407](https://github.com/kokkos/kokkos/issues/407)
	- Failing Kokkos build on Debian [\#402](https://github.com/kokkos/kokkos/issues/402)
	- Failing Tests on NVIDIA Pascal GPUs [\#398](https://github.com/kokkos/kokkos/issues/398)
	- Algorithms: fill\_random assumes dimensions fit in unsigned int [\#389](https://github.com/kokkos/kokkos/issues/389)
	- Kokkos::subview with RandomAccess Memory Trait [\#385](https://github.com/kokkos/kokkos/issues/385)
	- Build warning $signed / unsigned comparison$ in Cuda implementation [\#365](https://github.com/kokkos/kokkos/issues/365)
	- wrong results for a parallel\_reduce with CUDA8 / Maxwell50 [\#352](https://github.com/kokkos/kokkos/issues/352)
	- Hierarchical parallelism - 3 level unit test [\#344](https://github.com/kokkos/kokkos/issues/344)
	- Can I allocate a View w/ both WithoutInitializing & AllowPadding? [\#324](https://github.com/kokkos/kokkos/issues/324)
	- subview View layout determination [\#309](https://github.com/kokkos/kokkos/issues/309)
	- Unit tests with Cuda - Maxwell [\#196](https://github.com/kokkos/kokkos/issues/196)

	## [2.01.00](https://github.com/kokkos/kokkos/tree/2.01.00) (2016-07-21)
	[Full Changelog](https://github.com/kokkos/kokkos/compare/End_C++98...2.01.00)

	Implemented enhancements:

	- Edit ViewMapping so assigning Views with the same custom layout compiles when const casting [\#327](https://github.com/kokkos/kokkos/issues/327)
	- DynRankView: Performance improvement for operator [\#321](https://github.com/kokkos/kokkos/issues/321)
	- Interoperability between static and dynamic rank views [\#295](https://github.com/kokkos/kokkos/issues/295)
	- subview member function ? [\#280](https://github.com/kokkos/kokkos/issues/280)
	- Inter-operatibility between View and DynRankView. [\#245](https://github.com/kokkos/kokkos/issues/245)
	- $Trilinos$ build warning in atomic\_assign, with Kokkos::complex [\#177](https://github.com/kokkos/kokkos/issues/177)
	- View\<\>::shmem\_size should runtime check for number of arguments equal to rank [\#176](https://github.com/kokkos/kokkos/issues/176)
	- Custom reduction join via lambda argument [\#99](https://github.com/kokkos/kokkos/issues/99)
	- DynRankView with 0 dimensions passed in at construction [\#293](https://github.com/kokkos/kokkos/issues/293)
	- Inject view\_alloc and friends into Kokkos namespace [\#292](https://github.com/kokkos/kokkos/issues/292)
	- Less restrictive TeamPolicy reduction on Cuda [\#286](https://github.com/kokkos/kokkos/issues/286)
	- deep\_copy using remap with source execution space [\#267](https://github.com/kokkos/kokkos/issues/267)
	- Suggestion: Enable opt-in L1 caching via nvcc-wrapper [\#261](https://github.com/kokkos/kokkos/issues/261)
	- More flexible create\_mirror functions [\#260](https://github.com/kokkos/kokkos/issues/260)
	- Rename View::memory\_span to View::required\_allocation\_size [\#256](https://github.com/kokkos/kokkos/issues/256)
	- Use of subviews and views with compile-time dimensions [\#237](https://github.com/kokkos/kokkos/issues/237)
	- Use of subviews and views with compile-time dimensions [\#237](https://github.com/kokkos/kokkos/issues/237)
	- Kokkos::Timer [\#234](https://github.com/kokkos/kokkos/issues/234)
	- Fence CudaUVMSpace allocations [\#230](https://github.com/kokkos/kokkos/issues/230)
	- View::operator accept std::is\_integral and std::is\_enum [\#227](https://github.com/kokkos/kokkos/issues/227)
	- Allocating zero size View [\#216](https://github.com/kokkos/kokkos/issues/216)
	- Thread scalable memory pool [\#212](https://github.com/kokkos/kokkos/issues/212)
	- Add a way to disable memory leak output [\#194](https://github.com/kokkos/kokkos/issues/194)
	- Kokkos exec space init should init Kokkos profiling [\#192](https://github.com/kokkos/kokkos/issues/192)
	- Runtime rank wrapper for View [\#189](https://github.com/kokkos/kokkos/issues/189)
	- Profiling Interface [\#158](https://github.com/kokkos/kokkos/issues/158)
	- Fix View assignment $of managed to unmanaged$ [\#153](https://github.com/kokkos/kokkos/issues/153)
	- Add unit test for assignment of managed View to unmanaged View [\#152](https://github.com/kokkos/kokkos/issues/152)
	- Check for oversubscription of threads with MPI in Kokkos::initialize [\#149](https://github.com/kokkos/kokkos/issues/149)
	- Dynamic resizeable 1dimensional view [\#143](https://github.com/kokkos/kokkos/issues/143)
	- Develop TaskPolicy for CUDA [\#142](https://github.com/kokkos/kokkos/issues/142)
	- New View : Test Compilation Downstream [\#138](https://github.com/kokkos/kokkos/issues/138)
	- New View Implementation [\#135](https://github.com/kokkos/kokkos/issues/135)
	- Add variant of subview that lets users add traits [\#134](https://github.com/kokkos/kokkos/issues/134)
	- NVCC-WRAPPER: Add --host-only flag [\#121](https://github.com/kokkos/kokkos/issues/121)
	- Address gtest issue with TriBITS Kokkos build outside of Trilinos [\#117](https://github.com/kokkos/kokkos/issues/117)
	- Make tests pass with -expt-extended-lambda on CUDA [\#108](https://github.com/kokkos/kokkos/issues/108)
	- Dynamic scheduling for parallel\_for and parallel\_reduce [\#106](https://github.com/kokkos/kokkos/issues/106)
	- Runtime or compile time error when reduce functor's join is not properly specified as const member function or with volatile arguments [\#105](https://github.com/kokkos/kokkos/issues/105)
	- Error out when the number of threads is modified after kokkos is initialized [\#104](https://github.com/kokkos/kokkos/issues/104)
	- Porting to POWER and remove assumption of X86 default [\#103](https://github.com/kokkos/kokkos/issues/103)
	- Dynamic scheduling option for RangePolicy [\#100](https://github.com/kokkos/kokkos/issues/100)
	- SharedMemory Support for Lambdas [\#81](https://github.com/kokkos/kokkos/issues/81)
	- Recommended TeamSize for Lambdas [\#80](https://github.com/kokkos/kokkos/issues/80)
	- Add Aggressive Vectorization Compilation mode [\#72](https://github.com/kokkos/kokkos/issues/72)
	- Dynamic scheduling team execution policy [\#53](https://github.com/kokkos/kokkos/issues/53)
	- UVM allocations in multi-GPU systems [\#50](https://github.com/kokkos/kokkos/issues/50)
	- Synchronic in Kokkos::Impl [\#44](https://github.com/kokkos/kokkos/issues/44)
	- index and dimension types in for loops [\#28](https://github.com/kokkos/kokkos/issues/28)
	- Subview assign of 1D Strided with stride 1 to LayoutLeft/Right [\#1](https://github.com/kokkos/kokkos/issues/1)

	Fixed bugs:

	- misspelled variable name in Kokkos\_Atomic\_Fetch + missing unit tests [\#340](https://github.com/kokkos/kokkos/issues/340)
	- seg fault Kokkos::Impl::CudaInternal::print\_configuration [\#338](https://github.com/kokkos/kokkos/issues/338)
	- Clang compiler error with named parallel\_reduce, tags, and TeamPolicy. [\#335](https://github.com/kokkos/kokkos/issues/335)
	- Shared Memory Allocation Error at parallel\_reduce [\#311](https://github.com/kokkos/kokkos/issues/311)
	- DynRankView: Fix resize and realloc [\#303](https://github.com/kokkos/kokkos/issues/303)
	- Scratch memory and dynamic scheduling [\#279](https://github.com/kokkos/kokkos/issues/279)
	- MemoryPool infinite loop when out of memory [\#312](https://github.com/kokkos/kokkos/issues/312)
	- Kokkos DynRankView changes break Sacado and Panzer [\#299](https://github.com/kokkos/kokkos/issues/299)
	- MemoryPool fails to compile on non-cuda non-x86 [\#297](https://github.com/kokkos/kokkos/issues/297)
	- Random Number Generator Fix [\#296](https://github.com/kokkos/kokkos/issues/296)
	- View template parameter ordering Bug [\#282](https://github.com/kokkos/kokkos/issues/282)
	- Serial task policy broken. [\#281](https://github.com/kokkos/kokkos/issues/281)
	- deep\_copy with LayoutStride should not memcpy [\#262](https://github.com/kokkos/kokkos/issues/262)
	- DualView::need\_sync should be a const method [\#248](https://github.com/kokkos/kokkos/issues/248)
	- Arbitrary-sized atomics on GPUs broken; loop forever [\#238](https://github.com/kokkos/kokkos/issues/238)
	- boolean reduction value\_type changes answer [\#225](https://github.com/kokkos/kokkos/issues/225)
	- Custom init function for parallel\_reduce with array value\_type [\#210](https://github.com/kokkos/kokkos/issues/210)
	- unit\_test Makefile is Broken - Recursively Calls itself until Machine Apocalypse. [\#202](https://github.com/kokkos/kokkos/issues/202)
	- nvcc\_wrapper Does Not Support -Xcompiler \<compiler option\> [\#198](https://github.com/kokkos/kokkos/issues/198)
	- Kokkos exec space init should init Kokkos profiling [\#192](https://github.com/kokkos/kokkos/issues/192)
	- Kokkos Threads Backend impl\_shared\_alloc Broken on Intel 16.1 $Shepard Haswell$ [\#186](https://github.com/kokkos/kokkos/issues/186)
	- pthread back end hangs if used uninitialized [\#182](https://github.com/kokkos/kokkos/issues/182)
	- parallel\_reduce of size 0, not calling init/join [\#175](https://github.com/kokkos/kokkos/issues/175)
	- Bug in Threads with OpenMP enabled [\#173](https://github.com/kokkos/kokkos/issues/173)
	- KokkosExp\_SharedAlloc, m\_team\_work\_index inaccessible [\#166](https://github.com/kokkos/kokkos/issues/166)
	- 128-bit CAS without Assembly Broken? [\#161](https://github.com/kokkos/kokkos/issues/161)
	- fatal error: Cuda/Kokkos\_Cuda\_abort.hpp: No such file or directory [\#157](https://github.com/kokkos/kokkos/issues/157)
	- Power8: Fix OpenMP backend [\#139](https://github.com/kokkos/kokkos/issues/139)
	- Data race in Kokkos OpenMP initialization [\#131](https://github.com/kokkos/kokkos/issues/131)
	- parallel\_launch\_local\_memory and cuda 7.5 [\#125](https://github.com/kokkos/kokkos/issues/125)
	- Resize can fail with Cuda due to asynchronous dispatch [\#119](https://github.com/kokkos/kokkos/issues/119)
	- Qthread taskpolicy initialization bug. [\#92](https://github.com/kokkos/kokkos/issues/92)
	- Windows: sys/mman.h [\#89](https://github.com/kokkos/kokkos/issues/89)
	- Windows: atomic\_fetch\_sub [\#88](https://github.com/kokkos/kokkos/issues/88)
	- Windows: snprintf [\#87](https://github.com/kokkos/kokkos/issues/87)
	- Parallel\_Reduce with TeamPolicy and league size of 0 returns garbage [\#85](https://github.com/kokkos/kokkos/issues/85)
	- Throw with Cuda when using $2D$ team\_policy parallel\_reduce with less than a warp size [\#76](https://github.com/kokkos/kokkos/issues/76)
	- Scalar views don't work with Kokkos::Atomic memory trait [\#69](https://github.com/kokkos/kokkos/issues/69)
	- Reduce the number of threads per team for Cuda [\#63](https://github.com/kokkos/kokkos/issues/63)
	- Named Kernels fail for reductions with CUDA [\#60](https://github.com/kokkos/kokkos/issues/60)
	- Kokkos View dimension\_ for long returning unsigned int [\#20](https://github.com/kokkos/kokkos/issues/20)
	- atomic test hangs with LLVM [\#6](https://github.com/kokkos/kokkos/issues/6)
	- OpenMP Test should set omp\_set\_num\_threads to 1 [\#4](https://github.com/kokkos/kokkos/issues/4)

	Closed issues:

	- develop branch broken with CUDA 8 and --expt-extended-lambda [\#354](https://github.com/kokkos/kokkos/issues/354)
	- --arch=KNL with Intel 2016 build failure [\#349](https://github.com/kokkos/kokkos/issues/349)
	- Error building with Cuda when passing -DKOKKOS\_CUDA\_USE\_LAMBDA to generate\_makefile.bash [\#343](https://github.com/kokkos/kokkos/issues/343)
	- Can I safely use int indices in a 2-D View with capacity \> 2B? [\#318](https://github.com/kokkos/kokkos/issues/318)
	- Kokkos::ViewAllocateWithoutInitializing is not working [\#317](https://github.com/kokkos/kokkos/issues/317)
	- Intel build on Mac OS X [\#277](https://github.com/kokkos/kokkos/issues/277)
	- deleted [\#271](https://github.com/kokkos/kokkos/issues/271)
	- Broken Mira build [\#268](https://github.com/kokkos/kokkos/issues/268)
	- 32-bit build [\#246](https://github.com/kokkos/kokkos/issues/246)
	- parallel\_reduce with RDC crashes linker [\#232](https://github.com/kokkos/kokkos/issues/232)
	- build of Kokkos\_Sparse\_MV\_impl\_spmv\_Serial.cpp.o fails if you use nvcc and have cuda disabled [\#209](https://github.com/kokkos/kokkos/issues/209)
	- Kokkos Serial execution space is not tested with TeamPolicy. [\#207](https://github.com/kokkos/kokkos/issues/207)
	- Unit test failure on Hansen KokkosCore\_UnitTest\_Cuda\_MPI\_1 [\#200](https://github.com/kokkos/kokkos/issues/200)
	- nvcc compiler warning: calling a \_\_host\_\_ function from a \_\_host\_\_ \_\_device\_\_ function is not allowed [\#180](https://github.com/kokkos/kokkos/issues/180)
	- Intel 15 build error with defaulted "move" operators [\#171](https://github.com/kokkos/kokkos/issues/171)
	- missing libkokkos.a during Trilinos 12.4.2 build, yet other libkokkos\*.a libs are there [\#165](https://github.com/kokkos/kokkos/issues/165)
	- Tie atomic updates to execution space or even to thread team? $speculation$ [\#144](https://github.com/kokkos/kokkos/issues/144)
	- New View: Compiletime/size Test [\#137](https://github.com/kokkos/kokkos/issues/137)
	- New View : Performance Test [\#136](https://github.com/kokkos/kokkos/issues/136)
	- Signed/unsigned comparison warning in CUDA parallel [\#130](https://github.com/kokkos/kokkos/issues/130)
	- Kokkos::complex: Need op\* w/ std::complex & real [\#126](https://github.com/kokkos/kokkos/issues/126)
	- Use uintptr\_t for casting pointers [\#110](https://github.com/kokkos/kokkos/issues/110)
	- Default thread mapping behavior between P and Q threads. [\#91](https://github.com/kokkos/kokkos/issues/91)
	- Windows: Atomic\_Fetch\_Exchange return type [\#90](https://github.com/kokkos/kokkos/issues/90)
	- Synchronic unit test is way too long [\#84](https://github.com/kokkos/kokkos/issues/84)
	- nvcc\_wrapper -\> $$NVCC\_WRAPPER$ [\#42](https://github.com/kokkos/kokkos/issues/42)
	- Check compiler version and print helpful message [\#39](https://github.com/kokkos/kokkos/issues/39)
	- Kokkos shared memory on Cuda uses a lot of registers [\#31](https://github.com/kokkos/kokkos/issues/31)
	- Can not pass unit test `cuda.space` without a GT 720 [\#25](https://github.com/kokkos/kokkos/issues/25)
	- Makefile.kokkos lacks bounds checking option that CMake has [\#24](https://github.com/kokkos/kokkos/issues/24)
	- Kokkos can not complete unit tests with CUDA UVM enabled [\#23](https://github.com/kokkos/kokkos/issues/23)
	- Simplify teams + shared memory histogram example to remove vectorization [\#21](https://github.com/kokkos/kokkos/issues/21)
	- Kokkos needs to rever to ${PROJECT\_NAME}\_ENABLE\_CXX11 not Trilinos\_ENABLE\_CXX11 [\#17](https://github.com/kokkos/kokkos/issues/17)
	- Kokkos Base Makefile adds AVX to KNC Build [\#16](https://github.com/kokkos/kokkos/issues/16)
	- MS Visual Studio 2013 Build Errors [\#9](https://github.com/kokkos/kokkos/issues/9)
	- subview$X, ALL\($, j\) for 2-D LayoutRight View X: should it view a column? [\#5](https://github.com/kokkos/kokkos/issues/5)

	## [End_C++98](https://github.com/kokkos/kokkos/tree/End_C++98) (2015-04-15)


	\* This Change Log was automatically generated by [github_changelog_generator](https://github.com/skywinder/Github-Changelog-Generator)
	diff --git a/lib/kokkos/CMakeLists.txt b/lib/kokkos/CMakeLists.txt
	index 16854c839..1c820660a 100644
	--- a/lib/kokkos/CMakeLists.txt
	+++ b/lib/kokkos/CMakeLists.txt
	@@ -1,216 +1,215 @@
	IF(COMMAND TRIBITS_PACKAGE_DECL)
	SET(KOKKOS_HAS_TRILINOS ON CACHE BOOL "")
	ELSE()
	SET(KOKKOS_HAS_TRILINOS OFF CACHE BOOL "")
	ENDIF()

	IF(NOT KOKKOS_HAS_TRILINOS)
	CMAKE_MINIMUM_REQUIRED(VERSION 2.8.11 FATAL_ERROR)
	INCLUDE(cmake/tribits.cmake)
	SET(CMAKE_CXX_STANDARD 11)
	ENDIF()

	#
	# A) Forward delcare the package so that certain options are also defined for
	# subpackages
	#

	TRIBITS_PACKAGE_DECL(Kokkos) # ENABLE_SHADOWING_WARNINGS)

	#------------------------------------------------------------------------------
	#
	# B) Define the common options for Kokkos first so they can be used by
	# subpackages as well.
	#



	# mfh 01 Aug 2016: See Issue #61:
	#
	# https://github.com/kokkos/kokkos/issues/61
	#
	# Don't use TRIBITS_ADD_DEBUG_OPTION() here, because that defines
	# HAVE_KOKKOS_DEBUG. We define KOKKOS_HAVE_DEBUG here instead,
	# for compatibility with Kokkos' Makefile build system.

	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_DEBUG
	KOKKOS_HAVE_DEBUG
	"Enable run-time debug checks. These checks may be expensive, so they are disabled by default in a release build."
	${${PROJECT_NAME}_ENABLE_DEBUG}
	)

	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_SIERRA_BUILD
	KOKKOS_FOR_SIERRA
	"Configure Kokkos for building within the Sierra build system."
	OFF
	)

	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_Cuda
	KOKKOS_HAVE_CUDA
	"Enable CUDA support in Kokkos."
	"${TPL_ENABLE_CUDA}"
	)

	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_Cuda_UVM
	KOKKOS_USE_CUDA_UVM
	"Enable CUDA Unified Virtual Memory as the default in Kokkos."
	OFF
	)

	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_Cuda_RDC
	KOKKOS_HAVE_CUDA_RDC
	"Enable CUDA Relocatable Device Code support in Kokkos."
	OFF
	)

	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_Cuda_Lambda
	KOKKOS_HAVE_CUDA_LAMBDA
	"Enable CUDA LAMBDA support in Kokkos."
	OFF
	)

	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_Pthread
	KOKKOS_HAVE_PTHREAD
	"Enable Pthread support in Kokkos."
	OFF
	)

	ASSERT_DEFINED(TPL_ENABLE_Pthread)
	IF (Kokkos_ENABLE_Pthread AND NOT TPL_ENABLE_Pthread)
	MESSAGE(FATAL_ERROR "You set Kokkos_ENABLE_Pthread=ON, but Trilinos' support for Pthread(s) is not enabled (TPL_ENABLE_Pthread=OFF). This is not allowed. Please enable Pthreads in Trilinos before attempting to enable Kokkos' support for Pthreads.")
	ENDIF ()
	IF (NOT TPL_ENABLE_Pthread)
	ADD_DEFINITIONS(-DGTEST_HAS_PTHREAD=0)
	ENDIF()

	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_OpenMP
	KOKKOS_HAVE_OPENMP
	"Enable OpenMP support in Kokkos."
	"${${PROJECT_NAME}_ENABLE_OpenMP}"
	)

	TRIBITS_ADD_OPTION_AND_DEFINE(
	- Kokkos_ENABLE_QTHREAD
	- KOKKOS_HAVE_QTHREAD
	- "Enable QTHREAD support in Kokkos."
	- "${TPL_ENABLE_QTHREAD}"
	+ Kokkos_ENABLE_Qthreads
	+ KOKKOS_HAVE_QTHREADS
	+ "Enable Qthreads support in Kokkos."
	+ "${TPL_ENABLE_QTHREADS}"
	)

	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_CXX11
	KOKKOS_HAVE_CXX11
	"Enable C++11 support in Kokkos."
	"${${PROJECT_NAME}_ENABLE_CXX11}"
	)
	-
	+
	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_HWLOC
	KOKKOS_HAVE_HWLOC
	"Enable HWLOC support in Kokkos."
	"${TPL_ENABLE_HWLOC}"
	)

	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_MPI
	KOKKOS_HAVE_MPI
	"Enable MPI support in Kokkos."
	"${TPL_ENABLE_MPI}"
	)

	# Set default value of Kokkos_ENABLE_Debug_Bounds_Check option
	#
	# CMake is case sensitive. The Kokkos_ENABLE_Debug_Bounds_Check
	# option (defined below) is annoyingly not all caps, but we need to
	# keep it that way for backwards compatibility. If users forget and
	# try using an all-caps variable, then make it count by using the
	# all-caps version as the default value of the original, not-all-caps
	# option. Otherwise, the default value of this option comes from
	# Kokkos_ENABLE_DEBUG (see Issue #367).

	ASSERT_DEFINED(${PACKAGE_NAME}_ENABLE_DEBUG)
	IF(DEFINED Kokkos_ENABLE_DEBUG_BOUNDS_CHECK)
	IF(Kokkos_ENABLE_DEBUG_BOUNDS_CHECK)
	SET(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT ON)
	ELSE()
	SET(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT "${${PACKAGE_NAME}_ENABLE_DEBUG}")
	ENDIF()
	ELSE()
	SET(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT "${${PACKAGE_NAME}_ENABLE_DEBUG}")
	ENDIF()
	ASSERT_DEFINED(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT)

	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_Debug_Bounds_Check
	KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK
	"Enable Kokkos::View run-time bounds checking."
	"${Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT}"
	)

	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_Profiling
	KOKKOS_ENABLE_PROFILING_INTERNAL
	"Enable KokkosP profiling support for kernel data collections."
	"${TPL_ENABLE_DLlib}"
	)

	# placeholder for future device...
	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_Winthread
	KOKKOS_HAVE_WINTHREAD
	"Enable Winthread support in Kokkos."
	"${TPL_ENABLE_Winthread}"
	)

	# use new/old View
	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_USING_DEPRECATED_VIEW
	KOKKOS_USING_DEPRECATED_VIEW
	"Choose whether to use the old, deprecated Kokkos::View"
	OFF
	)

	#------------------------------------------------------------------------------
	#
	# C) Install Kokkos' executable scripts
	#


	# nvcc_wrapper is Kokkos' wrapper for NVIDIA's NVCC CUDA compiler.
	# Kokkos needs nvcc_wrapper in order to build. Other libraries and
	# executables also need nvcc_wrapper. Thus, we need to install it.
	# If the argument of DESTINATION is a relative path, CMake computes it
	# as relative to ${CMAKE_INSTALL_PATH}.

	INSTALL(PROGRAMS ${CMAKE_CURRENT_SOURCE_DIR}/bin/nvcc_wrapper DESTINATION bin)


	#------------------------------------------------------------------------------
	#
	# D) Process the subpackages for Kokkos
	#

	TRIBITS_PROCESS_SUBPACKAGES()

	#
	# E) If Kokkos itself is enabled, process the Kokkos package
	#

	TRIBITS_PACKAGE_DEF()

	TRIBITS_EXCLUDE_AUTOTOOLS_FILES()

	TRIBITS_EXCLUDE_FILES(
	classic/doc
	classic/LinAlg/doc/CrsRefactorNotesMay2012
	)

	TRIBITS_PACKAGE_POSTPROCESS()
	-
	diff --git a/lib/kokkos/Makefile.kokkos b/lib/kokkos/Makefile.kokkos
	index 9d00c1902..5b094dba8 100644
	--- a/lib/kokkos/Makefile.kokkos
	+++ b/lib/kokkos/Makefile.kokkos
	@@ -1,676 +1,699 @@
	-# Default settings common options
	+# Default settings common options.

	#LAMMPS specific settings:
	KOKKOS_PATH=../../lib/kokkos
	CXXFLAGS=$(CCFLAGS)

	-#Options: OpenMP,Serial,Pthreads,Cuda
	+# Options: Cuda,OpenMP,Pthreads,Qthreads,Serial
	KOKKOS_DEVICES ?= "OpenMP"
	#KOKKOS_DEVICES ?= "Pthreads"
	-#Options: KNC,SNB,HSW,Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal61,ARMv80,ARMv81,ARMv8-ThunderX,BGQ,Power7,Power8,Power9,KNL,BDW,SKX
	+# Options: KNC,SNB,HSW,Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,ARMv80,ARMv81,ARMv8-ThunderX,BGQ,Power7,Power8,Power9,KNL,BDW,SKX
	KOKKOS_ARCH ?= ""
	-#Options: yes,no
	+# Options: yes,no
	KOKKOS_DEBUG ?= "no"
	-#Options: hwloc,librt,experimental_memkind
	+# Options: hwloc,librt,experimental_memkind
	KOKKOS_USE_TPLS ?= ""
	-#Options: c++11,c++1z
	+# Options: c++11,c++1z
	KOKKOS_CXX_STANDARD ?= "c++11"
	-#Options: aggressive_vectorization,disable_profiling
	+# Options: aggressive_vectorization,disable_profiling
	KOKKOS_OPTIONS ?= ""

	-#Default settings specific options
	-#Options: force_uvm,use_ldg,rdc,enable_lambda
	+# Default settings specific options.
	+# Options: force_uvm,use_ldg,rdc,enable_lambda
	KOKKOS_CUDA_OPTIONS ?= "enable_lambda"

	-# Check for general settings
	-
	+# Check for general settings.
	KOKKOS_INTERNAL_ENABLE_DEBUG := $(strip $(shell echo $(KOKKOS_DEBUG) \| grep "yes" \| wc -l))
	KOKKOS_INTERNAL_ENABLE_CXX11 := $(strip $(shell echo $(KOKKOS_CXX_STANDARD) \| grep "c++11" \| wc -l))
	KOKKOS_INTERNAL_ENABLE_CXX1Z := $(strip $(shell echo $(KOKKOS_CXX_STANDARD) \| grep "c++1z" \| wc -l))

	-# Check for external libraries
	+# Check for external libraries.
	KOKKOS_INTERNAL_USE_HWLOC := $(strip $(shell echo $(KOKKOS_USE_TPLS) \| grep "hwloc" \| wc -l))
	KOKKOS_INTERNAL_USE_LIBRT := $(strip $(shell echo $(KOKKOS_USE_TPLS) \| grep "librt" \| wc -l))
	KOKKOS_INTERNAL_USE_MEMKIND := $(strip $(shell echo $(KOKKOS_USE_TPLS) \| grep "experimental_memkind" \| wc -l))

	-# Check for advanced settings
	+# Check for advanced settings.
	KOKKOS_INTERNAL_OPT_RANGE_AGGRESSIVE_VECTORIZATION := $(strip $(shell echo $(KOKKOS_OPTIONS) \| grep "aggressive_vectorization" \| wc -l))
	KOKKOS_INTERNAL_DISABLE_PROFILING := $(strip $(shell echo $(KOKKOS_OPTIONS) \| grep "disable_profiling" \| wc -l))
	KOKKOS_INTERNAL_CUDA_USE_LDG := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) \| grep "use_ldg" \| wc -l))
	KOKKOS_INTERNAL_CUDA_USE_UVM := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) \| grep "force_uvm" \| wc -l))
	KOKKOS_INTERNAL_CUDA_USE_RELOC := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) \| grep "rdc" \| wc -l))
	KOKKOS_INTERNAL_CUDA_USE_LAMBDA := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) \| grep "enable_lambda" \| wc -l))

	-# Check for Kokkos Host Execution Spaces one of which must be on
	-
	+# Check for Kokkos Host Execution Spaces one of which must be on.
	KOKKOS_INTERNAL_USE_OPENMP := $(strip $(shell echo $(KOKKOS_DEVICES) \| grep OpenMP \| wc -l))
	KOKKOS_INTERNAL_USE_PTHREADS := $(strip $(shell echo $(KOKKOS_DEVICES) \| grep Pthread \| wc -l))
	+KOKKOS_INTERNAL_USE_QTHREADS := $(strip $(shell echo $(KOKKOS_DEVICES) \| grep Qthreads \| wc -l))
	KOKKOS_INTERNAL_USE_SERIAL := $(strip $(shell echo $(KOKKOS_DEVICES) \| grep Serial \| wc -l))
	-KOKKOS_INTERNAL_USE_QTHREAD := $(strip $(shell echo $(KOKKOS_DEVICES) \| grep Qthread \| wc -l))

	ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 0)
	ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 0)
	- KOKKOS_INTERNAL_USE_SERIAL := 1
	+ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 0)
	+ KOKKOS_INTERNAL_USE_SERIAL := 1
	+endif
	endif
	endif

	-# Check for other Execution Spaces
	-
	+# Check for other Execution Spaces.
	KOKKOS_INTERNAL_USE_CUDA := $(strip $(shell echo $(KOKKOS_DEVICES) \| grep Cuda \| wc -l))

	ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
	KOKKOS_INTERNAL_NVCC_PATH := $(shell which nvcc)
	CUDA_PATH ?= $(KOKKOS_INTERNAL_NVCC_PATH:/bin/nvcc=)
	KOKKOS_INTERNAL_COMPILER_NVCC_VERSION := $(shell nvcc --version 2>&1 \| grep release \| cut -d' ' -f5 \| cut -d',' -f1 \| tr -d .)
	endif

	-# Check OS
	-
	+# Check OS.
	KOKKOS_OS := $(shell uname -s)
	KOKKOS_INTERNAL_OS_CYGWIN := $(shell uname -s \| grep CYGWIN \| wc -l)
	KOKKOS_INTERNAL_OS_LINUX := $(shell uname -s \| grep Linux \| wc -l)
	KOKKOS_INTERNAL_OS_DARWIN := $(shell uname -s \| grep Darwin \| wc -l)

	-# Check compiler
	-
	-KOKKOS_INTERNAL_COMPILER_INTEL := $(shell $(CXX) --version 2>&1 \| grep "Intel Corporation" \| wc -l)
	-KOKKOS_INTERNAL_COMPILER_PGI := $(shell $(CXX) --version 2>&1 \| grep PGI \| wc -l)
	-KOKKOS_INTERNAL_COMPILER_XL := $(shell $(CXX) -qversion 2>&1 \| grep XL \| wc -l)
	-KOKKOS_INTERNAL_COMPILER_CRAY := $(shell $(CXX) -craype-verbose 2>&1 \| grep "CC-" \| wc -l)
	-KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(CXX) --version 2>&1 \| grep "nvcc" \| wc -l)
	+# Check compiler.
	+KOKKOS_INTERNAL_COMPILER_INTEL := $(shell $(CXX) --version 2>&1 \| grep "Intel Corporation" \| wc -l)
	+KOKKOS_INTERNAL_COMPILER_PGI := $(shell $(CXX) --version 2>&1 \| grep PGI \| wc -l)
	+KOKKOS_INTERNAL_COMPILER_XL := $(shell $(CXX) -qversion 2>&1 \| grep XL \| wc -l)
	+KOKKOS_INTERNAL_COMPILER_CRAY := $(shell $(CXX) -craype-verbose 2>&1 \| grep "CC-" \| wc -l)
	+KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(CXX) --version 2>&1 \| grep "nvcc" \| wc -l)
	ifneq ($(OMPI_CXX),)
	KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(OMPI_CXX) --version 2>&1 \| grep "nvcc" \| wc -l)
	endif
	ifneq ($(MPICH_CXX),)
	KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(MPICH_CXX) --version 2>&1 \| grep "nvcc" \| wc -l)
	endif
	-KOKKOS_INTERNAL_COMPILER_CLANG := $(shell $(CXX) --version 2>&1 \| grep "clang" \| wc -l)
	+KOKKOS_INTERNAL_COMPILER_CLANG := $(shell $(CXX) --version 2>&1 \| grep "clang" \| wc -l)

	ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 2)
	KOKKOS_INTERNAL_COMPILER_CLANG = 1
	endif
	ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 2)
	KOKKOS_INTERNAL_COMPILER_XL = 1
	endif

	ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
	KOKKOS_INTERNAL_COMPILER_CLANG_VERSION := $(shell clang --version \| grep version \| cut -d ' ' -f3 \| tr -d '.')
	+
	ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
	ifeq ($(shell test $(KOKKOS_INTERNAL_COMPILER_CLANG_VERSION) -lt 400; echo $$?),0)
	- $(error Compiling Cuda code directly with Clang requires version 4.0.0 or higher)
	+ $(error Compiling Cuda code directly with Clang requires version 4.0.0 or higher)
	endif
	KOKKOS_INTERNAL_CUDA_USE_LAMBDA := 1
	endif
	endif

	-
	ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	- KOKKOS_INTERNAL_OPENMP_FLAG := -mp
	+ KOKKOS_INTERNAL_OPENMP_FLAG := -mp
	else
	ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
	KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp=libomp
	else
	ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
	KOKKOS_INTERNAL_OPENMP_FLAG := -qsmp=omp
	else
	ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
	- # OpenMP is turned on by default in Cray compiler environment
	+ # OpenMP is turned on by default in Cray compiler environment.
	KOKKOS_INTERNAL_OPENMP_FLAG :=
	else
	KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp
	endif
	endif
	endif
	endif

	ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	KOKKOS_INTERNAL_CXX11_FLAG := --c++11
	else
	ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
	KOKKOS_INTERNAL_CXX11_FLAG := -std=c++11
	else
	ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
	KOKKOS_INTERNAL_CXX11_FLAG := -hstd=c++11
	else
	KOKKOS_INTERNAL_CXX11_FLAG := --std=c++11
	KOKKOS_INTERNAL_CXX1Z_FLAG := --std=c++1z
	endif
	endif
	endif

	-# Check for Kokkos Architecture settings
	+# Check for Kokkos Architecture settings.

	-#Intel based
	+# Intel based.
	KOKKOS_INTERNAL_USE_ARCH_KNC := $(strip $(shell echo $(KOKKOS_ARCH) \| grep KNC \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_SNB := $(strip $(shell echo $(KOKKOS_ARCH) \| grep SNB \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_HSW := $(strip $(shell echo $(KOKKOS_ARCH) \| grep HSW \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_BDW := $(strip $(shell echo $(KOKKOS_ARCH) \| grep BDW \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_SKX := $(strip $(shell echo $(KOKKOS_ARCH) \| grep SKX \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_KNL := $(strip $(shell echo $(KOKKOS_ARCH) \| grep KNL \| wc -l))

	-#NVIDIA based
	-NVCC_WRAPPER := $(KOKKOS_PATH)/config/nvcc_wrapper
	+# NVIDIA based.
	+NVCC_WRAPPER := $(KOKKOS_PATH)/config/nvcc_wrapper
	KOKKOS_INTERNAL_USE_ARCH_KEPLER30 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Kepler30 \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_KEPLER32 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Kepler32 \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_KEPLER35 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Kepler35 \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_KEPLER37 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Kepler37 \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_MAXWELL50 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Maxwell50 \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_MAXWELL52 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Maxwell52 \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_MAXWELL53 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Maxwell53 \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_PASCAL61 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Pascal61 \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_PASCAL60 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Pascal60 \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
	+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
	+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
	+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
	+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
	+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL60) \
	+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
	+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
	+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) \| bc))

	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
	-KOKKOS_INTERNAL_USE_ARCH_MAXWELL50 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Maxwell \| wc -l))
	-KOKKOS_INTERNAL_USE_ARCH_KEPLER35 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Kepler \| wc -l))
	-KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
	- + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
	- + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
	- + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
	- + $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
	- + $(KOKKOS_INTERNAL_USE_ARCH_PASCAL60) \
	- + $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
	- + $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
	- + $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) \| bc))
	-endif
	-
	-#ARM based
	+ KOKKOS_INTERNAL_USE_ARCH_MAXWELL50 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Maxwell \| wc -l))
	+ KOKKOS_INTERNAL_USE_ARCH_KEPLER35 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Kepler \| wc -l))
	+ KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
	+ + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
	+ + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
	+ + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
	+ + $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
	+ + $(KOKKOS_INTERNAL_USE_ARCH_PASCAL60) \
	+ + $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
	+ + $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
	+ + $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) \| bc))
	+endif
	+
	+# ARM based.
	KOKKOS_INTERNAL_USE_ARCH_ARMV80 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep ARMv80 \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_ARMV81 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep ARMv81 \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX := $(strip $(shell echo $(KOKKOS_ARCH) \| grep ARMv8-ThunderX \| wc -l))

	-#IBM based
	+# IBM based.
	KOKKOS_INTERNAL_USE_ARCH_BGQ := $(strip $(shell echo $(KOKKOS_ARCH) \| grep BGQ \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_POWER7 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Power7 \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_POWER8 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Power8 \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_POWER9 := $(strip $(shell echo $(KOKKOS_ARCH) \| grep Power9 \| wc -l))
	KOKKOS_INTERNAL_USE_ARCH_IBM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_BGQ)+$(KOKKOS_INTERNAL_USE_ARCH_POWER7)+$(KOKKOS_INTERNAL_USE_ARCH_POWER8)+$(KOKKOS_INTERNAL_USE_ARCH_POWER9) \| bc))

	-#AMD based
	+# AMD based.
	KOKKOS_INTERNAL_USE_ARCH_AMDAVX := $(strip $(shell echo $(KOKKOS_ARCH) \| grep AMDAVX \| wc -l))

	-#Any AVX?
	+# Any AVX?
	KOKKOS_INTERNAL_USE_ARCH_AVX := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_AMDAVX) \| bc ))
	KOKKOS_INTERNAL_USE_ARCH_AVX2 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW) \| bc ))
	KOKKOS_INTERNAL_USE_ARCH_AVX512MIC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNL) \| bc ))
	KOKKOS_INTERNAL_USE_ARCH_AVX512XEON := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SKX) \| bc ))

	-# Decide what ISA level we are able to support
	-KOKKOS_INTERNAL_USE_ISA_X86_64 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW)+$(KOKKOS_INTERNAL_USE_ARCH_KNL)+$(KOKKOS_INTERNAL_USE_ARCH_SKX) \| bc ))
	-KOKKOS_INTERNAL_USE_ISA_KNC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNC) \| bc ))
	-KOKKOS_INTERNAL_USE_ISA_POWERPCLE := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_POWER8)+$(KOKKOS_INTERNAL_USE_ARCH_POWER9) \| bc ))
	+# Decide what ISA level we are able to support.
	+KOKKOS_INTERNAL_USE_ISA_X86_64 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW)+$(KOKKOS_INTERNAL_USE_ARCH_KNL)+$(KOKKOS_INTERNAL_USE_ARCH_SKX) \| bc ))
	+KOKKOS_INTERNAL_USE_ISA_KNC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNC) \| bc ))
	+KOKKOS_INTERNAL_USE_ISA_POWERPCLE := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_POWER8)+$(KOKKOS_INTERNAL_USE_ARCH_POWER9) \| bc ))

	-#Incompatible flags?
	+# Incompatible flags?
	KOKKOS_INTERNAL_USE_ARCH_MULTIHOST := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_AVX)+$(KOKKOS_INTERNAL_USE_ARCH_AVX2)+$(KOKKOS_INTERNAL_USE_ARCH_KNC)+$(KOKKOS_INTERNAL_USE_ARCH_IBM)+$(KOKKOS_INTERNAL_USE_ARCH_AMDAVX)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV80)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV81)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX)>1" \| bc ))
	KOKKOS_INTERNAL_USE_ARCH_MULTIGPU := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_NVIDIA)>1" \| bc))

	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MULTIHOST), 1)
	$(error Defined Multiple Host architectures: KOKKOS_ARCH=$(KOKKOS_ARCH) )
	endif
	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MULTIGPU), 1)
	$(error Defined Multiple GPU architectures: KOKKOS_ARCH=$(KOKKOS_ARCH) )
	endif

	-#Generating the list of Flags
	+# Generating the list of Flags.

	KOKKOS_CPPFLAGS = -I./ -I$(KOKKOS_PATH)/core/src -I$(KOKKOS_PATH)/containers/src -I$(KOKKOS_PATH)/algorithms/src

	# No warnings:
	KOKKOS_CXXFLAGS =
	# INTEL and CLANG warnings:
	#KOKKOS_CXXFLAGS = -Wall -Wshadow -pedantic -Wsign-compare -Wtype-limits -Wuninitialized
	# GCC warnings:
	#KOKKOS_CXXFLAGS = -Wall -Wshadow -pedantic -Wsign-compare -Wtype-limits -Wuninitialized -Wignored-qualifiers -Wempty-body -Wclobbered

	KOKKOS_LIBS = -lkokkos -ldl
	KOKKOS_LDFLAGS = -L$(shell pwd)
	-KOKKOS_SRC =
	+KOKKOS_SRC =
	KOKKOS_HEADERS =

	-#Generating the KokkosCore_config.h file
	+# Generating the KokkosCore_config.h file.

	tmp := $(shell echo "/* ---------------------------------------------" > KokkosCore_config.tmp)
	tmp := $(shell echo "Makefile constructed configuration:" >> KokkosCore_config.tmp)
	tmp := $(shell date >> KokkosCore_config.tmp)
	tmp := $(shell echo "----------------------------------------------*/" >> KokkosCore_config.tmp)

	-
	tmp := $(shell echo "/* Execution Spaces */" >> KokkosCore_config.tmp)
	+
	+ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
	+ tmp := $(shell echo "\#define KOKKOS_HAVE_CUDA 1" >> KokkosCore_config.tmp )
	+endif
	+
	ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
	- tmp := $(shell echo '\#define KOKKOS_HAVE_OPENMP 1' >> KokkosCore_config.tmp)
	+ tmp := $(shell echo '\#define KOKKOS_HAVE_OPENMP 1' >> KokkosCore_config.tmp)
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
	- tmp := $(shell echo "\#define KOKKOS_HAVE_PTHREAD 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_HAVE_PTHREAD 1" >> KokkosCore_config.tmp )
	endif

	-ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
	- tmp := $(shell echo "\#define KOKKOS_HAVE_SERIAL 1" >> KokkosCore_config.tmp )
	+ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 1)
	+ tmp := $(shell echo "\#define KOKKOS_HAVE_QTHREADS 1" >> KokkosCore_config.tmp )
	endif

	-ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
	- tmp := $(shell echo "\#define KOKKOS_HAVE_CUDA 1" >> KokkosCore_config.tmp )
	+ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
	+ tmp := $(shell echo "\#define KOKKOS_HAVE_SERIAL 1" >> KokkosCore_config.tmp )
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_ISA_X86_64), 1)
	- tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_USE_ISA_X86_64" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_USE_ISA_X86_64" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_ISA_KNC), 1)
	- tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_USE_ISA_KNC" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_USE_ISA_KNC" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_ISA_POWERPCLE), 1)
	- tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_USE_ISA_POWERPCLE" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
	-endif
	-
	-ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
	- KOKKOS_CPPFLAGS += -I$(QTHREAD_PATH)/include
	- KOKKOS_LDFLAGS += -L$(QTHREAD_PATH)/lib
	- tmp := $(shell echo "\#define KOKKOS_HAVE_QTHREAD 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_USE_ISA_POWERPCLE" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
	endif

	tmp := $(shell echo "/* General Settings */" >> KokkosCore_config.tmp)
	ifeq ($(KOKKOS_INTERNAL_ENABLE_CXX11), 1)
	- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX11_FLAG)
	- tmp := $(shell echo "\#define KOKKOS_HAVE_CXX11 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX11_FLAG)
	+ tmp := $(shell echo "\#define KOKKOS_HAVE_CXX11 1" >> KokkosCore_config.tmp )
	endif

	ifeq ($(KOKKOS_INTERNAL_ENABLE_CXX1Z), 1)
	- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX1Z_FLAG)
	- tmp := $(shell echo "\#define KOKKOS_HAVE_CXX11 1" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_HAVE_CXX1Z 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX1Z_FLAG)
	+ tmp := $(shell echo "\#define KOKKOS_HAVE_CXX11 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_HAVE_CXX1Z 1" >> KokkosCore_config.tmp )
	endif

	ifeq ($(KOKKOS_INTERNAL_ENABLE_DEBUG), 1)
	ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
	- KOKKOS_CXXFLAGS += -lineinfo
	+ KOKKOS_CXXFLAGS += -lineinfo
	endif
	- KOKKOS_CXXFLAGS += -g
	- KOKKOS_LDFLAGS += -g -ldl
	- tmp := $(shell echo "\#define KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK 1" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_HAVE_DEBUG 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CXXFLAGS += -g
	+ KOKKOS_LDFLAGS += -g -ldl
	+ tmp := $(shell echo "\#define KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_HAVE_DEBUG 1" >> KokkosCore_config.tmp )
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_HWLOC), 1)
	- KOKKOS_CPPFLAGS += -I$(HWLOC_PATH)/include
	- KOKKOS_LDFLAGS += -L$(HWLOC_PATH)/lib
	- KOKKOS_LIBS += -lhwloc
	- tmp := $(shell echo "\#define KOKKOS_HAVE_HWLOC 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CPPFLAGS += -I$(HWLOC_PATH)/include
	+ KOKKOS_LDFLAGS += -L$(HWLOC_PATH)/lib
	+ KOKKOS_LIBS += -lhwloc
	+ tmp := $(shell echo "\#define KOKKOS_HAVE_HWLOC 1" >> KokkosCore_config.tmp )
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_LIBRT), 1)
	- tmp := $(shell echo "\#define KOKKOS_USE_LIBRT 1" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define PREC_TIMER 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_USE_LIBRT 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define PREC_TIMER 1" >> KokkosCore_config.tmp )
	tmp := $(shell echo "\#define KOKKOSP_ENABLE_RTLIB 1" >> KokkosCore_config.tmp )
	- KOKKOS_LIBS += -lrt
	+ KOKKOS_LIBS += -lrt
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_MEMKIND), 1)
	KOKKOS_CPPFLAGS += -I$(MEMKIND_PATH)/include
	- KOKKOS_LDFLAGS += -L$(MEMKIND_PATH)/lib
	- KOKKOS_LIBS += -lmemkind
	+ KOKKOS_LDFLAGS += -L$(MEMKIND_PATH)/lib
	+ KOKKOS_LIBS += -lmemkind
	tmp := $(shell echo "\#define KOKKOS_HAVE_HBWSPACE 1" >> KokkosCore_config.tmp )
	endif

	ifeq ($(KOKKOS_INTERNAL_DISABLE_PROFILING), 1)
	tmp := $(shell echo "\#define KOKKOS_ENABLE_PROFILING 0" >> KokkosCore_config.tmp )
	endif

	tmp := $(shell echo "/* Optimization Settings */" >> KokkosCore_config.tmp)

	ifeq ($(KOKKOS_INTERNAL_OPT_RANGE_AGGRESSIVE_VECTORIZATION), 1)
	tmp := $(shell echo "\#define KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION 1" >> KokkosCore_config.tmp )
	endif

	tmp := $(shell echo "/* Cuda Settings */" >> KokkosCore_config.tmp)

	ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
	+
	ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LDG), 1)
	- tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LDG_INTRINSIC 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LDG_INTRINSIC 1" >> KokkosCore_config.tmp )
	endif

	ifeq ($(KOKKOS_INTERNAL_CUDA_USE_UVM), 1)
	- tmp := $(shell echo "\#define KOKKOS_CUDA_USE_UVM 1" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_USE_CUDA_UVM 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_CUDA_USE_UVM 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_USE_CUDA_UVM 1" >> KokkosCore_config.tmp )
	endif

	ifeq ($(KOKKOS_INTERNAL_CUDA_USE_RELOC), 1)
	- tmp := $(shell echo "\#define KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE 1" >> KokkosCore_config.tmp )
	- KOKKOS_CXXFLAGS += --relocatable-device-code=true
	- KOKKOS_LDFLAGS += --relocatable-device-code=true
	+ tmp := $(shell echo "\#define KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CXXFLAGS += --relocatable-device-code=true
	+ KOKKOS_LDFLAGS += --relocatable-device-code=true
	endif

	ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LAMBDA), 1)
	ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
	ifeq ($(shell test $(KOKKOS_INTERNAL_COMPILER_NVCC_VERSION) -gt 70; echo $$?),0)
	- tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
	- KOKKOS_CXXFLAGS += -expt-extended-lambda
	+ tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CXXFLAGS += -expt-extended-lambda
	else
	$(warning Warning: Cuda Lambda support was requested but NVCC version is too low. This requires NVCC for Cuda version 7.5 or higher. Disabling Lambda support now.)
	endif
	endif
	+
	ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
	tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
	endif
	endif
	+
	endif

	-#Add Architecture flags
	+# Add Architecture flags.

	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV80), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV80 1" >> KokkosCore_config.tmp )
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
	- KOKKOS_CXXFLAGS +=
	- KOKKOS_LDFLAGS +=
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV80 1" >> KokkosCore_config.tmp )
	+
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
	+ KOKKOS_CXXFLAGS +=
	+ KOKKOS_LDFLAGS +=
	+ else
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	+ KOKKOS_CXXFLAGS +=
	+ KOKKOS_LDFLAGS +=
	else
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	- KOKKOS_CXXFLAGS +=
	- KOKKOS_LDFLAGS +=
	- else
	- KOKKOS_CXXFLAGS += -march=armv8-a
	- KOKKOS_LDFLAGS += -march=armv8-a
	- endif
	+ KOKKOS_CXXFLAGS += -march=armv8-a
	+ KOKKOS_LDFLAGS += -march=armv8-a
	endif
	+ endif
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV81), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV81 1" >> KokkosCore_config.tmp )
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
	- KOKKOS_CXXFLAGS +=
	- KOKKOS_LDFLAGS +=
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV81 1" >> KokkosCore_config.tmp )
	+
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
	+ KOKKOS_CXXFLAGS +=
	+ KOKKOS_LDFLAGS +=
	+ else
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	+ KOKKOS_CXXFLAGS +=
	+ KOKKOS_LDFLAGS +=
	else
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	- KOKKOS_CXXFLAGS +=
	- KOKKOS_LDFLAGS +=
	- else
	- KOKKOS_CXXFLAGS += -march=armv8.1-a
	- KOKKOS_LDFLAGS += -march=armv8.1-a
	- endif
	+ KOKKOS_CXXFLAGS += -march=armv8.1-a
	+ KOKKOS_LDFLAGS += -march=armv8.1-a
	endif
	+ endif
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV80 1" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV8_THUNDERX 1" >> KokkosCore_config.tmp )
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
	- KOKKOS_CXXFLAGS +=
	- KOKKOS_LDFLAGS +=
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV80 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV8_THUNDERX 1" >> KokkosCore_config.tmp )
	+
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
	+ KOKKOS_CXXFLAGS +=
	+ KOKKOS_LDFLAGS +=
	+ else
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	+ KOKKOS_CXXFLAGS +=
	+ KOKKOS_LDFLAGS +=
	else
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	- KOKKOS_CXXFLAGS +=
	- KOKKOS_LDFLAGS +=
	- else
	- KOKKOS_CXXFLAGS += -march=armv8-a -mtune=thunderx
	- KOKKOS_LDFLAGS += -march=armv8-a -mtune=thunderx
	- endif
	+ KOKKOS_CXXFLAGS += -march=armv8-a -mtune=thunderx
	+ KOKKOS_LDFLAGS += -march=armv8-a -mtune=thunderx
	endif
	+ endif
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_AVX 1" >> KokkosCore_config.tmp )
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
	- KOKKOS_CXXFLAGS += -mavx
	- KOKKOS_LDFLAGS += -mavx
	- else
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
	-
	- else
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	- KOKKOS_CXXFLAGS += -tp=sandybridge
	- KOKKOS_LDFLAGS += -tp=sandybridge
	- else
	- # Assume that this is a really a GNU compiler
	- KOKKOS_CXXFLAGS += -mavx
	- KOKKOS_LDFLAGS += -mavx
	- endif
	- endif
	- endif
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_AVX 1" >> KokkosCore_config.tmp )
	+
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
	+ KOKKOS_CXXFLAGS += -mavx
	+ KOKKOS_LDFLAGS += -mavx
	+ else
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
	+
	+ else
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	+ KOKKOS_CXXFLAGS += -tp=sandybridge
	+ KOKKOS_LDFLAGS += -tp=sandybridge
	+ else
	+ # Assume that this is a really a GNU compiler.
	+ KOKKOS_CXXFLAGS += -mavx
	+ KOKKOS_LDFLAGS += -mavx
	+ endif
	+ endif
	+ endif
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_POWER8), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_POWER8 1" >> KokkosCore_config.tmp )
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_POWER8 1" >> KokkosCore_config.tmp )
	+
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)

	- else
	- # Assume that this is a really a GNU compiler or it could be XL on P8
	- KOKKOS_CXXFLAGS += -mcpu=power8 -mtune=power8
	- KOKKOS_LDFLAGS += -mcpu=power8 -mtune=power8
	- endif
	+ else
	+ # Assume that this is a really a GNU compiler or it could be XL on P8.
	+ KOKKOS_CXXFLAGS += -mcpu=power8 -mtune=power8
	+ KOKKOS_LDFLAGS += -mcpu=power8 -mtune=power8
	+ endif
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_POWER9), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_POWER9 1" >> KokkosCore_config.tmp )
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_POWER9 1" >> KokkosCore_config.tmp )
	+
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)

	- else
	- # Assume that this is a really a GNU compiler or it could be XL on P9
	- KOKKOS_CXXFLAGS += -mcpu=power9 -mtune=power9
	- KOKKOS_LDFLAGS += -mcpu=power9 -mtune=power9
	- endif
	+ else
	+ # Assume that this is a really a GNU compiler or it could be XL on P9.
	+ KOKKOS_CXXFLAGS += -mcpu=power9 -mtune=power9
	+ KOKKOS_LDFLAGS += -mcpu=power9 -mtune=power9
	+ endif
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX2), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_AVX2 1" >> KokkosCore_config.tmp )
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
	- KOKKOS_CXXFLAGS += -xCORE-AVX2
	- KOKKOS_LDFLAGS += -xCORE-AVX2
	- else
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
	-
	- else
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	- KOKKOS_CXXFLAGS += -tp=haswell
	- KOKKOS_LDFLAGS += -tp=haswell
	- else
	- # Assume that this is a really a GNU compiler
	- KOKKOS_CXXFLAGS += -march=core-avx2 -mtune=core-avx2
	- KOKKOS_LDFLAGS += -march=core-avx2 -mtune=core-avx2
	- endif
	- endif
	- endif
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_AVX2 1" >> KokkosCore_config.tmp )
	+
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
	+ KOKKOS_CXXFLAGS += -xCORE-AVX2
	+ KOKKOS_LDFLAGS += -xCORE-AVX2
	+ else
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
	+
	+ else
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	+ KOKKOS_CXXFLAGS += -tp=haswell
	+ KOKKOS_LDFLAGS += -tp=haswell
	+ else
	+ # Assume that this is a really a GNU compiler.
	+ KOKKOS_CXXFLAGS += -march=core-avx2 -mtune=core-avx2
	+ KOKKOS_LDFLAGS += -march=core-avx2 -mtune=core-avx2
	+ endif
	+ endif
	+ endif
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512MIC), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_AVX512MIC 1" >> KokkosCore_config.tmp )
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
	- KOKKOS_CXXFLAGS += -xMIC-AVX512
	- KOKKOS_LDFLAGS += -xMIC-AVX512
	- else
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_AVX512MIC 1" >> KokkosCore_config.tmp )
	+
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
	+ KOKKOS_CXXFLAGS += -xMIC-AVX512
	+ KOKKOS_LDFLAGS += -xMIC-AVX512
	+ else
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)

	- else
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	+ else
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)

	- else
	- # Asssume that this is really a GNU compiler
	- KOKKOS_CXXFLAGS += -march=knl
	- KOKKOS_LDFLAGS += -march=knl
	- endif
	- endif
	- endif
	+ else
	+ # Asssume that this is really a GNU compiler.
	+ KOKKOS_CXXFLAGS += -march=knl
	+ KOKKOS_LDFLAGS += -march=knl
	+ endif
	+ endif
	+ endif
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512XEON), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_AVX512XEON 1" >> KokkosCore_config.tmp )
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
	- KOKKOS_CXXFLAGS += -xCORE-AVX512
	- KOKKOS_LDFLAGS += -xCORE-AVX512
	- else
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_AVX512XEON 1" >> KokkosCore_config.tmp )
	+
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
	+ KOKKOS_CXXFLAGS += -xCORE-AVX512
	+ KOKKOS_LDFLAGS += -xCORE-AVX512
	+ else
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)

	- else
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
	+ else
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)

	- else
	- # Nothing here yet
	- KOKKOS_CXXFLAGS += -march=skylake-avx512
	- KOKKOS_LDFLAGS += -march=skylake-avx512
	- endif
	- endif
	- endif
	+ else
	+ # Nothing here yet.
	+ KOKKOS_CXXFLAGS += -march=skylake-avx512
	+ KOKKOS_LDFLAGS += -march=skylake-avx512
	+ endif
	+ endif
	+ endif
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KNC), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_KNC 1" >> KokkosCore_config.tmp )
	- KOKKOS_CXXFLAGS += -mmic
	- KOKKOS_LDFLAGS += -mmic
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_KNC 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CXXFLAGS += -mmic
	+ KOKKOS_LDFLAGS += -mmic
	endif

	-#Figure out the architecture flag for Cuda
	+# Figure out the architecture flag for Cuda.
	ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
	+
	ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
	KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG=-arch
	endif
	ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
	- KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG=-x cuda --cuda-gpu-arch
	+ KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG=--cuda-gpu-arch
	+ KOKKOS_CXXFLAGS += -x cuda
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER30), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER30 1" >> KokkosCore_config.tmp )
	- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_30
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER30 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_30
	+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_30
	endif
	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER32), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER32 1" >> KokkosCore_config.tmp )
	- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_32
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER32 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_32
	+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_32
	endif
	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER35), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER35 1" >> KokkosCore_config.tmp )
	- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_35
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER35 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_35
	+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_35
	endif
	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER37), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER37 1" >> KokkosCore_config.tmp )
	- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_37
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER37 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_37
	+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_37
	endif
	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL50 1" >> KokkosCore_config.tmp )
	- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_50
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL50 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_50
	+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_50
	endif
	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL52 1" >> KokkosCore_config.tmp )
	- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_52
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL52 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_52
	+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_52
	endif
	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL53 1" >> KokkosCore_config.tmp )
	- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_53
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL53 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_53
	+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_53
	endif
	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL61), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL61 1" >> KokkosCore_config.tmp )
	- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_61
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL61 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_61
	+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_61
	endif
	ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL60), 1)
	- tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
	- tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL60 1" >> KokkosCore_config.tmp )
	- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_60
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
	+ tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL60 1" >> KokkosCore_config.tmp )
	+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_60
	+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_60
	endif
	+
	endif
	-
	+
	KOKKOS_INTERNAL_LS_CONFIG := $(shell ls KokkosCore_config.h)
	ifeq ($(KOKKOS_INTERNAL_LS_CONFIG), KokkosCore_config.h)
	-KOKKOS_INTERNAL_NEW_CONFIG := $(strip $(shell diff KokkosCore_config.h KokkosCore_config.tmp \| grep define \| wc -l))
	+ KOKKOS_INTERNAL_NEW_CONFIG := $(strip $(shell diff KokkosCore_config.h KokkosCore_config.tmp \| grep define \| wc -l))
	else
	-KOKKOS_INTERNAL_NEW_CONFIG := 1
	+ KOKKOS_INTERNAL_NEW_CONFIG := 1
	endif

	ifneq ($(KOKKOS_INTERNAL_NEW_CONFIG), 0)
	- tmp := $(shell cp KokkosCore_config.tmp KokkosCore_config.h)
	+ tmp := $(shell cp KokkosCore_config.tmp KokkosCore_config.h)
	endif

	KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/*.hpp)
	KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/impl/*.hpp)
	KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/containers/src/*.hpp)
	KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/containers/src/impl/*.hpp)
	KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/algorithms/src/*.hpp)

	KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/impl/*.cpp)
	KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/containers/src/impl/*.cpp)

	ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
	- KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.cpp)
	- KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
	- KOKKOS_CXXFLAGS += -I$(CUDA_PATH)/include
	- KOKKOS_LDFLAGS += -L$(CUDA_PATH)/lib64
	- KOKKOS_LIBS += -lcudart -lcuda
	+ KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.cpp)
	+ KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
	+ KOKKOS_CXXFLAGS += -I$(CUDA_PATH)/include
	+ KOKKOS_LDFLAGS += -L$(CUDA_PATH)/lib64
	+ KOKKOS_LIBS += -lcudart -lcuda
	endif

	-ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
	- KOKKOS_LIBS += -lpthread
	- KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.cpp)
	- KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.hpp)
	+ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
	+ KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.cpp)
	+ KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.hpp)
	+
	+ ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
	+ KOKKOS_CXXFLAGS += -Xcompiler $(KOKKOS_INTERNAL_OPENMP_FLAG)
	+ else
	+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_OPENMP_FLAG)
	+ endif
	+
	+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_OPENMP_FLAG)
	endif

	-ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
	- KOKKOS_LIBS += -lqthread
	- KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Qthread/*.cpp)
	- KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Qthread/*.hpp)
	+ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
	+ KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.cpp)
	+ KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.hpp)
	+ KOKKOS_LIBS += -lpthread
	endif

	-ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
	- KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.cpp)
	- KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.hpp)
	- ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
	- KOKKOS_CXXFLAGS += -Xcompiler $(KOKKOS_INTERNAL_OPENMP_FLAG)
	- else
	- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_OPENMP_FLAG)
	- endif
	- KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_OPENMP_FLAG)
	+ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 1)
	+ KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Qthreads/*.cpp)
	+ KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Qthreads/*.hpp)
	+ KOKKOS_CPPFLAGS += -I$(QTHREADS_PATH)/include
	+ KOKKOS_LDFLAGS += -L$(QTHREADS_PATH)/lib
	+ KOKKOS_LIBS += -lqthread
	endif

	-#Explicitly set the GCC Toolchain for Clang
	+# Explicitly set the GCC Toolchain for Clang.
	ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
	- KOKKOS_INTERNAL_GCC_PATH = $(shell which g++)
	- KOKKOS_INTERNAL_GCC_TOOLCHAIN = $(KOKKOS_INTERNAL_GCC_PATH:/bin/g++=)
	- KOKKOS_CXXFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN) -DKOKKOS_CUDA_CLANG_WORKAROUND -DKOKKOS_CUDA_USE_LDG_INTRINSIC
	- KOKKOS_LDFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN)
	+ KOKKOS_INTERNAL_GCC_PATH = $(shell which g++)
	+ KOKKOS_INTERNAL_GCC_TOOLCHAIN = $(KOKKOS_INTERNAL_GCC_PATH:/bin/g++=)
	+ KOKKOS_CXXFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN) -DKOKKOS_CUDA_CLANG_WORKAROUND -DKOKKOS_CUDA_USE_LDG_INTRINSIC
	+ KOKKOS_LDFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN)
	endif

	-#With Cygwin functions such as fdopen and fileno are not defined
	-#when strict ansi is enabled. strict ansi gets enabled with --std=c++11
	-#though. So we hard undefine it here. Not sure if that has any bad side effects
	-#This is needed for gtest actually, not for Kokkos itself!
	+# With Cygwin functions such as fdopen and fileno are not defined
	+# when strict ansi is enabled. strict ansi gets enabled with --std=c++11
	+# though. So we hard undefine it here. Not sure if that has any bad side effects
	+# This is needed for gtest actually, not for Kokkos itself!
	ifeq ($(KOKKOS_INTERNAL_OS_CYGWIN), 1)
	KOKKOS_CXXFLAGS += -U__STRICT_ANSI__
	endif

	-# Setting up dependencies
	+# Setting up dependencies.

	KokkosCore_config.h:

	KOKKOS_CPP_DEPENDS := KokkosCore_config.h $(KOKKOS_HEADERS)

	KOKKOS_OBJ = $(KOKKOS_SRC:.cpp=.o)
	KOKKOS_OBJ_LINK = $(notdir $(KOKKOS_OBJ))

	include $(KOKKOS_PATH)/Makefile.targets

	kokkos-clean:
	rm -f $(KOKKOS_OBJ_LINK) KokkosCore_config.h KokkosCore_config.tmp libkokkos.a

	libkokkos.a: $(KOKKOS_OBJ_LINK) $(KOKKOS_SRC) $(KOKKOS_HEADERS)
	ar cr libkokkos.a $(KOKKOS_OBJ_LINK)
	ranlib libkokkos.a

	KOKKOS_LINK_DEPENDS=libkokkos.a
	diff --git a/lib/kokkos/Makefile.targets b/lib/kokkos/Makefile.targets
	index a48a5f6eb..54cacb741 100644
	--- a/lib/kokkos/Makefile.targets
	+++ b/lib/kokkos/Makefile.targets
	@@ -1,62 +1,63 @@
	Kokkos_UnorderedMap_impl.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/containers/src/impl/Kokkos_UnorderedMap_impl.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/containers/src/impl/Kokkos_UnorderedMap_impl.cpp
	Kokkos_Core.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Core.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Core.cpp
	Kokkos_CPUDiscovery.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_CPUDiscovery.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_CPUDiscovery.cpp
	Kokkos_Error.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Error.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Error.cpp
	Kokkos_ExecPolicy.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_ExecPolicy.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_ExecPolicy.cpp
	Kokkos_HostSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HostSpace.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HostSpace.cpp
	Kokkos_hwloc.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_hwloc.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_hwloc.cpp
	Kokkos_Serial.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial.cpp
	Kokkos_Serial_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_Task.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_Task.cpp
	Kokkos_TaskQueue.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_TaskQueue.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_TaskQueue.cpp
	+Kokkos_HostThreadTeam.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HostThreadTeam.cpp
	+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HostThreadTeam.cpp
	Kokkos_spinwait.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_spinwait.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_spinwait.cpp
	Kokkos_Profiling_Interface.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Profiling_Interface.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Profiling_Interface.cpp
	Kokkos_SharedAlloc.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_SharedAlloc.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_SharedAlloc.cpp
	Kokkos_MemoryPool.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_MemoryPool.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_MemoryPool.cpp

	ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
	Kokkos_Cuda_Impl.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Impl.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Impl.cpp
	Kokkos_CudaSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_CudaSpace.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Cuda/Kokkos_CudaSpace.cpp
	Kokkos_Cuda_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Task.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Task.cpp
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
	Kokkos_ThreadsExec_base.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec_base.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec_base.cpp
	Kokkos_ThreadsExec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec.cpp
	endif

	-ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
	-Kokkos_QthreadExec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Qthread/Kokkos_QthreadExec.cpp
	- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Qthread/Kokkos_QthreadExec.cpp
	-Kokkos_Qthread_TaskPolicy.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp
	- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp
	+ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 1)
	+Kokkos_QthreadsExec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Qthreads/Kokkos_QthreadsExec.cpp
	+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Qthreads/Kokkos_QthreadsExec.cpp
	+Kokkos_Qthreads_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Qthreads/Kokkos_Qthreads_Task.cpp
	+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Qthreads/Kokkos_Qthreads_Task.cpp
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
	Kokkos_OpenMPexec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMPexec.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMPexec.cpp
	Kokkos_OpenMP_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
	endif

	Kokkos_HBWSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp
	-
	diff --git a/lib/kokkos/README b/lib/kokkos/README
	index 7ebde23a1..257a2e5db 100644
	--- a/lib/kokkos/README
	+++ b/lib/kokkos/README
	@@ -1,165 +1,173 @@
	Kokkos implements a programming model in C++ for writing performance portable
	applications targeting all major HPC platforms. For that purpose it provides
	abstractions for both parallel execution of code and data management.
	Kokkos is designed to target complex node architectures with N-level memory
	hierarchies and multiple types of execution resources. It currently can use
	OpenMP, Pthreads and CUDA as backend programming models.

	Kokkos is licensed under standard 3-clause BSD terms of use. For specifics
	see the LICENSE file contained in the repository or distribution.

	The core developers of Kokkos are Carter Edwards and Christian Trott
	at the Computer Science Research Institute of the Sandia National
	Laboratories.

	The KokkosP interface and associated tools are developed by the Application
	Performance Team and Kokkos core developers at Sandia National Laboratories.

	To learn more about Kokkos consider watching one of our presentations:
	GTC 2015:
	http://on-demand.gputechconf.com/gtc/2015/video/S5166.html
	http://on-demand.gputechconf.com/gtc/2015/presentation/S5166-H-Carter-Edwards.pdf

	A programming guide can be found under doc/Kokkos_PG.pdf. This is an initial version
	and feedback is greatly appreciated.

	A separate repository with extensive tutorial material can be found under
	https://github.com/kokkos/kokkos-tutorials.

	If you have a patch to contribute please feel free to issue a pull request against
	the develop branch. For major contributions it is better to contact us first
	for guidance.

	For questions please send an email to
	kokkos-users@software.sandia.gov

	For non-public questions send an email to
	hcedwar(at)sandia.gov and crtrott(at)sandia.gov

	============================================================================
	====Requirements============================================================
	============================================================================

	Primary tested compilers on X86 are:
	GCC 4.7.2
	GCC 4.8.4
	GCC 4.9.2
	GCC 5.1.0
	+ GCC 5.2.0
	Intel 14.0.4
	Intel 15.0.2
	Intel 16.0.1
	Intel 17.0.098
	+ Intel 17.1.132
	Clang 3.5.2
	Clang 3.6.1
	+ Clang 3.7.1
	+ Clang 3.8.1
	Clang 3.9.0
	+ PGI 17.1

	Primary tested compilers on Power 8 are:
	GCC 5.4.0 (OpenMP,Serial)
	IBM XL 13.1.3 (OpenMP, Serial) (There is a workaround in place to avoid a compiler bug)

	Primary tested compilers on Intel KNL are:
	+ GCC 6.2.0
	Intel 16.2.181 (with gcc 4.7.2)
	Intel 17.0.098 (with gcc 4.7.2)
	+ Intel 17.1.132 (with gcc 4.9.3)
	+ Intel 17.2.174 (with gcc 4.9.3)
	+ Intel 18.0.061 (beta) (with gcc 4.9.3)

	Secondary tested compilers are:
	- CUDA 7.0 (with gcc 4.7.2)
	- CUDA 7.5 (with gcc 4.7.2)
	+ CUDA 7.0 (with gcc 4.8.4)
	+ CUDA 7.5 (with gcc 4.8.4)
	CUDA 8.0 (with gcc 5.3.0 on X86 and gcc 5.4.0 on Power8)
	CUDA/Clang 8.0 using Clang/Trunk compiler

	Other compilers working:
	X86:
	- PGI 15.4
	Cygwin 2.1.0 64bit with gcc 4.9.3

	Known non-working combinations:
	Power8:
	Pthreads backend


	Primary tested compiler are passing in release mode
	with warnings as errors. They also are tested with a comprehensive set of
	backend combinations (i.e. OpenMP, Pthreads, Serial, OpenMP+Serial, ...).
	We are using the following set of flags:
	GCC: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits
	-Wignored-qualifiers -Wempty-body -Wclobbered -Wuninitialized
	Intel: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized
	Clang: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized

	Secondary compilers are passing without -Werror.
	Other compilers are tested occasionally, in particular when pushing from develop to
	master branch, without -Werror and only for a select set of backends.

	============================================================================
	====Getting started=========================================================
	============================================================================

	In the 'example/tutorial' directory you will find step by step tutorial
	examples which explain many of the features of Kokkos. They work with
	simple Makefiles. To build with g++ and OpenMP simply type 'make'
	in the 'example/tutorial' directory. This will build all examples in the
	subfolders. To change the build options refer to the Programming Guide
	in the compilation section.

	============================================================================
	====Running Unit Tests======================================================
	============================================================================

	To run the unit tests create a build directory and run the following commands

	KOKKOS_PATH/generate_makefile.bash
	make build-test
	make test

	Run KOKKOS_PATH/generate_makefile.bash --help for more detailed options such as
	changing the device type for which to build.

	============================================================================
	====Install the library=====================================================
	============================================================================

	To install Kokkos as a library create a build directory and run the following

	KOKKOS_PATH/generate_makefile.bash --prefix=INSTALL_PATH
	make lib
	make install

	KOKKOS_PATH/generate_makefile.bash --help for more detailed options such as
	changing the device type for which to build.

	============================================================================
	====CMakeFiles==============================================================
	============================================================================

	The CMake files contained in this repository require Tribits and are used
	for integration with Trilinos. They do not currently support a standalone
	CMake build.

	===========================================================================
	====Kokkos and CUDA UVM====================================================
	===========================================================================

	Kokkos does support UVM as a specific memory space called CudaUVMSpace.
	Allocations made with that space are accessible from host and device.
	You can tell Kokkos to use that as the default space for Cuda allocations.
	In either case UVM comes with a number of restrictions:
	(i) You can't access allocations on the host while a kernel is potentially
	running. This will lead to segfaults. To avoid that you either need to
	call Kokkos::Cuda::fence() (or just Kokkos::fence()), after kernels, or
	you can set the environment variable CUDA_LAUNCH_BLOCKING=1.
	Furthermore in multi socket multi GPU machines, UVM defaults to using
	zero copy allocations for technical reasons related to using multiple
	GPUs from the same process. If an executable doesn't do that (e.g. each
	MPI rank of an application uses a single GPU [can be the same GPU for
	multiple MPI ranks]) you can set CUDA_MANAGED_FORCE_DEVICE_ALLOC=1.
	This will enforce proper UVM allocations, but can lead to errors if
	more than a single GPU is used by a single process.

	===========================================================================
	====Contributing===========================================================
	===========================================================================

	Contributions to Kokkos are welcome. In order to do so, please open an issue
	where a feature request or bug can be discussed. Then issue a pull request
	with your contribution. Pull requests must be issued against the develop branch.

	diff --git a/lib/kokkos/algorithms/cmake/Dependencies.cmake b/lib/kokkos/algorithms/cmake/Dependencies.cmake
	index 1d71d8af3..c36b62523 100644
	--- a/lib/kokkos/algorithms/cmake/Dependencies.cmake
	+++ b/lib/kokkos/algorithms/cmake/Dependencies.cmake
	@@ -1,5 +1,5 @@
	TRIBITS_PACKAGE_DEFINE_DEPENDENCIES(
	- LIB_REQUIRED_PACKAGES KokkosCore
	+ LIB_REQUIRED_PACKAGES KokkosCore KokkosContainers
	LIB_OPTIONAL_TPLS Pthread CUDA HWLOC
	TEST_OPTIONAL_TPLS CUSPARSE
	)
	diff --git a/lib/kokkos/algorithms/src/Kokkos_Random.hpp b/lib/kokkos/algorithms/src/Kokkos_Random.hpp
	index d376173bf..bd7358236 100644
	--- a/lib/kokkos/algorithms/src/Kokkos_Random.hpp
	+++ b/lib/kokkos/algorithms/src/Kokkos_Random.hpp
	@@ -1,1751 +1,1755 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_RANDOM_HPP
	#define KOKKOS_RANDOM_HPP

	#include <Kokkos_Core.hpp>
	#include <Kokkos_Complex.hpp>
	#include <cstdio>
	#include <cstdlib>
	#include <cmath>

	/// \file Kokkos_Random.hpp
	/// \brief Pseudorandom number generators
	///
	/// These generators are based on Vigna, Sebastiano (2014). "An
	/// experimental exploration of Marsaglia's xorshift generators,
	/// scrambled." See: http://arxiv.org/abs/1402.6246

	namespace Kokkos {

	/*Template functions to get equidistributed random numbers from a generator for a specific Scalar type

	template<class Generator,Scalar>
	struct rand{

	//Max value returned by draw(Generator& gen)
	KOKKOS_INLINE_FUNCTION
	static Scalar max();

	//Returns a value between zero and max()
	KOKKOS_INLINE_FUNCTION
	static Scalar draw(Generator& gen);

	//Returns a value between zero and range()
	//Note: for floating point values range can be larger than max()
	KOKKOS_INLINE_FUNCTION
	static Scalar draw(Generator& gen, const Scalar& range){}

	//Return value between start and end
	KOKKOS_INLINE_FUNCTION
	static Scalar draw(Generator& gen, const Scalar& start, const Scalar& end);
	};

	The Random number generators themselves have two components a state-pool and the actual generator
	A state-pool manages a number of generators, so that each active thread is able to grep its own.
	This allows the generation of random numbers which are independent between threads. Note that
	in contrast to CuRand none of the functions of the pool (or the generator) are collectives,
	i.e. all functions can be called inside conditionals.

	template<class Device>
	class Pool {
	public:
	//The Kokkos device type
	typedef Device device_type;
	//The actual generator type
	typedef Generator<Device> generator_type;

	//Default constructor: does not initialize a pool
	Pool();

	//Initializing constructor: calls init(seed,Device_Specific_Number);
	Pool(unsigned int seed);

	//Intialize Pool with seed as a starting seed with a pool_size of num_states
	//The Random_XorShift64 generator is used in serial to initialize all states,
	//thus the intialization process is platform independent and deterministic.
	void init(unsigned int seed, int num_states);

	//Get a generator. This will lock one of the states, guaranteeing that each thread
	//will have its private generator. Note: on Cuda getting a state involves atomics,
	//and is thus not deterministic!
	generator_type get_state();

	//Give a state back to the pool. This unlocks the state, and writes the modified
	//state of the generator back to the pool.
	void free_state(generator_type gen);

	}

	template<class Device>
	class Generator {
	public:
	//The Kokkos device type
	typedef DeviceType device_type;

	//Max return values of respective [X]rand[S]() functions
	enum {MAX_URAND = 0xffffffffU};
	enum {MAX_URAND64 = 0xffffffffffffffffULL-1};
	enum {MAX_RAND = static_cast<int>(0xffffffffU/2)};
	enum {MAX_RAND64 = static_cast<int64_t>(0xffffffffffffffffULL/2-1)};


	//Init with a state and the idx with respect to pool. Note: in serial the
	//Generator can be used by just giving it the necessary state arguments
	KOKKOS_INLINE_FUNCTION
	Generator (STATE_ARGUMENTS, int state_idx = 0);

	//Draw a equidistributed uint32_t in the range (0,MAX_URAND]
	KOKKOS_INLINE_FUNCTION
	uint32_t urand();

	//Draw a equidistributed uint64_t in the range (0,MAX_URAND64]
	KOKKOS_INLINE_FUNCTION
	uint64_t urand64();

	//Draw a equidistributed uint32_t in the range (0,range]
	KOKKOS_INLINE_FUNCTION
	uint32_t urand(const uint32_t& range);

	//Draw a equidistributed uint32_t in the range (start,end]
	KOKKOS_INLINE_FUNCTION
	uint32_t urand(const uint32_t& start, const uint32_t& end );

	//Draw a equidistributed uint64_t in the range (0,range]
	KOKKOS_INLINE_FUNCTION
	uint64_t urand64(const uint64_t& range);

	//Draw a equidistributed uint64_t in the range (start,end]
	KOKKOS_INLINE_FUNCTION
	uint64_t urand64(const uint64_t& start, const uint64_t& end );

	//Draw a equidistributed int in the range (0,MAX_RAND]
	KOKKOS_INLINE_FUNCTION
	int rand();

	//Draw a equidistributed int in the range (0,range]
	KOKKOS_INLINE_FUNCTION
	int rand(const int& range);

	//Draw a equidistributed int in the range (start,end]
	KOKKOS_INLINE_FUNCTION
	int rand(const int& start, const int& end );

	//Draw a equidistributed int64_t in the range (0,MAX_RAND64]
	KOKKOS_INLINE_FUNCTION
	int64_t rand64();

	//Draw a equidistributed int64_t in the range (0,range]
	KOKKOS_INLINE_FUNCTION
	int64_t rand64(const int64_t& range);

	//Draw a equidistributed int64_t in the range (start,end]
	KOKKOS_INLINE_FUNCTION
	int64_t rand64(const int64_t& start, const int64_t& end );

	//Draw a equidistributed float in the range (0,1.0]
	KOKKOS_INLINE_FUNCTION
	float frand();

	//Draw a equidistributed float in the range (0,range]
	KOKKOS_INLINE_FUNCTION
	float frand(const float& range);

	//Draw a equidistributed float in the range (start,end]
	KOKKOS_INLINE_FUNCTION
	float frand(const float& start, const float& end );

	//Draw a equidistributed double in the range (0,1.0]
	KOKKOS_INLINE_FUNCTION
	double drand();

	//Draw a equidistributed double in the range (0,range]
	KOKKOS_INLINE_FUNCTION
	double drand(const double& range);

	//Draw a equidistributed double in the range (start,end]
	KOKKOS_INLINE_FUNCTION
	double drand(const double& start, const double& end );

	//Draw a standard normal distributed double
	KOKKOS_INLINE_FUNCTION
	double normal() ;

	//Draw a normal distributed double with given mean and standard deviation
	KOKKOS_INLINE_FUNCTION
	double normal(const double& mean, const double& std_dev=1.0);
	}

	//Additional Functions:

	//Fills view with random numbers in the range (0,range]
	template<class ViewType, class PoolType>
	void fill_random(ViewType view, PoolType pool, ViewType::value_type range);

	//Fills view with random numbers in the range (start,end]
	template<class ViewType, class PoolType>
	void fill_random(ViewType view, PoolType pool,
	ViewType::value_type start, ViewType::value_type end);

	*/

	template<class Generator, class Scalar>
	struct rand;


	template<class Generator>
	struct rand<Generator,char> {

	KOKKOS_INLINE_FUNCTION
	static short max(){return 127;}
	KOKKOS_INLINE_FUNCTION
	static short draw(Generator& gen)
	{return short((gen.rand()&0xff+256)%256);}
	KOKKOS_INLINE_FUNCTION
	static short draw(Generator& gen, const char& range)
	{return char(gen.rand(range));}
	KOKKOS_INLINE_FUNCTION
	static short draw(Generator& gen, const char& start, const char& end)
	{return char(gen.rand(start,end));}

	};

	template<class Generator>
	struct rand<Generator,short> {
	KOKKOS_INLINE_FUNCTION
	static short max(){return 32767;}
	KOKKOS_INLINE_FUNCTION
	static short draw(Generator& gen)
	{return short((gen.rand()&0xffff+65536)%32768);}
	KOKKOS_INLINE_FUNCTION
	static short draw(Generator& gen, const short& range)
	{return short(gen.rand(range));}
	KOKKOS_INLINE_FUNCTION
	static short draw(Generator& gen, const short& start, const short& end)
	{return short(gen.rand(start,end));}

	};

	template<class Generator>
	struct rand<Generator,int> {
	KOKKOS_INLINE_FUNCTION
	static int max(){return Generator::MAX_RAND;}
	KOKKOS_INLINE_FUNCTION
	static int draw(Generator& gen)
	{return gen.rand();}
	KOKKOS_INLINE_FUNCTION
	static int draw(Generator& gen, const int& range)
	{return gen.rand(range);}
	KOKKOS_INLINE_FUNCTION
	static int draw(Generator& gen, const int& start, const int& end)
	{return gen.rand(start,end);}

	};

	template<class Generator>
	struct rand<Generator,unsigned int> {
	KOKKOS_INLINE_FUNCTION
	static unsigned int max () {
	return Generator::MAX_URAND;
	}
	KOKKOS_INLINE_FUNCTION
	static unsigned int draw (Generator& gen) {
	return gen.urand ();
	}
	KOKKOS_INLINE_FUNCTION
	static unsigned int draw(Generator& gen, const unsigned int& range) {
	return gen.urand (range);
	}
	KOKKOS_INLINE_FUNCTION
	static unsigned int
	draw (Generator& gen, const unsigned int& start, const unsigned int& end) {
	return gen.urand (start, end);
	}
	};

	template<class Generator>
	struct rand<Generator,long> {
	KOKKOS_INLINE_FUNCTION
	static long max () {
	// FIXME (mfh 26 Oct 2014) It would be better to select the
	// return value at compile time, using something like enable_if.
	return sizeof (long) == 4 ?
	static_cast<long> (Generator::MAX_RAND) :
	static_cast<long> (Generator::MAX_RAND64);
	}
	KOKKOS_INLINE_FUNCTION
	static long draw (Generator& gen) {
	// FIXME (mfh 26 Oct 2014) It would be better to select the
	// return value at compile time, using something like enable_if.
	return sizeof (long) == 4 ?
	static_cast<long> (gen.rand ()) :
	static_cast<long> (gen.rand64 ());
	}
	KOKKOS_INLINE_FUNCTION
	static long draw (Generator& gen, const long& range) {
	// FIXME (mfh 26 Oct 2014) It would be better to select the
	// return value at compile time, using something like enable_if.
	return sizeof (long) == 4 ?
	static_cast<long> (gen.rand (static_cast<int> (range))) :
	static_cast<long> (gen.rand64 (range));
	}
	KOKKOS_INLINE_FUNCTION
	static long draw (Generator& gen, const long& start, const long& end) {
	// FIXME (mfh 26 Oct 2014) It would be better to select the
	// return value at compile time, using something like enable_if.
	return sizeof (long) == 4 ?
	static_cast<long> (gen.rand (static_cast<int> (start),
	static_cast<int> (end))) :
	static_cast<long> (gen.rand64 (start, end));
	}
	};

	template<class Generator>
	struct rand<Generator,unsigned long> {
	KOKKOS_INLINE_FUNCTION
	static unsigned long max () {
	// FIXME (mfh 26 Oct 2014) It would be better to select the
	// return value at compile time, using something like enable_if.
	return sizeof (unsigned long) == 4 ?
	static_cast<unsigned long> (Generator::MAX_URAND) :
	static_cast<unsigned long> (Generator::MAX_URAND64);
	}
	KOKKOS_INLINE_FUNCTION
	static unsigned long draw (Generator& gen) {
	// FIXME (mfh 26 Oct 2014) It would be better to select the
	// return value at compile time, using something like enable_if.
	return sizeof (unsigned long) == 4 ?
	static_cast<unsigned long> (gen.urand ()) :
	static_cast<unsigned long> (gen.urand64 ());
	}
	KOKKOS_INLINE_FUNCTION
	static unsigned long draw(Generator& gen, const unsigned long& range) {
	// FIXME (mfh 26 Oct 2014) It would be better to select the
	// return value at compile time, using something like enable_if.
	return sizeof (unsigned long) == 4 ?
	static_cast<unsigned long> (gen.urand (static_cast<unsigned int> (range))) :
	static_cast<unsigned long> (gen.urand64 (range));
	}
	KOKKOS_INLINE_FUNCTION
	static unsigned long
	draw (Generator& gen, const unsigned long& start, const unsigned long& end) {
	// FIXME (mfh 26 Oct 2014) It would be better to select the
	// return value at compile time, using something like enable_if.
	return sizeof (unsigned long) == 4 ?
	static_cast<unsigned long> (gen.urand (static_cast<unsigned int> (start),
	static_cast<unsigned int> (end))) :
	static_cast<unsigned long> (gen.urand64 (start, end));
	}
	};

	// NOTE (mfh 26 oct 2014) This is a partial specialization for long
	// long, a C99 / C++11 signed type which is guaranteed to be at
	// least 64 bits. Do NOT write a partial specialization for
	// int64_t!!! This is just a typedef! It could be either long or
	// long long. We don't know which a priori, and I've seen both.
	// The types long and long long are guaranteed to differ, so it's
	// always safe to specialize for both.
	template<class Generator>
	struct rand<Generator, long long> {
	KOKKOS_INLINE_FUNCTION
	static long long max () {
	// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
	return Generator::MAX_RAND64;
	}
	KOKKOS_INLINE_FUNCTION
	static long long draw (Generator& gen) {
	// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
	return gen.rand64 ();
	}
	KOKKOS_INLINE_FUNCTION
	static long long draw (Generator& gen, const long long& range) {
	// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
	return gen.rand64 (range);
	}
	KOKKOS_INLINE_FUNCTION
	static long long draw (Generator& gen, const long long& start, const long long& end) {
	// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
	return gen.rand64 (start, end);
	}
	};

	// NOTE (mfh 26 oct 2014) This is a partial specialization for
	// unsigned long long, a C99 / C++11 unsigned type which is
	// guaranteed to be at least 64 bits. Do NOT write a partial
	// specialization for uint64_t!!! This is just a typedef! It could
	// be either unsigned long or unsigned long long. We don't know
	// which a priori, and I've seen both. The types unsigned long and
	// unsigned long long are guaranteed to differ, so it's always safe
	// to specialize for both.
	template<class Generator>
	struct rand<Generator,unsigned long long> {
	KOKKOS_INLINE_FUNCTION
	static unsigned long long max () {
	// FIXME (mfh 26 Oct 2014) It's legal for unsigned long long to be > 64 bits.
	return Generator::MAX_URAND64;
	}
	KOKKOS_INLINE_FUNCTION
	static unsigned long long draw (Generator& gen) {
	// FIXME (mfh 26 Oct 2014) It's legal for unsigned long long to be > 64 bits.
	return gen.urand64 ();
	}
	KOKKOS_INLINE_FUNCTION
	static unsigned long long draw (Generator& gen, const unsigned long long& range) {
	// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
	return gen.urand64 (range);
	}
	KOKKOS_INLINE_FUNCTION
	static unsigned long long
	draw (Generator& gen, const unsigned long long& start, const unsigned long long& end) {
	// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
	return gen.urand64 (start, end);
	}
	};

	template<class Generator>
	struct rand<Generator,float> {
	KOKKOS_INLINE_FUNCTION
	static float max(){return 1.0f;}
	KOKKOS_INLINE_FUNCTION
	static float draw(Generator& gen)
	{return gen.frand();}
	KOKKOS_INLINE_FUNCTION
	static float draw(Generator& gen, const float& range)
	{return gen.frand(range);}
	KOKKOS_INLINE_FUNCTION
	static float draw(Generator& gen, const float& start, const float& end)
	{return gen.frand(start,end);}

	};

	template<class Generator>
	struct rand<Generator,double> {
	KOKKOS_INLINE_FUNCTION
	static double max(){return 1.0;}
	KOKKOS_INLINE_FUNCTION
	static double draw(Generator& gen)
	{return gen.drand();}
	KOKKOS_INLINE_FUNCTION
	static double draw(Generator& gen, const double& range)
	{return gen.drand(range);}
	KOKKOS_INLINE_FUNCTION
	static double draw(Generator& gen, const double& start, const double& end)
	{return gen.drand(start,end);}

	};

	template<class Generator>
	struct rand<Generator, Kokkos::complex<float> > {
	KOKKOS_INLINE_FUNCTION
	static Kokkos::complex<float> max () {
	return Kokkos::complex<float> (1.0, 1.0);
	}
	KOKKOS_INLINE_FUNCTION
	static Kokkos::complex<float> draw (Generator& gen) {
	const float re = gen.frand ();
	const float im = gen.frand ();
	return Kokkos::complex<float> (re, im);
	}
	KOKKOS_INLINE_FUNCTION
	static Kokkos::complex<float> draw (Generator& gen, const Kokkos::complex<float>& range) {
	const float re = gen.frand (real (range));
	const float im = gen.frand (imag (range));
	return Kokkos::complex<float> (re, im);
	}
	KOKKOS_INLINE_FUNCTION
	static Kokkos::complex<float> draw (Generator& gen, const Kokkos::complex<float>& start, const Kokkos::complex<float>& end) {
	const float re = gen.frand (real (start), real (end));
	const float im = gen.frand (imag (start), imag (end));
	return Kokkos::complex<float> (re, im);
	}
	};

	template<class Generator>
	struct rand<Generator, Kokkos::complex<double> > {
	KOKKOS_INLINE_FUNCTION
	static Kokkos::complex<double> max () {
	return Kokkos::complex<double> (1.0, 1.0);
	}
	KOKKOS_INLINE_FUNCTION
	static Kokkos::complex<double> draw (Generator& gen) {
	const double re = gen.drand ();
	const double im = gen.drand ();
	return Kokkos::complex<double> (re, im);
	}
	KOKKOS_INLINE_FUNCTION
	static Kokkos::complex<double> draw (Generator& gen, const Kokkos::complex<double>& range) {
	const double re = gen.drand (real (range));
	const double im = gen.drand (imag (range));
	return Kokkos::complex<double> (re, im);
	}
	KOKKOS_INLINE_FUNCTION
	static Kokkos::complex<double> draw (Generator& gen, const Kokkos::complex<double>& start, const Kokkos::complex<double>& end) {
	const double re = gen.drand (real (start), real (end));
	const double im = gen.drand (imag (start), imag (end));
	return Kokkos::complex<double> (re, im);
	}
	};

	template<class DeviceType>
	class Random_XorShift64_Pool;

	template<class DeviceType>
	class Random_XorShift64 {
	private:
	uint64_t state_;
	const int state_idx_;
	friend class Random_XorShift64_Pool<DeviceType>;
	public:

	typedef DeviceType device_type;

	enum {MAX_URAND = 0xffffffffU};
	enum {MAX_URAND64 = 0xffffffffffffffffULL-1};
	enum {MAX_RAND = static_cast<int>(0xffffffff/2)};
	enum {MAX_RAND64 = static_cast<int64_t>(0xffffffffffffffffLL/2-1)};

	KOKKOS_INLINE_FUNCTION
	Random_XorShift64 (uint64_t state, int state_idx = 0)
	- : state_(state),state_idx_(state_idx){}
	+ : state_(state==0?uint64_t(1318319):state),state_idx_(state_idx){}

	KOKKOS_INLINE_FUNCTION
	uint32_t urand() {
	state_ ^= state_ >> 12;
	state_ ^= state_ << 25;
	state_ ^= state_ >> 27;

	uint64_t tmp = state_ * 2685821657736338717ULL;
	tmp = tmp>>16;
	return static_cast<uint32_t>(tmp&MAX_URAND);
	}

	KOKKOS_INLINE_FUNCTION
	uint64_t urand64() {
	state_ ^= state_ >> 12;
	state_ ^= state_ << 25;
	state_ ^= state_ >> 27;
	return (state_ * 2685821657736338717ULL) - 1;
	}

	KOKKOS_INLINE_FUNCTION
	uint32_t urand(const uint32_t& range) {
	const uint32_t max_val = (MAX_URAND/range)*range;
	uint32_t tmp = urand();
	while(tmp>=max_val)
	tmp = urand();
	return tmp%range;
	}

	KOKKOS_INLINE_FUNCTION
	uint32_t urand(const uint32_t& start, const uint32_t& end ) {
	return urand(end-start)+start;
	}

	KOKKOS_INLINE_FUNCTION
	uint64_t urand64(const uint64_t& range) {
	const uint64_t max_val = (MAX_URAND64/range)*range;
	uint64_t tmp = urand64();
	while(tmp>=max_val)
	tmp = urand64();
	return tmp%range;
	}

	KOKKOS_INLINE_FUNCTION
	uint64_t urand64(const uint64_t& start, const uint64_t& end ) {
	return urand64(end-start)+start;
	}

	KOKKOS_INLINE_FUNCTION
	int rand() {
	return static_cast<int>(urand()/2);
	}

	KOKKOS_INLINE_FUNCTION
	int rand(const int& range) {
	const int max_val = (MAX_RAND/range)*range;
	int tmp = rand();
	while(tmp>=max_val)
	tmp = rand();
	return tmp%range;
	}

	KOKKOS_INLINE_FUNCTION
	int rand(const int& start, const int& end ) {
	return rand(end-start)+start;
	}

	KOKKOS_INLINE_FUNCTION
	int64_t rand64() {
	return static_cast<int64_t>(urand64()/2);
	}

	KOKKOS_INLINE_FUNCTION
	int64_t rand64(const int64_t& range) {
	const int64_t max_val = (MAX_RAND64/range)*range;
	int64_t tmp = rand64();
	while(tmp>=max_val)
	tmp = rand64();
	return tmp%range;
	}

	KOKKOS_INLINE_FUNCTION
	int64_t rand64(const int64_t& start, const int64_t& end ) {
	return rand64(end-start)+start;
	}

	KOKKOS_INLINE_FUNCTION
	float frand() {
	return 1.0f * urand64()/MAX_URAND64;
	}

	KOKKOS_INLINE_FUNCTION
	float frand(const float& range) {
	return range * urand64()/MAX_URAND64;
	}

	KOKKOS_INLINE_FUNCTION
	float frand(const float& start, const float& end ) {
	return frand(end-start)+start;
	}

	KOKKOS_INLINE_FUNCTION
	double drand() {
	return 1.0 * urand64()/MAX_URAND64;
	}

	KOKKOS_INLINE_FUNCTION
	double drand(const double& range) {
	return range * urand64()/MAX_URAND64;
	}

	KOKKOS_INLINE_FUNCTION
	double drand(const double& start, const double& end ) {
	return drand(end-start)+start;
	}

	//Marsaglia polar method for drawing a standard normal distributed random number
	KOKKOS_INLINE_FUNCTION
	double normal() {
	double S = 2.0;
	double U;
	while(S>=1.0) {
	U = 2.0*drand() - 1.0;
	const double V = 2.0*drand() - 1.0;
	S = UU+VV;
	}
	return Usqrt(-2.0log(S)/S);
	}

	KOKKOS_INLINE_FUNCTION
	double normal(const double& mean, const double& std_dev=1.0) {
	return mean + normal()*std_dev;
	}

	};

	template<class DeviceType = Kokkos::DefaultExecutionSpace>
	class Random_XorShift64_Pool {
	private:
	typedef View<int*,DeviceType> lock_type;
	typedef View<uint64_t*,DeviceType> state_data_type;
	lock_type locks_;
	state_data_type state_;
	int num_states_;

	public:
	typedef Random_XorShift64<DeviceType> generator_type;
	typedef DeviceType device_type;

	Random_XorShift64_Pool() {
	num_states_ = 0;
	}
	Random_XorShift64_Pool(uint64_t seed) {
	num_states_ = 0;
	init(seed,DeviceType::max_hardware_threads());
	}

	Random_XorShift64_Pool(const Random_XorShift64_Pool& src):
	locks_(src.locks_),
	state_(src.state_),
	num_states_(src.num_states_)
	{}

	Random_XorShift64_Pool operator = (const Random_XorShift64_Pool& src) {
	locks_ = src.locks_;
	state_ = src.state_;
	num_states_ = src.num_states_;
	return *this;
	}

	void init(uint64_t seed, int num_states) {
	+ if(seed==0)
	+ seed = uint64_t(1318319);
	+
	num_states_ = num_states;

	locks_ = lock_type("Kokkos::Random_XorShift64::locks",num_states_);
	state_ = state_data_type("Kokkos::Random_XorShift64::state",num_states_);

	typename state_data_type::HostMirror h_state = create_mirror_view(state_);
	typename lock_type::HostMirror h_lock = create_mirror_view(locks_);

	// Execute on the HostMirror's default execution space.
	Random_XorShift64<typename state_data_type::HostMirror::execution_space> gen(seed,0);
	for(int i = 0; i < 17; i++)
	gen.rand();
	for(int i = 0; i < num_states_; i++) {
	int n1 = gen.rand();
	int n2 = gen.rand();
	int n3 = gen.rand();
	int n4 = gen.rand();
	h_state(i) = (((static_cast<uint64_t>(n1)) & 0xffff)<<00) \|
	(((static_cast<uint64_t>(n2)) & 0xffff)<<16) \|
	(((static_cast<uint64_t>(n3)) & 0xffff)<<32) \|
	(((static_cast<uint64_t>(n4)) & 0xffff)<<48);
	h_lock(i) = 0;
	}
	deep_copy(state_,h_state);
	deep_copy(locks_,h_lock);
	}

	KOKKOS_INLINE_FUNCTION
	Random_XorShift64<DeviceType> get_state() const {
	const int i = DeviceType::hardware_thread_id();;
	return Random_XorShift64<DeviceType>(state_(i),i);
	}

	KOKKOS_INLINE_FUNCTION
	void free_state(const Random_XorShift64<DeviceType>& state) const {
	state_(state.state_idx_) = state.state_;
	}
	};


	template<class DeviceType>
	class Random_XorShift1024_Pool;

	template<class DeviceType>
	class Random_XorShift1024 {
	private:
	int p_;
	const int state_idx_;
	uint64_t state_[16];
	friend class Random_XorShift1024_Pool<DeviceType>;
	public:

	typedef Random_XorShift1024_Pool<DeviceType> pool_type;
	typedef DeviceType device_type;

	enum {MAX_URAND = 0xffffffffU};
	enum {MAX_URAND64 = 0xffffffffffffffffULL-1};
	enum {MAX_RAND = static_cast<int>(0xffffffffU/2)};
	enum {MAX_RAND64 = static_cast<int64_t>(0xffffffffffffffffULL/2-1)};

	KOKKOS_INLINE_FUNCTION
	Random_XorShift1024 (const typename pool_type::state_data_type& state, int p, int state_idx = 0):
	p_(p),state_idx_(state_idx){
	for(int i=0 ; i<16; i++)
	state_[i] = state(state_idx,i);
	}

	KOKKOS_INLINE_FUNCTION
	uint32_t urand() {
	uint64_t state_0 = state_[ p_ ];
	uint64_t state_1 = state_[ p_ = ( p_ + 1 ) & 15 ];
	state_1 ^= state_1 << 31;
	state_1 ^= state_1 >> 11;
	state_0 ^= state_0 >> 30;
	uint64_t tmp = ( state_[ p_ ] = state_0 ^ state_1 ) * 1181783497276652981ULL;
	tmp = tmp>>16;
	return static_cast<uint32_t>(tmp&MAX_URAND);
	}

	KOKKOS_INLINE_FUNCTION
	uint64_t urand64() {
	uint64_t state_0 = state_[ p_ ];
	uint64_t state_1 = state_[ p_ = ( p_ + 1 ) & 15 ];
	state_1 ^= state_1 << 31;
	state_1 ^= state_1 >> 11;
	state_0 ^= state_0 >> 30;
	return (( state_[ p_ ] = state_0 ^ state_1 ) * 1181783497276652981LL) - 1;
	}

	KOKKOS_INLINE_FUNCTION
	uint32_t urand(const uint32_t& range) {
	const uint32_t max_val = (MAX_URAND/range)*range;
	uint32_t tmp = urand();
	while(tmp>=max_val)
	tmp = urand();
	return tmp%range;
	}

	KOKKOS_INLINE_FUNCTION
	uint32_t urand(const uint32_t& start, const uint32_t& end ) {
	return urand(end-start)+start;
	}

	KOKKOS_INLINE_FUNCTION
	uint64_t urand64(const uint64_t& range) {
	const uint64_t max_val = (MAX_URAND64/range)*range;
	uint64_t tmp = urand64();
	while(tmp>=max_val)
	tmp = urand64();
	return tmp%range;
	}

	KOKKOS_INLINE_FUNCTION
	uint64_t urand64(const uint64_t& start, const uint64_t& end ) {
	return urand64(end-start)+start;
	}

	KOKKOS_INLINE_FUNCTION
	int rand() {
	return static_cast<int>(urand()/2);
	}

	KOKKOS_INLINE_FUNCTION
	int rand(const int& range) {
	const int max_val = (MAX_RAND/range)*range;
	int tmp = rand();
	while(tmp>=max_val)
	tmp = rand();
	return tmp%range;
	}

	KOKKOS_INLINE_FUNCTION
	int rand(const int& start, const int& end ) {
	return rand(end-start)+start;
	}

	KOKKOS_INLINE_FUNCTION
	int64_t rand64() {
	return static_cast<int64_t>(urand64()/2);
	}

	KOKKOS_INLINE_FUNCTION
	int64_t rand64(const int64_t& range) {
	const int64_t max_val = (MAX_RAND64/range)*range;
	int64_t tmp = rand64();
	while(tmp>=max_val)
	tmp = rand64();
	return tmp%range;
	}

	KOKKOS_INLINE_FUNCTION
	int64_t rand64(const int64_t& start, const int64_t& end ) {
	return rand64(end-start)+start;
	}

	KOKKOS_INLINE_FUNCTION
	float frand() {
	return 1.0f * urand64()/MAX_URAND64;
	}

	KOKKOS_INLINE_FUNCTION
	float frand(const float& range) {
	return range * urand64()/MAX_URAND64;
	}

	KOKKOS_INLINE_FUNCTION
	float frand(const float& start, const float& end ) {
	return frand(end-start)+start;
	}

	KOKKOS_INLINE_FUNCTION
	double drand() {
	return 1.0 * urand64()/MAX_URAND64;
	}

	KOKKOS_INLINE_FUNCTION
	double drand(const double& range) {
	return range * urand64()/MAX_URAND64;
	}

	KOKKOS_INLINE_FUNCTION
	double drand(const double& start, const double& end ) {
	return frand(end-start)+start;
	}

	//Marsaglia polar method for drawing a standard normal distributed random number
	KOKKOS_INLINE_FUNCTION
	double normal() {
	double S = 2.0;
	double U;
	while(S>=1.0) {
	U = 2.0*drand() - 1.0;
	const double V = 2.0*drand() - 1.0;
	S = UU+VV;
	}
	return Usqrt(-2.0log(S)/S);
	}

	KOKKOS_INLINE_FUNCTION
	double normal(const double& mean, const double& std_dev=1.0) {
	return mean + normal()*std_dev;
	}
	};


	template<class DeviceType = Kokkos::DefaultExecutionSpace>
	class Random_XorShift1024_Pool {
	private:
	typedef View<int*,DeviceType> int_view_type;
	typedef View<uint64_t*[16],DeviceType> state_data_type;

	int_view_type locks_;
	state_data_type state_;
	int_view_type p_;
	int num_states_;
	friend class Random_XorShift1024<DeviceType>;

	public:
	typedef Random_XorShift1024<DeviceType> generator_type;

	typedef DeviceType device_type;

	Random_XorShift1024_Pool() {
	num_states_ = 0;
	}

	inline
	Random_XorShift1024_Pool(uint64_t seed){
	num_states_ = 0;
	init(seed,DeviceType::max_hardware_threads());
	}

	Random_XorShift1024_Pool(const Random_XorShift1024_Pool& src):
	locks_(src.locks_),
	state_(src.state_),
	p_(src.p_),
	num_states_(src.num_states_)
	{}

	Random_XorShift1024_Pool operator = (const Random_XorShift1024_Pool& src) {
	locks_ = src.locks_;
	state_ = src.state_;
	p_ = src.p_;
	num_states_ = src.num_states_;
	return *this;
	}

	inline
	void init(uint64_t seed, int num_states) {
	+ if(seed==0)
	+ seed = uint64_t(1318319);
	num_states_ = num_states;
	-
	locks_ = int_view_type("Kokkos::Random_XorShift1024::locks",num_states_);
	state_ = state_data_type("Kokkos::Random_XorShift1024::state",num_states_);
	p_ = int_view_type("Kokkos::Random_XorShift1024::p",num_states_);

	typename state_data_type::HostMirror h_state = create_mirror_view(state_);
	typename int_view_type::HostMirror h_lock = create_mirror_view(locks_);
	typename int_view_type::HostMirror h_p = create_mirror_view(p_);

	// Execute on the HostMirror's default execution space.
	Random_XorShift64<typename state_data_type::HostMirror::execution_space> gen(seed,0);
	for(int i = 0; i < 17; i++)
	gen.rand();
	for(int i = 0; i < num_states_; i++) {
	for(int j = 0; j < 16 ; j++) {
	int n1 = gen.rand();
	int n2 = gen.rand();
	int n3 = gen.rand();
	int n4 = gen.rand();
	h_state(i,j) = (((static_cast<uint64_t>(n1)) & 0xffff)<<00) \|
	(((static_cast<uint64_t>(n2)) & 0xffff)<<16) \|
	(((static_cast<uint64_t>(n3)) & 0xffff)<<32) \|
	(((static_cast<uint64_t>(n4)) & 0xffff)<<48);
	}
	h_p(i) = 0;
	h_lock(i) = 0;
	}
	deep_copy(state_,h_state);
	deep_copy(locks_,h_lock);
	}

	KOKKOS_INLINE_FUNCTION
	Random_XorShift1024<DeviceType> get_state() const {
	const int i = DeviceType::hardware_thread_id();
	return Random_XorShift1024<DeviceType>(state_,p_(i),i);
	};

	KOKKOS_INLINE_FUNCTION
	void free_state(const Random_XorShift1024<DeviceType>& state) const {
	for(int i = 0; i<16; i++)
	state_(state.state_idx_,i) = state.state_[i];
	p_(state.state_idx_) = state.p_;
	}
	};

	#if defined(KOKKOS_ENABLE_CUDA) && defined(__CUDACC__)

	template<>
	class Random_XorShift1024<Kokkos::Cuda> {
	private:
	int p_;
	const int state_idx_;
	uint64_t* state_;
	const int stride_;
	friend class Random_XorShift1024_Pool<Kokkos::Cuda>;
	public:

	typedef Kokkos::Cuda device_type;
	typedef Random_XorShift1024_Pool<device_type> pool_type;

	enum {MAX_URAND = 0xffffffffU};
	enum {MAX_URAND64 = 0xffffffffffffffffULL-1};
	enum {MAX_RAND = static_cast<int>(0xffffffffU/2)};
	enum {MAX_RAND64 = static_cast<int64_t>(0xffffffffffffffffULL/2-1)};

	KOKKOS_INLINE_FUNCTION
	Random_XorShift1024 (const typename pool_type::state_data_type& state, int p, int state_idx = 0):
	p_(p),state_idx_(state_idx),state_(&state(state_idx,0)),stride_(state.stride_1()){
	}

	KOKKOS_INLINE_FUNCTION
	uint32_t urand() {
	uint64_t state_0 = state_[ p_ * stride_ ];
	uint64_t state_1 = state_[ (p_ = ( p_ + 1 ) & 15) * stride_ ];
	state_1 ^= state_1 << 31;
	state_1 ^= state_1 >> 11;
	state_0 ^= state_0 >> 30;
	uint64_t tmp = ( state_[ p_ * stride_ ] = state_0 ^ state_1 ) * 1181783497276652981ULL;
	tmp = tmp>>16;
	return static_cast<uint32_t>(tmp&MAX_URAND);
	}

	KOKKOS_INLINE_FUNCTION
	uint64_t urand64() {
	uint64_t state_0 = state_[ p_ * stride_ ];
	uint64_t state_1 = state_[ (p_ = ( p_ + 1 ) & 15) * stride_ ];
	state_1 ^= state_1 << 31;
	state_1 ^= state_1 >> 11;
	state_0 ^= state_0 >> 30;
	return (( state_[ p_ * stride_ ] = state_0 ^ state_1 ) * 1181783497276652981LL) - 1;
	}

	KOKKOS_INLINE_FUNCTION
	uint32_t urand(const uint32_t& range) {
	const uint32_t max_val = (MAX_URAND/range)*range;
	uint32_t tmp = urand();
	while(tmp>=max_val)
	urand();
	return tmp%range;
	}

	KOKKOS_INLINE_FUNCTION
	uint32_t urand(const uint32_t& start, const uint32_t& end ) {
	return urand(end-start)+start;
	}

	KOKKOS_INLINE_FUNCTION
	uint64_t urand64(const uint64_t& range) {
	const uint64_t max_val = (MAX_URAND64/range)*range;
	uint64_t tmp = urand64();
	while(tmp>=max_val)
	urand64();
	return tmp%range;
	}

	KOKKOS_INLINE_FUNCTION
	uint64_t urand64(const uint64_t& start, const uint64_t& end ) {
	return urand64(end-start)+start;
	}

	KOKKOS_INLINE_FUNCTION
	int rand() {
	return static_cast<int>(urand()/2);
	}

	KOKKOS_INLINE_FUNCTION
	int rand(const int& range) {
	const int max_val = (MAX_RAND/range)*range;
	int tmp = rand();
	while(tmp>=max_val)
	rand();
	return tmp%range;
	}

	KOKKOS_INLINE_FUNCTION
	int rand(const int& start, const int& end ) {
	return rand(end-start)+start;
	}

	KOKKOS_INLINE_FUNCTION
	int64_t rand64() {
	return static_cast<int64_t>(urand64()/2);
	}

	KOKKOS_INLINE_FUNCTION
	int64_t rand64(const int64_t& range) {
	const int64_t max_val = (MAX_RAND64/range)*range;
	int64_t tmp = rand64();
	while(tmp>=max_val)
	rand64();
	return tmp%range;
	}

	KOKKOS_INLINE_FUNCTION
	int64_t rand64(const int64_t& start, const int64_t& end ) {
	return rand64(end-start)+start;
	}

	KOKKOS_INLINE_FUNCTION
	float frand() {
	return 1.0f * urand64()/MAX_URAND64;
	}

	KOKKOS_INLINE_FUNCTION
	float frand(const float& range) {
	return range * urand64()/MAX_URAND64;
	}

	KOKKOS_INLINE_FUNCTION
	float frand(const float& start, const float& end ) {
	return frand(end-start)+start;
	}

	KOKKOS_INLINE_FUNCTION
	double drand() {
	return 1.0 * urand64()/MAX_URAND64;
	}

	KOKKOS_INLINE_FUNCTION
	double drand(const double& range) {
	return range * urand64()/MAX_URAND64;
	}

	KOKKOS_INLINE_FUNCTION
	double drand(const double& start, const double& end ) {
	return frand(end-start)+start;
	}

	//Marsaglia polar method for drawing a standard normal distributed random number
	KOKKOS_INLINE_FUNCTION
	double normal() {
	double S = 2.0;
	double U;
	while(S>=1.0) {
	U = 2.0*drand() - 1.0;
	const double V = 2.0*drand() - 1.0;
	S = UU+VV;
	}
	return Usqrt(-2.0log(S)/S);
	}

	KOKKOS_INLINE_FUNCTION
	double normal(const double& mean, const double& std_dev=1.0) {
	return mean + normal()*std_dev;
	}
	};

	template<>
	inline
	Random_XorShift64_Pool<Kokkos::Cuda>::Random_XorShift64_Pool(uint64_t seed) {
	num_states_ = 0;
	init(seed,4*32768);
	}

	template<>
	KOKKOS_INLINE_FUNCTION
	Random_XorShift64<Kokkos::Cuda> Random_XorShift64_Pool<Kokkos::Cuda>::get_state() const {
	#ifdef __CUDA_ARCH__
	const int i_offset = (threadIdx.xblockDim.y + threadIdx.y)blockDim.z+threadIdx.z;
	int i = (((blockIdx.xgridDim.y+blockIdx.y)gridDim.z + blockIdx.z) *
	blockDim.xblockDim.yblockDim.z + i_offset)%num_states_;
	while(Kokkos::atomic_compare_exchange(&locks_(i),0,1)) {
	i+=blockDim.xblockDim.yblockDim.z;
	if(i>=num_states_) {i = i_offset;}
	}

	return Random_XorShift64<Kokkos::Cuda>(state_(i),i);
	#else
	return Random_XorShift64<Kokkos::Cuda>(state_(0),0);
	#endif
	}

	template<>
	KOKKOS_INLINE_FUNCTION
	void Random_XorShift64_Pool<Kokkos::Cuda>::free_state(const Random_XorShift64<Kokkos::Cuda> &state) const {
	#ifdef __CUDA_ARCH__
	state_(state.state_idx_) = state.state_;
	locks_(state.state_idx_) = 0;
	return;
	#endif
	}


	template<>
	inline
	Random_XorShift1024_Pool<Kokkos::Cuda>::Random_XorShift1024_Pool(uint64_t seed) {
	num_states_ = 0;
	init(seed,4*32768);
	}

	template<>
	KOKKOS_INLINE_FUNCTION
	Random_XorShift1024<Kokkos::Cuda> Random_XorShift1024_Pool<Kokkos::Cuda>::get_state() const {
	#ifdef __CUDA_ARCH__
	const int i_offset = (threadIdx.xblockDim.y + threadIdx.y)blockDim.z+threadIdx.z;
	int i = (((blockIdx.xgridDim.y+blockIdx.y)gridDim.z + blockIdx.z) *
	blockDim.xblockDim.yblockDim.z + i_offset)%num_states_;
	while(Kokkos::atomic_compare_exchange(&locks_(i),0,1)) {
	i+=blockDim.xblockDim.yblockDim.z;
	if(i>=num_states_) {i = i_offset;}
	}

	return Random_XorShift1024<Kokkos::Cuda>(state_, p_(i), i);
	#else
	return Random_XorShift1024<Kokkos::Cuda>(state_, p_(0), 0);
	#endif
	}

	template<>
	KOKKOS_INLINE_FUNCTION
	void Random_XorShift1024_Pool<Kokkos::Cuda>::free_state(const Random_XorShift1024<Kokkos::Cuda> &state) const {
	#ifdef __CUDA_ARCH__
	for(int i=0; i<16; i++)
	state_(state.state_idx_,i) = state.state_[i];
	locks_(state.state_idx_) = 0;
	return;
	#endif
	}


	#endif


	namespace Impl {

	template<class ViewType, class RandomPool, int loops, int rank, class IndexType>
	struct fill_random_functor_range;
	template<class ViewType, class RandomPool, int loops, int rank, class IndexType>
	struct fill_random_functor_begin_end;

	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_range<ViewType,RandomPool,loops,1,IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type range;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type range_):
	a(a_),rand_pool(rand_pool_),range(range_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (const IndexType& i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0()))
	a(idx) = Rand::draw(gen,range);
	}
	rand_pool.free_state(gen);
	}
	};

	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_range<ViewType,RandomPool,loops,2,IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type range;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type range_):
	a(a_),rand_pool(rand_pool_),range(range_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (IndexType i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0())) {
	for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
	a(idx,k) = Rand::draw(gen,range);
	}
	}
	rand_pool.free_state(gen);
	}
	};


	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_range<ViewType,RandomPool,loops,3,IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type range;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type range_):
	a(a_),rand_pool(rand_pool_),range(range_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (IndexType i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0())) {
	for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
	for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
	a(idx,k,l) = Rand::draw(gen,range);
	}
	}
	rand_pool.free_state(gen);
	}
	};

	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_range<ViewType,RandomPool,loops,4, IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type range;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type range_):
	a(a_),rand_pool(rand_pool_),range(range_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (IndexType i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0())) {
	for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
	for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
	for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
	a(idx,k,l,m) = Rand::draw(gen,range);
	}
	}
	rand_pool.free_state(gen);
	}
	};

	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_range<ViewType,RandomPool,loops,5,IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type range;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type range_):
	a(a_),rand_pool(rand_pool_),range(range_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (IndexType i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0())) {
	for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
	for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
	for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
	for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
	a(idx,k,l,m,n) = Rand::draw(gen,range);
	}
	}
	rand_pool.free_state(gen);
	}
	};

	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_range<ViewType,RandomPool,loops,6,IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type range;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type range_):
	a(a_),rand_pool(rand_pool_),range(range_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (IndexType i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0())) {
	for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
	for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
	for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
	for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
	for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
	a(idx,k,l,m,n,o) = Rand::draw(gen,range);
	}
	}
	rand_pool.free_state(gen);
	}
	};

	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_range<ViewType,RandomPool,loops,7,IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type range;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type range_):
	a(a_),rand_pool(rand_pool_),range(range_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (IndexType i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0())) {
	for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
	for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
	for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
	for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
	for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
	for(IndexType p=0;p<static_cast<IndexType>(a.dimension_6());p++)
	a(idx,k,l,m,n,o,p) = Rand::draw(gen,range);
	}
	}
	rand_pool.free_state(gen);
	}
	};

	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_range<ViewType,RandomPool,loops,8,IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type range;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type range_):
	a(a_),rand_pool(rand_pool_),range(range_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (IndexType i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0())) {
	for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
	for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
	for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
	for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
	for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
	for(IndexType p=0;p<static_cast<IndexType>(a.dimension_6());p++)
	for(IndexType q=0;q<static_cast<IndexType>(a.dimension_7());q++)
	a(idx,k,l,m,n,o,p,q) = Rand::draw(gen,range);
	}
	}
	rand_pool.free_state(gen);
	}
	};
	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_begin_end<ViewType,RandomPool,loops,1,IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type begin,end;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
	a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (IndexType i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0()))
	a(idx) = Rand::draw(gen,begin,end);
	}
	rand_pool.free_state(gen);
	}
	};

	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_begin_end<ViewType,RandomPool,loops,2,IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type begin,end;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
	a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (IndexType i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0())) {
	for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
	a(idx,k) = Rand::draw(gen,begin,end);
	}
	}
	rand_pool.free_state(gen);
	}
	};


	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_begin_end<ViewType,RandomPool,loops,3,IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type begin,end;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
	a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (IndexType i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0())) {
	for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
	for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
	a(idx,k,l) = Rand::draw(gen,begin,end);
	}
	}
	rand_pool.free_state(gen);
	}
	};

	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_begin_end<ViewType,RandomPool,loops,4,IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type begin,end;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
	a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (IndexType i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0())) {
	for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
	for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
	for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
	a(idx,k,l,m) = Rand::draw(gen,begin,end);
	}
	}
	rand_pool.free_state(gen);
	}
	};

	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_begin_end<ViewType,RandomPool,loops,5,IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type begin,end;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
	a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (IndexType i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0())){
	for(IndexType l=0;l<static_cast<IndexType>(a.dimension_1());l++)
	for(IndexType m=0;m<static_cast<IndexType>(a.dimension_2());m++)
	for(IndexType n=0;n<static_cast<IndexType>(a.dimension_3());n++)
	for(IndexType o=0;o<static_cast<IndexType>(a.dimension_4());o++)
	a(idx,l,m,n,o) = Rand::draw(gen,begin,end);
	}
	}
	rand_pool.free_state(gen);
	}
	};

	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_begin_end<ViewType,RandomPool,loops,6,IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type begin,end;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
	a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (IndexType i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0())) {
	for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
	for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
	for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
	for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
	for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
	a(idx,k,l,m,n,o) = Rand::draw(gen,begin,end);
	}
	}
	rand_pool.free_state(gen);
	}
	};


	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_begin_end<ViewType,RandomPool,loops,7,IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type begin,end;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
	a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (IndexType i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0())) {
	for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
	for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
	for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
	for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
	for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
	for(IndexType p=0;p<static_cast<IndexType>(a.dimension_6());p++)
	a(idx,k,l,m,n,o,p) = Rand::draw(gen,begin,end);
	}
	}
	rand_pool.free_state(gen);
	}
	};

	template<class ViewType, class RandomPool, int loops, class IndexType>
	struct fill_random_functor_begin_end<ViewType,RandomPool,loops,8,IndexType>{
	typedef typename ViewType::execution_space execution_space;
	ViewType a;
	RandomPool rand_pool;
	typename ViewType::const_value_type begin,end;

	typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;

	fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
	typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
	a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (IndexType i) const {
	typename RandomPool::generator_type gen = rand_pool.get_state();
	for(IndexType j=0;j<loops;j++) {
	const IndexType idx = i*loops+j;
	if(idx<static_cast<IndexType>(a.dimension_0())) {
	for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
	for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
	for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
	for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
	for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
	for(IndexType p=0;p<static_cast<IndexType>(a.dimension_6());p++)
	for(IndexType q=0;q<static_cast<IndexType>(a.dimension_7());q++)
	a(idx,k,l,m,n,o,p,q) = Rand::draw(gen,begin,end);
	}
	}
	rand_pool.free_state(gen);
	}
	};

	}

	template<class ViewType, class RandomPool, class IndexType = int64_t>
	void fill_random(ViewType a, RandomPool g, typename ViewType::const_value_type range) {
	int64_t LDA = a.dimension_0();
	if(LDA>0)
	parallel_for((LDA+127)/128,Impl::fill_random_functor_range<ViewType,RandomPool,128,ViewType::Rank,IndexType>(a,g,range));
	}

	template<class ViewType, class RandomPool, class IndexType = int64_t>
	void fill_random(ViewType a, RandomPool g, typename ViewType::const_value_type begin,typename ViewType::const_value_type end ) {
	int64_t LDA = a.dimension_0();
	if(LDA>0)
	parallel_for((LDA+127)/128,Impl::fill_random_functor_begin_end<ViewType,RandomPool,128,ViewType::Rank,IndexType>(a,g,begin,end));
	}
	}

	#endif
	diff --git a/lib/kokkos/algorithms/src/Kokkos_Sort.hpp b/lib/kokkos/algorithms/src/Kokkos_Sort.hpp
	index 5b8c65fee..237de751f 100644
	--- a/lib/kokkos/algorithms/src/Kokkos_Sort.hpp
	+++ b/lib/kokkos/algorithms/src/Kokkos_Sort.hpp
	@@ -1,407 +1,548 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/


	#ifndef KOKKOS_SORT_HPP_
	#define KOKKOS_SORT_HPP_

	#include <Kokkos_Core.hpp>

	#include <algorithm>

	namespace Kokkos {

	namespace Impl {

	- template<class ValuesViewType, int Rank=ValuesViewType::Rank>
	+ template< class DstViewType , class SrcViewType
	+ , int Rank = DstViewType::Rank >
	struct CopyOp;

	- template<class ValuesViewType>
	- struct CopyOp<ValuesViewType,1> {
	- template<class DstType, class SrcType>
	+ template< class DstViewType , class SrcViewType >
	+ struct CopyOp<DstViewType,SrcViewType,1> {
	KOKKOS_INLINE_FUNCTION
	- static void copy(DstType& dst, size_t i_dst,
	- SrcType& src, size_t i_src ) {
	+ static void copy(DstViewType const& dst, size_t i_dst,
	+ SrcViewType const& src, size_t i_src ) {
	dst(i_dst) = src(i_src);
	}
	};

	- template<class ValuesViewType>
	- struct CopyOp<ValuesViewType,2> {
	- template<class DstType, class SrcType>
	+ template< class DstViewType , class SrcViewType >
	+ struct CopyOp<DstViewType,SrcViewType,2> {
	KOKKOS_INLINE_FUNCTION
	- static void copy(DstType& dst, size_t i_dst,
	- SrcType& src, size_t i_src ) {
	- for(int j = 0;j< (int) dst.dimension_1(); j++)
	+ static void copy(DstViewType const& dst, size_t i_dst,
	+ SrcViewType const& src, size_t i_src ) {
	+ for(int j = 0;j< (int) dst.extent(1); j++)
	dst(i_dst,j) = src(i_src,j);
	}
	};

	- template<class ValuesViewType>
	- struct CopyOp<ValuesViewType,3> {
	- template<class DstType, class SrcType>
	+ template< class DstViewType , class SrcViewType >
	+ struct CopyOp<DstViewType,SrcViewType,3> {
	KOKKOS_INLINE_FUNCTION
	- static void copy(DstType& dst, size_t i_dst,
	- SrcType& src, size_t i_src ) {
	- for(int j = 0; j<dst.dimension_1(); j++)
	- for(int k = 0; k<dst.dimension_2(); k++)
	+ static void copy(DstViewType const& dst, size_t i_dst,
	+ SrcViewType const& src, size_t i_src ) {
	+ for(int j = 0; j<dst.extent(1); j++)
	+ for(int k = 0; k<dst.extent(2); k++)
	dst(i_dst,j,k) = src(i_src,j,k);
	}
	};
	}

	-template<class KeyViewType, class BinSortOp, class ExecutionSpace = typename KeyViewType::execution_space,
	- class SizeType = typename KeyViewType::memory_space::size_type>
	+//----------------------------------------------------------------------------
	+
	+template< class KeyViewType
	+ , class BinSortOp
	+ , class Space = typename KeyViewType::device_type
	+ , class SizeType = typename KeyViewType::memory_space::size_type
	+ >
	class BinSort {
	+public:

	+ template< class DstViewType , class SrcViewType >
	+ struct copy_functor {

	-public:
	- template<class ValuesViewType, class PermuteViewType, class CopyOp>
	- struct bin_sort_sort_functor {
	- typedef ExecutionSpace execution_space;
	- typedef typename ValuesViewType::non_const_type values_view_type;
	- typedef typename ValuesViewType::const_type const_values_view_type;
	- Kokkos::View<typename values_view_type::const_data_type,typename values_view_type::array_layout,
	- typename values_view_type::memory_space,Kokkos::MemoryTraits<Kokkos::RandomAccess> > values;
	- values_view_type sorted_values;
	- typename PermuteViewType::const_type sort_order;
	- bin_sort_sort_functor(const_values_view_type values_, values_view_type sorted_values_, PermuteViewType sort_order_):
	- values(values_),sorted_values(sorted_values_),sort_order(sort_order_) {}
	+ typedef typename SrcViewType::const_type src_view_type ;
	+
	+ typedef Impl::CopyOp< DstViewType , src_view_type > copy_op ;
	+
	+ DstViewType dst_values ;
	+ src_view_type src_values ;
	+ int dst_offset ;
	+
	+ copy_functor( DstViewType const & dst_values_
	+ , int const & dst_offset_
	+ , SrcViewType const & src_values_
	+ )
	+ : dst_values( dst_values_ )
	+ , src_values( src_values_ )
	+ , dst_offset( dst_offset_ )
	+ {}
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator() (const int& i) const {
	+ // printf("copy: dst(%i) src(%i)\n",i+dst_offset,i);
	+ copy_op::copy(dst_values,i+dst_offset,src_values,i);
	+ }
	+ };
	+
	+ template< class DstViewType
	+ , class PermuteViewType
	+ , class SrcViewType
	+ >
	+ struct copy_permute_functor {
	+
	+ // If a Kokkos::View then can generate constant random access
	+ // otherwise can only use the constant type.
	+
	+ typedef typename std::conditional
	+ < Kokkos::is_view< SrcViewType >::value
	+ , Kokkos::View< typename SrcViewType::const_data_type
	+ , typename SrcViewType::array_layout
	+ , typename SrcViewType::device_type
	+ , Kokkos::MemoryTraits<Kokkos::RandomAccess>
	+ >
	+ , typename SrcViewType::const_type
	+ >::type src_view_type ;
	+
	+ typedef typename PermuteViewType::const_type perm_view_type ;
	+
	+ typedef Impl::CopyOp< DstViewType , src_view_type > copy_op ;
	+
	+ DstViewType dst_values ;
	+ perm_view_type sort_order ;
	+ src_view_type src_values ;
	+
	+ copy_permute_functor( DstViewType const & dst_values_
	+ , PermuteViewType const & sort_order_
	+ , SrcViewType const & src_values_
	+ )
	+ : dst_values( dst_values_ )
	+ , sort_order( sort_order_ )
	+ , src_values( src_values_ )
	+ {}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	- //printf("Sort: %i %i\n",i,sort_order(i));
	- CopyOp::copy(sorted_values,i,values,sort_order(i));
	+ // printf("copy_permute: dst(%i) src(%i)\n",i,sort_order(i));
	+ copy_op::copy(dst_values,i,src_values,sort_order(i));
	}
	};

	- typedef ExecutionSpace execution_space;
	+ typedef typename Space::execution_space execution_space;
	typedef BinSortOp bin_op_type;

	struct bin_count_tag {};
	struct bin_offset_tag {};
	struct bin_binning_tag {};
	struct bin_sort_bins_tag {};

	public:
	+
	typedef SizeType size_type;
	typedef size_type value_type;

	- typedef Kokkos::View<size_type*, execution_space> offset_type;
	- typedef Kokkos::View<const int*, execution_space> bin_count_type;
	+ typedef Kokkos::View<size_type*, Space> offset_type;
	+ typedef Kokkos::View<const int*, Space> bin_count_type;

	+ typedef typename KeyViewType::const_type const_key_view_type ;

	- typedef Kokkos::View<typename KeyViewType::const_data_type,
	- typename KeyViewType::array_layout,
	- typename KeyViewType::memory_space> const_key_view_type;
	- typedef Kokkos::View<typename KeyViewType::const_data_type,
	- typename KeyViewType::array_layout,
	- typename KeyViewType::memory_space,
	- Kokkos::MemoryTraits<Kokkos::RandomAccess> > const_rnd_key_view_type;
	+ // If a Kokkos::View then can generate constant random access
	+ // otherwise can only use the constant type.
	+
	+ typedef typename std::conditional
	+ < Kokkos::is_view< KeyViewType >::value
	+ , Kokkos::View< typename KeyViewType::const_data_type,
	+ typename KeyViewType::array_layout,
	+ typename KeyViewType::device_type,
	+ Kokkos::MemoryTraits<Kokkos::RandomAccess> >
	+ , const_key_view_type
	+ >::type const_rnd_key_view_type;

	typedef typename KeyViewType::non_const_value_type non_const_key_scalar;
	typedef typename KeyViewType::const_value_type const_key_scalar;

	+ typedef Kokkos::View<int*, Space, Kokkos::MemoryTraits<Kokkos::Atomic> > bin_count_atomic_type ;
	+
	private:
	+
	const_key_view_type keys;
	const_rnd_key_view_type keys_rnd;

	public:
	- BinSortOp bin_op;

	- offset_type bin_offsets;
	+ BinSortOp bin_op ;
	+ offset_type bin_offsets ;
	+ bin_count_atomic_type bin_count_atomic ;
	+ bin_count_type bin_count_const ;
	+ offset_type sort_order ;

	- Kokkos::View<int*, ExecutionSpace, Kokkos::MemoryTraits<Kokkos::Atomic> > bin_count_atomic;
	- bin_count_type bin_count_const;
	-
	- offset_type sort_order;
	-
	- bool sort_within_bins;
	+ int range_begin ;
	+ int range_end ;
	+ bool sort_within_bins ;

	public:

	- // Constructor: takes the keys, the binning_operator and optionally whether to sort within bins (default false)
	- BinSort(const_key_view_type keys_, BinSortOp bin_op_,
	- bool sort_within_bins_ = false)
	- :keys(keys_),keys_rnd(keys_), bin_op(bin_op_) {
	+ BinSort() {}

	- bin_count_atomic = Kokkos::View<int*, ExecutionSpace >("Kokkos::SortImpl::BinSortFunctor::bin_count",bin_op.max_bins());
	+ //----------------------------------------
	+ // Constructor: takes the keys, the binning_operator and optionally whether to sort within bins (default false)
	+ BinSort( const_key_view_type keys_
	+ , int range_begin_
	+ , int range_end_
	+ , BinSortOp bin_op_
	+ , bool sort_within_bins_ = false
	+ )
	+ : keys(keys_)
	+ , keys_rnd(keys_)
	+ , bin_op(bin_op_)
	+ , bin_offsets()
	+ , bin_count_atomic()
	+ , bin_count_const()
	+ , sort_order()
	+ , range_begin( range_begin_ )
	+ , range_end( range_end_ )
	+ , sort_within_bins( sort_within_bins_ )
	+ {
	+ bin_count_atomic = Kokkos::View<int*, Space >("Kokkos::SortImpl::BinSortFunctor::bin_count",bin_op.max_bins());
	bin_count_const = bin_count_atomic;
	bin_offsets = offset_type("Kokkos::SortImpl::BinSortFunctor::bin_offsets",bin_op.max_bins());
	- sort_order = offset_type("PermutationVector",keys.dimension_0());
	- sort_within_bins = sort_within_bins_;
	+ sort_order = offset_type("PermutationVector",range_end-range_begin);
	}

	+ BinSort( const_key_view_type keys_
	+ , BinSortOp bin_op_
	+ , bool sort_within_bins_ = false
	+ )
	+ : BinSort( keys_ , 0 , keys_.extent(0), bin_op_ , sort_within_bins_ ) {}
	+
	+ //----------------------------------------
	// Create the permutation vector, the bin_offset array and the bin_count array. Can be called again if keys changed
	void create_permute_vector() {
	- Kokkos::parallel_for (Kokkos::RangePolicy<ExecutionSpace,bin_count_tag> (0,keys.dimension_0()),*this);
	- Kokkos::parallel_scan(Kokkos::RangePolicy<ExecutionSpace,bin_offset_tag> (0,bin_op.max_bins()) ,*this);
	+ const size_t len = range_end - range_begin ;
	+ Kokkos::parallel_for (Kokkos::RangePolicy<execution_space,bin_count_tag> (0,len),*this);
	+ Kokkos::parallel_scan(Kokkos::RangePolicy<execution_space,bin_offset_tag> (0,bin_op.max_bins()) ,*this);

	Kokkos::deep_copy(bin_count_atomic,0);
	- Kokkos::parallel_for (Kokkos::RangePolicy<ExecutionSpace,bin_binning_tag> (0,keys.dimension_0()),*this);
	+ Kokkos::parallel_for (Kokkos::RangePolicy<execution_space,bin_binning_tag> (0,len),*this);

	if(sort_within_bins)
	- Kokkos::parallel_for (Kokkos::RangePolicy<ExecutionSpace,bin_sort_bins_tag>(0,bin_op.max_bins()) ,*this);
	+ Kokkos::parallel_for (Kokkos::RangePolicy<execution_space,bin_sort_bins_tag>(0,bin_op.max_bins()) ,*this);
	}

	// Sort a view with respect ot the first dimension using the permutation array
	template<class ValuesViewType>
	- void sort(ValuesViewType values) {
	- ValuesViewType sorted_values = ValuesViewType("Copy",
	- values.dimension_0(),
	- values.dimension_1(),
	- values.dimension_2(),
	- values.dimension_3(),
	- values.dimension_4(),
	- values.dimension_5(),
	- values.dimension_6(),
	- values.dimension_7());
	-
	- parallel_for(values.dimension_0(),
	- bin_sort_sort_functor<ValuesViewType, offset_type,
	- Impl::CopyOp<ValuesViewType> >(values,sorted_values,sort_order));
	-
	- deep_copy(values,sorted_values);
	+ void sort( ValuesViewType const & values)
	+ {
	+ typedef
	+ Kokkos::View< typename ValuesViewType::data_type,
	+ typename ValuesViewType::array_layout,
	+ typename ValuesViewType::device_type >
	+ scratch_view_type ;
	+
	+ const size_t len = range_end - range_begin ;
	+
	+ scratch_view_type
	+ sorted_values("Scratch",
	+ len,
	+ values.extent(1),
	+ values.extent(2),
	+ values.extent(3),
	+ values.extent(4),
	+ values.extent(5),
	+ values.extent(6),
	+ values.extent(7));
	+
	+ {
	+ copy_permute_functor< scratch_view_type /* DstViewType */
	+ , offset_type /* PermuteViewType */
	+ , ValuesViewType /* SrcViewType */
	+ >
	+ functor( sorted_values , sort_order , values );
	+
	+ parallel_for( Kokkos::RangePolicy<execution_space>(0,len),functor);
	+ }
	+
	+ {
	+ copy_functor< ValuesViewType , scratch_view_type >
	+ functor( values , range_begin , sorted_values );
	+
	+ parallel_for( Kokkos::RangePolicy<execution_space>(0,len),functor);
	+ }
	}

	// Get the permutation vector
	KOKKOS_INLINE_FUNCTION
	offset_type get_permute_vector() const { return sort_order;}

	// Get the start offsets for each bin
	KOKKOS_INLINE_FUNCTION
	offset_type get_bin_offsets() const { return bin_offsets;}

	// Get the count for each bin
	KOKKOS_INLINE_FUNCTION
	bin_count_type get_bin_count() const {return bin_count_const;}

	public:
	+
	KOKKOS_INLINE_FUNCTION
	void operator() (const bin_count_tag& tag, const int& i) const {
	- bin_count_atomic(bin_op.bin(keys,i))++;
	+ const int j = range_begin + i ;
	+ bin_count_atomic(bin_op.bin(keys,j))++;
	}

	KOKKOS_INLINE_FUNCTION
	void operator() (const bin_offset_tag& tag, const int& i, value_type& offset, const bool& final) const {
	if(final) {
	bin_offsets(i) = offset;
	}
	offset+=bin_count_const(i);
	}

	KOKKOS_INLINE_FUNCTION
	void operator() (const bin_binning_tag& tag, const int& i) const {
	- const int bin = bin_op.bin(keys,i);
	+ const int j = range_begin + i ;
	+ const int bin = bin_op.bin(keys,j);
	const int count = bin_count_atomic(bin)++;

	- sort_order(bin_offsets(bin) + count) = i;
	+ sort_order(bin_offsets(bin) + count) = j ;
	}

	KOKKOS_INLINE_FUNCTION
	void operator() (const bin_sort_bins_tag& tag, const int&i ) const {
	bool sorted = false;
	int upper_bound = bin_offsets(i)+bin_count_const(i);
	while(!sorted) {
	sorted = true;
	int old_idx = sort_order(bin_offsets(i));
	int new_idx;
	for(int k=bin_offsets(i)+1; k<upper_bound; k++) {
	new_idx = sort_order(k);

	if(!bin_op(keys_rnd,old_idx,new_idx)) {
	sort_order(k-1) = new_idx;
	sort_order(k) = old_idx;
	sorted = false;
	} else {
	old_idx = new_idx;
	}
	}
	upper_bound--;
	}
	}
	};

	+//----------------------------------------------------------------------------
	+
	template<class KeyViewType>
	struct BinOp1D {
	- const int max_bins_;
	- const double mul_;
	+ int max_bins_;
	+ double mul_;
	typename KeyViewType::const_value_type range_;
	typename KeyViewType::const_value_type min_;

	+ BinOp1D():max_bins_(0),mul_(0.0),
	+ range_(typename KeyViewType::const_value_type()),
	+ min_(typename KeyViewType::const_value_type()) {}
	+
	//Construct BinOp with number of bins, minimum value and maxuimum value
	BinOp1D(int max_bins__, typename KeyViewType::const_value_type min,
	typename KeyViewType::const_value_type max )
	:max_bins_(max_bins__+1),mul_(1.0*max_bins__/(max-min)),range_(max-min),min_(min) {}

	//Determine bin index from key value
	template<class ViewType>
	KOKKOS_INLINE_FUNCTION
	int bin(ViewType& keys, const int& i) const {
	return int(mul_*(keys(i)-min_));
	}

	//Return maximum bin index + 1
	KOKKOS_INLINE_FUNCTION
	int max_bins() const {
	return max_bins_;
	}

	//Compare to keys within a bin if true new_val will be put before old_val
	template<class ViewType, typename iType1, typename iType2>
	KOKKOS_INLINE_FUNCTION
	bool operator()(ViewType& keys, iType1& i1, iType2& i2) const {
	return keys(i1)<keys(i2);
	}
	};

	template<class KeyViewType>
	struct BinOp3D {
	int max_bins_[3];
	double mul_[3];
	typename KeyViewType::non_const_value_type range_[3];
	typename KeyViewType::non_const_value_type min_[3];

	+ BinOp3D() {}
	+
	BinOp3D(int max_bins__[], typename KeyViewType::const_value_type min[],
	typename KeyViewType::const_value_type max[] )
	{
	- max_bins_[0] = max_bins__[0]+1;
	- max_bins_[1] = max_bins__[1]+1;
	- max_bins_[2] = max_bins__[2]+1;
	+ max_bins_[0] = max_bins__[0];
	+ max_bins_[1] = max_bins__[1];
	+ max_bins_[2] = max_bins__[2];
	mul_[0] = 1.0*max_bins__[0]/(max[0]-min[0]);
	mul_[1] = 1.0*max_bins__[1]/(max[1]-min[1]);
	mul_[2] = 1.0*max_bins__[2]/(max[2]-min[2]);
	range_[0] = max[0]-min[0];
	range_[1] = max[1]-min[1];
	range_[2] = max[2]-min[2];
	min_[0] = min[0];
	min_[1] = min[1];
	min_[2] = min[2];
	}

	template<class ViewType>
	KOKKOS_INLINE_FUNCTION
	int bin(ViewType& keys, const int& i) const {
	return int( (((int(mul_[0](keys(i,0)-min_[0]))max_bins_[1]) +
	int(mul_[1](keys(i,1)-min_[1])))max_bins_[2]) +
	int(mul_[2]*(keys(i,2)-min_[2])));
	}

	KOKKOS_INLINE_FUNCTION
	int max_bins() const {
	return max_bins_[0]max_bins_[1]max_bins_[2];
	}

	template<class ViewType, typename iType1, typename iType2>
	KOKKOS_INLINE_FUNCTION
	bool operator()(ViewType& keys, iType1& i1 , iType2& i2) const {
	if (keys(i1,0)>keys(i2,0)) return true;
	else if (keys(i1,0)==keys(i2,0)) {
	if (keys(i1,1)>keys(i2,1)) return true;
	else if (keys(i1,1)==keys(i2,2)) {
	if (keys(i1,2)>keys(i2,2)) return true;
	}
	}
	return false;
	}
	};

	namespace Impl {

	template<class ViewType>
	bool try_std_sort(ViewType view) {
	bool possible = true;
	size_t stride[8] = { view.stride_0()
	, view.stride_1()
	, view.stride_2()
	, view.stride_3()
	, view.stride_4()
	, view.stride_5()
	, view.stride_6()
	, view.stride_7()
	};
	possible = possible && std::is_same<typename ViewType::memory_space, HostSpace>::value;
	possible = possible && (ViewType::Rank == 1);
	possible = possible && (stride[0] == 1);
	if(possible) {
	- std::sort(view.ptr_on_device(),view.ptr_on_device()+view.dimension_0());
	+ std::sort(view.data(),view.data()+view.extent(0));
	}
	return possible;
	}

	template<class ViewType>
	struct min_max_functor {
	typedef Kokkos::Experimental::MinMaxScalar<typename ViewType::non_const_value_type> minmax_scalar;

	ViewType view;
	min_max_functor(const ViewType& view_):view(view_) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (const size_t& i, minmax_scalar& minmax) const {
	if(view(i) < minmax.min_val) minmax.min_val = view(i);
	if(view(i) > minmax.max_val) minmax.max_val = view(i);
	}
	};

	}

	template<class ViewType>
	-void sort(ViewType view, bool always_use_kokkos_sort = false) {
	+void sort( ViewType const & view , bool const always_use_kokkos_sort = false)
	+{
	if(!always_use_kokkos_sort) {
	if(Impl::try_std_sort(view)) return;
	}
	typedef BinOp1D<ViewType> CompType;

	Kokkos::Experimental::MinMaxScalar<typename ViewType::non_const_value_type> result;
	Kokkos::Experimental::MinMax<typename ViewType::non_const_value_type> reducer(result);
	- parallel_reduce(Kokkos::RangePolicy<typename ViewType::execution_space>(0,view.dimension_0()),
	+ parallel_reduce(Kokkos::RangePolicy<typename ViewType::execution_space>(0,view.extent(0)),
	Impl::min_max_functor<ViewType>(view),reducer);
	if(result.min_val == result.max_val) return;
	- BinSort<ViewType, CompType> bin_sort(view,CompType(view.dimension_0()/2,result.min_val,result.max_val),true);
	+ BinSort<ViewType, CompType> bin_sort(view,CompType(view.extent(0)/2,result.min_val,result.max_val),true);
	bin_sort.create_permute_vector();
	bin_sort.sort(view);
	}

	+template<class ViewType>
	+void sort( ViewType view
	+ , size_t const begin
	+ , size_t const end
	+ )
	+{
	+ typedef Kokkos::RangePolicy<typename ViewType::execution_space> range_policy ;
	+ typedef BinOp1D<ViewType> CompType;
	+
	+ Kokkos::Experimental::MinMaxScalar<typename ViewType::non_const_value_type> result;
	+ Kokkos::Experimental::MinMax<typename ViewType::non_const_value_type> reducer(result);
	+
	+ parallel_reduce( range_policy( begin , end )
	+ , Impl::min_max_functor<ViewType>(view),reducer );
	+
	+ if(result.min_val == result.max_val) return;
	+
	+ BinSort<ViewType, CompType>
	+ bin_sort(view,begin,end,CompType((end-begin)/2,result.min_val,result.max_val),true);
	+
	+ bin_sort.create_permute_vector();
	+ bin_sort.sort(view);
	+}
	}

	#endif
	diff --git a/lib/kokkos/algorithms/unit_tests/TestSort.hpp b/lib/kokkos/algorithms/unit_tests/TestSort.hpp
	index 03e4fb691..61ffa6f43 100644
	--- a/lib/kokkos/algorithms/unit_tests/TestSort.hpp
	+++ b/lib/kokkos/algorithms/unit_tests/TestSort.hpp
	@@ -1,210 +1,275 @@
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER

	#ifndef TESTSORT_HPP_
	#define TESTSORT_HPP_

	#include <gtest/gtest.h>
	#include<Kokkos_Core.hpp>
	+#include<Kokkos_DynamicView.hpp>
	#include<Kokkos_Random.hpp>
	#include<Kokkos_Sort.hpp>

	namespace Test {

	namespace Impl{

	template<class ExecutionSpace, class Scalar>
	struct is_sorted_struct {
	typedef unsigned int value_type;
	typedef ExecutionSpace execution_space;

	Kokkos::View<Scalar*,ExecutionSpace> keys;

	is_sorted_struct(Kokkos::View<Scalar*,ExecutionSpace> keys_):keys(keys_) {}
	KOKKOS_INLINE_FUNCTION
	void operator() (int i, unsigned int& count) const {
	if(keys(i)>keys(i+1)) count++;
	}
	};

	template<class ExecutionSpace, class Scalar>
	struct sum {
	typedef double value_type;
	typedef ExecutionSpace execution_space;

	Kokkos::View<Scalar*,ExecutionSpace> keys;

	sum(Kokkos::View<Scalar*,ExecutionSpace> keys_):keys(keys_) {}
	KOKKOS_INLINE_FUNCTION
	void operator() (int i, double& count) const {
	count+=keys(i);
	}
	};

	template<class ExecutionSpace, class Scalar>
	struct bin3d_is_sorted_struct {
	typedef unsigned int value_type;
	typedef ExecutionSpace execution_space;

	Kokkos::View<Scalar*[3],ExecutionSpace> keys;

	int max_bins;
	Scalar min;
	Scalar max;

	bin3d_is_sorted_struct(Kokkos::View<Scalar*[3],ExecutionSpace> keys_,int max_bins_,Scalar min_,Scalar max_):
	keys(keys_),max_bins(max_bins_),min(min_),max(max_) {
	}
	KOKKOS_INLINE_FUNCTION
	void operator() (int i, unsigned int& count) const {
	int ix1 = int ((keys(i,0)-min)/max * max_bins);
	int iy1 = int ((keys(i,1)-min)/max * max_bins);
	int iz1 = int ((keys(i,2)-min)/max * max_bins);
	int ix2 = int ((keys(i+1,0)-min)/max * max_bins);
	int iy2 = int ((keys(i+1,1)-min)/max * max_bins);
	int iz2 = int ((keys(i+1,2)-min)/max * max_bins);

	if (ix1>ix2) count++;
	else if(ix1==ix2) {
	if (iy1>iy2) count++;
	else if ((iy1==iy2) && (iz1>iz2)) count++;
	}
	}
	};

	template<class ExecutionSpace, class Scalar>
	struct sum3D {
	typedef double value_type;
	typedef ExecutionSpace execution_space;

	Kokkos::View<Scalar*[3],ExecutionSpace> keys;

	sum3D(Kokkos::View<Scalar*[3],ExecutionSpace> keys_):keys(keys_) {}
	KOKKOS_INLINE_FUNCTION
	void operator() (int i, double& count) const {
	count+=keys(i,0);
	count+=keys(i,1);
	count+=keys(i,2);
	}
	};

	template<class ExecutionSpace, typename KeyType>
	void test_1D_sort(unsigned int n,bool force_kokkos) {
	typedef Kokkos::View<KeyType*,ExecutionSpace> KeyViewType;
	KeyViewType keys("Keys",n);

	// Test sorting array with all numbers equal
	Kokkos::deep_copy(keys,KeyType(1));
	Kokkos::sort(keys,force_kokkos);

	Kokkos::Random_XorShift64_Pool<ExecutionSpace> g(1931);
	Kokkos::fill_random(keys,g,Kokkos::Random_XorShift64_Pool<ExecutionSpace>::generator_type::MAX_URAND);

	double sum_before = 0.0;
	double sum_after = 0.0;
	unsigned int sort_fails = 0;

	Kokkos::parallel_reduce(n,sum<ExecutionSpace, KeyType>(keys),sum_before);

	Kokkos::sort(keys,force_kokkos);

	Kokkos::parallel_reduce(n,sum<ExecutionSpace, KeyType>(keys),sum_after);
	Kokkos::parallel_reduce(n-1,is_sorted_struct<ExecutionSpace, KeyType>(keys),sort_fails);

	double ratio = sum_before/sum_after;
	double epsilon = 1e-10;
	unsigned int equal_sum = (ratio > (1.0-epsilon)) && (ratio < (1.0+epsilon)) ? 1 : 0;

	ASSERT_EQ(sort_fails,0);
	ASSERT_EQ(equal_sum,1);
	}

	template<class ExecutionSpace, typename KeyType>
	void test_3D_sort(unsigned int n) {
	typedef Kokkos::View<KeyType*[3],ExecutionSpace > KeyViewType;

	KeyViewType keys("Keys",nnn);

	Kokkos::Random_XorShift64_Pool<ExecutionSpace> g(1931);
	Kokkos::fill_random(keys,g,100.0);

	double sum_before = 0.0;
	double sum_after = 0.0;
	unsigned int sort_fails = 0;

	Kokkos::parallel_reduce(keys.dimension_0(),sum3D<ExecutionSpace, KeyType>(keys),sum_before);

	int bin_1d = 1;
	while( bin_1dbin_1dbin_1d4< (int) keys.dimension_0() ) bin_1d=2;
	int bin_max[3] = {bin_1d,bin_1d,bin_1d};
	typename KeyViewType::value_type min[3] = {0,0,0};
	typename KeyViewType::value_type max[3] = {100,100,100};

	typedef Kokkos::BinOp3D< KeyViewType > BinOp;
	BinOp bin_op(bin_max,min,max);
	Kokkos::BinSort< KeyViewType , BinOp >
	Sorter(keys,bin_op,false);
	Sorter.create_permute_vector();
	Sorter.template sort< KeyViewType >(keys);

	Kokkos::parallel_reduce(keys.dimension_0(),sum3D<ExecutionSpace, KeyType>(keys),sum_after);
	Kokkos::parallel_reduce(keys.dimension_0()-1,bin3d_is_sorted_struct<ExecutionSpace, KeyType>(keys,bin_1d,min[0],max[0]),sort_fails);

	double ratio = sum_before/sum_after;
	double epsilon = 1e-10;
	unsigned int equal_sum = (ratio > (1.0-epsilon)) && (ratio < (1.0+epsilon)) ? 1 : 0;

	- printf("3D Sort Sum: %f %f Fails: %u\n",sum_before,sum_after,sort_fails);
	+ if ( sort_fails )
	+ printf("3D Sort Sum: %f %f Fails: %u\n",sum_before,sum_after,sort_fails);
	+
	ASSERT_EQ(sort_fails,0);
	ASSERT_EQ(equal_sum,1);
	}

	+//----------------------------------------------------------------------------
	+
	+template<class ExecutionSpace, typename KeyType>
	+void test_dynamic_view_sort(unsigned int n )
	+{
	+ typedef typename ExecutionSpace::memory_space memory_space ;
	+ typedef Kokkos::Experimental::DynamicView<KeyType*,ExecutionSpace> KeyDynamicViewType;
	+ typedef Kokkos::View<KeyType*,ExecutionSpace> KeyViewType;
	+
	+ const size_t upper_bound = 2 * n ;
	+
	+ typename KeyDynamicViewType::memory_pool
	+ pool( memory_space() , 2 * n * sizeof(KeyType) );
	+
	+ KeyDynamicViewType keys("Keys",pool,upper_bound);
	+
	+ keys.resize_serial(n);
	+
	+ KeyViewType keys_view("KeysTmp", n );
	+
	+ // Test sorting array with all numbers equal
	+ Kokkos::deep_copy(keys_view,KeyType(1));
	+ Kokkos::Experimental::deep_copy(keys,keys_view);
	+ Kokkos::sort(keys, 0 /* begin / , n / end */ );
	+
	+ Kokkos::Random_XorShift64_Pool<ExecutionSpace> g(1931);
	+ Kokkos::fill_random(keys_view,g,Kokkos::Random_XorShift64_Pool<ExecutionSpace>::generator_type::MAX_URAND);
	+
	+ Kokkos::Experimental::deep_copy(keys,keys_view);
	+
	+ double sum_before = 0.0;
	+ double sum_after = 0.0;
	+ unsigned int sort_fails = 0;
	+
	+ Kokkos::parallel_reduce(n,sum<ExecutionSpace, KeyType>(keys_view),sum_before);
	+
	+ Kokkos::sort(keys, 0 /* begin / , n / end */ );
	+
	+ Kokkos::Experimental::deep_copy( keys_view , keys );
	+
	+ Kokkos::parallel_reduce(n,sum<ExecutionSpace, KeyType>(keys_view),sum_after);
	+ Kokkos::parallel_reduce(n-1,is_sorted_struct<ExecutionSpace, KeyType>(keys_view),sort_fails);
	+
	+ double ratio = sum_before/sum_after;
	+ double epsilon = 1e-10;
	+ unsigned int equal_sum = (ratio > (1.0-epsilon)) && (ratio < (1.0+epsilon)) ? 1 : 0;
	+
	+ if ( sort_fails != 0 \|\| equal_sum != 1 ) {
	+ std::cout << " N = " << n
	+ << " ; sum_before = " << sum_before
	+ << " ; sum_after = " << sum_after
	+ << " ; ratio = " << ratio
	+ << std::endl ;
	+ }
	+
	+ ASSERT_EQ(sort_fails,0);
	+ ASSERT_EQ(equal_sum,1);
	+}
	+
	+//----------------------------------------------------------------------------
	+
	template<class ExecutionSpace, typename KeyType>
	void test_sort(unsigned int N)
	{
	test_1D_sort<ExecutionSpace,KeyType>(NNN, true);
	test_1D_sort<ExecutionSpace,KeyType>(NNN, false);
	test_3D_sort<ExecutionSpace,KeyType>(N);
	+ test_dynamic_view_sort<ExecutionSpace,KeyType>(N*N);
	}

	}
	}
	#endif /* TESTSORT_HPP_ */
	diff --git a/lib/kokkos/bin/nvcc_wrapper b/lib/kokkos/bin/nvcc_wrapper
	index cb206cf88..09fa5d500 100755
	--- a/lib/kokkos/bin/nvcc_wrapper
	+++ b/lib/kokkos/bin/nvcc_wrapper
	@@ -1,284 +1,287 @@
	#!/bin/bash
	#
	# This shell script (nvcc_wrapper) wraps both the host compiler and
	# NVCC, if you are building legacy C or C++ code with CUDA enabled.
	# The script remedies some differences between the interface of NVCC
	# and that of the host compiler, in particular for linking.
	# It also means that a legacy code doesn't need separate .cu files;
	# it can just use .cpp files.
	#
	# Default settings: change those according to your machine. For
	# example, you may have have two different wrappers with either icpc
	# or g++ as their back-end compiler. The defaults can be overwritten
	# by using the usual arguments (e.g., -arch=sm_30 -ccbin icpc).

	default_arch="sm_35"
	#default_arch="sm_50"

	#
	# The default C++ compiler.
	#
	host_compiler=${NVCC_WRAPPER_DEFAULT_COMPILER:-"g++"}
	#host_compiler="icpc"
	#host_compiler="/usr/local/gcc/4.8.3/bin/g++"
	#host_compiler="/usr/local/gcc/4.9.1/bin/g++"

	#
	# Internal variables
	#

	# C++ files
	cpp_files=""

	# Host compiler arguments
	xcompiler_args=""

	# Cuda (NVCC) only arguments
	cuda_args=""

	# Arguments for both NVCC and Host compiler
	shared_args=""

	# Linker arguments
	xlinker_args=""

	# Object files passable to NVCC
	object_files=""

	# Link objects for the host linker only
	object_files_xlinker=""

	# Shared libraries with version numbers are not handled correctly by NVCC
	shared_versioned_libraries_host=""
	shared_versioned_libraries=""

	# Does the User set the architecture
	arch_set=0

	# Does the user overwrite the host compiler
	ccbin_set=0

	#Error code of compilation
	error_code=0

	# Do a dry run without actually compiling
	dry_run=0

	# Skip NVCC compilation and use host compiler directly
	host_only=0

	# Enable workaround for CUDA 6.5 for pragma ident
	replace_pragma_ident=0

	# Mark first host compiler argument
	first_xcompiler_arg=1

	temp_dir=${TMPDIR:-/tmp}

	# Check if we have an optimization argument already
	optimization_applied=0

	#echo "Arguments: $# $@"

	while [ $# -gt 0 ]
	do
	case $1 in
	#show the executed command
	--show\|--nvcc-wrapper-show)
	dry_run=1
	;;
	#run host compilation only
	--host-only)
	host_only=1
	;;
	#replace '#pragma ident' with '#ident' this is needed to compile OpenMPI due to a configure script bug and a non standardized behaviour of pragma with macros
	--replace-pragma-ident)
	replace_pragma_ident=1
	;;
	#handle source files to be compiled as cuda files
	.cpp\|.cxx\|.cc\|.C\|.c++\|.cu)
	cpp_files="$cpp_files $1"
	;;
	# Ensure we only have one optimization flag because NVCC doesn't allow muliple
	-O*)
	if [ $optimization_applied -eq 1 ]; then
	echo "nvcc_wrapper - warning you have set multiple optimization flags (-O*), only the first is used because nvcc can only accept a single optimization setting."
	else
	shared_args="$shared_args $1"
	optimization_applied=1
	fi
	;;
	#Handle shared args (valid for both nvcc and the host compiler)
	-D\|-c\|-I\|-L\|-l\|-g\|--help\|--version\|-E\|-M\|-shared)
	shared_args="$shared_args $1"
	;;
	#Handle shared args that have an argument
	-o\|-MT)
	shared_args="$shared_args $1 $2"
	shift
	;;
	#Handle known nvcc args
	-gencode\|--dryrun\|--verbose\|--keep\|--keep-dir\|-G\|--relocatable-device-code\|-lineinfo\|-expt-extended-lambda\|--resource-usage\|-Xptxas)
	cuda_args="$cuda_args $1"
	;;
	#Handle more known nvcc args
	--expt-extended-lambda\|--expt-relaxed-constexpr)
	cuda_args="$cuda_args $1"
	;;
	#Handle known nvcc args that have an argument
	-rdc\|-maxrregcount\|--default-stream)
	cuda_args="$cuda_args $1 $2"
	shift
	;;
	#Handle c++11 setting
	--std=c++11\|-std=c++11)
	shared_args="$shared_args $1"
	;;
	#strip of -std=c++98 due to nvcc warnings and Tribits will place both -std=c++11 and -std=c++98
	-std=c++98\|--std=c++98)
	;;
	#strip of pedantic because it produces endless warnings about #LINE added by the preprocessor
	-pedantic\|-Wpedantic\|-ansi)
	;;
	+ #strip of -Woverloaded-virtual to avoid "cc1: warning: command line option ‘-Woverloaded-virtual’ is valid for C++/ObjC++ but not for C"
	+ -Woverloaded-virtual)
	+ ;;
	#strip -Xcompiler because we add it
	-Xcompiler)
	if [ $first_xcompiler_arg -eq 1 ]; then
	xcompiler_args="$2"
	first_xcompiler_arg=0
	else
	xcompiler_args="$xcompiler_args,$2"
	fi
	shift
	;;
	#strip of "-x cu" because we add that
	-x)
	if [[ $2 != "cu" ]]; then
	if [ $first_xcompiler_arg -eq 1 ]; then
	xcompiler_args="-x,$2"
	first_xcompiler_arg=0
	else
	xcompiler_args="$xcompiler_args,-x,$2"
	fi
	fi
	shift
	;;
	#Handle -ccbin (if its not set we can set it to a default value)
	-ccbin)
	cuda_args="$cuda_args $1 $2"
	ccbin_set=1
	host_compiler=$2
	shift
	;;
	#Handle -arch argument (if its not set use a default
	-arch*)
	cuda_args="$cuda_args $1"
	arch_set=1
	;;
	#Handle -Xcudafe argument
	-Xcudafe)
	cuda_args="$cuda_args -Xcudafe $2"
	shift
	;;
	#Handle args that should be sent to the linker
	-Wl*)
	xlinker_args="$xlinker_args -Xlinker ${1:4:${#1}}"
	host_linker_args="$host_linker_args ${1:4:${#1}}"
	;;
	#Handle object files: -x cu applies to all input files, so give them to linker, except if only linking
	.a\|.so\|.o\|.obj)
	object_files="$object_files $1"
	object_files_xlinker="$object_files_xlinker -Xlinker $1"
	;;
	#Handle object files which always need to use "-Xlinker": -x cu applies to all input files, so give them to linker, except if only linking
	- *.dylib)
	+ @\|.dylib)
	object_files="$object_files -Xlinker $1"
	object_files_xlinker="$object_files_xlinker -Xlinker $1"
	;;
	#Handle shared libraries with .so. names which nvcc can't do.
	.so.)
	shared_versioned_libraries_host="$shared_versioned_libraries_host $1"
	shared_versioned_libraries="$shared_versioned_libraries -Xlinker $1"
	;;
	#All other args are sent to the host compiler
	*)
	if [ $first_xcompiler_arg -eq 1 ]; then
	xcompiler_args=$1
	first_xcompiler_arg=0
	else
	xcompiler_args="$xcompiler_args,$1"
	fi
	;;
	esac

	shift
	done

	#Add default host compiler if necessary
	if [ $ccbin_set -ne 1 ]; then
	cuda_args="$cuda_args -ccbin $host_compiler"
	fi

	#Add architecture command
	if [ $arch_set -ne 1 ]; then
	cuda_args="$cuda_args -arch=$default_arch"
	fi

	#Compose compilation command
	nvcc_command="nvcc $cuda_args $shared_args $xlinker_args $shared_versioned_libraries"
	if [ $first_xcompiler_arg -eq 0 ]; then
	nvcc_command="$nvcc_command -Xcompiler $xcompiler_args"
	fi

	#Compose host only command
	host_command="$host_compiler $shared_args $xcompiler_args $host_linker_args $shared_versioned_libraries_host"

	#nvcc does not accept '#pragma ident SOME_MACRO_STRING' but it does accept '#ident SOME_MACRO_STRING'
	if [ $replace_pragma_ident -eq 1 ]; then
	cpp_files2=""
	for file in $cpp_files
	do
	var=`grep pragma ${file} \| grep ident \| grep "#"`
	if [ "${#var}" -gt 0 ]
	then
	sed 's/#[\ \t]pragma[\ \t]ident/#ident/g' $file > $temp_dir/nvcc_wrapper_tmp_$file
	cpp_files2="$cpp_files2 $temp_dir/nvcc_wrapper_tmp_$file"
	else
	cpp_files2="$cpp_files2 $file"
	fi
	done
	cpp_files=$cpp_files2
	#echo $cpp_files
	fi

	if [ "$cpp_files" ]; then
	nvcc_command="$nvcc_command $object_files_xlinker -x cu $cpp_files"
	else
	nvcc_command="$nvcc_command $object_files"
	fi

	if [ "$cpp_files" ]; then
	host_command="$host_command $object_files $cpp_files"
	else
	host_command="$host_command $object_files"
	fi

	#Print command for dryrun
	if [ $dry_run -eq 1 ]; then
	if [ $host_only -eq 1 ]; then
	echo $host_command
	else
	echo $nvcc_command
	fi
	exit 0
	fi

	#Run compilation command
	if [ $host_only -eq 1 ]; then
	$host_command
	else
	$nvcc_command
	fi
	error_code=$?

	#Report error code
	exit $error_code
	diff --git a/lib/kokkos/cmake/tpls/FindTPLQTHREAD.cmake b/lib/kokkos/cmake/deps/QTHREADS.cmake
	similarity index 98%
	rename from lib/kokkos/cmake/tpls/FindTPLQTHREAD.cmake
	rename to lib/kokkos/cmake/deps/QTHREADS.cmake
	index 994b72b20..c312f2590 100644
	--- a/lib/kokkos/cmake/tpls/FindTPLQTHREAD.cmake
	+++ b/lib/kokkos/cmake/deps/QTHREADS.cmake
	@@ -1,70 +1,69 @@
	# @HEADER
	# ************************************************************************
	#
	# Trilinos: An Object-Oriented Solver Framework
	# Copyright (2001) Sandia Corporation
	#
	#
	# Copyright (2001) Sandia Corporation. Under the terms of Contract
	# DE-AC04-94AL85000, there is a non-exclusive license for use of this
	# work by or on behalf of the U.S. Government. Export of this program
	# may require a license from the United States Government.
	#
	# 1. Redistributions of source code must retain the above copyright
	# notice, this list of conditions and the following disclaimer.
	#
	# 2. Redistributions in binary form must reproduce the above copyright
	# notice, this list of conditions and the following disclaimer in the
	# documentation and/or other materials provided with the distribution.
	#
	# 3. Neither the name of the Corporation nor the names of the
	# contributors may be used to endorse or promote products derived from
	# this software without specific prior written permission.
	#
	# THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	#
	# NOTICE: The United States Government is granted for itself and others
	# acting on its behalf a paid-up, nonexclusive, irrevocable worldwide
	# license in this data to reproduce, prepare derivative works, and
	# perform publicly and display publicly. Beginning five (5) years from
	# July 25, 2001, the United States Government is granted for itself and
	# others acting on its behalf a paid-up, nonexclusive, irrevocable
	# worldwide license in this data to reproduce, prepare derivative works,
	# distribute copies to the public, perform publicly and display
	# publicly, and to permit others to do so.
	#
	# NEITHER THE UNITED STATES GOVERNMENT, NOR THE UNITED STATES DEPARTMENT
	# OF ENERGY, NOR SANDIA CORPORATION, NOR ANY OF THEIR EMPLOYEES, MAKES
	# ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL LIABILITY OR
	# RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF ANY
	# INFORMATION, APPARATUS, PRODUCT, OR PROCESS DISCLOSED, OR REPRESENTS
	# THAT ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS.
	#
	# ************************************************************************
	# @HEADER


	#-----------------------------------------------------------------------------
	# Hardware locality detection and control library.
	#
	# Acquisition information:
	# Date checked: July 2014
	# Checked by: H. Carter Edwards <hcedwar AT sandia.gov>
	# Source: https://code.google.com/p/qthreads
	#

	-TRIBITS_TPL_FIND_INCLUDE_DIRS_AND_LIBRARIES( QTHREAD
	+TRIBITS_TPL_FIND_INCLUDE_DIRS_AND_LIBRARIES( QTHREADS
	REQUIRED_HEADERS qthread.h
	REQUIRED_LIBS_NAMES "qthread"
	)
	-
	diff --git a/lib/kokkos/cmake/deps/QTHREAD.cmake b/lib/kokkos/cmake/tpls/FindTPLQTHREADS.cmake
	similarity index 98%
	rename from lib/kokkos/cmake/deps/QTHREAD.cmake
	rename to lib/kokkos/cmake/tpls/FindTPLQTHREADS.cmake
	index 994b72b20..c312f2590 100644
	--- a/lib/kokkos/cmake/deps/QTHREAD.cmake
	+++ b/lib/kokkos/cmake/tpls/FindTPLQTHREADS.cmake
	@@ -1,70 +1,69 @@
	# @HEADER
	# ************************************************************************
	#
	# Trilinos: An Object-Oriented Solver Framework
	# Copyright (2001) Sandia Corporation
	#
	#
	# Copyright (2001) Sandia Corporation. Under the terms of Contract
	# DE-AC04-94AL85000, there is a non-exclusive license for use of this
	# work by or on behalf of the U.S. Government. Export of this program
	# may require a license from the United States Government.
	#
	# 1. Redistributions of source code must retain the above copyright
	# notice, this list of conditions and the following disclaimer.
	#
	# 2. Redistributions in binary form must reproduce the above copyright
	# notice, this list of conditions and the following disclaimer in the
	# documentation and/or other materials provided with the distribution.
	#
	# 3. Neither the name of the Corporation nor the names of the
	# contributors may be used to endorse or promote products derived from
	# this software without specific prior written permission.
	#
	# THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	#
	# NOTICE: The United States Government is granted for itself and others
	# acting on its behalf a paid-up, nonexclusive, irrevocable worldwide
	# license in this data to reproduce, prepare derivative works, and
	# perform publicly and display publicly. Beginning five (5) years from
	# July 25, 2001, the United States Government is granted for itself and
	# others acting on its behalf a paid-up, nonexclusive, irrevocable
	# worldwide license in this data to reproduce, prepare derivative works,
	# distribute copies to the public, perform publicly and display
	# publicly, and to permit others to do so.
	#
	# NEITHER THE UNITED STATES GOVERNMENT, NOR THE UNITED STATES DEPARTMENT
	# OF ENERGY, NOR SANDIA CORPORATION, NOR ANY OF THEIR EMPLOYEES, MAKES
	# ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL LIABILITY OR
	# RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF ANY
	# INFORMATION, APPARATUS, PRODUCT, OR PROCESS DISCLOSED, OR REPRESENTS
	# THAT ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS.
	#
	# ************************************************************************
	# @HEADER


	#-----------------------------------------------------------------------------
	# Hardware locality detection and control library.
	#
	# Acquisition information:
	# Date checked: July 2014
	# Checked by: H. Carter Edwards <hcedwar AT sandia.gov>
	# Source: https://code.google.com/p/qthreads
	#

	-TRIBITS_TPL_FIND_INCLUDE_DIRS_AND_LIBRARIES( QTHREAD
	+TRIBITS_TPL_FIND_INCLUDE_DIRS_AND_LIBRARIES( QTHREADS
	REQUIRED_HEADERS qthread.h
	REQUIRED_LIBS_NAMES "qthread"
	)
	-
	diff --git a/lib/kokkos/config/kokkos_dev/config-core-all.sh b/lib/kokkos/config/kokkos_dev/config-core-all.sh
	index fa588c778..d4fb25a8e 100755
	--- a/lib/kokkos/config/kokkos_dev/config-core-all.sh
	+++ b/lib/kokkos/config/kokkos_dev/config-core-all.sh
	@@ -1,113 +1,110 @@
	#!/bin/sh
	#
	# Copy this script, put it outside the Trilinos source directory, and
	# build there.
	#
	#-----------------------------------------------------------------------------
	# Building on 'kokkos-dev.sandia.gov' with enabled capabilities:
	#
	-# Cuda, OpenMP, Threads, Qthread, hwloc
	+# Cuda, OpenMP, Threads, Qthreads, hwloc
	#
	# module loaded on 'kokkos-dev.sandia.gov' for this build
	#
	# module load cmake/2.8.11.2 gcc/4.8.3 cuda/6.5.14 nvcc-wrapper/gnu
	#
	# The 'nvcc-wrapper' module should load a script that matches
	# kokkos/config/nvcc_wrapper
	#
	#-----------------------------------------------------------------------------
	# Source and installation directories:

	TRILINOS_SOURCE_DIR=${HOME}/Trilinos
	TRILINOS_INSTALL_DIR=${HOME}/TrilinosInstall/`date +%F`

	CMAKE_CONFIGURE=""
	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D CMAKE_INSTALL_PREFIX=${TRILINOS_INSTALL_DIR}"

	#-----------------------------------------------------------------------------
	# Debug/optimized

	# CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D CMAKE_BUILD_TYPE:STRING=DEBUG"
	# CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Kokkos_ENABLE_BOUNDS_CHECK:BOOL=ON"

	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D CMAKE_BUILD_TYPE:STRING=RELEASE"

	#-----------------------------------------------------------------------------

	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D CMAKE_CXX_FLAGS:STRING=-Wall"
	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D CMAKE_C_COMPILER=gcc"

	#-----------------------------------------------------------------------------
	# Cuda using GNU, use the nvcc_wrapper to build CUDA source

	# CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D CMAKE_CXX_COMPILER=g++"

	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D CMAKE_CXX_COMPILER=nvcc_wrapper"
	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D TPL_ENABLE_CUDA:BOOL=ON"
	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D TPL_ENABLE_CUSPARSE:BOOL=ON"

	#-----------------------------------------------------------------------------
	# Configure for Kokkos subpackages and tests:

	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_Fortran:BOOL=OFF"
	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_ALL_PACKAGES:BOOL=OFF"
	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_EXAMPLES:BOOL=ON"
	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_TESTS:BOOL=ON"

	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_KokkosCore:BOOL=ON"
	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_KokkosContainers:BOOL=ON"
	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_KokkosAlgorithms:BOOL=ON"
	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_TpetraKernels:BOOL=ON"
	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_KokkosExample:BOOL=ON"

	#-----------------------------------------------------------------------------
	# Hardware locality configuration:

	HWLOC_BASE_DIR="/home/projects/hwloc/1.7.1/host/gnu/4.7.3"

	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D TPL_ENABLE_HWLOC:BOOL=ON"
	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D HWLOC_INCLUDE_DIRS:FILEPATH=${HWLOC_BASE_DIR}/include"
	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D HWLOC_LIBRARY_DIRS:FILEPATH=${HWLOC_BASE_DIR}/lib"

	#-----------------------------------------------------------------------------
	# Pthread

	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D TPL_ENABLE_Pthread:BOOL=ON"
	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Kokkos_ENABLE_Pthread:BOOL=ON"

	#-----------------------------------------------------------------------------
	# OpenMP

	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_OpenMP:BOOL=ON"
	CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Kokkos_ENABLE_OpenMP:BOOL=ON"

	#-----------------------------------------------------------------------------
	-# Qthread
	+# Qthreads

	-QTHREAD_BASE_DIR="/home/projects/qthreads/2014-07-08/host/gnu/4.7.3"
	+QTHREADS_BASE_DIR="/home/projects/qthreads/2014-07-08/host/gnu/4.7.3"

	-CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D TPL_ENABLE_QTHREAD:BOOL=ON"
	-CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D QTHREAD_INCLUDE_DIRS:FILEPATH=${QTHREAD_BASE_DIR}/include"
	-CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D QTHREAD_LIBRARY_DIRS:FILEPATH=${QTHREAD_BASE_DIR}/lib"
	+CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D TPL_ENABLE_QTHREADS:BOOL=ON"
	+CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D QTHREADS_INCLUDE_DIRS:FILEPATH=${QTHREADS_BASE_DIR}/include"
	+CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D QTHREADS_LIBRARY_DIRS:FILEPATH=${QTHREADS_BASE_DIR}/lib"

	#-----------------------------------------------------------------------------
	# C++11

	# CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_CXX11:BOOL=ON"
	# CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Kokkos_ENABLE_CXX11:BOOL=ON"

	#-----------------------------------------------------------------------------
	#
	# Remove CMake output files to force reconfigure from scratch.
	#

	rm -rf CMake* Trilinos* packages Dart* Testing cmake_install.cmake MakeFile*

	#

	echo cmake ${CMAKE_CONFIGURE} ${TRILINOS_SOURCE_DIR}

	cmake ${CMAKE_CONFIGURE} ${TRILINOS_SOURCE_DIR}
	-
	-#-----------------------------------------------------------------------------
	-
	diff --git a/lib/kokkos/config/master_history.txt b/lib/kokkos/config/master_history.txt
	index 446cbb021..9eaecb503 100644
	--- a/lib/kokkos/config/master_history.txt
	+++ b/lib/kokkos/config/master_history.txt
	@@ -1,7 +1,8 @@
	tag: 2.01.00 date: 07:21:2016 master: xxxxxxxx develop: fa6dfcc4
	tag: 2.01.06 date: 09:02:2016 master: 9afaa87f develop: 555f1a3a
	tag: 2.01.10 date: 09:27:2016 master: e4119325 develop: e6cda11e
	tag: 2.02.00 date: 10:30:2016 master: 6c90a581 develop: ca3dd56e
	tag: 2.02.01 date: 11:01:2016 master: 9c698c86 develop: b0072304
	tag: 2.02.07 date: 12:16:2016 master: 4b4cc4ba develop: 382c0966
	-tag: 2.02.15 date: 02:10:2017 master: 8c64cd93 develop: 28dea8b6
	+tag: 2.02.15 date: 02:10:2017 master: 8c64cd93 develop: 28dea8b6
	+tag: 2.03.00 date: 04:25:2017 master: 120d9ce7 develop: 015ba641
	diff --git a/lib/kokkos/config/test_all_sandia b/lib/kokkos/config/test_all_sandia
	index 2c15e951b..690960664 100755
	--- a/lib/kokkos/config/test_all_sandia
	+++ b/lib/kokkos/config/test_all_sandia
	@@ -1,676 +1,714 @@
	#!/bin/bash -e

	#
	# Global config
	#

	set -o pipefail

	-# Determine current machine
	+# Determine current machine.

	MACHINE=""
	HOSTNAME=$(hostname)
	PROCESSOR=`uname -p`

	if [[ "$HOSTNAME" =~ (white\|ride).* ]]; then
	- MACHINE=white
	+ MACHINE=white
	elif [[ "$HOSTNAME" =~ .bowman. ]]; then
	- MACHINE=bowman
	+ MACHINE=bowman
	elif [[ "$HOSTNAME" =~ node.* ]]; then # Warning: very generic name
	- if [[ "$PROCESSOR" = "aarch64" ]]; then
	- MACHINE=sullivan
	- else
	- MACHINE=shepard
	- fi
	+ if [[ "$PROCESSOR" = "aarch64" ]]; then
	+ MACHINE=sullivan
	+ else
	+ MACHINE=shepard
	+ fi
	elif [[ "$HOSTNAME" =~ apollo ]]; then
	- MACHINE=apollo
	+ MACHINE=apollo
	elif [ ! -z "$SEMS_MODULEFILES_ROOT" ]; then
	- MACHINE=sems
	+ MACHINE=sems
	else
	- echo "Unrecognized machine" >&2
	- exit 1
	+ echo "Unrecognized machine" >&2
	+ exit 1
	fi

	GCC_BUILD_LIST="OpenMP,Pthread,Serial,OpenMP_Serial,Pthread_Serial"
	IBM_BUILD_LIST="OpenMP,Serial,OpenMP_Serial"
	ARM_GCC_BUILD_LIST="OpenMP,Serial,OpenMP_Serial"
	INTEL_BUILD_LIST="OpenMP,Pthread,Serial,OpenMP_Serial,Pthread_Serial"
	CLANG_BUILD_LIST="Pthread,Serial,Pthread_Serial"
	CUDA_BUILD_LIST="Cuda_OpenMP,Cuda_Pthread,Cuda_Serial"
	CUDA_IBM_BUILD_LIST="Cuda_OpenMP,Cuda_Serial"

	GCC_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wignored-qualifiers,-Wempty-body,-Wclobbered,-Wuninitialized"
	IBM_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wuninitialized"
	CLANG_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wuninitialized"
	INTEL_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wuninitialized"
	CUDA_WARNING_FLAGS=""

	-# Default. Machine specific can override
	+# Default. Machine specific can override.
	DEBUG=False
	ARGS=""
	CUSTOM_BUILD_LIST=""
	+QTHREADS_PATH=""
	DRYRUN=False
	BUILD_ONLY=False
	declare -i NUM_JOBS_TO_RUN_IN_PARALLEL=3
	TEST_SCRIPT=False
	SKIP_HWLOC=False
	SPOT_CHECK=False

	PRINT_HELP=False
	OPT_FLAG=""
	KOKKOS_OPTIONS=""

	-
	#
	-# Handle arguments
	+# Handle arguments.
	#

	while [[ $# > 0 ]]
	do
	-key="$1"
	-case $key in
	---kokkos-path*)
	-KOKKOS_PATH="${key#*=}"
	-;;
	---build-list*)
	-CUSTOM_BUILD_LIST="${key#*=}"
	-;;
	---debug*)
	-DEBUG=True
	-;;
	---build-only*)
	-BUILD_ONLY=True
	-;;
	---test-script*)
	-TEST_SCRIPT=True
	-;;
	---skip-hwloc*)
	-SKIP_HWLOC=True
	-;;
	---num*)
	-NUM_JOBS_TO_RUN_IN_PARALLEL="${key#*=}"
	-;;
	---dry-run*)
	-DRYRUN=True
	-;;
	---spot-check*)
	-SPOT_CHECK=True
	-;;
	---arch*)
	-ARCH_FLAG="--arch=${key#*=}"
	-;;
	---opt-flag*)
	-OPT_FLAG="${key#*=}"
	-;;
	---with-cuda-options*)
	-KOKKOS_CUDA_OPTIONS="--with-cuda-options=${key#*=}"
	-;;
	---help*)
	-PRINT_HELP=True
	-;;
	-*)
	-# args, just append
	-ARGS="$ARGS $1"
	-;;
	-esac
	-shift
	+ key="$1"
	+
	+ case $key in
	+ --kokkos-path*)
	+ KOKKOS_PATH="${key#*=}"
	+ ;;
	+ --qthreads-path*)
	+ QTHREADS_PATH="${key#*=}"
	+ ;;
	+ --build-list*)
	+ CUSTOM_BUILD_LIST="${key#*=}"
	+ ;;
	+ --debug*)
	+ DEBUG=True
	+ ;;
	+ --build-only*)
	+ BUILD_ONLY=True
	+ ;;
	+ --test-script*)
	+ TEST_SCRIPT=True
	+ ;;
	+ --skip-hwloc*)
	+ SKIP_HWLOC=True
	+ ;;
	+ --num*)
	+ NUM_JOBS_TO_RUN_IN_PARALLEL="${key#*=}"
	+ ;;
	+ --dry-run*)
	+ DRYRUN=True
	+ ;;
	+ --spot-check*)
	+ SPOT_CHECK=True
	+ ;;
	+ --arch*)
	+ ARCH_FLAG="--arch=${key#*=}"
	+ ;;
	+ --opt-flag*)
	+ OPT_FLAG="${key#*=}"
	+ ;;
	+ --with-cuda-options*)
	+ KOKKOS_CUDA_OPTIONS="--with-cuda-options=${key#*=}"
	+ ;;
	+ --help*)
	+ PRINT_HELP=True
	+ ;;
	+ *)
	+ # args, just append
	+ ARGS="$ARGS $1"
	+ ;;
	+ esac
	+
	+ shift
	done

	SCRIPT_KOKKOS_ROOT=$( cd "$( dirname "$0" )" && cd .. && pwd )

	-# set kokkos path
	+# Set kokkos path.
	if [ -z "$KOKKOS_PATH" ]; then
	- KOKKOS_PATH=$SCRIPT_KOKKOS_ROOT
	+ KOKKOS_PATH=$SCRIPT_KOKKOS_ROOT
	else
	- # Ensure KOKKOS_PATH is abs path
	- KOKKOS_PATH=$( cd $KOKKOS_PATH && pwd )
	+ # Ensure KOKKOS_PATH is abs path.
	+ KOKKOS_PATH=$( cd $KOKKOS_PATH && pwd )
	fi

	#
	-# Machine specific config
	+# Machine specific config.
	#

	if [ "$MACHINE" = "sems" ]; then
	- source /projects/sems/modulefiles/utils/sems-modules-init.sh
	+ source /projects/sems/modulefiles/utils/sems-modules-init.sh

	- BASE_MODULE_LIST="sems-env,kokkos-env,sems-<COMPILER_NAME>/<COMPILER_VERSION>,kokkos-hwloc/1.10.1/base"
	- CUDA_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/4.8.4,kokkos-hwloc/1.10.1/base"
	- CUDA8_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0,kokkos-hwloc/1.10.1/base"
	+ BASE_MODULE_LIST="sems-env,kokkos-env,sems-<COMPILER_NAME>/<COMPILER_VERSION>,kokkos-hwloc/1.10.1/base"
	+ CUDA_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/4.8.4,kokkos-hwloc/1.10.1/base"
	+ CUDA8_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0,kokkos-hwloc/1.10.1/base"

	- if [ -z "$ARCH_FLAG" ]; then
	- ARCH_FLAG=""
	- fi
	+ if [ -z "$ARCH_FLAG" ]; then
	+ ARCH_FLAG=""
	+ fi

	if [ "$SPOT_CHECK" = "True" ]; then
	# Format: (compiler module-list build-list exe-name warning-flag)
	COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST "OpenMP,Pthread" g++ $GCC_WARNING_FLAGS"
	"gcc/5.1.0 $BASE_MODULE_LIST "Serial" g++ $GCC_WARNING_FLAGS"
	"intel/16.0.1 $BASE_MODULE_LIST "OpenMP" icpc $INTEL_WARNING_FLAGS"
	"clang/3.9.0 $BASE_MODULE_LIST "Pthread_Serial" clang++ $CLANG_WARNING_FLAGS"
	"cuda/8.0.44 $CUDA8_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
	)
	else
	# Format: (compiler module-list build-list exe-name warning-flag)
	COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
	"gcc/4.8.4 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
	- "gcc/4.9.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
	- "gcc/5.1.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
	"intel/14.0.4 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
	"intel/15.0.2 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
	"intel/16.0.1 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
	"clang/3.6.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
	"clang/3.7.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
	"clang/3.8.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
	- "clang/3.9.0 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
	"cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
	"cuda/7.5.18 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
	"cuda/8.0.44 $CUDA8_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
	)
	fi
	-
	elif [ "$MACHINE" = "white" ]; then
	- source /etc/profile.d/modules.sh
	- SKIP_HWLOC=True
	- export SLURM_TASKS_PER_NODE=32
	+ source /etc/profile.d/modules.sh
	+ SKIP_HWLOC=True
	+ export SLURM_TASKS_PER_NODE=32

	- BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>"
	- IBM_MODULE_LIST="<COMPILER_NAME>/xl/<COMPILER_VERSION>"
	- CUDA_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>,gcc/5.4.0"
	+ BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>"
	+ IBM_MODULE_LIST="<COMPILER_NAME>/xl/<COMPILER_VERSION>"
	+ CUDA_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>,gcc/5.4.0"

	- # Don't do pthread on white
	- GCC_BUILD_LIST="OpenMP,Serial,OpenMP_Serial"
	+ # Don't do pthread on white.
	+ GCC_BUILD_LIST="OpenMP,Serial,OpenMP_Serial"

	- # Format: (compiler module-list build-list exe-name warning-flag)
	- COMPILERS=("gcc/5.4.0 $BASE_MODULE_LIST $IBM_BUILD_LIST g++ $GCC_WARNING_FLAGS"
	- "ibm/13.1.3 $IBM_MODULE_LIST $IBM_BUILD_LIST xlC $IBM_WARNING_FLAGS"
	- "cuda/8.0.44 $CUDA_MODULE_LIST $CUDA_IBM_BUILD_LIST ${KOKKOS_PATH}/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
	- )
	- if [ -z "$ARCH_FLAG" ]; then
	- ARCH_FLAG="--arch=Power8,Kepler37"
	- fi
	- NUM_JOBS_TO_RUN_IN_PARALLEL=2
	+ # Format: (compiler module-list build-list exe-name warning-flag)
	+ COMPILERS=("gcc/5.4.0 $BASE_MODULE_LIST $IBM_BUILD_LIST g++ $GCC_WARNING_FLAGS"
	+ "ibm/13.1.3 $IBM_MODULE_LIST $IBM_BUILD_LIST xlC $IBM_WARNING_FLAGS"
	+ "cuda/8.0.44 $CUDA_MODULE_LIST $CUDA_IBM_BUILD_LIST ${KOKKOS_PATH}/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
	+ )
	+
	+ if [ -z "$ARCH_FLAG" ]; then
	+ ARCH_FLAG="--arch=Power8,Kepler37"
	+ fi
	+
	+ NUM_JOBS_TO_RUN_IN_PARALLEL=2

	elif [ "$MACHINE" = "bowman" ]; then
	- source /etc/profile.d/modules.sh
	- SKIP_HWLOC=True
	- export SLURM_TASKS_PER_NODE=32
	+ source /etc/profile.d/modules.sh
	+ SKIP_HWLOC=True
	+ export SLURM_TASKS_PER_NODE=32

	- BASE_MODULE_LIST="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"
	+ BASE_MODULE_LIST="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"

	- OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"
	+ OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"

	- # Format: (compiler module-list build-list exe-name warning-flag)
	- COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
	- "intel/17.0.098 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
	- )
	+ # Format: (compiler module-list build-list exe-name warning-flag)
	+ COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
	+ "intel/17.0.098 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
	+ )

	- if [ -z "$ARCH_FLAG" ]; then
	- ARCH_FLAG="--arch=KNL"
	- fi
	+ if [ -z "$ARCH_FLAG" ]; then
	+ ARCH_FLAG="--arch=KNL"
	+ fi

	- NUM_JOBS_TO_RUN_IN_PARALLEL=2
	+ NUM_JOBS_TO_RUN_IN_PARALLEL=2

	elif [ "$MACHINE" = "sullivan" ]; then
	- source /etc/profile.d/modules.sh
	- SKIP_HWLOC=True
	- export SLURM_TASKS_PER_NODE=96
	+ source /etc/profile.d/modules.sh
	+ SKIP_HWLOC=True
	+ export SLURM_TASKS_PER_NODE=96

	- BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>"
	+ BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>"

	- # Format: (compiler module-list build-list exe-name warning-flag)
	- COMPILERS=("gcc/5.3.0 $BASE_MODULE_LIST $ARM_GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS")
	+ # Format: (compiler module-list build-list exe-name warning-flag)
	+ COMPILERS=("gcc/5.3.0 $BASE_MODULE_LIST $ARM_GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS")

	- if [ -z "$ARCH_FLAG" ]; then
	- ARCH_FLAG="--arch=ARMv8-ThunderX"
	- fi
	+ if [ -z "$ARCH_FLAG" ]; then
	+ ARCH_FLAG="--arch=ARMv8-ThunderX"
	+ fi

	- NUM_JOBS_TO_RUN_IN_PARALLEL=2
	+ NUM_JOBS_TO_RUN_IN_PARALLEL=2

	elif [ "$MACHINE" = "shepard" ]; then
	- source /etc/profile.d/modules.sh
	- SKIP_HWLOC=True
	- export SLURM_TASKS_PER_NODE=32
	+ source /etc/profile.d/modules.sh
	+ SKIP_HWLOC=True
	+ export SLURM_TASKS_PER_NODE=32

	- BASE_MODULE_LIST="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"
	+ BASE_MODULE_LIST="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"

	- OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"
	+ OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"

	- # Format: (compiler module-list build-list exe-name warning-flag)
	- COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
	- "intel/17.0.098 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
	- )
	+ # Format: (compiler module-list build-list exe-name warning-flag)
	+ COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
	+ "intel/17.0.098 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
	+ )

	- if [ -z "$ARCH_FLAG" ]; then
	- ARCH_FLAG="--arch=HSW"
	- fi
	- NUM_JOBS_TO_RUN_IN_PARALLEL=2
	+ if [ -z "$ARCH_FLAG" ]; then
	+ ARCH_FLAG="--arch=HSW"
	+ fi
	+ NUM_JOBS_TO_RUN_IN_PARALLEL=2

	elif [ "$MACHINE" = "apollo" ]; then
	- source /projects/sems/modulefiles/utils/sems-modules-init.sh
	- module use /home/projects/modulefiles/local/x86-64
	- module load kokkos-env
	+ source /projects/sems/modulefiles/utils/sems-modules-init.sh
	+ module use /home/projects/modulefiles/local/x86-64
	+ module load kokkos-env

	- module load sems-git
	- module load sems-tex
	- module load sems-cmake/3.5.2
	- module load sems-gdb
	+ module load sems-git
	+ module load sems-tex
	+ module load sems-cmake/3.5.2
	+ module load sems-gdb

	- SKIP_HWLOC=True
	+ SKIP_HWLOC=True

	- BASE_MODULE_LIST="sems-env,kokkos-env,sems-<COMPILER_NAME>/<COMPILER_VERSION>,kokkos-hwloc/1.10.1/base"
	- CUDA_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/4.8.4,kokkos-hwloc/1.10.1/base"
	- CUDA8_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0,kokkos-hwloc/1.10.1/base"
	+ BASE_MODULE_LIST="sems-env,kokkos-env,sems-<COMPILER_NAME>/<COMPILER_VERSION>,kokkos-hwloc/1.10.1/base"
	+ CUDA_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/4.8.4,kokkos-hwloc/1.10.1/base"
	+ CUDA8_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0,kokkos-hwloc/1.10.1/base"

	- CLANG_MODULE_LIST="sems-env,kokkos-env,sems-git,sems-cmake/3.5.2,<COMPILER_NAME>/<COMPILER_VERSION>,cuda/8.0.44"
	- NVCC_MODULE_LIST="sems-env,kokkos-env,sems-git,sems-cmake/3.5.2,<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0"
	+ CLANG_MODULE_LIST="sems-env,kokkos-env,sems-git,sems-cmake/3.5.2,<COMPILER_NAME>/<COMPILER_VERSION>,cuda/8.0.44"
	+ NVCC_MODULE_LIST="sems-env,kokkos-env,sems-git,sems-cmake/3.5.2,<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0"

	- BUILD_LIST_CUDA_NVCC="Cuda_Serial,Cuda_OpenMP"
	- BUILD_LIST_CUDA_CLANG="Cuda_Serial,Cuda_Pthread"
	- BUILD_LIST_CLANG="Serial,Pthread,OpenMP"
	+ BUILD_LIST_CUDA_NVCC="Cuda_Serial,Cuda_OpenMP"
	+ BUILD_LIST_CUDA_CLANG="Cuda_Serial,Cuda_Pthread"
	+ BUILD_LIST_CLANG="Serial,Pthread,OpenMP"

	if [ "$SPOT_CHECK" = "True" ]; then
	# Format: (compiler module-list build-list exe-name warning-flag)
	COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST "OpenMP,Pthread" g++ $GCC_WARNING_FLAGS"
	"gcc/5.1.0 $BASE_MODULE_LIST "Serial" g++ $GCC_WARNING_FLAGS"
	"intel/16.0.1 $BASE_MODULE_LIST "OpenMP" icpc $INTEL_WARNING_FLAGS"
	"clang/3.9.0 $BASE_MODULE_LIST "Pthread_Serial" clang++ $CLANG_WARNING_FLAGS"
	"clang/head $CLANG_MODULE_LIST "Cuda_Pthread" clang++ $CUDA_WARNING_FLAGS"
	"cuda/8.0.44 $CUDA_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
	)
	else
	# Format: (compiler module-list build-list exe-name warning-flag)
	COMPILERS=("cuda/8.0.44 $CUDA8_MODULE_LIST $BUILD_LIST_CUDA_NVCC $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
	"clang/head $CLANG_MODULE_LIST $BUILD_LIST_CUDA_CLANG clang++ $CUDA_WARNING_FLAGS"
	"clang/3.9.0 $CLANG_MODULE_LIST $BUILD_LIST_CLANG clang++ $CLANG_WARNING_FLAGS"
	"gcc/4.7.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
	"gcc/4.8.4 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
	"gcc/4.9.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
	"gcc/5.3.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
	"gcc/6.1.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
	"intel/14.0.4 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
	"intel/15.0.2 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
	"intel/16.0.1 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
	"clang/3.5.2 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
	"clang/3.6.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
	"cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
	"cuda/7.5.18 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
	)
	fi

	- if [ -z "$ARCH_FLAG" ]; then
	- ARCH_FLAG="--arch=SNB,Kepler35"
	- fi
	- NUM_JOBS_TO_RUN_IN_PARALLEL=2
	-else
	- echo "Unhandled machine $MACHINE" >&2
	- exit 1
	-fi
	+ if [ -z "$ARCH_FLAG" ]; then
	+ ARCH_FLAG="--arch=SNB,Kepler35"
	+ fi

	+ NUM_JOBS_TO_RUN_IN_PARALLEL=2

	+else
	+ echo "Unhandled machine $MACHINE" >&2
	+ exit 1
	+fi

	export OMP_NUM_THREADS=4

	declare -i NUM_RESULTS_TO_KEEP=7

	RESULT_ROOT_PREFIX=TestAll

	if [ "$PRINT_HELP" = "True" ]; then
	-echo "test_all_sandia <ARGS> <OPTIONS>:"
	-echo "--kokkos-path=/Path/To/Kokkos: Path to the Kokkos root directory"
	-echo " Defaults to root repo containing this script"
	-echo "--debug: Run tests in debug. Defaults to False"
	-echo "--test-script: Test this script, not Kokkos"
	-echo "--skip-hwloc: Do not do hwloc tests"
	-echo "--num=N: Number of jobs to run in parallel"
	-echo "--spot-check: Minimal test set to issue pull request"
	-echo "--dry-run: Just print what would be executed"
	-echo "--build-only: Just do builds, don't run anything"
	-echo "--opt-flag=FLAG: Optimization flag (default: -O3)"
	-echo "--arch=ARCHITECTURE: overwrite architecture flags"
	-echo "--with-cuda-options=OPT: set KOKKOS_CUDA_OPTIONS"
	-echo "--build-list=BUILD,BUILD,BUILD..."
	-echo " Provide a comma-separated list of builds instead of running all builds"
	-echo " Valid items:"
	-echo " OpenMP, Pthread, Serial, OpenMP_Serial, Pthread_Serial"
	-echo " Cuda_OpenMP, Cuda_Pthread, Cuda_Serial"
	-echo ""
	-
	-echo "ARGS: list of expressions matching compilers to test"
	-echo " supported compilers sems"
	-for COMPILER_DATA in "${COMPILERS[@]}"; do
	+ echo "test_all_sandia <ARGS> <OPTIONS>:"
	+ echo "--kokkos-path=/Path/To/Kokkos: Path to the Kokkos root directory"
	+ echo " Defaults to root repo containing this script"
	+ echo "--debug: Run tests in debug. Defaults to False"
	+ echo "--test-script: Test this script, not Kokkos"
	+ echo "--skip-hwloc: Do not do hwloc tests"
	+ echo "--num=N: Number of jobs to run in parallel"
	+ echo "--spot-check: Minimal test set to issue pull request"
	+ echo "--dry-run: Just print what would be executed"
	+ echo "--build-only: Just do builds, don't run anything"
	+ echo "--opt-flag=FLAG: Optimization flag (default: -O3)"
	+ echo "--arch=ARCHITECTURE: overwrite architecture flags"
	+ echo "--with-cuda-options=OPT: set KOKKOS_CUDA_OPTIONS"
	+ echo "--build-list=BUILD,BUILD,BUILD..."
	+ echo " Provide a comma-separated list of builds instead of running all builds"
	+ echo " Valid items:"
	+ echo " OpenMP, Pthread, Qthreads, Serial, OpenMP_Serial, Pthread_Serial"
	+ echo " Qthreads_Serial, Cuda_OpenMP, Cuda_Pthread, Cuda_Serial"
	+ echo ""
	+
	+ echo "ARGS: list of expressions matching compilers to test"
	+ echo " supported compilers sems"
	+ for COMPILER_DATA in "${COMPILERS[@]}"; do
	ARR=($COMPILER_DATA)
	COMPILER=${ARR[0]}
	echo " $COMPILER"
	-done
	-echo ""
	-
	-echo "Examples:"
	-echo " Run all tests"
	-echo " % test_all_sandia"
	-echo ""
	-echo " Run all gcc tests"
	-echo " % test_all_sandia gcc"
	-echo ""
	-echo " Run all gcc/4.7.2 and all intel tests"
	-echo " % test_all_sandia gcc/4.7.2 intel"
	-echo ""
	-echo " Run all tests in debug"
	-echo " % test_all_sandia --debug"
	-echo ""
	-echo " Run gcc/4.7.2 and only do OpenMP and OpenMP_Serial builds"
	-echo " % test_all_sandia gcc/4.7.2 --build-list=OpenMP,OpenMP_Serial"
	-echo ""
	-echo "If you want to kill the tests, do:"
	-echo " hit ctrl-z"
	-echo " % kill -9 %1"
	-echo
	-exit 0
	+ done
	+ echo ""
	+
	+ echo "Examples:"
	+ echo " Run all tests"
	+ echo " % test_all_sandia"
	+ echo ""
	+ echo " Run all gcc tests"
	+ echo " % test_all_sandia gcc"
	+ echo ""
	+ echo " Run all gcc/4.7.2 and all intel tests"
	+ echo " % test_all_sandia gcc/4.7.2 intel"
	+ echo ""
	+ echo " Run all tests in debug"
	+ echo " % test_all_sandia --debug"
	+ echo ""
	+ echo " Run gcc/4.7.2 and only do OpenMP and OpenMP_Serial builds"
	+ echo " % test_all_sandia gcc/4.7.2 --build-list=OpenMP,OpenMP_Serial"
	+ echo ""
	+ echo "If you want to kill the tests, do:"
	+ echo " hit ctrl-z"
	+ echo " % kill -9 %1"
	+ echo
	+ exit 0
	fi

	-# set build type
	+# Set build type.
	if [ "$DEBUG" = "True" ]; then
	- BUILD_TYPE=debug
	+ BUILD_TYPE=debug
	else
	- BUILD_TYPE=release
	+ BUILD_TYPE=release
	fi

	-# If no args provided, do all compilers
	+# If no args provided, do all compilers.
	if [ -z "$ARGS" ]; then
	- ARGS='?'
	+ ARGS='?'
	fi

	-# Process args to figure out which compilers to test
	+# Process args to figure out which compilers to test.
	COMPILERS_TO_TEST=""
	+
	for ARG in $ARGS; do
	- for COMPILER_DATA in "${COMPILERS[@]}"; do
	- ARR=($COMPILER_DATA)
	- COMPILER=${ARR[0]}
	- if [[ "$COMPILER" = $ARG* ]]; then
	- if [[ "$COMPILERS_TO_TEST" != ${COMPILER} ]]; then
	- COMPILERS_TO_TEST="$COMPILERS_TO_TEST $COMPILER"
	- else
	- echo "Tried to add $COMPILER twice"
	- fi
	- fi
	- done
	+ for COMPILER_DATA in "${COMPILERS[@]}"; do
	+ ARR=($COMPILER_DATA)
	+ COMPILER=${ARR[0]}
	+
	+ if [[ "$COMPILER" = $ARG* ]]; then
	+ if [[ "$COMPILERS_TO_TEST" != ${COMPILER} ]]; then
	+ COMPILERS_TO_TEST="$COMPILERS_TO_TEST $COMPILER"
	+ else
	+ echo "Tried to add $COMPILER twice"
	+ fi
	+ fi
	+ done
	done

	+# Check if Qthreads build requested.
	+HAVE_QTHREADS_BUILD="False"
	+if [ -n "$CUSTOM_BUILD_LIST" ]; then
	+ if [[ "$CUSTOM_BUILD_LIST" = Qthreads ]]; then
	+ HAVE_QTHREADS_BUILD="True"
	+ fi
	+else
	+ for COMPILER_DATA in "${COMPILERS[@]}"; do
	+ ARR=($COMPILER_DATA)
	+ BUILD_LIST=${ARR[2]}
	+ if [[ "$BUILD_LIST" = Qthreads ]]; then
	+ HAVE_QTHREADS_BUILD="True"
	+ fi
	+ done
	+fi
	+
	+# Ensure Qthreads path is set if Qthreads build is requested.
	+if [ "$HAVE_QTHREADS_BUILD" = "True" ]; then
	+ if [ -z "$QTHREADS_PATH" ]; then
	+ echo "Need to supply Qthreads path (--qthreads-path) when testing Qthreads backend." >&2
	+ exit 1
	+ else
	+ # Strip trailing slashes from path.
	+ QTHREADS_PATH=$(echo $QTHREADS_PATH \| sed 's/\/*$//')
	+ fi
	+fi
	+
	#
	-# Functions
	+# Functions.
	#

	# get_compiler_name <COMPILER>
	get_compiler_name() {
	- echo $1 \| cut -d/ -f1
	+ echo $1 \| cut -d/ -f1
	}

	# get_compiler_version <COMPILER>
	get_compiler_version() {
	- echo $1 \| cut -d/ -f2
	+ echo $1 \| cut -d/ -f2
	}

	-# Do not call directly
	+# Do not call directly.
	get_compiler_data() {
	- local compiler=$1
	- local item=$2
	- local compiler_name=$(get_compiler_name $compiler)
	- local compiler_vers=$(get_compiler_version $compiler)
	-
	- local compiler_data
	- for compiler_data in "${COMPILERS[@]}" ; do
	- local arr=($compiler_data)
	- if [ "$compiler" = "${arr[0]}" ]; then
	- echo "${arr[$item]}" \| tr , ' ' \| sed -e "s/<COMPILER_NAME>/$compiler_name/g" -e "s/<COMPILER_VERSION>/$compiler_vers/g"
	- return 0
	- fi
	- done
	-
	- # Not found
	- echo "Unreconized compiler $compiler" >&2
	- exit 1
	+ local compiler=$1
	+ local item=$2
	+ local compiler_name=$(get_compiler_name $compiler)
	+ local compiler_vers=$(get_compiler_version $compiler)
	+
	+ local compiler_data
	+ for compiler_data in "${COMPILERS[@]}" ; do
	+ local arr=($compiler_data)
	+
	+ if [ "$compiler" = "${arr[0]}" ]; then
	+ echo "${arr[$item]}" \| tr , ' ' \| sed -e "s/<COMPILER_NAME>/$compiler_name/g" -e "s/<COMPILER_VERSION>/$compiler_vers/g"
	+ return 0
	+ fi
	+ done
	+
	+ # Not found.
	+ echo "Unreconized compiler $compiler" >&2
	+ exit 1
	}

	#
	# For all getters, usage: <GETTER> <COMPILER>
	#

	get_compiler_modules() {
	- get_compiler_data $1 1
	+ get_compiler_data $1 1
	}

	get_compiler_build_list() {
	- get_compiler_data $1 2
	+ get_compiler_data $1 2
	}

	get_compiler_exe_name() {
	- get_compiler_data $1 3
	+ get_compiler_data $1 3
	}

	get_compiler_warning_flags() {
	- get_compiler_data $1 4
	+ get_compiler_data $1 4
	}

	run_cmd() {
	- echo "RUNNING: $*"
	- if [ "$DRYRUN" != "True" ]; then
	- eval "$* 2>&1"
	- fi
	+ echo "RUNNING: $*"
	+ if [ "$DRYRUN" != "True" ]; then
	+ eval "$* 2>&1"
	+ fi
	}

	# report_and_log_test_results <SUCCESS> <DESC> <COMMENT>
	report_and_log_test_result() {
	- # Use sane var names
	- local success=$1; local desc=$2; local comment=$3;
	+ # Use sane var names.
	+ local success=$1; local desc=$2; local comment=$3;

	- if [ "$success" = "0" ]; then
	- echo " PASSED $desc"
	- echo $comment > $PASSED_DIR/$desc
	- else
	- # For failures, comment should be the name of the phase that failed
	- echo " FAILED $desc" >&2
	- echo $comment > $FAILED_DIR/$desc
	- cat ${desc}.${comment}.log
	- fi
	+ if [ "$success" = "0" ]; then
	+ echo " PASSED $desc"
	+ echo $comment > $PASSED_DIR/$desc
	+ else
	+ # For failures, comment should be the name of the phase that failed.
	+ echo " FAILED $desc" >&2
	+ echo $comment > $FAILED_DIR/$desc
	+ cat ${desc}.${comment}.log
	+ fi
	}

	setup_env() {
	- local compiler=$1
	- local compiler_modules=$(get_compiler_modules $compiler)
	-
	- module purge
	-
	- local mod
	- for mod in $compiler_modules; do
	- echo "Loading module $mod"
	- module load $mod 2>&1
	- # It is ridiculously hard to check for the success of a loaded
	- # module. Module does not return error codes and piping to grep
	- # causes module to run in a subshell.
	- module list 2>&1 \| grep "$mod" >& /dev/null \|\| return 1
	- done
	-
	- return 0
	+ local compiler=$1
	+ local compiler_modules=$(get_compiler_modules $compiler)
	+
	+ module purge
	+
	+ local mod
	+ for mod in $compiler_modules; do
	+ echo "Loading module $mod"
	+ module load $mod 2>&1
	+ # It is ridiculously hard to check for the success of a loaded
	+ # module. Module does not return error codes and piping to grep
	+ # causes module to run in a subshell.
	+ module list 2>&1 \| grep "$mod" >& /dev/null \|\| return 1
	+ done
	+
	+ return 0
	}

	# single_build_and_test <COMPILER> <BUILD> <BUILD_TYPE>
	single_build_and_test() {
	- # Use sane var names
	- local compiler=$1; local build=$2; local build_type=$3;
	+ # Use sane var names.
	+ local compiler=$1; local build=$2; local build_type=$3;
	+
	+ # Set up env.
	+ mkdir -p $ROOT_DIR/$compiler/"${build}-$build_type"
	+ cd $ROOT_DIR/$compiler/"${build}-$build_type"
	+ local desc=$(echo "${compiler}-${build}-${build_type}" \| sed 's:/:-:g')
	+ setup_env $compiler >& ${desc}.configure.log \|\| { report_and_log_test_result 1 ${desc} configure && return 0; }

	- # set up env
	- mkdir -p $ROOT_DIR/$compiler/"${build}-$build_type"
	- cd $ROOT_DIR/$compiler/"${build}-$build_type"
	- local desc=$(echo "${compiler}-${build}-${build_type}" \| sed 's:/:-:g')
	- setup_env $compiler >& ${desc}.configure.log \|\| { report_and_log_test_result 1 ${desc} configure && return 0; }
	+ # Set up flags.
	+ local compiler_warning_flags=$(get_compiler_warning_flags $compiler)
	+ local compiler_exe=$(get_compiler_exe_name $compiler)

	- # Set up flags
	- local compiler_warning_flags=$(get_compiler_warning_flags $compiler)
	- local compiler_exe=$(get_compiler_exe_name $compiler)
	+ if [[ "$build_type" = hwloc* ]]; then
	+ local extra_args=--with-hwloc=$(dirname $(dirname $(which hwloc-info)))
	+ fi

	+ if [[ "$build" = Qthreads ]]; then
	if [[ "$build_type" = hwloc* ]]; then
	- local extra_args=--with-hwloc=$(dirname $(dirname $(which hwloc-info)))
	+ local extra_args="$extra_args --qthreads-path=${QTHREADS_PATH}_hwloc"
	+ else
	+ local extra_args="$extra_args --qthreads-path=$QTHREADS_PATH"
	fi
	+ fi

	- if [[ "$OPT_FLAG" = "" ]]; then
	- OPT_FLAG="-O3"
	- fi
	+ if [[ "$OPT_FLAG" = "" ]]; then
	+ OPT_FLAG="-O3"
	+ fi

	- if [[ "$build_type" = debug ]]; then
	- local extra_args="$extra_args --debug"
	- local cxxflags="-g $compiler_warning_flags"
	- else
	- local cxxflags="$OPT_FLAG $compiler_warning_flags"
	- fi
	+ if [[ "$build_type" = debug ]]; then
	+ local extra_args="$extra_args --debug"
	+ local cxxflags="-g $compiler_warning_flags"
	+ else
	+ local cxxflags="$OPT_FLAG $compiler_warning_flags"
	+ fi

	- if [[ "$compiler" == cuda* ]]; then
	- cxxflags="--keep --keep-dir=$(pwd) $cxxflags"
	- export TMPDIR=$(pwd)
	- fi
	+ if [[ "$KOKKOS_CUDA_OPTIONS" != "" ]]; then
	+ local extra_args="$extra_args $KOKKOS_CUDA_OPTIONS"
	+ fi

	- if [[ "$KOKKOS_CUDA_OPTIONS" != "" ]]; then
	- local extra_args="$extra_args $KOKKOS_CUDA_OPTIONS"
	- fi
	+ echo " Starting job $desc"

	- echo " Starting job $desc"
	+ local comment="no_comment"

	- local comment="no_comment"
	+ if [ "$TEST_SCRIPT" = "True" ]; then
	+ local rand=$[ 1 + $[ RANDOM % 10 ]]
	+ sleep $rand

	- if [ "$TEST_SCRIPT" = "True" ]; then
	- local rand=$[ 1 + $[ RANDOM % 10 ]]
	- sleep $rand
	- if [ $rand -gt 5 ]; then
	- run_cmd ls fake_problem >& ${desc}.configure.log \|\| { report_and_log_test_result 1 $desc configure && return 0; }
	- fi
	- else
	- run_cmd ${KOKKOS_PATH}/generate_makefile.bash --with-devices=$build $ARCH_FLAG --compiler=$(which $compiler_exe) --cxxflags=\"$cxxflags\" $extra_args &>> ${desc}.configure.log \|\| { report_and_log_test_result 1 ${desc} configure && return 0; }
	- local -i build_start_time=$(date +%s)
	- run_cmd make build-test >& ${desc}.build.log \|\| { report_and_log_test_result 1 ${desc} build && return 0; }
	- local -i build_end_time=$(date +%s)
	- comment="build_time=$(($build_end_time-$build_start_time))"
	- if [[ "$BUILD_ONLY" == False ]]; then
	- run_cmd make test >& ${desc}.test.log \|\| { report_and_log_test_result 1 ${desc} test && return 0; }
	- local -i run_end_time=$(date +%s)
	- comment="$comment run_time=$(($run_end_time-$build_end_time))"
	- fi
	+ if [ $rand -gt 5 ]; then
	+ run_cmd ls fake_problem >& ${desc}.configure.log \|\| { report_and_log_test_result 1 $desc configure && return 0; }
	fi
	+ else
	+ run_cmd ${KOKKOS_PATH}/generate_makefile.bash --with-devices=$build $ARCH_FLAG --compiler=$(which $compiler_exe) --cxxflags=\"$cxxflags\" $extra_args &>> ${desc}.configure.log \|\| { report_and_log_test_result 1 ${desc} configure && return 0; }
	+ local -i build_start_time=$(date +%s)
	+ run_cmd make build-test >& ${desc}.build.log \|\| { report_and_log_test_result 1 ${desc} build && return 0; }
	+ local -i build_end_time=$(date +%s)
	+ comment="build_time=$(($build_end_time-$build_start_time))"
	+
	+ if [[ "$BUILD_ONLY" == False ]]; then
	+ run_cmd make test >& ${desc}.test.log \|\| { report_and_log_test_result 1 ${desc} test && return 0; }
	+ local -i run_end_time=$(date +%s)
	+ comment="$comment run_time=$(($run_end_time-$build_end_time))"
	+ fi
	+ fi

	- report_and_log_test_result 0 $desc "$comment"
	+ report_and_log_test_result 0 $desc "$comment"

	- return 0
	+ return 0
	}

	# wait_for_jobs <NUM-JOBS>
	wait_for_jobs() {
	- local -i max_jobs=$1
	- local -i num_active_jobs=$(jobs \| wc -l)
	- while [ $num_active_jobs -ge $max_jobs ]
	- do
	- sleep 1
	- num_active_jobs=$(jobs \| wc -l)
	- jobs >& /dev/null
	- done
	+ local -i max_jobs=$1
	+ local -i num_active_jobs=$(jobs \| wc -l)
	+ while [ $num_active_jobs -ge $max_jobs ]
	+ do
	+ sleep 1
	+ num_active_jobs=$(jobs \| wc -l)
	+ jobs >& /dev/null
	+ done
	}

	# run_in_background <COMPILER> <BUILD> <BUILD_TYPE>
	run_in_background() {
	- local compiler=$1
	-
	- local -i num_jobs=$NUM_JOBS_TO_RUN_IN_PARALLEL
	- # don't override command line input
	- # if [[ "$BUILD_ONLY" == True ]]; then
	- # num_jobs=8
	- # else
	- if [[ "$compiler" == cuda* ]]; then
	- num_jobs=1
	- fi
	- # fi
	- wait_for_jobs $num_jobs
	-
	- single_build_and_test $* &
	+ local compiler=$1
	+
	+ local -i num_jobs=$NUM_JOBS_TO_RUN_IN_PARALLEL
	+ # Don't override command line input.
	+ # if [[ "$BUILD_ONLY" == True ]]; then
	+ # num_jobs=8
	+ # else
	+ if [[ "$compiler" == cuda* ]]; then
	+ num_jobs=1
	+ fi
	+ # fi
	+ wait_for_jobs $num_jobs
	+
	+ single_build_and_test $* &
	}

	# build_and_test_all <COMPILER>
	build_and_test_all() {
	- # Get compiler data
	- local compiler=$1
	- if [ -z "$CUSTOM_BUILD_LIST" ]; then
	- local compiler_build_list=$(get_compiler_build_list $compiler)
	- else
	- local compiler_build_list=$(echo "$CUSTOM_BUILD_LIST" \| tr , ' ')
	- fi
	+ # Get compiler data.
	+ local compiler=$1
	+ if [ -z "$CUSTOM_BUILD_LIST" ]; then
	+ local compiler_build_list=$(get_compiler_build_list $compiler)
	+ else
	+ local compiler_build_list=$(echo "$CUSTOM_BUILD_LIST" \| tr , ' ')
	+ fi

	- # do builds
	- local build
	- for build in $compiler_build_list
	- do
	- run_in_background $compiler $build $BUILD_TYPE
	+ # Do builds.
	+ local build
	+ for build in $compiler_build_list
	+ do
	+ run_in_background $compiler $build $BUILD_TYPE

	- # If not cuda, do a hwloc test too
	- if [[ "$compiler" != cuda* && "$SKIP_HWLOC" == False ]]; then
	- run_in_background $compiler $build "hwloc-$BUILD_TYPE"
	- fi
	- done
	+ # If not cuda, do a hwloc test too.
	+ if [[ "$compiler" != cuda* && "$SKIP_HWLOC" == False ]]; then
	+ run_in_background $compiler $build "hwloc-$BUILD_TYPE"
	+ fi
	+ done

	- return 0
	+ return 0
	}

	get_test_root_dir() {
	- local existing_results=$(find . -maxdepth 1 -name "$RESULT_ROOT_PREFIX*" \| sort)
	- local -i num_existing_results=$(echo $existing_results \| tr ' ' '\n' \| wc -l)
	- local -i num_to_delete=${num_existing_results}-${NUM_RESULTS_TO_KEEP}
	+ local existing_results=$(find . -maxdepth 1 -name "$RESULT_ROOT_PREFIX*" \| sort)
	+ local -i num_existing_results=$(echo $existing_results \| tr ' ' '\n' \| wc -l)
	+ local -i num_to_delete=${num_existing_results}-${NUM_RESULTS_TO_KEEP}

	- if [ $num_to_delete -gt 0 ]; then
	- /bin/rm -rf $(echo $existing_results \| tr ' ' '\n' \| head -n $num_to_delete)
	- fi
	+ if [ $num_to_delete -gt 0 ]; then
	+ /bin/rm -rf $(echo $existing_results \| tr ' ' '\n' \| head -n $num_to_delete)
	+ fi

	- echo $(pwd)/${RESULT_ROOT_PREFIX}_$(date +"%Y-%m-%d_%H.%M.%S")
	+ echo $(pwd)/${RESULT_ROOT_PREFIX}_$(date +"%Y-%m-%d_%H.%M.%S")
	}

	wait_summarize_and_exit() {
	- wait_for_jobs 1
	-
	- echo "#######################################################"
	- echo "PASSED TESTS"
	- echo "#######################################################"
	-
	- local passed_test
	- for passed_test in $(\ls -1 $PASSED_DIR \| sort)
	- do
	- echo $passed_test $(cat $PASSED_DIR/$passed_test)
	- done
	-
	- echo "#######################################################"
	- echo "FAILED TESTS"
	- echo "#######################################################"
	-
	- local failed_test
	- local -i rv=0
	- for failed_test in $(\ls -1 $FAILED_DIR \| sort)
	- do
	- echo $failed_test "("$(cat $FAILED_DIR/$failed_test)" failed)"
	- rv=$rv+1
	- done
	-
	- exit $rv
	+ wait_for_jobs 1
	+
	+ echo "#######################################################"
	+ echo "PASSED TESTS"
	+ echo "#######################################################"
	+
	+ local passed_test
	+ for passed_test in $(\ls -1 $PASSED_DIR \| sort)
	+ do
	+ echo $passed_test $(cat $PASSED_DIR/$passed_test)
	+ done
	+
	+ echo "#######################################################"
	+ echo "FAILED TESTS"
	+ echo "#######################################################"
	+
	+ local failed_test
	+ local -i rv=0
	+ for failed_test in $(\ls -1 $FAILED_DIR \| sort)
	+ do
	+ echo $failed_test "("$(cat $FAILED_DIR/$failed_test)" failed)"
	+ rv=$rv+1
	+ done
	+
	+ exit $rv
	}

	#
	-# Main
	+# Main.
	#

	ROOT_DIR=$(get_test_root_dir)
	mkdir -p $ROOT_DIR
	cd $ROOT_DIR

	PASSED_DIR=$ROOT_DIR/results/passed
	FAILED_DIR=$ROOT_DIR/results/failed
	mkdir -p $PASSED_DIR
	mkdir -p $FAILED_DIR

	echo "Going to test compilers: " $COMPILERS_TO_TEST
	for COMPILER in $COMPILERS_TO_TEST; do
	- echo "Testing compiler $COMPILER"
	- build_and_test_all $COMPILER
	+ echo "Testing compiler $COMPILER"
	+ build_and_test_all $COMPILER
	done

	wait_summarize_and_exit
	diff --git a/lib/kokkos/containers/src/Kokkos_DynamicView.hpp b/lib/kokkos/containers/src/Kokkos_DynamicView.hpp
	index 3277c007d..53e0eab69 100644
	--- a/lib/kokkos/containers/src/Kokkos_DynamicView.hpp
	+++ b/lib/kokkos/containers/src/Kokkos_DynamicView.hpp
	@@ -1,494 +1,591 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_DYNAMIC_VIEW_HPP
	#define KOKKOS_DYNAMIC_VIEW_HPP

	#include <cstdio>

	#include <Kokkos_Core.hpp>
	#include <impl/Kokkos_Error.hpp>

	namespace Kokkos {
	namespace Experimental {

	/** \brief Dynamic views are restricted to rank-one and no layout.
	* Subviews are not allowed.
	*/
	template< typename DataType , typename ... P >
	class DynamicView : public Kokkos::ViewTraits< DataType , P ... >
	{
	public:

	- typedef ViewTraits< DataType , P ... > traits ;
	+ typedef Kokkos::ViewTraits< DataType , P ... > traits ;

	private:

	template< class , class ... > friend class DynamicView ;

	typedef Kokkos::Experimental::Impl::SharedAllocationTracker track_type ;

	static_assert( traits::rank == 1 && traits::rank_dynamic == 1
	, "DynamicView must be rank-one" );

	static_assert( std::is_trivial< typename traits::value_type >::value &&
	std::is_same< typename traits::specialize , void >::value
	, "DynamicView must have trivial data type" );


	template< class Space , bool = Kokkos::Impl::MemorySpaceAccess< Space , typename traits::memory_space >::accessible > struct verify_space
	{ KOKKOS_FORCEINLINE_FUNCTION static void check() {} };

	template< class Space > struct verify_space<Space,false>
	{ KOKKOS_FORCEINLINE_FUNCTION static void check()
	{ Kokkos::abort("Kokkos::DynamicView ERROR: attempt to access inaccessible memory space"); };
	};

	public:

	typedef Kokkos::Experimental::MemoryPool< typename traits::device_type > memory_pool ;

	private:

	memory_pool m_pool ;
	track_type m_track ;
	typename traits::value_type ** m_chunks ;
	unsigned m_chunk_shift ;
	unsigned m_chunk_mask ;
	unsigned m_chunk_max ;

	public:

	//----------------------------------------------------------------------

	/** \brief Compatible view of array of scalar types */
	typedef DynamicView< typename traits::data_type ,
	typename traits::device_type >
	array_type ;

	/** \brief Compatible view of const data type */
	typedef DynamicView< typename traits::const_data_type ,
	typename traits::device_type >
	const_type ;

	/** \brief Compatible view of non-const data type */
	typedef DynamicView< typename traits::non_const_data_type ,
	typename traits::device_type >
	non_const_type ;

	/** \brief Must be accessible everywhere */
	typedef DynamicView HostMirror ;

	//----------------------------------------------------------------------

	enum { Rank = 1 };

	- KOKKOS_INLINE_FUNCTION constexpr size_t size() const
	+ KOKKOS_INLINE_FUNCTION
	+ size_t size() const noexcept
	{
	- return
	- Kokkos::Impl::MemorySpaceAccess
	- < Kokkos::Impl::ActiveExecutionMemorySpace
	- , typename traits::memory_space
	- >::accessible
	- ? // Runtime size is at the end of the chunk pointer array
	- (reinterpret_cast<const uintptr_t>( m_chunks + m_chunk_max ))
	- << m_chunk_shift
	- : 0 ;
	+ uintptr_t n = 0 ;
	+
	+ if ( Kokkos::Impl::MemorySpaceAccess
	+ < Kokkos::Impl::ActiveExecutionMemorySpace
	+ , typename traits::memory_space
	+ >::accessible ) {
	+ n = reinterpret_cast<const uintptr_t>( m_chunks + m_chunk_max );
	+ }
	+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+ else {
	+ Kokkos::Impl::DeepCopy< Kokkos::HostSpace
	+ , typename traits::memory_space
	+ , Kokkos::HostSpace::execution_space >
	+ ( & n
	+ , reinterpret_cast<const uintptr_t*>( m_chunks + m_chunk_max )
	+ , sizeof(uintptr_t) );
	+ }
	+#endif
	+ return n << m_chunk_shift ;
	}

	template< typename iType >
	- KOKKOS_INLINE_FUNCTION constexpr
	+ KOKKOS_INLINE_FUNCTION
	size_t extent( const iType & r ) const
	{ return r == 0 ? size() : 1 ; }

	template< typename iType >
	- KOKKOS_INLINE_FUNCTION constexpr
	+ KOKKOS_INLINE_FUNCTION
	size_t extent_int( const iType & r ) const
	{ return r == 0 ? size() : 1 ; }

	- KOKKOS_INLINE_FUNCTION constexpr size_t dimension_0() const { return size(); }
	+ KOKKOS_INLINE_FUNCTION size_t dimension_0() const { return size(); }
	KOKKOS_INLINE_FUNCTION constexpr size_t dimension_1() const { return 1 ; }
	KOKKOS_INLINE_FUNCTION constexpr size_t dimension_2() const { return 1 ; }
	KOKKOS_INLINE_FUNCTION constexpr size_t dimension_3() const { return 1 ; }
	KOKKOS_INLINE_FUNCTION constexpr size_t dimension_4() const { return 1 ; }
	KOKKOS_INLINE_FUNCTION constexpr size_t dimension_5() const { return 1 ; }
	KOKKOS_INLINE_FUNCTION constexpr size_t dimension_6() const { return 1 ; }
	KOKKOS_INLINE_FUNCTION constexpr size_t dimension_7() const { return 1 ; }

	KOKKOS_INLINE_FUNCTION constexpr size_t stride_0() const { return 0 ; }
	KOKKOS_INLINE_FUNCTION constexpr size_t stride_1() const { return 0 ; }
	KOKKOS_INLINE_FUNCTION constexpr size_t stride_2() const { return 0 ; }
	KOKKOS_INLINE_FUNCTION constexpr size_t stride_3() const { return 0 ; }
	KOKKOS_INLINE_FUNCTION constexpr size_t stride_4() const { return 0 ; }
	KOKKOS_INLINE_FUNCTION constexpr size_t stride_5() const { return 0 ; }
	KOKKOS_INLINE_FUNCTION constexpr size_t stride_6() const { return 0 ; }
	KOKKOS_INLINE_FUNCTION constexpr size_t stride_7() const { return 0 ; }

	template< typename iType >
	KOKKOS_INLINE_FUNCTION void stride( iType * const s ) const { *s = 0 ; }

	//----------------------------------------------------------------------
	// Range span is the span which contains all members.

	typedef typename traits::value_type & reference_type ;
	typedef typename traits::value_type * pointer_type ;

	enum { reference_type_is_lvalue_reference = std::is_lvalue_reference< reference_type >::value };

	KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const { return false ; }
	KOKKOS_INLINE_FUNCTION constexpr size_t span() const { return 0 ; }
	KOKKOS_INLINE_FUNCTION constexpr pointer_type data() const { return 0 ; }

	//----------------------------------------

	template< typename I0 , class ... Args >
	KOKKOS_INLINE_FUNCTION
	reference_type operator()( const I0 & i0 , const Args & ... args ) const
	{
	static_assert( Kokkos::Impl::are_integral<I0,Args...>::value
	, "Indices must be integral type" );

	DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();

	// Which chunk is being indexed.
	const uintptr_t ic = uintptr_t( i0 >> m_chunk_shift );

	typename traits::value_type * volatile * const ch = m_chunks + ic ;

	// Do bounds checking if enabled or if the chunk pointer is zero.
	// If not bounds checking then we assume a non-zero pointer is valid.

	#if ! defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
	if ( 0 == *ch )
	#endif
	{
	// Verify that allocation of the requested chunk in in progress.

	// The allocated chunk counter is m_chunks[ m_chunk_max ]
	const uintptr_t n =
	reinterpret_cast<uintptr_t volatile >( m_chunks + m_chunk_max );

	if ( n <= ic ) {
	Kokkos::abort("Kokkos::DynamicView array bounds error");
	}

	// Allocation of this chunk is in progress
	// so wait for allocation to complete.
	while ( 0 == *ch );
	}

	return (*ch)[ i0 & m_chunk_mask ];
	}

	//----------------------------------------
	/** \brief Resizing in parallel only increases the array size,
	* never decrease.
	*/
	KOKKOS_INLINE_FUNCTION
	void resize_parallel( size_t n ) const
	{
	typedef typename traits::value_type value_type ;

	DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();

	const uintptr_t NC = ( n + m_chunk_mask ) >> m_chunk_shift ;

	if ( m_chunk_max < NC ) {
	#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
	printf("DynamicView::resize_parallel(%lu) m_chunk_max(%u) NC(%lu)\n"
	, n , m_chunk_max , NC );
	#endif
	Kokkos::abort("DynamicView::resize_parallel exceeded maximum size");
	}

	typename traits::value_type * volatile * const ch = m_chunks ;

	// The allocated chunk counter is m_chunks[ m_chunk_max ]
	uintptr_t volatile * const pc =
	reinterpret_cast<uintptr_t volatile*>( m_chunks + m_chunk_max );

	// Potentially concurrent iteration of allocation to the required size.

	for ( uintptr_t jc = *pc ; jc < NC ; ) {

	// Claim the 'jc' chunk to-be-allocated index

	const uintptr_t jc_try = jc ;

	// Jump iteration to the chunk counter.

	jc = atomic_compare_exchange( pc , jc_try , jc_try + 1 );

	if ( jc_try == jc ) {

	ch[jc_try] = reinterpret_cast<value_type*>(
	m_pool.allocate( sizeof(value_type) << m_chunk_shift ));

	Kokkos::memory_fence();
	}
	}
	}

	/** \brief Resizing in serial can grow or shrink the array size, */
	+ template< typename IntType >
	inline
	- void resize_serial( size_t n )
	+ typename std::enable_if
	+ < std::is_integral<IntType>::value &&
	+ Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace
	+ , typename traits::memory_space
	+ >::accessible
	+ >::type
	+ resize_serial( IntType const & n )
	{
	- DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
	+ typedef typename traits::value_type value_type ;
	+ typedef value_type * pointer_type ;

	const uintptr_t NC = ( n + m_chunk_mask ) >> m_chunk_shift ;

	if ( m_chunk_max < NC ) {
	Kokkos::abort("DynamicView::resize_serial exceeded maximum size");
	}

	uintptr_t * const pc =
	reinterpret_cast<uintptr_t*>( m_chunks + m_chunk_max );

	if ( *pc < NC ) {
	while ( *pc < NC ) {
	- m_chunks[*pc] =
	- m_pool.allocate( sizeof(traits::value_type) << m_chunk_shift );
	+ m_chunks[*pc] = reinterpret_cast<pointer_type>
	+ ( m_pool.allocate( sizeof(value_type) << m_chunk_shift ) );
	++*pc ;
	}
	}
	else {
	while ( NC + 1 <= *pc ) {
	--*pc ;
	m_pool.deallocate( m_chunks[*pc]
	- , sizeof(traits::value_type) << m_chunk_shift );
	+ , sizeof(value_type) << m_chunk_shift );
	m_chunks[*pc] = 0 ;
	}
	}
	}

	+ //----------------------------------------
	+
	+ struct ResizeSerial {
	+ memory_pool m_pool ;
	+ typename traits::value_type ** m_chunks ;
	+ uintptr_t * m_pc ;
	+ uintptr_t m_nc ;
	+ unsigned m_chunk_shift ;
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( int ) const
	+ {
	+ typedef typename traits::value_type value_type ;
	+ typedef value_type * pointer_type ;
	+
	+ if ( *m_pc < m_nc ) {
	+ while ( *m_pc < m_nc ) {
	+ m_chunks[*m_pc] = reinterpret_cast<pointer_type>
	+ ( m_pool.allocate( sizeof(value_type) << m_chunk_shift ) );
	+ ++*m_pc ;
	+ }
	+ }
	+ else {
	+ while ( m_nc + 1 <= *m_pc ) {
	+ --*m_pc ;
	+ m_pool.deallocate( m_chunks[*m_pc]
	+ , sizeof(value_type) << m_chunk_shift );
	+ m_chunks[*m_pc] = 0 ;
	+ }
	+ }
	+ }
	+
	+ ResizeSerial( memory_pool const & arg_pool
	+ , typename traits::value_type ** arg_chunks
	+ , uintptr_t * arg_pc
	+ , uintptr_t arg_nc
	+ , unsigned arg_chunk_shift
	+ )
	+ : m_pool( arg_pool )
	+ , m_chunks( arg_chunks )
	+ , m_pc( arg_pc )
	+ , m_nc( arg_nc )
	+ , m_chunk_shift( arg_chunk_shift )
	+ {}
	+ };
	+
	+ template< typename IntType >
	+ inline
	+ typename std::enable_if
	+ < std::is_integral<IntType>::value &&
	+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace
	+ , typename traits::memory_space
	+ >::accessible
	+ >::type
	+ resize_serial( IntType const & n )
	+ {
	+ const uintptr_t NC = ( n + m_chunk_mask ) >> m_chunk_shift ;
	+
	+ if ( m_chunk_max < NC ) {
	+ Kokkos::abort("DynamicView::resize_serial exceeded maximum size");
	+ }
	+
	+ // Must dispatch kernel
	+
	+ typedef Kokkos::RangePolicy< typename traits::execution_space > Range ;
	+
	+ uintptr_t * const pc =
	+ reinterpret_cast<uintptr_t*>( m_chunks + m_chunk_max );
	+
	+ Kokkos::Impl::ParallelFor<ResizeSerial,Range>
	+ closure( ResizeSerial( m_pool, m_chunks, pc, NC, m_chunk_shift )
	+ , Range(0,1) );
	+
	+ closure.execute();
	+
	+ traits::execution_space::fence();
	+ }
	+
	//----------------------------------------------------------------------

	~DynamicView() = default ;
	DynamicView() = default ;
	DynamicView( DynamicView && ) = default ;
	DynamicView( const DynamicView & ) = default ;
	DynamicView & operator = ( DynamicView && ) = default ;
	DynamicView & operator = ( const DynamicView & ) = default ;

	template< class RT , class ... RP >
	- KOKKOS_INLINE_FUNCTION
	DynamicView( const DynamicView<RT,RP...> & rhs )
	: m_pool( rhs.m_pool )
	, m_track( rhs.m_track )
	- , m_chunks( rhs.m_chunks )
	+ , m_chunks( (typename traits::value_type **) rhs.m_chunks )
	, m_chunk_shift( rhs.m_chunk_shift )
	, m_chunk_mask( rhs.m_chunk_mask )
	, m_chunk_max( rhs.m_chunk_max )
	{
	+ typedef typename DynamicView<RT,RP...>::traits SrcTraits ;
	+ typedef Kokkos::Impl::ViewMapping< traits , SrcTraits , void > Mapping ;
	+ static_assert( Mapping::is_assignable , "Incompatible DynamicView copy construction" );
	}

	//----------------------------------------------------------------------

	struct Destroy {
	memory_pool m_pool ;
	typename traits::value_type ** m_chunks ;
	unsigned m_chunk_max ;
	bool m_destroy ;

	// Initialize or destroy array of chunk pointers.
	// Two entries beyond the max chunks are allocation counters.

	KOKKOS_INLINE_FUNCTION
	void operator()( unsigned i ) const
	{
	if ( m_destroy && i < m_chunk_max && 0 != m_chunks[i] ) {
	m_pool.deallocate( m_chunks[i] , m_pool.get_min_block_size() );
	}
	m_chunks[i] = 0 ;
	}

	void execute( bool arg_destroy )
	{
	typedef Kokkos::RangePolicy< typename traits::execution_space > Range ;

	m_destroy = arg_destroy ;

	Kokkos::Impl::ParallelFor<Destroy,Range>
	closure( *this , Range(0, m_chunk_max + 1) );

	closure.execute();

	traits::execution_space::fence();
	}

	void construct_shared_allocation()
	{ execute( false ); }

	void destroy_shared_allocation()
	{ execute( true ); }

	Destroy() = default ;
	Destroy( Destroy && ) = default ;
	Destroy( const Destroy & ) = default ;
	Destroy & operator = ( Destroy && ) = default ;
	Destroy & operator = ( const Destroy & ) = default ;

	Destroy( const memory_pool & arg_pool
	, typename traits::value_type ** arg_chunk
	, const unsigned arg_chunk_max )
	: m_pool( arg_pool )
	, m_chunks( arg_chunk )
	, m_chunk_max( arg_chunk_max )
	, m_destroy( false )
	{}
	};


	/**\brief Allocation constructor
	*
	* Memory is allocated in chunks from the memory pool.
	* The chunk size conforms to the memory pool's chunk size.
	* A maximum size is required in order to allocate a
	* chunk-pointer array.
	*/
	explicit inline
	DynamicView( const std::string & arg_label
	, const memory_pool & arg_pool
	, const size_t arg_size_max )
	: m_pool( arg_pool )
	, m_track()
	, m_chunks(0)
	// The memory pool chunk is guaranteed to be a power of two
	, m_chunk_shift(
	Kokkos::Impl::integral_power_of_two(
	m_pool.get_min_block_size()/sizeof(typename traits::value_type)) )
	, m_chunk_mask( ( 1 << m_chunk_shift ) - 1 )
	, m_chunk_max( ( arg_size_max + m_chunk_mask ) >> m_chunk_shift )
	{
	- DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
	-
	// A functor to deallocate all of the chunks upon final destruction

	typedef typename traits::memory_space memory_space ;
	typedef Kokkos::Experimental::Impl::SharedAllocationRecord< memory_space , Destroy > record_type ;

	// Allocate chunk pointers and allocation counter
	record_type * const record =
	record_type::allocate( memory_space()
	, arg_label
	, ( sizeof(pointer_type) * ( m_chunk_max + 1 ) ) );

	m_chunks = reinterpret_cast<pointer_type*>( record->data() );

	record->m_destroy = Destroy( m_pool , m_chunks , m_chunk_max );

	// Initialize to zero

	record->m_destroy.construct_shared_allocation();

	m_track.assign_allocated_record_to_uninitialized( record );
	}
	};

	} // namespace Experimental
	} // namespace Kokkos

	namespace Kokkos {
	namespace Experimental {

	template< class T , class ... P >
	inline
	typename Kokkos::Experimental::DynamicView<T,P...>::HostMirror
	create_mirror_view( const Kokkos::Experimental::DynamicView<T,P...> & src )
	{
	return src ;
	}

	template< class T , class ... DP , class ... SP >
	inline
	void deep_copy( const View<T,DP...> & dst
	, const DynamicView<T,SP...> & src
	)
	{
	typedef View<T,DP...> dst_type ;
	typedef DynamicView<T,SP...> src_type ;

	typedef typename ViewTraits<T,DP...>::execution_space dst_execution_space ;
	typedef typename ViewTraits<T,SP...>::memory_space src_memory_space ;

	enum { DstExecCanAccessSrc =
	Kokkos::Impl::SpaceAccessibility< dst_execution_space , src_memory_space >::accessible };

	if ( DstExecCanAccessSrc ) {
	// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
	Kokkos::Experimental::Impl::ViewRemap< dst_type , src_type >( dst , src );
	}
	else {
	Kokkos::Impl::throw_runtime_exception("deep_copy given views that would require a temporary allocation");
	}
	}

	template< class T , class ... DP , class ... SP >
	inline
	void deep_copy( const DynamicView<T,DP...> & dst
	, const View<T,SP...> & src
	)
	{
	typedef DynamicView<T,SP...> dst_type ;
	typedef View<T,DP...> src_type ;

	typedef typename ViewTraits<T,DP...>::execution_space dst_execution_space ;
	typedef typename ViewTraits<T,SP...>::memory_space src_memory_space ;

	enum { DstExecCanAccessSrc =
	Kokkos::Impl::SpaceAccessibility< dst_execution_space , src_memory_space >::accessible };

	if ( DstExecCanAccessSrc ) {
	// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
	Kokkos::Experimental::Impl::ViewRemap< dst_type , src_type >( dst , src );
	}
	else {
	Kokkos::Impl::throw_runtime_exception("deep_copy given views that would require a temporary allocation");
	}
	}

	} // namespace Experimental
	} // namespace Kokkos

	#endif /* #ifndef KOKKOS_DYNAMIC_VIEW_HPP */

	diff --git a/lib/kokkos/containers/src/Kokkos_UnorderedMap.hpp b/lib/kokkos/containers/src/Kokkos_UnorderedMap.hpp
	index 8646d2779..193f1bc33 100644
	--- a/lib/kokkos/containers/src/Kokkos_UnorderedMap.hpp
	+++ b/lib/kokkos/containers/src/Kokkos_UnorderedMap.hpp
	@@ -1,848 +1,849 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	/// \file Kokkos_UnorderedMap.hpp
	/// \brief Declaration and definition of Kokkos::UnorderedMap.
	///
	/// This header file declares and defines Kokkos::UnorderedMap and its
	/// related nonmember functions.

	#ifndef KOKKOS_UNORDERED_MAP_HPP
	#define KOKKOS_UNORDERED_MAP_HPP

	#include <Kokkos_Core.hpp>
	#include <Kokkos_Functional.hpp>

	#include <Kokkos_Bitset.hpp>

	#include <impl/Kokkos_Traits.hpp>
	#include <impl/Kokkos_UnorderedMap_impl.hpp>


	#include <iostream>

	#include <stdint.h>
	#include <stdexcept>


	namespace Kokkos {

	enum { UnorderedMapInvalidIndex = ~0u };

	/// \brief First element of the return value of UnorderedMap::insert().
	///
	/// Inserting an element into an UnorderedMap is not guaranteed to
	/// succeed. There are three possible conditions:
	/// <ol>
	/// <li> <tt>INSERT_FAILED</tt>: The insert failed. This usually
	/// means that the UnorderedMap ran out of space. </li>
	/// <li> <tt>INSERT_SUCCESS</tt>: The insert succeeded, and the key
	/// did <i>not</i> exist in the table before. </li>
	/// <li> <tt>INSERT_EXISTING</tt>: The insert succeeded, and the key
	/// <i>did</i> exist in the table before. The new value was
	/// ignored and the old value was left in place. </li>
	/// </ol>

	class UnorderedMapInsertResult
	{
	private:
	enum Status{
	SUCCESS = 1u << 31
	, EXISTING = 1u << 30
	, FREED_EXISTING = 1u << 29
	, LIST_LENGTH_MASK = ~(SUCCESS \| EXISTING \| FREED_EXISTING)
	};

	public:
	/// Did the map successful insert the key/value pair
	KOKKOS_FORCEINLINE_FUNCTION
	bool success() const { return (m_status & SUCCESS); }

	/// Was the key already present in the map
	KOKKOS_FORCEINLINE_FUNCTION
	bool existing() const { return (m_status & EXISTING); }

	/// Did the map fail to insert the key due to insufficent capacity
	KOKKOS_FORCEINLINE_FUNCTION
	bool failed() const { return m_index == UnorderedMapInvalidIndex; }

	/// Did the map lose a race condition to insert a dupulicate key/value pair
	/// where an index was claimed that needed to be released
	KOKKOS_FORCEINLINE_FUNCTION
	bool freed_existing() const { return (m_status & FREED_EXISTING); }

	/// How many iterations through the insert loop did it take before the
	/// map returned
	KOKKOS_FORCEINLINE_FUNCTION
	uint32_t list_position() const { return (m_status & LIST_LENGTH_MASK); }

	/// Index where the key can be found as long as the insert did not fail
	KOKKOS_FORCEINLINE_FUNCTION
	uint32_t index() const { return m_index; }

	KOKKOS_FORCEINLINE_FUNCTION
	UnorderedMapInsertResult()
	: m_index(UnorderedMapInvalidIndex)
	, m_status(0)
	{}

	KOKKOS_FORCEINLINE_FUNCTION
	void increment_list_position()
	{
	m_status += (list_position() < LIST_LENGTH_MASK) ? 1u : 0u;
	}

	KOKKOS_FORCEINLINE_FUNCTION
	void set_existing(uint32_t i, bool arg_freed_existing)
	{
	m_index = i;
	m_status = EXISTING \| (arg_freed_existing ? FREED_EXISTING : 0u) \| list_position();
	}

	KOKKOS_FORCEINLINE_FUNCTION
	void set_success(uint32_t i)
	{
	m_index = i;
	m_status = SUCCESS \| list_position();
	}

	private:
	uint32_t m_index;
	uint32_t m_status;
	};

	/// \class UnorderedMap
	/// \brief Thread-safe, performance-portable lookup table.
	///
	/// This class provides a lookup table. In terms of functionality,
	/// this class compares to std::unordered_map (new in C++11).
	/// "Unordered" means that keys are not stored in any particular
	/// order, unlike (for example) std::map. "Thread-safe" means that
	/// lookups, insertion, and deletion are safe to call by multiple
	/// threads in parallel. "Performance-portable" means that parallel
	/// performance of these operations is reasonable, on multiple
	/// hardware platforms. Platforms on which performance has been
	/// tested include conventional Intel x86 multicore processors, Intel
	/// Xeon Phi ("MIC"), and NVIDIA GPUs.
	///
	/// Parallel performance portability entails design decisions that
	/// might differ from one's expectation for a sequential interface.
	/// This particularly affects insertion of single elements. In an
	/// interface intended for sequential use, insertion might reallocate
	/// memory if the original allocation did not suffice to hold the new
	/// element. In this class, insertion does <i>not</i> reallocate
	/// memory. This means that it might fail. insert() returns an enum
	/// which indicates whether the insert failed. There are three
	/// possible conditions:
	/// <ol>
	/// <li> <tt>INSERT_FAILED</tt>: The insert failed. This usually
	/// means that the UnorderedMap ran out of space. </li>
	/// <li> <tt>INSERT_SUCCESS</tt>: The insert succeeded, and the key
	/// did <i>not</i> exist in the table before. </li>
	/// <li> <tt>INSERT_EXISTING</tt>: The insert succeeded, and the key
	/// <i>did</i> exist in the table before. The new value was
	/// ignored and the old value was left in place. </li>
	/// </ol>
	///
	/// \tparam Key Type of keys of the lookup table. If \c const, users
	/// are not allowed to add or remove keys, though they are allowed
	/// to change values. In that case, the implementation may make
	/// optimizations specific to the <tt>Device</tt>. For example, if
	/// <tt>Device</tt> is \c Cuda, it may use texture fetches to access
	/// keys.
	///
	/// \tparam Value Type of values stored in the lookup table. You may use
	/// \c void here, in which case the table will be a set of keys. If
	/// \c const, users are not allowed to change entries.
	/// In that case, the implementation may make
	/// optimizations specific to the \c Device, such as using texture
	/// fetches to access values.
	///
	/// \tparam Device The Kokkos Device type.
	///
	/// \tparam Hasher Definition of the hash function for instances of
	/// <tt>Key</tt>. The default will calculate a bitwise hash.
	///
	/// \tparam EqualTo Definition of the equality function for instances of
	/// <tt>Key</tt>. The default will do a bitwise equality comparison.
	///
	template < typename Key
	, typename Value
	, typename Device = Kokkos::DefaultExecutionSpace
	, typename Hasher = pod_hash<typename Impl::remove_const<Key>::type>
	, typename EqualTo = pod_equal_to<typename Impl::remove_const<Key>::type>
	>
	class UnorderedMap
	{
	private:
	typedef typename ViewTraits<Key,Device,void,void>::host_mirror_space host_mirror_space ;
	public:
	//! \name Public types and constants
	//@{

	//key_types
	typedef Key declared_key_type;
	typedef typename Impl::remove_const<declared_key_type>::type key_type;
	typedef typename Impl::add_const<key_type>::type const_key_type;

	//value_types
	typedef Value declared_value_type;
	typedef typename Impl::remove_const<declared_value_type>::type value_type;
	typedef typename Impl::add_const<value_type>::type const_value_type;

	- typedef Device execution_space;
	+ typedef Device device_type;
	+ typedef typename Device::execution_space execution_space;
	typedef Hasher hasher_type;
	typedef EqualTo equal_to_type;
	typedef uint32_t size_type;

	//map_types
	- typedef UnorderedMap<declared_key_type,declared_value_type,execution_space,hasher_type,equal_to_type> declared_map_type;
	- typedef UnorderedMap<key_type,value_type,execution_space,hasher_type,equal_to_type> insertable_map_type;
	- typedef UnorderedMap<const_key_type,value_type,execution_space,hasher_type,equal_to_type> modifiable_map_type;
	- typedef UnorderedMap<const_key_type,const_value_type,execution_space,hasher_type,equal_to_type> const_map_type;
	+ typedef UnorderedMap<declared_key_type,declared_value_type,device_type,hasher_type,equal_to_type> declared_map_type;
	+ typedef UnorderedMap<key_type,value_type,device_type,hasher_type,equal_to_type> insertable_map_type;
	+ typedef UnorderedMap<const_key_type,value_type,device_type,hasher_type,equal_to_type> modifiable_map_type;
	+ typedef UnorderedMap<const_key_type,const_value_type,device_type,hasher_type,equal_to_type> const_map_type;

	static const bool is_set = std::is_same<void,value_type>::value;
	static const bool has_const_key = std::is_same<const_key_type,declared_key_type>::value;
	static const bool has_const_value = is_set \|\| std::is_same<const_value_type,declared_value_type>::value;

	static const bool is_insertable_map = !has_const_key && (is_set \|\| !has_const_value);
	static const bool is_modifiable_map = has_const_key && !has_const_value;
	static const bool is_const_map = has_const_key && has_const_value;


	typedef UnorderedMapInsertResult insert_result;

	typedef UnorderedMap<Key,Value,host_mirror_space,Hasher,EqualTo> HostMirror;

	typedef Impl::UnorderedMapHistogram<const_map_type> histogram_type;

	//@}

	private:
	enum { invalid_index = ~static_cast<size_type>(0) };

	typedef typename Impl::if_c< is_set, int, declared_value_type>::type impl_value_type;

	typedef typename Impl::if_c< is_insertable_map
	- , View< key_type *, execution_space>
	- , View< const key_type *, execution_space, MemoryTraits<RandomAccess> >
	+ , View< key_type *, device_type>
	+ , View< const key_type *, device_type, MemoryTraits<RandomAccess> >
	>::type key_type_view;

	typedef typename Impl::if_c< is_insertable_map \|\| is_modifiable_map
	- , View< impl_value_type *, execution_space>
	- , View< const impl_value_type *, execution_space, MemoryTraits<RandomAccess> >
	+ , View< impl_value_type *, device_type>
	+ , View< const impl_value_type *, device_type, MemoryTraits<RandomAccess> >
	>::type value_type_view;

	typedef typename Impl::if_c< is_insertable_map
	- , View< size_type *, execution_space>
	- , View< const size_type *, execution_space, MemoryTraits<RandomAccess> >
	+ , View< size_type *, device_type>
	+ , View< const size_type *, device_type, MemoryTraits<RandomAccess> >
	>::type size_type_view;

	typedef typename Impl::if_c< is_insertable_map
	, Bitset< execution_space >
	, ConstBitset< execution_space>
	>::type bitset_type;

	enum { modified_idx = 0, erasable_idx = 1, failed_insert_idx = 2 };
	enum { num_scalars = 3 };
	- typedef View< int[num_scalars], LayoutLeft, execution_space> scalars_view;
	+ typedef View< int[num_scalars], LayoutLeft, device_type> scalars_view;

	public:
	//! \name Public member functions
	//@{

	UnorderedMap()
	: m_bounded_insert()
	, m_hasher()
	, m_equal_to()
	, m_size()
	, m_available_indexes()
	, m_hash_lists()
	, m_next_index()
	, m_keys()
	, m_values()
	, m_scalars()
	{}

	/// \brief Constructor
	///
	/// \param capacity_hint [in] Initial guess of how many unique keys will be inserted into the map
	/// \param hash [in] Hasher function for \c Key instances. The
	/// default value usually suffices.
	UnorderedMap( size_type capacity_hint, hasher_type hasher = hasher_type(), equal_to_type equal_to = equal_to_type() )
	: m_bounded_insert(true)
	, m_hasher(hasher)
	, m_equal_to(equal_to)
	, m_size()
	, m_available_indexes(calculate_capacity(capacity_hint))
	, m_hash_lists(ViewAllocateWithoutInitializing("UnorderedMap hash list"), Impl::find_hash_size(capacity()))
	, m_next_index(ViewAllocateWithoutInitializing("UnorderedMap next index"), capacity()+1) // +1 so that the *_at functions can always return a valid reference
	, m_keys("UnorderedMap keys",capacity()+1)
	, m_values("UnorderedMap values",(is_set? 1 : capacity()+1))
	, m_scalars("UnorderedMap scalars")
	{
	if (!is_insertable_map) {
	throw std::runtime_error("Cannot construct a non-insertable (i.e. const key_type) unordered_map");
	}

	Kokkos::deep_copy(m_hash_lists, invalid_index);
	Kokkos::deep_copy(m_next_index, invalid_index);
	}

	void reset_failed_insert_flag()
	{
	reset_flag(failed_insert_idx);
	}

	histogram_type get_histogram()
	{
	return histogram_type(*this);
	}

	//! Clear all entries in the table.
	void clear()
	{
	m_bounded_insert = true;

	if (capacity() == 0) return;

	m_available_indexes.clear();

	Kokkos::deep_copy(m_hash_lists, invalid_index);
	Kokkos::deep_copy(m_next_index, invalid_index);
	{
	const key_type tmp = key_type();
	Kokkos::deep_copy(m_keys,tmp);
	}
	if (is_set){
	const impl_value_type tmp = impl_value_type();
	Kokkos::deep_copy(m_values,tmp);
	}
	{
	Kokkos::deep_copy(m_scalars, 0);
	}
	}

	/// \brief Change the capacity of the the map
	///
	/// If there are no failed inserts the current size of the map will
	/// be used as a lower bound for the input capacity.
	/// If the map is not empty and does not have failed inserts
	/// and the capacity changes then the current data is copied
	/// into the resized / rehashed map.
	///
	/// This is <i>not</i> a device function; it may <i>not</i> be
	/// called in a parallel kernel.
	bool rehash(size_type requested_capacity = 0)
	{
	const bool bounded_insert = (capacity() == 0) \|\| (size() == 0u);
	return rehash(requested_capacity, bounded_insert );
	}

	bool rehash(size_type requested_capacity, bool bounded_insert)
	{
	if(!is_insertable_map) return false;

	const size_type curr_size = size();
	requested_capacity = (requested_capacity < curr_size) ? curr_size : requested_capacity;

	insertable_map_type tmp(requested_capacity, m_hasher, m_equal_to);

	if (curr_size) {
	tmp.m_bounded_insert = false;
	Impl::UnorderedMapRehash<insertable_map_type> f(tmp,*this);
	f.apply();
	}
	tmp.m_bounded_insert = bounded_insert;

	*this = tmp;

	return true;
	}

	/// \brief The number of entries in the table.
	///
	/// This method has undefined behavior when erasable() is true.
	///
	/// Note that this is not a device function; it cannot be called in
	/// a parallel kernel. The value is not stored as a variable; it
	/// must be computed.
	size_type size() const
	{
	if( capacity() == 0u ) return 0u;
	if (modified()) {
	m_size = m_available_indexes.count();
	reset_flag(modified_idx);
	}
	return m_size;
	}

	/// \brief The current number of failed insert() calls.
	///
	/// This is <i>not</i> a device function; it may <i>not</i> be
	/// called in a parallel kernel. The value is not stored as a
	/// variable; it must be computed.
	bool failed_insert() const
	{
	return get_flag(failed_insert_idx);
	}

	bool erasable() const
	{
	return is_insertable_map ? get_flag(erasable_idx) : false;
	}

	bool begin_erase()
	{
	bool result = !erasable();
	if (is_insertable_map && result) {
	execution_space::fence();
	set_flag(erasable_idx);
	execution_space::fence();
	}
	return result;
	}

	bool end_erase()
	{
	bool result = erasable();
	if (is_insertable_map && result) {
	execution_space::fence();
	Impl::UnorderedMapErase<declared_map_type> f(*this);
	f.apply();
	execution_space::fence();
	reset_flag(erasable_idx);
	}
	return result;
	}

	/// \brief The maximum number of entries that the table can hold.
	///
	/// This <i>is</i> a device function; it may be called in a parallel
	/// kernel.
	KOKKOS_FORCEINLINE_FUNCTION
	size_type capacity() const
	{ return m_available_indexes.size(); }

	/// \brief The number of hash table "buckets."
	///
	/// This is different than the number of entries that the table can
	/// hold. Each key hashes to an index in [0, hash_capacity() - 1].
	/// That index can hold zero or more entries. This class decides
	/// what hash_capacity() should be, given the user's upper bound on
	/// the number of entries the table must be able to hold.
	///
	/// This <i>is</i> a device function; it may be called in a parallel
	/// kernel.
	KOKKOS_INLINE_FUNCTION
	size_type hash_capacity() const
	{ return m_hash_lists.dimension_0(); }

	//---------------------------------------------------------------------------
	//---------------------------------------------------------------------------


	/// This <i>is</i> a device function; it may be called in a parallel
	/// kernel. As discussed in the class documentation, it need not
	/// succeed. The return value tells you if it did.
	///
	/// \param k [in] The key to attempt to insert.
	/// \param v [in] The corresponding value to attempt to insert. If
	/// using this class as a set (with Value = void), then you need not
	/// provide this value.
	KOKKOS_INLINE_FUNCTION
	insert_result insert(key_type const& k, impl_value_type const&v = impl_value_type()) const
	{
	insert_result result;

	if ( !is_insertable_map \|\| capacity() == 0u \|\| m_scalars((int)erasable_idx) ) {
	return result;
	}

	if ( !m_scalars((int)modified_idx) ) {
	m_scalars((int)modified_idx) = true;
	}

	int volatile & failed_insert_ref = m_scalars((int)failed_insert_idx) ;

	const size_type hash_value = m_hasher(k);
	const size_type hash_list = hash_value % m_hash_lists.dimension_0();

	size_type * curr_ptr = & m_hash_lists[ hash_list ];
	size_type new_index = invalid_index ;

	// Force integer multiply to long
	size_type index_hint = static_cast<size_type>( (static_cast<double>(hash_list) * capacity()) / m_hash_lists.dimension_0());

	size_type find_attempts = 0;

	enum { bounded_find_attempts = 32u };
	const size_type max_attempts = (m_bounded_insert && (bounded_find_attempts < m_available_indexes.max_hint()) ) ?
	bounded_find_attempts :
	m_available_indexes.max_hint();

	bool not_done = true ;

	#if defined( __MIC__ )
	#pragma noprefetch
	#endif
	while ( not_done ) {

	// Continue searching the unordered list for this key,
	// list will only be appended during insert phase.
	// Need volatile_load as other threads may be appending.
	size_type curr = volatile_load(curr_ptr);

	KOKKOS_NONTEMPORAL_PREFETCH_LOAD(&m_keys[curr != invalid_index ? curr : 0]);
	#if defined( __MIC__ )
	#pragma noprefetch
	#endif
	while ( curr != invalid_index && ! m_equal_to( volatile_load(&m_keys[curr]), k) ) {
	result.increment_list_position();
	index_hint = curr;
	curr_ptr = &m_next_index[curr];
	curr = volatile_load(curr_ptr);
	KOKKOS_NONTEMPORAL_PREFETCH_LOAD(&m_keys[curr != invalid_index ? curr : 0]);
	}

	//------------------------------------------------------------
	// If key already present then return that index.
	if ( curr != invalid_index ) {

	const bool free_existing = new_index != invalid_index;
	if ( free_existing ) {
	// Previously claimed an unused entry that was not inserted.
	// Release this unused entry immediately.
	if (!m_available_indexes.reset(new_index) ) {
	printf("Unable to free existing\n");
	}

	}

	result.set_existing(curr, free_existing);
	not_done = false ;
	}
	//------------------------------------------------------------
	// Key is not currently in the map.
	// If the thread has claimed an entry try to insert now.
	else {

	//------------------------------------------------------------
	// If have not already claimed an unused entry then do so now.
	if (new_index == invalid_index) {

	bool found = false;
	// use the hash_list as the flag for the search direction
	Kokkos::tie(found, index_hint) = m_available_indexes.find_any_unset_near( index_hint, hash_list );

	// found and index and this thread set it
	if ( !found && ++find_attempts >= max_attempts ) {
	failed_insert_ref = true;
	not_done = false ;
	}
	else if (m_available_indexes.set(index_hint) ) {
	new_index = index_hint;
	// Set key and value
	KOKKOS_NONTEMPORAL_PREFETCH_STORE(&m_keys[new_index]);
	m_keys[new_index] = k ;

	if (!is_set) {
	KOKKOS_NONTEMPORAL_PREFETCH_STORE(&m_values[new_index]);
	m_values[new_index] = v ;
	}

	// Do not proceed until key and value are updated in global memory
	memory_fence();
	}
	}
	else if (failed_insert_ref) {
	not_done = false;
	}

	// Attempt to append claimed entry into the list.
	// Another thread may also be trying to append the same list so protect with atomic.
	if ( new_index != invalid_index &&
	curr == atomic_compare_exchange(curr_ptr, static_cast<size_type>(invalid_index), new_index) ) {
	// Succeeded in appending
	result.set_success(new_index);
	not_done = false ;
	}
	}
	} // while ( not_done )

	return result ;
	}

	KOKKOS_INLINE_FUNCTION
	bool erase(key_type const& k) const
	{
	bool result = false;

	if(is_insertable_map && 0u < capacity() && m_scalars((int)erasable_idx)) {

	if ( ! m_scalars((int)modified_idx) ) {
	m_scalars((int)modified_idx) = true;
	}

	size_type index = find(k);
	if (valid_at(index)) {
	m_available_indexes.reset(index);
	result = true;
	}
	}

	return result;
	}

	/// \brief Find the given key \c k, if it exists in the table.
	///
	/// \return If the key exists in the table, the index of the
	/// value corresponding to that key; otherwise, an invalid index.
	///
	/// This <i>is</i> a device function; it may be called in a parallel
	/// kernel.
	KOKKOS_INLINE_FUNCTION
	size_type find( const key_type & k) const
	{
	size_type curr = 0u < capacity() ? m_hash_lists( m_hasher(k) % m_hash_lists.dimension_0() ) : invalid_index ;

	KOKKOS_NONTEMPORAL_PREFETCH_LOAD(&m_keys[curr != invalid_index ? curr : 0]);
	while (curr != invalid_index && !m_equal_to( m_keys[curr], k) ) {
	KOKKOS_NONTEMPORAL_PREFETCH_LOAD(&m_keys[curr != invalid_index ? curr : 0]);
	curr = m_next_index[curr];
	}

	return curr;
	}

	/// \brief Does the key exist in the map
	///
	/// This <i>is</i> a device function; it may be called in a parallel
	/// kernel.
	KOKKOS_INLINE_FUNCTION
	bool exists( const key_type & k) const
	{
	return valid_at(find(k));
	}


	/// \brief Get the value with \c i as its direct index.
	///
	/// \param i [in] Index directly into the array of entries.
	///
	/// This <i>is</i> a device function; it may be called in a parallel
	/// kernel.
	///
	/// 'const value_type' via Cuda texture fetch must return by value.
	KOKKOS_FORCEINLINE_FUNCTION
	typename Impl::if_c< (is_set \|\| has_const_value), impl_value_type, impl_value_type &>::type
	value_at(size_type i) const
	{
	return m_values[ is_set ? 0 : (i < capacity() ? i : capacity()) ];
	}

	/// \brief Get the key with \c i as its direct index.
	///
	/// \param i [in] Index directly into the array of entries.
	///
	/// This <i>is</i> a device function; it may be called in a parallel
	/// kernel.
	KOKKOS_FORCEINLINE_FUNCTION
	key_type key_at(size_type i) const
	{
	return m_keys[ i < capacity() ? i : capacity() ];
	}

	KOKKOS_FORCEINLINE_FUNCTION
	bool valid_at(size_type i) const
	{
	return m_available_indexes.test(i);
	}

	template <typename SKey, typename SValue>
	UnorderedMap( UnorderedMap<SKey,SValue,Device,Hasher,EqualTo> const& src,
	typename Impl::enable_if< Impl::UnorderedMapCanAssign<declared_key_type,declared_value_type,SKey,SValue>::value,int>::type = 0
	)
	: m_bounded_insert(src.m_bounded_insert)
	, m_hasher(src.m_hasher)
	, m_equal_to(src.m_equal_to)
	, m_size(src.m_size)
	, m_available_indexes(src.m_available_indexes)
	, m_hash_lists(src.m_hash_lists)
	, m_next_index(src.m_next_index)
	, m_keys(src.m_keys)
	, m_values(src.m_values)
	, m_scalars(src.m_scalars)
	{}


	template <typename SKey, typename SValue>
	typename Impl::enable_if< Impl::UnorderedMapCanAssign<declared_key_type,declared_value_type,SKey,SValue>::value
	,declared_map_type & >::type
	operator=( UnorderedMap<SKey,SValue,Device,Hasher,EqualTo> const& src)
	{
	m_bounded_insert = src.m_bounded_insert;
	m_hasher = src.m_hasher;
	m_equal_to = src.m_equal_to;
	m_size = src.m_size;
	m_available_indexes = src.m_available_indexes;
	m_hash_lists = src.m_hash_lists;
	m_next_index = src.m_next_index;
	m_keys = src.m_keys;
	m_values = src.m_values;
	m_scalars = src.m_scalars;
	return *this;
	}

	template <typename SKey, typename SValue, typename SDevice>
	typename Impl::enable_if< std::is_same< typename Impl::remove_const<SKey>::type, key_type>::value &&
	std::is_same< typename Impl::remove_const<SValue>::type, value_type>::value
	>::type
	create_copy_view( UnorderedMap<SKey, SValue, SDevice, Hasher,EqualTo> const& src)
	{
	if (m_hash_lists.ptr_on_device() != src.m_hash_lists.ptr_on_device()) {

	insertable_map_type tmp;

	tmp.m_bounded_insert = src.m_bounded_insert;
	tmp.m_hasher = src.m_hasher;
	tmp.m_equal_to = src.m_equal_to;
	tmp.m_size = src.size();
	tmp.m_available_indexes = bitset_type( src.capacity() );
	tmp.m_hash_lists = size_type_view( ViewAllocateWithoutInitializing("UnorderedMap hash list"), src.m_hash_lists.dimension_0() );
	tmp.m_next_index = size_type_view( ViewAllocateWithoutInitializing("UnorderedMap next index"), src.m_next_index.dimension_0() );
	tmp.m_keys = key_type_view( ViewAllocateWithoutInitializing("UnorderedMap keys"), src.m_keys.dimension_0() );
	tmp.m_values = value_type_view( ViewAllocateWithoutInitializing("UnorderedMap values"), src.m_values.dimension_0() );
	tmp.m_scalars = scalars_view("UnorderedMap scalars");

	Kokkos::deep_copy(tmp.m_available_indexes, src.m_available_indexes);

	- typedef Kokkos::Impl::DeepCopy< typename execution_space::memory_space, typename SDevice::memory_space > raw_deep_copy;
	+ typedef Kokkos::Impl::DeepCopy< typename device_type::memory_space, typename SDevice::memory_space > raw_deep_copy;

	raw_deep_copy(tmp.m_hash_lists.ptr_on_device(), src.m_hash_lists.ptr_on_device(), sizeof(size_type)*src.m_hash_lists.dimension_0());
	raw_deep_copy(tmp.m_next_index.ptr_on_device(), src.m_next_index.ptr_on_device(), sizeof(size_type)*src.m_next_index.dimension_0());
	raw_deep_copy(tmp.m_keys.ptr_on_device(), src.m_keys.ptr_on_device(), sizeof(key_type)*src.m_keys.dimension_0());
	if (!is_set) {
	raw_deep_copy(tmp.m_values.ptr_on_device(), src.m_values.ptr_on_device(), sizeof(impl_value_type)*src.m_values.dimension_0());
	}
	raw_deep_copy(tmp.m_scalars.ptr_on_device(), src.m_scalars.ptr_on_device(), sizeof(int)*num_scalars );

	*this = tmp;
	}
	}

	//@}
	private: // private member functions

	bool modified() const
	{
	return get_flag(modified_idx);
	}

	void set_flag(int flag) const
	{
	- typedef Kokkos::Impl::DeepCopy< typename execution_space::memory_space, Kokkos::HostSpace > raw_deep_copy;
	+ typedef Kokkos::Impl::DeepCopy< typename device_type::memory_space, Kokkos::HostSpace > raw_deep_copy;
	const int true_ = true;
	raw_deep_copy(m_scalars.ptr_on_device() + flag, &true_, sizeof(int));
	}

	void reset_flag(int flag) const
	{
	- typedef Kokkos::Impl::DeepCopy< typename execution_space::memory_space, Kokkos::HostSpace > raw_deep_copy;
	+ typedef Kokkos::Impl::DeepCopy< typename device_type::memory_space, Kokkos::HostSpace > raw_deep_copy;
	const int false_ = false;
	raw_deep_copy(m_scalars.ptr_on_device() + flag, &false_, sizeof(int));
	}

	bool get_flag(int flag) const
	{
	- typedef Kokkos::Impl::DeepCopy< Kokkos::HostSpace, typename execution_space::memory_space > raw_deep_copy;
	+ typedef Kokkos::Impl::DeepCopy< Kokkos::HostSpace, typename device_type::memory_space > raw_deep_copy;
	int result = false;
	raw_deep_copy(&result, m_scalars.ptr_on_device() + flag, sizeof(int));
	return result;
	}

	static uint32_t calculate_capacity(uint32_t capacity_hint)
	{
	// increase by 16% and round to nears multiple of 128
	return capacity_hint ? ((static_cast<uint32_t>(7ullcapacity_hint/6u) + 127u)/128u)128u : 128u;
	}

	private: // private members
	bool m_bounded_insert;
	hasher_type m_hasher;
	equal_to_type m_equal_to;
	mutable size_type m_size;
	bitset_type m_available_indexes;
	size_type_view m_hash_lists;
	size_type_view m_next_index;
	key_type_view m_keys;
	value_type_view m_values;
	scalars_view m_scalars;

	template <typename KKey, typename VValue, typename DDevice, typename HHash, typename EEqualTo>
	friend class UnorderedMap;

	template <typename UMap>
	friend struct Impl::UnorderedMapErase;

	template <typename UMap>
	friend struct Impl::UnorderedMapHistogram;

	template <typename UMap>
	friend struct Impl::UnorderedMapPrint;
	};

	// Specialization of deep_copy for two UnorderedMap objects.
	template < typename DKey, typename DT, typename DDevice
	, typename SKey, typename ST, typename SDevice
	, typename Hasher, typename EqualTo >
	inline void deep_copy( UnorderedMap<DKey, DT, DDevice, Hasher, EqualTo> & dst
	, const UnorderedMap<SKey, ST, SDevice, Hasher, EqualTo> & src )
	{
	dst.create_copy_view(src);
	}


	} // namespace Kokkos

	#endif //KOKKOS_UNORDERED_MAP_HPP
	diff --git a/lib/kokkos/containers/unit_tests/CMakeLists.txt b/lib/kokkos/containers/unit_tests/CMakeLists.txt
	index b9d860f32..0c59c616d 100644
	--- a/lib/kokkos/containers/unit_tests/CMakeLists.txt
	+++ b/lib/kokkos/containers/unit_tests/CMakeLists.txt
	@@ -1,40 +1,51 @@

	INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
	INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
	INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )

	-SET(SOURCES
	- UnitTestMain.cpp
	- TestCuda.cpp
	- )
	-
	SET(LIBRARIES kokkoscore)

	IF(Kokkos_ENABLE_Pthread)
	- LIST( APPEND SOURCES
	- TestThreads.cpp
	+TRIBITS_ADD_EXECUTABLE_AND_TEST(
	+ UnitTest_Threads
	+ SOURCES TestThreads.cpp UnitTestMain.cpp
	+ COMM serial mpi
	+ NUM_MPI_PROCS 1
	+ FAIL_REGULAR_EXPRESSION " FAILED "
	+ TESTONLYLIBS kokkos_gtest
	)
	ENDIF()

	IF(Kokkos_ENABLE_Serial)
	- LIST( APPEND SOURCES
	- TestSerial.cpp
	+TRIBITS_ADD_EXECUTABLE_AND_TEST(
	+ UnitTest_Serial
	+ SOURCES TestSerial.cpp UnitTestMain.cpp
	+ COMM serial mpi
	+ NUM_MPI_PROCS 1
	+ FAIL_REGULAR_EXPRESSION " FAILED "
	+ TESTONLYLIBS kokkos_gtest
	)
	ENDIF()

	IF(Kokkos_ENABLE_OpenMP)
	- LIST( APPEND SOURCES
	- TestOpenMP.cpp
	+TRIBITS_ADD_EXECUTABLE_AND_TEST(
	+ UnitTest_OpenMP
	+ SOURCES TestOpenMP.cpp UnitTestMain.cpp
	+ COMM serial mpi
	+ NUM_MPI_PROCS 1
	+ FAIL_REGULAR_EXPRESSION " FAILED "
	+ TESTONLYLIBS kokkos_gtest
	)
	ENDIF()

	-
	+IF(Kokkos_ENABLE_Cuda)
	TRIBITS_ADD_EXECUTABLE_AND_TEST(
	- UnitTest
	- SOURCES ${SOURCES}
	+ UnitTest_Cuda
	+ SOURCES TestCuda.cpp UnitTestMain.cpp
	COMM serial mpi
	NUM_MPI_PROCS 1
	FAIL_REGULAR_EXPRESSION " FAILED "
	TESTONLYLIBS kokkos_gtest
	)
	-
	+ENDIF()
	+
	diff --git a/lib/kokkos/containers/unit_tests/TestDynamicView.hpp b/lib/kokkos/containers/unit_tests/TestDynamicView.hpp
	index 7e3ca005f..beb07bd79 100644
	--- a/lib/kokkos/containers/unit_tests/TestDynamicView.hpp
	+++ b/lib/kokkos/containers/unit_tests/TestDynamicView.hpp
	@@ -1,168 +1,171 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_TEST_DYNAMICVIEW_HPP
	#define KOKKOS_TEST_DYNAMICVIEW_HPP

	#include <gtest/gtest.h>
	#include <iostream>
	#include <cstdlib>
	#include <cstdio>
	#include <Kokkos_Core.hpp>

	#include <Kokkos_DynamicView.hpp>
	#include <impl/Kokkos_Timer.hpp>

	namespace Test {

	template< typename Scalar , class Space >
	struct TestDynamicView
	{
	typedef typename Space::execution_space execution_space ;
	typedef typename Space::memory_space memory_space ;

	typedef Kokkos::Experimental::MemoryPool<typename Space::device_type> memory_pool_type;

	typedef Kokkos::Experimental::DynamicView<Scalar*,Space> view_type;
	+ typedef typename view_type::const_type const_view_type ;

	typedef typename Kokkos::TeamPolicy<execution_space>::member_type member_type ;
	typedef double value_type;

	struct TEST {};
	struct VERIFY {};

	view_type a;
	const unsigned total_size ;

	TestDynamicView( const view_type & arg_a , const unsigned arg_total )
	: a(arg_a), total_size( arg_total ) {}

	KOKKOS_INLINE_FUNCTION
	void operator() ( const TEST , member_type team_member, double& value) const
	{
	const unsigned int team_idx = team_member.league_rank() * team_member.team_size();

	if ( team_member.team_rank() == 0 ) {
	unsigned n = team_idx + team_member.team_size();

	if ( total_size < n ) n = total_size ;

	a.resize_parallel( n );

	if ( a.extent(0) < n ) {
	Kokkos::abort("GrowTest TEST failed resize_parallel");
	}
	}

	// Make sure resize is done for all team members:
	team_member.team_barrier();

	const unsigned int val = team_idx + team_member.team_rank();

	if ( val < total_size ) {
	value += val ;

	a( val ) = val ;
	}
	}

	KOKKOS_INLINE_FUNCTION
	void operator() ( const VERIFY , member_type team_member, double& value) const
	{
	const unsigned int val =
	team_member.team_rank() +
	team_member.league_rank() * team_member.team_size();

	if ( val < total_size ) {

	if ( val != a(val) ) {
	Kokkos::abort("GrowTest VERIFY failed resize_parallel");
	}

	value += a(val);
	}
	}

	static void run( unsigned arg_total_size )
	{
	typedef Kokkos::TeamPolicy<execution_space,TEST> TestPolicy ;
	typedef Kokkos::TeamPolicy<execution_space,VERIFY> VerifyPolicy ;

	// printf("TestDynamicView::run(%d) construct memory pool\n",arg_total_size);

	memory_pool_type pool( memory_space() , arg_total_size * sizeof(Scalar) * 1.2 );

	// printf("TestDynamicView::run(%d) construct dynamic view\n",arg_total_size);

	view_type da("A",pool,arg_total_size);

	+ const_view_type ca(da);
	+
	// printf("TestDynamicView::run(%d) construct test functor\n",arg_total_size);

	TestDynamicView functor(da,arg_total_size);

	const unsigned team_size = TestPolicy::team_size_recommended(functor);
	const unsigned league_size = ( arg_total_size + team_size - 1 ) / team_size ;

	double reference = 0;
	double result = 0;

	// printf("TestDynamicView::run(%d) run functor test\n",arg_total_size);

	Kokkos::parallel_reduce( TestPolicy(league_size,team_size) , functor , reference);
	execution_space::fence();


	// printf("TestDynamicView::run(%d) run functor verify\n",arg_total_size);

	Kokkos::parallel_reduce( VerifyPolicy(league_size,team_size) , functor , result );
	execution_space::fence();

	// printf("TestDynamicView::run(%d) done\n",arg_total_size);

	}
	};

	} // namespace Test

	#endif /* #ifndef KOKKOS_TEST_DYNAMICVIEW_HPP */

	diff --git a/lib/kokkos/core/cmake/Dependencies.cmake b/lib/kokkos/core/cmake/Dependencies.cmake
	index ae9a20c50..8d9872725 100644
	--- a/lib/kokkos/core/cmake/Dependencies.cmake
	+++ b/lib/kokkos/core/cmake/Dependencies.cmake
	@@ -1,6 +1,6 @@
	TRIBITS_PACKAGE_DEFINE_DEPENDENCIES(
	- LIB_OPTIONAL_TPLS Pthread CUDA HWLOC QTHREAD DLlib
	+ LIB_OPTIONAL_TPLS Pthread CUDA HWLOC QTHREADS DLlib
	TEST_OPTIONAL_TPLS CUSPARSE
	)

	-TRIBITS_TPL_TENTATIVELY_ENABLE(DLlib)
	\ No newline at end of file
	+TRIBITS_TPL_TENTATIVELY_ENABLE(DLlib)
	diff --git a/lib/kokkos/core/cmake/KokkosCore_config.h.in b/lib/kokkos/core/cmake/KokkosCore_config.h.in
	index 9359b5a32..a71e60f20 100644
	--- a/lib/kokkos/core/cmake/KokkosCore_config.h.in
	+++ b/lib/kokkos/core/cmake/KokkosCore_config.h.in
	@@ -1,67 +1,67 @@
	#ifndef KOKKOS_CORE_CONFIG_H
	#define KOKKOS_CORE_CONFIG_H

	/* The trivial 'src/build_common.sh' creates a config
	* that must stay in sync with this file.
	*/
	#cmakedefine KOKKOS_FOR_SIERRA

	#if !defined( KOKKOS_FOR_SIERRA )

	#cmakedefine KOKKOS_HAVE_MPI
	#cmakedefine KOKKOS_HAVE_CUDA

	// mfh 16 Sep 2014: If passed in on the command line, that overrides
	// any value of KOKKOS_USE_CUDA_UVM here. Doing this should prevent build
	// warnings like this one:
	//
	// packages/kokkos/core/src/KokkosCore_config.h:13:1: warning: "KOKKOS_USE_CUDA_UVM" redefined
	//
	// At some point, we should edit the test-build scripts in
	// Trilinos/cmake/ctest/drivers/perseus/, and take
	// -DKOKKOS_USE_CUDA_UVM from the command-line arguments there. I
	// hesitate to do that now, because I'm not sure if all the files are
	// including KokkosCore_config.h (or a header file that includes it) like
	// they should.

	#if ! defined(KOKKOS_USE_CUDA_UVM)
	#cmakedefine KOKKOS_USE_CUDA_UVM
	#endif // ! defined(KOKKOS_USE_CUDA_UVM)

	#cmakedefine KOKKOS_HAVE_PTHREAD
	#cmakedefine KOKKOS_HAVE_SERIAL
	-#cmakedefine KOKKOS_HAVE_QTHREAD
	+#cmakedefine KOKKOS_HAVE_QTHREADS
	#cmakedefine KOKKOS_HAVE_Winthread
	#cmakedefine KOKKOS_HAVE_OPENMP
	#cmakedefine KOKKOS_HAVE_HWLOC
	#cmakedefine KOKKOS_HAVE_DEBUG
	#cmakedefine KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK
	#cmakedefine KOKKOS_HAVE_CXX11
	#cmakedefine KOKKOS_HAVE_CUSPARSE
	#cmakedefine KOKKOS_ENABLE_PROFILING_INTERNAL
	#ifdef KOKKOS_ENABLE_PROFILING_INTERNAL
	#define KOKKOS_ENABLE_PROFILING 1
	#else
	#define KOKKOS_ENABLE_PROFILING 0
	#endif

	#cmakedefine KOKKOS_HAVE_CUDA_RDC
	#ifdef KOKKOS_HAVE_CUDA_RDC
	#define KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE 1
	#endif

	#cmakedefine KOKKOS_HAVE_CUDA_LAMBDA
	#ifdef KOKKOS_HAVE_CUDA_LAMBDA
	#define KOKKOS_CUDA_USE_LAMBDA 1
	#endif

	// Don't forbid users from defining this macro on the command line,
	// but still make sure that CMake logic can control its definition.
	#if ! defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
	#cmakedefine KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA 1
	#endif // KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA

	#cmakedefine KOKKOS_USING_DEPRECATED_VIEW

	#endif // KOKKOS_FOR_SIERRA
	#endif // KOKKOS_CORE_CONFIG_H
	diff --git a/lib/kokkos/core/perf_test/Makefile b/lib/kokkos/core/perf_test/Makefile
	index 85f869971..3a0ad2d4c 100644
	--- a/lib/kokkos/core/perf_test/Makefile
	+++ b/lib/kokkos/core/perf_test/Makefile
	@@ -1,63 +1,62 @@
	KOKKOS_PATH = ../..

	GTEST_PATH = ../../tpls/gtest

	vpath %.cpp ${KOKKOS_PATH}/core/perf_test

	default: build_all
	echo "End Build"

	ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
	CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
	else
	CXX = g++
	endif

	CXXFLAGS = -O3
	LINK ?= $(CXX)
	LDFLAGS ?= -lpthread

	include $(KOKKOS_PATH)/Makefile.kokkos

	KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/core/perf_test

	TEST_TARGETS =
	TARGETS =

	OBJ_PERF = PerfTestHost.o PerfTestCuda.o PerfTestMain.o gtest-all.o
	TARGETS += KokkosCore_PerformanceTest
	TEST_TARGETS += test-performance

	OBJ_ATOMICS = test_atomic.o
	TARGETS += KokkosCore_PerformanceTest_Atomics
	TEST_TARGETS += test-atomic


	KokkosCore_PerformanceTest: $(OBJ_PERF) $(KOKKOS_LINK_DEPENDS)
	$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_PERF) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest

	KokkosCore_PerformanceTest_Atomics: $(OBJ_ATOMICS) $(KOKKOS_LINK_DEPENDS)
	$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_ATOMICS) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest_Atomics

	test-performance: KokkosCore_PerformanceTest
	./KokkosCore_PerformanceTest

	test-atomic: KokkosCore_PerformanceTest_Atomics
	./KokkosCore_PerformanceTest_Atomics


	build_all: $(TARGETS)

	test: $(TEST_TARGETS)

	clean: kokkos-clean
	rm -f *.o $(TARGETS)

	# Compilation rules

	%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<

	gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc
	-
	diff --git a/lib/kokkos/core/perf_test/PerfTestCuda.cpp b/lib/kokkos/core/perf_test/PerfTestCuda.cpp
	index 7386ecef2..65ce61fb5 100644
	--- a/lib/kokkos/core/perf_test/PerfTestCuda.cpp
	+++ b/lib/kokkos/core/perf_test/PerfTestCuda.cpp
	@@ -1,189 +1,199 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <iostream>
	#include <iomanip>
	#include <algorithm>
	#include <gtest/gtest.h>

	#include <Kokkos_Core.hpp>

	#if defined( KOKKOS_ENABLE_CUDA )

	#include <impl/Kokkos_Timer.hpp>

	+#include <PerfTestMDRange.hpp>
	+
	#include <PerfTestHexGrad.hpp>
	#include <PerfTestBlasKernels.hpp>
	#include <PerfTestGramSchmidt.hpp>
	#include <PerfTestDriver.hpp>


	namespace Test {

	class cuda : public ::testing::Test {
	protected:
	static void SetUpTestCase() {
	Kokkos::HostSpace::execution_space::initialize();
	Kokkos::Cuda::initialize( Kokkos::Cuda::SelectDevice(0) );
	}
	static void TearDownTestCase() {
	Kokkos::Cuda::finalize();
	Kokkos::HostSpace::execution_space::finalize();
	}
	};

	+//TEST_F( cuda, mdrange_lr ) {
	+// EXPECT_NO_THROW( (run_test_mdrange<Kokkos::Cuda , Kokkos::LayoutRight>( 5, 8, "Kokkos::Cuda" )) );
	+//}
	+
	+//TEST_F( cuda, mdrange_ll ) {
	+// EXPECT_NO_THROW( (run_test_mdrange<Kokkos::Cuda , Kokkos::LayoutLeft>( 5, 8, "Kokkos::Cuda" )) );
	+//}
	+
	TEST_F( cuda, hexgrad )
	{
	EXPECT_NO_THROW( run_test_hexgrad< Kokkos::Cuda >( 10 , 20, "Kokkos::Cuda" ) );
	}

	TEST_F( cuda, gramschmidt )
	{
	EXPECT_NO_THROW( run_test_gramschmidt< Kokkos::Cuda >( 10 , 20, "Kokkos::Cuda" ) );
	}

	namespace {

	template <typename T>
	struct TextureFetch
	{
	typedef Kokkos::View< T *, Kokkos::CudaSpace> array_type;
	typedef Kokkos::View< const T *, Kokkos::CudaSpace, Kokkos::MemoryRandomAccess> const_array_type;
	typedef Kokkos::View< int *, Kokkos::CudaSpace> index_array_type;
	typedef Kokkos::View< const int *, Kokkos::CudaSpace> const_index_array_type;

	struct FillArray
	{
	array_type m_array;
	FillArray( const array_type & array )
	: m_array(array)
	{}

	void apply() const
	{
	Kokkos::parallel_for( Kokkos::RangePolicy<Kokkos::Cuda,int>(0,m_array.dimension_0()), *this);
	}

	KOKKOS_INLINE_FUNCTION
	void operator()(int i) const { m_array(i) = i; }
	};

	struct RandomIndexes
	{
	index_array_type m_indexes;
	typename index_array_type::HostMirror m_host_indexes;
	RandomIndexes( const index_array_type & indexes)
	: m_indexes(indexes)
	, m_host_indexes(Kokkos::create_mirror(m_indexes))
	{}

	void apply() const
	{
	Kokkos::parallel_for( Kokkos::RangePolicy<Kokkos::HostSpace::execution_space,int>(0,m_host_indexes.dimension_0()), *this);
	//random shuffle
	Kokkos::HostSpace::execution_space::fence();
	std::random_shuffle(m_host_indexes.ptr_on_device(), m_host_indexes.ptr_on_device() + m_host_indexes.dimension_0());
	Kokkos::deep_copy(m_indexes,m_host_indexes);
	}

	KOKKOS_INLINE_FUNCTION
	void operator()(int i) const { m_host_indexes(i) = i; }
	};

	struct RandomReduce
	{
	const_array_type m_array;
	const_index_array_type m_indexes;
	RandomReduce( const const_array_type & array, const const_index_array_type & indexes)
	: m_array(array)
	, m_indexes(indexes)
	{}

	void apply(T & reduce) const
	{
	Kokkos::parallel_reduce( Kokkos::RangePolicy<Kokkos::Cuda,int>(0,m_array.dimension_0()), *this, reduce);
	}

	KOKKOS_INLINE_FUNCTION
	void operator()(int i, T & reduce) const
	{ reduce += m_array(m_indexes(i)); }
	};

	static void run(int size, double & reduce_time, T &reduce)
	{
	array_type array("array",size);
	index_array_type indexes("indexes",size);

	{ FillArray f(array); f.apply(); }
	{ RandomIndexes f(indexes); f.apply(); }

	Kokkos::Cuda::fence();

	Kokkos::Timer timer;
	for (int j=0; j<10; ++j) {
	RandomReduce f(array,indexes);
	f.apply(reduce);
	}
	Kokkos::Cuda::fence();
	reduce_time = timer.seconds();
	}
	};

	} // unnamed namespace

	TEST_F( cuda, texture_double )
	{
	printf("Random reduce of double through texture fetch\n");
	for (int i=1; i<=26; ++i) {
	int size = 1<<i;
	double time = 0;
	double reduce = 0;
	TextureFetch<double>::run(size,time,reduce);
	printf(" time = %1.3e size = 2^%d\n", time, i);
	}
	}

	} // namespace Test

	#endif /* #if defined( KOKKOS_ENABLE_CUDA ) */

	diff --git a/lib/kokkos/core/perf_test/PerfTestDriver.hpp b/lib/kokkos/core/perf_test/PerfTestDriver.hpp
	index 7b6cfc5b5..4732c3275 100644
	--- a/lib/kokkos/core/perf_test/PerfTestDriver.hpp
	+++ b/lib/kokkos/core/perf_test/PerfTestDriver.hpp
	@@ -1,152 +1,488 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <iostream>
	#include <string>

	// mfh 06 Jun 2013: This macro doesn't work like one might thing it
	// should. It doesn't take the template parameter DeviceType and
	// print its actual type name; it just literally prints out
	// "DeviceType". I've worked around this below without using the
	// macro, so I'm commenting out the macro to avoid compiler complaints
	// about an unused macro.

	// #define KOKKOS_IMPL_MACRO_TO_STRING( X ) #X
	// #define KOKKOS_MACRO_TO_STRING( X ) KOKKOS_IMPL_MACRO_TO_STRING( X )

	//------------------------------------------------------------------------

	namespace Test {

	enum { NUMBER_OF_TRIALS = 5 };

	+template< class DeviceType , class LayoutType >
	+void run_test_mdrange( int exp_beg , int exp_end, const char deviceTypeName[], int range_offset = 0, int tile_offset = 0 )
	+// exp_beg = 6 => 2^6 = 64 is starting range length
	+{
	+#define MDRANGE_PERFORMANCE_OUTPUT_VERBOSE 0
	+
	+ std::string label_mdrange ;
	+ label_mdrange.append( "\"MDRange< double , " );
	+ label_mdrange.append( deviceTypeName );
	+ label_mdrange.append( " >\"" );
	+
	+ std::string label_range_col2 ;
	+ label_range_col2.append( "\"RangeColTwo< double , " );
	+ label_range_col2.append( deviceTypeName );
	+ label_range_col2.append( " >\"" );
	+
	+ std::string label_range_col_all ;
	+ label_range_col_all.append( "\"RangeColAll< double , " );
	+ label_range_col_all.append( deviceTypeName );
	+ label_range_col_all.append( " >\"" );
	+
	+ if ( std::is_same<LayoutType, Kokkos::LayoutRight>::value) {
	+ std::cout << "--------------------------------------------------------------\n"
	+ << "Performance tests for MDRange Layout Right"
	+ << "\n--------------------------------------------------------------" << std::endl;
	+ } else {
	+ std::cout << "--------------------------------------------------------------\n"
	+ << "Performance tests for MDRange Layout Left"
	+ << "\n--------------------------------------------------------------" << std::endl;
	+ }
	+
	+
	+ for (int i = exp_beg ; i < exp_end ; ++i) {
	+ const int range_length = (1<<i) + range_offset;
	+
	+ std::cout << "\n--------------------------------------------------------------\n"
	+ << "--------------------------------------------------------------\n"
	+ << "MDRange Test: range bounds: " << range_length << " , " << range_length << " , " << range_length
	+ << "\n--------------------------------------------------------------\n"
	+ << "--------------------------------------------------------------\n";
	+// << std::endl;
	+
	+ int t0_min = 0, t1_min = 0, t2_min = 0;
	+ double seconds_min = 0.0;
	+
	+ // Test 1: The MDRange in full
	+ {
	+ int t0 = 1, t1 = 1, t2 = 1;
	+ int counter = 1;
	+#if !defined(KOKKOS_HAVE_CUDA)
	+ int min_bnd = 8;
	+ int tfast = range_length;
	+#else
	+ int min_bnd = 2;
	+ int tfast = 32;
	+#endif
	+ while ( tfast >= min_bnd ) {
	+ int tmid = min_bnd;
	+ while ( tmid < tfast ) {
	+ t0 = min_bnd;
	+ t1 = tmid;
	+ t2 = tfast;
	+ int t2_rev = min_bnd;
	+ int t1_rev = tmid;
	+ int t0_rev = tfast;
	+
	+#if defined(KOKKOS_HAVE_CUDA)
	+ //Note: Product of tile sizes must be < 1024 for Cuda
	+ if ( t0t1t2 >= 1024 ) {
	+ printf(" Exceeded Cuda tile limits; onto next range set\n\n");
	+ break;
	+ }
	+#endif
	+
	+ // Run 1 with tiles LayoutRight style
	+ double seconds_1 = 0;
	+ { seconds_1 = MultiDimRangePerf3D< DeviceType , double , LayoutType >::test_multi_index(range_length,range_length,range_length, t0, t1, t2) ; }
	+
	+#if MDRANGE_PERFORMANCE_OUTPUT_VERBOSE
	+ std::cout << label_mdrange
	+ << " , " << t0 << " , " << t1 << " , " << t2
	+ << " , " << seconds_1
	+ << std::endl ;
	+#endif
	+
	+ if ( counter == 1 ) {
	+ seconds_min = seconds_1;
	+ t0_min = t0;
	+ t1_min = t1;
	+ t2_min = t2;
	+ }
	+ else {
	+ if ( seconds_1 < seconds_min )
	+ {
	+ seconds_min = seconds_1;
	+ t0_min = t0;
	+ t1_min = t1;
	+ t2_min = t2;
	+ }
	+ }
	+
	+ // Run 2 with tiles LayoutLeft style - reverse order of tile dims
	+ double seconds_1rev = 0;
	+ { seconds_1rev = MultiDimRangePerf3D< DeviceType , double , LayoutType >::test_multi_index(range_length,range_length,range_length, t0_rev, t1_rev, t2_rev) ; }
	+
	+#if MDRANGE_PERFORMANCE_OUTPUT_VERBOSE
	+ std::cout << label_mdrange
	+ << " , " << t0_rev << " , " << t1_rev << " , " << t2_rev
	+ << " , " << seconds_1rev
	+ << std::endl ;
	+#endif
	+
	+ if ( seconds_1rev < seconds_min )
	+ {
	+ seconds_min = seconds_1rev;
	+ t0_min = t0_rev;
	+ t1_min = t1_rev;
	+ t2_min = t2_rev;
	+ }
	+
	+ ++counter;
	+ tmid <<= 1;
	+ } //end inner while
	+ tfast >>=1;
	+ } //end outer while
	+
	+ std::cout << "\n"
	+ << "--------------------------------------------------------------\n"
	+ << label_mdrange
	+ << "\n Min values "
	+ << "\n Range length per dim (3D): " << range_length
	+ << "\n TileDims: " << t0_min << " , " << t1_min << " , " << t2_min
	+ << "\n Min time: " << seconds_min
	+ << "\n---------------------------------------------------------------"
	+ << std::endl ;
	+ } //end scope
	+
	+#if !defined(KOKKOS_HAVE_CUDA)
	+ double seconds_min_c = 0.0;
	+ int t0c_min = 0, t1c_min = 0, t2c_min = 0;
	+ int counter = 1;
	+ {
	+ int min_bnd = 8;
	+ // Test 1_c: MDRange with 0 for 'inner' tile dim; this case will utilize the full span in that direction, should be similar to Collapse<2>
	+ if ( std::is_same<LayoutType, Kokkos::LayoutRight>::value ) {
	+ for ( unsigned int T0 = min_bnd; T0 < static_cast<unsigned int>(range_length); T0<<=1 ) {
	+ for ( unsigned int T1 = min_bnd; T1 < static_cast<unsigned int>(range_length); T1<<=1 ) {
	+ double seconds_c = 0;
	+ { seconds_c = MultiDimRangePerf3D< DeviceType , double , LayoutType >::test_multi_index(range_length,range_length,range_length, T0, T1, 0) ; }
	+
	+#if MDRANGE_PERFORMANCE_OUTPUT_VERBOSE
	+ std::cout << " MDRange LR with '0' tile - collapse-like \n"
	+ << label_mdrange
	+ << " , " << T0 << " , " << T1 << " , " << range_length
	+ << " , " << seconds_c
	+ << std::endl ;
	+#endif
	+
	+ t2c_min = range_length;
	+ if ( counter == 1 ) {
	+ seconds_min_c = seconds_c;
	+ t0c_min = T0;
	+ t1c_min = T1;
	+ }
	+ else {
	+ if ( seconds_c < seconds_min_c )
	+ {
	+ seconds_min_c = seconds_c;
	+ t0c_min = T0;
	+ t1c_min = T1;
	+ }
	+ }
	+ ++counter;
	+ }
	+ }
	+ }
	+ else {
	+ for ( unsigned int T1 = min_bnd; T1 <= static_cast<unsigned int>(range_length); T1<<=1 ) {
	+ for ( unsigned int T2 = min_bnd; T2 <= static_cast<unsigned int>(range_length); T2<<=1 ) {
	+ double seconds_c = 0;
	+ { seconds_c = MultiDimRangePerf3D< DeviceType , double , LayoutType >::test_multi_index(range_length,range_length,range_length, 0, T1, T2) ; }
	+
	+#if MDRANGE_PERFORMANCE_OUTPUT_VERBOSE
	+ std::cout << " MDRange LL with '0' tile - collapse-like \n"
	+ << label_mdrange
	+ << " , " <<range_length << " < " << T1 << " , " << T2
	+ << " , " << seconds_c
	+ << std::endl ;
	+#endif
	+
	+
	+ t0c_min = range_length;
	+ if ( counter == 1 ) {
	+ seconds_min_c = seconds_c;
	+ t1c_min = T1;
	+ t2c_min = T2;
	+ }
	+ else {
	+ if ( seconds_c < seconds_min_c )
	+ {
	+ seconds_min_c = seconds_c;
	+ t1c_min = T1;
	+ t2c_min = T2;
	+ }
	+ }
	+ ++counter;
	+ }
	+ }
	+ }
	+
	+ std::cout
	+// << "--------------------------------------------------------------\n"
	+ << label_mdrange
	+ << " Collapse<2> style: "
	+ << "\n Min values "
	+ << "\n Range length per dim (3D): " << range_length
	+ << "\n TileDims: " << t0c_min << " , " << t1c_min << " , " << t2c_min
	+ << "\n Min time: " << seconds_min_c
	+ << "\n---------------------------------------------------------------"
	+ << std::endl ;
	+ } //end scope test 2
	+#endif
	+
	+
	+ // Test 2: RangePolicy Collapse2 style
	+ double seconds_2 = 0;
	+ { seconds_2 = RangePolicyCollapseTwo< DeviceType , double , LayoutType >::test_index_collapse_two(range_length,range_length,range_length) ; }
	+ std::cout << label_range_col2
	+ << " , " << range_length
	+ << " , " << seconds_2
	+ << std::endl ;
	+
	+
	+ // Test 3: RangePolicy Collapse all style - not necessary, always slow
	+ /*
	+ double seconds_3 = 0;
	+ { seconds_3 = RangePolicyCollapseAll< DeviceType , double , LayoutType >::test_collapse_all(range_length,range_length,range_length) ; }
	+ std::cout << label_range_col_all
	+ << " , " << range_length
	+ << " , " << seconds_3
	+ << "\n---------------------------------------------------------------"
	+ << std::endl ;
	+ */
	+
	+ // Compare fastest times... will never be collapse all so ignore it
	+ // seconds_min = tiled MDRange
	+ // seconds_min_c = collapse<2>-like MDRange (tiledim = span for fast dim) - only for non-Cuda, else tile too long
	+ // seconds_2 = collapse<2>-style RangePolicy
	+ // seconds_3 = collapse<3>-style RangePolicy
	+
	+#if !defined(KOKKOS_HAVE_CUDA)
	+ if ( seconds_min < seconds_min_c ) {
	+ if ( seconds_min < seconds_2 ) {
	+ std::cout << "--------------------------------------------------------------\n"
	+ << " Fastest run: MDRange tiled\n"
	+ << " Time: " << seconds_min
	+ << " Difference: " << seconds_2 - seconds_min
	+ << " Other times: \n"
	+ << " MDrange collapse-like (tiledim = span on fast dim) type: " << seconds_min_c << "\n"
	+ << " Collapse2 Range Policy: " << seconds_2 << "\n"
	+ << "\n--------------------------------------------------------------"
	+ << "\n--------------------------------------------------------------"
	+ //<< "\n\n"
	+ << std::endl;
	+ }
	+ else if ( seconds_min > seconds_2 ) {
	+ std::cout << " Fastest run: Collapse2 RangePolicy\n"
	+ << " Time: " << seconds_2
	+ << " Difference: " << seconds_min - seconds_2
	+ << " Other times: \n"
	+ << " MDrange Tiled: " << seconds_min << "\n"
	+ << " MDrange collapse-like (tiledim = span on fast dim) type: " << seconds_min_c << "\n"
	+ << "\n--------------------------------------------------------------"
	+ << "\n--------------------------------------------------------------"
	+ //<< "\n\n"
	+ << std::endl;
	+ }
	+ }
	+ else if ( seconds_min > seconds_min_c ) {
	+ if ( seconds_min_c < seconds_2 ) {
	+ std::cout << "--------------------------------------------------------------\n"
	+ << " Fastest run: MDRange collapse-like (tiledim = span on fast dim) type\n"
	+ << " Time: " << seconds_min_c
	+ << " Difference: " << seconds_2 - seconds_min_c
	+ << " Other times: \n"
	+ << " MDrange Tiled: " << seconds_min << "\n"
	+ << " Collapse2 Range Policy: " << seconds_2 << "\n"
	+ << "\n--------------------------------------------------------------"
	+ << "\n--------------------------------------------------------------"
	+ //<< "\n\n"
	+ << std::endl;
	+ }
	+ else if ( seconds_min_c > seconds_2 ) {
	+ std::cout << " Fastest run: Collapse2 RangePolicy\n"
	+ << " Time: " << seconds_2
	+ << " Difference: " << seconds_min_c - seconds_2
	+ << " Other times: \n"
	+ << " MDrange Tiled: " << seconds_min << "\n"
	+ << " MDrange collapse-like (tiledim = span on fast dim) type: " << seconds_min_c << "\n"
	+ << "\n--------------------------------------------------------------"
	+ << "\n--------------------------------------------------------------"
	+ //<< "\n\n"
	+ << std::endl;
	+ }
	+ } // end else if
	+#else
	+ if ( seconds_min < seconds_2 ) {
	+ std::cout << "--------------------------------------------------------------\n"
	+ << " Fastest run: MDRange tiled\n"
	+ << " Time: " << seconds_min
	+ << " Difference: " << seconds_2 - seconds_min
	+ << " Other times: \n"
	+ << " Collapse2 Range Policy: " << seconds_2 << "\n"
	+ << "\n--------------------------------------------------------------"
	+ << "\n--------------------------------------------------------------"
	+ //<< "\n\n"
	+ << std::endl;
	+ }
	+ else if ( seconds_min > seconds_2 ) {
	+ std::cout << " Fastest run: Collapse2 RangePolicy\n"
	+ << " Time: " << seconds_2
	+ << " Difference: " << seconds_min - seconds_2
	+ << " Other times: \n"
	+ << " MDrange Tiled: " << seconds_min << "\n"
	+ << "\n--------------------------------------------------------------"
	+ << "\n--------------------------------------------------------------"
	+ //<< "\n\n"
	+ << std::endl;
	+ }
	+#endif
	+
	+ } //end for
	+
	+#undef MDRANGE_PERFORMANCE_OUTPUT_VERBOSE
	+
	+}


	template< class DeviceType >
	void run_test_hexgrad( int exp_beg , int exp_end, const char deviceTypeName[] )
	{
	std::string label_hexgrad ;
	label_hexgrad.append( "\"HexGrad< double , " );
	// mfh 06 Jun 2013: This only appends "DeviceType" (literally) to
	// the string, not the actual name of the device type. Thus, I've
	// modified the function to take the name of the device type.
	//
	//label_hexgrad.append( KOKKOS_MACRO_TO_STRING( DeviceType ) );
	label_hexgrad.append( deviceTypeName );
	label_hexgrad.append( " >\"" );

	for (int i = exp_beg ; i < exp_end ; ++i) {
	double min_seconds = 0.0 ;
	double max_seconds = 0.0 ;
	double avg_seconds = 0.0 ;

	const int parallel_work_length = 1<<i;

	for ( int j = 0 ; j < NUMBER_OF_TRIALS ; ++j ) {
	const double seconds = HexGrad< DeviceType >::test(parallel_work_length) ;

	if ( 0 == j ) {
	min_seconds = seconds ;
	max_seconds = seconds ;
	}
	else {
	if ( seconds < min_seconds ) min_seconds = seconds ;
	if ( seconds > max_seconds ) max_seconds = seconds ;
	}
	avg_seconds += seconds ;
	}
	avg_seconds /= NUMBER_OF_TRIALS ;

	std::cout << label_hexgrad
	<< " , " << parallel_work_length
	<< " , " << min_seconds
	<< " , " << ( min_seconds / parallel_work_length )
	<< std::endl ;
	}
	}

	template< class DeviceType >
	void run_test_gramschmidt( int exp_beg , int exp_end, const char deviceTypeName[] )
	{
	std::string label_gramschmidt ;
	label_gramschmidt.append( "\"GramSchmidt< double , " );
	// mfh 06 Jun 2013: This only appends "DeviceType" (literally) to
	// the string, not the actual name of the device type. Thus, I've
	// modified the function to take the name of the device type.
	//
	//label_gramschmidt.append( KOKKOS_MACRO_TO_STRING( DeviceType ) );
	label_gramschmidt.append( deviceTypeName );
	label_gramschmidt.append( " >\"" );

	for (int i = exp_beg ; i < exp_end ; ++i) {
	double min_seconds = 0.0 ;
	double max_seconds = 0.0 ;
	double avg_seconds = 0.0 ;

	const int parallel_work_length = 1<<i;

	for ( int j = 0 ; j < NUMBER_OF_TRIALS ; ++j ) {
	const double seconds = ModifiedGramSchmidt< double , DeviceType >::test(parallel_work_length, 32 ) ;

	if ( 0 == j ) {
	min_seconds = seconds ;
	max_seconds = seconds ;
	}
	else {
	if ( seconds < min_seconds ) min_seconds = seconds ;
	if ( seconds > max_seconds ) max_seconds = seconds ;
	}
	avg_seconds += seconds ;
	}
	avg_seconds /= NUMBER_OF_TRIALS ;

	std::cout << label_gramschmidt
	<< " , " << parallel_work_length
	<< " , " << min_seconds
	<< " , " << ( min_seconds / parallel_work_length )
	<< std::endl ;
	}
	}

	}

	diff --git a/lib/kokkos/core/perf_test/PerfTestHost.cpp b/lib/kokkos/core/perf_test/PerfTestHost.cpp
	index 606177ca5..831d58110 100644
	--- a/lib/kokkos/core/perf_test/PerfTestHost.cpp
	+++ b/lib/kokkos/core/perf_test/PerfTestHost.cpp
	@@ -1,115 +1,125 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <Kokkos_Core.hpp>

	#if defined( KOKKOS_ENABLE_OPENMP )

	typedef Kokkos::OpenMP TestHostDevice ;
	const char TestHostDeviceName[] = "Kokkos::OpenMP" ;

	#elif defined( KOKKOS_ENABLE_PTHREAD )

	typedef Kokkos::Threads TestHostDevice ;
	const char TestHostDeviceName[] = "Kokkos::Threads" ;

	#elif defined( KOKKOS_ENABLE_SERIAL )

	typedef Kokkos::Serial TestHostDevice ;
	const char TestHostDeviceName[] = "Kokkos::Serial" ;

	#else
	# error "You must enable at least one of the following execution spaces in order to build this test: Kokkos::Threads, Kokkos::OpenMP, or Kokkos::Serial."
	#endif

	#include <impl/Kokkos_Timer.hpp>

	+#include <PerfTestMDRange.hpp>
	+
	#include <PerfTestHexGrad.hpp>
	#include <PerfTestBlasKernels.hpp>
	#include <PerfTestGramSchmidt.hpp>
	#include <PerfTestDriver.hpp>

	//------------------------------------------------------------------------

	namespace Test {

	class host : public ::testing::Test {
	protected:
	static void SetUpTestCase()
	{
	if(Kokkos::hwloc::available()) {
	const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
	const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
	const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();

	unsigned threads_count = 0 ;

	threads_count = std::max( 1u , numa_count )
	* std::max( 2u , cores_per_numa * threads_per_core );

	TestHostDevice::initialize( threads_count );
	} else {
	const unsigned thread_count = 4 ;
	TestHostDevice::initialize( thread_count );
	}
	}

	static void TearDownTestCase()
	{
	TestHostDevice::finalize();
	}
	};

	+//TEST_F( host, mdrange_lr ) {
	+// EXPECT_NO_THROW( (run_test_mdrange<TestHostDevice , Kokkos::LayoutRight> (5, 8, TestHostDeviceName) ) );
	+//}
	+
	+//TEST_F( host, mdrange_ll ) {
	+// EXPECT_NO_THROW( (run_test_mdrange<TestHostDevice , Kokkos::LayoutLeft> (5, 8, TestHostDeviceName) ) );
	+//}
	+
	TEST_F( host, hexgrad ) {
	EXPECT_NO_THROW(run_test_hexgrad< TestHostDevice>( 10, 20, TestHostDeviceName ));
	}

	TEST_F( host, gramschmidt ) {
	EXPECT_NO_THROW(run_test_gramschmidt< TestHostDevice>( 10, 20, TestHostDeviceName ));
	}

	} // namespace Test


	diff --git a/lib/kokkos/core/perf_test/PerfTestMDRange.hpp b/lib/kokkos/core/perf_test/PerfTestMDRange.hpp
	new file mode 100644
	index 000000000..d910b513c
	--- /dev/null
	+++ b/lib/kokkos/core/perf_test/PerfTestMDRange.hpp
	@@ -0,0 +1,564 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+namespace Test {
	+template< class DeviceType
	+ , typename ScalarType = double
	+ , typename TestLayout = Kokkos::LayoutRight
	+ >
	+struct MultiDimRangePerf3D
	+{
	+ typedef DeviceType execution_space;
	+ typedef typename execution_space::size_type size_type;
	+
	+ using iterate_type = Kokkos::Experimental::Iterate;
	+
	+ typedef Kokkos::View<ScalarType***, TestLayout, DeviceType> view_type;
	+ typedef typename view_type::HostMirror host_view_type;
	+
	+ view_type A;
	+ view_type B;
	+ const long irange;
	+ const long jrange;
	+ const long krange;
	+
	+ MultiDimRangePerf3D(const view_type & A_, const view_type & B_, const long &irange_, const long &jrange_, const long &krange_)
	+ : A(A_), B(B_), irange(irange_), jrange(jrange_), krange(krange_)
	+ {}
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()(const long i, const long j, const long k) const
	+ {
	+ A(i,j,k) = 0.25*(ScalarType)( B(i+2,j,k) + B(i+1,j,k)
	+ + B(i,j+2,k) + B(i,j+1,k)
	+ + B(i,j,k+2) + B(i,j,k+1)
	+ + B(i,j,k) );
	+ }
	+
	+
	+ struct InitZeroTag {};
	+// struct InitViewTag {};
	+
	+ struct Init
	+ {
	+
	+ Init(const view_type & input_, const long &irange_, const long &jrange_, const long &krange_)
	+ : input(input_), irange(irange_), jrange(jrange_), krange(krange_) {}
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()(const long i, const long j, const long k) const
	+ {
	+ input(i,j,k) = 1.0;
	+ }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()(const InitZeroTag&, const long i, const long j, const long k) const
	+ {
	+ input(i,j,k) = 0;
	+ }
	+
	+ view_type input;
	+ const long irange;
	+ const long jrange;
	+ const long krange;
	+ };
	+
	+
	+ static double test_multi_index(const unsigned int icount, const unsigned int jcount, const unsigned int kcount, const unsigned int Ti = 1, const unsigned int Tj = 1, const unsigned int Tk = 1, const long iter = 1)
	+ {
	+ //This test performs multidim range over all dims
	+ view_type Atest("Atest", icount, jcount, kcount);
	+ view_type Btest("Btest", icount+2, jcount+2, kcount+2);
	+ typedef MultiDimRangePerf3D<execution_space,ScalarType,TestLayout> FunctorType;
	+
	+ double dt_min = 0;
	+
	+ // LayoutRight
	+ if ( std::is_same<TestLayout, Kokkos::LayoutRight>::value ) {
	+ Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3, iterate_type::Right, iterate_type::Right>, execution_space > policy_initA({{0,0,0}},{{icount,jcount,kcount}},{{Ti,Tj,Tk}});
	+ Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3, iterate_type::Right, iterate_type::Right>, execution_space > policy_initB({{0,0,0}},{{icount+2,jcount+2,kcount+2}},{{Ti,Tj,Tk}});
	+
	+ typedef typename Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3, iterate_type::Right, iterate_type::Right>, execution_space > MDRangeType;
	+ using tile_type = typename MDRangeType::tile_type;
	+ using point_type = typename MDRangeType::point_type;
	+
	+ Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3, iterate_type::Right, iterate_type::Right>, execution_space > policy(point_type{{0,0,0}},point_type{{icount,jcount,kcount}},tile_type{{Ti,Tj,Tk}} );
	+
	+ Kokkos::Experimental::md_parallel_for( policy_initA, Init(Atest, icount, jcount, kcount) );
	+ execution_space::fence();
	+ Kokkos::Experimental::md_parallel_for( policy_initB, Init(Btest, icount+2, jcount+2, kcount+2) );
	+ execution_space::fence();
	+
	+ for (int i = 0; i < iter; ++i)
	+ {
	+ Kokkos::Timer timer;
	+ Kokkos::Experimental::md_parallel_for( policy, FunctorType(Atest, Btest, icount, jcount, kcount) );
	+ execution_space::fence();
	+ const double dt = timer.seconds();
	+ if ( 0 == i ) dt_min = dt ;
	+ else dt_min = dt < dt_min ? dt : dt_min ;
	+
	+ //Correctness check - only the first run
	+ if ( 0 == i )
	+ {
	+ long numErrors = 0;
	+ host_view_type Ahost("Ahost", icount, jcount, kcount);
	+ Kokkos::deep_copy(Ahost, Atest);
	+ host_view_type Bhost("Bhost", icount+2, jcount+2, kcount+2);
	+ Kokkos::deep_copy(Bhost, Btest);
	+
	+ // On KNL, this may vectorize - add print statement to prevent
	+ // Also, compare against epsilon, as vectorization can change bitwise answer
	+ for ( long l = 0; l < static_cast<long>(icount); ++l ) {
	+ for ( long j = 0; j < static_cast<long>(jcount); ++j ) {
	+ for ( long k = 0; k < static_cast<long>(kcount); ++k ) {
	+ ScalarType check = 0.25*(ScalarType)( Bhost(l+2,j,k) + Bhost(l+1,j,k)
	+ + Bhost(l,j+2,k) + Bhost(l,j+1,k)
	+ + Bhost(l,j,k+2) + Bhost(l,j,k+1)
	+ + Bhost(l,j,k) );
	+ if ( Ahost(l,j,k) - check != 0 ) {
	+ ++numErrors;
	+ std::cout << " Correctness error at index: " << l << ","<<j<<","<<k<<"\n"
	+ << " multi Ahost = " << Ahost(l,j,k) << " expected = " << check
	+ << " multi Bhost(ijk) = " << Bhost(l,j,k)
	+ << " multi Bhost(l+1jk) = " << Bhost(l+1,j,k)
	+ << " multi Bhost(l+2jk) = " << Bhost(l+2,j,k)
	+ << " multi Bhost(ij+1k) = " << Bhost(l,j+1,k)
	+ << " multi Bhost(ij+2k) = " << Bhost(l,j+2,k)
	+ << " multi Bhost(ijk+1) = " << Bhost(l,j,k+1)
	+ << " multi Bhost(ijk+2) = " << Bhost(l,j,k+2)
	+ << std::endl;
	+ //exit(-1);
	+ }
	+ } } }
	+ if ( numErrors != 0 ) { std::cout << "LR multi: errors " << numErrors << " range product " << icountjcountkcount << " LL " << jcountkcount << " LR " << icountjcount << std::endl; }
	+ //else { std::cout << " multi: No errors!" << std::endl; }
	+ }
	+ } //end for
	+
	+ }
	+ // LayoutLeft
	+ else {
	+ Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3,iterate_type::Left,iterate_type::Left>, execution_space > policy_initA({{0,0,0}},{{icount,jcount,kcount}},{{Ti,Tj,Tk}});
	+ Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3,iterate_type::Left,iterate_type::Left>, execution_space > policy_initB({{0,0,0}},{{icount+2,jcount+2,kcount+2}},{{Ti,Tj,Tk}});
	+
	+ //typedef typename Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3, iterate_type::Left, iterate_type::Left>, execution_space > MDRangeType;
	+ //using tile_type = typename MDRangeType::tile_type;
	+ //using point_type = typename MDRangeType::point_type;
	+ //Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3, iterate_type::Left, iterate_type::Left>, execution_space > policy(point_type{{0,0,0}},point_type{{icount,jcount,kcount}},tile_type{{Ti,Tj,Tk}} );
	+ Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3, iterate_type::Left, iterate_type::Left>, execution_space > policy({{0,0,0}},{{icount,jcount,kcount}},{{Ti,Tj,Tk}} );
	+
	+ Kokkos::Experimental::md_parallel_for( policy_initA, Init(Atest, icount, jcount, kcount) );
	+ execution_space::fence();
	+ Kokkos::Experimental::md_parallel_for( policy_initB, Init(Btest, icount+2, jcount+2, kcount+2) );
	+ execution_space::fence();
	+
	+ for (int i = 0; i < iter; ++i)
	+ {
	+ Kokkos::Timer timer;
	+ Kokkos::Experimental::md_parallel_for( policy, FunctorType(Atest, Btest, icount, jcount, kcount) );
	+ execution_space::fence();
	+ const double dt = timer.seconds();
	+ if ( 0 == i ) dt_min = dt ;
	+ else dt_min = dt < dt_min ? dt : dt_min ;
	+
	+ //Correctness check - only the first run
	+ if ( 0 == i )
	+ {
	+ long numErrors = 0;
	+ host_view_type Ahost("Ahost", icount, jcount, kcount);
	+ Kokkos::deep_copy(Ahost, Atest);
	+ host_view_type Bhost("Bhost", icount+2, jcount+2, kcount+2);
	+ Kokkos::deep_copy(Bhost, Btest);
	+
	+ // On KNL, this may vectorize - add print statement to prevent
	+ // Also, compare against epsilon, as vectorization can change bitwise answer
	+ for ( long l = 0; l < static_cast<long>(icount); ++l ) {
	+ for ( long j = 0; j < static_cast<long>(jcount); ++j ) {
	+ for ( long k = 0; k < static_cast<long>(kcount); ++k ) {
	+ ScalarType check = 0.25*(ScalarType)( Bhost(l+2,j,k) + Bhost(l+1,j,k)
	+ + Bhost(l,j+2,k) + Bhost(l,j+1,k)
	+ + Bhost(l,j,k+2) + Bhost(l,j,k+1)
	+ + Bhost(l,j,k) );
	+ if ( Ahost(l,j,k) - check != 0 ) {
	+ ++numErrors;
	+ std::cout << " Correctness error at index: " << l << ","<<j<<","<<k<<"\n"
	+ << " multi Ahost = " << Ahost(l,j,k) << " expected = " << check
	+ << " multi Bhost(ijk) = " << Bhost(l,j,k)
	+ << " multi Bhost(l+1jk) = " << Bhost(l+1,j,k)
	+ << " multi Bhost(l+2jk) = " << Bhost(l+2,j,k)
	+ << " multi Bhost(ij+1k) = " << Bhost(l,j+1,k)
	+ << " multi Bhost(ij+2k) = " << Bhost(l,j+2,k)
	+ << " multi Bhost(ijk+1) = " << Bhost(l,j,k+1)
	+ << " multi Bhost(ijk+2) = " << Bhost(l,j,k+2)
	+ << std::endl;
	+ //exit(-1);
	+ }
	+ } } }
	+ if ( numErrors != 0 ) { std::cout << " LL multi run: errors " << numErrors << " range product " << icountjcountkcount << " LL " << jcountkcount << " LR " << icountjcount << std::endl; }
	+ //else { std::cout << " multi: No errors!" << std::endl; }
	+
	+ }
	+ } //end for
	+ }
	+
	+ return dt_min;
	+ }
	+
	+};
	+
	+
	+template< class DeviceType
	+ , typename ScalarType = double
	+ , typename TestLayout = Kokkos::LayoutRight
	+ >
	+struct RangePolicyCollapseTwo
	+{
	+ // RangePolicy for 3D range, but will collapse only 2 dims => like Rank<2> for multi-dim; unroll 2 dims in one-dim
	+
	+ typedef DeviceType execution_space;
	+ typedef typename execution_space::size_type size_type;
	+ typedef TestLayout layout;
	+
	+ using iterate_type = Kokkos::Experimental::Iterate;
	+
	+ typedef Kokkos::View<ScalarType***, TestLayout, DeviceType> view_type;
	+ typedef typename view_type::HostMirror host_view_type;
	+
	+ view_type A;
	+ view_type B;
	+ const long irange;
	+ const long jrange;
	+ const long krange;
	+
	+ RangePolicyCollapseTwo(view_type & A_, const view_type & B_, const long &irange_, const long &jrange_, const long &krange_)
	+ : A(A_), B(B_) , irange(irange_), jrange(jrange_), krange(krange_)
	+ {}
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()(const long r) const
	+ {
	+ if ( std::is_same<TestLayout, Kokkos::LayoutRight>::value )
	+ {
	+//id(i,j,k) = k + jNk + iNkNj = k + Nk(j + iNj) = k + Nkr
	+//r = j + i*Nj
	+ long i = int(r / jrange);
	+ long j = int( r - i*jrange);
	+ for (int k = 0; k < krange; ++k) {
	+ A(i,j,k) = 0.25*(ScalarType)( B(i+2,j,k) + B(i+1,j,k)
	+ + B(i,j+2,k) + B(i,j+1,k)
	+ + B(i,j,k+2) + B(i,j,k+1)
	+ + B(i,j,k) );
	+ }
	+ }
	+ else if ( std::is_same<TestLayout, Kokkos::LayoutLeft>::value )
	+ {
	+//id(i,j,k) = i + jNi + kNiNj = i + Ni(j + kNj) = i + Nir
	+//r = j + k*Nj
	+ long k = int(r / jrange);
	+ long j = int( r - k*jrange);
	+ for (int i = 0; i < irange; ++i) {
	+ A(i,j,k) = 0.25*(ScalarType)( B(i+2,j,k) + B(i+1,j,k)
	+ + B(i,j+2,k) + B(i,j+1,k)
	+ + B(i,j,k+2) + B(i,j,k+1)
	+ + B(i,j,k) );
	+ }
	+ }
	+ }
	+
	+
	+ struct Init
	+ {
	+ view_type input;
	+ const long irange;
	+ const long jrange;
	+ const long krange;
	+
	+ Init(const view_type & input_, const long &irange_, const long &jrange_, const long &krange_)
	+ : input(input_), irange(irange_), jrange(jrange_), krange(krange_) {}
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()(const long r) const
	+ {
	+ if ( std::is_same<TestLayout, Kokkos::LayoutRight>::value )
	+ {
	+ long i = int(r / jrange);
	+ long j = int( r - i*jrange);
	+ for (int k = 0; k < krange; ++k) {
	+ input(i,j,k) = 1;
	+ }
	+ }
	+ else if ( std::is_same<TestLayout, Kokkos::LayoutLeft>::value )
	+ {
	+ long k = int(r / jrange);
	+ long j = int( r - k*jrange);
	+ for (int i = 0; i < irange; ++i) {
	+ input(i,j,k) = 1;
	+ }
	+ }
	+ }
	+ };
	+
	+
	+ static double test_index_collapse_two(const unsigned int icount, const unsigned int jcount, const unsigned int kcount, const long iter = 1)
	+ {
	+ // This test refers to collapsing two dims while using the RangePolicy
	+ view_type Atest("Atest", icount, jcount, kcount);
	+ view_type Btest("Btest", icount+2, jcount+2, kcount+2);
	+ typedef RangePolicyCollapseTwo<execution_space,ScalarType,TestLayout> FunctorType;
	+
	+ long collapse_index_rangeA = 0;
	+ long collapse_index_rangeB = 0;
	+ if ( std::is_same<TestLayout, Kokkos::LayoutRight>::value ) {
	+ collapse_index_rangeA = icount*jcount;
	+ collapse_index_rangeB = (icount+2)*(jcount+2);
	+// std::cout << " LayoutRight " << std::endl;
	+ } else if ( std::is_same<TestLayout, Kokkos::LayoutLeft>::value ) {
	+ collapse_index_rangeA = kcount*jcount;
	+ collapse_index_rangeB = (kcount+2)*(jcount+2);
	+// std::cout << " LayoutLeft " << std::endl;
	+ } else {
	+ std::cout << " LayoutRight or LayoutLeft required - will pass 0 as range instead " << std::endl;
	+ exit(-1);
	+ }
	+
	+ Kokkos::RangePolicy<execution_space> policy(0, (collapse_index_rangeA) );
	+ Kokkos::RangePolicy<execution_space> policy_initB(0, (collapse_index_rangeB) );
	+
	+ double dt_min = 0;
	+
	+ Kokkos::parallel_for( policy, Init(Atest,icount,jcount,kcount) );
	+ execution_space::fence();
	+ Kokkos::parallel_for( policy_initB, Init(Btest,icount+2,jcount+2,kcount+2) );
	+ execution_space::fence();
	+
	+ for (int i = 0; i < iter; ++i)
	+ {
	+ Kokkos::Timer timer;
	+ Kokkos::parallel_for(policy, FunctorType(Atest, Btest, icount, jcount, kcount));
	+ execution_space::fence();
	+ const double dt = timer.seconds();
	+ if ( 0 == i ) dt_min = dt ;
	+ else dt_min = dt < dt_min ? dt : dt_min ;
	+
	+ //Correctness check - first iteration only
	+ if ( 0 == i )
	+ {
	+ long numErrors = 0;
	+ host_view_type Ahost("Ahost", icount, jcount, kcount);
	+ Kokkos::deep_copy(Ahost, Atest);
	+ host_view_type Bhost("Bhost", icount+2, jcount+2, kcount+2);
	+ Kokkos::deep_copy(Bhost, Btest);
	+
	+ // On KNL, this may vectorize - add print statement to prevent
	+ // Also, compare against epsilon, as vectorization can change bitwise answer
	+ for ( long l = 0; l < static_cast<long>(icount); ++l ) {
	+ for ( long j = 0; j < static_cast<long>(jcount); ++j ) {
	+ for ( long k = 0; k < static_cast<long>(kcount); ++k ) {
	+ ScalarType check = 0.25*(ScalarType)( Bhost(l+2,j,k) + Bhost(l+1,j,k)
	+ + Bhost(l,j+2,k) + Bhost(l,j+1,k)
	+ + Bhost(l,j,k+2) + Bhost(l,j,k+1)
	+ + Bhost(l,j,k) );
	+ if ( Ahost(l,j,k) - check != 0 ) {
	+ ++numErrors;
	+ std::cout << " Correctness error at index: " << l << ","<<j<<","<<k<<"\n"
	+ << " flat Ahost = " << Ahost(l,j,k) << " expected = " << check << std::endl;
	+ //exit(-1);
	+ }
	+ } } }
	+ if ( numErrors != 0 ) { std::cout << " RP collapse2: errors " << numErrors << " range product " << icountjcountkcount << " LL " << jcountkcount << " LR " << icountjcount << std::endl; }
	+ //else { std::cout << " RP collapse2: Pass! " << std::endl; }
	+ }
	+ }
	+
	+ return dt_min;
	+ }
	+
	+};
	+
	+
	+template< class DeviceType
	+ , typename ScalarType = double
	+ , typename TestLayout = Kokkos::LayoutRight
	+ >
	+struct RangePolicyCollapseAll
	+{
	+ // RangePolicy for 3D range, but will collapse all dims
	+
	+ typedef DeviceType execution_space;
	+ typedef typename execution_space::size_type size_type;
	+ typedef TestLayout layout;
	+
	+ typedef Kokkos::View<ScalarType***, TestLayout, DeviceType> view_type;
	+ typedef typename view_type::HostMirror host_view_type;
	+
	+ view_type A;
	+ view_type B;
	+ const long irange;
	+ const long jrange;
	+ const long krange;
	+
	+ RangePolicyCollapseAll(view_type & A_, const view_type & B_, const long &irange_, const long &jrange_, const long &krange_)
	+ : A(A_), B(B_), irange(irange_), jrange(jrange_), krange(krange_)
	+ {}
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()(const long r) const
	+ {
	+ if ( std::is_same<TestLayout, Kokkos::LayoutRight>::value )
	+ {
	+ long i = int(r / (jrange*krange));
	+ long j = int(( r - ijrangekrange)/krange);
	+ long k = int(r - ijrangekrange - j*krange);
	+ A(i,j,k) = 0.25*(ScalarType)( B(i+2,j,k) + B(i+1,j,k)
	+ + B(i,j+2,k) + B(i,j+1,k)
	+ + B(i,j,k+2) + B(i,j,k+1)
	+ + B(i,j,k) );
	+ }
	+ else if ( std::is_same<TestLayout, Kokkos::LayoutLeft>::value )
	+ {
	+ long k = int(r / (irange*jrange));
	+ long j = int(( r - kirangejrange)/irange);
	+ long i = int(r - kirangejrange - j*irange);
	+ A(i,j,k) = 0.25*(ScalarType)( B(i+2,j,k) + B(i+1,j,k)
	+ + B(i,j+2,k) + B(i,j+1,k)
	+ + B(i,j,k+2) + B(i,j,k+1)
	+ + B(i,j,k) );
	+ }
	+ }
	+
	+
	+ struct Init
	+ {
	+ view_type input;
	+ const long irange;
	+ const long jrange;
	+ const long krange;
	+
	+ Init(const view_type & input_, const long &irange_, const long &jrange_, const long &krange_)
	+ : input(input_), irange(irange_), jrange(jrange_), krange(krange_) {}
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()(const long r) const
	+ {
	+ if ( std::is_same<TestLayout, Kokkos::LayoutRight>::value )
	+ {
	+ long i = int(r / (jrange*krange));
	+ long j = int(( r - ijrangekrange)/krange);
	+ long k = int(r - ijrangekrange - j*krange);
	+ input(i,j,k) = 1;
	+ }
	+ else if ( std::is_same<TestLayout, Kokkos::LayoutLeft>::value )
	+ {
	+ long k = int(r / (irange*jrange));
	+ long j = int(( r - kirangejrange)/irange);
	+ long i = int(r - kirangejrange - j*irange);
	+ input(i,j,k) = 1;
	+ }
	+ }
	+ };
	+
	+
	+ static double test_collapse_all(const unsigned int icount, const unsigned int jcount, const unsigned int kcount, const long iter = 1)
	+ {
	+ //This test refers to collapsing all dims using the RangePolicy
	+ view_type Atest("Atest", icount, jcount, kcount);
	+ view_type Btest("Btest", icount+2, jcount+2, kcount+2);
	+ typedef RangePolicyCollapseAll<execution_space,ScalarType,TestLayout> FunctorType;
	+
	+ const long flat_index_range = icountjcountkcount;
	+ Kokkos::RangePolicy<execution_space> policy(0, flat_index_range );
	+ Kokkos::RangePolicy<execution_space> policy_initB(0, (icount+2)(jcount+2)(kcount+2) );
	+
	+ double dt_min = 0;
	+
	+ Kokkos::parallel_for( policy, Init(Atest,icount,jcount,kcount) );
	+ execution_space::fence();
	+ Kokkos::parallel_for( policy_initB, Init(Btest,icount+2,jcount+2,kcount+2) );
	+ execution_space::fence();
	+
	+ for (int i = 0; i < iter; ++i)
	+ {
	+ Kokkos::Timer timer;
	+ Kokkos::parallel_for(policy, FunctorType(Atest, Btest, icount, jcount, kcount));
	+ execution_space::fence();
	+ const double dt = timer.seconds();
	+ if ( 0 == i ) dt_min = dt ;
	+ else dt_min = dt < dt_min ? dt : dt_min ;
	+
	+ //Correctness check - first iteration only
	+ if ( 0 == i )
	+ {
	+ long numErrors = 0;
	+ host_view_type Ahost("Ahost", icount, jcount, kcount);
	+ Kokkos::deep_copy(Ahost, Atest);
	+ host_view_type Bhost("Bhost", icount+2, jcount+2, kcount+2);
	+ Kokkos::deep_copy(Bhost, Btest);
	+
	+ // On KNL, this may vectorize - add print statement to prevent
	+ // Also, compare against epsilon, as vectorization can change bitwise answer
	+ for ( long l = 0; l < static_cast<long>(icount); ++l ) {
	+ for ( long j = 0; j < static_cast<long>(jcount); ++j ) {
	+ for ( long k = 0; k < static_cast<long>(kcount); ++k ) {
	+ ScalarType check = 0.25*(ScalarType)( Bhost(l+2,j,k) + Bhost(l+1,j,k)
	+ + Bhost(l,j+2,k) + Bhost(l,j+1,k)
	+ + Bhost(l,j,k+2) + Bhost(l,j,k+1)
	+ + Bhost(l,j,k) );
	+ if ( Ahost(l,j,k) - check != 0 ) {
	+ ++numErrors;
	+ std::cout << " Callapse ALL Correctness error at index: " << l << ","<<j<<","<<k<<"\n"
	+ << " flat Ahost = " << Ahost(l,j,k) << " expected = " << check << std::endl;
	+ //exit(-1);
	+ }
	+ } } }
	+ if ( numErrors != 0 ) { std::cout << " RP collapse all: errors " << numErrors << " range product " << icountjcountkcount << " LL " << jcountkcount << " LR " << icountjcount << std::endl; }
	+ //else { std::cout << " RP collapse all: Pass! " << std::endl; }
	+ }
	+ }
	+
	+ return dt_min;
	+ }
	+
	+};
	+
	+} //end namespace Test
	diff --git a/lib/kokkos/core/src/CMakeLists.txt b/lib/kokkos/core/src/CMakeLists.txt
	index 807a01ed0..492470d05 100644
	--- a/lib/kokkos/core/src/CMakeLists.txt
	+++ b/lib/kokkos/core/src/CMakeLists.txt
	@@ -1,113 +1,111 @@

	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_Serial
	KOKKOS_HAVE_SERIAL
	"Whether to enable the Kokkos::Serial device. This device executes \"parallel\" kernels sequentially on a single CPU thread. It is enabled by default. If you disable this device, please enable at least one other CPU device, such as Kokkos::OpenMP or Kokkos::Threads."
	ON
	)

	ASSERT_DEFINED(${PROJECT_NAME}_ENABLE_CXX11)
	ASSERT_DEFINED(${PACKAGE_NAME}_ENABLE_CUDA)

	# Kokkos_ENABLE_CXX11_DISPATCH_LAMBDA governs whether Kokkos allows
	# use of lambdas at the outer level of parallel dispatch (that is, as
	# the argument to an outer parallel_for, parallel_reduce, or
	# parallel_scan). This works with non-CUDA execution spaces if C++11
	# is enabled. It does not currently work with public releases of
	# CUDA. If that changes, please change the default here to ON if CUDA
	# and C++11 are ON.
	IF (${PROJECT_NAME}_ENABLE_CXX11)
	IF (${PACKAGE_NAME}_ENABLE_CUDA)
	SET(Kokkos_ENABLE_CXX11_DISPATCH_LAMBDA_DEFAULT OFF)
	ELSE ()
	SET(Kokkos_ENABLE_CXX11_DISPATCH_LAMBDA_DEFAULT ON)
	ENDIF ()
	ELSE ()
	SET(Kokkos_ENABLE_CXX11_DISPATCH_LAMBDA_DEFAULT OFF)
	ENDIF ()

	TRIBITS_ADD_OPTION_AND_DEFINE(
	Kokkos_ENABLE_CXX11_DISPATCH_LAMBDA
	KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
	"Whether Kokkos allows use of lambdas at the outer level of parallel dispatch (that is, as the argument to an outer parallel_for, parallel_reduce, or parallel_scan). This requires C++11. It also does not currently work with public releases of CUDA. As a result, even if C++11 is enabled, this will be OFF by default if CUDA is enabled. If this option is ON, the macro KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA will be defined. For compatibility with Kokkos' Makefile build system, it is also possible to define that macro on the command line."
	${Kokkos_ENABLE_CXX11_DISPATCH_LAMBDA_DEFAULT}
	)

	TRIBITS_CONFIGURE_FILE(${PACKAGE_NAME}_config.h)

	INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
	INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})

	#-----------------------------------------------------------------------------

	SET(TRILINOS_INCDIR ${CMAKE_INSTALL_PREFIX}/${${PROJECT_NAME}_INSTALL_INCLUDE_DIR})

	#-----------------------------------------------------------------------------

	SET(HEADERS_PUBLIC "")
	SET(HEADERS_PRIVATE "")
	SET(SOURCES "")

	FILE(GLOB HEADERS_PUBLIC Kokkos*.hpp)
	LIST( APPEND HEADERS_PUBLIC ${CMAKE_CURRENT_BINARY_DIR}/${PACKAGE_NAME}_config.h )

	#-----------------------------------------------------------------------------

	FILE(GLOB HEADERS_IMPL impl/*.hpp)
	FILE(GLOB SOURCES_IMPL impl/*.cpp)

	LIST(APPEND HEADERS_PRIVATE ${HEADERS_IMPL} )
	LIST(APPEND SOURCES ${SOURCES_IMPL} )

	INSTALL(FILES ${HEADERS_IMPL} DESTINATION ${TRILINOS_INCDIR}/impl/)

	#-----------------------------------------------------------------------------

	FILE(GLOB HEADERS_THREADS Threads/*.hpp)
	FILE(GLOB SOURCES_THREADS Threads/*.cpp)

	LIST(APPEND HEADERS_PRIVATE ${HEADERS_THREADS} )
	LIST(APPEND SOURCES ${SOURCES_THREADS} )

	INSTALL(FILES ${HEADERS_THREADS} DESTINATION ${TRILINOS_INCDIR}/Threads/)

	#-----------------------------------------------------------------------------

	FILE(GLOB HEADERS_OPENMP OpenMP/*.hpp)
	FILE(GLOB SOURCES_OPENMP OpenMP/*.cpp)

	LIST(APPEND HEADERS_PRIVATE ${HEADERS_OPENMP} )
	LIST(APPEND SOURCES ${SOURCES_OPENMP} )

	INSTALL(FILES ${HEADERS_OPENMP} DESTINATION ${TRILINOS_INCDIR}/OpenMP/)

	#-----------------------------------------------------------------------------

	FILE(GLOB HEADERS_CUDA Cuda/*.hpp)
	FILE(GLOB SOURCES_CUDA Cuda/*.cpp)

	LIST(APPEND HEADERS_PRIVATE ${HEADERS_CUDA} )
	LIST(APPEND SOURCES ${SOURCES_CUDA} )

	INSTALL(FILES ${HEADERS_CUDA} DESTINATION ${TRILINOS_INCDIR}/Cuda/)

	#-----------------------------------------------------------------------------
	-FILE(GLOB HEADERS_QTHREAD Qthread/*.hpp)
	-FILE(GLOB SOURCES_QTHREAD Qthread/*.cpp)
	+FILE(GLOB HEADERS_QTHREADS Qthreads/*.hpp)
	+FILE(GLOB SOURCES_QTHREADS Qthreads/*.cpp)

	-LIST(APPEND HEADERS_PRIVATE ${HEADERS_QTHREAD} )
	-LIST(APPEND SOURCES ${SOURCES_QTHREAD} )
	+LIST(APPEND HEADERS_PRIVATE ${HEADERS_QTHREADS} )
	+LIST(APPEND SOURCES ${SOURCES_QTHREADS} )

	-INSTALL(FILES ${HEADERS_QTHREAD} DESTINATION ${TRILINOS_INCDIR}/Qthread/)
	+INSTALL(FILES ${HEADERS_QTHREADS} DESTINATION ${TRILINOS_INCDIR}/Qthreads/)

	#-----------------------------------------------------------------------------

	TRIBITS_ADD_LIBRARY(
	kokkoscore
	HEADERS ${HEADERS_PUBLIC}
	NOINSTALLHEADERS ${HEADERS_PRIVATE}
	SOURCES ${SOURCES}
	DEPLIBS
	)
	-
	-
	diff --git a/lib/kokkos/core/src/Cuda/KokkosExp_Cuda_IterateTile.hpp b/lib/kokkos/core/src/Cuda/KokkosExp_Cuda_IterateTile.hpp
	new file mode 100644
	index 000000000..e0eadb25a
	--- /dev/null
	+++ b/lib/kokkos/core/src/Cuda/KokkosExp_Cuda_IterateTile.hpp
	@@ -0,0 +1,1300 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+#ifndef KOKKOS_CUDA_EXP_ITERATE_TILE_HPP
	+#define KOKKOS_CUDA_EXP_ITERATE_TILE_HPP
	+
	+#include <iostream>
	+#include <algorithm>
	+#include <stdio.h>
	+
	+#include <Kokkos_Macros.hpp>
	+
	+/* only compile this file if CUDA is enabled for Kokkos */
	+#if defined( __CUDACC__ ) && defined( KOKKOS_HAVE_CUDA )
	+
	+#include <utility>
	+
	+//#include<Cuda/Kokkos_CudaExec.hpp>
	+// Including the file above, leads to following type of errors:
	+// /home/ndellin/kokkos/core/src/Cuda/Kokkos_CudaExec.hpp(84): error: incomplete type is not allowed
	+// As a result, recreate cuda_parallel_launch and associated code
	+
	+#if defined(KOKKOS_ENABLE_PROFILING)
	+#include <impl/Kokkos_Profiling_Interface.hpp>
	+#include <typeinfo>
	+#endif
	+
	+namespace Kokkos { namespace Experimental { namespace Impl {
	+
	+// ------------------------------------------------------------------ //
	+
	+template< class DriverType >
	+__global__
	+static void cuda_parallel_launch( const DriverType driver )
	+{
	+ driver();
	+}
	+
	+template< class DriverType >
	+struct CudaLaunch
	+{
	+ inline
	+ CudaLaunch( const DriverType & driver
	+ , const dim3 & grid
	+ , const dim3 & block
	+ )
	+ {
	+ cuda_parallel_launch< DriverType ><<< grid , block >>>(driver);
	+ }
	+
	+};
	+
	+// ------------------------------------------------------------------ //
	+template< int N , typename RP , typename Functor , typename Tag >
	+struct apply_impl;
	+
	+//Rank 2
	+// Specializations for void tag type
	+template< typename RP , typename Functor >
	+struct apply_impl<2,RP,Functor,void >
	+{
	+ using index_type = typename RP::index_type;
	+
	+ __device__
	+ apply_impl( const RP & rp_ , const Functor & f_ )
	+ : m_rp(rp_)
	+ , m_func(f_)
	+ {}
	+
	+ inline __device__
	+ void exec_range() const
	+ {
	+// LL
	+ if (RP::inner_direction == RP::Left) {
	+ /*
	+ index_type offset_1 = blockIdx.y*m_rp.m_tile[1] + threadIdx.y;
	+ index_type offset_0 = blockIdx.x*m_rp.m_tile[0] + threadIdx.x;
	+
	+ for ( index_type j = offset_1; j < m_rp.m_upper[1], threadIdx.y < m_rp.m_tile[1]; j += (gridDim.y*m_rp.m_tile[1]) ) {
	+ for ( index_type i = offset_0; i < m_rp.m_upper[0], threadIdx.x < m_rp.m_tile[0]; i += (gridDim.x*m_rp.m_tile[0]) ) {
	+ m_func(i, j);
	+ } }
	+*/
	+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
	+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
	+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
	+
	+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
	+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
	+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
	+ m_func(offset_0 , offset_1);
	+ }
	+ }
	+ }
	+ }
	+ }
	+// LR
	+ else {
	+/*
	+ index_type offset_1 = blockIdx.y*m_rp.m_tile[1] + threadIdx.y;
	+ index_type offset_0 = blockIdx.x*m_rp.m_tile[0] + threadIdx.x;
	+
	+ for ( index_type i = offset_0; i < m_rp.m_upper[0], threadIdx.x < m_rp.m_tile[0]; i += (gridDim.x*m_rp.m_tile[0]) ) {
	+ for ( index_type j = offset_1; j < m_rp.m_upper[1], threadIdx.y < m_rp.m_tile[1]; j += (gridDim.y*m_rp.m_tile[1]) ) {
	+ m_func(i, j);
	+ } }
	+*/
	+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
	+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
	+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
	+
	+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
	+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
	+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
	+ m_func(offset_0 , offset_1);
	+ }
	+ }
	+ }
	+ }
	+ }
	+
	+ } //end exec_range
	+
	+private:
	+ const RP & m_rp;
	+ const Functor & m_func;
	+
	+};
	+
	+// Specializations for tag type
	+template< typename RP , typename Functor , typename Tag >
	+struct apply_impl<2,RP,Functor,Tag>
	+{
	+ using index_type = typename RP::index_type;
	+
	+ inline __device__
	+ apply_impl( const RP & rp_ , const Functor & f_ )
	+ : m_rp(rp_)
	+ , m_func(f_)
	+ {}
	+
	+ inline __device__
	+ void exec_range() const
	+ {
	+ if (RP::inner_direction == RP::Left) {
	+ // Loop over size maxnumblocks until full range covered
	+/*
	+ index_type offset_1 = blockIdx.y*m_rp.m_tile[1] + threadIdx.y;
	+ index_type offset_0 = blockIdx.x*m_rp.m_tile[0] + threadIdx.x;
	+
	+ for ( index_type j = offset_1; j < m_rp.m_upper[1], threadIdx.y < m_rp.m_tile[1]; j += (gridDim.y*m_rp.m_tile[1]) ) {
	+ for ( index_type i = offset_0; i < m_rp.m_upper[0], threadIdx.x < m_rp.m_tile[0]; i += (gridDim.x*m_rp.m_tile[0]) ) {
	+ m_func(Tag(), i, j);
	+ } }
	+*/
	+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
	+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
	+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
	+
	+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
	+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
	+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
	+ m_func(Tag(), offset_0 , offset_1);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ else {
	+/*
	+ index_type offset_1 = blockIdx.y*m_rp.m_tile[1] + threadIdx.y;
	+ index_type offset_0 = blockIdx.x*m_rp.m_tile[0] + threadIdx.x;
	+
	+ for ( index_type i = offset_0; i < m_rp.m_upper[0], threadIdx.x < m_rp.m_tile[0]; i += (gridDim.x*m_rp.m_tile[0]) ) {
	+ for ( index_type j = offset_1; j < m_rp.m_upper[1], threadIdx.y < m_rp.m_tile[1]; j += (gridDim.y*m_rp.m_tile[1]) ) {
	+ m_func(Tag(), i, j);
	+ } }
	+*/
	+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
	+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
	+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
	+
	+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
	+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
	+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
	+ m_func(Tag(), offset_0 , offset_1);
	+ }
	+ }
	+ }
	+ }
	+ }
	+
	+ } //end exec_range
	+
	+private:
	+ const RP & m_rp;
	+ const Functor & m_func;
	+};
	+
	+
	+//Rank 3
	+// Specializations for void tag type
	+template< typename RP , typename Functor >
	+struct apply_impl<3,RP,Functor,void >
	+{
	+ using index_type = typename RP::index_type;
	+
	+ __device__
	+ apply_impl( const RP & rp_ , const Functor & f_ )
	+ : m_rp(rp_)
	+ , m_func(f_)
	+ {}
	+
	+ inline __device__
	+ void exec_range() const
	+ {
	+// LL
	+ if (RP::inner_direction == RP::Left) {
	+ for ( index_type tile_id2 = blockIdx.z; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.z ) {
	+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.z;
	+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.z < m_rp.m_tile[2] ) {
	+
	+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
	+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
	+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
	+
	+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
	+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
	+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
	+ m_func(offset_0 , offset_1 , offset_2);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+// LR
	+ else {
	+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
	+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
	+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
	+
	+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
	+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
	+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
	+
	+ for ( index_type tile_id2 = blockIdx.z; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.z ) {
	+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.z;
	+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.z < m_rp.m_tile[2] ) {
	+ m_func(offset_0 , offset_1 , offset_2);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+
	+ } //end exec_range
	+
	+private:
	+ const RP & m_rp;
	+ const Functor & m_func;
	+};
	+
	+// Specializations for void tag type
	+template< typename RP , typename Functor , typename Tag >
	+struct apply_impl<3,RP,Functor,Tag>
	+{
	+ using index_type = typename RP::index_type;
	+
	+ inline __device__
	+ apply_impl( const RP & rp_ , const Functor & f_ )
	+ : m_rp(rp_)
	+ , m_func(f_)
	+ {}
	+
	+ inline __device__
	+ void exec_range() const
	+ {
	+ if (RP::inner_direction == RP::Left) {
	+ for ( index_type tile_id2 = blockIdx.z; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.z ) {
	+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.z;
	+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.z < m_rp.m_tile[2] ) {
	+
	+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
	+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
	+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
	+
	+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
	+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
	+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
	+ m_func(Tag(), offset_0 , offset_1 , offset_2);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ else {
	+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
	+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
	+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
	+
	+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
	+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
	+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
	+
	+ for ( index_type tile_id2 = blockIdx.z; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.z ) {
	+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.z;
	+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.z < m_rp.m_tile[2] ) {
	+ m_func(Tag(), offset_0 , offset_1 , offset_2);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+
	+ } //end exec_range
	+
	+private:
	+ const RP & m_rp;
	+ const Functor & m_func;
	+};
	+
	+
	+//Rank 4
	+// Specializations for void tag type
	+template< typename RP , typename Functor >
	+struct apply_impl<4,RP,Functor,void >
	+{
	+ using index_type = typename RP::index_type;
	+
	+ __device__
	+ apply_impl( const RP & rp_ , const Functor & f_ )
	+ : m_rp(rp_)
	+ , m_func(f_)
	+ {}
	+
	+ static constexpr index_type max_blocks = 65535;
	+
	+ inline __device__
	+ void exec_range() const
	+ {
	+// LL
	+ if (RP::inner_direction == RP::Left) {
	+ const index_type temp0 = m_rp.m_tile_end[0];
	+ const index_type temp1 = m_rp.m_tile_end[1];
	+ const index_type numbl0 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
	+ const index_type numbl1 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl0 ) :
	+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
	+
	+ const index_type tile_id0 = blockIdx.x % numbl0;
	+ const index_type tile_id1 = blockIdx.x / numbl0;
	+ const index_type thr_id0 = threadIdx.x % m_rp.m_tile[0];
	+ const index_type thr_id1 = threadIdx.x / m_rp.m_tile[0];
	+
	+ for ( index_type tile_id3 = blockIdx.z; tile_id3 < m_rp.m_tile_end[3]; tile_id3 += gridDim.z ) {
	+ const index_type offset_3 = tile_id3*m_rp.m_tile[3] + threadIdx.z;
	+ if ( offset_3 < m_rp.m_upper[3] && threadIdx.z < m_rp.m_tile[3] ) {
	+
	+ for ( index_type tile_id2 = blockIdx.y; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.y ) {
	+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.y;
	+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.y < m_rp.m_tile[2] ) {
	+
	+ for ( index_type j = tile_id1 ; j < m_rp.m_tile_end[1]; j += numbl1 ) {
	+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
	+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
	+
	+ for ( index_type i = tile_id0 ; i < m_rp.m_tile_end[0]; i += numbl0 ) {
	+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
	+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
	+ m_func(offset_0 , offset_1 , offset_2 , offset_3);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+// LR
	+ else {
	+ const index_type temp0 = m_rp.m_tile_end[0];
	+ const index_type temp1 = m_rp.m_tile_end[1];
	+ const index_type numbl1 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
	+ const index_type numbl0 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl1 ) :
	+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
	+
	+ const index_type tile_id0 = blockIdx.x / numbl1;
	+ const index_type tile_id1 = blockIdx.x % numbl1;
	+ const index_type thr_id0 = threadIdx.x / m_rp.m_tile[1];
	+ const index_type thr_id1 = threadIdx.x % m_rp.m_tile[1];
	+
	+ for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
	+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
	+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
	+
	+ for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
	+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
	+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
	+
	+ for ( index_type tile_id2 = blockIdx.y; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.y ) {
	+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.y;
	+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.y < m_rp.m_tile[2] ) {
	+
	+ for ( index_type tile_id3 = blockIdx.z; tile_id3 < m_rp.m_tile_end[3]; tile_id3 += gridDim.z ) {
	+ const index_type offset_3 = tile_id3*m_rp.m_tile[3] + threadIdx.z;
	+ if ( offset_3 < m_rp.m_upper[3] && threadIdx.z < m_rp.m_tile[3] ) {
	+ m_func(offset_0 , offset_1 , offset_2 , offset_3);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+
	+ } //end exec_range
	+
	+private:
	+ const RP & m_rp;
	+ const Functor & m_func;
	+};
	+
	+// Specializations for void tag type
	+template< typename RP , typename Functor , typename Tag >
	+struct apply_impl<4,RP,Functor,Tag>
	+{
	+ using index_type = typename RP::index_type;
	+
	+ inline __device__
	+ apply_impl( const RP & rp_ , const Functor & f_ )
	+ : m_rp(rp_)
	+ , m_func(f_)
	+ {}
	+
	+ static constexpr index_type max_blocks = 65535;
	+
	+ inline __device__
	+ void exec_range() const
	+ {
	+ if (RP::inner_direction == RP::Left) {
	+ const index_type temp0 = m_rp.m_tile_end[0];
	+ const index_type temp1 = m_rp.m_tile_end[1];
	+ const index_type numbl0 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
	+ const index_type numbl1 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl0 ) :
	+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
	+
	+ const index_type tile_id0 = blockIdx.x % numbl0;
	+ const index_type tile_id1 = blockIdx.x / numbl0;
	+ const index_type thr_id0 = threadIdx.x % m_rp.m_tile[0];
	+ const index_type thr_id1 = threadIdx.x / m_rp.m_tile[0];
	+
	+ for ( index_type tile_id3 = blockIdx.z; tile_id3 < m_rp.m_tile_end[3]; tile_id3 += gridDim.z ) {
	+ const index_type offset_3 = tile_id3*m_rp.m_tile[3] + threadIdx.z;
	+ if ( offset_3 < m_rp.m_upper[3] && threadIdx.z < m_rp.m_tile[3] ) {
	+
	+ for ( index_type tile_id2 = blockIdx.y; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.y ) {
	+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.y;
	+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.y < m_rp.m_tile[2] ) {
	+
	+ for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
	+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
	+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
	+
	+ for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
	+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
	+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
	+ m_func(Tag(), offset_0 , offset_1 , offset_2 , offset_3);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ else {
	+ const index_type temp0 = m_rp.m_tile_end[0];
	+ const index_type temp1 = m_rp.m_tile_end[1];
	+ const index_type numbl1 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
	+ const index_type numbl0 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl1 ) :
	+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
	+
	+ const index_type tile_id0 = blockIdx.x / numbl1;
	+ const index_type tile_id1 = blockIdx.x % numbl1;
	+ const index_type thr_id0 = threadIdx.x / m_rp.m_tile[1];
	+ const index_type thr_id1 = threadIdx.x % m_rp.m_tile[1];
	+
	+ for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
	+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
	+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
	+
	+ for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
	+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + thr_id1;
	+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
	+
	+ for ( index_type tile_id2 = blockIdx.y; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.y ) {
	+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.y;
	+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.y < m_rp.m_tile[2] ) {
	+
	+ for ( index_type tile_id3 = blockIdx.z; tile_id3 < m_rp.m_tile_end[3]; tile_id3 += gridDim.z ) {
	+ const index_type offset_3 = tile_id3*m_rp.m_tile[3] + threadIdx.z;
	+ if ( offset_3 < m_rp.m_upper[3] && threadIdx.z < m_rp.m_tile[3] ) {
	+ m_func(Tag() , offset_0 , offset_1 , offset_2 , offset_3);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+
	+ } //end exec_range
	+
	+private:
	+ const RP & m_rp;
	+ const Functor & m_func;
	+};
	+
	+
	+//Rank 5
	+// Specializations for void tag type
	+template< typename RP , typename Functor >
	+struct apply_impl<5,RP,Functor,void >
	+{
	+ using index_type = typename RP::index_type;
	+
	+ __device__
	+ apply_impl( const RP & rp_ , const Functor & f_ )
	+ : m_rp(rp_)
	+ , m_func(f_)
	+ {}
	+
	+ static constexpr index_type max_blocks = 65535;
	+
	+ inline __device__
	+ void exec_range() const
	+ {
	+// LL
	+ if (RP::inner_direction == RP::Left) {
	+
	+ index_type temp0 = m_rp.m_tile_end[0];
	+ index_type temp1 = m_rp.m_tile_end[1];
	+ const index_type numbl0 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
	+ const index_type numbl1 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl0 ) :
	+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
	+
	+ const index_type tile_id0 = blockIdx.x % numbl0;
	+ const index_type tile_id1 = blockIdx.x / numbl0;
	+ const index_type thr_id0 = threadIdx.x % m_rp.m_tile[0];
	+ const index_type thr_id1 = threadIdx.x / m_rp.m_tile[0];
	+
	+ temp0 = m_rp.m_tile_end[2];
	+ temp1 = m_rp.m_tile_end[3];
	+ const index_type numbl2 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
	+ const index_type numbl3 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl2 ) :
	+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
	+
	+ const index_type tile_id2 = blockIdx.y % numbl2;
	+ const index_type tile_id3 = blockIdx.y / numbl2;
	+ const index_type thr_id2 = threadIdx.y % m_rp.m_tile[2];
	+ const index_type thr_id3 = threadIdx.y / m_rp.m_tile[2];
	+
	+ for ( index_type tile_id4 = blockIdx.z; tile_id4 < m_rp.m_tile_end[4]; tile_id4 += gridDim.z ) {
	+ const index_type offset_4 = tile_id4*m_rp.m_tile[4] + threadIdx.z;
	+ if ( offset_4 < m_rp.m_upper[4] && threadIdx.z < m_rp.m_tile[4] ) {
	+
	+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
	+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
	+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
	+
	+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
	+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
	+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
	+
	+ for ( index_type j = tile_id1 ; j < m_rp.m_tile_end[1]; j += numbl1 ) {
	+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
	+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
	+
	+ for ( index_type i = tile_id0 ; i < m_rp.m_tile_end[0]; i += numbl0 ) {
	+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
	+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
	+ m_func(offset_0 , offset_1 , offset_2 , offset_3, offset_4);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+// LR
	+ else {
	+ index_type temp0 = m_rp.m_tile_end[0];
	+ index_type temp1 = m_rp.m_tile_end[1];
	+ const index_type numbl1 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
	+ const index_type numbl0 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl1 ) :
	+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
	+
	+ const index_type tile_id0 = blockIdx.x / numbl1;
	+ const index_type tile_id1 = blockIdx.x % numbl1;
	+ const index_type thr_id0 = threadIdx.x / m_rp.m_tile[1];
	+ const index_type thr_id1 = threadIdx.x % m_rp.m_tile[1];
	+
	+ temp0 = m_rp.m_tile_end[2];
	+ temp1 = m_rp.m_tile_end[3];
	+ const index_type numbl3 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
	+ const index_type numbl2 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl3 ) :
	+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
	+
	+ const index_type tile_id2 = blockIdx.y / numbl3;
	+ const index_type tile_id3 = blockIdx.y % numbl3;
	+ const index_type thr_id2 = threadIdx.y / m_rp.m_tile[3];
	+ const index_type thr_id3 = threadIdx.y % m_rp.m_tile[3];
	+
	+ for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
	+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
	+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
	+
	+ for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
	+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
	+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
	+
	+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
	+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
	+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
	+
	+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
	+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
	+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
	+
	+ for ( index_type tile_id4 = blockIdx.z; tile_id4 < m_rp.m_tile_end[4]; tile_id4 += gridDim.z ) {
	+ const index_type offset_4 = tile_id4*m_rp.m_tile[4] + threadIdx.z;
	+ if ( offset_4 < m_rp.m_upper[4] && threadIdx.z < m_rp.m_tile[4] ) {
	+ m_func(offset_0 , offset_1 , offset_2 , offset_3 , offset_4);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+
	+ } //end exec_range
	+
	+private:
	+ const RP & m_rp;
	+ const Functor & m_func;
	+};
	+
	+// Specializations for tag type
	+template< typename RP , typename Functor , typename Tag >
	+struct apply_impl<5,RP,Functor,Tag>
	+{
	+ using index_type = typename RP::index_type;
	+
	+ __device__
	+ apply_impl( const RP & rp_ , const Functor & f_ )
	+ : m_rp(rp_)
	+ , m_func(f_)
	+ {}
	+
	+ static constexpr index_type max_blocks = 65535;
	+
	+ inline __device__
	+ void exec_range() const
	+ {
	+// LL
	+ if (RP::inner_direction == RP::Left) {
	+ index_type temp0 = m_rp.m_tile_end[0];
	+ index_type temp1 = m_rp.m_tile_end[1];
	+ const index_type numbl0 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
	+ const index_type numbl1 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl0 ) :
	+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
	+
	+ const index_type tile_id0 = blockIdx.x % numbl0;
	+ const index_type tile_id1 = blockIdx.x / numbl0;
	+ const index_type thr_id0 = threadIdx.x % m_rp.m_tile[0];
	+ const index_type thr_id1 = threadIdx.x / m_rp.m_tile[0];
	+
	+ temp0 = m_rp.m_tile_end[2];
	+ temp1 = m_rp.m_tile_end[3];
	+ const index_type numbl2 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
	+ const index_type numbl3 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl2 ) :
	+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
	+
	+ const index_type tile_id2 = blockIdx.y % numbl2;
	+ const index_type tile_id3 = blockIdx.y / numbl2;
	+ const index_type thr_id2 = threadIdx.y % m_rp.m_tile[2];
	+ const index_type thr_id3 = threadIdx.y / m_rp.m_tile[2];
	+
	+ for ( index_type tile_id4 = blockIdx.z; tile_id4 < m_rp.m_tile_end[4]; tile_id4 += gridDim.z ) {
	+ const index_type offset_4 = tile_id4*m_rp.m_tile[4] + threadIdx.z;
	+ if ( offset_4 < m_rp.m_upper[4] && threadIdx.z < m_rp.m_tile[4] ) {
	+
	+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
	+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
	+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
	+
	+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
	+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
	+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
	+
	+ for ( index_type j = tile_id1 ; j < m_rp.m_tile_end[1]; j += numbl1 ) {
	+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
	+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
	+
	+ for ( index_type i = tile_id0 ; i < m_rp.m_tile_end[0]; i += numbl0 ) {
	+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
	+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
	+ m_func(Tag() , offset_0 , offset_1 , offset_2 , offset_3, offset_4);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+// LR
	+ else {
	+ index_type temp0 = m_rp.m_tile_end[0];
	+ index_type temp1 = m_rp.m_tile_end[1];
	+ const index_type numbl1 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
	+ const index_type numbl0 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl1 ) :
	+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
	+
	+ const index_type tile_id0 = blockIdx.x / numbl1;
	+ const index_type tile_id1 = blockIdx.x % numbl1;
	+ const index_type thr_id0 = threadIdx.x / m_rp.m_tile[1];
	+ const index_type thr_id1 = threadIdx.x % m_rp.m_tile[1];
	+
	+ temp0 = m_rp.m_tile_end[2];
	+ temp1 = m_rp.m_tile_end[3];
	+ const index_type numbl3 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
	+ const index_type numbl2 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl3 ) :
	+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
	+
	+ const index_type tile_id2 = blockIdx.y / numbl3;
	+ const index_type tile_id3 = blockIdx.y % numbl3;
	+ const index_type thr_id2 = threadIdx.y / m_rp.m_tile[3];
	+ const index_type thr_id3 = threadIdx.y % m_rp.m_tile[3];
	+
	+ for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
	+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
	+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
	+
	+ for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
	+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
	+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
	+
	+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
	+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
	+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
	+
	+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
	+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
	+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
	+
	+ for ( index_type tile_id4 = blockIdx.z; tile_id4 < m_rp.m_tile_end[4]; tile_id4 += gridDim.z ) {
	+ const index_type offset_4 = tile_id4*m_rp.m_tile[4] + threadIdx.z;
	+ if ( offset_4 < m_rp.m_upper[4] && threadIdx.z < m_rp.m_tile[4] ) {
	+ m_func(Tag() , offset_0 , offset_1 , offset_2 , offset_3 , offset_4);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+
	+ } //end exec_range
	+
	+private:
	+ const RP & m_rp;
	+ const Functor & m_func;
	+};
	+
	+
	+//Rank 6
	+// Specializations for void tag type
	+template< typename RP , typename Functor >
	+struct apply_impl<6,RP,Functor,void >
	+{
	+ using index_type = typename RP::index_type;
	+
	+ __device__
	+ apply_impl( const RP & rp_ , const Functor & f_ )
	+ : m_rp(rp_)
	+ , m_func(f_)
	+ {}
	+
	+ static constexpr index_type max_blocks = 65535;
	+
	+ inline __device__
	+ void exec_range() const
	+ {
	+// LL
	+ if (RP::inner_direction == RP::Left) {
	+ index_type temp0 = m_rp.m_tile_end[0];
	+ index_type temp1 = m_rp.m_tile_end[1];
	+ const index_type numbl0 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
	+ const index_type numbl1 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl0 ) :
	+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
	+
	+ const index_type tile_id0 = blockIdx.x % numbl0;
	+ const index_type tile_id1 = blockIdx.x / numbl0;
	+ const index_type thr_id0 = threadIdx.x % m_rp.m_tile[0];
	+ const index_type thr_id1 = threadIdx.x / m_rp.m_tile[0];
	+
	+ temp0 = m_rp.m_tile_end[2];
	+ temp1 = m_rp.m_tile_end[3];
	+ const index_type numbl2 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
	+ const index_type numbl3 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl2 ) :
	+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
	+
	+ const index_type tile_id2 = blockIdx.y % numbl2;
	+ const index_type tile_id3 = blockIdx.y / numbl2;
	+ const index_type thr_id2 = threadIdx.y % m_rp.m_tile[2];
	+ const index_type thr_id3 = threadIdx.y / m_rp.m_tile[2];
	+
	+ temp0 = m_rp.m_tile_end[4];
	+ temp1 = m_rp.m_tile_end[5];
	+ const index_type numbl4 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
	+ const index_type numbl5 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl4 ) :
	+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
	+
	+ const index_type tile_id4 = blockIdx.z % numbl4;
	+ const index_type tile_id5 = blockIdx.z / numbl4;
	+ const index_type thr_id4 = threadIdx.z % m_rp.m_tile[4];
	+ const index_type thr_id5 = threadIdx.z / m_rp.m_tile[4];
	+
	+ for ( index_type n = tile_id5; n < m_rp.m_tile_end[5]; n += numbl5 ) {
	+ const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5;
	+ if ( offset_5 < m_rp.m_upper[5] && thr_id5 < m_rp.m_tile[5] ) {
	+
	+ for ( index_type m = tile_id4; m < m_rp.m_tile_end[4]; m += numbl4 ) {
	+ const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4;
	+ if ( offset_4 < m_rp.m_upper[4] && thr_id4 < m_rp.m_tile[4] ) {
	+
	+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
	+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
	+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
	+
	+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
	+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
	+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
	+
	+ for ( index_type j = tile_id1 ; j < m_rp.m_tile_end[1]; j += numbl1 ) {
	+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
	+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
	+
	+ for ( index_type i = tile_id0 ; i < m_rp.m_tile_end[0]; i += numbl0 ) {
	+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
	+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
	+ m_func(offset_0 , offset_1 , offset_2 , offset_3, offset_4, offset_5);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+// LR
	+ else {
	+ index_type temp0 = m_rp.m_tile_end[0];
	+ index_type temp1 = m_rp.m_tile_end[1];
	+ const index_type numbl1 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
	+ const index_type numbl0 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl1 ) :
	+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
	+
	+ const index_type tile_id0 = blockIdx.x / numbl1;
	+ const index_type tile_id1 = blockIdx.x % numbl1;
	+ const index_type thr_id0 = threadIdx.x / m_rp.m_tile[1];
	+ const index_type thr_id1 = threadIdx.x % m_rp.m_tile[1];
	+
	+ temp0 = m_rp.m_tile_end[2];
	+ temp1 = m_rp.m_tile_end[3];
	+ const index_type numbl3 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
	+ const index_type numbl2 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl3 ) :
	+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
	+
	+ const index_type tile_id2 = blockIdx.y / numbl3;
	+ const index_type tile_id3 = blockIdx.y % numbl3;
	+ const index_type thr_id2 = threadIdx.y / m_rp.m_tile[3];
	+ const index_type thr_id3 = threadIdx.y % m_rp.m_tile[3];
	+
	+ temp0 = m_rp.m_tile_end[4];
	+ temp1 = m_rp.m_tile_end[5];
	+ const index_type numbl5 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
	+ const index_type numbl4 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl5 ) :
	+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
	+
	+ const index_type tile_id4 = blockIdx.z / numbl5;
	+ const index_type tile_id5 = blockIdx.z % numbl5;
	+ const index_type thr_id4 = threadIdx.z / m_rp.m_tile[5];
	+ const index_type thr_id5 = threadIdx.z % m_rp.m_tile[5];
	+
	+ for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
	+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
	+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
	+
	+ for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
	+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
	+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
	+
	+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
	+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
	+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
	+
	+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
	+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
	+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
	+
	+ for ( index_type m = tile_id4; m < m_rp.m_tile_end[4]; m += numbl4 ) {
	+ const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4;
	+ if ( offset_4 < m_rp.m_upper[4] && thr_id4 < m_rp.m_tile[4] ) {
	+
	+ for ( index_type n = tile_id5; n < m_rp.m_tile_end[5]; n += numbl5 ) {
	+ const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5;
	+ if ( offset_5 < m_rp.m_upper[5] && thr_id5 < m_rp.m_tile[5] ) {
	+ m_func(offset_0 , offset_1 , offset_2 , offset_3 , offset_4 , offset_5);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+
	+ } //end exec_range
	+
	+private:
	+ const RP & m_rp;
	+ const Functor & m_func;
	+};
	+
	+// Specializations for tag type
	+template< typename RP , typename Functor , typename Tag >
	+struct apply_impl<6,RP,Functor,Tag>
	+{
	+ using index_type = typename RP::index_type;
	+
	+ __device__
	+ apply_impl( const RP & rp_ , const Functor & f_ )
	+ : m_rp(rp_)
	+ , m_func(f_)
	+ {}
	+
	+ static constexpr index_type max_blocks = 65535;
	+
	+ inline __device__
	+ void exec_range() const
	+ {
	+// LL
	+ if (RP::inner_direction == RP::Left) {
	+ index_type temp0 = m_rp.m_tile_end[0];
	+ index_type temp1 = m_rp.m_tile_end[1];
	+ const index_type numbl0 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
	+ const index_type numbl1 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl0 ) :
	+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
	+
	+ const index_type tile_id0 = blockIdx.x % numbl0;
	+ const index_type tile_id1 = blockIdx.x / numbl0;
	+ const index_type thr_id0 = threadIdx.x % m_rp.m_tile[0];
	+ const index_type thr_id1 = threadIdx.x / m_rp.m_tile[0];
	+
	+ temp0 = m_rp.m_tile_end[2];
	+ temp1 = m_rp.m_tile_end[3];
	+ const index_type numbl2 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
	+ const index_type numbl3 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl2 ) :
	+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
	+
	+ const index_type tile_id2 = blockIdx.y % numbl2;
	+ const index_type tile_id3 = blockIdx.y / numbl2;
	+ const index_type thr_id2 = threadIdx.y % m_rp.m_tile[2];
	+ const index_type thr_id3 = threadIdx.y / m_rp.m_tile[2];
	+
	+ temp0 = m_rp.m_tile_end[4];
	+ temp1 = m_rp.m_tile_end[5];
	+ const index_type numbl4 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
	+ const index_type numbl5 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl4 ) :
	+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
	+
	+ const index_type tile_id4 = blockIdx.z % numbl4;
	+ const index_type tile_id5 = blockIdx.z / numbl4;
	+ const index_type thr_id4 = threadIdx.z % m_rp.m_tile[4];
	+ const index_type thr_id5 = threadIdx.z / m_rp.m_tile[4];
	+
	+ for ( index_type n = tile_id5; n < m_rp.m_tile_end[5]; n += numbl5 ) {
	+ const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5;
	+ if ( offset_5 < m_rp.m_upper[5] && thr_id5 < m_rp.m_tile[5] ) {
	+
	+ for ( index_type m = tile_id4; m < m_rp.m_tile_end[4]; m += numbl4 ) {
	+ const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4;
	+ if ( offset_4 < m_rp.m_upper[4] && thr_id4 < m_rp.m_tile[4] ) {
	+
	+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
	+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
	+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
	+
	+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
	+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
	+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
	+
	+ for ( index_type j = tile_id1 ; j < m_rp.m_tile_end[1]; j += numbl1 ) {
	+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
	+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
	+
	+ for ( index_type i = tile_id0 ; i < m_rp.m_tile_end[0]; i += numbl0 ) {
	+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
	+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
	+ m_func(Tag() , offset_0 , offset_1 , offset_2 , offset_3, offset_4, offset_5);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+// LR
	+ else {
	+ index_type temp0 = m_rp.m_tile_end[0];
	+ index_type temp1 = m_rp.m_tile_end[1];
	+ const index_type numbl1 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
	+ const index_type numbl0 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl1 ) :
	+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
	+
	+ const index_type tile_id0 = blockIdx.x / numbl1;
	+ const index_type tile_id1 = blockIdx.x % numbl1;
	+ const index_type thr_id0 = threadIdx.x / m_rp.m_tile[1];
	+ const index_type thr_id1 = threadIdx.x % m_rp.m_tile[1];
	+
	+ temp0 = m_rp.m_tile_end[2];
	+ temp1 = m_rp.m_tile_end[3];
	+ const index_type numbl3 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
	+ const index_type numbl2 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl3 ) :
	+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
	+
	+ const index_type tile_id2 = blockIdx.y / numbl3;
	+ const index_type tile_id3 = blockIdx.y % numbl3;
	+ const index_type thr_id2 = threadIdx.y / m_rp.m_tile[3];
	+ const index_type thr_id3 = threadIdx.y % m_rp.m_tile[3];
	+
	+ temp0 = m_rp.m_tile_end[4];
	+ temp1 = m_rp.m_tile_end[5];
	+ const index_type numbl5 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
	+ const index_type numbl4 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl5 ) :
	+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
	+
	+ const index_type tile_id4 = blockIdx.z / numbl5;
	+ const index_type tile_id5 = blockIdx.z % numbl5;
	+ const index_type thr_id4 = threadIdx.z / m_rp.m_tile[5];
	+ const index_type thr_id5 = threadIdx.z % m_rp.m_tile[5];
	+
	+ for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
	+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
	+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
	+
	+ for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
	+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
	+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
	+
	+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
	+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
	+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
	+
	+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
	+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
	+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
	+
	+ for ( index_type m = tile_id4; m < m_rp.m_tile_end[4]; m += numbl4 ) {
	+ const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4;
	+ if ( offset_4 < m_rp.m_upper[4] && thr_id4 < m_rp.m_tile[4] ) {
	+
	+ for ( index_type n = tile_id5; n < m_rp.m_tile_end[5]; n += numbl5 ) {
	+ const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5;
	+ if ( offset_5 < m_rp.m_upper[5] && thr_id5 < m_rp.m_tile[5] ) {
	+ m_func(Tag() , offset_0 , offset_1 , offset_2 , offset_3 , offset_4 , offset_5);
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+ }
	+
	+ } //end exec_range
	+
	+private:
	+ const RP & m_rp;
	+ const Functor & m_func;
	+};
	+
	+// ----------------------------------------------------------------------------------
	+
	+template < typename RP
	+ , typename Functor
	+ , typename Tag
	+ >
	+struct DeviceIterateTile
	+{
	+ using index_type = typename RP::index_type;
	+ using array_index_type = typename RP::array_index_type;
	+ using point_type = typename RP::point_type;
	+
	+ struct VoidDummy {};
	+ typedef typename std::conditional< std::is_same<Tag, void>::value, VoidDummy, Tag>::type usable_tag;
	+
	+ DeviceIterateTile( const RP & rp, const Functor & func )
	+ : m_rp{rp}
	+ , m_func{func}
	+ {}
	+
	+private:
	+ inline __device__
	+ void apply() const
	+ {
	+ apply_impl<RP::rank,RP,Functor,Tag>(m_rp,m_func).exec_range();
	+ } //end apply
	+
	+public:
	+
	+ inline
	+ __device__
	+ void operator()(void) const
	+ {
	+ this-> apply();
	+ }
	+
	+ inline
	+ void execute() const
	+ {
	+ const array_index_type maxblocks = 65535; //not true for blockIdx.x for newer archs
	+ if ( RP::rank == 2 )
	+ {
	+ const dim3 block( m_rp.m_tile[0] , m_rp.m_tile[1] , 1);
	+ const dim3 grid(
	+ std::min( ( m_rp.m_upper[0] - m_rp.m_lower[0] + block.x - 1 ) / block.x , maxblocks )
	+ , std::min( ( m_rp.m_upper[1] - m_rp.m_lower[1] + block.y - 1 ) / block.y , maxblocks )
	+ , 1
	+ );
	+ CudaLaunch< DeviceIterateTile >( *this , grid , block );
	+ }
	+ else if ( RP::rank == 3 )
	+ {
	+ const dim3 block( m_rp.m_tile[0] , m_rp.m_tile[1] , m_rp.m_tile[2] );
	+ const dim3 grid(
	+ std::min( ( m_rp.m_upper[0] - m_rp.m_lower[0] + block.x - 1 ) / block.x , maxblocks )
	+ , std::min( ( m_rp.m_upper[1] - m_rp.m_lower[1] + block.y - 1 ) / block.y , maxblocks )
	+ , std::min( ( m_rp.m_upper[2] - m_rp.m_lower[2] + block.z - 1 ) / block.z , maxblocks )
	+ );
	+ CudaLaunch< DeviceIterateTile >( *this , grid , block );
	+ }
	+ else if ( RP::rank == 4 )
	+ {
	+ // id0,id1 encoded within threadIdx.x; id2 to threadIdx.y; id3 to threadIdx.z
	+ const dim3 block( m_rp.m_tile[0]*m_rp.m_tile[1] , m_rp.m_tile[2] , m_rp.m_tile[3] );
	+ const dim3 grid(
	+ std::min( static_cast<index_type>( m_rp.m_tile_end[0] * m_rp.m_tile_end[1] )
	+ , static_cast<index_type>(maxblocks) )
	+ , std::min( ( m_rp.m_upper[2] - m_rp.m_lower[2] + block.y - 1 ) / block.y , maxblocks )
	+ , std::min( ( m_rp.m_upper[3] - m_rp.m_lower[3] + block.z - 1 ) / block.z , maxblocks )
	+ );
	+ CudaLaunch< DeviceIterateTile >( *this , grid , block );
	+ }
	+ else if ( RP::rank == 5 )
	+ {
	+ // id0,id1 encoded within threadIdx.x; id2,id3 to threadIdx.y; id4 to threadIdx.z
	+ const dim3 block( m_rp.m_tile[0]m_rp.m_tile[1] , m_rp.m_tile[2]m_rp.m_tile[3] , m_rp.m_tile[4] );
	+ const dim3 grid(
	+ std::min( static_cast<index_type>( m_rp.m_tile_end[0] * m_rp.m_tile_end[1] )
	+ , static_cast<index_type>(maxblocks) )
	+ , std::min( static_cast<index_type>( m_rp.m_tile_end[2] * m_rp.m_tile_end[3] )
	+ , static_cast<index_type>(maxblocks) )
	+ , std::min( ( m_rp.m_upper[4] - m_rp.m_lower[4] + block.z - 1 ) / block.z , maxblocks )
	+ );
	+ CudaLaunch< DeviceIterateTile >( *this , grid , block );
	+ }
	+ else if ( RP::rank == 6 )
	+ {
	+ // id0,id1 encoded within threadIdx.x; id2,id3 to threadIdx.y; id4,id5 to threadIdx.z
	+ const dim3 block( m_rp.m_tile[0]m_rp.m_tile[1] , m_rp.m_tile[2]m_rp.m_tile[3] , m_rp.m_tile[4]*m_rp.m_tile[5] );
	+ const dim3 grid(
	+ std::min( static_cast<index_type>( m_rp.m_tile_end[0] * m_rp.m_tile_end[1] )
	+ , static_cast<index_type>(maxblocks) )
	+ , std::min( static_cast<index_type>( m_rp.m_tile_end[2] * m_rp.m_tile_end[3] )
	+ , static_cast<index_type>(maxblocks) )
	+ , std::min( static_cast<index_type>( m_rp.m_tile_end[4] * m_rp.m_tile_end[5] )
	+ , static_cast<index_type>(maxblocks) )
	+ );
	+ CudaLaunch< DeviceIterateTile >( *this , grid , block );
	+ }
	+ else
	+ {
	+ printf("Kokkos::MDRange Error: Exceeded rank bounds with Cuda\n");
	+ Kokkos::abort("Aborting");
	+ }
	+
	+ } //end execute
	+
	+protected:
	+ const RP m_rp;
	+ const Functor m_func;
	+};
	+
	+} } } //end namespace Kokkos::Experimental::Impl
	+
	+#endif
	+#endif
	diff --git a/lib/kokkos/core/src/Cuda/Kokkos_CudaExec.hpp b/lib/kokkos/core/src/Cuda/Kokkos_CudaExec.hpp
	index 0a0f41686..a273db998 100644
	--- a/lib/kokkos/core/src/Cuda/Kokkos_CudaExec.hpp
	+++ b/lib/kokkos/core/src/Cuda/Kokkos_CudaExec.hpp
	@@ -1,318 +1,321 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_CUDAEXEC_HPP
	#define KOKKOS_CUDAEXEC_HPP

	#include <Kokkos_Macros.hpp>

	/* only compile this file if CUDA is enabled for Kokkos */
	#ifdef KOKKOS_ENABLE_CUDA

	#include <string>
	#include <Kokkos_Parallel.hpp>
	#include <impl/Kokkos_Error.hpp>
	#include <Cuda/Kokkos_Cuda_abort.hpp>
	#include <Cuda/Kokkos_Cuda_Error.hpp>

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	struct CudaTraits {
	enum { WarpSize = 32 /* 0x0020 */ };
	enum { WarpIndexMask = 0x001f /* Mask for warpindex */ };
	enum { WarpIndexShift = 5 /* WarpSize == 1 << WarpShift */ };

	enum { SharedMemoryBanks = 32 /* Compute device 2.0 */ };
	enum { SharedMemoryCapacity = 0x0C000 /* 48k shared / 16k L1 Cache */ };
	enum { SharedMemoryUsage = 0x04000 /* 16k shared / 48k L1 Cache */ };

	enum { UpperBoundGridCount = 65535 /* Hard upper bound */ };
	enum { ConstantMemoryCapacity = 0x010000 /* 64k bytes */ };
	enum { ConstantMemoryUsage = 0x008000 /* 32k bytes */ };
	enum { ConstantMemoryCache = 0x002000 /* 8k bytes */ };

	typedef unsigned long
	ConstantGlobalBufferType[ ConstantMemoryUsage / sizeof(unsigned long) ];

	enum { ConstantMemoryUseThreshold = 0x000200 /* 512 bytes */ };

	KOKKOS_INLINE_FUNCTION static
	CudaSpace::size_type warp_count( CudaSpace::size_type i )
	{ return ( i + WarpIndexMask ) >> WarpIndexShift ; }

	KOKKOS_INLINE_FUNCTION static
	CudaSpace::size_type warp_align( CudaSpace::size_type i )
	{
	enum { Mask = ~CudaSpace::size_type( WarpIndexMask ) };
	return ( i + WarpIndexMask ) & Mask ;
	}
	};

	//----------------------------------------------------------------------------

	CudaSpace::size_type cuda_internal_multiprocessor_count();
	CudaSpace::size_type cuda_internal_maximum_warp_count();
	CudaSpace::size_type cuda_internal_maximum_grid_count();
	CudaSpace::size_type cuda_internal_maximum_shared_words();

	CudaSpace::size_type * cuda_internal_scratch_flags( const CudaSpace::size_type size );
	CudaSpace::size_type * cuda_internal_scratch_space( const CudaSpace::size_type size );
	CudaSpace::size_type * cuda_internal_scratch_unified( const CudaSpace::size_type size );

	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	#if defined( __CUDACC__ )

	/** \brief Access to constant memory on the device */
	#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE

	__device__ __constant__
	extern unsigned long kokkos_impl_cuda_constant_memory_buffer[] ;

	#else

	__device__ __constant__
	unsigned long kokkos_impl_cuda_constant_memory_buffer[ Kokkos::Impl::CudaTraits::ConstantMemoryUsage / sizeof(unsigned long) ] ;

	#endif


	namespace Kokkos {
	namespace Impl {
	struct CudaLockArraysStruct {
	int* atomic;
	int* scratch;
	int* threadid;
	+ int n;
	};
	}
	}
	__device__ __constant__
	#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
	extern
	#endif
	Kokkos::Impl::CudaLockArraysStruct kokkos_impl_cuda_lock_arrays ;

	#define CUDA_SPACE_ATOMIC_MASK 0x1FFFF
	#define CUDA_SPACE_ATOMIC_XOR_MASK 0x15A39

	namespace Kokkos {
	namespace Impl {
	void* cuda_resize_scratch_space(size_t bytes, bool force_shrink = false);
	}
	}

	namespace Kokkos {
	namespace Impl {
	__device__ inline
	bool lock_address_cuda_space(void* ptr) {
	size_t offset = size_t(ptr);
	offset = offset >> 2;
	offset = offset & CUDA_SPACE_ATOMIC_MASK;
	return (0 == atomicCAS(&kokkos_impl_cuda_lock_arrays.atomic[offset],0,1));
	}

	__device__ inline
	void unlock_address_cuda_space(void* ptr) {
	size_t offset = size_t(ptr);
	offset = offset >> 2;
	offset = offset & CUDA_SPACE_ATOMIC_MASK;
	atomicExch( &kokkos_impl_cuda_lock_arrays.atomic[ offset ], 0);
	}

	}
	}

	template< typename T >
	inline
	__device__
	T * kokkos_impl_cuda_shared_memory()
	{ extern __shared__ Kokkos::CudaSpace::size_type sh[]; return (T*) sh ; }

	namespace Kokkos {
	namespace Impl {

	//----------------------------------------------------------------------------
	// See section B.17 of Cuda C Programming Guide Version 3.2
	// for discussion of
	// __launch_bounds__(maxThreadsPerBlock,minBlocksPerMultiprocessor)
	// function qualifier which could be used to improve performance.
	//----------------------------------------------------------------------------
	// Maximize L1 cache and minimize shared memory:
	// cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferL1 );
	// For 2.0 capability: 48 KB L1 and 16 KB shared
	//----------------------------------------------------------------------------

	template< class DriverType >
	__global__
	static void cuda_parallel_launch_constant_memory()
	{
	const DriverType & driver =
	((const DriverType ) kokkos_impl_cuda_constant_memory_buffer );

	driver();
	}

	template< class DriverType >
	__global__
	static void cuda_parallel_launch_local_memory( const DriverType driver )
	{
	driver();
	}

	template < class DriverType ,
	bool Large = ( CudaTraits::ConstantMemoryUseThreshold < sizeof(DriverType) ) >
	struct CudaParallelLaunch ;

	template < class DriverType >
	struct CudaParallelLaunch< DriverType , true > {

	inline
	CudaParallelLaunch( const DriverType & driver
	, const dim3 & grid
	, const dim3 & block
	, const int shmem
	, const cudaStream_t stream = 0 )
	{
	if ( grid.x && ( block.x * block.y * block.z ) ) {

	if ( sizeof( Kokkos::Impl::CudaTraits::ConstantGlobalBufferType ) <
	sizeof( DriverType ) ) {
	Kokkos::Impl::throw_runtime_exception( std::string("CudaParallelLaunch FAILED: Functor is too large") );
	}

	// Fence before changing settings and copying closure
	Kokkos::Cuda::fence();

	if ( CudaTraits::SharedMemoryCapacity < shmem ) {
	Kokkos::Impl::throw_runtime_exception( std::string("CudaParallelLaunch FAILED: shared memory request is too large") );
	}
	#ifndef KOKKOS_ARCH_KEPLER //On Kepler the L1 has no benefit since it doesn't cache reads
	else if ( shmem ) {
	CUDA_SAFE_CALL( cudaFuncSetCacheConfig( cuda_parallel_launch_constant_memory< DriverType > , cudaFuncCachePreferShared ) );
	} else {
	CUDA_SAFE_CALL( cudaFuncSetCacheConfig( cuda_parallel_launch_constant_memory< DriverType > , cudaFuncCachePreferL1 ) );
	}
	#endif

	// Copy functor to constant memory on the device
	cudaMemcpyToSymbol( kokkos_impl_cuda_constant_memory_buffer , & driver , sizeof(DriverType) );

	#ifndef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
	Kokkos::Impl::CudaLockArraysStruct locks;
	locks.atomic = atomic_lock_array_cuda_space_ptr(false);
	locks.scratch = scratch_lock_array_cuda_space_ptr(false);
	locks.threadid = threadid_lock_array_cuda_space_ptr(false);
	+ locks.n = Kokkos::Cuda::concurrency();
	cudaMemcpyToSymbol( kokkos_impl_cuda_lock_arrays , & locks , sizeof(CudaLockArraysStruct) );
	#endif

	// Invoke the driver function on the device
	cuda_parallel_launch_constant_memory< DriverType ><<< grid , block , shmem , stream >>>();

	#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
	CUDA_SAFE_CALL( cudaGetLastError() );
	Kokkos::Cuda::fence();
	#endif
	}
	}
	};

	template < class DriverType >
	struct CudaParallelLaunch< DriverType , false > {

	inline
	CudaParallelLaunch( const DriverType & driver
	, const dim3 & grid
	, const dim3 & block
	, const int shmem
	, const cudaStream_t stream = 0 )
	{
	if ( grid.x && ( block.x * block.y * block.z ) ) {

	if ( CudaTraits::SharedMemoryCapacity < shmem ) {
	Kokkos::Impl::throw_runtime_exception( std::string("CudaParallelLaunch FAILED: shared memory request is too large") );
	}
	#ifndef KOKKOS_ARCH_KEPLER //On Kepler the L1 has no benefit since it doesn't cache reads
	else if ( shmem ) {
	CUDA_SAFE_CALL( cudaFuncSetCacheConfig( cuda_parallel_launch_local_memory< DriverType > , cudaFuncCachePreferShared ) );
	} else {
	CUDA_SAFE_CALL( cudaFuncSetCacheConfig( cuda_parallel_launch_local_memory< DriverType > , cudaFuncCachePreferL1 ) );
	}
	#endif

	#ifndef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
	Kokkos::Impl::CudaLockArraysStruct locks;
	locks.atomic = atomic_lock_array_cuda_space_ptr(false);
	locks.scratch = scratch_lock_array_cuda_space_ptr(false);
	locks.threadid = threadid_lock_array_cuda_space_ptr(false);
	+ locks.n = Kokkos::Cuda::concurrency();
	cudaMemcpyToSymbol( kokkos_impl_cuda_lock_arrays , & locks , sizeof(CudaLockArraysStruct) );
	#endif

	cuda_parallel_launch_local_memory< DriverType ><<< grid , block , shmem , stream >>>( driver );

	#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
	CUDA_SAFE_CALL( cudaGetLastError() );
	Kokkos::Cuda::fence();
	#endif
	}
	}
	};

	//----------------------------------------------------------------------------

	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	#endif /* defined( __CUDACC__ ) */
	#endif /* defined( KOKKOS_ENABLE_CUDA ) */
	#endif /* #ifndef KOKKOS_CUDAEXEC_HPP */
	diff --git a/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp b/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp
	index 91a3c9213..303b3fa4f 100644
	--- a/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp
	+++ b/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp
	@@ -1,914 +1,915 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <stdlib.h>
	#include <iostream>
	#include <sstream>
	#include <stdexcept>
	#include <algorithm>
	#include <atomic>
	#include <Kokkos_Macros.hpp>

	/* only compile this file if CUDA is enabled for Kokkos */
	#ifdef KOKKOS_ENABLE_CUDA

	#include <Kokkos_Core.hpp>
	#include <Kokkos_Cuda.hpp>
	#include <Kokkos_CudaSpace.hpp>

	#include <Cuda/Kokkos_Cuda_Internal.hpp>
	#include <impl/Kokkos_Error.hpp>

	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	#include <impl/Kokkos_Profiling_Interface.hpp>
	#endif


	/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/

	namespace Kokkos {
	namespace Impl {

	namespace {

	static std::atomic<int> num_uvm_allocations(0) ;

	cudaStream_t get_deep_copy_stream() {
	static cudaStream_t s = 0;
	if( s == 0) {
	cudaStreamCreate ( &s );
	}
	return s;
	}
	}

	DeepCopy<CudaSpace,CudaSpace,Cuda>::DeepCopy( void * dst , const void * src , size_t n )
	{ CUDA_SAFE_CALL( cudaMemcpy( dst , src , n , cudaMemcpyDefault ) ); }

	DeepCopy<HostSpace,CudaSpace,Cuda>::DeepCopy( void * dst , const void * src , size_t n )
	{ CUDA_SAFE_CALL( cudaMemcpy( dst , src , n , cudaMemcpyDefault ) ); }

	DeepCopy<CudaSpace,HostSpace,Cuda>::DeepCopy( void * dst , const void * src , size_t n )
	{ CUDA_SAFE_CALL( cudaMemcpy( dst , src , n , cudaMemcpyDefault ) ); }

	DeepCopy<CudaSpace,CudaSpace,Cuda>::DeepCopy( const Cuda & instance , void * dst , const void * src , size_t n )
	{ CUDA_SAFE_CALL( cudaMemcpyAsync( dst , src , n , cudaMemcpyDefault , instance.cuda_stream() ) ); }

	DeepCopy<HostSpace,CudaSpace,Cuda>::DeepCopy( const Cuda & instance , void * dst , const void * src , size_t n )
	{ CUDA_SAFE_CALL( cudaMemcpyAsync( dst , src , n , cudaMemcpyDefault , instance.cuda_stream() ) ); }

	DeepCopy<CudaSpace,HostSpace,Cuda>::DeepCopy( const Cuda & instance , void * dst , const void * src , size_t n )
	{ CUDA_SAFE_CALL( cudaMemcpyAsync( dst , src , n , cudaMemcpyDefault , instance.cuda_stream() ) ); }

	void DeepCopyAsyncCuda( void * dst , const void * src , size_t n) {
	cudaStream_t s = get_deep_copy_stream();
	CUDA_SAFE_CALL( cudaMemcpyAsync( dst , src , n , cudaMemcpyDefault , s ) );
	cudaStreamSynchronize(s);
	}

	} // namespace Impl
	} // namespace Kokkos

	/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/


	namespace Kokkos {

	void CudaSpace::access_error()
	{
	const std::string msg("Kokkos::CudaSpace::access_error attempt to execute Cuda function from non-Cuda space" );
	Kokkos::Impl::throw_runtime_exception( msg );
	}

	void CudaSpace::access_error( const void * const )
	{
	const std::string msg("Kokkos::CudaSpace::access_error attempt to execute Cuda function from non-Cuda space" );
	Kokkos::Impl::throw_runtime_exception( msg );
	}


	/--------------------------------------------------------------------------/

	bool CudaUVMSpace::available()
	{
	#if defined( CUDA_VERSION ) && ( 6000 <= CUDA_VERSION ) && !defined(__APPLE__)
	enum { UVM_available = true };
	#else
	enum { UVM_available = false };
	#endif
	return UVM_available;
	}

	/--------------------------------------------------------------------------/

	int CudaUVMSpace::number_of_allocations()
	{
	return Kokkos::Impl::num_uvm_allocations.load();
	}

	} // namespace Kokkos

	/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/

	namespace Kokkos {

	CudaSpace::CudaSpace()
	: m_device( Kokkos::Cuda().cuda_device() )
	{
	}

	CudaUVMSpace::CudaUVMSpace()
	: m_device( Kokkos::Cuda().cuda_device() )
	{
	}

	CudaHostPinnedSpace::CudaHostPinnedSpace()
	{
	}

	void * CudaSpace::allocate( const size_t arg_alloc_size ) const
	{
	void * ptr = NULL;

	CUDA_SAFE_CALL( cudaMalloc( &ptr, arg_alloc_size ) );

	return ptr ;
	}

	void * CudaUVMSpace::allocate( const size_t arg_alloc_size ) const
	{
	void * ptr = NULL;

	enum { max_uvm_allocations = 65536 };

	- if ( arg_alloc_size > 0 )
	+ if ( arg_alloc_size > 0 )
	{
	Kokkos::Impl::num_uvm_allocations++;

	if ( Kokkos::Impl::num_uvm_allocations.load() > max_uvm_allocations ) {
	Kokkos::Impl::throw_runtime_exception( "CudaUVM error: The maximum limit of UVM allocations exceeded (currently 65536)." ) ;
	}

	CUDA_SAFE_CALL( cudaMallocManaged( &ptr, arg_alloc_size , cudaMemAttachGlobal ) );
	- }
	+ }

	return ptr ;
	}

	void * CudaHostPinnedSpace::allocate( const size_t arg_alloc_size ) const
	{
	void * ptr = NULL;

	CUDA_SAFE_CALL( cudaHostAlloc( &ptr, arg_alloc_size , cudaHostAllocDefault ) );

	return ptr ;
	}

	void CudaSpace::deallocate( void * const arg_alloc_ptr , const size_t /* arg_alloc_size */ ) const
	{
	try {
	CUDA_SAFE_CALL( cudaFree( arg_alloc_ptr ) );
	} catch(...) {}
	}

	void CudaUVMSpace::deallocate( void * const arg_alloc_ptr , const size_t /* arg_alloc_size */ ) const
	{
	try {
	if ( arg_alloc_ptr != nullptr ) {
	Kokkos::Impl::num_uvm_allocations--;
	CUDA_SAFE_CALL( cudaFree( arg_alloc_ptr ) );
	}
	} catch(...) {}
	}

	void CudaHostPinnedSpace::deallocate( void * const arg_alloc_ptr , const size_t /* arg_alloc_size */ ) const
	{
	try {
	CUDA_SAFE_CALL( cudaFreeHost( arg_alloc_ptr ) );
	} catch(...) {}
	}

	constexpr const char* CudaSpace::name() {
	return m_name;
	}

	constexpr const char* CudaUVMSpace::name() {
	return m_name;
	}

	constexpr const char* CudaHostPinnedSpace::name() {
	return m_name;
	}

	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	SharedAllocationRecord< void , void >
	SharedAllocationRecord< Kokkos::CudaSpace , void >::s_root_record ;

	SharedAllocationRecord< void , void >
	SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::s_root_record ;

	SharedAllocationRecord< void , void >
	SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::s_root_record ;

	::cudaTextureObject_t
	SharedAllocationRecord< Kokkos::CudaSpace , void >::
	attach_texture_object( const unsigned sizeof_alias
	, void * const alloc_ptr
	, size_t const alloc_size )
	{
	enum { TEXTURE_BOUND_1D = 1u << 27 };

	if ( ( alloc_ptr == 0 ) \|\| ( sizeof_alias * TEXTURE_BOUND_1D <= alloc_size ) ) {
	std::ostringstream msg ;
	msg << "Kokkos::CudaSpace ERROR: Cannot attach texture object to"
	<< " alloc_ptr(" << alloc_ptr << ")"
	<< " alloc_size(" << alloc_size << ")"
	<< " max_size(" << ( sizeof_alias * TEXTURE_BOUND_1D ) << ")" ;
	std::cerr << msg.str() << std::endl ;
	std::cerr.flush();
	Kokkos::Impl::throw_runtime_exception( msg.str() );
	}

	::cudaTextureObject_t tex_obj ;

	struct cudaResourceDesc resDesc ;
	struct cudaTextureDesc texDesc ;

	memset( & resDesc , 0 , sizeof(resDesc) );
	memset( & texDesc , 0 , sizeof(texDesc) );

	resDesc.resType = cudaResourceTypeLinear ;
	resDesc.res.linear.desc = ( sizeof_alias == 4 ? cudaCreateChannelDesc< int >() :
	( sizeof_alias == 8 ? cudaCreateChannelDesc< ::int2 >() :
	/* sizeof_alias == 16 */ cudaCreateChannelDesc< ::int4 >() ) );
	resDesc.res.linear.sizeInBytes = alloc_size ;
	resDesc.res.linear.devPtr = alloc_ptr ;

	CUDA_SAFE_CALL( cudaCreateTextureObject( & tex_obj , & resDesc, & texDesc, NULL ) );

	return tex_obj ;
	}

	std::string
	SharedAllocationRecord< Kokkos::CudaSpace , void >::get_label() const
	{
	SharedAllocationHeader header ;

	Kokkos::Impl::DeepCopy< Kokkos::HostSpace , Kokkos::CudaSpace >( & header , RecordBase::head() , sizeof(SharedAllocationHeader) );

	return std::string( header.m_label );
	}

	std::string
	SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::get_label() const
	{
	return std::string( RecordBase::head()->m_label );
	}

	std::string
	SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::get_label() const
	{
	return std::string( RecordBase::head()->m_label );
	}

	SharedAllocationRecord< Kokkos::CudaSpace , void > *
	SharedAllocationRecord< Kokkos::CudaSpace , void >::
	allocate( const Kokkos::CudaSpace & arg_space
	, const std::string & arg_label
	, const size_t arg_alloc_size
	)
	{
	return new SharedAllocationRecord( arg_space , arg_label , arg_alloc_size );
	}

	SharedAllocationRecord< Kokkos::CudaUVMSpace , void > *
	SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
	allocate( const Kokkos::CudaUVMSpace & arg_space
	, const std::string & arg_label
	, const size_t arg_alloc_size
	)
	{
	return new SharedAllocationRecord( arg_space , arg_label , arg_alloc_size );
	}

	SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void > *
	SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
	allocate( const Kokkos::CudaHostPinnedSpace & arg_space
	, const std::string & arg_label
	, const size_t arg_alloc_size
	)
	{
	return new SharedAllocationRecord( arg_space , arg_label , arg_alloc_size );
	}

	void
	SharedAllocationRecord< Kokkos::CudaSpace , void >::
	deallocate( SharedAllocationRecord< void , void > * arg_rec )
	{
	delete static_cast<SharedAllocationRecord*>(arg_rec);
	}

	void
	SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
	deallocate( SharedAllocationRecord< void , void > * arg_rec )
	{
	delete static_cast<SharedAllocationRecord*>(arg_rec);
	}

	void
	SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
	deallocate( SharedAllocationRecord< void , void > * arg_rec )
	{
	delete static_cast<SharedAllocationRecord*>(arg_rec);
	}

	SharedAllocationRecord< Kokkos::CudaSpace , void >::
	~SharedAllocationRecord()
	{
	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	if(Kokkos::Profiling::profileLibraryLoaded()) {

	SharedAllocationHeader header ;
	Kokkos::Impl::DeepCopy<CudaSpace,HostSpace>::DeepCopy( & header , RecordBase::m_alloc_ptr , sizeof(SharedAllocationHeader) );

	Kokkos::Profiling::deallocateData(
	Kokkos::Profiling::SpaceHandle(Kokkos::CudaSpace::name()),header.m_label,
	data(),size());
	}
	#endif

	m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
	, SharedAllocationRecord< void , void >::m_alloc_size
	);
	}

	SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
	~SharedAllocationRecord()
	{
	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::fence(); //Make sure I can access the label ...
	Kokkos::Profiling::deallocateData(
	Kokkos::Profiling::SpaceHandle(Kokkos::CudaUVMSpace::name()),RecordBase::m_alloc_ptr->m_label,
	data(),size());
	}
	#endif

	m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
	, SharedAllocationRecord< void , void >::m_alloc_size
	);
	}

	SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
	~SharedAllocationRecord()
	{
	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::deallocateData(
	Kokkos::Profiling::SpaceHandle(Kokkos::CudaHostPinnedSpace::name()),RecordBase::m_alloc_ptr->m_label,
	data(),size());
	}
	#endif

	m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
	, SharedAllocationRecord< void , void >::m_alloc_size
	);
	}

	SharedAllocationRecord< Kokkos::CudaSpace , void >::
	SharedAllocationRecord( const Kokkos::CudaSpace & arg_space
	, const std::string & arg_label
	, const size_t arg_alloc_size
	, const SharedAllocationRecord< void , void >::function_type arg_dealloc
	)
	// Pass through allocated [ SharedAllocationHeader , user_memory ]
	// Pass through deallocation function
	: SharedAllocationRecord< void , void >
	( & SharedAllocationRecord< Kokkos::CudaSpace , void >::s_root_record
	, reinterpret_cast<SharedAllocationHeader*>( arg_space.allocate( sizeof(SharedAllocationHeader) + arg_alloc_size ) )
	, sizeof(SharedAllocationHeader) + arg_alloc_size
	, arg_dealloc
	)
	, m_tex_obj( 0 )
	, m_space( arg_space )
	{
	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
	}
	#endif

	SharedAllocationHeader header ;

	// Fill in the Header information
	header.m_record = static_cast< SharedAllocationRecord< void , void > * >( this );

	strncpy( header.m_label
	, arg_label.c_str()
	, SharedAllocationHeader::maximum_label_length
	);

	// Copy to device memory
	Kokkos::Impl::DeepCopy<CudaSpace,HostSpace>::DeepCopy( RecordBase::m_alloc_ptr , & header , sizeof(SharedAllocationHeader) );
	}

	SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
	SharedAllocationRecord( const Kokkos::CudaUVMSpace & arg_space
	, const std::string & arg_label
	, const size_t arg_alloc_size
	, const SharedAllocationRecord< void , void >::function_type arg_dealloc
	)
	// Pass through allocated [ SharedAllocationHeader , user_memory ]
	// Pass through deallocation function
	: SharedAllocationRecord< void , void >
	( & SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::s_root_record
	, reinterpret_cast<SharedAllocationHeader*>( arg_space.allocate( sizeof(SharedAllocationHeader) + arg_alloc_size ) )
	, sizeof(SharedAllocationHeader) + arg_alloc_size
	, arg_dealloc
	)
	, m_tex_obj( 0 )
	, m_space( arg_space )
	{
	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
	}
	#endif
	// Fill in the Header information, directly accessible via UVM

	RecordBase::m_alloc_ptr->m_record = this ;

	strncpy( RecordBase::m_alloc_ptr->m_label
	, arg_label.c_str()
	, SharedAllocationHeader::maximum_label_length
	);
	}

	SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
	SharedAllocationRecord( const Kokkos::CudaHostPinnedSpace & arg_space
	, const std::string & arg_label
	, const size_t arg_alloc_size
	, const SharedAllocationRecord< void , void >::function_type arg_dealloc
	)
	// Pass through allocated [ SharedAllocationHeader , user_memory ]
	// Pass through deallocation function
	: SharedAllocationRecord< void , void >
	( & SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::s_root_record
	, reinterpret_cast<SharedAllocationHeader*>( arg_space.allocate( sizeof(SharedAllocationHeader) + arg_alloc_size ) )
	, sizeof(SharedAllocationHeader) + arg_alloc_size
	, arg_dealloc
	)
	, m_space( arg_space )
	{
	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
	}
	#endif
	// Fill in the Header information, directly accessible via UVM

	RecordBase::m_alloc_ptr->m_record = this ;

	strncpy( RecordBase::m_alloc_ptr->m_label
	, arg_label.c_str()
	, SharedAllocationHeader::maximum_label_length
	);
	}

	//----------------------------------------------------------------------------

	void * SharedAllocationRecord< Kokkos::CudaSpace , void >::
	allocate_tracked( const Kokkos::CudaSpace & arg_space
	, const std::string & arg_alloc_label
	, const size_t arg_alloc_size )
	{
	if ( ! arg_alloc_size ) return (void *) 0 ;

	SharedAllocationRecord * const r =
	allocate( arg_space , arg_alloc_label , arg_alloc_size );

	RecordBase::increment( r );

	return r->data();
	}

	void SharedAllocationRecord< Kokkos::CudaSpace , void >::
	deallocate_tracked( void * const arg_alloc_ptr )
	{
	if ( arg_alloc_ptr != 0 ) {
	SharedAllocationRecord * const r = get_record( arg_alloc_ptr );

	RecordBase::decrement( r );
	}
	}

	void * SharedAllocationRecord< Kokkos::CudaSpace , void >::
	reallocate_tracked( void * const arg_alloc_ptr
	, const size_t arg_alloc_size )
	{
	SharedAllocationRecord * const r_old = get_record( arg_alloc_ptr );
	SharedAllocationRecord * const r_new = allocate( r_old->m_space , r_old->get_label() , arg_alloc_size );

	Kokkos::Impl::DeepCopy<CudaSpace,CudaSpace>( r_new->data() , r_old->data()
	, std::min( r_old->size() , r_new->size() ) );

	RecordBase::increment( r_new );
	RecordBase::decrement( r_old );

	return r_new->data();
	}

	void * SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
	allocate_tracked( const Kokkos::CudaUVMSpace & arg_space
	, const std::string & arg_alloc_label
	, const size_t arg_alloc_size )
	{
	if ( ! arg_alloc_size ) return (void *) 0 ;

	SharedAllocationRecord * const r =
	allocate( arg_space , arg_alloc_label , arg_alloc_size );

	RecordBase::increment( r );

	return r->data();
	}

	void SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
	deallocate_tracked( void * const arg_alloc_ptr )
	{
	if ( arg_alloc_ptr != 0 ) {

	SharedAllocationRecord * const r = get_record( arg_alloc_ptr );

	RecordBase::decrement( r );
	}
	}

	void * SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
	reallocate_tracked( void * const arg_alloc_ptr
	, const size_t arg_alloc_size )
	{
	SharedAllocationRecord * const r_old = get_record( arg_alloc_ptr );
	SharedAllocationRecord * const r_new = allocate( r_old->m_space , r_old->get_label() , arg_alloc_size );

	Kokkos::Impl::DeepCopy<CudaUVMSpace,CudaUVMSpace>( r_new->data() , r_old->data()
	, std::min( r_old->size() , r_new->size() ) );

	RecordBase::increment( r_new );
	RecordBase::decrement( r_old );

	return r_new->data();
	}

	void * SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
	allocate_tracked( const Kokkos::CudaHostPinnedSpace & arg_space
	, const std::string & arg_alloc_label
	, const size_t arg_alloc_size )
	{
	if ( ! arg_alloc_size ) return (void *) 0 ;

	SharedAllocationRecord * const r =
	allocate( arg_space , arg_alloc_label , arg_alloc_size );

	RecordBase::increment( r );

	return r->data();
	}

	void SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
	deallocate_tracked( void * const arg_alloc_ptr )
	{
	if ( arg_alloc_ptr != 0 ) {
	SharedAllocationRecord * const r = get_record( arg_alloc_ptr );

	RecordBase::decrement( r );
	}
	}

	void * SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
	reallocate_tracked( void * const arg_alloc_ptr
	, const size_t arg_alloc_size )
	{
	SharedAllocationRecord * const r_old = get_record( arg_alloc_ptr );
	SharedAllocationRecord * const r_new = allocate( r_old->m_space , r_old->get_label() , arg_alloc_size );

	Kokkos::Impl::DeepCopy<CudaHostPinnedSpace,CudaHostPinnedSpace>( r_new->data() , r_old->data()
	, std::min( r_old->size() , r_new->size() ) );

	RecordBase::increment( r_new );
	RecordBase::decrement( r_old );

	return r_new->data();
	}

	//----------------------------------------------------------------------------

	SharedAllocationRecord< Kokkos::CudaSpace , void > *
	SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record( void * alloc_ptr )
	{
	using Header = SharedAllocationHeader ;
	using RecordBase = SharedAllocationRecord< void , void > ;
	using RecordCuda = SharedAllocationRecord< Kokkos::CudaSpace , void > ;

	#if 0
	// Copy the header from the allocation
	Header head ;

	Header const * const head_cuda = alloc_ptr ? Header::get_header( alloc_ptr ) : (Header*) 0 ;

	if ( alloc_ptr ) {
	Kokkos::Impl::DeepCopy<HostSpace,CudaSpace>::DeepCopy( & head , head_cuda , sizeof(SharedAllocationHeader) );
	}

	RecordCuda * const record = alloc_ptr ? static_cast< RecordCuda * >( head.m_record ) : (RecordCuda *) 0 ;

	if ( ! alloc_ptr \|\| record->m_alloc_ptr != head_cuda ) {
	Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record ERROR" ) );
	}

	#else

	// Iterate the list to search for the record among all allocations
	// requires obtaining the root of the list and then locking the list.

	RecordCuda * const record = static_cast< RecordCuda * >( RecordBase::find( & s_root_record , alloc_ptr ) );

	if ( record == 0 ) {
	Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record ERROR" ) );
	}

	#endif

	return record ;
	}

	SharedAllocationRecord< Kokkos::CudaUVMSpace , void > *
	SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::get_record( void * alloc_ptr )
	{
	using Header = SharedAllocationHeader ;
	using RecordCuda = SharedAllocationRecord< Kokkos::CudaUVMSpace , void > ;

	Header * const h = alloc_ptr ? reinterpret_cast< Header * >( alloc_ptr ) - 1 : (Header *) 0 ;

	if ( ! alloc_ptr \|\| h->m_record->m_alloc_ptr != h ) {
	Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::get_record ERROR" ) );
	}

	return static_cast< RecordCuda * >( h->m_record );
	}

	SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void > *
	SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::get_record( void * alloc_ptr )
	{
	using Header = SharedAllocationHeader ;
	using RecordCuda = SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void > ;

	Header * const h = alloc_ptr ? reinterpret_cast< Header * >( alloc_ptr ) - 1 : (Header *) 0 ;

	if ( ! alloc_ptr \|\| h->m_record->m_alloc_ptr != h ) {
	Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::get_record ERROR" ) );
	}

	return static_cast< RecordCuda * >( h->m_record );
	}

	// Iterate records to print orphaned memory ...
	void
	SharedAllocationRecord< Kokkos::CudaSpace , void >::
	print_records( std::ostream & s , const Kokkos::CudaSpace & space , bool detail )
	{
	SharedAllocationRecord< void , void > * r = & s_root_record ;

	char buffer[256] ;

	SharedAllocationHeader head ;

	if ( detail ) {
	do {
	if ( r->m_alloc_ptr ) {
	Kokkos::Impl::DeepCopy<HostSpace,CudaSpace>::DeepCopy( & head , r->m_alloc_ptr , sizeof(SharedAllocationHeader) );
	}
	else {
	head.m_label[0] = 0 ;
	}

	//Formatting dependent on sizeof(uintptr_t)
	const char * format_string;

	- if (sizeof(uintptr_t) == sizeof(unsigned long)) {
	+ if (sizeof(uintptr_t) == sizeof(unsigned long)) {
	format_string = "Cuda addr( 0x%.12lx ) list( 0x%.12lx 0x%.12lx ) extent[ 0x%.12lx + %.8ld ] count(%d) dealloc(0x%.12lx) %s\n";
	}
	- else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
	+ else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
	format_string = "Cuda addr( 0x%.12llx ) list( 0x%.12llx 0x%.12llx ) extent[ 0x%.12llx + %.8ld ] count(%d) dealloc(0x%.12llx) %s\n";
	}

	- snprintf( buffer , 256
	+ snprintf( buffer , 256
	, format_string
	, reinterpret_cast<uintptr_t>( r )
	, reinterpret_cast<uintptr_t>( r->m_prev )
	, reinterpret_cast<uintptr_t>( r->m_next )
	, reinterpret_cast<uintptr_t>( r->m_alloc_ptr )
	, r->m_alloc_size
	, r->m_count
	, reinterpret_cast<uintptr_t>( r->m_dealloc )
	, head.m_label
	);
	std::cout << buffer ;
	r = r->m_next ;
	} while ( r != & s_root_record );
	}
	else {
	do {
	if ( r->m_alloc_ptr ) {

	Kokkos::Impl::DeepCopy<HostSpace,CudaSpace>::DeepCopy( & head , r->m_alloc_ptr , sizeof(SharedAllocationHeader) );

	//Formatting dependent on sizeof(uintptr_t)
	const char * format_string;

	- if (sizeof(uintptr_t) == sizeof(unsigned long)) {
	+ if (sizeof(uintptr_t) == sizeof(unsigned long)) {
	format_string = "Cuda [ 0x%.12lx + %ld ] %s\n";
	}
	- else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
	+ else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
	format_string = "Cuda [ 0x%.12llx + %ld ] %s\n";
	}

	- snprintf( buffer , 256
	+ snprintf( buffer , 256
	, format_string
	, reinterpret_cast< uintptr_t >( r->data() )
	, r->size()
	, head.m_label
	);
	}
	else {
	snprintf( buffer , 256 , "Cuda [ 0 + 0 ]\n" );
	}
	std::cout << buffer ;
	r = r->m_next ;
	} while ( r != & s_root_record );
	}
	}

	void
	SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
	print_records( std::ostream & s , const Kokkos::CudaUVMSpace & space , bool detail )
	{
	SharedAllocationRecord< void , void >::print_host_accessible_records( s , "CudaUVM" , & s_root_record , detail );
	}

	void
	SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
	print_records( std::ostream & s , const Kokkos::CudaHostPinnedSpace & space , bool detail )
	{
	SharedAllocationRecord< void , void >::print_host_accessible_records( s , "CudaHostPinned" , & s_root_record , detail );
	}

	} // namespace Impl
	} // namespace Kokkos

	/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/

	namespace Kokkos {
	namespace {
	__global__ void init_lock_array_kernel_atomic() {
	unsigned i = blockIdx.x*blockDim.x + threadIdx.x;

	if(i<CUDA_SPACE_ATOMIC_MASK+1)
	kokkos_impl_cuda_lock_arrays.atomic[i] = 0;
	}

	__global__ void init_lock_array_kernel_scratch_threadid(int N) {
	unsigned i = blockIdx.x*blockDim.x + threadIdx.x;

	if(i<N) {
	kokkos_impl_cuda_lock_arrays.scratch[i] = 0;
	kokkos_impl_cuda_lock_arrays.threadid[i] = 0;
	}
	}
	}


	namespace Impl {
	int* atomic_lock_array_cuda_space_ptr(bool deallocate) {
	static int* ptr = NULL;
	if(deallocate) {
	cudaFree(ptr);
	ptr = NULL;
	}

	if(ptr==NULL && !deallocate)
	cudaMalloc(&ptr,sizeof(int)*(CUDA_SPACE_ATOMIC_MASK+1));
	return ptr;
	}

	int* scratch_lock_array_cuda_space_ptr(bool deallocate) {
	static int* ptr = NULL;
	if(deallocate) {
	cudaFree(ptr);
	ptr = NULL;
	}

	if(ptr==NULL && !deallocate)
	cudaMalloc(&ptr,sizeof(int)*(Cuda::concurrency()));
	return ptr;
	}

	int* threadid_lock_array_cuda_space_ptr(bool deallocate) {
	static int* ptr = NULL;
	if(deallocate) {
	cudaFree(ptr);
	ptr = NULL;
	}

	if(ptr==NULL && !deallocate)
	cudaMalloc(&ptr,sizeof(int)*(Cuda::concurrency()));
	return ptr;
	}

	void init_lock_arrays_cuda_space() {
	static int is_initialized = 0;
	if(! is_initialized) {
	Kokkos::Impl::CudaLockArraysStruct locks;
	locks.atomic = atomic_lock_array_cuda_space_ptr(false);
	locks.scratch = scratch_lock_array_cuda_space_ptr(false);
	locks.threadid = threadid_lock_array_cuda_space_ptr(false);
	+ locks.n = Kokkos::Cuda::concurrency();
	cudaMemcpyToSymbol( kokkos_impl_cuda_lock_arrays , & locks , sizeof(CudaLockArraysStruct) );
	init_lock_array_kernel_atomic<<<(CUDA_SPACE_ATOMIC_MASK+255)/256,256>>>();
	init_lock_array_kernel_scratch_threadid<<<(Kokkos::Cuda::concurrency()+255)/256,256>>>(Kokkos::Cuda::concurrency());
	}
	}

	void* cuda_resize_scratch_space(size_t bytes, bool force_shrink) {
	static void* ptr = NULL;
	static size_t current_size = 0;
	if(current_size == 0) {
	current_size = bytes;
	ptr = Kokkos::kokkos_malloc<Kokkos::CudaSpace>("CudaSpace::ScratchMemory",current_size);
	}
	if(bytes > current_size) {
	current_size = bytes;
	ptr = Kokkos::kokkos_realloc<Kokkos::CudaSpace>(ptr,current_size);
	}
	if((bytes < current_size) && (force_shrink)) {
	current_size = bytes;
	Kokkos::kokkos_free<Kokkos::CudaSpace>(ptr);
	ptr = Kokkos::kokkos_malloc<Kokkos::CudaSpace>("CudaSpace::ScratchMemory",current_size);
	}
	return ptr;
	}

	}
	}
	#endif // KOKKOS_ENABLE_CUDA

	diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp
	index eeea97049..44d908d10 100644
	--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp
	+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp
	@@ -1,778 +1,779 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	/--------------------------------------------------------------------------/
	/* Kokkos interfaces */

	#include <Kokkos_Core.hpp>

	/* only compile this file if CUDA is enabled for Kokkos */
	#ifdef KOKKOS_ENABLE_CUDA

	#include <Cuda/Kokkos_Cuda_Error.hpp>
	#include <Cuda/Kokkos_Cuda_Internal.hpp>
	#include <impl/Kokkos_Error.hpp>
	#include <impl/Kokkos_Profiling_Interface.hpp>

	/--------------------------------------------------------------------------/
	/* Standard 'C' libraries */
	#include <stdlib.h>

	/* Standard 'C++' libraries */
	#include <vector>
	#include <iostream>
	#include <sstream>
	#include <string>

	#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE

	__device__ __constant__
	unsigned long kokkos_impl_cuda_constant_memory_buffer[ Kokkos::Impl::CudaTraits::ConstantMemoryUsage / sizeof(unsigned long) ] ;

	__device__ __constant__
	Kokkos::Impl::CudaLockArraysStruct kokkos_impl_cuda_lock_arrays ;

	#endif

	/--------------------------------------------------------------------------/

	namespace Kokkos {
	namespace Impl {

	namespace {

	__global__
	void query_cuda_kernel_arch( int * d_arch )
	{
	#if defined( __CUDA_ARCH__ )
	*d_arch = __CUDA_ARCH__ ;
	#else
	*d_arch = 0 ;
	#endif
	}

	/** Query what compute capability is actually launched to the device: */
	int cuda_kernel_arch()
	{
	int * d_arch = 0 ;
	cudaMalloc( (void **) & d_arch , sizeof(int) );
	query_cuda_kernel_arch<<<1,1>>>( d_arch );
	int arch = 0 ;
	cudaMemcpy( & arch , d_arch , sizeof(int) , cudaMemcpyDefault );
	cudaFree( d_arch );
	return arch ;
	}

	bool cuda_launch_blocking()
	{
	const char * env = getenv("CUDA_LAUNCH_BLOCKING");

	if (env == 0) return false;

	return atoi(env);
	}

	}

	void cuda_device_synchronize()
	{
	// static const bool launch_blocking = cuda_launch_blocking();

	// if (!launch_blocking) {
	CUDA_SAFE_CALL( cudaDeviceSynchronize() );
	// }
	}

	void cuda_internal_error_throw( cudaError e , const char * name, const char * file, const int line )
	{
	std::ostringstream out ;
	out << name << " error( " << cudaGetErrorName(e) << "): " << cudaGetErrorString(e);
	if (file) {
	out << " " << file << ":" << line;
	}
	throw_runtime_exception( out.str() );
	}

	//----------------------------------------------------------------------------
	// Some significant cuda device properties:
	//
	// cudaDeviceProp::name : Text label for device
	// cudaDeviceProp::major : Device major number
	// cudaDeviceProp::minor : Device minor number
	// cudaDeviceProp::warpSize : number of threads per warp
	// cudaDeviceProp::multiProcessorCount : number of multiprocessors
	// cudaDeviceProp::sharedMemPerBlock : capacity of shared memory per block
	// cudaDeviceProp::totalConstMem : capacity of constant memory
	// cudaDeviceProp::totalGlobalMem : capacity of global memory
	// cudaDeviceProp::maxGridSize[3] : maximum grid size

	//
	// Section 4.4.2.4 of the CUDA Toolkit Reference Manual
	//
	// struct cudaDeviceProp {
	// char name[256];
	// size_t totalGlobalMem;
	// size_t sharedMemPerBlock;
	// int regsPerBlock;
	// int warpSize;
	// size_t memPitch;
	// int maxThreadsPerBlock;
	// int maxThreadsDim[3];
	// int maxGridSize[3];
	// size_t totalConstMem;
	// int major;
	// int minor;
	// int clockRate;
	// size_t textureAlignment;
	// int deviceOverlap;
	// int multiProcessorCount;
	// int kernelExecTimeoutEnabled;
	// int integrated;
	// int canMapHostMemory;
	// int computeMode;
	// int concurrentKernels;
	// int ECCEnabled;
	// int pciBusID;
	// int pciDeviceID;
	// int tccDriver;
	// int asyncEngineCount;
	// int unifiedAddressing;
	// int memoryClockRate;
	// int memoryBusWidth;
	// int l2CacheSize;
	// int maxThreadsPerMultiProcessor;
	// };


	namespace {



	class CudaInternalDevices {
	public:
	enum { MAXIMUM_DEVICE_COUNT = 64 };
	struct cudaDeviceProp m_cudaProp[ MAXIMUM_DEVICE_COUNT ] ;
	int m_cudaDevCount ;

	CudaInternalDevices();

	static const CudaInternalDevices & singleton();
	};

	CudaInternalDevices::CudaInternalDevices()
	{
	// See 'cudaSetDeviceFlags' for host-device thread interaction
	// Section 4.4.2.6 of the CUDA Toolkit Reference Manual

	CUDA_SAFE_CALL (cudaGetDeviceCount( & m_cudaDevCount ) );

	if(m_cudaDevCount > MAXIMUM_DEVICE_COUNT) {
	Kokkos::abort("Sorry, you have more GPUs per node than we thought anybody would ever have. Please report this to github.com/kokkos/kokkos.");
	}
	for ( int i = 0 ; i < m_cudaDevCount ; ++i ) {
	CUDA_SAFE_CALL( cudaGetDeviceProperties( m_cudaProp + i , i ) );
	}
	}

	const CudaInternalDevices & CudaInternalDevices::singleton()
	{
	static CudaInternalDevices self ; return self ;
	}

	}

	//----------------------------------------------------------------------------

	class CudaInternal {
	private:

	CudaInternal( const CudaInternal & );
	CudaInternal & operator = ( const CudaInternal & );


	public:

	typedef Cuda::size_type size_type ;

	int m_cudaDev ;
	int m_cudaArch ;
	unsigned m_multiProcCount ;
	unsigned m_maxWarpCount ;
	unsigned m_maxBlock ;
	unsigned m_maxSharedWords ;
	size_type m_scratchSpaceCount ;
	size_type m_scratchFlagsCount ;
	size_type m_scratchUnifiedCount ;
	size_type m_scratchUnifiedSupported ;
	size_type m_streamCount ;
	size_type * m_scratchSpace ;
	size_type * m_scratchFlags ;
	size_type * m_scratchUnified ;
	cudaStream_t * m_stream ;

	static int was_initialized;
	static int was_finalized;

	static CudaInternal & singleton();

	int verify_is_initialized( const char * const label ) const ;

	int is_initialized() const
	{ return 0 != m_scratchSpace && 0 != m_scratchFlags ; }

	void initialize( int cuda_device_id , int stream_count );
	void finalize();

	void print_configuration( std::ostream & ) const ;

	~CudaInternal();

	CudaInternal()
	: m_cudaDev( -1 )
	, m_cudaArch( -1 )
	, m_multiProcCount( 0 )
	, m_maxWarpCount( 0 )
	, m_maxBlock( 0 )
	, m_maxSharedWords( 0 )
	, m_scratchSpaceCount( 0 )
	, m_scratchFlagsCount( 0 )
	, m_scratchUnifiedCount( 0 )
	, m_scratchUnifiedSupported( 0 )
	, m_streamCount( 0 )
	, m_scratchSpace( 0 )
	, m_scratchFlags( 0 )
	, m_scratchUnified( 0 )
	, m_stream( 0 )
	{}

	size_type * scratch_space( const size_type size );
	size_type * scratch_flags( const size_type size );
	size_type * scratch_unified( const size_type size );
	};

	int CudaInternal::was_initialized = 0;
	int CudaInternal::was_finalized = 0;
	//----------------------------------------------------------------------------


	void CudaInternal::print_configuration( std::ostream & s ) const
	{
	const CudaInternalDevices & dev_info = CudaInternalDevices::singleton();

	#if defined( KOKKOS_ENABLE_CUDA )
	s << "macro KOKKOS_ENABLE_CUDA : defined" << std::endl ;
	#endif
	#if defined( CUDA_VERSION )
	s << "macro CUDA_VERSION = " << CUDA_VERSION
	<< " = version " << CUDA_VERSION / 1000
	<< "." << ( CUDA_VERSION % 1000 ) / 10
	<< std::endl ;
	#endif

	for ( int i = 0 ; i < dev_info.m_cudaDevCount ; ++i ) {
	s << "Kokkos::Cuda[ " << i << " ] "
	<< dev_info.m_cudaProp[i].name
	<< " capability " << dev_info.m_cudaProp[i].major << "." << dev_info.m_cudaProp[i].minor
	<< ", Total Global Memory: " << human_memory_size(dev_info.m_cudaProp[i].totalGlobalMem)
	<< ", Shared Memory per Block: " << human_memory_size(dev_info.m_cudaProp[i].sharedMemPerBlock);
	if ( m_cudaDev == i ) s << " : Selected" ;
	s << std::endl ;
	}
	}

	//----------------------------------------------------------------------------

	CudaInternal::~CudaInternal()
	{
	if ( m_stream \|\|
	m_scratchSpace \|\|
	m_scratchFlags \|\|
	m_scratchUnified ) {
	std::cerr << "Kokkos::Cuda ERROR: Failed to call Kokkos::Cuda::finalize()"
	<< std::endl ;
	std::cerr.flush();
	}

	m_cudaDev = -1 ;
	m_cudaArch = -1 ;
	m_multiProcCount = 0 ;
	m_maxWarpCount = 0 ;
	m_maxBlock = 0 ;
	m_maxSharedWords = 0 ;
	m_scratchSpaceCount = 0 ;
	m_scratchFlagsCount = 0 ;
	m_scratchUnifiedCount = 0 ;
	m_scratchUnifiedSupported = 0 ;
	m_streamCount = 0 ;
	m_scratchSpace = 0 ;
	m_scratchFlags = 0 ;
	m_scratchUnified = 0 ;
	m_stream = 0 ;
	}

	int CudaInternal::verify_is_initialized( const char * const label ) const
	{
	if ( m_cudaDev < 0 ) {
	std::cerr << "Kokkos::Cuda::" << label << " : ERROR device not initialized" << std::endl ;
	}
	return 0 <= m_cudaDev ;
	}

	CudaInternal & CudaInternal::singleton()
	{
	static CudaInternal self ;
	return self ;
	}

	void CudaInternal::initialize( int cuda_device_id , int stream_count )
	{
	if ( was_finalized ) Kokkos::abort("Calling Cuda::initialize after Cuda::finalize is illegal\n");
	was_initialized = 1;
	if ( is_initialized() ) return;

	enum { WordSize = sizeof(size_type) };

	if ( ! HostSpace::execution_space::is_initialized() ) {
	const std::string msg("Cuda::initialize ERROR : HostSpace::execution_space is not initialized");
	throw_runtime_exception( msg );
	}

	const CudaInternalDevices & dev_info = CudaInternalDevices::singleton();

	const bool ok_init = 0 == m_scratchSpace \|\| 0 == m_scratchFlags ;

	const bool ok_id = 0 <= cuda_device_id &&
	cuda_device_id < dev_info.m_cudaDevCount ;

	// Need device capability 3.0 or better

	const bool ok_dev = ok_id &&
	( 3 <= dev_info.m_cudaProp[ cuda_device_id ].major &&
	0 <= dev_info.m_cudaProp[ cuda_device_id ].minor );

	if ( ok_init && ok_dev ) {

	const struct cudaDeviceProp & cudaProp =
	dev_info.m_cudaProp[ cuda_device_id ];

	m_cudaDev = cuda_device_id ;

	CUDA_SAFE_CALL( cudaSetDevice( m_cudaDev ) );
	CUDA_SAFE_CALL( cudaDeviceReset() );
	Kokkos::Impl::cuda_device_synchronize();

	// Query what compute capability architecture a kernel executes:
	m_cudaArch = cuda_kernel_arch();

	if ( m_cudaArch != cudaProp.major * 100 + cudaProp.minor * 10 ) {
	std::cerr << "Kokkos::Cuda::initialize WARNING: running kernels compiled for compute capability "
	<< ( m_cudaArch / 100 ) << "." << ( ( m_cudaArch % 100 ) / 10 )
	<< " on device with compute capability "
	<< cudaProp.major << "." << cudaProp.minor
	<< " , this will likely reduce potential performance."
	<< std::endl ;
	}

	// number of multiprocessors

	m_multiProcCount = cudaProp.multiProcessorCount ;

	//----------------------------------
	// Maximum number of warps,
	// at most one warp per thread in a warp for reduction.

	// HCE 2012-February :
	// Found bug in CUDA 4.1 that sometimes a kernel launch would fail
	// if the thread count == 1024 and a functor is passed to the kernel.
	// Copying the kernel to constant memory and then launching with
	// thread count == 1024 would work fine.
	//
	// HCE 2012-October :
	// All compute capabilities support at least 16 warps (512 threads).
	// However, we have found that 8 warps typically gives better performance.

	m_maxWarpCount = 8 ;

	// m_maxWarpCount = cudaProp.maxThreadsPerBlock / Impl::CudaTraits::WarpSize ;

	if ( Impl::CudaTraits::WarpSize < m_maxWarpCount ) {
	m_maxWarpCount = Impl::CudaTraits::WarpSize ;
	}

	m_maxSharedWords = cudaProp.sharedMemPerBlock / WordSize ;

	//----------------------------------
	// Maximum number of blocks:

	m_maxBlock = cudaProp.maxGridSize[0] ;

	//----------------------------------

	m_scratchUnifiedSupported = cudaProp.unifiedAddressing ;

	if ( ! m_scratchUnifiedSupported ) {
	std::cout << "Kokkos::Cuda device "
	<< cudaProp.name << " capability "
	<< cudaProp.major << "." << cudaProp.minor
	<< " does not support unified virtual address space"
	<< std::endl ;
	}

	//----------------------------------
	// Multiblock reduction uses scratch flags for counters
	// and scratch space for partial reduction values.
	// Allocate some initial space. This will grow as needed.

	{
	const unsigned reduce_block_count = m_maxWarpCount * Impl::CudaTraits::WarpSize ;

	(void) scratch_unified( 16 * sizeof(size_type) );
	(void) scratch_flags( reduce_block_count * 2 * sizeof(size_type) );
	(void) scratch_space( reduce_block_count * 16 * sizeof(size_type) );
	}
	//----------------------------------

	if ( stream_count ) {
	m_stream = (cudaStream_t) ::malloc( stream_count sizeof(cudaStream_t) );
	m_streamCount = stream_count ;
	for ( size_type i = 0 ; i < m_streamCount ; ++i ) m_stream[i] = 0 ;
	}
	}
	else {

	std::ostringstream msg ;
	msg << "Kokkos::Cuda::initialize(" << cuda_device_id << ") FAILED" ;

	if ( ! ok_init ) {
	msg << " : Already initialized" ;
	}
	if ( ! ok_id ) {
	msg << " : Device identifier out of range "
	<< "[0.." << dev_info.m_cudaDevCount << "]" ;
	}
	else if ( ! ok_dev ) {
	msg << " : Device " ;
	msg << dev_info.m_cudaProp[ cuda_device_id ].major ;
	msg << "." ;
	msg << dev_info.m_cudaProp[ cuda_device_id ].minor ;
	msg << " has insufficient capability, required 3.0 or better" ;
	}
	Kokkos::Impl::throw_runtime_exception( msg.str() );
	}

	#ifdef KOKKOS_ENABLE_CUDA_UVM
	if(!cuda_launch_blocking()) {
	std::cout << "Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default" << std::endl;
	std::cout << " without setting CUDA_LAUNCH_BLOCKING=1." << std::endl;
	std::cout << " The code must call Cuda::fence() after each kernel" << std::endl;
	- std::cout << " or will likely crash when accessing data on the host." << std::endl;
	+ std::cout << " or will likely crash when accessing data on the host." << std::endl;
	}

	const char * env_force_device_alloc = getenv("CUDA_MANAGED_FORCE_DEVICE_ALLOC");
	bool force_device_alloc;
	if (env_force_device_alloc == 0) force_device_alloc=false;
	else force_device_alloc=atoi(env_force_device_alloc)!=0;
	-
	+
	const char * env_visible_devices = getenv("CUDA_VISIBLE_DEVICES");
	bool visible_devices_one=true;
	if (env_visible_devices == 0) visible_devices_one=false;
	-
	+
	if(!visible_devices_one && !force_device_alloc) {
	std::cout << "Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default" << std::endl;
	std::cout << " without setting CUDA_MANAGED_FORCE_DEVICE_ALLOC=1 or " << std::endl;
	std::cout << " setting CUDA_VISIBLE_DEVICES." << std::endl;
	std::cout << " This could on multi GPU systems lead to severe performance" << std::endl;
	std::cout << " penalties." << std::endl;
	}
	#endif

	cudaThreadSetCacheConfig(cudaFuncCachePreferShared);

	// Init the array for used for arbitrarily sized atomics
	Impl::init_lock_arrays_cuda_space();

	#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
	Kokkos::Impl::CudaLockArraysStruct locks;
	locks.atomic = atomic_lock_array_cuda_space_ptr(false);
	locks.scratch = scratch_lock_array_cuda_space_ptr(false);
	locks.threadid = threadid_lock_array_cuda_space_ptr(false);
	+ locks.n = Kokkos::Cuda::concurrency();
	cudaMemcpyToSymbol( kokkos_impl_cuda_lock_arrays , & locks , sizeof(CudaLockArraysStruct) );
	#endif
	}

	//----------------------------------------------------------------------------

	typedef Cuda::size_type ScratchGrain[ Impl::CudaTraits::WarpSize ] ;
	enum { sizeScratchGrain = sizeof(ScratchGrain) };


	Cuda::size_type *
	CudaInternal::scratch_flags( const Cuda::size_type size )
	{
	if ( verify_is_initialized("scratch_flags") && m_scratchFlagsCount * sizeScratchGrain < size ) {


	m_scratchFlagsCount = ( size + sizeScratchGrain - 1 ) / sizeScratchGrain ;

	typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void > Record ;

	Record * const r = Record::allocate( Kokkos::CudaSpace()
	, "InternalScratchFlags"
	, ( sizeof( ScratchGrain ) * m_scratchFlagsCount ) );

	Record::increment( r );

	m_scratchFlags = reinterpret_cast<size_type *>( r->data() );

	CUDA_SAFE_CALL( cudaMemset( m_scratchFlags , 0 , m_scratchFlagsCount * sizeScratchGrain ) );
	}

	return m_scratchFlags ;
	}

	Cuda::size_type *
	CudaInternal::scratch_space( const Cuda::size_type size )
	{
	if ( verify_is_initialized("scratch_space") && m_scratchSpaceCount * sizeScratchGrain < size ) {

	m_scratchSpaceCount = ( size + sizeScratchGrain - 1 ) / sizeScratchGrain ;

	typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void > Record ;

	Record * const r = Record::allocate( Kokkos::CudaSpace()
	, "InternalScratchSpace"
	, ( sizeof( ScratchGrain ) * m_scratchSpaceCount ) );

	Record::increment( r );

	m_scratchSpace = reinterpret_cast<size_type *>( r->data() );
	}

	return m_scratchSpace ;
	}

	Cuda::size_type *
	CudaInternal::scratch_unified( const Cuda::size_type size )
	{
	if ( verify_is_initialized("scratch_unified") &&
	m_scratchUnifiedSupported && m_scratchUnifiedCount * sizeScratchGrain < size ) {

	m_scratchUnifiedCount = ( size + sizeScratchGrain - 1 ) / sizeScratchGrain ;

	typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void > Record ;

	Record * const r = Record::allocate( Kokkos::CudaHostPinnedSpace()
	, "InternalScratchUnified"
	, ( sizeof( ScratchGrain ) * m_scratchUnifiedCount ) );

	Record::increment( r );

	m_scratchUnified = reinterpret_cast<size_type *>( r->data() );
	}

	return m_scratchUnified ;
	}

	//----------------------------------------------------------------------------

	void CudaInternal::finalize()
	{
	was_finalized = 1;
	if ( 0 != m_scratchSpace \|\| 0 != m_scratchFlags ) {

	- atomic_lock_array_cuda_space_ptr(false);
	- scratch_lock_array_cuda_space_ptr(false);
	- threadid_lock_array_cuda_space_ptr(false);
	+ atomic_lock_array_cuda_space_ptr(true);
	+ scratch_lock_array_cuda_space_ptr(true);
	+ threadid_lock_array_cuda_space_ptr(true);

	if ( m_stream ) {
	for ( size_type i = 1 ; i < m_streamCount ; ++i ) {
	cudaStreamDestroy( m_stream[i] );
	m_stream[i] = 0 ;
	}
	::free( m_stream );
	}

	typedef Kokkos::Experimental::Impl::SharedAllocationRecord< CudaSpace > RecordCuda ;
	typedef Kokkos::Experimental::Impl::SharedAllocationRecord< CudaHostPinnedSpace > RecordHost ;

	RecordCuda::decrement( RecordCuda::get_record( m_scratchFlags ) );
	RecordCuda::decrement( RecordCuda::get_record( m_scratchSpace ) );
	RecordHost::decrement( RecordHost::get_record( m_scratchUnified ) );

	m_cudaDev = -1 ;
	m_multiProcCount = 0 ;
	m_maxWarpCount = 0 ;
	m_maxBlock = 0 ;
	m_maxSharedWords = 0 ;
	m_scratchSpaceCount = 0 ;
	m_scratchFlagsCount = 0 ;
	m_scratchUnifiedCount = 0 ;
	m_streamCount = 0 ;
	m_scratchSpace = 0 ;
	m_scratchFlags = 0 ;
	m_scratchUnified = 0 ;
	m_stream = 0 ;
	}
	}

	//----------------------------------------------------------------------------

	Cuda::size_type cuda_internal_multiprocessor_count()
	{ return CudaInternal::singleton().m_multiProcCount ; }

	Cuda::size_type cuda_internal_maximum_warp_count()
	{ return CudaInternal::singleton().m_maxWarpCount ; }

	Cuda::size_type cuda_internal_maximum_grid_count()
	{ return CudaInternal::singleton().m_maxBlock ; }

	Cuda::size_type cuda_internal_maximum_shared_words()
	{ return CudaInternal::singleton().m_maxSharedWords ; }

	Cuda::size_type * cuda_internal_scratch_space( const Cuda::size_type size )
	{ return CudaInternal::singleton().scratch_space( size ); }

	Cuda::size_type * cuda_internal_scratch_flags( const Cuda::size_type size )
	{ return CudaInternal::singleton().scratch_flags( size ); }

	Cuda::size_type * cuda_internal_scratch_unified( const Cuda::size_type size )
	{ return CudaInternal::singleton().scratch_unified( size ); }


	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------

	namespace Kokkos {

	Cuda::size_type Cuda::detect_device_count()
	{ return Impl::CudaInternalDevices::singleton().m_cudaDevCount ; }

	int Cuda::concurrency() {
	return 131072;
	}

	int Cuda::is_initialized()
	{ return Impl::CudaInternal::singleton().is_initialized(); }

	void Cuda::initialize( const Cuda::SelectDevice config , size_t num_instances )
	{
	Impl::CudaInternal::singleton().initialize( config.cuda_device_id , num_instances );

	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	Kokkos::Profiling::initialize();
	#endif
	}

	std::vector<unsigned>
	Cuda::detect_device_arch()
	{
	const Impl::CudaInternalDevices & s = Impl::CudaInternalDevices::singleton();

	std::vector<unsigned> output( s.m_cudaDevCount );

	for ( int i = 0 ; i < s.m_cudaDevCount ; ++i ) {
	output[i] = s.m_cudaProp[i].major * 100 + s.m_cudaProp[i].minor ;
	}

	return output ;
	}

	Cuda::size_type Cuda::device_arch()
	{
	const int dev_id = Impl::CudaInternal::singleton().m_cudaDev ;

	int dev_arch = 0 ;

	if ( 0 <= dev_id ) {
	const struct cudaDeviceProp & cudaProp =
	Impl::CudaInternalDevices::singleton().m_cudaProp[ dev_id ] ;

	dev_arch = cudaProp.major * 100 + cudaProp.minor ;
	}

	return dev_arch ;
	}

	void Cuda::finalize()
	{
	Impl::CudaInternal::singleton().finalize();

	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	Kokkos::Profiling::finalize();
	#endif
	}

	Cuda::Cuda()
	: m_device( Impl::CudaInternal::singleton().m_cudaDev )
	, m_stream( 0 )
	{
	Impl::CudaInternal::singleton().verify_is_initialized( "Cuda instance constructor" );
	}

	Cuda::Cuda( const int instance_id )
	: m_device( Impl::CudaInternal::singleton().m_cudaDev )
	, m_stream(
	Impl::CudaInternal::singleton().verify_is_initialized( "Cuda instance constructor" )
	? Impl::CudaInternal::singleton().m_stream[ instance_id % Impl::CudaInternal::singleton().m_streamCount ]
	: 0 )
	{}

	void Cuda::print_configuration( std::ostream & s , const bool )
	{ Impl::CudaInternal::singleton().print_configuration( s ); }

	bool Cuda::sleep() { return false ; }

	bool Cuda::wake() { return true ; }

	void Cuda::fence()
	{
	Kokkos::Impl::cuda_device_synchronize();
	}

	} // namespace Kokkos

	#endif // KOKKOS_ENABLE_CUDA
	//----------------------------------------------------------------------------

	diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp
	index fa29d732f..56e6a3c1e 100644
	--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp
	+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp
	@@ -1,1926 +1,1970 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_CUDA_PARALLEL_HPP
	#define KOKKOS_CUDA_PARALLEL_HPP

	#include <iostream>
	#include <algorithm>
	#include <stdio.h>

	#include <Kokkos_Macros.hpp>

	/* only compile this file if CUDA is enabled for Kokkos */
	#if defined( __CUDACC__ ) && defined( KOKKOS_ENABLE_CUDA )

	#include <utility>
	#include <Kokkos_Parallel.hpp>

	#include <Cuda/Kokkos_CudaExec.hpp>
	#include <Cuda/Kokkos_Cuda_ReduceScan.hpp>
	#include <Cuda/Kokkos_Cuda_Internal.hpp>
	#include <Kokkos_Vectorization.hpp>

	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	#include <impl/Kokkos_Profiling_Interface.hpp>
	#include <typeinfo>
	#endif

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	template< typename Type >
	struct CudaJoinFunctor {
	typedef Type value_type ;

	KOKKOS_INLINE_FUNCTION
	static void join( volatile value_type & update ,
	volatile const value_type & input )
	{ update += input ; }
	};

	class CudaTeamMember {
	private:

	typedef Kokkos::Cuda execution_space ;
	typedef execution_space::scratch_memory_space scratch_memory_space ;

	void * m_team_reduce ;
	scratch_memory_space m_team_shared ;
	int m_league_rank ;
	int m_league_size ;

	public:

	KOKKOS_INLINE_FUNCTION
	const execution_space::scratch_memory_space & team_shmem() const
	{ return m_team_shared.set_team_thread_mode(0,1,0) ; }
	KOKKOS_INLINE_FUNCTION
	const execution_space::scratch_memory_space & team_scratch(const int& level) const
	{ return m_team_shared.set_team_thread_mode(level,1,0) ; }
	KOKKOS_INLINE_FUNCTION
	const execution_space::scratch_memory_space & thread_scratch(const int& level) const
	{ return m_team_shared.set_team_thread_mode(level,team_size(),team_rank()) ; }

	KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
	KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
	KOKKOS_INLINE_FUNCTION int team_rank() const {
	#ifdef __CUDA_ARCH__
	return threadIdx.y ;
	#else
	return 1;
	#endif
	}
	KOKKOS_INLINE_FUNCTION int team_size() const {
	#ifdef __CUDA_ARCH__
	return blockDim.y ;
	#else
	return 1;
	#endif
	}

	KOKKOS_INLINE_FUNCTION void team_barrier() const {
	#ifdef __CUDA_ARCH__
	__syncthreads();
	#endif
	}

	template<class ValueType>
	KOKKOS_INLINE_FUNCTION void team_broadcast(ValueType& value, const int& thread_id) const {
	#ifdef __CUDA_ARCH__
	__shared__ ValueType sh_val;
	if(threadIdx.x == 0 && threadIdx.y == thread_id) {
	sh_val = value;
	}
	team_barrier();
	value = sh_val;
	team_barrier();
	#endif
	}

	template< class ValueType, class JoinOp >
	KOKKOS_INLINE_FUNCTION
	typename JoinOp::value_type team_reduce( const ValueType & value
	, const JoinOp & op_in ) const {
	#ifdef __CUDA_ARCH__
	typedef JoinLambdaAdapter<ValueType,JoinOp> JoinOpFunctor ;
	const JoinOpFunctor op(op_in);
	ValueType * const base_data = (ValueType *) m_team_reduce ;

	__syncthreads(); // Don't write in to shared data until all threads have entered this function

	if ( 0 == threadIdx.y ) { base_data[0] = 0 ; }

	base_data[ threadIdx.y ] = value ;

	Impl::cuda_intra_block_reduce_scan<false,JoinOpFunctor,void>( op , base_data );

	return base_data[ blockDim.y - 1 ];
	#else
	return typename JoinOp::value_type();
	#endif
	}

	/** \brief Intra-team exclusive prefix sum with team_rank() ordering
	* with intra-team non-deterministic ordering accumulation.
	*
	* The global inter-team accumulation value will, at the end of the
	* league's parallel execution, be the scan's total.
	* Parallel execution ordering of the league's teams is non-deterministic.
	* As such the base value for each team's scan operation is similarly
	* non-deterministic.
	*/
	template< typename Type >
	KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value , Type * const global_accum ) const {
	#ifdef __CUDA_ARCH__
	Type * const base_data = (Type *) m_team_reduce ;

	__syncthreads(); // Don't write in to shared data until all threads have entered this function

	if ( 0 == threadIdx.y ) { base_data[0] = 0 ; }

	base_data[ threadIdx.y + 1 ] = value ;

	Impl::cuda_intra_block_reduce_scan<true,Impl::CudaJoinFunctor<Type>,void>( Impl::CudaJoinFunctor<Type>() , base_data + 1 );

	if ( global_accum ) {
	if ( blockDim.y == threadIdx.y + 1 ) {
	base_data[ blockDim.y ] = atomic_fetch_add( global_accum , base_data[ blockDim.y ] );
	}
	__syncthreads(); // Wait for atomic
	base_data[ threadIdx.y ] += base_data[ blockDim.y ] ;
	}

	return base_data[ threadIdx.y ];
	#else
	return Type();
	#endif
	}

	/** \brief Intra-team exclusive prefix sum with team_rank() ordering.
	*
	* The highest rank thread can compute the reduction total as
	* reduction_total = dev.team_scan( value ) + value ;
	*/
	template< typename Type >
	KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value ) const {
	return this->template team_scan<Type>( value , 0 );
	}

	//----------------------------------------
	// Private for the driver

	KOKKOS_INLINE_FUNCTION
	CudaTeamMember( void * shared
	, const int shared_begin
	, const int shared_size
	, void* scratch_level_1_ptr
	, const int scratch_level_1_size
	, const int arg_league_rank
	, const int arg_league_size )
	: m_team_reduce( shared )
	, m_team_shared( ((char *)shared) + shared_begin , shared_size, scratch_level_1_ptr, scratch_level_1_size)
	, m_league_rank( arg_league_rank )
	, m_league_size( arg_league_size )
	{}

	};

	} // namespace Impl

	namespace Impl {
	template< class ... Properties >
	class TeamPolicyInternal< Kokkos::Cuda , Properties ... >: public PolicyTraits<Properties ... >
	{
	public:

	//! Tag this class as a kokkos execution policy
	typedef TeamPolicyInternal execution_policy ;

	typedef PolicyTraits<Properties ... > traits;

	private:

	enum { MAX_WARP = 8 };

	int m_league_size ;
	int m_team_size ;
	int m_vector_length ;
	int m_team_scratch_size[2] ;
	int m_thread_scratch_size[2] ;
	int m_chunk_size;

	public:

	//! Execution space of this execution policy
	typedef Kokkos::Cuda execution_space ;

	TeamPolicyInternal& operator = (const TeamPolicyInternal& p) {
	m_league_size = p.m_league_size;
	m_team_size = p.m_team_size;
	m_vector_length = p.m_vector_length;
	m_team_scratch_size[0] = p.m_team_scratch_size[0];
	m_team_scratch_size[1] = p.m_team_scratch_size[1];
	m_thread_scratch_size[0] = p.m_thread_scratch_size[0];
	m_thread_scratch_size[1] = p.m_thread_scratch_size[1];
	m_chunk_size = p.m_chunk_size;
	return *this;
	}

	//----------------------------------------

	template< class FunctorType >
	inline static
	int team_size_max( const FunctorType & functor )
	{
	int n = MAX_WARP * Impl::CudaTraits::WarpSize ;

	for ( ; n ; n >>= 1 ) {
	const int shmem_size =
	/* for global reduce */ Impl::cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,typename traits::work_tag>( functor , n )
	/* for team reduce / + ( n + 2 ) sizeof(double)
	/* for team shared */ + Impl::FunctorTeamShmemSize< FunctorType >::value( functor , n );

	if ( shmem_size < Impl::CudaTraits::SharedMemoryCapacity ) break ;
	}

	return n ;
	}

	template< class FunctorType >
	static int team_size_recommended( const FunctorType & functor )
	{ return team_size_max( functor ); }

	template< class FunctorType >
	static int team_size_recommended( const FunctorType & functor , const int vector_length)
	{
	int max = team_size_max( functor )/vector_length;
	if(max<1) max = 1;
	return max;
	}

	inline static
	int vector_length_max()
	{ return Impl::CudaTraits::WarpSize; }

	//----------------------------------------

	inline int vector_length() const { return m_vector_length ; }
	inline int team_size() const { return m_team_size ; }
	inline int league_size() const { return m_league_size ; }
	inline int scratch_size(int level, int team_size_ = -1) const {
	if(team_size_<0) team_size_ = m_team_size;
	return m_team_scratch_size[level] + team_size_*m_thread_scratch_size[level];
	}
	inline size_t team_scratch_size(int level) const {
	return m_team_scratch_size[level];
	}
	inline size_t thread_scratch_size(int level) const {
	return m_thread_scratch_size[level];
	}

	TeamPolicyInternal()
	: m_league_size( 0 )
	, m_team_size( 0 )
	, m_vector_length( 0 )
	, m_team_scratch_size {0,0}
	, m_thread_scratch_size {0,0}
	, m_chunk_size ( 32 )
	{}

	/** \brief Specify league size, request team size */
	TeamPolicyInternal( execution_space &
	, int league_size_
	, int team_size_request
	, int vector_length_request = 1 )
	: m_league_size( league_size_ )
	, m_team_size( team_size_request )
	, m_vector_length( vector_length_request )
	, m_team_scratch_size {0,0}
	, m_thread_scratch_size {0,0}
	, m_chunk_size ( 32 )
	{
	// Allow only power-of-two vector_length
	if ( ! Kokkos::Impl::is_integral_power_of_two( vector_length_request ) ) {
	Impl::throw_runtime_exception( "Requested non-power-of-two vector length for TeamPolicy.");
	}

	// Make sure league size is permissable
	if(league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
	Impl::throw_runtime_exception( "Requested too large league_size for TeamPolicy on Cuda execution space.");

	// Make sure total block size is permissable
	if ( m_team_size * m_vector_length > 1024 ) {
	Impl::throw_runtime_exception(std::string("Kokkos::TeamPolicy< Cuda > the team size is too large. Team size x vector length must be smaller than 1024."));
	}
	}

	/** \brief Specify league size, request team size */
	TeamPolicyInternal( execution_space &
	, int league_size_
	, const Kokkos::AUTO_t & /* team_size_request */
	, int vector_length_request = 1 )
	: m_league_size( league_size_ )
	, m_team_size( -1 )
	, m_vector_length( vector_length_request )
	, m_team_scratch_size {0,0}
	, m_thread_scratch_size {0,0}
	, m_chunk_size ( 32 )
	{
	// Allow only power-of-two vector_length
	if ( ! Kokkos::Impl::is_integral_power_of_two( vector_length_request ) ) {
	Impl::throw_runtime_exception( "Requested non-power-of-two vector length for TeamPolicy.");
	}

	// Make sure league size is permissable
	if(league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
	Impl::throw_runtime_exception( "Requested too large league_size for TeamPolicy on Cuda execution space.");
	}

	TeamPolicyInternal( int league_size_
	, int team_size_request
	, int vector_length_request = 1 )
	: m_league_size( league_size_ )
	, m_team_size( team_size_request )
	, m_vector_length ( vector_length_request )
	, m_team_scratch_size {0,0}
	, m_thread_scratch_size {0,0}
	, m_chunk_size ( 32 )
	{
	// Allow only power-of-two vector_length
	if ( ! Kokkos::Impl::is_integral_power_of_two( vector_length_request ) ) {
	Impl::throw_runtime_exception( "Requested non-power-of-two vector length for TeamPolicy.");
	}

	// Make sure league size is permissable
	if(league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
	Impl::throw_runtime_exception( "Requested too large league_size for TeamPolicy on Cuda execution space.");

	// Make sure total block size is permissable
	if ( m_team_size * m_vector_length > 1024 ) {
	Impl::throw_runtime_exception(std::string("Kokkos::TeamPolicy< Cuda > the team size is too large. Team size x vector length must be smaller than 1024."));
	}
	}

	TeamPolicyInternal( int league_size_
	, const Kokkos::AUTO_t & /* team_size_request */
	, int vector_length_request = 1 )
	: m_league_size( league_size_ )
	, m_team_size( -1 )
	, m_vector_length ( vector_length_request )
	, m_team_scratch_size {0,0}
	, m_thread_scratch_size {0,0}
	, m_chunk_size ( 32 )
	{
	// Allow only power-of-two vector_length
	if ( ! Kokkos::Impl::is_integral_power_of_two( vector_length_request ) ) {
	Impl::throw_runtime_exception( "Requested non-power-of-two vector length for TeamPolicy.");
	}

	// Make sure league size is permissable
	if(league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
	Impl::throw_runtime_exception( "Requested too large league_size for TeamPolicy on Cuda execution space.");
	}

	inline int chunk_size() const { return m_chunk_size ; }

	/** \brief set chunk_size to a discrete value*/
	inline TeamPolicyInternal set_chunk_size(typename traits::index_type chunk_size_) const {
	TeamPolicyInternal p = *this;
	p.m_chunk_size = chunk_size_;
	return p;
	}

	/** \brief set per team scratch size for a specific level of the scratch hierarchy */
	inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team) const {
	TeamPolicyInternal p = *this;
	p.m_team_scratch_size[level] = per_team.value;
	return p;
	};

	/** \brief set per thread scratch size for a specific level of the scratch hierarchy */
	inline TeamPolicyInternal set_scratch_size(const int& level, const PerThreadValue& per_thread) const {
	TeamPolicyInternal p = *this;
	p.m_thread_scratch_size[level] = per_thread.value;
	return p;
	};

	/** \brief set per thread and per team scratch size for a specific level of the scratch hierarchy */
	inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team, const PerThreadValue& per_thread) const {
	TeamPolicyInternal p = *this;
	p.m_team_scratch_size[level] = per_team.value;
	p.m_thread_scratch_size[level] = per_thread.value;
	return p;
	};

	typedef Kokkos::Impl::CudaTeamMember member_type ;
	};
	} // namspace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	template< class FunctorType , class ... Traits >
	class ParallelFor< FunctorType
	, Kokkos::RangePolicy< Traits ... >
	, Kokkos::Cuda
	>
	{
	private:

	typedef Kokkos::RangePolicy< Traits ... > Policy;
	typedef typename Policy::member_type Member ;
	typedef typename Policy::work_tag WorkTag ;

	const FunctorType m_functor ;
	const Policy m_policy ;

	ParallelFor() = delete ;
	ParallelFor & operator = ( const ParallelFor & ) = delete ;

	template< class TagType >
	inline __device__
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	exec_range( const Member i ) const
	{ m_functor( i ); }

	template< class TagType >
	inline __device__
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	exec_range( const Member i ) const
	{ m_functor( TagType() , i ); }

	public:

	typedef FunctorType functor_type ;

	inline
	__device__
	void operator()(void) const
	{
	const Member work_stride = blockDim.y * gridDim.x ;
	const Member work_end = m_policy.end();

	for ( Member
	iwork = m_policy.begin() + threadIdx.y + blockDim.y * blockIdx.x ;
	iwork < work_end ;
	iwork += work_stride ) {
	this-> template exec_range< WorkTag >( iwork );
	}
	}

	inline
	void execute() const
	{
	const int nwork = m_policy.end() - m_policy.begin();
	const dim3 block( 1 , CudaTraits::WarpSize * cuda_internal_maximum_warp_count(), 1);
	const dim3 grid( std::min( ( nwork + block.y - 1 ) / block.y , cuda_internal_maximum_grid_count() ) , 1 , 1);

	CudaParallelLaunch< ParallelFor >( *this , grid , block , 0 );
	}

	ParallelFor( const FunctorType & arg_functor ,
	const Policy & arg_policy )
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	{ }
	};

	template< class FunctorType , class ... Properties >
	class ParallelFor< FunctorType
	, Kokkos::TeamPolicy< Properties ... >
	, Kokkos::Cuda
	>
	{
	private:

	typedef TeamPolicyInternal< Kokkos::Cuda , Properties ... > Policy ;
	typedef typename Policy::member_type Member ;
	typedef typename Policy::work_tag WorkTag ;

	public:

	typedef FunctorType functor_type ;
	typedef Cuda::size_type size_type ;

	private:

	// Algorithmic constraints: blockDim.y is a power of two AND blockDim.y == blockDim.z == 1
	// shared memory utilization:
	//
	// [ team reduce space ]
	// [ team shared space ]
	//

	const FunctorType m_functor ;
	const size_type m_league_size ;
	const size_type m_team_size ;
	const size_type m_vector_size ;
	const size_type m_shmem_begin ;
	const size_type m_shmem_size ;
	void* m_scratch_ptr[2] ;
	const int m_scratch_size[2] ;

	template< class TagType >
	__device__ inline
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	exec_team( const Member & member ) const
	{ m_functor( member ); }

	template< class TagType >
	__device__ inline
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	exec_team( const Member & member ) const
	{ m_functor( TagType() , member ); }

	public:

	__device__ inline
	void operator()(void) const
	{
	// Iterate this block through the league
	+ int threadid = 0;
	+ if ( m_scratch_size[1]>0 ) {
	+ __shared__ int base_thread_id;
	+ if (threadIdx.x==0 && threadIdx.y==0 ) {
	+ threadid = ((blockIdx.xblockDim.z + threadIdx.z) blockDim.x * blockDim.y) % kokkos_impl_cuda_lock_arrays.n;
	+ threadid = ((threadid + blockDim.x * blockDim.y-1)/(blockDim.x * blockDim.y)) * blockDim.x * blockDim.y;
	+ if(threadid > kokkos_impl_cuda_lock_arrays.n) threadid-=blockDim.x * blockDim.y;
	+ int done = 0;
	+ while (!done) {
	+ done = (0 == atomicCAS(&kokkos_impl_cuda_lock_arrays.atomic[threadid],0,1));
	+ if(!done) {
	+ threadid += blockDim.x * blockDim.y;
	+ if(threadid > kokkos_impl_cuda_lock_arrays.n) threadid = 0;
	+ }
	+ }
	+ base_thread_id = threadid;
	+ }
	+ __syncthreads();
	+ threadid = base_thread_id;
	+ }
	+
	+
	for ( int league_rank = blockIdx.x ; league_rank < m_league_size ; league_rank += gridDim.x ) {

	this-> template exec_team< WorkTag >(
	typename Policy::member_type( kokkos_impl_cuda_shared_memory<void>()
	, m_shmem_begin
	, m_shmem_size
	- , m_scratch_ptr[1]
	+ , (void) ( ((char)m_scratch_ptr[1]) + threadid/(blockDim.xblockDim.y) m_scratch_size[1])
	, m_scratch_size[1]
	, league_rank
	, m_league_size ) );
	}
	}

	inline
	void execute() const
	{
	const int shmem_size_total = m_shmem_begin + m_shmem_size ;
	const dim3 grid( int(m_league_size) , 1 , 1 );
	const dim3 block( int(m_vector_size) , int(m_team_size) , 1 );

	CudaParallelLaunch< ParallelFor >( *this, grid, block, shmem_size_total ); // copy to device and execute

	}

	ParallelFor( const FunctorType & arg_functor
	, const Policy & arg_policy
	)
	: m_functor( arg_functor )
	, m_league_size( arg_policy.league_size() )
	, m_team_size( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
	Kokkos::Impl::cuda_get_opt_block_size< ParallelFor >( arg_functor , arg_policy.vector_length(), arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) / arg_policy.vector_length() )
	, m_vector_size( arg_policy.vector_length() )
	, m_shmem_begin( sizeof(double) * ( m_team_size + 2 ) )
	, m_shmem_size( arg_policy.scratch_size(0,m_team_size) + FunctorTeamShmemSize< FunctorType >::value( m_functor , m_team_size ) )
	, m_scratch_ptr{NULL,NULL}
	, m_scratch_size{arg_policy.scratch_size(0,m_team_size),arg_policy.scratch_size(1,m_team_size)}
	{
	// Functor's reduce memory, team scan memory, and team shared memory depend upon team size.
	m_scratch_ptr[1] = cuda_resize_scratch_space(m_scratch_size[1](Cuda::concurrency()/(m_team_sizem_vector_size)));

	const int shmem_size_total = m_shmem_begin + m_shmem_size ;
	if ( CudaTraits::SharedMemoryCapacity < shmem_size_total ) {
	Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelFor< Cuda > insufficient shared memory"));
	}

	if ( int(m_team_size) >
	int(Kokkos::Impl::cuda_get_max_block_size< ParallelFor >
	( arg_functor , arg_policy.vector_length(), arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) / arg_policy.vector_length())) {
	Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelFor< Cuda > requested too large team size."));
	}
	}
	};

	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	template< class FunctorType , class ReducerType, class ... Traits >
	class ParallelReduce< FunctorType
	, Kokkos::RangePolicy< Traits ... >
	, ReducerType
	, Kokkos::Cuda
	>
	{
	private:

	typedef Kokkos::RangePolicy< Traits ... > Policy ;

	typedef typename Policy::WorkRange WorkRange ;
	typedef typename Policy::work_tag WorkTag ;
	typedef typename Policy::member_type Member ;

	typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
	typedef typename ReducerConditional::type ReducerTypeFwd;

	typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
	typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
	typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTag > ValueJoin ;

	public:

	typedef typename ValueTraits::pointer_type pointer_type ;
	typedef typename ValueTraits::value_type value_type ;
	typedef typename ValueTraits::reference_type reference_type ;
	typedef FunctorType functor_type ;
	typedef Cuda::size_type size_type ;

	// Algorithmic constraints: blockSize is a power of two AND blockDim.y == blockDim.z == 1

	const FunctorType m_functor ;
	const Policy m_policy ;
	const ReducerType m_reducer ;
	const pointer_type m_result_ptr ;
	size_type * m_scratch_space ;
	size_type * m_scratch_flags ;
	size_type * m_unified_space ;

	// Shall we use the shfl based reduction or not (only use it for static sized types of more than 128bit
	enum { UseShflReduction = ((sizeof(value_type)>2*sizeof(double)) && ValueTraits::StaticValueSize) };
	// Some crutch to do function overloading
	private:
	typedef double DummyShflReductionType;
	typedef int DummySHMEMReductionType;

	public:
	template< class TagType >
	__device__ inline
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	exec_range( const Member & i , reference_type update ) const
	{ m_functor( i , update ); }

	template< class TagType >
	__device__ inline
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	exec_range( const Member & i , reference_type update ) const
	{ m_functor( TagType() , i , update ); }

	__device__ inline
	void operator() () const {
	run(Kokkos::Impl::if_c<UseShflReduction, DummyShflReductionType, DummySHMEMReductionType>::select(1,1.0) );
	}

	__device__ inline
	void run(const DummySHMEMReductionType& ) const
	{
	const integral_nonzero_constant< size_type , ValueTraits::StaticValueSize / sizeof(size_type) >
	word_count( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) / sizeof(size_type) );

	{
	reference_type value =
	ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , kokkos_impl_cuda_shared_memory<size_type>() + threadIdx.y * word_count.value );

	// Number of blocks is bounded so that the reduction can be limited to two passes.
	// Each thread block is given an approximately equal amount of work to perform.
	// Accumulate the values for this block.
	// The accumulation ordering does not match the final pass, but is arithmatically equivalent.

	const WorkRange range( m_policy , blockIdx.x , gridDim.x );

	for ( Member iwork = range.begin() + threadIdx.y , iwork_end = range.end() ;
	iwork < iwork_end ; iwork += blockDim.y ) {
	this-> template exec_range< WorkTag >( iwork , value );
	}
	}

	// Reduce with final value at blockDim.y - 1 location.
	if ( cuda_single_inter_block_reduce_scan<false,ReducerTypeFwd,WorkTag>(
	ReducerConditional::select(m_functor , m_reducer) , blockIdx.x , gridDim.x ,
	kokkos_impl_cuda_shared_memory<size_type>() , m_scratch_space , m_scratch_flags ) ) {

	// This is the final block with the final result at the final threads' location

	size_type * const shared = kokkos_impl_cuda_shared_memory<size_type>() + ( blockDim.y - 1 ) * word_count.value ;
	size_type * const global = m_unified_space ? m_unified_space : m_scratch_space ;

	if ( threadIdx.y == 0 ) {
	Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , shared );
	}

	if ( CudaTraits::WarpSize < word_count.value ) { __syncthreads(); }

	for ( unsigned i = threadIdx.y ; i < word_count.value ; i += blockDim.y ) { global[i] = shared[i]; }
	}
	}

	__device__ inline
	void run(const DummyShflReductionType&) const
	{

	value_type value;
	ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , &value);
	// Number of blocks is bounded so that the reduction can be limited to two passes.
	// Each thread block is given an approximately equal amount of work to perform.
	// Accumulate the values for this block.
	// The accumulation ordering does not match the final pass, but is arithmatically equivalent.

	const WorkRange range( m_policy , blockIdx.x , gridDim.x );

	for ( Member iwork = range.begin() + threadIdx.y , iwork_end = range.end() ;
	iwork < iwork_end ; iwork += blockDim.y ) {
	this-> template exec_range< WorkTag >( iwork , value );
	}

	pointer_type const result = (pointer_type) (m_unified_space ? m_unified_space : m_scratch_space) ;

	int max_active_thread = range.end()-range.begin() < blockDim.y ? range.end() - range.begin():blockDim.y;

	max_active_thread = (max_active_thread == 0)?blockDim.y:max_active_thread;

	value_type init;
	ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , &init);
	if(Impl::cuda_inter_block_reduction<ReducerTypeFwd,ValueJoin,WorkTag>
	(value,init,ValueJoin(ReducerConditional::select(m_functor , m_reducer)),m_scratch_space,result,m_scratch_flags,max_active_thread)) {
	const unsigned id = threadIdx.y*blockDim.x + threadIdx.x;
	if(id==0) {
	Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , (void*) &value );
	*result = value;
	}
	}
	}

	// Determine block size constrained by shared memory:
	static inline
	unsigned local_block_size( const FunctorType & f )
	{
	unsigned n = CudaTraits::WarpSize * 8 ;
	while ( n && CudaTraits::SharedMemoryCapacity < cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( f , n ) ) { n >>= 1 ; }
	return n ;
	}

	inline
	void execute()
	{
	const int nwork = m_policy.end() - m_policy.begin();
	if ( nwork ) {
	const int block_size = local_block_size( m_functor );

	m_scratch_space = cuda_internal_scratch_space( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) * block_size /* block_size == max block_count */ );
	m_scratch_flags = cuda_internal_scratch_flags( sizeof(size_type) );
	m_unified_space = cuda_internal_scratch_unified( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) );

	// REQUIRED ( 1 , N , 1 )
	const dim3 block( 1 , block_size , 1 );
	// Required grid.x <= block.y
	const dim3 grid( std::min( int(block.y) , int( ( nwork + block.y - 1 ) / block.y ) ) , 1 , 1 );

	const int shmem = UseShflReduction?0:cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( m_functor , block.y );

	CudaParallelLaunch< ParallelReduce >( *this, grid, block, shmem ); // copy to device and execute

	Cuda::fence();

	if ( m_result_ptr ) {
	if ( m_unified_space ) {
	const int count = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
	for ( int i = 0 ; i < count ; ++i ) { m_result_ptr[i] = pointer_type(m_unified_space)[i] ; }
	}
	else {
	const int size = ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) );
	DeepCopy<HostSpace,CudaSpace>( m_result_ptr , m_scratch_space , size );
	}
	}
	}
	else {
	if (m_result_ptr) {
	ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , m_result_ptr );
	}
	}
	}

	template< class HostViewType >
	ParallelReduce( const FunctorType & arg_functor
	, const Policy & arg_policy
	, const HostViewType & arg_result
	, typename std::enable_if<
	Kokkos::is_view< HostViewType >::value
	,void*>::type = NULL)
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	, m_reducer( InvalidType() )
	, m_result_ptr( arg_result.ptr_on_device() )
	, m_scratch_space( 0 )
	, m_scratch_flags( 0 )
	, m_unified_space( 0 )
	{ }

	ParallelReduce( const FunctorType & arg_functor
	, const Policy & arg_policy
	, const ReducerType & reducer)
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	, m_reducer( reducer )
	, m_result_ptr( reducer.result_view().ptr_on_device() )
	, m_scratch_space( 0 )
	, m_scratch_flags( 0 )
	, m_unified_space( 0 )
	{ }
	};

	//----------------------------------------------------------------------------

	template< class FunctorType , class ReducerType, class ... Properties >
	class ParallelReduce< FunctorType
	, Kokkos::TeamPolicy< Properties ... >
	, ReducerType
	, Kokkos::Cuda
	>
	{
	private:

	typedef TeamPolicyInternal< Kokkos::Cuda, Properties ... > Policy ;
	typedef typename Policy::member_type Member ;
	typedef typename Policy::work_tag WorkTag ;

	typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
	typedef typename ReducerConditional::type ReducerTypeFwd;

	typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
	typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
	typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTag > ValueJoin ;

	typedef typename ValueTraits::pointer_type pointer_type ;
	typedef typename ValueTraits::reference_type reference_type ;
	typedef typename ValueTraits::value_type value_type ;

	public:

	typedef FunctorType functor_type ;
	typedef Cuda::size_type size_type ;

	enum { UseShflReduction = (true && ValueTraits::StaticValueSize) };

	private:
	typedef double DummyShflReductionType;
	typedef int DummySHMEMReductionType;

	// Algorithmic constraints: blockDim.y is a power of two AND blockDim.y == blockDim.z == 1
	// shared memory utilization:
	//
	// [ global reduce space ]
	// [ team reduce space ]
	// [ team shared space ]
	//

	const FunctorType m_functor ;
	const ReducerType m_reducer ;
	const pointer_type m_result_ptr ;
	size_type * m_scratch_space ;
	size_type * m_scratch_flags ;
	size_type * m_unified_space ;
	size_type m_team_begin ;
	size_type m_shmem_begin ;
	size_type m_shmem_size ;
	void* m_scratch_ptr[2] ;
	int m_scratch_size[2] ;
	const size_type m_league_size ;
	const size_type m_team_size ;
	const size_type m_vector_size ;

	template< class TagType >
	__device__ inline
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	exec_team( const Member & member , reference_type update ) const
	{ m_functor( member , update ); }

	template< class TagType >
	__device__ inline
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	exec_team( const Member & member , reference_type update ) const
	{ m_functor( TagType() , member , update ); }

	public:

	__device__ inline
	void operator() () const {
	- run(Kokkos::Impl::if_c<UseShflReduction, DummyShflReductionType, DummySHMEMReductionType>::select(1,1.0) );
	+ int threadid = 0;
	+ if ( m_scratch_size[1]>0 ) {
	+ __shared__ int base_thread_id;
	+ if (threadIdx.x==0 && threadIdx.y==0 ) {
	+ threadid = ((blockIdx.xblockDim.z + threadIdx.z) blockDim.x * blockDim.y) % kokkos_impl_cuda_lock_arrays.n;
	+ threadid = ((threadid + blockDim.x * blockDim.y-1)/(blockDim.x * blockDim.y)) * blockDim.x * blockDim.y;
	+ if(threadid > kokkos_impl_cuda_lock_arrays.n) threadid-=blockDim.x * blockDim.y;
	+ int done = 0;
	+ while (!done) {
	+ done = (0 == atomicCAS(&kokkos_impl_cuda_lock_arrays.atomic[threadid],0,1));
	+ if(!done) {
	+ threadid += blockDim.x * blockDim.y;
	+ if(threadid > kokkos_impl_cuda_lock_arrays.n) threadid = 0;
	+ }
	+ }
	+ base_thread_id = threadid;
	+ }
	+ __syncthreads();
	+ threadid = base_thread_id;
	+ }
	+
	+ run(Kokkos::Impl::if_c<UseShflReduction, DummyShflReductionType, DummySHMEMReductionType>::select(1,1.0), threadid );
	}

	__device__ inline
	- void run(const DummySHMEMReductionType&) const
	+ void run(const DummySHMEMReductionType&, const int& threadid) const
	{
	const integral_nonzero_constant< size_type , ValueTraits::StaticValueSize / sizeof(size_type) >
	word_count( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) / sizeof(size_type) );

	reference_type value =
	ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , kokkos_impl_cuda_shared_memory<size_type>() + threadIdx.y * word_count.value );

	// Iterate this block through the league
	for ( int league_rank = blockIdx.x ; league_rank < m_league_size ; league_rank += gridDim.x ) {
	this-> template exec_team< WorkTag >
	( Member( kokkos_impl_cuda_shared_memory<char>() + m_team_begin
	, m_shmem_begin
	, m_shmem_size
	- , m_scratch_ptr[1]
	+ , (void) ( ((char)m_scratch_ptr[1]) + threadid/(blockDim.xblockDim.y) m_scratch_size[1])
	, m_scratch_size[1]
	, league_rank
	, m_league_size )
	, value );
	}

	// Reduce with final value at blockDim.y - 1 location.
	if ( cuda_single_inter_block_reduce_scan<false,FunctorType,WorkTag>(
	ReducerConditional::select(m_functor , m_reducer) , blockIdx.x , gridDim.x ,
	kokkos_impl_cuda_shared_memory<size_type>() , m_scratch_space , m_scratch_flags ) ) {

	// This is the final block with the final result at the final threads' location

	size_type * const shared = kokkos_impl_cuda_shared_memory<size_type>() + ( blockDim.y - 1 ) * word_count.value ;
	size_type * const global = m_unified_space ? m_unified_space : m_scratch_space ;

	if ( threadIdx.y == 0 ) {
	Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , shared );
	}

	if ( CudaTraits::WarpSize < word_count.value ) { __syncthreads(); }

	for ( unsigned i = threadIdx.y ; i < word_count.value ; i += blockDim.y ) { global[i] = shared[i]; }
	}
	}

	__device__ inline
	- void run(const DummyShflReductionType&) const
	+ void run(const DummyShflReductionType&, const int& threadid) const
	{
	value_type value;
	ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , &value);

	// Iterate this block through the league
	for ( int league_rank = blockIdx.x ; league_rank < m_league_size ; league_rank += gridDim.x ) {
	this-> template exec_team< WorkTag >
	( Member( kokkos_impl_cuda_shared_memory<char>() + m_team_begin
	, m_shmem_begin
	, m_shmem_size
	- , m_scratch_ptr[1]
	+ , (void) ( ((char)m_scratch_ptr[1]) + threadid/(blockDim.xblockDim.y) m_scratch_size[1])
	, m_scratch_size[1]
	, league_rank
	, m_league_size )
	, value );
	}

	pointer_type const result = (pointer_type) (m_unified_space ? m_unified_space : m_scratch_space) ;

	value_type init;
	ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , &init);
	if(Impl::cuda_inter_block_reduction<FunctorType,ValueJoin,WorkTag>
	(value,init,ValueJoin(ReducerConditional::select(m_functor , m_reducer)),m_scratch_space,result,m_scratch_flags,blockDim.y)) {
	const unsigned id = threadIdx.y*blockDim.x + threadIdx.x;
	if(id==0) {
	Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , (void*) &value );
	*result = value;
	}
	}
	}

	inline
	void execute()
	{
	const int nwork = m_league_size * m_team_size ;
	if ( nwork ) {
	const int block_count = UseShflReduction? std::min( m_league_size , size_type(1024) )
	:std::min( m_league_size , m_team_size );

	m_scratch_space = cuda_internal_scratch_space( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) * block_count );
	m_scratch_flags = cuda_internal_scratch_flags( sizeof(size_type) );
	m_unified_space = cuda_internal_scratch_unified( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) );

	const dim3 block( m_vector_size , m_team_size , 1 );
	const dim3 grid( block_count , 1 , 1 );
	const int shmem_size_total = m_team_begin + m_shmem_begin + m_shmem_size ;

	CudaParallelLaunch< ParallelReduce >( *this, grid, block, shmem_size_total ); // copy to device and execute

	Cuda::fence();

	if ( m_result_ptr ) {
	if ( m_unified_space ) {
	const int count = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
	for ( int i = 0 ; i < count ; ++i ) { m_result_ptr[i] = pointer_type(m_unified_space)[i] ; }
	}
	else {
	const int size = ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) );
	DeepCopy<HostSpace,CudaSpace>( m_result_ptr, m_scratch_space, size );
	}
	}
	}
	else {
	if (m_result_ptr) {
	ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , m_result_ptr );
	}
	}
	}

	template< class HostViewType >
	ParallelReduce( const FunctorType & arg_functor
	, const Policy & arg_policy
	, const HostViewType & arg_result
	, typename std::enable_if<
	Kokkos::is_view< HostViewType >::value
	,void*>::type = NULL)
	: m_functor( arg_functor )
	, m_reducer( InvalidType() )
	, m_result_ptr( arg_result.ptr_on_device() )
	, m_scratch_space( 0 )
	, m_scratch_flags( 0 )
	, m_unified_space( 0 )
	, m_team_begin( 0 )
	, m_shmem_begin( 0 )
	, m_shmem_size( 0 )
	, m_scratch_ptr{NULL,NULL}
	, m_league_size( arg_policy.league_size() )
	, m_team_size( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
	Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
	arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
	arg_policy.vector_length() )
	, m_vector_size( arg_policy.vector_length() )
	, m_scratch_size{
	arg_policy.scratch_size(0,( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
	Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
	arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
	arg_policy.vector_length() )
	), arg_policy.scratch_size(1,( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
	Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
	arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
	arg_policy.vector_length() )
	)}
	{
	// Return Init value if the number of worksets is zero
	if( arg_policy.league_size() == 0) {
	ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , arg_result.ptr_on_device() );
	return ;
	}

	m_team_begin = UseShflReduction?0:cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( arg_functor , m_team_size );
	m_shmem_begin = sizeof(double) * ( m_team_size + 2 );
	m_shmem_size = arg_policy.scratch_size(0,m_team_size) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , m_team_size );
	m_scratch_ptr[1] = cuda_resize_scratch_space(m_scratch_size[1](Cuda::concurrency()/(m_team_sizem_vector_size)));
	m_scratch_size[0] = m_shmem_size;
	m_scratch_size[1] = arg_policy.scratch_size(1,m_team_size);

	// The global parallel_reduce does not support vector_length other than 1 at the moment
	if( (arg_policy.vector_length() > 1) && !UseShflReduction )
	Impl::throw_runtime_exception( "Kokkos::parallel_reduce with a TeamPolicy using a vector length of greater than 1 is not currently supported for CUDA for dynamic sized reduction types.");

	if( (m_team_size < 32) && !UseShflReduction )
	Impl::throw_runtime_exception( "Kokkos::parallel_reduce with a TeamPolicy using a team_size smaller than 32 is not currently supported with CUDA for dynamic sized reduction types.");

	// Functor's reduce memory, team scan memory, and team shared memory depend upon team size.

	const int shmem_size_total = m_team_begin + m_shmem_begin + m_shmem_size ;

	if (! Kokkos::Impl::is_integral_power_of_two( m_team_size ) && !UseShflReduction ) {
	Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelReduce< Cuda > bad team size"));
	}

	if ( CudaTraits::SharedMemoryCapacity < shmem_size_total ) {
	Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelReduce< Cuda > requested too much L0 scratch memory"));
	}

	- if ( m_team_size >
	- Kokkos::Impl::cuda_get_max_block_size< ParallelReduce >
	- ( arg_functor , arg_policy.vector_length(), arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) / arg_policy.vector_length()) {
	+ if ( unsigned(m_team_size) >
	+ unsigned(Kokkos::Impl::cuda_get_max_block_size< ParallelReduce >
	+ ( arg_functor , arg_policy.vector_length(), arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) / arg_policy.vector_length())) {
	Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelReduce< Cuda > requested too large team size."));
	}

	}

	ParallelReduce( const FunctorType & arg_functor
	, const Policy & arg_policy
	, const ReducerType & reducer)
	: m_functor( arg_functor )
	, m_reducer( reducer )
	, m_result_ptr( reducer.result_view().ptr_on_device() )
	, m_scratch_space( 0 )
	, m_scratch_flags( 0 )
	, m_unified_space( 0 )
	, m_team_begin( 0 )
	, m_shmem_begin( 0 )
	, m_shmem_size( 0 )
	, m_scratch_ptr{NULL,NULL}
	, m_league_size( arg_policy.league_size() )
	, m_team_size( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
	Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
	arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
	arg_policy.vector_length() )
	, m_vector_size( arg_policy.vector_length() )
	{
	// Return Init value if the number of worksets is zero
	if( arg_policy.league_size() == 0) {
	ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , m_result_ptr );
	return ;
	}

	m_team_begin = UseShflReduction?0:cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( arg_functor , m_team_size );
	m_shmem_begin = sizeof(double) * ( m_team_size + 2 );
	m_shmem_size = arg_policy.scratch_size(0,m_team_size) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , m_team_size );
	m_scratch_ptr[1] = cuda_resize_scratch_space(m_scratch_size[1](Cuda::concurrency()/(m_team_sizem_vector_size)));
	m_scratch_size[0] = m_shmem_size;
	m_scratch_size[1] = arg_policy.scratch_size(1,m_team_size);

	// The global parallel_reduce does not support vector_length other than 1 at the moment
	if( (arg_policy.vector_length() > 1) && !UseShflReduction )
	Impl::throw_runtime_exception( "Kokkos::parallel_reduce with a TeamPolicy using a vector length of greater than 1 is not currently supported for CUDA for dynamic sized reduction types.");

	if( (m_team_size < 32) && !UseShflReduction )
	Impl::throw_runtime_exception( "Kokkos::parallel_reduce with a TeamPolicy using a team_size smaller than 32 is not currently supported with CUDA for dynamic sized reduction types.");

	// Functor's reduce memory, team scan memory, and team shared memory depend upon team size.

	const int shmem_size_total = m_team_begin + m_shmem_begin + m_shmem_size ;

	if ( (! Kokkos::Impl::is_integral_power_of_two( m_team_size ) && !UseShflReduction ) \|\|
	CudaTraits::SharedMemoryCapacity < shmem_size_total ) {
	Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelReduce< Cuda > bad team size"));
	}

	if ( int(m_team_size) >
	int(Kokkos::Impl::cuda_get_max_block_size< ParallelReduce >
	( arg_functor , arg_policy.vector_length(), arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) / arg_policy.vector_length())) {
	Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelReduce< Cuda > requested too large team size."));
	}

	}
	};

	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	template< class FunctorType , class ... Traits >
	class ParallelScan< FunctorType
	, Kokkos::RangePolicy< Traits ... >
	, Kokkos::Cuda
	>
	{
	private:

	typedef Kokkos::RangePolicy< Traits ... > Policy ;
	typedef typename Policy::member_type Member ;
	typedef typename Policy::work_tag WorkTag ;
	typedef typename Policy::WorkRange WorkRange ;

	typedef Kokkos::Impl::FunctorValueTraits< FunctorType, WorkTag > ValueTraits ;
	typedef Kokkos::Impl::FunctorValueInit< FunctorType, WorkTag > ValueInit ;
	typedef Kokkos::Impl::FunctorValueOps< FunctorType, WorkTag > ValueOps ;

	public:

	typedef typename ValueTraits::pointer_type pointer_type ;
	typedef typename ValueTraits::reference_type reference_type ;
	typedef FunctorType functor_type ;
	typedef Cuda::size_type size_type ;

	private:

	// Algorithmic constraints:
	// (a) blockDim.y is a power of two
	// (b) blockDim.y == blockDim.z == 1
	// (c) gridDim.x <= blockDim.y * blockDim.y
	// (d) gridDim.y == gridDim.z == 1

	const FunctorType m_functor ;
	const Policy m_policy ;
	size_type * m_scratch_space ;
	size_type * m_scratch_flags ;
	size_type m_final ;

	template< class TagType >
	__device__ inline
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	exec_range( const Member & i , reference_type update , const bool final_result ) const
	{ m_functor( i , update , final_result ); }

	template< class TagType >
	__device__ inline
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	exec_range( const Member & i , reference_type update , const bool final_result ) const
	{ m_functor( TagType() , i , update , final_result ); }

	//----------------------------------------

	__device__ inline
	void initial(void) const
	{
	const integral_nonzero_constant< size_type , ValueTraits::StaticValueSize / sizeof(size_type) >
	word_count( ValueTraits::value_size( m_functor ) / sizeof(size_type) );

	size_type * const shared_value = kokkos_impl_cuda_shared_memory<size_type>() + word_count.value * threadIdx.y ;

	ValueInit::init( m_functor , shared_value );

	// Number of blocks is bounded so that the reduction can be limited to two passes.
	// Each thread block is given an approximately equal amount of work to perform.
	// Accumulate the values for this block.
	// The accumulation ordering does not match the final pass, but is arithmatically equivalent.

	const WorkRange range( m_policy , blockIdx.x , gridDim.x );

	for ( Member iwork = range.begin() + threadIdx.y , iwork_end = range.end() ;
	iwork < iwork_end ; iwork += blockDim.y ) {
	this-> template exec_range< WorkTag >( iwork , ValueOps::reference( shared_value ) , false );
	}

	// Reduce and scan, writing out scan of blocks' totals and block-groups' totals.
	// Blocks' scan values are written to 'blockIdx.x' location.
	// Block-groups' scan values are at: i = ( j * blockDim.y - 1 ) for i < gridDim.x
	cuda_single_inter_block_reduce_scan<true,FunctorType,WorkTag>( m_functor , blockIdx.x , gridDim.x , kokkos_impl_cuda_shared_memory<size_type>() , m_scratch_space , m_scratch_flags );
	}

	//----------------------------------------

	__device__ inline
	void final(void) const
	{
	const integral_nonzero_constant< size_type , ValueTraits::StaticValueSize / sizeof(size_type) >
	word_count( ValueTraits::value_size( m_functor ) / sizeof(size_type) );

	// Use shared memory as an exclusive scan: { 0 , value[0] , value[1] , value[2] , ... }
	size_type * const shared_data = kokkos_impl_cuda_shared_memory<size_type>();
	size_type * const shared_prefix = shared_data + word_count.value * threadIdx.y ;
	size_type * const shared_accum = shared_data + word_count.value * ( blockDim.y + 1 );

	// Starting value for this thread block is the previous block's total.
	if ( blockIdx.x ) {
	size_type * const block_total = m_scratch_space + word_count.value * ( blockIdx.x - 1 );
	for ( unsigned i = threadIdx.y ; i < word_count.value ; ++i ) { shared_accum[i] = block_total[i] ; }
	}
	else if ( 0 == threadIdx.y ) {
	ValueInit::init( m_functor , shared_accum );
	}

	const WorkRange range( m_policy , blockIdx.x , gridDim.x );

	for ( typename Policy::member_type iwork_base = range.begin(); iwork_base < range.end() ; iwork_base += blockDim.y ) {

	const typename Policy::member_type iwork = iwork_base + threadIdx.y ;

	__syncthreads(); // Don't overwrite previous iteration values until they are used

	ValueInit::init( m_functor , shared_prefix + word_count.value );

	// Copy previous block's accumulation total into thread[0] prefix and inclusive scan value of this block
	for ( unsigned i = threadIdx.y ; i < word_count.value ; ++i ) {
	shared_data[i + word_count.value] = shared_data[i] = shared_accum[i] ;
	}

	if ( CudaTraits::WarpSize < word_count.value ) { __syncthreads(); } // Protect against large scan values.

	// Call functor to accumulate inclusive scan value for this work item
	if ( iwork < range.end() ) {
	this-> template exec_range< WorkTag >( iwork , ValueOps::reference( shared_prefix + word_count.value ) , false );
	}

	// Scan block values into locations shared_data[1..blockDim.y]
	cuda_intra_block_reduce_scan<true,FunctorType,WorkTag>( m_functor , typename ValueTraits::pointer_type(shared_data+word_count.value) );

	{
	size_type * const block_total = shared_data + word_count.value * blockDim.y ;
	for ( unsigned i = threadIdx.y ; i < word_count.value ; ++i ) { shared_accum[i] = block_total[i]; }
	}

	// Call functor with exclusive scan value
	if ( iwork < range.end() ) {
	this-> template exec_range< WorkTag >( iwork , ValueOps::reference( shared_prefix ) , true );
	}
	}
	}

	public:

	//----------------------------------------

	__device__ inline
	void operator()(void) const
	{
	if ( ! m_final ) {
	initial();
	}
	else {
	final();
	}
	}

	// Determine block size constrained by shared memory:
	static inline
	unsigned local_block_size( const FunctorType & f )
	{
	// blockDim.y must be power of two = 128 (4 warps) or 256 (8 warps) or 512 (16 warps)
	// gridDim.x <= blockDim.y * blockDim.y
	//
	// 4 warps was 10% faster than 8 warps and 20% faster than 16 warps in unit testing

	unsigned n = CudaTraits::WarpSize * 4 ;
	while ( n && CudaTraits::SharedMemoryCapacity < cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( f , n ) ) { n >>= 1 ; }
	return n ;
	}

	inline
	void execute()
	{
	const int nwork = m_policy.end() - m_policy.begin();
	if ( nwork ) {
	enum { GridMaxComputeCapability_2x = 0x0ffff };

	const int block_size = local_block_size( m_functor );

	const int grid_max =
	( block_size * block_size ) < GridMaxComputeCapability_2x ?
	( block_size * block_size ) : GridMaxComputeCapability_2x ;

	// At most 'max_grid' blocks:
	const int max_grid = std::min( int(grid_max) , int(( nwork + block_size - 1 ) / block_size ));

	// How much work per block:
	const int work_per_block = ( nwork + max_grid - 1 ) / max_grid ;

	// How many block are really needed for this much work:
	const int grid_x = ( nwork + work_per_block - 1 ) / work_per_block ;

	m_scratch_space = cuda_internal_scratch_space( ValueTraits::value_size( m_functor ) * grid_x );
	m_scratch_flags = cuda_internal_scratch_flags( sizeof(size_type) * 1 );

	const dim3 grid( grid_x , 1 , 1 );
	const dim3 block( 1 , block_size , 1 ); // REQUIRED DIMENSIONS ( 1 , N , 1 )
	const int shmem = ValueTraits::value_size( m_functor ) * ( block_size + 2 );

	m_final = false ;
	CudaParallelLaunch< ParallelScan >( *this, grid, block, shmem ); // copy to device and execute

	m_final = true ;
	CudaParallelLaunch< ParallelScan >( *this, grid, block, shmem ); // copy to device and execute
	}
	}

	ParallelScan( const FunctorType & arg_functor ,
	const Policy & arg_policy )
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	, m_scratch_space( 0 )
	, m_scratch_flags( 0 )
	, m_final( false )
	{ }
	};

	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {
	template<typename iType>
	struct TeamThreadRangeBoundariesStruct<iType,CudaTeamMember> {
	typedef iType index_type;
	const iType start;
	const iType end;
	const iType increment;
	const CudaTeamMember& thread;

	#ifdef __CUDA_ARCH__
	__device__ inline
	TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& count):
	start( threadIdx.y ),
	end( count ),
	increment( blockDim.y ),
	thread(thread_)
	{}
	__device__ inline
	TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& begin_, const iType& end_):
	start( begin_+threadIdx.y ),
	end( end_ ),
	increment( blockDim.y ),
	thread(thread_)
	{}
	#else
	KOKKOS_INLINE_FUNCTION
	TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& count):
	start( 0 ),
	end( count ),
	increment( 1 ),
	thread(thread_)
	{}
	KOKKOS_INLINE_FUNCTION
	TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& begin_, const iType& end_):
	start( begin_ ),
	end( end_ ),
	increment( 1 ),
	thread(thread_)
	{}
	#endif
	};

	template<typename iType>
	struct ThreadVectorRangeBoundariesStruct<iType,CudaTeamMember> {
	typedef iType index_type;
	const iType start;
	const iType end;
	const iType increment;

	#ifdef __CUDA_ARCH__
	__device__ inline
	ThreadVectorRangeBoundariesStruct (const CudaTeamMember, const iType& count):
	start( threadIdx.x ),
	end( count ),
	increment( blockDim.x )
	{}
	__device__ inline
	ThreadVectorRangeBoundariesStruct (const iType& count):
	start( threadIdx.x ),
	end( count ),
	increment( blockDim.x )
	{}
	#else
	KOKKOS_INLINE_FUNCTION
	ThreadVectorRangeBoundariesStruct (const CudaTeamMember, const iType& count):
	start( 0 ),
	end( count ),
	increment( 1 )
	{}
	KOKKOS_INLINE_FUNCTION
	ThreadVectorRangeBoundariesStruct (const iType& count):
	start( 0 ),
	end( count ),
	increment( 1 )
	{}
	#endif
	};

	} // namespace Impl

	template<typename iType>
	KOKKOS_INLINE_FUNCTION
	Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >
	TeamThreadRange( const Impl::CudaTeamMember & thread, const iType & count ) {
	return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >( thread, count );
	}

	template< typename iType1, typename iType2 >
	KOKKOS_INLINE_FUNCTION
	Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
	Impl::CudaTeamMember >
	TeamThreadRange( const Impl::CudaTeamMember & thread, const iType1 & begin, const iType2 & end ) {
	typedef typename std::common_type< iType1, iType2 >::type iType;
	return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >( thread, iType(begin), iType(end) );
	}

	template<typename iType>
	KOKKOS_INLINE_FUNCTION
	Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >
	ThreadVectorRange(const Impl::CudaTeamMember& thread, const iType& count) {
	return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >(thread,count);
	}

	KOKKOS_INLINE_FUNCTION
	Impl::ThreadSingleStruct<Impl::CudaTeamMember> PerTeam(const Impl::CudaTeamMember& thread) {
	return Impl::ThreadSingleStruct<Impl::CudaTeamMember>(thread);
	}

	KOKKOS_INLINE_FUNCTION
	Impl::VectorSingleStruct<Impl::CudaTeamMember> PerThread(const Impl::CudaTeamMember& thread) {
	return Impl::VectorSingleStruct<Impl::CudaTeamMember>(thread);
	}

	} // namespace Kokkos

	namespace Kokkos {

	/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all threads of the the calling thread team.
	* This functionality requires C++11 support.*/
	template<typename iType, class Lambda>
	KOKKOS_INLINE_FUNCTION
	void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>& loop_boundaries, const Lambda& lambda) {
	#ifdef __CUDA_ARCH__
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
	lambda(i);
	#endif
	}

	/** \brief Inter-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
	* val is performed and put into result. This functionality requires C++11 support.*/
	template< typename iType, class Lambda, typename ValueType >
	KOKKOS_INLINE_FUNCTION
	void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>& loop_boundaries,
	const Lambda & lambda, ValueType& result) {

	#ifdef __CUDA_ARCH__
	result = ValueType();

	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	lambda(i,result);
	}

	Impl::cuda_intra_warp_reduction(result,[&] (ValueType& dst, const ValueType& src)
	{ dst+=src; });
	Impl::cuda_inter_warp_reduction(result,[&] (ValueType& dst, const ValueType& src)
	{ dst+=src; });
	#endif
	}

	/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
	* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
	* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
	* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
	* '1 for '). This functionality requires C++11 support./
	template< typename iType, class Lambda, typename ValueType, class JoinType >
	KOKKOS_INLINE_FUNCTION
	void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>& loop_boundaries,
	const Lambda & lambda, const JoinType& join, ValueType& init_result) {

	#ifdef __CUDA_ARCH__
	ValueType result = init_result;

	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	lambda(i,result);
	}

	Impl::cuda_intra_warp_reduction(result, join );
	Impl::cuda_inter_warp_reduction(result, join );

	init_result = result;
	#endif
	}

	} //namespace Kokkos

	namespace Kokkos {
	/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
	* This functionality requires C++11 support.*/
	template<typename iType, class Lambda>
	KOKKOS_INLINE_FUNCTION
	void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >&
	loop_boundaries, const Lambda& lambda) {
	#ifdef __CUDA_ARCH__
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
	lambda(i);
	#endif
	}

	-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	+/** \brief Intra-thread vector parallel_reduce.
	*
	- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
	- * val is performed and put into result. This functionality requires C++11 support.*/
	+ * Calls lambda(iType i, ValueType & val) for each i=[0..N).
	+ *
	+ * The range [0..N) is mapped to all vector lanes of
	+ * the calling thread and a reduction of val is performed using +=
	+ * and output into result.
	+ *
	+ * The identity value for the += operator is assumed to be the default
	+ * constructed value.
	+ */
	template< typename iType, class Lambda, typename ValueType >
	KOKKOS_INLINE_FUNCTION
	-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >&
	- loop_boundaries, const Lambda & lambda, ValueType& result) {
	+void parallel_reduce
	+ ( Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >
	+ const & loop_boundaries
	+ , Lambda const & lambda
	+ , ValueType & result )
	+{
	#ifdef __CUDA_ARCH__
	result = ValueType();

	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	lambda(i,result);
	}

	- if (loop_boundaries.increment > 1)
	- result += shfl_down(result, 1,loop_boundaries.increment);
	- if (loop_boundaries.increment > 2)
	- result += shfl_down(result, 2,loop_boundaries.increment);
	- if (loop_boundaries.increment > 4)
	- result += shfl_down(result, 4,loop_boundaries.increment);
	- if (loop_boundaries.increment > 8)
	- result += shfl_down(result, 8,loop_boundaries.increment);
	- if (loop_boundaries.increment > 16)
	- result += shfl_down(result, 16,loop_boundaries.increment);
	-
	- result = shfl(result,0,loop_boundaries.increment);
	+ Impl::cuda_intra_warp_vector_reduce(
	+ Impl::Reducer< ValueType , Impl::ReduceSum< ValueType > >( & result ) );
	+
	#endif
	}

	-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	+/** \brief Intra-thread vector parallel_reduce.
	*
	- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
	- * val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
	- * The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
	- * the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
	- * '1 for '). This functionality requires C++11 support./
	+ * Calls lambda(iType i, ValueType & val) for each i=[0..N).
	+ *
	+ * The range [0..N) is mapped to all vector lanes of
	+ * the calling thread and a reduction of val is performed
	+ * using JoinType::operator()(ValueType& val, const ValueType& update)
	+ * and output into result.
	+ *
	+ * The input value of result must be the identity value for the
	+ * reduction operation; e.g., ( 0 , += ) or ( 1 , *= ).
	+ */
	template< typename iType, class Lambda, typename ValueType, class JoinType >
	KOKKOS_INLINE_FUNCTION
	-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >&
	- loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& init_result) {
	-
	+void parallel_reduce
	+ ( Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >
	+ const & loop_boundaries
	+ , Lambda const & lambda
	+ , JoinType const & join
	+ , ValueType & result )
	+{
	#ifdef __CUDA_ARCH__
	- ValueType result = init_result;

	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	lambda(i,result);
	}

	- if (loop_boundaries.increment > 1)
	- join( result, shfl_down(result, 1,loop_boundaries.increment));
	- if (loop_boundaries.increment > 2)
	- join( result, shfl_down(result, 2,loop_boundaries.increment));
	- if (loop_boundaries.increment > 4)
	- join( result, shfl_down(result, 4,loop_boundaries.increment));
	- if (loop_boundaries.increment > 8)
	- join( result, shfl_down(result, 8,loop_boundaries.increment));
	- if (loop_boundaries.increment > 16)
	- join( result, shfl_down(result, 16,loop_boundaries.increment));
	-
	- init_result = shfl(result,0,loop_boundaries.increment);
	+ Impl::cuda_intra_warp_vector_reduce(
	+ Impl::Reducer< ValueType , JoinType >( join , & result ) );
	+
	#endif
	}

	/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
	* for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all vector lanes in the thread and a scan operation is performed.
	* Depending on the target execution space the operator might be called twice: once with final=false
	* and once with final=true. When final==true val contains the prefix sum value. The contribution of this
	* "i" needs to be added to val no matter whether final==true or not. In a serial execution
	* (i.e. team_size==1) the operator is only called once with final==true. Scan_val will be set
	* to the final sum value over all vector lanes.
	* This functionality requires C++11 support.*/
	template< typename iType, class FunctorType >
	KOKKOS_INLINE_FUNCTION
	void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >&
	loop_boundaries, const FunctorType & lambda) {

	#ifdef __CUDA_ARCH__
	typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
	typedef typename ValueTraits::value_type value_type ;

	value_type scan_val = value_type();
	const int VectorLength = blockDim.x;

	iType loop_bound = ((loop_boundaries.end+VectorLength-1)/VectorLength) * VectorLength;
	for(int _i = threadIdx.x; _i < loop_bound; _i += VectorLength) {
	value_type val = value_type();
	if(_i<loop_boundaries.end)
	lambda(_i , val , false);

	value_type tmp = val;
	value_type result_i;

	if(threadIdx.x%VectorLength == 0)
	result_i = tmp;
	if (VectorLength > 1) {
	const value_type tmp2 = shfl_up(tmp, 1,VectorLength);
	if(threadIdx.x > 0)
	tmp+=tmp2;
	}
	if(threadIdx.x%VectorLength == 1)
	result_i = tmp;
	if (VectorLength > 3) {
	const value_type tmp2 = shfl_up(tmp, 2,VectorLength);
	if(threadIdx.x > 1)
	tmp+=tmp2;
	}
	if ((threadIdx.x%VectorLength >= 2) &&
	(threadIdx.x%VectorLength < 4))
	result_i = tmp;
	if (VectorLength > 7) {
	const value_type tmp2 = shfl_up(tmp, 4,VectorLength);
	if(threadIdx.x > 3)
	tmp+=tmp2;
	}
	if ((threadIdx.x%VectorLength >= 4) &&
	(threadIdx.x%VectorLength < 8))
	result_i = tmp;
	if (VectorLength > 15) {
	const value_type tmp2 = shfl_up(tmp, 8,VectorLength);
	if(threadIdx.x > 7)
	tmp+=tmp2;
	}
	if ((threadIdx.x%VectorLength >= 8) &&
	(threadIdx.x%VectorLength < 16))
	result_i = tmp;
	if (VectorLength > 31) {
	const value_type tmp2 = shfl_up(tmp, 16,VectorLength);
	if(threadIdx.x > 15)
	tmp+=tmp2;
	}
	if (threadIdx.x%VectorLength >= 16)
	result_i = tmp;

	val = scan_val + result_i - val;
	scan_val += shfl(tmp,VectorLength-1,VectorLength);
	if(_i<loop_boundaries.end)
	lambda(_i , val , true);
	}
	#endif
	}

	}

	namespace Kokkos {

	template<class FunctorType>
	KOKKOS_INLINE_FUNCTION
	void single(const Impl::VectorSingleStruct<Impl::CudaTeamMember>& , const FunctorType& lambda) {
	#ifdef __CUDA_ARCH__
	if(threadIdx.x == 0) lambda();
	#endif
	}

	template<class FunctorType>
	KOKKOS_INLINE_FUNCTION
	void single(const Impl::ThreadSingleStruct<Impl::CudaTeamMember>& , const FunctorType& lambda) {
	#ifdef __CUDA_ARCH__
	if(threadIdx.x == 0 && threadIdx.y == 0) lambda();
	#endif
	}

	template<class FunctorType, class ValueType>
	KOKKOS_INLINE_FUNCTION
	void single(const Impl::VectorSingleStruct<Impl::CudaTeamMember>& , const FunctorType& lambda, ValueType& val) {
	#ifdef __CUDA_ARCH__
	if(threadIdx.x == 0) lambda(val);
	val = shfl(val,0,blockDim.x);
	#endif
	}

	template<class FunctorType, class ValueType>
	KOKKOS_INLINE_FUNCTION
	void single(const Impl::ThreadSingleStruct<Impl::CudaTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
	#ifdef __CUDA_ARCH__
	if(threadIdx.x == 0 && threadIdx.y == 0) {
	lambda(val);
	}
	single_struct.team_member.team_broadcast(val,0);
	#endif
	}

	}

	namespace Kokkos {

	namespace Impl {
	template< class FunctorType, class ExecPolicy, class ValueType , class Tag = typename ExecPolicy::work_tag>
	struct CudaFunctorAdapter {
	const FunctorType f;
	typedef ValueType value_type;
	CudaFunctorAdapter(const FunctorType& f_):f(f_) {}

	__device__ inline
	void operator() (typename ExecPolicy::work_tag, const typename ExecPolicy::member_type& i, ValueType& val) const {
	//Insert Static Assert with decltype on ValueType equals third argument type of FunctorType::operator()
	f(typename ExecPolicy::work_tag(), i,val);
	}
	};

	template< class FunctorType, class ExecPolicy, class ValueType >
	struct CudaFunctorAdapter<FunctorType,ExecPolicy,ValueType,void> {
	const FunctorType f;
	typedef ValueType value_type;
	CudaFunctorAdapter(const FunctorType& f_):f(f_) {}

	__device__ inline
	void operator() (const typename ExecPolicy::member_type& i, ValueType& val) const {
	//Insert Static Assert with decltype on ValueType equals second argument type of FunctorType::operator()
	f(i,val);
	}
	__device__ inline
	void operator() (typename ExecPolicy::member_type& i, ValueType& val) const {
	//Insert Static Assert with decltype on ValueType equals second argument type of FunctorType::operator()
	f(i,val);
	}

	};

	template< class FunctorType, class Enable = void>
	struct ReduceFunctorHasInit {
	enum {value = false};
	};

	template< class FunctorType>
	struct ReduceFunctorHasInit<FunctorType, typename Impl::enable_if< 0 < sizeof( & FunctorType::init ) >::type > {
	enum {value = true};
	};

	template< class FunctorType, class Enable = void>
	struct ReduceFunctorHasJoin {
	enum {value = false};
	};

	template< class FunctorType>
	struct ReduceFunctorHasJoin<FunctorType, typename Impl::enable_if< 0 < sizeof( & FunctorType::join ) >::type > {
	enum {value = true};
	};

	template< class FunctorType, class Enable = void>
	struct ReduceFunctorHasFinal {
	enum {value = false};
	};

	template< class FunctorType>
	struct ReduceFunctorHasFinal<FunctorType, typename Impl::enable_if< 0 < sizeof( & FunctorType::final ) >::type > {
	enum {value = true};
	};

	template< class FunctorType, class Enable = void>
	struct ReduceFunctorHasShmemSize {
	enum {value = false};
	};

	template< class FunctorType>
	struct ReduceFunctorHasShmemSize<FunctorType, typename Impl::enable_if< 0 < sizeof( & FunctorType::team_shmem_size ) >::type > {
	enum {value = true};
	};

	template< class FunctorType, bool Enable =
	( FunctorDeclaresValueType<FunctorType,void>::value) \|\|
	( ReduceFunctorHasInit<FunctorType>::value ) \|\|
	( ReduceFunctorHasJoin<FunctorType>::value ) \|\|
	( ReduceFunctorHasFinal<FunctorType>::value ) \|\|
	( ReduceFunctorHasShmemSize<FunctorType>::value )
	>
	struct IsNonTrivialReduceFunctor {
	enum {value = false};
	};

	template< class FunctorType>
	struct IsNonTrivialReduceFunctor<FunctorType, true> {
	enum {value = true};
	};

	template<class FunctorType, class ResultType, class Tag, bool Enable = IsNonTrivialReduceFunctor<FunctorType>::value >
	struct FunctorReferenceType {
	typedef ResultType& reference_type;
	};

	template<class FunctorType, class ResultType, class Tag>
	struct FunctorReferenceType<FunctorType, ResultType, Tag, true> {
	typedef typename Kokkos::Impl::FunctorValueTraits< FunctorType ,Tag >::reference_type reference_type;
	};

	template< class FunctorTypeIn, class ExecPolicy, class ValueType>
	struct ParallelReduceFunctorType<FunctorTypeIn,ExecPolicy,ValueType,Cuda> {

	enum {FunctorHasValueType = IsNonTrivialReduceFunctor<FunctorTypeIn>::value };
	typedef typename Kokkos::Impl::if_c<FunctorHasValueType, FunctorTypeIn, Impl::CudaFunctorAdapter<FunctorTypeIn,ExecPolicy,ValueType> >::type functor_type;
	static functor_type functor(const FunctorTypeIn& functor_in) {
	return Impl::if_c<FunctorHasValueType,FunctorTypeIn,functor_type>::select(functor_in,functor_type(functor_in));
	}
	};

	}

	} // namespace Kokkos
	#endif /* defined( __CUDACC__ ) */

	#endif /* #ifndef KOKKOS_CUDA_PARALLEL_HPP */
	diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
	index ad9cca26c..79b3867ba 100644
	--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
	+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
	@@ -1,444 +1,595 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_CUDA_REDUCESCAN_HPP
	#define KOKKOS_CUDA_REDUCESCAN_HPP

	#include <Kokkos_Macros.hpp>

	/* only compile this file if CUDA is enabled for Kokkos */
	#if defined( __CUDACC__ ) && defined( KOKKOS_ENABLE_CUDA )

	#include <utility>

	#include <Kokkos_Parallel.hpp>
	#include <impl/Kokkos_FunctorAdapter.hpp>
	#include <impl/Kokkos_Error.hpp>
	#include <Cuda/Kokkos_Cuda_Vectorization.hpp>
	+
	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	+//----------------------------------------------------------------------------
	+
	+template< typename T >
	+__device__ inline
	+void cuda_shfl( T & out , T const & in , int lane ,
	+ typename std::enable_if< sizeof(int) == sizeof(T) , int >::type width )
	+{
	+ reinterpret_cast<int>(&out) =
	+ __shfl( reinterpret_cast<int const >(&in) , lane , width );
	+}
	+
	+template< typename T >
	+__device__ inline
	+void cuda_shfl( T & out , T const & in , int lane ,
	+ typename std::enable_if
	+ < ( sizeof(int) < sizeof(T) ) && ( 0 == ( sizeof(T) % sizeof(int) ) )
	+ , int >::type width )
	+{
	+ enum : int { N = sizeof(T) / sizeof(int) };
	+
	+ for ( int i = 0 ; i < N ; ++i ) {
	+ reinterpret_cast<int*>(&out)[i] =
	+ __shfl( reinterpret_cast<int const *>(&in)[i] , lane , width );
	+ }
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+template< typename T >
	+__device__ inline
	+void cuda_shfl_down( T & out , T const & in , int delta ,
	+ typename std::enable_if< sizeof(int) == sizeof(T) , int >::type width )
	+{
	+ reinterpret_cast<int>(&out) =
	+ __shfl_down( reinterpret_cast<int const >(&in) , delta , width );
	+}
	+
	+template< typename T >
	+__device__ inline
	+void cuda_shfl_down( T & out , T const & in , int delta ,
	+ typename std::enable_if
	+ < ( sizeof(int) < sizeof(T) ) && ( 0 == ( sizeof(T) % sizeof(int) ) )
	+ , int >::type width )
	+{
	+ enum : int { N = sizeof(T) / sizeof(int) };
	+
	+ for ( int i = 0 ; i < N ; ++i ) {
	+ reinterpret_cast<int*>(&out)[i] =
	+ __shfl_down( reinterpret_cast<int const *>(&in)[i] , delta , width );
	+ }
	+}

	+//----------------------------------------------------------------------------

	-//Shfl based reductions
	+template< typename T >
	+__device__ inline
	+void cuda_shfl_up( T & out , T const & in , int delta ,
	+ typename std::enable_if< sizeof(int) == sizeof(T) , int >::type width )
	+{
	+ reinterpret_cast<int>(&out) =
	+ __shfl_up( reinterpret_cast<int const >(&in) , delta , width );
	+}
	+
	+template< typename T >
	+__device__ inline
	+void cuda_shfl_up( T & out , T const & in , int delta ,
	+ typename std::enable_if
	+ < ( sizeof(int) < sizeof(T) ) && ( 0 == ( sizeof(T) % sizeof(int) ) )
	+ , int >::type width )
	+{
	+ enum : int { N = sizeof(T) / sizeof(int) };
	+
	+ for ( int i = 0 ; i < N ; ++i ) {
	+ reinterpret_cast<int*>(&out)[i] =
	+ __shfl_up( reinterpret_cast<int const *>(&in)[i] , delta , width );
	+ }
	+}
	+
	+//----------------------------------------------------------------------------
	+/** \brief Reduce within a warp over blockDim.x, the "vector" dimension.
	+ *
	+ * This will be called within a nested, intra-team parallel operation.
	+ * Use shuffle operations to avoid conflicts with shared memory usage.
	+ *
	+ * Requires:
	+ * blockDim.x is power of 2
	+ * blockDim.x <= 32 (one warp)
	+ *
	+ * Cannot use "butterfly" pattern because floating point
	+ * addition is non-associative. Therefore, must broadcast
	+ * the final result.
	+ */
	+template< class Reducer >
	+__device__ inline
	+void cuda_intra_warp_vector_reduce( Reducer const & reducer )
	+{
	+ static_assert(
	+ std::is_reference< typename Reducer::reference_type >::value , "" );
	+
	+ if ( 1 < blockDim.x ) {
	+
	+ typename Reducer::value_type tmp ;
	+
	+ for ( int i = blockDim.x ; ( i >>= 1 ) ; ) {
	+
	+ cuda_shfl_down( tmp , reducer.reference() , i , blockDim.x );
	+
	+ if ( threadIdx.x < i ) { reducer.join( reducer.data() , & tmp ); }
	+ }
	+
	+ // Broadcast from root "lane" to all other "lanes"
	+
	+ cuda_shfl( reducer.reference() , reducer.reference() , 0 , blockDim.x );
	+ }
	+}
	+
	+/** \brief Inclusive scan over blockDim.x, the "vector" dimension.
	+ *
	+ * This will be called within a nested, intra-team parallel operation.
	+ * Use shuffle operations to avoid conflicts with shared memory usage.
	+ *
	+ * Algorithm is concurrent bottom-up reductions in triangular pattern
	+ * where each CUDA thread is the root of a reduction tree from the
	+ * zeroth CUDA thread to itself.
	+ *
	+ * Requires:
	+ * blockDim.x is power of 2
	+ * blockDim.x <= 32 (one warp)
	+ */
	+template< typename ValueType >
	+__device__ inline
	+void cuda_intra_warp_vector_inclusive_scan( ValueType & local )
	+{
	+ ValueType tmp ;
	+
	+ // Bottom up:
	+ // [t] += [t-1] if t >= 1
	+ // [t] += [t-2] if t >= 2
	+ // [t] += [t-4] if t >= 4
	+ // ...
	+
	+ for ( int i = 1 ; i < blockDim.x ; i <<= 1 ) {
	+
	+ cuda_shfl_up( tmp , local , i , blockDim.x );
	+
	+ if ( i <= threadIdx.x ) { local += tmp ; }
	+ }
	+}
	+
	+//----------------------------------------------------------------------------
	/*
	* Algorithmic constraints:
	* (a) threads with same threadIdx.y have same value
	* (b) blockDim.x == power of two
	* (c) blockDim.z == 1
	*/

	template< class ValueType , class JoinOp>
	__device__
	inline void cuda_intra_warp_reduction( ValueType& result,
	const JoinOp& join,
	const int max_active_thread = blockDim.y) {

	unsigned int shift = 1;

	//Reduce over values from threads with different threadIdx.y
	while(blockDim.x * shift < 32 ) {
	const ValueType tmp = shfl_down(result, blockDim.x*shift,32u);
	//Only join if upper thread is active (this allows non power of two for blockDim.y
	if(threadIdx.y + shift < max_active_thread)
	join(result , tmp);
	shift*=2;
	}

	result = shfl(result,0,32);
	}

	template< class ValueType , class JoinOp>
	__device__
	inline void cuda_inter_warp_reduction( ValueType& value,
	const JoinOp& join,
	const int max_active_thread = blockDim.y) {

	#define STEP_WIDTH 4
	- __shared__ char sh_result[sizeof(ValueType)*STEP_WIDTH];
	+ // Depending on the ValueType _shared__ memory must be aligned up to 8byte boundaries
	+ // The reason not to use ValueType directly is that for types with constructors it
	+ // could lead to race conditions
	+ __shared__ double sh_result[(sizeof(ValueType)+7)/8*STEP_WIDTH];
	ValueType* result = (ValueType*) & sh_result;
	const unsigned step = 32 / blockDim.x;
	unsigned shift = STEP_WIDTH;
	const int id = threadIdx.y%step==0?threadIdx.y/step:65000;
	if(id < STEP_WIDTH ) {
	result[id] = value;
	}
	__syncthreads();
	while (shift<=max_active_thread/step) {
	if(shift<=id && shift+STEP_WIDTH>id && threadIdx.x==0) {
	join(result[id%STEP_WIDTH],value);
	}
	__syncthreads();
	shift+=STEP_WIDTH;
	}


	value = result[0];
	for(int i = 1; (i*step<max_active_thread) && i<STEP_WIDTH; i++)
	join(value,result[i]);
	}

	template< class ValueType , class JoinOp>
	__device__
	inline void cuda_intra_block_reduction( ValueType& value,
	const JoinOp& join,
	const int max_active_thread = blockDim.y) {
	cuda_intra_warp_reduction(value,join,max_active_thread);
	cuda_inter_warp_reduction(value,join,max_active_thread);
	}

	template< class FunctorType , class JoinOp , class ArgTag = void >
	__device__
	bool cuda_inter_block_reduction( typename FunctorValueTraits< FunctorType , ArgTag >::reference_type value,
	typename FunctorValueTraits< FunctorType , ArgTag >::reference_type neutral,
	const JoinOp& join,
	Cuda::size_type * const m_scratch_space,
	typename FunctorValueTraits< FunctorType , ArgTag >::pointer_type const result,
	Cuda::size_type * const m_scratch_flags,
	const int max_active_thread = blockDim.y) {
	#ifdef __CUDA_ARCH__
	typedef typename FunctorValueTraits< FunctorType , ArgTag >::pointer_type pointer_type;
	typedef typename FunctorValueTraits< FunctorType , ArgTag >::value_type value_type;

	//Do the intra-block reduction with shfl operations and static shared memory
	cuda_intra_block_reduction(value,join,max_active_thread);

	const unsigned id = threadIdx.y*blockDim.x + threadIdx.x;

	//One thread in the block writes block result to global scratch_memory
	if(id == 0 ) {
	pointer_type global = ((pointer_type) m_scratch_space) + blockIdx.x;
	*global = value;
	}

	//One warp of last block performs inter block reduction through loading the block values from global scratch_memory
	bool last_block = false;

	__syncthreads();
	if ( id < 32 ) {
	Cuda::size_type count;

	//Figure out whether this is the last block
	if(id == 0)
	count = Kokkos::atomic_fetch_add(m_scratch_flags,1);
	count = Kokkos::shfl(count,0,32);

	//Last block does the inter block reduction
	if( count == gridDim.x - 1) {
	//set flag back to zero
	if(id == 0)
	*m_scratch_flags = 0;
	last_block = true;
	value = neutral;

	pointer_type const volatile global = (pointer_type) m_scratch_space ;

	//Reduce all global values with splitting work over threads in one warp
	const int step_size = blockDim.xblockDim.y < 32 ? blockDim.xblockDim.y : 32;
	for(int i=id; i<gridDim.x; i+=step_size) {
	value_type tmp = global[i];
	join(value, tmp);
	}

	//Perform shfl reductions within the warp only join if contribution is valid (allows gridDim.x non power of two and <32)
	if (blockDim.x*blockDim.y > 1) {
	value_type tmp = Kokkos::shfl_down(value, 1,32);
	if( id + 1 < gridDim.x )
	join(value, tmp);
	}
	if (blockDim.x*blockDim.y > 2) {
	value_type tmp = Kokkos::shfl_down(value, 2,32);
	if( id + 2 < gridDim.x )
	join(value, tmp);
	}
	if (blockDim.x*blockDim.y > 4) {
	value_type tmp = Kokkos::shfl_down(value, 4,32);
	if( id + 4 < gridDim.x )
	join(value, tmp);
	}
	if (blockDim.x*blockDim.y > 8) {
	value_type tmp = Kokkos::shfl_down(value, 8,32);
	if( id + 8 < gridDim.x )
	join(value, tmp);
	}
	if (blockDim.x*blockDim.y > 16) {
	value_type tmp = Kokkos::shfl_down(value, 16,32);
	if( id + 16 < gridDim.x )
	join(value, tmp);
	}
	}
	}

	//The last block has in its thread=0 the global reduction value through "value"
	return last_block;
	#else
	return true;
	#endif
	}

	//----------------------------------------------------------------------------
	// See section B.17 of Cuda C Programming Guide Version 3.2
	// for discussion of
	// __launch_bounds__(maxThreadsPerBlock,minBlocksPerMultiprocessor)
	// function qualifier which could be used to improve performance.
	//----------------------------------------------------------------------------
	// Maximize shared memory and minimize L1 cache:
	// cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferShared );
	// For 2.0 capability: 48 KB shared and 16 KB L1
	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------
	/*
	* Algorithmic constraints:
	* (a) blockDim.y is a power of two
	* (b) blockDim.y <= 512
	* (c) blockDim.x == blockDim.z == 1
	*/

	template< bool DoScan , class FunctorType , class ArgTag >
	__device__
	void cuda_intra_block_reduce_scan( const FunctorType & functor ,
	const typename FunctorValueTraits< FunctorType , ArgTag >::pointer_type base_data )
	{
	typedef FunctorValueTraits< FunctorType , ArgTag > ValueTraits ;
	typedef FunctorValueJoin< FunctorType , ArgTag > ValueJoin ;

	typedef typename ValueTraits::pointer_type pointer_type ;

	const unsigned value_count = ValueTraits::value_count( functor );
	const unsigned BlockSizeMask = blockDim.y - 1 ;

	// Must have power of two thread count

	if ( BlockSizeMask & blockDim.y ) { Kokkos::abort("Cuda::cuda_intra_block_scan requires power-of-two blockDim"); }

	#define BLOCK_REDUCE_STEP( R , TD , S ) \
	if ( ! ( R & ((1<<(S+1))-1) ) ) { ValueJoin::join( functor , TD , (TD - (value_count<<S)) ); }

	#define BLOCK_SCAN_STEP( TD , N , S ) \
	if ( N == (1<<S) ) { ValueJoin::join( functor , TD , (TD - (value_count<<S))); }

	const unsigned rtid_intra = threadIdx.y ^ BlockSizeMask ;
	const pointer_type tdata_intra = base_data + value_count * threadIdx.y ;

	{ // Intra-warp reduction:
	BLOCK_REDUCE_STEP(rtid_intra,tdata_intra,0)
	BLOCK_REDUCE_STEP(rtid_intra,tdata_intra,1)
	BLOCK_REDUCE_STEP(rtid_intra,tdata_intra,2)
	BLOCK_REDUCE_STEP(rtid_intra,tdata_intra,3)
	BLOCK_REDUCE_STEP(rtid_intra,tdata_intra,4)
	}

	__syncthreads(); // Wait for all warps to reduce

	{ // Inter-warp reduce-scan by a single warp to avoid extra synchronizations
	const unsigned rtid_inter = ( threadIdx.y ^ BlockSizeMask ) << CudaTraits::WarpIndexShift ;

	if ( rtid_inter < blockDim.y ) {

	const pointer_type tdata_inter = base_data + value_count * ( rtid_inter ^ BlockSizeMask );

	if ( (1<<5) < BlockSizeMask ) { BLOCK_REDUCE_STEP(rtid_inter,tdata_inter,5) }
	if ( (1<<6) < BlockSizeMask ) { __threadfence_block(); BLOCK_REDUCE_STEP(rtid_inter,tdata_inter,6) }
	if ( (1<<7) < BlockSizeMask ) { __threadfence_block(); BLOCK_REDUCE_STEP(rtid_inter,tdata_inter,7) }
	if ( (1<<8) < BlockSizeMask ) { __threadfence_block(); BLOCK_REDUCE_STEP(rtid_inter,tdata_inter,8) }

	if ( DoScan ) {

	int n = ( rtid_inter & 32 ) ? 32 : (
	( rtid_inter & 64 ) ? 64 : (
	( rtid_inter & 128 ) ? 128 : (
	( rtid_inter & 256 ) ? 256 : 0 )));

	if ( ! ( rtid_inter + n < blockDim.y ) ) n = 0 ;

	__threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,8)
	__threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,7)
	__threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,6)
	__threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,5)
	}
	}
	}

	__syncthreads(); // Wait for inter-warp reduce-scan to complete

	if ( DoScan ) {
	int n = ( rtid_intra & 1 ) ? 1 : (
	( rtid_intra & 2 ) ? 2 : (
	( rtid_intra & 4 ) ? 4 : (
	( rtid_intra & 8 ) ? 8 : (
	( rtid_intra & 16 ) ? 16 : 0 ))));

	if ( ! ( rtid_intra + n < blockDim.y ) ) n = 0 ;
	#ifdef KOKKOS_IMPL_CUDA_CLANG_WORKAROUND
	BLOCK_SCAN_STEP(tdata_intra,n,4) __syncthreads();//__threadfence_block();
	BLOCK_SCAN_STEP(tdata_intra,n,3) __syncthreads();//__threadfence_block();
	BLOCK_SCAN_STEP(tdata_intra,n,2) __syncthreads();//__threadfence_block();
	BLOCK_SCAN_STEP(tdata_intra,n,1) __syncthreads();//__threadfence_block();
	BLOCK_SCAN_STEP(tdata_intra,n,0) __syncthreads();
	#else
	BLOCK_SCAN_STEP(tdata_intra,n,4) __threadfence_block();
	BLOCK_SCAN_STEP(tdata_intra,n,3) __threadfence_block();
	BLOCK_SCAN_STEP(tdata_intra,n,2) __threadfence_block();
	BLOCK_SCAN_STEP(tdata_intra,n,1) __threadfence_block();
	BLOCK_SCAN_STEP(tdata_intra,n,0) __threadfence_block();
	#endif
	}

	#undef BLOCK_SCAN_STEP
	#undef BLOCK_REDUCE_STEP
	}

	//----------------------------------------------------------------------------
	/**\brief Input value-per-thread starting at 'shared_data'.
	* Reduction value at last thread's location.
	*
	* If 'DoScan' then write blocks' scan values and block-groups' scan values.
	*
	* Global reduce result is in the last threads' 'shared_data' location.
	*/
	template< bool DoScan , class FunctorType , class ArgTag >
	__device__
	bool cuda_single_inter_block_reduce_scan( const FunctorType & functor ,
	const Cuda::size_type block_id ,
	const Cuda::size_type block_count ,
	Cuda::size_type * const shared_data ,
	Cuda::size_type * const global_data ,
	Cuda::size_type * const global_flags )
	{
	typedef Cuda::size_type size_type ;
	typedef FunctorValueTraits< FunctorType , ArgTag > ValueTraits ;
	typedef FunctorValueJoin< FunctorType , ArgTag > ValueJoin ;
	typedef FunctorValueInit< FunctorType , ArgTag > ValueInit ;
	typedef FunctorValueOps< FunctorType , ArgTag > ValueOps ;

	typedef typename ValueTraits::pointer_type pointer_type ;
	typedef typename ValueTraits::reference_type reference_type ;

	// '__ffs' = position of the least significant bit set to 1.
	// 'blockDim.y' is guaranteed to be a power of two so this
	// is the integral shift value that can replace an integral divide.
	const unsigned BlockSizeShift = __ffs( blockDim.y ) - 1 ;
	const unsigned BlockSizeMask = blockDim.y - 1 ;

	// Must have power of two thread count
	if ( BlockSizeMask & blockDim.y ) { Kokkos::abort("Cuda::cuda_single_inter_block_reduce_scan requires power-of-two blockDim"); }

	const integral_nonzero_constant< size_type , ValueTraits::StaticValueSize / sizeof(size_type) >
	word_count( ValueTraits::value_size( functor ) / sizeof(size_type) );

	// Reduce the accumulation for the entire block.
	cuda_intra_block_reduce_scan<false,FunctorType,ArgTag>( functor , pointer_type(shared_data) );

	{
	// Write accumulation total to global scratch space.
	// Accumulation total is the last thread's data.
	size_type * const shared = shared_data + word_count.value * BlockSizeMask ;
	size_type * const global = global_data + word_count.value * block_id ;

	#if (__CUDA_ARCH__ < 500)
	for ( size_type i = threadIdx.y ; i < word_count.value ; i += blockDim.y ) { global[i] = shared[i] ; }
	#else
	for ( size_type i = 0 ; i < word_count.value ; i += 1 ) { global[i] = shared[i] ; }
	#endif

	}

	// Contributing blocks note that their contribution has been completed via an atomic-increment flag
	// If this block is not the last block to contribute to this group then the block is done.
	const bool is_last_block =
	! __syncthreads_or( threadIdx.y ? 0 : ( 1 + atomicInc( global_flags , block_count - 1 ) < block_count ) );

	if ( is_last_block ) {

	const size_type b = ( long(block_count) * long(threadIdx.y) ) >> BlockSizeShift ;
	const size_type e = ( long(block_count) * long( threadIdx.y + 1 ) ) >> BlockSizeShift ;

	{
	void * const shared_ptr = shared_data + word_count.value * threadIdx.y ;
	reference_type shared_value = ValueInit::init( functor , shared_ptr );

	for ( size_type i = b ; i < e ; ++i ) {
	ValueJoin::join( functor , shared_ptr , global_data + word_count.value * i );
	}
	}

	cuda_intra_block_reduce_scan<DoScan,FunctorType,ArgTag>( functor , pointer_type(shared_data) );

	if ( DoScan ) {

	size_type * const shared_value = shared_data + word_count.value * ( threadIdx.y ? threadIdx.y - 1 : blockDim.y );

	if ( ! threadIdx.y ) { ValueInit::init( functor , shared_value ); }

	// Join previous inclusive scan value to each member
	for ( size_type i = b ; i < e ; ++i ) {
	size_type * const global_value = global_data + word_count.value * i ;
	ValueJoin::join( functor , shared_value , global_value );
	ValueOps ::copy( functor , global_value , shared_value );
	}
	}
	}

	return is_last_block ;
	}

	// Size in bytes required for inter block reduce or scan
	template< bool DoScan , class FunctorType , class ArgTag >
	inline
	unsigned cuda_single_inter_block_reduce_scan_shmem( const FunctorType & functor , const unsigned BlockSize )
	{
	return ( BlockSize + 2 ) * Impl::FunctorValueTraits< FunctorType , ArgTag >::value_size( functor );
	}

	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	#endif /* #if defined( __CUDACC__ ) */
	#endif /* KOKKOS_CUDA_REDUCESCAN_HPP */

	diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.cpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.cpp
	index c96b8b7d4..cf3e55d50 100644
	--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.cpp
	+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.cpp
	@@ -1,179 +1,179 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Core.hpp>

	#if defined( KOKKOS_ENABLE_CUDA ) && defined( KOKKOS_ENABLE_TASKDAG )

	#include <impl/Kokkos_TaskQueue_impl.hpp>

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	template class TaskQueue< Kokkos::Cuda > ;

	//----------------------------------------------------------------------------

	__device__
	void TaskQueueSpecialization< Kokkos::Cuda >::driver
	( TaskQueueSpecialization< Kokkos::Cuda >::queue_type * const queue )
	{
	using Member = TaskExec< Kokkos::Cuda > ;
	using Queue = TaskQueue< Kokkos::Cuda > ;
	using task_root_type = TaskBase< Kokkos::Cuda , void , void > ;

	task_root_type * const end = (task_root_type *) task_root_type::EndTag ;

	Member single_exec( 1 );
	Member team_exec( blockDim.y );

	const int warp_lane = threadIdx.x + threadIdx.y * blockDim.x ;

	union {
	task_root_type * ptr ;
	int raw[2] ;
	} task ;

	// Loop until all queues are empty and no tasks in flight

	do {

	// Each team lead attempts to acquire either a thread team task
	// or collection of single thread tasks for the team.

	if ( 0 == warp_lane ) {

	task.ptr = 0 < ((volatile int ) & queue->m_ready_count) ? end : 0 ;

	// Loop by priority and then type
	for ( int i = 0 ; i < Queue::NumQueue && end == task.ptr ; ++i ) {
	for ( int j = 0 ; j < 2 && end == task.ptr ; ++j ) {
	- task.ptr = Queue::pop_task( & queue->m_ready[i][j] );
	+ task.ptr = Queue::pop_ready_task( & queue->m_ready[i][j] );
	}
	}

	#if 0
	printf("TaskQueue<Cuda>::driver(%d,%d) task(%lx)\n",threadIdx.z,blockIdx.x
	, uintptr_t(task.ptr));
	#endif

	}

	// shuffle broadcast

	task.raw[0] = __shfl( task.raw[0] , 0 );
	task.raw[1] = __shfl( task.raw[1] , 0 );

	if ( 0 == task.ptr ) break ; // 0 == queue->m_ready_count

	if ( end != task.ptr ) {
	if ( task_root_type::TaskTeam == task.ptr->m_task_type ) {
	// Thread Team Task
	(*task.ptr->m_apply)( task.ptr , & team_exec );
	}
	else if ( 0 == threadIdx.y ) {
	// Single Thread Task
	(*task.ptr->m_apply)( task.ptr , & single_exec );
	}

	if ( 0 == warp_lane ) {
	queue->complete( task.ptr );
	}
	}
	} while(1);
	}

	namespace {

	__global__
	void cuda_task_queue_execute( TaskQueue< Kokkos::Cuda > * queue )
	{ TaskQueueSpecialization< Kokkos::Cuda >::driver( queue ); }

	}

	void TaskQueueSpecialization< Kokkos::Cuda >::execute
	( TaskQueue< Kokkos::Cuda > * const queue )
	{
	const int warps_per_block = 4 ;
	const dim3 grid( Kokkos::Impl::cuda_internal_multiprocessor_count() , 1 , 1 );
	const dim3 block( 1 , Kokkos::Impl::CudaTraits::WarpSize , warps_per_block );
	const int shared = 0 ;
	const cudaStream_t stream = 0 ;

	CUDA_SAFE_CALL( cudaDeviceSynchronize() );

	#if 0
	printf("cuda_task_queue_execute before\n");
	#endif

	// Query the stack size, in bytes:
	//
	// size_t stack_size = 0 ;
	// CUDA_SAFE_CALL( cudaDeviceGetLimit( & stack_size , cudaLimitStackSize ) );
	//
	// If not large enough then set the stack size, in bytes:
	//
	// CUDA_SAFE_CALL( cudaDeviceSetLimit( cudaLimitStackSize , stack_size ) );

	cuda_task_queue_execute<<< grid , block , shared , stream >>>( queue );

	CUDA_SAFE_CALL( cudaGetLastError() );

	CUDA_SAFE_CALL( cudaDeviceSynchronize() );

	#if 0
	printf("cuda_task_queue_execute after\n");
	#endif

	}

	}} /* namespace Kokkos::Impl */

	//----------------------------------------------------------------------------

	#endif /* #if defined( KOKKOS_ENABLE_CUDA ) && defined( KOKKOS_ENABLE_TASKDAG ) */


	diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.hpp
	index 479294f30..a13e37837 100644
	--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.hpp
	+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.hpp
	@@ -1,523 +1,546 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_IMPL_CUDA_TASK_HPP
	#define KOKKOS_IMPL_CUDA_TASK_HPP

	#if defined( KOKKOS_ENABLE_TASKDAG )

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {
	namespace {

	template< typename TaskType >
	__global__
	void set_cuda_task_base_apply_function_pointer
	( TaskBase<Kokkos::Cuda,void,void>::function_type * ptr )
	{ *ptr = TaskType::apply ; }

	}

	+template< class > class TaskExec ;
	+
	template<>
	class TaskQueueSpecialization< Kokkos::Cuda >
	{
	public:

	using execution_space = Kokkos::Cuda ;
	using memory_space = Kokkos::CudaUVMSpace ;
	using queue_type = TaskQueue< execution_space > ;
	+ using member_type = TaskExec< Kokkos::Cuda > ;

	static
	void iff_single_thread_recursive_execute( queue_type * const ) {}

	__device__
	static void driver( queue_type * const );

	static
	void execute( queue_type * const );

	- template< typename FunctorType >
	+ template< typename TaskType >
	static
	- void proc_set_apply( TaskBase<execution_space,void,void>::function_type * ptr )
	+ typename TaskType::function_type
	+ get_function_pointer()
	{
	- using TaskType = TaskBase< execution_space
	- , typename FunctorType::value_type
	- , FunctorType > ;
	+ using function_type = typename TaskType::function_type ;
	+
	+ function_type * const ptr =
	+ (function_type*) cuda_internal_scratch_unified( sizeof(function_type) );

	CUDA_SAFE_CALL( cudaDeviceSynchronize() );

	set_cuda_task_base_apply_function_pointer<TaskType><<<1,1>>>(ptr);

	CUDA_SAFE_CALL( cudaGetLastError() );
	CUDA_SAFE_CALL( cudaDeviceSynchronize() );
	+
	+ return *ptr ;
	}
	};

	extern template class TaskQueue< Kokkos::Cuda > ;

	//----------------------------------------------------------------------------
	/**\brief Impl::TaskExec<Cuda> is the TaskScheduler<Cuda>::member_type
	* passed to tasks running in a Cuda space.
	*
	* Cuda thread blocks for tasking are dimensioned:
	* blockDim.x == vector length
	* blockDim.y == team size
	* blockDim.z == number of teams
	* where
	* blockDim.x * blockDim.y == WarpSize
	*
	* Both single thread and thread team tasks are run by a full Cuda warp.
	* A single thread task is called by warp lane #0 and the remaining
	* lanes of the warp are idle.
	*/
	template<>
	class TaskExec< Kokkos::Cuda >
	{
	private:

	TaskExec( TaskExec && ) = delete ;
	TaskExec( TaskExec const & ) = delete ;
	TaskExec & operator = ( TaskExec && ) = delete ;
	TaskExec & operator = ( TaskExec const & ) = delete ;

	friend class Kokkos::Impl::TaskQueue< Kokkos::Cuda > ;
	friend class Kokkos::Impl::TaskQueueSpecialization< Kokkos::Cuda > ;

	const int m_team_size ;

	__device__
	TaskExec( int arg_team_size = blockDim.y )
	: m_team_size( arg_team_size ) {}

	public:

	#if defined( __CUDA_ARCH__ )
	__device__ void team_barrier() { /* __threadfence_block(); */ }
	__device__ int team_rank() const { return threadIdx.y ; }
	__device__ int team_size() const { return m_team_size ; }
	#else
	__host__ void team_barrier() {}
	__host__ int team_rank() const { return 0 ; }
	__host__ int team_size() const { return 0 ; }
	#endif

	};

	//----------------------------------------------------------------------------

	template<typename iType>
	struct TeamThreadRangeBoundariesStruct<iType, TaskExec< Kokkos::Cuda > >
	{
	typedef iType index_type;
	const iType start ;
	const iType end ;
	const iType increment ;
	const TaskExec< Kokkos::Cuda > & thread;

	#if defined( __CUDA_ARCH__ )

	__device__ inline
	TeamThreadRangeBoundariesStruct
	( const TaskExec< Kokkos::Cuda > & arg_thread, const iType& arg_count)
	: start( threadIdx.y )
	, end(arg_count)
	, increment( blockDim.y )
	, thread(arg_thread)
	{}

	__device__ inline
	TeamThreadRangeBoundariesStruct
	( const TaskExec< Kokkos::Cuda > & arg_thread
	, const iType & arg_start
	, const iType & arg_end
	)
	: start( arg_start + threadIdx.y )
	, end( arg_end)
	, increment( blockDim.y )
	, thread( arg_thread )
	{}

	#else

	TeamThreadRangeBoundariesStruct
	( const TaskExec< Kokkos::Cuda > & arg_thread, const iType& arg_count);

	TeamThreadRangeBoundariesStruct
	( const TaskExec< Kokkos::Cuda > & arg_thread
	, const iType & arg_start
	, const iType & arg_end
	);

	#endif

	};

	//----------------------------------------------------------------------------

	template<typename iType>
	struct ThreadVectorRangeBoundariesStruct<iType, TaskExec< Kokkos::Cuda > >
	{
	typedef iType index_type;
	const iType start ;
	const iType end ;
	const iType increment ;
	const TaskExec< Kokkos::Cuda > & thread;

	#if defined( __CUDA_ARCH__ )

	__device__ inline
	ThreadVectorRangeBoundariesStruct
	( const TaskExec< Kokkos::Cuda > & arg_thread, const iType& arg_count)
	: start( threadIdx.x )
	, end(arg_count)
	, increment( blockDim.x )
	, thread(arg_thread)
	{}

	#else

	ThreadVectorRangeBoundariesStruct
	( const TaskExec< Kokkos::Cuda > & arg_thread, const iType& arg_count);

	#endif

	};

	}} /* namespace Kokkos::Impl */

	//----------------------------------------------------------------------------

	namespace Kokkos {

	template<typename iType>
	KOKKOS_INLINE_FUNCTION
	Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Cuda > >
	TeamThreadRange( const Impl::TaskExec< Kokkos::Cuda > & thread, const iType & count )
	{
	return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Cuda > >( thread, count );
	}

	template<typename iType1, typename iType2>
	KOKKOS_INLINE_FUNCTION
	Impl::TeamThreadRangeBoundariesStruct
	< typename std::common_type<iType1,iType2>::type
	, Impl::TaskExec< Kokkos::Cuda > >
	TeamThreadRange( const Impl::TaskExec< Kokkos::Cuda > & thread
	, const iType1 & begin, const iType2 & end )
	{
	typedef typename std::common_type< iType1, iType2 >::type iType;
	return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Cuda > >(
	thread, iType(begin), iType(end) );
	}

	template<typename iType>
	KOKKOS_INLINE_FUNCTION
	Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >
	ThreadVectorRange( const Impl::TaskExec< Kokkos::Cuda > & thread
	, const iType & count )
	{
	return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >(thread,count);
	}

	/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all threads of the the calling thread team.
	* This functionality requires C++11 support.
	*/
	template<typename iType, class Lambda>
	KOKKOS_INLINE_FUNCTION
	void parallel_for
	( const Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::Cuda > >& loop_boundaries
	, const Lambda& lambda
	)
	{
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	lambda(i);
	}
	}

	// reduce across corresponding lanes between team members within warp
	// assume stride*team_size == warp_size
	template< typename ValueType, class JoinType >
	KOKKOS_INLINE_FUNCTION
	void strided_shfl_warp_reduction
	(const JoinType& join,
	ValueType& val,
	int team_size,
	int stride)
	{
	for (int lane_delta=(team_size*stride)>>1; lane_delta>=stride; lane_delta>>=1) {
	join(val, Kokkos::shfl_down(val, lane_delta, team_size*stride));
	}
	}

	// multiple within-warp non-strided reductions
	template< typename ValueType, class JoinType >
	KOKKOS_INLINE_FUNCTION
	void multi_shfl_warp_reduction
	(const JoinType& join,
	ValueType& val,
	int vec_length)
	{
	for (int lane_delta=vec_length>>1; lane_delta; lane_delta>>=1) {
	join(val, Kokkos::shfl_down(val, lane_delta, vec_length));
	}
	}

	// broadcast within warp
	template< class ValueType >
	KOKKOS_INLINE_FUNCTION
	ValueType shfl_warp_broadcast
	(ValueType& val,
	int src_lane,
	int width)
	{
	return Kokkos::shfl(val, src_lane, width);
	}

	// all-reduce across corresponding vector lanes between team members within warp
	// assume vec_length*team_size == warp_size
	// blockDim.x == vec_length == stride
	// blockDim.y == team_size
	// threadIdx.x == position in vec
	// threadIdx.y == member number
	template< typename iType, class Lambda, typename ValueType, class JoinType >
	KOKKOS_INLINE_FUNCTION
	void parallel_reduce
	(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
	const Lambda & lambda,
	const JoinType& join,
	ValueType& initialized_result) {

	ValueType result = initialized_result;
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	lambda(i,result);
	}
	initialized_result = result;

	strided_shfl_warp_reduction<ValueType, JoinType>(
	join,
	initialized_result,
	loop_boundaries.thread.team_size(),
	blockDim.x);
	initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, threadIdx.x, Impl::CudaTraits::WarpSize );
	}

	// all-reduce across corresponding vector lanes between team members within warp
	// if no join() provided, use sum
	// assume vec_length*team_size == warp_size
	// blockDim.x == vec_length == stride
	// blockDim.y == team_size
	// threadIdx.x == position in vec
	// threadIdx.y == member number
	template< typename iType, class Lambda, typename ValueType >
	KOKKOS_INLINE_FUNCTION
	void parallel_reduce
	(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
	const Lambda & lambda,
	ValueType& initialized_result) {

	//TODO what is the point of creating this temporary?
	ValueType result = initialized_result;
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	lambda(i,result);
	}
	initialized_result = result;

	strided_shfl_warp_reduction(
	[&] (ValueType& val1, const ValueType& val2) { val1 += val2; },
	initialized_result,
	loop_boundaries.thread.team_size(),
	blockDim.x);
	initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, threadIdx.x, Impl::CudaTraits::WarpSize );
	}

	// all-reduce within team members within warp
	// assume vec_length*team_size == warp_size
	// blockDim.x == vec_length == stride
	// blockDim.y == team_size
	// threadIdx.x == position in vec
	// threadIdx.y == member number
	template< typename iType, class Lambda, typename ValueType, class JoinType >
	KOKKOS_INLINE_FUNCTION
	void parallel_reduce
	(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
	const Lambda & lambda,
	const JoinType& join,
	ValueType& initialized_result) {

	ValueType result = initialized_result;
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	lambda(i,result);
	}
	initialized_result = result;

	multi_shfl_warp_reduction<ValueType, JoinType>(join, initialized_result, blockDim.x);
	initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, 0, blockDim.x );
	}

	// all-reduce within team members within warp
	// if no join() provided, use sum
	// assume vec_length*team_size == warp_size
	// blockDim.x == vec_length == stride
	// blockDim.y == team_size
	// threadIdx.x == position in vec
	// threadIdx.y == member number
	template< typename iType, class Lambda, typename ValueType >
	KOKKOS_INLINE_FUNCTION
	void parallel_reduce
	(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
	const Lambda & lambda,
	ValueType& initialized_result) {

	ValueType result = initialized_result;

	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	lambda(i,result);
	}

	initialized_result = result;

	//initialized_result = multi_shfl_warp_reduction(
	multi_shfl_warp_reduction(
	[&] (ValueType& val1, const ValueType& val2) { val1 += val2; },
	initialized_result,
	blockDim.x);
	initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, 0, blockDim.x );
	}

	// scan across corresponding vector lanes between team members within warp
	// assume vec_length*team_size == warp_size
	// blockDim.x == vec_length == stride
	// blockDim.y == team_size
	// threadIdx.x == position in vec
	// threadIdx.y == member number
	-template< typename ValueType, typename iType, class Lambda >
	+template< typename iType, class Closure >
	KOKKOS_INLINE_FUNCTION
	void parallel_scan
	(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
	- const Lambda & lambda) {
	+ const Closure & closure )
	+{
	+ // Extract value_type from closure

	- ValueType accum = 0 ;
	- ValueType val, y, local_total;
	+ using value_type =
	+ typename Kokkos::Impl::FunctorAnalysis
	+ < Kokkos::Impl::FunctorPatternInterface::SCAN
	+ , void
	+ , Closure >::value_type ;
	+
	+ value_type accum = 0 ;
	+ value_type val, y, local_total;

	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	val = 0;
	- lambda(i,val,false);
	+ closure(i,val,false);

	// intra-blockDim.y exclusive scan on 'val'
	// accum = accumulated, sum in total for this iteration

	// INCLUSIVE scan
	for( int offset = blockDim.x ; offset < Impl::CudaTraits::WarpSize ; offset <<= 1 ) {
	y = Kokkos::shfl_up(val, offset, Impl::CudaTraits::WarpSize);
	if(threadIdx.y*blockDim.x >= offset) { val += y; }
	}

	// pass accum to all threads
	- local_total = shfl_warp_broadcast<ValueType>(val,
	+ local_total = shfl_warp_broadcast<value_type>(val,
	threadIdx.x+Impl::CudaTraits::WarpSize-blockDim.x,
	Impl::CudaTraits::WarpSize);

	// make EXCLUSIVE scan by shifting values over one
	val = Kokkos::shfl_up(val, blockDim.x, Impl::CudaTraits::WarpSize);
	if ( threadIdx.y == 0 ) { val = 0 ; }

	val += accum;
	- lambda(i,val,true);
	+ closure(i,val,true);
	accum += local_total;
	}
	}

	// scan within team member (vector) within warp
	// assume vec_length*team_size == warp_size
	// blockDim.x == vec_length == stride
	// blockDim.y == team_size
	// threadIdx.x == position in vec
	// threadIdx.y == member number
	-template< typename iType, class Lambda, typename ValueType >
	+template< typename iType, class Closure >
	KOKKOS_INLINE_FUNCTION
	void parallel_scan
	(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
	- const Lambda & lambda)
	+ const Closure & closure )
	{
	- ValueType accum = 0 ;
	- ValueType val, y, local_total;
	+ // Extract value_type from closure
	+
	+ using value_type =
	+ typename Kokkos::Impl::FunctorAnalysis
	+ < Kokkos::Impl::FunctorPatternInterface::SCAN
	+ , void
	+ , Closure >::value_type ;
	+
	+ value_type accum = 0 ;
	+ value_type val, y, local_total;

	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	val = 0;
	- lambda(i,val,false);
	+ closure(i,val,false);

	// intra-blockDim.x exclusive scan on 'val'
	// accum = accumulated, sum in total for this iteration

	// INCLUSIVE scan
	for( int offset = 1 ; offset < blockDim.x ; offset <<= 1 ) {
	y = Kokkos::shfl_up(val, offset, blockDim.x);
	if(threadIdx.x >= offset) { val += y; }
	}

	// pass accum to all threads
	- local_total = shfl_warp_broadcast<ValueType>(val, blockDim.x-1, blockDim.x);
	+ local_total = shfl_warp_broadcast<value_type>(val, blockDim.x-1, blockDim.x);

	// make EXCLUSIVE scan by shifting values over one
	val = Kokkos::shfl_up(val, 1, blockDim.x);
	if ( threadIdx.x == 0 ) { val = 0 ; }

	val += accum;
	- lambda(i,val,true);
	+ closure(i,val,true);
	accum += local_total;
	}
	}

	} /* namespace Kokkos */

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
	#endif /* #ifndef KOKKOS_IMPL_CUDA_TASK_HPP */

	diff --git a/lib/kokkos/core/src/KokkosExp_MDRangePolicy.hpp b/lib/kokkos/core/src/KokkosExp_MDRangePolicy.hpp
	index 4e1ce855c..a450ca36a 100644
	--- a/lib/kokkos/core/src/KokkosExp_MDRangePolicy.hpp
	+++ b/lib/kokkos/core/src/KokkosExp_MDRangePolicy.hpp
	@@ -1,611 +1,477 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_CORE_EXP_MD_RANGE_POLICY_HPP
	#define KOKKOS_CORE_EXP_MD_RANGE_POLICY_HPP

	+#include <initializer_list>
	+
	+#include<impl/KokkosExp_Host_IterateTile.hpp>
	#include <Kokkos_ExecPolicy.hpp>
	#include <Kokkos_Parallel.hpp>
	-#include <initializer_list>

	-#if defined(KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION) && defined(KOKKOS_ENABLE_PRAGMA_IVDEP) && !defined(__CUDA_ARCH__)
	-#define KOKKOS_IMPL_MDRANGE_IVDEP
	+#if defined( __CUDACC__ ) && defined( KOKKOS_ENABLE_CUDA )
	+#include<Cuda/KokkosExp_Cuda_IterateTile.hpp>
	#endif

	namespace Kokkos { namespace Experimental {

	+// ------------------------------------------------------------------ //
	+
	enum class Iterate
	{
	Default, // Default for the device
	Left, // Left indices stride fastest
	Right, // Right indices stride fastest
	- Flat, // Do not tile, only valid for inner direction
	};

	template <typename ExecSpace>
	struct default_outer_direction
	{
	using type = Iterate;
	+ #if defined( KOKKOS_ENABLE_CUDA)
	+ static constexpr Iterate value = Iterate::Left;
	+ #else
	static constexpr Iterate value = Iterate::Right;
	+ #endif
	};

	template <typename ExecSpace>
	struct default_inner_direction
	{
	using type = Iterate;
	+ #if defined( KOKKOS_ENABLE_CUDA)
	+ static constexpr Iterate value = Iterate::Left;
	+ #else
	static constexpr Iterate value = Iterate::Right;
	+ #endif
	};


	// Iteration Pattern
	template < unsigned N
	, Iterate OuterDir = Iterate::Default
	, Iterate InnerDir = Iterate::Default
	>
	struct Rank
	{
	static_assert( N != 0u, "Kokkos Error: rank 0 undefined");
	static_assert( N != 1u, "Kokkos Error: rank 1 is not a multi-dimensional range");
	- static_assert( N < 4u, "Kokkos Error: Unsupported rank...");
	+ static_assert( N < 7u, "Kokkos Error: Unsupported rank...");

	using iteration_pattern = Rank<N, OuterDir, InnerDir>;

	static constexpr int rank = N;
	static constexpr Iterate outer_direction = OuterDir;
	static constexpr Iterate inner_direction = InnerDir;
	};


	-
	// multi-dimensional iteration pattern
	template <typename... Properties>
	struct MDRangePolicy
	+ : public Kokkos::Impl::PolicyTraits<Properties ...>
	{
	+ using traits = Kokkos::Impl::PolicyTraits<Properties ...>;
	using range_policy = RangePolicy<Properties...>;

	- static_assert( !std::is_same<range_policy,void>::value
	+ using impl_range_policy = RangePolicy< typename traits::execution_space
	+ , typename traits::schedule_type
	+ , typename traits::index_type
	+ > ;
	+
	+ static_assert( !std::is_same<typename traits::iteration_pattern,void>::value
	, "Kokkos Error: MD iteration pattern not defined" );

	- using iteration_pattern = typename range_policy::iteration_pattern;
	- using work_tag = typename range_policy::work_tag;
	+ using iteration_pattern = typename traits::iteration_pattern;
	+ using work_tag = typename traits::work_tag;

	static constexpr int rank = iteration_pattern::rank;

	static constexpr int outer_direction = static_cast<int> (
	- (iteration_pattern::outer_direction != Iterate::Default && iteration_pattern::outer_direction != Iterate::Flat)
	+ (iteration_pattern::outer_direction != Iterate::Default)
	? iteration_pattern::outer_direction
	- : default_outer_direction< typename range_policy::execution_space>::value );
	+ : default_outer_direction< typename traits::execution_space>::value );

	static constexpr int inner_direction = static_cast<int> (
	iteration_pattern::inner_direction != Iterate::Default
	? iteration_pattern::inner_direction
	- : default_inner_direction< typename range_policy::execution_space>::value ) ;
	+ : default_inner_direction< typename traits::execution_space>::value ) ;


	// Ugly ugly workaround intel 14 not handling scoped enum correctly
	- static constexpr int Flat = static_cast<int>( Iterate::Flat );
	static constexpr int Right = static_cast<int>( Iterate::Right );
	-
	-
	- using size_type = typename range_policy::index_type;
	- using index_type = typename std::make_signed<size_type>::type;
	-
	-
	- template <typename I>
	- MDRangePolicy( std::initializer_list<I> upper_corner )
	+ static constexpr int Left = static_cast<int>( Iterate::Left );
	+
	+ using index_type = typename traits::index_type;
	+ using array_index_type = long;
	+ using point_type = Kokkos::Array<array_index_type,rank>; //was index_type
	+ using tile_type = Kokkos::Array<array_index_type,rank>;
	+ // If point_type or tile_type is not templated on a signed integral type (if it is unsigned),
	+ // then if user passes in intializer_list of runtime-determined values of
	+ // signed integral type that are not const will receive a compiler error due
	+ // to an invalid case for implicit conversion -
	+ // "conversion from integer or unscoped enumeration type to integer type that cannot represent all values of the original, except where source is a constant expression whose value can be stored exactly in the target type"
	+ // This would require the user to either pass a matching index_type parameter
	+ // as template parameter to the MDRangePolicy or static_cast the individual values
	+
	+ MDRangePolicy( point_type const& lower, point_type const& upper, tile_type const& tile = tile_type{} )
	+ : m_lower(lower)
	+ , m_upper(upper)
	+ , m_tile(tile)
	+ , m_num_tiles(1)
	{
	- static_assert( std::is_integral<I>::value, "Kokkos Error: corner defined with non-integral type" );
	-
	- // TODO check size of lists equal to rank
	- // static_asserts on initializer_list.size() require c++14
	-
	- //static_assert( upper_corner.size() == rank, "Kokkos Error: upper_corner has incorrect rank" );
	-
	- const auto u = upper_corner.begin();
	-
	- m_num_tiles = 1;
	- for (int i=0; i<rank; ++i) {
	- m_offset[i] = static_cast<index_type>(0);
	- m_dim[i] = static_cast<index_type>(u[i]);
	- if (inner_direction != Flat) {
	- // default tile size to 4
	- m_tile[i] = 4;
	- } else {
	- m_tile[i] = 1;
	+ // Host
	+ if ( true
	+ #if defined(KOKKOS_ENABLE_CUDA)
	+ && !std::is_same< typename traits::execution_space, Kokkos::Cuda >::value
	+ #endif
	+ )
	+ {
	+ index_type span;
	+ for (int i=0; i<rank; ++i) {
	+ span = upper[i] - lower[i];
	+ if ( m_tile[i] <= 0 ) {
	+ if ( (inner_direction == Right && (i < rank-1))
	+ \|\| (inner_direction == Left && (i > 0)) )
	+ {
	+ m_tile[i] = 2;
	+ }
	+ else {
	+ m_tile[i] = span;
	+ }
	+ }
	+ m_tile_end[i] = static_cast<index_type>((span + m_tile[i] - 1) / m_tile[i]);
	+ m_num_tiles *= m_tile_end[i];
	}
	- m_tile_dim[i] = (m_dim[i] + (m_tile[i] - 1)) / m_tile[i];
	- m_num_tiles *= m_tile_dim[i];
	}
	- }
	-
	- template <typename IA, typename IB>
	- MDRangePolicy( std::initializer_list<IA> corner_a
	- , std::initializer_list<IB> corner_b
	- )
	- {
	- static_assert( std::is_integral<IA>::value, "Kokkos Error: corner A defined with non-integral type" );
	- static_assert( std::is_integral<IB>::value, "Kokkos Error: corner B defined with non-integral type" );
	-
	- // TODO check size of lists equal to rank
	- // static_asserts on initializer_list.size() require c++14
	- //static_assert( corner_a.size() == rank, "Kokkos Error: corner_a has incorrect rank" );
	- //static_assert( corner_b.size() == rank, "Kokkos Error: corner_b has incorrect rank" );
	-
	-
	- using A = typename std::make_signed<IA>::type;
	- using B = typename std::make_signed<IB>::type;
	-
	- const auto a = [=](int i) { return static_cast<A>(corner_a.begin()[i]); };
	- const auto b = [=](int i) { return static_cast<B>(corner_b.begin()[i]); };
	-
	- m_num_tiles = 1;
	- for (int i=0; i<rank; ++i) {
	- m_offset[i] = static_cast<index_type>(a(i) <= b(i) ? a(i) : b(i));
	- m_dim[i] = static_cast<index_type>(a(i) <= b(i) ? b(i) - a(i) : a(i) - b(i));
	- if (inner_direction != Flat) {
	- // default tile size to 4
	- m_tile[i] = 4;
	- } else {
	- m_tile[i] = 1;
	+ #if defined(KOKKOS_ENABLE_CUDA)
	+ else // Cuda
	+ {
	+ index_type span;
	+ for (int i=0; i<rank; ++i) {
	+ span = upper[i] - lower[i];
	+ if ( m_tile[i] <= 0 ) {
	+ // TODO: determine what is a good default tile size for cuda
	+ // may be rank dependent
	+ if ( (inner_direction == Right && (i < rank-1))
	+ \|\| (inner_direction == Left && (i > 0)) )
	+ {
	+ m_tile[i] = 2;
	+ }
	+ else {
	+ m_tile[i] = 16;
	+ }
	+ }
	+ m_tile_end[i] = static_cast<index_type>((span + m_tile[i] - 1) / m_tile[i]);
	+ m_num_tiles *= m_tile_end[i];
	+ }
	+ index_type total_tile_size_check = 1;
	+ for (int i=0; i<rank; ++i) {
	+ total_tile_size_check *= m_tile[i];
	+ }
	+ if ( total_tile_size_check >= 1024 ) { // improve this check - 1024,1024,64 max per dim (Kepler), but product num_threads < 1024; more restrictions pending register limit
	+ printf(" Tile dimensions exceed Cuda limits\n");
	+ Kokkos::abort(" Cuda ExecSpace Error: MDRange tile dims exceed maximum number of threads per block - choose smaller tile dims");
	+ //Kokkos::Impl::throw_runtime_exception( " Cuda ExecSpace Error: MDRange tile dims exceed maximum number of threads per block - choose smaller tile dims");
	}
	- m_tile_dim[i] = (m_dim[i] + (m_tile[i] - 1)) / m_tile[i];
	- m_num_tiles *= m_tile_dim[i];
	- }
	- }
	-
	- template <typename IA, typename IB, typename T>
	- MDRangePolicy( std::initializer_list<IA> corner_a
	- , std::initializer_list<IB> corner_b
	- , std::initializer_list<T> tile
	- )
	- {
	- static_assert( std::is_integral<IA>::value, "Kokkos Error: corner A defined with non-integral type" );
	- static_assert( std::is_integral<IB>::value, "Kokkos Error: corner B defined with non-integral type" );
	- static_assert( std::is_integral<T>::value, "Kokkos Error: tile defined with non-integral type" );
	- static_assert( inner_direction != Flat, "Kokkos Error: tiling not support with flat iteration" );
	-
	- // TODO check size of lists equal to rank
	- // static_asserts on initializer_list.size() require c++14
	- //static_assert( corner_a.size() == rank, "Kokkos Error: corner_a has incorrect rank" );
	- //static_assert( corner_b.size() == rank, "Kokkos Error: corner_b has incorrect rank" );
	- //static_assert( tile.size() == rank, "Kokkos Error: tile has incorrect rank" );
	-
	- using A = typename std::make_signed<IA>::type;
	- using B = typename std::make_signed<IB>::type;
	-
	- const auto a = [=](int i) { return static_cast<A>(corner_a.begin()[i]); };
	- const auto b = [=](int i) { return static_cast<B>(corner_b.begin()[i]); };
	- const auto t = tile.begin();
	-
	- m_num_tiles = 1;
	- for (int i=0; i<rank; ++i) {
	- m_offset[i] = static_cast<index_type>(a(i) <= b(i) ? a(i) : b(i));
	- m_dim[i] = static_cast<index_type>(a(i) <= b(i) ? b(i) - a(i) : a(i) - b(i));
	- m_tile[i] = static_cast<int>(t[i] > (T)0 ? t[i] : (T)1 );
	- m_tile_dim[i] = (m_dim[i] + (m_tile[i] - 1)) / m_tile[i];
	- m_num_tiles *= m_tile_dim[i];
	}
	+ #endif
	}

	- index_type m_offset[rank];
	- index_type m_dim[rank];
	- int m_tile[rank];
	- index_type m_tile_dim[rank];
	- size_type m_num_tiles; // product of tile dims
	-};
	-
	-namespace Impl {

	-// Serial, Threads, OpenMP
	-// use enable_if to overload for Cuda
	-template < typename MDRange, typename Functor, typename Enable = void >
	-struct MDForFunctor
	-{
	- using work_tag = typename MDRange::work_tag;
	- using index_type = typename MDRange::index_type;
	- using size_type = typename MDRange::size_type;
	-
	- MDRange m_range;
	- Functor m_func;
	-
	- KOKKOS_INLINE_FUNCTION
	- MDForFunctor( MDRange const& range, Functor const& f )
	- : m_range(range)
	- , m_func( f )
	- {}
	-
	- KOKKOS_INLINE_FUNCTION
	- MDForFunctor( MDRange const& range, Functor && f )
	- : m_range(range)
	- , m_func( std::forward<Functor>(f) )
	- {}
	-
	- KOKKOS_INLINE_FUNCTION
	- MDForFunctor( MDRange && range, Functor const& f )
	- : m_range( std::forward<MDRange>(range) )
	- , m_func( f )
	- {}
	-
	- KOKKOS_INLINE_FUNCTION
	- MDForFunctor( MDRange && range, Functor && f )
	- : m_range( std::forward<MDRange>(range) )
	- , m_func( std::forward<Functor>(f) )
	- {}
	-
	-
	- KOKKOS_INLINE_FUNCTION
	- MDForFunctor( MDForFunctor const& ) = default;
	-
	- KOKKOS_INLINE_FUNCTION
	- MDForFunctor& operator=( MDForFunctor const& ) = default;
	-
	- KOKKOS_INLINE_FUNCTION
	- MDForFunctor( MDForFunctor && ) = default;
	-
	- KOKKOS_INLINE_FUNCTION
	- MDForFunctor& operator=( MDForFunctor && ) = default;
	-
	- // Rank-2, Flat, No Tag
	- template <typename Idx>
	- KOKKOS_FORCEINLINE_FUNCTION
	- typename std::enable_if<( std::is_integral<Idx>::value
	- && std::is_same<void, work_tag>::value
	- && MDRange::rank == 2
	- && MDRange::inner_direction == MDRange::Flat
	- )>::type
	- operator()(Idx t) const
	+ template < typename LT , typename UT , typename TT = array_index_type >
	+ MDRangePolicy( std::initializer_list<LT> const& lower, std::initializer_list<UT> const& upper, std::initializer_list<TT> const& tile = {} )
	{
	- if ( MDRange::outer_direction == MDRange::Right ) {
	- m_func( m_range.m_offset[0] + ( t / m_range.m_dim[1] )
	- , m_range.m_offset[1] + ( t % m_range.m_dim[1] ) );
	- } else {
	- m_func( m_range.m_offset[0] + ( t % m_range.m_dim[0] )
	- , m_range.m_offset[1] + ( t / m_range.m_dim[0] ) );
	+#if 0
	+ // This should work, less duplicated code but not yet extensively tested
	+ point_type lower_tmp, upper_tmp;
	+ tile_type tile_tmp;
	+ for ( auto i = 0; i < rank; ++i ) {
	+ lower_tmp[i] = static_cast<array_index_type>(lower.begin()[i]);
	+ upper_tmp[i] = static_cast<array_index_type>(upper.begin()[i]);
	+ tile_tmp[i] = static_cast<array_index_type>(tile.begin()[i]);
	}
	- }

	- // Rank-2, Flat, Tag
	- template <typename Idx>
	- KOKKOS_FORCEINLINE_FUNCTION
	- typename std::enable_if<( std::is_integral<Idx>::value
	- && !std::is_same<void, work_tag>::value
	- && MDRange::rank == 2
	- && MDRange::inner_direction == MDRange::Flat
	- )>::type
	- operator()(Idx t) const
	- {
	- if ( MDRange::outer_direction == MDRange::Right ) {
	- m_func( work_tag{}, m_range.m_offset[0] + ( t / m_range.m_dim[1] )
	- , m_range.m_offset[1] + ( t % m_range.m_dim[1] ) );
	- } else {
	- m_func( work_tag{}, m_range.m_offset[0] + ( t % m_range.m_dim[0] )
	- , m_range.m_offset[1] + ( t / m_range.m_dim[0] ) );
	- }
	- }
	+ MDRangePolicy( lower_tmp, upper_tmp, tile_tmp );

	- // Rank-2, Not Flat, No Tag
	- template <typename Idx>
	- KOKKOS_FORCEINLINE_FUNCTION
	- typename std::enable_if<( std::is_integral<Idx>::value
	- && std::is_same<void, work_tag>::value
	- && MDRange::rank == 2
	- && MDRange::inner_direction != MDRange::Flat
	- )>::type
	- operator()(Idx t) const
	- {
	- index_type t0, t1;
	- if ( MDRange::outer_direction == MDRange::Right ) {
	- t0 = t / m_range.m_tile_dim[1];
	- t1 = t % m_range.m_tile_dim[1];
	- } else {
	- t0 = t % m_range.m_tile_dim[0];
	- t1 = t / m_range.m_tile_dim[0];
	- }
	+#else
	+ if(m_lower.size()!=rank \|\| m_upper.size() != rank)
	+ Kokkos::abort("MDRangePolicy: Constructor initializer lists have wrong size");

	- const index_type b0 = t0 * m_range.m_tile[0] + m_range.m_offset[0];
	- const index_type b1 = t1 * m_range.m_tile[1] + m_range.m_offset[1];
	-
	- const index_type e0 = b0 + m_range.m_tile[0] <= (m_range.m_dim[0] + m_range.m_offset[0] ) ? b0 + m_range.m_tile[0] : ( m_range.m_dim[0] + m_range.m_offset[0] );
	- const index_type e1 = b1 + m_range.m_tile[1] <= (m_range.m_dim[1] + m_range.m_offset[1] ) ? b1 + m_range.m_tile[1] : ( m_range.m_dim[1] + m_range.m_offset[1] );
	-
	- if ( MDRange::inner_direction == MDRange::Right ) {
	- for (int i0=b0; i0<e0; ++i0) {
	- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
	- #pragma ivdep
	- #endif
	- for (int i1=b1; i1<e1; ++i1) {
	- m_func( i0, i1 );
	- }}
	- } else {
	- for (int i1=b1; i1<e1; ++i1) {
	- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
	- #pragma ivdep
	- #endif
	- for (int i0=b0; i0<e0; ++i0) {
	- m_func( i0, i1 );
	- }}
	+ for ( auto i = 0; i < rank; ++i ) {
	+ m_lower[i] = static_cast<array_index_type>(lower.begin()[i]);
	+ m_upper[i] = static_cast<array_index_type>(upper.begin()[i]);
	+ if(tile.size()==rank)
	+ m_tile[i] = static_cast<array_index_type>(tile.begin()[i]);
	+ else
	+ m_tile[i] = 0;
	}
	- }

	- // Rank-2, Not Flat, Tag
	- template <typename Idx>
	- KOKKOS_FORCEINLINE_FUNCTION
	- typename std::enable_if<( std::is_integral<Idx>::value
	- && !std::is_same<void, work_tag>::value
	- && MDRange::rank == 2
	- && MDRange::inner_direction != MDRange::Flat
	- )>::type
	- operator()(Idx t) const
	- {
	- work_tag tag;
	-
	- index_type t0, t1;
	- if ( MDRange::outer_direction == MDRange::Right ) {
	- t0 = t / m_range.m_tile_dim[1];
	- t1 = t % m_range.m_tile_dim[1];
	- } else {
	- t0 = t % m_range.m_tile_dim[0];
	- t1 = t / m_range.m_tile_dim[0];
	- }
	+ m_num_tiles = 1;

	- const index_type b0 = t0 * m_range.m_tile[0] + m_range.m_offset[0];
	- const index_type b1 = t1 * m_range.m_tile[1] + m_range.m_offset[1];
	-
	- const index_type e0 = b0 + m_range.m_tile[0] <= (m_range.m_dim[0] + m_range.m_offset[0] ) ? b0 + m_range.m_tile[0] : ( m_range.m_dim[0] + m_range.m_offset[0] );
	- const index_type e1 = b1 + m_range.m_tile[1] <= (m_range.m_dim[1] + m_range.m_offset[1] ) ? b1 + m_range.m_tile[1] : ( m_range.m_dim[1] + m_range.m_offset[1] );
	-
	- if ( MDRange::inner_direction == MDRange::Right ) {
	- for (int i0=b0; i0<e0; ++i0) {
	- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
	- #pragma ivdep
	- #endif
	- for (int i1=b1; i1<e1; ++i1) {
	- m_func( tag, i0, i1 );
	- }}
	- } else {
	- for (int i1=b1; i1<e1; ++i1) {
	- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
	- #pragma ivdep
	- #endif
	- for (int i0=b0; i0<e0; ++i0) {
	- m_func( tag, i0, i1 );
	- }}
	- }
	- }

	- //---------------------------------------------------------------------------
	-
	- // Rank-3, Flat, No Tag
	- template <typename Idx>
	- KOKKOS_FORCEINLINE_FUNCTION
	- typename std::enable_if<( std::is_integral<Idx>::value
	- && std::is_same<void, work_tag>::value
	- && MDRange::rank == 3
	- && MDRange::inner_direction == MDRange::Flat
	- )>::type
	- operator()(Idx t) const
	- {
	- if ( MDRange::outer_direction == MDRange::Right ) {
	- const int64_t tmp_prod = m_range.m_dim[1]*m_range.m_dim[2];
	- m_func( m_range.m_offset[0] + ( t / tmp_prod )
	- , m_range.m_offset[1] + ( (t % tmp_prod) / m_range.m_dim[2] )
	- , m_range.m_offset[2] + ( (t % tmp_prod) % m_range.m_dim[2] )
	- );
	- } else {
	- const int64_t tmp_prod = m_range.m_dim[0]*m_range.m_dim[1];
	- m_func( m_range.m_offset[0] + ( (t % tmp_prod) % m_range.m_dim[0] )
	- , m_range.m_offset[1] + ( (t % tmp_prod) / m_range.m_dim[0] )
	- , m_range.m_offset[2] + ( t / tmp_prod )
	- );
	+ // Host
	+ if ( true
	+ #if defined(KOKKOS_ENABLE_CUDA)
	+ && !std::is_same< typename traits::execution_space, Kokkos::Cuda >::value
	+ #endif
	+ )
	+ {
	+ index_type span;
	+ for (int i=0; i<rank; ++i) {
	+ span = m_upper[i] - m_lower[i];
	+ if ( m_tile[i] <= 0 ) {
	+ if ( (inner_direction == Right && (i < rank-1))
	+ \|\| (inner_direction == Left && (i > 0)) )
	+ {
	+ m_tile[i] = 2;
	+ }
	+ else {
	+ m_tile[i] = span;
	+ }
	+ }
	+ m_tile_end[i] = static_cast<index_type>((span + m_tile[i] - 1) / m_tile[i]);
	+ m_num_tiles *= m_tile_end[i];
	+ }
	}
	- }
	-
	- // Rank-3, Flat, Tag
	- template <typename Idx>
	- KOKKOS_FORCEINLINE_FUNCTION
	- typename std::enable_if<( std::is_integral<Idx>::value
	- && !std::is_same<void, work_tag>::value
	- && MDRange::rank == 3
	- && MDRange::inner_direction == MDRange::Flat
	- )>::type
	- operator()(Idx t) const
	- {
	- if ( MDRange::outer_direction == MDRange::Right ) {
	- const int64_t tmp_prod = m_range.m_dim[1]*m_range.m_dim[2];
	- m_func( work_tag{}
	- , m_range.m_offset[0] + ( t / tmp_prod )
	- , m_range.m_offset[1] + ( (t % tmp_prod) / m_range.m_dim[2] )
	- , m_range.m_offset[2] + ( (t % tmp_prod) % m_range.m_dim[2] )
	- );
	- } else {
	- const int64_t tmp_prod = m_range.m_dim[0]*m_range.m_dim[1];
	- m_func( work_tag{}
	- , m_range.m_offset[0] + ( (t % tmp_prod) % m_range.m_dim[0] )
	- , m_range.m_offset[1] + ( (t % tmp_prod) / m_range.m_dim[0] )
	- , m_range.m_offset[2] + ( t / tmp_prod )
	- );
	+ #if defined(KOKKOS_ENABLE_CUDA)
	+ else // Cuda
	+ {
	+ index_type span;
	+ for (int i=0; i<rank; ++i) {
	+ span = m_upper[i] - m_lower[i];
	+ if ( m_tile[i] <= 0 ) {
	+ // TODO: determine what is a good default tile size for cuda
	+ // may be rank dependent
	+ if ( (inner_direction == Right && (i < rank-1))
	+ \|\| (inner_direction == Left && (i > 0)) )
	+ {
	+ m_tile[i] = 2;
	+ }
	+ else {
	+ m_tile[i] = 16;
	+ }
	+ }
	+ m_tile_end[i] = static_cast<index_type>((span + m_tile[i] - 1) / m_tile[i]);
	+ m_num_tiles *= m_tile_end[i];
	+ }
	+ index_type total_tile_size_check = 1;
	+ for (int i=0; i<rank; ++i) {
	+ total_tile_size_check *= m_tile[i];
	+ }
	+ if ( total_tile_size_check >= 1024 ) { // improve this check - 1024,1024,64 max per dim (Kepler), but product num_threads < 1024; more restrictions pending register limit
	+ printf(" Tile dimensions exceed Cuda limits\n");
	+ Kokkos::abort(" Cuda ExecSpace Error: MDRange tile dims exceed maximum number of threads per block - choose smaller tile dims");
	+ //Kokkos::Impl::throw_runtime_exception( " Cuda ExecSpace Error: MDRange tile dims exceed maximum number of threads per block - choose smaller tile dims");
	+ }
	}
	+ #endif
	+#endif
	}

	- // Rank-3, Not Flat, No Tag
	- template <typename Idx>
	- KOKKOS_FORCEINLINE_FUNCTION
	- typename std::enable_if<( std::is_integral<Idx>::value
	- && std::is_same<void, work_tag>::value
	- && MDRange::rank == 3
	- && MDRange::inner_direction != MDRange::Flat
	- )>::type
	- operator()(Idx t) const
	- {
	- index_type t0, t1, t2;
	- if ( MDRange::outer_direction == MDRange::Right ) {
	- const index_type tmp_prod = ( m_range.m_tile_dim[1]*m_range.m_tile_dim[2]);
	- t0 = t / tmp_prod;
	- t1 = ( t % tmp_prod ) / m_range.m_tile_dim[2];
	- t2 = ( t % tmp_prod ) % m_range.m_tile_dim[2];
	- } else {
	- const index_type tmp_prod = ( m_range.m_tile_dim[0]*m_range.m_tile_dim[1]);
	- t0 = ( t % tmp_prod ) % m_range.m_tile_dim[0];
	- t1 = ( t % tmp_prod ) / m_range.m_tile_dim[0];
	- t2 = t / tmp_prod;
	- }

	- const index_type b0 = t0 * m_range.m_tile[0] + m_range.m_offset[0];
	- const index_type b1 = t1 * m_range.m_tile[1] + m_range.m_offset[1];
	- const index_type b2 = t2 * m_range.m_tile[2] + m_range.m_offset[2];
	-
	- const index_type e0 = b0 + m_range.m_tile[0] <= (m_range.m_dim[0] + m_range.m_offset[0] ) ? b0 + m_range.m_tile[0] : ( m_range.m_dim[0] + m_range.m_offset[0] );
	- const index_type e1 = b1 + m_range.m_tile[1] <= (m_range.m_dim[1] + m_range.m_offset[1] ) ? b1 + m_range.m_tile[1] : ( m_range.m_dim[1] + m_range.m_offset[1] );
	- const index_type e2 = b2 + m_range.m_tile[2] <= (m_range.m_dim[2] + m_range.m_offset[2] ) ? b2 + m_range.m_tile[2] : ( m_range.m_dim[2] + m_range.m_offset[2] );
	-
	- if ( MDRange::inner_direction == MDRange::Right ) {
	- for (int i0=b0; i0<e0; ++i0) {
	- for (int i1=b1; i1<e1; ++i1) {
	- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
	- #pragma ivdep
	- #endif
	- for (int i2=b2; i2<e2; ++i2) {
	- m_func( i0, i1, i2 );
	- }}}
	- } else {
	- for (int i2=b2; i2<e2; ++i2) {
	- for (int i1=b1; i1<e1; ++i1) {
	- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
	- #pragma ivdep
	- #endif
	- for (int i0=b0; i0<e0; ++i0) {
	- m_func( i0, i1, i2 );
	- }}}
	- }
	- }
	+ point_type m_lower;
	+ point_type m_upper;
	+ tile_type m_tile;
	+ point_type m_tile_end;
	+ index_type m_num_tiles;
	+};
	+// ------------------------------------------------------------------ //

	- // Rank-3, Not Flat, Tag
	- template <typename Idx>
	- KOKKOS_FORCEINLINE_FUNCTION
	- typename std::enable_if<( std::is_integral<Idx>::value
	- && !std::is_same<void, work_tag>::value
	- && MDRange::rank == 3
	- && MDRange::inner_direction != MDRange::Flat
	- )>::type
	- operator()(Idx t) const
	- {
	- work_tag tag;
	-
	- index_type t0, t1, t2;
	- if ( MDRange::outer_direction == MDRange::Right ) {
	- const index_type tmp_prod = ( m_range.m_tile_dim[1]*m_range.m_tile_dim[2]);
	- t0 = t / tmp_prod;
	- t1 = ( t % tmp_prod ) / m_range.m_tile_dim[2];
	- t2 = ( t % tmp_prod ) % m_range.m_tile_dim[2];
	- } else {
	- const index_type tmp_prod = ( m_range.m_tile_dim[0]*m_range.m_tile_dim[1]);
	- t0 = ( t % tmp_prod ) % m_range.m_tile_dim[0];
	- t1 = ( t % tmp_prod ) / m_range.m_tile_dim[0];
	- t2 = t / tmp_prod;
	- }
	+// ------------------------------------------------------------------ //
	+//md_parallel_for
	+// ------------------------------------------------------------------ //
	+template <typename MDRange, typename Functor, typename Enable = void>
	+void md_parallel_for( MDRange const& range
	+ , Functor const& f
	+ , const std::string& str = ""
	+ , typename std::enable_if<( true
	+ #if defined( KOKKOS_ENABLE_CUDA)
	+ && !std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
	+ #endif
	+ ) >::type* = 0
	+ )
	+{
	+ Impl::MDFunctor<MDRange, Functor, void> g(range, f);

	- const index_type b0 = t0 * m_range.m_tile[0] + m_range.m_offset[0];
	- const index_type b1 = t1 * m_range.m_tile[1] + m_range.m_offset[1];
	- const index_type b2 = t2 * m_range.m_tile[2] + m_range.m_offset[2];
	-
	- const index_type e0 = b0 + m_range.m_tile[0] <= (m_range.m_dim[0] + m_range.m_offset[0] ) ? b0 + m_range.m_tile[0] : ( m_range.m_dim[0] + m_range.m_offset[0] );
	- const index_type e1 = b1 + m_range.m_tile[1] <= (m_range.m_dim[1] + m_range.m_offset[1] ) ? b1 + m_range.m_tile[1] : ( m_range.m_dim[1] + m_range.m_offset[1] );
	- const index_type e2 = b2 + m_range.m_tile[2] <= (m_range.m_dim[2] + m_range.m_offset[2] ) ? b2 + m_range.m_tile[2] : ( m_range.m_dim[2] + m_range.m_offset[2] );
	-
	- if ( MDRange::inner_direction == MDRange::Right ) {
	- for (int i0=b0; i0<e0; ++i0) {
	- for (int i1=b1; i1<e1; ++i1) {
	- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
	- #pragma ivdep
	- #endif
	- for (int i2=b2; i2<e2; ++i2) {
	- m_func( tag, i0, i1, i2 );
	- }}}
	- } else {
	- for (int i2=b2; i2<e2; ++i2) {
	- for (int i1=b1; i1<e1; ++i1) {
	- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
	- #pragma ivdep
	- #endif
	- for (int i0=b0; i0<e0; ++i0) {
	- m_func( tag, i0, i1, i2 );
	- }}}
	- }
	- }
	-};
	+ //using range_policy = typename MDRange::range_policy;
	+ using range_policy = typename MDRange::impl_range_policy;
	+
	+ Kokkos::parallel_for( range_policy(0, range.m_num_tiles).set_chunk_size(1), g, str );
	+}

	+template <typename MDRange, typename Functor>
	+void md_parallel_for( const std::string& str
	+ , MDRange const& range
	+ , Functor const& f
	+ , typename std::enable_if<( true
	+ #if defined( KOKKOS_ENABLE_CUDA)
	+ && !std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
	+ #endif
	+ ) >::type* = 0
	+ )
	+{
	+ Impl::MDFunctor<MDRange, Functor, void> g(range, f);

	+ //using range_policy = typename MDRange::range_policy;
	+ using range_policy = typename MDRange::impl_range_policy;

	-} // namespace Impl
	+ Kokkos::parallel_for( range_policy(0, range.m_num_tiles).set_chunk_size(1), g, str );
	+}

	+// Cuda specialization
	+#if defined( __CUDACC__ ) && defined( KOKKOS_ENABLE_CUDA )
	+template <typename MDRange, typename Functor>
	+void md_parallel_for( const std::string& str
	+ , MDRange const& range
	+ , Functor const& f
	+ , typename std::enable_if<( true
	+ #if defined( KOKKOS_ENABLE_CUDA)
	+ && std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
	+ #endif
	+ ) >::type* = 0
	+ )
	+{
	+ Impl::DeviceIterateTile<MDRange, Functor, typename MDRange::work_tag> closure(range, f);
	+ closure.execute();
	+}

	template <typename MDRange, typename Functor>
	void md_parallel_for( MDRange const& range
	, Functor const& f
	, const std::string& str = ""
	+ , typename std::enable_if<( true
	+ #if defined( KOKKOS_ENABLE_CUDA)
	+ && std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
	+ #endif
	+ ) >::type* = 0
	)
	{
	- Impl::MDForFunctor<MDRange, Functor> g(range, f);
	+ Impl::DeviceIterateTile<MDRange, Functor, typename MDRange::work_tag> closure(range, f);
	+ closure.execute();
	+}
	+#endif
	+// ------------------------------------------------------------------ //

	- using range_policy = typename MDRange::range_policy;
	+// ------------------------------------------------------------------ //
	+//md_parallel_reduce
	+// ------------------------------------------------------------------ //
	+template <typename MDRange, typename Functor, typename ValueType>
	+void md_parallel_reduce( MDRange const& range
	+ , Functor const& f
	+ , ValueType & v
	+ , const std::string& str = ""
	+ , typename std::enable_if<( true
	+ #if defined( KOKKOS_ENABLE_CUDA)
	+ && !std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
	+ #endif
	+ ) >::type* = 0
	+ )
	+{
	+ Impl::MDFunctor<MDRange, Functor, ValueType> g(range, f, v);

	- Kokkos::parallel_for( range_policy(0, range.m_num_tiles).set_chunk_size(1), g, str );
	+ //using range_policy = typename MDRange::range_policy;
	+ using range_policy = typename MDRange::impl_range_policy;
	+ Kokkos::parallel_reduce( str, range_policy(0, range.m_num_tiles).set_chunk_size(1), g, v );
	}

	-template <typename MDRange, typename Functor>
	-void md_parallel_for( const std::string& str
	+template <typename MDRange, typename Functor, typename ValueType>
	+void md_parallel_reduce( const std::string& str
	, MDRange const& range
	, Functor const& f
	+ , ValueType & v
	+ , typename std::enable_if<( true
	+ #if defined( KOKKOS_ENABLE_CUDA)
	+ && !std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
	+ #endif
	+ ) >::type* = 0
	)
	{
	- Impl::MDForFunctor<MDRange, Functor> g(range, f);
	+ Impl::MDFunctor<MDRange, Functor, ValueType> g(range, f, v);

	- using range_policy = typename MDRange::range_policy;
	+ //using range_policy = typename MDRange::range_policy;
	+ using range_policy = typename MDRange::impl_range_policy;

	- Kokkos::parallel_for( range_policy(0, range.m_num_tiles).set_chunk_size(1), g, str );
	+ Kokkos::parallel_reduce( str, range_policy(0, range.m_num_tiles).set_chunk_size(1), g, v );
	}

	+// Cuda - parallel_reduce not implemented yet
	+/*
	+template <typename MDRange, typename Functor, typename ValueType>
	+void md_parallel_reduce( MDRange const& range
	+ , Functor const& f
	+ , ValueType & v
	+ , const std::string& str = ""
	+ , typename std::enable_if<( true
	+ #if defined( KOKKOS_ENABLE_CUDA)
	+ && std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
	+ #endif
	+ ) >::type* = 0
	+ )
	+{
	+ Impl::DeviceIterateTile<MDRange, Functor, typename MDRange::work_tag> closure(range, f, v);
	+ closure.execute();
	+}
	+
	+template <typename MDRange, typename Functor, typename ValueType>
	+void md_parallel_reduce( const std::string& str
	+ , MDRange const& range
	+ , Functor const& f
	+ , ValueType & v
	+ , typename std::enable_if<( true
	+ #if defined( KOKKOS_ENABLE_CUDA)
	+ && std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
	+ #endif
	+ ) >::type* = 0
	+ )
	+{
	+ Impl::DeviceIterateTile<MDRange, Functor, typename MDRange::work_tag> closure(range, f, v);
	+ closure.execute();
	+}
	+*/
	+
	}} // namespace Kokkos::Experimental

	#endif //KOKKOS_CORE_EXP_MD_RANGE_POLICY_HPP

	diff --git a/lib/kokkos/core/src/Kokkos_Array.hpp b/lib/kokkos/core/src/Kokkos_Array.hpp
	index 8deb5142c..abb263b7c 100644
	--- a/lib/kokkos/core/src/Kokkos_Array.hpp
	+++ b/lib/kokkos/core/src/Kokkos_Array.hpp
	@@ -1,302 +1,315 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_ARRAY_HPP
	#define KOKKOS_ARRAY_HPP

	#include <type_traits>
	#include <algorithm>
	#include <limits>
	#include <cstddef>

	namespace Kokkos {

	/**\brief Derived from the C++17 'std::array'.
	* Dropping the iterator interface.
	*/
	template< class T = void
	, size_t N = ~size_t(0)
	, class Proxy = void
	>
	struct Array {
	-private:
	- T m_elem[N];
	+public:
	+ /**
	+ * The elements of this C array shall not be accessed directly. The data
	+ * member has to be declared public to enable aggregate initialization as for
	+ * std::array. We mark it as private in the documentation.
	+ * @private
	+ */
	+ T m_internal_implementation_private_member_data[N];
	public:

	typedef T & reference ;
	typedef typename std::add_const<T>::type & const_reference ;
	typedef size_t size_type ;
	typedef ptrdiff_t difference_type ;
	typedef T value_type ;
	typedef T * pointer ;
	typedef typename std::add_const<T>::type * const_pointer ;

	KOKKOS_INLINE_FUNCTION static constexpr size_type size() { return N ; }
	KOKKOS_INLINE_FUNCTION static constexpr bool empty(){ return false ; }

	template< typename iType >
	KOKKOS_INLINE_FUNCTION
	reference operator[]( const iType & i )
	{
	- static_assert( std::is_integral<iType>::value , "Must be integral argument" );
	- return m_elem[i];
	+ static_assert( ( std::is_integral<iType>::value \|\| std::is_enum<iType>::value ) , "Must be integral argument" );
	+ return m_internal_implementation_private_member_data[i];
	}

	template< typename iType >
	KOKKOS_INLINE_FUNCTION
	const_reference operator[]( const iType & i ) const
	{
	- static_assert( std::is_integral<iType>::value , "Must be integral argument" );
	- return m_elem[i];
	+ static_assert( ( std::is_integral<iType>::value \|\| std::is_enum<iType>::value ) , "Must be integral argument" );
	+ return m_internal_implementation_private_member_data[i];
	}

	- KOKKOS_INLINE_FUNCTION pointer data() { return & m_elem[0] ; }
	- KOKKOS_INLINE_FUNCTION const_pointer data() const { return & m_elem[0] ; }
	+ KOKKOS_INLINE_FUNCTION pointer data()
	+ {
	+ return & m_internal_implementation_private_member_data[0];
	+ }
	+ KOKKOS_INLINE_FUNCTION const_pointer data() const
	+ {
	+ return & m_internal_implementation_private_member_data[0];
	+ }

	- ~Array() = default ;
	- Array() = default ;
	- Array( const Array & ) = default ;
	- Array & operator = ( const Array & ) = default ;
	+ // Do not default unless move and move-assignment are also defined
	+ // ~Array() = default ;
	+ // Array() = default ;
	+ // Array( const Array & ) = default ;
	+ // Array & operator = ( const Array & ) = default ;

	// Some supported compilers are not sufficiently C++11 compliant
	// for default move constructor and move assignment operator.
	// Array( Array && ) = default ;
	// Array & operator = ( Array && ) = default ;
	};


	template< class T , class Proxy >
	struct Array<T,0,Proxy> {
	public:

	typedef typename std::add_const<T>::type & reference ;
	typedef typename std::add_const<T>::type & const_reference ;
	typedef size_t size_type ;
	typedef ptrdiff_t difference_type ;
	typedef typename std::add_const<T>::type value_type ;
	typedef typename std::add_const<T>::type * pointer ;
	typedef typename std::add_const<T>::type * const_pointer ;

	KOKKOS_INLINE_FUNCTION static constexpr size_type size() { return 0 ; }
	KOKKOS_INLINE_FUNCTION static constexpr bool empty() { return true ; }

	template< typename iType >
	KOKKOS_INLINE_FUNCTION
	value_type operator[]( const iType & )
	{
	- static_assert( std::is_integral<iType>::value , "Must be integer argument" );
	+ static_assert( ( std::is_integral<iType>::value \|\| std::is_enum<iType>::value ) , "Must be integer argument" );
	return value_type();
	}

	template< typename iType >
	KOKKOS_INLINE_FUNCTION
	value_type operator[]( const iType & ) const
	{
	- static_assert( std::is_integral<iType>::value , "Must be integer argument" );
	+ static_assert( ( std::is_integral<iType>::value \|\| std::is_enum<iType>::value ) , "Must be integer argument" );
	return value_type();
	}

	KOKKOS_INLINE_FUNCTION pointer data() { return pointer(0) ; }
	KOKKOS_INLINE_FUNCTION const_pointer data() const { return const_pointer(0); }

	~Array() = default ;
	Array() = default ;
	Array( const Array & ) = default ;
	Array & operator = ( const Array & ) = default ;

	// Some supported compilers are not sufficiently C++11 compliant
	// for default move constructor and move assignment operator.
	// Array( Array && ) = default ;
	// Array & operator = ( Array && ) = default ;
	};


	template<>
	struct Array<void,~size_t(0),void>
	{
	struct contiguous {};
	struct strided {};
	};

	template< class T >
	struct Array< T , ~size_t(0) , Array<>::contiguous >
	{
	private:
	T * m_elem ;
	size_t m_size ;
	public:

	typedef T & reference ;
	typedef typename std::add_const<T>::type & const_reference ;
	typedef size_t size_type ;
	typedef ptrdiff_t difference_type ;
	typedef T value_type ;
	typedef T * pointer ;
	typedef typename std::add_const<T>::type * const_pointer ;

	KOKKOS_INLINE_FUNCTION constexpr size_type size() const { return m_size ; }
	KOKKOS_INLINE_FUNCTION constexpr bool empty() const { return 0 != m_size ; }

	template< typename iType >
	KOKKOS_INLINE_FUNCTION
	reference operator[]( const iType & i )
	{
	- static_assert( std::is_integral<iType>::value , "Must be integral argument" );
	+ static_assert( ( std::is_integral<iType>::value \|\| std::is_enum<iType>::value ) , "Must be integral argument" );
	return m_elem[i];
	}

	template< typename iType >
	KOKKOS_INLINE_FUNCTION
	const_reference operator[]( const iType & i ) const
	{
	- static_assert( std::is_integral<iType>::value , "Must be integral argument" );
	+ static_assert( ( std::is_integral<iType>::value \|\| std::is_enum<iType>::value ) , "Must be integral argument" );
	return m_elem[i];
	}

	KOKKOS_INLINE_FUNCTION pointer data() { return m_elem ; }
	KOKKOS_INLINE_FUNCTION const_pointer data() const { return m_elem ; }

	~Array() = default ;
	Array() = delete ;
	Array( const Array & rhs ) = delete ;

	// Some supported compilers are not sufficiently C++11 compliant
	// for default move constructor and move assignment operator.
	// Array( Array && rhs ) = default ;
	// Array & operator = ( Array && rhs ) = delete ;

	KOKKOS_INLINE_FUNCTION
	Array & operator = ( const Array & rhs )
	{
	const size_t n = std::min( m_size , rhs.size() );
	for ( size_t i = 0 ; i < n ; ++i ) m_elem[i] = rhs[i] ;
	return *this ;
	}

	template< size_t N , class P >
	KOKKOS_INLINE_FUNCTION
	Array & operator = ( const Array<T,N,P> & rhs )
	{
	const size_t n = std::min( m_size , rhs.size() );
	for ( size_t i = 0 ; i < n ; ++i ) m_elem[i] = rhs[i] ;
	return *this ;
	}

	KOKKOS_INLINE_FUNCTION constexpr Array( pointer arg_ptr , size_type arg_size , size_type = 0 )
	: m_elem(arg_ptr), m_size(arg_size) {}
	};

	template< class T >
	struct Array< T , ~size_t(0) , Array<>::strided >
	{
	private:
	T * m_elem ;
	size_t m_size ;
	size_t m_stride ;
	public:

	typedef T & reference ;
	typedef typename std::add_const<T>::type & const_reference ;
	typedef size_t size_type ;
	typedef ptrdiff_t difference_type ;
	typedef T value_type ;
	typedef T * pointer ;
	typedef typename std::add_const<T>::type * const_pointer ;

	KOKKOS_INLINE_FUNCTION constexpr size_type size() const { return m_size ; }
	KOKKOS_INLINE_FUNCTION constexpr bool empty() const { return 0 != m_size ; }

	template< typename iType >
	KOKKOS_INLINE_FUNCTION
	reference operator[]( const iType & i )
	{
	- static_assert( std::is_integral<iType>::value , "Must be integral argument" );
	+ static_assert( ( std::is_integral<iType>::value \|\| std::is_enum<iType>::value ) , "Must be integral argument" );
	return m_elem[i*m_stride];
	}

	template< typename iType >
	KOKKOS_INLINE_FUNCTION
	const_reference operator[]( const iType & i ) const
	{
	- static_assert( std::is_integral<iType>::value , "Must be integral argument" );
	+ static_assert( ( std::is_integral<iType>::value \|\| std::is_enum<iType>::value ) , "Must be integral argument" );
	return m_elem[i*m_stride];
	}

	KOKKOS_INLINE_FUNCTION pointer data() { return m_elem ; }
	KOKKOS_INLINE_FUNCTION const_pointer data() const { return m_elem ; }

	~Array() = default ;
	Array() = delete ;
	Array( const Array & ) = delete ;


	// Some supported compilers are not sufficiently C++11 compliant
	// for default move constructor and move assignment operator.
	// Array( Array && rhs ) = default ;
	// Array & operator = ( Array && rhs ) = delete ;

	KOKKOS_INLINE_FUNCTION
	Array & operator = ( const Array & rhs )
	{
	const size_t n = std::min( m_size , rhs.size() );
	for ( size_t i = 0 ; i < n ; ++i ) m_elem[i] = rhs[i] ;
	return *this ;
	}

	template< size_t N , class P >
	KOKKOS_INLINE_FUNCTION
	Array & operator = ( const Array<T,N,P> & rhs )
	{
	const size_t n = std::min( m_size , rhs.size() );
	for ( size_t i = 0 ; i < n ; ++i ) m_elem[i] = rhs[i] ;
	return *this ;
	}

	KOKKOS_INLINE_FUNCTION constexpr Array( pointer arg_ptr , size_type arg_size , size_type arg_stride )
	: m_elem(arg_ptr), m_size(arg_size), m_stride(arg_stride) {}
	};

	} // namespace Kokkos

	#endif /* #ifndef KOKKOS_ARRAY_HPP */

	diff --git a/lib/kokkos/core/src/Kokkos_Concepts.hpp b/lib/kokkos/core/src/Kokkos_Concepts.hpp
	index 3f9bdea40..cfcdabf95 100644
	--- a/lib/kokkos/core/src/Kokkos_Concepts.hpp
	+++ b/lib/kokkos/core/src/Kokkos_Concepts.hpp
	@@ -1,342 +1,343 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_CORE_CONCEPTS_HPP
	#define KOKKOS_CORE_CONCEPTS_HPP

	#include <type_traits>

	// Needed for 'is_space<S>::host_mirror_space
	#include <Kokkos_Core_fwd.hpp>

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {

	//Schedules for Execution Policies
	struct Static {};
	struct Dynamic {};

	//Schedule Wrapper Type
	template<class T>
	struct Schedule
	{
	static_assert( std::is_same<T,Static>::value
	\|\| std::is_same<T,Dynamic>::value
	, "Kokkos: Invalid Schedule<> type."
	);
	using schedule_type = Schedule ;
	using type = T;
	};

	//Specify Iteration Index Type
	template<typename T>
	struct IndexType
	{
	static_assert(std::is_integral<T>::value,"Kokkos: Invalid IndexType<>.");
	using index_type = IndexType ;
	using type = T;
	};

	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {

	#define KOKKOS_IMPL_IS_CONCEPT( CONCEPT ) \
	template< typename T > struct is_ ## CONCEPT { \
	private: \
	template< typename , typename = std::true_type > struct have : std::false_type {}; \
	template< typename U > struct have<U,typename std::is_same<U,typename U:: CONCEPT >::type> : std::true_type {}; \
	public: \
	enum { value = is_ ## CONCEPT::template have<T>::value }; \
	};

	// Public concept:

	KOKKOS_IMPL_IS_CONCEPT( memory_space )
	KOKKOS_IMPL_IS_CONCEPT( memory_traits )
	KOKKOS_IMPL_IS_CONCEPT( execution_space )
	KOKKOS_IMPL_IS_CONCEPT( execution_policy )
	KOKKOS_IMPL_IS_CONCEPT( array_layout )
	+KOKKOS_IMPL_IS_CONCEPT( reducer )

	namespace Impl {

	// For backward compatibility:

	using Kokkos::is_memory_space ;
	using Kokkos::is_memory_traits ;
	using Kokkos::is_execution_space ;
	using Kokkos::is_execution_policy ;
	using Kokkos::is_array_layout ;

	// Implementation concept:

	KOKKOS_IMPL_IS_CONCEPT( iteration_pattern )
	KOKKOS_IMPL_IS_CONCEPT( schedule_type )
	KOKKOS_IMPL_IS_CONCEPT( index_type )

	}

	#undef KOKKOS_IMPL_IS_CONCEPT

	} // namespace Kokkos

	//----------------------------------------------------------------------------

	namespace Kokkos {

	template< class ExecutionSpace , class MemorySpace >
	struct Device {
	static_assert( Kokkos::is_execution_space<ExecutionSpace>::value
	, "Execution space is not valid" );
	static_assert( Kokkos::is_memory_space<MemorySpace>::value
	, "Memory space is not valid" );
	typedef ExecutionSpace execution_space;
	typedef MemorySpace memory_space;
	typedef Device<execution_space,memory_space> device_type;
	};


	template< typename T >
	struct is_space {
	private:

	template< typename , typename = void >
	struct exe : std::false_type { typedef void space ; };

	template< typename , typename = void >
	struct mem : std::false_type { typedef void space ; };

	template< typename , typename = void >
	struct dev : std::false_type { typedef void space ; };

	template< typename U >
	struct exe<U,typename std::conditional<true,void,typename U::execution_space>::type>
	: std::is_same<U,typename U::execution_space>::type
	{ typedef typename U::execution_space space ; };

	template< typename U >
	struct mem<U,typename std::conditional<true,void,typename U::memory_space>::type>
	: std::is_same<U,typename U::memory_space>::type
	{ typedef typename U::memory_space space ; };

	template< typename U >
	struct dev<U,typename std::conditional<true,void,typename U::device_type>::type>
	: std::is_same<U,typename U::device_type>::type
	{ typedef typename U::device_type space ; };

	typedef typename is_space::template exe<T> is_exe ;
	typedef typename is_space::template mem<T> is_mem ;
	typedef typename is_space::template dev<T> is_dev ;

	public:

	enum { value = is_exe::value \|\| is_mem::value \|\| is_dev::value };

	typedef typename is_exe::space execution_space ;
	typedef typename is_mem::space memory_space ;

	// For backward compatibility, deprecated in favor of
	// Kokkos::Impl::HostMirror<S>::host_mirror_space

	typedef typename std::conditional
	< std::is_same< memory_space , Kokkos::HostSpace >::value
	#if defined( KOKKOS_ENABLE_CUDA )
	\|\| std::is_same< memory_space , Kokkos::CudaUVMSpace >::value
	\|\| std::is_same< memory_space , Kokkos::CudaHostPinnedSpace >::value
	#endif /* #if defined( KOKKOS_ENABLE_CUDA ) */
	, memory_space
	, Kokkos::HostSpace
	>::type host_memory_space ;

	#if defined( KOKKOS_ENABLE_CUDA )
	typedef typename std::conditional
	< std::is_same< execution_space , Kokkos::Cuda >::value
	, Kokkos::DefaultHostExecutionSpace , execution_space
	>::type host_execution_space ;
	#else
	typedef execution_space host_execution_space ;
	#endif

	typedef typename std::conditional
	< std::is_same< execution_space , host_execution_space >::value &&
	std::is_same< memory_space , host_memory_space >::value
	, T , Kokkos::Device< host_execution_space , host_memory_space >
	>::type host_mirror_space ;
	};

	// For backward compatiblity

	namespace Impl {

	using Kokkos::is_space ;

	}

	} // namespace Kokkos

	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	/**\brief Access relationship between DstMemorySpace and SrcMemorySpace
	*
	* The default case can assume accessibility for the same space.
	* Specializations must be defined for different memory spaces.
	*/
	template< typename DstMemorySpace , typename SrcMemorySpace >
	struct MemorySpaceAccess {

	static_assert( Kokkos::is_memory_space< DstMemorySpace >::value &&
	Kokkos::is_memory_space< SrcMemorySpace >::value
	, "template arguments must be memory spaces" );

	/**\brief Can a View (or pointer) to memory in SrcMemorySpace
	* be assigned to a View (or pointer) to memory marked DstMemorySpace.
	*
	* 1. DstMemorySpace::execution_space == SrcMemorySpace::execution_space
	* 2. All execution spaces that can access DstMemorySpace can also access
	* SrcMemorySpace.
	*/
	enum { assignable = std::is_same<DstMemorySpace,SrcMemorySpace>::value };

	/**\brief For all DstExecSpace::memory_space == DstMemorySpace
	* DstExecSpace can access SrcMemorySpace.
	*/
	enum { accessible = assignable };

	/**\brief Does a DeepCopy capability exist
	* to DstMemorySpace from SrcMemorySpace
	*/
	enum { deepcopy = assignable };
	};


	/**\brief Can AccessSpace access MemorySpace ?
	*
	* Requires:
	* Kokkos::is_space< AccessSpace >::value
	* Kokkos::is_memory_space< MemorySpace >::value
	*
	* Can AccessSpace::execution_space access MemorySpace ?
	* enum : bool { accessible };
	*
	* Is View<AccessSpace::memory_space> assignable from View<MemorySpace> ?
	* enum : bool { assignable };
	*
	* If ! accessible then through which intercessory memory space
	* should a be used to deep copy memory for
	* AccessSpace::execution_space
	* to get access.
	* When AccessSpace::memory_space == Kokkos::HostSpace
	* then space is the View host mirror space.
	*/
	template< typename AccessSpace , typename MemorySpace >
	struct SpaceAccessibility {
	private:

	static_assert( Kokkos::is_space< AccessSpace >::value
	, "template argument #1 must be a Kokkos space" );

	static_assert( Kokkos::is_memory_space< MemorySpace >::value
	, "template argument #2 must be a Kokkos memory space" );

	// The input AccessSpace may be a Device<ExecSpace,MemSpace>
	// verify that it is a valid combination of spaces.
	static_assert( Kokkos::Impl::MemorySpaceAccess
	< typename AccessSpace::execution_space::memory_space
	, typename AccessSpace::memory_space
	>::accessible
	, "template argument #1 is an invalid space" );

	typedef Kokkos::Impl::MemorySpaceAccess
	< typename AccessSpace::execution_space::memory_space , MemorySpace >
	exe_access ;

	typedef Kokkos::Impl::MemorySpaceAccess
	< typename AccessSpace::memory_space , MemorySpace >
	mem_access ;

	public:

	/**\brief Can AccessSpace::execution_space access MemorySpace ?
	*
	* Default based upon memory space accessibility.
	* Specialization required for other relationships.
	*/
	enum { accessible = exe_access::accessible };

	/**\brief Can assign to AccessSpace from MemorySpace ?
	*
	* Default based upon memory space accessibility.
	* Specialization required for other relationships.
	*/
	enum { assignable =
	is_memory_space< AccessSpace >::value && mem_access::assignable };

	/*\brief Can deep copy to AccessSpace::memory_Space from MemorySpace ? /
	enum { deepcopy = mem_access::deepcopy };

	// What intercessory space for AccessSpace::execution_space
	// to be able to access MemorySpace?
	// If same memory space or not accessible use the AccessSpace
	// else construct a device with execution space and memory space.
	typedef typename std::conditional
	< std::is_same<typename AccessSpace::memory_space,MemorySpace>::value \|\|
	! exe_access::accessible
	, AccessSpace
	, Kokkos::Device< typename AccessSpace::execution_space , MemorySpace >
	>::type space ;
	};

	}} // namespace Kokkos::Impl

	//----------------------------------------------------------------------------

	#endif // KOKKOS_CORE_CONCEPTS_HPP

	diff --git a/lib/kokkos/core/src/Kokkos_Core.hpp b/lib/kokkos/core/src/Kokkos_Core.hpp
	index 6d92f4bf6..16c1bce90 100644
	--- a/lib/kokkos/core/src/Kokkos_Core.hpp
	+++ b/lib/kokkos/core/src/Kokkos_Core.hpp
	@@ -1,162 +1,169 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_CORE_HPP
	#define KOKKOS_CORE_HPP

	//----------------------------------------------------------------------------
	// Include the execution space header files for the enabled execution spaces.

	#include <Kokkos_Core_fwd.hpp>

	#if defined( KOKKOS_ENABLE_SERIAL )
	#include <Kokkos_Serial.hpp>
	#endif

	#if defined( KOKKOS_ENABLE_OPENMP )
	#include <Kokkos_OpenMP.hpp>
	#endif

	+#if defined( KOKKOS_ENABLE_QTHREADS )
	+#include <Kokkos_Qthreads.hpp>
	+#endif
	+
	#if defined( KOKKOS_ENABLE_PTHREAD )
	#include <Kokkos_Threads.hpp>
	#endif

	#if defined( KOKKOS_ENABLE_CUDA )
	#include <Kokkos_Cuda.hpp>
	#endif

	#include <Kokkos_MemoryPool.hpp>
	#include <Kokkos_Pair.hpp>
	#include <Kokkos_Array.hpp>
	#include <Kokkos_View.hpp>
	#include <Kokkos_Vectorization.hpp>
	#include <Kokkos_Atomic.hpp>
	#include <Kokkos_hwloc.hpp>
	#include <Kokkos_Timer.hpp>

	#include <Kokkos_Complex.hpp>

	+#include <iosfwd>

	//----------------------------------------------------------------------------

	namespace Kokkos {

	struct InitArguments {
	int num_threads;
	int num_numa;
	int device_id;

	InitArguments() {
	num_threads = -1;
	num_numa = -1;
	device_id = -1;
	}
	};

	void initialize(int& narg, char* arg[]);

	void initialize(const InitArguments& args = InitArguments());

	/** \brief Finalize the spaces that were initialized via Kokkos::initialize */
	void finalize();

	/** \brief Finalize all known execution spaces */
	void finalize_all();

	void fence();

	+/** \brief Print "Bill of Materials" */
	+void print_configuration( std::ostream & , const bool detail = false );
	+
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {

	/* Allocate memory from a memory space.
	* The allocation is tracked in Kokkos memory tracking system, so
	* leaked memory can be identified.
	*/
	template< class Space = typename Kokkos::DefaultExecutionSpace::memory_space >
	inline
	void * kokkos_malloc( const std::string & arg_alloc_label
	, const size_t arg_alloc_size )
	{
	typedef typename Space::memory_space MemorySpace ;
	return Impl::SharedAllocationRecord< MemorySpace >::
	allocate_tracked( MemorySpace() , arg_alloc_label , arg_alloc_size );
	}

	template< class Space = typename Kokkos::DefaultExecutionSpace::memory_space >
	inline
	void * kokkos_malloc( const size_t arg_alloc_size )
	{
	typedef typename Space::memory_space MemorySpace ;
	return Impl::SharedAllocationRecord< MemorySpace >::
	allocate_tracked( MemorySpace() , "no-label" , arg_alloc_size );
	}

	template< class Space = typename Kokkos::DefaultExecutionSpace::memory_space >
	inline
	void kokkos_free( void * arg_alloc )
	{
	typedef typename Space::memory_space MemorySpace ;
	return Impl::SharedAllocationRecord< MemorySpace >::
	deallocate_tracked( arg_alloc );
	}

	template< class Space = typename Kokkos::DefaultExecutionSpace::memory_space >
	inline
	void * kokkos_realloc( void * arg_alloc , const size_t arg_alloc_size )
	{
	typedef typename Space::memory_space MemorySpace ;
	return Impl::SharedAllocationRecord< MemorySpace >::
	reallocate_tracked( arg_alloc , arg_alloc_size );
	}

	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	#endif
	-
	diff --git a/lib/kokkos/core/src/Kokkos_Core_fwd.hpp b/lib/kokkos/core/src/Kokkos_Core_fwd.hpp
	index e7e6a49d3..4029bf599 100644
	--- a/lib/kokkos/core/src/Kokkos_Core_fwd.hpp
	+++ b/lib/kokkos/core/src/Kokkos_Core_fwd.hpp
	@@ -1,248 +1,259 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_CORE_FWD_HPP
	#define KOKKOS_CORE_FWD_HPP

	//----------------------------------------------------------------------------
	// Kokkos_Macros.hpp does introspection on configuration options
	// and compiler environment then sets a collection of #define macros.

	#include <Kokkos_Macros.hpp>
	#include <impl/Kokkos_Utilities.hpp>

	//----------------------------------------------------------------------------
	// Have assumed a 64bit build (8byte pointers) throughout the code base.

	static_assert( sizeof(void*) == 8
	, "Kokkos assumes 64-bit build; i.e., 8-byte pointers" );

	//----------------------------------------------------------------------------

	namespace Kokkos {

	struct AUTO_t {
	KOKKOS_INLINE_FUNCTION
	- constexpr const AUTO_t & operator()() const { return *this ; }
	+ constexpr const AUTO_t & operator()() const { return *this; }
	};

	namespace {
	/*\brief Token to indicate that a parameter's value is to be automatically selected /
	constexpr AUTO_t AUTO = Kokkos::AUTO_t();
	}

	struct InvalidType {};

	-}
	+} // namespace Kokkos

	-//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------
	// Forward declarations for class inter-relationships

	namespace Kokkos {

	-class HostSpace ; ///< Memory space for main process and CPU execution spaces
	+class HostSpace; ///< Memory space for main process and CPU execution spaces

	#ifdef KOKKOS_ENABLE_HBWSPACE
	namespace Experimental {
	-class HBWSpace ; /// Memory space for hbw_malloc from memkind (e.g. for KNL processor)
	+class HBWSpace; /// Memory space for hbw_malloc from memkind (e.g. for KNL processor)
	}
	#endif

	#if defined( KOKKOS_ENABLE_SERIAL )
	-class Serial ; ///< Execution space main process on CPU
	-#endif // defined( KOKKOS_ENABLE_SERIAL )
	+class Serial; ///< Execution space main process on CPU.
	+#endif
	+
	+#if defined( KOKKOS_ENABLE_QTHREADS )
	+class Qthreads; ///< Execution space with Qthreads back-end.
	+#endif

	#if defined( KOKKOS_ENABLE_PTHREAD )
	-class Threads ; ///< Execution space with pthreads back-end
	+class Threads; ///< Execution space with pthreads back-end.
	#endif

	#if defined( KOKKOS_ENABLE_OPENMP )
	-class OpenMP ; ///< OpenMP execution space
	+class OpenMP; ///< OpenMP execution space.
	#endif

	#if defined( KOKKOS_ENABLE_CUDA )
	-class CudaSpace ; ///< Memory space on Cuda GPU
	-class CudaUVMSpace ; ///< Memory space on Cuda GPU with UVM
	-class CudaHostPinnedSpace ; ///< Memory space on Host accessible to Cuda GPU
	-class Cuda ; ///< Execution space for Cuda GPU
	+class CudaSpace; ///< Memory space on Cuda GPU
	+class CudaUVMSpace; ///< Memory space on Cuda GPU with UVM
	+class CudaHostPinnedSpace; ///< Memory space on Host accessible to Cuda GPU
	+class Cuda; ///< Execution space for Cuda GPU
	#endif

	template<class ExecutionSpace, class MemorySpace>
	struct Device;
	+
	} // namespace Kokkos

	-//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------
	// Set the default execution space.

	/// Define Kokkos::DefaultExecutionSpace as per configuration option
	/// or chosen from the enabled execution spaces in the following order:
	/// Kokkos::Cuda, Kokkos::OpenMP, Kokkos::Threads, Kokkos::Serial

	namespace Kokkos {

	-#if defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA )
	- typedef Cuda DefaultExecutionSpace ;
	-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
	- typedef OpenMP DefaultExecutionSpace ;
	-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
	- typedef Threads DefaultExecutionSpace ;
	-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
	- typedef Serial DefaultExecutionSpace ;
	+#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA )
	+ typedef Cuda DefaultExecutionSpace;
	+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
	+ typedef OpenMP DefaultExecutionSpace;
	+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
	+ typedef Threads DefaultExecutionSpace;
	+//#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS )
	+// typedef Qthreads DefaultExecutionSpace;
	+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
	+ typedef Serial DefaultExecutionSpace;
	#else
	-# error "At least one of the following execution spaces must be defined in order to use Kokkos: Kokkos::Cuda, Kokkos::OpenMP, Kokkos::Serial, or Kokkos::Threads."
	+# error "At least one of the following execution spaces must be defined in order to use Kokkos: Kokkos::Cuda, Kokkos::OpenMP, Kokkos::Threads, Kokkos::Qthreads, or Kokkos::Serial."
	#endif

	-#if defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
	- typedef OpenMP DefaultHostExecutionSpace ;
	-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
	- typedef Threads DefaultHostExecutionSpace ;
	-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
	- typedef Serial DefaultHostExecutionSpace ;
	-#elif defined ( KOKKOS_ENABLE_OPENMP )
	- typedef OpenMP DefaultHostExecutionSpace ;
	-#elif defined ( KOKKOS_ENABLE_PTHREAD )
	- typedef Threads DefaultHostExecutionSpace ;
	-#elif defined ( KOKKOS_ENABLE_SERIAL )
	- typedef Serial DefaultHostExecutionSpace ;
	+#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
	+ typedef OpenMP DefaultHostExecutionSpace;
	+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
	+ typedef Threads DefaultHostExecutionSpace;
	+//#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS )
	+// typedef Qthreads DefaultHostExecutionSpace;
	+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
	+ typedef Serial DefaultHostExecutionSpace;
	+#elif defined( KOKKOS_ENABLE_OPENMP )
	+ typedef OpenMP DefaultHostExecutionSpace;
	+#elif defined( KOKKOS_ENABLE_PTHREAD )
	+ typedef Threads DefaultHostExecutionSpace;
	+//#elif defined( KOKKOS_ENABLE_QTHREADS )
	+// typedef Qthreads DefaultHostExecutionSpace;
	+#elif defined( KOKKOS_ENABLE_SERIAL )
	+ typedef Serial DefaultHostExecutionSpace;
	#else
	-# error "At least one of the following execution spaces must be defined in order to use Kokkos: Kokkos::OpenMP, Kokkos::Serial, or Kokkos::Threads."
	+# error "At least one of the following execution spaces must be defined in order to use Kokkos: Kokkos::OpenMP, Kokkos::Threads, Kokkos::Qthreads, or Kokkos::Serial."
	#endif

	} // namespace Kokkos

	-//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------
	// Detect the active execution space and define its memory space.
	// This is used to verify whether a running kernel can access
	// a given memory space.

	namespace Kokkos {
	+
	namespace Impl {

	-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA ) && defined (KOKKOS_ENABLE_CUDA)
	-typedef Kokkos::CudaSpace ActiveExecutionMemorySpace ;
	+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA ) && defined( KOKKOS_ENABLE_CUDA )
	+typedef Kokkos::CudaSpace ActiveExecutionMemorySpace;
	#elif defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	-typedef Kokkos::HostSpace ActiveExecutionMemorySpace ;
	+typedef Kokkos::HostSpace ActiveExecutionMemorySpace;
	#else
	-typedef void ActiveExecutionMemorySpace ;
	+typedef void ActiveExecutionMemorySpace;
	#endif

	-template< class ActiveSpace , class MemorySpace >
	+template< class ActiveSpace, class MemorySpace >
	struct VerifyExecutionCanAccessMemorySpace {
	enum {value = 0};
	};

	template< class Space >
	-struct VerifyExecutionCanAccessMemorySpace< Space , Space >
	+struct VerifyExecutionCanAccessMemorySpace< Space, Space >
	{
	enum {value = 1};
	KOKKOS_INLINE_FUNCTION static void verify(void) {}
	KOKKOS_INLINE_FUNCTION static void verify(const void *) {}
	};

	} // namespace Impl
	+
	} // namespace Kokkos

	-#define KOKKOS_RESTRICT_EXECUTION_TO_DATA( DATA_SPACE , DATA_PTR ) \
	+#define KOKKOS_RESTRICT_EXECUTION_TO_DATA( DATA_SPACE, DATA_PTR ) \
	Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< \
	- Kokkos::Impl::ActiveExecutionMemorySpace , DATA_SPACE >::verify( DATA_PTR )
	+ Kokkos::Impl::ActiveExecutionMemorySpace, DATA_SPACE >::verify( DATA_PTR )

	#define KOKKOS_RESTRICT_EXECUTION_TO_( DATA_SPACE ) \
	Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< \
	- Kokkos::Impl::ActiveExecutionMemorySpace , DATA_SPACE >::verify()
	+ Kokkos::Impl::ActiveExecutionMemorySpace, DATA_SPACE >::verify()

	//----------------------------------------------------------------------------
	-//----------------------------------------------------------------------------

	namespace Kokkos {
	void fence();
	}

	-//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	+
	namespace Impl {

	template< class Functor
	, class Policy
	, class EnableFunctor = void
	- , class EnablePolicy = void
	+ , class EnablePolicy = void
	>
	struct FunctorPolicyExecutionSpace;

	//----------------------------------------------------------------------------
	/// \class ParallelFor
	/// \brief Implementation of the ParallelFor operator that has a
	/// partial specialization for the device.
	///
	/// This is an implementation detail of parallel_for. Users should
	/// skip this and go directly to the nonmember function parallel_for.
	-template< class FunctorType , class ExecPolicy , class ExecutionSpace =
	- typename Impl::FunctorPolicyExecutionSpace< FunctorType , ExecPolicy >::execution_space
	- > class ParallelFor ;
	+template< class FunctorType, class ExecPolicy, class ExecutionSpace =
	+ typename Impl::FunctorPolicyExecutionSpace< FunctorType, ExecPolicy >::execution_space
	+ > class ParallelFor;

	/// \class ParallelReduce
	/// \brief Implementation detail of parallel_reduce.
	///
	/// This is an implementation detail of parallel_reduce. Users should
	/// skip this and go directly to the nonmember function parallel_reduce.
	-template< class FunctorType , class ExecPolicy , class ReducerType = InvalidType, class ExecutionSpace =
	- typename Impl::FunctorPolicyExecutionSpace< FunctorType , ExecPolicy >::execution_space
	- > class ParallelReduce ;
	+template< class FunctorType, class ExecPolicy, class ReducerType = InvalidType, class ExecutionSpace =
	+ typename Impl::FunctorPolicyExecutionSpace< FunctorType, ExecPolicy >::execution_space
	+ > class ParallelReduce;

	/// \class ParallelScan
	/// \brief Implementation detail of parallel_scan.
	///
	/// This is an implementation detail of parallel_scan. Users should
	/// skip this and go directly to the documentation of the nonmember
	/// template function Kokkos::parallel_scan.
	-template< class FunctorType , class ExecPolicy , class ExecutionSapce =
	- typename Impl::FunctorPolicyExecutionSpace< FunctorType , ExecPolicy >::execution_space
	- > class ParallelScan ;
	+template< class FunctorType, class ExecPolicy, class ExecutionSapce =
	+ typename Impl::FunctorPolicyExecutionSpace< FunctorType, ExecPolicy >::execution_space
	+ > class ParallelScan;

	-}}
	-#endif /* #ifndef KOKKOS_CORE_FWD_HPP */
	+} // namespace Impl
	+
	+} // namespace Kokkos

	+#endif /* #ifndef KOKKOS_CORE_FWD_HPP */
	diff --git a/lib/kokkos/core/src/Kokkos_Cuda.hpp b/lib/kokkos/core/src/Kokkos_Cuda.hpp
	index afccdb6c5..433cac5e5 100644
	--- a/lib/kokkos/core/src/Kokkos_Cuda.hpp
	+++ b/lib/kokkos/core/src/Kokkos_Cuda.hpp
	@@ -1,304 +1,304 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_CUDA_HPP
	#define KOKKOS_CUDA_HPP

	#include <Kokkos_Core_fwd.hpp>

	// If CUDA execution space is enabled then use this header file.

	#if defined( KOKKOS_ENABLE_CUDA )

	#include <iosfwd>
	#include <vector>

	#include <Kokkos_CudaSpace.hpp>

	#include <Kokkos_Parallel.hpp>
	#include <Kokkos_TaskScheduler.hpp>
	#include <Kokkos_Layout.hpp>
	#include <Kokkos_ScratchSpace.hpp>
	#include <Kokkos_MemoryTraits.hpp>
	#include <impl/Kokkos_Tags.hpp>

	-#include <KokkosExp_MDRangePolicy.hpp>

	/--------------------------------------------------------------------------/

	namespace Kokkos {
	namespace Impl {
	class CudaExec ;
	} // namespace Impl
	} // namespace Kokkos

	/--------------------------------------------------------------------------/

	namespace Kokkos {

	/// \class Cuda
	/// \brief Kokkos Execution Space that uses CUDA to run on GPUs.
	///
	/// An "execution space" represents a parallel execution model. It tells Kokkos
	/// how to parallelize the execution of kernels in a parallel_for or
	/// parallel_reduce. For example, the Threads execution space uses Pthreads or
	/// C++11 threads on a CPU, the OpenMP execution space uses the OpenMP language
	/// extensions, and the Serial execution space executes "parallel" kernels
	/// sequentially. The Cuda execution space uses NVIDIA's CUDA programming
	/// model to execute kernels in parallel on GPUs.
	class Cuda {
	public:
	//! \name Type declarations that all Kokkos execution spaces must provide.
	//@{

	//! Tag this class as a kokkos execution space
	typedef Cuda execution_space ;

	#if defined( KOKKOS_ENABLE_CUDA_UVM )
	//! This execution space's preferred memory space.
	typedef CudaUVMSpace memory_space ;
	#else
	//! This execution space's preferred memory space.
	typedef CudaSpace memory_space ;
	#endif

	//! This execution space preferred device_type
	typedef Kokkos::Device<execution_space,memory_space> device_type;

	//! The size_type best suited for this execution space.
	typedef memory_space::size_type size_type ;

	//! This execution space's preferred array layout.
	typedef LayoutLeft array_layout ;

	//!
	typedef ScratchMemorySpace< Cuda > scratch_memory_space ;

	//@}
	//--------------------------------------------------
	//! \name Functions that all Kokkos devices must implement.
	//@{

	/// \brief True if and only if this method is being called in a
	/// thread-parallel function.
	KOKKOS_INLINE_FUNCTION static int in_parallel() {
	#if defined( __CUDA_ARCH__ )
	return true;
	#else
	return false;
	#endif
	}

	/** \brief Set the device in a "sleep" state.
	*
	* This function sets the device in a "sleep" state in which it is
	* not ready for work. This may consume less resources than if the
	* device were in an "awake" state, but it may also take time to
	* bring the device from a sleep state to be ready for work.
	*
	* \return True if the device is in the "sleep" state, else false if
	* the device is actively working and could not enter the "sleep"
	* state.
	*/
	static bool sleep();

	/// \brief Wake the device from the 'sleep' state so it is ready for work.
	///
	/// \return True if the device is in the "ready" state, else "false"
	/// if the device is actively working (which also means that it's
	/// awake).
	static bool wake();

	/// \brief Wait until all dispatched functors complete.
	///
	/// The parallel_for or parallel_reduce dispatch of a functor may
	/// return asynchronously, before the functor completes. This
	/// method does not return until all dispatched functors on this
	/// device have completed.
	static void fence();

	//! Free any resources being consumed by the device.
	static void finalize();

	//! Has been initialized
	static int is_initialized();

	/** \brief Return the maximum amount of concurrency. */
	static int concurrency();

	//! Print configuration information to the given output stream.
	static void print_configuration( std::ostream & , const bool detail = false );

	//@}
	//--------------------------------------------------
	//! \name Cuda space instances

	~Cuda() {}
	Cuda();
	explicit Cuda( const int instance_id );

	Cuda( Cuda && ) = default ;
	Cuda( const Cuda & ) = default ;
	Cuda & operator = ( Cuda && ) = default ;
	Cuda & operator = ( const Cuda & ) = default ;

	//--------------------------------------------------------------------------
	//! \name Device-specific functions
	//@{

	struct SelectDevice {
	int cuda_device_id ;
	SelectDevice() : cuda_device_id(0) {}
	explicit SelectDevice( int id ) : cuda_device_id( id ) {}
	};

	//! Initialize, telling the CUDA run-time library which device to use.
	static void initialize( const SelectDevice = SelectDevice()
	, const size_t num_instances = 1 );

	/// \brief Cuda device architecture of the selected device.
	///
	/// This matches the __CUDA_ARCH__ specification.
	static size_type device_arch();

	//! Query device count.
	static size_type detect_device_count();

	/** \brief Detect the available devices and their architecture
	* as defined by the __CUDA_ARCH__ specification.
	*/
	static std::vector<unsigned> detect_device_arch();

	cudaStream_t cuda_stream() const { return m_stream ; }
	int cuda_device() const { return m_device ; }

	//@}
	//--------------------------------------------------------------------------

	private:

	cudaStream_t m_stream ;
	int m_device ;
	};

	} // namespace Kokkos

	/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/

	namespace Kokkos {
	namespace Impl {

	template<>
	struct MemorySpaceAccess
	< Kokkos::CudaSpace
	, Kokkos::Cuda::scratch_memory_space
	>
	{
	enum { assignable = false };
	enum { accessible = true };
	enum { deepcopy = false };
	};

	#if defined( KOKKOS_ENABLE_CUDA_UVM )

	// If forcing use of UVM everywhere
	// then must assume that CudaUVMSpace
	// can be a stand-in for CudaSpace.
	// This will fail when a strange host-side execution space
	// that defines CudaUVMSpace as its preferredmemory space.

	template<>
	struct MemorySpaceAccess
	< Kokkos::CudaUVMSpace
	, Kokkos::Cuda::scratch_memory_space
	>
	{
	enum { assignable = false };
	enum { accessible = true };
	enum { deepcopy = false };
	};

	#endif


	template<>
	struct VerifyExecutionCanAccessMemorySpace
	< Kokkos::CudaSpace
	, Kokkos::Cuda::scratch_memory_space
	>
	{
	enum { value = true };
	KOKKOS_INLINE_FUNCTION static void verify( void ) { }
	KOKKOS_INLINE_FUNCTION static void verify( const void * ) { }
	};

	template<>
	struct VerifyExecutionCanAccessMemorySpace
	< Kokkos::HostSpace
	, Kokkos::Cuda::scratch_memory_space
	>
	{
	enum { value = false };
	inline static void verify( void ) { CudaSpace::access_error(); }
	inline static void verify( const void * p ) { CudaSpace::access_error(p); }
	};

	} // namespace Impl
	} // namespace Kokkos

	/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/

	#include <Cuda/Kokkos_CudaExec.hpp>
	#include <Cuda/Kokkos_Cuda_View.hpp>
	#include <Cuda/Kokkos_Cuda_Parallel.hpp>
	#include <Cuda/Kokkos_Cuda_Task.hpp>

	+#include <KokkosExp_MDRangePolicy.hpp>
	//----------------------------------------------------------------------------

	#endif /* #if defined( KOKKOS_ENABLE_CUDA ) */
	#endif /* #ifndef KOKKOS_CUDA_HPP */



	diff --git a/lib/kokkos/core/src/Kokkos_HBWSpace.hpp b/lib/kokkos/core/src/Kokkos_HBWSpace.hpp
	index d6bf8dcdf..fc39ce0e5 100644
	--- a/lib/kokkos/core/src/Kokkos_HBWSpace.hpp
	+++ b/lib/kokkos/core/src/Kokkos_HBWSpace.hpp
	@@ -1,337 +1,352 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_HBWSPACE_HPP
	#define KOKKOS_HBWSPACE_HPP

	-
	#include <Kokkos_HostSpace.hpp>

	/--------------------------------------------------------------------------/
	+
	#ifdef KOKKOS_ENABLE_HBWSPACE

	namespace Kokkos {
	+
	namespace Experimental {
	+
	namespace Impl {

	/// \brief Initialize lock array for arbitrary size atomics.
	///
	/// Arbitrary atomics are implemented using a hash table of locks
	/// where the hash value is derived from the address of the
	/// object for which an atomic operation is performed.
	/// This function initializes the locks to zero (unset).
	void init_lock_array_hbw_space();

	/// \brief Aquire a lock for the address
	///
	/// This function tries to aquire the lock for the hash value derived
	/// from the provided ptr. If the lock is successfully aquired the
	/// function returns true. Otherwise it returns false.
	-bool lock_address_hbw_space(void* ptr);
	+bool lock_address_hbw_space( void* ptr );

	/// \brief Release lock for the address
	///
	/// This function releases the lock for the hash value derived
	/// from the provided ptr. This function should only be called
	/// after previously successfully aquiring a lock with
	/// lock_address.
	-void unlock_address_hbw_space(void* ptr);
	+void unlock_address_hbw_space( void* ptr );

	} // namespace Impl
	-} // neamspace Experimental
	+
	+} // namespace Experimental
	+
	} // namespace Kokkos

	namespace Kokkos {
	+
	namespace Experimental {

	/// \class HBWSpace
	/// \brief Memory management for host memory.
	///
	/// HBWSpace is a memory space that governs host memory. "Host"
	/// memory means the usual CPU-accessible memory.
	class HBWSpace {
	public:
	-
	//! Tag this class as a kokkos memory space
	- typedef HBWSpace memory_space ;
	- typedef size_t size_type ;
	+ typedef HBWSpace memory_space;
	+ typedef size_t size_type;

	/// \typedef execution_space
	/// \brief Default execution space for this memory space.
	///
	/// Every memory space has a default execution space. This is
	/// useful for things like initializing a View (which happens in
	/// parallel using the View's default execution space).
	#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
	- typedef Kokkos::OpenMP execution_space ;
	+ typedef Kokkos::OpenMP execution_space;
	#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
	- typedef Kokkos::Threads execution_space ;
	+ typedef Kokkos::Threads execution_space;
	+//#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS )
	+// typedef Kokkos::Qthreads execution_space;
	#elif defined( KOKKOS_ENABLE_OPENMP )
	- typedef Kokkos::OpenMP execution_space ;
	+ typedef Kokkos::OpenMP execution_space;
	#elif defined( KOKKOS_ENABLE_PTHREAD )
	- typedef Kokkos::Threads execution_space ;
	+ typedef Kokkos::Threads execution_space;
	+//#elif defined( KOKKOS_ENABLE_QTHREADS )
	+// typedef Kokkos::Qthreads execution_space;
	#elif defined( KOKKOS_ENABLE_SERIAL )
	- typedef Kokkos::Serial execution_space ;
	+ typedef Kokkos::Serial execution_space;
	#else
	-# error "At least one of the following host execution spaces must be defined: Kokkos::OpenMP, Kokkos::Serial, or Kokkos::Threads. You might be seeing this message if you disabled the Kokkos::Serial device explicitly using the Kokkos_ENABLE_Serial:BOOL=OFF CMake option, but did not enable any of the other host execution space devices."
	+# error "At least one of the following host execution spaces must be defined: Kokkos::OpenMP, Kokkos::Threads, Kokkos::Qhreads, or Kokkos::Serial. You might be seeing this message if you disabled the Kokkos::Serial device explicitly using the Kokkos_ENABLE_Serial:BOOL=OFF CMake option, but did not enable any of the other host execution space devices."
	#endif

	//! This memory space preferred device_type
	- typedef Kokkos::Device<execution_space,memory_space> device_type;
	+ typedef Kokkos::Device< execution_space, memory_space > device_type;

	/--------------------------------/
	/* Functions unique to the HBWSpace */
	static int in_parallel();

	static void register_in_parallel( int (*)() );

	/--------------------------------/

	/*\brief Default memory space instance /
	HBWSpace();
	- HBWSpace( const HBWSpace & rhs ) = default ;
	- HBWSpace & operator = ( const HBWSpace & ) = default ;
	- ~HBWSpace() = default ;
	+ HBWSpace( const HBWSpace & rhs ) = default;
	+ HBWSpace & operator = ( const HBWSpace & ) = default;
	+ ~HBWSpace() = default;

	/*\brief Non-default memory space instance to choose allocation mechansim, if available /

	- enum AllocationMechanism { STD_MALLOC , POSIX_MEMALIGN , POSIX_MMAP , INTEL_MM_ALLOC };
	+ enum AllocationMechanism { STD_MALLOC, POSIX_MEMALIGN, POSIX_MMAP, INTEL_MM_ALLOC };

	explicit
	HBWSpace( const AllocationMechanism & );

	/*\brief Allocate untracked memory in the space /
	- void * allocate( const size_t arg_alloc_size ) const ;
	+ void * allocate( const size_t arg_alloc_size ) const;

	/*\brief Deallocate untracked memory in the space /
	- void deallocate( void * const arg_alloc_ptr
	- , const size_t arg_alloc_size ) const ;
	+ void deallocate( void * const arg_alloc_ptr
	+ , const size_t arg_alloc_size ) const;

	/*\brief Return Name of the MemorySpace /
	static constexpr const char* name();

	private:

	- AllocationMechanism m_alloc_mech ;
	+ AllocationMechanism m_alloc_mech;
	static constexpr const char* m_name = "HBW";
	- friend class Kokkos::Impl::SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void > ;
	+ friend class Kokkos::Impl::SharedAllocationRecord< Kokkos::Experimental::HBWSpace, void >;
	};

	} // namespace Experimental
	+
	} // namespace Kokkos

	-//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	+
	namespace Impl {

	template<>
	-class SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >
	- : public SharedAllocationRecord< void , void >
	+class SharedAllocationRecord< Kokkos::Experimental::HBWSpace, void >
	+ : public SharedAllocationRecord< void, void >
	{
	private:

	- friend Kokkos::Experimental::HBWSpace ;
	+ friend Kokkos::Experimental::HBWSpace;

	- typedef SharedAllocationRecord< void , void > RecordBase ;
	+ typedef SharedAllocationRecord< void, void > RecordBase;

	- SharedAllocationRecord( const SharedAllocationRecord & ) = delete ;
	- SharedAllocationRecord & operator = ( const SharedAllocationRecord & ) = delete ;
	+ SharedAllocationRecord( const SharedAllocationRecord & ) = delete;
	+ SharedAllocationRecord & operator = ( const SharedAllocationRecord & ) = delete;

	static void deallocate( RecordBase * );

	/*\brief Root record for tracked allocations from this HBWSpace instance /
	- static RecordBase s_root_record ;
	+ static RecordBase s_root_record;

	- const Kokkos::Experimental::HBWSpace m_space ;
	+ const Kokkos::Experimental::HBWSpace m_space;

	protected:

	~SharedAllocationRecord();
	- SharedAllocationRecord() = default ;
	+ SharedAllocationRecord() = default;

	- SharedAllocationRecord( const Kokkos::Experimental::HBWSpace & arg_space
	- , const std::string & arg_label
	- , const size_t arg_alloc_size
	- , const RecordBase::function_type arg_dealloc = & deallocate
	+ SharedAllocationRecord( const Kokkos::Experimental::HBWSpace & arg_space
	+ , const std::string & arg_label
	+ , const size_t arg_alloc_size
	+ , const RecordBase::function_type arg_dealloc = & deallocate
	);

	public:

	inline
	std::string get_label() const
	{
	return std::string( RecordBase::head()->m_label );
	}

	KOKKOS_INLINE_FUNCTION static
	- SharedAllocationRecord * allocate( const Kokkos::Experimental::HBWSpace & arg_space
	- , const std::string & arg_label
	- , const size_t arg_alloc_size
	+ SharedAllocationRecord * allocate( const Kokkos::Experimental::HBWSpace & arg_space
	+ , const std::string & arg_label
	+ , const size_t arg_alloc_size
	)
	{
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- return new SharedAllocationRecord( arg_space , arg_label , arg_alloc_size );
	+ return new SharedAllocationRecord( arg_space, arg_label, arg_alloc_size );
	#else
	- return (SharedAllocationRecord *) 0 ;
	+ return (SharedAllocationRecord *) 0;
	#endif
	}

	/*\brief Allocate tracked memory in the space /
	static
	void * allocate_tracked( const Kokkos::Experimental::HBWSpace & arg_space
	- , const std::string & arg_label
	- , const size_t arg_alloc_size );
	+ , const std::string & arg_label
	+ , const size_t arg_alloc_size );

	/*\brief Reallocate tracked memory in the space /
	static
	void * reallocate_tracked( void * const arg_alloc_ptr
	, const size_t arg_alloc_size );

	/*\brief Deallocate tracked memory in the space /
	static
	void deallocate_tracked( void * const arg_alloc_ptr );

	-
	static SharedAllocationRecord * get_record( void * arg_alloc_ptr );

	- static void print_records( std::ostream & , const Kokkos::Experimental::HBWSpace & , bool detail = false );
	+ static void print_records( std::ostream &, const Kokkos::Experimental::HBWSpace &, bool detail = false );
	};

	} // namespace Impl
	-} // namespace Kokkos

	+} // namespace Kokkos

	-//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	+
	namespace Impl {

	-static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::Experimental::HBWSpace , Kokkos::Experimental::HBWSpace >::assignable , "" );
	+static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::Experimental::HBWSpace, Kokkos::Experimental::HBWSpace >::assignable, "" );

	template<>
	-struct MemorySpaceAccess< Kokkos::HostSpace , Kokkos::Experimental::HBWSpace > {
	+struct MemorySpaceAccess< Kokkos::HostSpace, Kokkos::Experimental::HBWSpace > {
	enum { assignable = true };
	enum { accessible = true };
	enum { deepcopy = true };
	};

	template<>
	-struct MemorySpaceAccess< Kokkos::Experimental::HBWSpace , Kokkos::HostSpace> {
	+struct MemorySpaceAccess< Kokkos::Experimental::HBWSpace, Kokkos::HostSpace > {
	enum { assignable = false };
	enum { accessible = true };
	enum { deepcopy = true };
	};

	-}}
	+} // namespace Impl
	+
	+} // namespace Kokkos

	-//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	-namespace Impl {

	+namespace Impl {

	-template<class ExecutionSpace>
	-struct DeepCopy<Experimental::HBWSpace,Experimental::HBWSpace,ExecutionSpace> {
	- DeepCopy( void * dst , const void * src , size_t n ) {
	- memcpy( dst , src , n );
	+template< class ExecutionSpace >
	+struct DeepCopy< Experimental::HBWSpace, Experimental::HBWSpace, ExecutionSpace > {
	+ DeepCopy( void * dst, const void * src, size_t n ) {
	+ memcpy( dst, src, n );
	}
	- DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n ) {
	+
	+ DeepCopy( const ExecutionSpace& exec, void * dst, const void * src, size_t n ) {
	exec.fence();
	- memcpy( dst , src , n );
	+ memcpy( dst, src, n );
	}
	};

	-template<class ExecutionSpace>
	-struct DeepCopy<HostSpace,Experimental::HBWSpace,ExecutionSpace> {
	- DeepCopy( void * dst , const void * src , size_t n ) {
	- memcpy( dst , src , n );
	+template< class ExecutionSpace >
	+struct DeepCopy< HostSpace, Experimental::HBWSpace, ExecutionSpace > {
	+ DeepCopy( void * dst, const void * src, size_t n ) {
	+ memcpy( dst, src, n );
	}
	- DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n ) {
	+
	+ DeepCopy( const ExecutionSpace& exec, void * dst, const void * src, size_t n ) {
	exec.fence();
	- memcpy( dst , src , n );
	+ memcpy( dst, src, n );
	}
	};

	-template<class ExecutionSpace>
	-struct DeepCopy<Experimental::HBWSpace,HostSpace,ExecutionSpace> {
	- DeepCopy( void * dst , const void * src , size_t n ) {
	- memcpy( dst , src , n );
	+template< class ExecutionSpace >
	+struct DeepCopy< Experimental::HBWSpace, HostSpace, ExecutionSpace > {
	+ DeepCopy( void * dst, const void * src, size_t n ) {
	+ memcpy( dst, src, n );
	}
	- DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n ) {
	+
	+ DeepCopy( const ExecutionSpace& exec, void * dst, const void * src, size_t n ) {
	exec.fence();
	- memcpy( dst , src , n );
	+ memcpy( dst, src, n );
	}
	};

	} // namespace Impl
	+
	} // namespace Kokkos

	namespace Kokkos {
	+
	namespace Impl {

	template<>
	-struct VerifyExecutionCanAccessMemorySpace< Kokkos::HostSpace , Kokkos::Experimental::HBWSpace >
	+struct VerifyExecutionCanAccessMemorySpace< Kokkos::HostSpace, Kokkos::Experimental::HBWSpace >
	{
	enum { value = true };
	inline static void verify( void ) { }
	inline static void verify( const void * ) { }
	};

	template<>
	-struct VerifyExecutionCanAccessMemorySpace< Kokkos::Experimental::HBWSpace , Kokkos::HostSpace >
	+struct VerifyExecutionCanAccessMemorySpace< Kokkos::Experimental::HBWSpace, Kokkos::HostSpace >
	{
	enum { value = true };
	inline static void verify( void ) { }
	inline static void verify( const void * ) { }
	};

	} // namespace Impl
	+
	} // namespace Kokkos

	#endif
	-#endif /* #define KOKKOS_HBWSPACE_HPP */

	+#endif // #define KOKKOS_HBWSPACE_HPP
	diff --git a/lib/kokkos/core/src/Kokkos_HostSpace.hpp b/lib/kokkos/core/src/Kokkos_HostSpace.hpp
	index e79de462b..82006665c 100644
	--- a/lib/kokkos/core/src/Kokkos_HostSpace.hpp
	+++ b/lib/kokkos/core/src/Kokkos_HostSpace.hpp
	@@ -1,317 +1,318 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_HOSTSPACE_HPP
	#define KOKKOS_HOSTSPACE_HPP

	#include <cstring>
	#include <string>
	#include <iosfwd>
	#include <typeinfo>

	#include <Kokkos_Core_fwd.hpp>
	#include <Kokkos_Concepts.hpp>
	#include <Kokkos_MemoryTraits.hpp>

	#include <impl/Kokkos_Traits.hpp>
	#include <impl/Kokkos_Error.hpp>
	#include <impl/Kokkos_SharedAlloc.hpp>

	/--------------------------------------------------------------------------/

	namespace Kokkos {
	+
	namespace Impl {

	/// \brief Initialize lock array for arbitrary size atomics.
	///
	/// Arbitrary atomics are implemented using a hash table of locks
	/// where the hash value is derived from the address of the
	/// object for which an atomic operation is performed.
	/// This function initializes the locks to zero (unset).
	void init_lock_array_host_space();

	/// \brief Aquire a lock for the address
	///
	/// This function tries to aquire the lock for the hash value derived
	/// from the provided ptr. If the lock is successfully aquired the
	/// function returns true. Otherwise it returns false.
	bool lock_address_host_space(void* ptr);

	/// \brief Release lock for the address
	///
	/// This function releases the lock for the hash value derived
	/// from the provided ptr. This function should only be called
	/// after previously successfully aquiring a lock with
	/// lock_address.
	-void unlock_address_host_space(void* ptr);
	+void unlock_address_host_space( void* ptr );

	} // namespace Impl
	+
	} // namespace Kokkos

	namespace Kokkos {

	/// \class HostSpace
	/// \brief Memory management for host memory.
	///
	/// HostSpace is a memory space that governs host memory. "Host"
	/// memory means the usual CPU-accessible memory.
	class HostSpace {
	public:
	-
	//! Tag this class as a kokkos memory space
	- typedef HostSpace memory_space ;
	- typedef size_t size_type ;
	+ typedef HostSpace memory_space;
	+ typedef size_t size_type;

	/// \typedef execution_space
	/// \brief Default execution space for this memory space.
	///
	/// Every memory space has a default execution space. This is
	/// useful for things like initializing a View (which happens in
	/// parallel using the View's default execution space).
	#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
	- typedef Kokkos::OpenMP execution_space ;
	+ typedef Kokkos::OpenMP execution_space;
	#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
	- typedef Kokkos::Threads execution_space ;
	+ typedef Kokkos::Threads execution_space;
	+//#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS )
	+// typedef Kokkos::Qthreads execution_space;
	#elif defined( KOKKOS_ENABLE_OPENMP )
	- typedef Kokkos::OpenMP execution_space ;
	+ typedef Kokkos::OpenMP execution_space;
	#elif defined( KOKKOS_ENABLE_PTHREAD )
	- typedef Kokkos::Threads execution_space ;
	+ typedef Kokkos::Threads execution_space;
	+//#elif defined( KOKKOS_ENABLE_QTHREADS )
	+// typedef Kokkos::Qthreads execution_space;
	#elif defined( KOKKOS_ENABLE_SERIAL )
	- typedef Kokkos::Serial execution_space ;
	+ typedef Kokkos::Serial execution_space;
	#else
	-# error "At least one of the following host execution spaces must be defined: Kokkos::OpenMP, Kokkos::Serial, or Kokkos::Threads. You might be seeing this message if you disabled the Kokkos::Serial device explicitly using the Kokkos_ENABLE_Serial:BOOL=OFF CMake option, but did not enable any of the other host execution space devices."
	+# error "At least one of the following host execution spaces must be defined: Kokkos::OpenMP, Kokkos::Threads, Kokkos::Qthreads, or Kokkos::Serial. You might be seeing this message if you disabled the Kokkos::Serial device explicitly using the Kokkos_ENABLE_Serial:BOOL=OFF CMake option, but did not enable any of the other host execution space devices."
	#endif

	//! This memory space preferred device_type
	- typedef Kokkos::Device<execution_space,memory_space> device_type;
	+ typedef Kokkos::Device< execution_space, memory_space > device_type;

	/--------------------------------/
	/* Functions unique to the HostSpace */
	static int in_parallel();

	static void register_in_parallel( int (*)() );

	/--------------------------------/

	/*\brief Default memory space instance /
	HostSpace();
	- HostSpace( HostSpace && rhs ) = default ;
	- HostSpace( const HostSpace & rhs ) = default ;
	- HostSpace & operator = ( HostSpace && ) = default ;
	- HostSpace & operator = ( const HostSpace & ) = default ;
	- ~HostSpace() = default ;
	+ HostSpace( HostSpace && rhs ) = default;
	+ HostSpace( const HostSpace & rhs ) = default;
	+ HostSpace & operator = ( HostSpace && ) = default;
	+ HostSpace & operator = ( const HostSpace & ) = default;
	+ ~HostSpace() = default;

	/*\brief Non-default memory space instance to choose allocation mechansim, if available /

	- enum AllocationMechanism { STD_MALLOC , POSIX_MEMALIGN , POSIX_MMAP , INTEL_MM_ALLOC };
	+ enum AllocationMechanism { STD_MALLOC, POSIX_MEMALIGN, POSIX_MMAP, INTEL_MM_ALLOC };

	explicit
	HostSpace( const AllocationMechanism & );

	/*\brief Allocate untracked memory in the space /
	- void * allocate( const size_t arg_alloc_size ) const ;
	+ void * allocate( const size_t arg_alloc_size ) const;

	/*\brief Deallocate untracked memory in the space /
	- void deallocate( void * const arg_alloc_ptr
	- , const size_t arg_alloc_size ) const ;
	+ void deallocate( void * const arg_alloc_ptr
	+ , const size_t arg_alloc_size ) const;

	/*\brief Return Name of the MemorySpace /
	static constexpr const char* name();

	private:
	-
	- AllocationMechanism m_alloc_mech ;
	+ AllocationMechanism m_alloc_mech;
	static constexpr const char* m_name = "Host";
	- friend class Kokkos::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > ;
	+ friend class Kokkos::Impl::SharedAllocationRecord< Kokkos::HostSpace, void >;
	};

	} // namespace Kokkos

	-//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	-namespace Impl {

	-static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::HostSpace >::assignable , "" );
	+namespace Impl {

	+static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace, Kokkos::HostSpace >::assignable, "" );

	template< typename S >
	struct HostMirror {
	private:
	-
	// If input execution space can access HostSpace then keep it.
	// Example: Kokkos::OpenMP can access, Kokkos::Cuda cannot
	enum { keep_exe = Kokkos::Impl::MemorySpaceAccess
	- < typename S::execution_space::memory_space , Kokkos::HostSpace >
	- ::accessible };
	+ < typename S::execution_space::memory_space, Kokkos::HostSpace >::accessible };

	// If HostSpace can access memory space then keep it.
	// Example: Cannot access Kokkos::CudaSpace, can access Kokkos::CudaUVMSpace
	enum { keep_mem = Kokkos::Impl::MemorySpaceAccess
	- < Kokkos::HostSpace , typename S::memory_space >::accessible };
	+ < Kokkos::HostSpace, typename S::memory_space >::accessible };

	public:

	typedef typename std::conditional
	< keep_exe && keep_mem /* Can keep whole space */
	, S
	, typename std::conditional
	< keep_mem /* Can keep memory space, use default Host execution space */
	, Kokkos::Device< Kokkos::HostSpace::execution_space
	, typename S::memory_space >
	, Kokkos::HostSpace
	>::type
	- >::type Space ;
	+ >::type Space;
	};

	} // namespace Impl
	+
	} // namespace Kokkos

	-//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	+
	namespace Impl {

	template<>
	-class SharedAllocationRecord< Kokkos::HostSpace , void >
	- : public SharedAllocationRecord< void , void >
	+class SharedAllocationRecord< Kokkos::HostSpace, void >
	+ : public SharedAllocationRecord< void, void >
	{
	private:
	+ friend Kokkos::HostSpace;

	- friend Kokkos::HostSpace ;
	-
	- typedef SharedAllocationRecord< void , void > RecordBase ;
	+ typedef SharedAllocationRecord< void, void > RecordBase;

	- SharedAllocationRecord( const SharedAllocationRecord & ) = delete ;
	- SharedAllocationRecord & operator = ( const SharedAllocationRecord & ) = delete ;
	+ SharedAllocationRecord( const SharedAllocationRecord & ) = delete;
	+ SharedAllocationRecord & operator = ( const SharedAllocationRecord & ) = delete;

	static void deallocate( RecordBase * );

	/*\brief Root record for tracked allocations from this HostSpace instance /
	- static RecordBase s_root_record ;
	+ static RecordBase s_root_record;

	- const Kokkos::HostSpace m_space ;
	+ const Kokkos::HostSpace m_space;

	protected:
	-
	~SharedAllocationRecord();
	- SharedAllocationRecord() = default ;
	+ SharedAllocationRecord() = default;

	SharedAllocationRecord( const Kokkos::HostSpace & arg_space
	, const std::string & arg_label
	, const size_t arg_alloc_size
	, const RecordBase::function_type arg_dealloc = & deallocate
	);

	public:

	inline
	std::string get_label() const
	- {
	- return std::string( RecordBase::head()->m_label );
	- }
	+ {
	+ return std::string( RecordBase::head()->m_label );
	+ }

	KOKKOS_INLINE_FUNCTION static
	SharedAllocationRecord * allocate( const Kokkos::HostSpace & arg_space
	, const std::string & arg_label
	, const size_t arg_alloc_size
	)
	- {
	+ {
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- return new SharedAllocationRecord( arg_space , arg_label , arg_alloc_size );
	+ return new SharedAllocationRecord( arg_space, arg_label, arg_alloc_size );
	#else
	- return (SharedAllocationRecord *) 0 ;
	+ return (SharedAllocationRecord *) 0;
	#endif
	- }
	+ }
	+

	/*\brief Allocate tracked memory in the space /
	static
	void * allocate_tracked( const Kokkos::HostSpace & arg_space
	, const std::string & arg_label
	, const size_t arg_alloc_size );

	/*\brief Reallocate tracked memory in the space /
	static
	void * reallocate_tracked( void * const arg_alloc_ptr
	, const size_t arg_alloc_size );

	/*\brief Deallocate tracked memory in the space /
	static
	void deallocate_tracked( void * const arg_alloc_ptr );

	-
	static SharedAllocationRecord * get_record( void * arg_alloc_ptr );

	- static void print_records( std::ostream & , const Kokkos::HostSpace & , bool detail = false );
	+ static void print_records( std::ostream &, const Kokkos::HostSpace &, bool detail = false );
	};

	} // namespace Impl
	+
	} // namespace Kokkos

	-//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	+
	namespace Impl {

	-template< class DstSpace, class SrcSpace, class ExecutionSpace = typename DstSpace::execution_space> struct DeepCopy ;
	+template< class DstSpace, class SrcSpace, class ExecutionSpace = typename DstSpace::execution_space > struct DeepCopy;

	-template<class ExecutionSpace>
	-struct DeepCopy<HostSpace,HostSpace,ExecutionSpace> {
	- DeepCopy( void * dst , const void * src , size_t n ) {
	- memcpy( dst , src , n );
	+template< class ExecutionSpace >
	+struct DeepCopy< HostSpace, HostSpace, ExecutionSpace > {
	+ DeepCopy( void * dst, const void * src, size_t n ) {
	+ memcpy( dst, src, n );
	}
	- DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n ) {
	+
	+ DeepCopy( const ExecutionSpace& exec, void * dst, const void * src, size_t n ) {
	exec.fence();
	- memcpy( dst , src , n );
	+ memcpy( dst, src, n );
	}
	};

	} // namespace Impl
	-} // namespace Kokkos
	-

	-#endif /* #define KOKKOS_HOSTSPACE_HPP */
	+} // namespace Kokkos

	+#endif // #define KOKKOS_HOSTSPACE_HPP
	diff --git a/lib/kokkos/core/src/Kokkos_Macros.hpp b/lib/kokkos/core/src/Kokkos_Macros.hpp
	index 52845b9e0..c138b08c9 100644
	--- a/lib/kokkos/core/src/Kokkos_Macros.hpp
	+++ b/lib/kokkos/core/src/Kokkos_Macros.hpp
	@@ -1,493 +1,468 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_MACROS_HPP
	#define KOKKOS_MACROS_HPP

	//----------------------------------------------------------------------------
	-/** Pick up configure/build options via #define macros:
	+/** Pick up configure / build options via #define macros:
	*
	* KOKKOS_ENABLE_CUDA Kokkos::Cuda execution and memory spaces
	* KOKKOS_ENABLE_PTHREAD Kokkos::Threads execution space
	- * KOKKOS_ENABLE_QTHREAD Kokkos::Qthread execution space
	- * KOKKOS_ENABLE_OPENMP Kokkos::OpenMP execution space
	- * KOKKOS_ENABLE_HWLOC HWLOC library is available
	- * KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK insert array bounds checks, is expensive!
	- *
	- * KOKKOS_ENABLE_MPI negotiate MPI/execution space interactions
	- *
	- * KOKKOS_ENABLE_CUDA_UVM Use CUDA UVM for Cuda memory space
	+ * KOKKOS_ENABLE_QTHREADS Kokkos::Qthreads execution space
	+ * KOKKOS_ENABLE_OPENMP Kokkos::OpenMP execution space
	+ * KOKKOS_ENABLE_HWLOC HWLOC library is available.
	+ * KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK Insert array bounds checks, is expensive!
	+ * KOKKOS_ENABLE_MPI Negotiate MPI/execution space interactions.
	+ * KOKKOS_ENABLE_CUDA_UVM Use CUDA UVM for Cuda memory space.
	*/

	#ifndef KOKKOS_DONT_INCLUDE_CORE_CONFIG_H
	-#include <KokkosCore_config.h>
	+ #include <KokkosCore_config.h>
	#endif

	#include <impl/Kokkos_OldMacros.hpp>

	//----------------------------------------------------------------------------
	/** Pick up compiler specific #define macros:
	*
	* Macros for known compilers evaluate to an integral version value
	*
	* KOKKOS_COMPILER_NVCC
	* KOKKOS_COMPILER_GNU
	* KOKKOS_COMPILER_INTEL
	* KOKKOS_COMPILER_IBM
	* KOKKOS_COMPILER_CRAYC
	* KOKKOS_COMPILER_APPLECC
	* KOKKOS_COMPILER_CLANG
	* KOKKOS_COMPILER_PGI
	*
	* Macros for which compiler extension to use for atomics on intrinsice types
	*
	* KOKKOS_ENABLE_CUDA_ATOMICS
	* KOKKOS_ENABLE_GNU_ATOMICS
	* KOKKOS_ENABLE_INTEL_ATOMICS
	* KOKKOS_ENABLE_OPENMP_ATOMICS
	*
	- * A suite of 'KOKKOS_HAVE_PRAGMA_...' are defined for internal use.
	+ * A suite of 'KOKKOS_ENABLE_PRAGMA_...' are defined for internal use.
	*
	* Macros for marking functions to run in an execution space:
	*
	* KOKKOS_FUNCTION
	* KOKKOS_INLINE_FUNCTION request compiler to inline
	* KOKKOS_FORCEINLINE_FUNCTION force compiler to inline, use with care!
	*/

	//----------------------------------------------------------------------------

	#if defined( KOKKOS_ENABLE_CUDA ) && defined( __CUDACC__ )
	+ // Compiling with a CUDA compiler.
	+ //
	+ // Include <cuda.h> to pick up the CUDA_VERSION macro defined as:
	+ // CUDA_VERSION = ( MAJOR_VERSION * 1000 ) + ( MINOR_VERSION * 10 )
	+ //
	+ // When generating device code the __CUDA_ARCH__ macro is defined as:
	+ // __CUDA_ARCH__ = ( MAJOR_CAPABILITY * 100 ) + ( MINOR_CAPABILITY * 10 )
	+
	+ #include <cuda_runtime.h>
	+ #include <cuda.h>
	+
	+ #if !defined( CUDA_VERSION )
	+ #error "#include <cuda.h> did not define CUDA_VERSION."
	+ #endif

	-/* Compiling with a CUDA compiler.
	- *
	- * Include <cuda.h> to pick up the CUDA_VERSION macro defined as:
	- * CUDA_VERSION = ( MAJOR_VERSION * 1000 ) + ( MINOR_VERSION * 10 )
	- *
	- * When generating device code the __CUDA_ARCH__ macro is defined as:
	- * __CUDA_ARCH__ = ( MAJOR_CAPABILITY * 100 ) + ( MINOR_CAPABILITY * 10 )
	- */
	+ #if ( CUDA_VERSION < 7000 )
	+ // CUDA supports C++11 in device code starting with version 7.0.
	+ // This includes auto type and device code internal lambdas.
	+ #error "Cuda version 7.0 or greater required."
	+ #endif

	-#include <cuda_runtime.h>
	-#include <cuda.h>
	+ #if defined( __CUDA_ARCH__ ) && ( __CUDA_ARCH__ < 300 )
	+ // Compiling with CUDA compiler for device code.
	+ #error "Cuda device capability >= 3.0 is required."
	+ #endif

	-#if ! defined( CUDA_VERSION )
	-#error "#include <cuda.h> did not define CUDA_VERSION"
	-#endif
	+ #ifdef KOKKOS_ENABLE_CUDA_LAMBDA
	+ #if ( CUDA_VERSION < 7050 )
	+ // CUDA supports C++11 lambdas generated in host code to be given
	+ // to the device starting with version 7.5. But the release candidate (7.5.6)
	+ // still identifies as 7.0.
	+ #error "Cuda version 7.5 or greater required for host-to-device Lambda support."
	+ #endif

	-#if ( CUDA_VERSION < 7000 )
	-// CUDA supports C++11 in device code starting with
	-// version 7.0. This includes auto type and device code internal
	-// lambdas.
	-#error "Cuda version 7.0 or greater required"
	-#endif
	+ #if ( CUDA_VERSION < 8000 ) && defined( __NVCC__ )
	+ #define KOKKOS_LAMBDA [=]__device__
	+ #else
	+ #define KOKKOS_LAMBDA [=]__host__ __device__

	-#if defined( __CUDA_ARCH__ ) && ( __CUDA_ARCH__ < 300 )
	-/* Compiling with CUDA compiler for device code. */
	-#error "Cuda device capability >= 3.0 is required"
	-#endif
	+ #if defined( KOKKOS_ENABLE_CXX1Z )
	+ #define KOKKOS_CLASS_LAMBDA [=,*this] __host__ __device__
	+ #endif
	+ #endif

	-#ifdef KOKKOS_ENABLE_CUDA_LAMBDA
	-#if ( CUDA_VERSION < 7050 )
	- // CUDA supports C++11 lambdas generated in host code to be given
	- // to the device starting with version 7.5. But the release candidate (7.5.6)
	- // still identifies as 7.0
	- #error "Cuda version 7.5 or greater required for host-to-device Lambda support"
	-#endif
	-#if ( CUDA_VERSION < 8000 ) && defined(__NVCC__)
	- #define KOKKOS_LAMBDA [=]__device__
	-#else
	- #define KOKKOS_LAMBDA [=]__host__ __device__
	- #if defined( KOKKOS_ENABLE_CXX1Z )
	- #define KOKKOS_CLASS_LAMBDA [=,*this] __host__ __device__
	+ #define KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA 1
	#endif
	-#endif
	-#define KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA 1
	-#endif
	-#endif /* #if defined( KOKKOS_ENABLE_CUDA ) && defined( __CUDACC__ ) */
	+#endif // #if defined( KOKKOS_ENABLE_CUDA ) && defined( __CUDACC__ )

	-
	-#if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
	// Cuda version 8.0 still needs the functor wrapper
	- #if (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA /* && (CUDA_VERSION < 8000) */ ) && defined(__NVCC__)
	+ #if /* ( CUDA_VERSION < 8000 ) && */ defined( __NVCC__ )
	#define KOKKOS_IMPL_NEED_FUNCTOR_WRAPPER
	#endif
	#endif

	-/--------------------------------------------------------------------------/
	-/* Language info: C++, CUDA, OPENMP */
	+//----------------------------------------------------------------------------
	+// Language info: C++, CUDA, OPENMP

	#if defined( KOKKOS_ENABLE_CUDA )
	// Compiling Cuda code to 'ptx'

	#define KOKKOS_FORCEINLINE_FUNCTION __device__ __host__ __forceinline__
	#define KOKKOS_INLINE_FUNCTION __device__ __host__ inline
	#define KOKKOS_FUNCTION __device__ __host__
	-#endif /* #if defined( __CUDA_ARCH__ ) */
	+#endif // #if defined( __CUDA_ARCH__ )

	#if defined( _OPENMP )
	+ // Compiling with OpenMP.
	+ // The value of _OPENMP is an integer value YYYYMM
	+ // where YYYY and MM are the year and month designation
	+ // of the supported OpenMP API version.
	+#endif // #if defined( _OPENMP )

	- /* Compiling with OpenMP.
	- * The value of _OPENMP is an integer value YYYYMM
	- * where YYYY and MM are the year and month designation
	- * of the supported OpenMP API version.
	- */
	-
	-#endif /* #if defined( _OPENMP ) */
	-
	-/--------------------------------------------------------------------------/
	-/* Mapping compiler built-ins to KOKKOS_COMPILER_*** macros */
	+//----------------------------------------------------------------------------
	+// Mapping compiler built-ins to KOKKOS_COMPILER_*** macros

	#if defined( __NVCC__ )
	// NVIDIA compiler is being used.
	// Code is parsed and separated into host and device code.
	// Host code is compiled again with another compiler.
	// Device code is compile to 'ptx'.
	#define KOKKOS_COMPILER_NVCC __NVCC__
	-
	#else
	-#if ! defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
	- #if !defined (KOKKOS_ENABLE_CUDA) // Compiling with clang for Cuda does not work with LAMBDAs either
	- // CUDA (including version 6.5) does not support giving lambdas as
	- // arguments to global functions. Thus its not currently possible
	- // to dispatch lambdas from the host.
	- #define KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA 1
	+ #if !defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
	+ #if !defined( KOKKOS_ENABLE_CUDA ) // Compiling with clang for Cuda does not work with LAMBDAs either
	+ // CUDA (including version 6.5) does not support giving lambdas as
	+ // arguments to global functions. Thus its not currently possible
	+ // to dispatch lambdas from the host.
	+ #define KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA 1
	#endif
	#endif
	-#endif /* #if defined( __NVCC__ ) */
	+#endif // #if defined( __NVCC__ )

	-#if !defined (KOKKOS_LAMBDA)
	+#if !defined( KOKKOS_LAMBDA )
	#define KOKKOS_LAMBDA [=]
	#endif

	-#if defined( KOKKOS_ENABLE_CXX1Z ) && !defined (KOKKOS_CLASS_LAMBDA)
	+#if defined( KOKKOS_ENABLE_CXX1Z ) && !defined( KOKKOS_CLASS_LAMBDA )
	#define KOKKOS_CLASS_LAMBDA [=,*this]
	#endif

	-//#if ! defined( __CUDA_ARCH__ ) /* Not compiling Cuda code to 'ptx'. */
	+//#if !defined( __CUDA_ARCH__ ) // Not compiling Cuda code to 'ptx'.

	-/* Intel compiler for host code */
	+// Intel compiler for host code.

	#if defined( __INTEL_COMPILER )
	#define KOKKOS_COMPILER_INTEL __INTEL_COMPILER
	#elif defined( __ICC )
	// Old define
	#define KOKKOS_COMPILER_INTEL __ICC
	#elif defined( __ECC )
	// Very old define
	#define KOKKOS_COMPILER_INTEL __ECC
	#endif

	-/* CRAY compiler for host code */
	+// CRAY compiler for host code
	#if defined( _CRAYC )
	#define KOKKOS_COMPILER_CRAYC _CRAYC
	#endif

	#if defined( __IBMCPP__ )
	// IBM C++
	#define KOKKOS_COMPILER_IBM __IBMCPP__
	#elif defined( __IBMC__ )
	#define KOKKOS_COMPILER_IBM __IBMC__
	#endif

	#if defined( __APPLE_CC__ )
	#define KOKKOS_COMPILER_APPLECC __APPLE_CC__
	#endif

	-#if defined (__clang__) && !defined (KOKKOS_COMPILER_INTEL)
	+#if defined( __clang__ ) && !defined( KOKKOS_COMPILER_INTEL )
	#define KOKKOS_COMPILER_CLANG __clang_major__100+__clang_minor__10+__clang_patchlevel__
	#endif

	-#if ! defined( __clang__ ) && ! defined( KOKKOS_COMPILER_INTEL ) &&defined( __GNUC__ )
	+#if !defined( __clang__ ) && !defined( KOKKOS_COMPILER_INTEL ) &&defined( __GNUC__ )
	#define KOKKOS_COMPILER_GNU __GNUC__100+__GNUC_MINOR__10+__GNUC_PATCHLEVEL__
	+
	#if ( 472 > KOKKOS_COMPILER_GNU )
	#error "Compiling with GCC version earlier than 4.7.2 is not supported."
	#endif
	#endif

	-#if defined( __PGIC__ ) && ! defined( __GNUC__ )
	+#if defined( __PGIC__ ) && !defined( __GNUC__ )
	#define KOKKOS_COMPILER_PGI __PGIC__100+__PGIC_MINOR__10+__PGIC_PATCHLEVEL__
	+
	#if ( 1540 > KOKKOS_COMPILER_PGI )
	#error "Compiling with PGI version earlier than 15.4 is not supported."
	#endif
	#endif

	-//#endif /* #if ! defined( __CUDA_ARCH__ ) */
	+//#endif // #if !defined( __CUDA_ARCH__ )

	-/--------------------------------------------------------------------------/
	-/--------------------------------------------------------------------------/
	-/* Intel compiler macros */
	+//----------------------------------------------------------------------------
	+// Intel compiler macros

	#if defined( KOKKOS_COMPILER_INTEL )
	-
	#define KOKKOS_ENABLE_PRAGMA_UNROLL 1
	- #define KOKKOS_ENABLE_PRAGMA_IVDEP 1
	#define KOKKOS_ENABLE_PRAGMA_LOOPCOUNT 1
	#define KOKKOS_ENABLE_PRAGMA_VECTOR 1
	#define KOKKOS_ENABLE_PRAGMA_SIMD 1

	+ #if ( __INTEL_COMPILER > 1400 )
	+ #define KOKKOS_ENABLE_PRAGMA_IVDEP 1
	+ #endif
	+
	#define KOKKOS_RESTRICT __restrict__

	#ifndef KOKKOS_ALIGN
	- #define KOKKOS_ALIGN(size) __attribute__((aligned(size)))
	+ #define KOKKOS_ALIGN(size) __attribute__((aligned(size)))
	#endif

	#ifndef KOKKOS_ALIGN_PTR
	- #define KOKKOS_ALIGN_PTR(size) __attribute__((align_value(size)))
	+ #define KOKKOS_ALIGN_PTR(size) __attribute__((align_value(size)))
	#endif

	#ifndef KOKKOS_ALIGN_SIZE
	- #define KOKKOS_ALIGN_SIZE 64
	+ #define KOKKOS_ALIGN_SIZE 64
	#endif

	#if ( 1400 > KOKKOS_COMPILER_INTEL )
	#if ( 1300 > KOKKOS_COMPILER_INTEL )
	#error "Compiling with Intel version earlier than 13.0 is not supported. Official minimal version is 14.0."
	#else
	#warning "Compiling with Intel version 13.x probably works but is not officially supported. Official minimal version is 14.0."
	#endif
	#endif
	- #if ! defined( KOKKOS_ENABLE_ASM ) && ! defined( _WIN32 )
	+
	+ #if !defined( KOKKOS_ENABLE_ASM ) && !defined( _WIN32 )
	#define KOKKOS_ENABLE_ASM 1
	#endif

	- #if ! defined( KOKKOS_FORCEINLINE_FUNCTION )
	- #if !defined (_WIN32)
	+ #if !defined( KOKKOS_FORCEINLINE_FUNCTION )
	+ #if !defined( _WIN32 )
	#define KOKKOS_FORCEINLINE_FUNCTION inline __attribute__((always_inline))
	#else
	#define KOKKOS_FORCEINLINE_FUNCTION inline
	#endif
	#endif

	#if defined( __MIC__ )
	// Compiling for Xeon Phi
	#endif
	-
	#endif

	-/--------------------------------------------------------------------------/
	-/* Cray compiler macros */
	+//----------------------------------------------------------------------------
	+// Cray compiler macros

	#if defined( KOKKOS_COMPILER_CRAYC )
	-
	-
	#endif

	-/--------------------------------------------------------------------------/
	-/* IBM Compiler macros */
	+//----------------------------------------------------------------------------
	+// IBM Compiler macros

	#if defined( KOKKOS_COMPILER_IBM )
	-
	#define KOKKOS_ENABLE_PRAGMA_UNROLL 1
	//#define KOKKOS_ENABLE_PRAGMA_IVDEP 1
	//#define KOKKOS_ENABLE_PRAGMA_LOOPCOUNT 1
	//#define KOKKOS_ENABLE_PRAGMA_VECTOR 1
	//#define KOKKOS_ENABLE_PRAGMA_SIMD 1
	-
	#endif

	-/--------------------------------------------------------------------------/
	-/* CLANG compiler macros */
	+//----------------------------------------------------------------------------
	+// CLANG compiler macros

	#if defined( KOKKOS_COMPILER_CLANG )
	-
	//#define KOKKOS_ENABLE_PRAGMA_UNROLL 1
	//#define KOKKOS_ENABLE_PRAGMA_IVDEP 1
	//#define KOKKOS_ENABLE_PRAGMA_LOOPCOUNT 1
	//#define KOKKOS_ENABLE_PRAGMA_VECTOR 1
	//#define KOKKOS_ENABLE_PRAGMA_SIMD 1

	- #if ! defined( KOKKOS_FORCEINLINE_FUNCTION )
	+ #if !defined( KOKKOS_FORCEINLINE_FUNCTION )
	#define KOKKOS_FORCEINLINE_FUNCTION inline __attribute__((always_inline))
	#endif
	-
	#endif

	-/--------------------------------------------------------------------------/
	-/* GNU Compiler macros */
	+//----------------------------------------------------------------------------
	+// GNU Compiler macros

	#if defined( KOKKOS_COMPILER_GNU )
	-
	//#define KOKKOS_ENABLE_PRAGMA_UNROLL 1
	//#define KOKKOS_ENABLE_PRAGMA_IVDEP 1
	//#define KOKKOS_ENABLE_PRAGMA_LOOPCOUNT 1
	//#define KOKKOS_ENABLE_PRAGMA_VECTOR 1
	//#define KOKKOS_ENABLE_PRAGMA_SIMD 1

	- #if ! defined( KOKKOS_FORCEINLINE_FUNCTION )
	+ #if !defined( KOKKOS_FORCEINLINE_FUNCTION )
	#define KOKKOS_FORCEINLINE_FUNCTION inline __attribute__((always_inline))
	#endif

	- #if ! defined( KOKKOS_ENABLE_ASM ) && ! defined( __PGIC__ ) && \
	- ( defined( __amd64 ) \|\| \
	- defined( __amd64__ ) \|\| \
	- defined( __x86_64 ) \|\| \
	- defined( __x86_64__ ) )
	+ #if !defined( KOKKOS_ENABLE_ASM ) && !defined( __PGIC__ ) && \
	+ ( defined( __amd64 ) \|\| defined( __amd64__ ) \|\| \
	+ defined( __x86_64 ) \|\| defined( __x86_64__ ) )
	#define KOKKOS_ENABLE_ASM 1
	#endif
	-
	#endif

	-/--------------------------------------------------------------------------/
	+//----------------------------------------------------------------------------

	#if defined( KOKKOS_COMPILER_PGI )
	-
	#define KOKKOS_ENABLE_PRAGMA_UNROLL 1
	#define KOKKOS_ENABLE_PRAGMA_IVDEP 1
	//#define KOKKOS_ENABLE_PRAGMA_LOOPCOUNT 1
	#define KOKKOS_ENABLE_PRAGMA_VECTOR 1
	//#define KOKKOS_ENABLE_PRAGMA_SIMD 1
	-
	#endif

	-/--------------------------------------------------------------------------/
	+//----------------------------------------------------------------------------

	#if defined( KOKKOS_COMPILER_NVCC )
	-
	- #if defined(__CUDA_ARCH__ )
	+ #if defined( __CUDA_ARCH__ )
	#define KOKKOS_ENABLE_PRAGMA_UNROLL 1
	#endif
	-
	#endif

	//----------------------------------------------------------------------------
	-/** Define function marking macros if compiler specific macros are undefined: */
	+// Define function marking macros if compiler specific macros are undefined:

	-#if ! defined( KOKKOS_FORCEINLINE_FUNCTION )
	-#define KOKKOS_FORCEINLINE_FUNCTION inline
	+#if !defined( KOKKOS_FORCEINLINE_FUNCTION )
	+ #define KOKKOS_FORCEINLINE_FUNCTION inline
	#endif

	-#if ! defined( KOKKOS_INLINE_FUNCTION )
	-#define KOKKOS_INLINE_FUNCTION inline
	+#if !defined( KOKKOS_INLINE_FUNCTION )
	+ #define KOKKOS_INLINE_FUNCTION inline
	#endif

	-#if ! defined( KOKKOS_FUNCTION )
	-#define KOKKOS_FUNCTION /**/
	+#if !defined( KOKKOS_FUNCTION )
	+ #define KOKKOS_FUNCTION /**/
	#endif

	-
	//----------------------------------------------------------------------------
	-///** Define empty macro for restrict if necessary: */
	+// Define empty macro for restrict if necessary:

	-#if ! defined(KOKKOS_RESTRICT)
	-#define KOKKOS_RESTRICT
	+#if !defined( KOKKOS_RESTRICT )
	+ #define KOKKOS_RESTRICT
	#endif

	//----------------------------------------------------------------------------
	-/** Define Macro for alignment: */
	-#if ! defined KOKKOS_ALIGN_SIZE
	-#define KOKKOS_ALIGN_SIZE 16
	-#endif
	+// Define Macro for alignment:

	-#if ! defined(KOKKOS_ALIGN)
	-#define KOKKOS_ALIGN(size) __attribute__((aligned(size)))
	+#if !defined KOKKOS_ALIGN_SIZE
	+ #define KOKKOS_ALIGN_SIZE 16
	#endif

	-#if ! defined(KOKKOS_ALIGN_PTR)
	-#define KOKKOS_ALIGN_PTR(size) __attribute__((aligned(size)))
	+#if !defined( KOKKOS_ALIGN )
	+ #define KOKKOS_ALIGN(size) __attribute__((aligned(size)))
	#endif

	-//----------------------------------------------------------------------------
	-/** Determine the default execution space for parallel dispatch.
	- * There is zero or one default execution space specified.
	- */
	-
	-#if 1 < ( ( defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA ) ? 1 : 0 ) + \
	- ( defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP ) ? 1 : 0 ) + \
	- ( defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS ) ? 1 : 0 ) + \
	- ( defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL ) ? 1 : 0 ) )
	-
	-#error "More than one KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_* specified" ;
	-
	+#if !defined( KOKKOS_ALIGN_PTR )
	+ #define KOKKOS_ALIGN_PTR(size) __attribute__((aligned(size)))
	#endif

	-/** If default is not specified then chose from enabled execution spaces.
	- * Priority: CUDA, OPENMP, THREADS, SERIAL
	- */
	-#if defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA )
	-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
	-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
	-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
	-#elif defined ( KOKKOS_ENABLE_CUDA )
	-#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA
	-#elif defined ( KOKKOS_ENABLE_OPENMP )
	-#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP
	-#elif defined ( KOKKOS_ENABLE_PTHREAD )
	-#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS
	+//----------------------------------------------------------------------------
	+// Determine the default execution space for parallel dispatch.
	+// There is zero or one default execution space specified.
	+
	+#if 1 < ( ( defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA ) ? 1 : 0 ) + \
	+ ( defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP ) ? 1 : 0 ) + \
	+ ( defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS ) ? 1 : 0 ) + \
	+ ( defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS ) ? 1 : 0 ) + \
	+ ( defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL ) ? 1 : 0 ) )
	+ #error "More than one KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_* specified."
	+#endif
	+
	+// If default is not specified then chose from enabled execution spaces.
	+// Priority: CUDA, OPENMP, THREADS, QTHREADS, SERIAL
	+#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA )
	+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
	+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
	+//#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS )
	+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
	+#elif defined( KOKKOS_ENABLE_CUDA )
	+ #define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA
	+#elif defined( KOKKOS_ENABLE_OPENMP )
	+ #define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP
	+#elif defined( KOKKOS_ENABLE_PTHREAD )
	+ #define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS
	+//#elif defined( KOKKOS_ENABLE_QTHREADS )
	+// #define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS
	#else
	-#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL
	+ #define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL
	#endif

	//----------------------------------------------------------------------------
	-/** Determine for what space the code is being compiled: */
	+// Determine for what space the code is being compiled:

	-#if defined( __CUDACC__ ) && defined( __CUDA_ARCH__ ) && defined (KOKKOS_ENABLE_CUDA)
	-#define KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA
	+#if defined( __CUDACC__ ) && defined( __CUDA_ARCH__ ) && defined( KOKKOS_ENABLE_CUDA )
	+ #define KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA
	#else
	-#define KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
	+ #define KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
	#endif

	-//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	#if ( defined( _POSIX_C_SOURCE ) && _POSIX_C_SOURCE >= 200112L ) \|\| \
	( defined( _XOPEN_SOURCE ) && _XOPEN_SOURCE >= 600 )
	-#if defined(KOKKOS_ENABLE_PERFORMANCE_POSIX_MEMALIGN)
	-#define KOKKOS_ENABLE_POSIX_MEMALIGN 1
	-#endif
	+ #if defined( KOKKOS_ENABLE_PERFORMANCE_POSIX_MEMALIGN )
	+ #define KOKKOS_ENABLE_POSIX_MEMALIGN 1
	+ #endif
	#endif

	//----------------------------------------------------------------------------
	-//----------------------------------------------------------------------------
	-
	-/Enable Profiling by default/
	+// Enable Profiling by default

	#ifndef KOKKOS_ENABLE_PROFILING
	-#define KOKKOS_ENABLE_PROFILING 1
	+ #define KOKKOS_ENABLE_PROFILING 1
	#endif

	-//----------------------------------------------------------------------------
	-//----------------------------------------------------------------------------
	-
	-#endif /* #ifndef KOKKOS_MACROS_HPP */
	-
	+#endif // #ifndef KOKKOS_MACROS_HPP
	diff --git a/lib/kokkos/core/src/Kokkos_MemoryPool.hpp b/lib/kokkos/core/src/Kokkos_MemoryPool.hpp
	index 2d45926e7..eadad10b4 100644
	--- a/lib/kokkos/core/src/Kokkos_MemoryPool.hpp
	+++ b/lib/kokkos/core/src/Kokkos_MemoryPool.hpp
	@@ -1,1558 +1,1559 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_MEMORYPOOL_HPP
	#define KOKKOS_MEMORYPOOL_HPP

	#include <Kokkos_Core_fwd.hpp>
	#include <Kokkos_Parallel.hpp>
	#include <Kokkos_Atomic.hpp>
	#include <impl/Kokkos_BitOps.hpp>
	#include <impl/Kokkos_Error.hpp>
	#include <impl/Kokkos_SharedAlloc.hpp>

	#include <limits>
	#include <algorithm>
	#include <chrono>

	// How should errors be handled? In general, production code should return a
	// value indicating failure so the user can decide how the error is handled.
	// While experimental, code can abort instead. If KOKKOS_ENABLE_MEMPOOL_PRINTERR is
	// defined, the code will abort with an error message. Otherwise, the code will
	// return with a value indicating failure when possible, or do nothing instead.
	//#define KOKKOS_ENABLE_MEMPOOL_PRINTERR

	//#define KOKKOS_ENABLE_MEMPOOL_PRINT_INFO
	//#define KOKKOS_ENABLE_MEMPOOL_PRINT_CONSTRUCTOR_INFO
	//#define KOKKOS_ENABLE_MEMPOOL_PRINT_BLOCKSIZE_INFO
	//#define KOKKOS_ENABLE_MEMPOOL_PRINT_SUPERBLOCK_INFO
	//#define KOKKOS_ENABLE_MEMPOOL_PRINT_ACTIVE_SUPERBLOCKS
	//#define KOKKOS_ENABLE_MEMPOOL_PRINT_PAGE_INFO
	//#define KOKKOS_ENABLE_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO

	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Experimental {

	namespace MempoolImpl {

	template < typename T, typename ExecutionSpace >
	struct initialize_array {
	typedef ExecutionSpace execution_space;
	typedef typename ExecutionSpace::size_type size_type;

	T * m_data;
	T m_value;

	initialize_array( T * d, size_t size, T v ) : m_data( d ), m_value( v )
	{
	Kokkos::parallel_for( size, *this );

	execution_space::fence();
	}

	KOKKOS_INLINE_FUNCTION
	void operator()( size_type i ) const { m_data[i] = m_value; }
	};

	template <typename Bitset>
	struct bitset_count
	{
	typedef typename Bitset::execution_space execution_space;
	typedef typename execution_space::size_type size_type;
	typedef typename Bitset::size_type value_type;
	typedef typename Bitset::word_type word_type;

	word_type * m_words;
	value_type & m_result;

	bitset_count( word_type * w, value_type num_words, value_type & r )
	: m_words( w ), m_result( r )
	{
	parallel_reduce( num_words, *this, m_result );
	}

	KOKKOS_INLINE_FUNCTION
	void init( value_type & v ) const
	{ v = 0; }

	KOKKOS_INLINE_FUNCTION
	void join( volatile value_type & dst, volatile value_type const & src ) const
	{ dst += src; }

	KOKKOS_INLINE_FUNCTION
	void operator()( size_type i, value_type & count ) const
	{
	count += Kokkos::Impl::bit_count( m_words[i] );
	}
	};

	template < typename Device >
	class Bitset {
	public:
	typedef typename Device::execution_space execution_space;
	typedef typename Device::memory_space memory_space;
	typedef unsigned word_type;
	typedef unsigned size_type;

	typedef Kokkos::Impl::DeepCopy< memory_space, Kokkos::HostSpace > raw_deep_copy;

	// Define some constants.
	enum {
	// Size of bitset word. Should be 32.
	WORD_SIZE = sizeof(word_type) * CHAR_BIT,
	LG_WORD_SIZE = Kokkos::Impl::integral_power_of_two( WORD_SIZE ),
	WORD_MASK = WORD_SIZE - 1
	};

	private:
	word_type * m_words;
	size_type m_size;
	size_type m_num_words;
	word_type m_last_word_mask;

	public:
	~Bitset() = default;
	Bitset() = default;
	Bitset( Bitset && ) = default;
	Bitset( const Bitset & ) = default;
	Bitset & operator = ( Bitset && ) = default;
	Bitset & operator = ( const Bitset & ) = default;

	void init( void * w, size_type s )
	{
	// Assumption: The size of the memory pointed to by w is a multiple of
	// sizeof(word_type).

	m_words = reinterpret_cast<word_type*>( w );
	m_size = s;
	m_num_words = ( s + WORD_SIZE - 1 ) >> LG_WORD_SIZE;
	m_last_word_mask = m_size & WORD_MASK ? ( word_type(1) << ( m_size & WORD_MASK ) ) - 1 : 0;

	reset();
	}

	size_type size() const { return m_size; }

	size_type count() const
	{
	size_type val = 0;
	bitset_count< Bitset > bc( m_words, m_num_words, val );
	return val;
	}

	void set()
	{
	// Set all the bits.
	initialize_array< word_type, execution_space > ia( m_words, m_num_words, ~word_type(0) );

	if ( m_last_word_mask ) {
	// Clear the unused bits in the last block.
	raw_deep_copy( m_words + ( m_num_words - 1 ), &m_last_word_mask, sizeof(word_type) );
	}
	}

	void reset()
	{
	initialize_array< word_type, execution_space > ia( m_words, m_num_words, word_type(0) );
	}

	KOKKOS_FORCEINLINE_FUNCTION
	bool test( size_type i ) const
	{
	size_type word_pos = i >> LG_WORD_SIZE;
	word_type word = volatile_load( &m_words[ word_pos ] );
	word_type mask = word_type(1) << ( i & WORD_MASK );

	return word & mask;
	}

	KOKKOS_FORCEINLINE_FUNCTION
	bool set( size_type i ) const
	{
	size_type word_pos = i >> LG_WORD_SIZE;
	word_type mask = word_type(1) << ( i & WORD_MASK );

	return !( atomic_fetch_or( &m_words[ word_pos ], mask ) & mask );
	}

	KOKKOS_FORCEINLINE_FUNCTION
	bool reset( size_type i ) const
	{
	size_type word_pos = i >> LG_WORD_SIZE;
	word_type mask = word_type(1) << ( i & WORD_MASK );

	return atomic_fetch_and( &m_words[ word_pos ], ~mask ) & mask;
	}

	KOKKOS_FORCEINLINE_FUNCTION
	Kokkos::pair< bool, word_type >
	fetch_word_set( size_type i ) const
	{
	size_type word_pos = i >> LG_WORD_SIZE;
	word_type mask = word_type(1) << ( i & WORD_MASK );

	Kokkos::pair<bool, word_type> result;
	result.second = atomic_fetch_or( &m_words[ word_pos ], mask );
	result.first = !( result.second & mask );

	return result;
	}

	KOKKOS_FORCEINLINE_FUNCTION
	Kokkos::pair< bool, word_type >
	fetch_word_reset( size_type i ) const
	{
	size_type word_pos = i >> LG_WORD_SIZE;
	word_type mask = word_type(1) << ( i & WORD_MASK );

	Kokkos::pair<bool, word_type> result;
	result.second = atomic_fetch_and( &m_words[ word_pos ], ~mask );
	result.first = result.second & mask;

	return result;
	}

	KOKKOS_FORCEINLINE_FUNCTION
	Kokkos::pair< bool, word_type >
	set_any_in_word( size_type & pos ) const
	{
	size_type word_pos = pos >> LG_WORD_SIZE;
	word_type word = volatile_load( &m_words[ word_pos ] );

	// Loop until there are no more unset bits in the word.
	while ( ~word ) {
	// Find the first unset bit in the word.
	size_type bit = Kokkos::Impl::bit_scan_forward( ~word );

	// Try to set the bit.
	word_type mask = word_type(1) << bit;
	word = atomic_fetch_or( &m_words[ word_pos ], mask );

	if ( !( word & mask ) ) {
	// Successfully set the bit.
	pos = ( word_pos << LG_WORD_SIZE ) + bit;

	return Kokkos::pair<bool, word_type>( true, word );
	}
	}

	// Didn't find a free bit in this word.
	return Kokkos::pair<bool, word_type>( false, word_type(0) );
	}

	KOKKOS_FORCEINLINE_FUNCTION
	Kokkos::pair< bool, word_type >
	set_any_in_word( size_type & pos, word_type word_mask ) const
	{
	size_type word_pos = pos >> LG_WORD_SIZE;
	word_type word = volatile_load( &m_words[ word_pos ] );
	word = ( ~word ) & word_mask;

	// Loop until there are no more unset bits in the word.
	while ( word ) {
	// Find the first unset bit in the word.
	size_type bit = Kokkos::Impl::bit_scan_forward( word );

	// Try to set the bit.
	word_type mask = word_type(1) << bit;
	word = atomic_fetch_or( &m_words[ word_pos ], mask );

	if ( !( word & mask ) ) {
	// Successfully set the bit.
	pos = ( word_pos << LG_WORD_SIZE ) + bit;

	return Kokkos::pair<bool, word_type>( true, word );
	}

	word = ( ~word ) & word_mask;
	}

	// Didn't find a free bit in this word.
	return Kokkos::pair<bool, word_type>( false, word_type(0) );
	}

	KOKKOS_FORCEINLINE_FUNCTION
	Kokkos::pair< bool, word_type >
	reset_any_in_word( size_type & pos ) const
	{
	size_type word_pos = pos >> LG_WORD_SIZE;
	word_type word = volatile_load( &m_words[ word_pos ] );

	// Loop until there are no more set bits in the word.
	while ( word ) {
	// Find the first unset bit in the word.
	size_type bit = Kokkos::Impl::bit_scan_forward( word );

	// Try to reset the bit.
	word_type mask = word_type(1) << bit;
	word = atomic_fetch_and( &m_words[ word_pos ], ~mask );

	if ( word & mask ) {
	// Successfully reset the bit.
	pos = ( word_pos << LG_WORD_SIZE ) + bit;

	return Kokkos::pair<bool, word_type>( true, word );
	}
	}

	// Didn't find a free bit in this word.
	return Kokkos::pair<bool, word_type>( false, word_type(0) );
	}

	KOKKOS_FORCEINLINE_FUNCTION
	Kokkos::pair< bool, word_type >
	reset_any_in_word( size_type & pos, word_type word_mask ) const
	{
	size_type word_pos = pos >> LG_WORD_SIZE;
	word_type word = volatile_load( &m_words[ word_pos ] );
	word = word & word_mask;

	// Loop until there are no more set bits in the word.
	while ( word ) {
	// Find the first unset bit in the word.
	size_type bit = Kokkos::Impl::bit_scan_forward( word );

	// Try to reset the bit.
	word_type mask = word_type(1) << bit;
	word = atomic_fetch_and( &m_words[ word_pos ], ~mask );

	if ( word & mask ) {
	// Successfully reset the bit.
	pos = ( word_pos << LG_WORD_SIZE ) + bit;

	return Kokkos::pair<bool, word_type>( true, word );
	}

	word = word & word_mask;
	}

	// Didn't find a free bit in this word.
	return Kokkos::pair<bool, word_type>( false, word_type(0) );
	}
	};

	template < typename UInt32View, typename BSHeaderView, typename SBHeaderView,
	typename MempoolBitset >
	struct create_histogram {
	typedef typename UInt32View::execution_space execution_space;
	typedef typename execution_space::size_type size_type;
	typedef Kokkos::pair< double, uint32_t > value_type;

	size_t m_start;
	UInt32View m_page_histogram;
	BSHeaderView m_blocksize_info;
	SBHeaderView m_sb_header;
	MempoolBitset m_sb_blocks;
	size_t m_lg_max_sb_blocks;
	uint32_t m_lg_min_block_size;
	uint32_t m_blocks_per_page;
	value_type & m_result;

	create_histogram( size_t start, size_t end, UInt32View ph, BSHeaderView bsi,
	SBHeaderView sbh, MempoolBitset sbb, size_t lmsb,
	uint32_t lmbs, uint32_t bpp, value_type & r )
	: m_start( start ), m_page_histogram( ph ), m_blocksize_info( bsi ),
	m_sb_header( sbh ), m_sb_blocks( sbb ), m_lg_max_sb_blocks( lmsb ),
	m_lg_min_block_size( lmbs ), m_blocks_per_page( bpp ), m_result( r )
	{
	Kokkos::parallel_reduce( end - start, *this, m_result );

	execution_space::fence();
	}

	KOKKOS_INLINE_FUNCTION
	void init( value_type & v ) const
	{
	v.first = 0.0;
	v.second = 0;
	}

	KOKKOS_INLINE_FUNCTION
	void join( volatile value_type & dst, volatile value_type const & src ) const
	{
	dst.first += src.first;
	dst.second += src.second;
	}

	KOKKOS_INLINE_FUNCTION
	void operator()( size_type i, value_type & r ) const
	{
	size_type i2 = i + m_start;

	uint32_t lg_block_size = m_sb_header(i2).m_lg_block_size;

	// A superblock only has a block size of 0 when it is empty.
	if ( lg_block_size != 0 ) {
	uint32_t block_size_id = lg_block_size - m_lg_min_block_size;
	uint32_t blocks_per_sb = m_blocksize_info[block_size_id].m_blocks_per_sb;
	uint32_t pages_per_sb = m_blocksize_info[block_size_id].m_pages_per_sb;

	uint32_t total_allocated_blocks = 0;

	for ( uint32_t j = 0; j < pages_per_sb; ++j ) {
	unsigned start_pos = ( i2 << m_lg_max_sb_blocks ) + j * m_blocks_per_page;
	unsigned end_pos = start_pos + m_blocks_per_page;
	uint32_t page_allocated_blocks = 0;

	for ( unsigned k = start_pos; k < end_pos; ++k ) {
	page_allocated_blocks += m_sb_blocks.test( k );
	}

	total_allocated_blocks += page_allocated_blocks;

	atomic_increment( &m_page_histogram(page_allocated_blocks) );
	}

	r.first += double(total_allocated_blocks) / blocks_per_sb;
	r.second += blocks_per_sb;
	}
	}
	};

	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_SUPERBLOCK_INFO
	template < typename UInt32View, typename SBHeaderView, typename MempoolBitset >
	struct count_allocated_blocks {
	typedef typename UInt32View::execution_space execution_space;
	typedef typename execution_space::size_type size_type;

	UInt32View m_num_allocated_blocks;
	SBHeaderView m_sb_header;
	MempoolBitset m_sb_blocks;
	size_t m_sb_size;
	size_t m_lg_max_sb_blocks;

	count_allocated_blocks( size_t num_sb, UInt32View nab, SBHeaderView sbh,
	MempoolBitset sbb, size_t sbs, size_t lmsb )
	: m_num_allocated_blocks( nab ), m_sb_header( sbh ),
	m_sb_blocks( sbb ), m_sb_size( sbs ), m_lg_max_sb_blocks( lmsb )
	{
	Kokkos::parallel_for( num_sb, *this );

	execution_space::fence();
	}

	KOKKOS_INLINE_FUNCTION
	void operator()( size_type i ) const
	{
	uint32_t lg_block_size = m_sb_header(i).m_lg_block_size;

	// A superblock only has a block size of 0 when it is empty.
	if ( lg_block_size != 0 ) {
	// Count the allocated blocks in the superblock.
	uint32_t blocks_per_sb = lg_block_size > 0 ? m_sb_size >> lg_block_size : 0;
	unsigned start_pos = i << m_lg_max_sb_blocks;
	unsigned end_pos = start_pos + blocks_per_sb;
	uint32_t count = 0;

	for ( unsigned j = start_pos; j < end_pos; ++j ) {
	count += m_sb_blocks.test( j );
	}

	m_num_allocated_blocks(i) = count;
	}
	}
	};
	#endif

	}

	/// \class MemoryPool
	/// \brief Bitset based memory manager for pools of same-sized chunks of memory.
	/// \tparam Device Kokkos device that gives the execution and memory space the
	/// allocator will be used in.
	///
	/// MemoryPool is a memory space that can be on host or device. It provides a
	/// pool memory allocator for fast allocation of same-sized chunks of memory.
	/// The memory is only accessible on the host / device this allocator is
	/// associated with.
	///
	/// This allocator is based on ideas from the following GPU allocators:
	/// Halloc (https://github.com/canonizer/halloc).
	/// ScatterAlloc (https://github.com/ComputationalRadiationPhysics/scatteralloc)
	template < typename Device >
	class MemoryPool {
	private:
	// The allocator uses superblocks. A superblock is divided into pages, and a
	// page is divided into blocks. A block is the chunk of memory that is given
	// out by the allocator. A page always has a number of blocks equal to the
	// size of the word used by the bitset. Thus, the pagesize can vary between
	// superblocks as it is based on the block size of the superblock. The
	// allocator supports all powers of 2 from MIN_BLOCK_SIZE to the size of a
	// superblock as block sizes.

	// Superblocks are divided into 4 categories:
	// 1. empty - is completely empty; there are no active allocations
	// 2. partfull - partially full; there are some active allocations
	// 3. full - full enough with active allocations that new allocations
	// will likely fail
	// 4. active - is currently the active superblock for a block size
	//
	// An inactive superblock is one that is empty, partfull, or full.
	//
	// New allocations occur only from an active superblock. If a superblock is
	// made inactive after an allocation request is made to it but before the
	// allocation request is fulfilled, the allocation will still be attempted
	// from that superblock. Deallocations can occur to partfull, full, or
	// active superblocks. Superblocks move between categories as allocations
	// and deallocations happen. Superblocks all start empty.
	//
	// Here are the possible moves between categories:
	// empty -> active During allocation, there is no active superblock
	// or the active superblock is full.
	// active -> full During allocation, the full threshold of the
	// superblock is reached when increasing the fill
	// level.
	// full -> partfull During deallocation, the full threshold of the
	// superblock is crossed when decreasing the fill
	// level.
	// partfull -> empty Deallocation of the last allocated block of an
	// inactive superblock.
	// partfull -> active During allocation, the active superblock is full.
	//
	// When a new active superblock is needed, partfull superblocks of the same
	// block size are chosen over empty superblocks.
	//
	// The empty and partfull superblocks are tracked using bitsets that represent
	// the superblocks in those repsective categories. Empty superblocks use a
	// single bitset, while partfull superblocks use a bitset per block size
	// (contained sequentially in a single bitset). Active superblocks are
	// tracked by the active superblocks array. Full superblocks aren't tracked
	// at all.

	typedef typename Device::execution_space execution_space;
	typedef typename Device::memory_space backend_memory_space;
	typedef Device device_type;
	typedef MempoolImpl::Bitset< device_type > MempoolBitset;

	// Define some constants.
	enum {
	MIN_BLOCK_SIZE = 64,
	LG_MIN_BLOCK_SIZE = Kokkos::Impl::integral_power_of_two( MIN_BLOCK_SIZE ),
	MAX_BLOCK_SIZES = 31 - LG_MIN_BLOCK_SIZE + 1,

	// Size of bitset word.
	BLOCKS_PER_PAGE = MempoolBitset::WORD_SIZE,
	LG_BLOCKS_PER_PAGE = MempoolBitset::LG_WORD_SIZE,

	INVALID_SUPERBLOCK = ~uint32_t(0),
	SUPERBLOCK_LOCK = ~uint32_t(0) - 1,

	MAX_TRIES = 32 // Cap on the number of pages searched
	// before an allocation returns empty.
	};

	public:
	// Stores information about each superblock.
	struct SuperblockHeader {
	uint32_t m_full_pages;
	uint32_t m_empty_pages;
	uint32_t m_lg_block_size;
	uint32_t m_is_active;

	KOKKOS_FUNCTION
	SuperblockHeader() :
	m_full_pages(0), m_empty_pages(0), m_lg_block_size(0), m_is_active(false) {}
	};

	// Stores information about each block size.
	struct BlockSizeHeader {
	uint32_t m_blocks_per_sb;
	uint32_t m_pages_per_sb;
	uint32_t m_sb_full_level;
	uint32_t m_page_full_level;

	KOKKOS_FUNCTION
	BlockSizeHeader() :
	m_blocks_per_sb(0), m_pages_per_sb(0), m_sb_full_level(0), m_page_full_level(0) {}
	};

	private:
	typedef Kokkos::Impl::SharedAllocationTracker Tracker;
	typedef View< uint32_t *, device_type > UInt32View;
	typedef View< SuperblockHeader *, device_type > SBHeaderView;

	// The letters 'sb' used in any variable name mean superblock.

	size_t m_lg_sb_size; // Log2 of superblock size.
	size_t m_sb_size; // Superblock size.
	size_t m_lg_max_sb_blocks; // Log2 of the number of blocks of the
	// minimum block size in a superblock.
	size_t m_num_sb; // Number of superblocks.
	size_t m_ceil_num_sb; // Number of superblocks rounded up to the smallest
	// multiple of the bitset word size. Used by
	// bitsets representing superblock categories to
	// ensure different block sizes never share a word
	// in the bitset.
	size_t m_num_block_size; // Number of block sizes supported.
	size_t m_data_size; // Amount of memory available to the allocator.
	size_t m_sb_blocks_size; // Amount of memory for free / empty blocks bitset.
	size_t m_empty_sb_size; // Amount of memory for empty superblocks bitset.
	size_t m_partfull_sb_size; // Amount of memory for partfull superblocks bitset.
	size_t m_total_size; // Total amount of memory allocated.
	char * m_data; // Beginning device memory location used for
	// superblocks.
	UInt32View m_active; // Active superblocks IDs.
	SBHeaderView m_sb_header; // Header info for superblocks.
	MempoolBitset m_sb_blocks; // Bitsets representing free / allocated status
	// of blocks in superblocks.
	MempoolBitset m_empty_sb; // Bitset representing empty superblocks.
	MempoolBitset m_partfull_sb; // Bitsets representing partially full superblocks.
	Tracker m_track; // Tracker for superblock memory.
	BlockSizeHeader m_blocksize_info[MAX_BLOCK_SIZES]; // Header info for block sizes.

	// There were several methods tried for storing the block size header info: in a View,
	// in a View of const data, and in a RandomAccess View. All of these were slower than
	// storing it in a static array that is a member variable to the class. In the latter
	// case, the block size info gets copied into the constant memory on the GPU along with
	// the class when it is copied there for exeucting a parallel loop. Instead of storing
	// the values, computing the values every time they were needed was also tried. This
	// method was slightly slower than storing them in the static array.

	public:
	//! Tag this class as a kokkos memory space
	typedef MemoryPool memory_space;

	~MemoryPool() = default;
	MemoryPool() = default;
	MemoryPool( MemoryPool && ) = default;
	MemoryPool( const MemoryPool & ) = default;
	MemoryPool & operator = ( MemoryPool && ) = default;
	MemoryPool & operator = ( const MemoryPool & ) = default;

	/// \brief Initializes the memory pool.
	/// \param memspace The memory space from which the memory pool will allocate memory.
	/// \param total_size The requested memory amount controlled by the allocator. The
	/// actual amount is rounded up to the smallest multiple of the
	/// superblock size >= the requested size.
	/// \param log2_superblock_size Log2 of the size of superblocks used by the allocator.
	/// In most use cases, the default value should work.
	inline
	MemoryPool( const backend_memory_space & memspace,
	size_t total_size, size_t log2_superblock_size = 20 )
	: m_lg_sb_size( log2_superblock_size ),
	m_sb_size( size_t(1) << m_lg_sb_size ),
	m_lg_max_sb_blocks( m_lg_sb_size - LG_MIN_BLOCK_SIZE ),
	m_num_sb( ( total_size + m_sb_size - 1 ) >> m_lg_sb_size ),
	m_ceil_num_sb( ( ( m_num_sb + BLOCKS_PER_PAGE - 1 ) >> LG_BLOCKS_PER_PAGE ) <<
	LG_BLOCKS_PER_PAGE ),
	m_num_block_size( m_lg_sb_size - LG_MIN_BLOCK_SIZE + 1 ),
	m_data_size( m_num_sb * m_sb_size ),
	m_sb_blocks_size( ( m_num_sb << m_lg_max_sb_blocks ) / CHAR_BIT ),
	m_empty_sb_size( m_ceil_num_sb / CHAR_BIT ),
	m_partfull_sb_size( m_ceil_num_sb * m_num_block_size / CHAR_BIT ),
	m_total_size( m_data_size + m_sb_blocks_size + m_empty_sb_size + m_partfull_sb_size ),
	m_data(0),
	m_active( "Active superblocks" ),
	m_sb_header( "Superblock headers" ),
	m_track()
	{
	// Assumption. The minimum block size must be a power of 2.
	static_assert( Kokkos::Impl::is_integral_power_of_two( MIN_BLOCK_SIZE ), "" );

	// Assumption. Require a superblock be large enough so it takes at least 1
	// whole bitset word to represent it using the minimum blocksize.
	if ( m_sb_size < MIN_BLOCK_SIZE * BLOCKS_PER_PAGE ) {
	printf( "\n MemoryPool::MemoryPool() Superblock size must be >= %u \n",
	MIN_BLOCK_SIZE * BLOCKS_PER_PAGE );
	#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
	fflush( stdout );
	#endif
	Kokkos::abort( "" );
	}

	// Assumption. A superblock's size can be at most 2^31. Verify this.
	if ( m_lg_sb_size > 31 ) {
	printf( "\n MemoryPool::MemoryPool() Superblock size must be < %u \n",
	( uint32_t(1) << 31 ) );
	#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
	fflush( stdout );
	#endif
	Kokkos::abort( "" );
	}

	// Assumption. The Bitset only uses unsigned for size types which limits
	// the amount of memory the allocator can manage. Verify the memory size
	// is below this limit.
	if ( m_data_size > size_t(MIN_BLOCK_SIZE) * std::numeric_limits<unsigned>::max() ) {
	printf( "\n MemoryPool::MemoryPool() Allocator can only manage %lu bytes of memory; requested %lu \n",
	size_t(MIN_BLOCK_SIZE) * std::numeric_limits<unsigned>::max(), total_size );
	#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
	fflush( stdout );
	#endif
	Kokkos::abort( "" );
	}

	// Allocate memory for Views. This is done here instead of at construction
	// so that the runtime checks can be performed before allocating memory.
	resize( m_active, m_num_block_size );
	resize( m_sb_header, m_num_sb );

	// Allocate superblock memory.
	typedef Kokkos::Impl::SharedAllocationRecord< backend_memory_space, void > SharedRecord;
	SharedRecord * rec =
	SharedRecord::allocate( memspace, "mempool", m_total_size );

	m_track.assign_allocated_record_to_uninitialized( rec );
	m_data = reinterpret_cast<char *>( rec->data() );

	// Set and initialize the free / empty block bitset memory.
	m_sb_blocks.init( m_data + m_data_size, m_num_sb << m_lg_max_sb_blocks );

	// Set and initialize the empty superblock block bitset memory.
	m_empty_sb.init( m_data + m_data_size + m_sb_blocks_size, m_num_sb );

	// Start with all superblocks in the empty category.
	m_empty_sb.set();

	// Set and initialize the partfull superblock block bitset memory.
	m_partfull_sb.init( m_data + m_data_size + m_sb_blocks_size + m_empty_sb_size,
	m_ceil_num_sb * m_num_block_size );

	// Initialize all active superblocks to be invalid.
	typename UInt32View::HostMirror host_active = create_mirror_view( m_active );
	for ( size_t i = 0; i < m_num_block_size; ++i ) host_active(i) = INVALID_SUPERBLOCK;
	deep_copy( m_active, host_active );

	// A superblock is considered full when this percentage of its pages are full.
	const double superblock_full_fraction = .8;

	// A page is considered full when this percentage of its blocks are full.
	const double page_full_fraction = .875;

	// Initialize the blocksize info.
	for ( size_t i = 0; i < m_num_block_size; ++i ) {
	uint32_t lg_block_size = i + LG_MIN_BLOCK_SIZE;
	uint32_t blocks_per_sb = m_sb_size >> lg_block_size;
	uint32_t pages_per_sb = ( blocks_per_sb + BLOCKS_PER_PAGE - 1 ) >> LG_BLOCKS_PER_PAGE;

	m_blocksize_info[i].m_blocks_per_sb = blocks_per_sb;
	m_blocksize_info[i].m_pages_per_sb = pages_per_sb;

	// Set the full level for the superblock.
	m_blocksize_info[i].m_sb_full_level =
	static_cast<uint32_t>( pages_per_sb * superblock_full_fraction );

	if ( m_blocksize_info[i].m_sb_full_level == 0 ) {
	m_blocksize_info[i].m_sb_full_level = 1;
	}

	// Set the full level for the page.
	uint32_t blocks_per_page =
	blocks_per_sb < BLOCKS_PER_PAGE ? blocks_per_sb : BLOCKS_PER_PAGE;

	m_blocksize_info[i].m_page_full_level =
	static_cast<uint32_t>( blocks_per_page * page_full_fraction );

	if ( m_blocksize_info[i].m_page_full_level == 0 ) {
	m_blocksize_info[i].m_page_full_level = 1;
	}
	}

	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_CONSTRUCTOR_INFO
	printf( "\n" );
	printf( " m_lg_sb_size: %12lu\n", m_lg_sb_size );
	printf( " m_sb_size: %12lu\n", m_sb_size );
	printf( " m_max_sb_blocks: %12lu\n", size_t(1) << m_lg_max_sb_blocks );
	printf( "m_lg_max_sb_blocks: %12lu\n", m_lg_max_sb_blocks );
	printf( " m_num_sb: %12lu\n", m_num_sb );
	printf( " m_ceil_num_sb: %12lu\n", m_ceil_num_sb );
	printf( " m_num_block_size: %12lu\n", m_num_block_size );
	printf( " data bytes: %12lu\n", m_data_size );
	printf( " sb_blocks bytes: %12lu\n", m_sb_blocks_size );
	printf( " empty_sb bytes: %12lu\n", m_empty_sb_size );
	printf( " partfull_sb bytes: %12lu\n", m_partfull_sb_size );
	printf( " total bytes: %12lu\n", m_total_size );
	printf( " m_empty_sb size: %12u\n", m_empty_sb.size() );
	printf( "m_partfull_sb size: %12u\n", m_partfull_sb.size() );
	printf( "\n" );
	fflush( stdout );
	#endif

	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_BLOCKSIZE_INFO
	// Print the blocksize info for all the block sizes.
	printf( "SIZE BLOCKS_PER_SB PAGES_PER_SB SB_FULL_LEVEL PAGE_FULL_LEVEL\n" );
	for ( size_t i = 0; i < m_num_block_size; ++i ) {
	printf( "%4zu %13u %12u %13u %15u\n", i + LG_MIN_BLOCK_SIZE,
	m_blocksize_info[i].m_blocks_per_sb, m_blocksize_info[i].m_pages_per_sb,
	m_blocksize_info[i].m_sb_full_level, m_blocksize_info[i].m_page_full_level );
	}
	printf( "\n" );
	#endif
	}

	/// \brief The actual block size allocated given alloc_size.
	KOKKOS_INLINE_FUNCTION
	size_t allocate_block_size( const size_t alloc_size ) const
	{ return size_t(1) << ( get_block_size_index( alloc_size ) + LG_MIN_BLOCK_SIZE ); }

	/// \brief Allocate a chunk of memory.
	/// \param alloc_size Size of the requested allocated in number of bytes.
	///
	/// The function returns a void pointer to a memory location on success and
	/// NULL on failure.
	KOKKOS_FUNCTION
	void * allocate( size_t alloc_size ) const
	{
	void * p = 0;

	// Only support allocations up to the superblock size. Just return 0
	// (failed allocation) for any size above this.
	if ( alloc_size <= m_sb_size )
	{
	int block_size_id = get_block_size_index( alloc_size );
	uint32_t blocks_per_sb = m_blocksize_info[block_size_id].m_blocks_per_sb;
	uint32_t pages_per_sb = m_blocksize_info[block_size_id].m_pages_per_sb;

	#ifdef KOKKOS_IMPL_CUDA_CLANG_WORKAROUND
	// Without this test it looks like pages_per_sb might come back wrong.
	if ( pages_per_sb == 0 ) return NULL;
	#endif

	unsigned word_size = blocks_per_sb > 32 ? 32 : blocks_per_sb;
	unsigned word_mask = ( uint64_t(1) << word_size ) - 1;

	// Instead of forcing an atomic read to guarantee the updated value,
	// reading the old value is actually beneficial because more threads will
	// attempt allocations on the old active superblock instead of waiting on
	// the new active superblock. This will help hide the latency of
	// switching the active superblock.
	uint32_t sb_id = volatile_load( &m_active(block_size_id) );

	// If the active is locked, keep reading it atomically until the lock is
	// released.
	while ( sb_id == SUPERBLOCK_LOCK ) {
	sb_id = atomic_fetch_or( &m_active(block_size_id), uint32_t(0) );
	}

	load_fence();

	bool allocation_done = false;

	while ( !allocation_done ) {
	bool need_new_sb = false;

	if ( sb_id != INVALID_SUPERBLOCK ) {
	// Use the value from the clock register as the hash value.
	uint64_t hash_val = get_clock_register();

	// Get the starting position for this superblock's bits in the bitset.
	uint32_t pos_base = sb_id << m_lg_max_sb_blocks;

	// Mod the hash value to choose a page in the superblock. The
	// initial block searched is the first block of that page.
	uint32_t pos_rel = uint32_t( hash_val & ( pages_per_sb - 1 ) ) << LG_BLOCKS_PER_PAGE;

	// Get the absolute starting position for this superblock's bits in the bitset.
	uint32_t pos = pos_base + pos_rel;

	// Keep track of the number of pages searched. Pages in the superblock are
	// searched linearly from the starting page. All pages in the superblock are
	// searched until either a location is found, or it is proven empty.
	uint32_t pages_searched = 0;

	bool search_done = false;

	while ( !search_done ) {
	bool success = false;
	unsigned prev_val = 0;

	Kokkos::tie( success, prev_val ) = m_sb_blocks.set_any_in_word( pos, word_mask );

	if ( !success ) {
	if ( ++pages_searched >= pages_per_sb ) {
	// Searched all the pages in this superblock. Look for a new superblock.
	//
	// The previous method tried limiting the number of pages searched, but
	// that caused a huge performance issue in CUDA where the outer loop
	// executed massive numbers of times. Threads weren't able to find a
	// free location when the superblock wasn't full and were able to execute
	// the outer loop many times before the superblock was switched for a new
	// one. Switching to an exhaustive search eliminated this possiblity and
	// didn't slow anything down for the tests.
	need_new_sb = true;
	search_done = true;
	}
	else {
	// Move to the next page making sure the new search position
	// doesn't go past this superblock's bits.
	pos += BLOCKS_PER_PAGE;
	pos = ( pos < pos_base + blocks_per_sb ) ? pos : pos_base;
	}
	}
	else {
	// Reserved a memory location to allocate.
	memory_fence();

	search_done = true;
	allocation_done = true;

	uint32_t lg_block_size = block_size_id + LG_MIN_BLOCK_SIZE;

	p = m_data + ( size_t(sb_id) << m_lg_sb_size ) +
	( ( pos - pos_base ) << lg_block_size );

	uint32_t used_bits = Kokkos::Impl::bit_count( prev_val );

	if ( used_bits == 0 ) {
	// This page was empty. Decrement the number of empty pages for
	// the superblock.
	atomic_decrement( &m_sb_header(sb_id).m_empty_pages );
	}
	else if ( used_bits == m_blocksize_info[block_size_id].m_page_full_level - 1 )
	{
	// This page is full. Increment the number of full pages for
	// the superblock.
	uint32_t full_pages = atomic_fetch_add( &m_sb_header(sb_id).m_full_pages, 1 );

	// This allocation made the superblock full, so a new one needs to be found.
	if ( full_pages == m_blocksize_info[block_size_id].m_sb_full_level - 1 ) {
	need_new_sb = true;
	}
	}
	}
	}
	}
	else {
	// This is the first allocation for this block size. A superblock needs
	// to be set as the active one. If this point is reached any other time,
	// it is an error.
	need_new_sb = true;
	}

	if ( need_new_sb ) {
	uint32_t new_sb_id = find_superblock( block_size_id, sb_id );

	if ( new_sb_id == sb_id ) {
	allocation_done = true;
	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_INFO
	printf( " No superblocks available. \n" );
	#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
	fflush( stdout );
	#endif
	#endif
	}
	else {
	sb_id = new_sb_id;
	}
	}
	}
	}
	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_INFO
	else {
	printf( " Requested allocation size (%zu) larger than superblock size (%lu). \n",
	alloc_size, m_sb_size );
	#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
	fflush( stdout );
	#endif
	}
	#endif

	return p;
	}

	/// \brief Release allocated memory back to the pool.
	/// \param alloc_ptr Pointer to chunk of memory previously allocated by
	/// the allocator.
	/// \param alloc_size Size of the allocated memory in number of bytes.
	KOKKOS_FUNCTION
	void deallocate( void * alloc_ptr, size_t alloc_size ) const
	{
	char * ap = static_cast<char *>( alloc_ptr );

	// Only deallocate memory controlled by this pool.
	if ( ap >= m_data && ap + alloc_size <= m_data + m_data_size ) {
	// Get the superblock for the address. This can be calculated by math on
	// the address since the superblocks are stored contiguously in one memory
	// chunk.
	uint32_t sb_id = ( ap - m_data ) >> m_lg_sb_size;

	// Get the starting position for this superblock's bits in the bitset.
	uint32_t pos_base = sb_id << m_lg_max_sb_blocks;

	// Get the relative position for this memory location's bit in the bitset.
	uint32_t offset = ( ap - m_data ) - ( size_t(sb_id) << m_lg_sb_size );
	uint32_t lg_block_size = m_sb_header(sb_id).m_lg_block_size;
	uint32_t block_size_id = lg_block_size - LG_MIN_BLOCK_SIZE;
	uint32_t pos_rel = offset >> lg_block_size;

	bool success = false;
	unsigned prev_val = 0;

	memory_fence();

	Kokkos::tie( success, prev_val ) = m_sb_blocks.fetch_word_reset( pos_base + pos_rel );

	// If the memory location was previously deallocated, do nothing.
	if ( success ) {
	uint32_t page_fill_level = Kokkos::Impl::bit_count( prev_val );

	if ( page_fill_level == 1 ) {
	// This page is now empty. Increment the number of empty pages for the
	// superblock.
	uint32_t empty_pages = atomic_fetch_add( &m_sb_header(sb_id).m_empty_pages, 1 );

	if ( !volatile_load( &m_sb_header(sb_id).m_is_active ) &&
	empty_pages == m_blocksize_info[block_size_id].m_pages_per_sb - 1 )
	{
	// This deallocation caused the superblock to be empty. Change the
	// superblock category from partially full to empty.
	unsigned pos = block_size_id * m_ceil_num_sb + sb_id;

	if ( m_partfull_sb.reset( pos ) ) {
	// Reset the empty pages and block size for the superblock.
	volatile_store( &m_sb_header(sb_id).m_empty_pages, uint32_t(0) );
	volatile_store( &m_sb_header(sb_id).m_lg_block_size, uint32_t(0) );

	store_fence();

	m_empty_sb.set( sb_id );
	}
	}
	}
	else if ( page_fill_level == m_blocksize_info[block_size_id].m_page_full_level ) {
	// This page is no longer full. Decrement the number of full pages for
	// the superblock.
	uint32_t full_pages = atomic_fetch_sub( &m_sb_header(sb_id).m_full_pages, 1 );

	if ( !volatile_load( &m_sb_header(sb_id).m_is_active ) &&
	full_pages == m_blocksize_info[block_size_id].m_sb_full_level )
	{
	// This deallocation caused the number of full pages to decrease below
	// the full threshold. Change the superblock category from full to
	// partially full.
	unsigned pos = block_size_id * m_ceil_num_sb + sb_id;
	m_partfull_sb.set( pos );
	}
	}
	}
	}
	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINTERR
	else {
	printf( "\n MemoryPool::deallocate() ADDRESS_OUT_OF_RANGE(0x%llx) \n",
	reinterpret_cast<uint64_t>( alloc_ptr ) );
	#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
	fflush( stdout );
	#endif
	}
	#endif
	}

	/// \brief Tests if the memory pool has no more memory available to allocate.
	KOKKOS_INLINE_FUNCTION
	bool is_empty() const
	{
	// The allocator is empty if all superblocks are full. A superblock is
	// full if it has >= 80% of its pages allocated.

	// Look at all the superblocks. If one is not full, then the allocator
	// isn't empty.
	for ( size_t i = 0; i < m_num_sb; ++i ) {
	uint32_t lg_block_size = m_sb_header(i).m_lg_block_size;

	// A superblock only has a block size of 0 when it is empty.
	if ( lg_block_size == 0 ) return false;

	uint32_t block_size_id = lg_block_size - LG_MIN_BLOCK_SIZE;
	uint32_t full_pages = volatile_load( &m_sb_header(i).m_full_pages );

	if ( full_pages < m_blocksize_info[block_size_id].m_sb_full_level ) return false;
	}

	// All the superblocks were full. The allocator is empty.
	return true;
	}

	// The following functions are used for debugging.
	void print_status() const
	{
	printf( "\n" );

	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_SUPERBLOCK_INFO
	typename SBHeaderView::HostMirror host_sb_header = create_mirror_view( m_sb_header );
	deep_copy( host_sb_header, m_sb_header );

	UInt32View num_allocated_blocks( "Allocated Blocks", m_num_sb );

	// Count the number of allocated blocks per superblock.
	{
	MempoolImpl::count_allocated_blocks< UInt32View, SBHeaderView, MempoolBitset >
	mch( m_num_sb, num_allocated_blocks, m_sb_header,
	m_sb_blocks, m_sb_size, m_lg_max_sb_blocks );
	}

	typename UInt32View::HostMirror host_num_allocated_blocks =
	create_mirror_view( num_allocated_blocks );
	deep_copy( host_num_allocated_blocks, num_allocated_blocks );

	// Print header info of all superblocks.
	printf( "SB_ID SIZE ACTIVE EMPTY_PAGES FULL_PAGES USED_BLOCKS\n" );
	for ( size_t i = 0; i < m_num_sb; ++i ) {
	printf( "%5zu %4u %6d %11u %10u %10u\n", i,
	host_sb_header(i).m_lg_block_size, host_sb_header(i).m_is_active,
	host_sb_header(i).m_empty_pages, host_sb_header(i).m_full_pages,
	host_num_allocated_blocks(i) );
	}

	printf( "\n" );
	#endif

	UInt32View page_histogram( "Page Histogram", 33 );

	// Get a View version of the blocksize info.
	typedef View< BlockSizeHeader *, device_type > BSHeaderView;
	BSHeaderView blocksize_info( "BlockSize Headers", MAX_BLOCK_SIZES );

	Kokkos::Impl::DeepCopy< backend_memory_space, Kokkos::HostSpace >
	dc( blocksize_info.ptr_on_device(), m_blocksize_info,
	sizeof(BlockSizeHeader) * m_num_block_size );

	Kokkos::pair< double, uint32_t > result = Kokkos::pair< double, uint32_t >( 0.0, 0 );

	// Create the page histogram.
	{
	MempoolImpl::create_histogram< UInt32View, BSHeaderView, SBHeaderView, MempoolBitset >
	mch( 0, m_num_sb, page_histogram, blocksize_info, m_sb_header, m_sb_blocks,
	m_lg_max_sb_blocks, LG_MIN_BLOCK_SIZE, BLOCKS_PER_PAGE, result );
	}

	typename UInt32View::HostMirror host_page_histogram = create_mirror_view( page_histogram );
	deep_copy( host_page_histogram, page_histogram );

	// Find the used and total pages and blocks.
	uint32_t used_pages = 0;
	uint32_t used_blocks = 0;
	for ( uint32_t i = 1; i < 33; ++i ) {
	used_pages += host_page_histogram(i);
	used_blocks += i * host_page_histogram(i);
	}
	uint32_t total_pages = used_pages + host_page_histogram(0);

	unsigned num_empty_sb = m_empty_sb.count();
	unsigned num_non_empty_sb = m_num_sb - num_empty_sb;
	unsigned num_partfull_sb = m_partfull_sb.count();

	uint32_t total_blocks = result.second;
	double ave_sb_full = num_non_empty_sb == 0 ? 0.0 : result.first / num_non_empty_sb;
	double percent_used_sb = double( m_num_sb - num_empty_sb ) / m_num_sb;
	double percent_used_pages = total_pages == 0 ? 0.0 : double(used_pages) / total_pages;
	double percent_used_blocks = total_blocks == 0 ? 0.0 : double(used_blocks) / total_blocks;

	// Count active superblocks.
	typename UInt32View::HostMirror host_active = create_mirror_view( m_active );
	deep_copy( host_active, m_active );

	unsigned num_active_sb = 0;
	for ( size_t i = 0; i < m_num_block_size; ++i ) {
	num_active_sb += host_active(i) != INVALID_SUPERBLOCK;
	}

	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_ACTIVE_SUPERBLOCKS
	// Print active superblocks.
	printf( "BS_ID SB_ID\n" );
	for ( size_t i = 0; i < m_num_block_size; ++i ) {
	uint32_t sb_id = host_active(i);

	if ( sb_id == INVALID_SUPERBLOCK ) {
	printf( "%5zu I\n", i );
	}
	else if ( sb_id == SUPERBLOCK_LOCK ) {
	printf( "%5zu L\n", i );
	}
	else {
	printf( "%5zu %7u\n", i, sb_id );
	}
	}
	printf( "\n" );
	fflush( stdout );
	#endif

	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_PAGE_INFO
	// Print the summary page histogram.
	printf( "USED_BLOCKS PAGE_COUNT\n" );
	for ( uint32_t i = 0; i < 33; ++i ) {
	printf( "%10u %10u\n", i, host_page_histogram[i] );
	}
	printf( "\n" );
	#endif

	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
	// Print the page histogram for a few individual superblocks.
	// const uint32_t num_sb_id = 2;
	// uint32_t sb_id[num_sb_id] = { 0, 10 };
	const uint32_t num_sb_id = 1;
	uint32_t sb_id[num_sb_id] = { 0 };

	for ( uint32_t i = 0; i < num_sb_id; ++i ) {
	deep_copy( page_histogram, 0 );

	{
	MempoolImpl::create_histogram< UInt32View, BSHeaderView, SBHeaderView, MempoolBitset >
	mch( sb_id[i], sb_id[i] + 1, page_histogram, blocksize_info, m_sb_header,
	m_sb_blocks, m_lg_max_sb_blocks, LG_MIN_BLOCK_SIZE, BLOCKS_PER_PAGE, result );
	}

	deep_copy( host_page_histogram, page_histogram );

	printf( "SB_ID USED_BLOCKS PAGE_COUNT\n" );
	for ( uint32_t j = 0; j < 33; ++j ) {
	printf( "%5u %10u %10u\n", sb_id[i], j, host_page_histogram[j] );
	}
	printf( "\n" );
	}

	/*
	// Print the blocks used for each page of a few individual superblocks.
	for ( uint32_t i = 0; i < num_sb_id; ++i ) {
	uint32_t lg_block_size = host_sb_header(sb_id[i]).m_lg_block_size;

	if ( lg_block_size != 0 ) {
	printf( "SB_ID BLOCK ID USED_BLOCKS\n" );

	uint32_t block_size_id = lg_block_size - LG_MIN_BLOCK_SIZE;
	uint32_t pages_per_sb = m_blocksize_info[block_size_id].m_pages_per_sb;

	for ( uint32_t j = 0; j < pages_per_sb; ++j ) {
	unsigned start_pos = ( sb_id[i] << m_lg_max_sb_blocks ) + j * BLOCKS_PER_PAGE;
	unsigned end_pos = start_pos + BLOCKS_PER_PAGE;
	uint32_t num_allocated_blocks = 0;

	for ( unsigned k = start_pos; k < end_pos; ++k ) {
	num_allocated_blocks += m_sb_blocks.test( k );
	}

	printf( "%5u %8u %11u\n", sb_id[i], j, num_allocated_blocks );
	}

	printf( "\n" );
	}
	}
	*/
	#endif

	printf( " Used blocks: %10u / %10u = %10.6lf\n", used_blocks, total_blocks,
	percent_used_blocks );
	printf( " Used pages: %10u / %10u = %10.6lf\n", used_pages, total_pages,
	percent_used_pages );
	printf( " Used SB: %10zu / %10zu = %10.6lf\n", m_num_sb - num_empty_sb, m_num_sb,
	percent_used_sb );
	printf( " Active SB: %10u\n", num_active_sb );
	printf( " Empty SB: %10u\n", num_empty_sb );
	printf( " Partfull SB: %10u\n", num_partfull_sb );
	printf( " Full SB: %10lu\n",
	m_num_sb - num_active_sb - num_empty_sb - num_partfull_sb );
	printf( "Ave. SB Full %%: %10.6lf\n", ave_sb_full );
	printf( "\n" );
	fflush( stdout );

	#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
	fflush( stdout );
	#endif
	}

	KOKKOS_INLINE_FUNCTION
	size_t get_min_block_size() const { return MIN_BLOCK_SIZE; }

	+ KOKKOS_INLINE_FUNCTION
	size_t get_mem_size() const { return m_data_size; }

	private:
	/// \brief Returns the index into the active array for the given size.
	///
	/// Computes log2 of the largest power of two >= the given size
	/// ( ie ceil( log2(size) ) ) shifted by LG_MIN_BLOCK_SIZE.
	KOKKOS_FORCEINLINE_FUNCTION
	int get_block_size_index( const size_t size ) const
	{
	// We know the size fits in a 32 bit unsigned because the size of a
	// superblock is limited to 2^31, so casting to an unsigned is safe.

	// Find the most significant nonzero bit.
	uint32_t first_nonzero_bit =
	Kokkos::Impl::bit_scan_reverse( static_cast<unsigned>( size ) );

	// If size is an integral power of 2, ceil( log2(size) ) is equal to the
	// most significant nonzero bit. Otherwise, you need to add 1. Since the
	// minimum block size is MIN_BLOCK_SIZE, make sure ceil( log2(size) ) is at
	// least LG_MIN_BLOCK_SIZE.
	uint32_t lg2_size = first_nonzero_bit + !Kokkos::Impl::is_integral_power_of_two( size );
	lg2_size = lg2_size > LG_MIN_BLOCK_SIZE ? lg2_size : LG_MIN_BLOCK_SIZE;

	// Return ceil( log2(size) ) shifted so that the value for MIN_BLOCK_SIZE
	// is 0.
	return lg2_size - LG_MIN_BLOCK_SIZE;
	}

	/// \brief Finds a superblock with free space to become a new active superblock.
	///
	/// If this function is called, the current active superblock needs to be replaced
	/// because it is full. Initially, only the thread that sets the active superblock
	/// to full calls this function. Other threads can still allocate from the "full"
	/// active superblock because a full superblock still has locations available. If
	/// a thread tries to allocate from the active superblock when it has no free
	/// locations, then that thread will call this function, too, and spin on a lock
	/// waiting until the active superblock has been replaced.
	KOKKOS_FUNCTION
	uint32_t find_superblock( int block_size_id, uint32_t old_sb ) const
	{
	// Try to grab the lock on the head.
	uint32_t lock_sb =
	Kokkos::atomic_compare_exchange( &m_active(block_size_id), old_sb, SUPERBLOCK_LOCK );

	load_fence();

	// Initialize the new superblock to be the previous one so the previous
	// superblock is returned if a new superblock can't be found.
	uint32_t new_sb = lock_sb;

	if ( lock_sb == old_sb ) {
	// This thread has the lock.

	// 1. Look for a partially filled superblock that is of the right block
	// size.

	size_t max_tries = m_ceil_num_sb >> LG_BLOCKS_PER_PAGE;
	size_t tries = 0;
	bool search_done = false;

	// Set the starting search position to the beginning of this block
	// size's bitset.
	unsigned pos = block_size_id * m_ceil_num_sb;

	while ( !search_done ) {
	bool success = false;
	unsigned prev_val = 0;

	Kokkos::tie( success, prev_val ) = m_partfull_sb.reset_any_in_word( pos );

	if ( !success ) {
	if ( ++tries >= max_tries ) {
	// Exceeded number of words for this block size's bitset.
	search_done = true;
	}
	else {
	pos += BLOCKS_PER_PAGE;
	}
	}
	else {
	// Found a superblock.

	// It is possible that the newly found superblock is the same as the
	// old superblock. In this case putting the old value back in yields
	// correct behavior. This could happen as follows. This thread
	// grabs the lock and transitions the superblock to the full state.
	// Before it searches for a new superblock, other threads perform
	// enough deallocations to transition the superblock to the partially
	// full state. This thread then searches for a partially full
	// superblock and finds the one it removed. There's potential for
	// this to cause a performance issue if the same superblock keeps
	// being removed and added due to the right mix and ordering of
	// allocations and deallocations.
	search_done = true;
	new_sb = pos - block_size_id * m_ceil_num_sb;

	// Set the head status for the superblock.
	volatile_store( &m_sb_header(new_sb).m_is_active, uint32_t(true) );

	// If there was a previous active superblock, mark it as not active.
	// It is now in the full category and as such isn't tracked.
	if ( lock_sb != INVALID_SUPERBLOCK ) {
	volatile_store( &m_sb_header(lock_sb).m_is_active, uint32_t(false) );
	}

	store_fence();
	}
	}

	// 2. Look for an empty superblock.
	if ( new_sb == lock_sb ) {
	tries = 0;
	search_done = false;

	// Set the starting search position to the beginning of this block
	// size's bitset.
	pos = 0;

	while ( !search_done ) {
	bool success = false;
	unsigned prev_val = 0;

	Kokkos::tie( success, prev_val ) = m_empty_sb.reset_any_in_word( pos );

	if ( !success ) {
	if ( ++tries >= max_tries ) {
	// Exceeded number of words for this block size's bitset.
	search_done = true;
	}
	else {
	pos += BLOCKS_PER_PAGE;
	}
	}
	else {
	// Found a superblock.

	// It is possible that the newly found superblock is the same as
	// the old superblock. In this case putting the old value back in
	// yields correct behavior. This could happen as follows. This
	// thread grabs the lock and transitions the superblock to the full
	// state. Before it searches for a new superblock, other threads
	// perform enough deallocations to transition the superblock to the
	// partially full state and then the empty state. This thread then
	// searches for a partially full superblock and none exist. This
	// thread then searches for an empty superblock and finds the one
	// it removed. The likelihood of this happening is so remote that
	// the potential for this to cause a performance issue is
	// infinitesimal.
	search_done = true;
	new_sb = pos;

	// Set the empty pages, block size, and head status for the
	// superblock.
	volatile_store( &m_sb_header(new_sb).m_empty_pages,
	m_blocksize_info[block_size_id].m_pages_per_sb );
	volatile_store( &m_sb_header(new_sb).m_lg_block_size,
	block_size_id + LG_MIN_BLOCK_SIZE );
	volatile_store( &m_sb_header(new_sb).m_is_active, uint32_t(true) );

	// If there was a previous active superblock, mark it as not active.
	// It is now in the full category and as such isn't tracked.
	if ( lock_sb != INVALID_SUPERBLOCK ) {
	volatile_store( &m_sb_header(lock_sb).m_is_active, uint32_t(false) );
	}

	store_fence();
	}
	}
	}

	// Write the new active superblock to release the lock.
	atomic_exchange( &m_active(block_size_id), new_sb );
	}
	else {
	// Either another thread has the lock and is switching the active
	// superblock for this block size or another thread has already changed
	// the active superblock since this thread read its value. Keep
	// atomically reading the active superblock until it isn't locked to get
	// the new active superblock.
	do {
	new_sb = atomic_fetch_or( &m_active(block_size_id), uint32_t(0) );
	} while ( new_sb == SUPERBLOCK_LOCK );

	load_fence();

	// Assertions:
	// 1. An invalid superblock should never be found here.
	// 2. If the new superblock is the same as the previous superblock, the
	// allocator is empty.
	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINTERR
	if ( new_sb == INVALID_SUPERBLOCK ) {
	printf( "\n MemoryPool::find_superblock() FOUND_INACTIVE_SUPERBLOCK \n" );
	#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
	fflush( stdout );
	#endif
	Kokkos::abort( "" );
	}
	#endif
	}

	return new_sb;
	}

	/// Returns 64 bits from a clock register.
	KOKKOS_FORCEINLINE_FUNCTION
	uint64_t get_clock_register(void) const
	{
	#if defined( __CUDA_ARCH__ )
	// Return value of 64-bit hi-res clock register.
	return clock64();
	#elif defined( __i386__ ) \|\| defined( __x86_64 )
	// Return value of 64-bit hi-res clock register.
	unsigned a = 0, d = 0;

	__asm__ volatile( "rdtsc" : "=a" (a), "=d" (d) );

	return ( (uint64_t) a ) \| ( ( (uint64_t) d ) << 32 );
	#elif defined( __powerpc ) \|\| defined( __powerpc__ ) \|\| defined( __powerpc64__ ) \|\| \
	defined( __POWERPC__ ) \|\| defined( __ppc__ ) \|\| defined( __ppc64__ )
	unsigned int cycles = 0;

	asm volatile( "mftb %0" : "=r" (cycles) );

	return (uint64_t) cycles;
	#else
	const uint64_t ticks =
	std::chrono::high_resolution_clock::now().time_since_epoch().count();

	return ticks;
	#endif
	}
	};

	} // namespace Experimental
	} // namespace Kokkos

	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINTERR
	#undef KOKKOS_ENABLE_MEMPOOL_PRINTERR
	#endif

	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_INFO
	#undef KOKKOS_ENABLE_MEMPOOL_PRINT_INFO
	#endif

	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_BLOCKSIZE_INFO
	#undef KOKKOS_ENABLE_MEMPOOL_PRINT_BLOCKSIZE_INFO
	#endif

	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_SUPERBLOCK_INFO
	#undef KOKKOS_ENABLE_MEMPOOL_PRINT_SUPERBLOCK_INFO
	#endif

	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_PAGE_INFO
	#undef KOKKOS_ENABLE_MEMPOOL_PRINT_PAGE_INFO
	#endif

	#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
	#undef KOKKOS_ENABLE_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
	#endif

	#endif // KOKKOS_MEMORYPOOL_HPP
	diff --git a/lib/kokkos/core/src/Kokkos_OpenMP.hpp b/lib/kokkos/core/src/Kokkos_OpenMP.hpp
	index a337d1a9d..c0c43b92f 100644
	--- a/lib/kokkos/core/src/Kokkos_OpenMP.hpp
	+++ b/lib/kokkos/core/src/Kokkos_OpenMP.hpp
	@@ -1,204 +1,204 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_OPENMP_HPP
	#define KOKKOS_OPENMP_HPP

	#include <Kokkos_Core_fwd.hpp>

	#if defined( KOKKOS_ENABLE_OPENMP) && !defined(_OPENMP)
	#error "You enabled Kokkos OpenMP support without enabling OpenMP in the compiler!"
	#endif

	#if defined( KOKKOS_ENABLE_OPENMP ) && defined( _OPENMP )

	#include <omp.h>

	#include <cstddef>
	#include <iosfwd>
	#include <Kokkos_HostSpace.hpp>
	#ifdef KOKKOS_ENABLE_HBWSPACE
	#include <Kokkos_HBWSpace.hpp>
	#endif
	#include <Kokkos_ScratchSpace.hpp>
	#include <Kokkos_Parallel.hpp>
	#include <Kokkos_TaskScheduler.hpp>
	#include <Kokkos_Layout.hpp>
	#include <impl/Kokkos_Tags.hpp>

	-#include <KokkosExp_MDRangePolicy.hpp>
	/--------------------------------------------------------------------------/

	namespace Kokkos {

	/// \class OpenMP
	/// \brief Kokkos device for multicore processors in the host memory space.
	class OpenMP {
	public:
	//------------------------------------
	//! \name Type declarations that all Kokkos devices must provide.
	//@{

	//! Tag this class as a kokkos execution space
	typedef OpenMP execution_space ;
	#ifdef KOKKOS_ENABLE_HBWSPACE
	typedef Experimental::HBWSpace memory_space ;
	#else
	typedef HostSpace memory_space ;
	#endif
	//! This execution space preferred device_type
	typedef Kokkos::Device<execution_space,memory_space> device_type;

	typedef LayoutRight array_layout ;
	typedef memory_space::size_type size_type ;

	typedef ScratchMemorySpace< OpenMP > scratch_memory_space ;

	//@}
	//------------------------------------
	//! \name Functions that all Kokkos execution spaces must implement.
	//@{

	inline static bool in_parallel() { return omp_in_parallel(); }

	/** \brief Set the device in a "sleep" state. A noop for OpenMP. */
	static bool sleep();

	/** \brief Wake the device from the 'sleep' state. A noop for OpenMP. */
	static bool wake();

	/** \brief Wait until all dispatched functors complete. A noop for OpenMP. */
	static void fence() {}

	/// \brief Print configuration information to the given output stream.
	static void print_configuration( std::ostream & , const bool detail = false );

	/// \brief Free any resources being consumed by the device.
	static void finalize();

	/** \brief Initialize the device.
	*
	* 1) If the hardware locality library is enabled and OpenMP has not
	* already bound threads then bind OpenMP threads to maximize
	* core utilization and group for memory hierarchy locality.
	*
	* 2) Allocate a HostThread for each OpenMP thread to hold its
	* topology and fan in/out data.
	*/
	static void initialize( unsigned thread_count = 0 ,
	unsigned use_numa_count = 0 ,
	unsigned use_cores_per_numa = 0 );

	static int is_initialized();

	/** \brief Return the maximum amount of concurrency. */
	static int concurrency();

	//@}
	//------------------------------------
	/** \brief This execution space has a topological thread pool which can be queried.
	*
	* All threads within a pool have a common memory space for which they are cache coherent.
	* depth = 0 gives the number of threads in the whole pool.
	* depth = 1 gives the number of threads in a NUMA region, typically sharing L3 cache.
	* depth = 2 gives the number of threads at the finest granularity, typically sharing L1 cache.
	*/
	inline static int thread_pool_size( int depth = 0 );

	/** \brief The rank of the executing thread in this thread pool */
	KOKKOS_INLINE_FUNCTION static int thread_pool_rank();

	//------------------------------------

	inline static unsigned max_hardware_threads() { return thread_pool_size(0); }

	KOKKOS_INLINE_FUNCTION static
	unsigned hardware_thread_id() { return thread_pool_rank(); }
	};

	} // namespace Kokkos

	/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/

	namespace Kokkos {
	namespace Impl {

	template<>
	struct MemorySpaceAccess
	< Kokkos::OpenMP::memory_space
	, Kokkos::OpenMP::scratch_memory_space
	>
	{
	enum { assignable = false };
	enum { accessible = true };
	enum { deepcopy = false };
	};

	template<>
	struct VerifyExecutionCanAccessMemorySpace
	< Kokkos::OpenMP::memory_space
	, Kokkos::OpenMP::scratch_memory_space
	>
	{
	enum { value = true };
	inline static void verify( void ) { }
	inline static void verify( const void * ) { }
	};

	} // namespace Impl
	} // namespace Kokkos

	/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/

	#include <OpenMP/Kokkos_OpenMPexec.hpp>
	#include <OpenMP/Kokkos_OpenMP_Parallel.hpp>
	#include <OpenMP/Kokkos_OpenMP_Task.hpp>

	+#include <KokkosExp_MDRangePolicy.hpp>
	/--------------------------------------------------------------------------/

	#endif /* #if defined( KOKKOS_ENABLE_OPENMP ) && defined( _OPENMP ) */
	#endif /* #ifndef KOKKOS_OPENMP_HPP */


	diff --git a/lib/kokkos/core/src/Kokkos_Pair.hpp b/lib/kokkos/core/src/Kokkos_Pair.hpp
	index 83436826f..067767f2f 100644
	--- a/lib/kokkos/core/src/Kokkos_Pair.hpp
	+++ b/lib/kokkos/core/src/Kokkos_Pair.hpp
	@@ -1,530 +1,527 @@
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER

	/// \file Kokkos_Pair.hpp
	/// \brief Declaration and definition of Kokkos::pair.
	///
	/// This header file declares and defines Kokkos::pair and its related
	/// nonmember functions.

	#ifndef KOKKOS_PAIR_HPP
	#define KOKKOS_PAIR_HPP

	#include <Kokkos_Macros.hpp>
	#include <utility>

	namespace Kokkos {
	/// \struct pair
	/// \brief Replacement for std::pair that works on CUDA devices.
	///
	/// The instance methods of std::pair, including its constructors, are
	/// not marked as <tt>__device__</tt> functions. Thus, they cannot be
	/// called on a CUDA device, such as an NVIDIA GPU. This struct
	/// implements the same interface as std::pair, but can be used on a
	/// CUDA device as well as on the host.
	template <class T1, class T2>
	struct pair
	{
	//! The first template parameter of this class.
	typedef T1 first_type;
	//! The second template parameter of this class.
	typedef T2 second_type;

	//! The first element of the pair.
	first_type first;
	//! The second element of the pair.
	second_type second;

	/// \brief Default constructor.
	///
	/// This calls the default constructors of T1 and T2. It won't
	/// compile if those default constructors are not defined and
	/// public.
	- KOKKOS_FORCEINLINE_FUNCTION
	- pair()
	- : first(), second()
	- {}
	+ KOKKOS_FORCEINLINE_FUNCTION constexpr
	+ pair() = default ;

	/// \brief Constructor that takes both elements of the pair.
	///
	/// This calls the copy constructors of T1 and T2. It won't compile
	/// if those copy constructors are not defined and public.
	- KOKKOS_FORCEINLINE_FUNCTION
	+ KOKKOS_FORCEINLINE_FUNCTION constexpr
	pair(first_type const& f, second_type const& s)
	: first(f), second(s)
	{}

	/// \brief Copy constructor.
	///
	/// This calls the copy constructors of T1 and T2. It won't compile
	/// if those copy constructors are not defined and public.
	template <class U, class V>
	- KOKKOS_FORCEINLINE_FUNCTION
	+ KOKKOS_FORCEINLINE_FUNCTION constexpr
	pair( const pair<U,V> &p)
	: first(p.first), second(p.second)
	{}

	/// \brief Copy constructor.
	///
	/// This calls the copy constructors of T1 and T2. It won't compile
	/// if those copy constructors are not defined and public.
	template <class U, class V>
	- KOKKOS_FORCEINLINE_FUNCTION
	+ KOKKOS_FORCEINLINE_FUNCTION constexpr
	pair( const volatile pair<U,V> &p)
	: first(p.first), second(p.second)
	{}

	/// \brief Assignment operator.
	///
	/// This calls the assignment operators of T1 and T2. It won't
	/// compile if the assignment operators are not defined and public.
	template <class U, class V>
	KOKKOS_FORCEINLINE_FUNCTION
	pair<T1, T2> & operator=(const pair<U,V> &p)
	{
	first = p.first;
	second = p.second;
	return *this;
	}


	/// \brief Assignment operator, for volatile <tt>*this</tt>.
	///
	/// \param p [in] Input; right-hand side of the assignment.
	///
	/// This calls the assignment operators of T1 and T2. It will not
	/// compile if the assignment operators are not defined and public.
	///
	/// This operator returns \c void instead of <tt>volatile pair<T1,
	/// T2>& </tt>. See Kokkos Issue #177 for the explanation. In
	/// practice, this means that you should not chain assignments with
	/// volatile lvalues.
	template <class U, class V>
	KOKKOS_FORCEINLINE_FUNCTION
	void operator=(const volatile pair<U,V> &p) volatile
	{
	first = p.first;
	second = p.second;
	// We deliberately do not return anything here. See explanation
	// in public documentation above.
	}

	// from std::pair<U,V>
	template <class U, class V>
	pair( const std::pair<U,V> &p)
	: first(p.first), second(p.second)
	{}

	/// \brief Return the std::pair version of this object.
	///
	/// This is <i>not</i> a device function; you may not call it on a
	/// CUDA device. It is meant to be called on the host, if the user
	/// wants an std::pair instead of a Kokkos::pair.
	///
	/// \note This is not a conversion operator, since defining a
	/// conversion operator made the relational operators have
	/// ambiguous definitions.
	std::pair<T1,T2> to_std_pair() const
	{ return std::make_pair(first,second); }
	};

	template <class T1, class T2>
	struct pair<T1&, T2&>
	{
	//! The first template parameter of this class.
	typedef T1& first_type;
	//! The second template parameter of this class.
	typedef T2& second_type;

	//! The first element of the pair.
	first_type first;
	//! The second element of the pair.
	second_type second;

	/// \brief Constructor that takes both elements of the pair.
	///
	/// This calls the copy constructors of T1 and T2. It won't compile
	/// if those copy constructors are not defined and public.
	- KOKKOS_FORCEINLINE_FUNCTION
	+ KOKKOS_FORCEINLINE_FUNCTION constexpr
	pair(first_type f, second_type s)
	: first(f), second(s)
	{}

	/// \brief Copy constructor.
	///
	/// This calls the copy constructors of T1 and T2. It won't compile
	/// if those copy constructors are not defined and public.
	template <class U, class V>
	- KOKKOS_FORCEINLINE_FUNCTION
	+ KOKKOS_FORCEINLINE_FUNCTION constexpr
	pair( const pair<U,V> &p)
	: first(p.first), second(p.second)
	{}

	// from std::pair<U,V>
	template <class U, class V>
	pair( const std::pair<U,V> &p)
	: first(p.first), second(p.second)
	{}

	/// \brief Assignment operator.
	///
	/// This calls the assignment operators of T1 and T2. It won't
	/// compile if the assignment operators are not defined and public.
	template <class U, class V>
	KOKKOS_FORCEINLINE_FUNCTION
	pair<first_type, second_type> & operator=(const pair<U,V> &p)
	{
	first = p.first;
	second = p.second;
	return *this;
	}

	/// \brief Return the std::pair version of this object.
	///
	/// This is <i>not</i> a device function; you may not call it on a
	/// CUDA device. It is meant to be called on the host, if the user
	/// wants an std::pair instead of a Kokkos::pair.
	///
	/// \note This is not a conversion operator, since defining a
	/// conversion operator made the relational operators have
	/// ambiguous definitions.
	std::pair<T1,T2> to_std_pair() const
	{ return std::make_pair(first,second); }
	};

	template <class T1, class T2>
	struct pair<T1, T2&>
	{
	//! The first template parameter of this class.
	typedef T1 first_type;
	//! The second template parameter of this class.
	typedef T2& second_type;

	//! The first element of the pair.
	first_type first;
	//! The second element of the pair.
	second_type second;

	/// \brief Constructor that takes both elements of the pair.
	///
	/// This calls the copy constructors of T1 and T2. It won't compile
	/// if those copy constructors are not defined and public.
	- KOKKOS_FORCEINLINE_FUNCTION
	+ KOKKOS_FORCEINLINE_FUNCTION constexpr
	pair(first_type const& f, second_type s)
	: first(f), second(s)
	{}

	/// \brief Copy constructor.
	///
	/// This calls the copy constructors of T1 and T2. It won't compile
	/// if those copy constructors are not defined and public.
	template <class U, class V>
	- KOKKOS_FORCEINLINE_FUNCTION
	+ KOKKOS_FORCEINLINE_FUNCTION constexpr
	pair( const pair<U,V> &p)
	: first(p.first), second(p.second)
	{}

	// from std::pair<U,V>
	template <class U, class V>
	pair( const std::pair<U,V> &p)
	: first(p.first), second(p.second)
	{}

	/// \brief Assignment operator.
	///
	/// This calls the assignment operators of T1 and T2. It won't
	/// compile if the assignment operators are not defined and public.
	template <class U, class V>
	KOKKOS_FORCEINLINE_FUNCTION
	pair<first_type, second_type> & operator=(const pair<U,V> &p)
	{
	first = p.first;
	second = p.second;
	return *this;
	}

	/// \brief Return the std::pair version of this object.
	///
	/// This is <i>not</i> a device function; you may not call it on a
	/// CUDA device. It is meant to be called on the host, if the user
	/// wants an std::pair instead of a Kokkos::pair.
	///
	/// \note This is not a conversion operator, since defining a
	/// conversion operator made the relational operators have
	/// ambiguous definitions.
	std::pair<T1,T2> to_std_pair() const
	{ return std::make_pair(first,second); }
	};

	template <class T1, class T2>
	struct pair<T1&, T2>
	{
	//! The first template parameter of this class.
	typedef T1& first_type;
	//! The second template parameter of this class.
	typedef T2 second_type;

	//! The first element of the pair.
	first_type first;
	//! The second element of the pair.
	second_type second;

	/// \brief Constructor that takes both elements of the pair.
	///
	/// This calls the copy constructors of T1 and T2. It won't compile
	/// if those copy constructors are not defined and public.
	- KOKKOS_FORCEINLINE_FUNCTION
	+ KOKKOS_FORCEINLINE_FUNCTION constexpr
	pair(first_type f, second_type const& s)
	: first(f), second(s)
	{}

	/// \brief Copy constructor.
	///
	/// This calls the copy constructors of T1 and T2. It won't compile
	/// if those copy constructors are not defined and public.
	template <class U, class V>
	- KOKKOS_FORCEINLINE_FUNCTION
	+ KOKKOS_FORCEINLINE_FUNCTION constexpr
	pair( const pair<U,V> &p)
	: first(p.first), second(p.second)
	{}

	// from std::pair<U,V>
	template <class U, class V>
	pair( const std::pair<U,V> &p)
	: first(p.first), second(p.second)
	{}

	/// \brief Assignment operator.
	///
	/// This calls the assignment operators of T1 and T2. It won't
	/// compile if the assignment operators are not defined and public.
	template <class U, class V>
	KOKKOS_FORCEINLINE_FUNCTION
	pair<first_type, second_type> & operator=(const pair<U,V> &p)
	{
	first = p.first;
	second = p.second;
	return *this;
	}

	/// \brief Return the std::pair version of this object.
	///
	/// This is <i>not</i> a device function; you may not call it on a
	/// CUDA device. It is meant to be called on the host, if the user
	/// wants an std::pair instead of a Kokkos::pair.
	///
	/// \note This is not a conversion operator, since defining a
	/// conversion operator made the relational operators have
	/// ambiguous definitions.
	std::pair<T1,T2> to_std_pair() const
	{ return std::make_pair(first,second); }
	};

	//! Equality operator for Kokkos::pair.
	template <class T1, class T2>
	KOKKOS_FORCEINLINE_FUNCTION
	bool operator== (const pair<T1,T2>& lhs, const pair<T1,T2>& rhs)
	{ return lhs.first==rhs.first && lhs.second==rhs.second; }

	//! Inequality operator for Kokkos::pair.
	template <class T1, class T2>
	-KOKKOS_FORCEINLINE_FUNCTION
	+KOKKOS_FORCEINLINE_FUNCTION constexpr
	bool operator!= (const pair<T1,T2>& lhs, const pair<T1,T2>& rhs)
	{ return !(lhs==rhs); }

	//! Less-than operator for Kokkos::pair.
	template <class T1, class T2>
	-KOKKOS_FORCEINLINE_FUNCTION
	+KOKKOS_FORCEINLINE_FUNCTION constexpr
	bool operator< (const pair<T1,T2>& lhs, const pair<T1,T2>& rhs)
	{ return lhs.first<rhs.first \|\| (!(rhs.first<lhs.first) && lhs.second<rhs.second); }

	//! Less-than-or-equal-to operator for Kokkos::pair.
	template <class T1, class T2>
	-KOKKOS_FORCEINLINE_FUNCTION
	+KOKKOS_FORCEINLINE_FUNCTION constexpr
	bool operator<= (const pair<T1,T2>& lhs, const pair<T1,T2>& rhs)
	{ return !(rhs<lhs); }

	//! Greater-than operator for Kokkos::pair.
	template <class T1, class T2>
	-KOKKOS_FORCEINLINE_FUNCTION
	+KOKKOS_FORCEINLINE_FUNCTION constexpr
	bool operator> (const pair<T1,T2>& lhs, const pair<T1,T2>& rhs)
	{ return rhs<lhs; }

	//! Greater-than-or-equal-to operator for Kokkos::pair.
	template <class T1, class T2>
	-KOKKOS_FORCEINLINE_FUNCTION
	+KOKKOS_FORCEINLINE_FUNCTION constexpr
	bool operator>= (const pair<T1,T2>& lhs, const pair<T1,T2>& rhs)
	{ return !(lhs<rhs); }

	/// \brief Return a new pair.
	///
	/// This is a "nonmember constructor" for Kokkos::pair. It works just
	/// like std::make_pair.
	template <class T1,class T2>
	-KOKKOS_FORCEINLINE_FUNCTION
	+KOKKOS_FORCEINLINE_FUNCTION constexpr
	pair<T1,T2> make_pair (T1 x, T2 y)
	{ return ( pair<T1,T2>(x,y) ); }

	/// \brief Return a pair of references to the input arguments.
	///
	/// This compares to std::tie (new in C++11). You can use it to
	/// assign to two variables at once, from the result of a function
	/// that returns a pair. For example (<tt>__device__</tt> and
	/// <tt>__host__</tt> attributes omitted for brevity):
	/// \code
	/// // Declaration of the function to call.
	/// // First return value: operation count.
	/// // Second return value: whether all operations succeeded.
	/// Kokkos::pair<int, bool> someFunction ();
	///
	/// // Code that uses Kokkos::tie.
	/// int myFunction () {
	/// int count = 0;
	/// bool success = false;
	///
	/// // This assigns to both count and success.
	/// Kokkos::tie (count, success) = someFunction ();
	///
	/// if (! success) {
	/// // ... Some operation failed;
	/// // take corrective action ...
	/// }
	/// return count;
	/// }
	/// \endcode
	///
	/// The line that uses tie() could have been written like this:
	/// \code
	/// Kokkos::pair<int, bool> result = someFunction ();
	/// count = result.first;
	/// success = result.second;
	/// \endcode
	///
	/// Using tie() saves two lines of code and avoids a copy of each
	/// element of the pair. The latter could be significant if one or
	/// both elements of the pair are more substantial objects than \c int
	/// or \c bool.
	template <class T1,class T2>
	KOKKOS_FORCEINLINE_FUNCTION
	pair<T1 &,T2 &> tie (T1 & x, T2 & y)
	{ return ( pair<T1 &,T2 &>(x,y) ); }

	//
	// Specialization of Kokkos::pair for a \c void second argument. This
	// is not actually a "pair"; it only contains one element, the first.
	//
	template <class T1>
	struct pair<T1,void>
	{
	typedef T1 first_type;
	typedef void second_type;

	first_type first;
	enum { second = 0 };

	- KOKKOS_FORCEINLINE_FUNCTION
	- pair()
	- : first()
	- {}
	+ KOKKOS_FORCEINLINE_FUNCTION constexpr
	+ pair() = default ;

	- KOKKOS_FORCEINLINE_FUNCTION
	+ KOKKOS_FORCEINLINE_FUNCTION constexpr
	pair(const first_type & f)
	: first(f)
	{}

	- KOKKOS_FORCEINLINE_FUNCTION
	+ KOKKOS_FORCEINLINE_FUNCTION constexpr
	pair(const first_type & f, int)
	: first(f)
	{}

	template <class U>
	- KOKKOS_FORCEINLINE_FUNCTION
	+ KOKKOS_FORCEINLINE_FUNCTION constexpr
	pair( const pair<U,void> &p)
	: first(p.first)
	{}

	template <class U>
	KOKKOS_FORCEINLINE_FUNCTION
	pair<T1, void> & operator=(const pair<U,void> &p)
	{
	first = p.first;
	return *this;
	}
	};

	//
	// Specialization of relational operators for Kokkos::pair<T1,void>.
	//

	template <class T1>
	-KOKKOS_FORCEINLINE_FUNCTION
	+KOKKOS_FORCEINLINE_FUNCTION constexpr
	bool operator== (const pair<T1,void>& lhs, const pair<T1,void>& rhs)
	{ return lhs.first==rhs.first; }

	template <class T1>
	-KOKKOS_FORCEINLINE_FUNCTION
	+KOKKOS_FORCEINLINE_FUNCTION constexpr
	bool operator!= (const pair<T1,void>& lhs, const pair<T1,void>& rhs)
	{ return !(lhs==rhs); }

	template <class T1>
	-KOKKOS_FORCEINLINE_FUNCTION
	+KOKKOS_FORCEINLINE_FUNCTION constexpr
	bool operator< (const pair<T1,void>& lhs, const pair<T1,void>& rhs)
	{ return lhs.first<rhs.first; }

	template <class T1>
	-KOKKOS_FORCEINLINE_FUNCTION
	+KOKKOS_FORCEINLINE_FUNCTION constexpr
	bool operator<= (const pair<T1,void>& lhs, const pair<T1,void>& rhs)
	{ return !(rhs<lhs); }

	template <class T1>
	-KOKKOS_FORCEINLINE_FUNCTION
	+KOKKOS_FORCEINLINE_FUNCTION constexpr
	bool operator> (const pair<T1,void>& lhs, const pair<T1,void>& rhs)
	{ return rhs<lhs; }

	template <class T1>
	-KOKKOS_FORCEINLINE_FUNCTION
	+KOKKOS_FORCEINLINE_FUNCTION constexpr
	bool operator>= (const pair<T1,void>& lhs, const pair<T1,void>& rhs)
	{ return !(lhs<rhs); }

	} // namespace Kokkos


	#endif //KOKKOS_PAIR_HPP
	+
	diff --git a/lib/kokkos/core/src/Kokkos_Parallel.hpp b/lib/kokkos/core/src/Kokkos_Parallel.hpp
	index 64b1502bc..e412e608b 100644
	--- a/lib/kokkos/core/src/Kokkos_Parallel.hpp
	+++ b/lib/kokkos/core/src/Kokkos_Parallel.hpp
	@@ -1,527 +1,528 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	/// \file Kokkos_Parallel.hpp
	/// \brief Declaration of parallel operators

	#ifndef KOKKOS_PARALLEL_HPP
	#define KOKKOS_PARALLEL_HPP

	#include <cstddef>
	#include <Kokkos_Core_fwd.hpp>
	#include <Kokkos_View.hpp>
	#include <Kokkos_ExecPolicy.hpp>

	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	#include <impl/Kokkos_Profiling_Interface.hpp>
	#include <typeinfo>
	#endif

	#include <impl/Kokkos_Tags.hpp>
	#include <impl/Kokkos_Traits.hpp>
	+#include <impl/Kokkos_FunctorAnalysis.hpp>
	#include <impl/Kokkos_FunctorAdapter.hpp>

	#ifdef KOKKOS_DEBUG
	#include<iostream>
	#endif

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	//----------------------------------------------------------------------------
	/** \brief Given a Functor and Execution Policy query an execution space.
	*
	* if the Policy has an execution space use that
	* else if the Functor has an execution_space use that
	* else if the Functor has a device_type use that for backward compatibility
	* else use the default
	*/
	template< class Functor
	, class Policy
	, class EnableFunctor
	, class EnablePolicy
	>
	struct FunctorPolicyExecutionSpace {
	typedef Kokkos::DefaultExecutionSpace execution_space ;
	};

	template< class Functor , class Policy >
	struct FunctorPolicyExecutionSpace
	< Functor , Policy
	, typename enable_if_type< typename Functor::device_type >::type
	, typename enable_if_type< typename Policy ::execution_space >::type
	>
	{
	typedef typename Policy ::execution_space execution_space ;
	};

	template< class Functor , class Policy >
	struct FunctorPolicyExecutionSpace
	< Functor , Policy
	, typename enable_if_type< typename Functor::execution_space >::type
	, typename enable_if_type< typename Policy ::execution_space >::type
	>
	{
	typedef typename Policy ::execution_space execution_space ;
	};

	template< class Functor , class Policy , class EnableFunctor >
	struct FunctorPolicyExecutionSpace
	< Functor , Policy
	, EnableFunctor
	, typename enable_if_type< typename Policy::execution_space >::type
	>
	{
	typedef typename Policy ::execution_space execution_space ;
	};

	template< class Functor , class Policy , class EnablePolicy >
	struct FunctorPolicyExecutionSpace
	< Functor , Policy
	, typename enable_if_type< typename Functor::device_type >::type
	, EnablePolicy
	>
	{
	typedef typename Functor::device_type execution_space ;
	};

	template< class Functor , class Policy , class EnablePolicy >
	struct FunctorPolicyExecutionSpace
	< Functor , Policy
	, typename enable_if_type< typename Functor::execution_space >::type
	, EnablePolicy
	>
	{
	typedef typename Functor::execution_space execution_space ;
	};

	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {

	/** \brief Execute \c functor in parallel according to the execution \c policy.
	*
	* A "functor" is a class containing the function to execute in parallel,
	* data needed for that execution, and an optional \c execution_space
	* typedef. Here is an example functor for parallel_for:
	*
	* \code
	* class FunctorType {
	* public:
	* typedef ... execution_space ;
	* void operator() ( WorkType iwork ) const ;
	* };
	* \endcode
	*
	* In the above example, \c WorkType is any integer type for which a
	* valid conversion from \c size_t to \c IntType exists. Its
	* <tt>operator()</tt> method defines the operation to parallelize,
	* over the range of integer indices <tt>iwork=[0,work_count-1]</tt>.
	* This compares to a single iteration \c iwork of a \c for loop.
	* If \c execution_space is not defined DefaultExecutionSpace will be used.
	*/
	template< class ExecPolicy , class FunctorType >
	inline
	void parallel_for( const ExecPolicy & policy
	, const FunctorType & functor
	, const std::string& str = ""
	, typename Impl::enable_if< ! Impl::is_integral< ExecPolicy >::value >::type * = 0
	)
	{
	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	uint64_t kpID = 0;
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::beginParallelFor("" == str ? typeid(FunctorType).name() : str, 0, &kpID);
	}
	#endif

	Kokkos::Impl::shared_allocation_tracking_claim_and_disable();
	Impl::ParallelFor< FunctorType , ExecPolicy > closure( functor , policy );
	Kokkos::Impl::shared_allocation_tracking_release_and_enable();
	-
	+
	closure.execute();

	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::endParallelFor(kpID);
	}
	#endif
	}

	template< class FunctorType >
	inline
	void parallel_for( const size_t work_count
	, const FunctorType & functor
	, const std::string& str = ""
	)
	{
	typedef typename
	Impl::FunctorPolicyExecutionSpace< FunctorType , void >::execution_space
	execution_space ;
	typedef RangePolicy< execution_space > policy ;

	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	uint64_t kpID = 0;
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::beginParallelFor("" == str ? typeid(FunctorType).name() : str, 0, &kpID);
	}
	#endif
	-
	+
	Kokkos::Impl::shared_allocation_tracking_claim_and_disable();
	Impl::ParallelFor< FunctorType , policy > closure( functor , policy(0,work_count) );
	Kokkos::Impl::shared_allocation_tracking_release_and_enable();

	closure.execute();

	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::endParallelFor(kpID);
	}
	#endif
	}

	template< class ExecPolicy , class FunctorType >
	inline
	void parallel_for( const std::string & str
	, const ExecPolicy & policy
	, const FunctorType & functor )
	{
	#if KOKKOS_ENABLE_DEBUG_PRINT_KERNEL_NAMES
	Kokkos::fence();
	std::cout << "KOKKOS_DEBUG Start parallel_for kernel: " << str << std::endl;
	#endif

	parallel_for(policy,functor,str);

	#if KOKKOS_ENABLE_DEBUG_PRINT_KERNEL_NAMES
	Kokkos::fence();
	std::cout << "KOKKOS_DEBUG End parallel_for kernel: " << str << std::endl;
	#endif
	(void) str;
	}

	}

	#include <Kokkos_Parallel_Reduce.hpp>
	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {

	/// \fn parallel_scan
	/// \tparam ExecutionPolicy The execution policy type.
	/// \tparam FunctorType The scan functor type.
	///
	/// \param policy [in] The execution policy.
	/// \param functor [in] The scan functor.
	///
	/// This function implements a parallel scan pattern. The scan can
	/// be either inclusive or exclusive, depending on how you implement
	/// the scan functor.
	///
	/// A scan functor looks almost exactly like a reduce functor, except
	/// that its operator() takes a third \c bool argument, \c final_pass,
	/// which indicates whether this is the last pass of the scan
	/// operation. We will show below how to use the \c final_pass
	/// argument to control whether the scan is inclusive or exclusive.
	///
	/// Here is the minimum required interface of a scan functor for a POD
	/// (plain old data) value type \c PodType. That is, the result is a
	/// View of zero or more PodType. It is also possible for the result
	/// to be an array of (same-sized) arrays of PodType, but we do not
	/// show the required interface for that here.
	/// \code
	/// template< class ExecPolicy , class FunctorType >
	/// class ScanFunctor {
	/// public:
	/// // The Kokkos device type
	/// typedef ... execution_space;
	/// // Type of an entry of the array containing the result;
	/// // also the type of each of the entries combined using
	/// // operator() or join().
	/// typedef PodType value_type;
	///
	/// void operator () (const ExecPolicy::member_type & i, value_type& update, const bool final_pass) const;
	/// void init (value_type& update) const;
	/// void join (volatile value_type& update, volatile const value_type& input) const
	/// };
	/// \endcode
	///
	/// Here is an example of a functor which computes an inclusive plus-scan
	/// of an array of \c int, in place. If given an array [1, 2, 3, 4], this
	/// scan will overwrite that array with [1, 3, 6, 10].
	///
	/// \code
	/// template<class SpaceType>
	/// class InclScanFunctor {
	/// public:
	/// typedef SpaceType execution_space;
	/// typedef int value_type;
	/// typedef typename SpaceType::size_type size_type;
	///
	/// InclScanFunctor( Kokkos::View<value_type*, execution_space> x
	/// , Kokkos::View<value_type*, execution_space> y ) : m_x(x), m_y(y) {}
	///
	/// void operator () (const size_type i, value_type& update, const bool final_pass) const {
	/// update += m_x(i);
	/// if (final_pass) {
	/// m_y(i) = update;
	/// }
	/// }
	/// void init (value_type& update) const {
	/// update = 0;
	/// }
	/// void join (volatile value_type& update, volatile const value_type& input) const {
	/// update += input;
	/// }
	///
	/// private:
	/// Kokkos::View<value_type*, execution_space> m_x;
	/// Kokkos::View<value_type*, execution_space> m_y;
	/// };
	/// \endcode
	///
	/// Here is an example of a functor which computes an <i>exclusive</i>
	/// scan of an array of \c int, in place. In operator(), note both
	/// that the final_pass test and the update have switched places, and
	/// the use of a temporary. If given an array [1, 2, 3, 4], this scan
	/// will overwrite that array with [0, 1, 3, 6].
	///
	/// \code
	/// template<class SpaceType>
	/// class ExclScanFunctor {
	/// public:
	/// typedef SpaceType execution_space;
	/// typedef int value_type;
	/// typedef typename SpaceType::size_type size_type;
	///
	/// ExclScanFunctor (Kokkos::View<value_type*, execution_space> x) : x_ (x) {}
	///
	/// void operator () (const size_type i, value_type& update, const bool final_pass) const {
	/// const value_type x_i = x_(i);
	/// if (final_pass) {
	/// x_(i) = update;
	/// }
	/// update += x_i;
	/// }
	/// void init (value_type& update) const {
	/// update = 0;
	/// }
	/// void join (volatile value_type& update, volatile const value_type& input) const {
	/// update += input;
	/// }
	///
	/// private:
	/// Kokkos::View<value_type*, execution_space> x_;
	/// };
	/// \endcode
	///
	/// Here is an example of a functor which builds on the above
	/// exclusive scan example, to compute an offsets array from a
	/// population count array, in place. We assume that the pop count
	/// array has an extra entry at the end to store the final count. If
	/// given an array [1, 2, 3, 4, 0], this scan will overwrite that
	/// array with [0, 1, 3, 6, 10].
	///
	/// \code
	/// template<class SpaceType>
	/// class OffsetScanFunctor {
	/// public:
	/// typedef SpaceType execution_space;
	/// typedef int value_type;
	/// typedef typename SpaceType::size_type size_type;
	///
	/// // lastIndex_ is the last valid index (zero-based) of x.
	/// // If x has length zero, then lastIndex_ won't be used anyway.
	/// OffsetScanFunctor( Kokkos::View<value_type*, execution_space> x
	/// , Kokkos::View<value_type*, execution_space> y )
	/// : m_x(x), m_y(y), last_index_ (x.dimension_0 () == 0 ? 0 : x.dimension_0 () - 1)
	/// {}
	///
	/// void operator () (const size_type i, int& update, const bool final_pass) const {
	/// if (final_pass) {
	/// m_y(i) = update;
	/// }
	/// update += m_x(i);
	/// // The last entry of m_y gets the final sum.
	/// if (final_pass && i == last_index_) {
	/// m_y(i+1) = update;
	/// }
	/// }
	/// void init (value_type& update) const {
	/// update = 0;
	/// }
	/// void join (volatile value_type& update, volatile const value_type& input) const {
	/// update += input;
	/// }
	///
	/// private:
	/// Kokkos::View<value_type*, execution_space> m_x;
	/// Kokkos::View<value_type*, execution_space> m_y;
	/// const size_type last_index_;
	/// };
	/// \endcode
	///
	template< class ExecutionPolicy , class FunctorType >
	inline
	void parallel_scan( const ExecutionPolicy & policy
	, const FunctorType & functor
	, const std::string& str = ""
	, typename Impl::enable_if< ! Impl::is_integral< ExecutionPolicy >::value >::type * = 0
	)
	{
	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	uint64_t kpID = 0;
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::beginParallelScan("" == str ? typeid(FunctorType).name() : str, 0, &kpID);
	}
	#endif

	Kokkos::Impl::shared_allocation_tracking_claim_and_disable();
	Impl::ParallelScan< FunctorType , ExecutionPolicy > closure( functor , policy );
	Kokkos::Impl::shared_allocation_tracking_release_and_enable();

	closure.execute();

	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::endParallelScan(kpID);
	}
	#endif

	}

	template< class FunctorType >
	inline
	void parallel_scan( const size_t work_count
	, const FunctorType & functor
	, const std::string& str = "" )
	{
	typedef typename
	Kokkos::Impl::FunctorPolicyExecutionSpace< FunctorType , void >::execution_space
	execution_space ;

	typedef Kokkos::RangePolicy< execution_space > policy ;

	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	uint64_t kpID = 0;
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::beginParallelScan("" == str ? typeid(FunctorType).name() : str, 0, &kpID);
	}
	#endif
	-
	+
	Kokkos::Impl::shared_allocation_tracking_claim_and_disable();
	Impl::ParallelScan< FunctorType , policy > closure( functor , policy(0,work_count) );
	Kokkos::Impl::shared_allocation_tracking_release_and_enable();

	closure.execute();

	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::endParallelScan(kpID);
	}
	#endif

	}

	template< class ExecutionPolicy , class FunctorType >
	inline
	void parallel_scan( const std::string& str
	, const ExecutionPolicy & policy
	, const FunctorType & functor)
	{
	#if KOKKOS_ENABLE_DEBUG_PRINT_KERNEL_NAMES
	Kokkos::fence();
	std::cout << "KOKKOS_DEBUG Start parallel_scan kernel: " << str << std::endl;
	#endif

	parallel_scan(policy,functor,str);

	#if KOKKOS_ENABLE_DEBUG_PRINT_KERNEL_NAMES
	Kokkos::fence();
	std::cout << "KOKKOS_DEBUG End parallel_scan kernel: " << str << std::endl;
	#endif
	(void) str;
	}

	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	template< class FunctorType , class Enable = void >
	struct FunctorTeamShmemSize
	{
	KOKKOS_INLINE_FUNCTION static size_t value( const FunctorType & , int ) { return 0 ; }
	};

	template< class FunctorType >
	struct FunctorTeamShmemSize< FunctorType , typename Impl::enable_if< 0 < sizeof( & FunctorType::team_shmem_size ) >::type >
	{
	static inline size_t value( const FunctorType & f , int team_size ) { return f.team_shmem_size( team_size ) ; }
	};

	template< class FunctorType >
	struct FunctorTeamShmemSize< FunctorType , typename Impl::enable_if< 0 < sizeof( & FunctorType::shmem_size ) >::type >
	{
	static inline size_t value( const FunctorType & f , int team_size ) { return f.shmem_size( team_size ) ; }
	};

	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	#endif /* KOKKOS_PARALLEL_HPP */

	diff --git a/lib/kokkos/core/src/Kokkos_Parallel_Reduce.hpp b/lib/kokkos/core/src/Kokkos_Parallel_Reduce.hpp
	index a3649b442..900dce19f 100644
	--- a/lib/kokkos/core/src/Kokkos_Parallel_Reduce.hpp
	+++ b/lib/kokkos/core/src/Kokkos_Parallel_Reduce.hpp
	@@ -1,1356 +1,1356 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/


	namespace Kokkos {


	template<class T, class Enable = void>
	struct is_reducer_type {
	enum { value = 0 };
	};


	template<class T>
	struct is_reducer_type<T,typename std::enable_if<
	std::is_same<typename std::remove_cv<T>::type,
	typename std::remove_cv<typename T::reducer_type>::type>::value
	>::type> {
	enum { value = 1 };
	};

	namespace Experimental {


	template<class Scalar,class Space = HostSpace>
	struct Sum {
	public:
	//Required
	typedef Sum reducer_type;
	typedef Scalar value_type;

	typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;

	value_type init_value;

	private:
	result_view_type result;

	template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
	struct InitWrapper;

	template<class ValueType >
	struct InitWrapper<ValueType,true> {
	static ValueType value() {
	return static_cast<value_type>(0);
	}
	};

	template<class ValueType >
	struct InitWrapper<ValueType,false> {
	static ValueType value() {
	return value_type();
	}
	};

	public:

	Sum(value_type& result_):
	init_value(InitWrapper<value_type>::value()),result(&result_) {}
	Sum(const result_view_type& result_):
	init_value(InitWrapper<value_type>::value()),result(result_) {}
	Sum(value_type& result_, const value_type& init_value_):
	init_value(init_value_),result(&result_) {}
	Sum(const result_view_type& result_, const value_type& init_value_):
	init_value(init_value_),result(result_) {}

	//Required
	KOKKOS_INLINE_FUNCTION
	void join(value_type& dest, const value_type& src) const {
	dest += src;
	}

	KOKKOS_INLINE_FUNCTION
	void join(volatile value_type& dest, const volatile value_type& src) const {
	dest += src;
	}

	//Optional
	KOKKOS_INLINE_FUNCTION
	void init( value_type& val) const {
	val = init_value;
	}

	result_view_type result_view() const {
	return result;
	}
	};

	template<class Scalar,class Space = HostSpace>
	struct Prod {
	public:
	//Required
	typedef Prod reducer_type;
	typedef Scalar value_type;

	typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;

	value_type init_value;

	private:
	result_view_type result;

	template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
	struct InitWrapper;

	template<class ValueType >
	struct InitWrapper<ValueType,true> {
	static ValueType value() {
	return static_cast<value_type>(1);
	}
	};

	template<class ValueType >
	struct InitWrapper<ValueType,false> {
	static ValueType value() {
	return value_type();
	}
	};

	public:

	Prod(value_type& result_):
	init_value(InitWrapper<value_type>::value()),result(&result_) {}
	Prod(const result_view_type& result_):
	init_value(InitWrapper<value_type>::value()),result(result_) {}
	Prod(value_type& result_, const value_type& init_value_):
	init_value(init_value_),result(&result_) {}
	Prod(const result_view_type& result_, const value_type& init_value_):
	init_value(init_value_),result(result_) {}

	//Required
	KOKKOS_INLINE_FUNCTION
	void join(value_type& dest, const value_type& src) const {
	dest *= src;
	}

	KOKKOS_INLINE_FUNCTION
	void join(volatile value_type& dest, const volatile value_type& src) const {
	dest *= src;
	}

	//Optional
	KOKKOS_INLINE_FUNCTION
	void init( value_type& val) const {
	val = init_value;
	}

	result_view_type result_view() const {
	return result;
	}
	};

	template<class Scalar, class Space = HostSpace>
	struct Min {
	public:
	//Required
	typedef Min reducer_type;
	typedef typename std::remove_cv<Scalar>::type value_type;

	typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;

	value_type init_value;

	private:
	result_view_type result;

	template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
	struct InitWrapper;

	template<class ValueType >
	struct InitWrapper<ValueType,true> {
	static ValueType value() {
	return std::numeric_limits<value_type>::max();
	}
	};

	template<class ValueType >
	struct InitWrapper<ValueType,false> {
	static ValueType value() {
	return value_type();
	}
	};

	public:

	Min(value_type& result_):
	init_value(InitWrapper<value_type>::value()),result(&result_) {}
	Min(const result_view_type& result_):
	init_value(InitWrapper<value_type>::value()),result(result_) {}
	Min(value_type& result_, const value_type& init_value_):
	init_value(init_value_),result(&result_) {}
	Min(const result_view_type& result_, const value_type& init_value_):
	init_value(init_value_),result(result_) {}

	//Required
	KOKKOS_INLINE_FUNCTION
	void join(value_type& dest, const value_type& src) const {
	if ( src < dest )
	dest = src;
	}

	KOKKOS_INLINE_FUNCTION
	void join(volatile value_type& dest, const volatile value_type& src) const {
	if ( src < dest )
	dest = src;
	}

	//Optional
	KOKKOS_INLINE_FUNCTION
	void init( value_type& val) const {
	val = init_value;
	}

	result_view_type result_view() const {
	return result;
	}
	};

	template<class Scalar, class Space = HostSpace>
	struct Max {
	public:
	//Required
	typedef Max reducer_type;
	typedef typename std::remove_cv<Scalar>::type value_type;

	typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;

	value_type init_value;

	private:
	result_view_type result;

	template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
	struct InitWrapper;

	template<class ValueType >
	struct InitWrapper<ValueType,true> {
	static ValueType value() {
	return std::numeric_limits<value_type>::min();
	}
	};

	template<class ValueType >
	struct InitWrapper<ValueType,false> {
	static ValueType value() {
	return value_type();
	}
	};

	public:

	Max(value_type& result_):
	init_value(InitWrapper<value_type>::value()),result(&result_) {}
	Max(const result_view_type& result_):
	init_value(InitWrapper<value_type>::value()),result(result_) {}
	Max(value_type& result_, const value_type& init_value_):
	init_value(init_value_),result(&result_) {}
	Max(const result_view_type& result_, const value_type& init_value_):
	init_value(init_value_),result(result_) {}

	//Required
	KOKKOS_INLINE_FUNCTION
	void join(value_type& dest, const value_type& src) const {
	if ( src > dest )
	dest = src;
	}

	KOKKOS_INLINE_FUNCTION
	void join(volatile value_type& dest, const volatile value_type& src) const {
	if ( src > dest )
	dest = src;
	}

	//Optional
	KOKKOS_INLINE_FUNCTION
	void init( value_type& val) const {
	val = init_value;
	}

	result_view_type result_view() const {
	return result;
	}
	};

	template<class Scalar, class Space = HostSpace>
	struct LAnd {
	public:
	//Required
	typedef LAnd reducer_type;
	typedef Scalar value_type;

	typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;

	private:
	result_view_type result;

	public:

	LAnd(value_type& result_):result(&result_) {}
	LAnd(const result_view_type& result_):result(result_) {}

	//Required
	KOKKOS_INLINE_FUNCTION
	void join(value_type& dest, const value_type& src) const {
	dest = dest && src;
	}

	KOKKOS_INLINE_FUNCTION
	void join(volatile value_type& dest, const volatile value_type& src) const {
	dest = dest && src;
	}

	//Optional
	KOKKOS_INLINE_FUNCTION
	void init( value_type& val) const {
	val = 1;
	}

	result_view_type result_view() const {
	return result;
	}
	};

	template<class Scalar, class Space = HostSpace>
	struct LOr {
	public:
	//Required
	typedef LOr reducer_type;
	typedef Scalar value_type;

	typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;

	private:
	result_view_type result;

	public:

	LOr(value_type& result_):result(&result_) {}
	LOr(const result_view_type& result_):result(result_) {}

	//Required
	KOKKOS_INLINE_FUNCTION
	void join(value_type& dest, const value_type& src) const {
	dest = dest \|\| src;
	}

	KOKKOS_INLINE_FUNCTION
	void join(volatile value_type& dest, const volatile value_type& src) const {
	dest = dest \|\| src;
	}

	//Optional
	KOKKOS_INLINE_FUNCTION
	void init( value_type& val) const {
	val = 0;
	}

	result_view_type result_view() const {
	return result;
	}
	};

	template<class Scalar, class Space = HostSpace>
	struct LXor {
	public:
	//Required
	typedef LXor reducer_type;
	typedef Scalar value_type;

	typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;

	private:
	result_view_type result;

	public:

	LXor(value_type& result_):result(&result_) {}
	LXor(const result_view_type& result_):result(result_) {}

	//Required
	KOKKOS_INLINE_FUNCTION
	void join(value_type& dest, const value_type& src) const {
	dest = dest? (!src) : src;
	}

	KOKKOS_INLINE_FUNCTION
	void join(volatile value_type& dest, const volatile value_type& src) const {
	dest = dest? (!src) : src;
	}

	//Optional
	KOKKOS_INLINE_FUNCTION
	void init( value_type& val) const {
	val = 0;
	}

	result_view_type result_view() const {
	return result;
	}
	};

	template<class Scalar, class Space = HostSpace>
	struct BAnd {
	public:
	//Required
	typedef BAnd reducer_type;
	typedef typename std::remove_cv<Scalar>::type value_type;

	typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;

	value_type init_value;

	private:
	result_view_type result;

	public:

	BAnd(value_type& result_):
	init_value(value_type() \| (~value_type())),result(&result_) {}
	BAnd(const result_view_type& result_):
	init_value(value_type() \| (~value_type())),result(result_) {}

	//Required
	KOKKOS_INLINE_FUNCTION
	void join(value_type& dest, const value_type& src) const {
	dest = dest & src;
	}

	KOKKOS_INLINE_FUNCTION
	void join(volatile value_type& dest, const volatile value_type& src) const {
	dest = dest & src;
	}

	//Optional
	KOKKOS_INLINE_FUNCTION
	void init( value_type& val) const {
	val = init_value;
	}

	result_view_type result_view() const {
	return result;
	}
	};

	template<class Scalar, class Space = HostSpace>
	struct BOr {
	public:
	//Required
	typedef BOr reducer_type;
	typedef typename std::remove_cv<Scalar>::type value_type;

	typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;

	value_type init_value;

	private:
	result_view_type result;

	public:

	BOr(value_type& result_):
	init_value(value_type() & (~value_type())),result(&result_) {}
	BOr(const result_view_type& result_):
	init_value(value_type() & (~value_type())),result(result_) {}

	//Required
	KOKKOS_INLINE_FUNCTION
	void join(value_type& dest, const value_type& src) const {
	dest = dest \| src;
	}

	KOKKOS_INLINE_FUNCTION
	void join(volatile value_type& dest, const volatile value_type& src) const {
	dest = dest \| src;
	}

	//Optional
	KOKKOS_INLINE_FUNCTION
	void init( value_type& val) const {
	val = init_value;
	}

	result_view_type result_view() const {
	return result;
	}
	};

	template<class Scalar, class Space = HostSpace>
	struct BXor {
	public:
	//Required
	typedef BXor reducer_type;
	typedef typename std::remove_cv<Scalar>::type value_type;

	typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;

	value_type init_value;

	private:
	result_view_type result;

	public:

	BXor(value_type& result_):
	init_value(value_type() & (~value_type())),result(&result_) {}
	BXor(const result_view_type& result_):
	init_value(value_type() & (~value_type())),result(result_) {}

	//Required
	KOKKOS_INLINE_FUNCTION
	void join(value_type& dest, const value_type& src) const {
	dest = dest ^ src;
	}

	KOKKOS_INLINE_FUNCTION
	void join(volatile value_type& dest, const volatile value_type& src) const {
	dest = dest ^ src;
	}

	//Optional
	KOKKOS_INLINE_FUNCTION
	void init( value_type& val) const {
	val = init_value;
	}

	result_view_type result_view() const {
	return result;
	}
	};

	template<class Scalar, class Index>
	struct ValLocScalar {
	Scalar val;
	Index loc;

	KOKKOS_INLINE_FUNCTION
	void operator = (const ValLocScalar& rhs) {
	val = rhs.val;
	loc = rhs.loc;
	}

	KOKKOS_INLINE_FUNCTION
	void operator = (const volatile ValLocScalar& rhs) volatile {
	val = rhs.val;
	loc = rhs.loc;
	}
	};

	template<class Scalar, class Index, class Space = HostSpace>
	struct MinLoc {
	private:
	typedef typename std::remove_cv<Scalar>::type scalar_type;
	typedef typename std::remove_cv<Index>::type index_type;

	public:
	//Required
	typedef MinLoc reducer_type;
	typedef ValLocScalar<scalar_type,index_type> value_type;

	typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;

	scalar_type init_value;

	private:
	result_view_type result;

	template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
	struct InitWrapper;

	template<class ValueType >
	struct InitWrapper<ValueType,true> {
	static ValueType value() {
	return std::numeric_limits<scalar_type>::max();
	}
	};

	template<class ValueType >
	struct InitWrapper<ValueType,false> {
	static ValueType value() {
	return scalar_type();
	}
	};

	public:

	MinLoc(value_type& result_):
	init_value(InitWrapper<scalar_type>::value()),result(&result_) {}
	MinLoc(const result_view_type& result_):
	init_value(InitWrapper<scalar_type>::value()),result(result_) {}
	MinLoc(value_type& result_, const scalar_type& init_value_):
	init_value(init_value_),result(&result_) {}
	MinLoc(const result_view_type& result_, const scalar_type& init_value_):
	init_value(init_value_),result(result_) {}


	//Required
	KOKKOS_INLINE_FUNCTION
	void join(value_type& dest, const value_type& src) const {
	if ( src.val < dest.val )
	dest = src;
	}

	KOKKOS_INLINE_FUNCTION
	void join(volatile value_type& dest, const volatile value_type& src) const {
	if ( src.val < dest.val )
	dest = src;
	}

	//Optional
	KOKKOS_INLINE_FUNCTION
	void init( value_type& val) const {
	val.val = init_value;
	}

	result_view_type result_view() const {
	return result;
	}
	};

	template<class Scalar, class Index, class Space = HostSpace>
	struct MaxLoc {
	private:
	typedef typename std::remove_cv<Scalar>::type scalar_type;
	typedef typename std::remove_cv<Index>::type index_type;

	public:
	//Required
	typedef MaxLoc reducer_type;
	typedef ValLocScalar<scalar_type,index_type> value_type;

	typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;

	scalar_type init_value;

	private:
	result_view_type result;

	template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
	struct InitWrapper;

	template<class ValueType >
	struct InitWrapper<ValueType,true> {
	static ValueType value() {
	return std::numeric_limits<scalar_type>::min();
	}
	};

	template<class ValueType >
	struct InitWrapper<ValueType,false> {
	static ValueType value() {
	return scalar_type();
	}
	};

	public:

	MaxLoc(value_type& result_):
	init_value(InitWrapper<scalar_type>::value()),result(&result_) {}
	MaxLoc(const result_view_type& result_):
	init_value(InitWrapper<scalar_type>::value()),result(result_) {}
	MaxLoc(value_type& result_, const scalar_type& init_value_):
	init_value(init_value_),result(&result_) {}
	MaxLoc(const result_view_type& result_, const scalar_type& init_value_):
	init_value(init_value_),result(result_) {}

	//Required
	KOKKOS_INLINE_FUNCTION
	void join(value_type& dest, const value_type& src) const {
	if ( src.val > dest.val )
	dest = src;
	}

	KOKKOS_INLINE_FUNCTION
	void join(volatile value_type& dest, const volatile value_type& src) const {
	if ( src.val > dest.val )
	dest = src;
	}

	//Optional
	KOKKOS_INLINE_FUNCTION
	void init( value_type& val) const {
	val.val = init_value;
	}

	result_view_type result_view() const {
	return result;
	}
	};

	template<class Scalar>
	struct MinMaxScalar {
	Scalar min_val,max_val;

	KOKKOS_INLINE_FUNCTION
	void operator = (const MinMaxScalar& rhs) {
	min_val = rhs.min_val;
	max_val = rhs.max_val;
	}

	KOKKOS_INLINE_FUNCTION
	void operator = (const volatile MinMaxScalar& rhs) volatile {
	min_val = rhs.min_val;
	max_val = rhs.max_val;
	}
	};

	template<class Scalar, class Space = HostSpace>
	struct MinMax {
	private:
	typedef typename std::remove_cv<Scalar>::type scalar_type;

	public:
	//Required
	typedef MinMax reducer_type;
	typedef MinMaxScalar<scalar_type> value_type;

	typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;

	scalar_type min_init_value;
	scalar_type max_init_value;

	private:
	result_view_type result;

	template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
	struct MinInitWrapper;

	template<class ValueType >
	struct MinInitWrapper<ValueType,true> {
	static ValueType value() {
	return std::numeric_limits<scalar_type>::max();
	}
	};

	template<class ValueType >
	struct MinInitWrapper<ValueType,false> {
	static ValueType value() {
	return scalar_type();
	}
	};

	template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
	struct MaxInitWrapper;

	template<class ValueType >
	struct MaxInitWrapper<ValueType,true> {
	static ValueType value() {
	return std::numeric_limits<scalar_type>::min();
	}
	};

	template<class ValueType >
	struct MaxInitWrapper<ValueType,false> {
	static ValueType value() {
	return scalar_type();
	}
	};

	public:

	MinMax(value_type& result_):
	min_init_value(MinInitWrapper<scalar_type>::value()),max_init_value(MaxInitWrapper<scalar_type>::value()),result(&result_) {}
	MinMax(const result_view_type& result_):
	min_init_value(MinInitWrapper<scalar_type>::value()),max_init_value(MaxInitWrapper<scalar_type>::value()),result(result_) {}
	MinMax(value_type& result_, const scalar_type& min_init_value_, const scalar_type& max_init_value_):
	min_init_value(min_init_value_),max_init_value(max_init_value_),result(&result_) {}
	MinMax(const result_view_type& result_, const scalar_type& min_init_value_, const scalar_type& max_init_value_):
	min_init_value(min_init_value_),max_init_value(max_init_value_),result(result_) {}

	//Required
	KOKKOS_INLINE_FUNCTION
	void join(value_type& dest, const value_type& src) const {
	if ( src.min_val < dest.min_val ) {
	dest.min_val = src.min_val;
	}
	if ( src.max_val > dest.max_val ) {
	dest.max_val = src.max_val;
	}
	}

	KOKKOS_INLINE_FUNCTION
	void join(volatile value_type& dest, const volatile value_type& src) const {
	if ( src.min_val < dest.min_val ) {
	dest.min_val = src.min_val;
	}
	if ( src.max_val > dest.max_val ) {
	dest.max_val = src.max_val;
	}
	}

	//Optional
	KOKKOS_INLINE_FUNCTION
	void init( value_type& val) const {
	val.min_val = min_init_value;
	val.max_val = max_init_value;
	}

	result_view_type result_view() const {
	return result;
	}
	};

	template<class Scalar, class Index>
	struct MinMaxLocScalar {
	Scalar min_val,max_val;
	Index min_loc,max_loc;

	KOKKOS_INLINE_FUNCTION
	void operator = (const MinMaxLocScalar& rhs) {
	min_val = rhs.min_val;
	min_loc = rhs.min_loc;
	max_val = rhs.max_val;
	max_loc = rhs.max_loc;
	}

	KOKKOS_INLINE_FUNCTION
	void operator = (const volatile MinMaxLocScalar& rhs) volatile {
	min_val = rhs.min_val;
	min_loc = rhs.min_loc;
	max_val = rhs.max_val;
	max_loc = rhs.max_loc;
	}
	};

	template<class Scalar, class Index, class Space = HostSpace>
	struct MinMaxLoc {
	private:
	typedef typename std::remove_cv<Scalar>::type scalar_type;
	typedef typename std::remove_cv<Index>::type index_type;

	public:
	//Required
	typedef MinMaxLoc reducer_type;
	typedef MinMaxLocScalar<scalar_type,index_type> value_type;

	typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;

	scalar_type min_init_value;
	scalar_type max_init_value;

	private:
	result_view_type result;

	template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
	struct MinInitWrapper;

	template<class ValueType >
	struct MinInitWrapper<ValueType,true> {
	static ValueType value() {
	return std::numeric_limits<scalar_type>::max();
	}
	};

	template<class ValueType >
	struct MinInitWrapper<ValueType,false> {
	static ValueType value() {
	return scalar_type();
	}
	};

	template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
	struct MaxInitWrapper;

	template<class ValueType >
	struct MaxInitWrapper<ValueType,true> {
	static ValueType value() {
	return std::numeric_limits<scalar_type>::min();
	}
	};

	template<class ValueType >
	struct MaxInitWrapper<ValueType,false> {
	static ValueType value() {
	return scalar_type();
	}
	};

	public:

	MinMaxLoc(value_type& result_):
	min_init_value(MinInitWrapper<scalar_type>::value()),max_init_value(MaxInitWrapper<scalar_type>::value()),result(&result_) {}
	MinMaxLoc(const result_view_type& result_):
	min_init_value(MinInitWrapper<scalar_type>::value()),max_init_value(MaxInitWrapper<scalar_type>::value()),result(result_) {}
	MinMaxLoc(value_type& result_, const scalar_type& min_init_value_, const scalar_type& max_init_value_):
	min_init_value(min_init_value_),max_init_value(max_init_value_),result(&result_) {}
	MinMaxLoc(const result_view_type& result_, const scalar_type& min_init_value_, const scalar_type& max_init_value_):
	min_init_value(min_init_value_),max_init_value(max_init_value_),result(result_) {}

	//Required
	KOKKOS_INLINE_FUNCTION
	void join(value_type& dest, const value_type& src) const {
	if ( src.min_val < dest.min_val ) {
	dest.min_val = src.min_val;
	dest.min_loc = src.min_loc;
	}
	if ( src.max_val > dest.max_val ) {
	dest.max_val = src.max_val;
	dest.max_loc = src.max_loc;
	}
	}

	KOKKOS_INLINE_FUNCTION
	void join(volatile value_type& dest, const volatile value_type& src) const {
	if ( src.min_val < dest.min_val ) {
	dest.min_val = src.min_val;
	dest.min_loc = src.min_loc;
	}
	if ( src.max_val > dest.max_val ) {
	dest.max_val = src.max_val;
	dest.max_loc = src.max_loc;
	}
	}

	//Optional
	KOKKOS_INLINE_FUNCTION
	void init( value_type& val) const {
	val.min_val = min_init_value;
	val.max_val = max_init_value;
	}

	result_view_type result_view() const {
	return result;
	}
	};
	}
	}


	namespace Kokkos {
	namespace Impl {

	template< class T, class ReturnType , class ValueTraits>
	struct ParallelReduceReturnValue;

	template< class ReturnType , class FunctorType >
	struct ParallelReduceReturnValue<typename std::enable_if<Kokkos::is_view<ReturnType>::value>::type, ReturnType, FunctorType> {
	typedef ReturnType return_type;
	typedef InvalidType reducer_type;

	typedef typename return_type::value_type value_type_scalar;
	typedef typename return_type::value_type* const value_type_array;

	typedef typename if_c<return_type::rank==0,value_type_scalar,value_type_array>::type value_type;

	static return_type& return_value(ReturnType& return_val, const FunctorType&) {
	return return_val;
	}
	};

	template< class ReturnType , class FunctorType>
	struct ParallelReduceReturnValue<typename std::enable_if<
	!Kokkos::is_view<ReturnType>::value &&
	(!std::is_array<ReturnType>::value && !std::is_pointer<ReturnType>::value) &&
	!Kokkos::is_reducer_type<ReturnType>::value
	>::type, ReturnType, FunctorType> {
	typedef Kokkos::View< ReturnType
	, Kokkos::HostSpace
	, Kokkos::MemoryUnmanaged
	> return_type;

	typedef InvalidType reducer_type;

	typedef typename return_type::value_type value_type;

	static return_type return_value(ReturnType& return_val, const FunctorType&) {
	return return_type(&return_val);
	}
	};

	template< class ReturnType , class FunctorType>
	struct ParallelReduceReturnValue<typename std::enable_if<
	(is_array<ReturnType>::value \|\| std::is_pointer<ReturnType>::value)
	>::type, ReturnType, FunctorType> {
	typedef Kokkos::View< typename std::remove_const<ReturnType>::type
	, Kokkos::HostSpace
	, Kokkos::MemoryUnmanaged
	> return_type;

	typedef InvalidType reducer_type;

	typedef typename return_type::value_type value_type[];

	static return_type return_value(ReturnType& return_val,
	const FunctorType& functor) {
	return return_type(return_val,functor.value_count);
	}
	};

	template< class ReturnType , class FunctorType>
	struct ParallelReduceReturnValue<typename std::enable_if<
	Kokkos::is_reducer_type<ReturnType>::value
	>::type, ReturnType, FunctorType> {
	typedef ReturnType return_type;
	typedef ReturnType reducer_type;
	typedef typename return_type::value_type value_type;

	static return_type return_value(ReturnType& return_val,
	const FunctorType& functor) {
	return return_val;
	}
	};
	}

	namespace Impl {
	template< class T, class ReturnType , class FunctorType>
	struct ParallelReducePolicyType;

	template< class PolicyType , class FunctorType >
	struct ParallelReducePolicyType<typename std::enable_if<Kokkos::Impl::is_execution_policy<PolicyType>::value>::type, PolicyType,FunctorType> {

	typedef PolicyType policy_type;
	static PolicyType policy(const PolicyType& policy_) {
	return policy_;
	}
	};

	template< class PolicyType , class FunctorType >
	struct ParallelReducePolicyType<typename std::enable_if<std::is_integral<PolicyType>::value>::type, PolicyType,FunctorType> {
	typedef typename
	Impl::FunctorPolicyExecutionSpace< FunctorType , void >::execution_space
	execution_space ;

	typedef Kokkos::RangePolicy<execution_space> policy_type;

	static policy_type policy(const PolicyType& policy_) {
	return policy_type(0,policy_);
	}
	};

	}

	namespace Impl {
	template< class FunctorType, class ExecPolicy, class ValueType, class ExecutionSpace>
	struct ParallelReduceFunctorType {
	typedef FunctorType functor_type;
	static const functor_type& functor(const functor_type& functor) {
	return functor;
	}
	};
	}

	namespace Impl {

	template< class PolicyType, class FunctorType, class ReturnType >
	struct ParallelReduceAdaptor {
	typedef Impl::ParallelReduceReturnValue<void,ReturnType,FunctorType> return_value_adapter;
	#ifdef KOKKOS_IMPL_NEED_FUNCTOR_WRAPPER
	typedef Impl::ParallelReduceFunctorType<FunctorType,PolicyType,
	typename return_value_adapter::value_type,
	typename PolicyType::execution_space> functor_adaptor;
	#endif
	static inline
	void execute(const std::string& label,
	const PolicyType& policy,
	const FunctorType& functor,
	ReturnType& return_value) {
	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	uint64_t kpID = 0;
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::beginParallelReduce("" == label ? typeid(FunctorType).name() : label, 0, &kpID);
	}
	#endif

	Kokkos::Impl::shared_allocation_tracking_claim_and_disable();
	#ifdef KOKKOS_IMPL_NEED_FUNCTOR_WRAPPER
	Impl::ParallelReduce<typename functor_adaptor::functor_type, PolicyType, typename return_value_adapter::reducer_type >
	closure(functor_adaptor::functor(functor),
	policy,
	return_value_adapter::return_value(return_value,functor));
	#else
	Impl::ParallelReduce<FunctorType, PolicyType, typename return_value_adapter::reducer_type >
	closure(functor,
	policy,
	return_value_adapter::return_value(return_value,functor));
	#endif
	Kokkos::Impl::shared_allocation_tracking_release_and_enable();
	closure.execute();

	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::endParallelReduce(kpID);
	}
	#endif
	}

	};
	}
	/*! \fn void parallel_reduce(label,policy,functor,return_argument)
	\brief Perform a parallel reduction.
	\param label An optional Label giving the call name. Must be able to construct a std::string from the argument.
	\param policy A Kokkos Execution Policy, such as an integer, a RangePolicy or a TeamPolicy.
	\param functor A functor with a reduction operator, and optional init, join and final functions.
	\param return_argument A return argument which can be a scalar, a View, or a ReducerStruct. This argument can be left out if the functor has a final function.
	*/

	/** \brief Parallel reduction
	*
	* parallel_reduce performs parallel reductions with arbitrary functions - i.e.
	* it is not solely data based. The call expects up to 4 arguments:
	*
	*
	* Example of a parallel_reduce functor for a POD (plain old data) value type:
	* \code
	* class FunctorType { // For POD value type
	* public:
	* typedef ... execution_space ;
	* typedef <podType> value_type ;
	* void operator()( <intType> iwork , <podType> & update ) const ;
	* void init( <podType> & update ) const ;
	* void join( volatile <podType> & update ,
	* volatile const <podType> & input ) const ;
	*
	* typedef true_type has_final ;
	* void final( <podType> & update ) const ;
	* };
	* \endcode
	*
	* Example of a parallel_reduce functor for an array of POD (plain old data) values:
	* \code
	* class FunctorType { // For array of POD value
	* public:
	* typedef ... execution_space ;
	* typedef <podType> value_type[] ;
	* void operator()( <intType> , <podType> update[] ) const ;
	* void init( <podType> update[] ) const ;
	* void join( volatile <podType> update[] ,
	* volatile const <podType> input[] ) const ;
	*
	* typedef true_type has_final ;
	* void final( <podType> update[] ) const ;
	* };
	* \endcode
	*/

	// ReturnValue is scalar or array: take by reference

	template< class PolicyType, class FunctorType, class ReturnType >
	inline
	void parallel_reduce(const std::string& label,
	const PolicyType& policy,
	const FunctorType& functor,
	ReturnType& return_value,
	typename Impl::enable_if<
	Kokkos::Impl::is_execution_policy<PolicyType>::value
	>::type * = 0) {
	Impl::ParallelReduceAdaptor<PolicyType,FunctorType,ReturnType>::execute(label,policy,functor,return_value);
	}

	template< class PolicyType, class FunctorType, class ReturnType >
	inline
	void parallel_reduce(const PolicyType& policy,
	const FunctorType& functor,
	ReturnType& return_value,
	typename Impl::enable_if<
	Kokkos::Impl::is_execution_policy<PolicyType>::value
	>::type * = 0) {
	Impl::ParallelReduceAdaptor<PolicyType,FunctorType,ReturnType>::execute("",policy,functor,return_value);
	}

	template< class FunctorType, class ReturnType >
	inline
	void parallel_reduce(const size_t& policy,
	const FunctorType& functor,
	ReturnType& return_value) {
	typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
	Impl::ParallelReduceAdaptor<policy_type,FunctorType,ReturnType>::execute("",policy_type(0,policy),functor,return_value);
	}

	template< class FunctorType, class ReturnType >
	inline
	void parallel_reduce(const std::string& label,
	const size_t& policy,
	const FunctorType& functor,
	ReturnType& return_value) {
	typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
	Impl::ParallelReduceAdaptor<policy_type,FunctorType,ReturnType>::execute(label,policy_type(0,policy),functor,return_value);
	}

	// ReturnValue as View or Reducer: take by copy to allow for inline construction

	template< class PolicyType, class FunctorType, class ReturnType >
	inline
	void parallel_reduce(const std::string& label,
	const PolicyType& policy,
	const FunctorType& functor,
	const ReturnType& return_value,
	typename Impl::enable_if<
	Kokkos::Impl::is_execution_policy<PolicyType>::value
	>::type * = 0) {
	Impl::ParallelReduceAdaptor<PolicyType,FunctorType,const ReturnType>::execute(label,policy,functor,return_value);
	}

	template< class PolicyType, class FunctorType, class ReturnType >
	inline
	void parallel_reduce(const PolicyType& policy,
	const FunctorType& functor,
	const ReturnType& return_value,
	typename Impl::enable_if<
	Kokkos::Impl::is_execution_policy<PolicyType>::value
	>::type * = 0) {
	ReturnType return_value_impl = return_value;
	Impl::ParallelReduceAdaptor<PolicyType,FunctorType,ReturnType>::execute("",policy,functor,return_value_impl);
	}

	template< class FunctorType, class ReturnType >
	inline
	void parallel_reduce(const size_t& policy,
	const FunctorType& functor,
	const ReturnType& return_value) {
	typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
	ReturnType return_value_impl = return_value;
	Impl::ParallelReduceAdaptor<policy_type,FunctorType,ReturnType>::execute("",policy_type(0,policy),functor,return_value_impl);
	}

	template< class FunctorType, class ReturnType >
	inline
	void parallel_reduce(const std::string& label,
	const size_t& policy,
	const FunctorType& functor,
	const ReturnType& return_value) {
	typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
	ReturnType return_value_impl = return_value;
	Impl::ParallelReduceAdaptor<policy_type,FunctorType,ReturnType>::execute(label,policy_type(0,policy),functor,return_value_impl);
	}

	// No Return Argument

	template< class PolicyType, class FunctorType>
	inline
	void parallel_reduce(const std::string& label,
	const PolicyType& policy,
	const FunctorType& functor,
	typename Impl::enable_if<
	Kokkos::Impl::is_execution_policy<PolicyType>::value
	>::type * = 0) {
	typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
	typedef typename Kokkos::Impl::if_c< (ValueTraits::StaticValueSize != 0)
	, typename ValueTraits::value_type
	, typename ValueTraits::pointer_type
	>::type value_type ;

	typedef Kokkos::View< value_type
	, Kokkos::HostSpace
	, Kokkos::MemoryUnmanaged
	> result_view_type;
	result_view_type result_view ;

	Impl::ParallelReduceAdaptor<PolicyType,FunctorType,result_view_type>::execute(label,policy,functor,result_view);
	}

	template< class PolicyType, class FunctorType >
	inline
	void parallel_reduce(const PolicyType& policy,
	const FunctorType& functor,
	typename Impl::enable_if<
	Kokkos::Impl::is_execution_policy<PolicyType>::value
	>::type * = 0) {
	typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
	typedef typename Kokkos::Impl::if_c< (ValueTraits::StaticValueSize != 0)
	, typename ValueTraits::value_type
	, typename ValueTraits::pointer_type
	>::type value_type ;

	typedef Kokkos::View< value_type
	, Kokkos::HostSpace
	, Kokkos::MemoryUnmanaged
	> result_view_type;
	result_view_type result_view ;

	Impl::ParallelReduceAdaptor<PolicyType,FunctorType,result_view_type>::execute("",policy,functor,result_view);
	}

	template< class FunctorType >
	inline
	void parallel_reduce(const size_t& policy,
	const FunctorType& functor) {
	typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
	typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
	typedef typename Kokkos::Impl::if_c< (ValueTraits::StaticValueSize != 0)
	, typename ValueTraits::value_type
	, typename ValueTraits::pointer_type
	>::type value_type ;

	typedef Kokkos::View< value_type
	, Kokkos::HostSpace
	, Kokkos::MemoryUnmanaged
	> result_view_type;
	result_view_type result_view ;

	Impl::ParallelReduceAdaptor<policy_type,FunctorType,result_view_type>::execute("",policy_type(0,policy),functor,result_view);
	}

	template< class FunctorType>
	inline
	void parallel_reduce(const std::string& label,
	const size_t& policy,
	const FunctorType& functor) {
	typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
	typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
	typedef typename Kokkos::Impl::if_c< (ValueTraits::StaticValueSize != 0)
	, typename ValueTraits::value_type
	, typename ValueTraits::pointer_type
	>::type value_type ;

	typedef Kokkos::View< value_type
	, Kokkos::HostSpace
	, Kokkos::MemoryUnmanaged
	> result_view_type;
	result_view_type result_view ;

	Impl::ParallelReduceAdaptor<policy_type,FunctorType,result_view_type>::execute(label,policy_type(0,policy),functor,result_view);
	}



	} //namespace Kokkos
	diff --git a/lib/kokkos/core/src/Kokkos_Qthread.hpp b/lib/kokkos/core/src/Kokkos_Qthreads.hpp
	similarity index 72%
	rename from lib/kokkos/core/src/Kokkos_Qthread.hpp
	rename to lib/kokkos/core/src/Kokkos_Qthreads.hpp
	index c58518b06..0507552c3 100644
	--- a/lib/kokkos/core/src/Kokkos_Qthread.hpp
	+++ b/lib/kokkos/core/src/Kokkos_Qthreads.hpp
	@@ -1,183 +1,198 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	-#ifndef KOKKOS_QTHREAD_HPP
	-#define KOKKOS_QTHREAD_HPP
	+#ifndef KOKKOS_QTHREADS_HPP
	+#define KOKKOS_QTHREADS_HPP
	+
	+#include <Kokkos_Core_fwd.hpp>
	+
	+#ifdef KOKKOS_ENABLE_QTHREADS
	+
	+// Defines to enable experimental Qthreads functionality.
	+#define QTHREAD_LOCAL_PRIORITY
	+#define CLONED_TASKS
	+
	+#include <qthread.h>

	#include <cstddef>
	#include <iosfwd>
	-#include <Kokkos_Core.hpp>
	-#include <Kokkos_Layout.hpp>
	-#include <Kokkos_MemoryTraits.hpp>
	+
	#include <Kokkos_HostSpace.hpp>
	-#include <Kokkos_ExecPolicy.hpp>
	+#include <Kokkos_ScratchSpace.hpp>
	+#include <Kokkos_Parallel.hpp>
	+//#include <Kokkos_MemoryTraits.hpp>
	+//#include <Kokkos_ExecPolicy.hpp>
	+//#include <Kokkos_TaskScheduler.hpp> // Uncomment when Tasking working.
	+#include <Kokkos_Layout.hpp>
	#include <impl/Kokkos_Tags.hpp>
	+#include <KokkosExp_MDRangePolicy.hpp>

	/--------------------------------------------------------------------------/

	namespace Kokkos {
	+
	namespace Impl {
	-class QthreadExec ;
	+
	+class QthreadsExec;
	+
	} // namespace Impl
	+
	} // namespace Kokkos

	/--------------------------------------------------------------------------/

	namespace Kokkos {

	-/** \brief Execution space supported by Qthread */
	-class Qthread {
	+/** \brief Execution space supported by Qthreads */
	+class Qthreads {
	public:
	//! \name Type declarations that all Kokkos devices must provide.
	//@{

	//! Tag this class as an execution space
	- typedef Qthread execution_space ;
	- typedef Kokkos::HostSpace memory_space ;
	+ typedef Qthreads execution_space;
	+ typedef Kokkos::HostSpace memory_space;
	//! This execution space preferred device_type
	- typedef Kokkos::Device<execution_space,memory_space> device_type;
	+ typedef Kokkos::Device< execution_space, memory_space > device_type;

	- typedef Kokkos::LayoutRight array_layout ;
	- typedef memory_space::size_type size_type ;
	+ typedef Kokkos::LayoutRight array_layout;
	+ typedef memory_space::size_type size_type;

	- typedef ScratchMemorySpace< Qthread > scratch_memory_space ;
	+ typedef ScratchMemorySpace< Qthreads > scratch_memory_space;

	//@}
	/------------------------------------------------------------------------/

	/** \brief Initialization will construct one or more instances */
	- static Qthread & instance( int = 0 );
	+ static Qthreads & instance( int = 0 );

	/** \brief Set the execution space to a "sleep" state.
	*
	* This function sets the "sleep" state in which it is not ready for work.
	* This may consume less resources than in an "ready" state,
	* but it may also take time to transition to the "ready" state.
	*
	* \return True if enters or is in the "sleep" state.
	* False if functions are currently executing.
	*/
	bool sleep();

	/** \brief Wake from the sleep state.
	- *
	+ *
	* \return True if enters or is in the "ready" state.
	* False if functions are currently executing.
	*/
	static bool wake();

	/** \brief Wait until all dispatched functions to complete.
	- *
	+ *
	* The parallel_for or parallel_reduce dispatch of a functor may
	* return asynchronously, before the functor completes. This
	* method does not return until all dispatched functors on this
	* device have completed.
	*/
	static void fence();

	/------------------------------------------------------------------------/

	static int in_parallel();

	static int is_initialized();

	/** \brief Return maximum amount of concurrency */
	static int concurrency();

	static void initialize( int thread_count );
	static void finalize();

	/** \brief Print configuration information to the given output stream. */
	- static void print_configuration( std::ostream & , const bool detail = false );
	+ static void print_configuration( std::ostream &, const bool detail = false );

	- int shepherd_size() const ;
	- int shepherd_worker_size() const ;
	+ int shepherd_size() const;
	+ int shepherd_worker_size() const;
	};

	-/--------------------------------------------------------------------------/
	-
	} // namespace Kokkos

	-/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/

	namespace Kokkos {
	+
	namespace Impl {

	template<>
	-struct MemorySpaceAccess
	- < Kokkos::Qthread::memory_space
	- , Kokkos::Qthread::scratch_memory_space
	+struct MemorySpaceAccess
	+ < Kokkos::Qthreads::memory_space
	+ , Kokkos::Qthreads::scratch_memory_space
	>
	{
	enum { assignable = false };
	enum { accessible = true };
	enum { deepcopy = false };
	};

	template<>
	struct VerifyExecutionCanAccessMemorySpace
	- < Kokkos::Qthread::memory_space
	- , Kokkos::Qthread::scratch_memory_space
	+ < Kokkos::Qthreads::memory_space
	+ , Kokkos::Qthreads::scratch_memory_space
	>
	{
	enum { value = true };
	- inline static void verify( void ) { }
	- inline static void verify( const void * ) { }
	+ inline static void verify( void ) {}
	+ inline static void verify( const void * ) {}
	};

	} // namespace Impl
	+
	} // namespace Kokkos

	/--------------------------------------------------------------------------/
	-/--------------------------------------------------------------------------/
	-
	-#include <Kokkos_Parallel.hpp>
	-#include <Qthread/Kokkos_QthreadExec.hpp>
	-#include <Qthread/Kokkos_Qthread_Parallel.hpp>

	-#endif /* #define KOKKOS_QTHREAD_HPP */
	+#include <Qthreads/Kokkos_QthreadsExec.hpp>
	+#include <Qthreads/Kokkos_Qthreads_Parallel.hpp>
	+//#include <Qthreads/Kokkos_Qthreads_Task.hpp> // Uncomment when Tasking working.
	+//#include <Qthreads/Kokkos_Qthreads_TaskQueue.hpp> // Uncomment when Tasking working.

	-//----------------------------------------------------------------------------
	-//----------------------------------------------------------------------------
	+#endif // #define KOKKOS_ENABLE_QTHREADS

	+#endif // #define KOKKOS_QTHREADS_HPP
	diff --git a/lib/kokkos/core/src/Kokkos_Serial.hpp b/lib/kokkos/core/src/Kokkos_Serial.hpp
	index f26253591..72710e816 100644
	--- a/lib/kokkos/core/src/Kokkos_Serial.hpp
	+++ b/lib/kokkos/core/src/Kokkos_Serial.hpp
	@@ -1,1123 +1,825 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	/// \file Kokkos_Serial.hpp
	/// \brief Declaration and definition of Kokkos::Serial device.

	#ifndef KOKKOS_SERIAL_HPP
	#define KOKKOS_SERIAL_HPP

	#include <cstddef>
	#include <iosfwd>
	#include <Kokkos_Parallel.hpp>
	#include <Kokkos_TaskScheduler.hpp>
	#include <Kokkos_Layout.hpp>
	#include <Kokkos_HostSpace.hpp>
	#include <Kokkos_ScratchSpace.hpp>
	#include <Kokkos_MemoryTraits.hpp>
	#include <impl/Kokkos_Tags.hpp>
	+#include <impl/Kokkos_HostThreadTeam.hpp>
	+#include <impl/Kokkos_FunctorAnalysis.hpp>
	#include <impl/Kokkos_FunctorAdapter.hpp>
	#include <impl/Kokkos_Profiling_Interface.hpp>

	#include <KokkosExp_MDRangePolicy.hpp>

	#if defined( KOKKOS_ENABLE_SERIAL )

	namespace Kokkos {

	/// \class Serial
	/// \brief Kokkos device for non-parallel execution
	///
	/// A "device" represents a parallel execution model. It tells Kokkos
	/// how to parallelize the execution of kernels in a parallel_for or
	/// parallel_reduce. For example, the Threads device uses Pthreads or
	/// C++11 threads on a CPU, the OpenMP device uses the OpenMP language
	/// extensions, and the Cuda device uses NVIDIA's CUDA programming
	/// model. The Serial device executes "parallel" kernels
	/// sequentially. This is useful if you really do not want to use
	/// threads, or if you want to explore different combinations of MPI
	/// and shared-memory parallel programming models.
	class Serial {
	public:
	//! \name Type declarations that all Kokkos devices must provide.
	//@{

	//! Tag this class as an execution space:
	typedef Serial execution_space ;
	//! The size_type typedef best suited for this device.
	typedef HostSpace::size_type size_type ;
	//! This device's preferred memory space.
	typedef HostSpace memory_space ;
	//! This execution space preferred device_type
	typedef Kokkos::Device<execution_space,memory_space> device_type;

	//! This device's preferred array layout.
	typedef LayoutRight array_layout ;

	/// \brief Scratch memory space
	typedef ScratchMemorySpace< Kokkos::Serial > scratch_memory_space ;

	//@}

	/// \brief True if and only if this method is being called in a
	/// thread-parallel function.
	///
	/// For the Serial device, this method <i>always</i> returns false,
	/// because parallel_for or parallel_reduce with the Serial device
	/// always execute sequentially.
	inline static int in_parallel() { return false ; }

	/** \brief Set the device in a "sleep" state.
	*
	* This function sets the device in a "sleep" state in which it is
	* not ready for work. This may consume less resources than if the
	* device were in an "awake" state, but it may also take time to
	* bring the device from a sleep state to be ready for work.
	*
	* \return True if the device is in the "sleep" state, else false if
	* the device is actively working and could not enter the "sleep"
	* state.
	*/
	static bool sleep();

	/// \brief Wake the device from the 'sleep' state so it is ready for work.
	///
	/// \return True if the device is in the "ready" state, else "false"
	/// if the device is actively working (which also means that it's
	/// awake).
	static bool wake();

	/// \brief Wait until all dispatched functors complete.
	///
	/// The parallel_for or parallel_reduce dispatch of a functor may
	/// return asynchronously, before the functor completes. This
	/// method does not return until all dispatched functors on this
	/// device have completed.
	static void fence() {}

	static void initialize( unsigned threads_count = 1 ,
	unsigned use_numa_count = 0 ,
	unsigned use_cores_per_numa = 0 ,
	- bool allow_asynchronous_threadpool = false) {
	- (void) threads_count;
	- (void) use_numa_count;
	- (void) use_cores_per_numa;
	- (void) allow_asynchronous_threadpool;
	-
	- // Init the array of locks used for arbitrarily sized atomics
	- Impl::init_lock_array_host_space();
	- #if (KOKKOS_ENABLE_PROFILING)
	- Kokkos::Profiling::initialize();
	- #endif
	- }
	+ bool allow_asynchronous_threadpool = false);

	- static int is_initialized() { return 1 ; }
	+ static int is_initialized();

	/** \brief Return the maximum amount of concurrency. */
	static int concurrency() {return 1;};

	//! Free any resources being consumed by the device.
	- static void finalize() {
	- #if (KOKKOS_ENABLE_PROFILING)
	- Kokkos::Profiling::finalize();
	- #endif
	- }
	+ static void finalize();

	//! Print configuration information to the given output stream.
	static void print_configuration( std::ostream & , const bool /* detail */ = false ) {}

	//--------------------------------------------------------------------------

	inline static int thread_pool_size( int = 0 ) { return 1 ; }
	KOKKOS_INLINE_FUNCTION static int thread_pool_rank() { return 0 ; }

	//--------------------------------------------------------------------------

	KOKKOS_INLINE_FUNCTION static unsigned hardware_thread_id() { return thread_pool_rank(); }
	inline static unsigned max_hardware_threads() { return thread_pool_size(0); }

	//--------------------------------------------------------------------------
	-
	- static void * scratch_memory_resize( unsigned reduce_size , unsigned shared_size );
	-
	- //--------------------------------------------------------------------------
	};

	} // namespace Kokkos

	/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/

	namespace Kokkos {
	namespace Impl {

	template<>
	-struct MemorySpaceAccess
	+struct MemorySpaceAccess
	< Kokkos::Serial::memory_space
	, Kokkos::Serial::scratch_memory_space
	>
	{
	enum { assignable = false };
	enum { accessible = true };
	enum { deepcopy = false };
	};

	template<>
	struct VerifyExecutionCanAccessMemorySpace
	< Kokkos::Serial::memory_space
	, Kokkos::Serial::scratch_memory_space
	>
	{
	enum { value = true };
	inline static void verify( void ) { }
	inline static void verify( const void * ) { }
	};

	-namespace SerialImpl {
	-
	-struct Sentinel {
	-
	- void * m_scratch ;
	- unsigned m_reduce_end ;
	- unsigned m_shared_end ;
	-
	- Sentinel();
	- ~Sentinel();
	- static Sentinel & singleton();
	-};
	-
	-inline
	-unsigned align( unsigned n );
	-}
	} // namespace Impl
	} // namespace Kokkos

	/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/

	namespace Kokkos {
	namespace Impl {

	-class SerialTeamMember {
	-private:
	- typedef Kokkos::ScratchMemorySpace< Kokkos::Serial > scratch_memory_space ;
	- const scratch_memory_space m_space ;
	- const int m_league_rank ;
	- const int m_league_size ;
	-
	- SerialTeamMember & operator = ( const SerialTeamMember & );
	-
	-public:
	-
	- KOKKOS_INLINE_FUNCTION
	- const scratch_memory_space & team_shmem() const { return m_space ; }
	-
	- KOKKOS_INLINE_FUNCTION
	- const scratch_memory_space & team_scratch(int) const
	- { return m_space ; }
	-
	- KOKKOS_INLINE_FUNCTION
	- const scratch_memory_space & thread_scratch(int) const
	- { return m_space ; }
	-
	- KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
	- KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
	- KOKKOS_INLINE_FUNCTION int team_rank() const { return 0 ; }
	- KOKKOS_INLINE_FUNCTION int team_size() const { return 1 ; }
	+// Resize thread team data scratch memory
	+void serial_resize_thread_team_data( size_t pool_reduce_bytes
	+ , size_t team_reduce_bytes
	+ , size_t team_shared_bytes
	+ , size_t thread_local_bytes );

	- KOKKOS_INLINE_FUNCTION void team_barrier() const {}
	+HostThreadTeamData * serial_get_thread_team_data();

	- template<class ValueType>
	- KOKKOS_INLINE_FUNCTION
	- void team_broadcast(const ValueType& , const int& ) const {}
	-
	- template< class ValueType, class JoinOp >
	- KOKKOS_INLINE_FUNCTION
	- ValueType team_reduce( const ValueType & value , const JoinOp & ) const
	- {
	- return value ;
	- }
	-
	- /** \brief Intra-team exclusive prefix sum with team_rank() ordering
	- * with intra-team non-deterministic ordering accumulation.
	- *
	- * The global inter-team accumulation value will, at the end of the
	- * league's parallel execution, be the scan's total.
	- * Parallel execution ordering of the league's teams is non-deterministic.
	- * As such the base value for each team's scan operation is similarly
	- * non-deterministic.
	- */
	- template< typename Type >
	- KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value , Type * const global_accum ) const
	- {
	- const Type tmp = global_accum ? *global_accum : Type(0) ;
	- if ( global_accum ) { *global_accum += value ; }
	- return tmp ;
	- }
	-
	- /** \brief Intra-team exclusive prefix sum with team_rank() ordering.
	- *
	- * The highest rank thread can compute the reduction total as
	- * reduction_total = dev.team_scan( value ) + value ;
	- */
	- template< typename Type >
	- KOKKOS_INLINE_FUNCTION Type team_scan( const Type & ) const
	- { return Type(0); }
	-
	- //----------------------------------------
	- // Execution space specific:
	+} /* namespace Impl */
	+} /* namespace Kokkos */

	- SerialTeamMember( int arg_league_rank
	- , int arg_league_size
	- , int arg_shared_size
	- );
	-};

	-} // namespace Impl
	+namespace Kokkos {
	+namespace Impl {

	/*
	* < Kokkos::Serial , WorkArgTag >
	* < WorkArgTag , Impl::enable_if< std::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value >::type >
	*
	*/
	-namespace Impl {
	template< class ... Properties >
	class TeamPolicyInternal< Kokkos::Serial , Properties ... >:public PolicyTraits<Properties...>
	{
	private:

	size_t m_team_scratch_size[2] ;
	size_t m_thread_scratch_size[2] ;
	int m_league_size ;
	int m_chunk_size;

	public:

	//! Tag this class as a kokkos execution policy
	typedef TeamPolicyInternal execution_policy ;

	typedef PolicyTraits<Properties ... > traits;

	//! Execution space of this execution policy:
	typedef Kokkos::Serial execution_space ;

	TeamPolicyInternal& operator = (const TeamPolicyInternal& p) {
	m_league_size = p.m_league_size;
	m_team_scratch_size[0] = p.m_team_scratch_size[0];
	m_thread_scratch_size[0] = p.m_thread_scratch_size[0];
	m_team_scratch_size[1] = p.m_team_scratch_size[1];
	m_thread_scratch_size[1] = p.m_thread_scratch_size[1];
	m_chunk_size = p.m_chunk_size;
	return *this;
	}

	//----------------------------------------

	template< class FunctorType >
	static
	int team_size_max( const FunctorType & ) { return 1 ; }

	template< class FunctorType >
	static
	int team_size_recommended( const FunctorType & ) { return 1 ; }

	template< class FunctorType >
	static
	int team_size_recommended( const FunctorType & , const int& ) { return 1 ; }

	//----------------------------------------

	inline int team_size() const { return 1 ; }
	inline int league_size() const { return m_league_size ; }
	inline size_t scratch_size(const int& level, int = 0) const { return m_team_scratch_size[level] + m_thread_scratch_size[level]; }

	/** \brief Specify league size, request team size */
	TeamPolicyInternal( execution_space &
	, int league_size_request
	, int /* team_size_request */
	, int /* vector_length_request */ = 1 )
	: m_team_scratch_size { 0 , 0 }
	, m_thread_scratch_size { 0 , 0 }
	, m_league_size( league_size_request )
	, m_chunk_size ( 32 )
	{}

	TeamPolicyInternal( execution_space &
	, int league_size_request
	, const Kokkos::AUTO_t & /* team_size_request */
	, int /* vector_length_request */ = 1 )
	: m_team_scratch_size { 0 , 0 }
	, m_thread_scratch_size { 0 , 0 }
	, m_league_size( league_size_request )
	, m_chunk_size ( 32 )
	{}

	TeamPolicyInternal( int league_size_request
	, int /* team_size_request */
	, int /* vector_length_request */ = 1 )
	: m_team_scratch_size { 0 , 0 }
	, m_thread_scratch_size { 0 , 0 }
	, m_league_size( league_size_request )
	, m_chunk_size ( 32 )
	{}

	TeamPolicyInternal( int league_size_request
	, const Kokkos::AUTO_t & /* team_size_request */
	, int /* vector_length_request */ = 1 )
	: m_team_scratch_size { 0 , 0 }
	, m_thread_scratch_size { 0 , 0 }
	, m_league_size( league_size_request )
	, m_chunk_size ( 32 )
	{}

	inline int chunk_size() const { return m_chunk_size ; }

	/** \brief set chunk_size to a discrete value*/
	inline TeamPolicyInternal set_chunk_size(typename traits::index_type chunk_size_) const {
	TeamPolicyInternal p = *this;
	p.m_chunk_size = chunk_size_;
	return p;
	}

	/** \brief set per team scratch size for a specific level of the scratch hierarchy */
	inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team) const {
	TeamPolicyInternal p = *this;
	p.m_team_scratch_size[level] = per_team.value;
	return p;
	};

	/** \brief set per thread scratch size for a specific level of the scratch hierarchy */
	inline TeamPolicyInternal set_scratch_size(const int& level, const PerThreadValue& per_thread) const {
	TeamPolicyInternal p = *this;
	p.m_thread_scratch_size[level] = per_thread.value;
	return p;
	};

	/** \brief set per thread and per team scratch size for a specific level of the scratch hierarchy */
	inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team, const PerThreadValue& per_thread) const {
	TeamPolicyInternal p = *this;
	p.m_team_scratch_size[level] = per_team.value;
	p.m_thread_scratch_size[level] = per_thread.value;
	return p;
	};

	- typedef Impl::SerialTeamMember member_type ;
	+ typedef Impl::HostThreadTeamMember< Kokkos::Serial > member_type ;
	};
	} /* namespace Impl */
	} /* namespace Kokkos */

	-/--------------------------------------------------------------------------/
	-/--------------------------------------------------------------------------/
	-
	/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/
	/* Parallel patterns for Kokkos::Serial with RangePolicy */

	namespace Kokkos {
	namespace Impl {

	template< class FunctorType , class ... Traits >
	class ParallelFor< FunctorType ,
	Kokkos::RangePolicy< Traits ... > ,
	Kokkos::Serial
	>
	{
	private:

	typedef Kokkos::RangePolicy< Traits ... > Policy ;

	const FunctorType m_functor ;
	const Policy m_policy ;

	template< class TagType >
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	exec() const
	{
	const typename Policy::member_type e = m_policy.end();
	for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
	m_functor( i );
	}
	}

	template< class TagType >
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	exec() const
	{
	const TagType t{} ;
	const typename Policy::member_type e = m_policy.end();
	for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
	m_functor( t , i );
	}
	}

	public:

	inline
	void execute() const
	{ this-> template exec< typename Policy::work_tag >(); }

	inline
	ParallelFor( const FunctorType & arg_functor
	, const Policy & arg_policy )
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	{}
	};

	/--------------------------------------------------------------------------/

	template< class FunctorType , class ReducerType , class ... Traits >
	class ParallelReduce< FunctorType
	, Kokkos::RangePolicy< Traits ... >
	, ReducerType
	, Kokkos::Serial
	>
	{
	private:

	typedef Kokkos::RangePolicy< Traits ... > Policy ;
	typedef typename Policy::work_tag WorkTag ;

	typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
	typedef typename ReducerConditional::type ReducerTypeFwd;

	- typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTag > ValueTraits ;
	+ typedef FunctorAnalysis< FunctorPatternInterface::REDUCE , Policy , FunctorType > Analysis ;
	+
	typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;

	- typedef typename ValueTraits::pointer_type pointer_type ;
	- typedef typename ValueTraits::reference_type reference_type ;
	+ typedef typename Analysis::pointer_type pointer_type ;
	+ typedef typename Analysis::reference_type reference_type ;

	const FunctorType m_functor ;
	const Policy m_policy ;
	const ReducerType m_reducer ;
	const pointer_type m_result_ptr ;

	template< class TagType >
	inline
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	- exec( pointer_type ptr ) const
	+ exec( reference_type update ) const
	{
	- reference_type update = ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );
	-
	const typename Policy::member_type e = m_policy.end();
	for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
	m_functor( i , update );
	}
	-
	- Kokkos::Impl::FunctorFinal< ReducerTypeFwd , TagType >::
	- final( ReducerConditional::select(m_functor , m_reducer) , ptr );
	}

	template< class TagType >
	inline
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	- exec( pointer_type ptr ) const
	+ exec( reference_type update ) const
	{
	const TagType t{} ;
	- reference_type update = ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );

	const typename Policy::member_type e = m_policy.end();
	for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
	m_functor( t , i , update );
	}
	-
	- Kokkos::Impl::FunctorFinal< ReducerTypeFwd , TagType >::
	- final( ReducerConditional::select(m_functor , m_reducer) , ptr );
	}

	public:

	inline
	void execute() const
	{
	- pointer_type ptr = (pointer_type) Kokkos::Serial::scratch_memory_resize
	- ( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , 0 );
	+ const size_t pool_reduce_size =
	+ Analysis::value_size( ReducerConditional::select(m_functor , m_reducer) );
	+ const size_t team_reduce_size = 0 ; // Never shrinks
	+ const size_t team_shared_size = 0 ; // Never shrinks
	+ const size_t thread_local_size = 0 ; // Never shrinks
	+
	+ serial_resize_thread_team_data( pool_reduce_size
	+ , team_reduce_size
	+ , team_shared_size
	+ , thread_local_size );
	+
	+ HostThreadTeamData & data = *serial_get_thread_team_data();

	- this-> template exec< WorkTag >( m_result_ptr ? m_result_ptr : ptr );
	+ pointer_type ptr =
	+ m_result_ptr ? m_result_ptr : pointer_type(data.pool_reduce_local());
	+
	+ reference_type update =
	+ ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );
	+
	+ this-> template exec< WorkTag >( update );
	+
	+ Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::
	+ final( ReducerConditional::select(m_functor , m_reducer) , ptr );
	}

	template< class HostViewType >
	ParallelReduce( const FunctorType & arg_functor ,
	const Policy & arg_policy ,
	const HostViewType & arg_result_view ,
	typename std::enable_if<
	Kokkos::is_view< HostViewType >::value &&
	!Kokkos::is_reducer_type<ReducerType>::value
	,void*>::type = NULL)
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	, m_reducer( InvalidType() )
	- , m_result_ptr( arg_result_view.ptr_on_device() )
	+ , m_result_ptr( arg_result_view.data() )
	{
	static_assert( Kokkos::is_view< HostViewType >::value
	, "Kokkos::Serial reduce result must be a View" );

	static_assert( std::is_same< typename HostViewType::memory_space , HostSpace >::value
	, "Kokkos::Serial reduce result must be a View in HostSpace" );
	}

	inline
	ParallelReduce( const FunctorType & arg_functor
	, Policy arg_policy
	, const ReducerType& reducer )
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	, m_reducer( reducer )
	, m_result_ptr( reducer.result_view().data() )
	{
	/*static_assert( std::is_same< typename ViewType::memory_space
	, Kokkos::HostSpace >::value
	, "Reduction result on Kokkos::OpenMP must be a Kokkos::View in HostSpace" );*/
	}
	};

	/--------------------------------------------------------------------------/

	template< class FunctorType , class ... Traits >
	class ParallelScan< FunctorType
	, Kokkos::RangePolicy< Traits ... >
	, Kokkos::Serial
	>
	{
	private:

	typedef Kokkos::RangePolicy< Traits ... > Policy ;
	typedef typename Policy::work_tag WorkTag ;
	- typedef Kokkos::Impl::FunctorValueTraits< FunctorType , WorkTag > ValueTraits ;
	+
	+ typedef FunctorAnalysis< FunctorPatternInterface::SCAN , Policy , FunctorType > Analysis ;
	+
	typedef Kokkos::Impl::FunctorValueInit< FunctorType , WorkTag > ValueInit ;

	- typedef typename ValueTraits::pointer_type pointer_type ;
	- typedef typename ValueTraits::reference_type reference_type ;
	+ typedef typename Analysis::pointer_type pointer_type ;
	+ typedef typename Analysis::reference_type reference_type ;

	const FunctorType m_functor ;
	const Policy m_policy ;

	template< class TagType >
	inline
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	- exec( pointer_type ptr ) const
	+ exec( reference_type update ) const
	{
	- reference_type update = ValueInit::init( m_functor , ptr );
	-
	const typename Policy::member_type e = m_policy.end();
	for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
	m_functor( i , update , true );
	}
	}

	template< class TagType >
	inline
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	- exec( pointer_type ptr ) const
	+ exec( reference_type update ) const
	{
	const TagType t{} ;
	- reference_type update = ValueInit::init( m_functor , ptr );
	-
	const typename Policy::member_type e = m_policy.end();
	for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
	m_functor( t , i , update , true );
	}
	}

	public:

	inline
	void execute() const
	{
	- pointer_type ptr = (pointer_type)
	- Kokkos::Serial::scratch_memory_resize( ValueTraits::value_size( m_functor ) , 0 );
	- this-> template exec< WorkTag >( ptr );
	+ const size_t pool_reduce_size = Analysis::value_size( m_functor );
	+ const size_t team_reduce_size = 0 ; // Never shrinks
	+ const size_t team_shared_size = 0 ; // Never shrinks
	+ const size_t thread_local_size = 0 ; // Never shrinks
	+
	+ serial_resize_thread_team_data( pool_reduce_size
	+ , team_reduce_size
	+ , team_shared_size
	+ , thread_local_size );
	+
	+ HostThreadTeamData & data = *serial_get_thread_team_data();
	+
	+ reference_type update =
	+ ValueInit::init( m_functor , pointer_type(data.pool_reduce_local()) );
	+
	+ this-> template exec< WorkTag >( update );
	}

	inline
	ParallelScan( const FunctorType & arg_functor
	, const Policy & arg_policy
	)
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	{}
	};

	} // namespace Impl
	} // namespace Kokkos

	/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/
	/* Parallel patterns for Kokkos::Serial with TeamPolicy */

	namespace Kokkos {
	namespace Impl {

	template< class FunctorType , class ... Properties >
	class ParallelFor< FunctorType
	, Kokkos::TeamPolicy< Properties ... >
	, Kokkos::Serial
	>
	{
	private:

	+ enum { TEAM_REDUCE_SIZE = 512 };
	+
	typedef TeamPolicyInternal< Kokkos::Serial , Properties ...> Policy ;
	typedef typename Policy::member_type Member ;

	const FunctorType m_functor ;
	const int m_league ;
	const int m_shared ;

	template< class TagType >
	inline
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	- exec() const
	+ exec( HostThreadTeamData & data ) const
	{
	for ( int ileague = 0 ; ileague < m_league ; ++ileague ) {
	- m_functor( Member(ileague,m_league,m_shared) );
	+ m_functor( Member(data,ileague,m_league) );
	}
	}

	template< class TagType >
	inline
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	- exec() const
	+ exec( HostThreadTeamData & data ) const
	{
	const TagType t{} ;
	for ( int ileague = 0 ; ileague < m_league ; ++ileague ) {
	- m_functor( t , Member(ileague,m_league,m_shared) );
	+ m_functor( t , Member(data,ileague,m_league) );
	}
	}

	public:

	inline
	void execute() const
	{
	- Kokkos::Serial::scratch_memory_resize( 0 , m_shared );
	- this-> template exec< typename Policy::work_tag >();
	+ const size_t pool_reduce_size = 0 ; // Never shrinks
	+ const size_t team_reduce_size = TEAM_REDUCE_SIZE ;
	+ const size_t team_shared_size = m_shared ;
	+ const size_t thread_local_size = 0 ; // Never shrinks
	+
	+ serial_resize_thread_team_data( pool_reduce_size
	+ , team_reduce_size
	+ , team_shared_size
	+ , thread_local_size );
	+
	+ HostThreadTeamData & data = *serial_get_thread_team_data();
	+
	+ this->template exec< typename Policy::work_tag >( data );
	}

	ParallelFor( const FunctorType & arg_functor
	, const Policy & arg_policy )
	: m_functor( arg_functor )
	, m_league( arg_policy.league_size() )
	- , m_shared( arg_policy.scratch_size(0) + arg_policy.scratch_size(1) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , 1 ) )
	+ , m_shared( arg_policy.scratch_size(0) +
	+ arg_policy.scratch_size(1) +
	+ FunctorTeamShmemSize< FunctorType >::value( arg_functor , 1 ) )
	{ }
	};

	/--------------------------------------------------------------------------/

	template< class FunctorType , class ReducerType , class ... Properties >
	class ParallelReduce< FunctorType
	, Kokkos::TeamPolicy< Properties ... >
	, ReducerType
	, Kokkos::Serial
	>
	{
	private:

	+ enum { TEAM_REDUCE_SIZE = 512 };
	+
	typedef TeamPolicyInternal< Kokkos::Serial, Properties ... > Policy ;
	+
	+ typedef FunctorAnalysis< FunctorPatternInterface::REDUCE , Policy , FunctorType > Analysis ;
	+
	typedef typename Policy::member_type Member ;
	typedef typename Policy::work_tag WorkTag ;

	typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
	typedef typename ReducerConditional::type ReducerTypeFwd;

	- typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTag > ValueTraits ;
	typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;

	- typedef typename ValueTraits::pointer_type pointer_type ;
	- typedef typename ValueTraits::reference_type reference_type ;
	+ typedef typename Analysis::pointer_type pointer_type ;
	+ typedef typename Analysis::reference_type reference_type ;

	const FunctorType m_functor ;
	const int m_league ;
	const ReducerType m_reducer ;
	pointer_type m_result_ptr ;
	const int m_shared ;

	template< class TagType >
	inline
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	- exec( pointer_type ptr ) const
	+ exec( HostThreadTeamData & data , reference_type update ) const
	{
	- reference_type update = ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );
	-
	for ( int ileague = 0 ; ileague < m_league ; ++ileague ) {
	- m_functor( Member(ileague,m_league,m_shared) , update );
	+ m_functor( Member(data,ileague,m_league) , update );
	}
	-
	- Kokkos::Impl::FunctorFinal< ReducerTypeFwd , TagType >::
	- final( ReducerConditional::select(m_functor , m_reducer) , ptr );
	}

	template< class TagType >
	inline
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	- exec( pointer_type ptr ) const
	+ exec( HostThreadTeamData & data , reference_type update ) const
	{
	const TagType t{} ;

	- reference_type update = ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );
	-
	for ( int ileague = 0 ; ileague < m_league ; ++ileague ) {
	- m_functor( t , Member(ileague,m_league,m_shared) , update );
	+ m_functor( t , Member(data,ileague,m_league) , update );
	}
	-
	- Kokkos::Impl::FunctorFinal< ReducerTypeFwd , TagType >::
	- final( ReducerConditional::select(m_functor , m_reducer) , ptr );
	}

	public:

	inline
	void execute() const
	{
	- pointer_type ptr = (pointer_type) Kokkos::Serial::scratch_memory_resize
	- ( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , m_shared );
	+ const size_t pool_reduce_size =
	+ Analysis::value_size( ReducerConditional::select(m_functor, m_reducer));
	+
	+ const size_t team_reduce_size = TEAM_REDUCE_SIZE ;
	+ const size_t team_shared_size = m_shared ;
	+ const size_t thread_local_size = 0 ; // Never shrinks
	+
	+ serial_resize_thread_team_data( pool_reduce_size
	+ , team_reduce_size
	+ , team_shared_size
	+ , thread_local_size );
	+

	- this-> template exec< WorkTag >( m_result_ptr ? m_result_ptr : ptr );
	+ HostThreadTeamData & data = *serial_get_thread_team_data();
	+
	+ pointer_type ptr =
	+ m_result_ptr ? m_result_ptr : pointer_type(data.pool_reduce_local());
	+
	+ reference_type update =
	+ ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );
	+
	+ this-> template exec< WorkTag >( data , update );
	+
	+ Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::
	+ final( ReducerConditional::select(m_functor , m_reducer) , ptr );
	}

	template< class ViewType >
	ParallelReduce( const FunctorType & arg_functor
	, const Policy & arg_policy
	, const ViewType & arg_result ,
	typename std::enable_if<
	Kokkos::is_view< ViewType >::value &&
	!Kokkos::is_reducer_type<ReducerType>::value
	,void*>::type = NULL)
	: m_functor( arg_functor )
	, m_league( arg_policy.league_size() )
	, m_reducer( InvalidType() )
	- , m_result_ptr( arg_result.ptr_on_device() )
	- , m_shared( arg_policy.scratch_size(0) + arg_policy.scratch_size(1) + FunctorTeamShmemSize< FunctorType >::value( m_functor , 1 ) )
	+ , m_result_ptr( arg_result.data() )
	+ , m_shared( arg_policy.scratch_size(0) +
	+ arg_policy.scratch_size(1) +
	+ FunctorTeamShmemSize< FunctorType >::value( m_functor , 1 ) )
	{
	static_assert( Kokkos::is_view< ViewType >::value
	, "Reduction result on Kokkos::Serial must be a Kokkos::View" );

	static_assert( std::is_same< typename ViewType::memory_space
	, Kokkos::HostSpace >::value
	, "Reduction result on Kokkos::Serial must be a Kokkos::View in HostSpace" );
	}

	inline
	ParallelReduce( const FunctorType & arg_functor
	- , Policy arg_policy
	- , const ReducerType& reducer )
	- : m_functor( arg_functor )
	- , m_league( arg_policy.league_size() )
	- , m_reducer( reducer )
	- , m_result_ptr( reducer.result_view().data() )
	- , m_shared( arg_policy.scratch_size(0) + arg_policy.scratch_size(1) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , arg_policy.team_size() ) )
	+ , Policy arg_policy
	+ , const ReducerType& reducer )
	+ : m_functor( arg_functor )
	+ , m_league( arg_policy.league_size() )
	+ , m_reducer( reducer )
	+ , m_result_ptr( reducer.result_view().data() )
	+ , m_shared( arg_policy.scratch_size(0) +
	+ arg_policy.scratch_size(1) +
	+ FunctorTeamShmemSize< FunctorType >::value( arg_functor , 1 ) )
	{
	/*static_assert( std::is_same< typename ViewType::memory_space
	, Kokkos::HostSpace >::value
	, "Reduction result on Kokkos::OpenMP must be a Kokkos::View in HostSpace" );*/
	}

	};

	} // namespace Impl
	} // namespace Kokkos

	/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/
	-/* Nested parallel patterns for Kokkos::Serial with TeamPolicy */
	-
	-namespace Kokkos {
	-namespace Impl {
	-
	-template<typename iType>
	-struct TeamThreadRangeBoundariesStruct<iType,SerialTeamMember> {
	- typedef iType index_type;
	- const iType begin ;
	- const iType end ;
	- enum {increment = 1};
	- const SerialTeamMember& thread;
	-
	- KOKKOS_INLINE_FUNCTION
	- TeamThreadRangeBoundariesStruct (const SerialTeamMember& arg_thread, const iType& arg_count)
	- : begin(0)
	- , end(arg_count)
	- , thread(arg_thread)
	- {}
	-
	- KOKKOS_INLINE_FUNCTION
	- TeamThreadRangeBoundariesStruct (const SerialTeamMember& arg_thread, const iType& arg_begin, const iType & arg_end )
	- : begin( arg_begin )
	- , end( arg_end)
	- , thread( arg_thread )
	- {}
	-};
	-
	- template<typename iType>
	- struct ThreadVectorRangeBoundariesStruct<iType,SerialTeamMember> {
	- typedef iType index_type;
	- enum {start = 0};
	- const iType end;
	- enum {increment = 1};
	-
	- KOKKOS_INLINE_FUNCTION
	- ThreadVectorRangeBoundariesStruct (const SerialTeamMember& thread, const iType& count):
	- end( count )
	- {}
	- };
	-
	-} // namespace Impl
	-
	-template< typename iType >
	-KOKKOS_INLINE_FUNCTION
	-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>
	-TeamThreadRange( const Impl::SerialTeamMember& thread, const iType & count )
	-{
	- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::SerialTeamMember >( thread, count );
	-}
	-
	-template< typename iType1, typename iType2 >
	-KOKKOS_INLINE_FUNCTION
	-Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
	- Impl::SerialTeamMember >
	-TeamThreadRange( const Impl::SerialTeamMember& thread, const iType1 & begin, const iType2 & end )
	-{
	- typedef typename std::common_type< iType1, iType2 >::type iType;
	- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::SerialTeamMember >( thread, iType(begin), iType(end) );
	-}
	-
	-template<typename iType>
	-KOKKOS_INLINE_FUNCTION
	-Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >
	- ThreadVectorRange(const Impl::SerialTeamMember& thread, const iType& count) {
	- return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >(thread,count);
	-}
	-
	-KOKKOS_INLINE_FUNCTION
	-Impl::ThreadSingleStruct<Impl::SerialTeamMember> PerTeam(const Impl::SerialTeamMember& thread) {
	- return Impl::ThreadSingleStruct<Impl::SerialTeamMember>(thread);
	-}
	-
	-KOKKOS_INLINE_FUNCTION
	-Impl::VectorSingleStruct<Impl::SerialTeamMember> PerThread(const Impl::SerialTeamMember& thread) {
	- return Impl::VectorSingleStruct<Impl::SerialTeamMember>(thread);
	-}
	-
	-} // namespace Kokkos
	-
	-namespace Kokkos {
	-
	- /** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all threads of the the calling thread team.
	- * This functionality requires C++11 support.*/
	-template<typename iType, class Lambda>
	-KOKKOS_INLINE_FUNCTION
	-void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>& loop_boundaries, const Lambda& lambda) {
	- for( iType i = loop_boundaries.begin; i < loop_boundaries.end; i+=loop_boundaries.increment)
	- lambda(i);
	-}
	-
	-/** \brief Inter-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
	- * val is performed and put into result. This functionality requires C++11 support.*/
	-template< typename iType, class Lambda, typename ValueType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>& loop_boundaries,
	- const Lambda & lambda, ValueType& result) {
	-
	- result = ValueType();
	-
	- for( iType i = loop_boundaries.begin; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- ValueType tmp = ValueType();
	- lambda(i,tmp);
	- result+=tmp;
	- }
	-
	- result = loop_boundaries.thread.team_reduce(result,Impl::JoinAdd<ValueType>());
	-}
	-
	-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
	- * val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
	- * The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
	- * the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
	- * '1 for '). This functionality requires C++11 support./
	-template< typename iType, class Lambda, typename ValueType, class JoinType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>& loop_boundaries,
	- const Lambda & lambda, const JoinType& join, ValueType& init_result) {
	-
	- ValueType result = init_result;
	-
	- for( iType i = loop_boundaries.begin; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- ValueType tmp = ValueType();
	- lambda(i,tmp);
	- join(result,tmp);
	- }
	-
	- init_result = loop_boundaries.thread.team_reduce(result,Impl::JoinLambdaAdapter<ValueType,JoinType>(join));
	-}
	-
	-} //namespace Kokkos
	-
	-namespace Kokkos {
	-/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
	- * This functionality requires C++11 support.*/
	-template<typename iType, class Lambda>
	-KOKKOS_INLINE_FUNCTION
	-void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >&
	- loop_boundaries, const Lambda& lambda) {
	- #ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	- #pragma ivdep
	- #endif
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
	- lambda(i);
	-}
	-
	-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
	- * val is performed and put into result. This functionality requires C++11 support.*/
	-template< typename iType, class Lambda, typename ValueType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >&
	- loop_boundaries, const Lambda & lambda, ValueType& result) {
	- result = ValueType();
	-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	-#pragma ivdep
	-#endif
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- ValueType tmp = ValueType();
	- lambda(i,tmp);
	- result+=tmp;
	- }
	-}
	-
	-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
	- * val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
	- * The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
	- * the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
	- * '1 for '). This functionality requires C++11 support./
	-template< typename iType, class Lambda, typename ValueType, class JoinType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >&
	- loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& init_result) {
	-
	- ValueType result = init_result;
	-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	-#pragma ivdep
	-#endif
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- ValueType tmp = ValueType();
	- lambda(i,tmp);
	- join(result,tmp);
	- }
	- init_result = result;
	-}
	-
	-/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
	- * for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all vector lanes in the thread and a scan operation is performed.
	- * Depending on the target execution space the operator might be called twice: once with final=false
	- * and once with final=true. When final==true val contains the prefix sum value. The contribution of this
	- * "i" needs to be added to val no matter whether final==true or not. In a serial execution
	- * (i.e. team_size==1) the operator is only called once with final==true. Scan_val will be set
	- * to the final sum value over all vector lanes.
	- * This functionality requires C++11 support.*/
	-template< typename iType, class FunctorType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >&
	- loop_boundaries, const FunctorType & lambda) {
	-
	- typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
	- typedef typename ValueTraits::value_type value_type ;
	-
	- value_type scan_val = value_type();
	-
	-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	-#pragma ivdep
	-#endif
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- lambda(i,scan_val,true);
	- }
	-}
	-
	-} // namespace Kokkos
	-
	-namespace Kokkos {
	-
	-template<class FunctorType>
	-KOKKOS_INLINE_FUNCTION
	-void single(const Impl::VectorSingleStruct<Impl::SerialTeamMember>& , const FunctorType& lambda) {
	- lambda();
	-}
	-
	-template<class FunctorType>
	-KOKKOS_INLINE_FUNCTION
	-void single(const Impl::ThreadSingleStruct<Impl::SerialTeamMember>& , const FunctorType& lambda) {
	- lambda();
	-}
	-
	-template<class FunctorType, class ValueType>
	-KOKKOS_INLINE_FUNCTION
	-void single(const Impl::VectorSingleStruct<Impl::SerialTeamMember>& , const FunctorType& lambda, ValueType& val) {
	- lambda(val);
	-}
	-
	-template<class FunctorType, class ValueType>
	-KOKKOS_INLINE_FUNCTION
	-void single(const Impl::ThreadSingleStruct<Impl::SerialTeamMember>& , const FunctorType& lambda, ValueType& val) {
	- lambda(val);
	-}
	-}
	-
	-//----------------------------------------------------------------------------

	#include <impl/Kokkos_Serial_Task.hpp>

	#endif // defined( KOKKOS_ENABLE_SERIAL )
	#endif /* #define KOKKOS_SERIAL_HPP */

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------
	diff --git a/lib/kokkos/core/src/Kokkos_TaskScheduler.hpp b/lib/kokkos/core/src/Kokkos_TaskScheduler.hpp
	index e4271aa18..e25039d23 100644
	--- a/lib/kokkos/core/src/Kokkos_TaskScheduler.hpp
	+++ b/lib/kokkos/core/src/Kokkos_TaskScheduler.hpp
	@@ -1,692 +1,851 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_TASKSCHEDULER_HPP
	#define KOKKOS_TASKSCHEDULER_HPP

	//----------------------------------------------------------------------------

	#include <Kokkos_Core_fwd.hpp>

	// If compiling with CUDA then must be using CUDA 8 or better
	// and use relocateable device code to enable the task policy.
	// nvcc relocatable device code option: --relocatable-device-code=true

	#if ( defined( KOKKOS_ENABLE_CUDA ) )
	#if ( 8000 <= CUDA_VERSION ) && \
	defined( KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE )

	#define KOKKOS_ENABLE_TASKDAG

	#endif
	#else
	#define KOKKOS_ENABLE_TASKDAG
	#endif

	#if defined( KOKKOS_ENABLE_TASKDAG )

	//----------------------------------------------------------------------------

	#include <Kokkos_MemoryPool.hpp>
	#include <impl/Kokkos_Tags.hpp>

	//----------------------------------------------------------------------------

	namespace Kokkos {

	// Forward declarations used in Impl::TaskQueue

	template< typename Arg1 = void , typename Arg2 = void >
	class Future ;

	template< typename Space >
	class TaskScheduler ;

	+template< typename Space >
	+void wait( TaskScheduler< Space > const & );
	+
	+template< typename Space >
	+struct is_scheduler : public std::false_type {};
	+
	+template< typename Space >
	+struct is_scheduler< TaskScheduler< Space > > : public std::true_type {};
	+
	} // namespace Kokkos

	#include <impl/Kokkos_TaskQueue.hpp>

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	/*\brief Implementation data for task data management, access, and execution.
	*
	* CRTP Inheritance structure to allow static_cast from the
	* task root type and a task's FunctorType.
	*
	* TaskBase< Space , ResultType , FunctorType >
	* : TaskBase< Space , ResultType , void >
	* , FunctorType
	* { ... };
	*
	* TaskBase< Space , ResultType , void >
	* : TaskBase< Space , void , void >
	* { ... };
	*/
	template< typename Space , typename ResultType , typename FunctorType >
	class TaskBase ;

	-template< typename Space >
	-class TaskExec ;
	-
	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------

	namespace Kokkos {

	/**
	*
	* Future< space > // value_type == void
	* Future< value > // space == Default
	* Future< value , space >
	*
	*/
	template< typename Arg1 , typename Arg2 >
	class Future {
	private:

	template< typename > friend class TaskScheduler ;
	template< typename , typename > friend class Future ;
	template< typename , typename , typename > friend class Impl::TaskBase ;

	enum { Arg1_is_space = Kokkos::is_space< Arg1 >::value };
	enum { Arg2_is_space = Kokkos::is_space< Arg2 >::value };
	enum { Arg1_is_value = ! Arg1_is_space &&
	! std::is_same< Arg1 , void >::value };
	enum { Arg2_is_value = ! Arg2_is_space &&
	! std::is_same< Arg2 , void >::value };

	static_assert( ! ( Arg1_is_space && Arg2_is_space )
	, "Future cannot be given two spaces" );

	static_assert( ! ( Arg1_is_value && Arg2_is_value )
	, "Future cannot be given two value types" );

	using ValueType =
	typename std::conditional< Arg1_is_value , Arg1 ,
	typename std::conditional< Arg2_is_value , Arg2 , void
	>::type >::type ;

	using Space =
	typename std::conditional< Arg1_is_space , Arg1 ,
	typename std::conditional< Arg2_is_space , Arg2 , void
	>::type >::type ;

	using task_base = Impl::TaskBase< Space , ValueType , void > ;
	using queue_type = Impl::TaskQueue< Space > ;

	task_base * m_task ;

	KOKKOS_INLINE_FUNCTION explicit
	Future( task_base * task ) : m_task(0)
	{ if ( task ) queue_type::assign( & m_task , task ); }

	//----------------------------------------

	public:

	using execution_space = typename Space::execution_space ;
	using value_type = ValueType ;

	//----------------------------------------

	KOKKOS_INLINE_FUNCTION
	bool is_null() const { return 0 == m_task ; }

	KOKKOS_INLINE_FUNCTION
	int reference_count() const
	{ return 0 != m_task ? m_task->reference_count() : 0 ; }

	//----------------------------------------

	KOKKOS_INLINE_FUNCTION
	void clear()
	{ if ( m_task ) queue_type::assign( & m_task , (task_base*)0 ); }

	//----------------------------------------

	KOKKOS_INLINE_FUNCTION
	~Future() { clear(); }

	//----------------------------------------

	KOKKOS_INLINE_FUNCTION
	constexpr Future() noexcept : m_task(0) {}

	KOKKOS_INLINE_FUNCTION
	Future( Future && rhs )
	: m_task( rhs.m_task ) { rhs.m_task = 0 ; }

	KOKKOS_INLINE_FUNCTION
	Future( const Future & rhs )
	: m_task(0)
	{ if ( rhs.m_task ) queue_type::assign( & m_task , rhs.m_task ); }

	KOKKOS_INLINE_FUNCTION
	Future & operator = ( Future && rhs )
	{
	clear();
	m_task = rhs.m_task ;
	rhs.m_task = 0 ;
	return *this ;
	}

	KOKKOS_INLINE_FUNCTION
	Future & operator = ( const Future & rhs )
	{
	if ( m_task \|\| rhs.m_task ) queue_type::assign( & m_task , rhs.m_task );
	return *this ;
	}

	//----------------------------------------

	template< class A1 , class A2 >
	KOKKOS_INLINE_FUNCTION
	Future( Future<A1,A2> && rhs )
	: m_task( rhs.m_task )
	{
	static_assert
	( std::is_same< Space , void >::value \|\|
	std::is_same< Space , typename Future<A1,A2>::Space >::value
	, "Assigned Futures must have the same space" );

	static_assert
	( std::is_same< value_type , void >::value \|\|
	std::is_same< value_type , typename Future<A1,A2>::value_type >::value
	, "Assigned Futures must have the same value_type" );

	rhs.m_task = 0 ;
	}

	template< class A1 , class A2 >
	KOKKOS_INLINE_FUNCTION
	Future( const Future<A1,A2> & rhs )
	: m_task(0)
	{
	static_assert
	( std::is_same< Space , void >::value \|\|
	std::is_same< Space , typename Future<A1,A2>::Space >::value
	, "Assigned Futures must have the same space" );

	static_assert
	( std::is_same< value_type , void >::value \|\|
	std::is_same< value_type , typename Future<A1,A2>::value_type >::value
	, "Assigned Futures must have the same value_type" );

	if ( rhs.m_task ) queue_type::assign( & m_task , rhs.m_task );
	}

	template< class A1 , class A2 >
	KOKKOS_INLINE_FUNCTION
	Future & operator = ( const Future<A1,A2> & rhs )
	{
	static_assert
	( std::is_same< Space , void >::value \|\|
	std::is_same< Space , typename Future<A1,A2>::Space >::value
	, "Assigned Futures must have the same space" );

	static_assert
	( std::is_same< value_type , void >::value \|\|
	std::is_same< value_type , typename Future<A1,A2>::value_type >::value
	, "Assigned Futures must have the same value_type" );

	if ( m_task \|\| rhs.m_task ) queue_type::assign( & m_task , rhs.m_task );
	return *this ;
	}

	template< class A1 , class A2 >
	KOKKOS_INLINE_FUNCTION
	Future & operator = ( Future<A1,A2> && rhs )
	{
	static_assert
	( std::is_same< Space , void >::value \|\|
	std::is_same< Space , typename Future<A1,A2>::Space >::value
	, "Assigned Futures must have the same space" );

	static_assert
	( std::is_same< value_type , void >::value \|\|
	std::is_same< value_type , typename Future<A1,A2>::value_type >::value
	, "Assigned Futures must have the same value_type" );

	clear();
	m_task = rhs.m_task ;
	rhs.m_task = 0 ;
	return *this ;
	}

	//----------------------------------------

	KOKKOS_INLINE_FUNCTION
	typename task_base::get_return_type
	get() const
	{
	if ( 0 == m_task ) {
	Kokkos::abort( "Kokkos:::Future::get ERROR: is_null()");
	}
	return m_task->get();
	}
	};

	+// Is a Future with the given execution space
	+template< typename , typename ExecSpace = void >
	+struct is_future : public std::false_type {};
	+
	+template< typename Arg1 , typename Arg2 , typename ExecSpace >
	+struct is_future< Future<Arg1,Arg2> , ExecSpace >
	+ : public std::integral_constant
	+ < bool ,
	+ ( std::is_same< ExecSpace , void >::value \|\|
	+ std::is_same< ExecSpace
	+ , typename Future<Arg1,Arg2>::execution_space >::value )
	+ > {};
	+
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {

	-enum TaskType { TaskTeam = Impl::TaskBase<void,void,void>::TaskTeam
	- , TaskSingle = Impl::TaskBase<void,void,void>::TaskSingle };
	+enum class TaskPriority : int { High = 0
	+ , Regular = 1
	+ , Low = 2 };

	-enum TaskPriority { TaskHighPriority = 0
	- , TaskRegularPriority = 1
	- , TaskLowPriority = 2 };
	+} // namespace Kokkos

	-template< typename Space >
	-void wait( TaskScheduler< Space > const & );
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+namespace Impl {
	+
	+//----------------------------------------------------------------------------
	+
	+template< int TaskEnum , typename DepFutureType >
	+struct TaskPolicyData
	+{
	+ using execution_space = typename DepFutureType::execution_space ;
	+ using scheduler_type = TaskScheduler< execution_space > ;
	+
	+ enum : int { m_task_type = TaskEnum };
	+
	+ scheduler_type const * m_scheduler ;
	+ DepFutureType const m_dependence ;
	+ int m_priority ;
	+
	+ TaskPolicyData() = delete ;
	+ TaskPolicyData( TaskPolicyData && ) = default ;
	+ TaskPolicyData( TaskPolicyData const & ) = default ;
	+ TaskPolicyData & operator = ( TaskPolicyData && ) = default ;
	+ TaskPolicyData & operator = ( TaskPolicyData const & ) = default ;
	+
	+ KOKKOS_INLINE_FUNCTION
	+ TaskPolicyData( DepFutureType && arg_future
	+ , Kokkos::TaskPriority const & arg_priority )
	+ : m_scheduler( 0 )
	+ , m_dependence( arg_future )
	+ , m_priority( static_cast<int>( arg_priority ) )
	+ {}
	+
	+ KOKKOS_INLINE_FUNCTION
	+ TaskPolicyData( scheduler_type const & arg_scheduler
	+ , Kokkos::TaskPriority const & arg_priority )
	+ : m_scheduler( & arg_scheduler )
	+ , m_dependence()
	+ , m_priority( static_cast<int>( arg_priority ) )
	+ {}
	+};

	+} // namespace Impl
	} // namespace Kokkos

	+//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {

	template< typename ExecSpace >
	class TaskScheduler
	{
	private:

	using track_type = Kokkos::Impl::SharedAllocationTracker ;
	using queue_type = Kokkos::Impl::TaskQueue< ExecSpace > ;
	using task_base = Impl::TaskBase< ExecSpace , void , void > ;

	track_type m_track ;
	queue_type * m_queue ;

	//----------------------------------------
	- // Process optional arguments to spawn and respawn functions
	-
	- KOKKOS_INLINE_FUNCTION static
	- void assign( task_base * const ) {}
	-
	- // TaskTeam or TaskSingle
	- template< typename ... Options >
	- KOKKOS_INLINE_FUNCTION static
	- void assign( task_base * const task
	- , TaskType const & arg
	- , Options const & ... opts )
	- {
	- task->m_task_type = arg ;
	- assign( task , opts ... );
	- }
	-
	- // TaskHighPriority or TaskRegularPriority or TaskLowPriority
	- template< typename ... Options >
	- KOKKOS_INLINE_FUNCTION static
	- void assign( task_base * const task
	- , TaskPriority const & arg
	- , Options const & ... opts )
	- {
	- task->m_priority = arg ;
	- assign( task , opts ... );
	- }
	-
	- // Future for a dependence
	- template< typename A1 , typename A2 , typename ... Options >
	- KOKKOS_INLINE_FUNCTION static
	- void assign( task_base * const task
	- , Future< A1 , A2 > const & arg
	- , Options const & ... opts )
	- {
	- task->add_dependence( arg.m_task );
	- assign( task , opts ... );
	- }
	-
	- //----------------------------------------

	public:

	- using execution_policy = TaskScheduler ;
	using execution_space = ExecSpace ;
	using memory_space = typename queue_type::memory_space ;
	- using member_type = Kokkos::Impl::TaskExec< ExecSpace > ;
	+ using member_type =
	+ typename Kokkos::Impl::TaskQueueSpecialization< ExecSpace >::member_type ;

	KOKKOS_INLINE_FUNCTION
	TaskScheduler() : m_track(), m_queue(0) {}

	KOKKOS_INLINE_FUNCTION
	TaskScheduler( TaskScheduler && rhs ) = default ;

	KOKKOS_INLINE_FUNCTION
	TaskScheduler( TaskScheduler const & rhs ) = default ;

	KOKKOS_INLINE_FUNCTION
	TaskScheduler & operator = ( TaskScheduler && rhs ) = default ;

	KOKKOS_INLINE_FUNCTION
	TaskScheduler & operator = ( TaskScheduler const & rhs ) = default ;

	TaskScheduler( memory_space const & arg_memory_space
	, unsigned const arg_memory_pool_capacity
	, unsigned const arg_memory_pool_log2_superblock = 12 )
	: m_track()
	, m_queue(0)
	{
	typedef Kokkos::Impl::SharedAllocationRecord
	< memory_space , typename queue_type::Destroy >
	record_type ;

	record_type * record =
	record_type::allocate( arg_memory_space
	, "TaskQueue"
	, sizeof(queue_type)
	);

	m_queue = new( record->data() )
	queue_type( arg_memory_space
	, arg_memory_pool_capacity
	, arg_memory_pool_log2_superblock );

	record->m_destroy.m_queue = m_queue ;

	m_track.assign_allocated_record_to_uninitialized( record );
	}

	//----------------------------------------
	/*\brief Allocation size for a spawned task /
	template< typename FunctorType >
	KOKKOS_FUNCTION
	size_t spawn_allocation_size() const
	{
	using task_type = Impl::TaskBase< execution_space
	, typename FunctorType::value_type
	, FunctorType > ;

	return m_queue->allocate_block_size( sizeof(task_type) );
	}

	/*\brief Allocation size for a when_all aggregate /
	KOKKOS_FUNCTION
	size_t when_all_allocation_size( int narg ) const
	{
	using task_base = Kokkos::Impl::TaskBase< ExecSpace , void , void > ;

	return m_queue->allocate_block_size( sizeof(task_base) + narg * sizeof(task_base*) );
	}

	//----------------------------------------

	- /**\brief A task spawns a task with options
	- *
	- * 1) High, Normal, or Low priority
	- * 2) With or without dependence
	- * 3) Team or Serial
	- */
	- template< typename FunctorType , typename ... Options >
	- KOKKOS_FUNCTION
	- Future< typename FunctorType::value_type , ExecSpace >
	- task_spawn( FunctorType const & arg_functor
	- , Options const & ... arg_options
	- ) const
	+ template< int TaskEnum , typename DepFutureType , typename FunctorType >
	+ KOKKOS_FUNCTION static
	+ Kokkos::Future< typename FunctorType::value_type , execution_space >
	+ spawn( Impl::TaskPolicyData<TaskEnum,DepFutureType> const & arg_policy
	+ , typename task_base::function_type arg_function
	+ , FunctorType && arg_functor
	+ )
	{
	using value_type = typename FunctorType::value_type ;
	using future_type = Future< value_type , execution_space > ;
	using task_type = Impl::TaskBase< execution_space
	, value_type
	, FunctorType > ;

	+ queue_type * const queue =
	+ arg_policy.m_scheduler ? arg_policy.m_scheduler->m_queue : (
	+ arg_policy.m_dependence.m_task
	+ ? arg_policy.m_dependence.m_task->m_queue
	+ : (queue_type*) 0 );
	+
	+ if ( 0 == queue ) {
	+ Kokkos::abort("Kokkos spawn given null Future" );
	+ }
	+
	//----------------------------------------
	// Give single-thread back-ends an opportunity to clear
	// queue of ready tasks before allocating a new task

	- m_queue->iff_single_thread_recursive_execute();
	+ queue->iff_single_thread_recursive_execute();

	//----------------------------------------

	future_type f ;

	// Allocate task from memory pool
	f.m_task =
	- reinterpret_cast< task_type * >(m_queue->allocate(sizeof(task_type)));
	+ reinterpret_cast< task_type * >(queue->allocate(sizeof(task_type)));

	if ( f.m_task ) {

	// Placement new construction
	- new ( f.m_task ) task_type( arg_functor );
	-
	- // Reference count starts at two
	- // +1 for matching decrement when task is complete
	- // +1 for future
	- f.m_task->m_queue = m_queue ;
	- f.m_task->m_ref_count = 2 ;
	- f.m_task->m_alloc_size = sizeof(task_type);
	-
	- assign( f.m_task , arg_options... );
	-
	- // Spawning from within the execution space so the
	- // apply function pointer is guaranteed to be valid
	- f.m_task->m_apply = task_type::apply ;
	-
	- m_queue->schedule( f.m_task );
	- // this task may be updated or executed at any moment
	+ // Reference count starts at two:
	+ // +1 for the matching decrement when task is complete
	+ // +1 for the future
	+ new ( f.m_task )
	+ task_type( arg_function
	+ , queue
	+ , arg_policy.m_dependence.m_task /* dependence */
	+ , 2 /* reference count */
	+ , int(sizeof(task_type)) /* allocation size */
	+ , int(arg_policy.m_task_type)
	+ , int(arg_policy.m_priority)
	+ , std::move(arg_functor) );
	+
	+ // The dependence (if any) is processed immediately
	+ // within the schedule function, as such the dependence's
	+ // reference count does not need to be incremented for
	+ // the assignment.
	+
	+ queue->schedule_runnable( f.m_task );
	+ // This task may be updated or executed at any moment,
	+ // even during the call to 'schedule'.
	}

	return f ;
	}

	- /**\brief The host process spawns a task with options
	- *
	- * 1) High, Normal, or Low priority
	- * 2) With or without dependence
	- * 3) Team or Serial
	- */
	- template< typename FunctorType , typename ... Options >
	- inline
	- Future< typename FunctorType::value_type , ExecSpace >
	- host_spawn( FunctorType const & arg_functor
	- , Options const & ... arg_options
	- ) const
	+ template< typename FunctorType , typename A1 , typename A2 >
	+ KOKKOS_FUNCTION static
	+ void
	+ respawn( FunctorType * arg_self
	+ , Future<A1,A2> const & arg_dependence
	+ , TaskPriority const & arg_priority
	+ )
	{
	+ // Precondition: task is in Executing state
	+
	using value_type = typename FunctorType::value_type ;
	- using future_type = Future< value_type , execution_space > ;
	using task_type = Impl::TaskBase< execution_space
	, value_type
	, FunctorType > ;

	- if ( m_queue == 0 ) {
	- Kokkos::abort("Kokkos::TaskScheduler not initialized");
	- }
	+ task_type * const task = static_cast< task_type * >( arg_self );

	- future_type f ;
	+ task->m_priority = static_cast<int>(arg_priority);

	- // Allocate task from memory pool
	- f.m_task =
	- reinterpret_cast<task_type*>( m_queue->allocate(sizeof(task_type)) );
	-
	- if ( f.m_task ) {
	-
	- // Placement new construction
	- new( f.m_task ) task_type( arg_functor );
	-
	- // Reference count starts at two:
	- // +1 to match decrement when task completes
	- // +1 for the future
	- f.m_task->m_queue = m_queue ;
	- f.m_task->m_ref_count = 2 ;
	- f.m_task->m_alloc_size = sizeof(task_type);
	-
	- assign( f.m_task , arg_options... );
	-
	- // Potentially spawning outside execution space so the
	- // apply function pointer must be obtained from execution space.
	- // Required for Cuda execution space function pointer.
	- m_queue->template proc_set_apply< FunctorType >( & f.m_task->m_apply );
	+ task->add_dependence( arg_dependence.m_task );

	- m_queue->schedule( f.m_task );
	- }
	- return f ;
	+ // Postcondition: task is in Executing-Respawn state
	}

	+ //----------------------------------------
	/**\brief Return a future that is complete
	* when all input futures are complete.
	*/
	template< typename A1 , typename A2 >
	- KOKKOS_FUNCTION
	- Future< ExecSpace >
	- when_all( int narg , Future< A1 , A2 > const * const arg ) const
	+ KOKKOS_FUNCTION static
	+ Future< execution_space >
	+ when_all( Future< A1 , A2 > const arg[] , int narg )
	{
	- static_assert
	- ( std::is_same< execution_space
	- , typename Future< A1 , A2 >::execution_space
	- >::value
	- , "Future must have same execution space" );
	-
	- using future_type = Future< ExecSpace > ;
	- using task_base = Kokkos::Impl::TaskBase< ExecSpace , void , void > ;
	+ using future_type = Future< execution_space > ;
	+ using task_base = Kokkos::Impl::TaskBase< execution_space , void , void > ;

	future_type f ;

	- size_t const size = sizeof(task_base) + narg * sizeof(task_base*);
	-
	- f.m_task =
	- reinterpret_cast< task_base * >( m_queue->allocate( size ) );
	+ if ( narg ) {

	- if ( f.m_task ) {
	-
	- new( f.m_task ) task_base();
	-
	- // Reference count starts at two:
	- // +1 to match decrement when task completes
	- // +1 for the future
	- f.m_task->m_queue = m_queue ;
	- f.m_task->m_ref_count = 2 ;
	- f.m_task->m_alloc_size = size ;
	- f.m_task->m_dep_count = narg ;
	- f.m_task->m_task_type = task_base::Aggregate ;
	-
	- task_base ** const dep = f.m_task->aggregate_dependences();
	-
	- // Assign dependences to increment their reference count
	- // The futures may be destroyed upon returning from this call
	- // so increment reference count to track this assignment.
	+ queue_type * queue = 0 ;

	for ( int i = 0 ; i < narg ; ++i ) {
	- task_base * const t = dep[i] = arg[i].m_task ;
	+ task_base * const t = arg[i].m_task ;
	if ( 0 != t ) {
	+ // Increment reference count to track subsequent assignment.
	Kokkos::atomic_increment( &(t->m_ref_count) );
	+ if ( queue == 0 ) {
	+ queue = t->m_queue ;
	+ }
	+ else if ( queue != t->m_queue ) {
	+ Kokkos::abort("Kokkos when_all Futures must be in the same scheduler" );
	+ }
	}
	}

	- m_queue->schedule( f.m_task );
	- // this when_all may be processed at any moment
	- }
	+ if ( queue != 0 ) {

	- return f ;
	- }
	+ size_t const size = sizeof(task_base) + narg * sizeof(task_base*);

	- /**\brief An executing task respawns itself with options
	- *
	- * 1) High, Normal, or Low priority
	- * 2) With or without dependence
	- */
	- template< class FunctorType , typename ... Options >
	- KOKKOS_FUNCTION
	- void respawn( FunctorType * task_self
	- , Options const & ... arg_options ) const
	- {
	- using value_type = typename FunctorType::value_type ;
	- using task_type = Impl::TaskBase< execution_space
	- , value_type
	- , FunctorType > ;
	+ f.m_task =
	+ reinterpret_cast< task_base * >( queue->allocate( size ) );

	- task_type * const task = static_cast< task_type * >( task_self );
	+ if ( f.m_task ) {

	- // Reschedule task with no dependences.
	- m_queue->reschedule( task );
	+ // Reference count starts at two:
	+ // +1 to match decrement when task completes
	+ // +1 for the future
	+ new( f.m_task ) task_base( queue
	+ , 2 /* reference count */
	+ , size /* allocation size */
	+ , narg /* dependence count */
	+ );

	- // Dependences, if requested, are added here through parsing the arguments.
	- assign( task , arg_options... );
	- }
	+ // Assign dependences, reference counts were already incremented

	- //----------------------------------------
	+ task_base ** const dep = f.m_task->aggregate_dependences();

	- template< typename S >
	- friend
	- void Kokkos::wait( Kokkos::TaskScheduler< S > const & );
	+ for ( int i = 0 ; i < narg ; ++i ) { dep[i] = arg[i].m_task ; }
	+
	+ queue->schedule_aggregate( f.m_task );
	+ // this when_all may be processed at any moment
	+ }
	+ }
	+ }
	+
	+ return f ;
	+ }

	//----------------------------------------

	- inline
	+ KOKKOS_INLINE_FUNCTION
	int allocation_capacity() const noexcept
	{ return m_queue->m_memory.get_mem_size(); }

	KOKKOS_INLINE_FUNCTION
	int allocated_task_count() const noexcept
	{ return m_queue->m_count_alloc ; }

	KOKKOS_INLINE_FUNCTION
	int allocated_task_count_max() const noexcept
	{ return m_queue->m_max_alloc ; }

	KOKKOS_INLINE_FUNCTION
	long allocated_task_count_accum() const noexcept
	{ return m_queue->m_accum_alloc ; }

	+ //----------------------------------------
	+
	+ template< typename S >
	+ friend
	+ void Kokkos::wait( Kokkos::TaskScheduler< S > const & );
	+
	};

	+} // namespace Kokkos
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+
	+//----------------------------------------------------------------------------
	+// Construct a TaskTeam execution policy
	+
	+template< typename T >
	+Kokkos::Impl::TaskPolicyData
	+ < Kokkos::Impl::TaskBase<void,void,void>::TaskTeam
	+ , typename std::conditional< Kokkos::is_future< T >::value , T ,
	+ typename Kokkos::Future< typename T::execution_space > >::type
	+ >
	+KOKKOS_INLINE_FUNCTION
	+TaskTeam( T const & arg
	+ , TaskPriority const & arg_priority = TaskPriority::Regular
	+ )
	+{
	+ static_assert( Kokkos::is_future<T>::value \|\|
	+ Kokkos::is_scheduler<T>::value
	+ , "Kokkos TaskTeam argument must be Future or TaskScheduler" );
	+
	+ return
	+ Kokkos::Impl::TaskPolicyData
	+ < Kokkos::Impl::TaskBase<void,void,void>::TaskTeam
	+ , typename std::conditional< Kokkos::is_future< T >::value , T ,
	+ typename Kokkos::Future< typename T::execution_space > >::type
	+ >( arg , arg_priority );
	+}
	+
	+// Construct a TaskSingle execution policy
	+
	+template< typename T >
	+Kokkos::Impl::TaskPolicyData
	+ < Kokkos::Impl::TaskBase<void,void,void>::TaskSingle
	+ , typename std::conditional< Kokkos::is_future< T >::value , T ,
	+ typename Kokkos::Future< typename T::execution_space > >::type
	+ >
	+KOKKOS_INLINE_FUNCTION
	+TaskSingle( T const & arg
	+ , TaskPriority const & arg_priority = TaskPriority::Regular
	+ )
	+{
	+ static_assert( Kokkos::is_future<T>::value \|\|
	+ Kokkos::is_scheduler<T>::value
	+ , "Kokkos TaskSingle argument must be Future or TaskScheduler" );
	+
	+ return
	+ Kokkos::Impl::TaskPolicyData
	+ < Kokkos::Impl::TaskBase<void,void,void>::TaskSingle
	+ , typename std::conditional< Kokkos::is_future< T >::value , T ,
	+ typename Kokkos::Future< typename T::execution_space > >::type
	+ >( arg , arg_priority );
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+/**\brief A host control thread spawns a task with options
	+ *
	+ * 1) Team or Serial
	+ * 2) With scheduler or dependence
	+ * 3) High, Normal, or Low priority
	+ */
	+template< int TaskEnum
	+ , typename DepFutureType
	+ , typename FunctorType >
	+Future< typename FunctorType::value_type
	+ , typename DepFutureType::execution_space >
	+host_spawn( Impl::TaskPolicyData<TaskEnum,DepFutureType> const & arg_policy
	+ , FunctorType && arg_functor
	+ )
	+{
	+ using exec_space = typename DepFutureType::execution_space ;
	+ using scheduler = TaskScheduler< exec_space > ;
	+
	+ typedef Impl::TaskBase< exec_space
	+ , typename FunctorType::value_type
	+ , FunctorType
	+ > task_type ;
	+
	+ static_assert( TaskEnum == task_type::TaskTeam \|\|
	+ TaskEnum == task_type::TaskSingle
	+ , "Kokkos host_spawn requires TaskTeam or TaskSingle" );
	+
	+ // May be spawning a Cuda task, must use the specialization
	+ // to query on-device function pointer.
	+ typename task_type::function_type const ptr =
	+ Kokkos::Impl::TaskQueueSpecialization< exec_space >::
	+ template get_function_pointer< task_type >();
	+
	+ return scheduler::spawn( arg_policy , ptr , std::move(arg_functor) );
	+}
	+
	+/**\brief A task spawns a task with options
	+ *
	+ * 1) Team or Serial
	+ * 2) With scheduler or dependence
	+ * 3) High, Normal, or Low priority
	+ */
	+template< int TaskEnum
	+ , typename DepFutureType
	+ , typename FunctorType >
	+Future< typename FunctorType::value_type
	+ , typename DepFutureType::execution_space >
	+KOKKOS_INLINE_FUNCTION
	+task_spawn( Impl::TaskPolicyData<TaskEnum,DepFutureType> const & arg_policy
	+ , FunctorType && arg_functor
	+ )
	+{
	+ using exec_space = typename DepFutureType::execution_space ;
	+ using scheduler = TaskScheduler< exec_space > ;
	+
	+ typedef Impl::TaskBase< exec_space
	+ , typename FunctorType::value_type
	+ , FunctorType
	+ > task_type ;
	+
	+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST ) && \
	+ defined( KOKKOS_ENABLE_CUDA )
	+
	+ static_assert( ! std::is_same< Kokkos::Cuda , exec_space >::value
	+ , "Error calling Kokkos::task_spawn for Cuda space within Host code" );
	+
	+#endif
	+
	+ static_assert( TaskEnum == task_type::TaskTeam \|\|
	+ TaskEnum == task_type::TaskSingle
	+ , "Kokkos host_spawn requires TaskTeam or TaskSingle" );
	+
	+ typename task_type::function_type const ptr = task_type::apply ;
	+
	+ return scheduler::spawn( arg_policy , ptr , std::move(arg_functor) );
	+}
	+
	+/**\brief A task respawns itself with options
	+ *
	+ * 1) With scheduler or dependence
	+ * 2) High, Normal, or Low priority
	+ */
	+template< typename FunctorType , typename T >
	+void
	+KOKKOS_INLINE_FUNCTION
	+respawn( FunctorType * arg_self
	+ , T const & arg
	+ , TaskPriority const & arg_priority = TaskPriority::Regular
	+ )
	+{
	+ static_assert( Kokkos::is_future<T>::value \|\|
	+ Kokkos::is_scheduler<T>::value
	+ , "Kokkos respawn argument must be Future or TaskScheduler" );
	+
	+ TaskScheduler< typename T::execution_space >::
	+ respawn( arg_self , arg , arg_priority );
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+template< typename A1 , typename A2 >
	+KOKKOS_INLINE_FUNCTION
	+Future< typename Future< A1 , A2 >::execution_space >
	+when_all( Future< A1 , A2 > const arg[]
	+ , int narg
	+ )
	+{
	+ return TaskScheduler< typename Future<A1,A2>::execution_space >::
	+ when_all( arg , narg );
	+}
	+
	+//----------------------------------------------------------------------------
	+// Wait for all runnable tasks to complete
	+
	template< typename ExecSpace >
	inline
	-void wait( TaskScheduler< ExecSpace > const & policy )
	-{ policy.m_queue->execute(); }
	+void wait( TaskScheduler< ExecSpace > const & scheduler )
	+{ scheduler.m_queue->execute(); }

	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
	#endif /* #ifndef KOKKOS_TASKSCHEDULER_HPP */
	diff --git a/lib/kokkos/core/src/Kokkos_Threads.hpp b/lib/kokkos/core/src/Kokkos_Threads.hpp
	index aca482b42..8aa968d05 100644
	--- a/lib/kokkos/core/src/Kokkos_Threads.hpp
	+++ b/lib/kokkos/core/src/Kokkos_Threads.hpp
	@@ -1,233 +1,232 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_THREADS_HPP
	#define KOKKOS_THREADS_HPP

	#include <Kokkos_Core_fwd.hpp>

	#if defined( KOKKOS_ENABLE_PTHREAD )

	#include <cstddef>
	#include <iosfwd>
	#include <Kokkos_HostSpace.hpp>
	#include <Kokkos_ScratchSpace.hpp>
	#include <Kokkos_Layout.hpp>
	#include <Kokkos_MemoryTraits.hpp>
	#include <impl/Kokkos_Tags.hpp>

	/--------------------------------------------------------------------------/

	namespace Kokkos {
	namespace Impl {
	class ThreadsExec ;
	} // namespace Impl
	} // namespace Kokkos

	/--------------------------------------------------------------------------/

	namespace Kokkos {

	/** \brief Execution space for a pool of Pthreads or C11 threads on a CPU. */
	class Threads {
	public:
	//! \name Type declarations that all Kokkos devices must provide.
	//@{
	//! Tag this class as a kokkos execution space
	typedef Threads execution_space ;
	typedef Kokkos::HostSpace memory_space ;

	//! This execution space preferred device_type
	typedef Kokkos::Device<execution_space,memory_space> device_type;

	typedef Kokkos::LayoutRight array_layout ;
	typedef memory_space::size_type size_type ;

	typedef ScratchMemorySpace< Threads > scratch_memory_space ;


	//@}
	/------------------------------------------------------------------------/
	//! \name Static functions that all Kokkos devices must implement.
	//@{

	/// \brief True if and only if this method is being called in a
	/// thread-parallel function.
	static int in_parallel();

	/** \brief Set the device in a "sleep" state.
	*
	* This function sets the device in a "sleep" state in which it is
	* not ready for work. This may consume less resources than if the
	* device were in an "awake" state, but it may also take time to
	* bring the device from a sleep state to be ready for work.
	*
	* \return True if the device is in the "sleep" state, else false if
	* the device is actively working and could not enter the "sleep"
	* state.
	*/
	static bool sleep();

	/// \brief Wake the device from the 'sleep' state so it is ready for work.
	///
	/// \return True if the device is in the "ready" state, else "false"
	/// if the device is actively working (which also means that it's
	/// awake).
	static bool wake();

	/// \brief Wait until all dispatched functors complete.
	///
	/// The parallel_for or parallel_reduce dispatch of a functor may
	/// return asynchronously, before the functor completes. This
	/// method does not return until all dispatched functors on this
	/// device have completed.
	static void fence();

	/// \brief Free any resources being consumed by the device.
	///
	/// For the Threads device, this terminates spawned worker threads.
	static void finalize();

	/// \brief Print configuration information to the given output stream.
	static void print_configuration( std::ostream & , const bool detail = false );

	//@}
	/------------------------------------------------------------------------/
	/------------------------------------------------------------------------/
	//! \name Space-specific functions
	//@{

	/** \brief Initialize the device in the "ready to work" state.
	*
	* The device is initialized in a "ready to work" or "awake" state.
	* This state reduces latency and thus improves performance when
	* dispatching work. However, the "awake" state consumes resources
	* even when no work is being done. You may call sleep() to put
	* the device in a "sleeping" state that does not consume as many
	* resources, but it will take time (latency) to awaken the device
	* again (via the wake()) method so that it is ready for work.
	*
	* Teams of threads are distributed as evenly as possible across
	* the requested number of numa regions and cores per numa region.
	* A team will not be split across a numa region.
	*
	* If the 'use_' arguments are not supplied the hwloc is queried
	* to use all available cores.
	*/
	static void initialize( unsigned threads_count = 0 ,
	unsigned use_numa_count = 0 ,
	unsigned use_cores_per_numa = 0 ,
	bool allow_asynchronous_threadpool = false );

	static int is_initialized();

	/** \brief Return the maximum amount of concurrency. */
	static int concurrency();

	static Threads & instance( int = 0 );

	//----------------------------------------

	static int thread_pool_size( int depth = 0 );
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	static int thread_pool_rank();
	#else
	KOKKOS_INLINE_FUNCTION static int thread_pool_rank() { return 0 ; }
	#endif

	inline static unsigned max_hardware_threads() { return thread_pool_size(0); }
	KOKKOS_INLINE_FUNCTION static unsigned hardware_thread_id() { return thread_pool_rank(); }

	//@}
	//----------------------------------------
	};

	} // namespace Kokkos

	/--------------------------------------------------------------------------/

	namespace Kokkos {
	namespace Impl {

	template<>
	struct MemorySpaceAccess
	< Kokkos::Threads::memory_space
	, Kokkos::Threads::scratch_memory_space
	>
	{
	enum { assignable = false };
	enum { accessible = true };
	enum { deepcopy = false };
	};

	template<>
	struct VerifyExecutionCanAccessMemorySpace
	< Kokkos::Threads::memory_space
	, Kokkos::Threads::scratch_memory_space
	>
	{
	enum { value = true };
	inline static void verify( void ) { }
	inline static void verify( const void * ) { }
	};

	} // namespace Impl
	} // namespace Kokkos

	/--------------------------------------------------------------------------/

	#include <Kokkos_ExecPolicy.hpp>
	#include <Kokkos_Parallel.hpp>
	#include <Threads/Kokkos_ThreadsExec.hpp>
	#include <Threads/Kokkos_ThreadsTeam.hpp>
	#include <Threads/Kokkos_Threads_Parallel.hpp>

	#include <KokkosExp_MDRangePolicy.hpp>

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	#endif /* #if defined( KOKKOS_ENABLE_PTHREAD ) */
	#endif /* #define KOKKOS_THREADS_HPP */

	-
	diff --git a/lib/kokkos/core/src/Makefile b/lib/kokkos/core/src/Makefile
	index 316f61fd4..0668f89c8 100644
	--- a/lib/kokkos/core/src/Makefile
	+++ b/lib/kokkos/core/src/Makefile
	@@ -1,144 +1,200 @@
	ifndef KOKKOS_PATH
	MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
	KOKKOS_PATH = $(subst Makefile,,$(MAKEFILE_PATH))../..
	endif

	PREFIX ?= /usr/local/lib/kokkos

	default: messages build-lib
	echo "End Build"

	ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
	CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
	else
	CXX = g++
	endif

	CXXFLAGS = -O3
	LINK ?= $(CXX)
	LDFLAGS ?=

	include $(KOKKOS_PATH)/Makefile.kokkos

	PWD = $(shell pwd)

	KOKKOS_HEADERS_INCLUDE = $(wildcard $(KOKKOS_PATH)/core/src/*.hpp)
	KOKKOS_HEADERS_INCLUDE_IMPL = $(wildcard $(KOKKOS_PATH)/core/src/impl/*.hpp)
	KOKKOS_HEADERS_INCLUDE += $(wildcard $(KOKKOS_PATH)/containers/src/*.hpp)
	KOKKOS_HEADERS_INCLUDE_IMPL += $(wildcard $(KOKKOS_PATH)/containers/src/impl/*.hpp)
	KOKKOS_HEADERS_INCLUDE += $(wildcard $(KOKKOS_PATH)/algorithms/src/*.hpp)

	CONDITIONAL_COPIES =

	ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
	- KOKKOS_HEADERS_CUDA += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
	- CONDITIONAL_COPIES += copy-cuda
	+ KOKKOS_HEADERS_CUDA += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
	+ CONDITIONAL_COPIES += copy-cuda
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
	- KOKKOS_HEADERS_THREADS += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.hpp)
	- CONDITIONAL_COPIES += copy-threads
	+ KOKKOS_HEADERS_THREADS += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.hpp)
	+ CONDITIONAL_COPIES += copy-threads
	endif

	-ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
	- KOKKOS_HEADERS_QTHREAD += $(wildcard $(KOKKOS_PATH)/core/src/Qthread/*.hpp)
	- CONDITIONAL_COPIES += copy-qthread
	+ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 1)
	+ KOKKOS_HEADERS_QTHREADS += $(wildcard $(KOKKOS_PATH)/core/src/Qthreads/*.hpp)
	+ CONDITIONAL_COPIES += copy-qthreads
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
	- KOKKOS_HEADERS_OPENMP += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.hpp)
	- CONDITIONAL_COPIES += copy-openmp
	+ KOKKOS_HEADERS_OPENMP += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.hpp)
	+ CONDITIONAL_COPIES += copy-openmp
	endif

	ifeq ($(KOKKOS_OS),CYGWIN)
	COPY_FLAG = -u
	endif
	ifeq ($(KOKKOS_OS),Linux)
	COPY_FLAG = -u
	endif
	ifeq ($(KOKKOS_OS),Darwin)
	COPY_FLAG =
	endif

	+ifeq ($(KOKKOS_DEBUG),"no")
	+ KOKKOS_DEBUG_CMAKE = OFF
	+else
	+ KOKKOS_DEBUG_CMAKE = ON
	+endif
	+
	messages:
	echo "Start Build"

	build-makefile-kokkos:
	rm -f Makefile.kokkos
	echo "#Global Settings used to generate this library" >> Makefile.kokkos
	echo "KOKKOS_PATH = $(PREFIX)" >> Makefile.kokkos
	echo "KOKKOS_DEVICES = $(KOKKOS_DEVICES)" >> Makefile.kokkos
	echo "KOKKOS_ARCH = $(KOKKOS_ARCH)" >> Makefile.kokkos
	echo "KOKKOS_DEBUG = $(KOKKOS_DEBUG)" >> Makefile.kokkos
	echo "KOKKOS_USE_TPLS = $(KOKKOS_USE_TPLS)" >> Makefile.kokkos
	echo "KOKKOS_CXX_STANDARD = $(KOKKOS_CXX_STANDARD)" >> Makefile.kokkos
	echo "KOKKOS_OPTIONS = $(KOKKOS_OPTIONS)" >> Makefile.kokkos
	echo "KOKKOS_CUDA_OPTIONS = $(KOKKOS_CUDA_OPTIONS)" >> Makefile.kokkos
	echo "CXX ?= $(CXX)" >> Makefile.kokkos
	echo "NVCC_WRAPPER ?= $(PREFIX)/bin/nvcc_wrapper" >> Makefile.kokkos
	echo "" >> Makefile.kokkos
	echo "#Source and Header files of Kokkos relative to KOKKOS_PATH" >> Makefile.kokkos
	echo "KOKKOS_HEADERS = $(KOKKOS_HEADERS)" >> Makefile.kokkos
	echo "KOKKOS_SRC = $(KOKKOS_SRC)" >> Makefile.kokkos
	echo "" >> Makefile.kokkos
	echo "#Variables used in application Makefiles" >> Makefile.kokkos
	echo "KOKKOS_CPP_DEPENDS = $(KOKKOS_CPP_DEPENDS)" >> Makefile.kokkos
	echo "KOKKOS_CXXFLAGS = $(KOKKOS_CXXFLAGS)" >> Makefile.kokkos
	echo "KOKKOS_CPPFLAGS = $(KOKKOS_CPPFLAGS)" >> Makefile.kokkos
	echo "KOKKOS_LINK_DEPENDS = $(KOKKOS_LINK_DEPENDS)" >> Makefile.kokkos
	echo "KOKKOS_LIBS = $(KOKKOS_LIBS)" >> Makefile.kokkos
	echo "KOKKOS_LDFLAGS = $(KOKKOS_LDFLAGS)" >> Makefile.kokkos
	echo "" >> Makefile.kokkos
	echo "#Internal settings which need to propagated for Kokkos examples" >> Makefile.kokkos
	echo "KOKKOS_INTERNAL_USE_CUDA = ${KOKKOS_INTERNAL_USE_CUDA}" >> Makefile.kokkos
	+ echo "KOKKOS_INTERNAL_USE_QTHREADS = ${KOKKOS_INTERNAL_USE_QTHREADS}" >> Makefile.kokkos
	echo "KOKKOS_INTERNAL_USE_OPENMP = ${KOKKOS_INTERNAL_USE_OPENMP}" >> Makefile.kokkos
	echo "KOKKOS_INTERNAL_USE_PTHREADS = ${KOKKOS_INTERNAL_USE_PTHREADS}" >> Makefile.kokkos
	echo "" >> Makefile.kokkos
	echo "#Fake kokkos-clean target" >> Makefile.kokkos
	echo "kokkos-clean:" >> Makefile.kokkos
	echo "" >> Makefile.kokkos
	sed \
	-e 's\|$(KOKKOS_PATH)/core/src\|$(PREFIX)/include\|g' \
	-e 's\|$(KOKKOS_PATH)/containers/src\|$(PREFIX)/include\|g' \
	-e 's\|$(KOKKOS_PATH)/algorithms/src\|$(PREFIX)/include\|g' \
	-e 's\|-L$(PWD)\|-L$(PREFIX)/lib\|g' \
	-e 's\|= libkokkos.a\|= $(PREFIX)/lib/libkokkos.a\|g' \
	-e 's\|= KokkosCore_config.h\|= $(PREFIX)/include/KokkosCore_config.h\|g' Makefile.kokkos \
	> Makefile.kokkos.tmp
	mv -f Makefile.kokkos.tmp Makefile.kokkos

	-build-lib: build-makefile-kokkos $(KOKKOS_LINK_DEPENDS)
	+build-cmake-kokkos:
	+ rm -f kokkos.cmake
	+ echo "#Global Settings used to generate this library" >> kokkos.cmake
	+ echo "set(KOKKOS_PATH $(PREFIX) CACHE PATH \"Kokkos installation path\")" >> kokkos.cmake
	+ echo "set(KOKKOS_DEVICES $(KOKKOS_DEVICES) CACHE STRING \"Kokkos devices list\")" >> kokkos.cmake
	+ echo "set(KOKKOS_ARCH $(KOKKOS_ARCH) CACHE STRING \"Kokkos architecture flags\")" >> kokkos.cmake
	+ echo "set(KOKKOS_DEBUG $(KOKKOS_DEBUG_CMAKE) CACHE BOOL \"Kokkos debug enabled ?)\")" >> kokkos.cmake
	+ echo "set(KOKKOS_USE_TPLS $(KOKKOS_USE_TPLS) CACHE STRING \"Kokkos templates list\")" >> kokkos.cmake
	+ echo "set(KOKKOS_CXX_STANDARD $(KOKKOS_CXX_STANDARD) CACHE STRING \"Kokkos C++ standard\")" >> kokkos.cmake
	+ echo "set(KOKKOS_OPTIONS $(KOKKOS_OPTIONS) CACHE STRING \"Kokkos options\")" >> kokkos.cmake
	+ echo "set(KOKKOS_CUDA_OPTIONS $(KOKKOS_CUDA_OPTIONS) CACHE STRING \"Kokkos Cuda options\")" >> kokkos.cmake
	+ echo "if(NOT $ENV{CXX})" >> kokkos.cmake
	+ echo ' message(WARNING "You are currently using compiler $${CMAKE_CXX_COMPILER} while Kokkos was built with $(CXX) ; make sure this is the behavior you intended to be.")' >> kokkos.cmake
	+ echo "endif()" >> kokkos.cmake
	+ echo "if(NOT DEFINED ENV{NVCC_WRAPPER})" >> kokkos.cmake
	+ echo " set(NVCC_WRAPPER \"$(NVCC_WRAPPER)\" CACHE FILEPATH \"Path to command nvcc_wrapper\")" >> kokkos.cmake
	+ echo "else()" >> kokkos.cmake
	+ echo ' set(NVCC_WRAPPER $$ENV{NVCC_WRAPPER} CACHE FILEPATH "Path to command nvcc_wrapper")' >> kokkos.cmake
	+ echo "endif()" >> kokkos.cmake
	+ echo "" >> kokkos.cmake
	+ echo "#Source and Header files of Kokkos relative to KOKKOS_PATH" >> kokkos.cmake
	+ echo "set(KOKKOS_HEADERS \"$(KOKKOS_HEADERS)\" CACHE STRING \"Kokkos headers list\")" >> kokkos.cmake
	+ echo "set(KOKKOS_SRC \"$(KOKKOS_SRC)\" CACHE STRING \"Kokkos source list\")" >> kokkos.cmake
	+ echo "" >> kokkos.cmake
	+ echo "#Variables used in application Makefiles" >> kokkos.cmake
	+ echo "set(KOKKOS_CPP_DEPENDS \"$(KOKKOS_CPP_DEPENDS)\" CACHE STRING \"\")" >> kokkos.cmake
	+ echo "set(KOKKOS_CXXFLAGS \"$(KOKKOS_CXXFLAGS)\" CACHE STRING \"\")" >> kokkos.cmake
	+ echo "set(KOKKOS_CPPFLAGS \"$(KOKKOS_CPPFLAGS)\" CACHE STRING \"\")" >> kokkos.cmake
	+ echo "set(KOKKOS_LINK_DEPENDS \"$(KOKKOS_LINK_DEPENDS)\" CACHE STRING \"\")" >> kokkos.cmake
	+ echo "set(KOKKOS_LIBS \"$(KOKKOS_LIBS)\" CACHE STRING \"\")" >> kokkos.cmake
	+ echo "set(KOKKOS_LDFLAGS \"$(KOKKOS_LDFLAGS)\" CACHE STRING \"\")" >> kokkos.cmake
	+ echo "" >> kokkos.cmake
	+ echo "#Internal settings which need to propagated for Kokkos examples" >> kokkos.cmake
	+ echo "set(KOKKOS_INTERNAL_USE_CUDA \"${KOKKOS_INTERNAL_USE_CUDA}\" CACHE STRING \"\")" >> kokkos.cmake
	+ echo "set(KOKKOS_INTERNAL_USE_OPENMP \"${KOKKOS_INTERNAL_USE_OPENMP}\" CACHE STRING \"\")" >> kokkos.cmake
	+ echo "set(KOKKOS_INTERNAL_USE_PTHREADS \"${KOKKOS_INTERNAL_USE_PTHREADS}\" CACHE STRING \"\")" >> kokkos.cmake
	+ echo "mark_as_advanced(KOKKOS_HEADERS KOKKOS_SRC KOKKOS_INTERNAL_USE_CUDA KOKKOS_INTERNAL_USE_OPENMP KOKKOS_INTERNAL_USE_PTHREADS)" >> kokkos.cmake
	+ echo "" >> kokkos.cmake
	+ sed \
	+ -e 's\|$(KOKKOS_PATH)/core/src\|$(PREFIX)/include\|g' \
	+ -e 's\|$(KOKKOS_PATH)/containers/src\|$(PREFIX)/include\|g' \
	+ -e 's\|$(KOKKOS_PATH)/algorithms/src\|$(PREFIX)/include\|g' \
	+ -e 's\|-L$(PWD)\|-L$(PREFIX)/lib\|g' \
	+ -e 's\|= libkokkos.a\|= $(PREFIX)/lib/libkokkos.a\|g' \
	+ -e 's\|= KokkosCore_config.h\|= $(PREFIX)/include/KokkosCore_config.h\|g' kokkos.cmake \
	+ > kokkos.cmake.tmp
	+ mv -f kokkos.cmake.tmp kokkos.cmake
	+
	+build-lib: build-makefile-kokkos build-cmake-kokkos $(KOKKOS_LINK_DEPENDS)

	mkdir:
	mkdir -p $(PREFIX)
	mkdir -p $(PREFIX)/bin
	mkdir -p $(PREFIX)/include
	mkdir -p $(PREFIX)/lib
	mkdir -p $(PREFIX)/include/impl

	copy-cuda: mkdir
	mkdir -p $(PREFIX)/include/Cuda
	cp $(COPY_FLAG) $(KOKKOS_HEADERS_CUDA) $(PREFIX)/include/Cuda

	copy-threads: mkdir
	mkdir -p $(PREFIX)/include/Threads
	cp $(COPY_FLAG) $(KOKKOS_HEADERS_THREADS) $(PREFIX)/include/Threads

	-copy-qthread: mkdir
	- mkdir -p $(PREFIX)/include/Qthread
	- cp $(COPY_FLAG) $(KOKKOS_HEADERS_QTHREAD) $(PREFIX)/include/Qthread
	+copy-qthreads: mkdir
	+ mkdir -p $(PREFIX)/include/Qthreads
	+ cp $(COPY_FLAG) $(KOKKOS_HEADERS_QTHREADS) $(PREFIX)/include/Qthreads

	copy-openmp: mkdir
	mkdir -p $(PREFIX)/include/OpenMP
	cp $(COPY_FLAG) $(KOKKOS_HEADERS_OPENMP) $(PREFIX)/include/OpenMP

	install: mkdir $(CONDITIONAL_COPIES) build-lib
	cp $(COPY_FLAG) $(NVCC_WRAPPER) $(PREFIX)/bin
	cp $(COPY_FLAG) $(KOKKOS_HEADERS_INCLUDE) $(PREFIX)/include
	cp $(COPY_FLAG) $(KOKKOS_HEADERS_INCLUDE_IMPL) $(PREFIX)/include/impl
	cp $(COPY_FLAG) Makefile.kokkos $(PREFIX)
	+ cp $(COPY_FLAG) kokkos.cmake $(PREFIX)
	cp $(COPY_FLAG) libkokkos.a $(PREFIX)/lib
	cp $(COPY_FLAG) KokkosCore_config.h $(PREFIX)/include

	clean: kokkos-clean
	rm -f Makefile.kokkos
	diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Parallel.hpp b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Parallel.hpp
	index a61791ca9..ecacffb77 100644
	--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Parallel.hpp
	+++ b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Parallel.hpp
	@@ -1,750 +1,853 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_OPENMP_PARALLEL_HPP
	#define KOKKOS_OPENMP_PARALLEL_HPP

	#include <omp.h>
	#include <iostream>
	-#include <Kokkos_Parallel.hpp>
	#include <OpenMP/Kokkos_OpenMPexec.hpp>
	#include <impl/Kokkos_FunctorAdapter.hpp>

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	template< class FunctorType , class ... Traits >
	class ParallelFor< FunctorType
	, Kokkos::RangePolicy< Traits ... >
	, Kokkos::OpenMP
	>
	{
	private:

	typedef Kokkos::RangePolicy< Traits ... > Policy ;
	typedef typename Policy::work_tag WorkTag ;
	typedef typename Policy::WorkRange WorkRange ;
	typedef typename Policy::member_type Member ;

	const FunctorType m_functor ;
	const Policy m_policy ;

	template< class TagType >
	inline static
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	exec_range( const FunctorType & functor
	, const Member ibeg , const Member iend )
	{
	#ifdef KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION
	#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	#pragma ivdep
	#endif
	#endif
	for ( Member iwork = ibeg ; iwork < iend ; ++iwork ) {
	functor( iwork );
	}
	}

	template< class TagType >
	inline static
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	exec_range( const FunctorType & functor
	, const Member ibeg , const Member iend )
	{
	const TagType t{} ;
	#ifdef KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION
	#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	#pragma ivdep
	#endif
	#endif
	for ( Member iwork = ibeg ; iwork < iend ; ++iwork ) {
	functor( t , iwork );
	}
	}

	public:

	- inline void execute() const {
	- this->template execute_schedule<typename Policy::schedule_type::type>();
	- }
	-
	- template<class Schedule>
	- inline
	- typename std::enable_if< std::is_same<Schedule,Kokkos::Static>::value >::type
	- execute_schedule() const
	+ inline void execute() const
	{
	+ enum { is_dynamic = std::is_same< typename Policy::schedule_type::type
	+ , Kokkos::Dynamic >::value };
	+
	OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_for");
	OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_for");

	#pragma omp parallel
	{
	- OpenMPexec & exec = * OpenMPexec::get_thread_omp();
	-
	- const WorkRange range( m_policy, exec.pool_rank(), exec.pool_size() );
	+ HostThreadTeamData & data = *OpenMPexec::get_thread_data();

	- ParallelFor::template exec_range< WorkTag >( m_functor , range.begin() , range.end() );
	- }
	-/* END #pragma omp parallel */
	- }
	+ data.set_work_partition( m_policy.end() - m_policy.begin()
	+ , m_policy.chunk_size() );

	- template<class Schedule>
	- inline
	- typename std::enable_if< std::is_same<Schedule,Kokkos::Dynamic>::value >::type
	- execute_schedule() const
	- {
	- OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_for");
	- OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_for");
	+ if ( is_dynamic ) {
	+ // Make sure work partition is set before stealing
	+ if ( data.pool_rendezvous() ) data.pool_rendezvous_release();
	+ }

	-#pragma omp parallel
	- {
	- OpenMPexec & exec = * OpenMPexec::get_thread_omp();
	+ std::pair<int64_t,int64_t> range(0,0);

	- const WorkRange range( m_policy, exec.pool_rank(), exec.pool_size() );
	+ do {

	- exec.set_work_range(range.begin(),range.end(),m_policy.chunk_size());
	- exec.reset_steal_target();
	- #pragma omp barrier
	-
	- long work_index = exec.get_work_index();
	+ range = is_dynamic ? data.get_work_stealing_chunk()
	+ : data.get_work_partition();

	- while(work_index != -1) {
	- const Member begin = static_cast<Member>(work_index) * m_policy.chunk_size();
	- const Member end = begin + m_policy.chunk_size() < m_policy.end()?begin+m_policy.chunk_size():m_policy.end();
	- ParallelFor::template exec_range< WorkTag >( m_functor , begin, end );
	- work_index = exec.get_work_index();
	- }
	+ ParallelFor::template
	+ exec_range< WorkTag >( m_functor
	+ , range.first + m_policy.begin()
	+ , range.second + m_policy.begin() );

	+ } while ( is_dynamic && 0 <= range.first );
	}
	-/* END #pragma omp parallel */
	+ // END #pragma omp parallel
	}

	inline
	ParallelFor( const FunctorType & arg_functor
	, Policy arg_policy )
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	{}
	};

	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	template< class FunctorType , class ReducerType, class ... Traits >
	class ParallelReduce< FunctorType
	, Kokkos::RangePolicy< Traits ...>
	, ReducerType
	, Kokkos::OpenMP
	>
	{
	private:

	typedef Kokkos::RangePolicy< Traits ... > Policy ;

	typedef typename Policy::work_tag WorkTag ;
	typedef typename Policy::WorkRange WorkRange ;
	typedef typename Policy::member_type Member ;

	+ typedef FunctorAnalysis< FunctorPatternInterface::REDUCE , Policy , FunctorType > Analysis ;
	+
	typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
	typedef typename ReducerConditional::type ReducerTypeFwd;

	// Static Assert WorkTag void if ReducerType not InvalidType

	- typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
	typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
	typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTag > ValueJoin ;

	- typedef typename ValueTraits::pointer_type pointer_type ;
	- typedef typename ValueTraits::reference_type reference_type ;
	+ typedef typename Analysis::pointer_type pointer_type ;
	+ typedef typename Analysis::reference_type reference_type ;

	const FunctorType m_functor ;
	const Policy m_policy ;
	const ReducerType m_reducer ;
	const pointer_type m_result_ptr ;

	template< class TagType >
	inline static
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	exec_range( const FunctorType & functor
	, const Member ibeg , const Member iend
	, reference_type update )
	{
	#ifdef KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION
	#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	#pragma ivdep
	#endif
	#endif
	for ( Member iwork = ibeg ; iwork < iend ; ++iwork ) {
	functor( iwork , update );
	}
	}

	template< class TagType >
	inline static
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	exec_range( const FunctorType & functor
	, const Member ibeg , const Member iend
	, reference_type update )
	{
	const TagType t{} ;
	#ifdef KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION
	#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	#pragma ivdep
	#endif
	#endif
	for ( Member iwork = ibeg ; iwork < iend ; ++iwork ) {
	functor( t , iwork , update );
	}
	}

	public:

	- inline void execute() const {
	- this->template execute_schedule<typename Policy::schedule_type::type>();
	- }
	-
	- template<class Schedule>
	- inline
	- typename std::enable_if< std::is_same<Schedule,Kokkos::Static>::value >::type
	- execute_schedule() const
	+ inline void execute() const
	{
	- OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_reduce");
	- OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_reduce");
	+ enum { is_dynamic = std::is_same< typename Policy::schedule_type::type
	+ , Kokkos::Dynamic >::value };

	- OpenMPexec::resize_scratch( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , 0 );
	+ OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_for");
	+ OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_for");
	+
	+ const size_t pool_reduce_bytes =
	+ Analysis::value_size( ReducerConditional::select(m_functor, m_reducer));
	+
	+ OpenMPexec::resize_thread_data( pool_reduce_bytes
	+ , 0 // team_reduce_bytes
	+ , 0 // team_shared_bytes
	+ , 0 // thread_local_bytes
	+ );

	#pragma omp parallel
	{
	- OpenMPexec & exec = * OpenMPexec::get_thread_omp();
	- const WorkRange range( m_policy, exec.pool_rank(), exec.pool_size() );
	- ParallelReduce::template exec_range< WorkTag >
	- ( m_functor , range.begin() , range.end()
	- , ValueInit::init( ReducerConditional::select(m_functor , m_reducer), exec.scratch_reduce() ) );
	- }
	-/* END #pragma omp parallel */
	+ HostThreadTeamData & data = *OpenMPexec::get_thread_data();

	- // Reduction:
	+ data.set_work_partition( m_policy.end() - m_policy.begin()
	+ , m_policy.chunk_size() );

	- const pointer_type ptr = pointer_type( OpenMPexec::pool_rev(0)->scratch_reduce() );
	+ if ( is_dynamic ) {
	+ // Make sure work partition is set before stealing
	+ if ( data.pool_rendezvous() ) data.pool_rendezvous_release();
	+ }

	- for ( int i = 1 ; i < OpenMPexec::pool_size() ; ++i ) {
	- ValueJoin::join( ReducerConditional::select(m_functor , m_reducer) , ptr , OpenMPexec::pool_rev(i)->scratch_reduce() );
	- }
	+ reference_type update =
	+ ValueInit::init( ReducerConditional::select(m_functor , m_reducer)
	+ , data.pool_reduce_local() );

	- Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , ptr );
	+ std::pair<int64_t,int64_t> range(0,0);

	- if ( m_result_ptr ) {
	- const int n = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
	+ do {

	- for ( int j = 0 ; j < n ; ++j ) { m_result_ptr[j] = ptr[j] ; }
	- }
	- }
	+ range = is_dynamic ? data.get_work_stealing_chunk()
	+ : data.get_work_partition();

	- template<class Schedule>
	- inline
	- typename std::enable_if< std::is_same<Schedule,Kokkos::Dynamic>::value >::type
	- execute_schedule() const
	- {
	- OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_reduce");
	- OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_reduce");
	+ ParallelReduce::template
	+ exec_range< WorkTag >( m_functor
	+ , range.first + m_policy.begin()
	+ , range.second + m_policy.begin()
	+ , update );

	- OpenMPexec::resize_scratch( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , 0 );
	-
	-#pragma omp parallel
	- {
	- OpenMPexec & exec = * OpenMPexec::get_thread_omp();
	- const WorkRange range( m_policy, exec.pool_rank(), exec.pool_size() );
	-
	- exec.set_work_range(range.begin(),range.end(),m_policy.chunk_size());
	- exec.reset_steal_target();
	- #pragma omp barrier
	-
	- long work_index = exec.get_work_index();
	-
	- reference_type update = ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , exec.scratch_reduce() );
	- while(work_index != -1) {
	- const Member begin = static_cast<Member>(work_index) * m_policy.chunk_size();
	- const Member end = begin + m_policy.chunk_size() < m_policy.end()?begin+m_policy.chunk_size():m_policy.end();
	- ParallelReduce::template exec_range< WorkTag >
	- ( m_functor , begin,end
	- , update );
	- work_index = exec.get_work_index();
	- }
	+ } while ( is_dynamic && 0 <= range.first );
	}
	-/* END #pragma omp parallel */
	+// END #pragma omp parallel

	// Reduction:

	- const pointer_type ptr = pointer_type( OpenMPexec::pool_rev(0)->scratch_reduce() );
	+ const pointer_type ptr = pointer_type( OpenMPexec::get_thread_data(0)->pool_reduce_local() );

	for ( int i = 1 ; i < OpenMPexec::pool_size() ; ++i ) {
	- ValueJoin::join( ReducerConditional::select(m_functor , m_reducer) , ptr , OpenMPexec::pool_rev(i)->scratch_reduce() );
	+ ValueJoin::join( ReducerConditional::select(m_functor , m_reducer)
	+ , ptr
	+ , OpenMPexec::get_thread_data(i)->pool_reduce_local() );
	}

	Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , ptr );

	if ( m_result_ptr ) {
	- const int n = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
	+ const int n = Analysis::value_count( ReducerConditional::select(m_functor , m_reducer) );

	for ( int j = 0 ; j < n ; ++j ) { m_result_ptr[j] = ptr[j] ; }
	}
	}

	//----------------------------------------

	template< class ViewType >
	inline
	ParallelReduce( const FunctorType & arg_functor
	, Policy arg_policy
	, const ViewType & arg_result_view
	, typename std::enable_if<
	Kokkos::is_view< ViewType >::value &&
	!Kokkos::is_reducer_type<ReducerType>::value
	,void*>::type = NULL)
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	, m_reducer( InvalidType() )
	, m_result_ptr( arg_result_view.data() )
	{
	/*static_assert( std::is_same< typename ViewType::memory_space
	, Kokkos::HostSpace >::value
	, "Reduction result on Kokkos::OpenMP must be a Kokkos::View in HostSpace" );*/
	}

	inline
	ParallelReduce( const FunctorType & arg_functor
	, Policy arg_policy
	, const ReducerType& reducer )
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	, m_reducer( reducer )
	, m_result_ptr( reducer.result_view().data() )
	{
	/*static_assert( std::is_same< typename ViewType::memory_space
	, Kokkos::HostSpace >::value
	, "Reduction result on Kokkos::OpenMP must be a Kokkos::View in HostSpace" );*/
	}

	};

	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	template< class FunctorType , class ... Traits >
	class ParallelScan< FunctorType
	, Kokkos::RangePolicy< Traits ... >
	, Kokkos::OpenMP
	>
	{
	private:

	typedef Kokkos::RangePolicy< Traits ... > Policy ;

	+ typedef FunctorAnalysis< FunctorPatternInterface::SCAN , Policy , FunctorType > Analysis ;
	+
	typedef typename Policy::work_tag WorkTag ;
	typedef typename Policy::WorkRange WorkRange ;
	typedef typename Policy::member_type Member ;

	- typedef Kokkos::Impl::FunctorValueTraits< FunctorType, WorkTag > ValueTraits ;
	typedef Kokkos::Impl::FunctorValueInit< FunctorType, WorkTag > ValueInit ;
	typedef Kokkos::Impl::FunctorValueJoin< FunctorType, WorkTag > ValueJoin ;
	typedef Kokkos::Impl::FunctorValueOps< FunctorType, WorkTag > ValueOps ;

	- typedef typename ValueTraits::pointer_type pointer_type ;
	- typedef typename ValueTraits::reference_type reference_type ;
	+ typedef typename Analysis::pointer_type pointer_type ;
	+ typedef typename Analysis::reference_type reference_type ;

	const FunctorType m_functor ;
	const Policy m_policy ;

	template< class TagType >
	inline static
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	exec_range( const FunctorType & functor
	, const Member ibeg , const Member iend
	, reference_type update , const bool final )
	{
	#ifdef KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION
	#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	#pragma ivdep
	#endif
	#endif
	for ( Member iwork = ibeg ; iwork < iend ; ++iwork ) {
	functor( iwork , update , final );
	}
	}

	template< class TagType >
	inline static
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	exec_range( const FunctorType & functor
	, const Member ibeg , const Member iend
	, reference_type update , const bool final )
	{
	const TagType t{} ;
	#ifdef KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION
	#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	#pragma ivdep
	#endif
	#endif
	for ( Member iwork = ibeg ; iwork < iend ; ++iwork ) {
	functor( t , iwork , update , final );
	}
	}

	public:

	inline
	void execute() const
	{
	OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_scan");
	OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_scan");

	- OpenMPexec::resize_scratch( 2 * ValueTraits::value_size( m_functor ) , 0 );
	+ const int value_count = Analysis::value_count( m_functor );
	+ const size_t pool_reduce_bytes = 2 * Analysis::value_size( m_functor );
	+
	+ OpenMPexec::resize_thread_data( pool_reduce_bytes
	+ , 0 // team_reduce_bytes
	+ , 0 // team_shared_bytes
	+ , 0 // thread_local_bytes
	+ );

	#pragma omp parallel
	{
	- OpenMPexec & exec = * OpenMPexec::get_thread_omp();
	- const WorkRange range( m_policy, exec.pool_rank(), exec.pool_size() );
	- const pointer_type ptr =
	- pointer_type( exec.scratch_reduce() ) +
	- ValueTraits::value_count( m_functor );
	+ HostThreadTeamData & data = *OpenMPexec::get_thread_data();
	+
	+ const WorkRange range( m_policy, data.pool_rank(), data.pool_size() );
	+
	+ reference_type update_sum =
	+ ValueInit::init( m_functor , data.pool_reduce_local() );
	+
	ParallelScan::template exec_range< WorkTag >
	- ( m_functor , range.begin() , range.end()
	- , ValueInit::init( m_functor , ptr ) , false );
	- }
	-/* END #pragma omp parallel */
	+ ( m_functor , range.begin() , range.end() , update_sum , false );

	- {
	- const unsigned thread_count = OpenMPexec::pool_size();
	- const unsigned value_count = ValueTraits::value_count( m_functor );
	+ if ( data.pool_rendezvous() ) {

	- pointer_type ptr_prev = 0 ;
	+ pointer_type ptr_prev = 0 ;

	- for ( unsigned rank_rev = thread_count ; rank_rev-- ; ) {
	+ const int n = data.pool_size();

	- pointer_type ptr = pointer_type( OpenMPexec::pool_rev(rank_rev)->scratch_reduce() );
	+ for ( int i = 0 ; i < n ; ++i ) {

	- if ( ptr_prev ) {
	- for ( unsigned i = 0 ; i < value_count ; ++i ) { ptr[i] = ptr_prev[ i + value_count ] ; }
	- ValueJoin::join( m_functor , ptr + value_count , ptr );
	- }
	- else {
	- ValueInit::init( m_functor , ptr );
	+ pointer_type ptr = (pointer_type)
	+ data.pool_member(i)->pool_reduce_local();
	+
	+ if ( i ) {
	+ for ( int j = 0 ; j < value_count ; ++j ) {
	+ ptr[j+value_count] = ptr_prev[j+value_count] ;
	+ }
	+ ValueJoin::join( m_functor , ptr + value_count , ptr_prev );
	+ }
	+ else {
	+ ValueInit::init( m_functor , ptr + value_count );
	+ }
	+
	+ ptr_prev = ptr ;
	}

	- ptr_prev = ptr ;
	+ data.pool_rendezvous_release();
	}
	- }

	-#pragma omp parallel
	- {
	- OpenMPexec & exec = * OpenMPexec::get_thread_omp();
	- const WorkRange range( m_policy, exec.pool_rank(), exec.pool_size() );
	- const pointer_type ptr = pointer_type( exec.scratch_reduce() );
	+ reference_type update_base =
	+ ValueOps::reference
	+ ( ((pointer_type)data.pool_reduce_local()) + value_count );
	+
	ParallelScan::template exec_range< WorkTag >
	- ( m_functor , range.begin() , range.end()
	- , ValueOps::reference( ptr ) , true );
	+ ( m_functor , range.begin() , range.end() , update_base , true );
	}
	/* END #pragma omp parallel */
	+
	}

	//----------------------------------------

	inline
	ParallelScan( const FunctorType & arg_functor
	, const Policy & arg_policy )
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	{}

	//----------------------------------------
	};

	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	template< class FunctorType , class ... Properties >
	class ParallelFor< FunctorType
	, Kokkos::TeamPolicy< Properties ... >
	, Kokkos::OpenMP
	>
	{
	private:

	+ enum { TEAM_REDUCE_SIZE = 512 };
	+
	typedef Kokkos::Impl::TeamPolicyInternal< Kokkos::OpenMP, Properties ... > Policy ;
	- typedef typename Policy::work_tag WorkTag ;
	- typedef typename Policy::member_type Member ;
	+ typedef typename Policy::work_tag WorkTag ;
	+ typedef typename Policy::schedule_type::type SchedTag ;
	+ typedef typename Policy::member_type Member ;

	const FunctorType m_functor ;
	const Policy m_policy ;
	const int m_shmem_size ;

	- template< class TagType, class Schedule >
	+ template< class TagType >
	inline static
	- typename std::enable_if< std::is_same< TagType , void >::value && std::is_same<Schedule,Kokkos::Static>::value>::type
	- exec_team( const FunctorType & functor , Member member )
	+ typename std::enable_if< ( std::is_same< TagType , void >::value ) >::type
	+ exec_team( const FunctorType & functor
	+ , HostThreadTeamData & data
	+ , const int league_rank_begin
	+ , const int league_rank_end
	+ , const int league_size )
	{
	- for ( ; member.valid_static() ; member.next_static() ) {
	- functor( member );
	- }
	- }
	+ for ( int r = league_rank_begin ; r < league_rank_end ; ) {

	- template< class TagType, class Schedule >
	- inline static
	- typename std::enable_if< (! std::is_same< TagType , void >::value) && std::is_same<Schedule,Kokkos::Static>::value >::type
	- exec_team( const FunctorType & functor , Member member )
	- {
	- const TagType t{} ;
	- for ( ; member.valid_static() ; member.next_static() ) {
	- functor( t , member );
	- }
	- }
	+ functor( Member( data, r , league_size ) );

	- template< class TagType, class Schedule >
	- inline static
	- typename std::enable_if< std::is_same< TagType , void >::value && std::is_same<Schedule,Kokkos::Dynamic>::value>::type
	- exec_team( const FunctorType & functor , Member member )
	- {
	- #pragma omp barrier
	- for ( ; member.valid_dynamic() ; member.next_dynamic() ) {
	- functor( member );
	+ if ( ++r < league_rank_end ) {
	+ // Don't allow team members to lap one another
	+ // so that they don't overwrite shared memory.
	+ if ( data.team_rendezvous() ) { data.team_rendezvous_release(); }
	+ }
	}
	}

	- template< class TagType, class Schedule >
	+
	+ template< class TagType >
	inline static
	- typename std::enable_if< (! std::is_same< TagType , void >::value) && std::is_same<Schedule,Kokkos::Dynamic>::value >::type
	- exec_team( const FunctorType & functor , Member member )
	+ typename std::enable_if< ( ! std::is_same< TagType , void >::value ) >::type
	+ exec_team( const FunctorType & functor
	+ , HostThreadTeamData & data
	+ , const int league_rank_begin
	+ , const int league_rank_end
	+ , const int league_size )
	{
	- #pragma omp barrier
	- const TagType t{} ;
	- for ( ; member.valid_dynamic() ; member.next_dynamic() ) {
	- functor( t , member );
	+ const TagType t{};
	+
	+ for ( int r = league_rank_begin ; r < league_rank_end ; ) {
	+
	+ functor( t , Member( data, r , league_size ) );
	+
	+ if ( ++r < league_rank_end ) {
	+ // Don't allow team members to lap one another
	+ // so that they don't overwrite shared memory.
	+ if ( data.team_rendezvous() ) { data.team_rendezvous_release(); }
	+ }
	}
	}

	public:

	inline
	void execute() const
	{
	+ enum { is_dynamic = std::is_same< SchedTag , Kokkos::Dynamic >::value };
	+
	OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_for");
	OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_for");

	- const size_t team_reduce_size = Policy::member_type::team_reduce_size();
	+ const size_t pool_reduce_size = 0 ; // Never shrinks
	+ const size_t team_reduce_size = TEAM_REDUCE_SIZE * m_policy.team_size();
	+ const size_t team_shared_size = m_shmem_size + m_policy.scratch_size(1);
	+ const size_t thread_local_size = 0 ; // Never shrinks

	- OpenMPexec::resize_scratch( 0 , team_reduce_size + m_shmem_size + m_policy.scratch_size(1));
	+ OpenMPexec::resize_thread_data( pool_reduce_size
	+ , team_reduce_size
	+ , team_shared_size
	+ , thread_local_size );

	#pragma omp parallel
	{
	- ParallelFor::template exec_team< WorkTag, typename Policy::schedule_type::type>
	- ( m_functor
	- , Member( * OpenMPexec::get_thread_omp(), m_policy, m_shmem_size, 0) );
	+ HostThreadTeamData & data = *OpenMPexec::get_thread_data();
	+
	+ const int active = data.organize_team( m_policy.team_size() );
	+
	+ if ( active ) {
	+ data.set_work_partition( m_policy.league_size()
	+ , ( 0 < m_policy.chunk_size()
	+ ? m_policy.chunk_size()
	+ : m_policy.team_iter() ) );
	+ }
	+
	+ if ( is_dynamic ) {
	+ // Must synchronize to make sure each team has set its
	+ // partition before begining the work stealing loop.
	+ if ( data.pool_rendezvous() ) data.pool_rendezvous_release();
	+ }
	+
	+ if ( active ) {
	+
	+ std::pair<int64_t,int64_t> range(0,0);
	+
	+ do {
	+
	+ range = is_dynamic ? data.get_work_stealing_chunk()
	+ : data.get_work_partition();
	+
	+ ParallelFor::template exec_team< WorkTag >
	+ ( m_functor , data
	+ , range.first , range.second , m_policy.league_size() );
	+
	+ } while ( is_dynamic && 0 <= range.first );
	+ }
	+
	+ data.disband_team();
	}
	-/* END #pragma omp parallel */
	+// END #pragma omp parallel
	}

	+
	inline
	ParallelFor( const FunctorType & arg_functor ,
	const Policy & arg_policy )
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	- , m_shmem_size( arg_policy.scratch_size(0) + arg_policy.scratch_size(1) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , arg_policy.team_size() ) )
	+ , m_shmem_size( arg_policy.scratch_size(0) +
	+ arg_policy.scratch_size(1) +
	+ FunctorTeamShmemSize< FunctorType >
	+ ::value( arg_functor , arg_policy.team_size() ) )
	{}
	};

	+//----------------------------------------------------------------------------

	template< class FunctorType , class ReducerType, class ... Properties >
	class ParallelReduce< FunctorType
	, Kokkos::TeamPolicy< Properties ... >
	, ReducerType
	, Kokkos::OpenMP
	>
	{
	private:

	+ enum { TEAM_REDUCE_SIZE = 512 };
	+
	typedef Kokkos::Impl::TeamPolicyInternal< Kokkos::OpenMP, Properties ... > Policy ;

	- typedef typename Policy::work_tag WorkTag ;
	- typedef typename Policy::member_type Member ;
	+ typedef FunctorAnalysis< FunctorPatternInterface::REDUCE , Policy , FunctorType > Analysis ;
	+
	+ typedef typename Policy::work_tag WorkTag ;
	+ typedef typename Policy::schedule_type::type SchedTag ;
	+ typedef typename Policy::member_type Member ;
	+
	+ typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value
	+ , FunctorType, ReducerType> ReducerConditional;

	- typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
	typedef typename ReducerConditional::type ReducerTypeFwd;

	- typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTag > ValueTraits ;
	typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;
	typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd , WorkTag > ValueJoin ;

	- typedef typename ValueTraits::pointer_type pointer_type ;
	- typedef typename ValueTraits::reference_type reference_type ;
	+ typedef typename Analysis::pointer_type pointer_type ;
	+ typedef typename Analysis::reference_type reference_type ;

	const FunctorType m_functor ;
	const Policy m_policy ;
	const ReducerType m_reducer ;
	const pointer_type m_result_ptr ;
	const int m_shmem_size ;

	template< class TagType >
	inline static
	- typename std::enable_if< std::is_same< TagType , void >::value >::type
	- exec_team( const FunctorType & functor , Member member , reference_type update )
	+ typename std::enable_if< ( std::is_same< TagType , void >::value ) >::type
	+ exec_team( const FunctorType & functor
	+ , HostThreadTeamData & data
	+ , reference_type & update
	+ , const int league_rank_begin
	+ , const int league_rank_end
	+ , const int league_size )
	{
	- for ( ; member.valid_static() ; member.next_static() ) {
	- functor( member , update );
	+ for ( int r = league_rank_begin ; r < league_rank_end ; ) {
	+
	+ functor( Member( data, r , league_size ) , update );
	+
	+ if ( ++r < league_rank_end ) {
	+ // Don't allow team members to lap one another
	+ // so that they don't overwrite shared memory.
	+ if ( data.team_rendezvous() ) { data.team_rendezvous_release(); }
	+ }
	}
	}

	+
	template< class TagType >
	inline static
	- typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	- exec_team( const FunctorType & functor , Member member , reference_type update )
	+ typename std::enable_if< ( ! std::is_same< TagType , void >::value ) >::type
	+ exec_team( const FunctorType & functor
	+ , HostThreadTeamData & data
	+ , reference_type & update
	+ , const int league_rank_begin
	+ , const int league_rank_end
	+ , const int league_size )
	{
	- const TagType t{} ;
	- for ( ; member.valid_static() ; member.next_static() ) {
	- functor( t , member , update );
	+ const TagType t{};
	+
	+ for ( int r = league_rank_begin ; r < league_rank_end ; ) {
	+
	+ functor( t , Member( data, r , league_size ) , update );
	+
	+ if ( ++r < league_rank_end ) {
	+ // Don't allow team members to lap one another
	+ // so that they don't overwrite shared memory.
	+ if ( data.team_rendezvous() ) { data.team_rendezvous_release(); }
	+ }
	}
	}

	public:

	inline
	void execute() const
	{
	+ enum { is_dynamic = std::is_same< SchedTag , Kokkos::Dynamic >::value };
	+
	OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_reduce");
	+ OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_reduce");
	+
	+ const size_t pool_reduce_size =
	+ Analysis::value_size( ReducerConditional::select(m_functor, m_reducer));

	- const size_t team_reduce_size = Policy::member_type::team_reduce_size();
	+ const size_t team_reduce_size = TEAM_REDUCE_SIZE * m_policy.team_size();
	+ const size_t team_shared_size = m_shmem_size + m_policy.scratch_size(1);
	+ const size_t thread_local_size = 0 ; // Never shrinks

	- OpenMPexec::resize_scratch( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , team_reduce_size + m_shmem_size );
	+ OpenMPexec::resize_thread_data( pool_reduce_size
	+ , team_reduce_size
	+ , team_shared_size
	+ , thread_local_size );

	#pragma omp parallel
	{
	- OpenMPexec & exec = * OpenMPexec::get_thread_omp();
	+ HostThreadTeamData & data = *OpenMPexec::get_thread_data();

	- ParallelReduce::template exec_team< WorkTag >
	- ( m_functor
	- , Member( exec , m_policy , m_shmem_size, 0 )
	- , ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , exec.scratch_reduce() ) );
	- }
	-/* END #pragma omp parallel */
	+ const int active = data.organize_team( m_policy.team_size() );

	- {
	- const pointer_type ptr = pointer_type( OpenMPexec::pool_rev(0)->scratch_reduce() );
	-
	- int max_active_threads = OpenMPexec::pool_size();
	- if( max_active_threads > m_policy.league_size()* m_policy.team_size() )
	- max_active_threads = m_policy.league_size()* m_policy.team_size();
	+ if ( active ) {
	+ data.set_work_partition( m_policy.league_size()
	+ , ( 0 < m_policy.chunk_size()
	+ ? m_policy.chunk_size()
	+ : m_policy.team_iter() ) );
	+ }

	- for ( int i = 1 ; i < max_active_threads ; ++i ) {
	- ValueJoin::join( ReducerConditional::select(m_functor , m_reducer) , ptr , OpenMPexec::pool_rev(i)->scratch_reduce() );
	+ if ( is_dynamic ) {
	+ // Must synchronize to make sure each team has set its
	+ // partition before begining the work stealing loop.
	+ if ( data.pool_rendezvous() ) data.pool_rendezvous_release();
	}

	- Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , ptr );
	+ if ( active ) {
	+ reference_type update =
	+ ValueInit::init( ReducerConditional::select(m_functor , m_reducer)
	+ , data.pool_reduce_local() );
	+
	+ std::pair<int64_t,int64_t> range(0,0);

	- if ( m_result_ptr ) {
	- const int n = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
	+ do {

	- for ( int j = 0 ; j < n ; ++j ) { m_result_ptr[j] = ptr[j] ; }
	+ range = is_dynamic ? data.get_work_stealing_chunk()
	+ : data.get_work_partition();
	+
	+ ParallelReduce::template exec_team< WorkTag >
	+ ( m_functor , data , update
	+ , range.first , range.second , m_policy.league_size() );
	+
	+ } while ( is_dynamic && 0 <= range.first );
	+ } else {
	+ ValueInit::init( ReducerConditional::select(m_functor , m_reducer)
	+ , data.pool_reduce_local() );
	}
	+
	+ data.disband_team();
	+ }
	+// END #pragma omp parallel
	+
	+ // Reduction:
	+
	+ const pointer_type ptr = pointer_type( OpenMPexec::get_thread_data(0)->pool_reduce_local() );
	+
	+ for ( int i = 1 ; i < OpenMPexec::pool_size() ; ++i ) {
	+ ValueJoin::join( ReducerConditional::select(m_functor , m_reducer)
	+ , ptr
	+ , OpenMPexec::get_thread_data(i)->pool_reduce_local() );
	+ }
	+
	+ Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , ptr );
	+
	+ if ( m_result_ptr ) {
	+ const int n = Analysis::value_count( ReducerConditional::select(m_functor , m_reducer) );
	+
	+ for ( int j = 0 ; j < n ; ++j ) { m_result_ptr[j] = ptr[j] ; }
	}
	}

	+ //----------------------------------------
	+
	template< class ViewType >
	inline
	ParallelReduce( const FunctorType & arg_functor ,
	const Policy & arg_policy ,
	const ViewType & arg_result ,
	typename std::enable_if<
	Kokkos::is_view< ViewType >::value &&
	!Kokkos::is_reducer_type<ReducerType>::value
	,void*>::type = NULL)
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	, m_reducer( InvalidType() )
	, m_result_ptr( arg_result.ptr_on_device() )
	- , m_shmem_size( arg_policy.scratch_size(0) + arg_policy.scratch_size(1) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , arg_policy.team_size() ) )
	+ , m_shmem_size( arg_policy.scratch_size(0) +
	+ arg_policy.scratch_size(1) +
	+ FunctorTeamShmemSize< FunctorType >
	+ ::value( arg_functor , arg_policy.team_size() ) )
	{}

	inline
	ParallelReduce( const FunctorType & arg_functor
	, Policy arg_policy
	, const ReducerType& reducer )
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	, m_reducer( reducer )
	, m_result_ptr( reducer.result_view().data() )
	- , m_shmem_size( arg_policy.scratch_size(0) + arg_policy.scratch_size(1) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , arg_policy.team_size() ) )
	+ , m_shmem_size( arg_policy.scratch_size(0) +
	+ arg_policy.scratch_size(1) +
	+ FunctorTeamShmemSize< FunctorType >
	+ ::value( arg_functor , arg_policy.team_size() ) )
	{
	/*static_assert( std::is_same< typename ViewType::memory_space
	, Kokkos::HostSpace >::value
	, "Reduction result on Kokkos::OpenMP must be a Kokkos::View in HostSpace" );*/
	}

	};

	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	#endif /* KOKKOS_OPENMP_PARALLEL_HPP */

	diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
	index 5b3e9873e..9144d8c27 100644
	--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
	+++ b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
	@@ -1,329 +1,316 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Core.hpp>

	#if defined( KOKKOS_ENABLE_OPENMP ) && defined( KOKKOS_ENABLE_TASKDAG )

	#include <impl/Kokkos_TaskQueue_impl.hpp>
	+#include <impl/Kokkos_HostThreadTeam.hpp>

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	template class TaskQueue< Kokkos::OpenMP > ;

	-//----------------------------------------------------------------------------
	-
	-TaskExec< Kokkos::OpenMP >::
	-TaskExec()
	- : m_self_exec( 0 )
	- , m_team_exec( 0 )
	- , m_sync_mask( 0 )
	- , m_sync_value( 0 )
	- , m_sync_step( 0 )
	- , m_group_rank( 0 )
	- , m_team_rank( 0 )
	- , m_team_size( 1 )
	-{
	-}
	-
	-TaskExec< Kokkos::OpenMP >::
	-TaskExec( Kokkos::Impl::OpenMPexec & arg_exec , int const arg_team_size )
	- : m_self_exec( & arg_exec )
	- , m_team_exec( arg_exec.pool_rev(arg_exec.pool_rank_rev() / arg_team_size) )
	- , m_sync_mask( 0 )
	- , m_sync_value( 0 )
	- , m_sync_step( 0 )
	- , m_group_rank( arg_exec.pool_rank_rev() / arg_team_size )
	- , m_team_rank( arg_exec.pool_rank_rev() % arg_team_size )
	- , m_team_size( arg_team_size )
	-{
	- // This team spans
	- // m_self_exec->pool_rev( team_size * group_rank )
	- // m_self_exec->pool_rev( team_size * ( group_rank + 1 ) - 1 )
	-
	- int64_t volatile * const sync = (int64_t *) m_self_exec->scratch_reduce();
	-
	- sync[0] = int64_t(0) ;
	- sync[1] = int64_t(0) ;
	-
	- for ( int i = 0 ; i < m_team_size ; ++i ) {
	- m_sync_value \|= int64_t(1) << (8*i);
	- m_sync_mask \|= int64_t(3) << (8*i);
	- }
	+class HostThreadTeamDataSingleton : private HostThreadTeamData {
	+private:
	+
	+ HostThreadTeamDataSingleton() : HostThreadTeamData()
	+ {
	+ Kokkos::OpenMP::memory_space space ;
	+ const size_t num_pool_reduce_bytes = 32 ;
	+ const size_t num_team_reduce_bytes = 32 ;
	+ const size_t num_team_shared_bytes = 1024 ;
	+ const size_t num_thread_local_bytes = 1024 ;
	+ const size_t alloc_bytes =
	+ HostThreadTeamData::scratch_size( num_pool_reduce_bytes
	+ , num_team_reduce_bytes
	+ , num_team_shared_bytes
	+ , num_thread_local_bytes );
	+
	+ HostThreadTeamData::scratch_assign
	+ ( space.allocate( alloc_bytes )
	+ , alloc_bytes
	+ , num_pool_reduce_bytes
	+ , num_team_reduce_bytes
	+ , num_team_shared_bytes
	+ , num_thread_local_bytes );
	+ }
	+
	+ ~HostThreadTeamDataSingleton()
	+ {
	+ Kokkos::OpenMP::memory_space space ;
	+ space.deallocate( HostThreadTeamData::scratch_buffer()
	+ , HostThreadTeamData::scratch_bytes() );
	+ }
	+
	+public:
	+
	+ static HostThreadTeamData & singleton()
	+ {
	+ static HostThreadTeamDataSingleton s ;
	+ return s ;
	+ }
	+};

	- Kokkos::memory_fence();
	-}
	-
	-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+//----------------------------------------------------------------------------

	-void TaskExec< Kokkos::OpenMP >::team_barrier_impl() const
	+void TaskQueueSpecialization< Kokkos::OpenMP >::execute
	+ ( TaskQueue< Kokkos::OpenMP > * const queue )
	{
	- if ( m_team_exec->scratch_reduce_size() < int(2 * sizeof(int64_t)) ) {
	- Kokkos::abort("TaskQueue<OpenMP> scratch_reduce memory too small");
	- }
	+ using execution_space = Kokkos::OpenMP ;
	+ using queue_type = TaskQueue< execution_space > ;
	+ using task_root_type = TaskBase< execution_space , void , void > ;
	+ using Member = Impl::HostThreadTeamMember< execution_space > ;

	- // Use team shared memory to synchronize.
	- // Alternate memory locations between barriers to avoid a sequence
	- // of barriers overtaking one another.
	+ static task_root_type * const end =
	+ (task_root_type *) task_root_type::EndTag ;

	- int64_t volatile * const sync =
	- ((int64_t *) m_team_exec->scratch_reduce()) + ( m_sync_step & 0x01 );
	+ HostThreadTeamData & team_data_single =
	+ HostThreadTeamDataSingleton::singleton();

	- // This team member sets one byte within the sync variable
	- int8_t volatile * const sync_self =
	- ((int8_t *) sync) + m_team_rank ;
	+ const int team_size = Impl::OpenMPexec::pool_size(2); // Threads per core
	+ // const int team_size = Impl::OpenMPexec::pool_size(1); // Threads per NUMA

	#if 0
	-fprintf( stdout
	- , "barrier group(%d) member(%d) step(%d) wait(%lx) : before(%lx)\n"
	- , m_group_rank
	- , m_team_rank
	- , m_sync_step
	- , m_sync_value
	- , *sync
	- );
	+fprintf(stdout,"TaskQueue<OpenMP> execute %d\n", team_size );
	fflush(stdout);
	#endif

	- *sync_self = int8_t( m_sync_value & 0x03 ); // signal arrival

	- while ( m_sync_value != *sync ); // wait for team to arrive
	+#pragma omp parallel
	+ {
	+ Impl::HostThreadTeamData & self = *Impl::OpenMPexec::get_thread_data();

	-#if 0
	-fprintf( stdout
	- , "barrier group(%d) member(%d) step(%d) wait(%lx) : after(%lx)\n"
	- , m_group_rank
	- , m_team_rank
	- , m_sync_step
	- , m_sync_value
	- , *sync
	- );
	-fflush(stdout);
	-#endif
	+ // Organizing threads into a team performs a barrier across the
	+ // entire pool to insure proper initialization of the team
	+ // rendezvous mechanism before a team rendezvous can be performed.

	- ++m_sync_step ;
	+ if ( self.organize_team( team_size ) ) {

	- if ( 0 == ( 0x01 & m_sync_step ) ) { // Every other step
	- m_sync_value ^= m_sync_mask ;
	- if ( 1000 < m_sync_step ) m_sync_step = 0 ;
	- }
	-}
	+ Member single_exec( team_data_single );
	+ Member team_exec( self );

	+#if 0
	+fprintf(stdout,"TaskQueue<OpenMP> pool(%d of %d) team(%d of %d) league(%d of %d) running\n"
	+ , self.pool_rank()
	+ , self.pool_size()
	+ , team_exec.team_rank()
	+ , team_exec.team_size()
	+ , team_exec.league_rank()
	+ , team_exec.league_size()
	+ );
	+fflush(stdout);
	#endif

	-//----------------------------------------------------------------------------
	-
	-void TaskQueueSpecialization< Kokkos::OpenMP >::execute
	- ( TaskQueue< Kokkos::OpenMP > * const queue )
	-{
	- using execution_space = Kokkos::OpenMP ;
	- using queue_type = TaskQueue< execution_space > ;
	- using task_root_type = TaskBase< execution_space , void , void > ;
	- using PoolExec = Kokkos::Impl::OpenMPexec ;
	- using Member = TaskExec< execution_space > ;
	+ // Loop until all queues are empty and no tasks in flight

	- task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
	+ task_root_type * task = 0 ;

	- // Required: team_size <= 8
	+ do {
	+ // Each team lead attempts to acquire either a thread team task
	+ // or a single thread task for the team.

	- const int team_size = PoolExec::pool_size(2); // Threads per core
	- // const int team_size = PoolExec::pool_size(1); // Threads per NUMA
	+ if ( 0 == team_exec.team_rank() ) {

	- if ( 8 < team_size ) {
	- Kokkos::abort("TaskQueue<OpenMP> unsupported team size");
	- }
	+ bool leader_loop = false ;

	-#pragma omp parallel
	- {
	- PoolExec & self = *PoolExec::get_thread_omp();
	+ do {

	- Member single_exec ;
	- Member team_exec( self , team_size );
	+ if ( 0 != task && end != task ) {
	+ // team member #0 completes the previously executed task,
	+ // completion may delete the task
	+ queue->complete( task );
	+ }

	- // Team shared memory
	- task_root_type * volatile * const task_shared =
	- (task_root_type **) team_exec.m_team_exec->scratch_thread();
	+ // If 0 == m_ready_count then set task = 0

	-// Barrier across entire OpenMP thread pool to insure initialization
	-#pragma omp barrier
	+ task = 0 < ((volatile int ) & queue->m_ready_count) ? end : 0 ;

	- // Loop until all queues are empty and no tasks in flight
	+ // Attempt to acquire a task
	+ // Loop by priority and then type
	+ for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
	+ for ( int j = 0 ; j < 2 && end == task ; ++j ) {
	+ task = queue_type::pop_ready_task( & queue->m_ready[i][j] );
	+ }
	+ }

	- do {
	+ // If still tasks are still executing
	+ // and no task could be acquired
	+ // then continue this leader loop
	+ leader_loop = end == task ;

	- task_root_type * task = 0 ;
	+ if ( ( ! leader_loop ) &&
	+ ( 0 != task ) &&
	+ ( task_root_type::TaskSingle == task->m_task_type ) ) {

	- // Each team lead attempts to acquire either a thread team task
	- // or a single thread task for the team.
	+ // if a single thread task then execute now

	- if ( 0 == team_exec.team_rank() ) {
	+#if 0
	+fprintf(stdout,"TaskQueue<OpenMP> pool(%d of %d) executing single task 0x%lx\n"
	+ , self.pool_rank()
	+ , self.pool_size()
	+ , int64_t(task)
	+ );
	+fflush(stdout);
	+#endif

	- task = 0 < ((volatile int ) & queue->m_ready_count) ? end : 0 ;
	+ (*task->m_apply)( task , & single_exec );

	- // Loop by priority and then type
	- for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
	- for ( int j = 0 ; j < 2 && end == task ; ++j ) {
	- task = queue_type::pop_task( & queue->m_ready[i][j] );
	- }
	+ leader_loop = true ;
	+ }
	+ } while ( leader_loop );
	}
	- }
	-
	- // Team lead broadcast acquired task to team members:
	-
	- if ( 1 < team_exec.team_size() ) {
	-
	- if ( 0 == team_exec.team_rank() ) *task_shared = task ;
	-
	- // Fence to be sure task_shared is stored before the barrier
	- Kokkos::memory_fence();

	- // Whole team waits for every team member to reach this statement
	- team_exec.team_barrier();
	+ // Team lead either found 0 == m_ready_count or a team task
	+ // Team lead broadcast acquired task:

	- // Fence to be sure task_shared is stored
	- Kokkos::memory_fence();
	+ team_exec.team_broadcast( task , 0);

	- task = *task_shared ;
	- }
	+ if ( 0 != task ) { // Thread Team Task

	#if 0
	-fprintf( stdout
	- , "\nexecute group(%d) member(%d) task_shared(0x%lx) task(0x%lx)\n"
	- , team_exec.m_group_rank
	- , team_exec.m_team_rank
	- , uintptr_t(task_shared)
	- , uintptr_t(task)
	+fprintf(stdout,"TaskQueue<OpenMP> pool(%d of %d) team((%d of %d) league(%d of %d) executing team task 0x%lx\n"
	+ , self.pool_rank()
	+ , self.pool_size()
	+ , team_exec.team_rank()
	+ , team_exec.team_size()
	+ , team_exec.league_rank()
	+ , team_exec.league_size()
	+ , int64_t(task)
	);
	fflush(stdout);
	#endif

	- if ( 0 == task ) break ; // 0 == m_ready_count
	-
	- if ( end == task ) {
	- // All team members wait for whole team to reach this statement.
	- // Is necessary to prevent task_shared from being updated
	- // before it is read by all threads.
	- team_exec.team_barrier();
	- }
	- else if ( task_root_type::TaskTeam == task->m_task_type ) {
	- // Thread Team Task
	- (*task->m_apply)( task , & team_exec );
	+ (*task->m_apply)( task , & team_exec );

	- // The m_apply function performs a barrier
	-
	- if ( 0 == team_exec.team_rank() ) {
	- // team member #0 completes the task, which may delete the task
	- queue->complete( task );
	+ // The m_apply function performs a barrier
	}
	- }
	- else {
	- // Single Thread Task
	+ } while( 0 != task );

	- if ( 0 == team_exec.team_rank() ) {
	+#if 0
	+fprintf(stdout,"TaskQueue<OpenMP> pool(%d of %d) team(%d of %d) league(%d of %d) ending\n"
	+ , self.pool_rank()
	+ , self.pool_size()
	+ , team_exec.team_rank()
	+ , team_exec.team_size()
	+ , team_exec.league_rank()
	+ , team_exec.league_size()
	+ );
	+fflush(stdout);
	+#endif

	- (*task->m_apply)( task , & single_exec );
	+ }

	- queue->complete( task );
	- }
	+ self.disband_team();
	+
	+#if 0
	+fprintf(stdout,"TaskQueue<OpenMP> pool(%d of %d) disbanded\n"
	+ , self.pool_rank()
	+ , self.pool_size()
	+ );
	+fflush(stdout);
	+#endif

	- // All team members wait for whole team to reach this statement.
	- // Not necessary to complete the task.
	- // Is necessary to prevent task_shared from being updated
	- // before it is read by all threads.
	- team_exec.team_barrier();
	- }
	- } while(1);
	}
	// END #pragma omp parallel

	+#if 0
	+fprintf(stdout,"TaskQueue<OpenMP> execute %d end\n", team_size );
	+fflush(stdout);
	+#endif
	+
	}

	void TaskQueueSpecialization< Kokkos::OpenMP >::
	iff_single_thread_recursive_execute
	( TaskQueue< Kokkos::OpenMP > * const queue )
	{
	using execution_space = Kokkos::OpenMP ;
	using queue_type = TaskQueue< execution_space > ;
	using task_root_type = TaskBase< execution_space , void , void > ;
	- using Member = TaskExec< execution_space > ;
	+ using Member = Impl::HostThreadTeamMember< execution_space > ;

	if ( 1 == omp_get_num_threads() ) {

	task_root_type * const end = (task_root_type *) task_root_type::EndTag ;

	- Member single_exec ;
	+ HostThreadTeamData & team_data_single =
	+ HostThreadTeamDataSingleton::singleton();
	+
	+ Member single_exec( team_data_single );

	task_root_type * task = end ;

	do {

	task = end ;

	// Loop by priority and then type
	for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
	for ( int j = 0 ; j < 2 && end == task ; ++j ) {
	- task = queue_type::pop_task( & queue->m_ready[i][j] );
	+ task = queue_type::pop_ready_task( & queue->m_ready[i][j] );
	}
	}

	if ( end == task ) break ;

	(*task->m_apply)( task , & single_exec );

	queue->complete( task );

	} while(1);
	}
	}

	}} /* namespace Kokkos::Impl */

	//----------------------------------------------------------------------------

	#endif /* #if defined( KOKKOS_ENABLE_OPENMP ) && defined( KOKKOS_ENABLE_TASKDAG ) */


	diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.hpp b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.hpp
	index 15dbb77c2..3cfdf790b 100644
	--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.hpp
	+++ b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.hpp
	@@ -1,365 +1,89 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_IMPL_OPENMP_TASK_HPP
	#define KOKKOS_IMPL_OPENMP_TASK_HPP

	#if defined( KOKKOS_ENABLE_TASKDAG )

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	template<>
	class TaskQueueSpecialization< Kokkos::OpenMP >
	{
	public:

	using execution_space = Kokkos::OpenMP ;
	using queue_type = Kokkos::Impl::TaskQueue< execution_space > ;
	using task_base_type = Kokkos::Impl::TaskBase< execution_space , void , void > ;
	+ using member_type = Kokkos::Impl::HostThreadTeamMember< execution_space > ;

	// Must specify memory space
	using memory_space = Kokkos::HostSpace ;

	static
	void iff_single_thread_recursive_execute( queue_type * const );

	// Must provide task queue execution function
	static void execute( queue_type * const );

	- // Must provide mechanism to set function pointer in
	- // execution space from the host process.
	- template< typename FunctorType >
	+ template< typename TaskType >
	static
	- void proc_set_apply( task_base_type::function_type * ptr )
	- {
	- using TaskType = TaskBase< Kokkos::OpenMP
	- , typename FunctorType::value_type
	- , FunctorType
	- > ;
	- *ptr = TaskType::apply ;
	- }
	+ typename TaskType::function_type
	+ get_function_pointer() { return TaskType::apply ; }
	};

	extern template class TaskQueue< Kokkos::OpenMP > ;

	-//----------------------------------------------------------------------------
	-
	-template<>
	-class TaskExec< Kokkos::OpenMP >
	-{
	-private:
	-
	- TaskExec( TaskExec && ) = delete ;
	- TaskExec( TaskExec const & ) = delete ;
	- TaskExec & operator = ( TaskExec && ) = delete ;
	- TaskExec & operator = ( TaskExec const & ) = delete ;
	-
	-
	- using PoolExec = Kokkos::Impl::OpenMPexec ;
	-
	- friend class Kokkos::Impl::TaskQueue< Kokkos::OpenMP > ;
	- friend class Kokkos::Impl::TaskQueueSpecialization< Kokkos::OpenMP > ;
	-
	- PoolExec * const m_self_exec ; ///< This thread's thread pool data structure
	- PoolExec * const m_team_exec ; ///< Team thread's thread pool data structure
	- int64_t m_sync_mask ;
	- int64_t mutable m_sync_value ;
	- int mutable m_sync_step ;
	- int m_group_rank ; ///< Which "team" subset of thread pool
	- int m_team_rank ; ///< Which thread within a team
	- int m_team_size ;
	-
	- TaskExec();
	- TaskExec( PoolExec & arg_exec , int arg_team_size );
	-
	- void team_barrier_impl() const ;
	-
	-public:
	-
	-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- void * team_shared() const
	- { return m_team_exec ? m_team_exec->scratch_thread() : (void*) 0 ; }
	-
	- int team_shared_size() const
	- { return m_team_exec ? m_team_exec->scratch_thread_size() : 0 ; }
	-
	- /**\brief Whole team enters this function call
	- * before any teeam member returns from
	- * this function call.
	- */
	- void team_barrier() const { if ( 1 < m_team_size ) team_barrier_impl(); }
	-#else
	- KOKKOS_INLINE_FUNCTION void team_barrier() const {}
	- KOKKOS_INLINE_FUNCTION void * team_shared() const { return 0 ; }
	- KOKKOS_INLINE_FUNCTION int team_shared_size() const { return 0 ; }
	-#endif
	-
	- KOKKOS_INLINE_FUNCTION
	- int team_rank() const { return m_team_rank ; }
	-
	- KOKKOS_INLINE_FUNCTION
	- int team_size() const { return m_team_size ; }
	-};
	-
	}} /* namespace Kokkos::Impl */

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	-namespace Kokkos {
	-
	-template<typename iType>
	-KOKKOS_INLINE_FUNCTION
	-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >
	-TeamThreadRange
	- ( Impl::TaskExec< Kokkos::OpenMP > & thread, const iType & count )
	-{
	- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >(thread,count);
	-}
	-
	-template<typename iType1, typename iType2>
	-KOKKOS_INLINE_FUNCTION
	-Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
	- Impl::TaskExec< Kokkos::OpenMP > >
	-TeamThreadRange
	- ( Impl:: TaskExec< Kokkos::OpenMP > & thread, const iType1 & begin, const iType2 & end )
	-{
	- typedef typename std::common_type<iType1, iType2>::type iType;
	- return Impl::TeamThreadRangeBoundariesStruct<iType, Impl::TaskExec< Kokkos::OpenMP > >(thread, begin, end);
	-}
	-
	-template<typename iType>
	-KOKKOS_INLINE_FUNCTION
	-Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >
	-ThreadVectorRange
	- ( Impl::TaskExec< Kokkos::OpenMP > & thread
	- , const iType & count )
	-{
	- return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >(thread,count);
	-}
	-
	-/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all threads of the the calling thread team.
	- * This functionality requires C++11 support.
	-*/
	-template<typename iType, class Lambda>
	-KOKKOS_INLINE_FUNCTION
	-void parallel_for
	- ( const Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::OpenMP > >& loop_boundaries
	- , const Lambda& lambda
	- )
	-{
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- lambda(i);
	- }
	-}
	-
	-template<typename iType, class Lambda, typename ValueType>
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce
	- ( const Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::OpenMP > >& loop_boundaries
	- , const Lambda& lambda
	- , ValueType& initialized_result)
	-{
	- int team_rank = loop_boundaries.thread.team_rank(); // member num within the team
	- ValueType result = initialized_result;
	-
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- lambda(i, result);
	- }
	-
	- if ( 1 < loop_boundaries.thread.team_size() ) {
	-
	- ValueType shared = (ValueType) loop_boundaries.thread.team_shared();
	-
	- loop_boundaries.thread.team_barrier();
	- shared[team_rank] = result;
	-
	- loop_boundaries.thread.team_barrier();
	-
	- // reduce across threads to thread 0
	- if (team_rank == 0) {
	- for (int i = 1; i < loop_boundaries.thread.team_size(); i++) {
	- shared[0] += shared[i];
	- }
	- }
	-
	- loop_boundaries.thread.team_barrier();
	-
	- // broadcast result
	- initialized_result = shared[0];
	- }
	- else {
	- initialized_result = result ;
	- }
	-}
	-
	-template< typename iType, class Lambda, typename ValueType, class JoinType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce
	- (const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >& loop_boundaries,
	- const Lambda & lambda,
	- const JoinType & join,
	- ValueType& initialized_result)
	-{
	- int team_rank = loop_boundaries.thread.team_rank(); // member num within the team
	- ValueType result = initialized_result;
	-
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- lambda(i, result);
	- }
	-
	- if ( 1 < loop_boundaries.thread.team_size() ) {
	- ValueType shared = (ValueType) loop_boundaries.thread.team_shared();
	-
	- loop_boundaries.thread.team_barrier();
	- shared[team_rank] = result;
	-
	- loop_boundaries.thread.team_barrier();
	-
	- // reduce across threads to thread 0
	- if (team_rank == 0) {
	- for (int i = 1; i < loop_boundaries.thread.team_size(); i++) {
	- join(shared[0], shared[i]);
	- }
	- }
	-
	- loop_boundaries.thread.team_barrier();
	-
	- // broadcast result
	- initialized_result = shared[0];
	- }
	- else {
	- initialized_result = result ;
	- }
	-}
	-
	-// placeholder for future function
	-template< typename iType, class Lambda, typename ValueType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce
	- (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >& loop_boundaries,
	- const Lambda & lambda,
	- ValueType& initialized_result)
	-{
	-}
	-
	-// placeholder for future function
	-template< typename iType, class Lambda, typename ValueType, class JoinType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce
	- (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >& loop_boundaries,
	- const Lambda & lambda,
	- const JoinType & join,
	- ValueType& initialized_result)
	-{
	-}
	-
	-template< typename ValueType, typename iType, class Lambda >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_scan
	- (const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >& loop_boundaries,
	- const Lambda & lambda)
	-{
	- ValueType accum = 0 ;
	- ValueType val, local_total;
	- ValueType shared = (ValueType) loop_boundaries.thread.team_shared();
	- int team_size = loop_boundaries.thread.team_size();
	- int team_rank = loop_boundaries.thread.team_rank(); // member num within the team
	-
	- // Intra-member scan
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- local_total = 0;
	- lambda(i,local_total,false);
	- val = accum;
	- lambda(i,val,true);
	- accum += local_total;
	- }
	-
	- shared[team_rank] = accum;
	- loop_boundaries.thread.team_barrier();
	-
	- // Member 0 do scan on accumulated totals
	- if (team_rank == 0) {
	- for( iType i = 1; i < team_size; i+=1) {
	- shared[i] += shared[i-1];
	- }
	- accum = 0; // Member 0 set accum to 0 in preparation for inter-member scan
	- }
	-
	- loop_boundaries.thread.team_barrier();
	-
	- // Inter-member scan adding in accumulated totals
	- if (team_rank != 0) { accum = shared[team_rank-1]; }
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- local_total = 0;
	- lambda(i,local_total,false);
	- val = accum;
	- lambda(i,val,true);
	- accum += local_total;
	- }
	-}
	-
	-// placeholder for future function
	-template< typename iType, class Lambda, typename ValueType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_scan
	- (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >& loop_boundaries,
	- const Lambda & lambda)
	-{
	-}
	-
	-
	-} /* namespace Kokkos */
	-
	-//----------------------------------------------------------------------------
	-//----------------------------------------------------------------------------
	-
	#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
	#endif /* #ifndef KOKKOS_IMPL_OPENMP_TASK_HPP */

	diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.cpp b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.cpp
	index 34cf581a4..2d50c6e54 100644
	--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.cpp
	+++ b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.cpp
	@@ -1,408 +1,462 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <stdio.h>
	#include <limits>
	#include <iostream>
	#include <vector>
	#include <Kokkos_Core.hpp>
	#include <impl/Kokkos_Error.hpp>
	#include <iostream>
	#include <impl/Kokkos_CPUDiscovery.hpp>
	#include <impl/Kokkos_Profiling_Interface.hpp>

	#ifdef KOKKOS_ENABLE_OPENMP

	namespace Kokkos {
	namespace Impl {
	namespace {

	KOKKOS_INLINE_FUNCTION
	int kokkos_omp_in_parallel();

	int kokkos_omp_in_critical_region = ( Kokkos::HostSpace::register_in_parallel( kokkos_omp_in_parallel ) , 0 );

	KOKKOS_INLINE_FUNCTION
	int kokkos_omp_in_parallel()
	{
	#ifndef __CUDA_ARCH__
	return omp_in_parallel() && ! kokkos_omp_in_critical_region ;
	#else
	return 0;
	#endif
	}

	bool s_using_hwloc = false;

	} // namespace
	} // namespace Impl
	} // namespace Kokkos


	namespace Kokkos {
	namespace Impl {

	int OpenMPexec::m_map_rank[ OpenMPexec::MAX_THREAD_COUNT ] = { 0 };

	int OpenMPexec::m_pool_topo[ 4 ] = { 0 };

	-OpenMPexec * OpenMPexec::m_pool[ OpenMPexec::MAX_THREAD_COUNT ] = { 0 };
	+HostThreadTeamData * OpenMPexec::m_pool[ OpenMPexec::MAX_THREAD_COUNT ] = { 0 };

	void OpenMPexec::verify_is_process( const char * const label )
	{
	if ( omp_in_parallel() ) {
	std::string msg( label );
	msg.append( " ERROR: in parallel" );
	Kokkos::Impl::throw_runtime_exception( msg );
	}
	}

	void OpenMPexec::verify_initialized( const char * const label )
	{
	if ( 0 == m_pool[0] ) {
	std::string msg( label );
	msg.append( " ERROR: not initialized" );
	Kokkos::Impl::throw_runtime_exception( msg );
	}

	if ( omp_get_max_threads() != Kokkos::OpenMP::thread_pool_size(0) ) {
	std::string msg( label );
	msg.append( " ERROR: Initialized but threads modified inappropriately" );
	Kokkos::Impl::throw_runtime_exception( msg );
	}

	}

	-void OpenMPexec::clear_scratch()
	+} // namespace Impl
	+} // namespace Kokkos
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+namespace Impl {
	+
	+void OpenMPexec::clear_thread_data()
	{
	+ const size_t member_bytes =
	+ sizeof(int64_t) *
	+ HostThreadTeamData::align_to_int64( sizeof(HostThreadTeamData) );
	+
	+ const int old_alloc_bytes =
	+ m_pool[0] ? ( member_bytes + m_pool[0]->scratch_bytes() ) : 0 ;
	+
	+ Kokkos::HostSpace space ;
	+
	#pragma omp parallel
	{
	- const int rank_rev = m_map_rank[ omp_get_thread_num() ];
	- typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > Record ;
	- if ( m_pool[ rank_rev ] ) {
	- Record * const r = Record::get_record( m_pool[ rank_rev ] );
	- m_pool[ rank_rev ] = 0 ;
	- Record::decrement( r );
	+ const int rank = m_map_rank[ omp_get_thread_num() ];
	+
	+ if ( 0 != m_pool[rank] ) {
	+
	+ m_pool[rank]->disband_pool();
	+
	+ space.deallocate( m_pool[rank] , old_alloc_bytes );
	+
	+ m_pool[rank] = 0 ;
	}
	}
	/* END #pragma omp parallel */
	}

	-void OpenMPexec::resize_scratch( size_t reduce_size , size_t thread_size )
	+void OpenMPexec::resize_thread_data( size_t pool_reduce_bytes
	+ , size_t team_reduce_bytes
	+ , size_t team_shared_bytes
	+ , size_t thread_local_bytes )
	{
	- enum { ALIGN_MASK = Kokkos::Impl::MEMORY_ALIGNMENT - 1 };
	- enum { ALLOC_EXEC = ( sizeof(OpenMPexec) + ALIGN_MASK ) & ~ALIGN_MASK };
	+ const size_t member_bytes =
	+ sizeof(int64_t) *
	+ HostThreadTeamData::align_to_int64( sizeof(HostThreadTeamData) );

	- const size_t old_reduce_size = m_pool[0] ? m_pool[0]->m_scratch_reduce_end : 0 ;
	- const size_t old_thread_size = m_pool[0] ? m_pool[0]->m_scratch_thread_end - m_pool[0]->m_scratch_reduce_end : 0 ;
	+ HostThreadTeamData * root = m_pool[0] ;

	- reduce_size = ( reduce_size + ALIGN_MASK ) & ~ALIGN_MASK ;
	- thread_size = ( thread_size + ALIGN_MASK ) & ~ALIGN_MASK ;
	+ const size_t old_pool_reduce = root ? root->pool_reduce_bytes() : 0 ;
	+ const size_t old_team_reduce = root ? root->team_reduce_bytes() : 0 ;
	+ const size_t old_team_shared = root ? root->team_shared_bytes() : 0 ;
	+ const size_t old_thread_local = root ? root->thread_local_bytes() : 0 ;
	+ const size_t old_alloc_bytes = root ? ( member_bytes + root->scratch_bytes() ) : 0 ;

	- // Requesting allocation and old allocation is too small:
	+ // Allocate if any of the old allocation is tool small:

	- const bool allocate = ( old_reduce_size < reduce_size ) \|\|
	- ( old_thread_size < thread_size );
	+ const bool allocate = ( old_pool_reduce < pool_reduce_bytes ) \|\|
	+ ( old_team_reduce < team_reduce_bytes ) \|\|
	+ ( old_team_shared < team_shared_bytes ) \|\|
	+ ( old_thread_local < thread_local_bytes );

	if ( allocate ) {
	- if ( reduce_size < old_reduce_size ) { reduce_size = old_reduce_size ; }
	- if ( thread_size < old_thread_size ) { thread_size = old_thread_size ; }
	- }

	- const size_t alloc_size = allocate ? ALLOC_EXEC + reduce_size + thread_size : 0 ;
	- const int pool_size = m_pool_topo[0] ;
	+ if ( pool_reduce_bytes < old_pool_reduce ) { pool_reduce_bytes = old_pool_reduce ; }
	+ if ( team_reduce_bytes < old_team_reduce ) { team_reduce_bytes = old_team_reduce ; }
	+ if ( team_shared_bytes < old_team_shared ) { team_shared_bytes = old_team_shared ; }
	+ if ( thread_local_bytes < old_thread_local ) { thread_local_bytes = old_thread_local ; }

	- if ( allocate ) {
	+ const size_t alloc_bytes =
	+ member_bytes +
	+ HostThreadTeamData::scratch_size( pool_reduce_bytes
	+ , team_reduce_bytes
	+ , team_shared_bytes
	+ , thread_local_bytes );
	+
	+ const int pool_size = omp_get_max_threads();

	- clear_scratch();
	+ Kokkos::HostSpace space ;

	#pragma omp parallel
	{
	- const int rank_rev = m_map_rank[ omp_get_thread_num() ];
	- const int rank = pool_size - ( rank_rev + 1 );
	+ const int rank = m_map_rank[ omp_get_thread_num() ];

	- typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > Record ;
	+ if ( 0 != m_pool[rank] ) {

	- Record * const r = Record::allocate( Kokkos::HostSpace()
	- , "openmp_scratch"
	- , alloc_size );
	+ m_pool[rank]->disband_pool();

	- Record::increment( r );
	+ space.deallocate( m_pool[rank] , old_alloc_bytes );
	+ }
	+
	+ void * const ptr = space.allocate( alloc_bytes );

	- m_pool[ rank_rev ] = reinterpret_cast<OpenMPexec*>( r->data() );
	+ m_pool[ rank ] = new( ptr ) HostThreadTeamData();

	- new ( m_pool[ rank_rev ] ) OpenMPexec( rank , ALLOC_EXEC , reduce_size , thread_size );
	+ m_pool[ rank ]->
	+ scratch_assign( ((char *)ptr) + member_bytes
	+ , alloc_bytes
	+ , pool_reduce_bytes
	+ , team_reduce_bytes
	+ , team_shared_bytes
	+ , thread_local_bytes );
	}
	/* END #pragma omp parallel */
	+
	+ HostThreadTeamData::organize_pool( m_pool , pool_size );
	}
	}

	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {

	//----------------------------------------------------------------------------

	int OpenMP::is_initialized()
	{ return 0 != Impl::OpenMPexec::m_pool[0]; }

	void OpenMP::initialize( unsigned thread_count ,
	unsigned use_numa_count ,
	unsigned use_cores_per_numa )
	{
	// Before any other call to OMP query the maximum number of threads
	// and save the value for re-initialization unit testing.

	- //Using omp_get_max_threads(); is problematic in conjunction with
	- //Hwloc on Intel (essentially an initial call to the OpenMP runtime
	- //without a parallel region before will set a process mask for a single core
	- //The runtime will than bind threads for a parallel region to other cores on the
	- //entering the first parallel region and make the process mask the aggregate of
	- //the thread masks. The intend seems to be to make serial code run fast, if you
	- //compile with OpenMP enabled but don't actually use parallel regions or so
	- //static int omp_max_threads = omp_get_max_threads();
	+ // Using omp_get_max_threads(); is problematic in conjunction with
	+ // Hwloc on Intel (essentially an initial call to the OpenMP runtime
	+ // without a parallel region before will set a process mask for a single core
	+ // The runtime will than bind threads for a parallel region to other cores on the
	+ // entering the first parallel region and make the process mask the aggregate of
	+ // the thread masks. The intend seems to be to make serial code run fast, if you
	+ // compile with OpenMP enabled but don't actually use parallel regions or so
	+ // static int omp_max_threads = omp_get_max_threads();
	int nthreads = 0;
	#pragma omp parallel
	{
	#pragma omp atomic
	nthreads++;
	}

	static int omp_max_threads = nthreads;

	const bool is_initialized = 0 != Impl::OpenMPexec::m_pool[0] ;

	bool thread_spawn_failed = false ;

	if ( ! is_initialized ) {

	// Use hwloc thread pinning if concerned with locality.
	// If spreading threads across multiple NUMA regions.
	// If hyperthreading is enabled.
	Impl::s_using_hwloc = hwloc::available() && (
	( 1 < Kokkos::hwloc::get_available_numa_count() ) \|\|
	( 1 < Kokkos::hwloc::get_available_threads_per_core() ) );

	std::pair<unsigned,unsigned> threads_coord[ Impl::OpenMPexec::MAX_THREAD_COUNT ];

	// If hwloc available then use it's maximum value.

	if ( thread_count == 0 ) {
	thread_count = Impl::s_using_hwloc
	? Kokkos::hwloc::get_available_numa_count() *
	Kokkos::hwloc::get_available_cores_per_numa() *
	Kokkos::hwloc::get_available_threads_per_core()
	: omp_max_threads ;
	}

	if(Impl::s_using_hwloc)
	hwloc::thread_mapping( "Kokkos::OpenMP::initialize" ,
	false /* do not allow asynchronous */ ,
	thread_count ,
	use_numa_count ,
	use_cores_per_numa ,
	threads_coord );

	// Spawn threads:

	omp_set_num_threads( thread_count );

	// Verify OMP interaction:
	if ( int(thread_count) != omp_get_max_threads() ) {
	thread_spawn_failed = true ;
	}

	// Verify spawning and bind threads:
	#pragma omp parallel
	{
	#pragma omp critical
	{
	if ( int(thread_count) != omp_get_num_threads() ) {
	thread_spawn_failed = true ;
	}

	// Call to 'bind_this_thread' is not thread safe so place this whole block in a critical region.
	// Call to 'new' may not be thread safe as well.

	- // Reverse the rank for threads so that the scan operation reduces to the highest rank thread.
	-
	const unsigned omp_rank = omp_get_thread_num();
	const unsigned thread_r = Impl::s_using_hwloc && Kokkos::hwloc::can_bind_threads()
	? Kokkos::hwloc::bind_this_thread( thread_count , threads_coord )
	: omp_rank ;

	Impl::OpenMPexec::m_map_rank[ omp_rank ] = thread_r ;
	}
	/* END #pragma omp critical */
	}
	/* END #pragma omp parallel */

	if ( ! thread_spawn_failed ) {
	Impl::OpenMPexec::m_pool_topo[0] = thread_count ;
	Impl::OpenMPexec::m_pool_topo[1] = Impl::s_using_hwloc ? thread_count / use_numa_count : thread_count;
	Impl::OpenMPexec::m_pool_topo[2] = Impl::s_using_hwloc ? thread_count / ( use_numa_count * use_cores_per_numa ) : 1;

	- Impl::OpenMPexec::resize_scratch( 1024 , 1024 );
	+ // New, unified host thread team data:
	+ {
	+ size_t pool_reduce_bytes = 32 * thread_count ;
	+ size_t team_reduce_bytes = 32 * thread_count ;
	+ size_t team_shared_bytes = 1024 * thread_count ;
	+ size_t thread_local_bytes = 1024 ;
	+
	+ Impl::OpenMPexec::resize_thread_data( pool_reduce_bytes
	+ , team_reduce_bytes
	+ , team_shared_bytes
	+ , thread_local_bytes
	+ );
	+ }
	}
	}

	if ( is_initialized \|\| thread_spawn_failed ) {
	std::string msg("Kokkos::OpenMP::initialize ERROR");

	if ( is_initialized ) { msg.append(" : already initialized"); }
	if ( thread_spawn_failed ) { msg.append(" : failed spawning threads"); }

	Kokkos::Impl::throw_runtime_exception(msg);
	}

	// Check for over-subscription
	//if( Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node() ) {
	// std::cout << "Kokkos::OpenMP::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
	// std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
	// std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
	// std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
	//}
	// Init the array for used for arbitrarily sized atomics
	Impl::init_lock_array_host_space();

	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	Kokkos::Profiling::initialize();
	#endif
	}

	//----------------------------------------------------------------------------

	void OpenMP::finalize()
	{
	Impl::OpenMPexec::verify_initialized( "OpenMP::finalize" );
	Impl::OpenMPexec::verify_is_process( "OpenMP::finalize" );

	- Impl::OpenMPexec::clear_scratch();
	+ // New, unified host thread team data:
	+ Impl::OpenMPexec::clear_thread_data();

	Impl::OpenMPexec::m_pool_topo[0] = 0 ;
	Impl::OpenMPexec::m_pool_topo[1] = 0 ;
	Impl::OpenMPexec::m_pool_topo[2] = 0 ;

	omp_set_num_threads(1);

	if ( Impl::s_using_hwloc && Kokkos::hwloc::can_bind_threads() ) {
	hwloc::unbind_this_thread();
	}

	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	Kokkos::Profiling::finalize();
	#endif
	}

	//----------------------------------------------------------------------------

	void OpenMP::print_configuration( std::ostream & s , const bool detail )
	{
	Impl::OpenMPexec::verify_is_process( "OpenMP::print_configuration" );

	s << "Kokkos::OpenMP" ;

	#if defined( KOKKOS_ENABLE_OPENMP )
	s << " KOKKOS_ENABLE_OPENMP" ;
	#endif
	#if defined( KOKKOS_ENABLE_HWLOC )

	const unsigned numa_count_ = Kokkos::hwloc::get_available_numa_count();
	const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
	const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();

	s << " hwloc[" << numa_count_ << "x" << cores_per_numa << "x" << threads_per_core << "]"
	<< " hwloc_binding_" << ( Impl::s_using_hwloc ? "enabled" : "disabled" )
	;
	#endif

	const bool is_initialized = 0 != Impl::OpenMPexec::m_pool[0] ;

	if ( is_initialized ) {
	const int numa_count = Kokkos::Impl::OpenMPexec::m_pool_topo[0] / Kokkos::Impl::OpenMPexec::m_pool_topo[1] ;
	const int core_per_numa = Kokkos::Impl::OpenMPexec::m_pool_topo[1] / Kokkos::Impl::OpenMPexec::m_pool_topo[2] ;
	const int thread_per_core = Kokkos::Impl::OpenMPexec::m_pool_topo[2] ;

	s << " thread_pool_topology[ " << numa_count
	<< " x " << core_per_numa
	<< " x " << thread_per_core
	<< " ]"
	<< std::endl ;

	if ( detail ) {
	std::vector< std::pair<unsigned,unsigned> > coord( Kokkos::Impl::OpenMPexec::m_pool_topo[0] );

	#pragma omp parallel
	{
	#pragma omp critical
	{
	coord[ omp_get_thread_num() ] = hwloc::get_this_thread_coordinate();
	}
	/* END #pragma omp critical */
	}
	/* END #pragma omp parallel */

	for ( unsigned i = 0 ; i < coord.size() ; ++i ) {
	s << " thread omp_rank[" << i << "]"
	<< " kokkos_rank[" << Impl::OpenMPexec::m_map_rank[ i ] << "]"
	<< " hwloc_coord[" << coord[i].first << "." << coord[i].second << "]"
	<< std::endl ;
	}
	}
	}
	else {
	s << " not initialized" << std::endl ;
	}
	}

	int OpenMP::concurrency() {
	return thread_pool_size(0);
	}

	} // namespace Kokkos

	#endif //KOKKOS_ENABLE_OPENMP
	diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.hpp b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.hpp
	index 63f7234da..39ace3131 100644
	--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.hpp
	+++ b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.hpp
	@@ -1,1065 +1,345 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_OPENMPEXEC_HPP
	#define KOKKOS_OPENMPEXEC_HPP

	+#include <Kokkos_OpenMP.hpp>
	+
	#include <impl/Kokkos_Traits.hpp>
	-#include <impl/Kokkos_spinwait.hpp>
	+#include <impl/Kokkos_HostThreadTeam.hpp>

	#include <Kokkos_Atomic.hpp>
	+
	#include <iostream>
	#include <sstream>
	#include <fstream>
	+
	+#include <omp.h>
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	namespace Kokkos {
	namespace Impl {

	//----------------------------------------------------------------------------
	/** \brief Data for OpenMP thread execution */

	class OpenMPexec {
	public:

	+ friend class Kokkos::OpenMP ;
	+
	enum { MAX_THREAD_COUNT = 4096 };

	private:

	- static OpenMPexec * m_pool[ MAX_THREAD_COUNT ]; // Indexed by: m_pool_rank_rev
	-
	static int m_pool_topo[ 4 ];
	static int m_map_rank[ MAX_THREAD_COUNT ];

	- friend class Kokkos::OpenMP ;
	-
	- int const m_pool_rank ;
	- int const m_pool_rank_rev ;
	- int const m_scratch_exec_end ;
	- int const m_scratch_reduce_end ;
	- int const m_scratch_thread_end ;
	-
	- int volatile m_barrier_state ;
	-
	- // Members for dynamic scheduling
	- // Which thread am I stealing from currently
	- int m_current_steal_target;
	- // This thread's owned work_range
	- Kokkos::pair<long,long> m_work_range KOKKOS_ALIGN(16);
	- // Team Offset if one thread determines work_range for others
	- long m_team_work_index;
	+ static HostThreadTeamData * m_pool[ MAX_THREAD_COUNT ];

	- // Is this thread stealing (i.e. its owned work_range is exhausted
	- bool m_stealing;
	-
	- OpenMPexec();
	- OpenMPexec( const OpenMPexec & );
	- OpenMPexec & operator = ( const OpenMPexec & );
	-
	- static void clear_scratch();
	+ static
	+ void clear_thread_data();

	public:

	// Topology of a cache coherent thread pool:
	// TOTAL = NUMA x GRAIN
	// pool_size( depth = 0 )
	// pool_size(0) = total number of threads
	// pool_size(1) = number of threads per NUMA
	// pool_size(2) = number of threads sharing finest grain memory hierarchy

	inline static
	int pool_size( int depth = 0 ) { return m_pool_topo[ depth ]; }

	- inline static
	- OpenMPexec * pool_rev( int pool_rank_rev ) { return m_pool[ pool_rank_rev ]; }
	-
	- inline int pool_rank() const { return m_pool_rank ; }
	- inline int pool_rank_rev() const { return m_pool_rank_rev ; }
	-
	- inline long team_work_index() const { return m_team_work_index ; }
	-
	- inline int scratch_reduce_size() const
	- { return m_scratch_reduce_end - m_scratch_exec_end ; }
	-
	- inline int scratch_thread_size() const
	- { return m_scratch_thread_end - m_scratch_reduce_end ; }
	-
	- inline void * scratch_reduce() const { return ((char *) this) + m_scratch_exec_end ; }
	- inline void * scratch_thread() const { return ((char *) this) + m_scratch_reduce_end ; }
	-
	- inline
	- void state_wait( int state )
	- { Impl::spinwait( m_barrier_state , state ); }
	-
	- inline
	- void state_set( int state ) { m_barrier_state = state ; }
	-
	- ~OpenMPexec() {}
	-
	- OpenMPexec( const int arg_poolRank
	- , const int arg_scratch_exec_size
	- , const int arg_scratch_reduce_size
	- , const int arg_scratch_thread_size )
	- : m_pool_rank( arg_poolRank )
	- , m_pool_rank_rev( pool_size() - ( arg_poolRank + 1 ) )
	- , m_scratch_exec_end( arg_scratch_exec_size )
	- , m_scratch_reduce_end( m_scratch_exec_end + arg_scratch_reduce_size )
	- , m_scratch_thread_end( m_scratch_reduce_end + arg_scratch_thread_size )
	- , m_barrier_state(0)
	- {}
	-
	static void finalize();

	- static void initialize( const unsigned team_count ,
	+ static void initialize( const unsigned team_count ,
	const unsigned threads_per_team ,
	const unsigned numa_count ,
	const unsigned cores_per_numa );

	static void verify_is_process( const char * const );
	static void verify_initialized( const char * const );

	- static void resize_scratch( size_t reduce_size , size_t thread_size );

	- inline static
	- OpenMPexec * get_thread_omp() { return m_pool[ m_map_rank[ omp_get_thread_num() ] ]; }
	+ static
	+ void resize_thread_data( size_t pool_reduce_bytes
	+ , size_t team_reduce_bytes
	+ , size_t team_shared_bytes
	+ , size_t thread_local_bytes );

	- /* Dynamic Scheduling related functionality */
	- // Initialize the work range for this thread
	- inline void set_work_range(const long& begin, const long& end, const long& chunk_size) {
	- m_work_range.first = (begin+chunk_size-1)/chunk_size;
	- m_work_range.second = end>0?(end+chunk_size-1)/chunk_size:m_work_range.first;
	- }
	-
	- // Claim and index from this thread's range from the beginning
	- inline long get_work_index_begin () {
	- Kokkos::pair<long,long> work_range_new = m_work_range;
	- Kokkos::pair<long,long> work_range_old = work_range_new;
	- if(work_range_old.first>=work_range_old.second)
	- return -1;
	-
	- work_range_new.first+=1;
	-
	- bool success = false;
	- while(!success) {
	- work_range_new = Kokkos::atomic_compare_exchange(&m_work_range,work_range_old,work_range_new);
	- success = ( (work_range_new == work_range_old) \|\|
	- (work_range_new.first>=work_range_new.second));
	- work_range_old = work_range_new;
	- work_range_new.first+=1;
	- }
	- if(work_range_old.first<work_range_old.second)
	- return work_range_old.first;
	- else
	- return -1;
	- }
	-
	- // Claim and index from this thread's range from the end
	- inline long get_work_index_end () {
	- Kokkos::pair<long,long> work_range_new = m_work_range;
	- Kokkos::pair<long,long> work_range_old = work_range_new;
	- if(work_range_old.first>=work_range_old.second)
	- return -1;
	- work_range_new.second-=1;
	- bool success = false;
	- while(!success) {
	- work_range_new = Kokkos::atomic_compare_exchange(&m_work_range,work_range_old,work_range_new);
	- success = ( (work_range_new == work_range_old) \|\|
	- (work_range_new.first>=work_range_new.second) );
	- work_range_old = work_range_new;
	- work_range_new.second-=1;
	- }
	- if(work_range_old.first<work_range_old.second)
	- return work_range_old.second-1;
	- else
	- return -1;
	- }
	-
	- // Reset the steal target
	- inline void reset_steal_target() {
	- m_current_steal_target = (m_pool_rank+1)%m_pool_topo[0];
	- m_stealing = false;
	- }
	-
	- // Reset the steal target
	- inline void reset_steal_target(int team_size) {
	- m_current_steal_target = (m_pool_rank_rev+team_size);
	- if(m_current_steal_target>=m_pool_topo[0])
	- m_current_steal_target = 0;//m_pool_topo[0]-1;
	- m_stealing = false;
	- }
	-
	- // Get a steal target; start with my-rank + 1 and go round robin, until arriving at this threads rank
	- // Returns -1 fi no active steal target available
	- inline int get_steal_target() {
	- while(( m_pool[m_current_steal_target]->m_work_range.second <=
	- m_pool[m_current_steal_target]->m_work_range.first ) &&
	- (m_current_steal_target!=m_pool_rank) ) {
	- m_current_steal_target = (m_current_steal_target+1)%m_pool_topo[0];
	- }
	- if(m_current_steal_target == m_pool_rank)
	- return -1;
	- else
	- return m_current_steal_target;
	- }
	-
	- inline int get_steal_target(int team_size) {
	-
	- while(( m_pool[m_current_steal_target]->m_work_range.second <=
	- m_pool[m_current_steal_target]->m_work_range.first ) &&
	- (m_current_steal_target!=m_pool_rank_rev) ) {
	- if(m_current_steal_target + team_size < m_pool_topo[0])
	- m_current_steal_target = (m_current_steal_target+team_size);
	- else
	- m_current_steal_target = 0;
	- }
	-
	- if(m_current_steal_target == m_pool_rank_rev)
	- return -1;
	- else
	- return m_current_steal_target;
	- }
	-
	- inline long steal_work_index (int team_size = 0) {
	- long index = -1;
	- int steal_target = team_size>0?get_steal_target(team_size):get_steal_target();
	- while ( (steal_target != -1) && (index == -1)) {
	- index = m_pool[steal_target]->get_work_index_end();
	- if(index == -1)
	- steal_target = team_size>0?get_steal_target(team_size):get_steal_target();
	- }
	- return index;
	- }
	-
	- // Get a work index. Claim from owned range until its exhausted, then steal from other thread
	- inline long get_work_index (int team_size = 0) {
	- long work_index = -1;
	- if(!m_stealing) work_index = get_work_index_begin();
	-
	- if( work_index == -1) {
	- memory_fence();
	- m_stealing = true;
	- work_index = steal_work_index(team_size);
	- }
	- m_team_work_index = work_index;
	- memory_fence();
	- return work_index;
	- }
	+ inline static
	+ HostThreadTeamData * get_thread_data() noexcept
	+ { return m_pool[ m_map_rank[ omp_get_thread_num() ] ]; }

	+ inline static
	+ HostThreadTeamData * get_thread_data( int i ) noexcept
	+ { return m_pool[i]; }
	};

	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	-class OpenMPexecTeamMember {
	-public:
	-
	- enum { TEAM_REDUCE_SIZE = 512 };
	-
	- /** \brief Thread states for team synchronization */
	- enum { Active = 0 , Rendezvous = 1 };
	-
	- typedef Kokkos::OpenMP execution_space ;
	- typedef execution_space::scratch_memory_space scratch_memory_space ;
	-
	- Impl::OpenMPexec & m_exec ;
	- scratch_memory_space m_team_shared ;
	- int m_team_scratch_size[2] ;
	- int m_team_base_rev ;
	- int m_team_rank_rev ;
	- int m_team_rank ;
	- int m_team_size ;
	- int m_league_rank ;
	- int m_league_end ;
	- int m_league_size ;
	-
	- int m_chunk_size;
	- int m_league_chunk_end;
	- Impl::OpenMPexec & m_team_lead_exec ;
	- int m_invalid_thread;
	- int m_team_alloc;
	-
	- // Fan-in team threads, root of the fan-in which does not block returns true
	- inline
	- bool team_fan_in() const
	- {
	- memory_fence();
	- for ( int n = 1 , j ; ( ( j = m_team_rank_rev + n ) < m_team_size ) && ! ( m_team_rank_rev & n ) ; n <<= 1 ) {
	-
	- m_exec.pool_rev( m_team_base_rev + j )->state_wait( Active );
	- }
	-
	- if ( m_team_rank_rev ) {
	- m_exec.state_set( Rendezvous );
	- memory_fence();
	- m_exec.state_wait( Rendezvous );
	- }
	-
	- return 0 == m_team_rank_rev ;
	- }
	-
	- inline
	- void team_fan_out() const
	- {
	- memory_fence();
	- for ( int n = 1 , j ; ( ( j = m_team_rank_rev + n ) < m_team_size ) && ! ( m_team_rank_rev & n ) ; n <<= 1 ) {
	- m_exec.pool_rev( m_team_base_rev + j )->state_set( Active );
	- memory_fence();
	- }
	- }
	-
	-public:
	-
	- KOKKOS_INLINE_FUNCTION
	- const execution_space::scratch_memory_space& team_shmem() const
	- { return m_team_shared.set_team_thread_mode(0,1,0) ; }
	-
	- KOKKOS_INLINE_FUNCTION
	- const execution_space::scratch_memory_space& team_scratch(int) const
	- { return m_team_shared.set_team_thread_mode(0,1,0) ; }
	-
	- KOKKOS_INLINE_FUNCTION
	- const execution_space::scratch_memory_space& thread_scratch(int) const
	- { return m_team_shared.set_team_thread_mode(0,team_size(),team_rank()) ; }
	-
	- KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
	- KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
	- KOKKOS_INLINE_FUNCTION int team_rank() const { return m_team_rank ; }
	- KOKKOS_INLINE_FUNCTION int team_size() const { return m_team_size ; }
	-
	- KOKKOS_INLINE_FUNCTION void team_barrier() const
	-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- {}
	-#else
	- {
	- if ( 1 < m_team_size && !m_invalid_thread) {
	- team_fan_in();
	- team_fan_out();
	- }
	- }
	-#endif
	-
	- template<class ValueType>
	- KOKKOS_INLINE_FUNCTION
	- void team_broadcast(ValueType& value, const int& thread_id) const
	- {
	-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- { }
	-#else
	- // Make sure there is enough scratch space:
	- typedef typename if_c< sizeof(ValueType) < TEAM_REDUCE_SIZE
	- , ValueType , void >::type type ;
	-
	- type volatile * const shared_value =
	- ((type*) m_exec.pool_rev( m_team_base_rev )->scratch_thread());
	-
	- if ( team_rank() == thread_id ) *shared_value = value;
	- memory_fence();
	- team_barrier(); // Wait for 'thread_id' to write
	- value = *shared_value ;
	- team_barrier(); // Wait for team members to read
	-#endif
	- }
	-
	- template< class ValueType, class JoinOp >
	- KOKKOS_INLINE_FUNCTION ValueType
	- team_reduce( const ValueType & value
	- , const JoinOp & op_in ) const
	- #if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- { return ValueType(); }
	- #else
	- {
	- memory_fence();
	- typedef ValueType value_type;
	- const JoinLambdaAdapter<value_type,JoinOp> op(op_in);
	- #endif
	-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- // Make sure there is enough scratch space:
	- typedef typename if_c< sizeof(value_type) < TEAM_REDUCE_SIZE
	- , value_type , void >::type type ;
	-
	- type * const local_value = ((type*) m_exec.scratch_thread());
	-
	- // Set this thread's contribution
	- *local_value = value ;
	-
	- // Fence to make sure the base team member has access:
	- memory_fence();
	-
	- if ( team_fan_in() ) {
	- // The last thread to synchronize returns true, all other threads wait for team_fan_out()
	- type * const team_value = ((type*) m_exec.pool_rev( m_team_base_rev )->scratch_thread());
	-
	- // Join to the team value:
	- for ( int i = 1 ; i < m_team_size ; ++i ) {
	- op.join( team_value , ((type*) m_exec.pool_rev( m_team_base_rev + i )->scratch_thread()) );
	- }
	- memory_fence();
	-
	- // The base team member may "lap" the other team members,
	- // copy to their local value before proceeding.
	- for ( int i = 1 ; i < m_team_size ; ++i ) {
	- ((type) m_exec.pool_rev( m_team_base_rev + i )->scratch_thread()) = *team_value ;
	- }
	-
	- // Fence to make sure all team members have access
	- memory_fence();
	- }
	-
	- team_fan_out();
	-
	- return ((type volatile const )local_value);
	- }
	-#endif
	- /** \brief Intra-team exclusive prefix sum with team_rank() ordering
	- * with intra-team non-deterministic ordering accumulation.
	- *
	- * The global inter-team accumulation value will, at the end of the
	- * league's parallel execution, be the scan's total.
	- * Parallel execution ordering of the league's teams is non-deterministic.
	- * As such the base value for each team's scan operation is similarly
	- * non-deterministic.
	- */
	- template< typename ArgType >
	- KOKKOS_INLINE_FUNCTION ArgType team_scan( const ArgType & value , ArgType * const global_accum ) const
	-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- { return ArgType(); }
	-#else
	- {
	- // Make sure there is enough scratch space:
	- typedef typename if_c< sizeof(ArgType) < TEAM_REDUCE_SIZE , ArgType , void >::type type ;
	-
	- volatile type * const work_value = ((type*) m_exec.scratch_thread());
	-
	- *work_value = value ;
	-
	- memory_fence();
	-
	- if ( team_fan_in() ) {
	- // The last thread to synchronize returns true, all other threads wait for team_fan_out()
	- // m_team_base[0] == highest ranking team member
	- // m_team_base[ m_team_size - 1 ] == lowest ranking team member
	- //
	- // 1) copy from lower to higher rank, initialize lowest rank to zero
	- // 2) prefix sum from lowest to highest rank, skipping lowest rank
	-
	- type accum = 0 ;
	-
	- if ( global_accum ) {
	- for ( int i = m_team_size ; i-- ; ) {
	- type & val = ((type) m_exec.pool_rev( m_team_base_rev + i )->scratch_thread());
	- accum += val ;
	- }
	- accum = atomic_fetch_add( global_accum , accum );
	- }
	-
	- for ( int i = m_team_size ; i-- ; ) {
	- type & val = ((type) m_exec.pool_rev( m_team_base_rev + i )->scratch_thread());
	- const type offset = accum ;
	- accum += val ;
	- val = offset ;
	- }
	-
	- memory_fence();
	- }
	-
	- team_fan_out();
	-
	- return *work_value ;
	- }
	-#endif
	-
	- /** \brief Intra-team exclusive prefix sum with team_rank() ordering.
	- *
	- * The highest rank thread can compute the reduction total as
	- * reduction_total = dev.team_scan( value ) + value ;
	- */
	- template< typename Type >
	- KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value ) const
	- { return this-> template team_scan<Type>( value , 0 ); }
	-
	- //----------------------------------------
	- // Private for the driver
	-
	-private:
	-
	- typedef execution_space::scratch_memory_space space ;
	-
	-public:
	-
	- template< class ... Properties >
	- inline
	- OpenMPexecTeamMember( Impl::OpenMPexec & exec
	- , const TeamPolicyInternal< OpenMP, Properties ...> & team
	- , const int shmem_size_L1
	- , const int shmem_size_L2
	- )
	- : m_exec( exec )
	- , m_team_shared(0,0)
	- , m_team_scratch_size{ shmem_size_L1 , shmem_size_L2 }
	- , m_team_base_rev(0)
	- , m_team_rank_rev(0)
	- , m_team_rank(0)
	- , m_team_size( team.team_size() )
	- , m_league_rank(0)
	- , m_league_end(0)
	- , m_league_size( team.league_size() )
	- , m_chunk_size( team.chunk_size()>0?team.chunk_size():team.team_iter() )
	- , m_league_chunk_end(0)
	- , m_team_lead_exec( exec.pool_rev( team.team_alloc() (m_exec.pool_rank_rev()/team.team_alloc()) ))
	- , m_team_alloc( team.team_alloc())
	- {
	- const int pool_rank_rev = m_exec.pool_rank_rev();
	- const int pool_team_rank_rev = pool_rank_rev % team.team_alloc();
	- const int pool_league_rank_rev = pool_rank_rev / team.team_alloc();
	- const int pool_num_teams = OpenMP::thread_pool_size(0)/team.team_alloc();
	- const int chunks_per_team = ( team.league_size() + m_chunk_sizepool_num_teams-1 ) / (m_chunk_sizepool_num_teams);
	- int league_iter_end = team.league_size() - pool_league_rank_rev * chunks_per_team * m_chunk_size;
	- int league_iter_begin = league_iter_end - chunks_per_team * m_chunk_size;
	- if (league_iter_begin < 0) league_iter_begin = 0;
	- if (league_iter_end>team.league_size()) league_iter_end = team.league_size();
	-
	- if ((team.team_alloc()>m_team_size)?
	- (pool_team_rank_rev >= m_team_size):
	- (m_exec.pool_size() - pool_num_teams*m_team_size > m_exec.pool_rank())
	- )
	- m_invalid_thread = 1;
	- else
	- m_invalid_thread = 0;
	-
	- m_team_rank_rev = pool_team_rank_rev ;
	- if ( pool_team_rank_rev < m_team_size && !m_invalid_thread ) {
	- m_team_base_rev = team.team_alloc() * pool_league_rank_rev ;
	- m_team_rank_rev = pool_team_rank_rev ;
	- m_team_rank = m_team_size - ( m_team_rank_rev + 1 );
	- m_league_end = league_iter_end ;
	- m_league_rank = league_iter_begin ;
	- new( (void) &m_team_shared ) space( ( (char) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE , m_team_scratch_size[0] ,
	- ( (char*) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE + m_team_scratch_size[0],
	- 0 );
	- }
	-
	- if ( (m_team_rank_rev == 0) && (m_invalid_thread == 0) ) {
	- m_exec.set_work_range(m_league_rank,m_league_end,m_chunk_size);
	- m_exec.reset_steal_target(m_team_size);
	- }
	- }
	-
	- bool valid_static() const
	- {
	- return m_league_rank < m_league_end ;
	- }
	-
	- void next_static()
	- {
	- if ( m_league_rank < m_league_end ) {
	- team_barrier();
	- new( (void) &m_team_shared ) space( ( (char) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE , m_team_scratch_size[0] ,
	- ( (char*) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE + m_team_scratch_size[0],
	- 0);
	- }
	- m_league_rank++;
	- }
	-
	- bool valid_dynamic() {
	- if(m_invalid_thread)
	- return false;
	- if ((m_league_rank < m_league_chunk_end) && (m_league_rank < m_league_size)) {
	- return true;
	- }
	-
	- if ( m_team_rank_rev == 0 ) {
	- m_team_lead_exec.get_work_index(m_team_alloc);
	- }
	- team_barrier();
	-
	- long work_index = m_team_lead_exec.team_work_index();
	-
	- m_league_rank = work_index * m_chunk_size;
	- m_league_chunk_end = (work_index +1 ) * m_chunk_size;
	-
	- if(m_league_chunk_end > m_league_size) m_league_chunk_end = m_league_size;
	-
	- if(m_league_rank>=0)
	- return true;
	- return false;
	- }
	-
	- void next_dynamic() {
	- if(m_invalid_thread)
	- return;
	-
	- if ( m_league_rank < m_league_chunk_end ) {
	- team_barrier();
	- new( (void) &m_team_shared ) space( ( (char) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE , m_team_scratch_size[0] ,
	- ( (char*) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE + m_team_scratch_size[0],
	- 0);
	- }
	- m_league_rank++;
	- }
	-
	- static inline int team_reduce_size() { return TEAM_REDUCE_SIZE ; }
	-};
	-
	template< class ... Properties >
	class TeamPolicyInternal< Kokkos::OpenMP, Properties ... >: public PolicyTraits<Properties ...>
	{
	public:

	//! Tag this class as a kokkos execution policy
	typedef TeamPolicyInternal execution_policy ;

	typedef PolicyTraits<Properties ... > traits;

	TeamPolicyInternal& operator = (const TeamPolicyInternal& p) {
	m_league_size = p.m_league_size;
	m_team_size = p.m_team_size;
	m_team_alloc = p.m_team_alloc;
	m_team_iter = p.m_team_iter;
	m_team_scratch_size[0] = p.m_team_scratch_size[0];
	m_thread_scratch_size[0] = p.m_thread_scratch_size[0];
	m_team_scratch_size[1] = p.m_team_scratch_size[1];
	m_thread_scratch_size[1] = p.m_thread_scratch_size[1];
	m_chunk_size = p.m_chunk_size;
	return *this;
	}

	//----------------------------------------

	template< class FunctorType >
	inline static
	- int team_size_max( const FunctorType & )
	- { return traits::execution_space::thread_pool_size(1); }
	+ int team_size_max( const FunctorType & ) {
	+ int pool_size = traits::execution_space::thread_pool_size(1);
	+ int max_host_team_size = Impl::HostThreadTeamData::max_team_members;
	+ return pool_size<max_host_team_size?pool_size:max_host_team_size;
	+ }

	template< class FunctorType >
	inline static
	int team_size_recommended( const FunctorType & )
	{ return traits::execution_space::thread_pool_size(2); }

	template< class FunctorType >
	inline static
	int team_size_recommended( const FunctorType &, const int& )
	{ return traits::execution_space::thread_pool_size(2); }

	//----------------------------------------

	private:

	int m_league_size ;
	int m_team_size ;
	int m_team_alloc ;
	int m_team_iter ;

	size_t m_team_scratch_size[2];
	size_t m_thread_scratch_size[2];

	int m_chunk_size;

	inline void init( const int league_size_request
	, const int team_size_request )
	{
	const int pool_size = traits::execution_space::thread_pool_size(0);
	- const int team_max = traits::execution_space::thread_pool_size(1);
	+ const int max_host_team_size = Impl::HostThreadTeamData::max_team_members;
	+ const int team_max = pool_size<max_host_team_size?pool_size:max_host_team_size;
	const int team_grain = traits::execution_space::thread_pool_size(2);

	m_league_size = league_size_request ;

	m_team_size = team_size_request < team_max ?
	team_size_request : team_max ;

	// Round team size up to a multiple of 'team_gain'
	const int team_size_grain = team_grain * ( ( m_team_size + team_grain - 1 ) / team_grain );
	const int team_count = pool_size / team_size_grain ;

	// Constraint : pool_size = m_team_alloc * team_count
	m_team_alloc = pool_size / team_count ;

	// Maxumum number of iterations each team will take:
	m_team_iter = ( m_league_size + team_count - 1 ) / team_count ;

	set_auto_chunk_size();
	}

	public:

	inline int team_size() const { return m_team_size ; }
	inline int league_size() const { return m_league_size ; }

	inline size_t scratch_size(const int& level, int team_size_ = -1) const {
	if(team_size_ < 0) team_size_ = m_team_size;
	return m_team_scratch_size[level] + team_size_*m_thread_scratch_size[level] ;
	}

	/** \brief Specify league size, request team size */
	TeamPolicyInternal( typename traits::execution_space &
	, int league_size_request
	, int team_size_request
	, int /* vector_length_request */ = 1 )
	: m_team_scratch_size { 0 , 0 }
	, m_thread_scratch_size { 0 , 0 }
	, m_chunk_size(0)
	{ init( league_size_request , team_size_request ); }

	TeamPolicyInternal( typename traits::execution_space &
	, int league_size_request
	, const Kokkos::AUTO_t & /* team_size_request */
	, int /* vector_length_request */ = 1)
	: m_team_scratch_size { 0 , 0 }
	, m_thread_scratch_size { 0 , 0 }
	, m_chunk_size(0)
	{ init( league_size_request , traits::execution_space::thread_pool_size(2) ); }

	TeamPolicyInternal( int league_size_request
	, int team_size_request
	, int /* vector_length_request */ = 1 )
	: m_team_scratch_size { 0 , 0 }
	, m_thread_scratch_size { 0 , 0 }
	, m_chunk_size(0)
	{ init( league_size_request , team_size_request ); }

	TeamPolicyInternal( int league_size_request
	, const Kokkos::AUTO_t & /* team_size_request */
	, int /* vector_length_request */ = 1 )
	: m_team_scratch_size { 0 , 0 }
	, m_thread_scratch_size { 0 , 0 }
	, m_chunk_size(0)
	{ init( league_size_request , traits::execution_space::thread_pool_size(2) ); }

	inline int team_alloc() const { return m_team_alloc ; }
	inline int team_iter() const { return m_team_iter ; }

	inline int chunk_size() const { return m_chunk_size ; }

	/** \brief set chunk_size to a discrete value*/
	inline TeamPolicyInternal set_chunk_size(typename traits::index_type chunk_size_) const {
	TeamPolicyInternal p = *this;
	p.m_chunk_size = chunk_size_;
	return p;
	}

	inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team) const {
	TeamPolicyInternal p = *this;
	p.m_team_scratch_size[level] = per_team.value;
	return p;
	};

	inline TeamPolicyInternal set_scratch_size(const int& level, const PerThreadValue& per_thread) const {
	TeamPolicyInternal p = *this;
	p.m_thread_scratch_size[level] = per_thread.value;
	return p;
	};

	inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team, const PerThreadValue& per_thread) const {
	TeamPolicyInternal p = *this;
	p.m_team_scratch_size[level] = per_team.value;
	p.m_thread_scratch_size[level] = per_thread.value;
	return p;
	};

	private:
	/** \brief finalize chunk_size if it was set to AUTO*/
	inline void set_auto_chunk_size() {

	int concurrency = traits::execution_space::thread_pool_size(0)/m_team_alloc;
	if( concurrency==0 ) concurrency=1;

	if(m_chunk_size > 0) {
	if(!Impl::is_integral_power_of_two( m_chunk_size ))
	Kokkos::abort("TeamPolicy blocking granularity must be power of two" );
	}

	int new_chunk_size = 1;
	while(new_chunk_size100concurrency < m_league_size)
	new_chunk_size *= 2;
	if(new_chunk_size < 128) {
	new_chunk_size = 1;
	while( (new_chunk_size40concurrency < m_league_size ) && (new_chunk_size<128) )
	new_chunk_size*=2;
	}
	m_chunk_size = new_chunk_size;
	}

	public:
	- typedef Impl::OpenMPexecTeamMember member_type ;
	+ typedef Impl::HostThreadTeamMember< Kokkos::OpenMP > member_type ;
	};
	} // namespace Impl

	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {

	inline
	int OpenMP::thread_pool_size( int depth )
	{
	return Impl::OpenMPexec::pool_size(depth);
	}

	KOKKOS_INLINE_FUNCTION
	int OpenMP::thread_pool_rank()
	{
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	return Impl::OpenMPexec::m_map_rank[ omp_get_thread_num() ];
	#else
	return -1 ;
	#endif
	}

	-template< typename iType >
	-KOKKOS_INLINE_FUNCTION
	-Impl::TeamThreadRangeBoundariesStruct< iType, Impl::OpenMPexecTeamMember >
	-TeamThreadRange( const Impl::OpenMPexecTeamMember& thread, const iType& count ) {
	- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::OpenMPexecTeamMember >( thread, count );
	-}
	-
	-template< typename iType1, typename iType2 >
	-KOKKOS_INLINE_FUNCTION
	-Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
	- Impl::OpenMPexecTeamMember >
	-TeamThreadRange( const Impl::OpenMPexecTeamMember& thread, const iType1& begin, const iType2& end ) {
	- typedef typename std::common_type< iType1, iType2 >::type iType;
	- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::OpenMPexecTeamMember >( thread, iType(begin), iType(end) );
	-}
	-
	-template<typename iType>
	-KOKKOS_INLINE_FUNCTION
	-Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >
	-ThreadVectorRange(const Impl::OpenMPexecTeamMember& thread, const iType& count) {
	- return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >(thread,count);
	-}
	-
	-KOKKOS_INLINE_FUNCTION
	-Impl::ThreadSingleStruct<Impl::OpenMPexecTeamMember> PerTeam(const Impl::OpenMPexecTeamMember& thread) {
	- return Impl::ThreadSingleStruct<Impl::OpenMPexecTeamMember>(thread);
	-}
	-
	-KOKKOS_INLINE_FUNCTION
	-Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember> PerThread(const Impl::OpenMPexecTeamMember& thread) {
	- return Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember>(thread);
	-}
	-
	} // namespace Kokkos

	-namespace Kokkos {
	-
	- /** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all threads of the the calling thread team.
	- * This functionality requires C++11 support.*/
	-template<typename iType, class Lambda>
	-KOKKOS_INLINE_FUNCTION
	-void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>& loop_boundaries, const Lambda& lambda) {
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
	- lambda(i);
	-}
	-
	-/** \brief Inter-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
	- * val is performed and put into result. This functionality requires C++11 support.*/
	-template< typename iType, class Lambda, typename ValueType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>& loop_boundaries,
	- const Lambda & lambda, ValueType& result) {
	-
	- result = ValueType();
	-
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- ValueType tmp = ValueType();
	- lambda(i,tmp);
	- result+=tmp;
	- }
	-
	- result = loop_boundaries.thread.team_reduce(result,Impl::JoinAdd<ValueType>());
	-}
	-
	-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
	- * val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
	- * The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
	- * the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
	- * '1 for '). This functionality requires C++11 support./
	-template< typename iType, class Lambda, typename ValueType, class JoinType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>& loop_boundaries,
	- const Lambda & lambda, const JoinType& join, ValueType& init_result) {
	-
	- ValueType result = init_result;
	-
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- ValueType tmp = ValueType();
	- lambda(i,tmp);
	- join(result,tmp);
	- }
	-
	- init_result = loop_boundaries.thread.team_reduce(result,join);
	-}
	-
	-} //namespace Kokkos
	-
	-namespace Kokkos {
	-/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
	- * This functionality requires C++11 support.*/
	-template<typename iType, class Lambda>
	-KOKKOS_INLINE_FUNCTION
	-void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >&
	- loop_boundaries, const Lambda& lambda) {
	- #ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	- #pragma ivdep
	- #endif
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
	- lambda(i);
	-}
	-
	-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
	- * val is performed and put into result. This functionality requires C++11 support.*/
	-template< typename iType, class Lambda, typename ValueType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >&
	- loop_boundaries, const Lambda & lambda, ValueType& result) {
	- result = ValueType();
	-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	-#pragma ivdep
	-#endif
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- ValueType tmp = ValueType();
	- lambda(i,tmp);
	- result+=tmp;
	- }
	-}
	-
	-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
	- * val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
	- * The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
	- * the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
	- * '1 for '). This functionality requires C++11 support./
	-template< typename iType, class Lambda, typename ValueType, class JoinType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >&
	- loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& init_result) {
	-
	- ValueType result = init_result;
	-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	-#pragma ivdep
	-#endif
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- ValueType tmp = ValueType();
	- lambda(i,tmp);
	- join(result,tmp);
	- }
	- init_result = result;
	-}
	-
	-/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
	- * for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all vector lanes in the thread and a scan operation is performed.
	- * Depending on the target execution space the operator might be called twice: once with final=false
	- * and once with final=true. When final==true val contains the prefix sum value. The contribution of this
	- * "i" needs to be added to val no matter whether final==true or not. In a serial execution
	- * (i.e. team_size==1) the operator is only called once with final==true. Scan_val will be set
	- * to the final sum value over all vector lanes.
	- * This functionality requires C++11 support.*/
	-template< typename iType, class FunctorType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >&
	- loop_boundaries, const FunctorType & lambda) {
	-
	- typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
	- typedef typename ValueTraits::value_type value_type ;
	-
	- value_type scan_val = value_type();
	-
	-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	-#pragma ivdep
	-#endif
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- lambda(i,scan_val,true);
	- }
	-}
	-
	-} // namespace Kokkos
	-
	-namespace Kokkos {
	-
	-template<class FunctorType>
	-KOKKOS_INLINE_FUNCTION
	-void single(const Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember>& single_struct, const FunctorType& lambda) {
	- lambda();
	-}
	-
	-template<class FunctorType>
	-KOKKOS_INLINE_FUNCTION
	-void single(const Impl::ThreadSingleStruct<Impl::OpenMPexecTeamMember>& single_struct, const FunctorType& lambda) {
	- if(single_struct.team_member.team_rank()==0) lambda();
	-}
	-
	-template<class FunctorType, class ValueType>
	-KOKKOS_INLINE_FUNCTION
	-void single(const Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
	- lambda(val);
	-}
	-
	-template<class FunctorType, class ValueType>
	-KOKKOS_INLINE_FUNCTION
	-void single(const Impl::ThreadSingleStruct<Impl::OpenMPexecTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
	- if(single_struct.team_member.team_rank()==0) {
	- lambda(val);
	- }
	- single_struct.team_member.team_broadcast(val,0);
	-}
	-}
	-
	#endif /* #ifndef KOKKOS_OPENMPEXEC_HPP */
	diff --git a/lib/kokkos/core/src/Qthread/Kokkos_QthreadExec.cpp b/lib/kokkos/core/src/Qthread/Kokkos_QthreadExec.cpp
	deleted file mode 100644
	index b4df5e35b..000000000
	--- a/lib/kokkos/core/src/Qthread/Kokkos_QthreadExec.cpp
	+++ /dev/null
	@@ -1,511 +0,0 @@
	-/*
	-//@HEADER
	-// ************************************************************************
	-//
	-// Kokkos v. 2.0
	-// Copyright (2014) Sandia Corporation
	-//
	-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	-// the U.S. Government retains certain rights in this software.
	-//
	-// Redistribution and use in source and binary forms, with or without
	-// modification, are permitted provided that the following conditions are
	-// met:
	-//
	-// 1. Redistributions of source code must retain the above copyright
	-// notice, this list of conditions and the following disclaimer.
	-//
	-// 2. Redistributions in binary form must reproduce the above copyright
	-// notice, this list of conditions and the following disclaimer in the
	-// documentation and/or other materials provided with the distribution.
	-//
	-// 3. Neither the name of the Corporation nor the names of the
	-// contributors may be used to endorse or promote products derived from
	-// this software without specific prior written permission.
	-//
	-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	-//
	-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	-// ************************************************************************
	-//@HEADER
	-*/
	-
	-#include <Kokkos_Core_fwd.hpp>
	-
	-#if defined( KOKKOS_ENABLE_QTHREAD )
	-
	-#include <stdio.h>
	-#include <stdlib.h>
	-#include <iostream>
	-#include <sstream>
	-#include <utility>
	-#include <Kokkos_Qthread.hpp>
	-#include <Kokkos_Atomic.hpp>
	-#include <impl/Kokkos_Error.hpp>
	-
	-// Defines to enable experimental Qthread functionality
	-
	-#define QTHREAD_LOCAL_PRIORITY
	-#define CLONED_TASKS
	-
	-#include <qthread/qthread.h>
	-
	-//----------------------------------------------------------------------------
	-
	-namespace Kokkos {
	-namespace Impl {
	-namespace {
	-
	-enum { MAXIMUM_QTHREAD_WORKERS = 1024 };
	-
	-/** s_exec is indexed by the reverse rank of the workers
	- * for faster fan-in / fan-out lookups
	- * [ n - 1 , n - 2 , ... , 0 ]
	- */
	-QthreadExec * s_exec[ MAXIMUM_QTHREAD_WORKERS ];
	-
	-int s_number_shepherds = 0 ;
	-int s_number_workers_per_shepherd = 0 ;
	-int s_number_workers = 0 ;
	-
	-inline
	-QthreadExec ** worker_exec()
	-{
	- return s_exec + s_number_workers - ( qthread_shep() * s_number_workers_per_shepherd + qthread_worker_local(NULL) + 1 );
	-}
	-
	-const int s_base_size = QthreadExec::align_alloc( sizeof(QthreadExec) );
	-
	-int s_worker_reduce_end = 0 ; /* End of worker reduction memory */
	-int s_worker_shared_end = 0 ; /* Total of worker scratch memory */
	-int s_worker_shared_begin = 0 ; /* Beginning of worker shared memory */
	-
	-QthreadExecFunctionPointer volatile s_active_function = 0 ;
	-const void * volatile s_active_function_arg = 0 ;
	-
	-} /* namespace */
	-} /* namespace Impl */
	-} /* namespace Kokkos */
	-
	-//----------------------------------------------------------------------------
	-
	-namespace Kokkos {
	-
	-int Qthread::is_initialized()
	-{
	- return Impl::s_number_workers != 0 ;
	-}
	-
	-int Qthread::concurrency()
	-{
	- return Impl::s_number_workers_per_shepherd ;
	-}
	-
	-int Qthread::in_parallel()
	-{
	- return Impl::s_active_function != 0 ;
	-}
	-
	-void Qthread::initialize( int thread_count )
	-{
	- // Environment variable: QTHREAD_NUM_SHEPHERDS
	- // Environment variable: QTHREAD_NUM_WORKERS_PER_SHEP
	- // Environment variable: QTHREAD_HWPAR
	-
	- {
	- char buffer[256];
	- snprintf(buffer,sizeof(buffer),"QTHREAD_HWPAR=%d",thread_count);
	- putenv(buffer);
	- }
	-
	- const bool ok_init = ( QTHREAD_SUCCESS == qthread_initialize() ) &&
	- ( thread_count == qthread_num_shepherds() * qthread_num_workers_local(NO_SHEPHERD) ) &&
	- ( thread_count == qthread_num_workers() );
	-
	- bool ok_symmetry = true ;
	-
	- if ( ok_init ) {
	- Impl::s_number_shepherds = qthread_num_shepherds();
	- Impl::s_number_workers_per_shepherd = qthread_num_workers_local(NO_SHEPHERD);
	- Impl::s_number_workers = Impl::s_number_shepherds * Impl::s_number_workers_per_shepherd ;
	-
	- for ( int i = 0 ; ok_symmetry && i < Impl::s_number_shepherds ; ++i ) {
	- ok_symmetry = ( Impl::s_number_workers_per_shepherd == qthread_num_workers_local(i) );
	- }
	- }
	-
	- if ( ! ok_init \|\| ! ok_symmetry ) {
	- std::ostringstream msg ;
	-
	- msg << "Kokkos::Qthread::initialize(" << thread_count << ") FAILED" ;
	- msg << " : qthread_num_shepherds = " << qthread_num_shepherds();
	- msg << " : qthread_num_workers_per_shepherd = " << qthread_num_workers_local(NO_SHEPHERD);
	- msg << " : qthread_num_workers = " << qthread_num_workers();
	-
	- if ( ! ok_symmetry ) {
	- msg << " : qthread_num_workers_local = {" ;
	- for ( int i = 0 ; i < Impl::s_number_shepherds ; ++i ) {
	- msg << " " << qthread_num_workers_local(i) ;
	- }
	- msg << " }" ;
	- }
	-
	- Impl::s_number_workers = 0 ;
	- Impl::s_number_shepherds = 0 ;
	- Impl::s_number_workers_per_shepherd = 0 ;
	-
	- if ( ok_init ) { qthread_finalize(); }
	-
	- Kokkos::Impl::throw_runtime_exception( msg.str() );
	- }
	-
	- Impl::QthreadExec::resize_worker_scratch( 256 , 256 );
	-
	- // Init the array for used for arbitrarily sized atomics
	- Impl::init_lock_array_host_space();
	-
	-}
	-
	-void Qthread::finalize()
	-{
	- Impl::QthreadExec::clear_workers();
	-
	- if ( Impl::s_number_workers ) {
	- qthread_finalize();
	- }
	-
	- Impl::s_number_workers = 0 ;
	- Impl::s_number_shepherds = 0 ;
	- Impl::s_number_workers_per_shepherd = 0 ;
	-}
	-
	-void Qthread::print_configuration( std::ostream & s , const bool detail )
	-{
	- s << "Kokkos::Qthread {"
	- << " num_shepherds(" << Impl::s_number_shepherds << ")"
	- << " num_workers_per_shepherd(" << Impl::s_number_workers_per_shepherd << ")"
	- << " }" << std::endl ;
	-}
	-
	-Qthread & Qthread::instance( int )
	-{
	- static Qthread q ;
	- return q ;
	-}
	-
	-void Qthread::fence()
	-{
	-}
	-
	-int Qthread::shepherd_size() const { return Impl::s_number_shepherds ; }
	-int Qthread::shepherd_worker_size() const { return Impl::s_number_workers_per_shepherd ; }
	-
	-} /* namespace Kokkos */
	-
	-//----------------------------------------------------------------------------
	-
	-namespace Kokkos {
	-namespace Impl {
	-namespace {
	-
	-aligned_t driver_exec_all( void * arg )
	-{
	- QthreadExec & exec = **worker_exec();
	-
	- (*s_active_function)( exec , s_active_function_arg );
	-
	-/*
	- fprintf( stdout
	- , "QthreadExec driver worker(%d:%d) shepherd(%d:%d) shepherd_worker(%d:%d) done\n"
	- , exec.worker_rank()
	- , exec.worker_size()
	- , exec.shepherd_rank()
	- , exec.shepherd_size()
	- , exec.shepherd_worker_rank()
	- , exec.shepherd_worker_size()
	- );
	- fflush(stdout);
	-*/
	-
	- return 0 ;
	-}
	-
	-aligned_t driver_resize_worker_scratch( void * arg )
	-{
	- static volatile int lock_begin = 0 ;
	- static volatile int lock_end = 0 ;
	-
	- QthreadExec ** const exec = worker_exec();
	-
	- //----------------------------------------
	- // Serialize allocation for thread safety
	-
	- while ( ! atomic_compare_exchange_strong( & lock_begin , 0 , 1 ) ); // Spin wait to claim lock
	-
	- const bool ok = 0 == *exec ;
	-
	- if ( ok ) { exec = (QthreadExec ) malloc( s_base_size + s_worker_shared_end ); }
	-
	- lock_begin = 0 ; // release lock
	-
	- if ( ok ) { new( *exec ) QthreadExec(); }
	-
	- //----------------------------------------
	- // Wait for all calls to complete to insure that each worker has executed.
	-
	- if ( s_number_workers == 1 + atomic_fetch_add( & lock_end , 1 ) ) { lock_end = 0 ; }
	-
	- while ( lock_end );
	-
	-/*
	- fprintf( stdout
	- , "QthreadExec resize worker(%d:%d) shepherd(%d:%d) shepherd_worker(%d:%d) done\n"
	- , (**exec).worker_rank()
	- , (**exec).worker_size()
	- , (**exec).shepherd_rank()
	- , (**exec).shepherd_size()
	- , (**exec).shepherd_worker_rank()
	- , (**exec).shepherd_worker_size()
	- );
	- fflush(stdout);
	-*/
	-
	- //----------------------------------------
	-
	- if ( ! ok ) {
	- fprintf( stderr , "Kokkos::QthreadExec resize failed\n" );
	- fflush( stderr );
	- }
	-
	- return 0 ;
	-}
	-
	-void verify_is_process( const char * const label , bool not_active = false )
	-{
	- const bool not_process = 0 != qthread_shep() \|\| 0 != qthread_worker_local(NULL);
	- const bool is_active = not_active && ( s_active_function \|\| s_active_function_arg );
	-
	- if ( not_process \|\| is_active ) {
	- std::string msg( label );
	- msg.append( " : FAILED" );
	- if ( not_process ) msg.append(" : not called by main process");
	- if ( is_active ) msg.append(" : parallel execution in progress");
	- Kokkos::Impl::throw_runtime_exception( msg );
	- }
	-}
	-
	-}
	-
	-int QthreadExec::worker_per_shepherd()
	-{
	- return s_number_workers_per_shepherd ;
	-}
	-
	-QthreadExec::QthreadExec()
	-{
	- const int shepherd_rank = qthread_shep();
	- const int shepherd_worker_rank = qthread_worker_local(NULL);
	- const int worker_rank = shepherd_rank * s_number_workers_per_shepherd + shepherd_worker_rank ;
	-
	- m_worker_base = s_exec ;
	- m_shepherd_base = s_exec + s_number_workers_per_shepherd * ( ( s_number_shepherds - ( shepherd_rank + 1 ) ) );
	- m_scratch_alloc = ( (unsigned char *) this ) + s_base_size ;
	- m_reduce_end = s_worker_reduce_end ;
	- m_shepherd_rank = shepherd_rank ;
	- m_shepherd_size = s_number_shepherds ;
	- m_shepherd_worker_rank = shepherd_worker_rank ;
	- m_shepherd_worker_size = s_number_workers_per_shepherd ;
	- m_worker_rank = worker_rank ;
	- m_worker_size = s_number_workers ;
	- m_worker_state = QthreadExec::Active ;
	-}
	-
	-void QthreadExec::clear_workers()
	-{
	- for ( int iwork = 0 ; iwork < s_number_workers ; ++iwork ) {
	- QthreadExec * const exec = s_exec[iwork] ;
	- s_exec[iwork] = 0 ;
	- free( exec );
	- }
	-}
	-
	-void QthreadExec::shared_reset( Qthread::scratch_memory_space & space )
	-{
	- new( & space )
	- Qthread::scratch_memory_space(
	- ((unsigned char ) (*m_shepherd_base).m_scratch_alloc ) + s_worker_shared_begin ,
	- s_worker_shared_end - s_worker_shared_begin
	- );
	-}
	-
	-void QthreadExec::resize_worker_scratch( const int reduce_size , const int shared_size )
	-{
	- const int exec_all_reduce_alloc = align_alloc( reduce_size );
	- const int shepherd_scan_alloc = align_alloc( 8 );
	- const int shepherd_shared_end = exec_all_reduce_alloc + shepherd_scan_alloc + align_alloc( shared_size );
	-
	- if ( s_worker_reduce_end < exec_all_reduce_alloc \|\|
	- s_worker_shared_end < shepherd_shared_end ) {
	-
	-/*
	- fprintf( stdout , "QthreadExec::resize\n");
	- fflush(stdout);
	-*/
	-
	- // Clear current worker memory before allocating new worker memory
	- clear_workers();
	-
	- // Increase the buffers to an aligned allocation
	- s_worker_reduce_end = exec_all_reduce_alloc ;
	- s_worker_shared_begin = exec_all_reduce_alloc + shepherd_scan_alloc ;
	- s_worker_shared_end = shepherd_shared_end ;
	-
	- // Need to query which shepherd this main 'process' is running...
	-
	- const int main_shep = qthread_shep();
	-
	- // Have each worker resize its memory for proper first-touch
	-#if 0
	- for ( int jshep = 0 ; jshep < s_number_shepherds ; ++jshep ) {
	- for ( int i = jshep != main_shep ? 0 : 1 ; i < s_number_workers_per_shepherd ; ++i ) {
	- qthread_fork_to( driver_resize_worker_scratch , NULL , NULL , jshep );
	- }}
	-#else
	- // If this function is used before the 'qthread.task_policy' unit test
	- // the 'qthread.task_policy' unit test fails with a seg-fault within libqthread.so.
	- for ( int jshep = 0 ; jshep < s_number_shepherds ; ++jshep ) {
	- const int num_clone = jshep != main_shep ? s_number_workers_per_shepherd : s_number_workers_per_shepherd - 1 ;
	-
	- if ( num_clone ) {
	- const int ret = qthread_fork_clones_to_local_priority
	- ( driver_resize_worker_scratch /* function */
	- , NULL /* function data block */
	- , NULL /* pointer to return value feb */
	- , jshep /* shepherd number */
	- , num_clone - 1 /* number of instances - 1 */
	- );
	-
	- assert(ret == QTHREAD_SUCCESS);
	- }
	- }
	-#endif
	-
	- driver_resize_worker_scratch( NULL );
	-
	- // Verify all workers allocated
	-
	- bool ok = true ;
	- for ( int iwork = 0 ; ok && iwork < s_number_workers ; ++iwork ) { ok = 0 != s_exec[iwork] ; }
	-
	- if ( ! ok ) {
	- std::ostringstream msg ;
	- msg << "Kokkos::Impl::QthreadExec::resize : FAILED for workers {" ;
	- for ( int iwork = 0 ; iwork < s_number_workers ; ++iwork ) {
	- if ( 0 == s_exec[iwork] ) { msg << " " << ( s_number_workers - ( iwork + 1 ) ); }
	- }
	- msg << " }" ;
	- Kokkos::Impl::throw_runtime_exception( msg.str() );
	- }
	- }
	-}
	-
	-void QthreadExec::exec_all( Qthread & , QthreadExecFunctionPointer func , const void * arg )
	-{
	- verify_is_process("QthreadExec::exec_all(...)",true);
	-
	-/*
	- fprintf( stdout , "QthreadExec::exec_all\n");
	- fflush(stdout);
	-*/
	-
	- s_active_function = func ;
	- s_active_function_arg = arg ;
	-
	- // Need to query which shepherd this main 'process' is running...
	-
	- const int main_shep = qthread_shep();
	-
	-#if 0
	- for ( int jshep = 0 , iwork = 0 ; jshep < s_number_shepherds ; ++jshep ) {
	- for ( int i = jshep != main_shep ? 0 : 1 ; i < s_number_workers_per_shepherd ; ++i , ++iwork ) {
	- qthread_fork_to( driver_exec_all , NULL , NULL , jshep );
	- }}
	-#else
	- // If this function is used before the 'qthread.task_policy' unit test
	- // the 'qthread.task_policy' unit test fails with a seg-fault within libqthread.so.
	- for ( int jshep = 0 ; jshep < s_number_shepherds ; ++jshep ) {
	- const int num_clone = jshep != main_shep ? s_number_workers_per_shepherd : s_number_workers_per_shepherd - 1 ;
	-
	- if ( num_clone ) {
	- const int ret = qthread_fork_clones_to_local_priority
	- ( driver_exec_all /* function */
	- , NULL /* function data block */
	- , NULL /* pointer to return value feb */
	- , jshep /* shepherd number */
	- , num_clone - 1 /* number of instances - 1 */
	- );
	-
	- assert(ret == QTHREAD_SUCCESS);
	- }
	- }
	-#endif
	-
	- driver_exec_all( NULL );
	-
	- s_active_function = 0 ;
	- s_active_function_arg = 0 ;
	-}
	-
	-void * QthreadExec::exec_all_reduce_result()
	-{
	- return s_exec[0]->m_scratch_alloc ;
	-}
	-
	-} /* namespace Impl */
	-} /* namespace Kokkos */
	-
	-namespace Kokkos {
	-namespace Impl {
	-
	-QthreadTeamPolicyMember::QthreadTeamPolicyMember()
	- : m_exec( **worker_exec() )
	- , m_team_shared(0,0)
	- , m_team_size( 1 )
	- , m_team_rank( 0 )
	- , m_league_size(1)
	- , m_league_end(1)
	- , m_league_rank(0)
	-{
	- m_exec.shared_reset( m_team_shared );
	-}
	-
	-QthreadTeamPolicyMember::QthreadTeamPolicyMember( const QthreadTeamPolicyMember::TaskTeam & )
	- : m_exec( **worker_exec() )
	- , m_team_shared(0,0)
	- , m_team_size( s_number_workers_per_shepherd )
	- , m_team_rank( m_exec.shepherd_worker_rank() )
	- , m_league_size(1)
	- , m_league_end(1)
	- , m_league_rank(0)
	-{
	- m_exec.shared_reset( m_team_shared );
	-}
	-
	-} /* namespace Impl */
	-} /* namespace Kokkos */
	-
	-//----------------------------------------------------------------------------
	-
	-#endif /* #if defined( KOKKOS_ENABLE_QTHREAD ) */
	-
	diff --git a/lib/kokkos/core/src/Qthread/Kokkos_QthreadExec.hpp b/lib/kokkos/core/src/Qthread/Kokkos_QthreadExec.hpp
	deleted file mode 100644
	index f948eb290..000000000
	--- a/lib/kokkos/core/src/Qthread/Kokkos_QthreadExec.hpp
	+++ /dev/null
	@@ -1,620 +0,0 @@
	-/*
	-//@HEADER
	-// ************************************************************************
	-//
	-// Kokkos v. 2.0
	-// Copyright (2014) Sandia Corporation
	-//
	-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	-// the U.S. Government retains certain rights in this software.
	-//
	-// Redistribution and use in source and binary forms, with or without
	-// modification, are permitted provided that the following conditions are
	-// met:
	-//
	-// 1. Redistributions of source code must retain the above copyright
	-// notice, this list of conditions and the following disclaimer.
	-//
	-// 2. Redistributions in binary form must reproduce the above copyright
	-// notice, this list of conditions and the following disclaimer in the
	-// documentation and/or other materials provided with the distribution.
	-//
	-// 3. Neither the name of the Corporation nor the names of the
	-// contributors may be used to endorse or promote products derived from
	-// this software without specific prior written permission.
	-//
	-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	-//
	-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	-// ************************************************************************
	-//@HEADER
	-*/
	-
	-#ifndef KOKKOS_QTHREADEXEC_HPP
	-#define KOKKOS_QTHREADEXEC_HPP
	-
	-#include <impl/Kokkos_spinwait.hpp>
	-
	-//----------------------------------------------------------------------------
	-
	-namespace Kokkos {
	-namespace Impl {
	-
	-//----------------------------------------------------------------------------
	-
	-class QthreadExec ;
	-
	-typedef void (QthreadExecFunctionPointer)( QthreadExec & , const void );
	-
	-class QthreadExec {
	-private:
	-
	- enum { Inactive = 0 , Active = 1 };
	-
	- const QthreadExec * const * m_worker_base ;
	- const QthreadExec * const * m_shepherd_base ;
	-
	- void * m_scratch_alloc ; ///< Scratch memory [ reduce , team , shared ]
	- int m_reduce_end ; ///< End of scratch reduction memory
	-
	- int m_shepherd_rank ;
	- int m_shepherd_size ;
	-
	- int m_shepherd_worker_rank ;
	- int m_shepherd_worker_size ;
	-
	- /*
	- * m_worker_rank = m_shepherd_rank * m_shepherd_worker_size + m_shepherd_worker_rank
	- * m_worker_size = m_shepherd_size * m_shepherd_worker_size
	- */
	- int m_worker_rank ;
	- int m_worker_size ;
	-
	- int mutable volatile m_worker_state ;
	-
	-
	- friend class Kokkos::Qthread ;
	-
	- ~QthreadExec();
	- QthreadExec( const QthreadExec & );
	- QthreadExec & operator = ( const QthreadExec & );
	-
	-public:
	-
	- QthreadExec();
	-
	- /** Execute the input function on all available Qthread workers */
	- static void exec_all( Qthread & , QthreadExecFunctionPointer , const void * );
	-
	- //----------------------------------------
	- /** Barrier across all workers participating in the 'exec_all' */
	- void exec_all_barrier() const
	- {
	- const int rev_rank = m_worker_size - ( m_worker_rank + 1 );
	-
	- int n , j ;
	-
	- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ) ; n <<= 1 ) {
	- Impl::spinwait( m_worker_base[j]->m_worker_state , QthreadExec::Active );
	- }
	-
	- if ( rev_rank ) {
	- m_worker_state = QthreadExec::Inactive ;
	- Impl::spinwait( m_worker_state , QthreadExec::Inactive );
	- }
	-
	- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ) ; n <<= 1 ) {
	- m_worker_base[j]->m_worker_state = QthreadExec::Active ;
	- }
	- }
	-
	- /** Barrier across workers within the shepherd with rank < team_rank */
	- void shepherd_barrier( const int team_size ) const
	- {
	- if ( m_shepherd_worker_rank < team_size ) {
	-
	- const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
	-
	- int n , j ;
	-
	- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
	- Impl::spinwait( m_shepherd_base[j]->m_worker_state , QthreadExec::Active );
	- }
	-
	- if ( rev_rank ) {
	- m_worker_state = QthreadExec::Inactive ;
	- Impl::spinwait( m_worker_state , QthreadExec::Inactive );
	- }
	-
	- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
	- m_shepherd_base[j]->m_worker_state = QthreadExec::Active ;
	- }
	- }
	- }
	-
	- //----------------------------------------
	- /** Reduce across all workers participating in the 'exec_all' */
	- template< class FunctorType , class ReducerType , class ArgTag >
	- inline
	- void exec_all_reduce( const FunctorType & func, const ReducerType & reduce ) const
	- {
	- typedef Kokkos::Impl::if_c< std::is_same<InvalidType, ReducerType>::value, FunctorType, ReducerType > ReducerConditional;
	- typedef typename ReducerConditional::type ReducerTypeFwd;
	- typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, ArgTag > ValueJoin ;
	-
	- const int rev_rank = m_worker_size - ( m_worker_rank + 1 );
	-
	- int n , j ;
	-
	- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ) ; n <<= 1 ) {
	- const QthreadExec & fan = *m_worker_base[j];
	-
	- Impl::spinwait( fan.m_worker_state , QthreadExec::Active );
	-
	- ValueJoin::join( ReducerConditional::select(func , reduce) , m_scratch_alloc , fan.m_scratch_alloc );
	- }
	-
	- if ( rev_rank ) {
	- m_worker_state = QthreadExec::Inactive ;
	- Impl::spinwait( m_worker_state , QthreadExec::Inactive );
	- }
	-
	- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ) ; n <<= 1 ) {
	- m_worker_base[j]->m_worker_state = QthreadExec::Active ;
	- }
	- }
	-
	- //----------------------------------------
	- /** Scall across all workers participating in the 'exec_all' */
	- template< class FunctorType , class ArgTag >
	- inline
	- void exec_all_scan( const FunctorType & func ) const
	- {
	- typedef Kokkos::Impl::FunctorValueInit< FunctorType , ArgTag > ValueInit ;
	- typedef Kokkos::Impl::FunctorValueJoin< FunctorType , ArgTag > ValueJoin ;
	- typedef Kokkos::Impl::FunctorValueOps< FunctorType , ArgTag > ValueOps ;
	-
	- const int rev_rank = m_worker_size - ( m_worker_rank + 1 );
	-
	- int n , j ;
	-
	- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ) ; n <<= 1 ) {
	- Impl::spinwait( m_worker_base[j]->m_worker_state , QthreadExec::Active );
	- }
	-
	- if ( rev_rank ) {
	- m_worker_state = QthreadExec::Inactive ;
	- Impl::spinwait( m_worker_state , QthreadExec::Inactive );
	- }
	- else {
	- // Root thread scans across values before releasing threads
	- // Worker data is in reverse order, so m_worker_base[0] is the
	- // highest ranking thread.
	-
	- // Copy from lower ranking to higher ranking worker.
	- for ( int i = 1 ; i < m_worker_size ; ++i ) {
	- ValueOps::copy( func
	- , m_worker_base[i-1]->m_scratch_alloc
	- , m_worker_base[i]->m_scratch_alloc
	- );
	- }
	-
	- ValueInit::init( func , m_worker_base[m_worker_size-1]->m_scratch_alloc );
	-
	- // Join from lower ranking to higher ranking worker.
	- // Value at m_worker_base[n-1] is zero so skip adding it to m_worker_base[n-2].
	- for ( int i = m_worker_size - 1 ; --i > 0 ; ) {
	- ValueJoin::join( func , m_worker_base[i-1]->m_scratch_alloc , m_worker_base[i]->m_scratch_alloc );
	- }
	- }
	-
	- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ) ; n <<= 1 ) {
	- m_worker_base[j]->m_worker_state = QthreadExec::Active ;
	- }
	- }
	-
	- //----------------------------------------
	-
	- template< class Type>
	- inline
	- volatile Type * shepherd_team_scratch_value() const
	- { return (volatile Type)(((unsigned char ) m_scratch_alloc) + m_reduce_end); }
	-
	- template< class Type >
	- inline
	- void shepherd_broadcast( Type & value , const int team_size , const int team_rank ) const
	- {
	- if ( m_shepherd_base ) {
	- Type * const shared_value = m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
	- if ( m_shepherd_worker_rank == team_rank ) { *shared_value = value ; }
	- memory_fence();
	- shepherd_barrier( team_size );
	- value = *shared_value ;
	- }
	- }
	-
	- template< class Type >
	- inline
	- Type shepherd_reduce( const int team_size , const Type & value ) const
	- {
	- *shepherd_team_scratch_value<Type>() = value ;
	-
	- memory_fence();
	-
	- const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
	-
	- int n , j ;
	-
	- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
	- Impl::spinwait( m_shepherd_base[j]->m_worker_state , QthreadExec::Active );
	- }
	-
	- if ( rev_rank ) {
	- m_worker_state = QthreadExec::Inactive ;
	- Impl::spinwait( m_worker_state , QthreadExec::Inactive );
	- }
	- else {
	- Type & accum = * m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
	- for ( int i = 1 ; i < n ; ++i ) {
	- accum += * m_shepherd_base[i]->shepherd_team_scratch_value<Type>();
	- }
	- for ( int i = 1 ; i < n ; ++i ) {
	- * m_shepherd_base[i]->shepherd_team_scratch_value<Type>() = accum ;
	- }
	-
	- memory_fence();
	- }
	-
	- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
	- m_shepherd_base[j]->m_worker_state = QthreadExec::Active ;
	- }
	-
	- return *shepherd_team_scratch_value<Type>();
	- }
	-
	- template< class JoinOp >
	- inline
	- typename JoinOp::value_type
	- shepherd_reduce( const int team_size
	- , const typename JoinOp::value_type & value
	- , const JoinOp & op ) const
	- {
	- typedef typename JoinOp::value_type Type ;
	-
	- *shepherd_team_scratch_value<Type>() = value ;
	-
	- memory_fence();
	-
	- const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
	-
	- int n , j ;
	-
	- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
	- Impl::spinwait( m_shepherd_base[j]->m_worker_state , QthreadExec::Active );
	- }
	-
	- if ( rev_rank ) {
	- m_worker_state = QthreadExec::Inactive ;
	- Impl::spinwait( m_worker_state , QthreadExec::Inactive );
	- }
	- else {
	- volatile Type & accum = * m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
	- for ( int i = 1 ; i < team_size ; ++i ) {
	- op.join( accum , * m_shepherd_base[i]->shepherd_team_scratch_value<Type>() );
	- }
	- for ( int i = 1 ; i < team_size ; ++i ) {
	- * m_shepherd_base[i]->shepherd_team_scratch_value<Type>() = accum ;
	- }
	-
	- memory_fence();
	- }
	-
	- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
	- m_shepherd_base[j]->m_worker_state = QthreadExec::Active ;
	- }
	-
	- return *shepherd_team_scratch_value<Type>();
	- }
	-
	- template< class Type >
	- inline
	- Type shepherd_scan( const int team_size
	- , const Type & value
	- , Type * const global_value = 0 ) const
	- {
	- *shepherd_team_scratch_value<Type>() = value ;
	-
	- memory_fence();
	-
	- const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
	-
	- int n , j ;
	-
	- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
	- Impl::spinwait( m_shepherd_base[j]->m_worker_state , QthreadExec::Active );
	- }
	-
	- if ( rev_rank ) {
	- m_worker_state = QthreadExec::Inactive ;
	- Impl::spinwait( m_worker_state , QthreadExec::Inactive );
	- }
	- else {
	- // Root thread scans across values before releasing threads
	- // Worker data is in reverse order, so m_shepherd_base[0] is the
	- // highest ranking thread.
	-
	- // Copy from lower ranking to higher ranking worker.
	-
	- Type accum = * m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
	- for ( int i = 1 ; i < team_size ; ++i ) {
	- const Type tmp = * m_shepherd_base[i]->shepherd_team_scratch_value<Type>();
	- accum += tmp ;
	- * m_shepherd_base[i-1]->shepherd_team_scratch_value<Type>() = tmp ;
	- }
	-
	- * m_shepherd_base[team_size-1]->shepherd_team_scratch_value<Type>() =
	- global_value ? atomic_fetch_add( global_value , accum ) : 0 ;
	-
	- // Join from lower ranking to higher ranking worker.
	- for ( int i = team_size ; --i ; ) {
	- * m_shepherd_base[i-1]->shepherd_team_scratch_value<Type>() += * m_shepherd_base[i]->shepherd_team_scratch_value<Type>();
	- }
	-
	- memory_fence();
	- }
	-
	- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
	- m_shepherd_base[j]->m_worker_state = QthreadExec::Active ;
	- }
	-
	- return *shepherd_team_scratch_value<Type>();
	- }
	-
	- //----------------------------------------
	-
	- static inline
	- int align_alloc( int size )
	- {
	- enum { ALLOC_GRAIN = 1 << 6 /* power of two, 64bytes */};
	- enum { ALLOC_GRAIN_MASK = ALLOC_GRAIN - 1 };
	- return ( size + ALLOC_GRAIN_MASK ) & ~ALLOC_GRAIN_MASK ;
	- }
	-
	- void shared_reset( Qthread::scratch_memory_space & );
	-
	- void * exec_all_reduce_value() const { return m_scratch_alloc ; }
	-
	- static void * exec_all_reduce_result();
	-
	- static void resize_worker_scratch( const int reduce_size , const int shared_size );
	- static void clear_workers();
	-
	- //----------------------------------------
	-
	- inline int worker_rank() const { return m_worker_rank ; }
	- inline int worker_size() const { return m_worker_size ; }
	- inline int shepherd_worker_rank() const { return m_shepherd_worker_rank ; }
	- inline int shepherd_worker_size() const { return m_shepherd_worker_size ; }
	- inline int shepherd_rank() const { return m_shepherd_rank ; }
	- inline int shepherd_size() const { return m_shepherd_size ; }
	-
	- static int worker_per_shepherd();
	-};
	-
	-} /* namespace Impl */
	-} /* namespace Kokkos */
	-
	-//----------------------------------------------------------------------------
	-
	-namespace Kokkos {
	-namespace Impl {
	-
	-class QthreadTeamPolicyMember {
	-private:
	-
	- typedef Kokkos::Qthread execution_space ;
	- typedef execution_space::scratch_memory_space scratch_memory_space ;
	-
	-
	- Impl::QthreadExec & m_exec ;
	- scratch_memory_space m_team_shared ;
	- const int m_team_size ;
	- const int m_team_rank ;
	- const int m_league_size ;
	- const int m_league_end ;
	- int m_league_rank ;
	-
	-public:
	-
	- KOKKOS_INLINE_FUNCTION
	- const scratch_memory_space & team_shmem() const { return m_team_shared ; }
	-
	- KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
	- KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
	- KOKKOS_INLINE_FUNCTION int team_rank() const { return m_team_rank ; }
	- KOKKOS_INLINE_FUNCTION int team_size() const { return m_team_size ; }
	-
	- KOKKOS_INLINE_FUNCTION void team_barrier() const
	-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- {}
	-#else
	- { m_exec.shepherd_barrier( m_team_size ); }
	-#endif
	-
	- template< typename Type >
	- KOKKOS_INLINE_FUNCTION Type team_broadcast( const Type & value , int rank ) const
	-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- { return Type(); }
	-#else
	- { return m_exec.template shepherd_broadcast<Type>( value , m_team_size , rank ); }
	-#endif
	-
	- template< typename Type >
	- KOKKOS_INLINE_FUNCTION Type team_reduce( const Type & value ) const
	-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- { return Type(); }
	-#else
	- { return m_exec.template shepherd_reduce<Type>( m_team_size , value ); }
	-#endif
	-
	- template< typename JoinOp >
	- KOKKOS_INLINE_FUNCTION typename JoinOp::value_type
	- team_reduce( const typename JoinOp::value_type & value
	- , const JoinOp & op ) const
	-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- { return typename JoinOp::value_type(); }
	-#else
	- { return m_exec.template shepherd_reduce<JoinOp>( m_team_size , value , op ); }
	-#endif
	-
	- /** \brief Intra-team exclusive prefix sum with team_rank() ordering.
	- *
	- * The highest rank thread can compute the reduction total as
	- * reduction_total = dev.team_scan( value ) + value ;
	- */
	- template< typename Type >
	- KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value ) const
	-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- { return Type(); }
	-#else
	- { return m_exec.template shepherd_scan<Type>( m_team_size , value ); }
	-#endif
	-
	- /** \brief Intra-team exclusive prefix sum with team_rank() ordering
	- * with intra-team non-deterministic ordering accumulation.
	- *
	- * The global inter-team accumulation value will, at the end of the
	- * league's parallel execution, be the scan's total.
	- * Parallel execution ordering of the league's teams is non-deterministic.
	- * As such the base value for each team's scan operation is similarly
	- * non-deterministic.
	- */
	- template< typename Type >
	- KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value , Type * const global_accum ) const
	-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- { return Type(); }
	-#else
	- { return m_exec.template shepherd_scan<Type>( m_team_size , value , global_accum ); }
	-#endif
	-
	- //----------------------------------------
	- // Private driver for task-team parallel
	-
	- struct TaskTeam {};
	-
	- QthreadTeamPolicyMember();
	- explicit QthreadTeamPolicyMember( const TaskTeam & );
	-
	- //----------------------------------------
	- // Private for the driver ( for ( member_type i(exec,team); i ; i.next_team() ) { ... }
	-
	- // Initialize
	- template< class ... Properties >
	- QthreadTeamPolicyMember( Impl::QthreadExec & exec
	- , const Kokkos::Impl::TeamPolicyInternal<Qthread,Properties...> & team )
	- : m_exec( exec )
	- , m_team_shared(0,0)
	- , m_team_size( team.m_team_size )
	- , m_team_rank( exec.shepherd_worker_rank() )
	- , m_league_size( team.m_league_size )
	- , m_league_end( team.m_league_size - team.m_shepherd_iter * ( exec.shepherd_size() - ( exec.shepherd_rank() + 1 ) ) )
	- , m_league_rank( m_league_end > team.m_shepherd_iter ? m_league_end - team.m_shepherd_iter : 0 )
	- {
	- m_exec.shared_reset( m_team_shared );
	- }
	-
	- // Continue
	- operator bool () const { return m_league_rank < m_league_end ; }
	-
	- // iterate
	- void next_team() { ++m_league_rank ; m_exec.shared_reset( m_team_shared ); }
	-};
	-
	-
	-template< class ... Properties >
	-class TeamPolicyInternal< Kokkos::Qthread , Properties ... >
	- : public PolicyTraits< Properties... >
	-{
	-private:
	-
	- const int m_league_size ;
	- const int m_team_size ;
	- const int m_shepherd_iter ;
	-
	-public:
	-
	- //! Tag this class as a kokkos execution policy
	- typedef TeamPolicyInternal execution_policy ;
	- typedef Qthread execution_space ;
	- typedef PolicyTraits< Properties ... > traits ;
	-
	- //----------------------------------------
	-
	- template< class FunctorType >
	- inline static
	- int team_size_max( const FunctorType & )
	- { return Qthread::instance().shepherd_worker_size(); }
	-
	- template< class FunctorType >
	- static int team_size_recommended( const FunctorType & f )
	- { return team_size_max( f ); }
	-
	- template< class FunctorType >
	- inline static
	- int team_size_recommended( const FunctorType & f , const int& )
	- { return team_size_max( f ); }
	-
	- //----------------------------------------
	-
	- inline int team_size() const { return m_team_size ; }
	- inline int league_size() const { return m_league_size ; }
	-
	- // One active team per shepherd
	- TeamPolicyInternal( Kokkos::Qthread & q
	- , const int league_size
	- , const int team_size
	- , const int /* vector_length */ = 0
	- )
	- : m_league_size( league_size )
	- , m_team_size( team_size < q.shepherd_worker_size()
	- ? team_size : q.shepherd_worker_size() )
	- , m_shepherd_iter( ( league_size + q.shepherd_size() - 1 ) / q.shepherd_size() )
	- {
	- }
	-
	- // One active team per shepherd
	- TeamPolicyInternal( const int league_size
	- , const int team_size
	- , const int /* vector_length */ = 0
	- )
	- : m_league_size( league_size )
	- , m_team_size( team_size < Qthread::instance().shepherd_worker_size()
	- ? team_size : Qthread::instance().shepherd_worker_size() )
	- , m_shepherd_iter( ( league_size + Qthread::instance().shepherd_size() - 1 ) / Qthread::instance().shepherd_size() )
	- {
	- }
	-
	- typedef Impl::QthreadTeamPolicyMember member_type ;
	-
	- friend class Impl::QthreadTeamPolicyMember ;
	-};
	-
	-} /* namespace Impl */
	-} /* namespace Kokkos */
	-
	-//----------------------------------------------------------------------------
	-//----------------------------------------------------------------------------
	-
	-#endif /* #define KOKKOS_QTHREADEXEC_HPP */
	-
	diff --git a/lib/kokkos/core/src/Qthreads/Kokkos_QthreadsExec.cpp b/lib/kokkos/core/src/Qthreads/Kokkos_QthreadsExec.cpp
	new file mode 100644
	index 000000000..1b9249408
	--- /dev/null
	+++ b/lib/kokkos/core/src/Qthreads/Kokkos_QthreadsExec.cpp
	@@ -0,0 +1,519 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+#include <Kokkos_Core_fwd.hpp>
	+
	+#if defined( KOKKOS_ENABLE_QTHREADS )
	+
	+#include <stdio.h>
	+#include <stdlib.h>
	+#include <iostream>
	+#include <sstream>
	+#include <utility>
	+
	+#include <Kokkos_Qthreads.hpp>
	+#include <Kokkos_Atomic.hpp>
	+#include <impl/Kokkos_Error.hpp>
	+
	+// Defines to enable experimental Qthreads functionality.
	+//#define QTHREAD_LOCAL_PRIORITY
	+//#define CLONED_TASKS
	+
	+//#include <qthread.h>
	+
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+
	+namespace Impl {
	+
	+namespace {
	+
	+enum { MAXIMUM_QTHREADS_WORKERS = 1024 };
	+
	+/** s_exec is indexed by the reverse rank of the workers
	+ * for faster fan-in / fan-out lookups
	+ * [ n - 1, n - 2, ..., 0 ]
	+ */
	+QthreadsExec * s_exec[ MAXIMUM_QTHREADS_WORKERS ];
	+
	+int s_number_shepherds = 0;
	+int s_number_workers_per_shepherd = 0;
	+int s_number_workers = 0;
	+
	+inline
	+QthreadsExec ** worker_exec()
	+{
	+ return s_exec + s_number_workers - ( qthread_shep() * s_number_workers_per_shepherd + qthread_worker_local( NULL ) + 1 );
	+}
	+
	+const int s_base_size = QthreadsExec::align_alloc( sizeof(QthreadsExec) );
	+
	+int s_worker_reduce_end = 0; // End of worker reduction memory.
	+int s_worker_shared_end = 0; // Total of worker scratch memory.
	+int s_worker_shared_begin = 0; // Beginning of worker shared memory.
	+
	+QthreadsExecFunctionPointer volatile s_active_function = 0;
	+const void * volatile s_active_function_arg = 0;
	+
	+} // namespace
	+
	+} // namespace Impl
	+
	+} // namespace Kokkos
	+
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+
	+int Qthreads::is_initialized()
	+{
	+ return Impl::s_number_workers != 0;
	+}
	+
	+int Qthreads::concurrency()
	+{
	+ return Impl::s_number_workers_per_shepherd;
	+}
	+
	+int Qthreads::in_parallel()
	+{
	+ return Impl::s_active_function != 0;
	+}
	+
	+void Qthreads::initialize( int thread_count )
	+{
	+ // Environment variable: QTHREAD_NUM_SHEPHERDS
	+ // Environment variable: QTHREAD_NUM_WORKERS_PER_SHEP
	+ // Environment variable: QTHREAD_HWPAR
	+
	+ {
	+ char buffer[256];
	+ snprintf( buffer, sizeof(buffer), "QTHREAD_HWPAR=%d", thread_count );
	+ putenv( buffer );
	+ }
	+
	+ const bool ok_init = ( QTHREAD_SUCCESS == qthread_initialize() ) &&
	+ ( thread_count == qthread_num_shepherds() * qthread_num_workers_local( NO_SHEPHERD ) ) &&
	+ ( thread_count == qthread_num_workers() );
	+
	+ bool ok_symmetry = true;
	+
	+ if ( ok_init ) {
	+ Impl::s_number_shepherds = qthread_num_shepherds();
	+ Impl::s_number_workers_per_shepherd = qthread_num_workers_local( NO_SHEPHERD );
	+ Impl::s_number_workers = Impl::s_number_shepherds * Impl::s_number_workers_per_shepherd;
	+
	+ for ( int i = 0; ok_symmetry && i < Impl::s_number_shepherds; ++i ) {
	+ ok_symmetry = ( Impl::s_number_workers_per_shepherd == qthread_num_workers_local( i ) );
	+ }
	+ }
	+
	+ if ( ! ok_init \|\| ! ok_symmetry ) {
	+ std::ostringstream msg;
	+
	+ msg << "Kokkos::Qthreads::initialize(" << thread_count << ") FAILED";
	+ msg << " : qthread_num_shepherds = " << qthread_num_shepherds();
	+ msg << " : qthread_num_workers_per_shepherd = " << qthread_num_workers_local( NO_SHEPHERD );
	+ msg << " : qthread_num_workers = " << qthread_num_workers();
	+
	+ if ( ! ok_symmetry ) {
	+ msg << " : qthread_num_workers_local = {";
	+ for ( int i = 0; i < Impl::s_number_shepherds; ++i ) {
	+ msg << " " << qthread_num_workers_local( i );
	+ }
	+ msg << " }";
	+ }
	+
	+ Impl::s_number_workers = 0;
	+ Impl::s_number_shepherds = 0;
	+ Impl::s_number_workers_per_shepherd = 0;
	+
	+ if ( ok_init ) { qthread_finalize(); }
	+
	+ Kokkos::Impl::throw_runtime_exception( msg.str() );
	+ }
	+
	+ Impl::QthreadsExec::resize_worker_scratch( 256, 256 );
	+
	+ // Init the array for used for arbitrarily sized atomics.
	+ Impl::init_lock_array_host_space();
	+
	+}
	+
	+void Qthreads::finalize()
	+{
	+ Impl::QthreadsExec::clear_workers();
	+
	+ if ( Impl::s_number_workers ) {
	+ qthread_finalize();
	+ }
	+
	+ Impl::s_number_workers = 0;
	+ Impl::s_number_shepherds = 0;
	+ Impl::s_number_workers_per_shepherd = 0;
	+}
	+
	+void Qthreads::print_configuration( std::ostream & s, const bool detail )
	+{
	+ s << "Kokkos::Qthreads {"
	+ << " num_shepherds(" << Impl::s_number_shepherds << ")"
	+ << " num_workers_per_shepherd(" << Impl::s_number_workers_per_shepherd << ")"
	+ << " }" << std::endl;
	+}
	+
	+Qthreads & Qthreads::instance( int )
	+{
	+ static Qthreads q;
	+ return q;
	+}
	+
	+void Qthreads::fence()
	+{
	+}
	+
	+int Qthreads::shepherd_size() const { return Impl::s_number_shepherds; }
	+int Qthreads::shepherd_worker_size() const { return Impl::s_number_workers_per_shepherd; }
	+
	+} // namespace Kokkos
	+
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+
	+namespace Impl {
	+
	+namespace {
	+
	+aligned_t driver_exec_all( void * arg )
	+{
	+ QthreadsExec & exec = **worker_exec();
	+
	+ (*s_active_function)( exec, s_active_function_arg );
	+
	+/*
	+ fprintf( stdout
	+ , "QthreadsExec driver worker(%d:%d) shepherd(%d:%d) shepherd_worker(%d:%d) done\n"
	+ , exec.worker_rank()
	+ , exec.worker_size()
	+ , exec.shepherd_rank()
	+ , exec.shepherd_size()
	+ , exec.shepherd_worker_rank()
	+ , exec.shepherd_worker_size()
	+ );
	+ fflush(stdout);
	+*/
	+
	+ return 0;
	+}
	+
	+aligned_t driver_resize_worker_scratch( void * arg )
	+{
	+ static volatile int lock_begin = 0;
	+ static volatile int lock_end = 0;
	+
	+ QthreadsExec ** const exec = worker_exec();
	+
	+ //----------------------------------------
	+ // Serialize allocation for thread safety.
	+
	+ while ( ! atomic_compare_exchange_strong( & lock_begin, 0, 1 ) ); // Spin wait to claim lock.
	+
	+ const bool ok = 0 == *exec;
	+
	+ if ( ok ) { exec = (QthreadsExec ) malloc( s_base_size + s_worker_shared_end ); }
	+
	+ lock_begin = 0; // Release lock.
	+
	+ if ( ok ) { new( *exec ) QthreadsExec(); }
	+
	+ //----------------------------------------
	+ // Wait for all calls to complete to insure that each worker has executed.
	+
	+ if ( s_number_workers == 1 + atomic_fetch_add( & lock_end, 1 ) ) { lock_end = 0; }
	+
	+ while ( lock_end );
	+
	+/*
	+ fprintf( stdout
	+ , "QthreadsExec resize worker(%d:%d) shepherd(%d:%d) shepherd_worker(%d:%d) done\n"
	+ , (**exec).worker_rank()
	+ , (**exec).worker_size()
	+ , (**exec).shepherd_rank()
	+ , (**exec).shepherd_size()
	+ , (**exec).shepherd_worker_rank()
	+ , (**exec).shepherd_worker_size()
	+ );
	+ fflush(stdout);
	+*/
	+
	+ //----------------------------------------
	+
	+ if ( ! ok ) {
	+ fprintf( stderr, "Kokkos::QthreadsExec resize failed\n" );
	+ fflush( stderr );
	+ }
	+
	+ return 0;
	+}
	+
	+void verify_is_process( const char * const label, bool not_active = false )
	+{
	+ const bool not_process = 0 != qthread_shep() \|\| 0 != qthread_worker_local( NULL );
	+ const bool is_active = not_active && ( s_active_function \|\| s_active_function_arg );
	+
	+ if ( not_process \|\| is_active ) {
	+ std::string msg( label );
	+ msg.append( " : FAILED" );
	+ if ( not_process ) msg.append(" : not called by main process");
	+ if ( is_active ) msg.append(" : parallel execution in progress");
	+ Kokkos::Impl::throw_runtime_exception( msg );
	+ }
	+}
	+
	+} // namespace
	+
	+int QthreadsExec::worker_per_shepherd()
	+{
	+ return s_number_workers_per_shepherd;
	+}
	+
	+QthreadsExec::QthreadsExec()
	+{
	+ const int shepherd_rank = qthread_shep();
	+ const int shepherd_worker_rank = qthread_worker_local( NULL );
	+ const int worker_rank = shepherd_rank * s_number_workers_per_shepherd + shepherd_worker_rank;
	+
	+ m_worker_base = s_exec;
	+ m_shepherd_base = s_exec + s_number_workers_per_shepherd * ( ( s_number_shepherds - ( shepherd_rank + 1 ) ) );
	+ m_scratch_alloc = ( (unsigned char *) this ) + s_base_size;
	+ m_reduce_end = s_worker_reduce_end;
	+ m_shepherd_rank = shepherd_rank;
	+ m_shepherd_size = s_number_shepherds;
	+ m_shepherd_worker_rank = shepherd_worker_rank;
	+ m_shepherd_worker_size = s_number_workers_per_shepherd;
	+ m_worker_rank = worker_rank;
	+ m_worker_size = s_number_workers;
	+ m_worker_state = QthreadsExec::Active;
	+}
	+
	+void QthreadsExec::clear_workers()
	+{
	+ for ( int iwork = 0; iwork < s_number_workers; ++iwork ) {
	+ QthreadsExec * const exec = s_exec[iwork];
	+ s_exec[iwork] = 0;
	+ free( exec );
	+ }
	+}
	+
	+void QthreadsExec::shared_reset( Qthreads::scratch_memory_space & space )
	+{
	+ new( & space )
	+ Qthreads::scratch_memory_space(
	+ ((unsigned char ) (*m_shepherd_base).m_scratch_alloc ) + s_worker_shared_begin,
	+ s_worker_shared_end - s_worker_shared_begin
	+ );
	+}
	+
	+void QthreadsExec::resize_worker_scratch( const int reduce_size, const int shared_size )
	+{
	+ const int exec_all_reduce_alloc = align_alloc( reduce_size );
	+ const int shepherd_scan_alloc = align_alloc( 8 );
	+ const int shepherd_shared_end = exec_all_reduce_alloc + shepherd_scan_alloc + align_alloc( shared_size );
	+
	+ if ( s_worker_reduce_end < exec_all_reduce_alloc \|\|
	+ s_worker_shared_end < shepherd_shared_end ) {
	+
	+/*
	+ fprintf( stdout, "QthreadsExec::resize\n");
	+ fflush(stdout);
	+*/
	+
	+ // Clear current worker memory before allocating new worker memory.
	+ clear_workers();
	+
	+ // Increase the buffers to an aligned allocation.
	+ s_worker_reduce_end = exec_all_reduce_alloc;
	+ s_worker_shared_begin = exec_all_reduce_alloc + shepherd_scan_alloc;
	+ s_worker_shared_end = shepherd_shared_end;
	+
	+ // Need to query which shepherd this main 'process' is running.
	+
	+ const int main_shep = qthread_shep();
	+
	+ // Have each worker resize its memory for proper first-touch.
	+#if 0
	+ for ( int jshep = 0; jshep < s_number_shepherds; ++jshep ) {
	+ for ( int i = jshep != main_shep ? 0 : 1; i < s_number_workers_per_shepherd; ++i ) {
	+ qthread_fork_to( driver_resize_worker_scratch, NULL, NULL, jshep );
	+ }
	+ }
	+#else
	+ // If this function is used before the 'qthreads.task_policy' unit test,
	+ // the 'qthreads.task_policy' unit test fails with a seg-fault within libqthread.so.
	+ for ( int jshep = 0; jshep < s_number_shepherds; ++jshep ) {
	+ const int num_clone = jshep != main_shep ? s_number_workers_per_shepherd : s_number_workers_per_shepherd - 1;
	+
	+ if ( num_clone ) {
	+ const int ret = qthread_fork_clones_to_local_priority
	+ ( driver_resize_worker_scratch // Function
	+ , NULL // Function data block
	+ , NULL // Pointer to return value feb
	+ , jshep // Shepherd number
	+ , num_clone - 1 // Number of instances - 1
	+ );
	+
	+ assert( ret == QTHREAD_SUCCESS );
	+ }
	+ }
	+#endif
	+
	+ driver_resize_worker_scratch( NULL );
	+
	+ // Verify all workers allocated.
	+
	+ bool ok = true;
	+ for ( int iwork = 0; ok && iwork < s_number_workers; ++iwork ) { ok = 0 != s_exec[iwork]; }
	+
	+ if ( ! ok ) {
	+ std::ostringstream msg;
	+ msg << "Kokkos::Impl::QthreadsExec::resize : FAILED for workers {";
	+ for ( int iwork = 0; iwork < s_number_workers; ++iwork ) {
	+ if ( 0 == s_exec[iwork] ) { msg << " " << ( s_number_workers - ( iwork + 1 ) ); }
	+ }
	+ msg << " }";
	+ Kokkos::Impl::throw_runtime_exception( msg.str() );
	+ }
	+ }
	+}
	+
	+void QthreadsExec::exec_all( Qthreads &, QthreadsExecFunctionPointer func, const void * arg )
	+{
	+ verify_is_process("QthreadsExec::exec_all(...)",true);
	+
	+/*
	+ fprintf( stdout, "QthreadsExec::exec_all\n");
	+ fflush(stdout);
	+*/
	+
	+ s_active_function = func;
	+ s_active_function_arg = arg;
	+
	+ // Need to query which shepherd this main 'process' is running.
	+
	+ const int main_shep = qthread_shep();
	+
	+#if 0
	+ for ( int jshep = 0, iwork = 0; jshep < s_number_shepherds; ++jshep ) {
	+ for ( int i = jshep != main_shep ? 0 : 1; i < s_number_workers_per_shepherd; ++i, ++iwork ) {
	+ qthread_fork_to( driver_exec_all, NULL, NULL, jshep );
	+ }
	+ }
	+#else
	+ // If this function is used before the 'qthreads.task_policy' unit test,
	+ // the 'qthreads.task_policy' unit test fails with a seg-fault within libqthread.so.
	+ for ( int jshep = 0; jshep < s_number_shepherds; ++jshep ) {
	+ const int num_clone = jshep != main_shep ? s_number_workers_per_shepherd : s_number_workers_per_shepherd - 1;
	+
	+ if ( num_clone ) {
	+ const int ret = qthread_fork_clones_to_local_priority
	+ ( driver_exec_all // Function
	+ , NULL // Function data block
	+ , NULL // Pointer to return value feb
	+ , jshep // Shepherd number
	+ , num_clone - 1 // Number of instances - 1
	+ );
	+
	+ assert(ret == QTHREAD_SUCCESS);
	+ }
	+ }
	+#endif
	+
	+ driver_exec_all( NULL );
	+
	+ s_active_function = 0;
	+ s_active_function_arg = 0;
	+}
	+
	+void * QthreadsExec::exec_all_reduce_result()
	+{
	+ return s_exec[0]->m_scratch_alloc;
	+}
	+
	+} // namespace Impl
	+
	+} // namespace Kokkos
	+
	+namespace Kokkos {
	+
	+namespace Impl {
	+
	+QthreadsTeamPolicyMember::QthreadsTeamPolicyMember()
	+ : m_exec( **worker_exec() )
	+ , m_team_shared( 0, 0 )
	+ , m_team_size( 1 )
	+ , m_team_rank( 0 )
	+ , m_league_size( 1 )
	+ , m_league_end( 1 )
	+ , m_league_rank( 0 )
	+{
	+ m_exec.shared_reset( m_team_shared );
	+}
	+
	+QthreadsTeamPolicyMember::QthreadsTeamPolicyMember( const QthreadsTeamPolicyMember::TaskTeam & )
	+ : m_exec( **worker_exec() )
	+ , m_team_shared( 0, 0 )
	+ , m_team_size( s_number_workers_per_shepherd )
	+ , m_team_rank( m_exec.shepherd_worker_rank() )
	+ , m_league_size( 1 )
	+ , m_league_end( 1 )
	+ , m_league_rank( 0 )
	+{
	+ m_exec.shared_reset( m_team_shared );
	+}
	+
	+} // namespace Impl
	+
	+} // namespace Kokkos
	+
	+#endif // #if defined( KOKKOS_ENABLE_QTHREADS )
	diff --git a/lib/kokkos/core/src/Qthreads/Kokkos_QthreadsExec.hpp b/lib/kokkos/core/src/Qthreads/Kokkos_QthreadsExec.hpp
	new file mode 100644
	index 000000000..64856eb99
	--- /dev/null
	+++ b/lib/kokkos/core/src/Qthreads/Kokkos_QthreadsExec.hpp
	@@ -0,0 +1,640 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+#ifndef KOKKOS_QTHREADSEXEC_HPP
	+#define KOKKOS_QTHREADSEXEC_HPP
	+
	+#include <impl/Kokkos_spinwait.hpp>
	+
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+
	+namespace Impl {
	+
	+class QthreadsExec;
	+
	+typedef void (QthreadsExecFunctionPointer)( QthreadsExec &, const void );
	+
	+class QthreadsExec {
	+private:
	+ enum { Inactive = 0, Active = 1 };
	+
	+ const QthreadsExec * const * m_worker_base;
	+ const QthreadsExec * const * m_shepherd_base;
	+
	+ void * m_scratch_alloc; ///< Scratch memory [ reduce, team, shared ]
	+ int m_reduce_end; ///< End of scratch reduction memory
	+
	+ int m_shepherd_rank;
	+ int m_shepherd_size;
	+
	+ int m_shepherd_worker_rank;
	+ int m_shepherd_worker_size;
	+
	+ /*
	+ * m_worker_rank = m_shepherd_rank * m_shepherd_worker_size + m_shepherd_worker_rank
	+ * m_worker_size = m_shepherd_size * m_shepherd_worker_size
	+ */
	+ int m_worker_rank;
	+ int m_worker_size;
	+
	+ int mutable volatile m_worker_state;
	+
	+ friend class Kokkos::Qthreads;
	+
	+ ~QthreadsExec();
	+ QthreadsExec( const QthreadsExec & );
	+ QthreadsExec & operator = ( const QthreadsExec & );
	+
	+public:
	+ QthreadsExec();
	+
	+ /** Execute the input function on all available Qthreads workers. */
	+ static void exec_all( Qthreads &, QthreadsExecFunctionPointer, const void * );
	+
	+ /** Barrier across all workers participating in the 'exec_all'. */
	+ void exec_all_barrier() const
	+ {
	+ const int rev_rank = m_worker_size - ( m_worker_rank + 1 );
	+
	+ int n, j;
	+
	+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ); n <<= 1 ) {
	+ Impl::spinwait_while_equal( m_worker_base[j]->m_worker_state, QthreadsExec::Active );
	+ }
	+
	+ if ( rev_rank ) {
	+ m_worker_state = QthreadsExec::Inactive;
	+ Impl::spinwait_while_equal( m_worker_state, QthreadsExec::Inactive );
	+ }
	+
	+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ); n <<= 1 ) {
	+ m_worker_base[j]->m_worker_state = QthreadsExec::Active;
	+ }
	+ }
	+
	+ /** Barrier across workers within the shepherd with rank < team_rank. */
	+ void shepherd_barrier( const int team_size ) const
	+ {
	+ if ( m_shepherd_worker_rank < team_size ) {
	+
	+ const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
	+
	+ int n, j;
	+
	+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
	+ Impl::spinwait_while_equal( m_shepherd_base[j]->m_worker_state, QthreadsExec::Active );
	+ }
	+
	+ if ( rev_rank ) {
	+ m_worker_state = QthreadsExec::Inactive;
	+ Impl::spinwait_while_equal( m_worker_state, QthreadsExec::Inactive );
	+ }
	+
	+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
	+ m_shepherd_base[j]->m_worker_state = QthreadsExec::Active;
	+ }
	+ }
	+ }
	+
	+ /** Reduce across all workers participating in the 'exec_all'. */
	+ template< class FunctorType, class ReducerType, class ArgTag >
	+ inline
	+ void exec_all_reduce( const FunctorType & func, const ReducerType & reduce ) const
	+ {
	+ typedef Kokkos::Impl::if_c< std::is_same<InvalidType, ReducerType>::value, FunctorType, ReducerType > ReducerConditional;
	+ typedef typename ReducerConditional::type ReducerTypeFwd;
	+ typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, ArgTag > ValueJoin;
	+
	+ const int rev_rank = m_worker_size - ( m_worker_rank + 1 );
	+
	+ int n, j;
	+
	+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ); n <<= 1 ) {
	+ const QthreadsExec & fan = *m_worker_base[j];
	+
	+ Impl::spinwait_while_equal( fan.m_worker_state, QthreadsExec::Active );
	+
	+ ValueJoin::join( ReducerConditional::select( func, reduce ), m_scratch_alloc, fan.m_scratch_alloc );
	+ }
	+
	+ if ( rev_rank ) {
	+ m_worker_state = QthreadsExec::Inactive;
	+ Impl::spinwait_while_equal( m_worker_state, QthreadsExec::Inactive );
	+ }
	+
	+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ); n <<= 1 ) {
	+ m_worker_base[j]->m_worker_state = QthreadsExec::Active;
	+ }
	+ }
	+
	+ /** Scan across all workers participating in the 'exec_all'. */
	+ template< class FunctorType, class ArgTag >
	+ inline
	+ void exec_all_scan( const FunctorType & func ) const
	+ {
	+ typedef Kokkos::Impl::FunctorValueInit< FunctorType, ArgTag > ValueInit;
	+ typedef Kokkos::Impl::FunctorValueJoin< FunctorType, ArgTag > ValueJoin;
	+ typedef Kokkos::Impl::FunctorValueOps< FunctorType, ArgTag > ValueOps;
	+
	+ const int rev_rank = m_worker_size - ( m_worker_rank + 1 );
	+
	+ int n, j;
	+
	+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ); n <<= 1 ) {
	+ Impl::spinwait_while_equal( m_worker_base[j]->m_worker_state, QthreadsExec::Active );
	+ }
	+
	+ if ( rev_rank ) {
	+ m_worker_state = QthreadsExec::Inactive;
	+ Impl::spinwait_while_equal( m_worker_state, QthreadsExec::Inactive );
	+ }
	+ else {
	+ // Root thread scans across values before releasing threads.
	+ // Worker data is in reverse order, so m_worker_base[0] is the
	+ // highest ranking thread.
	+
	+ // Copy from lower ranking to higher ranking worker.
	+ for ( int i = 1; i < m_worker_size; ++i ) {
	+ ValueOps::copy( func
	+ , m_worker_base[i-1]->m_scratch_alloc
	+ , m_worker_base[i]->m_scratch_alloc
	+ );
	+ }
	+
	+ ValueInit::init( func, m_worker_base[m_worker_size-1]->m_scratch_alloc );
	+
	+ // Join from lower ranking to higher ranking worker.
	+ // Value at m_worker_base[n-1] is zero so skip adding it to m_worker_base[n-2].
	+ for ( int i = m_worker_size - 1; --i > 0; ) {
	+ ValueJoin::join( func, m_worker_base[i-1]->m_scratch_alloc, m_worker_base[i]->m_scratch_alloc );
	+ }
	+ }
	+
	+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ); n <<= 1 ) {
	+ m_worker_base[j]->m_worker_state = QthreadsExec::Active;
	+ }
	+ }
	+
	+ //----------------------------------------
	+
	+ template< class Type >
	+ inline
	+ volatile Type * shepherd_team_scratch_value() const
	+ { return (volatile Type)( ( (unsigned char ) m_scratch_alloc ) + m_reduce_end ); }
	+
	+ template< class Type >
	+ inline
	+ void shepherd_broadcast( Type & value, const int team_size, const int team_rank ) const
	+ {
	+ if ( m_shepherd_base ) {
	+ Type * const shared_value = m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
	+ if ( m_shepherd_worker_rank == team_rank ) { *shared_value = value; }
	+ memory_fence();
	+ shepherd_barrier( team_size );
	+ value = *shared_value;
	+ }
	+ }
	+
	+ template< class Type >
	+ inline
	+ Type shepherd_reduce( const int team_size, const Type & value ) const
	+ {
	+ volatile Type * const shared_value = shepherd_team_scratch_value<Type>();
	+ *shared_value = value;
	+// *shepherd_team_scratch_value<Type>() = value;
	+
	+ memory_fence();
	+
	+ const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
	+
	+ int n, j;
	+
	+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
	+ Impl::spinwait_while_equal( m_shepherd_base[j]->m_worker_state, QthreadsExec::Active );
	+ }
	+
	+ if ( rev_rank ) {
	+ m_worker_state = QthreadsExec::Inactive;
	+ Impl::spinwait_while_equal( m_worker_state, QthreadsExec::Inactive );
	+ }
	+ else {
	+ Type & accum = *m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
	+ for ( int i = 1; i < n; ++i ) {
	+ accum += *m_shepherd_base[i]->shepherd_team_scratch_value<Type>();
	+ }
	+ for ( int i = 1; i < n; ++i ) {
	+ *m_shepherd_base[i]->shepherd_team_scratch_value<Type>() = accum;
	+ }
	+
	+ memory_fence();
	+ }
	+
	+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
	+ m_shepherd_base[j]->m_worker_state = QthreadsExec::Active;
	+ }
	+
	+ return *shepherd_team_scratch_value<Type>();
	+ }
	+
	+ template< class JoinOp >
	+ inline
	+ typename JoinOp::value_type
	+ shepherd_reduce( const int team_size
	+ , const typename JoinOp::value_type & value
	+ , const JoinOp & op ) const
	+ {
	+ typedef typename JoinOp::value_type Type;
	+
	+ volatile Type * const shared_value = shepherd_team_scratch_value<Type>();
	+ *shared_value = value;
	+// *shepherd_team_scratch_value<Type>() = value;
	+
	+ memory_fence();
	+
	+ const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
	+
	+ int n, j;
	+
	+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
	+ Impl::spinwait_while_equal( m_shepherd_base[j]->m_worker_state, QthreadsExec::Active );
	+ }
	+
	+ if ( rev_rank ) {
	+ m_worker_state = QthreadsExec::Inactive;
	+ Impl::spinwait_while_equal( m_worker_state, QthreadsExec::Inactive );
	+ }
	+ else {
	+ volatile Type & accum = *m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
	+ for ( int i = 1; i < team_size; ++i ) {
	+ op.join( accum, *m_shepherd_base[i]->shepherd_team_scratch_value<Type>() );
	+ }
	+ for ( int i = 1; i < team_size; ++i ) {
	+ *m_shepherd_base[i]->shepherd_team_scratch_value<Type>() = accum;
	+ }
	+
	+ memory_fence();
	+ }
	+
	+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
	+ m_shepherd_base[j]->m_worker_state = QthreadsExec::Active;
	+ }
	+
	+ return *shepherd_team_scratch_value<Type>();
	+ }
	+
	+ template< class Type >
	+ inline
	+ Type shepherd_scan( const int team_size
	+ , const Type & value
	+ , Type * const global_value = 0 ) const
	+ {
	+ *shepherd_team_scratch_value<Type>() = value;
	+
	+ memory_fence();
	+
	+ const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
	+
	+ int n, j;
	+
	+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
	+ Impl::spinwait_while_equal( m_shepherd_base[j]->m_worker_state, QthreadsExec::Active );
	+ }
	+
	+ if ( rev_rank ) {
	+ m_worker_state = QthreadsExec::Inactive;
	+ Impl::spinwait_while_equal( m_worker_state, QthreadsExec::Inactive );
	+ }
	+ else {
	+ // Root thread scans across values before releasing threads.
	+ // Worker data is in reverse order, so m_shepherd_base[0] is the
	+ // highest ranking thread.
	+
	+ // Copy from lower ranking to higher ranking worker.
	+
	+ Type accum = *m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
	+ for ( int i = 1; i < team_size; ++i ) {
	+ const Type tmp = *m_shepherd_base[i]->shepherd_team_scratch_value<Type>();
	+ accum += tmp;
	+ *m_shepherd_base[i-1]->shepherd_team_scratch_value<Type>() = tmp;
	+ }
	+
	+ *m_shepherd_base[team_size-1]->shepherd_team_scratch_value<Type>() =
	+ global_value ? atomic_fetch_add( global_value, accum ) : 0;
	+
	+ // Join from lower ranking to higher ranking worker.
	+ for ( int i = team_size; --i; ) {
	+ m_shepherd_base[i-1]->shepherd_team_scratch_value<Type>() += m_shepherd_base[i]->shepherd_team_scratch_value<Type>();
	+ }
	+
	+ memory_fence();
	+ }
	+
	+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
	+ m_shepherd_base[j]->m_worker_state = QthreadsExec::Active;
	+ }
	+
	+ return *shepherd_team_scratch_value<Type>();
	+ }
	+
	+ //----------------------------------------
	+
	+ static inline
	+ int align_alloc( int size )
	+ {
	+ enum { ALLOC_GRAIN = 1 << 6 /* power of two, 64bytes */ };
	+ enum { ALLOC_GRAIN_MASK = ALLOC_GRAIN - 1 };
	+ return ( size + ALLOC_GRAIN_MASK ) & ~ALLOC_GRAIN_MASK;
	+ }
	+
	+ void shared_reset( Qthreads::scratch_memory_space & );
	+
	+ void * exec_all_reduce_value() const { return m_scratch_alloc; }
	+
	+ static void * exec_all_reduce_result();
	+
	+ static void resize_worker_scratch( const int reduce_size, const int shared_size );
	+ static void clear_workers();
	+
	+ //----------------------------------------
	+
	+ inline int worker_rank() const { return m_worker_rank; }
	+ inline int worker_size() const { return m_worker_size; }
	+ inline int shepherd_worker_rank() const { return m_shepherd_worker_rank; }
	+ inline int shepherd_worker_size() const { return m_shepherd_worker_size; }
	+ inline int shepherd_rank() const { return m_shepherd_rank; }
	+ inline int shepherd_size() const { return m_shepherd_size; }
	+
	+ static int worker_per_shepherd();
	+};
	+
	+} // namespace Impl
	+
	+} // namespace Kokkos
	+
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+
	+namespace Impl {
	+
	+class QthreadsTeamPolicyMember {
	+private:
	+ typedef Kokkos::Qthreads execution_space;
	+ typedef execution_space::scratch_memory_space scratch_memory_space;
	+
	+ Impl::QthreadsExec & m_exec;
	+ scratch_memory_space m_team_shared;
	+ const int m_team_size;
	+ const int m_team_rank;
	+ const int m_league_size;
	+ const int m_league_end;
	+ int m_league_rank;
	+
	+public:
	+ KOKKOS_INLINE_FUNCTION
	+ const scratch_memory_space & team_shmem() const { return m_team_shared; }
	+
	+ KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank; }
	+ KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size; }
	+ KOKKOS_INLINE_FUNCTION int team_rank() const { return m_team_rank; }
	+ KOKKOS_INLINE_FUNCTION int team_size() const { return m_team_size; }
	+
	+ KOKKOS_INLINE_FUNCTION void team_barrier() const
	+#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+ {}
	+#else
	+ { m_exec.shepherd_barrier( m_team_size ); }
	+#endif
	+
	+ template< typename Type >
	+ KOKKOS_INLINE_FUNCTION Type team_broadcast( const Type & value, int rank ) const
	+#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+ { return Type(); }
	+#else
	+ { return m_exec.template shepherd_broadcast<Type>( value, m_team_size, rank ); }
	+#endif
	+
	+ template< typename Type >
	+ KOKKOS_INLINE_FUNCTION Type team_reduce( const Type & value ) const
	+#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+ { return Type(); }
	+#else
	+ { return m_exec.template shepherd_reduce<Type>( m_team_size, value ); }
	+#endif
	+
	+ template< typename JoinOp >
	+ KOKKOS_INLINE_FUNCTION typename JoinOp::value_type
	+ team_reduce( const typename JoinOp::value_type & value
	+ , const JoinOp & op ) const
	+#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+ { return typename JoinOp::value_type(); }
	+#else
	+ { return m_exec.template shepherd_reduce<JoinOp>( m_team_size, value, op ); }
	+#endif
	+
	+ /** \brief Intra-team exclusive prefix sum with team_rank() ordering.
	+ *
	+ * The highest rank thread can compute the reduction total as
	+ * reduction_total = dev.team_scan( value ) + value;
	+ */
	+ template< typename Type >
	+ KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value ) const
	+#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+ { return Type(); }
	+#else
	+ { return m_exec.template shepherd_scan<Type>( m_team_size, value ); }
	+#endif
	+
	+ /** \brief Intra-team exclusive prefix sum with team_rank() ordering
	+ * with intra-team non-deterministic ordering accumulation.
	+ *
	+ * The global inter-team accumulation value will, at the end of the league's
	+ * parallel execution, be the scan's total. Parallel execution ordering of
	+ * the league's teams is non-deterministic. As such the base value for each
	+ * team's scan operation is similarly non-deterministic.
	+ */
	+ template< typename Type >
	+ KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value, Type * const global_accum ) const
	+#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+ { return Type(); }
	+#else
	+ { return m_exec.template shepherd_scan<Type>( m_team_size, value, global_accum ); }
	+#endif
	+
	+ //----------------------------------------
	+ // Private driver for task-team parallel.
	+
	+ struct TaskTeam {};
	+
	+ QthreadsTeamPolicyMember();
	+ explicit QthreadsTeamPolicyMember( const TaskTeam & );
	+
	+ //----------------------------------------
	+ // Private for the driver ( for ( member_type i( exec, team ); i; i.next_team() ) { ... }
	+
	+ // Initialize.
	+ template< class ... Properties >
	+ QthreadsTeamPolicyMember( Impl::QthreadsExec & exec
	+ , const Kokkos::Impl::TeamPolicyInternal< Qthreads, Properties... > & team )
	+ : m_exec( exec )
	+ , m_team_shared( 0, 0 )
	+ , m_team_size( team.m_team_size )
	+ , m_team_rank( exec.shepherd_worker_rank() )
	+ , m_league_size( team.m_league_size )
	+ , m_league_end( team.m_league_size - team.m_shepherd_iter * ( exec.shepherd_size() - ( exec.shepherd_rank() + 1 ) ) )
	+ , m_league_rank( m_league_end > team.m_shepherd_iter ? m_league_end - team.m_shepherd_iter : 0 )
	+ {
	+ m_exec.shared_reset( m_team_shared );
	+ }
	+
	+ // Continue.
	+ operator bool () const { return m_league_rank < m_league_end; }
	+
	+ // Iterate.
	+ void next_team() { ++m_league_rank; m_exec.shared_reset( m_team_shared ); }
	+};
	+
	+template< class ... Properties >
	+class TeamPolicyInternal< Kokkos::Qthreads, Properties ... >
	+ : public PolicyTraits< Properties... >
	+{
	+private:
	+ const int m_league_size;
	+ const int m_team_size;
	+ const int m_shepherd_iter;
	+
	+public:
	+ //! Tag this class as a kokkos execution policy.
	+ typedef TeamPolicyInternal execution_policy;
	+ typedef Qthreads execution_space;
	+ typedef PolicyTraits< Properties ... > traits;
	+
	+ //----------------------------------------
	+
	+ template< class FunctorType >
	+ inline static
	+ int team_size_max( const FunctorType & )
	+ { return Qthreads::instance().shepherd_worker_size(); }
	+
	+ template< class FunctorType >
	+ static int team_size_recommended( const FunctorType & f )
	+ { return team_size_max( f ); }
	+
	+ template< class FunctorType >
	+ inline static
	+ int team_size_recommended( const FunctorType & f, const int& )
	+ { return team_size_max( f ); }
	+
	+ //----------------------------------------
	+
	+ inline int team_size() const { return m_team_size; }
	+ inline int league_size() const { return m_league_size; }
	+
	+ // One active team per shepherd.
	+ TeamPolicyInternal( Kokkos::Qthreads & q
	+ , const int league_size
	+ , const int team_size
	+ , const int /* vector_length */ = 0
	+ )
	+ : m_league_size( league_size )
	+ , m_team_size( team_size < q.shepherd_worker_size()
	+ ? team_size : q.shepherd_worker_size() )
	+ , m_shepherd_iter( ( league_size + q.shepherd_size() - 1 ) / q.shepherd_size() )
	+ {}
	+
	+ // TODO: Make sure this is correct.
	+ // One active team per shepherd.
	+ TeamPolicyInternal( Kokkos::Qthreads & q
	+ , const int league_size
	+ , const Kokkos::AUTO_t & /* team_size_request */
	+ , const int /* vector_length */ = 0
	+ )
	+ : m_league_size( league_size )
	+ , m_team_size( q.shepherd_worker_size() )
	+ , m_shepherd_iter( ( league_size + q.shepherd_size() - 1 ) / q.shepherd_size() )
	+ {}
	+
	+ // One active team per shepherd.
	+ TeamPolicyInternal( const int league_size
	+ , const int team_size
	+ , const int /* vector_length */ = 0
	+ )
	+ : m_league_size( league_size )
	+ , m_team_size( team_size < Qthreads::instance().shepherd_worker_size()
	+ ? team_size : Qthreads::instance().shepherd_worker_size() )
	+ , m_shepherd_iter( ( league_size + Qthreads::instance().shepherd_size() - 1 ) / Qthreads::instance().shepherd_size() )
	+ {}
	+
	+ // TODO: Make sure this is correct.
	+ // One active team per shepherd.
	+ TeamPolicyInternal( const int league_size
	+ , const Kokkos::AUTO_t & /* team_size_request */
	+ , const int /* vector_length */ = 0
	+ )
	+ : m_league_size( league_size )
	+ , m_team_size( Qthreads::instance().shepherd_worker_size() )
	+ , m_shepherd_iter( ( league_size + Qthreads::instance().shepherd_size() - 1 ) / Qthreads::instance().shepherd_size() )
	+ {}
	+
	+ // TODO: Doesn't do anything yet. Fix this.
	+ /** \brief set chunk_size to a discrete value*/
	+ inline TeamPolicyInternal set_chunk_size(typename traits::index_type chunk_size_) const {
	+ TeamPolicyInternal p = *this;
	+// p.m_chunk_size = chunk_size_;
	+ return p;
	+ }
	+
	+ typedef Impl::QthreadsTeamPolicyMember member_type;
	+
	+ friend class Impl::QthreadsTeamPolicyMember;
	+};
	+
	+} // namespace Impl
	+
	+} // namespace Kokkos
	+
	+//----------------------------------------------------------------------------
	+
	+#endif // #define KOKKOS_QTHREADSEXEC_HPP
	diff --git a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_Parallel.hpp b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Parallel.hpp
	similarity index 86%
	rename from lib/kokkos/core/src/Qthread/Kokkos_Qthread_Parallel.hpp
	rename to lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Parallel.hpp
	index cb5b18094..9f9960754 100644
	--- a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_Parallel.hpp
	+++ b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Parallel.hpp
	@@ -1,727 +1,727 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	-#ifndef KOKKOS_QTHREAD_PARALLEL_HPP
	-#define KOKKOS_QTHREAD_PARALLEL_HPP
	+#ifndef KOKKOS_QTHREADS_PARALLEL_HPP
	+#define KOKKOS_QTHREADS_PARALLEL_HPP

	#include <vector>

	#include <Kokkos_Parallel.hpp>

	#include <impl/Kokkos_StaticAssert.hpp>
	#include <impl/Kokkos_FunctorAdapter.hpp>

	-#include <Qthread/Kokkos_QthreadExec.hpp>
	+#include <Qthreads/Kokkos_QthreadsExec.hpp>

	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	//----------------------------------------------------------------------------

	template< class FunctorType , class ... Traits >
	class ParallelFor< FunctorType
	, Kokkos::RangePolicy< Traits ... >
	- , Kokkos::Qthread
	+ , Kokkos::Qthreads
	>
	{
	private:

	typedef Kokkos::RangePolicy< Traits ... > Policy ;

	typedef typename Policy::work_tag WorkTag ;
	typedef typename Policy::member_type Member ;
	typedef typename Policy::WorkRange WorkRange ;

	const FunctorType m_functor ;
	const Policy m_policy ;

	template< class TagType >
	inline static
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	exec_range( const FunctorType & functor , const Member ibeg , const Member iend )
	{
	for ( Member i = ibeg ; i < iend ; ++i ) {
	functor( i );
	}
	}

	template< class TagType >
	inline static
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	exec_range( const FunctorType & functor , const Member ibeg , const Member iend )
	{
	const TagType t{} ;
	for ( Member i = ibeg ; i < iend ; ++i ) {
	functor( t , i );
	}
	}

	// Function is called once by every concurrent thread.
	- static void exec( QthreadExec & exec , const void * arg )
	+ static void exec( QthreadsExec & exec , const void * arg )
	{
	const ParallelFor & self = * ((const ParallelFor *) arg );

	const WorkRange range( self.m_policy, exec.worker_rank(), exec.worker_size() );

	ParallelFor::template exec_range< WorkTag > ( self.m_functor , range.begin() , range.end() );

	// All threads wait for completion.
	exec.exec_all_barrier();
	}

	public:

	inline
	void execute() const
	{
	- Impl::QthreadExec::exec_all( Qthread::instance() , & ParallelFor::exec , this );
	+ Impl::QthreadsExec::exec_all( Qthreads::instance() , & ParallelFor::exec , this );

	}

	ParallelFor( const FunctorType & arg_functor
	, const Policy & arg_policy
	)
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	{ }
	};

	//----------------------------------------------------------------------------

	template< class FunctorType , class ReducerType , class ... Traits >
	class ParallelReduce< FunctorType
	, Kokkos::RangePolicy< Traits ... >
	, ReducerType
	- , Kokkos::Qthread
	+ , Kokkos::Qthreads
	>
	{
	private:

	typedef Kokkos::RangePolicy< Traits ... > Policy ;

	typedef typename Policy::work_tag WorkTag ;
	typedef typename Policy::WorkRange WorkRange ;
	typedef typename Policy::member_type Member ;

	typedef Kokkos::Impl::if_c< std::is_same<InvalidType, ReducerType>::value, FunctorType, ReducerType > ReducerConditional;
	typedef typename ReducerConditional::type ReducerTypeFwd;

	// Static Assert WorkTag void if ReducerType not InvalidType

	typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
	typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;

	typedef typename ValueTraits::pointer_type pointer_type ;
	typedef typename ValueTraits::reference_type reference_type ;

	const FunctorType m_functor ;
	const Policy m_policy ;
	const ReducerType m_reducer ;
	const pointer_type m_result_ptr ;

	template< class TagType >
	inline static
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	exec_range( const FunctorType & functor
	, const Member ibeg , const Member iend
	, reference_type update )
	{
	for ( Member i = ibeg ; i < iend ; ++i ) {
	functor( i , update );
	}
	}

	template< class TagType >
	inline static
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	exec_range( const FunctorType & functor
	, const Member ibeg , const Member iend
	, reference_type update )
	{
	const TagType t{} ;
	for ( Member i = ibeg ; i < iend ; ++i ) {
	functor( t , i , update );
	}
	}

	- static void exec( QthreadExec & exec , const void * arg )
	+ static void exec( QthreadsExec & exec , const void * arg )
	{
	const ParallelReduce & self = * ((const ParallelReduce *) arg );

	const WorkRange range( self.m_policy, exec.worker_rank(), exec.worker_size() );

	ParallelReduce::template exec_range< WorkTag >(
	self.m_functor, range.begin(), range.end(),
	ValueInit::init( ReducerConditional::select(self.m_functor , self.m_reducer)
	, exec.exec_all_reduce_value() ) );

	exec.template exec_all_reduce< FunctorType, ReducerType, WorkTag >( self.m_functor, self.m_reducer );
	}

	public:

	inline
	void execute() const
	{
	- QthreadExec::resize_worker_scratch( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , 0 );
	- Impl::QthreadExec::exec_all( Qthread::instance() , & ParallelReduce::exec , this );
	+ QthreadsExec::resize_worker_scratch( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , 0 );
	+ Impl::QthreadsExec::exec_all( Qthreads::instance() , & ParallelReduce::exec , this );

	- const pointer_type data = (pointer_type) QthreadExec::exec_all_reduce_result();
	+ const pointer_type data = (pointer_type) QthreadsExec::exec_all_reduce_result();

	Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , data );

	if ( m_result_ptr ) {
	const unsigned n = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
	for ( unsigned i = 0 ; i < n ; ++i ) { m_result_ptr[i] = data[i]; }
	}
	}

	template< class ViewType >
	ParallelReduce( const FunctorType & arg_functor
	, const Policy & arg_policy
	, const ViewType & arg_result_view
	, typename std::enable_if<Kokkos::is_view< ViewType >::value &&
	!Kokkos::is_reducer_type< ReducerType >::value
	, void*>::type = NULL)
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	, m_reducer( InvalidType() )
	, m_result_ptr( arg_result_view.data() )
	{ }

	ParallelReduce( const FunctorType & arg_functor
	, Policy arg_policy
	, const ReducerType& reducer )
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	, m_reducer( reducer )
	, m_result_ptr( reducer.result_view().data() )
	{ }
	};

	//----------------------------------------------------------------------------

	template< class FunctorType , class ... Properties >
	class ParallelFor< FunctorType
	, TeamPolicy< Properties ... >
	- , Kokkos::Qthread >
	+ , Kokkos::Qthreads >
	{
	private:

	- typedef Kokkos::Impl::TeamPolicyInternal< Kokkos::Qthread , Properties ... > Policy ;
	+ typedef Kokkos::Impl::TeamPolicyInternal< Kokkos::Qthreads , Properties ... > Policy ;
	typedef typename Policy::member_type Member ;
	typedef typename Policy::work_tag WorkTag ;

	const FunctorType m_functor ;
	const Policy m_policy ;

	template< class TagType >
	inline static
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	exec_team( const FunctorType & functor , Member member )
	{
	while ( member ) {
	functor( member );
	member.team_barrier();
	member.next_team();
	}
	}

	template< class TagType >
	inline static
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	exec_team( const FunctorType & functor , Member member )
	{
	const TagType t{} ;
	while ( member ) {
	functor( t , member );
	member.team_barrier();
	member.next_team();
	}
	}

	- static void exec( QthreadExec & exec , const void * arg )
	+ static void exec( QthreadsExec & exec , const void * arg )
	{
	const ParallelFor & self = * ((const ParallelFor *) arg );

	ParallelFor::template exec_team< WorkTag >
	( self.m_functor , Member( exec , self.m_policy ) );

	exec.exec_all_barrier();
	}

	public:

	inline
	void execute() const
	{
	- QthreadExec::resize_worker_scratch
	+ QthreadsExec::resize_worker_scratch
	( /* reduction memory */ 0
	, /* team shared memory */ FunctorTeamShmemSize< FunctorType >::value( m_functor , m_policy.team_size() ) );
	- Impl::QthreadExec::exec_all( Qthread::instance() , & ParallelFor::exec , this );
	+ Impl::QthreadsExec::exec_all( Qthreads::instance() , & ParallelFor::exec , this );
	}

	ParallelFor( const FunctorType & arg_functor ,
	const Policy & arg_policy )
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	{ }
	};

	//----------------------------------------------------------------------------

	template< class FunctorType , class ReducerType , class ... Properties >
	class ParallelReduce< FunctorType
	, TeamPolicy< Properties... >
	, ReducerType
	- , Kokkos::Qthread
	+ , Kokkos::Qthreads
	>
	{
	private:

	- typedef Kokkos::Impl::TeamPolicyInternal< Kokkos::Qthread , Properties ... > Policy ;
	+ typedef Kokkos::Impl::TeamPolicyInternal< Kokkos::Qthreads , Properties ... > Policy ;

	typedef typename Policy::work_tag WorkTag ;
	typedef typename Policy::member_type Member ;

	typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
	typedef typename ReducerConditional::type ReducerTypeFwd;

	typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTag > ValueTraits ;
	typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;

	typedef typename ValueTraits::pointer_type pointer_type ;
	typedef typename ValueTraits::reference_type reference_type ;

	const FunctorType m_functor ;
	const Policy m_policy ;
	const ReducerType m_reducer ;
	const pointer_type m_result_ptr ;

	template< class TagType >
	inline static
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	exec_team( const FunctorType & functor , Member member , reference_type update )
	{
	while ( member ) {
	functor( member , update );
	member.team_barrier();
	member.next_team();
	}
	}

	template< class TagType >
	inline static
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	exec_team( const FunctorType & functor , Member member , reference_type update )
	{
	const TagType t{} ;
	while ( member ) {
	functor( t , member , update );
	member.team_barrier();
	member.next_team();
	}
	}

	- static void exec( QthreadExec & exec , const void * arg )
	+ static void exec( QthreadsExec & exec , const void * arg )
	{
	const ParallelReduce & self = * ((const ParallelReduce *) arg );

	ParallelReduce::template exec_team< WorkTag >
	( self.m_functor
	, Member( exec , self.m_policy )
	, ValueInit::init( ReducerConditional::select( self.m_functor , self.m_reducer )
	, exec.exec_all_reduce_value() ) );

	exec.template exec_all_reduce< FunctorType, ReducerType, WorkTag >( self.m_functor, self.m_reducer );
	}

	public:

	inline
	void execute() const
	{
	- QthreadExec::resize_worker_scratch
	+ QthreadsExec::resize_worker_scratch
	( /* reduction memory */ ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) )
	, /* team shared memory */ FunctorTeamShmemSize< FunctorType >::value( m_functor , m_policy.team_size() ) );

	- Impl::QthreadExec::exec_all( Qthread::instance() , & ParallelReduce::exec , this );
	+ Impl::QthreadsExec::exec_all( Qthreads::instance() , & ParallelReduce::exec , this );

	- const pointer_type data = (pointer_type) QthreadExec::exec_all_reduce_result();
	+ const pointer_type data = (pointer_type) QthreadsExec::exec_all_reduce_result();

	Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer), data );

	if ( m_result_ptr ) {
	const unsigned n = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
	for ( unsigned i = 0 ; i < n ; ++i ) { m_result_ptr[i] = data[i]; }
	}
	}

	template< class ViewType >
	ParallelReduce( const FunctorType & arg_functor
	, const Policy & arg_policy
	, const ViewType & arg_result
	, typename std::enable_if<Kokkos::is_view< ViewType >::value &&
	!Kokkos::is_reducer_type< ReducerType >::value
	, void*>::type = NULL)
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	, m_reducer( InvalidType() )
	, m_result_ptr( arg_result.ptr_on_device() )
	{ }

	inline
	ParallelReduce( const FunctorType & arg_functor
	, Policy arg_policy
	, const ReducerType& reducer )
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	, m_reducer( reducer )
	, m_result_ptr( reducer.result_view().data() )
	{ }
	};

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	template< class FunctorType , class ... Traits >
	class ParallelScan< FunctorType
	, Kokkos::RangePolicy< Traits ... >
	- , Kokkos::Qthread
	+ , Kokkos::Qthreads
	>
	{
	private:

	typedef Kokkos::RangePolicy< Traits ... > Policy ;

	typedef typename Policy::work_tag WorkTag ;
	typedef typename Policy::WorkRange WorkRange ;
	typedef typename Policy::member_type Member ;

	typedef Kokkos::Impl::FunctorValueTraits< FunctorType, WorkTag > ValueTraits ;
	typedef Kokkos::Impl::FunctorValueInit< FunctorType, WorkTag > ValueInit ;

	typedef typename ValueTraits::pointer_type pointer_type ;
	typedef typename ValueTraits::reference_type reference_type ;

	const FunctorType m_functor ;
	const Policy m_policy ;

	template< class TagType >
	inline static
	typename std::enable_if< std::is_same< TagType , void >::value >::type
	exec_range( const FunctorType & functor
	, const Member ibeg , const Member iend
	, reference_type update , const bool final )
	{
	for ( Member i = ibeg ; i < iend ; ++i ) {
	functor( i , update , final );
	}
	}

	template< class TagType >
	inline static
	typename std::enable_if< ! std::is_same< TagType , void >::value >::type
	exec_range( const FunctorType & functor
	, const Member ibeg , const Member iend
	, reference_type update , const bool final )
	{
	const TagType t{} ;
	for ( Member i = ibeg ; i < iend ; ++i ) {
	functor( t , i , update , final );
	}
	}

	- static void exec( QthreadExec & exec , const void * arg )
	+ static void exec( QthreadsExec & exec , const void * arg )
	{
	const ParallelScan & self = * ((const ParallelScan *) arg );

	const WorkRange range( self.m_policy , exec.worker_rank() , exec.worker_size() );

	// Initialize thread-local value
	reference_type update = ValueInit::init( self.m_functor , exec.exec_all_reduce_value() );

	ParallelScan::template exec_range< WorkTag >( self.m_functor, range.begin() , range.end() , update , false );

	exec.template exec_all_scan< FunctorType , typename Policy::work_tag >( self.m_functor );

	ParallelScan::template exec_range< WorkTag >( self.m_functor , range.begin() , range.end() , update , true );

	exec.exec_all_barrier();
	}

	public:

	inline
	void execute() const
	{
	- QthreadExec::resize_worker_scratch( ValueTraits::value_size( m_functor ) , 0 );
	- Impl::QthreadExec::exec_all( Qthread::instance() , & ParallelScan::exec , this );
	+ QthreadsExec::resize_worker_scratch( ValueTraits::value_size( m_functor ) , 0 );
	+ Impl::QthreadsExec::exec_all( Qthreads::instance() , & ParallelScan::exec , this );
	}

	ParallelScan( const FunctorType & arg_functor
	, const Policy & arg_policy
	)
	: m_functor( arg_functor )
	, m_policy( arg_policy )
	{
	}
	};

	} // namespace Impl

	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {

	template< typename iType >
	KOKKOS_INLINE_FUNCTION
	-Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadTeamPolicyMember >
	-TeamThreadRange( const Impl::QthreadTeamPolicyMember& thread, const iType& count )
	+Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadsTeamPolicyMember >
	+TeamThreadRange( const Impl::QthreadsTeamPolicyMember& thread, const iType& count )
	{
	- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadTeamPolicyMember >( thread, count );
	+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadsTeamPolicyMember >( thread, count );
	}

	template< typename iType1, typename iType2 >
	KOKKOS_INLINE_FUNCTION
	Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
	- Impl::QthreadTeamPolicyMember >
	-TeamThreadRange( const Impl::QthreadTeamPolicyMember& thread, const iType1 & begin, const iType2 & end )
	+ Impl::QthreadsTeamPolicyMember >
	+TeamThreadRange( const Impl::QthreadsTeamPolicyMember& thread, const iType1 & begin, const iType2 & end )
	{
	typedef typename std::common_type< iType1, iType2 >::type iType;
	- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadTeamPolicyMember >( thread, iType(begin), iType(end) );
	+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadsTeamPolicyMember >( thread, iType(begin), iType(end) );
	}

	template<typename iType>
	KOKKOS_INLINE_FUNCTION
	-Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >
	- ThreadVectorRange(const Impl::QthreadTeamPolicyMember& thread, const iType& count) {
	- return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >(thread,count);
	+Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember >
	+ ThreadVectorRange(const Impl::QthreadsTeamPolicyMember& thread, const iType& count) {
	+ return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember >(thread,count);
	}

	KOKKOS_INLINE_FUNCTION
	-Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember> PerTeam(const Impl::QthreadTeamPolicyMember& thread) {
	- return Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember>(thread);
	+Impl::ThreadSingleStruct<Impl::QthreadsTeamPolicyMember> PerTeam(const Impl::QthreadsTeamPolicyMember& thread) {
	+ return Impl::ThreadSingleStruct<Impl::QthreadsTeamPolicyMember>(thread);
	}

	KOKKOS_INLINE_FUNCTION
	-Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember> PerThread(const Impl::QthreadTeamPolicyMember& thread) {
	- return Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember>(thread);
	+Impl::VectorSingleStruct<Impl::QthreadsTeamPolicyMember> PerThread(const Impl::QthreadsTeamPolicyMember& thread) {
	+ return Impl::VectorSingleStruct<Impl::QthreadsTeamPolicyMember>(thread);
	}

	/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all threads of the the calling thread team.
	* This functionality requires C++11 support.*/
	template<typename iType, class Lambda>
	KOKKOS_INLINE_FUNCTION
	-void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>& loop_boundaries, const Lambda& lambda) {
	+void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember>& loop_boundaries, const Lambda& lambda) {
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
	lambda(i);
	}

	/** \brief Inter-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
	* val is performed and put into result. This functionality requires C++11 support.*/
	template< typename iType, class Lambda, typename ValueType >
	KOKKOS_INLINE_FUNCTION
	-void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>& loop_boundaries,
	+void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember>& loop_boundaries,
	const Lambda & lambda, ValueType& result) {

	result = ValueType();

	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	ValueType tmp = ValueType();
	lambda(i,tmp);
	result+=tmp;
	}

	result = loop_boundaries.thread.team_reduce(result,Impl::JoinAdd<ValueType>());
	}


	/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
	* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
	* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
	* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
	* '1 for '). This functionality requires C++11 support./
	template< typename iType, class Lambda, typename ValueType, class JoinType >
	KOKKOS_INLINE_FUNCTION
	-void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>& loop_boundaries,
	+void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember>& loop_boundaries,
	const Lambda & lambda, const JoinType& join, ValueType& init_result) {

	ValueType result = init_result;

	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	ValueType tmp = ValueType();
	lambda(i,tmp);
	join(result,tmp);
	}

	init_result = loop_boundaries.thread.team_reduce(result,Impl::JoinLambdaAdapter<ValueType,JoinType>(join));
	}

	/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
	* This functionality requires C++11 support.*/
	template<typename iType, class Lambda>
	KOKKOS_INLINE_FUNCTION
	-void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >&
	+void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember >&
	loop_boundaries, const Lambda& lambda) {
	#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	#pragma ivdep
	#endif
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
	lambda(i);
	}

	/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
	* val is performed and put into result. This functionality requires C++11 support.*/
	template< typename iType, class Lambda, typename ValueType >
	KOKKOS_INLINE_FUNCTION
	-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >&
	+void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember >&
	loop_boundaries, const Lambda & lambda, ValueType& result) {
	result = ValueType();
	#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	#pragma ivdep
	#endif
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	ValueType tmp = ValueType();
	lambda(i,tmp);
	result+=tmp;
	}
	}

	/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
	* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
	* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
	* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
	* '1 for '). This functionality requires C++11 support./
	template< typename iType, class Lambda, typename ValueType, class JoinType >
	KOKKOS_INLINE_FUNCTION
	-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >&
	+void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember >&
	loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& init_result) {

	ValueType result = init_result;
	#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	#pragma ivdep
	#endif
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	ValueType tmp = ValueType();
	lambda(i,tmp);
	join(result,tmp);
	}
	init_result = result;
	}

	/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
	* for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all vector lanes in the thread and a scan operation is performed.
	* Depending on the target execution space the operator might be called twice: once with final=false
	* and once with final=true. When final==true val contains the prefix sum value. The contribution of this
	* "i" needs to be added to val no matter whether final==true or not. In a serial execution
	* (i.e. team_size==1) the operator is only called once with final==true. Scan_val will be set
	* to the final sum value over all vector lanes.
	* This functionality requires C++11 support.*/
	template< typename iType, class FunctorType >
	KOKKOS_INLINE_FUNCTION
	-void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >&
	+void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember >&
	loop_boundaries, const FunctorType & lambda) {

	typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
	typedef typename ValueTraits::value_type value_type ;

	value_type scan_val = value_type();

	#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	#pragma ivdep
	#endif
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	lambda(i,scan_val,true);
	}
	}

	template<class FunctorType>
	KOKKOS_INLINE_FUNCTION
	-void single(const Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember>& single_struct, const FunctorType& lambda) {
	+void single(const Impl::VectorSingleStruct<Impl::QthreadsTeamPolicyMember>& single_struct, const FunctorType& lambda) {
	lambda();
	}

	template<class FunctorType>
	KOKKOS_INLINE_FUNCTION
	-void single(const Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember>& single_struct, const FunctorType& lambda) {
	+void single(const Impl::ThreadSingleStruct<Impl::QthreadsTeamPolicyMember>& single_struct, const FunctorType& lambda) {
	if(single_struct.team_member.team_rank()==0) lambda();
	}

	template<class FunctorType, class ValueType>
	KOKKOS_INLINE_FUNCTION
	-void single(const Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember>& single_struct, const FunctorType& lambda, ValueType& val) {
	+void single(const Impl::VectorSingleStruct<Impl::QthreadsTeamPolicyMember>& single_struct, const FunctorType& lambda, ValueType& val) {
	lambda(val);
	}

	template<class FunctorType, class ValueType>
	KOKKOS_INLINE_FUNCTION
	-void single(const Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember>& single_struct, const FunctorType& lambda, ValueType& val) {
	+void single(const Impl::ThreadSingleStruct<Impl::QthreadsTeamPolicyMember>& single_struct, const FunctorType& lambda, ValueType& val) {
	if(single_struct.team_member.team_rank()==0) {
	lambda(val);
	}
	single_struct.team_member.team_broadcast(val,0);
	}

	} // namespace Kokkos

	-#endif /* #define KOKKOS_QTHREAD_PARALLEL_HPP */
	+#endif /* #define KOKKOS_QTHREADS_PARALLEL_HPP */
	diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Task.cpp
	similarity index 56%
	copy from lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
	copy to lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Task.cpp
	index 5b3e9873e..614a2c03f 100644
	--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
	+++ b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Task.cpp
	@@ -1,329 +1,320 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Core.hpp>

	-#if defined( KOKKOS_ENABLE_OPENMP ) && defined( KOKKOS_ENABLE_TASKDAG )
	+#if defined( KOKKOS_ENABLE_QTHREADS ) && defined( KOKKOS_ENABLE_TASKPOLICY )

	#include <impl/Kokkos_TaskQueue_impl.hpp>

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	-template class TaskQueue< Kokkos::OpenMP > ;
	+template class TaskQueue< Kokkos::Qthreads > ;

	//----------------------------------------------------------------------------

	-TaskExec< Kokkos::OpenMP >::
	-TaskExec()
	- : m_self_exec( 0 )
	- , m_team_exec( 0 )
	- , m_sync_mask( 0 )
	- , m_sync_value( 0 )
	- , m_sync_step( 0 )
	- , m_group_rank( 0 )
	- , m_team_rank( 0 )
	- , m_team_size( 1 )
	-{
	-}
	-
	-TaskExec< Kokkos::OpenMP >::
	-TaskExec( Kokkos::Impl::OpenMPexec & arg_exec , int const arg_team_size )
	- : m_self_exec( & arg_exec )
	- , m_team_exec( arg_exec.pool_rev(arg_exec.pool_rank_rev() / arg_team_size) )
	- , m_sync_mask( 0 )
	- , m_sync_value( 0 )
	- , m_sync_step( 0 )
	- , m_group_rank( arg_exec.pool_rank_rev() / arg_team_size )
	- , m_team_rank( arg_exec.pool_rank_rev() % arg_team_size )
	- , m_team_size( arg_team_size )
	+TaskExec< Kokkos::Qthreads >::TaskExec()
	+ : m_self_exec( 0 ),
	+ m_team_exec( 0 ),
	+ m_sync_mask( 0 ),
	+ m_sync_value( 0 ),
	+ m_sync_step( 0 ),
	+ m_group_rank( 0 ),
	+ m_team_rank( 0 ),
	+ m_team_size( 1 )
	+{}
	+
	+TaskExec< Kokkos::Qthreads >::
	+TaskExec( Kokkos::Impl::QthreadsExec & arg_exec, int const arg_team_size )
	+ : m_self_exec( & arg_exec ),
	+ m_team_exec( arg_exec.pool_rev(arg_exec.pool_rank_rev() / arg_team_size) ),
	+ m_sync_mask( 0 ),
	+ m_sync_value( 0 ),
	+ m_sync_step( 0 ),
	+ m_group_rank( arg_exec.pool_rank_rev() / arg_team_size ),
	+ m_team_rank( arg_exec.pool_rank_rev() % arg_team_size ),
	+ m_team_size( arg_team_size )
	{
	// This team spans
	// m_self_exec->pool_rev( team_size * group_rank )
	// m_self_exec->pool_rev( team_size * ( group_rank + 1 ) - 1 )

	int64_t volatile * const sync = (int64_t *) m_self_exec->scratch_reduce();

	sync[0] = int64_t(0) ;
	sync[1] = int64_t(0) ;

	for ( int i = 0 ; i < m_team_size ; ++i ) {
	m_sync_value \|= int64_t(1) << (8*i);
	m_sync_mask \|= int64_t(3) << (8*i);
	}

	Kokkos::memory_fence();
	}

	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )

	-void TaskExec< Kokkos::OpenMP >::team_barrier_impl() const
	+void TaskExec< Kokkos::Qthreads >::team_barrier() const
	{
	- if ( m_team_exec->scratch_reduce_size() < int(2 * sizeof(int64_t)) ) {
	- Kokkos::abort("TaskQueue<OpenMP> scratch_reduce memory too small");
	- }
	+ if ( 1 < m_team_size ) {

	- // Use team shared memory to synchronize.
	- // Alternate memory locations between barriers to avoid a sequence
	- // of barriers overtaking one another.
	+ if ( m_team_exec->scratch_reduce_size() < int(2 * sizeof(int64_t)) ) {
	+ Kokkos::abort("TaskQueue<Qthreads> scratch_reduce memory too small");
	+ }

	- int64_t volatile * const sync =
	- ((int64_t *) m_team_exec->scratch_reduce()) + ( m_sync_step & 0x01 );
	+ // Use team shared memory to synchronize.
	+ // Alternate memory locations between barriers to avoid a sequence
	+ // of barriers overtaking one another.

	- // This team member sets one byte within the sync variable
	- int8_t volatile * const sync_self =
	- ((int8_t *) sync) + m_team_rank ;
	+ int64_t volatile * const sync =
	+ ((int64_t *) m_team_exec->scratch_reduce()) + ( m_sync_step & 0x01 );
	+
	+ // This team member sets one byte within the sync variable
	+ int8_t volatile * const sync_self =
	+ ((int8_t *) sync) + m_team_rank ;

	#if 0
	-fprintf( stdout
	- , "barrier group(%d) member(%d) step(%d) wait(%lx) : before(%lx)\n"
	- , m_group_rank
	- , m_team_rank
	- , m_sync_step
	- , m_sync_value
	- , *sync
	+fprintf( stdout,
	+ "barrier group(%d) member(%d) step(%d) wait(%lx) : before(%lx)\n",
	+ m_group_rank,
	+ m_team_rank,
	+ m_sync_step,
	+ m_sync_value,
	+ *sync
	);
	fflush(stdout);
	#endif

	- *sync_self = int8_t( m_sync_value & 0x03 ); // signal arrival
	+ *sync_self = int8_t( m_sync_value & 0x03 ); // signal arrival

	- while ( m_sync_value != *sync ); // wait for team to arrive
	+ while ( m_sync_value != *sync ); // wait for team to arrive

	#if 0
	-fprintf( stdout
	- , "barrier group(%d) member(%d) step(%d) wait(%lx) : after(%lx)\n"
	- , m_group_rank
	- , m_team_rank
	- , m_sync_step
	- , m_sync_value
	- , *sync
	+fprintf( stdout,
	+ "barrier group(%d) member(%d) step(%d) wait(%lx) : after(%lx)\n",
	+ m_group_rank,
	+ m_team_rank,
	+ m_sync_step,
	+ m_sync_value,
	+ *sync
	);
	fflush(stdout);
	#endif

	- ++m_sync_step ;
	+ ++m_sync_step ;

	- if ( 0 == ( 0x01 & m_sync_step ) ) { // Every other step
	- m_sync_value ^= m_sync_mask ;
	- if ( 1000 < m_sync_step ) m_sync_step = 0 ;
	+ if ( 0 == ( 0x01 & m_sync_step ) ) { // Every other step
	+ m_sync_value ^= m_sync_mask ;
	+ if ( 1000 < m_sync_step ) m_sync_step = 0 ;
	+ }
	}
	}

	#endif

	//----------------------------------------------------------------------------

	-void TaskQueueSpecialization< Kokkos::OpenMP >::execute
	- ( TaskQueue< Kokkos::OpenMP > * const queue )
	+void TaskQueueSpecialization< Kokkos::Qthreads >::execute
	+ ( TaskQueue< Kokkos::Qthreads > * const queue )
	{
	- using execution_space = Kokkos::OpenMP ;
	+ using execution_space = Kokkos::Qthreads ;
	using queue_type = TaskQueue< execution_space > ;
	- using task_root_type = TaskBase< execution_space , void , void > ;
	- using PoolExec = Kokkos::Impl::OpenMPexec ;
	+ using task_root_type = TaskBase< execution_space, void, void > ;
	+ using PoolExec = Kokkos::Impl::QthreadsExec ;
	using Member = TaskExec< execution_space > ;

	task_root_type * const end = (task_root_type *) task_root_type::EndTag ;

	// Required: team_size <= 8

	const int team_size = PoolExec::pool_size(2); // Threads per core
	// const int team_size = PoolExec::pool_size(1); // Threads per NUMA

	if ( 8 < team_size ) {
	- Kokkos::abort("TaskQueue<OpenMP> unsupported team size");
	+ Kokkos::abort("TaskQueue<Qthreads> unsupported team size");
	}

	#pragma omp parallel
	{
	PoolExec & self = *PoolExec::get_thread_omp();

	Member single_exec ;
	- Member team_exec( self , team_size );
	+ Member team_exec( self, team_size );

	// Team shared memory
	task_root_type * volatile * const task_shared =
	(task_root_type **) team_exec.m_team_exec->scratch_thread();

	-// Barrier across entire OpenMP thread pool to insure initialization
	+// Barrier across entire Qthreads thread pool to insure initialization
	#pragma omp barrier

	// Loop until all queues are empty and no tasks in flight

	do {

	- task_root_type * task = 0 ;
	-
	// Each team lead attempts to acquire either a thread team task
	- // or a single thread task for the team.
	+ // or collection of single thread tasks for the team.

	if ( 0 == team_exec.team_rank() ) {

	- task = 0 < ((volatile int ) & queue->m_ready_count) ? end : 0 ;
	+ task_root_type * tmp =
	+ 0 < ((volatile int ) & queue->m_ready_count) ? end : 0 ;

	// Loop by priority and then type
	- for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
	- for ( int j = 0 ; j < 2 && end == task ; ++j ) {
	- task = queue_type::pop_task( & queue->m_ready[i][j] );
	+ for ( int i = 0 ; i < queue_type::NumQueue && end == tmp ; ++i ) {
	+ for ( int j = 0 ; j < 2 && end == tmp ; ++j ) {
	+ tmp = queue_type::pop_task( & queue->m_ready[i][j] );
	}
	}
	- }
	-
	- // Team lead broadcast acquired task to team members:
	-
	- if ( 1 < team_exec.team_size() ) {

	- if ( 0 == team_exec.team_rank() ) *task_shared = task ;
	+ *task_shared = tmp ;

	- // Fence to be sure task_shared is stored before the barrier
	+ // Fence to be sure shared_task_array is stored
	Kokkos::memory_fence();
	+ }

	- // Whole team waits for every team member to reach this statement
	- team_exec.team_barrier();
	+ // Whole team waits for every team member to reach this statement
	+ team_exec.team_barrier();

	- // Fence to be sure task_shared is stored
	- Kokkos::memory_fence();
	+ Kokkos::memory_fence();

	- task = *task_shared ;
	- }
	+ task_root_type * const task = *task_shared ;

	#if 0
	-fprintf( stdout
	- , "\nexecute group(%d) member(%d) task_shared(0x%lx) task(0x%lx)\n"
	- , team_exec.m_group_rank
	- , team_exec.m_team_rank
	- , uintptr_t(task_shared)
	- , uintptr_t(task)
	+fprintf( stdout,
	+ "\nexecute group(%d) member(%d) task_shared(0x%lx) task(0x%lx)\n",
	+ team_exec.m_group_rank,
	+ team_exec.m_team_rank,
	+ uintptr_t(task_shared),
	+ uintptr_t(task)
	);
	fflush(stdout);
	#endif

	if ( 0 == task ) break ; // 0 == m_ready_count

	if ( end == task ) {
	- // All team members wait for whole team to reach this statement.
	- // Is necessary to prevent task_shared from being updated
	- // before it is read by all threads.
	team_exec.team_barrier();
	}
	else if ( task_root_type::TaskTeam == task->m_task_type ) {
	// Thread Team Task
	- (*task->m_apply)( task , & team_exec );
	+ (*task->m_apply)( task, & team_exec );

	// The m_apply function performs a barrier

	if ( 0 == team_exec.team_rank() ) {
	// team member #0 completes the task, which may delete the task
	- queue->complete( task );
	+ queue->complete( task );
	}
	}
	else {
	// Single Thread Task

	if ( 0 == team_exec.team_rank() ) {

	- (*task->m_apply)( task , & single_exec );
	+ (*task->m_apply)( task, & single_exec );

	- queue->complete( task );
	+ queue->complete( task );
	}

	// All team members wait for whole team to reach this statement.
	// Not necessary to complete the task.
	// Is necessary to prevent task_shared from being updated
	// before it is read by all threads.
	team_exec.team_barrier();
	}
	} while(1);
	}
	// END #pragma omp parallel

	}

	-void TaskQueueSpecialization< Kokkos::OpenMP >::
	+void TaskQueueSpecialization< Kokkos::Qthreads >::
	iff_single_thread_recursive_execute
	- ( TaskQueue< Kokkos::OpenMP > * const queue )
	+ ( TaskQueue< Kokkos::Qthreads > * const queue )
	{
	- using execution_space = Kokkos::OpenMP ;
	+ using execution_space = Kokkos::Qthreads ;
	using queue_type = TaskQueue< execution_space > ;
	- using task_root_type = TaskBase< execution_space , void , void > ;
	+ using task_root_type = TaskBase< execution_space, void, void > ;
	using Member = TaskExec< execution_space > ;

	if ( 1 == omp_get_num_threads() ) {

	task_root_type * const end = (task_root_type *) task_root_type::EndTag ;

	Member single_exec ;

	task_root_type * task = end ;

	do {

	task = end ;

	// Loop by priority and then type
	for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
	for ( int j = 0 ; j < 2 && end == task ; ++j ) {
	task = queue_type::pop_task( & queue->m_ready[i][j] );
	}
	}

	if ( end == task ) break ;

	- (*task->m_apply)( task , & single_exec );
	+ (*task->m_apply)( task, & single_exec );

	- queue->complete( task );
	+ queue->complete( task );

	} while(1);
	}
	}

	}} /* namespace Kokkos::Impl */

	//----------------------------------------------------------------------------

	-#endif /* #if defined( KOKKOS_ENABLE_OPENMP ) && defined( KOKKOS_ENABLE_TASKDAG ) */
	+#endif /* #if defined( KOKKOS_ENABLE_QTHREADS ) && defined( KOKKOS_ENABLE_TASKPOLICY ) */


	diff --git a/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Task.hpp b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Task.hpp
	new file mode 100644
	index 000000000..836452dde
	--- /dev/null
	+++ b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Task.hpp
	@@ -0,0 +1,156 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+#ifndef KOKKOS_IMPL_QTHREADS_TASK_HPP
	+#define KOKKOS_IMPL_QTHREADS_TASK_HPP
	+
	+#if defined( KOKKOS_ENABLE_TASKPOLICY )
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+namespace Impl {
	+
	+template<>
	+class TaskQueueSpecialization< Kokkos::Qthreads >
	+{
	+public:
	+
	+ using execution_space = Kokkos::Qthreads ;
	+ using queue_type = Kokkos::Impl::TaskQueue< execution_space > ;
	+ using task_base_type = Kokkos::Impl::TaskBase< execution_space, void, void > ;
	+
	+ // Must specify memory space
	+ using memory_space = Kokkos::HostSpace ;
	+
	+ static
	+ void iff_single_thread_recursive_execute( queue_type * const );
	+
	+ // Must provide task queue execution function
	+ static void execute( queue_type * const );
	+
	+ // Must provide mechanism to set function pointer in
	+ // execution space from the host process.
	+ template< typename FunctorType >
	+ static
	+ void proc_set_apply( task_base_type::function_type * ptr )
	+ {
	+ using TaskType = TaskBase< execution_space,
	+ typename FunctorType::value_type,
	+ FunctorType
	+ > ;
	+ *ptr = TaskType::apply ;
	+ }
	+};
	+
	+extern template class TaskQueue< Kokkos::Qthreads > ;
	+
	+//----------------------------------------------------------------------------
	+
	+template<>
	+class TaskExec< Kokkos::Qthreads >
	+{
	+private:
	+
	+ TaskExec( TaskExec && ) = delete ;
	+ TaskExec( TaskExec const & ) = delete ;
	+ TaskExec & operator = ( TaskExec && ) = delete ;
	+ TaskExec & operator = ( TaskExec const & ) = delete ;
	+
	+
	+ using PoolExec = Kokkos::Impl::QthreadsExec ;
	+
	+ friend class Kokkos::Impl::TaskQueue< Kokkos::Qthreads > ;
	+ friend class Kokkos::Impl::TaskQueueSpecialization< Kokkos::Qthreads > ;
	+
	+ PoolExec * const m_self_exec ; ///< This thread's thread pool data structure
	+ PoolExec * const m_team_exec ; ///< Team thread's thread pool data structure
	+ int64_t m_sync_mask ;
	+ int64_t mutable m_sync_value ;
	+ int mutable m_sync_step ;
	+ int m_group_rank ; ///< Which "team" subset of thread pool
	+ int m_team_rank ; ///< Which thread within a team
	+ int m_team_size ;
	+
	+ TaskExec();
	+ TaskExec( PoolExec & arg_exec, int arg_team_size );
	+
	+public:
	+
	+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+ void * team_shared() const
	+ { return m_team_exec ? m_team_exec->scratch_thread() : (void*) 0 ; }
	+
	+ int team_shared_size() const
	+ { return m_team_exec ? m_team_exec->scratch_thread_size() : 0 ; }
	+
	+ /**\brief Whole team enters this function call
	+ * before any teeam member returns from
	+ * this function call.
	+ */
	+ void team_barrier() const ;
	+#else
	+ KOKKOS_INLINE_FUNCTION void team_barrier() const {}
	+ KOKKOS_INLINE_FUNCTION void * team_shared() const { return 0 ; }
	+ KOKKOS_INLINE_FUNCTION int team_shared_size() const { return 0 ; }
	+#endif
	+
	+ KOKKOS_INLINE_FUNCTION
	+ int team_rank() const { return m_team_rank ; }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ int team_size() const { return m_team_size ; }
	+};
	+
	+}} /* namespace Kokkos::Impl */
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
	+#endif /* #ifndef KOKKOS_IMPL_QTHREADS_TASK_HPP */
	+
	diff --git a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskPolicy.cpp.old
	similarity index 91%
	rename from lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp
	rename to lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskPolicy.cpp.old
	index 50444177c..aa159cff6 100644
	--- a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp
	+++ b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskPolicy.cpp.old
	@@ -1,491 +1,488 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	-// Experimental unified task-data parallel manycore LDRD
	+// Experimental unified task-data parallel manycore LDRD.

	#include <Kokkos_Core_fwd.hpp>

	-#if defined( KOKKOS_ENABLE_QTHREAD )
	+#if defined( KOKKOS_ENABLE_QTHREADS )

	#include <stdio.h>

	#include <stdlib.h>
	#include <stdexcept>
	#include <iostream>
	#include <sstream>
	#include <string>

	#include <Kokkos_Atomic.hpp>
	-#include <Qthread/Kokkos_Qthread_TaskPolicy.hpp>
	+#include <Qthreads/Kokkos_Qthreads_TaskPolicy.hpp>

	#if defined( KOKKOS_ENABLE_TASKDAG )

	-//----------------------------------------------------------------------------
	-
	namespace Kokkos {
	namespace Experimental {
	namespace Impl {

	-typedef TaskMember< Kokkos::Qthread , void , void > Task ;
	+typedef TaskMember< Kokkos::Qthreads , void , void > Task ;

	namespace {

	inline
	unsigned padded_sizeof_derived( unsigned sizeof_derived )
	{
	return sizeof_derived +
	( sizeof_derived % sizeof(Task) ? sizeof(Task) - sizeof_derived % sizeof(Task*) : 0 );
	}

	// int lock_alloc_dealloc = 0 ;

	} // namespace

	void Task::deallocate( void * ptr )
	{
	// Counting on 'free' thread safety so lock/unlock not required.
	// However, isolate calls here to mitigate future need to introduce lock/unlock.

	// lock

	// while ( ! Kokkos::atomic_compare_exchange_strong( & lock_alloc_dealloc , 0 , 1 ) );

	free( ptr );

	// unlock

	// Kokkos::atomic_compare_exchange_strong( & lock_alloc_dealloc , 1 , 0 );
	}

	void * Task::allocate( const unsigned arg_sizeof_derived
	, const unsigned arg_dependence_capacity )
	{
	// Counting on 'malloc' thread safety so lock/unlock not required.
	// However, isolate calls here to mitigate future need to introduce lock/unlock.

	// lock

	// while ( ! Kokkos::atomic_compare_exchange_strong( & lock_alloc_dealloc , 0 , 1 ) );

	void * const ptr = malloc( padded_sizeof_derived( arg_sizeof_derived ) + arg_dependence_capacity * sizeof(Task*) );

	// unlock

	// Kokkos::atomic_compare_exchange_strong( & lock_alloc_dealloc , 1 , 0 );

	return ptr ;
	}

	Task::~TaskMember()
	{

	}


	Task::TaskMember( const function_verify_type arg_verify
	, const function_dealloc_type arg_dealloc
	, const function_single_type arg_apply_single
	, const function_team_type arg_apply_team
	, volatile int & arg_active_count
	, const unsigned arg_sizeof_derived
	, const unsigned arg_dependence_capacity
	)
	: m_dealloc( arg_dealloc )
	, m_verify( arg_verify )
	, m_apply_single( arg_apply_single )
	, m_apply_team( arg_apply_team )
	, m_active_count( & arg_active_count )
	, m_qfeb(0)
	, m_dep( (Task *)( ((unsigned char ) this) + padded_sizeof_derived( arg_sizeof_derived ) ) )
	, m_dep_capacity( arg_dependence_capacity )
	, m_dep_size( 0 )
	, m_ref_count( 0 )
	, m_state( Kokkos::Experimental::TASK_STATE_CONSTRUCTING )
	{
	qthread_empty( & m_qfeb ); // Set to full when complete
	for ( unsigned i = 0 ; i < arg_dependence_capacity ; ++i ) m_dep[i] = 0 ;
	}

	Task::TaskMember( const function_dealloc_type arg_dealloc
	, const function_single_type arg_apply_single
	, const function_team_type arg_apply_team
	, volatile int & arg_active_count
	, const unsigned arg_sizeof_derived
	, const unsigned arg_dependence_capacity
	)
	: m_dealloc( arg_dealloc )
	, m_verify( & Task::verify_type<void> )
	, m_apply_single( arg_apply_single )
	, m_apply_team( arg_apply_team )
	, m_active_count( & arg_active_count )
	, m_qfeb(0)
	, m_dep( (Task *)( ((unsigned char ) this) + padded_sizeof_derived( arg_sizeof_derived ) ) )
	, m_dep_capacity( arg_dependence_capacity )
	, m_dep_size( 0 )
	, m_ref_count( 0 )
	, m_state( Kokkos::Experimental::TASK_STATE_CONSTRUCTING )
	{
	qthread_empty( & m_qfeb ); // Set to full when complete
	for ( unsigned i = 0 ; i < arg_dependence_capacity ; ++i ) m_dep[i] = 0 ;
	}

	//----------------------------------------------------------------------------

	void Task::throw_error_add_dependence() const
	{
	- std::cerr << "TaskMember< Qthread >::add_dependence ERROR"
	+ std::cerr << "TaskMember< Qthreads >::add_dependence ERROR"
	<< " state(" << m_state << ")"
	<< " dep_size(" << m_dep_size << ")"
	<< std::endl ;
	- throw std::runtime_error("TaskMember< Qthread >::add_dependence ERROR");
	+ throw std::runtime_error("TaskMember< Qthreads >::add_dependence ERROR");
	}

	void Task::throw_error_verify_type()
	{
	- throw std::runtime_error("TaskMember< Qthread >::verify_type ERROR");
	+ throw std::runtime_error("TaskMember< Qthreads >::verify_type ERROR");
	}

	//----------------------------------------------------------------------------

	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	void Task::assign( Task ** const lhs , Task * rhs , const bool no_throw )
	{
	- static const char msg_error_header[] = "Kokkos::Impl::TaskManager<Kokkos::Qthread>::assign ERROR" ;
	+ static const char msg_error_header[] = "Kokkos::Impl::TaskManager<Kokkos::Qthreads>::assign ERROR" ;
	static const char msg_error_count[] = ": negative reference count" ;
	static const char msg_error_complete[] = ": destroy task that is not complete" ;
	static const char msg_error_dependences[] = ": destroy task that has dependences" ;
	static const char msg_error_exception[] = ": caught internal exception" ;

	if ( rhs ) { Kokkos::atomic_increment( &(*rhs).m_ref_count ); }

	Task * const lhs_val = Kokkos::atomic_exchange( lhs , rhs );

	if ( lhs_val ) {

	const int count = Kokkos::atomic_fetch_add( & (*lhs_val).m_ref_count , -1 );

	const char * msg_error = 0 ;

	try {

	if ( 1 == count ) {

	// Reference count at zero, delete it

	// Should only be deallocating a completed task
	if ( (*lhs_val).m_state == Kokkos::Experimental::TASK_STATE_COMPLETE ) {

	// A completed task should not have dependences...
	for ( int i = 0 ; i < (*lhs_val).m_dep_size && 0 == msg_error ; ++i ) {
	if ( (*lhs_val).m_dep[i] ) msg_error = msg_error_dependences ;
	}
	}
	else {
	msg_error = msg_error_complete ;
	}

	if ( 0 == msg_error ) {
	// Get deletion function and apply it
	const Task::function_dealloc_type d = (*lhs_val).m_dealloc ;

	(*d)( lhs_val );
	}
	}
	else if ( count <= 0 ) {
	msg_error = msg_error_count ;
	}
	}
	catch( ... ) {
	if ( 0 == msg_error ) msg_error = msg_error_exception ;
	}

	if ( 0 != msg_error ) {
	if ( no_throw ) {
	std::cerr << msg_error_header << msg_error << std::endl ;
	std::cerr.flush();
	}
	else {
	std::string msg(msg_error_header);
	msg.append(msg_error);
	throw std::runtime_error( msg );
	}
	}
	}
	}
	#endif


	//----------------------------------------------------------------------------

	void Task::closeout()
	{
	enum { RESPAWN = int( Kokkos::Experimental::TASK_STATE_WAITING ) \|
	int( Kokkos::Experimental::TASK_STATE_EXECUTING ) };

	#if 0
	fprintf( stdout
	, "worker(%d.%d) task 0x%.12lx %s\n"
	, qthread_shep()
	, qthread_worker_local(NULL)
	, reinterpret_cast<unsigned long>(this)
	, ( m_state == RESPAWN ? "respawn" : "complete" )
	);
	fflush(stdout);
	#endif

	// When dependent tasks run there would be a race
	// condition between destroying this task and
	// querying the active count pointer from this task.
	int volatile * const active_count = m_active_count ;

	if ( m_state == RESPAWN ) {
	// Task requests respawn, set state to waiting and reschedule the task
	m_state = Kokkos::Experimental::TASK_STATE_WAITING ;
	schedule();
	}
	else {

	// Task did not respawn, is complete
	m_state = Kokkos::Experimental::TASK_STATE_COMPLETE ;

	// Release dependences before allowing dependent tasks to run.
	// Otherwise there is a thread race condition for removing dependences.
	for ( int i = 0 ; i < m_dep_size ; ++i ) {
	assign( & m_dep[i] , 0 );
	}

	- // Set qthread FEB to full so that dependent tasks are allowed to execute.
	+ // Set Qthreads FEB to full so that dependent tasks are allowed to execute.
	// This 'task' may be deleted immediately following this function call.
	qthread_fill( & m_qfeb );

	// The dependent task could now complete and destroy 'this' task
	// before the call to 'qthread_fill' returns. Therefore, for
	// thread safety assume that 'this' task has now been destroyed.
	}

	// Decrement active task count before returning.
	Kokkos::atomic_decrement( active_count );
	}

	aligned_t Task::qthread_func( void * arg )
	{
	Task * const task = reinterpret_cast< Task * >(arg);

	// First member of the team change state to executing.
	// Use compare-exchange to avoid race condition with a respawn.
	Kokkos::atomic_compare_exchange_strong( & task->m_state
	, int(Kokkos::Experimental::TASK_STATE_WAITING)
	, int(Kokkos::Experimental::TASK_STATE_EXECUTING)
	);

	if ( task->m_apply_team && ! task->m_apply_single ) {
	- Kokkos::Impl::QthreadTeamPolicyMember::TaskTeam task_team_tag ;
	+ Kokkos::Impl::QthreadsTeamPolicyMember::TaskTeam task_team_tag ;

	// Initialize team size and rank with shephered info
	- Kokkos::Impl::QthreadTeamPolicyMember member( task_team_tag );
	+ Kokkos::Impl::QthreadsTeamPolicyMember member( task_team_tag );

	(*task->m_apply_team)( task , member );

	#if 0
	fprintf( stdout
	, "worker(%d.%d) task 0x%.12lx executed by member(%d:%d)\n"
	, qthread_shep()
	, qthread_worker_local(NULL)
	, reinterpret_cast<unsigned long>(task)
	, member.team_rank()
	, member.team_size()
	);
	fflush(stdout);
	#endif

	member.team_barrier();
	if ( member.team_rank() == 0 ) task->closeout();
	member.team_barrier();
	}
	else if ( task->m_apply_team && task->m_apply_single == reinterpret_cast<function_single_type>(1) ) {
	// Team hard-wired to one, no cloning
	- Kokkos::Impl::QthreadTeamPolicyMember member ;
	+ Kokkos::Impl::QthreadsTeamPolicyMember member ;
	(*task->m_apply_team)( task , member );
	task->closeout();
	}
	else {
	(*task->m_apply_single)( task );
	task->closeout();
	}

	#if 0
	fprintf( stdout
	, "worker(%d.%d) task 0x%.12lx return\n"
	, qthread_shep()
	, qthread_worker_local(NULL)
	, reinterpret_cast<unsigned long>(task)
	);
	fflush(stdout);
	#endif

	return 0 ;
	}

	void Task::respawn()
	{
	// Change state from pure executing to ( waiting \| executing )
	// to avoid confusion with simply waiting.
	Kokkos::atomic_compare_exchange_strong( & m_state
	, int(Kokkos::Experimental::TASK_STATE_EXECUTING)
	, int(Kokkos::Experimental::TASK_STATE_WAITING \|
	Kokkos::Experimental::TASK_STATE_EXECUTING)
	);
	}

	void Task::schedule()
	{
	// Is waiting for execution

	// Increment active task count before spawning.
	Kokkos::atomic_increment( m_active_count );

	- // spawn in qthread. must malloc the precondition array and give to qthread.
	- // qthread will eventually free this allocation so memory will not be leaked.
	+ // spawn in Qthreads. must malloc the precondition array and give to Qthreads.
	+ // Qthreads will eventually free this allocation so memory will not be leaked.

	// concern with thread safety of malloc, does this need to be guarded?
	aligned_t qprecon = (aligned_t ) malloc( ( m_dep_size + 1 ) * sizeof(aligned_t *) );

	qprecon[0] = reinterpret_cast<aligned_t *>( uintptr_t(m_dep_size) );

	for ( int i = 0 ; i < m_dep_size ; ++i ) {
	- qprecon[i+1] = & m_dep[i]->m_qfeb ; // Qthread precondition flag
	+ qprecon[i+1] = & m_dep[i]->m_qfeb ; // Qthreads precondition flag
	}

	if ( m_apply_team && ! m_apply_single ) {
	// If more than one shepherd spawn on a shepherd other than this shepherd
	const int num_shepherd = qthread_num_shepherds();
	const int num_worker_per_shepherd = qthread_num_workers_local(NO_SHEPHERD);
	const int this_shepherd = qthread_shep();

	int spawn_shepherd = ( this_shepherd + 1 ) % num_shepherd ;

	#if 0
	fprintf( stdout
	, "worker(%d.%d) task 0x%.12lx spawning on shepherd(%d) clone(%d)\n"
	, qthread_shep()
	, qthread_worker_local(NULL)
	, reinterpret_cast<unsigned long>(this)
	, spawn_shepherd
	, num_worker_per_shepherd - 1
	);
	fflush(stdout);
	#endif

	qthread_spawn_cloneable
	( & Task::qthread_func
	, this
	, 0
	, NULL
	, m_dep_size , qprecon /* dependences */
	, spawn_shepherd
	, unsigned( QTHREAD_SPAWN_SIMPLE \| QTHREAD_SPAWN_LOCAL_PRIORITY )
	, num_worker_per_shepherd - 1
	);
	}
	else {
	qthread_spawn( & Task::qthread_func /* function */
	, this /* function argument */
	, 0
	, NULL
	, m_dep_size , qprecon /* dependences */
	, NO_SHEPHERD
	, QTHREAD_SPAWN_SIMPLE /* allows optimization for non-blocking task */
	);
	}
	}

	} // namespace Impl
	} // namespace Experimental
	} // namespace Kokkos

	namespace Kokkos {
	namespace Experimental {

	-TaskPolicy< Kokkos::Qthread >::
	+TaskPolicy< Kokkos::Qthreads >::
	TaskPolicy
	( const unsigned /* arg_task_max_count */
	, const unsigned /* arg_task_max_size */
	, const unsigned arg_task_default_dependence_capacity
	, const unsigned arg_task_team_size
	)
	: m_default_dependence_capacity( arg_task_default_dependence_capacity )
	, m_team_size( arg_task_team_size != 0 ? arg_task_team_size : unsigned(qthread_num_workers_local(NO_SHEPHERD)) )
	, m_active_count_root(0)
	, m_active_count( m_active_count_root )
	{
	const unsigned num_worker_per_shepherd = unsigned( qthread_num_workers_local(NO_SHEPHERD) );

	if ( m_team_size != 1 && m_team_size != num_worker_per_shepherd ) {
	std::ostringstream msg ;
	- msg << "Kokkos::Experimental::TaskPolicy< Kokkos::Qthread >( "
	+ msg << "Kokkos::Experimental::TaskPolicy< Kokkos::Qthreads >( "
	<< "default_depedence = " << arg_task_default_dependence_capacity
	<< " , team_size = " << arg_task_team_size
	<< " ) ERROR, valid team_size arguments are { (omitted) , 1 , " << num_worker_per_shepherd << " }" ;
	Kokkos::Impl::throw_runtime_exception(msg.str());
	}
	}

	-TaskPolicy< Kokkos::Qthread >::member_type &
	-TaskPolicy< Kokkos::Qthread >::member_single()
	+TaskPolicy< Kokkos::Qthreads >::member_type &
	+TaskPolicy< Kokkos::Qthreads >::member_single()
	{
	static member_type s ;
	return s ;
	}

	-void wait( Kokkos::Experimental::TaskPolicy< Kokkos::Qthread > & policy )
	+void wait( Kokkos::Experimental::TaskPolicy< Kokkos::Qthreads > & policy )
	{
	volatile int * const active_task_count = & policy.m_active_count ;
	while ( *active_task_count ) qthread_yield();
	}

	} // namespace Experimental
	} // namespace Kokkos

	-#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
	-#endif /* #if defined( KOKKOS_ENABLE_QTHREAD ) */
	-
	+#endif // #if defined( KOKKOS_ENABLE_TASKDAG )
	+#endif // #if defined( KOKKOS_ENABLE_QTHREADS )
	diff --git a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.hpp b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskPolicy.hpp.old
	similarity index 90%
	rename from lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.hpp
	rename to lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskPolicy.hpp.old
	index 565dbf7e6..1e5a4dc59 100644
	--- a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.hpp
	+++ b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskPolicy.hpp.old
	@@ -1,664 +1,664 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	// Experimental unified task-data parallel manycore LDRD

	-#ifndef KOKKOS_QTHREAD_TASKSCHEDULER_HPP
	-#define KOKKOS_QTHREAD_TASKSCHEDULER_HPP
	+#ifndef KOKKOS_QTHREADS_TASKSCHEDULER_HPP
	+#define KOKKOS_QTHREADS_TASKSCHEDULER_HPP

	#include <string>
	#include <typeinfo>
	#include <stdexcept>

	//----------------------------------------------------------------------------
	-// Defines to enable experimental Qthread functionality
	+// Defines to enable experimental Qthreads functionality

	#define QTHREAD_LOCAL_PRIORITY
	#define CLONED_TASKS

	#include <qthread.h>

	#undef QTHREAD_LOCAL_PRIORITY
	#undef CLONED_TASKS

	//----------------------------------------------------------------------------

	-#include <Kokkos_Qthread.hpp>
	+#include <Kokkos_Qthreads.hpp>
	#include <Kokkos_TaskScheduler.hpp>
	#include <Kokkos_View.hpp>

	#include <impl/Kokkos_FunctorAdapter.hpp>

	#if defined( KOKKOS_ENABLE_TASKDAG )

	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Experimental {
	namespace Impl {

	template<>
	-class TaskMember< Kokkos::Qthread , void , void >
	+class TaskMember< Kokkos::Qthreads , void , void >
	{
	public:

	typedef TaskMember * (* function_verify_type) ( TaskMember * );
	typedef void (* function_single_type) ( TaskMember * );
	- typedef void (* function_team_type) ( TaskMember * , Kokkos::Impl::QthreadTeamPolicyMember & );
	+ typedef void (* function_team_type) ( TaskMember * , Kokkos::Impl::QthreadsTeamPolicyMember & );
	typedef void (* function_dealloc_type)( TaskMember * );

	private:

	const function_dealloc_type m_dealloc ; ///< Deallocation
	const function_verify_type m_verify ; ///< Result type verification
	const function_single_type m_apply_single ; ///< Apply function
	const function_team_type m_apply_team ; ///< Apply function
	int volatile * const m_active_count ; ///< Count of active tasks on this policy
	- aligned_t m_qfeb ; ///< Qthread full/empty bit
	+ aligned_t m_qfeb ; ///< Qthreads full/empty bit
	TaskMember ** const m_dep ; ///< Dependences
	const int m_dep_capacity ; ///< Capacity of dependences
	int m_dep_size ; ///< Actual count of dependences
	int m_ref_count ; ///< Reference count
	int m_state ; ///< State of the task

	TaskMember() /* = delete */ ;
	TaskMember( const TaskMember & ) /* = delete */ ;
	TaskMember & operator = ( const TaskMember & ) /* = delete */ ;

	static aligned_t qthread_func( void * arg );

	static void * allocate( const unsigned arg_sizeof_derived , const unsigned arg_dependence_capacity );
	static void deallocate( void * );

	void throw_error_add_dependence() const ;
	static void throw_error_verify_type();

	template < class DerivedTaskType >
	static
	void deallocate( TaskMember * t )
	{
	DerivedTaskType * ptr = static_cast< DerivedTaskType * >(t);
	ptr->~DerivedTaskType();
	deallocate( (void *) ptr );
	}

	void schedule();
	void closeout();

	protected :

	~TaskMember();

	- // Used by TaskMember< Qthread , ResultType , void >
	+ // Used by TaskMember< Qthreads , ResultType , void >
	TaskMember( const function_verify_type arg_verify
	, const function_dealloc_type arg_dealloc
	, const function_single_type arg_apply_single
	, const function_team_type arg_apply_team
	, volatile int & arg_active_count
	, const unsigned arg_sizeof_derived
	, const unsigned arg_dependence_capacity
	);

	- // Used for TaskMember< Qthread , void , void >
	+ // Used for TaskMember< Qthreads , void , void >
	TaskMember( const function_dealloc_type arg_dealloc
	, const function_single_type arg_apply_single
	, const function_team_type arg_apply_team
	, volatile int & arg_active_count
	, const unsigned arg_sizeof_derived
	, const unsigned arg_dependence_capacity
	);

	public:

	template< typename ResultType >
	KOKKOS_FUNCTION static
	TaskMember * verify_type( TaskMember * t )
	{
	enum { check_type = ! std::is_same< ResultType , void >::value };

	if ( check_type && t != 0 ) {

	// Verify that t->m_verify is this function
	const function_verify_type self = & TaskMember::template verify_type< ResultType > ;

	if ( t->m_verify != self ) {
	t = 0 ;
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	throw_error_verify_type();
	#endif
	}
	}
	return t ;
	}

	//----------------------------------------
	/* Inheritence Requirements on task types:
	* typedef FunctorType::value_type value_type ;
	* class DerivedTaskType
	- * : public TaskMember< Qthread , value_type , FunctorType >
	+ * : public TaskMember< Qthreads , value_type , FunctorType >
	* { ... };
	- * class TaskMember< Qthread , value_type , FunctorType >
	- * : public TaskMember< Qthread , value_type , void >
	+ * class TaskMember< Qthreads , value_type , FunctorType >
	+ * : public TaskMember< Qthreads , value_type , void >
	* , public Functor
	* { ... };
	* If value_type != void
	- * class TaskMember< Qthread , value_type , void >
	- * : public TaskMember< Qthread , void , void >
	+ * class TaskMember< Qthreads , value_type , void >
	+ * : public TaskMember< Qthreads , void , void >
	*
	* Allocate space for DerivedTaskType followed by TaskMember*[ dependence_capacity ]
	*
	*/

	/** \brief Allocate and construct a single-thread task */
	template< class DerivedTaskType >
	static
	TaskMember * create_single( const typename DerivedTaskType::functor_type & arg_functor
	, volatile int & arg_active_count
	, const unsigned arg_dependence_capacity )
	{
	typedef typename DerivedTaskType::functor_type functor_type ;
	typedef typename functor_type::value_type value_type ;

	DerivedTaskType * const task =
	new( allocate( sizeof(DerivedTaskType) , arg_dependence_capacity ) )
	DerivedTaskType( & TaskMember::template deallocate< DerivedTaskType >
	, & TaskMember::template apply_single< functor_type , value_type >
	, 0
	, arg_active_count
	, sizeof(DerivedTaskType)
	, arg_dependence_capacity
	, arg_functor );

	return static_cast< TaskMember * >( task );
	}

	/** \brief Allocate and construct a team-thread task */
	template< class DerivedTaskType >
	static
	TaskMember * create_team( const typename DerivedTaskType::functor_type & arg_functor
	, volatile int & arg_active_count
	, const unsigned arg_dependence_capacity
	, const bool arg_is_team )
	{
	typedef typename DerivedTaskType::functor_type functor_type ;
	typedef typename functor_type::value_type value_type ;

	const function_single_type flag = reinterpret_cast<function_single_type>( arg_is_team ? 0 : 1 );

	DerivedTaskType * const task =
	new( allocate( sizeof(DerivedTaskType) , arg_dependence_capacity ) )
	DerivedTaskType( & TaskMember::template deallocate< DerivedTaskType >
	, flag
	, & TaskMember::template apply_team< functor_type , value_type >
	, arg_active_count
	, sizeof(DerivedTaskType)
	, arg_dependence_capacity
	, arg_functor );

	return static_cast< TaskMember * >( task );
	}

	void respawn();
	void spawn()
	{
	m_state = Kokkos::Experimental::TASK_STATE_WAITING ;
	schedule();
	}

	//----------------------------------------

	typedef FutureValueTypeIsVoidError get_result_type ;

	KOKKOS_INLINE_FUNCTION
	get_result_type get() const { return get_result_type() ; }

	KOKKOS_INLINE_FUNCTION
	Kokkos::Experimental::TaskState get_state() const { return Kokkos::Experimental::TaskState( m_state ); }

	//----------------------------------------

	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	static
	void assign( TaskMember ** const lhs , TaskMember * const rhs , const bool no_throw = false );
	#else
	KOKKOS_INLINE_FUNCTION static
	void assign( TaskMember ** const lhs , TaskMember * const rhs , const bool no_throw = false ) {}
	#endif

	KOKKOS_INLINE_FUNCTION
	TaskMember * get_dependence( int i ) const
	{ return ( Kokkos::Experimental::TASK_STATE_EXECUTING == m_state && 0 <= i && i < m_dep_size ) ? m_dep[i] : (TaskMember*) 0 ; }

	KOKKOS_INLINE_FUNCTION
	int get_dependence() const
	{ return m_dep_size ; }

	KOKKOS_INLINE_FUNCTION
	void clear_dependence()
	{
	for ( int i = 0 ; i < m_dep_size ; ++i ) assign( m_dep + i , 0 );
	m_dep_size = 0 ;
	}

	KOKKOS_INLINE_FUNCTION
	void add_dependence( TaskMember * before )
	{
	if ( ( Kokkos::Experimental::TASK_STATE_CONSTRUCTING == m_state \|\|
	Kokkos::Experimental::TASK_STATE_EXECUTING == m_state ) &&
	m_dep_size < m_dep_capacity ) {
	assign( m_dep + m_dep_size , before );
	++m_dep_size ;
	}
	else {
	throw_error_add_dependence();
	}
	}

	//----------------------------------------

	template< class FunctorType , class ResultType >
	KOKKOS_INLINE_FUNCTION static
	void apply_single( typename std::enable_if< ! std::is_same< ResultType , void >::value , TaskMember * >::type t )
	{
	- typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
	+ typedef TaskMember< Kokkos::Qthreads , ResultType , FunctorType > derived_type ;

	- // TaskMember< Kokkos::Qthread , ResultType , FunctorType >
	- // : public TaskMember< Kokkos::Qthread , ResultType , void >
	+ // TaskMember< Kokkos::Qthreads , ResultType , FunctorType >
	+ // : public TaskMember< Kokkos::Qthreads , ResultType , void >
	// , public FunctorType
	// { ... };

	derived_type & m = * static_cast< derived_type * >( t );

	Kokkos::Impl::FunctorApply< FunctorType , void , ResultType & >::apply( (FunctorType &) m , & m.m_result );
	}

	template< class FunctorType , class ResultType >
	KOKKOS_INLINE_FUNCTION static
	void apply_single( typename std::enable_if< std::is_same< ResultType , void >::value , TaskMember * >::type t )
	{
	- typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
	+ typedef TaskMember< Kokkos::Qthreads , ResultType , FunctorType > derived_type ;

	- // TaskMember< Kokkos::Qthread , ResultType , FunctorType >
	- // : public TaskMember< Kokkos::Qthread , ResultType , void >
	+ // TaskMember< Kokkos::Qthreads , ResultType , FunctorType >
	+ // : public TaskMember< Kokkos::Qthreads , ResultType , void >
	// , public FunctorType
	// { ... };

	derived_type & m = * static_cast< derived_type * >( t );

	Kokkos::Impl::FunctorApply< FunctorType , void , void >::apply( (FunctorType &) m );
	}

	//----------------------------------------

	template< class FunctorType , class ResultType >
	KOKKOS_INLINE_FUNCTION static
	void apply_team( typename std::enable_if< ! std::is_same< ResultType , void >::value , TaskMember * >::type t
	- , Kokkos::Impl::QthreadTeamPolicyMember & member )
	+ , Kokkos::Impl::QthreadsTeamPolicyMember & member )
	{
	- typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
	+ typedef TaskMember< Kokkos::Qthreads , ResultType , FunctorType > derived_type ;

	derived_type & m = * static_cast< derived_type * >( t );

	m.FunctorType::apply( member , m.m_result );
	}

	template< class FunctorType , class ResultType >
	KOKKOS_INLINE_FUNCTION static
	void apply_team( typename std::enable_if< std::is_same< ResultType , void >::value , TaskMember * >::type t
	- , Kokkos::Impl::QthreadTeamPolicyMember & member )
	+ , Kokkos::Impl::QthreadsTeamPolicyMember & member )
	{
	- typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
	+ typedef TaskMember< Kokkos::Qthreads , ResultType , FunctorType > derived_type ;

	derived_type & m = * static_cast< derived_type * >( t );

	m.FunctorType::apply( member );
	}
	};

	//----------------------------------------------------------------------------
	-/** \brief Base class for tasks with a result value in the Qthread execution space.
	+/** \brief Base class for tasks with a result value in the Qthreads execution space.
	*
	* The FunctorType must be void because this class is accessed by the
	* Future class for the task and result value.
	*
	* Must be derived from TaskMember<S,void,void> 'root class' so the Future class
	* can correctly static_cast from the 'root class' to this class.
	*/
	template < class ResultType >
	-class TaskMember< Kokkos::Qthread , ResultType , void >
	- : public TaskMember< Kokkos::Qthread , void , void >
	+class TaskMember< Kokkos::Qthreads , ResultType , void >
	+ : public TaskMember< Kokkos::Qthreads , void , void >
	{
	public:

	ResultType m_result ;

	typedef const ResultType & get_result_type ;

	KOKKOS_INLINE_FUNCTION
	get_result_type get() const { return m_result ; }

	protected:

	- typedef TaskMember< Kokkos::Qthread , void , void > task_root_type ;
	+ typedef TaskMember< Kokkos::Qthreads , void , void > task_root_type ;
	typedef task_root_type::function_dealloc_type function_dealloc_type ;
	typedef task_root_type::function_single_type function_single_type ;
	typedef task_root_type::function_team_type function_team_type ;

	inline
	TaskMember( const function_dealloc_type arg_dealloc
	, const function_single_type arg_apply_single
	, const function_team_type arg_apply_team
	, volatile int & arg_active_count
	, const unsigned arg_sizeof_derived
	, const unsigned arg_dependence_capacity
	)
	: task_root_type( & task_root_type::template verify_type< ResultType >
	, arg_dealloc
	, arg_apply_single
	, arg_apply_team
	, arg_active_count
	, arg_sizeof_derived
	, arg_dependence_capacity )
	, m_result()
	{}
	};

	template< class ResultType , class FunctorType >
	-class TaskMember< Kokkos::Qthread , ResultType , FunctorType >
	- : public TaskMember< Kokkos::Qthread , ResultType , void >
	+class TaskMember< Kokkos::Qthreads , ResultType , FunctorType >
	+ : public TaskMember< Kokkos::Qthreads , ResultType , void >
	, public FunctorType
	{
	public:

	typedef FunctorType functor_type ;

	- typedef TaskMember< Kokkos::Qthread , void , void > task_root_type ;
	- typedef TaskMember< Kokkos::Qthread , ResultType , void > task_base_type ;
	+ typedef TaskMember< Kokkos::Qthreads , void , void > task_root_type ;
	+ typedef TaskMember< Kokkos::Qthreads , ResultType , void > task_base_type ;
	typedef task_root_type::function_dealloc_type function_dealloc_type ;
	typedef task_root_type::function_single_type function_single_type ;
	typedef task_root_type::function_team_type function_team_type ;

	inline
	TaskMember( const function_dealloc_type arg_dealloc
	, const function_single_type arg_apply_single
	, const function_team_type arg_apply_team
	, volatile int & arg_active_count
	, const unsigned arg_sizeof_derived
	, const unsigned arg_dependence_capacity
	, const functor_type & arg_functor
	)
	: task_base_type( arg_dealloc
	, arg_apply_single
	, arg_apply_team
	, arg_active_count
	, arg_sizeof_derived
	, arg_dependence_capacity )
	, functor_type( arg_functor )
	{}
	};

	} /* namespace Impl */
	} /* namespace Experimental */
	} /* namespace Kokkos */

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Experimental {

	-void wait( TaskPolicy< Kokkos::Qthread > & );
	+void wait( TaskPolicy< Kokkos::Qthreads > & );

	template<>
	-class TaskPolicy< Kokkos::Qthread >
	+class TaskPolicy< Kokkos::Qthreads >
	{
	public:

	- typedef Kokkos::Qthread execution_space ;
	+ typedef Kokkos::Qthreads execution_space ;
	typedef TaskPolicy execution_policy ;
	- typedef Kokkos::Impl::QthreadTeamPolicyMember member_type ;
	+ typedef Kokkos::Impl::QthreadsTeamPolicyMember member_type ;

	private:

	typedef Impl::TaskMember< execution_space , void , void > task_root_type ;

	template< class FunctorType >
	static inline
	const task_root_type * get_task_root( const FunctorType * f )
	{
	typedef Impl::TaskMember< execution_space , typename FunctorType::value_type , FunctorType > task_type ;
	return static_cast< const task_root_type * >( static_cast< const task_type * >(f) );
	}

	template< class FunctorType >
	static inline
	task_root_type * get_task_root( FunctorType * f )
	{
	typedef Impl::TaskMember< execution_space , typename FunctorType::value_type , FunctorType > task_type ;
	return static_cast< task_root_type * >( static_cast< task_type * >(f) );
	}

	unsigned m_default_dependence_capacity ;
	unsigned m_team_size ;
	volatile int m_active_count_root ;
	volatile int & m_active_count ;

	public:

	TaskPolicy
	( const unsigned arg_task_max_count
	, const unsigned arg_task_max_size
	, const unsigned arg_task_default_dependence_capacity = 4
	, const unsigned arg_task_team_size = 0 /* choose default */
	);

	KOKKOS_FUNCTION TaskPolicy() = default ;
	KOKKOS_FUNCTION TaskPolicy( TaskPolicy && rhs ) = default ;
	KOKKOS_FUNCTION TaskPolicy( const TaskPolicy & rhs ) = default ;
	KOKKOS_FUNCTION TaskPolicy & operator = ( TaskPolicy && rhs ) = default ;
	KOKKOS_FUNCTION TaskPolicy & operator = ( const TaskPolicy & rhs ) = default ;

	//----------------------------------------

	KOKKOS_INLINE_FUNCTION
	int allocated_task_count() const { return m_active_count ; }

	template< class ValueType >
	const Future< ValueType , execution_space > &
	spawn( const Future< ValueType , execution_space > & f
	, const bool priority = false ) const
	{
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	f.m_task->spawn();
	#endif
	return f ;
	}

	// Create single-thread task

	template< class FunctorType >
	KOKKOS_INLINE_FUNCTION
	Future< typename FunctorType::value_type , execution_space >
	task_create( const FunctorType & functor
	, const unsigned dependence_capacity = ~0u ) const
	{
	typedef typename FunctorType::value_type value_type ;
	typedef Impl::TaskMember< execution_space , value_type , FunctorType > task_type ;
	return Future< value_type , execution_space >(
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	task_root_type::create_single< task_type >
	( functor
	, m_active_count
	, ( ~0u == dependence_capacity ? m_default_dependence_capacity : dependence_capacity )
	)
	#endif
	);
	}

	template< class FunctorType >
	Future< typename FunctorType::value_type , execution_space >
	proc_create( const FunctorType & functor
	, const unsigned dependence_capacity = ~0u ) const
	{ return task_create( functor , dependence_capacity ); }

	// Create thread-team task

	template< class FunctorType >
	KOKKOS_INLINE_FUNCTION
	Future< typename FunctorType::value_type , execution_space >
	task_create_team( const FunctorType & functor
	, const unsigned dependence_capacity = ~0u ) const
	{
	typedef typename FunctorType::value_type value_type ;
	typedef Impl::TaskMember< execution_space , value_type , FunctorType > task_type ;

	return Future< value_type , execution_space >(
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	task_root_type::create_team< task_type >
	( functor
	, m_active_count
	, ( ~0u == dependence_capacity ? m_default_dependence_capacity : dependence_capacity )
	, 1 < m_team_size
	)
	#endif
	);
	}

	template< class FunctorType >
	KOKKOS_INLINE_FUNCTION
	Future< typename FunctorType::value_type , execution_space >
	proc_create_team( const FunctorType & functor
	, const unsigned dependence_capacity = ~0u ) const
	{ return task_create_team( functor , dependence_capacity ); }

	// Add dependence
	template< class A1 , class A2 , class A3 , class A4 >
	void add_dependence( const Future<A1,A2> & after
	, const Future<A3,A4> & before
	, typename std::enable_if
	< std::is_same< typename Future<A1,A2>::execution_space , execution_space >::value
	&&
	std::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
	>::type * = 0
	)
	{
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	after.m_task->add_dependence( before.m_task );
	#endif
	}

	//----------------------------------------
	// Functions for an executing task functor to query dependences,
	// set new dependences, and respawn itself.

	template< class FunctorType >
	Future< void , execution_space >
	get_dependence( const FunctorType * task_functor , int i ) const
	{
	return Future<void,execution_space>(
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	get_task_root(task_functor)->get_dependence(i)
	#endif
	);
	}

	template< class FunctorType >
	int get_dependence( const FunctorType * task_functor ) const
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	{ return get_task_root(task_functor)->get_dependence(); }
	#else
	{ return 0 ; }
	#endif

	template< class FunctorType >
	void clear_dependence( FunctorType * task_functor ) const
	{
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	get_task_root(task_functor)->clear_dependence();
	#endif
	}

	template< class FunctorType , class A3 , class A4 >
	void add_dependence( FunctorType * task_functor
	, const Future<A3,A4> & before
	, typename std::enable_if
	< std::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
	>::type * = 0
	)
	{
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	get_task_root(task_functor)->add_dependence( before.m_task );
	#endif
	}

	template< class FunctorType >
	void respawn( FunctorType * task_functor
	, const bool priority = false ) const
	{
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	get_task_root(task_functor)->respawn();
	#endif
	}

	template< class FunctorType >
	void respawn_needing_memory( FunctorType * task_functor ) const
	{
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	get_task_root(task_functor)->respawn();
	#endif
	}

	static member_type & member_single();

	- friend void wait( TaskPolicy< Kokkos::Qthread > & );
	+ friend void wait( TaskPolicy< Kokkos::Qthreads > & );
	};

	} /* namespace Experimental */
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
	-#endif /* #define KOKKOS_QTHREAD_TASK_HPP */
	+#endif /* #define KOKKOS_QTHREADS_TASK_HPP */

	diff --git a/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskQueue.hpp b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskQueue.hpp
	new file mode 100644
	index 000000000..55235cd6d
	--- /dev/null
	+++ b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskQueue.hpp
	@@ -0,0 +1,319 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+#if defined( KOKKOS_ENABLE_TASKPOLICY )
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+namespace Impl {
	+
	+/** \brief Manage task allocation, deallocation, and scheduling.
	+ *
	+ * Task execution is handled here directly for the Qthread implementation.
	+ */
	+template<>
	+class TaskQueue< Kokkos::Qthread > {
	+private:
	+
	+ using execution_space = Kokkos::Qthread ;
	+ using memory_space = Kokkos::HostSpace
	+ using device_type = Kokkos::Device< execution_space, memory_space > ;
	+ using memory_pool = Kokkos::Experimental::MemoryPool< device_type > ;
	+ using task_root_type = Kokkos::Impl::TaskBase< execution_space, void, void > ;
	+
	+ friend class Kokkos::TaskScheduler< execution_space > ;
	+
	+ struct Destroy {
	+ TaskQueue * m_queue ;
	+ void destroy_shared_allocation();
	+ };
	+
	+ //----------------------------------------
	+
	+ enum : int { TASK_STATE_NULL = 0, ///< Does not exist
	+ TASK_STATE_CONSTRUCTING = 1, ///< Is under construction
	+ TASK_STATE_WAITING = 2, ///< Is waiting for execution
	+ TASK_STATE_EXECUTING = 4, ///< Is executing
	+ TASK_STATE_RESPAWN = 8, ///< Requested respawn
	+ TASK_STATE_COMPLETE = 16 ///< Execution is complete
	+ };
	+
	+ // Queue is organized as [ priority ][ type ]
	+
	+ memory_pool m_memory ;
	+ unsigned m_team_size ; // Number of threads in a team
	+ long m_accum_alloc ; // Accumulated number of allocations
	+ int m_count_alloc ; // Current number of allocations
	+ int m_max_alloc ; // Maximum number of allocations
	+ int m_ready_count ; // Number of ready or executing
	+
	+ //----------------------------------------
	+
	+ ~TaskQueue();
	+ TaskQueue() = delete ;
	+ TaskQueue( TaskQueue && ) = delete ;
	+ TaskQueue( TaskQueue const & ) = delete ;
	+ TaskQueue & operator = ( TaskQueue && ) = delete ;
	+ TaskQueue & operator = ( TaskQueue const & ) = delete ;
	+
	+ TaskQueue
	+ ( const memory_space & arg_space,
	+ unsigned const arg_memory_pool_capacity,
	+ unsigned const arg_memory_pool_superblock_capacity_log2
	+ );
	+
	+ // Schedule a task
	+ // Precondition:
	+ // task is not executing
	+ // task->m_next is the dependence or zero
	+ // Postcondition:
	+ // task->m_next is linked list membership
	+ KOKKOS_FUNCTION
	+ void schedule( task_root_type * const );
	+
	+ // Reschedule a task
	+ // Precondition:
	+ // task is in Executing state
	+ // task->m_next == LockTag
	+ // Postcondition:
	+ // task is in Executing-Respawn state
	+ // task->m_next == 0 (no dependence)
	+ KOKKOS_FUNCTION
	+ void reschedule( task_root_type * );
	+
	+ // Complete a task
	+ // Precondition:
	+ // task is not executing
	+ // task->m_next == LockTag => task is complete
	+ // task->m_next != LockTag => task is respawn
	+ // Postcondition:
	+ // task->m_wait == LockTag => task is complete
	+ // task->m_wait != LockTag => task is waiting
	+ KOKKOS_FUNCTION
	+ void complete( task_root_type * );
	+
	+public:
	+
	+ // If and only if the execution space is a single thread
	+ // then execute ready tasks.
	+ KOKKOS_INLINE_FUNCTION
	+ void iff_single_thread_recursive_execute()
	+ {
	+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+ specialization::iff_single_thread_recursive_execute( this );
	+#endif
	+ }
	+
	+ void execute() { specialization::execute( this ); }
	+
	+ template< typename FunctorType >
	+ void proc_set_apply( typename task_root_type::function_type * ptr )
	+ {
	+ specialization::template proc_set_apply< FunctorType >( ptr );
	+ }
	+
	+ // Assign task pointer with reference counting of assigned tasks
	+ template< typename LV, typename RV >
	+ KOKKOS_FUNCTION static
	+ void assign( TaskBase< execution_space, LV, void > ** const lhs,
	+ TaskBase< execution_space, RV, void > * const rhs )
	+ {
	+ using task_lhs = TaskBase< execution_space, LV, void > ;
	+#if 0
	+ {
	+ printf( "assign( 0x%lx { 0x%lx %d %d }, 0x%lx { 0x%lx %d %d } )\n",
	+ uintptr_t( lhs ? *lhs : 0 ),
	+ uintptr_t( lhs && lhs ? (lhs)->m_next : 0 ),
	+ int( lhs && lhs ? (lhs)->m_task_type : 0 ),
	+ int( lhs && lhs ? (lhs)->m_ref_count : 0 ),
	+ uintptr_t(rhs),
	+ uintptr_t( rhs ? rhs->m_next : 0 ),
	+ int( rhs ? rhs->m_task_type : 0 ),
	+ int( rhs ? rhs->m_ref_count : 0 )
	+ );
	+ fflush( stdout );
	+ }
	+#endif
	+
	+ if ( *lhs )
	+ {
	+ const int count = Kokkos::atomic_fetch_add( &((*lhs)->m_ref_count), -1 );
	+
	+ if ( ( 1 == count ) && ( (*lhs)->m_state == TASK_STATE_COMPLETE ) ) {
	+ // Reference count is zero and task is complete, deallocate.
	+ (lhs)->m_queue->deallocate( lhs, (*lhs)->m_alloc_size );
	+ }
	+ else if ( count <= 1 ) {
	+ Kokkos::abort("TaskScheduler task has negative reference count or is incomplete" );
	+ }
	+
	+ // GEM: Should I check that there are no dependences here? Can the state
	+ // be set to complete while there are still dependences?
	+ }
	+
	+ if ( rhs ) { Kokkos::atomic_fetch_add( &(rhs->m_ref_count), 1 ); }
	+
	+ // Force write of *lhs
	+
	+ static_cast< task_lhs volatile * >(lhs) = rhs ;
	+
	+ Kokkos::memory_fence();
	+ }
	+
	+ KOKKOS_FUNCTION
	+ size_t allocate_block_size( size_t n ); ///< Actual block size allocated
	+
	+ KOKKOS_FUNCTION
	+ void * allocate( size_t n ); ///< Allocate from the memory pool
	+
	+ KOKKOS_FUNCTION
	+ void deallocate( void * p, size_t n ); ///< Deallocate to the memory pool
	+};
	+
	+} /* namespace Impl */
	+} /* namespace Kokkos */
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+namespace Impl {
	+
	+template<>
	+class TaskBase< Kokkos::Qthread, void, void >
	+{
	+public:
	+
	+ enum : int16_t { TaskTeam = TaskBase< void, void, void >::TaskTeam,
	+ TaskSingle = TaskBase< void, void, void >::TaskSingle,
	+ Aggregate = TaskBase< void, void, void >::Aggregate };
	+
	+ enum : uintptr_t { LockTag = TaskBase< void, void, void >::LockTag,
	+ EndTag = TaskBase< void, void, void >::EndTag };
	+
	+ using execution_space = Kokkos::Qthread ;
	+ using queue_type = TaskQueue< execution_space > ;
	+
	+ template< typename > friend class Kokkos::TaskScheduler ;
	+
	+ typedef void (* function_type) ( TaskBase , void );
	+
	+ // sizeof(TaskBase) == 48
	+
	+ function_type m_apply ; ///< Apply function pointer
	+ queue_type * m_queue ; ///< Queue in which this task resides
	+ TaskBase * m_dep ; ///< Dependence
	+ int32_t m_ref_count ; ///< Reference count
	+ int32_t m_alloc_size ; ///< Allocation size
	+ int32_t m_dep_count ; ///< Aggregate's number of dependences
	+ int16_t m_task_type ; ///< Type of task
	+ int16_t m_priority ; ///< Priority of runnable task
	+ aligned_t m_qfeb ; ///< Qthread full/empty bit
	+ int m_state ; ///< State of the task
	+
	+ TaskBase( TaskBase && ) = delete ;
	+ TaskBase( const TaskBase & ) = delete ;
	+ TaskBase & operator = ( TaskBase && ) = delete ;
	+ TaskBase & operator = ( const TaskBase & ) = delete ;
	+
	+ KOKKOS_INLINE_FUNCTION ~TaskBase() = default ;
	+
	+ KOKKOS_INLINE_FUNCTION
	+ constexpr TaskBase() noexcept
	+ : m_apply(0),
	+ m_queue(0),
	+ m_dep(0),
	+ m_ref_count(0),
	+ m_alloc_size(0),
	+ m_dep_count(0),
	+ m_task_type( TaskSingle ),
	+ m_priority( 1 /* TaskRegularPriority */ ),
	+ m_qfeb(0),
	+ m_state( queue_type::TASK_STATE_CONSTRUCTING )
	+ {
	+ qthread_empty( & m_qfeb ); // Set to full when complete
	+ }
	+
	+ //----------------------------------------
	+
	+ static aligned_t qthread_func( void * arg );
	+
	+ KOKKOS_INLINE_FUNCTION
	+ TaskBase ** aggregate_dependences()
	+ { return reinterpret_cast<TaskBase**>( this + 1 ); }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void requested_respawn()
	+ { return m_state == queue_type::TASK_STATE_RESPAWN; }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void add_dependence( TaskBase* dep )
	+ {
	+ // Assign dependence to m_dep. It will be processed in the subsequent
	+ // call to schedule. Error if the dependence is reset.
	+ if ( 0 != Kokkos::atomic_exchange( & m_dep, dep ) ) {
	+ Kokkos::abort("TaskScheduler ERROR: resetting task dependence");
	+ }
	+
	+ if ( 0 != dep ) {
	+ // The future may be destroyed upon returning from this call
	+ // so increment reference count to track this assignment.
	+ Kokkos::atomic_fetch_add( &(dep->m_ref_count), 1 );
	+ }
	+ }
	+
	+ using get_return_type = void ;
	+
	+ KOKKOS_INLINE_FUNCTION
	+ get_return_type get() const {}
	+};
	+
	+} /* namespace Impl */
	+} /* namespace Kokkos */
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
	diff --git a/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskQueue_impl.hpp b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskQueue_impl.hpp
	new file mode 100644
	index 000000000..4a9190c73
	--- /dev/null
	+++ b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskQueue_impl.hpp
	@@ -0,0 +1,436 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+#if defined( KOKKOS_ENABLE_TASKPOLICY )
	+
	+namespace Kokkos {
	+namespace Impl {
	+
	+//----------------------------------------------------------------------------
	+
	+template< typename ExecSpace >
	+void TaskQueue< ExecSpace >::Destroy::destroy_shared_allocation()
	+{
	+ m_queue->~TaskQueue();
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+template< typename ExecSpace >
	+TaskQueue< ExecSpace >::TaskQueue
	+ ( const TaskQueue< ExecSpace >::memory_space & arg_space,
	+ unsigned const arg_memory_pool_capacity,
	+ unsigned const arg_memory_pool_superblock_capacity_log2 )
	+ : m_memory( arg_space,
	+ arg_memory_pool_capacity,
	+ arg_memory_pool_superblock_capacity_log2 )
	+ m_team_size( unsigned( qthread_num_workers_local(NO_SHEPHERD) ) ),
	+ m_accum_alloc(0),
	+ m_count_alloc(0),
	+ m_max_alloc(0),
	+ m_ready_count(0)
	+{}
	+
	+//----------------------------------------------------------------------------
	+
	+template< typename ExecSpace >
	+TaskQueue< ExecSpace >::~TaskQueue()
	+{
	+ // Verify that ready count is zero.
	+ if ( 0 != m_ready_count ) {
	+ Kokkos::abort("TaskQueue::~TaskQueue ERROR: has ready or executing tasks");
	+ }
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+template< typename ExecSpace >
	+KOKKOS_FUNCTION
	+size_t TaskQueue< ExecSpace >::allocate_block_size( size_t n )
	+{
	+ return m_memory.allocate_block_size( n );
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+template< typename ExecSpace >
	+KOKKOS_FUNCTION
	+void * TaskQueue< ExecSpace >::allocate( size_t n )
	+{
	+ void * const p = m_memory.allocate(n);
	+
	+ if ( p ) {
	+ Kokkos::atomic_increment( & m_accum_alloc );
	+ Kokkos::atomic_increment( & m_count_alloc );
	+
	+ if ( m_max_alloc < m_count_alloc ) m_max_alloc = m_count_alloc ;
	+ }
	+
	+ return p ;
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+template< typename ExecSpace >
	+KOKKOS_FUNCTION
	+void TaskQueue< ExecSpace >::deallocate( void * p, size_t n )
	+{
	+ m_memory.deallocate( p, n );
	+ Kokkos::atomic_decrement( & m_count_alloc );
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+template< typename ExecSpace >
	+KOKKOS_FUNCTION
	+void TaskQueue< ExecSpace >::schedule
	+ ( TaskQueue< ExecSpace >::task_root_type * const task )
	+{
	+#if 0
	+ printf( "schedule( 0x%lx { %d %d %d }\n",
	+ uintptr_t(task),
	+ task->m_task_type,
	+ task->m_priority,
	+ task->m_ref_count );
	+#endif
	+
	+ // The task has been constructed and is waiting to be executed.
	+ task->m_state = TASK_STATE_WAITING ;
	+
	+ if ( task->m_task_type != task_root_type::Aggregate ) {
	+ // Scheduling a single or team task.
	+
	+ // Increment active task count before spawning.
	+ Kokkos::atomic_increment( m_ready_count );
	+
	+ if ( task->m_dep == 0 ) {
	+ // Schedule a task with no dependences.
	+
	+ if ( task_root_type::TaskTeam == task->m_task_type && m_team_size > 1 ) {
	+ // If more than one shepherd spawn on a shepherd other than this shepherd
	+ const int num_shepherd = qthread_num_shepherds();
	+ const int this_shepherd = qthread_shep();
	+ int spawn_shepherd = ( this_shepherd + 1 ) % num_shepherd ;
	+
	+#if 0
	+ fprintf( stdout,
	+ "worker(%d.%d) task 0x%.12lx spawning on shepherd(%d) clone(%d)\n",
	+ qthread_shep(),
	+ qthread_worker_local(NULL),
	+ reinterpret_cast<unsigned long>(this),
	+ spawn_shepherd,
	+ m_team_size - 1
	+ );
	+ fflush(stdout);
	+#endif
	+
	+ qthread_spawn_cloneable(
	+ & task_root_type::qthread_func,
	+ task,
	+ 0,
	+ NULL,
	+ 0, // no depenedences
	+ 0, // dependences array
	+ spawn_shepherd,
	+ unsigned( QTHREAD_SPAWN_SIMPLE \| QTHREAD_SPAWN_LOCAL_PRIORITY ),
	+ m_team_size - 1
	+ );
	+ }
	+ else {
	+ qthread_spawn(
	+ & task_root_type::qthread_func,
	+ task,
	+ 0,
	+ NULL,
	+ 0, // no depenedences
	+ 0, // dependences array
	+ NO_SHEPHERD,
	+ QTHREAD_SPAWN_SIMPLE /* allows optimization for non-blocking task */
	+ );
	+ }
	+ }
	+ else if ( task->m_dep->m_task_type != task_root_type::Aggregate )
	+ // Malloc the precondition array to pass to qthread_spawn(). For
	+ // non-aggregate tasks, it is a single pointer since there are no
	+ // dependences. Qthreads will eventually free this allocation so memory will
	+ // not be leaked. Is malloc thread-safe? Should this call be guarded? The
	+ // memory can't be allocated from the pool allocator because Qthreads frees
	+ // it using free().
	+ aligned_t qprecon = (aligned_t ) malloc( sizeof(aligned_t *) );
	+
	+ qprecon = reinterpret_cast<aligned_t >( uintptr_t(m_dep_size) );
	+
	+ if ( task->m_task_type == task_root_type::TaskTeam && m_team_size > 1) {
	+ // If more than one shepherd spawn on a shepherd other than this shepherd
	+ const int num_shepherd = qthread_num_shepherds();
	+ const int this_shepherd = qthread_shep();
	+ int spawn_shepherd = ( this_shepherd + 1 ) % num_shepherd ;
	+
	+#if 0
	+ fprintf( stdout,
	+ "worker(%d.%d) task 0x%.12lx spawning on shepherd(%d) clone(%d)\n",
	+ qthread_shep(),
	+ qthread_worker_local(NULL),
	+ reinterpret_cast<unsigned long>(this),
	+ spawn_shepherd,
	+ m_team_size - 1
	+ );
	+ fflush(stdout);
	+#endif
	+
	+ qthread_spawn_cloneable(
	+ & Task::qthread_func,
	+ this,
	+ 0,
	+ NULL,
	+ m_dep_size,
	+ qprecon, /* dependences */
	+ spawn_shepherd,
	+ unsigned( QTHREAD_SPAWN_SIMPLE \| QTHREAD_SPAWN_LOCAL_PRIORITY ),
	+ m_team_size - 1
	+ );
	+ }
	+ else {
	+ qthread_spawn(
	+ & Task::qthread_func, /* function */
	+ this, /* function argument */
	+ 0,
	+ NULL,
	+ m_dep_size,
	+ qprecon, /* dependences */
	+ NO_SHEPHERD,
	+ QTHREAD_SPAWN_SIMPLE /* allows optimization for non-blocking task */
	+ );
	+ }
	+ }
	+ else {
	+ // GEM: How do I handle an aggregate (when_all) task?
	+ }
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+template< typename ExecSpace >
	+KOKKOS_FUNCTION
	+void TaskQueue< ExecSpace >::reschedule( task_root_type * task )
	+{
	+ // Precondition:
	+ // task is in Executing state
	+ // task->m_next == LockTag
	+ //
	+ // Postcondition:
	+ // task is in Executing-Respawn state
	+ // task->m_next == 0 (no dependence)
	+
	+ task_root_type * const zero = (task_root_type *) 0 ;
	+ task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
	+
	+ if ( lock != Kokkos::atomic_exchange( & task->m_next, zero ) ) {
	+ Kokkos::abort("TaskScheduler::respawn ERROR: already respawned");
	+ }
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+template< typename ExecSpace >
	+KOKKOS_FUNCTION
	+void TaskQueue< ExecSpace >::complete
	+ ( TaskQueue< ExecSpace >::task_root_type * task )
	+{
	+ // Complete a runnable task that has finished executing
	+ // or a when_all task when all of its dependeneces are complete.
	+
	+ task_root_type * const zero = (task_root_type *) 0 ;
	+ task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
	+ task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
	+
	+#if 0
	+ printf( "complete( 0x%lx { 0x%lx 0x%lx %d %d %d }\n",
	+ uintptr_t(task),
	+ uintptr_t(task->m_wait),
	+ uintptr_t(task->m_next),
	+ task->m_task_type,
	+ task->m_priority,
	+ task->m_ref_count
	+ );
	+ fflush( stdout );
	+#endif
	+
	+ const bool runnable = task_root_type::Aggregate != task->m_task_type ;
	+
	+ //----------------------------------------
	+
	+ if ( runnable && lock != task->m_next ) {
	+ // Is a runnable task has finished executing and requested respawn.
	+ // Schedule the task for subsequent execution.
	+
	+ schedule( task );
	+ }
	+ //----------------------------------------
	+ else {
	+ // Is either an aggregate or a runnable task that executed
	+ // and did not respawn. Transition this task to complete.
	+
	+ // If 'task' is an aggregate then any of the runnable tasks that
	+ // it depends upon may be attempting to complete this 'task'.
	+ // Must only transition a task once to complete status.
	+ // This is controled by atomically locking the wait queue.
	+
	+ // Stop other tasks from adding themselves to this task's wait queue
	+ // by locking the head of this task's wait queue.
	+
	+ task_root_type * x = Kokkos::atomic_exchange( & task->m_wait, lock );
	+
	+ if ( x != (task_root_type *) lock ) {
	+
	+ // This thread has transitioned this 'task' to complete.
	+ // 'task' is no longer in a queue and is not executing
	+ // so decrement the reference count from 'task's creation.
	+ // If no other references to this 'task' then it will be deleted.
	+
	+ TaskQueue::assign( & task, zero );
	+
	+ // This thread has exclusive access to the wait list so
	+ // the concurrency-safe pop_task function is not needed.
	+ // Schedule the tasks that have been waiting on the input 'task',
	+ // which may have been deleted.
	+
	+ while ( x != end ) {
	+
	+ // Set x->m_next = zero <= no dependence
	+
	+ task_root_type * const next =
	+ (task_root_type *) Kokkos::atomic_exchange( & x->m_next, zero );
	+
	+ schedule( x );
	+
	+ x = next ;
	+ }
	+ }
	+ }
	+
	+ if ( runnable ) {
	+ // A runnable task was popped from a ready queue and executed.
	+ // If respawned into a ready queue then the ready count was incremented
	+ // so decrement whether respawned or not.
	+ Kokkos::atomic_decrement( & m_ready_count );
	+ }
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+template<>
	+aligned_t
	+TaskBase< Kokkos::Qthreads, void, void >::qthread_func( void * arg )
	+{
	+ using execution_space = Kokkos::Qthreads ;
	+ using task_root_type = TaskBase< execution_space , void , void > ;
	+ using Member = Kokkos::Impl::QthreadsTeamPolicyMember;
	+
	+ task_root_type * const task = reinterpret_cast< task_root_type * >( arg );
	+
	+ // First member of the team change state to executing.
	+ // Use compare-exchange to avoid race condition with a respawn.
	+ Kokkos::atomic_compare_exchange_strong( & task->m_state,
	+ queue_type::TASK_STATE_WAITING,
	+ queue_type::TASK_STATE_EXECUTING
	+ );
	+
	+ if ( task_root_type::TaskTeam == task->m_task_type )
	+ {
	+ if ( 1 < task->m_queue->m_team_size ) {
	+ // Team task with team size of more than 1.
	+ Member::TaskTeam task_team_tag ;
	+
	+ // Initialize team size and rank with shephered info
	+ Member member( task_team_tag );
	+
	+ (*task->m_apply)( task , & member );
	+
	+#if 0
	+ fprintf( stdout,
	+ "worker(%d.%d) task 0x%.12lx executed by member(%d:%d)\n",
	+ qthread_shep(),
	+ qthread_worker_local(NULL),
	+ reinterpret_cast<unsigned long>(task),
	+ member.team_rank(),
	+ member.team_size()
	+ );
	+ fflush(stdout);
	+#endif
	+
	+ member.team_barrier();
	+ if ( member.team_rank() == 0 ) task->closeout();
	+ member.team_barrier();
	+ }
	+ else {
	+ // Team task with team size of 1.
	+ Member member ;
	+ (*task->m_apply)( task , & member );
	+ task->closeout();
	+ }
	+ }
	+ else {
	+ (*task->m_apply)( task );
	+ task->closeout();
	+ }
	+
	+#if 0
	+fprintf( stdout
	+ , "worker(%d.%d) task 0x%.12lx return\n"
	+ , qthread_shep()
	+ , qthread_worker_local(NULL)
	+ , reinterpret_cast<unsigned long>(task)
	+ );
	+fflush(stdout);
	+#endif
	+
	+ return 0 ;
	+}
	+
	+} /* namespace Impl */
	+} /* namespace Kokkos */
	+
	+
	+#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
	+
	diff --git a/lib/kokkos/core/src/Qthread/README b/lib/kokkos/core/src/Qthreads/README
	similarity index 99%
	rename from lib/kokkos/core/src/Qthread/README
	rename to lib/kokkos/core/src/Qthreads/README
	index 6e6c86a9e..e35b1f698 100644
	--- a/lib/kokkos/core/src/Qthread/README
	+++ b/lib/kokkos/core/src/Qthreads/README
	@@ -1,25 +1,24 @@

	# This Qthreads back-end uses an experimental branch of the Qthreads repository with special #define options.

	# Cloning repository and branch:

	git clone git@github.com:Qthreads/qthreads.git qthreads

	cd qthreads

	# checkout branch with "cloned tasks"

	git checkout dev-kokkos

	# Configure/autogen

	sh autogen.sh

	# configure with 'hwloc' installation:

	./configure CFLAGS="-DCLONED_TASKS -DQTHREAD_LOCAL_PRIORITY" --with-hwloc=${HWLOCDIR} --prefix=${INSTALLDIR}

	# install

	make install
	-
	diff --git a/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.cpp b/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.cpp
	index 0f69be9ed..b1f53489f 100644
	--- a/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.cpp
	+++ b/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.cpp
	@@ -1,826 +1,826 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Core_fwd.hpp>

	#if defined( KOKKOS_ENABLE_PTHREAD ) \|\| defined( KOKKOS_ENABLE_WINTHREAD )

	#include <stdint.h>
	#include <limits>
	#include <utility>
	#include <iostream>
	#include <sstream>
	#include <Kokkos_Core.hpp>
	#include <impl/Kokkos_Error.hpp>
	#include <impl/Kokkos_CPUDiscovery.hpp>
	#include <impl/Kokkos_Profiling_Interface.hpp>


	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {
	namespace {

	ThreadsExec s_threads_process ;
	ThreadsExec * s_threads_exec[ ThreadsExec::MAX_THREAD_COUNT ] = { 0 };
	pthread_t s_threads_pid[ ThreadsExec::MAX_THREAD_COUNT ] = { 0 };
	std::pair<unsigned,unsigned> s_threads_coord[ ThreadsExec::MAX_THREAD_COUNT ];

	int s_thread_pool_size[3] = { 0 , 0 , 0 };

	unsigned s_current_reduce_size = 0 ;
	unsigned s_current_shared_size = 0 ;

	void (* volatile s_current_function)( ThreadsExec & , const void * );
	const void * volatile s_current_function_arg = 0 ;

	struct Sentinel {
	Sentinel()
	{
	HostSpace::register_in_parallel( ThreadsExec::in_parallel );
	}

	~Sentinel()
	{
	if ( s_thread_pool_size[0] \|\|
	s_thread_pool_size[1] \|\|
	s_thread_pool_size[2] \|\|
	s_current_reduce_size \|\|
	s_current_shared_size \|\|
	s_current_function \|\|
	s_current_function_arg \|\|
	s_threads_exec[0] ) {
	std::cerr << "ERROR : Process exiting without calling Kokkos::Threads::terminate()" << std::endl ;
	}
	}
	};

	inline
	unsigned fan_size( const unsigned rank , const unsigned size )
	{
	const unsigned rank_rev = size - ( rank + 1 );
	unsigned count = 0 ;
	for ( unsigned n = 1 ; ( rank_rev + n < size ) && ! ( rank_rev & n ) ; n <<= 1 ) { ++count ; }
	return count ;
	}

	} // namespace
	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	void execute_function_noop( ThreadsExec & , const void * ) {}

	void ThreadsExec::driver(void)
	{
	ThreadsExec this_thread ;

	while ( ThreadsExec::Active == this_thread.m_pool_state ) {

	(*s_current_function)( this_thread , s_current_function_arg );

	// Deactivate thread and wait for reactivation
	this_thread.m_pool_state = ThreadsExec::Inactive ;

	wait_yield( this_thread.m_pool_state , ThreadsExec::Inactive );
	}
	}

	ThreadsExec::ThreadsExec()
	: m_pool_base(0)
	, m_scratch(0)
	, m_scratch_reduce_end(0)
	, m_scratch_thread_end(0)
	, m_numa_rank(0)
	, m_numa_core_rank(0)
	, m_pool_rank(0)
	, m_pool_size(0)
	, m_pool_fan_size(0)
	, m_pool_state( ThreadsExec::Terminating )
	{
	if ( & s_threads_process != this ) {

	// A spawned thread

	ThreadsExec * const nil = 0 ;

	// Which entry in 's_threads_exec', possibly determined from hwloc binding
	const int entry = ((size_t)s_current_function_arg) < size_t(s_thread_pool_size[0])
	? ((size_t)s_current_function_arg)
	: size_t(Kokkos::hwloc::bind_this_thread( s_thread_pool_size[0] , s_threads_coord ));

	// Given a good entry set this thread in the 's_threads_exec' array
	if ( entry < s_thread_pool_size[0] &&
	nil == atomic_compare_exchange( s_threads_exec + entry , nil , this ) ) {

	const std::pair<unsigned,unsigned> coord = Kokkos::hwloc::get_this_thread_coordinate();

	m_numa_rank = coord.first ;
	m_numa_core_rank = coord.second ;
	m_pool_base = s_threads_exec ;
	m_pool_rank = s_thread_pool_size[0] - ( entry + 1 );
	m_pool_rank_rev = s_thread_pool_size[0] - ( pool_rank() + 1 );
	m_pool_size = s_thread_pool_size[0] ;
	m_pool_fan_size = fan_size( m_pool_rank , m_pool_size );
	m_pool_state = ThreadsExec::Active ;

	s_threads_pid[ m_pool_rank ] = pthread_self();

	// Inform spawning process that the threads_exec entry has been set.
	s_threads_process.m_pool_state = ThreadsExec::Active ;
	}
	else {
	// Inform spawning process that the threads_exec entry could not be set.
	s_threads_process.m_pool_state = ThreadsExec::Terminating ;
	}
	}
	else {
	// Enables 'parallel_for' to execute on unitialized Threads device
	m_pool_rank = 0 ;
	m_pool_size = 1 ;
	m_pool_state = ThreadsExec::Inactive ;

	s_threads_pid[ m_pool_rank ] = pthread_self();
	}
	}

	ThreadsExec::~ThreadsExec()
	{
	const unsigned entry = m_pool_size - ( m_pool_rank + 1 );

	typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > Record ;

	if ( m_scratch ) {
	Record * const r = Record::get_record( m_scratch );

	m_scratch = 0 ;

	Record::decrement( r );
	}

	m_pool_base = 0 ;
	m_scratch_reduce_end = 0 ;
	m_scratch_thread_end = 0 ;
	m_numa_rank = 0 ;
	m_numa_core_rank = 0 ;
	m_pool_rank = 0 ;
	m_pool_size = 0 ;
	m_pool_fan_size = 0 ;

	m_pool_state = ThreadsExec::Terminating ;

	if ( & s_threads_process != this && entry < MAX_THREAD_COUNT ) {
	ThreadsExec * const nil = 0 ;

	atomic_compare_exchange( s_threads_exec + entry , this , nil );

	s_threads_process.m_pool_state = ThreadsExec::Terminating ;
	}
	}


	int ThreadsExec::get_thread_count()
	{
	return s_thread_pool_size[0] ;
	}

	ThreadsExec * ThreadsExec::get_thread( const int init_thread_rank )
	{
	ThreadsExec * const th =
	init_thread_rank < s_thread_pool_size[0]
	? s_threads_exec[ s_thread_pool_size[0] - ( init_thread_rank + 1 ) ] : 0 ;

	if ( 0 == th \|\| th->m_pool_rank != init_thread_rank ) {
	std::ostringstream msg ;
	msg << "Kokkos::Impl::ThreadsExec::get_thread ERROR : "
	<< "thread " << init_thread_rank << " of " << s_thread_pool_size[0] ;
	if ( 0 == th ) {
	msg << " does not exist" ;
	}
	else {
	msg << " has wrong thread_rank " << th->m_pool_rank ;
	}
	Kokkos::Impl::throw_runtime_exception( msg.str() );
	}

	return th ;
	}

	//----------------------------------------------------------------------------

	void ThreadsExec::execute_sleep( ThreadsExec & exec , const void * )
	{
	ThreadsExec::global_lock();
	ThreadsExec::global_unlock();

	const int n = exec.m_pool_fan_size ;
	const int rank_rev = exec.m_pool_size - ( exec.m_pool_rank + 1 );

	for ( int i = 0 ; i < n ; ++i ) {
	- Impl::spinwait( exec.m_pool_base[ rank_rev + (1<<i) ]->m_pool_state , ThreadsExec::Active );
	+ Impl::spinwait_while_equal( exec.m_pool_base[ rank_rev + (1<<i) ]->m_pool_state , ThreadsExec::Active );
	}

	exec.m_pool_state = ThreadsExec::Inactive ;
	}

	}
	}

	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	void ThreadsExec::verify_is_process( const std::string & name , const bool initialized )
	{
	if ( ! is_process() ) {
	std::string msg( name );
	msg.append( " FAILED : Called by a worker thread, can only be called by the master process." );
	Kokkos::Impl::throw_runtime_exception( msg );
	}

	if ( initialized && 0 == s_thread_pool_size[0] ) {
	std::string msg( name );
	msg.append( " FAILED : Threads not initialized." );
	Kokkos::Impl::throw_runtime_exception( msg );
	}
	}

	int ThreadsExec::in_parallel()
	{
	// A thread function is in execution and
	// the function argument is not the special threads process argument and
	// the master process is a worker or is not the master process.
	return s_current_function &&
	( & s_threads_process != s_current_function_arg ) &&
	( s_threads_process.m_pool_base \|\| ! is_process() );
	}

	// Wait for root thread to become inactive
	void ThreadsExec::fence()
	{
	if ( s_thread_pool_size[0] ) {
	// Wait for the root thread to complete:
	- Impl::spinwait( s_threads_exec[0]->m_pool_state , ThreadsExec::Active );
	+ Impl::spinwait_while_equal( s_threads_exec[0]->m_pool_state , ThreadsExec::Active );
	}

	s_current_function = 0 ;
	s_current_function_arg = 0 ;

	// Make sure function and arguments are cleared before
	// potentially re-activating threads with a subsequent launch.
	memory_fence();
	}

	/** \brief Begin execution of the asynchronous functor */
	void ThreadsExec::start( void (func)( ThreadsExec & , const void ) , const void * arg )
	{
	verify_is_process("ThreadsExec::start" , true );

	if ( s_current_function \|\| s_current_function_arg ) {
	Kokkos::Impl::throw_runtime_exception( std::string( "ThreadsExec::start() FAILED : already executing" ) );
	}

	s_current_function = func ;
	s_current_function_arg = arg ;

	// Make sure function and arguments are written before activating threads.
	memory_fence();

	// Activate threads:
	for ( int i = s_thread_pool_size[0] ; 0 < i-- ; ) {
	s_threads_exec[i]->m_pool_state = ThreadsExec::Active ;
	}

	if ( s_threads_process.m_pool_size ) {
	// Master process is the root thread, run it:
	(*func)( s_threads_process , arg );
	s_threads_process.m_pool_state = ThreadsExec::Inactive ;
	}
	}

	//----------------------------------------------------------------------------

	bool ThreadsExec::sleep()
	{
	verify_is_process("ThreadsExec::sleep", true );

	if ( & execute_sleep == s_current_function ) return false ;

	fence();

	ThreadsExec::global_lock();

	s_current_function = & execute_sleep ;

	// Activate threads:
	for ( unsigned i = s_thread_pool_size[0] ; 0 < i ; ) {
	s_threads_exec[--i]->m_pool_state = ThreadsExec::Active ;
	}

	return true ;
	}

	bool ThreadsExec::wake()
	{
	verify_is_process("ThreadsExec::wake", true );

	if ( & execute_sleep != s_current_function ) return false ;

	ThreadsExec::global_unlock();

	if ( s_threads_process.m_pool_base ) {
	execute_sleep( s_threads_process , 0 );
	s_threads_process.m_pool_state = ThreadsExec::Inactive ;
	}

	fence();

	return true ;
	}

	//----------------------------------------------------------------------------

	void ThreadsExec::execute_serial( void (func)( ThreadsExec & , const void ) )
	{
	s_current_function = func ;
	s_current_function_arg = & s_threads_process ;

	// Make sure function and arguments are written before activating threads.
	memory_fence();

	const unsigned begin = s_threads_process.m_pool_base ? 1 : 0 ;

	for ( unsigned i = s_thread_pool_size[0] ; begin < i ; ) {
	ThreadsExec & th = * s_threads_exec[ --i ];

	th.m_pool_state = ThreadsExec::Active ;

	wait_yield( th.m_pool_state , ThreadsExec::Active );
	}

	if ( s_threads_process.m_pool_base ) {
	s_threads_process.m_pool_state = ThreadsExec::Active ;
	(*func)( s_threads_process , 0 );
	s_threads_process.m_pool_state = ThreadsExec::Inactive ;
	}

	s_current_function_arg = 0 ;
	s_current_function = 0 ;

	// Make sure function and arguments are cleared before proceeding.
	memory_fence();
	}

	//----------------------------------------------------------------------------

	void * ThreadsExec::root_reduce_scratch()
	{
	return s_threads_process.reduce_memory();
	}

	void ThreadsExec::execute_resize_scratch( ThreadsExec & exec , const void * )
	{
	typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > Record ;

	if ( exec.m_scratch ) {
	Record * const r = Record::get_record( exec.m_scratch );

	exec.m_scratch = 0 ;

	Record::decrement( r );
	}

	exec.m_scratch_reduce_end = s_threads_process.m_scratch_reduce_end ;
	exec.m_scratch_thread_end = s_threads_process.m_scratch_thread_end ;

	if ( s_threads_process.m_scratch_thread_end ) {

	// Allocate tracked memory:
	{
	Record * const r = Record::allocate( Kokkos::HostSpace() , "thread_scratch" , s_threads_process.m_scratch_thread_end );

	Record::increment( r );

	exec.m_scratch = r->data();
	}

	unsigned * ptr = reinterpret_cast<unsigned *>( exec.m_scratch );

	unsigned * const end = ptr + s_threads_process.m_scratch_thread_end / sizeof(unsigned);

	// touch on this thread
	while ( ptr < end ) *ptr++ = 0 ;
	}
	}

	void * ThreadsExec::resize_scratch( size_t reduce_size , size_t thread_size )
	{
	enum { ALIGN_MASK = Kokkos::Impl::MEMORY_ALIGNMENT - 1 };

	fence();

	const size_t old_reduce_size = s_threads_process.m_scratch_reduce_end ;
	const size_t old_thread_size = s_threads_process.m_scratch_thread_end - s_threads_process.m_scratch_reduce_end ;

	reduce_size = ( reduce_size + ALIGN_MASK ) & ~ALIGN_MASK ;
	thread_size = ( thread_size + ALIGN_MASK ) & ~ALIGN_MASK ;

	// Increase size or deallocate completely.

	if ( ( old_reduce_size < reduce_size ) \|\|
	( old_thread_size < thread_size ) \|\|
	( ( reduce_size == 0 && thread_size == 0 ) &&
	( old_reduce_size != 0 \|\| old_thread_size != 0 ) ) ) {

	verify_is_process( "ThreadsExec::resize_scratch" , true );

	s_threads_process.m_scratch_reduce_end = reduce_size ;
	s_threads_process.m_scratch_thread_end = reduce_size + thread_size ;

	execute_serial( & execute_resize_scratch );

	s_threads_process.m_scratch = s_threads_exec[0]->m_scratch ;
	}

	return s_threads_process.m_scratch ;
	}

	//----------------------------------------------------------------------------

	void ThreadsExec::print_configuration( std::ostream & s , const bool detail )
	{
	verify_is_process("ThreadsExec::print_configuration",false);

	fence();

	const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
	const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
	const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();

	// Forestall compiler warnings for unused variables.
	(void) numa_count;
	(void) cores_per_numa;
	(void) threads_per_core;

	s << "Kokkos::Threads" ;

	#if defined( KOKKOS_ENABLE_PTHREAD )
	s << " KOKKOS_ENABLE_PTHREAD" ;
	#endif
	#if defined( KOKKOS_ENABLE_HWLOC )
	s << " hwloc[" << numa_count << "x" << cores_per_numa << "x" << threads_per_core << "]" ;
	#endif

	if ( s_thread_pool_size[0] ) {
	s << " threads[" << s_thread_pool_size[0] << "]"
	<< " threads_per_numa[" << s_thread_pool_size[1] << "]"
	<< " threads_per_core[" << s_thread_pool_size[2] << "]"
	;
	if ( 0 == s_threads_process.m_pool_base ) { s << " Asynchronous" ; }
	s << " ReduceScratch[" << s_current_reduce_size << "]"
	<< " SharedScratch[" << s_current_shared_size << "]" ;
	s << std::endl ;

	if ( detail ) {

	for ( int i = 0 ; i < s_thread_pool_size[0] ; ++i ) {

	ThreadsExec * const th = s_threads_exec[i] ;

	if ( th ) {

	const int rank_rev = th->m_pool_size - ( th->m_pool_rank + 1 );

	s << " Thread[ " << th->m_pool_rank << " : "
	<< th->m_numa_rank << "." << th->m_numa_core_rank << " ]" ;

	s << " Fan{" ;
	for ( int j = 0 ; j < th->m_pool_fan_size ; ++j ) {
	ThreadsExec * const thfan = th->m_pool_base[rank_rev+(1<<j)] ;
	s << " [ " << thfan->m_pool_rank << " : "
	<< thfan->m_numa_rank << "." << thfan->m_numa_core_rank << " ]" ;
	}
	s << " }" ;

	if ( th == & s_threads_process ) {
	s << " is_process" ;
	}
	}
	s << std::endl ;
	}
	}
	}
	else {
	s << " not initialized" << std::endl ;
	}
	}

	//----------------------------------------------------------------------------

	int ThreadsExec::is_initialized()
	{ return 0 != s_threads_exec[0] ; }

	void ThreadsExec::initialize( unsigned thread_count ,
	unsigned use_numa_count ,
	unsigned use_cores_per_numa ,
	bool allow_asynchronous_threadpool )
	{
	static const Sentinel sentinel ;

	const bool is_initialized = 0 != s_thread_pool_size[0] ;

	unsigned thread_spawn_failed = 0 ;

	for ( int i = 0; i < ThreadsExec::MAX_THREAD_COUNT ; i++)
	s_threads_exec[i] = NULL;

	if ( ! is_initialized ) {

	// If thread_count, use_numa_count, or use_cores_per_numa are zero
	// then they will be given default values based upon hwloc detection
	// and allowed asynchronous execution.

	const bool hwloc_avail = Kokkos::hwloc::available();
	const bool hwloc_can_bind = hwloc_avail && Kokkos::hwloc::can_bind_threads();

	if ( thread_count == 0 ) {
	thread_count = hwloc_avail
	? Kokkos::hwloc::get_available_numa_count() *
	Kokkos::hwloc::get_available_cores_per_numa() *
	Kokkos::hwloc::get_available_threads_per_core()
	: 1 ;
	}

	const unsigned thread_spawn_begin =
	hwloc::thread_mapping( "Kokkos::Threads::initialize" ,
	allow_asynchronous_threadpool ,
	thread_count ,
	use_numa_count ,
	use_cores_per_numa ,
	s_threads_coord );

	const std::pair<unsigned,unsigned> proc_coord = s_threads_coord[0] ;

	if ( thread_spawn_begin ) {
	// Synchronous with s_threads_coord[0] as the process core
	// Claim entry #0 for binding the process core.
	s_threads_coord[0] = std::pair<unsigned,unsigned>(~0u,~0u);
	}

	s_thread_pool_size[0] = thread_count ;
	s_thread_pool_size[1] = s_thread_pool_size[0] / use_numa_count ;
	s_thread_pool_size[2] = s_thread_pool_size[1] / use_cores_per_numa ;
	s_current_function = & execute_function_noop ; // Initialization work function

	for ( unsigned ith = thread_spawn_begin ; ith < thread_count ; ++ith ) {

	s_threads_process.m_pool_state = ThreadsExec::Inactive ;

	// If hwloc available then spawned thread will
	// choose its own entry in 's_threads_coord'
	// otherwise specify the entry.
	s_current_function_arg = (void*)static_cast<uintptr_t>( hwloc_can_bind ? ~0u : ith );

	// Make sure all outstanding memory writes are complete
	// before spawning the new thread.
	memory_fence();

	// Spawn thread executing the 'driver()' function.
	// Wait until spawned thread has attempted to initialize.
	// If spawning and initialization is successfull then
	// an entry in 's_threads_exec' will be assigned.
	if ( ThreadsExec::spawn() ) {
	wait_yield( s_threads_process.m_pool_state , ThreadsExec::Inactive );
	}
	if ( s_threads_process.m_pool_state == ThreadsExec::Terminating ) break ;
	}

	// Wait for all spawned threads to deactivate before zeroing the function.

	for ( unsigned ith = thread_spawn_begin ; ith < thread_count ; ++ith ) {
	// Try to protect against cache coherency failure by casting to volatile.
	ThreadsExec * const th = ((ThreadsExec * volatile *)s_threads_exec)[ith] ;
	if ( th ) {
	wait_yield( th->m_pool_state , ThreadsExec::Active );
	}
	else {
	++thread_spawn_failed ;
	}
	}

	s_current_function = 0 ;
	s_current_function_arg = 0 ;
	s_threads_process.m_pool_state = ThreadsExec::Inactive ;

	memory_fence();

	if ( ! thread_spawn_failed ) {
	// Bind process to the core on which it was located before spawning occured
	if (hwloc_can_bind) {
	Kokkos::hwloc::bind_this_thread( proc_coord );
	}

	if ( thread_spawn_begin ) { // Include process in pool.
	const std::pair<unsigned,unsigned> coord = Kokkos::hwloc::get_this_thread_coordinate();

	s_threads_exec[0] = & s_threads_process ;
	s_threads_process.m_numa_rank = coord.first ;
	s_threads_process.m_numa_core_rank = coord.second ;
	s_threads_process.m_pool_base = s_threads_exec ;
	s_threads_process.m_pool_rank = thread_count - 1 ; // Reversed for scan-compatible reductions
	s_threads_process.m_pool_size = thread_count ;
	s_threads_process.m_pool_fan_size = fan_size( s_threads_process.m_pool_rank , s_threads_process.m_pool_size );
	s_threads_pid[ s_threads_process.m_pool_rank ] = pthread_self();
	}
	else {
	s_threads_process.m_pool_base = 0 ;
	s_threads_process.m_pool_rank = 0 ;
	s_threads_process.m_pool_size = 0 ;
	s_threads_process.m_pool_fan_size = 0 ;
	}

	// Initial allocations:
	ThreadsExec::resize_scratch( 1024 , 1024 );
	}
	else {
	s_thread_pool_size[0] = 0 ;
	s_thread_pool_size[1] = 0 ;
	s_thread_pool_size[2] = 0 ;
	}
	}

	if ( is_initialized \|\| thread_spawn_failed ) {

	std::ostringstream msg ;

	msg << "Kokkos::Threads::initialize ERROR" ;

	if ( is_initialized ) {
	msg << " : already initialized" ;
	}
	if ( thread_spawn_failed ) {
	msg << " : failed to spawn " << thread_spawn_failed << " threads" ;
	}

	Kokkos::Impl::throw_runtime_exception( msg.str() );
	}

	// Check for over-subscription
	//if( Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node() ) {
	// std::cout << "Kokkos::Threads::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
	// std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
	// std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
	// std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
	//}

	// Init the array for used for arbitrarily sized atomics
	Impl::init_lock_array_host_space();

	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	Kokkos::Profiling::initialize();
	#endif
	}

	//----------------------------------------------------------------------------

	void ThreadsExec::finalize()
	{
	verify_is_process("ThreadsExec::finalize",false);

	fence();

	resize_scratch(0,0);

	const unsigned begin = s_threads_process.m_pool_base ? 1 : 0 ;

	for ( unsigned i = s_thread_pool_size[0] ; begin < i-- ; ) {

	if ( s_threads_exec[i] ) {

	s_threads_exec[i]->m_pool_state = ThreadsExec::Terminating ;

	wait_yield( s_threads_process.m_pool_state , ThreadsExec::Inactive );

	s_threads_process.m_pool_state = ThreadsExec::Inactive ;
	}

	s_threads_pid[i] = 0 ;
	}

	if ( s_threads_process.m_pool_base ) {
	( & s_threads_process )->~ThreadsExec();
	s_threads_exec[0] = 0 ;
	}

	if (Kokkos::hwloc::can_bind_threads() ) {
	Kokkos::hwloc::unbind_this_thread();
	}

	s_thread_pool_size[0] = 0 ;
	s_thread_pool_size[1] = 0 ;
	s_thread_pool_size[2] = 0 ;

	// Reset master thread to run solo.
	s_threads_process.m_numa_rank = 0 ;
	s_threads_process.m_numa_core_rank = 0 ;
	s_threads_process.m_pool_base = 0 ;
	s_threads_process.m_pool_rank = 0 ;
	s_threads_process.m_pool_size = 1 ;
	s_threads_process.m_pool_fan_size = 0 ;
	s_threads_process.m_pool_state = ThreadsExec::Inactive ;

	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	Kokkos::Profiling::finalize();
	#endif
	}

	//----------------------------------------------------------------------------

	} /* namespace Impl */
	} /* namespace Kokkos */

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {

	int Threads::concurrency() {
	return thread_pool_size(0);
	}

	Threads & Threads::instance(int)
	{
	static Threads t ;
	return t ;
	}

	int Threads::thread_pool_size( int depth )
	{
	return Impl::s_thread_pool_size[depth];
	}

	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	int Threads::thread_pool_rank()
	{
	const pthread_t pid = pthread_self();
	int i = 0;
	while ( ( i < Impl::s_thread_pool_size[0] ) && ( pid != Impl::s_threads_pid[i] ) ) { ++i ; }
	return i ;
	}
	#endif

	} /* namespace Kokkos */

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	#endif /* #if defined( KOKKOS_ENABLE_PTHREAD ) \|\| defined( KOKKOS_ENABLE_WINTHREAD ) */

	diff --git a/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.hpp b/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.hpp
	index 385dd492d..a6db02eba 100644
	--- a/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.hpp
	+++ b/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.hpp
	@@ -1,631 +1,631 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_THREADSEXEC_HPP
	#define KOKKOS_THREADSEXEC_HPP

	#include <stdio.h>

	#include <utility>
	#include <impl/Kokkos_spinwait.hpp>
	#include <impl/Kokkos_FunctorAdapter.hpp>

	#include <Kokkos_Atomic.hpp>

	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	class ThreadsExec {
	public:

	// Fan array has log_2(NT) reduction threads plus 2 scan threads
	// Currently limited to 16k threads.
	enum { MAX_FAN_COUNT = 16 };
	enum { MAX_THREAD_COUNT = 1 << ( MAX_FAN_COUNT - 2 ) };
	enum { VECTOR_LENGTH = 8 };

	/** \brief States of a worker thread */
	enum { Terminating ///< Termination in progress
	, Inactive ///< Exists, waiting for work
	, Active ///< Exists, performing work
	, Rendezvous ///< Exists, waiting in a barrier or reduce

	, ScanCompleted
	, ScanAvailable
	, ReductionAvailable
	};

	private:

	friend class Kokkos::Threads ;

	// Fan-in operations' root is the highest ranking thread
	// to place the 'scan' reduction intermediate values on
	// the threads that need them.
	// For a simple reduction the thread location is arbitrary.

	ThreadsExec * const * m_pool_base ; ///< Base for pool fan-in

	void * m_scratch ;
	int m_scratch_reduce_end ;
	int m_scratch_thread_end ;
	int m_numa_rank ;
	int m_numa_core_rank ;
	int m_pool_rank ;
	int m_pool_rank_rev ;
	int m_pool_size ;
	int m_pool_fan_size ;
	int volatile m_pool_state ; ///< State for global synchronizations

	// Members for dynamic scheduling
	// Which thread am I stealing from currently
	int m_current_steal_target;
	// This thread's owned work_range
	Kokkos::pair<long,long> m_work_range KOKKOS_ALIGN(16);
	// Team Offset if one thread determines work_range for others
	long m_team_work_index;

	// Is this thread stealing (i.e. its owned work_range is exhausted
	bool m_stealing;

	static void global_lock();
	static void global_unlock();
	static bool spawn();

	static void execute_resize_scratch( ThreadsExec & , const void * );
	static void execute_sleep( ThreadsExec & , const void * );

	ThreadsExec( const ThreadsExec & );
	ThreadsExec & operator = ( const ThreadsExec & );

	static void execute_serial( void ()( ThreadsExec & , const void ) );

	public:

	KOKKOS_INLINE_FUNCTION int pool_size() const { return m_pool_size ; }
	KOKKOS_INLINE_FUNCTION int pool_rank() const { return m_pool_rank ; }
	KOKKOS_INLINE_FUNCTION int numa_rank() const { return m_numa_rank ; }
	KOKKOS_INLINE_FUNCTION int numa_core_rank() const { return m_numa_core_rank ; }
	inline long team_work_index() const { return m_team_work_index ; }

	static int get_thread_count();
	static ThreadsExec * get_thread( const int init_thread_rank );

	inline void * reduce_memory() const { return m_scratch ; }
	KOKKOS_INLINE_FUNCTION void * scratch_memory() const
	{ return reinterpret_cast<unsigned char *>(m_scratch) + m_scratch_reduce_end ; }

	KOKKOS_INLINE_FUNCTION int volatile & state() { return m_pool_state ; }
	KOKKOS_INLINE_FUNCTION ThreadsExec * const * pool_base() const { return m_pool_base ; }

	static void driver(void);

	~ThreadsExec();
	ThreadsExec();

	static void * resize_scratch( size_t reduce_size , size_t thread_size );

	static void * root_reduce_scratch();

	static bool is_process();

	static void verify_is_process( const std::string & , const bool initialized );

	static int is_initialized();

	static void initialize( unsigned thread_count ,
	unsigned use_numa_count ,
	unsigned use_cores_per_numa ,
	bool allow_asynchronous_threadpool );

	static void finalize();

	/* Given a requested team size, return valid team size */
	static unsigned team_size_valid( unsigned );

	static void print_configuration( std::ostream & , const bool detail = false );

	//------------------------------------

	static void wait_yield( volatile int & , const int );

	//------------------------------------
	// All-thread functions:

	inline
	int all_reduce( const int value )
	{
	// Make sure there is enough scratch space:
	const int rev_rank = m_pool_size - ( m_pool_rank + 1 );

	((volatile int) reduce_memory()) = value ;

	memory_fence();

	// Fan-in reduction with highest ranking thread as the root
	for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
	// Wait: Active -> Rendezvous
	- Impl::spinwait( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
	+ Impl::spinwait_while_equal( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
	}

	if ( rev_rank ) {
	m_pool_state = ThreadsExec::Rendezvous ;
	// Wait: Rendezvous -> Active
	- Impl::spinwait( m_pool_state , ThreadsExec::Rendezvous );
	+ Impl::spinwait_while_equal( m_pool_state , ThreadsExec::Rendezvous );
	}
	else {
	// Root thread does the reduction and broadcast

	int accum = 0 ;

	for ( int rank = 0 ; rank < m_pool_size ; ++rank ) {
	accum += ((volatile int ) get_thread( rank )->reduce_memory());
	}

	for ( int rank = 0 ; rank < m_pool_size ; ++rank ) {
	((volatile int ) get_thread( rank )->reduce_memory()) = accum ;
	}

	memory_fence();

	for ( int rank = 0 ; rank < m_pool_size ; ++rank ) {
	get_thread( rank )->m_pool_state = ThreadsExec::Active ;
	}
	}

	return ((volatile int) reduce_memory());
	}

	inline
	void barrier( )
	{
	// Make sure there is enough scratch space:
	const int rev_rank = m_pool_size - ( m_pool_rank + 1 );

	memory_fence();

	// Fan-in reduction with highest ranking thread as the root
	for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
	// Wait: Active -> Rendezvous
	- Impl::spinwait( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
	+ Impl::spinwait_while_equal( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
	}

	if ( rev_rank ) {
	m_pool_state = ThreadsExec::Rendezvous ;
	// Wait: Rendezvous -> Active
	- Impl::spinwait( m_pool_state , ThreadsExec::Rendezvous );
	+ Impl::spinwait_while_equal( m_pool_state , ThreadsExec::Rendezvous );
	}
	else {
	// Root thread does the reduction and broadcast

	memory_fence();

	for ( int rank = 0 ; rank < m_pool_size ; ++rank ) {
	get_thread( rank )->m_pool_state = ThreadsExec::Active ;
	}
	}
	}

	//------------------------------------
	// All-thread functions:

	template< class FunctorType , class ArgTag >
	inline
	void fan_in_reduce( const FunctorType & f ) const
	{
	typedef Kokkos::Impl::FunctorValueJoin< FunctorType , ArgTag > Join ;
	typedef Kokkos::Impl::FunctorFinal< FunctorType , ArgTag > Final ;

	const int rev_rank = m_pool_size - ( m_pool_rank + 1 );

	for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {

	ThreadsExec & fan = *m_pool_base[ rev_rank + ( 1 << i ) ] ;

	- Impl::spinwait( fan.m_pool_state , ThreadsExec::Active );
	+ Impl::spinwait_while_equal( fan.m_pool_state , ThreadsExec::Active );

	Join::join( f , reduce_memory() , fan.reduce_memory() );
	}

	if ( ! rev_rank ) {
	Final::final( f , reduce_memory() );
	}
	}

	inline
	void fan_in() const
	{
	const int rev_rank = m_pool_size - ( m_pool_rank + 1 );

	for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
	- Impl::spinwait( m_pool_base[rev_rank+(1<<i)]->m_pool_state , ThreadsExec::Active );
	+ Impl::spinwait_while_equal( m_pool_base[rev_rank+(1<<i)]->m_pool_state , ThreadsExec::Active );
	}
	}

	template< class FunctorType , class ArgTag >
	inline
	void scan_large( const FunctorType & f )
	{
	// Sequence of states:
	// 0) Active : entry and exit state
	// 1) ReductionAvailable : reduction value available
	// 2) ScanAvailable : inclusive scan value available
	// 3) Rendezvous : All threads inclusive scan value are available
	// 4) ScanCompleted : exclusive scan value copied

	typedef Kokkos::Impl::FunctorValueTraits< FunctorType , ArgTag > Traits ;
	typedef Kokkos::Impl::FunctorValueJoin< FunctorType , ArgTag > Join ;
	typedef Kokkos::Impl::FunctorValueInit< FunctorType , ArgTag > Init ;

	typedef typename Traits::value_type scalar_type ;

	const int rev_rank = m_pool_size - ( m_pool_rank + 1 );
	const unsigned count = Traits::value_count( f );

	scalar_type * const work_value = (scalar_type *) reduce_memory();

	//--------------------------------
	// Fan-in reduction with highest ranking thread as the root
	for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
	ThreadsExec & fan = *m_pool_base[ rev_rank + (1<<i) ];

	// Wait: Active -> ReductionAvailable (or ScanAvailable)
	- Impl::spinwait( fan.m_pool_state , ThreadsExec::Active );
	+ Impl::spinwait_while_equal( fan.m_pool_state , ThreadsExec::Active );
	Join::join( f , work_value , fan.reduce_memory() );
	}

	// Copy reduction value to scan value before releasing from this phase.
	for ( unsigned i = 0 ; i < count ; ++i ) { work_value[i+count] = work_value[i] ; }

	if ( rev_rank ) {

	// Set: Active -> ReductionAvailable
	m_pool_state = ThreadsExec::ReductionAvailable ;

	// Wait for contributing threads' scan value to be available.
	if ( ( 1 << m_pool_fan_size ) < ( m_pool_rank + 1 ) ) {
	ThreadsExec & th = *m_pool_base[ rev_rank + ( 1 << m_pool_fan_size ) ] ;

	// Wait: Active -> ReductionAvailable
	// Wait: ReductionAvailable -> ScanAvailable
	- Impl::spinwait( th.m_pool_state , ThreadsExec::Active );
	- Impl::spinwait( th.m_pool_state , ThreadsExec::ReductionAvailable );
	+ Impl::spinwait_while_equal( th.m_pool_state , ThreadsExec::Active );
	+ Impl::spinwait_while_equal( th.m_pool_state , ThreadsExec::ReductionAvailable );

	Join::join( f , work_value + count , ((scalar_type *)th.reduce_memory()) + count );
	}

	// This thread has completed inclusive scan
	// Set: ReductionAvailable -> ScanAvailable
	m_pool_state = ThreadsExec::ScanAvailable ;

	// Wait for all threads to complete inclusive scan
	// Wait: ScanAvailable -> Rendezvous
	- Impl::spinwait( m_pool_state , ThreadsExec::ScanAvailable );
	+ Impl::spinwait_while_equal( m_pool_state , ThreadsExec::ScanAvailable );
	}

	//--------------------------------

	for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
	ThreadsExec & fan = *m_pool_base[ rev_rank + (1<<i) ];
	// Wait: ReductionAvailable -> ScanAvailable
	- Impl::spinwait( fan.m_pool_state , ThreadsExec::ReductionAvailable );
	+ Impl::spinwait_while_equal( fan.m_pool_state , ThreadsExec::ReductionAvailable );
	// Set: ScanAvailable -> Rendezvous
	fan.m_pool_state = ThreadsExec::Rendezvous ;
	}

	// All threads have completed the inclusive scan.
	// All non-root threads are in the Rendezvous state.
	// Threads are free to overwrite their reduction value.
	//--------------------------------

	if ( ( rev_rank + 1 ) < m_pool_size ) {
	// Exclusive scan: copy the previous thread's inclusive scan value

	ThreadsExec & th = *m_pool_base[ rev_rank + 1 ] ; // Not the root thread

	const scalar_type * const src_value = ((scalar_type *)th.reduce_memory()) + count ;

	for ( unsigned j = 0 ; j < count ; ++j ) { work_value[j] = src_value[j]; }
	}
	else {
	(void) Init::init( f , work_value );
	}

	//--------------------------------
	// Wait for all threads to copy previous thread's inclusive scan value
	// Wait for all threads: Rendezvous -> ScanCompleted
	for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
	- Impl::spinwait( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Rendezvous );
	+ Impl::spinwait_while_equal( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Rendezvous );
	}
	if ( rev_rank ) {
	// Set: ScanAvailable -> ScanCompleted
	m_pool_state = ThreadsExec::ScanCompleted ;
	// Wait: ScanCompleted -> Active
	- Impl::spinwait( m_pool_state , ThreadsExec::ScanCompleted );
	+ Impl::spinwait_while_equal( m_pool_state , ThreadsExec::ScanCompleted );
	}
	// Set: ScanCompleted -> Active
	for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
	m_pool_base[ rev_rank + (1<<i) ]->m_pool_state = ThreadsExec::Active ;
	}
	}

	template< class FunctorType , class ArgTag >
	inline
	void scan_small( const FunctorType & f )
	{
	typedef Kokkos::Impl::FunctorValueTraits< FunctorType , ArgTag > Traits ;
	typedef Kokkos::Impl::FunctorValueJoin< FunctorType , ArgTag > Join ;
	typedef Kokkos::Impl::FunctorValueInit< FunctorType , ArgTag > Init ;

	typedef typename Traits::value_type scalar_type ;

	const int rev_rank = m_pool_size - ( m_pool_rank + 1 );
	const unsigned count = Traits::value_count( f );

	scalar_type * const work_value = (scalar_type *) reduce_memory();

	//--------------------------------
	// Fan-in reduction with highest ranking thread as the root
	for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
	// Wait: Active -> Rendezvous
	- Impl::spinwait( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
	+ Impl::spinwait_while_equal( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
	}

	for ( unsigned i = 0 ; i < count ; ++i ) { work_value[i+count] = work_value[i]; }

	if ( rev_rank ) {
	m_pool_state = ThreadsExec::Rendezvous ;
	// Wait: Rendezvous -> Active
	- Impl::spinwait( m_pool_state , ThreadsExec::Rendezvous );
	+ Impl::spinwait_while_equal( m_pool_state , ThreadsExec::Rendezvous );
	}
	else {
	// Root thread does the thread-scan before releasing threads

	scalar_type * ptr_prev = 0 ;

	for ( int rank = 0 ; rank < m_pool_size ; ++rank ) {
	scalar_type * const ptr = (scalar_type *) get_thread( rank )->reduce_memory();
	if ( rank ) {
	for ( unsigned i = 0 ; i < count ; ++i ) { ptr[i] = ptr_prev[ i + count ]; }
	Join::join( f , ptr + count , ptr );
	}
	else {
	(void) Init::init( f , ptr );
	}
	ptr_prev = ptr ;
	}
	}

	for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
	m_pool_base[ rev_rank + (1<<i) ]->m_pool_state = ThreadsExec::Active ;
	}
	}

	//------------------------------------
	/** \brief Wait for previous asynchronous functor to
	* complete and release the Threads device.
	* Acquire the Threads device and start this functor.
	*/
	static void start( void ()( ThreadsExec & , const void ) , const void * );

	static int in_parallel();
	static void fence();
	static bool sleep();
	static bool wake();

	/* Dynamic Scheduling related functionality */
	// Initialize the work range for this thread
	inline void set_work_range(const long& begin, const long& end, const long& chunk_size) {
	m_work_range.first = (begin+chunk_size-1)/chunk_size;
	m_work_range.second = end>0?(end+chunk_size-1)/chunk_size:m_work_range.first;
	}

	// Claim and index from this thread's range from the beginning
	inline long get_work_index_begin () {
	Kokkos::pair<long,long> work_range_new = m_work_range;
	Kokkos::pair<long,long> work_range_old = work_range_new;
	if(work_range_old.first>=work_range_old.second)
	return -1;

	work_range_new.first+=1;

	bool success = false;
	while(!success) {
	work_range_new = Kokkos::atomic_compare_exchange(&m_work_range,work_range_old,work_range_new);
	success = ( (work_range_new == work_range_old) \|\|
	(work_range_new.first>=work_range_new.second));
	work_range_old = work_range_new;
	work_range_new.first+=1;
	}
	if(work_range_old.first<work_range_old.second)
	return work_range_old.first;
	else
	return -1;
	}

	// Claim and index from this thread's range from the end
	inline long get_work_index_end () {
	Kokkos::pair<long,long> work_range_new = m_work_range;
	Kokkos::pair<long,long> work_range_old = work_range_new;
	if(work_range_old.first>=work_range_old.second)
	return -1;
	work_range_new.second-=1;
	bool success = false;
	while(!success) {
	work_range_new = Kokkos::atomic_compare_exchange(&m_work_range,work_range_old,work_range_new);
	success = ( (work_range_new == work_range_old) \|\|
	(work_range_new.first>=work_range_new.second) );
	work_range_old = work_range_new;
	work_range_new.second-=1;
	}
	if(work_range_old.first<work_range_old.second)
	return work_range_old.second-1;
	else
	return -1;
	}

	// Reset the steal target
	inline void reset_steal_target() {
	m_current_steal_target = (m_pool_rank+1)%pool_size();
	m_stealing = false;
	}

	// Reset the steal target
	inline void reset_steal_target(int team_size) {
	m_current_steal_target = (m_pool_rank_rev+team_size);
	if(m_current_steal_target>=pool_size())
	m_current_steal_target = 0;//pool_size()-1;
	m_stealing = false;
	}

	// Get a steal target; start with my-rank + 1 and go round robin, until arriving at this threads rank
	// Returns -1 fi no active steal target available
	inline int get_steal_target() {
	while(( m_pool_base[m_current_steal_target]->m_work_range.second <=
	m_pool_base[m_current_steal_target]->m_work_range.first ) &&
	(m_current_steal_target!=m_pool_rank) ) {
	m_current_steal_target = (m_current_steal_target+1)%pool_size();
	}
	if(m_current_steal_target == m_pool_rank)
	return -1;
	else
	return m_current_steal_target;
	}

	inline int get_steal_target(int team_size) {

	while(( m_pool_base[m_current_steal_target]->m_work_range.second <=
	m_pool_base[m_current_steal_target]->m_work_range.first ) &&
	(m_current_steal_target!=m_pool_rank_rev) ) {
	if(m_current_steal_target + team_size < pool_size())
	m_current_steal_target = (m_current_steal_target+team_size);
	else
	m_current_steal_target = 0;
	}

	if(m_current_steal_target == m_pool_rank_rev)
	return -1;
	else
	return m_current_steal_target;
	}

	inline long steal_work_index (int team_size = 0) {
	long index = -1;
	int steal_target = team_size>0?get_steal_target(team_size):get_steal_target();
	while ( (steal_target != -1) && (index == -1)) {
	index = m_pool_base[steal_target]->get_work_index_end();
	if(index == -1)
	steal_target = team_size>0?get_steal_target(team_size):get_steal_target();
	}
	return index;
	}

	// Get a work index. Claim from owned range until its exhausted, then steal from other thread
	inline long get_work_index (int team_size = 0) {
	long work_index = -1;
	if(!m_stealing) work_index = get_work_index_begin();

	if( work_index == -1) {
	memory_fence();
	m_stealing = true;
	work_index = steal_work_index(team_size);
	}

	m_team_work_index = work_index;
	memory_fence();
	return work_index;
	}

	};

	} /* namespace Impl */
	} /* namespace Kokkos */

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {

	inline int Threads::in_parallel()
	{ return Impl::ThreadsExec::in_parallel(); }

	inline int Threads::is_initialized()
	{ return Impl::ThreadsExec::is_initialized(); }

	inline void Threads::initialize(
	unsigned threads_count ,
	unsigned use_numa_count ,
	unsigned use_cores_per_numa ,
	bool allow_asynchronous_threadpool )
	{
	Impl::ThreadsExec::initialize( threads_count , use_numa_count , use_cores_per_numa , allow_asynchronous_threadpool );
	}

	inline void Threads::finalize()
	{
	Impl::ThreadsExec::finalize();
	}

	inline void Threads::print_configuration( std::ostream & s , const bool detail )
	{
	Impl::ThreadsExec::print_configuration( s , detail );
	}

	inline bool Threads::sleep()
	{ return Impl::ThreadsExec::sleep() ; }

	inline bool Threads::wake()
	{ return Impl::ThreadsExec::wake() ; }

	inline void Threads::fence()
	{ Impl::ThreadsExec::fence() ; }

	} /* namespace Kokkos */

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	#endif /* #define KOKKOS_THREADSEXEC_HPP */

	diff --git a/lib/kokkos/core/src/Threads/Kokkos_ThreadsTeam.hpp b/lib/kokkos/core/src/Threads/Kokkos_ThreadsTeam.hpp
	index b9edb6455..701495428 100644
	--- a/lib/kokkos/core/src/Threads/Kokkos_ThreadsTeam.hpp
	+++ b/lib/kokkos/core/src/Threads/Kokkos_ThreadsTeam.hpp
	@@ -1,916 +1,920 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_THREADSTEAM_HPP
	#define KOKKOS_THREADSTEAM_HPP

	#include <stdio.h>

	#include <utility>
	#include <impl/Kokkos_spinwait.hpp>
	#include <impl/Kokkos_FunctorAdapter.hpp>
	+#include <impl/Kokkos_HostThreadTeam.hpp>

	#include <Kokkos_Atomic.hpp>

	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	//----------------------------------------------------------------------------

	template< class > struct ThreadsExecAdapter ;

	//----------------------------------------------------------------------------

	class ThreadsExecTeamMember {
	private:

	enum { TEAM_REDUCE_SIZE = 512 };

	typedef Kokkos::Threads execution_space ;
	typedef execution_space::scratch_memory_space space ;

	ThreadsExec * const m_exec ;
	ThreadsExec * const * m_team_base ; ///< Base for team fan-in
	space m_team_shared ;
	int m_team_shared_size ;
	int m_team_size ;
	int m_team_rank ;
	int m_team_rank_rev ;
	int m_league_size ;
	int m_league_end ;
	int m_league_rank ;

	int m_chunk_size;
	int m_league_chunk_end;

	int m_invalid_thread;
	int m_team_alloc;

	inline
	void set_team_shared()
	{ new( & m_team_shared ) space( ((char ) (m_team_base)->scratch_memory()) + TEAM_REDUCE_SIZE , m_team_shared_size ); }

	public:

	// Fan-in and wait until the matching fan-out is called.
	// The root thread which does not wait will return true.
	// All other threads will return false during the fan-out.
	KOKKOS_INLINE_FUNCTION bool team_fan_in() const
	{
	int n , j ;

	// Wait for fan-in threads
	for ( n = 1 ; ( ! ( m_team_rank_rev & n ) ) && ( ( j = m_team_rank_rev + n ) < m_team_size ) ; n <<= 1 ) {
	- Impl::spinwait( m_team_base[j]->state() , ThreadsExec::Active );
	+ Impl::spinwait_while_equal( m_team_base[j]->state() , ThreadsExec::Active );
	}

	// If not root then wait for release
	if ( m_team_rank_rev ) {
	m_exec->state() = ThreadsExec::Rendezvous ;
	- Impl::spinwait( m_exec->state() , ThreadsExec::Rendezvous );
	+ Impl::spinwait_while_equal( m_exec->state() , ThreadsExec::Rendezvous );
	}

	return ! m_team_rank_rev ;
	}

	KOKKOS_INLINE_FUNCTION void team_fan_out() const
	{
	int n , j ;
	for ( n = 1 ; ( ! ( m_team_rank_rev & n ) ) && ( ( j = m_team_rank_rev + n ) < m_team_size ) ; n <<= 1 ) {
	m_team_base[j]->state() = ThreadsExec::Active ;
	}
	}

	public:

	KOKKOS_INLINE_FUNCTION static int team_reduce_size() { return TEAM_REDUCE_SIZE ; }

	KOKKOS_INLINE_FUNCTION
	const execution_space::scratch_memory_space & team_shmem() const
	{ return m_team_shared.set_team_thread_mode(0,1,0) ; }

	KOKKOS_INLINE_FUNCTION
	const execution_space::scratch_memory_space & team_scratch(int) const
	{ return m_team_shared.set_team_thread_mode(0,1,0) ; }

	KOKKOS_INLINE_FUNCTION
	const execution_space::scratch_memory_space & thread_scratch(int) const
	{ return m_team_shared.set_team_thread_mode(0,team_size(),team_rank()) ; }

	KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
	KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
	KOKKOS_INLINE_FUNCTION int team_rank() const { return m_team_rank ; }
	KOKKOS_INLINE_FUNCTION int team_size() const { return m_team_size ; }

	KOKKOS_INLINE_FUNCTION void team_barrier() const
	{
	team_fan_in();
	team_fan_out();
	}

	template<class ValueType>
	KOKKOS_INLINE_FUNCTION
	void team_broadcast(ValueType& value, const int& thread_id) const
	{
	#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	{ }
	#else
	// Make sure there is enough scratch space:
	typedef typename if_c< sizeof(ValueType) < TEAM_REDUCE_SIZE
	, ValueType , void >::type type ;

	if ( m_team_base ) {
	type * const local_value = ((type*) m_team_base[0]->scratch_memory());
	if(team_rank() == thread_id) *local_value = value;
	memory_fence();
	team_barrier();
	value = *local_value;
	}
	#endif
	}

	template< typename Type >
	KOKKOS_INLINE_FUNCTION Type team_reduce( const Type & value ) const
	#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	{ return Type(); }
	#else
	{
	// Make sure there is enough scratch space:
	typedef typename if_c< sizeof(Type) < TEAM_REDUCE_SIZE , Type , void >::type type ;

	if ( 0 == m_exec ) return value ;

	((volatile type) m_exec->scratch_memory() ) = value ;

	memory_fence();

	type & accum = ((type ) m_team_base[0]->scratch_memory() );

	if ( team_fan_in() ) {
	for ( int i = 1 ; i < m_team_size ; ++i ) {
	accum += ((type ) m_team_base[i]->scratch_memory() );
	}
	memory_fence();
	}

	team_fan_out();

	return accum ;
	}
	#endif

	template< class ValueType, class JoinOp >
	KOKKOS_INLINE_FUNCTION ValueType
	team_reduce( const ValueType & value
	, const JoinOp & op_in ) const
	#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	{ return ValueType(); }
	#else
	{
	typedef ValueType value_type;
	const JoinLambdaAdapter<value_type,JoinOp> op(op_in);
	#endif
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	// Make sure there is enough scratch space:
	typedef typename if_c< sizeof(value_type) < TEAM_REDUCE_SIZE
	, value_type , void >::type type ;

	if ( 0 == m_exec ) return value ;

	type * const local_value = ((type*) m_exec->scratch_memory());

	// Set this thread's contribution
	*local_value = value ;

	// Fence to make sure the base team member has access:
	memory_fence();

	if ( team_fan_in() ) {
	// The last thread to synchronize returns true, all other threads wait for team_fan_out()
	type * const team_value = ((type*) m_team_base[0]->scratch_memory());

	// Join to the team value:
	for ( int i = 1 ; i < m_team_size ; ++i ) {
	op.join( team_value , ((type*) m_team_base[i]->scratch_memory()) );
	}

	// Team base thread may "lap" member threads so copy out to their local value.
	for ( int i = 1 ; i < m_team_size ; ++i ) {
	((type) m_team_base[i]->scratch_memory()) = *team_value ;
	}

	// Fence to make sure all team members have access
	memory_fence();
	}

	team_fan_out();

	// Value was changed by the team base
	return ((type volatile const ) local_value);
	}
	#endif

	/** \brief Intra-team exclusive prefix sum with team_rank() ordering
	* with intra-team non-deterministic ordering accumulation.
	*
	* The global inter-team accumulation value will, at the end of the
	* league's parallel execution, be the scan's total.
	* Parallel execution ordering of the league's teams is non-deterministic.
	* As such the base value for each team's scan operation is similarly
	* non-deterministic.
	*/
	template< typename ArgType >
	KOKKOS_INLINE_FUNCTION ArgType team_scan( const ArgType & value , ArgType * const global_accum ) const
	#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	{ return ArgType(); }
	#else
	{
	// Make sure there is enough scratch space:
	typedef typename if_c< sizeof(ArgType) < TEAM_REDUCE_SIZE , ArgType , void >::type type ;

	if ( 0 == m_exec ) return type(0);

	volatile type * const work_value = ((type*) m_exec->scratch_memory());

	*work_value = value ;

	memory_fence();

	if ( team_fan_in() ) {
	// The last thread to synchronize returns true, all other threads wait for team_fan_out()
	// m_team_base[0] == highest ranking team member
	// m_team_base[ m_team_size - 1 ] == lowest ranking team member
	//
	// 1) copy from lower to higher rank, initialize lowest rank to zero
	// 2) prefix sum from lowest to highest rank, skipping lowest rank

	type accum = 0 ;

	if ( global_accum ) {
	for ( int i = m_team_size ; i-- ; ) {
	type & val = ((type) m_team_base[i]->scratch_memory());
	accum += val ;
	}
	accum = atomic_fetch_add( global_accum , accum );
	}

	for ( int i = m_team_size ; i-- ; ) {
	type & val = ((type) m_team_base[i]->scratch_memory());
	const type offset = accum ;
	accum += val ;
	val = offset ;
	}

	memory_fence();
	}

	team_fan_out();

	return *work_value ;
	}
	#endif

	/** \brief Intra-team exclusive prefix sum with team_rank() ordering.
	*
	* The highest rank thread can compute the reduction total as
	* reduction_total = dev.team_scan( value ) + value ;
	*/
	template< typename ArgType >
	KOKKOS_INLINE_FUNCTION ArgType team_scan( const ArgType & value ) const
	{ return this-> template team_scan<ArgType>( value , 0 ); }


	//----------------------------------------
	// Private for the driver

	template< class ... Properties >
	ThreadsExecTeamMember( Impl::ThreadsExec * exec
	, const TeamPolicyInternal< Kokkos::Threads , Properties ... > & team
	, const int shared_size )
	: m_exec( exec )
	, m_team_base(0)
	, m_team_shared(0,0)
	, m_team_shared_size( shared_size )
	, m_team_size(team.team_size())
	, m_team_rank(0)
	, m_team_rank_rev(0)
	, m_league_size(0)
	, m_league_end(0)
	, m_league_rank(0)
	, m_chunk_size( team.chunk_size() )
	, m_league_chunk_end(0)
	, m_team_alloc( team.team_alloc())
	{
	if ( team.league_size() ) {
	// Execution is using device-team interface:

	const int pool_rank_rev = m_exec->pool_size() - ( m_exec->pool_rank() + 1 );
	const int team_rank_rev = pool_rank_rev % team.team_alloc();
	const size_t pool_league_size = m_exec->pool_size() / team.team_alloc() ;
	const size_t pool_league_rank_rev = pool_rank_rev / team.team_alloc() ;
	+ if(pool_league_rank_rev >= pool_league_size) {
	+ m_invalid_thread = 1;
	+ return;
	+ }
	const size_t pool_league_rank = pool_league_size - ( pool_league_rank_rev + 1 );

	const int pool_num_teams = m_exec->pool_size()/team.team_alloc();
	const int chunk_size = team.chunk_size()>0?team.chunk_size():team.team_iter();
	const int chunks_per_team = ( team.league_size() + chunk_sizepool_num_teams-1 ) / (chunk_sizepool_num_teams);
	int league_iter_end = team.league_size() - pool_league_rank_rev * chunks_per_team * chunk_size;
	int league_iter_begin = league_iter_end - chunks_per_team * chunk_size;
	if (league_iter_begin < 0) league_iter_begin = 0;
	if (league_iter_end>team.league_size()) league_iter_end = team.league_size();

	if ((team.team_alloc()>m_team_size)?
	(team_rank_rev >= m_team_size):
	(m_exec->pool_size() - pool_num_teams*m_team_size > m_exec->pool_rank())
	)
	m_invalid_thread = 1;
	else
	m_invalid_thread = 0;

	// May be using fewer threads per team than a multiple of threads per core,
	// some threads will idle.

	if ( team_rank_rev < team.team_size() && !m_invalid_thread) {

	m_team_base = m_exec->pool_base() + team.team_alloc() * pool_league_rank_rev ;
	m_team_size = team.team_size() ;
	m_team_rank = team.team_size() - ( team_rank_rev + 1 );
	m_team_rank_rev = team_rank_rev ;
	m_league_size = team.league_size();

	m_league_rank = ( team.league_size() * pool_league_rank ) / pool_league_size ;
	m_league_end = ( team.league_size() * (pool_league_rank+1) ) / pool_league_size ;

	set_team_shared();
	}

	if ( (m_team_rank_rev == 0) && (m_invalid_thread == 0) ) {
	m_exec->set_work_range(m_league_rank,m_league_end,m_chunk_size);
	m_exec->reset_steal_target(m_team_size);
	}
	if(std::is_same<typename TeamPolicyInternal<Kokkos::Threads, Properties ...>::schedule_type::type,Kokkos::Dynamic>::value) {
	m_exec->barrier();
	}
	}
	else
	{ m_invalid_thread = 1; }
	}

	ThreadsExecTeamMember()
	: m_exec(0)
	, m_team_base(0)
	, m_team_shared(0,0)
	, m_team_shared_size(0)
	, m_team_size(1)
	, m_team_rank(0)
	, m_team_rank_rev(0)
	, m_league_size(1)
	, m_league_end(0)
	, m_league_rank(0)
	, m_chunk_size(0)
	, m_league_chunk_end(0)
	, m_invalid_thread(0)
	, m_team_alloc(0)
	{}

	inline
	ThreadsExec & threads_exec_team_base() const { return m_team_base ? *m_team_base : m_exec ; }

	bool valid_static() const
	{ return m_league_rank < m_league_end ; }

	void next_static()
	{
	if ( m_league_rank < m_league_end ) {
	team_barrier();
	set_team_shared();
	}
	m_league_rank++;
	}

	bool valid_dynamic() {

	if(m_invalid_thread)
	return false;
	if ((m_league_rank < m_league_chunk_end) && (m_league_rank < m_league_size)) {
	return true;
	}

	if ( m_team_rank_rev == 0 ) {
	m_team_base[0]->get_work_index(m_team_alloc);
	}
	team_barrier();

	long work_index = m_team_base[0]->team_work_index();

	m_league_rank = work_index * m_chunk_size;
	m_league_chunk_end = (work_index +1 ) * m_chunk_size;

	if(m_league_chunk_end > m_league_size) m_league_chunk_end = m_league_size;

	if((m_league_rank>=0) && (m_league_rank < m_league_chunk_end))
	return true;
	return false;
	}

	void next_dynamic() {
	if(m_invalid_thread)
	return;

	if ( m_league_rank < m_league_chunk_end ) {
	team_barrier();
	set_team_shared();
	}
	m_league_rank++;
	}

	void set_league_shmem( const int arg_league_rank
	, const int arg_league_size
	, const int arg_shmem_size
	)
	{
	m_league_rank = arg_league_rank ;
	m_league_size = arg_league_size ;
	m_team_shared_size = arg_shmem_size ;
	set_team_shared();
	}
	};

	} /* namespace Impl */
	} /* namespace Kokkos */

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {
	template< class ... Properties >
	class TeamPolicyInternal< Kokkos::Threads , Properties ... >: public PolicyTraits<Properties ...>
	{
	private:

	int m_league_size ;
	int m_team_size ;
	int m_team_alloc ;
	int m_team_iter ;

	size_t m_team_scratch_size[2];
	size_t m_thread_scratch_size[2];

	int m_chunk_size;

	inline
	void init( const int league_size_request
	, const int team_size_request )
	{
	const int pool_size = traits::execution_space::thread_pool_size(0);
	- const int team_max = traits::execution_space::thread_pool_size(1);
	+ const int max_host_team_size = Impl::HostThreadTeamData::max_team_members;
	+ const int team_max = pool_size<max_host_team_size?pool_size:max_host_team_size;
	const int team_grain = traits::execution_space::thread_pool_size(2);

	m_league_size = league_size_request ;

	m_team_size = team_size_request < team_max ?
	team_size_request : team_max ;

	// Round team size up to a multiple of 'team_gain'
	const int team_size_grain = team_grain * ( ( m_team_size + team_grain - 1 ) / team_grain );
	const int team_count = pool_size / team_size_grain ;

	// Constraint : pool_size = m_team_alloc * team_count
	m_team_alloc = pool_size / team_count ;

	// Maxumum number of iterations each team will take:
	m_team_iter = ( m_league_size + team_count - 1 ) / team_count ;

	set_auto_chunk_size();
	}


	public:

	//! Tag this class as a kokkos execution policy
	//! Tag this class as a kokkos execution policy
	typedef TeamPolicyInternal execution_policy ;

	typedef PolicyTraits<Properties ... > traits;

	TeamPolicyInternal& operator = (const TeamPolicyInternal& p) {
	m_league_size = p.m_league_size;
	m_team_size = p.m_team_size;
	m_team_alloc = p.m_team_alloc;
	m_team_iter = p.m_team_iter;
	m_team_scratch_size[0] = p.m_team_scratch_size[0];
	m_thread_scratch_size[0] = p.m_thread_scratch_size[0];
	m_team_scratch_size[1] = p.m_team_scratch_size[1];
	m_thread_scratch_size[1] = p.m_thread_scratch_size[1];
	m_chunk_size = p.m_chunk_size;
	return *this;
	}

	//----------------------------------------

	template< class FunctorType >
	inline static
	- int team_size_max( const FunctorType & )
	- { return traits::execution_space::thread_pool_size(1); }
	+ int team_size_max( const FunctorType & ) {
	+ int pool_size = traits::execution_space::thread_pool_size(1);
	+ int max_host_team_size = Impl::HostThreadTeamData::max_team_members;
	+ return pool_size<max_host_team_size?pool_size:max_host_team_size;
	+ }
	+

	template< class FunctorType >
	static int team_size_recommended( const FunctorType & )
	{ return traits::execution_space::thread_pool_size(2); }


	template< class FunctorType >
	inline static
	int team_size_recommended( const FunctorType &, const int& )
	{ return traits::execution_space::thread_pool_size(2); }

	//----------------------------------------

	inline int team_size() const { return m_team_size ; }
	inline int team_alloc() const { return m_team_alloc ; }
	inline int league_size() const { return m_league_size ; }
	inline size_t scratch_size(const int& level, int team_size_ = -1 ) const {
	if(team_size_ < 0)
	team_size_ = m_team_size;
	return m_team_scratch_size[level] + team_size_*m_thread_scratch_size[level] ;
	}

	inline int team_iter() const { return m_team_iter ; }

	/** \brief Specify league size, request team size */
	TeamPolicyInternal( typename traits::execution_space &
	, int league_size_request
	, int team_size_request
	, int vector_length_request = 1 )
	: m_league_size(0)
	, m_team_size(0)
	, m_team_alloc(0)
	, m_team_scratch_size { 0 , 0 }
	, m_thread_scratch_size { 0 , 0 }
	, m_chunk_size(0)
	{ init(league_size_request,team_size_request); (void) vector_length_request; }

	/** \brief Specify league size, request team size */
	TeamPolicyInternal( typename traits::execution_space &
	, int league_size_request
	, const Kokkos::AUTO_t & /* team_size_request */
	, int /* vector_length_request */ = 1 )
	: m_league_size(0)
	, m_team_size(0)
	, m_team_alloc(0)
	, m_team_scratch_size { 0 , 0 }
	, m_thread_scratch_size { 0 , 0 }
	, m_chunk_size(0)
	{ init(league_size_request,traits::execution_space::thread_pool_size(2)); }

	TeamPolicyInternal( int league_size_request
	, int team_size_request
	, int /* vector_length_request */ = 1 )
	: m_league_size(0)
	, m_team_size(0)
	, m_team_alloc(0)
	, m_team_scratch_size { 0 , 0 }
	, m_thread_scratch_size { 0 , 0 }
	, m_chunk_size(0)
	{ init(league_size_request,team_size_request); }

	TeamPolicyInternal( int league_size_request
	, const Kokkos::AUTO_t & /* team_size_request */
	, int /* vector_length_request */ = 1 )
	: m_league_size(0)
	, m_team_size(0)
	, m_team_alloc(0)
	, m_team_scratch_size { 0 , 0 }
	, m_thread_scratch_size { 0 , 0 }
	, m_chunk_size(0)
	{ init(league_size_request,traits::execution_space::thread_pool_size(2)); }

	inline int chunk_size() const { return m_chunk_size ; }

	/** \brief set chunk_size to a discrete value*/
	inline TeamPolicyInternal set_chunk_size(typename traits::index_type chunk_size_) const {
	TeamPolicyInternal p = *this;
	p.m_chunk_size = chunk_size_;
	return p;
	}

	/** \brief set per team scratch size for a specific level of the scratch hierarchy */
	inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team) const {
	TeamPolicyInternal p = *this;
	p.m_team_scratch_size[level] = per_team.value;
	return p;
	};

	/** \brief set per thread scratch size for a specific level of the scratch hierarchy */
	inline TeamPolicyInternal set_scratch_size(const int& level, const PerThreadValue& per_thread) const {
	TeamPolicyInternal p = *this;
	p.m_thread_scratch_size[level] = per_thread.value;
	return p;
	};

	/** \brief set per thread and per team scratch size for a specific level of the scratch hierarchy */
	inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team, const PerThreadValue& per_thread) const {
	TeamPolicyInternal p = *this;
	p.m_team_scratch_size[level] = per_team.value;
	p.m_thread_scratch_size[level] = per_thread.value;
	return p;
	};

	private:
	/** \brief finalize chunk_size if it was set to AUTO*/
	inline void set_auto_chunk_size() {

	int concurrency = traits::execution_space::thread_pool_size(0)/m_team_alloc;
	if( concurrency==0 ) concurrency=1;

	if(m_chunk_size > 0) {
	if(!Impl::is_integral_power_of_two( m_chunk_size ))
	Kokkos::abort("TeamPolicy blocking granularity must be power of two" );
	}

	int new_chunk_size = 1;
	while(new_chunk_size100concurrency < m_league_size)
	new_chunk_size *= 2;
	if(new_chunk_size < 128) {
	new_chunk_size = 1;
	while( (new_chunk_size40concurrency < m_league_size ) && (new_chunk_size<128) )
	new_chunk_size*=2;
	}
	m_chunk_size = new_chunk_size;
	}

	public:

	typedef Impl::ThreadsExecTeamMember member_type ;

	friend class Impl::ThreadsExecTeamMember ;
	};

	} /namespace Impl /
	} /* namespace Kokkos */


	namespace Kokkos {

	template< typename iType >
	KOKKOS_INLINE_FUNCTION
	Impl::TeamThreadRangeBoundariesStruct< iType, Impl::ThreadsExecTeamMember >
	TeamThreadRange( const Impl::ThreadsExecTeamMember& thread, const iType& count )
	{
	return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::ThreadsExecTeamMember >( thread, count );
	}

	template< typename iType1, typename iType2 >
	KOKKOS_INLINE_FUNCTION
	Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
	Impl::ThreadsExecTeamMember>
	TeamThreadRange( const Impl::ThreadsExecTeamMember& thread, const iType1 & begin, const iType2 & end )
	{
	typedef typename std::common_type< iType1, iType2 >::type iType;
	return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::ThreadsExecTeamMember >( thread, iType(begin), iType(end) );
	}


	template<typename iType>
	KOKKOS_INLINE_FUNCTION
	Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >
	ThreadVectorRange(const Impl::ThreadsExecTeamMember& thread, const iType& count) {
	return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >(thread,count);
	}


	KOKKOS_INLINE_FUNCTION
	Impl::ThreadSingleStruct<Impl::ThreadsExecTeamMember> PerTeam(const Impl::ThreadsExecTeamMember& thread) {
	return Impl::ThreadSingleStruct<Impl::ThreadsExecTeamMember>(thread);
	}

	KOKKOS_INLINE_FUNCTION
	Impl::VectorSingleStruct<Impl::ThreadsExecTeamMember> PerThread(const Impl::ThreadsExecTeamMember& thread) {
	return Impl::VectorSingleStruct<Impl::ThreadsExecTeamMember>(thread);
	}
	} // namespace Kokkos

	namespace Kokkos {

	/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all threads of the the calling thread team.
	* This functionality requires C++11 support.*/
	template<typename iType, class Lambda>
	KOKKOS_INLINE_FUNCTION
	void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>& loop_boundaries, const Lambda& lambda) {
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
	lambda(i);
	}

	/** \brief Inter-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
	* val is performed and put into result. This functionality requires C++11 support.*/
	template< typename iType, class Lambda, typename ValueType >
	KOKKOS_INLINE_FUNCTION
	void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>& loop_boundaries,
	const Lambda & lambda, ValueType& result) {

	result = ValueType();

	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	ValueType tmp = ValueType();
	lambda(i,tmp);
	result+=tmp;
	}

	result = loop_boundaries.thread.team_reduce(result,Impl::JoinAdd<ValueType>());
	}

	/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
	* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
	* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
	* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
	* '1 for '). This functionality requires C++11 support./
	template< typename iType, class Lambda, typename ValueType, class JoinType >
	KOKKOS_INLINE_FUNCTION
	void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>& loop_boundaries,
	const Lambda & lambda, const JoinType& join, ValueType& init_result) {

	ValueType result = init_result;

	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	ValueType tmp = ValueType();
	lambda(i,tmp);
	join(result,tmp);
	}

	init_result = loop_boundaries.thread.team_reduce(result,Impl::JoinLambdaAdapter<ValueType,JoinType>(join));
	}

	} //namespace Kokkos


	namespace Kokkos {
	/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
	* This functionality requires C++11 support.*/
	template<typename iType, class Lambda>
	KOKKOS_INLINE_FUNCTION
	void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >&
	loop_boundaries, const Lambda& lambda) {
	#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	#pragma ivdep
	#endif
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
	lambda(i);
	}

	/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
	* val is performed and put into result. This functionality requires C++11 support.*/
	template< typename iType, class Lambda, typename ValueType >
	KOKKOS_INLINE_FUNCTION
	void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >&
	loop_boundaries, const Lambda & lambda, ValueType& result) {
	result = ValueType();
	#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	#pragma ivdep
	#endif
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- ValueType tmp = ValueType();
	- lambda(i,tmp);
	- result+=tmp;
	+ lambda(i,result);
	}
	}

	/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
	* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
	* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
	* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
	* '1 for '). This functionality requires C++11 support./
	template< typename iType, class Lambda, typename ValueType, class JoinType >
	KOKKOS_INLINE_FUNCTION
	void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >&
	- loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& init_result) {
	+ loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& result ) {

	- ValueType result = init_result;
	#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	#pragma ivdep
	#endif
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- ValueType tmp = ValueType();
	- lambda(i,tmp);
	- join(result,tmp);
	+ lambda(i,result);
	}
	- init_result = result;
	}

	/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
	* for each i=0..N-1.
	*
	* The range i=0..N-1 is mapped to all vector lanes in the thread and a scan operation is performed.
	* Depending on the target execution space the operator might be called twice: once with final=false
	* and once with final=true. When final==true val contains the prefix sum value. The contribution of this
	* "i" needs to be added to val no matter whether final==true or not. In a serial execution
	* (i.e. team_size==1) the operator is only called once with final==true. Scan_val will be set
	* to the final sum value over all vector lanes.
	* This functionality requires C++11 support.*/
	template< typename iType, class FunctorType >
	KOKKOS_INLINE_FUNCTION
	void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >&
	loop_boundaries, const FunctorType & lambda) {

	typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
	typedef typename ValueTraits::value_type value_type ;

	value_type scan_val = value_type();

	#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	#pragma ivdep
	#endif
	for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	lambda(i,scan_val,true);
	}
	}

	} // namespace Kokkos

	namespace Kokkos {

	template<class FunctorType>
	KOKKOS_INLINE_FUNCTION
	void single(const Impl::VectorSingleStruct<Impl::ThreadsExecTeamMember>& single_struct, const FunctorType& lambda) {
	lambda();
	}

	template<class FunctorType>
	KOKKOS_INLINE_FUNCTION
	void single(const Impl::ThreadSingleStruct<Impl::ThreadsExecTeamMember>& single_struct, const FunctorType& lambda) {
	if(single_struct.team_member.team_rank()==0) lambda();
	}

	template<class FunctorType, class ValueType>
	KOKKOS_INLINE_FUNCTION
	void single(const Impl::VectorSingleStruct<Impl::ThreadsExecTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
	lambda(val);
	}

	template<class FunctorType, class ValueType>
	KOKKOS_INLINE_FUNCTION
	void single(const Impl::ThreadSingleStruct<Impl::ThreadsExecTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
	if(single_struct.team_member.team_rank()==0) {
	lambda(val);
	}
	single_struct.team_member.team_broadcast(val,0);
	}
	}

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	#endif /* #define KOKKOS_THREADSTEAM_HPP */

	diff --git a/lib/kokkos/core/src/impl/KokkosExp_Host_IterateTile.hpp b/lib/kokkos/core/src/impl/KokkosExp_Host_IterateTile.hpp
	new file mode 100644
	index 000000000..c4db3e15e
	--- /dev/null
	+++ b/lib/kokkos/core/src/impl/KokkosExp_Host_IterateTile.hpp
	@@ -0,0 +1,2356 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+#ifndef KOKKOS_HOST_EXP_ITERATE_TILE_HPP
	+#define KOKKOS_HOST_EXP_ITERATE_TILE_HPP
	+
	+#include <iostream>
	+#include <algorithm>
	+#include <stdio.h>
	+
	+#include <Kokkos_Macros.hpp>
	+
	+#if defined(KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION) && defined(KOKKOS_HAVE_PRAGMA_IVDEP) && !defined(__CUDA_ARCH__)
	+#define KOKKOS_MDRANGE_IVDEP
	+#endif
	+
	+
	+#ifdef KOKKOS_MDRANGE_IVDEP
	+ #define KOKKOS_ENABLE_IVDEP_MDRANGE _Pragma("ivdep")
	+#else
	+ #define KOKKOS_ENABLE_IVDEP_MDRANGE
	+#endif
	+
	+
	+
	+namespace Kokkos { namespace Experimental { namespace Impl {
	+
	+// Temporary, for testing new loop macros
	+#define KOKKOS_ENABLE_NEW_LOOP_MACROS 1
	+
	+
	+#define LOOP_1L(type, tile) \
	+ KOKKOS_ENABLE_IVDEP_MDRANGE \
	+ for( type i0=0; i0<static_cast<type>(tile[0]); ++i0)
	+
	+#define LOOP_2L(type, tile) \
	+ for( type i1=0; i1<static_cast<type>(tile[1]); ++i1) \
	+ LOOP_1L(type, tile)
	+
	+#define LOOP_3L(type, tile) \
	+ for( type i2=0; i2<static_cast<type>(tile[2]); ++i2) \
	+ LOOP_2L(type, tile)
	+
	+#define LOOP_4L(type, tile) \
	+ for( type i3=0; i3<static_cast<type>(tile[3]); ++i3) \
	+ LOOP_3L(type, tile)
	+
	+#define LOOP_5L(type, tile) \
	+ for( type i4=0; i4<static_cast<type>(tile[4]); ++i4) \
	+ LOOP_4L(type, tile)
	+
	+#define LOOP_6L(type, tile) \
	+ for( type i5=0; i5<static_cast<type>(tile[5]); ++i5) \
	+ LOOP_5L(type, tile)
	+
	+#define LOOP_7L(type, tile) \
	+ for( type i6=0; i6<static_cast<type>(tile[6]); ++i6) \
	+ LOOP_6L(type, tile)
	+
	+#define LOOP_8L(type, tile) \
	+ for( type i7=0; i7<static_cast<type>(tile[7]); ++i7) \
	+ LOOP_7L(type, tile)
	+
	+
	+#define LOOP_1R(type, tile) \
	+ KOKKOS_ENABLE_IVDEP_MDRANGE \
	+ for ( type i0=0; i0<static_cast<type>(tile[0]); ++i0 )
	+
	+#define LOOP_2R(type, tile) \
	+ LOOP_1R(type, tile) \
	+ for ( type i1=0; i1<static_cast<type>(tile[1]); ++i1 )
	+
	+#define LOOP_3R(type, tile) \
	+ LOOP_2R(type, tile) \
	+ for ( type i2=0; i2<static_cast<type>(tile[2]); ++i2 )
	+
	+#define LOOP_4R(type, tile) \
	+ LOOP_3R(type, tile) \
	+ for ( type i3=0; i3<static_cast<type>(tile[3]); ++i3 )
	+
	+#define LOOP_5R(type, tile) \
	+ LOOP_4R(type, tile) \
	+ for ( type i4=0; i4<static_cast<type>(tile[4]); ++i4 )
	+
	+#define LOOP_6R(type, tile) \
	+ LOOP_5R(type, tile) \
	+ for ( type i5=0; i5<static_cast<type>(tile[5]); ++i5 )
	+
	+#define LOOP_7R(type, tile) \
	+ LOOP_6R(type, tile) \
	+ for ( type i6=0; i6<static_cast<type>(tile[6]); ++i6 )
	+
	+#define LOOP_8R(type, tile) \
	+ LOOP_7R(type, tile) \
	+ for ( type i7=0; i7<static_cast<type>(tile[7]); ++i7 )
	+
	+
	+#define LOOP_ARGS_1 i0 + m_offset[0]
	+#define LOOP_ARGS_2 LOOP_ARGS_1, i1 + m_offset[1]
	+#define LOOP_ARGS_3 LOOP_ARGS_2, i2 + m_offset[2]
	+#define LOOP_ARGS_4 LOOP_ARGS_3, i3 + m_offset[3]
	+#define LOOP_ARGS_5 LOOP_ARGS_4, i4 + m_offset[4]
	+#define LOOP_ARGS_6 LOOP_ARGS_5, i5 + m_offset[5]
	+#define LOOP_ARGS_7 LOOP_ARGS_6, i6 + m_offset[6]
	+#define LOOP_ARGS_8 LOOP_ARGS_7, i7 + m_offset[7]
	+
	+
	+
	+// New Loop Macros...
	+// parallel_for, non-tagged
	+#define APPLY( func, ... ) \
	+ func( __VA_ARGS__ );
	+
	+// LayoutRight
	+// d = 0 to start
	+#define LOOP_R_1( func, type, m_offset, extent, d, ... ) \
	+ KOKKOS_ENABLE_IVDEP_MDRANGE \
	+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
	+ APPLY( func, __VA_ARGS__, i0 + m_offset[d] ) \
	+ }
	+
	+#define LOOP_R_2( func, type, m_offset, extent, d, ... ) \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
	+ LOOP_R_1( func, type, m_offset, extent, d+1 , __VA_ARGS__, i1 + m_offset[d] ) \
	+ }
	+
	+#define LOOP_R_3( func, type, m_offset, extent, d, ... ) \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
	+ LOOP_R_2( func, type, m_offset, extent, d+1 , __VA_ARGS__, i2 + m_offset[d] ) \
	+ }
	+
	+#define LOOP_R_4( func, type, m_offset, extent, d, ... ) \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
	+ LOOP_R_3( func, type, m_offset, extent, d+1 , __VA_ARGS__, i3 + m_offset[d] ) \
	+ }
	+
	+#define LOOP_R_5( func, type, m_offset, extent, d, ... ) \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
	+ LOOP_R_4( func, type, m_offset, extent, d+1 , __VA_ARGS__, i4 + m_offset[d] ) \
	+ }
	+
	+#define LOOP_R_6( func, type, m_offset, extent, d, ... ) \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
	+ LOOP_R_5( func, type, m_offset, extent, d+1 , __VA_ARGS__, i5 + m_offset[d] ) \
	+ }
	+
	+#define LOOP_R_7( func, type, m_offset, extent, d, ... ) \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
	+ LOOP_R_6( func, type, m_offset, extent, d+1 , __VA_ARGS__, i6 + m_offset[d] ) \
	+ }
	+
	+#define LOOP_R_8( func, type, m_offset, extent, d, ... ) \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
	+ LOOP_R_7( func, type, m_offset, extent, d+1 , __VA_ARGS__, i7 + m_offset[d] ) \
	+ }
	+
	+//LayoutLeft
	+// d = rank-1 to start
	+#define LOOP_L_1( func, type, m_offset, extent, d, ... ) \
	+ KOKKOS_ENABLE_IVDEP_MDRANGE \
	+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
	+ APPLY( func, i0 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define LOOP_L_2( func, type, m_offset, extent, d, ... ) \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
	+ LOOP_L_1( func, type, m_offset, extent, d-1, i1 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define LOOP_L_3( func, type, m_offset, extent, d, ... ) \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
	+ LOOP_L_2( func, type, m_offset, extent, d-1, i2 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define LOOP_L_4( func, type, m_offset, extent, d, ... ) \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
	+ LOOP_L_3( func, type, m_offset, extent, d-1, i3 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define LOOP_L_5( func, type, m_offset, extent, d, ... ) \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
	+ LOOP_L_4( func, type, m_offset, extent, d-1, i4 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define LOOP_L_6( func, type, m_offset, extent, d, ... ) \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
	+ LOOP_L_5( func, type, m_offset, extent, d-1, i5 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define LOOP_L_7( func, type, m_offset, extent, d, ... ) \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
	+ LOOP_L_6( func, type, m_offset, extent, d-1, i6 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define LOOP_L_8( func, type, m_offset, extent, d, ... ) \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
	+ LOOP_L_7( func, type, m_offset, extent, d-1, i7 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+// Left vs Right
	+// TODO: rank not necessary to pass through, can hardcode the values
	+#define LOOP_LAYOUT_1( func, type, is_left, m_offset, extent, rank ) \
	+ KOKKOS_ENABLE_IVDEP_MDRANGE \
	+ for( type i0 = (type)0; i0 < static_cast<type>(extent[0]); ++i0) { \
	+ APPLY( func, i0 + m_offset[0] ) \
	+ }
	+
	+#define LOOP_LAYOUT_2( func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[rank-1]); ++i1) { \
	+ LOOP_L_1( func, type, m_offset, extent, rank-2, i1 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[0]); ++i1) { \
	+ LOOP_R_1( func, type, m_offset, extent, 1 , i1 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define LOOP_LAYOUT_3( func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[rank-1]); ++i2) { \
	+ LOOP_L_2( func, type, m_offset, extent, rank-2, i2 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[0]); ++i2) { \
	+ LOOP_R_2( func, type, m_offset, extent, 1 , i2 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define LOOP_LAYOUT_4( func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[rank-1]); ++i3) { \
	+ LOOP_L_3( func, type, m_offset, extent, rank-2, i3 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[0]); ++i3) { \
	+ LOOP_R_3( func, type, m_offset, extent, 1 , i3 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define LOOP_LAYOUT_5( func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[rank-1]); ++i4) { \
	+ LOOP_L_4( func, type, m_offset, extent, rank-2, i4 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[0]); ++i4) { \
	+ LOOP_R_4( func, type, m_offset, extent, 1 , i4 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define LOOP_LAYOUT_6( func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[rank-1]); ++i5) { \
	+ LOOP_L_5( func, type, m_offset, extent, rank-2, i5 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[0]); ++i5) { \
	+ LOOP_R_5( func, type, m_offset, extent, 1 , i5 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define LOOP_LAYOUT_7( func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[rank-1]); ++i6) { \
	+ LOOP_L_6( func, type, m_offset, extent, rank-2, i6 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[0]); ++i6) { \
	+ LOOP_R_6( func, type, m_offset, extent, 1 , i6 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define LOOP_LAYOUT_8( func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[rank-1]); ++i7) { \
	+ LOOP_L_7( func, type, m_offset, extent, rank-2, i7 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[0]); ++i7) { \
	+ LOOP_R_7( func, type, m_offset, extent, 1 , i7 + m_offset[0] ) \
	+ } \
	+ }
	+
	+// Partial vs Full Tile
	+#define TILE_LOOP_1( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_1( func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_1( func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TILE_LOOP_2( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_2( func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_2( func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TILE_LOOP_3( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_3( func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_3( func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TILE_LOOP_4( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_4( func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_4( func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TILE_LOOP_5( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_5( func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_5( func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TILE_LOOP_6( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_6( func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_6( func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TILE_LOOP_7( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_7( func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_7( func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TILE_LOOP_8( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_8( func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_8( func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+
	+// parallel_reduce, non-tagged
	+// Reduction version
	+#define APPLY_REDUX( val, func, ... ) \
	+ func( __VA_ARGS__, val );
	+
	+// LayoutRight
	+// d = 0 to start
	+#define LOOP_R_1_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ KOKKOS_ENABLE_IVDEP_MDRANGE \
	+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
	+ APPLY_REDUX( val, func, __VA_ARGS__, i0 + m_offset[d] ) \
	+ }
	+
	+#define LOOP_R_2_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
	+ LOOP_R_1_REDUX( val, func, type, m_offset, extent, d+1 , __VA_ARGS__, i1 + m_offset[d] ) \
	+ }
	+
	+#define LOOP_R_3_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
	+ LOOP_R_2_REDUX( val, func, type, m_offset, extent, d+1 , __VA_ARGS__, i2 + m_offset[d] ) \
	+ }
	+
	+#define LOOP_R_4_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
	+ LOOP_R_3_REDUX( val, func, type, m_offset, extent, d+1 , __VA_ARGS__, i3 + m_offset[d] ) \
	+ }
	+
	+#define LOOP_R_5_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
	+ LOOP_R_4_REDUX( val, func, type, m_offset, extent, d+1 , __VA_ARGS__, i4 + m_offset[d] ) \
	+ }
	+
	+#define LOOP_R_6_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
	+ LOOP_R_5_REDUX( val, func, type, m_offset, extent, d+1 , __VA_ARGS__, i5 + m_offset[d] ) \
	+ }
	+
	+#define LOOP_R_7_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
	+ LOOP_R_6_REDUX( val, func, type, m_offset, extent, d+1 , __VA_ARGS__, i6 + m_offset[d] ) \
	+ }
	+
	+#define LOOP_R_8_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
	+ LOOP_R_7_REDUX( val, func, type, m_offset, extent, d+1 , __VA_ARGS__, i7 + m_offset[d] ) \
	+ }
	+
	+//LayoutLeft
	+// d = rank-1 to start
	+#define LOOP_L_1_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ KOKKOS_ENABLE_IVDEP_MDRANGE \
	+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
	+ APPLY_REDUX( val, func, i0 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define LOOP_L_2_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
	+ LOOP_L_1_REDUX( val, func, type, m_offset, extent, d-1, i1 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define LOOP_L_3_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
	+ LOOP_L_2_REDUX( val, func, type, m_offset, extent, d-1, i2 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define LOOP_L_4_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
	+ LOOP_L_3_REDUX( val, func, type, m_offset, extent, d-1, i3 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define LOOP_L_5_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
	+ LOOP_L_4_REDUX( val, func, type, m_offset, extent, d-1, i4 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define LOOP_L_6_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
	+ LOOP_L_5_REDUX( val, func, type, m_offset, extent, d-1, i5 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define LOOP_L_7_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
	+ LOOP_L_6_REDUX( val, func, type, m_offset, extent, d-1, i6 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define LOOP_L_8_REDUX( val, func, type, m_offset, extent, d, ... ) \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
	+ LOOP_L_7_REDUX( val, func, type, m_offset, extent, d-1, i7 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+// Left vs Right
	+#define LOOP_LAYOUT_1_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
	+ KOKKOS_ENABLE_IVDEP_MDRANGE \
	+ for( type i0 = (type)0; i0 < static_cast<type>(extent[0]); ++i0) { \
	+ APPLY_REDUX( val, func, i0 + m_offset[0] ) \
	+ }
	+
	+#define LOOP_LAYOUT_2_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[rank-1]); ++i1) { \
	+ LOOP_L_1_REDUX( val, func, type, m_offset, extent, rank-2, i1 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[0]); ++i1) { \
	+ LOOP_R_1_REDUX( val, func, type, m_offset, extent, 1 , i1 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define LOOP_LAYOUT_3_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[rank-1]); ++i2) { \
	+ LOOP_L_2_REDUX( val, func, type, m_offset, extent, rank-2, i2 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[0]); ++i2) { \
	+ LOOP_R_2_REDUX( val, func, type, m_offset, extent, 1 , i2 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define LOOP_LAYOUT_4_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[rank-1]); ++i3) { \
	+ LOOP_L_3_REDUX( val, func, type, m_offset, extent, rank-2, i3 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[0]); ++i3) { \
	+ LOOP_R_3_REDUX( val, func, type, m_offset, extent, 1 , i3 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define LOOP_LAYOUT_5_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[rank-1]); ++i4) { \
	+ LOOP_L_4_REDUX( val, func, type, m_offset, extent, rank-2, i4 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[0]); ++i4) { \
	+ LOOP_R_4_REDUX( val, func, type, m_offset, extent, 1 , i4 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define LOOP_LAYOUT_6_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[rank-1]); ++i5) { \
	+ LOOP_L_5_REDUX( val, func, type, m_offset, extent, rank-2, i5 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[0]); ++i5) { \
	+ LOOP_R_5_REDUX( val, func, type, m_offset, extent, 1 , i5 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define LOOP_LAYOUT_7_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[rank-1]); ++i6) { \
	+ LOOP_L_6_REDUX( val, func, type, m_offset, extent, rank-2, i6 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[0]); ++i6) { \
	+ LOOP_R_6_REDUX( val, func, type, m_offset, extent, 1 , i6 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define LOOP_LAYOUT_8_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[rank-1]); ++i7) { \
	+ LOOP_L_7_REDUX( val, func, type, m_offset, extent, rank-2, i7 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[0]); ++i7) { \
	+ LOOP_R_7_REDUX( val, func, type, m_offset, extent, 1 , i7 + m_offset[0] ) \
	+ } \
	+ }
	+
	+// Partial vs Full Tile
	+#define TILE_LOOP_1_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_1_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_1_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TILE_LOOP_2_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_2_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_2_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TILE_LOOP_3_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_3_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_3_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TILE_LOOP_4_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_4_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_4_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TILE_LOOP_5_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_5_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_5_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TILE_LOOP_6_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_6_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_6_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TILE_LOOP_7_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_7_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_7_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TILE_LOOP_8_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { LOOP_LAYOUT_8_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { LOOP_LAYOUT_8_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
	+// end New Loop Macros
	+
	+
	+// tagged macros
	+#define TAGGED_APPLY( tag, func, ... ) \
	+ func( tag, __VA_ARGS__ );
	+
	+// LayoutRight
	+// d = 0 to start
	+#define TAGGED_LOOP_R_1( tag, func, type, m_offset, extent, d, ... ) \
	+ KOKKOS_ENABLE_IVDEP_MDRANGE \
	+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
	+ TAGGED_APPLY( tag, func, __VA_ARGS__, i0 + m_offset[d] ) \
	+ }
	+
	+#define TAGGED_LOOP_R_2( tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
	+ TAGGED_LOOP_R_1( tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i1 + m_offset[d] ) \
	+ }
	+
	+#define TAGGED_LOOP_R_3( tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
	+ TAGGED_LOOP_R_2( tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i2 + m_offset[d] ) \
	+ }
	+
	+#define TAGGED_LOOP_R_4( tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
	+ TAGGED_LOOP_R_3( tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i3 + m_offset[d] ) \
	+ }
	+
	+#define TAGGED_LOOP_R_5( tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
	+ TAGGED_LOOP_R_4( tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i4 + m_offset[d] ) \
	+ }
	+
	+#define TAGGED_LOOP_R_6( tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
	+ TAGGED_LOOP_R_5( tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i5 + m_offset[d] ) \
	+ }
	+
	+#define TAGGED_LOOP_R_7( tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
	+ TAGGED_LOOP_R_6( tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i6 + m_offset[d] ) \
	+ }
	+
	+#define TAGGED_LOOP_R_8( tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
	+ TAGGED_LOOP_R_7( tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i7 + m_offset[d] ) \
	+ }
	+
	+//LayoutLeft
	+// d = rank-1 to start
	+#define TAGGED_LOOP_L_1( tag, func, type, m_offset, extent, d, ... ) \
	+ KOKKOS_ENABLE_IVDEP_MDRANGE \
	+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
	+ TAGGED_APPLY( tag, func, i0 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define TAGGED_LOOP_L_2( tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
	+ TAGGED_LOOP_L_1( tag, func, type, m_offset, extent, d-1, i1 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define TAGGED_LOOP_L_3( tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
	+ TAGGED_LOOP_L_2( tag, func, type, m_offset, extent, d-1, i2 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define TAGGED_LOOP_L_4( tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
	+ TAGGED_LOOP_L_3( tag, func, type, m_offset, extent, d-1, i3 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define TAGGED_LOOP_L_5( tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
	+ TAGGED_LOOP_L_4( tag, func, type, m_offset, extent, d-1, i4 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define TAGGED_LOOP_L_6( tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
	+ TAGGED_LOOP_L_5( tag, func, type, m_offset, extent, d-1, i5 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define TAGGED_LOOP_L_7( tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
	+ TAGGED_LOOP_L_6( tag, func, type, m_offset, extent, d-1, i6 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define TAGGED_LOOP_L_8( tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
	+ TAGGED_LOOP_L_7( tag, func, type, m_offset, extent, d-1, i7 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+// Left vs Right
	+// TODO: rank not necessary to pass through, can hardcode the values
	+#define TAGGED_LOOP_LAYOUT_1( tag, func, type, is_left, m_offset, extent, rank ) \
	+ KOKKOS_ENABLE_IVDEP_MDRANGE \
	+ for( type i0 = (type)0; i0 < static_cast<type>(extent[0]); ++i0) { \
	+ TAGGED_APPLY( tag, func, i0 + m_offset[0] ) \
	+ }
	+
	+#define TAGGED_LOOP_LAYOUT_2( tag, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[rank-1]); ++i1) { \
	+ TAGGED_LOOP_L_1( tag, func, type, m_offset, extent, rank-2, i1 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[0]); ++i1) { \
	+ TAGGED_LOOP_R_1( tag, func, type, m_offset, extent, 1 , i1 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define TAGGED_LOOP_LAYOUT_3( tag, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[rank-1]); ++i2) { \
	+ TAGGED_LOOP_L_2( tag, func, type, m_offset, extent, rank-2, i2 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[0]); ++i2) { \
	+ TAGGED_LOOP_R_2( tag, func, type, m_offset, extent, 1 , i2 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define TAGGED_LOOP_LAYOUT_4( tag, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[rank-1]); ++i3) { \
	+ TAGGED_LOOP_L_3( tag, func, type, m_offset, extent, rank-2, i3 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[0]); ++i3) { \
	+ TAGGED_LOOP_R_3( tag, func, type, m_offset, extent, 1 , i3 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define TAGGED_LOOP_LAYOUT_5( tag, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[rank-1]); ++i4) { \
	+ TAGGED_LOOP_L_4( tag, func, type, m_offset, extent, rank-2, i4 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[0]); ++i4) { \
	+ TAGGED_LOOP_R_4( tag, func, type, m_offset, extent, 1 , i4 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define TAGGED_LOOP_LAYOUT_6( tag, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[rank-1]); ++i5) { \
	+ TAGGED_LOOP_L_5( tag, func, type, m_offset, extent, rank-2, i5 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[0]); ++i5) { \
	+ TAGGED_LOOP_R_5( tag, func, type, m_offset, extent, 1 , i5 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define TAGGED_LOOP_LAYOUT_7( tag, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[rank-1]); ++i6) { \
	+ TAGGED_LOOP_L_6( tag, func, type, m_offset, extent, rank-2, i6 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[0]); ++i6) { \
	+ TAGGED_LOOP_R_6( tag, func, type, m_offset, extent, 1 , i6 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define TAGGED_LOOP_LAYOUT_8( tag, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[rank-1]); ++i7) { \
	+ TAGGED_LOOP_L_7( tag, func, type, m_offset, extent, rank-2, i7 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[0]); ++i7) { \
	+ TAGGED_LOOP_R_7( tag, func, type, m_offset, extent, 1 , i7 + m_offset[0] ) \
	+ } \
	+ }
	+
	+// Partial vs Full Tile
	+#define TAGGED_TILE_LOOP_1( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_1( tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_1( tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TAGGED_TILE_LOOP_2( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_2( tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_2( tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TAGGED_TILE_LOOP_3( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_3( tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_3( tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TAGGED_TILE_LOOP_4( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_4( tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_4( tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TAGGED_TILE_LOOP_5( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_5( tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_5( tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TAGGED_TILE_LOOP_6( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_6( tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_6( tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TAGGED_TILE_LOOP_7( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_7( tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_7( tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TAGGED_TILE_LOOP_8( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_8( tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_8( tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+
	+// parallel_reduce, tagged
	+// Reduction version
	+#define TAGGED_APPLY_REDUX( val, tag, func, ... ) \
	+ func( tag, __VA_ARGS__, val );
	+
	+// LayoutRight
	+// d = 0 to start
	+#define TAGGED_LOOP_R_1_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ KOKKOS_ENABLE_IVDEP_MDRANGE \
	+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
	+ TAGGED_APPLY_REDUX( val, tag, func, __VA_ARGS__, i0 + m_offset[d] ) \
	+ }
	+
	+#define TAGGED_LOOP_R_2_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
	+ TAGGED_LOOP_R_1_REDUX( val, tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i1 + m_offset[d] ) \
	+ }
	+
	+#define TAGGED_LOOP_R_3_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
	+ TAGGED_LOOP_R_2_REDUX( val, tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i2 + m_offset[d] ) \
	+ }
	+
	+#define TAGGED_LOOP_R_4_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
	+ TAGGED_LOOP_R_3_REDUX( val, tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i3 + m_offset[d] ) \
	+ }
	+
	+#define TAGGED_LOOP_R_5_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
	+ TAGGED_LOOP_R_4_REDUX( val, tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i4 + m_offset[d] ) \
	+ }
	+
	+#define TAGGED_LOOP_R_6_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
	+ TAGGED_LOOP_R_5_REDUX( val, tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i5 + m_offset[d] ) \
	+ }
	+
	+#define TAGGED_LOOP_R_7_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
	+ TAGGED_LOOP_R_6_REDUX( val, tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i6 + m_offset[d] ) \
	+ }
	+
	+#define TAGGED_LOOP_R_8_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
	+ TAGGED_LOOP_R_7_REDUX( val, tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i7 + m_offset[d] ) \
	+ }
	+
	+//LayoutLeft
	+// d = rank-1 to start
	+#define TAGGED_LOOP_L_1_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ KOKKOS_ENABLE_IVDEP_MDRANGE \
	+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
	+ TAGGED_APPLY_REDUX( val, tag, func, i0 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define TAGGED_LOOP_L_2_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
	+ TAGGED_LOOP_L_1_REDUX( val, tag, func, type, m_offset, extent, d-1, i1 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define TAGGED_LOOP_L_3_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
	+ TAGGED_LOOP_L_2_REDUX( val, tag, func, type, m_offset, extent, d-1, i2 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define TAGGED_LOOP_L_4_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
	+ TAGGED_LOOP_L_3_REDUX( val, tag, func, type, m_offset, extent, d-1, i3 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define TAGGED_LOOP_L_5_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
	+ TAGGED_LOOP_L_4_REDUX( val, tag, func, type, m_offset, extent, d-1, i4 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define TAGGED_LOOP_L_6_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
	+ TAGGED_LOOP_L_5_REDUX( val, tag, func, type, m_offset, extent, d-1, i5 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define TAGGED_LOOP_L_7_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
	+ TAGGED_LOOP_L_6_REDUX( val, tag, func, type, m_offset, extent, d-1, i6 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+#define TAGGED_LOOP_L_8_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
	+ TAGGED_LOOP_L_7_REDUX( val, tag, func, type, m_offset, extent, d-1, i7 + m_offset[d] , __VA_ARGS__ ) \
	+ }
	+
	+// Left vs Right
	+#define TAGGED_LOOP_LAYOUT_1_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
	+ KOKKOS_ENABLE_IVDEP_MDRANGE \
	+ for( type i0 = (type)0; i0 < static_cast<type>(extent[0]); ++i0) { \
	+ TAGGED_APPLY_REDUX( val, tag, func, i0 + m_offset[0] ) \
	+ }
	+
	+#define TAGGED_LOOP_LAYOUT_2_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[rank-1]); ++i1) { \
	+ TAGGED_LOOP_L_1_REDUX( val, tag, func, type, m_offset, extent, rank-2, i1 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i1 = (type)0; i1 < static_cast<type>(extent[0]); ++i1) { \
	+ TAGGED_LOOP_R_1_REDUX( val, tag, func, type, m_offset, extent, 1 , i1 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define TAGGED_LOOP_LAYOUT_3_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[rank-1]); ++i2) { \
	+ TAGGED_LOOP_L_2_REDUX( val, tag, func, type, m_offset, extent, rank-2, i2 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i2 = (type)0; i2 < static_cast<type>(extent[0]); ++i2) { \
	+ TAGGED_LOOP_R_2_REDUX( val, tag, func, type, m_offset, extent, 1 , i2 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define TAGGED_LOOP_LAYOUT_4_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[rank-1]); ++i3) { \
	+ TAGGED_LOOP_L_3_REDUX( val, tag, func, type, m_offset, extent, rank-2, i3 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i3 = (type)0; i3 < static_cast<type>(extent[0]); ++i3) { \
	+ TAGGED_LOOP_R_3_REDUX( val, tag, func, type, m_offset, extent, 1 , i3 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define TAGGED_LOOP_LAYOUT_5_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[rank-1]); ++i4) { \
	+ TAGGED_LOOP_L_4_REDUX( val, tag, func, type, m_offset, extent, rank-2, i4 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i4 = (type)0; i4 < static_cast<type>(extent[0]); ++i4) { \
	+ TAGGED_LOOP_R_4_REDUX( val, tag, func, type, m_offset, extent, 1 , i4 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define TAGGED_LOOP_LAYOUT_6_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[rank-1]); ++i5) { \
	+ TAGGED_LOOP_L_5_REDUX( val, tag, func, type, m_offset, extent, rank-2, i5 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i5 = (type)0; i5 < static_cast<type>(extent[0]); ++i5) { \
	+ TAGGED_LOOP_R_5_REDUX( val, tag, func, type, m_offset, extent, 1 , i5 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define TAGGED_LOOP_LAYOUT_7_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[rank-1]); ++i6) { \
	+ TAGGED_LOOP_L_6_REDUX( val, tag, func, type, m_offset, extent, rank-2, i6 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i6 = (type)0; i6 < static_cast<type>(extent[0]); ++i6) { \
	+ TAGGED_LOOP_R_6_REDUX( val, tag, func, type, m_offset, extent, 1 , i6 + m_offset[0] ) \
	+ } \
	+ }
	+
	+#define TAGGED_LOOP_LAYOUT_8_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
	+ if (is_left) { \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[rank-1]); ++i7) { \
	+ TAGGED_LOOP_L_7_REDUX( val, tag, func, type, m_offset, extent, rank-2, i7 + m_offset[rank-1] ) \
	+ } \
	+ } \
	+ else { \
	+ for( type i7 = (type)0; i7 < static_cast<type>(extent[0]); ++i7) { \
	+ TAGGED_LOOP_R_7_REDUX( val, tag, func, type, m_offset, extent, 1 , i7 + m_offset[0] ) \
	+ } \
	+ }
	+
	+// Partial vs Full Tile
	+#define TAGGED_TILE_LOOP_1_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_1_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_1_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TAGGED_TILE_LOOP_2_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_2_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_2_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TAGGED_TILE_LOOP_3_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_3_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_3_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TAGGED_TILE_LOOP_4_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_4_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_4_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TAGGED_TILE_LOOP_5_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_5_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_5_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TAGGED_TILE_LOOP_6_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_6_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_6_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TAGGED_TILE_LOOP_7_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_7_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_7_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+#define TAGGED_TILE_LOOP_8_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
	+ if (cond) { TAGGED_LOOP_LAYOUT_8_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
	+ else { TAGGED_LOOP_LAYOUT_8_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
	+
	+// end tagged macros
	+
	+
	+
	+
	+// Structs for calling loops
	+template < int Rank, bool IsLeft, typename IType, typename Tagged, typename Enable = void >
	+struct Tile_Loop_Type;
	+
	+template < bool IsLeft, typename IType >
	+struct Tile_Loop_Type<1, IsLeft, IType, void, void >
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_1( func, IType, IsLeft, cond, offset, a, b, 1 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_1_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 1 );
	+ }
	+};
	+
	+template < bool IsLeft, typename IType >
	+struct Tile_Loop_Type<2, IsLeft, IType, void, void>
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_2( func, IType, IsLeft, cond, offset, a, b, 2 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_2_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 2 );
	+ }
	+};
	+
	+template < bool IsLeft, typename IType >
	+struct Tile_Loop_Type<3, IsLeft, IType, void, void>
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_3( func, IType, IsLeft, cond, offset, a, b, 3 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_3_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 3 );
	+ }
	+};
	+
	+template < bool IsLeft, typename IType >
	+struct Tile_Loop_Type<4, IsLeft, IType, void, void>
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_4( func, IType, IsLeft, cond, offset, a, b, 4 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_4_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 4 );
	+ }
	+};
	+
	+template < bool IsLeft, typename IType >
	+struct Tile_Loop_Type<5, IsLeft, IType, void, void>
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_5( func, IType, IsLeft, cond, offset, a, b, 5 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_5_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 5 );
	+ }
	+};
	+
	+template < bool IsLeft, typename IType >
	+struct Tile_Loop_Type<6, IsLeft, IType, void, void>
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_6( func, IType, IsLeft, cond, offset, a, b, 6 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_6_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 6 );
	+ }
	+};
	+
	+template < bool IsLeft, typename IType >
	+struct Tile_Loop_Type<7, IsLeft, IType, void, void>
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_7( func, IType, IsLeft, cond, offset, a, b, 7 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_7_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 7 );
	+ }
	+};
	+
	+template < bool IsLeft, typename IType >
	+struct Tile_Loop_Type<8, IsLeft, IType, void, void>
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_8( func, IType, IsLeft, cond, offset, a, b, 8 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TILE_LOOP_8_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 8 );
	+ }
	+};
	+
	+// tagged versions
	+
	+template < bool IsLeft, typename IType, typename Tagged >
	+struct Tile_Loop_Type<1, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type >
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_1( Tagged(), func, IType, IsLeft, cond, offset, a, b, 1 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_1_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 1 );
	+ }
	+};
	+
	+template < bool IsLeft, typename IType, typename Tagged >
	+struct Tile_Loop_Type<2, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type>
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_2( Tagged(), func, IType, IsLeft, cond, offset, a, b, 2 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_2_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 2 );
	+ }
	+};
	+
	+template < bool IsLeft, typename IType, typename Tagged >
	+struct Tile_Loop_Type<3, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type>
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_3( Tagged(), func, IType, IsLeft, cond, offset, a, b, 3 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_3_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 3 );
	+ }
	+};
	+
	+template < bool IsLeft, typename IType, typename Tagged >
	+struct Tile_Loop_Type<4, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type>
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_4( Tagged(), func, IType, IsLeft, cond, offset, a, b, 4 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_4_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 4 );
	+ }
	+};
	+
	+template < bool IsLeft, typename IType, typename Tagged >
	+struct Tile_Loop_Type<5, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type>
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_5( Tagged(), func, IType, IsLeft, cond, offset, a, b, 5 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_5_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 5 );
	+ }
	+};
	+
	+template < bool IsLeft, typename IType, typename Tagged >
	+struct Tile_Loop_Type<6, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type>
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_6( Tagged(), func, IType, IsLeft, cond, offset, a, b, 6 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_6_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 6 );
	+ }
	+};
	+
	+template < bool IsLeft, typename IType, typename Tagged >
	+struct Tile_Loop_Type<7, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type>
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_7( Tagged(), func, IType, IsLeft, cond, offset, a, b, 7 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_7_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 7 );
	+ }
	+};
	+
	+template < bool IsLeft, typename IType, typename Tagged >
	+struct Tile_Loop_Type<8, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type>
	+{
	+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_8( Tagged(), func, IType, IsLeft, cond, offset, a, b, 8 );
	+ }
	+
	+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
	+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
	+ {
	+ TAGGED_TILE_LOOP_8_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 8 );
	+ }
	+};
	+// end Structs for calling loops
	+
	+
	+template <typename T>
	+using is_void = std::is_same< T , void >;
	+
	+template < typename RP
	+ , typename Functor
	+ , typename Tag = void
	+ , typename ValueType = void
	+ , typename Enable = void
	+ >
	+struct HostIterateTile;
	+
	+//For ParallelFor
	+template < typename RP
	+ , typename Functor
	+ , typename Tag
	+ , typename ValueType
	+ >
	+struct HostIterateTile < RP , Functor , Tag , ValueType , typename std::enable_if< is_void<ValueType >::value >::type >
	+{
	+ using index_type = typename RP::index_type;
	+ using point_type = typename RP::point_type;
	+
	+ using value_type = ValueType;
	+
	+ inline
	+ HostIterateTile( RP const& rp, Functor const& func )
	+ : m_rp(rp)
	+ , m_func(func)
	+ {
	+ }
	+
	+ inline
	+ bool check_iteration_bounds( point_type& partial_tile , point_type& offset ) const {
	+ bool is_full_tile = true;
	+
	+ for ( int i = 0; i < RP::rank; ++i ) {
	+ if ((offset[i] + m_rp.m_tile[i]) <= m_rp.m_upper[i]) {
	+ partial_tile[i] = m_rp.m_tile[i] ;
	+ }
	+ else {
	+ is_full_tile = false ;
	+ partial_tile[i] = (m_rp.m_upper[i] - 1 - offset[i]) == 0 ? 1
	+ : (m_rp.m_upper[i] - m_rp.m_tile[i]) > 0 ? (m_rp.m_upper[i] - offset[i])
	+ : (m_rp.m_upper[i] - m_rp.m_lower[i]) ; // when single tile encloses range
	+ }
	+ }
	+
	+ return is_full_tile ;
	+ } // end check bounds
	+
	+
	+ template <int Rank>
	+ struct RankTag
	+ {
	+ typedef RankTag type;
	+ enum { value = (int)Rank };
	+ };
	+
	+#if KOKKOS_ENABLE_NEW_LOOP_MACROS
	+ template <typename IType>
	+ inline
	+ void
	+ operator()(IType tile_idx) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ Tile_Loop_Type< RP::rank, (RP::inner_direction == RP::Left), index_type, Tag >::apply( m_func, full_tile, m_offset, m_rp.m_tile, m_tiledims );
	+
	+ }
	+
	+#else
	+ template <typename IType>
	+ inline
	+ void
	+ operator()(IType tile_idx) const
	+ { operator_impl( tile_idx , RankTag<RP::rank>() ); }
	+ // added due to compiler error when using sfinae to choose operator based on rank w/ cuda+serial
	+
	+ template <typename IType>
	+ inline
	+ void operator_impl( IType tile_idx , const RankTag<2> ) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ if (RP::inner_direction == RP::Left) {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_2L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_2 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_2L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_2 );
	+ }
	+ }
	+ } // end RP::Left
	+ else {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_2R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_2 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_2R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_2 );
	+ }
	+ }
	+ } // end RP::Right
	+
	+ } //end op() rank == 2
	+
	+
	+ template <typename IType>
	+ inline
	+ void operator_impl( IType tile_idx , const RankTag<3> ) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ if (RP::inner_direction == RP::Left) {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_3L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_3 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_3L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_3 );
	+ }
	+ }
	+ } // end RP::Left
	+ else {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_3R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_3 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_3R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_3 );
	+ }
	+ }
	+ } // end RP::Right
	+
	+ } //end op() rank == 3
	+
	+
	+ template <typename IType>
	+ inline
	+ void operator_impl( IType tile_idx , const RankTag<4> ) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ if (RP::inner_direction == RP::Left) {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_4L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_4 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_4L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_4 );
	+ }
	+ }
	+ } // end RP::Left
	+ else {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_4R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_4 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_4R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_4 );
	+ }
	+ }
	+ } // end RP::Right
	+
	+ } //end op() rank == 4
	+
	+
	+ template <typename IType>
	+ inline
	+ void operator_impl( IType tile_idx , const RankTag<5> ) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ if (RP::inner_direction == RP::Left) {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_5L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_5 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_5L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_5 );
	+ }
	+ }
	+ } // end RP::Left
	+ else {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_5R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_5 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_5R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_5 );
	+ }
	+ }
	+ } // end RP::Right
	+
	+ } //end op() rank == 5
	+
	+
	+ template <typename IType>
	+ inline
	+ void operator_impl( IType tile_idx , const RankTag<6> ) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ if (RP::inner_direction == RP::Left) {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_6L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_6 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_6L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_6 );
	+ }
	+ }
	+ } // end RP::Left
	+ else {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_6R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_6 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_6R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_6 );
	+ }
	+ }
	+ } // end RP::Right
	+
	+ } //end op() rank == 6
	+
	+
	+ template <typename IType>
	+ inline
	+ void operator_impl( IType tile_idx , const RankTag<7> ) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ if (RP::inner_direction == RP::Left) {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_7L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_7 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_7L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_7 );
	+ }
	+ }
	+ } // end RP::Left
	+ else {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_7R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_7 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_7R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_7 );
	+ }
	+ }
	+ } // end RP::Right
	+
	+ } //end op() rank == 7
	+
	+
	+ template <typename IType>
	+ inline
	+ void operator_impl( IType tile_idx , const RankTag<8> ) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ if (RP::inner_direction == RP::Left) {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_8L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_8 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_8L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_8 );
	+ }
	+ }
	+ } // end RP::Left
	+ else {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_8R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_8 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_8R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_8 );
	+ }
	+ }
	+ } // end RP::Right
	+
	+ } //end op() rank == 8
	+#endif
	+
	+
	+ template <typename... Args>
	+ typename std::enable_if<( sizeof...(Args) == RP::rank && std::is_same<Tag,void>::value), void>::type
	+ apply(Args &&... args) const
	+ {
	+ m_func(args...);
	+ }
	+
	+ template <typename... Args>
	+ typename std::enable_if<( sizeof...(Args) == RP::rank && !std::is_same<Tag,void>::value), void>::type
	+ apply(Args &&... args) const
	+ {
	+ m_func( m_tag, args...);
	+ }
	+
	+
	+ RP const& m_rp;
	+ Functor const& m_func;
	+ typename std::conditional< std::is_same<Tag,void>::value,int,Tag>::type m_tag;
	+// value_type & m_v;
	+
	+};
	+
	+
	+// ValueType: For reductions
	+template < typename RP
	+ , typename Functor
	+ , typename Tag
	+ , typename ValueType
	+ >
	+struct HostIterateTile < RP , Functor , Tag , ValueType , typename std::enable_if< !is_void<ValueType >::value >::type >
	+{
	+ using index_type = typename RP::index_type;
	+ using point_type = typename RP::point_type;
	+
	+ using value_type = ValueType;
	+
	+ inline
	+ HostIterateTile( RP const& rp, Functor const& func, value_type & v )
	+ : m_rp(rp) //Cuda 7.0 does not like braces...
	+ , m_func(func)
	+ , m_v(v) // use with non-void ValueType struct
	+ {
	+// Errors due to braces rather than parenthesis for init (with cuda 7.0)
	+// /home/ndellin/kokkos/core/src/impl/KokkosExp_Host_IterateTile.hpp:1216:98: error: too many braces around initializer for ‘int’ [-fpermissive]
	+// /home/ndellin/kokkos/core/src/impl/KokkosExp_Host_IterateTile.hpp:1216:98: error: aggregate value used where an integer was expected
	+ }
	+
	+ inline
	+ bool check_iteration_bounds( point_type& partial_tile , point_type& offset ) const {
	+ bool is_full_tile = true;
	+
	+ for ( int i = 0; i < RP::rank; ++i ) {
	+ if ((offset[i] + m_rp.m_tile[i]) <= m_rp.m_upper[i]) {
	+ partial_tile[i] = m_rp.m_tile[i] ;
	+ }
	+ else {
	+ is_full_tile = false ;
	+ partial_tile[i] = (m_rp.m_upper[i] - 1 - offset[i]) == 0 ? 1
	+ : (m_rp.m_upper[i] - m_rp.m_tile[i]) > 0 ? (m_rp.m_upper[i] - offset[i])
	+ : (m_rp.m_upper[i] - m_rp.m_lower[i]) ; // when single tile encloses range
	+ }
	+ }
	+
	+ return is_full_tile ;
	+ } // end check bounds
	+
	+
	+ template <int Rank>
	+ struct RankTag
	+ {
	+ typedef RankTag type;
	+ enum { value = (int)Rank };
	+ };
	+
	+
	+#if KOKKOS_ENABLE_NEW_LOOP_MACROS
	+ template <typename IType>
	+ inline
	+ void
	+ operator()(IType tile_idx) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ Tile_Loop_Type< RP::rank, (RP::inner_direction == RP::Left), index_type, Tag >::apply( m_v, m_func, full_tile, m_offset, m_rp.m_tile, m_tiledims );
	+
	+ }
	+
	+#else
	+ template <typename IType>
	+ inline
	+ void
	+ operator()(IType tile_idx) const
	+ { operator_impl( tile_idx , RankTag<RP::rank>() ); }
	+ // added due to compiler error when using sfinae to choose operator based on rank
	+
	+
	+ template <typename IType>
	+ inline
	+ void operator_impl( IType tile_idx , const RankTag<2> ) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ if (RP::inner_direction == RP::Left) {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_2L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_2 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_2L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_2 );
	+ }
	+ }
	+ } // end RP::Left
	+ else {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_2R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_2 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_2R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_2 );
	+ }
	+ }
	+ } // end RP::Right
	+
	+ } //end op() rank == 2
	+
	+
	+ template <typename IType>
	+ inline
	+ void operator_impl( IType tile_idx , const RankTag<3> ) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ if (RP::inner_direction == RP::Left) {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_3L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_3 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_3L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_3 );
	+ }
	+ }
	+ } // end RP::Left
	+ else {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_3R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_3 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_3R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_3 );
	+ }
	+ }
	+ } // end RP::Right
	+
	+ } //end op() rank == 3
	+
	+
	+ template <typename IType>
	+ inline
	+ void operator_impl( IType tile_idx , const RankTag<4> ) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ if (RP::inner_direction == RP::Left) {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_4L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_4 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_4L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_4 );
	+ }
	+ }
	+ } // end RP::Left
	+ else {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_4R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_4 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_4R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_4 );
	+ }
	+ }
	+ } // end RP::Right
	+
	+ } //end op() rank == 4
	+
	+
	+ template <typename IType>
	+ inline
	+ void operator_impl( IType tile_idx , const RankTag<5> ) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ if (RP::inner_direction == RP::Left) {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_5L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_5 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_5L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_5 );
	+ }
	+ }
	+ } // end RP::Left
	+ else {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_5R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_5 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_5R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_5 );
	+ }
	+ }
	+ } // end RP::Right
	+
	+ } //end op() rank == 5
	+
	+
	+ template <typename IType>
	+ inline
	+ void operator_impl( IType tile_idx , const RankTag<6> ) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ if (RP::inner_direction == RP::Left) {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_6L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_6 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_6L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_6 );
	+ }
	+ }
	+ } // end RP::Left
	+ else {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_6R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_6 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_6R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_6 );
	+ }
	+ }
	+ } // end RP::Right
	+
	+ } //end op() rank == 6
	+
	+
	+ template <typename IType>
	+ inline
	+ void operator_impl( IType tile_idx , const RankTag<7> ) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ if (RP::inner_direction == RP::Left) {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_7L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_7 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_7L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_7 );
	+ }
	+ }
	+ } // end RP::Left
	+ else {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_7R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_7 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_7R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_7 );
	+ }
	+ }
	+ } // end RP::Right
	+
	+ } //end op() rank == 7
	+
	+
	+ template <typename IType>
	+ inline
	+ void operator_impl( IType tile_idx , const RankTag<8> ) const
	+ {
	+ point_type m_offset;
	+ point_type m_tiledims;
	+
	+ if (RP::outer_direction == RP::Left) {
	+ for (int i=0; i<RP::rank; ++i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+ else {
	+ for (int i=RP::rank-1; i>=0; --i) {
	+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
	+ tile_idx /= m_rp.m_tile_end[i];
	+ }
	+ }
	+
	+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
	+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
	+
	+ if (RP::inner_direction == RP::Left) {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_8L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_8 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_8L(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_8 );
	+ }
	+ }
	+ } // end RP::Left
	+ else {
	+ if ( full_tile ) {
	+// #pragma simd
	+ LOOP_8R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_8 );
	+ }
	+ } else {
	+// #pragma simd
	+ LOOP_8R(index_type, m_tiledims) {
	+ apply( LOOP_ARGS_8 );
	+ }
	+ }
	+ } // end RP::Right
	+
	+ } //end op() rank == 8
	+#endif
	+
	+
	+ template <typename... Args>
	+ typename std::enable_if<( sizeof...(Args) == RP::rank && std::is_same<Tag,void>::value), void>::type
	+ apply(Args &&... args) const
	+ {
	+ m_func(args... , m_v);
	+ }
	+
	+ template <typename... Args>
	+ typename std::enable_if<( sizeof...(Args) == RP::rank && !std::is_same<Tag,void>::value), void>::type
	+ apply(Args &&... args) const
	+ {
	+ m_func( m_tag, args... , m_v);
	+ }
	+
	+
	+ RP const& m_rp;
	+ Functor const& m_func;
	+ value_type & m_v;
	+ typename std::conditional< std::is_same<Tag,void>::value,int,Tag>::type m_tag;
	+
	+};
	+
	+
	+// ------------------------------------------------------------------ //
	+
	+// MDFunctor - wraps the range_policy and functor to pass to IterateTile
	+// Serial, Threads, OpenMP
	+// Cuda uses DeviceIterateTile directly within md_parallel_for
	+// ParallelReduce
	+template < typename MDRange, typename Functor, typename ValueType = void >
	+struct MDFunctor
	+{
	+ using range_policy = MDRange;
	+ using functor_type = Functor;
	+ using value_type = ValueType;
	+ using work_tag = typename range_policy::work_tag;
	+ using index_type = typename range_policy::index_type;
	+ using iterate_type = typename Kokkos::Experimental::Impl::HostIterateTile< MDRange
	+ , Functor
	+ , work_tag
	+ , value_type
	+ >;
	+
	+
	+ inline
	+ MDFunctor( MDRange const& range, Functor const& f, ValueType & v )
	+ : m_range( range )
	+ , m_func( f )
	+ {}
	+
	+ inline
	+ MDFunctor( MDFunctor const& ) = default;
	+
	+ inline
	+ MDFunctor& operator=( MDFunctor const& ) = default;
	+
	+ inline
	+ MDFunctor( MDFunctor && ) = default;
	+
	+ inline
	+ MDFunctor& operator=( MDFunctor && ) = default;
	+
	+// KOKKOS_FORCEINLINE_FUNCTION //Caused cuda warning - __host__ warning
	+ inline
	+ void operator()(index_type t, value_type & v) const
	+ {
	+ iterate_type(m_range, m_func, v)(t);
	+ }
	+
	+ MDRange m_range;
	+ Functor m_func;
	+};
	+
	+// ParallelFor
	+template < typename MDRange, typename Functor >
	+struct MDFunctor< MDRange, Functor, void >
	+{
	+ using range_policy = MDRange;
	+ using functor_type = Functor;
	+ using work_tag = typename range_policy::work_tag;
	+ using index_type = typename range_policy::index_type;
	+ using iterate_type = typename Kokkos::Experimental::Impl::HostIterateTile< MDRange
	+ , Functor
	+ , work_tag
	+ , void
	+ >;
	+
	+
	+ inline
	+ MDFunctor( MDRange const& range, Functor const& f )
	+ : m_range( range )
	+ , m_func( f )
	+ {}
	+
	+ inline
	+ MDFunctor( MDFunctor const& ) = default;
	+
	+ inline
	+ MDFunctor& operator=( MDFunctor const& ) = default;
	+
	+ inline
	+ MDFunctor( MDFunctor && ) = default;
	+
	+ inline
	+ MDFunctor& operator=( MDFunctor && ) = default;
	+
	+ inline
	+ void operator()(index_type t) const
	+ {
	+ iterate_type(m_range, m_func)(t);
	+ }
	+
	+ MDRange m_range;
	+ Functor m_func;
	+};
	+
	+#undef KOKKOS_ENABLE_NEW_LOOP_MACROS
	+
	+} } } //end namespace Kokkos::Experimental::Impl
	+
	+
	+#endif
	diff --git a/lib/kokkos/core/src/impl/Kokkos_BitOps.hpp b/lib/kokkos/core/src/impl/Kokkos_BitOps.hpp
	index 0ffbc0548..7d7fd3d13 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_BitOps.hpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_BitOps.hpp
	@@ -1,122 +1,127 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_BITOPS_HPP
	#define KOKKOS_BITOPS_HPP

	#include <Kokkos_Macros.hpp>
	#include <stdint.h>
	#include <climits>

	namespace Kokkos {
	namespace Impl {

	KOKKOS_FORCEINLINE_FUNCTION
	int bit_scan_forward( unsigned i )
	{
	#if defined( __CUDA_ARCH__ )
	return __ffs(i) - 1;
	-#elif defined( __GNUC__ ) \|\| defined( __GNUG__ )
	- return __builtin_ffs(i) - 1;
	-#elif defined( __INTEL_COMPILER )
	+#elif defined( KOKKOS_COMPILER_INTEL )
	return _bit_scan_forward(i);
	+#elif defined( KOKKOS_COMPILER_IBM )
	+ return __cnttz4(i);
	+#elif defined( KOKKOS_COMPILER_GNU ) \|\| defined( __GNUC__ ) \|\| defined( __GNUG__ )
	+ return __builtin_ffs(i) - 1;
	#else
	-
	unsigned t = 1u;
	int r = 0;
	while ( i && ( i & t == 0 ) )
	{
	t = t << 1;
	++r;
	}
	return r;
	#endif
	}

	KOKKOS_FORCEINLINE_FUNCTION
	int bit_scan_reverse( unsigned i )
	{
	enum { shift = static_cast<int>( sizeof(unsigned) * CHAR_BIT - 1 ) };
	#if defined( __CUDA_ARCH__ )
	return shift - __clz(i);
	+#elif defined( KOKKOS_COMPILER_INTEL )
	+ return _bit_scan_reverse(i);
	+#elif defined( KOKKOS_COMPILER_IBM )
	+ return shift - __cntlz4(i);
	#elif defined( __GNUC__ ) \|\| defined( __GNUG__ )
	return shift - __builtin_clz(i);
	-#elif defined( __INTEL_COMPILER )
	- return _bit_scan_reverse(i);
	#else
	unsigned t = 1u << shift;
	int r = 0;
	while ( i && ( i & t == 0 ) )
	{
	t = t >> 1;
	++r;
	}
	return r;
	#endif
	}

	/// Count the number of bits set.
	KOKKOS_FORCEINLINE_FUNCTION
	int bit_count( unsigned i )
	{
	#if defined( __CUDA_ARCH__ )
	return __popc(i);
	-#elif defined( __GNUC__ ) \|\| defined( __GNUG__ )
	- return __builtin_popcount(i);
	#elif defined ( __INTEL_COMPILER )
	return _popcnt32(i);
	+#elif defined( KOKKOS_COMPILER_IBM )
	+ return __popcnt4(i);
	+#elif defined( __GNUC__ ) \|\| defined( __GNUG__ )
	+ return __builtin_popcount(i);
	#else
	// http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
	i = i - ( ( i >> 1 ) & ~0u / 3u ); // temp
	i = ( i & ~0u / 15u * 3u ) + ( ( i >> 2 ) & ~0u / 15u * 3u ); // temp
	i = ( i + ( i >> 4 ) ) & ~0u / 255u * 15u; // temp

	// count
	return (int)( ( i * ( ~0u / 255u ) ) >> ( sizeof(unsigned) - 1 ) * CHAR_BIT );
	#endif
	}

	} // namespace Impl
	} // namespace Kokkos

	#endif // KOKKOS_BITOPS_HPP
	diff --git a/lib/kokkos/core/src/impl/Kokkos_Core.cpp b/lib/kokkos/core/src/impl/Kokkos_Core.cpp
	index cd38eaa9d..7c38430c4 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_Core.cpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_Core.cpp
	@@ -1,453 +1,771 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Core.hpp>
	#include <impl/Kokkos_Error.hpp>
	#include <cctype>
	#include <cstring>
	#include <iostream>
	#include <cstdlib>

	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {
	namespace {

	bool is_unsigned_int(const char* str)
	{
	const size_t len = strlen (str);
	for (size_t i = 0; i < len; ++i) {
	if (! isdigit (str[i])) {
	return false;
	}
	}
	return true;
	}

	void initialize_internal(const InitArguments& args)
	{
	// This is an experimental setting
	// For KNL in Flat mode this variable should be set, so that
	// memkind allocates high bandwidth memory correctly.
	#ifdef KOKKOS_ENABLE_HBWSPACE
	setenv("MEMKIND_HBW_NODES", "1", 0);
	#endif

	// Protect declarations, to prevent "unused variable" warnings.
	#if defined( KOKKOS_ENABLE_OPENMP ) \|\| defined( KOKKOS_ENABLE_PTHREAD )
	const int num_threads = args.num_threads;
	const int use_numa = args.num_numa;
	#endif // defined( KOKKOS_ENABLE_OPENMP ) \|\| defined( KOKKOS_ENABLE_PTHREAD )
	#if defined( KOKKOS_ENABLE_CUDA )
	const int use_gpu = args.device_id;
	#endif // defined( KOKKOS_ENABLE_CUDA )

	#if defined( KOKKOS_ENABLE_OPENMP )
	if( std::is_same< Kokkos::OpenMP , Kokkos::DefaultExecutionSpace >::value \|\|
	std::is_same< Kokkos::OpenMP , Kokkos::HostSpace::execution_space >::value ) {
	if(num_threads>0) {
	if(use_numa>0) {
	Kokkos::OpenMP::initialize(num_threads,use_numa);
	}
	else {
	Kokkos::OpenMP::initialize(num_threads);
	}
	} else {
	Kokkos::OpenMP::initialize();
	}
	//std::cout << "Kokkos::initialize() fyi: OpenMP enabled and initialized" << std::endl ;
	}
	else {
	//std::cout << "Kokkos::initialize() fyi: OpenMP enabled but not initialized" << std::endl ;
	}
	#endif

	#if defined( KOKKOS_ENABLE_PTHREAD )
	if( std::is_same< Kokkos::Threads , Kokkos::DefaultExecutionSpace >::value \|\|
	std::is_same< Kokkos::Threads , Kokkos::HostSpace::execution_space >::value ) {
	if(num_threads>0) {
	if(use_numa>0) {
	Kokkos::Threads::initialize(num_threads,use_numa);
	}
	else {
	Kokkos::Threads::initialize(num_threads);
	}
	} else {
	Kokkos::Threads::initialize();
	}
	//std::cout << "Kokkos::initialize() fyi: Pthread enabled and initialized" << std::endl ;
	}
	else {
	//std::cout << "Kokkos::initialize() fyi: Pthread enabled but not initialized" << std::endl ;
	}
	#endif

	#if defined( KOKKOS_ENABLE_SERIAL )
	// Prevent "unused variable" warning for 'args' input struct. If
	// Serial::initialize() ever needs to take arguments from the input
	// struct, you may remove this line of code.
	(void) args;

	if( std::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value \|\|
	std::is_same< Kokkos::Serial , Kokkos::HostSpace::execution_space >::value ) {
	Kokkos::Serial::initialize();
	}
	#endif

	#if defined( KOKKOS_ENABLE_CUDA )
	if( std::is_same< Kokkos::Cuda , Kokkos::DefaultExecutionSpace >::value \|\| 0 < use_gpu ) {
	if (use_gpu > -1) {
	Kokkos::Cuda::initialize( Kokkos::Cuda::SelectDevice( use_gpu ) );
	}
	else {
	Kokkos::Cuda::initialize();
	}
	//std::cout << "Kokkos::initialize() fyi: Cuda enabled and initialized" << std::endl ;
	}
	#endif

	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	Kokkos::Profiling::initialize();
	#endif
	}

	void finalize_internal( const bool all_spaces = false )
	{

	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	Kokkos::Profiling::finalize();
	#endif

	#if defined( KOKKOS_ENABLE_CUDA )
	if( std::is_same< Kokkos::Cuda , Kokkos::DefaultExecutionSpace >::value \|\| all_spaces ) {
	if(Kokkos::Cuda::is_initialized())
	Kokkos::Cuda::finalize();
	}
	#endif

	#if defined( KOKKOS_ENABLE_OPENMP )
	if( std::is_same< Kokkos::OpenMP , Kokkos::DefaultExecutionSpace >::value \|\|
	std::is_same< Kokkos::OpenMP , Kokkos::HostSpace::execution_space >::value \|\|
	all_spaces ) {
	if(Kokkos::OpenMP::is_initialized())
	Kokkos::OpenMP::finalize();
	}
	#endif

	#if defined( KOKKOS_ENABLE_PTHREAD )
	if( std::is_same< Kokkos::Threads , Kokkos::DefaultExecutionSpace >::value \|\|
	std::is_same< Kokkos::Threads , Kokkos::HostSpace::execution_space >::value \|\|
	all_spaces ) {
	if(Kokkos::Threads::is_initialized())
	Kokkos::Threads::finalize();
	}
	#endif

	#if defined( KOKKOS_ENABLE_SERIAL )
	if( std::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value \|\|
	std::is_same< Kokkos::Serial , Kokkos::HostSpace::execution_space >::value \|\|
	all_spaces ) {
	if(Kokkos::Serial::is_initialized())
	Kokkos::Serial::finalize();
	}
	#endif
	}

	void fence_internal()
	{

	#if defined( KOKKOS_ENABLE_CUDA )
	if( std::is_same< Kokkos::Cuda , Kokkos::DefaultExecutionSpace >::value ) {
	Kokkos::Cuda::fence();
	}
	#endif

	#if defined( KOKKOS_ENABLE_OPENMP )
	if( std::is_same< Kokkos::OpenMP , Kokkos::DefaultExecutionSpace >::value \|\|
	std::is_same< Kokkos::OpenMP , Kokkos::HostSpace::execution_space >::value ) {
	Kokkos::OpenMP::fence();
	}
	#endif

	#if defined( KOKKOS_ENABLE_PTHREAD )
	if( std::is_same< Kokkos::Threads , Kokkos::DefaultExecutionSpace >::value \|\|
	std::is_same< Kokkos::Threads , Kokkos::HostSpace::execution_space >::value ) {
	Kokkos::Threads::fence();
	}
	#endif

	#if defined( KOKKOS_ENABLE_SERIAL )
	if( std::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value \|\|
	std::is_same< Kokkos::Serial , Kokkos::HostSpace::execution_space >::value ) {
	Kokkos::Serial::fence();
	}
	#endif

	}

	} // namespace
	} // namespace Impl
	} // namespace Kokkos

	//----------------------------------------------------------------------------

	namespace Kokkos {

	void initialize(int& narg, char* arg[])
	{
	int num_threads = -1;
	int numa = -1;
	int device = -1;

	int kokkos_threads_found = 0;
	int kokkos_numa_found = 0;
	int kokkos_device_found = 0;
	int kokkos_ndevices_found = 0;

	int iarg = 0;

	while (iarg < narg) {
	if ((strncmp(arg[iarg],"--kokkos-threads",16) == 0) \|\| (strncmp(arg[iarg],"--threads",9) == 0)) {
	//Find the number of threads (expecting --threads=XX)
	if (!((strncmp(arg[iarg],"--kokkos-threads=",17) == 0) \|\| (strncmp(arg[iarg],"--threads=",10) == 0)))
	Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--threads/--kokkos-threads'. Raised by Kokkos::initialize(int narg, char* argc[]).");

	char* number = strchr(arg[iarg],'=')+1;

	if(!Impl::is_unsigned_int(number) \|\| (strlen(number)==0))
	Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--threads/--kokkos-threads'. Raised by Kokkos::initialize(int narg, char* argc[]).");

	if((strncmp(arg[iarg],"--kokkos-threads",16) == 0) \|\| !kokkos_threads_found)
	num_threads = atoi(number);

	//Remove the --kokkos-threads argument from the list but leave --threads
	if(strncmp(arg[iarg],"--kokkos-threads",16) == 0) {
	for(int k=iarg;k<narg-1;k++) {
	arg[k] = arg[k+1];
	}
	kokkos_threads_found=1;
	narg--;
	} else {
	iarg++;
	}
	} else if ((strncmp(arg[iarg],"--kokkos-numa",13) == 0) \|\| (strncmp(arg[iarg],"--numa",6) == 0)) {
	//Find the number of numa (expecting --numa=XX)
	if (!((strncmp(arg[iarg],"--kokkos-numa=",14) == 0) \|\| (strncmp(arg[iarg],"--numa=",7) == 0)))
	Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--numa/--kokkos-numa'. Raised by Kokkos::initialize(int narg, char* argc[]).");

	char* number = strchr(arg[iarg],'=')+1;

	if(!Impl::is_unsigned_int(number) \|\| (strlen(number)==0))
	Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--numa/--kokkos-numa'. Raised by Kokkos::initialize(int narg, char* argc[]).");

	if((strncmp(arg[iarg],"--kokkos-numa",13) == 0) \|\| !kokkos_numa_found)
	numa = atoi(number);

	//Remove the --kokkos-numa argument from the list but leave --numa
	if(strncmp(arg[iarg],"--kokkos-numa",13) == 0) {
	for(int k=iarg;k<narg-1;k++) {
	arg[k] = arg[k+1];
	}
	kokkos_numa_found=1;
	narg--;
	} else {
	iarg++;
	}
	} else if ((strncmp(arg[iarg],"--kokkos-device",15) == 0) \|\| (strncmp(arg[iarg],"--device",8) == 0)) {
	//Find the number of device (expecting --device=XX)
	if (!((strncmp(arg[iarg],"--kokkos-device=",16) == 0) \|\| (strncmp(arg[iarg],"--device=",9) == 0)))
	Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--device/--kokkos-device'. Raised by Kokkos::initialize(int narg, char* argc[]).");

	char* number = strchr(arg[iarg],'=')+1;

	if(!Impl::is_unsigned_int(number) \|\| (strlen(number)==0))
	Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--device/--kokkos-device'. Raised by Kokkos::initialize(int narg, char* argc[]).");

	if((strncmp(arg[iarg],"--kokkos-device",15) == 0) \|\| !kokkos_device_found)
	device = atoi(number);

	//Remove the --kokkos-device argument from the list but leave --device
	if(strncmp(arg[iarg],"--kokkos-device",15) == 0) {
	for(int k=iarg;k<narg-1;k++) {
	arg[k] = arg[k+1];
	}
	kokkos_device_found=1;
	narg--;
	} else {
	iarg++;
	}
	} else if ((strncmp(arg[iarg],"--kokkos-ndevices",17) == 0) \|\| (strncmp(arg[iarg],"--ndevices",10) == 0)) {

	//Find the number of device (expecting --device=XX)
	if (!((strncmp(arg[iarg],"--kokkos-ndevices=",18) == 0) \|\| (strncmp(arg[iarg],"--ndevices=",11) == 0)))
	Impl::throw_runtime_exception("Error: expecting an '=INT[,INT]' after command line argument '--ndevices/--kokkos-ndevices'. Raised by Kokkos::initialize(int narg, char* argc[]).");

	int ndevices=-1;
	int skip_device = 9999;

	char* num1 = strchr(arg[iarg],'=')+1;
	char* num2 = strpbrk(num1,",");
	int num1_len = num2==NULL?strlen(num1):num2-num1;
	char* num1_only = new char[num1_len+1];
	strncpy(num1_only,num1,num1_len);
	num1_only[num1_len]=0;

	if(!Impl::is_unsigned_int(num1_only) \|\| (strlen(num1_only)==0)) {
	Impl::throw_runtime_exception("Error: expecting an integer number after command line argument '--kokkos-ndevices'. Raised by Kokkos::initialize(int narg, char* argc[]).");
	}
	if((strncmp(arg[iarg],"--kokkos-ndevices",17) == 0) \|\| !kokkos_ndevices_found)
	ndevices = atoi(num1_only);

	if( num2 != NULL ) {
	if(( !Impl::is_unsigned_int(num2+1) ) \|\| (strlen(num2)==1) )
	Impl::throw_runtime_exception("Error: expecting an integer number after command line argument '--kokkos-ndevices=XX,'. Raised by Kokkos::initialize(int narg, char* argc[]).");

	if((strncmp(arg[iarg],"--kokkos-ndevices",17) == 0) \|\| !kokkos_ndevices_found)
	skip_device = atoi(num2+1);
	}

	if((strncmp(arg[iarg],"--kokkos-ndevices",17) == 0) \|\| !kokkos_ndevices_found) {
	char *str;
	//if ((str = getenv("SLURM_LOCALID"))) {
	// int local_rank = atoi(str);
	// device = local_rank % ndevices;
	// if (device >= skip_device) device++;
	//}
	if ((str = getenv("MV2_COMM_WORLD_LOCAL_RANK"))) {
	int local_rank = atoi(str);
	device = local_rank % ndevices;
	if (device >= skip_device) device++;
	}
	if ((str = getenv("OMPI_COMM_WORLD_LOCAL_RANK"))) {
	int local_rank = atoi(str);
	device = local_rank % ndevices;
	if (device >= skip_device) device++;
	}
	if(device==-1) {
	device = 0;
	if (device >= skip_device) device++;
	}
	}

	//Remove the --kokkos-ndevices argument from the list but leave --ndevices
	if(strncmp(arg[iarg],"--kokkos-ndevices",17) == 0) {
	for(int k=iarg;k<narg-1;k++) {
	arg[k] = arg[k+1];
	}
	kokkos_ndevices_found=1;
	narg--;
	} else {
	iarg++;
	}
	} else if ((strcmp(arg[iarg],"--kokkos-help") == 0) \|\| (strcmp(arg[iarg],"--help") == 0)) {
	std::cout << std::endl;
	std::cout << "--------------------------------------------------------------------------------" << std::endl;
	std::cout << "-------------Kokkos command line arguments--------------------------------------" << std::endl;
	std::cout << "--------------------------------------------------------------------------------" << std::endl;
	std::cout << "The following arguments exist also without prefix 'kokkos' (e.g. --help)." << std::endl;
	std::cout << "The prefixed arguments will be removed from the list by Kokkos::initialize()," << std::endl;
	std::cout << "the non-prefixed ones are not removed. Prefixed versions take precedence over " << std::endl;
	std::cout << "non prefixed ones, and the last occurence of an argument overwrites prior" << std::endl;
	std::cout << "settings." << std::endl;
	std::cout << std::endl;
	std::cout << "--kokkos-help : print this message" << std::endl;
	std::cout << "--kokkos-threads=INT : specify total number of threads or" << std::endl;
	std::cout << " number of threads per NUMA region if " << std::endl;
	std::cout << " used in conjunction with '--numa' option. " << std::endl;
	std::cout << "--kokkos-numa=INT : specify number of NUMA regions used by process." << std::endl;
	std::cout << "--kokkos-device=INT : specify device id to be used by Kokkos. " << std::endl;
	std::cout << "--kokkos-ndevices=INT[,INT] : used when running MPI jobs. Specify number of" << std::endl;
	std::cout << " devices per node to be used. Process to device" << std::endl;
	std::cout << " mapping happens by obtaining the local MPI rank" << std::endl;
	std::cout << " and assigning devices round-robin. The optional" << std::endl;
	std::cout << " second argument allows for an existing device" << std::endl;
	std::cout << " to be ignored. This is most useful on workstations" << std::endl;
	std::cout << " with multiple GPUs of which one is used to drive" << std::endl;
	std::cout << " screen output." << std::endl;
	std::cout << std::endl;
	std::cout << "--------------------------------------------------------------------------------" << std::endl;
	std::cout << std::endl;

	//Remove the --kokkos-help argument from the list but leave --ndevices
	if(strcmp(arg[iarg],"--kokkos-help") == 0) {
	for(int k=iarg;k<narg-1;k++) {
	arg[k] = arg[k+1];
	}
	narg--;
	} else {
	iarg++;
	}
	} else
	iarg++;
	}

	InitArguments arguments;
	arguments.num_threads = num_threads;
	arguments.num_numa = numa;
	arguments.device_id = device;
	Impl::initialize_internal(arguments);
	}

	void initialize(const InitArguments& arguments) {
	Impl::initialize_internal(arguments);
	}

	void finalize()
	{
	Impl::finalize_internal();
	}

	void finalize_all()
	{
	enum { all_spaces = true };
	Impl::finalize_internal( all_spaces );
	}

	void fence()
	{
	Impl::fence_internal();
	}

	+void print_configuration( std::ostream & out , const bool detail )
	+{
	+ std::ostringstream msg;
	+
	+ msg << "Compiler:" << std::endl;
	+#ifdef KOKKOS_COMPILER_APPLECC
	+ msg << " KOKKOS_COMPILER_APPLECC: " << KOKKOS_COMPILER_APPLECC << std::endl;
	+#endif
	+#ifdef KOKKOS_COMPILER_CLANG
	+ msg << " KOKKOS_COMPILER_CLANG: " << KOKKOS_COMPILER_CLANG << std::endl;
	+#endif
	+#ifdef KOKKOS_COMPILER_CRAYC
	+ msg << " KOKKOS_COMPILER_CRAYC: " << KOKKOS_COMPILER_CRAYC << std::endl;
	+#endif
	+#ifdef KOKKOS_COMPILER_GNU
	+ msg << " KOKKOS_COMPILER_GNU: " << KOKKOS_COMPILER_GNU << std::endl;
	+#endif
	+#ifdef KOKKOS_COMPILER_IBM
	+ msg << " KOKKOS_COMPILER_IBM: " << KOKKOS_COMPILER_IBM << std::endl;
	+#endif
	+#ifdef KOKKOS_COMPILER_INTEL
	+ msg << " KOKKOS_COMPILER_INTEL: " << KOKKOS_COMPILER_INTEL << std::endl;
	+#endif
	+#ifdef KOKKOS_COMPILER_NVCC
	+ msg << " KOKKOS_COMPILER_NVCC: " << KOKKOS_COMPILER_NVCC << std::endl;
	+#endif
	+#ifdef KOKKOS_COMPILER_PGI
	+ msg << " KOKKOS_COMPILER_PGI: " << KOKKOS_COMPILER_PGI << std::endl;
	+#endif
	+
	+
	+ msg << "Architecture:" << std::endl;
	+#ifdef KOKKOS_ENABLE_ISA_KNC
	+ msg << " KOKKOS_ENABLE_ISA_KNC: yes" << std::endl;
	+#else
	+ msg << " KOKKOS_ENABLE_ISA_KNC: no" << std::endl;
	+#endif
	+#ifdef KOKKOS_ENABLE_ISA_POWERPCLE
	+ msg << " KOKKOS_ENABLE_ISA_POWERPCLE: yes" << std::endl;
	+#else
	+ msg << " KOKKOS_ENABLE_ISA_POWERPCLE: no" << std::endl;
	+#endif
	+#ifdef KOKKOS_ENABLE_ISA_X86_64
	+ msg << " KOKKOS_ENABLE_ISA_X86_64: yes" << std::endl;
	+#else
	+ msg << " KOKKOS_ENABLE_ISA_X86_64: no" << std::endl;
	+#endif
	+
	+
	+ msg << "Devices:" << std::endl;
	+ msg << " KOKKOS_ENABLE_CUDA: ";
	+#ifdef KOKKOS_ENABLE_CUDA
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_OPENMP: ";
	+#ifdef KOKKOS_ENABLE_OPENMP
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_PTHREAD: ";
	+#ifdef KOKKOS_ENABLE_PTHREAD
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_STDTHREAD: ";
	+#ifdef KOKKOS_ENABLE_STDTHREAD
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_WINTHREAD: ";
	+#ifdef KOKKOS_ENABLE_WINTHREAD
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_QTHREADS: ";
	+#ifdef KOKKOS_ENABLE_QTHREADS
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_SERIAL: ";
	+#ifdef KOKKOS_ENABLE_SERIAL
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+
	+
	+ msg << "Default Device:" << std::endl;
	+ msg << " KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA: ";
	+#ifdef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP: ";
	+#ifdef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS: ";
	+#ifdef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS: ";
	+#ifdef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL: ";
	+#ifdef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+
	+
	+ msg << "Atomics:" << std::endl;
	+ msg << " KOKKOS_ENABLE_CUDA_ATOMICS: ";
	+#ifdef KOKKOS_ENABLE_CUDA_ATOMICS
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_GNU_ATOMICS: ";
	+#ifdef KOKKOS_ENABLE_GNU_ATOMICS
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_INTEL_ATOMICS: ";
	+#ifdef KOKKOS_ENABLE_INTEL_ATOMICS
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_OPENMP_ATOMICS: ";
	+#ifdef KOKKOS_ENABLE_OPENMP_ATOMICS
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_WINDOWS_ATOMICS: ";
	+#ifdef KOKKOS_ENABLE_WINDOWS_ATOMICS
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+
	+
	+ msg << "Vectorization:" << std::endl;
	+ msg << " KOKKOS_ENABLE_PRAGMA_IVDEP: ";
	+#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_PRAGMA_LOOPCOUNT: ";
	+#ifdef KOKKOS_ENABLE_PRAGMA_LOOPCOUNT
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_PRAGMA_SIMD: ";
	+#ifdef KOKKOS_ENABLE_PRAGMA_SIMD
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_PRAGMA_UNROLL: ";
	+#ifdef KOKKOS_ENABLE_PRAGMA_UNROLL
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_PRAGMA_VECTOR: ";
	+#ifdef KOKKOS_ENABLE_PRAGMA_VECTOR
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+
	+ msg << "Memory:" << std::endl;
	+ msg << " KOKKOS_ENABLE_HBWSPACE: ";
	+#ifdef KOKKOS_ENABLE_HBWSPACE
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_INTEL_MM_ALLOC: ";
	+#ifdef KOKKOS_ENABLE_INTEL_MM_ALLOC
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_POSIX_MEMALIGN: ";
	+#ifdef KOKKOS_ENABLE_POSIX_MEMALIGN
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+
	+
	+ msg << "Options:" << std::endl;
	+ msg << " KOKKOS_ENABLE_ASM: ";
	+#ifdef KOKKOS_ENABLE_ASM
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_CXX1Z: ";
	+#ifdef KOKKOS_ENABLE_CXX1Z
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK: ";
	+#ifdef KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_HWLOC: ";
	+#ifdef KOKKOS_ENABLE_HWLOC
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_LIBRT: ";
	+#ifdef KOKKOS_ENABLE_LIBRT
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_MPI: ";
	+#ifdef KOKKOS_ENABLE_MPI
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_PROFILING: ";
	+#ifdef KOKKOS_ENABLE_PROFILING
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+
	+#ifdef KOKKOS_ENABLE_CUDA
	+ msg << "Cuda Options:" << std::endl;
	+ msg << " KOKKOS_ENABLE_CUDA_LAMBDA: ";
	+#ifdef KOKKOS_ENABLE_CUDA_LAMBDA
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_CUDA_LDG_INTRINSIC: ";
	+#ifdef KOKKOS_ENABLE_CUDA_LDG_INTRINSIC
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE: ";
	+#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_CUDA_UVM: ";
	+#ifdef KOKKOS_ENABLE_CUDA_UVM
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_CUSPARSE: ";
	+#ifdef KOKKOS_ENABLE_CUSPARSE
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+ msg << " KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA: ";
	+#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
	+ msg << "yes" << std::endl;
	+#else
	+ msg << "no" << std::endl;
	+#endif
	+
	+#endif
	+
	+ msg << "\nRuntime Configuration:" << std::endl;
	+#ifdef KOKKOS_ENABLE_CUDA
	+ Cuda::print_configuration(msg, detail);
	+#endif
	+#ifdef KOKKOS_ENABLE_OPENMP
	+ OpenMP::print_configuration(msg, detail);
	+#endif
	+#if defined( KOKKOS_ENABLE_PTHREAD ) \|\| defined( WINTHREAD )
	+ Threads::print_configuration(msg, detail);
	+#endif
	+#ifdef KOKKOS_ENABLE_QTHREADS
	+ Qthreads::print_configuration(msg, detail);
	+#endif
	+#ifdef KOKKOS_ENABLE_SERIAL
	+ Serial::print_configuration(msg, detail);
	+#endif
	+
	+ out << msg.str() << std::endl;
	+}
	+
	} // namespace Kokkos

	diff --git a/lib/kokkos/core/src/impl/Kokkos_FunctorAnalysis.hpp b/lib/kokkos/core/src/impl/Kokkos_FunctorAnalysis.hpp
	new file mode 100644
	index 000000000..b425b3f19
	--- /dev/null
	+++ b/lib/kokkos/core/src/impl/Kokkos_FunctorAnalysis.hpp
	@@ -0,0 +1,653 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+#ifndef KOKKOS_FUNCTORANALYSIS_HPP
	+#define KOKKOS_FUNCTORANALYSIS_HPP
	+
	+#include <cstddef>
	+#include <Kokkos_Core_fwd.hpp>
	+#include <impl/Kokkos_Traits.hpp>
	+#include <impl/Kokkos_Tags.hpp>
	+#include <impl/Kokkos_Reducer.hpp>
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+namespace Impl {
	+
	+struct FunctorPatternInterface {
	+ struct FOR {};
	+ struct REDUCE {};
	+ struct SCAN {};
	+};
	+
	+/** \brief Query Functor and execution policy argument tag for value type.
	+ *
	+ * If 'value_type' is not explicitly declared in the functor
	+ * then attempt to deduce the type from FunctorType::operator()
	+ * interface used by the pattern and policy.
	+ *
	+ * For the REDUCE pattern generate a Reducer and finalization function
	+ * derived from what is available within the functor.
	+ */
	+template< typename PatternInterface , class Policy , class Functor >
	+struct FunctorAnalysis {
	+private:
	+
	+ using FOR = FunctorPatternInterface::FOR ;
	+ using REDUCE = FunctorPatternInterface::REDUCE ;
	+ using SCAN = FunctorPatternInterface::SCAN ;
	+
	+ //----------------------------------------
	+
	+ struct VOID {};
	+
	+ template< typename P = Policy , typename = std::false_type >
	+ struct has_work_tag
	+ {
	+ using type = void ;
	+ using wtag = VOID ;
	+ };
	+
	+ template< typename P >
	+ struct has_work_tag
	+ < P , typename std::is_same< typename P::work_tag , void >::type >
	+ {
	+ using type = typename P::work_tag ;
	+ using wtag = typename P::work_tag ;
	+ };
	+
	+ using Tag = typename has_work_tag<>::type ;
	+ using WTag = typename has_work_tag<>::wtag ;
	+
	+ //----------------------------------------
	+ // Check for Functor::value_type, which is either a simple type T or T[]
	+
	+ template< typename F , typename = std::false_type >
	+ struct has_value_type { using type = void ; };
	+
	+ template< typename F >
	+ struct has_value_type
	+ < F , typename std::is_same< typename F::value_type , void >::type >
	+ {
	+ using type = typename F::value_type ;
	+
	+ static_assert( ! std::is_reference< type >::value &&
	+ std::rank< type >::value <= 1 &&
	+ std::extent< type >::value == 0
	+ , "Kokkos Functor::value_type is T or T[]" );
	+ };
	+
	+ //----------------------------------------
	+ // If Functor::value_type does not exist then evaluate operator(),
	+ // depending upon the pattern and whether the policy has a work tag,
	+ // to determine the reduction or scan value_type.
	+
	+ template< typename F
	+ , typename P = PatternInterface
	+ , typename V = typename has_value_type<F>::type
	+ , bool T = std::is_same< Tag , void >::value
	+ >
	+ struct deduce_value_type { using type = V ; };
	+
	+ template< typename F >
	+ struct deduce_value_type< F , REDUCE , void , true > {
	+
	+ template< typename M , typename A >
	+ KOKKOS_INLINE_FUNCTION static
	+ A deduce( void (Functor::*)( M , A & ) const );
	+
	+ using type = decltype( deduce( & F::operator() ) );
	+ };
	+
	+ template< typename F >
	+ struct deduce_value_type< F , REDUCE , void , false > {
	+
	+ template< typename M , typename A >
	+ KOKKOS_INLINE_FUNCTION static
	+ A deduce( void (Functor::*)( WTag , M , A & ) const );
	+
	+ template< typename M , typename A >
	+ KOKKOS_INLINE_FUNCTION static
	+ A deduce( void (Functor::*)( WTag const & , M , A & ) const );
	+
	+ using type = decltype( deduce( & F::operator() ) );
	+ };
	+
	+ template< typename F >
	+ struct deduce_value_type< F , SCAN , void , true > {
	+
	+ template< typename M , typename A , typename I >
	+ KOKKOS_INLINE_FUNCTION static
	+ A deduce( void (Functor::*)( M , A & , I ) const );
	+
	+ using type = decltype( deduce( & F::operator() ) );
	+ };
	+
	+ template< typename F >
	+ struct deduce_value_type< F , SCAN , void , false > {
	+
	+ template< typename M , typename A , typename I >
	+ KOKKOS_INLINE_FUNCTION static
	+ A deduce( void (Functor::*)( WTag , M , A & , I ) const );
	+
	+ template< typename M , typename A , typename I >
	+ KOKKOS_INLINE_FUNCTION static
	+ A deduce( void (Functor::*)( WTag const & , M , A & , I ) const );
	+
	+ using type = decltype( deduce( & F::operator() ) );
	+ };
	+
	+ //----------------------------------------
	+
	+ using candidate_type = typename deduce_value_type< Functor >::type ;
	+
	+ enum { candidate_is_void = std::is_same< candidate_type , void >::value
	+ , candidate_is_array = std::rank< candidate_type >::value == 1 };
	+
	+ //----------------------------------------
	+
	+public:
	+
	+ using value_type = typename std::remove_extent< candidate_type >::type ;
	+
	+ static_assert( ! std::is_const< value_type >::value
	+ , "Kokkos functor operator reduce argument cannot be const" );
	+
	+private:
	+
	+ // Stub to avoid defining a type 'void &'
	+ using ValueType = typename
	+ std::conditional< candidate_is_void , VOID , value_type >::type ;
	+
	+public:
	+
	+ using pointer_type = typename
	+ std::conditional< candidate_is_void , void , ValueType * >::type ;
	+
	+ using reference_type = typename
	+ std::conditional< candidate_is_array , ValueType * , typename
	+ std::conditional< ! candidate_is_void , ValueType & , void >
	+ ::type >::type ;
	+
	+private:
	+
	+ template< bool IsArray , class FF >
	+ KOKKOS_INLINE_FUNCTION static
	+ typename std::enable_if< IsArray , unsigned >::type
	+ get_length( FF const & f ) { return f.value_count ; }
	+
	+ template< bool IsArray , class FF >
	+ KOKKOS_INLINE_FUNCTION static
	+ typename std::enable_if< ! IsArray , unsigned >::type
	+ get_length( FF const & ) { return 1 ; }
	+
	+public:
	+
	+ enum { StaticValueSize = ! candidate_is_void &&
	+ ! candidate_is_array
	+ ? sizeof(ValueType) : 0 };
	+
	+ KOKKOS_FORCEINLINE_FUNCTION static
	+ unsigned value_count( const Functor & f )
	+ { return FunctorAnalysis::template get_length< candidate_is_array >(f); }
	+
	+ KOKKOS_FORCEINLINE_FUNCTION static
	+ unsigned value_size( const Functor & f )
	+ { return FunctorAnalysis::template get_length< candidate_is_array >(f) * sizeof(ValueType); }
	+
	+ //----------------------------------------
	+
	+ template< class Unknown >
	+ KOKKOS_FORCEINLINE_FUNCTION static
	+ unsigned value_count( const Unknown & )
	+ { return 1 ; }
	+
	+ template< class Unknown >
	+ KOKKOS_FORCEINLINE_FUNCTION static
	+ unsigned value_size( const Unknown & )
	+ { return sizeof(ValueType); }
	+
	+private:
	+
	+ enum INTERFACE : int
	+ { DISABLE = 0
	+ , NO_TAG_NOT_ARRAY = 1
	+ , NO_TAG_IS_ARRAY = 2
	+ , HAS_TAG_NOT_ARRAY = 3
	+ , HAS_TAG_IS_ARRAY = 4
	+ , DEDUCED =
	+ ! std::is_same< PatternInterface , REDUCE >::value ? DISABLE : (
	+ std::is_same<Tag,void>::value
	+ ? (candidate_is_array ? NO_TAG_IS_ARRAY : NO_TAG_NOT_ARRAY)
	+ : (candidate_is_array ? HAS_TAG_IS_ARRAY : HAS_TAG_NOT_ARRAY) )
	+ };
	+
	+ //----------------------------------------
	+ // parallel_reduce join operator
	+
	+ template< class F , INTERFACE >
	+ struct has_join_function ;
	+
	+ template< class F >
	+ struct has_join_function< F , NO_TAG_NOT_ARRAY >
	+ {
	+ typedef volatile ValueType & vref_type ;
	+ typedef volatile const ValueType & cvref_type ;
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::*)( vref_type , cvref_type ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (*)( vref_type , cvref_type ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void join( F const & f
	+ , ValueType volatile * dst
	+ , ValueType volatile const * src )
	+ { f.join( dst , src ); }
	+ };
	+
	+ template< class F >
	+ struct has_join_function< F , NO_TAG_IS_ARRAY >
	+ {
	+ typedef volatile ValueType * vref_type ;
	+ typedef volatile const ValueType * cvref_type ;
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::*)( vref_type , cvref_type ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (*)( vref_type , cvref_type ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void join( F const & f
	+ , ValueType volatile * dst
	+ , ValueType volatile const * src )
	+ { f.join( dst , src ); }
	+ };
	+
	+ template< class F >
	+ struct has_join_function< F , HAS_TAG_NOT_ARRAY >
	+ {
	+ typedef volatile ValueType & vref_type ;
	+ typedef volatile const ValueType & cvref_type ;
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::*)( WTag , vref_type , cvref_type ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (*)( WTag , vref_type , cvref_type ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::*)( WTag const & , vref_type , cvref_type ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (*)( WTag const & , vref_type , cvref_type ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void join( F const & f
	+ , ValueType volatile * dst
	+ , ValueType volatile const * src )
	+ { f.join( WTag() , dst , src ); }
	+ };
	+
	+ template< class F >
	+ struct has_join_function< F , HAS_TAG_IS_ARRAY >
	+ {
	+ typedef volatile ValueType * vref_type ;
	+ typedef volatile const ValueType * cvref_type ;
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::*)( WTag , vref_type , cvref_type ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (*)( WTag , vref_type , cvref_type ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::*)( WTag const & , vref_type , cvref_type ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (*)( WTag const & , vref_type , cvref_type ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void join( F const & f
	+ , ValueType volatile * dst
	+ , ValueType volatile const * src )
	+ { f.join( WTag() , dst , src ); }
	+ };
	+
	+
	+ template< class F = Functor
	+ , INTERFACE = DEDUCED
	+ , typename = void >
	+ struct DeduceJoin
	+ {
	+ KOKKOS_INLINE_FUNCTION static
	+ void join( F const & f
	+ , ValueType volatile * dst
	+ , ValueType volatile const * src )
	+ {
	+ const int n = FunctorAnalysis::value_count( f );
	+ for ( int i = 0 ; i < n ; ++i ) dst[i] += src[i];
	+ }
	+ };
	+
	+ template< class F >
	+ struct DeduceJoin< F , DISABLE , void >
	+ {
	+ KOKKOS_INLINE_FUNCTION static
	+ void join( F const &
	+ , ValueType volatile *
	+ , ValueType volatile const * ) {}
	+ };
	+
	+ template< class F , INTERFACE I >
	+ struct DeduceJoin< F , I ,
	+ decltype( has_join_function<F,I>::enable_if( & F::join ) ) >
	+ : public has_join_function<F,I> {};
	+
	+ //----------------------------------------
	+
	+ template< class , INTERFACE >
	+ struct has_init_function ;
	+
	+ template< class F >
	+ struct has_init_function< F , NO_TAG_NOT_ARRAY >
	+ {
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::*)( ValueType & ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (*)( ValueType & ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void init( F const & f , ValueType * dst )
	+ { f.init( *dst ); }
	+ };
	+
	+ template< class F >
	+ struct has_init_function< F , NO_TAG_IS_ARRAY >
	+ {
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::)( ValueType ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void ()( ValueType ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void init( F const & f , ValueType * dst )
	+ { f.init( dst ); }
	+ };
	+
	+ template< class F >
	+ struct has_init_function< F , HAS_TAG_NOT_ARRAY >
	+ {
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::*)( WTag , ValueType & ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::*)( WTag const & , ValueType & ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (*)( WTag , ValueType & ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (*)( WTag const & , ValueType & ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void init( F const & f , ValueType * dst )
	+ { f.init( WTag(), *dst ); }
	+ };
	+
	+ template< class F >
	+ struct has_init_function< F , HAS_TAG_IS_ARRAY >
	+ {
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::)( WTag , ValueType ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::)( WTag const & , ValueType ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void ()( WTag , ValueType ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void ()( WTag const & , ValueType ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void init( F const & f , ValueType * dst )
	+ { f.init( WTag(), dst ); }
	+ };
	+
	+ template< class F = Functor
	+ , INTERFACE = DEDUCED
	+ , typename = void >
	+ struct DeduceInit
	+ {
	+ KOKKOS_INLINE_FUNCTION static
	+ void init( F const & , ValueType * dst ) { new(dst) ValueType(); }
	+ };
	+
	+ template< class F >
	+ struct DeduceInit< F , DISABLE , void >
	+ {
	+ KOKKOS_INLINE_FUNCTION static
	+ void init( F const & , ValueType * ) {}
	+ };
	+
	+ template< class F , INTERFACE I >
	+ struct DeduceInit< F , I ,
	+ decltype( has_init_function<F,I>::enable_if( & F::init ) ) >
	+ : public has_init_function<F,I> {};
	+
	+ //----------------------------------------
	+
	+public:
	+
	+ struct Reducer
	+ {
	+ private:
	+
	+ Functor const & m_functor ;
	+ ValueType * const m_result ;
	+ int const m_length ;
	+
	+ public:
	+
	+ using reducer = Reducer ;
	+ using value_type = FunctorAnalysis::value_type ;
	+ using memory_space = void ;
	+ using reference_type = FunctorAnalysis::reference_type ;
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void join( ValueType volatile * dst
	+ , ValueType volatile const * src ) const noexcept
	+ { DeduceJoin<>::join( m_functor , dst , src ); }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void init( ValueType * dst ) const noexcept
	+ { DeduceInit<>::init( m_functor , dst ); }
	+
	+ KOKKOS_INLINE_FUNCTION explicit
	+ constexpr Reducer( Functor const & arg_functor
	+ , ValueType * arg_value = 0
	+ , int arg_length = 0 ) noexcept
	+ : m_functor( arg_functor ), m_result(arg_value), m_length(arg_length) {}
	+
	+ KOKKOS_INLINE_FUNCTION
	+ constexpr int length() const noexcept { return m_length ; }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ ValueType & operator[]( int i ) const noexcept
	+ { return m_result[i]; }
	+
	+ private:
	+
	+ template< bool IsArray >
	+ constexpr
	+ typename std::enable_if< IsArray , ValueType * >::type
	+ ref() const noexcept { return m_result ; }
	+
	+ template< bool IsArray >
	+ constexpr
	+ typename std::enable_if< ! IsArray , ValueType & >::type
	+ ref() const noexcept { return *m_result ; }
	+
	+ public:
	+
	+ KOKKOS_INLINE_FUNCTION
	+ auto result() const noexcept
	+ -> decltype( Reducer::template ref< candidate_is_array >() )
	+ { return Reducer::template ref< candidate_is_array >(); }
	+ };
	+
	+ //----------------------------------------
	+
	+private:
	+
	+ template< class , INTERFACE >
	+ struct has_final_function ;
	+
	+ // No tag, not array
	+ template< class F >
	+ struct has_final_function< F , NO_TAG_NOT_ARRAY >
	+ {
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::*)( ValueType & ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (*)( ValueType & ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void final( F const & f , ValueType * dst )
	+ { f.final( *dst ); }
	+ };
	+
	+ // No tag, is array
	+ template< class F >
	+ struct has_final_function< F , NO_TAG_IS_ARRAY >
	+ {
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::)( ValueType ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void ()( ValueType ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void final( F const & f , ValueType * dst )
	+ { f.final( dst ); }
	+ };
	+
	+ // Has tag, not array
	+ template< class F >
	+ struct has_final_function< F , HAS_TAG_NOT_ARRAY >
	+ {
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::*)( WTag , ValueType & ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::*)( WTag const & , ValueType & ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (*)( WTag , ValueType & ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (*)( WTag const & , ValueType & ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void final( F const & f , ValueType * dst )
	+ { f.final( WTag(), *dst ); }
	+ };
	+
	+ // Has tag, is array
	+ template< class F >
	+ struct has_final_function< F , HAS_TAG_IS_ARRAY >
	+ {
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::)( WTag , ValueType ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void (F::)( WTag const & , ValueType ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void ()( WTag , ValueType ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void enable_if( void ()( WTag const & , ValueType ) );
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void final( F const & f , ValueType * dst )
	+ { f.final( WTag(), dst ); }
	+ };
	+
	+ template< class F = Functor
	+ , INTERFACE = DEDUCED
	+ , typename = void >
	+ struct DeduceFinal
	+ {
	+ KOKKOS_INLINE_FUNCTION
	+ static void final( F const & , ValueType * ) {}
	+ };
	+
	+ template< class F , INTERFACE I >
	+ struct DeduceFinal< F , I ,
	+ decltype( has_final_function<F,I>::enable_if( & F::final ) ) >
	+ : public has_init_function<F,I> {};
	+
	+public:
	+
	+ static void final( Functor const & f , ValueType * result )
	+ { DeduceFinal<>::final( f , result ); }
	+
	+};
	+
	+} // namespace Impl
	+} // namespace Kokkos
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+#endif /* KOKKOS_FUNCTORANALYSIS_HPP */
	+
	diff --git a/lib/kokkos/core/src/impl/Kokkos_HBWSpace.cpp b/lib/kokkos/core/src/impl/Kokkos_HBWSpace.cpp
	index 96d30d0c4..eb1f5ce96 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_HBWSpace.cpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_HBWSpace.cpp
	@@ -1,399 +1,399 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/


	#include <Kokkos_Macros.hpp>


	#include <stddef.h>
	#include <stdlib.h>
	#include <stdint.h>
	#include <memory.h>

	#include <iostream>
	#include <sstream>
	#include <cstring>
	#include <algorithm>

	#include <Kokkos_HBWSpace.hpp>
	#include <impl/Kokkos_Error.hpp>
	#include <Kokkos_Atomic.hpp>
	#ifdef KOKKOS_ENABLE_HBWSPACE
	#include <memkind.h>
	#endif

	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	#include <impl/Kokkos_Profiling_Interface.hpp>
	#endif

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------
	#ifdef KOKKOS_ENABLE_HBWSPACE
	#define MEMKIND_TYPE MEMKIND_HBW //hbw_get_kind(HBW_PAGESIZE_4KB)

	namespace Kokkos {
	namespace Experimental {
	namespace {

	static const int QUERY_SPACE_IN_PARALLEL_MAX = 16 ;

	typedef int (* QuerySpaceInParallelPtr )();

	QuerySpaceInParallelPtr s_in_parallel_query[ QUERY_SPACE_IN_PARALLEL_MAX ] ;
	int s_in_parallel_query_count = 0 ;

	} // namespace <empty>

	void HBWSpace::register_in_parallel( int (*device_in_parallel)() )
	{
	if ( 0 == device_in_parallel ) {
	Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::HBWSpace::register_in_parallel ERROR : given NULL" ) );
	}

	int i = -1 ;

	if ( ! (device_in_parallel)() ) {
	for ( i = 0 ; i < s_in_parallel_query_count && ! (*(s_in_parallel_query[i]))() ; ++i );
	}

	if ( i < s_in_parallel_query_count ) {
	Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::HBWSpace::register_in_parallel_query ERROR : called in_parallel" ) );

	}

	if ( QUERY_SPACE_IN_PARALLEL_MAX <= i ) {
	Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::HBWSpace::register_in_parallel_query ERROR : exceeded maximum" ) );

	}

	for ( i = 0 ; i < s_in_parallel_query_count && s_in_parallel_query[i] != device_in_parallel ; ++i );

	if ( i == s_in_parallel_query_count ) {
	s_in_parallel_query[s_in_parallel_query_count++] = device_in_parallel ;
	}
	}

	int HBWSpace::in_parallel()
	{
	const int n = s_in_parallel_query_count ;

	int i = 0 ;

	while ( i < n && ! (*(s_in_parallel_query[i]))() ) { ++i ; }

	return i < n ;
	}

	} // namespace Experiemtal
	} // namespace Kokkos

	/--------------------------------------------------------------------------/

	namespace Kokkos {
	namespace Experimental {

	/* Default allocation mechanism */
	HBWSpace::HBWSpace()
	: m_alloc_mech(
	HBWSpace::STD_MALLOC
	)
	{
	printf("Init\n");
	setenv("MEMKIND_HBW_NODES", "1", 0);
	}

	/* Default allocation mechanism */
	HBWSpace::HBWSpace( const HBWSpace::AllocationMechanism & arg_alloc_mech )
	: m_alloc_mech( HBWSpace::STD_MALLOC )
	{
	printf("Init2\n");
	setenv("MEMKIND_HBW_NODES", "1", 0);
	if ( arg_alloc_mech == STD_MALLOC ) {
	m_alloc_mech = HBWSpace::STD_MALLOC ;
	}
	}

	void * HBWSpace::allocate( const size_t arg_alloc_size ) const
	{
	static_assert( sizeof(void*) == sizeof(uintptr_t)
	, "Error sizeof(void*) != sizeof(uintptr_t)" );

	static_assert( Kokkos::Impl::power_of_two< Kokkos::Impl::MEMORY_ALIGNMENT >::value
	, "Memory alignment must be power of two" );

	constexpr uintptr_t alignment = Kokkos::Impl::MEMORY_ALIGNMENT ;
	constexpr uintptr_t alignment_mask = alignment - 1 ;

	void * ptr = 0 ;

	if ( arg_alloc_size ) {

	if ( m_alloc_mech == STD_MALLOC ) {
	// Over-allocate to and round up to guarantee proper alignment.
	size_t size_padded = arg_alloc_size + sizeof(void*) + alignment ;

	void * alloc_ptr = memkind_malloc(MEMKIND_TYPE, size_padded );

	if (alloc_ptr) {
	uintptr_t address = reinterpret_cast<uintptr_t>(alloc_ptr);

	// offset enough to record the alloc_ptr
	address += sizeof(void *);
	uintptr_t rem = address % alignment;
	uintptr_t offset = rem ? (alignment - rem) : 0u;
	address += offset;
	ptr = reinterpret_cast<void *>(address);
	// record the alloc'd pointer
	address -= sizeof(void *);
	reinterpret_cast<void *>(address) = alloc_ptr;
	}
	}
	}

	if ( ( ptr == 0 ) \|\| ( reinterpret_cast<uintptr_t>(ptr) == ~uintptr_t(0) )
	\|\| ( reinterpret_cast<uintptr_t>(ptr) & alignment_mask ) ) {
	std::ostringstream msg ;
	msg << "Kokkos::Experimental::HBWSpace::allocate[ " ;
	switch( m_alloc_mech ) {
	case STD_MALLOC: msg << "STD_MALLOC" ; break ;
	}
	msg << " ]( " << arg_alloc_size << " ) FAILED" ;
	- if ( ptr == NULL ) { msg << " NULL" ; }
	+ if ( ptr == NULL ) { msg << " NULL" ; }
	else { msg << " NOT ALIGNED " << ptr ; }

	std::cerr << msg.str() << std::endl ;
	std::cerr.flush();

	Kokkos::Impl::throw_runtime_exception( msg.str() );
	}

	return ptr;
	}


	void HBWSpace::deallocate( void * const arg_alloc_ptr , const size_t arg_alloc_size ) const
	{
	if ( arg_alloc_ptr ) {

	if ( m_alloc_mech == STD_MALLOC ) {
	void * alloc_ptr = (reinterpret_cast<void *>(arg_alloc_ptr) -1);
	memkind_free(MEMKIND_TYPE, alloc_ptr );
	- }
	+ }

	}
	}

	constexpr const char* HBWSpace::name() {
	return m_name;
	}

	} // namespace Experimental
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	SharedAllocationRecord< void , void >
	SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::s_root_record ;

	void
	SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
	deallocate( SharedAllocationRecord< void , void > * arg_rec )
	{
	delete static_cast<SharedAllocationRecord*>(arg_rec);
	}

	SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
	~SharedAllocationRecord()
	{
	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::deallocateData(
	Kokkos::Profiling::SpaceHandle(Kokkos::Experimental::HBWSpace::name()),RecordBase::m_alloc_ptr->m_label,
	data(),size());
	}
	#endif

	m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
	, SharedAllocationRecord< void , void >::m_alloc_size
	);
	}

	SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
	SharedAllocationRecord( const Kokkos::Experimental::HBWSpace & arg_space
	, const std::string & arg_label
	, const size_t arg_alloc_size
	, const SharedAllocationRecord< void , void >::function_type arg_dealloc
	)
	// Pass through allocated [ SharedAllocationHeader , user_memory ]
	// Pass through deallocation function
	: SharedAllocationRecord< void , void >
	( & SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::s_root_record
	, reinterpret_cast<SharedAllocationHeader*>( arg_space.allocate( sizeof(SharedAllocationHeader) + arg_alloc_size ) )
	, sizeof(SharedAllocationHeader) + arg_alloc_size
	, arg_dealloc
	)
	, m_space( arg_space )
	{
	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
	}
	#endif

	// Fill in the Header information
	RecordBase::m_alloc_ptr->m_record = static_cast< SharedAllocationRecord< void , void > * >( this );

	strncpy( RecordBase::m_alloc_ptr->m_label
	, arg_label.c_str()
	, SharedAllocationHeader::maximum_label_length
	);
	}

	//----------------------------------------------------------------------------

	void * SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
	allocate_tracked( const Kokkos::Experimental::HBWSpace & arg_space
	- , const std::string & arg_alloc_label
	+ , const std::string & arg_alloc_label
	, const size_t arg_alloc_size )
	{
	if ( ! arg_alloc_size ) return (void *) 0 ;

	SharedAllocationRecord * const r =
	allocate( arg_space , arg_alloc_label , arg_alloc_size );

	RecordBase::increment( r );

	return r->data();
	}

	void SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
	deallocate_tracked( void * const arg_alloc_ptr )
	{
	if ( arg_alloc_ptr != 0 ) {
	SharedAllocationRecord * const r = get_record( arg_alloc_ptr );

	RecordBase::decrement( r );
	}
	}

	void * SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
	reallocate_tracked( void * const arg_alloc_ptr
	, const size_t arg_alloc_size )
	{
	SharedAllocationRecord * const r_old = get_record( arg_alloc_ptr );
	SharedAllocationRecord * const r_new = allocate( r_old->m_space , r_old->get_label() , arg_alloc_size );

	Kokkos::Impl::DeepCopy<Kokkos::Experimental::HBWSpace,Kokkos::Experimental::HBWSpace>( r_new->data() , r_old->data()
	, std::min( r_old->size() , r_new->size() ) );

	RecordBase::increment( r_new );
	RecordBase::decrement( r_old );

	return r_new->data();
	}

	SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void > *
	SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::get_record( void * alloc_ptr )
	{
	typedef SharedAllocationHeader Header ;
	typedef SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void > RecordHost ;

	SharedAllocationHeader const * const head = alloc_ptr ? Header::get_header( alloc_ptr ) : (SharedAllocationHeader *)0 ;
	RecordHost * const record = head ? static_cast< RecordHost * >( head->m_record ) : (RecordHost *) 0 ;

	if ( ! alloc_ptr \|\| record->m_alloc_ptr != head ) {
	Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::get_record ERROR" ) );
	}

	return record ;
	}

	// Iterate records to print orphaned memory ...
	void SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
	print_records( std::ostream & s , const Kokkos::Experimental::HBWSpace & space , bool detail )
	{
	SharedAllocationRecord< void , void >::print_host_accessible_records( s , "HBWSpace" , & s_root_record , detail );
	}

	} // namespace Impl
	} // namespace Kokkos

	/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/

	namespace Kokkos {
	namespace Experimental {
	namespace {
	const unsigned HBW_SPACE_ATOMIC_MASK = 0xFFFF;
	const unsigned HBW_SPACE_ATOMIC_XOR_MASK = 0x5A39;
	static int HBW_SPACE_ATOMIC_LOCKS[HBW_SPACE_ATOMIC_MASK+1];
	}

	namespace Impl {
	void init_lock_array_hbw_space() {
	static int is_initialized = 0;
	if(! is_initialized)
	for(int i = 0; i < static_cast<int> (HBW_SPACE_ATOMIC_MASK+1); i++)
	HBW_SPACE_ATOMIC_LOCKS[i] = 0;
	}

	bool lock_address_hbw_space(void* ptr) {
	return 0 == atomic_compare_exchange( &HBW_SPACE_ATOMIC_LOCKS[
	(( size_t(ptr) >> 2 ) & HBW_SPACE_ATOMIC_MASK) ^ HBW_SPACE_ATOMIC_XOR_MASK] ,
	0 , 1);
	}

	void unlock_address_hbw_space(void* ptr) {
	atomic_exchange( &HBW_SPACE_ATOMIC_LOCKS[
	(( size_t(ptr) >> 2 ) & HBW_SPACE_ATOMIC_MASK) ^ HBW_SPACE_ATOMIC_XOR_MASK] ,
	0);
	}

	}
	}
	}
	#endif
	diff --git a/lib/kokkos/core/src/impl/Kokkos_HostSpace.cpp b/lib/kokkos/core/src/impl/Kokkos_HostSpace.cpp
	index 3cd603728..67be86c9a 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_HostSpace.cpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_HostSpace.cpp
	@@ -1,505 +1,505 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <algorithm>
	#include <Kokkos_Macros.hpp>
	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	#include <impl/Kokkos_Profiling_Interface.hpp>
	#endif
	/--------------------------------------------------------------------------/

	#if defined( __INTEL_COMPILER ) && ! defined ( KOKKOS_ENABLE_CUDA )

	// Intel specialized allocator does not interoperate with CUDA memory allocation

	#define KOKKOS_ENABLE_INTEL_MM_ALLOC

	#endif

	/--------------------------------------------------------------------------/

	#if defined(KOKKOS_ENABLE_POSIX_MEMALIGN)

	#include <unistd.h>
	#include <sys/mman.h>

	/* mmap flags for private anonymous memory allocation */

	#if defined( MAP_ANONYMOUS ) && defined( MAP_PRIVATE )
	#define KOKKOS_IMPL_POSIX_MMAP_FLAGS (MAP_PRIVATE \| MAP_ANONYMOUS)
	#elif defined( MAP_ANON ) && defined( MAP_PRIVATE )
	#define KOKKOS_IMPL_POSIX_MMAP_FLAGS (MAP_PRIVATE \| MAP_ANON)
	#endif

	// mmap flags for huge page tables
	// the Cuda driver does not interoperate with MAP_HUGETLB
	#if defined( KOKKOS_IMPL_POSIX_MMAP_FLAGS )
	#if defined( MAP_HUGETLB ) && ! defined( KOKKOS_ENABLE_CUDA )
	#define KOKKOS_IMPL_POSIX_MMAP_FLAGS_HUGE (KOKKOS_IMPL_POSIX_MMAP_FLAGS \| MAP_HUGETLB )
	#else
	#define KOKKOS_IMPL_POSIX_MMAP_FLAGS_HUGE KOKKOS_IMPL_POSIX_MMAP_FLAGS
	#endif
	#endif

	#endif

	/--------------------------------------------------------------------------/

	#include <stddef.h>
	#include <stdlib.h>
	#include <stdint.h>
	#include <memory.h>

	#include <iostream>
	#include <sstream>
	#include <cstring>

	#include <Kokkos_HostSpace.hpp>
	#include <impl/Kokkos_Error.hpp>
	#include <Kokkos_Atomic.hpp>

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace {

	static const int QUERY_SPACE_IN_PARALLEL_MAX = 16 ;

	typedef int (* QuerySpaceInParallelPtr )();

	QuerySpaceInParallelPtr s_in_parallel_query[ QUERY_SPACE_IN_PARALLEL_MAX ] ;
	int s_in_parallel_query_count = 0 ;

	} // namespace <empty>

	void HostSpace::register_in_parallel( int (*device_in_parallel)() )
	{
	if ( 0 == device_in_parallel ) {
	Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::HostSpace::register_in_parallel ERROR : given NULL" ) );
	}

	int i = -1 ;

	if ( ! (device_in_parallel)() ) {
	for ( i = 0 ; i < s_in_parallel_query_count && ! (*(s_in_parallel_query[i]))() ; ++i );
	}

	if ( i < s_in_parallel_query_count ) {
	Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::HostSpace::register_in_parallel_query ERROR : called in_parallel" ) );

	}

	if ( QUERY_SPACE_IN_PARALLEL_MAX <= i ) {
	Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::HostSpace::register_in_parallel_query ERROR : exceeded maximum" ) );

	}

	for ( i = 0 ; i < s_in_parallel_query_count && s_in_parallel_query[i] != device_in_parallel ; ++i );

	if ( i == s_in_parallel_query_count ) {
	s_in_parallel_query[s_in_parallel_query_count++] = device_in_parallel ;
	}
	}

	int HostSpace::in_parallel()
	{
	const int n = s_in_parallel_query_count ;

	int i = 0 ;

	while ( i < n && ! (*(s_in_parallel_query[i]))() ) { ++i ; }

	return i < n ;
	}

	} // namespace Kokkos

	/--------------------------------------------------------------------------/

	namespace Kokkos {

	/* Default allocation mechanism */
	HostSpace::HostSpace()
	: m_alloc_mech(
	#if defined( KOKKOS_ENABLE_INTEL_MM_ALLOC )
	HostSpace::INTEL_MM_ALLOC
	#elif defined( KOKKOS_IMPL_POSIX_MMAP_FLAGS )
	HostSpace::POSIX_MMAP
	#elif defined( KOKKOS_ENABLE_POSIX_MEMALIGN )
	HostSpace::POSIX_MEMALIGN
	#else
	HostSpace::STD_MALLOC
	#endif
	)
	{}

	/* Default allocation mechanism */
	HostSpace::HostSpace( const HostSpace::AllocationMechanism & arg_alloc_mech )
	: m_alloc_mech( HostSpace::STD_MALLOC )
	{
	if ( arg_alloc_mech == STD_MALLOC ) {
	m_alloc_mech = HostSpace::STD_MALLOC ;
	}
	#if defined( KOKKOS_ENABLE_INTEL_MM_ALLOC )
	else if ( arg_alloc_mech == HostSpace::INTEL_MM_ALLOC ) {
	m_alloc_mech = HostSpace::INTEL_MM_ALLOC ;
	}
	#elif defined( KOKKOS_ENABLE_POSIX_MEMALIGN )
	else if ( arg_alloc_mech == HostSpace::POSIX_MEMALIGN ) {
	m_alloc_mech = HostSpace::POSIX_MEMALIGN ;
	}
	#elif defined( KOKKOS_IMPL_POSIX_MMAP_FLAGS )
	else if ( arg_alloc_mech == HostSpace::POSIX_MMAP ) {
	m_alloc_mech = HostSpace::POSIX_MMAP ;
	}
	#endif
	else {
	const char * const mech =
	( arg_alloc_mech == HostSpace::INTEL_MM_ALLOC ) ? "INTEL_MM_ALLOC" : (
	( arg_alloc_mech == HostSpace::POSIX_MEMALIGN ) ? "POSIX_MEMALIGN" : (
	( arg_alloc_mech == HostSpace::POSIX_MMAP ) ? "POSIX_MMAP" : "" ));

	std::string msg ;
	msg.append("Kokkos::HostSpace ");
	msg.append(mech);
	msg.append(" is not available" );
	Kokkos::Impl::throw_runtime_exception( msg );
	}
	}

	void * HostSpace::allocate( const size_t arg_alloc_size ) const
	{
	static_assert( sizeof(void*) == sizeof(uintptr_t)
	, "Error sizeof(void*) != sizeof(uintptr_t)" );

	static_assert( Kokkos::Impl::is_integral_power_of_two( Kokkos::Impl::MEMORY_ALIGNMENT )
	, "Memory alignment must be power of two" );

	constexpr uintptr_t alignment = Kokkos::Impl::MEMORY_ALIGNMENT ;
	constexpr uintptr_t alignment_mask = alignment - 1 ;

	void * ptr = 0 ;

	if ( arg_alloc_size ) {

	if ( m_alloc_mech == STD_MALLOC ) {
	// Over-allocate to and round up to guarantee proper alignment.
	size_t size_padded = arg_alloc_size + sizeof(void*) + alignment ;

	void * alloc_ptr = malloc( size_padded );

	if (alloc_ptr) {
	uintptr_t address = reinterpret_cast<uintptr_t>(alloc_ptr);

	// offset enough to record the alloc_ptr
	address += sizeof(void *);
	uintptr_t rem = address % alignment;
	uintptr_t offset = rem ? (alignment - rem) : 0u;
	address += offset;
	ptr = reinterpret_cast<void *>(address);
	// record the alloc'd pointer
	address -= sizeof(void *);
	reinterpret_cast<void *>(address) = alloc_ptr;
	}
	}

	#if defined( KOKKOS_ENABLE_INTEL_MM_ALLOC )
	else if ( m_alloc_mech == INTEL_MM_ALLOC ) {
	ptr = _mm_malloc( arg_alloc_size , alignment );
	}
	#endif

	#if defined( KOKKOS_ENABLE_POSIX_MEMALIGN )
	else if ( m_alloc_mech == POSIX_MEMALIGN ) {
	posix_memalign( & ptr, alignment , arg_alloc_size );
	}
	#endif

	#if defined( KOKKOS_IMPL_POSIX_MMAP_FLAGS )
	else if ( m_alloc_mech == POSIX_MMAP ) {
	constexpr size_t use_huge_pages = (1u << 27);
	constexpr int prot = PROT_READ \| PROT_WRITE ;
	const int flags = arg_alloc_size < use_huge_pages
	? KOKKOS_IMPL_POSIX_MMAP_FLAGS
	: KOKKOS_IMPL_POSIX_MMAP_FLAGS_HUGE ;

	// read write access to private memory

	ptr = mmap( NULL /* address hint, if NULL OS kernel chooses address */
	, arg_alloc_size /* size in bytes */
	, prot /* memory protection */
	, flags /* visibility of updates */
	, -1 /* file descriptor */
	, 0 /* offset */
	);

	/* Associated reallocation:
	ptr = mremap( old_ptr , old_size , new_size , MREMAP_MAYMOVE );
	*/
	}
	#endif
	}

	if ( ( ptr == 0 ) \|\| ( reinterpret_cast<uintptr_t>(ptr) == ~uintptr_t(0) )
	\|\| ( reinterpret_cast<uintptr_t>(ptr) & alignment_mask ) ) {
	std::ostringstream msg ;
	msg << "Kokkos::HostSpace::allocate[ " ;
	switch( m_alloc_mech ) {
	case STD_MALLOC: msg << "STD_MALLOC" ; break ;
	case POSIX_MEMALIGN: msg << "POSIX_MEMALIGN" ; break ;
	case POSIX_MMAP: msg << "POSIX_MMAP" ; break ;
	case INTEL_MM_ALLOC: msg << "INTEL_MM_ALLOC" ; break ;
	}
	msg << " ]( " << arg_alloc_size << " ) FAILED" ;
	- if ( ptr == NULL ) { msg << " NULL" ; }
	+ if ( ptr == NULL ) { msg << " NULL" ; }
	else { msg << " NOT ALIGNED " << ptr ; }

	std::cerr << msg.str() << std::endl ;
	std::cerr.flush();

	Kokkos::Impl::throw_runtime_exception( msg.str() );
	}

	return ptr;
	}


	void HostSpace::deallocate( void * const arg_alloc_ptr , const size_t arg_alloc_size ) const
	{
	if ( arg_alloc_ptr ) {

	if ( m_alloc_mech == STD_MALLOC ) {
	void * alloc_ptr = (reinterpret_cast<void *>(arg_alloc_ptr) -1);
	free( alloc_ptr );
	- }
	+ }

	#if defined( KOKKOS_ENABLE_INTEL_MM_ALLOC )
	else if ( m_alloc_mech == INTEL_MM_ALLOC ) {
	_mm_free( arg_alloc_ptr );
	}
	#endif

	#if defined( KOKKOS_ENABLE_POSIX_MEMALIGN )
	else if ( m_alloc_mech == POSIX_MEMALIGN ) {
	free( arg_alloc_ptr );
	}
	#endif

	#if defined( KOKKOS_IMPL_POSIX_MMAP_FLAGS )
	else if ( m_alloc_mech == POSIX_MMAP ) {
	munmap( arg_alloc_ptr , arg_alloc_size );
	}
	#endif

	}
	}

	constexpr const char* HostSpace::name() {
	return m_name;
	}
	} // namespace Kokkos

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	SharedAllocationRecord< void , void >
	SharedAllocationRecord< Kokkos::HostSpace , void >::s_root_record ;

	void
	SharedAllocationRecord< Kokkos::HostSpace , void >::
	deallocate( SharedAllocationRecord< void , void > * arg_rec )
	{
	delete static_cast<SharedAllocationRecord*>(arg_rec);
	}

	SharedAllocationRecord< Kokkos::HostSpace , void >::
	~SharedAllocationRecord()
	{
	- #if (KOKKOS_ENABLE_PROFILING)
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::deallocateData(
	Kokkos::Profiling::SpaceHandle(Kokkos::HostSpace::name()),RecordBase::m_alloc_ptr->m_label,
	data(),size());
	}
	#endif

	m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
	, SharedAllocationRecord< void , void >::m_alloc_size
	);
	}

	SharedAllocationRecord< Kokkos::HostSpace , void >::
	SharedAllocationRecord( const Kokkos::HostSpace & arg_space
	, const std::string & arg_label
	, const size_t arg_alloc_size
	, const SharedAllocationRecord< void , void >::function_type arg_dealloc
	)
	// Pass through allocated [ SharedAllocationHeader , user_memory ]
	// Pass through deallocation function
	: SharedAllocationRecord< void , void >
	( & SharedAllocationRecord< Kokkos::HostSpace , void >::s_root_record
	, reinterpret_cast<SharedAllocationHeader*>( arg_space.allocate( sizeof(SharedAllocationHeader) + arg_alloc_size ) )
	, sizeof(SharedAllocationHeader) + arg_alloc_size
	, arg_dealloc
	)
	, m_space( arg_space )
	{
	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	if(Kokkos::Profiling::profileLibraryLoaded()) {
	Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
	}
	#endif
	// Fill in the Header information
	RecordBase::m_alloc_ptr->m_record = static_cast< SharedAllocationRecord< void , void > * >( this );

	strncpy( RecordBase::m_alloc_ptr->m_label
	, arg_label.c_str()
	, SharedAllocationHeader::maximum_label_length
	);
	}

	//----------------------------------------------------------------------------

	void * SharedAllocationRecord< Kokkos::HostSpace , void >::
	allocate_tracked( const Kokkos::HostSpace & arg_space
	- , const std::string & arg_alloc_label
	+ , const std::string & arg_alloc_label
	, const size_t arg_alloc_size )
	{
	if ( ! arg_alloc_size ) return (void *) 0 ;

	SharedAllocationRecord * const r =
	allocate( arg_space , arg_alloc_label , arg_alloc_size );

	RecordBase::increment( r );

	return r->data();
	}

	void SharedAllocationRecord< Kokkos::HostSpace , void >::
	deallocate_tracked( void * const arg_alloc_ptr )
	{
	if ( arg_alloc_ptr != 0 ) {
	SharedAllocationRecord * const r = get_record( arg_alloc_ptr );

	RecordBase::decrement( r );
	}
	}

	void * SharedAllocationRecord< Kokkos::HostSpace , void >::
	reallocate_tracked( void * const arg_alloc_ptr
	, const size_t arg_alloc_size )
	{
	SharedAllocationRecord * const r_old = get_record( arg_alloc_ptr );
	SharedAllocationRecord * const r_new = allocate( r_old->m_space , r_old->get_label() , arg_alloc_size );

	Kokkos::Impl::DeepCopy<HostSpace,HostSpace>( r_new->data() , r_old->data()
	, std::min( r_old->size() , r_new->size() ) );

	RecordBase::increment( r_new );
	RecordBase::decrement( r_old );

	return r_new->data();
	}

	SharedAllocationRecord< Kokkos::HostSpace , void > *
	SharedAllocationRecord< Kokkos::HostSpace , void >::get_record( void * alloc_ptr )
	{
	typedef SharedAllocationHeader Header ;
	typedef SharedAllocationRecord< Kokkos::HostSpace , void > RecordHost ;

	SharedAllocationHeader const * const head = alloc_ptr ? Header::get_header( alloc_ptr ) : (SharedAllocationHeader *)0 ;
	RecordHost * const record = head ? static_cast< RecordHost * >( head->m_record ) : (RecordHost *) 0 ;

	if ( ! alloc_ptr \|\| record->m_alloc_ptr != head ) {
	Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::HostSpace , void >::get_record ERROR" ) );
	}

	return record ;
	}

	// Iterate records to print orphaned memory ...
	void SharedAllocationRecord< Kokkos::HostSpace , void >::
	print_records( std::ostream & s , const Kokkos::HostSpace & space , bool detail )
	{
	SharedAllocationRecord< void , void >::print_host_accessible_records( s , "HostSpace" , & s_root_record , detail );
	}

	} // namespace Impl
	} // namespace Kokkos

	/--------------------------------------------------------------------------/
	/--------------------------------------------------------------------------/

	namespace Kokkos {
	namespace {
	const unsigned HOST_SPACE_ATOMIC_MASK = 0xFFFF;
	const unsigned HOST_SPACE_ATOMIC_XOR_MASK = 0x5A39;
	static int HOST_SPACE_ATOMIC_LOCKS[HOST_SPACE_ATOMIC_MASK+1];
	}

	namespace Impl {
	void init_lock_array_host_space() {
	static int is_initialized = 0;
	if(! is_initialized)
	for(int i = 0; i < static_cast<int> (HOST_SPACE_ATOMIC_MASK+1); i++)
	HOST_SPACE_ATOMIC_LOCKS[i] = 0;
	}

	bool lock_address_host_space(void* ptr) {
	return 0 == atomic_compare_exchange( &HOST_SPACE_ATOMIC_LOCKS[
	(( size_t(ptr) >> 2 ) & HOST_SPACE_ATOMIC_MASK) ^ HOST_SPACE_ATOMIC_XOR_MASK] ,
	0 , 1);
	}

	void unlock_address_host_space(void* ptr) {
	atomic_exchange( &HOST_SPACE_ATOMIC_LOCKS[
	(( size_t(ptr) >> 2 ) & HOST_SPACE_ATOMIC_MASK) ^ HOST_SPACE_ATOMIC_XOR_MASK] ,
	0);
	}

	}
	}
	diff --git a/lib/kokkos/core/src/impl/Kokkos_HostThreadTeam.cpp b/lib/kokkos/core/src/impl/Kokkos_HostThreadTeam.cpp
	new file mode 100644
	index 000000000..ac200209c
	--- /dev/null
	+++ b/lib/kokkos/core/src/impl/Kokkos_HostThreadTeam.cpp
	@@ -0,0 +1,463 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+#include <limits>
	+#include <Kokkos_Macros.hpp>
	+#include <impl/Kokkos_HostThreadTeam.hpp>
	+#include <impl/Kokkos_Error.hpp>
	+#include <impl/Kokkos_spinwait.hpp>
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+namespace Impl {
	+
	+void HostThreadTeamData::organize_pool
	+ ( HostThreadTeamData * members[] , const int size )
	+{
	+ bool ok = true ;
	+
	+ // Verify not already a member of a pool:
	+ for ( int rank = 0 ; rank < size && ok ; ++rank ) {
	+ ok = ( 0 != members[rank] ) && ( 0 == members[rank]->m_pool_scratch );
	+ }
	+
	+ if ( ok ) {
	+
	+ int64_t * const root_scratch = members[0]->m_scratch ;
	+
	+ for ( int i = m_pool_rendezvous ; i < m_pool_reduce ; ++i ) {
	+ root_scratch[i] = 0 ;
	+ }
	+
	+ {
	+ HostThreadTeamData ** const pool =
	+ (HostThreadTeamData **) (root_scratch + m_pool_members);
	+
	+ // team size == 1, league size == pool_size
	+
	+ for ( int rank = 0 ; rank < size ; ++rank ) {
	+ HostThreadTeamData * const mem = members[ rank ] ;
	+ mem->m_pool_scratch = root_scratch ;
	+ mem->m_team_scratch = mem->m_scratch ;
	+ mem->m_pool_rank = rank ;
	+ mem->m_pool_size = size ;
	+ mem->m_team_base = rank ;
	+ mem->m_team_rank = 0 ;
	+ mem->m_team_size = 1 ;
	+ mem->m_team_alloc = 1 ;
	+ mem->m_league_rank = rank ;
	+ mem->m_league_size = size ;
	+ mem->m_pool_rendezvous_step = 0 ;
	+ mem->m_team_rendezvous_step = 0 ;
	+ pool[ rank ] = mem ;
	+ }
	+ }
	+
	+ Kokkos::memory_fence();
	+ }
	+ else {
	+ Kokkos::Impl::throw_runtime_exception("Kokkos::Impl::HostThreadTeamData::organize_pool ERROR pool already exists");
	+ }
	+}
	+
	+void HostThreadTeamData::disband_pool()
	+{
	+ m_work_range.first = -1 ;
	+ m_work_range.second = -1 ;
	+ m_pool_scratch = 0 ;
	+ m_team_scratch = 0 ;
	+ m_pool_rank = 0 ;
	+ m_pool_size = 1 ;
	+ m_team_base = 0 ;
	+ m_team_rank = 0 ;
	+ m_team_size = 1 ;
	+ m_team_alloc = 1 ;
	+ m_league_rank = 0 ;
	+ m_league_size = 1 ;
	+ m_pool_rendezvous_step = 0 ;
	+ m_team_rendezvous_step = 0 ;
	+}
	+
	+int HostThreadTeamData::organize_team( const int team_size )
	+{
	+ // Pool is initialized
	+ const bool ok_pool = 0 != m_pool_scratch ;
	+
	+ // Team is not set
	+ const bool ok_team =
	+ m_team_scratch == m_scratch &&
	+ m_team_base == m_pool_rank &&
	+ m_team_rank == 0 &&
	+ m_team_size == 1 &&
	+ m_team_alloc == 1 &&
	+ m_league_rank == m_pool_rank &&
	+ m_league_size == m_pool_size ;
	+
	+ if ( ok_pool && ok_team ) {
	+
	+ if ( team_size <= 0 ) return 0 ; // No teams to organize
	+
	+ if ( team_size == 1 ) return 1 ; // Already organized in teams of one
	+
	+ HostThreadTeamData * const * const pool =
	+ (HostThreadTeamData **) (m_pool_scratch + m_pool_members);
	+
	+ // "league_size" in this context is the number of concurrent teams
	+ // that the pool can accommodate. Excess threads are idle.
	+ const int league_size = m_pool_size / team_size ;
	+ const int team_alloc_size = m_pool_size / league_size ;
	+ const int team_alloc_rank = m_pool_rank % team_alloc_size ;
	+ const int league_rank = m_pool_rank / team_alloc_size ;
	+ const int team_base_rank = league_rank * team_alloc_size ;
	+
	+ m_team_scratch = pool[ team_base_rank ]->m_scratch ;
	+ m_team_base = team_base_rank ;
	+ // This needs to check overflow, if m_pool_size % team_alloc_size !=0
	+ // there are two corner cases:
	+ // (i) if team_alloc_size == team_size there might be a non-full
	+ // zombi team around (for example m_pool_size = 5 and team_size = 2
	+ // (ii) if team_alloc > team_size then the last team might have less
	+ // threads than the others
	+ m_team_rank = ( team_base_rank + team_size <= m_pool_size ) &&
	+ ( team_alloc_rank < team_size ) ?
	+ team_alloc_rank : -1;
	+ m_team_size = team_size ;
	+ m_team_alloc = team_alloc_size ;
	+ m_league_rank = league_rank ;
	+ m_league_size = league_size ;
	+ m_team_rendezvous_step = 0 ;
	+
	+ if ( team_base_rank == m_pool_rank ) {
	+ // Initialize team's rendezvous memory
	+ for ( int i = m_team_rendezvous ; i < m_pool_reduce ; ++i ) {
	+ m_scratch[i] = 0 ;
	+ }
	+ // Make sure team's rendezvous memory initialized
	+ // is written before proceeding.
	+ Kokkos::memory_fence();
	+ }
	+
	+ // Organizing threads into a team performs a barrier across the
	+ // entire pool to insure proper initialization of the team
	+ // rendezvous mechanism before a team rendezvous can be performed.
	+
	+ if ( pool_rendezvous() ) {
	+ pool_rendezvous_release();
	+ }
	+ }
	+ else {
	+ Kokkos::Impl::throw_runtime_exception("Kokkos::Impl::HostThreadTeamData::organize_team ERROR");
	+ }
	+
	+ return 0 <= m_team_rank ;
	+}
	+
	+void HostThreadTeamData::disband_team()
	+{
	+ m_team_scratch = m_scratch ;
	+ m_team_base = m_pool_rank ;
	+ m_team_rank = 0 ;
	+ m_team_size = 1 ;
	+ m_team_alloc = 1 ;
	+ m_league_rank = m_pool_rank ;
	+ m_league_size = m_pool_size ;
	+ m_team_rendezvous_step = 0 ;
	+}
	+
	+//----------------------------------------------------------------------------
	+/* pattern for rendezvous
	+ *
	+ * if ( rendezvous() ) {
	+ * ... all other threads are still in team_rendezvous() ...
	+ * rendezvous_release();
	+ * ... all other threads are released from team_rendezvous() ...
	+ * }
	+ */
	+
	+int HostThreadTeamData::rendezvous( int64_t * const buffer
	+ , int & rendezvous_step
	+ , int const size
	+ , int const rank ) noexcept
	+{
	+ enum : int { shift_byte = 3 };
	+ enum : int { size_byte = ( 01 << shift_byte ) }; // == 8
	+ enum : int { mask_byte = size_byte - 1 };
	+
	+ enum : int { shift_mem_cycle = 2 };
	+ enum : int { size_mem_cycle = ( 01 << shift_mem_cycle ) }; // == 4
	+ enum : int { mask_mem_cycle = size_mem_cycle - 1 };
	+
	+ // Cycle step values: 1 <= step <= size_val_cycle
	+ // An odd multiple of memory cycle so that when a memory location
	+ // is reused it has a different value.
	+ // Must be representable within a single byte: size_val_cycle < 16
	+
	+ enum : int { size_val_cycle = 3 * size_mem_cycle };
	+
	+ // Requires:
	+ // Called by rank = [ 0 .. size )
	+ // buffer aligned to int64_t[4]
	+
	+ // A sequence of rendezvous uses four cycled locations in memory
	+ // and non-equal cycled synchronization values to
	+ // 1) prevent rendezvous from overtaking one another and
	+ // 2) give each spin wait location an int64_t[4] span
	+ // so that it has its own cache line.
	+
	+ const int step = ( rendezvous_step % size_val_cycle ) + 1 ;
	+
	+ rendezvous_step = step ;
	+
	+ // The leading int64_t[4] span is for thread 0 to write
	+ // and all other threads to read spin-wait.
	+ // sync_offset is the index into this array for this step.
	+
	+ const int sync_offset = ( step & mask_mem_cycle ) + size_mem_cycle ;
	+
	+ union {
	+ int64_t full ;
	+ int8_t byte[8] ;
	+ } value ;
	+
	+ if ( rank ) {
	+
	+ const int group_begin = rank << shift_byte ; // == rank * size_byte
	+
	+ if ( group_begin < size ) {
	+
	+ // This thread waits for threads
	+ // [ group_begin .. group_begin + 8 )
	+ // [ rank8 .. rank8 + 8 )
	+ // to write to their designated bytes.
	+
	+ const int end = group_begin + size_byte < size
	+ ? size_byte : size - group_begin ;
	+
	+ value.full = 0 ;
	+ for ( int i = 0 ; i < end ; ++i ) value.byte[i] = int8_t( step );
	+
	+ store_fence(); // This should not be needed but fixes #742
	+
	+ spinwait_until_equal( buffer[ (rank << shift_mem_cycle) + sync_offset ]
	+ , value.full );
	+ }
	+
	+ {
	+ // This thread sets its designated byte.
	+ // ( rank % size_byte ) +
	+ // ( ( rank / size_byte ) * size_byte * size_mem_cycle ) +
	+ // ( sync_offset * size_byte )
	+ const int offset = ( rank & mask_byte )
	+ + ( ( rank & ~mask_byte ) << shift_mem_cycle )
	+ + ( sync_offset << shift_byte );
	+
	+ // All of this thread's previous memory stores must be complete before
	+ // this thread stores the step value at this thread's designated byte
	+ // in the shared synchronization array.
	+
	+ Kokkos::memory_fence();
	+
	+ ((volatile int8_t*) buffer)[ offset ] = int8_t( step );
	+
	+ // Memory fence to push the previous store out
	+ Kokkos::memory_fence();
	+ }
	+
	+ // Wait for thread 0 to release all other threads
	+
	+ spinwait_until_equal( buffer[ step & mask_mem_cycle ] , int64_t(step) );
	+
	+ }
	+ else {
	+ // Thread 0 waits for threads [1..7]
	+ // to write to their designated bytes.
	+
	+ const int end = size_byte < size ? 8 : size ;
	+
	+ value.full = 0 ;
	+ for ( int i = 1 ; i < end ; ++i ) value.byte[i] = int8_t( step );
	+
	+ spinwait_until_equal( buffer[ sync_offset ], value.full );
	+ }
	+
	+ return rank ? 0 : 1 ;
	+}
	+
	+void HostThreadTeamData::
	+ rendezvous_release( int64_t * const buffer
	+ , int const rendezvous_step ) noexcept
	+{
	+ enum : int { shift_mem_cycle = 2 };
	+ enum : int { size_mem_cycle = ( 01 << shift_mem_cycle ) }; // == 4
	+ enum : int { mask_mem_cycle = size_mem_cycle - 1 };
	+
	+ // Requires:
	+ // Called after team_rendezvous
	+ // Called only by true == team_rendezvous(root)
	+
	+ // Memory fence to be sure all previous writes are complete:
	+ Kokkos::memory_fence();
	+
	+ ((volatile int64_t*) buffer)[ rendezvous_step & mask_mem_cycle ] =
	+ int64_t( rendezvous_step );
	+
	+ // Memory fence to push the store out
	+ Kokkos::memory_fence();
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+int HostThreadTeamData::get_work_stealing() noexcept
	+{
	+ pair_int_t w( -1 , -1 );
	+
	+ if ( 1 == m_team_size \|\| team_rendezvous() ) {
	+
	+ // Attempt first from beginning of my work range
	+ for ( int attempt = m_work_range.first < m_work_range.second ; attempt ; ) {
	+
	+ // Query and attempt to update m_work_range
	+ // from: [ w.first , w.second )
	+ // to: [ w.first + 1 , w.second ) = w_new
	+ //
	+ // If w is invalid then is just a query.
	+
	+ const pair_int_t w_new( w.first + 1 , w.second );
	+
	+ w = Kokkos::atomic_compare_exchange( & m_work_range, w, w_new );
	+
	+ if ( w.first < w.second ) {
	+ // m_work_range is viable
	+
	+ // If steal is successful then don't repeat attempt to steal
	+ attempt = ! ( w_new.first == w.first + 1 &&
	+ w_new.second == w.second );
	+ }
	+ else {
	+ // m_work_range is not viable
	+ w.first = -1 ;
	+ w.second = -1 ;
	+
	+ attempt = 0 ;
	+ }
	+ }
	+
	+ if ( w.first == -1 && m_steal_rank != m_pool_rank ) {
	+
	+ HostThreadTeamData * const * const pool =
	+ (HostThreadTeamData**)( m_pool_scratch + m_pool_members );
	+
	+ // Attempt from begining failed, try to steal from end of neighbor
	+
	+ pair_int_t volatile * steal_range =
	+ & ( pool[ m_steal_rank ]->m_work_range );
	+
	+ for ( int attempt = true ; attempt ; ) {
	+
	+ // Query and attempt to update steal_work_range
	+ // from: [ w.first , w.second )
	+ // to: [ w.first , w.second - 1 ) = w_new
	+ //
	+ // If w is invalid then is just a query.
	+
	+ const pair_int_t w_new( w.first , w.second - 1 );
	+
	+ w = Kokkos::atomic_compare_exchange( steal_range, w, w_new );
	+
	+ if ( w.first < w.second ) {
	+ // steal_work_range is viable
	+
	+ // If steal is successful then don't repeat attempt to steal
	+ attempt = ! ( w_new.first == w.first &&
	+ w_new.second == w.second - 1 );
	+ }
	+ else {
	+ // steal_work_range is not viable, move to next member
	+ w.first = -1 ;
	+ w.second = -1 ;
	+
	+ // We need to figure out whether the next team is active
	+ // m_steal_rank + m_team_alloc could be the next base_rank to steal from
	+ // but only if there are another m_team_size threads available so that that
	+ // base rank has a full team.
	+ m_steal_rank = m_steal_rank + m_team_alloc + m_team_size <= m_pool_size ?
	+ m_steal_rank + m_team_alloc : 0;
	+
	+ steal_range = & ( pool[ m_steal_rank ]->m_work_range );
	+
	+ // If tried all other members then don't repeat attempt to steal
	+ attempt = m_steal_rank != m_pool_rank ;
	+ }
	+ }
	+
	+ if ( w.first != -1 ) w.first = w.second - 1 ;
	+ }
	+
	+ if ( 1 < m_team_size ) {
	+ // Must share the work index
	+ ((int volatile ) team_reduce()) = w.first ;
	+
	+ team_rendezvous_release();
	+ }
	+ }
	+ else if ( 1 < m_team_size ) {
	+ w.first = ((int volatile ) team_reduce());
	+ }
	+
	+ // May exit because successfully stole work and w is good.
	+ // May exit because no work left to steal and w = (-1,-1).
	+
	+#if 0
	+fprintf(stdout,"HostThreadTeamData::get_work_stealing() pool(%d of %d) %d\n"
	+ , m_pool_rank , m_pool_size , w.first );
	+fflush(stdout);
	+#endif
	+
	+ return w.first ;
	+}
	+
	+} // namespace Impl
	+} // namespace Kokkos
	+
	diff --git a/lib/kokkos/core/src/impl/Kokkos_HostThreadTeam.hpp b/lib/kokkos/core/src/impl/Kokkos_HostThreadTeam.hpp
	new file mode 100644
	index 000000000..6b5918eae
	--- /dev/null
	+++ b/lib/kokkos/core/src/impl/Kokkos_HostThreadTeam.hpp
	@@ -0,0 +1,1090 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+#ifndef KOKKOS_IMPL_HOSTTHREADTEAM_HPP
	+#define KOKKOS_IMPL_HOSTTHREADTEAM_HPP
	+
	+#include <Kokkos_Core_fwd.hpp>
	+#include <Kokkos_Pair.hpp>
	+#include <Kokkos_Atomic.hpp>
	+#include <Kokkos_ExecPolicy.hpp>
	+#include <impl/Kokkos_FunctorAdapter.hpp>
	+#include <impl/Kokkos_Reducer.hpp>
	+#include <impl/Kokkos_FunctorAnalysis.hpp>
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+namespace Impl {
	+
	+template< class HostExecSpace >
	+class HostThreadTeamMember ;
	+
	+class HostThreadTeamData {
	+public:
	+
	+ template< class > friend class HostThreadTeamMember ;
	+
	+ // Assume upper bounds on number of threads:
	+ // pool size <= 1024 threads
	+ // pool rendezvous <= ( 1024 / 8 ) * 4 + 4 = 2052
	+ // team size <= 64 threads
	+ // team rendezvous <= ( 64 / 8 ) * 4 + 4 = 36
	+
	+ enum : int { max_pool_members = 1024 };
	+ enum : int { max_team_members = 64 };
	+ enum : int { max_pool_rendezvous = ( max_pool_members / 8 ) * 4 + 4 };
	+ enum : int { max_team_rendezvous = ( max_team_members / 8 ) * 4 + 4 };
	+
	+private:
	+
	+ // per-thread scratch memory buffer chunks:
	+ //
	+ // [ pool_members ] = [ m_pool_members .. m_pool_rendezvous )
	+ // [ pool_rendezvous ] = [ m_pool_rendezvous .. m_team_rendezvous )
	+ // [ team_rendezvous ] = [ m_team_rendezvous .. m_pool_reduce )
	+ // [ pool_reduce ] = [ m_pool_reduce .. m_team_reduce )
	+ // [ team_reduce ] = [ m_team_reduce .. m_team_shared )
	+ // [ team_shared ] = [ m_team_shared .. m_thread_local )
	+ // [ thread_local ] = [ m_thread_local .. m_scratch_size )
	+
	+ enum : int { m_pool_members = 0 };
	+ enum : int { m_pool_rendezvous = m_pool_members + max_pool_members };
	+ enum : int { m_team_rendezvous = m_pool_rendezvous + max_pool_rendezvous };
	+ enum : int { m_pool_reduce = m_team_rendezvous + max_team_rendezvous };
	+
	+ using pair_int_t = Kokkos::pair<int,int> ;
	+
	+ pair_int_t m_work_range ;
	+ int64_t m_work_end ;
	+ int64_t * m_scratch ; // per-thread buffer
	+ int64_t * m_pool_scratch ; // == pool[0]->m_scratch
	+ int64_t * m_team_scratch ; // == pool[ 0 + m_team_base ]->m_scratch
	+ int m_pool_rank ;
	+ int m_pool_size ;
	+ int m_team_reduce ;
	+ int m_team_shared ;
	+ int m_thread_local ;
	+ int m_scratch_size ;
	+ int m_team_base ;
	+ int m_team_rank ;
	+ int m_team_size ;
	+ int m_team_alloc ;
	+ int m_league_rank ;
	+ int m_league_size ;
	+ int m_work_chunk ;
	+ int m_steal_rank ; // work stealing rank
	+ int mutable m_pool_rendezvous_step ;
	+ int mutable m_team_rendezvous_step ;
	+
	+ HostThreadTeamData * team_member( int r ) const noexcept
	+ { return ((HostThreadTeamData**)(m_pool_scratch+m_pool_members))[m_team_base+r]; }
	+
	+ // Rendezvous pattern:
	+ // if ( rendezvous(root) ) {
	+ // ... only root thread here while all others wait ...
	+ // rendezvous_release();
	+ // }
	+ // else {
	+ // ... all other threads release here ...
	+ // }
	+ //
	+ // Requires: buffer[ ( max_threads / 8 ) * 4 + 4 ]; 0 == max_threads % 8
	+ //
	+ static
	+ int rendezvous( int64_t * const buffer
	+ , int & rendezvous_step
	+ , int const size
	+ , int const rank ) noexcept ;
	+
	+ static
	+ void rendezvous_release( int64_t * const buffer
	+ , int const rendezvous_step ) noexcept ;
	+
	+public:
	+
	+ inline
	+ int team_rendezvous( int const root ) const noexcept
	+ {
	+ return 1 == m_team_size ? 1 :
	+ rendezvous( m_team_scratch + m_team_rendezvous
	+ , m_team_rendezvous_step
	+ , m_team_size
	+ , ( m_team_rank + m_team_size - root ) % m_team_size );
	+ }
	+
	+ inline
	+ int team_rendezvous() const noexcept
	+ {
	+ return 1 == m_team_size ? 1 :
	+ rendezvous( m_team_scratch + m_team_rendezvous
	+ , m_team_rendezvous_step
	+ , m_team_size
	+ , m_team_rank );
	+ }
	+
	+ inline
	+ void team_rendezvous_release() const noexcept
	+ {
	+ if ( 1 < m_team_size ) {
	+ rendezvous_release( m_team_scratch + m_team_rendezvous
	+ , m_team_rendezvous_step );
	+ }
	+ }
	+
	+ inline
	+ int pool_rendezvous() const noexcept
	+ {
	+ return 1 == m_pool_size ? 1 :
	+ rendezvous( m_pool_scratch + m_pool_rendezvous
	+ , m_pool_rendezvous_step
	+ , m_pool_size
	+ , m_pool_rank );
	+ }
	+
	+ inline
	+ void pool_rendezvous_release() const noexcept
	+ {
	+ if ( 1 < m_pool_size ) {
	+ rendezvous_release( m_pool_scratch + m_pool_rendezvous
	+ , m_pool_rendezvous_step );
	+ }
	+ }
	+
	+ //----------------------------------------
	+
	+ constexpr HostThreadTeamData() noexcept
	+ : m_work_range(-1,-1)
	+ , m_work_end(0)
	+ , m_scratch(0)
	+ , m_pool_scratch(0)
	+ , m_team_scratch(0)
	+ , m_pool_rank(0)
	+ , m_pool_size(1)
	+ , m_team_reduce(0)
	+ , m_team_shared(0)
	+ , m_thread_local(0)
	+ , m_scratch_size(0)
	+ , m_team_base(0)
	+ , m_team_rank(0)
	+ , m_team_size(1)
	+ , m_team_alloc(1)
	+ , m_league_rank(0)
	+ , m_league_size(1)
	+ , m_work_chunk(0)
	+ , m_steal_rank(0)
	+ , m_pool_rendezvous_step(0)
	+ , m_team_rendezvous_step(0)
	+ {}
	+
	+ //----------------------------------------
	+ // Organize array of members into a pool.
	+ // The 0th member is the root of the pool.
	+ // Requires: members are not already in a pool.
	+ // Requires: called by one thread.
	+ // Pool members are ordered as "close" - sorted by NUMA and then CORE
	+ // Each thread is its own team with team_size == 1.
	+ static void organize_pool( HostThreadTeamData * members[]
	+ , const int size );
	+
	+ // Called by each thread within the pool
	+ void disband_pool();
	+
	+ //----------------------------------------
	+ // Each thread within a pool organizes itself into a team.
	+ // Must be called by all threads of the pool.
	+ // Organizing threads into a team performs a barrier across the
	+ // entire pool to insure proper initialization of the team
	+ // rendezvous mechanism before a team rendezvous can be performed.
	+ //
	+ // Return true if a valid member of a team.
	+ // Return false if not a member and thread should be idled.
	+ int organize_team( const int team_size );
	+
	+ // Each thread within a pool disbands itself from current team.
	+ // Each thread becomes its own team with team_size == 1.
	+ // Must be called by all threads of the pool.
	+ void disband_team();
	+
	+ //----------------------------------------
	+
	+ constexpr int pool_rank() const { return m_pool_rank ; }
	+ constexpr int pool_size() const { return m_pool_size ; }
	+
	+ HostThreadTeamData * pool_member( int r ) const noexcept
	+ { return ((HostThreadTeamData**)(m_pool_scratch+m_pool_members))[r]; }
	+
	+ //----------------------------------------
	+
	+private:
	+
	+ enum : int { mask_to_16 = 0x0f }; // align to 16 bytes
	+ enum : int { shift_to_8 = 3 }; // size to 8 bytes
	+
	+public:
	+
	+ static constexpr int align_to_int64( int n )
	+ { return ( ( n + mask_to_16 ) & ~mask_to_16 ) >> shift_to_8 ; }
	+
	+ constexpr int pool_reduce_bytes() const
	+ { return m_scratch_size ? sizeof(int64_t) * ( m_team_reduce - m_pool_reduce ) : 0 ; }
	+
	+ constexpr int team_reduce_bytes() const
	+ { return sizeof(int64_t) * ( m_team_shared - m_team_reduce ); }
	+
	+ constexpr int team_shared_bytes() const
	+ { return sizeof(int64_t) * ( m_thread_local - m_team_shared ); }
	+
	+ constexpr int thread_local_bytes() const
	+ { return sizeof(int64_t) * ( m_scratch_size - m_thread_local ); }
	+
	+ constexpr int scratch_bytes() const
	+ { return sizeof(int64_t) * m_scratch_size ; }
	+
	+ // Memory chunks:
	+
	+ int64_t * scratch_buffer() const noexcept
	+ { return m_scratch ; }
	+
	+ int64_t * pool_reduce() const noexcept
	+ { return m_pool_scratch + m_pool_reduce ; }
	+
	+ int64_t * pool_reduce_local() const noexcept
	+ { return m_scratch + m_pool_reduce ; }
	+
	+ int64_t * team_reduce() const noexcept
	+ { return m_team_scratch + m_team_reduce ; }
	+
	+ int64_t * team_reduce_local() const noexcept
	+ { return m_scratch + m_team_reduce ; }
	+
	+ int64_t * team_shared() const noexcept
	+ { return m_team_scratch + m_team_shared ; }
	+
	+ int64_t * local_scratch() const noexcept
	+ { return m_scratch + m_thread_local ; }
	+
	+ // Given:
	+ // pool_reduce_size = number bytes for pool reduce
	+ // team_reduce_size = number bytes for team reduce
	+ // team_shared_size = number bytes for team shared memory
	+ // thread_local_size = number bytes for thread local memory
	+ // Return:
	+ // total number of bytes that must be allocated
	+ static
	+ size_t scratch_size( int pool_reduce_size
	+ , int team_reduce_size
	+ , int team_shared_size
	+ , int thread_local_size )
	+ {
	+ pool_reduce_size = align_to_int64( pool_reduce_size );
	+ team_reduce_size = align_to_int64( team_reduce_size );
	+ team_shared_size = align_to_int64( team_shared_size );
	+ thread_local_size = align_to_int64( thread_local_size );
	+
	+ const size_t total_bytes = (
	+ m_pool_reduce +
	+ pool_reduce_size +
	+ team_reduce_size +
	+ team_shared_size +
	+ thread_local_size ) * sizeof(int64_t);
	+
	+ return total_bytes ;
	+ }
	+
	+ // Given:
	+ // alloc_ptr = pointer to allocated memory
	+ // alloc_size = number bytes of allocated memory
	+ // pool_reduce_size = number bytes for pool reduce/scan operations
	+ // team_reduce_size = number bytes for team reduce/scan operations
	+ // team_shared_size = number bytes for team-shared memory
	+ // thread_local_size = number bytes for thread-local memory
	+ // Return:
	+ // total number of bytes that must be allocated
	+ void scratch_assign( void * const alloc_ptr
	+ , size_t const alloc_size
	+ , int pool_reduce_size
	+ , int team_reduce_size
	+ , int team_shared_size
	+ , int /* thread_local_size */ )
	+ {
	+ pool_reduce_size = align_to_int64( pool_reduce_size );
	+ team_reduce_size = align_to_int64( team_reduce_size );
	+ team_shared_size = align_to_int64( team_shared_size );
	+ // thread_local_size = align_to_int64( thread_local_size );
	+
	+ m_scratch = (int64_t *) alloc_ptr ;
	+ m_team_reduce = m_pool_reduce + pool_reduce_size ;
	+ m_team_shared = m_team_reduce + team_reduce_size ;
	+ m_thread_local = m_team_shared + team_shared_size ;
	+ m_scratch_size = align_to_int64( alloc_size );
	+
	+#if 0
	+fprintf(stdout,"HostThreadTeamData::scratch_assign { %d %d %d %d %d %d %d }\n"
	+ , int(m_pool_members)
	+ , int(m_pool_rendezvous)
	+ , int(m_pool_reduce)
	+ , int(m_team_reduce)
	+ , int(m_team_shared)
	+ , int(m_thread_local)
	+ , int(m_scratch_size)
	+ );
	+fflush(stdout);
	+#endif
	+
	+ }
	+
	+ //----------------------------------------
	+ // Get a work index within the range.
	+ // First try to steal from beginning of own teams's partition.
	+ // If that fails then try to steal from end of another teams' partition.
	+ int get_work_stealing() noexcept ;
	+
	+ //----------------------------------------
	+ // Set the initial work partitioning of [ 0 .. length ) among the teams
	+ // with granularity of chunk
	+
	+ void set_work_partition( int64_t const length
	+ , int const chunk ) noexcept
	+ {
	+ // Minimum chunk size to insure that
	+ // m_work_end < std::numeric_limits<int>::max() * m_work_chunk
	+
	+ int const chunk_min = ( length + std::numeric_limits<int>::max() )
	+ / std::numeric_limits<int>::max();
	+
	+ m_work_end = length ;
	+ m_work_chunk = std::max( chunk , chunk_min );
	+
	+ // Number of work chunks and partitioning of that number:
	+ int const num = ( m_work_end + m_work_chunk - 1 ) / m_work_chunk ;
	+ int const part = ( num + m_league_size - 1 ) / m_league_size ;
	+
	+ m_work_range.first = part * m_league_rank ;
	+ m_work_range.second = m_work_range.first + part ;
	+
	+ // Steal from next team, round robin
	+ // The next team is offset by m_team_alloc if it fits in the pool.
	+
	+ m_steal_rank = m_team_base + m_team_alloc + m_team_size <= m_pool_size ?
	+ m_team_base + m_team_alloc : 0 ;
	+ }
	+
	+ std::pair<int64_t,int64_t> get_work_partition() noexcept
	+ {
	+ return std::pair<int64_t,int64_t>
	+ ( m_work_range.first * m_work_chunk
	+ , m_work_range.second * m_work_chunk < m_work_end
	+ ? m_work_range.second * m_work_chunk : m_work_end );
	+ }
	+
	+ std::pair<int64_t,int64_t> get_work_stealing_chunk() noexcept
	+ {
	+ std::pair<int64_t,int64_t> x(-1,-1);
	+
	+ const int i = get_work_stealing();
	+
	+ if ( 0 <= i ) {
	+ x.first = m_work_chunk * i ;
	+ x.second = x.first + m_work_chunk < m_work_end
	+ ? x.first + m_work_chunk : m_work_end ;
	+ }
	+
	+ return x ;
	+ }
	+};
	+
	+//----------------------------------------------------------------------------
	+
	+template< class HostExecSpace >
	+class HostThreadTeamMember {
	+public:
	+
	+ using scratch_memory_space = typename HostExecSpace::scratch_memory_space ;
	+
	+private:
	+
	+ scratch_memory_space m_scratch ;
	+ HostThreadTeamData & m_data ;
	+ int const m_league_rank ;
	+ int const m_league_size ;
	+
	+public:
	+
	+ constexpr HostThreadTeamMember( HostThreadTeamData & arg_data ) noexcept
	+ : m_scratch( arg_data.team_shared() , arg_data.team_shared_bytes() )
	+ , m_data( arg_data )
	+ , m_league_rank(0)
	+ , m_league_size(1)
	+ {}
	+
	+ constexpr HostThreadTeamMember( HostThreadTeamData & arg_data
	+ , int const arg_league_rank
	+ , int const arg_league_size
	+ ) noexcept
	+ : m_scratch( arg_data.team_shared()
	+ , arg_data.team_shared_bytes()
	+ , arg_data.team_shared()
	+ , arg_data.team_shared_bytes() )
	+ , m_data( arg_data )
	+ , m_league_rank( arg_league_rank )
	+ , m_league_size( arg_league_size )
	+ {}
	+
	+ ~HostThreadTeamMember() = default ;
	+ HostThreadTeamMember() = delete ;
	+ HostThreadTeamMember( HostThreadTeamMember && ) = default ;
	+ HostThreadTeamMember( HostThreadTeamMember const & ) = default ;
	+ HostThreadTeamMember & operator = ( HostThreadTeamMember && ) = default ;
	+ HostThreadTeamMember & operator = ( HostThreadTeamMember const & ) = default ;
	+
	+ //----------------------------------------
	+
	+ KOKKOS_INLINE_FUNCTION
	+ int team_rank() const noexcept { return m_data.m_team_rank ; }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ int team_size() const noexcept { return m_data.m_team_size ; }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ int league_rank() const noexcept { return m_league_rank ; }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ int league_size() const noexcept { return m_league_size ; }
	+
	+ //----------------------------------------
	+
	+ KOKKOS_INLINE_FUNCTION
	+ const scratch_memory_space & team_shmem() const
	+ { return m_scratch.set_team_thread_mode(0,1,0); }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ const scratch_memory_space & team_scratch(int) const
	+ { return m_scratch.set_team_thread_mode(0,1,0); }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ const scratch_memory_space & thread_scratch(int) const
	+ { return m_scratch.set_team_thread_mode(0,m_data.m_team_size,m_data.m_team_rank); }
	+
	+ //----------------------------------------
	+ // Team collectives
	+
	+ KOKKOS_INLINE_FUNCTION void team_barrier() const noexcept
	+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+ {
	+ if ( m_data.team_rendezvous() ) m_data.team_rendezvous_release();
	+ }
	+#else
	+ {}
	+#endif
	+
	+ template< class Closure >
	+ KOKKOS_INLINE_FUNCTION
	+ void team_barrier( Closure const & f ) const noexcept
	+ {
	+ if ( m_data.team_rendezvous() ) {
	+
	+ // All threads have entered 'team_rendezvous'
	+ // only this thread returned from 'team_rendezvous'
	+ // with a return value of 'true'
	+
	+ f();
	+
	+ m_data.team_rendezvous_release();
	+ }
	+ }
	+
	+ //--------------------------------------------------------------------------
	+
	+ template< typename T >
	+ KOKKOS_INLINE_FUNCTION
	+ void team_broadcast( T & value , const int source_team_rank ) const noexcept
	+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+ {
	+ if ( 1 < m_data.m_team_size ) {
	+ T volatile * const shared_value = (T*) m_data.team_reduce();
	+
	+ // Don't overwrite shared memory until all threads arrive
	+
	+ if ( m_data.team_rendezvous( source_team_rank ) ) {
	+ // All threads have entered 'team_rendezvous'
	+ // only this thread returned from 'team_rendezvous'
	+ // with a return value of 'true'
	+
	+ *shared_value = value ;
	+
	+ m_data.team_rendezvous_release();
	+ // This thread released all other threads from 'team_rendezvous'
	+ // with a return value of 'false'
	+ }
	+ else {
	+ value = *shared_value ;
	+ }
	+ }
	+ }
	+#else
	+ { Kokkos::abort("HostThreadTeamMember team_broadcast\n"); }
	+#endif
	+
	+ //--------------------------------------------------------------------------
	+
	+ template< class Closure , typename T >
	+ KOKKOS_INLINE_FUNCTION
	+ void team_broadcast( Closure const & f , T & value , const int source_team_rank) const noexcept
	+ {
	+ T volatile * const shared_value = (T*) m_data.team_reduce();
	+
	+ // Don't overwrite shared memory until all threads arrive
	+
	+ if ( m_data.team_rendezvous(source_team_rank) ) {
	+
	+ // All threads have entered 'team_rendezvous'
	+ // only this thread returned from 'team_rendezvous'
	+ // with a return value of 'true'
	+
	+ f( value );
	+
	+ if ( 1 < m_data.m_team_size ) { *shared_value = value ; }
	+
	+ m_data.team_rendezvous_release();
	+ // This thread released all other threads from 'team_rendezvous'
	+ // with a return value of 'false'
	+ }
	+ else {
	+ value = *shared_value ;
	+ }
	+ }
	+
	+ //--------------------------------------------------------------------------
	+ // team_reduce( Sum(result) );
	+ // team_reduce( Min(result) );
	+ // team_reduce( Max(result) );
	+
	+ template< typename ReducerType >
	+ KOKKOS_INLINE_FUNCTION
	+ typename std::enable_if< is_reducer< ReducerType >::value >::type
	+ team_reduce( ReducerType const & reducer ) const noexcept
	+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+ {
	+ if ( 1 < m_data.m_team_size ) {
	+
	+ using value_type = typename ReducerType::value_type ;
	+
	+ if ( 0 != m_data.m_team_rank ) {
	+ // Non-root copies to their local buffer:
	+ reducer.copy( (value_type*) m_data.team_reduce_local()
	+ , reducer.data() );
	+ }
	+
	+ // Root does not overwrite shared memory until all threads arrive
	+ // and copy to their local buffer.
	+
	+ if ( m_data.team_rendezvous() ) {
	+ // All threads have entered 'team_rendezvous'
	+ // only this thread returned from 'team_rendezvous'
	+ // with a return value of 'true'
	+ //
	+ // This thread sums contributed values
	+ for ( int i = 1 ; i < m_data.m_team_size ; ++i ) {
	+ value_type * const src =
	+ (value_type*) m_data.team_member(i)->team_reduce_local();
	+
	+ reducer.join( reducer.data() , src );
	+ }
	+
	+ // Copy result to root member's buffer:
	+ reducer.copy( (value_type*) m_data.team_reduce() , reducer.data() );
	+
	+ m_data.team_rendezvous_release();
	+ // This thread released all other threads from 'team_rendezvous'
	+ // with a return value of 'false'
	+ }
	+ else {
	+ // Copy from root member's buffer:
	+ reducer.copy( reducer.data() , (value_type*) m_data.team_reduce() );
	+ }
	+ }
	+ }
	+#else
	+ { Kokkos::abort("HostThreadTeamMember team_reduce\n"); }
	+#endif
	+
	+ //--------------------------------------------------------------------------
	+
	+ template< typename ValueType , class JoinOp >
	+ KOKKOS_INLINE_FUNCTION
	+ ValueType
	+ team_reduce( ValueType const & value
	+ , JoinOp const & join ) const noexcept
	+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+ {
	+ if ( 0 != m_data.m_team_rank ) {
	+ // Non-root copies to their local buffer:
	+ ((ValueType) m_data.team_reduce_local()) = value ;
	+ }
	+
	+ // Root does not overwrite shared memory until all threads arrive
	+ // and copy to their local buffer.
	+
	+ if ( m_data.team_rendezvous() ) {
	+ const Impl::Reducer< ValueType , JoinOp > reducer( join );
	+
	+ // All threads have entered 'team_rendezvous'
	+ // only this thread returned from 'team_rendezvous'
	+ // with a return value of 'true'
	+ //
	+ // This thread sums contributed values
	+
	+ ValueType * const dst = (ValueType*) m_data.team_reduce_local();
	+
	+ *dst = value ;
	+
	+ for ( int i = 1 ; i < m_data.m_team_size ; ++i ) {
	+ ValueType * const src =
	+ (ValueType*) m_data.team_member(i)->team_reduce_local();
	+
	+ reducer.join( dst , src );
	+ }
	+
	+ m_data.team_rendezvous_release();
	+ // This thread released all other threads from 'team_rendezvous'
	+ // with a return value of 'false'
	+ }
	+
	+ return ((ValueType) m_data.team_reduce());
	+ }
	+#else
	+ { Kokkos::abort("HostThreadTeamMember team_reduce\n"); return ValueType(); }
	+#endif
	+
	+
	+ template< typename T >
	+ KOKKOS_INLINE_FUNCTION
	+ T team_scan( T const & value , T * const global = 0 ) const noexcept
	+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+ {
	+ if ( 0 != m_data.m_team_rank ) {
	+ // Non-root copies to their local buffer:
	+ ((T*) m_data.team_reduce_local())[1] = value ;
	+ }
	+
	+ // Root does not overwrite shared memory until all threads arrive
	+ // and copy to their local buffer.
	+
	+ if ( m_data.team_rendezvous() ) {
	+ // All threads have entered 'team_rendezvous'
	+ // only this thread returned from 'team_rendezvous'
	+ // with a return value of 'true'
	+ //
	+ // This thread scans contributed values
	+
	+ {
	+ T * prev = (T*) m_data.team_reduce_local();
	+
	+ prev[0] = 0 ;
	+ prev[1] = value ;
	+
	+ for ( int i = 1 ; i < m_data.m_team_size ; ++i ) {
	+ T * const ptr = (T*) m_data.team_member(i)->team_reduce_local();
	+
	+ ptr[0] = prev[0] + prev[1] ;
	+
	+ prev = ptr ;
	+ }
	+ }
	+
	+ // If adding to global value then atomic_fetch_add to that value
	+ // and sum previous value to every entry of the scan.
	+ if ( global ) {
	+ T * prev = (T*) m_data.team_reduce_local();
	+
	+ {
	+ T * ptr = (T*) m_data.team_member( m_data.m_team_size - 1 )->team_reduce_local();
	+ prev[0] = Kokkos::atomic_fetch_add( global , ptr[0] + ptr[1] );
	+ }
	+
	+ for ( int i = 1 ; i < m_data.m_team_size ; ++i ) {
	+ T * ptr = (T*) m_data.team_member(i)->team_reduce_local();
	+ ptr[0] += prev[0] ;
	+ }
	+ }
	+
	+ m_data.team_rendezvous_release();
	+ }
	+
	+ return ((T*) m_data.team_reduce_local())[0];
	+ }
	+#else
	+ { Kokkos::abort("HostThreadTeamMember team_scan\n"); return T(); }
	+#endif
	+
	+};
	+
	+
	+}} /* namespace Kokkos::Impl */
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+
	+template<class Space,typename iType>
	+KOKKOS_INLINE_FUNCTION
	+Impl::TeamThreadRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >
	+TeamThreadRange( Impl::HostThreadTeamMember<Space> const & member
	+ , iType const & count )
	+{
	+ return
	+ Impl::TeamThreadRangeBoundariesStruct
	+ <iType,Impl::HostThreadTeamMember<Space> >(member,0,count);
	+}
	+
	+template<class Space, typename iType1, typename iType2>
	+KOKKOS_INLINE_FUNCTION
	+Impl::TeamThreadRangeBoundariesStruct
	+ < typename std::common_type< iType1, iType2 >::type
	+ , Impl::HostThreadTeamMember<Space> >
	+TeamThreadRange( Impl::HostThreadTeamMember<Space> const & member
	+ , iType1 const & begin , iType2 const & end )
	+{
	+ return
	+ Impl::TeamThreadRangeBoundariesStruct
	+ < typename std::common_type< iType1, iType2 >::type
	+ , Impl::HostThreadTeamMember<Space> >( member , begin , end );
	+}
	+
	+template<class Space, typename iType>
	+KOKKOS_INLINE_FUNCTION
	+Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >
	+ThreadVectorRange
	+ ( Impl::HostThreadTeamMember<Space> const & member
	+ , const iType & count )
	+{
	+ return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >(member,count);
	+}
	+
	+//----------------------------------------------------------------------------
	+/** \brief Inter-thread parallel_for.
	+ *
	+ * Executes lambda(iType i) for each i=[0..N)
	+ *
	+ * The range [0..N) is mapped to all threads of the the calling thread team.
	+*/
	+template<typename iType, class Space, class Closure>
	+KOKKOS_INLINE_FUNCTION
	+void parallel_for
	+ ( Impl::TeamThreadRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> > const & loop_boundaries
	+ , Closure const & closure
	+ )
	+{
	+ for( iType i = loop_boundaries.start
	+ ; i < loop_boundaries.end
	+ ; i += loop_boundaries.increment ) {
	+ closure (i);
	+ }
	+}
	+
	+template<typename iType, class Space, class Closure>
	+KOKKOS_INLINE_FUNCTION
	+void parallel_for
	+ ( Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> > const & loop_boundaries
	+ , Closure const & closure
	+ )
	+{
	+ #ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	+ #pragma ivdep
	+ #endif
	+ for( iType i = loop_boundaries.start
	+ ; i < loop_boundaries.end
	+ ; i += loop_boundaries.increment ) {
	+ closure (i);
	+ }
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+template< typename iType, class Space, class Closure, class Reducer >
	+KOKKOS_INLINE_FUNCTION
	+typename std::enable_if< Kokkos::is_reducer< Reducer >::value >::type
	+parallel_reduce
	+ ( Impl::TeamThreadRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >
	+ const & loop_boundaries
	+ , Closure const & closure
	+ , Reducer const & reducer
	+ )
	+{
	+ reducer.init( reducer.data() );
	+
	+ for( iType i = loop_boundaries.start
	+ ; i < loop_boundaries.end
	+ ; i += loop_boundaries.increment ) {
	+ closure( i , reducer.reference() );
	+ }
	+
	+ loop_boundaries.thread.team_reduce( reducer );
	+}
	+
	+template< typename iType, class Space, typename Closure, typename ValueType >
	+KOKKOS_INLINE_FUNCTION
	+typename std::enable_if< ! Kokkos::is_reducer<ValueType>::value >::type
	+parallel_reduce
	+ ( Impl::TeamThreadRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >
	+ const & loop_boundaries
	+ , Closure const & closure
	+ , ValueType & result
	+ )
	+{
	+ Impl::Reducer< ValueType , Impl::ReduceSum< ValueType > > reducer( & result );
	+
	+ reducer.init( reducer.data() );
	+
	+ for( iType i = loop_boundaries.start
	+ ; i < loop_boundaries.end
	+ ; i += loop_boundaries.increment ) {
	+ closure( i , reducer.reference() );
	+ }
	+
	+ loop_boundaries.thread.team_reduce( reducer );
	+}
	+
	+template< typename iType, class Space
	+ , class Closure, class Joiner , typename ValueType >
	+KOKKOS_INLINE_FUNCTION
	+void parallel_reduce
	+ ( Impl::TeamThreadRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >
	+ const & loop_boundaries
	+ , Closure const & closure
	+ , Joiner const & joiner
	+ , ValueType & result
	+ )
	+{
	+ Impl::Reducer< ValueType , Joiner > reducer( joiner , & result );
	+
	+ reducer.init( reducer.data() );
	+
	+ for( iType i = loop_boundaries.start
	+ ; i < loop_boundaries.end
	+ ; i += loop_boundaries.increment ) {
	+ closure( i , reducer.reference() );
	+ }
	+
	+ loop_boundaries.thread.team_reduce( reducer );
	+}
	+
	+//----------------------------------------------------------------------------
	+/** \brief Inter-thread vector parallel_reduce.
	+ *
	+ * Executes lambda(iType i, ValueType & val) for each i=[0..N)
	+ *
	+ * The range [0..N) is mapped to all threads of the
	+ * calling thread team and a summation of val is
	+ * performed and put into result.
	+ */
	+template< typename iType, class Space , class Lambda, typename ValueType >
	+KOKKOS_INLINE_FUNCTION
	+void parallel_reduce
	+ (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >& loop_boundaries,
	+ const Lambda & lambda,
	+ ValueType& result)
	+{
	+ result = ValueType();
	+#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	+#pragma ivdep
	+#endif
	+ for( iType i = loop_boundaries.start ;
	+ i < loop_boundaries.end ;
	+ i += loop_boundaries.increment) {
	+ lambda(i,result);
	+ }
	+}
	+
	+/** \brief Intra-thread vector parallel_reduce.
	+ *
	+ * Executes lambda(iType i, ValueType & val) for each i=[0..N)
	+ *
	+ * The range [0..N) is mapped to all vector lanes of the the
	+ * calling thread and a reduction of val is performed using
	+ * JoinType(ValueType& val, const ValueType& update)
	+ * and put into init_result.
	+ * The input value of init_result is used as initializer for
	+ * temporary variables of ValueType. Therefore * the input
	+ * value should be the neutral element with respect to the
	+ * join operation (e.g. '0 for +-' or * '1 for *').
	+ */
	+template< typename iType, class Space
	+ , class Lambda, class JoinType , typename ValueType >
	+KOKKOS_INLINE_FUNCTION
	+void parallel_reduce
	+ (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >& loop_boundaries,
	+ const Lambda & lambda,
	+ const JoinType & join,
	+ ValueType& result)
	+{
	+#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	+#pragma ivdep
	+#endif
	+ for( iType i = loop_boundaries.start ;
	+ i < loop_boundaries.end ;
	+ i += loop_boundaries.increment ) {
	+ lambda(i,result);
	+ }
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+template< typename iType, class Space, class Closure >
	+KOKKOS_INLINE_FUNCTION
	+void parallel_scan
	+ ( Impl::TeamThreadRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> > const & loop_boundaries
	+ , Closure const & closure
	+ )
	+{
	+ // Extract ValueType from the closure
	+
	+ using value_type =
	+ typename Kokkos::Impl::FunctorAnalysis
	+ < Kokkos::Impl::FunctorPatternInterface::SCAN
	+ , void
	+ , Closure >::value_type ;
	+
	+ value_type accum = 0 ;
	+
	+ // Intra-member scan
	+ for ( iType i = loop_boundaries.start
	+ ; i < loop_boundaries.end
	+ ; i += loop_boundaries.increment ) {
	+ closure(i,accum,false);
	+ }
	+
	+ // 'accum' output is the exclusive prefix sum
	+ accum = loop_boundaries.thread.team_scan(accum);
	+
	+ for ( iType i = loop_boundaries.start
	+ ; i < loop_boundaries.end
	+ ; i += loop_boundaries.increment ) {
	+ closure(i,accum,true);
	+ }
	+}
	+
	+
	+template< typename iType, class Space, class ClosureType >
	+KOKKOS_INLINE_FUNCTION
	+void parallel_scan
	+ ( Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> > const & loop_boundaries
	+ , ClosureType const & closure
	+ )
	+{
	+ using value_type = typename
	+ Kokkos::Impl::FunctorAnalysis
	+ < Impl::FunctorPatternInterface::SCAN
	+ , void
	+ , ClosureType >::value_type ;
	+
	+ value_type scan_val = value_type();
	+
	+#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	+#pragma ivdep
	+#endif
	+ for ( iType i = loop_boundaries.start
	+ ; i < loop_boundaries.end
	+ ; i += loop_boundaries.increment ) {
	+ closure(i,scan_val,true);
	+ }
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+template< class Space >
	+KOKKOS_INLINE_FUNCTION
	+Impl::ThreadSingleStruct<Impl::HostThreadTeamMember<Space> >
	+PerTeam(const Impl::HostThreadTeamMember<Space> & member )
	+{
	+ return Impl::ThreadSingleStruct<Impl::HostThreadTeamMember<Space> >(member);
	+}
	+
	+template< class Space >
	+KOKKOS_INLINE_FUNCTION
	+Impl::VectorSingleStruct<Impl::HostThreadTeamMember<Space> >
	+PerThread(const Impl::HostThreadTeamMember<Space> & member)
	+{
	+ return Impl::VectorSingleStruct<Impl::HostThreadTeamMember<Space> >(member);
	+}
	+
	+template< class Space , class FunctorType >
	+KOKKOS_INLINE_FUNCTION
	+void single( const Impl::ThreadSingleStruct< Impl::HostThreadTeamMember<Space> > & single , const FunctorType & functor )
	+{
	+ if ( single.team_member.team_rank() == 0 ) functor();
	+ // 'single' does not perform a barrier.
	+ // single.team_member.team_barrier( functor );
	+}
	+
	+template< class Space , class FunctorType , typename ValueType >
	+KOKKOS_INLINE_FUNCTION
	+void single( const Impl::ThreadSingleStruct< Impl::HostThreadTeamMember<Space> > & single , const FunctorType & functor , ValueType & val )
	+{
	+ single.team_member.team_broadcast( functor , val , 0 );
	+}
	+
	+template< class Space , class FunctorType >
	+KOKKOS_INLINE_FUNCTION
	+void single( const Impl::VectorSingleStruct< Impl::HostThreadTeamMember<Space> > & , const FunctorType & functor )
	+{
	+ functor();
	+}
	+
	+template< class Space , class FunctorType , typename ValueType >
	+KOKKOS_INLINE_FUNCTION
	+void single( const Impl::VectorSingleStruct< Impl::HostThreadTeamMember<Space> > & , const FunctorType & functor , ValueType & val )
	+{
	+ functor(val);
	+}
	+
	+} /* namespace Kokkos */
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+#endif /* #ifndef KOKKOS_IMPL_HOSTTHREADTEAM_HPP */
	+
	diff --git a/lib/kokkos/core/src/impl/Kokkos_Memory_Fence.hpp b/lib/kokkos/core/src/impl/Kokkos_Memory_Fence.hpp
	index 84cf536bb..7489018ac 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_Memory_Fence.hpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_Memory_Fence.hpp
	@@ -1,107 +1,111 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#if defined( KOKKOS_ATOMIC_HPP ) && ! defined( KOKKOS_MEMORY_FENCE_HPP )
	#define KOKKOS_MEMORY_FENCE_HPP
	namespace Kokkos {

	//----------------------------------------------------------------------------

	KOKKOS_FORCEINLINE_FUNCTION
	void memory_fence()
	{
	#if defined( __CUDA_ARCH__ )
	__threadfence();
	+#elif defined( KOKKOS_ENABLE_ASM ) && defined( KOKKOS_ENABLE_ISA_X86_64 )
	+ asm volatile (
	+ "mfence" ::: "memory"
	+ );
	#elif defined( KOKKOS_ENABLE_GNU_ATOMICS ) \|\| \
	( defined( KOKKOS_COMPILER_NVCC ) && defined( KOKKOS_ENABLE_INTEL_ATOMICS ) )
	__sync_synchronize();
	#elif defined( KOKKOS_ENABLE_INTEL_ATOMICS )
	_mm_mfence();
	#elif defined( KOKKOS_ENABLE_OPENMP_ATOMICS )
	#pragma omp flush
	#elif defined( KOKKOS_ENABLE_WINDOWS_ATOMICS )
	MemoryBarrier();
	#else
	#error "Error: memory_fence() not defined"
	#endif
	}

	//////////////////////////////////////////////////////
	// store_fence()
	//
	// If possible use a store fence on the architecture, if not run a full memory fence

	KOKKOS_FORCEINLINE_FUNCTION
	void store_fence()
	{
	#if defined( KOKKOS_ENABLE_ASM ) && defined( KOKKOS_ENABLE_ISA_X86_64 )
	asm volatile (
	- "sfence" ::: "memory"
	- );
	+ "sfence" ::: "memory"
	+ );
	#else
	memory_fence();
	#endif
	}

	//////////////////////////////////////////////////////
	// load_fence()
	//
	// If possible use a load fence on the architecture, if not run a full memory fence

	KOKKOS_FORCEINLINE_FUNCTION
	void load_fence()
	{
	#if defined( KOKKOS_ENABLE_ASM ) && defined( KOKKOS_ENABLE_ISA_X86_64 )
	asm volatile (
	- "lfence" ::: "memory"
	- );
	+ "lfence" ::: "memory"
	+ );
	#else
	memory_fence();
	#endif
	}

	} // namespace kokkos

	#endif


	diff --git a/lib/kokkos/core/src/impl/Kokkos_OldMacros.hpp b/lib/kokkos/core/src/impl/Kokkos_OldMacros.hpp
	index da95c943f..5852efb01 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_OldMacros.hpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_OldMacros.hpp
	@@ -1,447 +1,447 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_IMPL_OLD_MACROS_HPP
	#define KOKKOS_IMPL_OLD_MACROS_HPP

	#ifdef KOKKOS_ATOMICS_USE_CUDA
	#ifndef KOKKOS_ENABLE_CUDA_ATOMICS
	#define KOKKOS_ENABLE_CUDA_ATOMICS KOKKOS_ATOMICS_USE_CUDA
	#endif
	#endif

	#ifdef KOKKOS_ATOMICS_USE_GCC
	#ifndef KOKKOS_ENABLE_GNU_ATOMICS
	#define KOKKOS_ENABLE_GNU_ATOMICS KOKKOS_ATOMICS_USE_GCC
	#endif
	#endif

	#ifdef KOKKOS_ATOMICS_USE_GNU
	#ifndef KOKKOS_ENABLE_GNU_ATOMICS
	#define KOKKOS_ENABLE_GNU_ATOMICS KOKKOS_ATOMICS_USE_GNU
	#endif
	#endif

	#ifdef KOKKOS_ATOMICS_USE_INTEL
	#ifndef KOKKOS_ENABLE_INTEL_ATOMICS
	#define KOKKOS_ENABLE_INTEL_ATOMICS KOKKOS_ATOMICS_USE_INTEL
	#endif
	#endif

	#ifdef KOKKOS_ATOMICS_USE_OMP31
	#ifndef KOKKOS_ENABLE_OPENMP_ATOMICS
	#define KOKKOS_ENABLE_OPENMP_ATOMICS KOKKOS_ATOMICS_USE_OMP31
	#endif
	#endif

	#ifdef KOKKOS_ATOMICS_USE_OPENMP31
	#ifndef KOKKOS_ENABLE_OPENMP_ATOMICS
	#define KOKKOS_ENABLE_OPENMP_ATOMICS KOKKOS_ATOMICS_USE_OPENMP31
	#endif
	#endif

	#ifdef KOKKOS_ATOMICS_USE_WINDOWS
	#ifndef KOKKOS_ENABLE_WINDOWS_ATOMICS
	#define KOKKOS_ENABLE_WINDOWS_ATOMICS KOKKOS_ATOMICS_USE_WINDOWS
	#endif
	#endif

	#ifdef KOKKOS_CUDA_CLANG_WORKAROUND
	#ifndef KOKKOS_IMPL_CUDA_CLANG_WORKAROUND
	#define KOKKOS_IMPL_CUDA_CLANG_WORKAROUND KOKKOS_CUDA_CLANG_WORKAROUND
	#endif
	#endif

	#ifdef KOKKOS_CUDA_USE_LAMBDA
	#ifndef KOKKOS_ENABLE_CUDA_LAMBDA
	#define KOKKOS_ENABLE_CUDA_LAMBDA KOKKOS_CUDA_USE_LAMBDA
	#endif
	#endif

	#ifdef KOKKOS_CUDA_USE_LDG_INTRINSIC
	#ifndef KOKKOS_ENABLE_CUDA_LDG_INTRINSIC
	#define KOKKOS_ENABLE_CUDA_LDG_INTRINSIC KOKKOS_CUDA_USE_LDG_INTRINSIC
	#endif
	#endif

	#ifdef KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE
	#ifndef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
	#define KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE
	#endif
	#endif

	#ifdef KOKKOS_CUDA_USE_UVM
	#ifndef KOKKOS_ENABLE_CUDA_UVM
	#define KOKKOS_ENABLE_CUDA_UVM KOKKOS_CUDA_USE_UVM
	#endif
	#endif

	#ifdef KOKKOS_HAVE_CUDA
	#ifndef KOKKOS_ENABLE_CUDA
	#define KOKKOS_ENABLE_CUDA KOKKOS_HAVE_CUDA
	#endif
	#endif

	#ifdef KOKKOS_HAVE_CUDA_LAMBDA
	#ifndef KOKKOS_ENABLE_CUDA_LAMBDA
	#define KOKKOS_ENABLE_CUDA_LAMBDA KOKKOS_HAVE_CUDA_LAMBDA
	#endif
	#endif

	#ifdef KOKKOS_HAVE_CUDA_RDC
	-#ifndef KOKKOS_ENABLE_CUDA_RDC
	-#define KOKKOS_ENABLE_CUDA_RDC KOKKOS_HAVE_CUDA_RDC
	+#ifndef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
	+#define KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE KOKKOS_HAVE_CUDA_RDC
	#endif
	#endif

	#ifdef KOKKOS_HAVE_CUSPARSE
	#ifndef KOKKOS_ENABLE_CUSPARSE
	#define KOKKOS_ENABLE_CUSPARSE KOKKOS_HAVE_CUSPARSE
	#endif
	#endif

	#ifdef KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
	#ifndef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
	#define KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
	#endif
	#endif

	#ifdef KOKKOS_HAVE_CXX1Z
	#ifndef KOKKOS_ENABLE_CXX1Z
	#define KOKKOS_ENABLE_CXX1Z KOKKOS_HAVE_CXX1Z
	#endif
	#endif

	#ifdef KOKKOS_HAVE_DEBUG
	#ifndef KOKKOS_DEBUG
	#define KOKKOS_DEBUG KOKKOS_HAVE_DEBUG
	#endif
	#endif

	#ifdef KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_CUDA
	#ifndef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA
	#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_CUDA
	#endif
	#endif

	#ifdef KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_OPENMP
	#ifndef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP
	#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_OPENMP
	#endif
	#endif

	#ifdef KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_SERIAL
	#ifndef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL
	#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_SERIAL
	#endif
	#endif

	#ifdef KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_THREADS
	#ifndef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS
	#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_THREADS
	#endif
	#endif

	#ifdef KOKKOS_HAVE_HBWSPACE
	#ifndef KOKKOS_ENABLE_HBWSPACE
	#define KOKKOS_ENABLE_HBWSPACE KOKKOS_HAVE_HBWSPACE
	#endif
	#endif

	#ifdef KOKKOS_HAVE_HWLOC
	#ifndef KOKKOS_ENABLE_HWLOC
	#define KOKKOS_ENABLE_HWLOC KOKKOS_HAVE_HWLOC
	#endif
	#endif

	#ifdef KOKKOS_HAVE_MPI
	#ifndef KOKKOS_ENABLE_MPI
	#define KOKKOS_ENABLE_MPI KOKKOS_HAVE_MPI
	#endif
	#endif

	#ifdef KOKKOS_HAVE_OPENMP
	#ifndef KOKKOS_ENABLE_OPENMP
	#define KOKKOS_ENABLE_OPENMP KOKKOS_HAVE_OPENMP
	#endif
	#endif

	#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
	#ifndef KOKKOS_ENABLE_PRAGMA_IVDEP
	#define KOKKOS_ENABLE_PRAGMA_IVDEP KOKKOS_HAVE_PRAGMA_IVDEP
	#endif
	#endif

	#ifdef KOKKOS_HAVE_PRAGMA_LOOPCOUNT
	#ifndef KOKKOS_ENABLE_PRAGMA_LOOPCOUNT
	#define KOKKOS_ENABLE_PRAGMA_LOOPCOUNT KOKKOS_HAVE_PRAGMA_LOOPCOUNT
	#endif
	#endif

	#ifdef KOKKOS_HAVE_PRAGMA_SIMD
	#ifndef KOKKOS_ENABLE_PRAGMA_SIMD
	#define KOKKOS_ENABLE_PRAGMA_SIMD KOKKOS_HAVE_PRAGMA_SIMD
	#endif
	#endif

	#ifdef KOKKOS_HAVE_PRAGMA_UNROLL
	#ifndef KOKKOS_ENABLE_PRAGMA_UNROLL
	#define KOKKOS_ENABLE_PRAGMA_UNROLL KOKKOS_HAVE_PRAGMA_UNROLL
	#endif
	#endif

	#ifdef KOKKOS_HAVE_PRAGMA_VECTOR
	#ifndef KOKKOS_ENABLE_PRAGMA_VECTOR
	#define KOKKOS_ENABLE_PRAGMA_VECTOR KOKKOS_HAVE_PRAGMA_VECTOR
	#endif
	#endif

	#ifdef KOKKOS_HAVE_PTHREAD
	#ifndef KOKKOS_ENABLE_PTHREAD
	#define KOKKOS_ENABLE_PTHREAD KOKKOS_HAVE_PTHREAD
	#endif
	#endif

	-#ifdef KOKKOS_HAVE_QTHREAD
	-#ifndef KOKKOS_ENABLE_QTHREAD
	-#define KOKKOS_ENABLE_QTHREAD KOKKOS_HAVE_QTHREAD
	+#ifdef KOKKOS_HAVE_QTHREADS
	+#ifndef KOKKOS_ENABLE_QTHREADS
	+#define KOKKOS_ENABLE_QTHREADS KOKKOS_HAVE_QTHREADS
	#endif
	#endif

	#ifdef KOKKOS_HAVE_SERIAL
	#ifndef KOKKOS_ENABLE_SERIAL
	#define KOKKOS_ENABLE_SERIAL KOKKOS_HAVE_SERIAL
	#endif
	#endif

	#ifdef KOKKOS_HAVE_TYPE
	#ifndef KOKKOS_IMPL_HAS_TYPE
	#define KOKKOS_IMPL_HAS_TYPE KOKKOS_HAVE_TYPE
	#endif
	#endif

	#ifdef KOKKOS_HAVE_WINTHREAD
	#ifndef KOKKOS_ENABLE_WINTHREAD
	#define KOKKOS_ENABLE_WINTHREAD KOKKOS_HAVE_WINTHREAD
	#endif
	#endif

	#ifdef KOKKOS_HAVE_Winthread
	#ifndef KOKKOS_ENABLE_WINTHREAD
	#define KOKKOS_ENABLE_WINTHREAD KOKKOS_HAVE_Winthread
	#endif
	#endif

	#ifdef KOKKOS_INTEL_MM_ALLOC_AVAILABLE
	#ifndef KOKKOS_ENABLE_INTEL_MM_ALLOC
	#define KOKKOS_ENABLE_INTEL_MM_ALLOC KOKKOS_INTEL_MM_ALLOC_AVAILABLE
	#endif
	#endif

	#ifdef KOKKOS_MACRO_IMPL_TO_STRING
	#ifndef KOKKOS_IMPL_MACRO_TO_STRING
	#define KOKKOS_IMPL_MACRO_TO_STRING KOKKOS_MACRO_IMPL_TO_STRING
	#endif
	#endif

	#ifdef KOKKOS_MACRO_TO_STRING
	#ifndef KOKKOS_MACRO_TO_STRING
	#define KOKKOS_MACRO_TO_STRING KOKKOS_MACRO_TO_STRING
	#endif
	#endif

	#ifdef KOKKOS_MAY_ALIAS
	#ifndef KOKKOS_IMPL_MAY_ALIAS
	#define KOKKOS_IMPL_MAY_ALIAS KOKKOS_MAY_ALIAS
	#endif
	#endif

	#ifdef KOKKOS_MDRANGE_IVDEP
	#ifndef KOKKOS_IMPL_MDRANGE_IVDEP
	#define KOKKOS_IMPL_MDRANGE_IVDEP KOKKOS_MDRANGE_IVDEP
	#endif
	#endif


	#ifdef KOKKOS_MEMPOOL_PRINTERR
	#ifndef KOKKOS_ENABLE_MEMPOOL_PRINTERR
	#define KOKKOS_ENABLE_MEMPOOL_PRINTERR KOKKOS_MEMPOOL_PRINTERR
	#endif
	#endif

	#ifdef KOKKOS_MEMPOOL_PRINT_ACTIVE_SUPERBLOCKS
	#ifndef KOKKOS_ENABLE_MEMPOOL_PRINT_ACTIVE_SUPERBLOCKS
	#define KOKKOS_ENABLE_MEMPOOL_PRINT_ACTIVE_SUPERBLOCKS KOKKOS_MEMPOOL_PRINT_ACTIVE_SUPERBLOCKS
	#endif
	#endif

	#ifdef KOKKOS_MEMPOOL_PRINT_BLOCKSIZE_INFO
	#ifndef KOKKOS_ENABLE_MEMPOOL_PRINT_BLOCKSIZE_INFO
	#define KOKKOS_ENABLE_MEMPOOL_PRINT_BLOCKSIZE_INFO KOKKOS_MEMPOOL_PRINT_BLOCKSIZE_INFO
	#endif
	#endif

	#ifdef KOKKOS_MEMPOOL_PRINT_CONSTRUCTOR_INFO
	#ifndef KOKKOS_ENABLE_MEMPOOL_PRINT_CONSTRUCTOR_INFO
	#define KOKKOS_ENABLE_MEMPOOL_PRINT_CONSTRUCTOR_INFO KOKKOS_MEMPOOL_PRINT_CONSTRUCTOR_INFO
	#endif
	#endif

	#ifdef KOKKOS_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
	#ifndef KOKKOS_ENABLE_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
	#define KOKKOS_ENABLE_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO KOKKOS_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
	#endif
	#endif

	#ifdef KOKKOS_MEMPOOL_PRINT_INFO
	#ifndef KOKKOS_ENABLE_MEMPOOL_PRINT_INFO
	#define KOKKOS_ENABLE_MEMPOOL_PRINT_INFO KOKKOS_MEMPOOL_PRINT_INFO
	#endif
	#endif

	#ifdef KOKKOS_MEMPOOL_PRINT_PAGE_INFO
	#ifndef KOKKOS_ENABLE_MEMPOOL_PRINT_PAGE_INFO
	#define KOKKOS_ENABLE_MEMPOOL_PRINT_PAGE_INFO KOKKOS_MEMPOOL_PRINT_PAGE_INFO
	#endif
	#endif

	#ifdef KOKKOS_MEMPOOL_PRINT_SUPERBLOCK_INFO
	#ifndef KOKKOS_ENABLE_MEMPOOL_PRINT_SUPERBLOCK_INFO
	#define KOKKOS_ENABLE_MEMPOOL_PRINT_SUPERBLOCK_INFO KOKKOS_MEMPOOL_PRINT_SUPERBLOCK_INFO
	#endif
	#endif

	#ifdef KOKKOS_POSIX_MEMALIGN_AVAILABLE
	#ifndef KOKKOS_ENABLE_POSIX_MEMALIGN
	#define KOKKOS_ENABLE_POSIX_MEMALIGN KOKKOS_POSIX_MEMALIGN_AVAILABLE
	#endif
	#endif

	#ifdef KOKKOS_POSIX_MMAP_FLAGS
	#ifndef KOKKOS_IMPL_POSIX_MMAP_FLAGS
	#define KOKKOS_IMPL_POSIX_MMAP_FLAGS KOKKOS_POSIX_MMAP_FLAGS
	#endif
	#endif

	#ifdef KOKKOS_POSIX_MMAP_FLAGS_HUGE
	#ifndef KOKKOS_IMPL_POSIX_MMAP_FLAGS_HUGE
	#define KOKKOS_IMPL_POSIX_MMAP_FLAGS_HUGE KOKKOS_POSIX_MMAP_FLAGS_HUGE
	#endif
	#endif

	#ifdef KOKKOS_SHARED_ALLOCATION_TRACKER_DECREMENT
	#ifndef KOKKOS_IMPL_SHARED_ALLOCATION_TRACKER_DECREMENT
	#define KOKKOS_IMPL_SHARED_ALLOCATION_TRACKER_DECREMENT KOKKOS_SHARED_ALLOCATION_TRACKER_DECREMENT
	#endif
	#endif

	#ifdef KOKKOS_SHARED_ALLOCATION_TRACKER_ENABLED
	#ifndef KOKKOS_IMPL_SHARED_ALLOCATION_TRACKER_ENABLED
	#define KOKKOS_IMPL_SHARED_ALLOCATION_TRACKER_ENABLED KOKKOS_SHARED_ALLOCATION_TRACKER_ENABLED
	#endif
	#endif

	#ifdef KOKKOS_SHARED_ALLOCATION_TRACKER_INCREMENT
	#ifndef KOKKOS_IMPL_SHARED_ALLOCATION_TRACKER_INCREMENT
	#define KOKKOS_IMPL_SHARED_ALLOCATION_TRACKER_INCREMENT KOKKOS_SHARED_ALLOCATION_TRACKER_INCREMENT
	#endif
	#endif

	#ifdef KOKKOS_USE_CUDA_UVM
	#ifndef KOKKOS_ENABLE_CUDA_UVM
	#define KOKKOS_ENABLE_CUDA_UVM KOKKOS_USE_CUDA_UVM
	#endif
	#endif

	#ifdef KOKKOS_USE_ISA_KNC
	#ifndef KOKKOS_ENABLE_ISA_KNC
	#define KOKKOS_ENABLE_ISA_KNC KOKKOS_USE_ISA_KNC
	#endif
	#endif

	#ifdef KOKKOS_USE_ISA_POWERPCLE
	#ifndef KOKKOS_ENABLE_ISA_POWERPCLE
	#define KOKKOS_ENABLE_ISA_POWERPCLE KOKKOS_USE_ISA_POWERPCLE
	#endif
	#endif

	#ifdef KOKKOS_USE_ISA_X86_64
	#ifndef KOKKOS_ENABLE_ISA_X86_64
	#define KOKKOS_ENABLE_ISA_X86_64 KOKKOS_USE_ISA_X86_64
	#endif
	#endif

	#ifdef KOKKOS_USE_LIBRT
	#ifndef KOKKOS_ENABLE_LIBRT
	#define KOKKOS_ENABLE_LIBRT KOKKOS_USE_LIBRT
	#endif
	#endif

	#ifdef KOKKOS_VIEW_OPERATOR_VERIFY
	#ifndef KOKKOS_IMPL_VIEW_OPERATOR_VERIFY
	#define KOKKOS_IMPL_VIEW_OPERATOR_VERIFY KOKKOS_VIEW_OPERATOR_VERIFY
	#endif
	#endif

	//------------------------------------------------------------------------------
	// Deprecated macros
	//------------------------------------------------------------------------------
	#ifdef KOKKOS_HAVE_CXX11
	#undef KOKKOS_HAVE_CXX11
	#endif
	#ifdef KOKKOS_ENABLE_CXX11
	#undef KOKKOS_ENABLE_CXX11
	#endif
	#ifdef KOKKOS_USING_EXP_VIEW
	#undef KOKKOS_USING_EXP_VIEW
	#endif
	#ifdef KOKKOS_USING_EXPERIMENTAL_VIEW
	#undef KOKKOS_USING_EXPERIMENTAL_VIEW
	#endif

	#define KOKKOS_HAVE_CXX11 1
	#define KOKKOS_ENABLE_CXX11 1
	#define KOKKOS_USING_EXP_VIEW 1
	#define KOKKOS_USING_EXPERIMENTAL_VIEW 1

	#endif //KOKKOS_IMPL_OLD_MACROS_HPP
	diff --git a/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.cpp b/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.cpp
	index 99c5df4db..0c006a8c0 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.cpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.cpp
	@@ -1,237 +1,237 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <impl/Kokkos_Profiling_Interface.hpp>

	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	#include <string.h>

	namespace Kokkos {
	namespace Profiling {

	SpaceHandle::SpaceHandle(const char* space_name) {
	strncpy(name,space_name,64);
	}

	bool profileLibraryLoaded() {
	return (NULL != initProfileLibrary);
	}

	void beginParallelFor(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID) {
	if(NULL != beginForCallee) {
	Kokkos::fence();
	(*beginForCallee)(kernelPrefix.c_str(), devID, kernelID);
	}
	}

	void endParallelFor(const uint64_t kernelID) {
	if(NULL != endForCallee) {
	Kokkos::fence();
	(*endForCallee)(kernelID);
	}
	}

	void beginParallelScan(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID) {
	if(NULL != beginScanCallee) {
	Kokkos::fence();
	(*beginScanCallee)(kernelPrefix.c_str(), devID, kernelID);
	}
	}

	void endParallelScan(const uint64_t kernelID) {
	if(NULL != endScanCallee) {
	Kokkos::fence();
	(*endScanCallee)(kernelID);
	}
	}
	-
	+
	void beginParallelReduce(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID) {
	if(NULL != beginReduceCallee) {
	Kokkos::fence();
	(*beginReduceCallee)(kernelPrefix.c_str(), devID, kernelID);
	}
	}
	-
	+
	void endParallelReduce(const uint64_t kernelID) {
	if(NULL != endReduceCallee) {
	Kokkos::fence();
	(*endReduceCallee)(kernelID);
	}
	}
	-
	+

	void pushRegion(const std::string& kName) {
	if( NULL != pushRegionCallee ) {
	Kokkos::fence();
	(*pushRegionCallee)(kName.c_str());
	}
	}

	void popRegion() {
	if( NULL != popRegionCallee ) {
	Kokkos::fence();
	(*popRegionCallee)();
	}
	}

	void allocateData(const SpaceHandle space, const std::string label, const void* ptr, const uint64_t size) {
	if(NULL != allocateDataCallee) {
	(*allocateDataCallee)(space,label.c_str(),ptr,size);
	}
	}

	void deallocateData(const SpaceHandle space, const std::string label, const void* ptr, const uint64_t size) {
	if(NULL != allocateDataCallee) {
	(*deallocateDataCallee)(space,label.c_str(),ptr,size);
	}
	}

	void initialize() {

	// Make sure initialize calls happens only once
	static int is_initialized = 0;
	if(is_initialized) return;
	is_initialized = 1;

	void* firstProfileLibrary;

	char* envProfileLibrary = getenv("KOKKOS_PROFILE_LIBRARY");

	// If we do not find a profiling library in the environment then exit
	// early.
	if( NULL == envProfileLibrary ) {
	return ;
	}

	char* envProfileCopy = (char) malloc(sizeof(char) (strlen(envProfileLibrary) + 1));
	sprintf(envProfileCopy, "%s", envProfileLibrary);

	char* profileLibraryName = strtok(envProfileCopy, ";");

	if( (NULL != profileLibraryName) && (strcmp(profileLibraryName, "") != 0) ) {
	firstProfileLibrary = dlopen(profileLibraryName, RTLD_NOW \| RTLD_GLOBAL);

	if(NULL == firstProfileLibrary) {
	std::cerr << "Error: Unable to load KokkosP library: " <<
	profileLibraryName << std::endl;
	} else {
	std::cout << "KokkosP: Library Loaded: " << profileLibraryName << std::endl;

	// dlsym returns a pointer to an object, while we want to assign to pointer to function
	// A direct cast will give warnings hence, we have to workaround the issue by casting pointer to pointers.
	auto p1 = dlsym(firstProfileLibrary, "kokkosp_begin_parallel_for");
	beginForCallee = ((beginFunction) &p1);
	auto p2 = dlsym(firstProfileLibrary, "kokkosp_begin_parallel_scan");
	beginScanCallee = ((beginFunction) &p2);
	auto p3 = dlsym(firstProfileLibrary, "kokkosp_begin_parallel_reduce");
	beginReduceCallee = ((beginFunction) &p3);

	auto p4 = dlsym(firstProfileLibrary, "kokkosp_end_parallel_scan");
	endScanCallee = ((endFunction) &p4);
	auto p5 = dlsym(firstProfileLibrary, "kokkosp_end_parallel_for");
	endForCallee = ((endFunction) &p5);
	auto p6 = dlsym(firstProfileLibrary, "kokkosp_end_parallel_reduce");
	endReduceCallee = ((endFunction) &p6);

	auto p7 = dlsym(firstProfileLibrary, "kokkosp_init_library");
	initProfileLibrary = ((initFunction) &p7);
	auto p8 = dlsym(firstProfileLibrary, "kokkosp_finalize_library");
	finalizeProfileLibrary = ((finalizeFunction) &p8);

	auto p9 = dlsym(firstProfileLibrary, "kokkosp_push_profile_region");
	pushRegionCallee = ((pushFunction) &p9);
	auto p10 = dlsym(firstProfileLibrary, "kokkosp_pop_profile_region");
	popRegionCallee = ((popFunction) &p10);

	auto p11 = dlsym(firstProfileLibrary, "kokkosp_allocate_data");
	allocateDataCallee = ((allocateDataFunction) &p11);
	auto p12 = dlsym(firstProfileLibrary, "kokkosp_deallocate_data");
	deallocateDataCallee = ((deallocateDataFunction) &p12);

	}
	}

	if(NULL != initProfileLibrary) {
	(*initProfileLibrary)(0,
	(uint64_t) KOKKOSP_INTERFACE_VERSION,
	(uint32_t) 0,
	NULL);
	}

	free(envProfileCopy);
	}

	void finalize() {
	// Make sure finalize calls happens only once
	static int is_finalized = 0;
	if(is_finalized) return;
	is_finalized = 1;

	if(NULL != finalizeProfileLibrary) {
	(*finalizeProfileLibrary)();

	// Set all profile hooks to NULL to prevent
	// any additional calls. Once we are told to
	// finalize, we mean it
	initProfileLibrary = NULL;
	finalizeProfileLibrary = NULL;

	beginForCallee = NULL;
	beginScanCallee = NULL;
	beginReduceCallee = NULL;
	endScanCallee = NULL;
	endForCallee = NULL;
	endReduceCallee = NULL;

	pushRegionCallee = NULL;
	popRegionCallee = NULL;

	allocateDataCallee = NULL;
	deallocateDataCallee = NULL;

	}
	}
	}
	}

	#endif
	diff --git a/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.hpp b/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.hpp
	index 3d6a38925..139a20d8f 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.hpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.hpp
	@@ -1,151 +1,151 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOSP_INTERFACE_HPP
	#define KOKKOSP_INTERFACE_HPP

	#include <cstddef>
	#include <Kokkos_Core_fwd.hpp>
	#include <Kokkos_Macros.hpp>
	#include <string>
	#include <cinttypes>

	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	#include <impl/Kokkos_Profiling_DeviceInfo.hpp>
	#include <dlfcn.h>
	#include <iostream>
	#include <stdlib.h>
	#endif

	#define KOKKOSP_INTERFACE_VERSION 20150628

	-#if (KOKKOS_ENABLE_PROFILING)
	+#if defined(KOKKOS_ENABLE_PROFILING)
	namespace Kokkos {
	namespace Profiling {

	struct SpaceHandle {
	SpaceHandle(const char* space_name);
	char name[64];
	};

	typedef void (*initFunction)(const int,
	const uint64_t,
	const uint32_t,
	KokkosPDeviceInfo*);
	typedef void (*finalizeFunction)();
	typedef void (beginFunction)(const char, const uint32_t, uint64_t*);
	typedef void (*endFunction)(uint64_t);

	typedef void (pushFunction)(const char);
	typedef void (*popFunction)();

	typedef void (allocateDataFunction)(const SpaceHandle, const char, const void*, const uint64_t);
	typedef void (deallocateDataFunction)(const SpaceHandle, const char, const void*, const uint64_t);


	static initFunction initProfileLibrary = NULL;
	static finalizeFunction finalizeProfileLibrary = NULL;

	static beginFunction beginForCallee = NULL;
	static beginFunction beginScanCallee = NULL;
	static beginFunction beginReduceCallee = NULL;
	static endFunction endForCallee = NULL;
	static endFunction endScanCallee = NULL;
	static endFunction endReduceCallee = NULL;

	static pushFunction pushRegionCallee = NULL;
	static popFunction popRegionCallee = NULL;

	static allocateDataFunction allocateDataCallee = NULL;
	static deallocateDataFunction deallocateDataCallee = NULL;


	bool profileLibraryLoaded();

	void beginParallelFor(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID);
	void endParallelFor(const uint64_t kernelID);
	void beginParallelScan(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID);
	void endParallelScan(const uint64_t kernelID);
	void beginParallelReduce(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID);
	void endParallelReduce(const uint64_t kernelID);

	void pushRegion(const std::string& kName);
	void popRegion();

	void allocateData(const SpaceHandle space, const std::string label, const void* ptr, const uint64_t size);
	void deallocateData(const SpaceHandle space, const std::string label, const void* ptr, const uint64_t size);

	void initialize();
	void finalize();

	//Define finalize_fake inline to get rid of warnings for unused static variables
	inline void finalize_fake() {
	if(NULL != finalizeProfileLibrary) {
	(*finalizeProfileLibrary)();

	// Set all profile hooks to NULL to prevent
	// any additional calls. Once we are told to
	// finalize, we mean it
	beginForCallee = NULL;
	beginScanCallee = NULL;
	beginReduceCallee = NULL;
	endScanCallee = NULL;
	endForCallee = NULL;
	endReduceCallee = NULL;

	allocateDataCallee = NULL;
	deallocateDataCallee = NULL;

	initProfileLibrary = NULL;
	finalizeProfileLibrary = NULL;
	pushRegionCallee = NULL;
	popRegionCallee = NULL;
	}
	}


	}
	}

	#endif
	#endif
	diff --git a/lib/kokkos/core/src/impl/Kokkos_Reducer.hpp b/lib/kokkos/core/src/impl/Kokkos_Reducer.hpp
	new file mode 100644
	index 000000000..b3ed5f151
	--- /dev/null
	+++ b/lib/kokkos/core/src/impl/Kokkos_Reducer.hpp
	@@ -0,0 +1,317 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+#ifndef KOKKOS_IMPL_REDUCER_HPP
	+#define KOKKOS_IMPL_REDUCER_HPP
	+
	+#include <impl/Kokkos_Traits.hpp>
	+
	+//----------------------------------------------------------------------------
	+/* Reducer abstraction:
	+ * 1) Provides 'join' operation
	+ * 2) Provides 'init' operation
	+ * 3) Provides 'copy' operation
	+ * 4) Optionally provides result value in a memory space
	+ *
	+ * Created from:
	+ * 1) Functor::operator()( destination , source )
	+ * 2) Functor::{ join , init )
	+ */
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+namespace Impl {
	+
	+template< typename value_type >
	+struct ReduceSum
	+{
	+ KOKKOS_INLINE_FUNCTION static
	+ void copy( value_type & dest
	+ , value_type const & src ) noexcept
	+ { dest = src ; }
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void init( value_type & dest ) noexcept
	+ { new( &dest ) value_type(); }
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void join( value_type volatile & dest
	+ , value_type const volatile & src ) noexcept
	+ { dest += src ; }
	+
	+ KOKKOS_INLINE_FUNCTION static
	+ void join( value_type & dest
	+ , value_type const & src ) noexcept
	+ { dest += src ; }
	+};
	+
	+template< typename T
	+ , class ReduceOp = ReduceSum< T >
	+ , typename MemorySpace = void >
	+struct Reducer
	+ : private ReduceOp
	+ , private integral_nonzero_constant
	+ < int , ( std::rank<T>::value == 1 ? std::extent<T>::value : 1 )>
	+{
	+private:
	+
	+ // Determine if T is simple array
	+
	+ enum : int { rank = std::rank<T>::value };
	+
	+ static_assert( rank <= 1 , "Kokkos::Impl::Reducer type is at most rank-one" );
	+
	+ using length_t =
	+ integral_nonzero_constant<int,( rank == 1 ? std::extent<T>::value : 1 )> ;
	+
	+public:
	+
	+ using reducer = Reducer ;
	+ using memory_space = MemorySpace ;
	+ using value_type = typename std::remove_extent<T>::type ;
	+ using reference_type =
	+ typename std::conditional< ( rank != 0 )
	+ , value_type *
	+ , value_type &
	+ >::type ;
	+private:
	+
	+ //--------------------------------------------------------------------------
	+ // Determine what functions 'ReduceOp' provides:
	+ // copy( destination , source )
	+ // init( destination )
	+ //
	+ // operator()( destination , source )
	+ // join( destination , source )
	+ //
	+ // Provide defaults for missing optional operations
	+
	+ template< class R , typename = void>
	+ struct COPY {
	+ KOKKOS_INLINE_FUNCTION static
	+ void copy( R const &
	+ , value_type * dst
	+ , value_type const * src ) { dst = src ; }
	+ };
	+
	+ template< class R >
	+ struct COPY< R , decltype( ((R)0)->copy( ((value_type*)0)
	+ , ((value_type const )0) ) ) >
	+ {
	+ KOKKOS_INLINE_FUNCTION static
	+ void copy( R const & r
	+ , value_type * dst
	+ , value_type const * src ) { r.copy( dst , src ); }
	+ };
	+
	+ template< class R , typename = void >
	+ struct INIT {
	+ KOKKOS_INLINE_FUNCTION static
	+ void init( R const & , value_type * dst ) { new(dst) value_type(); }
	+ };
	+
	+ template< class R >
	+ struct INIT< R , decltype( ((R)0)->init( ((value_type*)0 ) ) ) >
	+ {
	+ KOKKOS_INLINE_FUNCTION static
	+ void init( R const & r , value_type * dst ) { r.init( *dst ); }
	+ };
	+
	+ template< class R , typename V , typename = void > struct JOIN
	+ {
	+ // If no join function then try operator()
	+ KOKKOS_INLINE_FUNCTION static
	+ void join( R const & r , V * dst , V const * src )
	+ { r.operator()(dst,src); }
	+ };
	+
	+ template< class R , typename V >
	+ struct JOIN< R , V , decltype( ((R)0)->join ( ((V )0) , ((V const *)0) ) ) >
	+ {
	+ // If has join function use it
	+ KOKKOS_INLINE_FUNCTION static
	+ void join( R const & r , V * dst , V const * src )
	+ { r.join(dst,src); }
	+ };
	+
	+ //--------------------------------------------------------------------------
	+
	+ value_type * const m_result ;
	+
	+ template< int Rank >
	+ KOKKOS_INLINE_FUNCTION
	+ static constexpr
	+ typename std::enable_if< ( 0 != Rank ) , reference_type >::type
	+ ref( value_type * p ) noexcept { return p ; }
	+
	+ template< int Rank >
	+ KOKKOS_INLINE_FUNCTION
	+ static constexpr
	+ typename std::enable_if< ( 0 == Rank ) , reference_type >::type
	+ ref( value_type * p ) noexcept { return *p ; }
	+
	+public:
	+
	+ //--------------------------------------------------------------------------
	+
	+ KOKKOS_INLINE_FUNCTION
	+ constexpr int length() const noexcept
	+ { return length_t::value ; }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ value_type * data() const noexcept
	+ { return m_result ; }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ reference_type reference() const noexcept
	+ { return Reducer::template ref< rank >( m_result ); }
	+
	+ //--------------------------------------------------------------------------
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void copy( value_type * const dest
	+ , value_type const * const src ) const noexcept
	+ {
	+ for ( int i = 0 ; i < length() ; ++i ) {
	+ Reducer::template COPY<ReduceOp>::copy( (ReduceOp &) *this , dest + i , src + i );
	+ }
	+ }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void init( value_type * dest ) const noexcept
	+ {
	+ for ( int i = 0 ; i < length() ; ++i ) {
	+ Reducer::template INIT<ReduceOp>::init( (ReduceOp &) *this , dest + i );
	+ }
	+ }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void join( value_type * const dest
	+ , value_type const * const src ) const noexcept
	+ {
	+ for ( int i = 0 ; i < length() ; ++i ) {
	+ Reducer::template JOIN<ReduceOp,value_type>::join( (ReduceOp &) *this , dest + i , src + i );
	+ }
	+ }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void join( value_type volatile * const dest
	+ , value_type volatile const * const src ) const noexcept
	+ {
	+ for ( int i = 0 ; i < length() ; ++i ) {
	+ Reducer::template JOIN<ReduceOp,value_type volatile>::join( (ReduceOp &) *this , dest + i , src + i );
	+ }
	+ }
	+
	+ //--------------------------------------------------------------------------
	+
	+ template< typename ArgT >
	+ KOKKOS_INLINE_FUNCTION explicit
	+ constexpr Reducer
	+ ( ArgT * arg_value
	+ , typename std::enable_if
	+ < std::is_same<ArgT,value_type>::value &&
	+ std::is_default_constructible< ReduceOp >::value
	+ , int >::type arg_length = 1
	+ ) noexcept
	+ : ReduceOp(), length_t( arg_length ), m_result( arg_value ) {}
	+
	+ KOKKOS_INLINE_FUNCTION explicit
	+ constexpr Reducer( ReduceOp const & arg_op
	+ , value_type * arg_value = 0
	+ , int arg_length = 1 ) noexcept
	+ : ReduceOp( arg_op ), length_t( arg_length ), m_result( arg_value ) {}
	+
	+ KOKKOS_INLINE_FUNCTION explicit
	+ constexpr Reducer( ReduceOp && arg_op
	+ , value_type * arg_value = 0
	+ , int arg_length = 1 ) noexcept
	+ : ReduceOp( arg_op ), length_t( arg_length ), m_result( arg_value ) {}
	+
	+ Reducer( Reducer const & ) = default ;
	+ Reducer( Reducer && ) = default ;
	+ Reducer & operator = ( Reducer const & ) = default ;
	+ Reducer & operator = ( Reducer && ) = default ;
	+};
	+
	+} // namespace Impl
	+} // namespace Kokkos
	+
	+//----------------------------------------------------------------------------
	+//----------------------------------------------------------------------------
	+
	+namespace Kokkos {
	+
	+template< typename ValueType >
	+constexpr
	+Impl::Reducer< ValueType , Impl::ReduceSum< ValueType > >
	+Sum( ValueType & arg_value )
	+{
	+ static_assert( std::is_trivial<ValueType>::value
	+ , "Kokkos reducer requires trivial value type" );
	+ return Impl::Reducer< ValueType , Impl::ReduceSum< ValueType > >( & arg_value );
	+}
	+
	+template< typename ValueType >
	+constexpr
	+Impl::Reducer< ValueType[] , Impl::ReduceSum< ValueType > >
	+Sum( ValueType * arg_value , int arg_length )
	+{
	+ static_assert( std::is_trivial<ValueType>::value
	+ , "Kokkos reducer requires trivial value type" );
	+ return Impl::Reducer< ValueType[] , Impl::ReduceSum< ValueType > >( arg_value , arg_length );
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+template< typename ValueType , class JoinType >
	+Impl::Reducer< ValueType , JoinType >
	+reducer( ValueType & value , JoinType const & lambda )
	+{
	+ return Impl::Reducer< ValueType , JoinType >( lambda , & value );
	+}
	+
	+} // namespace Kokkos
	+
	+#endif /* #ifndef KOKKOS_IMPL_REDUCER_HPP */
	+
	diff --git a/lib/kokkos/core/src/impl/Kokkos_Serial.cpp b/lib/kokkos/core/src/impl/Kokkos_Serial.cpp
	index 76161c10f..794961330 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_Serial.cpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_Serial.cpp
	@@ -1,119 +1,182 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <stdlib.h>
	#include <sstream>
	#include <Kokkos_Serial.hpp>
	#include <impl/Kokkos_Traits.hpp>
	#include <impl/Kokkos_Error.hpp>

	#if defined( KOKKOS_ENABLE_SERIAL )

	/--------------------------------------------------------------------------/

	namespace Kokkos {
	namespace Impl {
	-namespace SerialImpl {
	+namespace {

	-Sentinel::Sentinel() : m_scratch(0), m_reduce_end(0), m_shared_end(0) {}
	+HostThreadTeamData g_serial_thread_team_data ;

	-Sentinel::~Sentinel()
	-{
	- if ( m_scratch ) { free( m_scratch ); }
	- m_scratch = 0 ;
	- m_reduce_end = 0 ;
	- m_shared_end = 0 ;
	}

	-Sentinel & Sentinel::singleton()
	+// Resize thread team data scratch memory
	+void serial_resize_thread_team_data( size_t pool_reduce_bytes
	+ , size_t team_reduce_bytes
	+ , size_t team_shared_bytes
	+ , size_t thread_local_bytes )
	{
	- static Sentinel s ; return s ;
	+ if ( pool_reduce_bytes < 512 ) pool_reduce_bytes = 512 ;
	+ if ( team_reduce_bytes < 512 ) team_reduce_bytes = 512 ;
	+
	+ const size_t old_pool_reduce = g_serial_thread_team_data.pool_reduce_bytes();
	+ const size_t old_team_reduce = g_serial_thread_team_data.team_reduce_bytes();
	+ const size_t old_team_shared = g_serial_thread_team_data.team_shared_bytes();
	+ const size_t old_thread_local = g_serial_thread_team_data.thread_local_bytes();
	+ const size_t old_alloc_bytes = g_serial_thread_team_data.scratch_bytes();
	+
	+ // Allocate if any of the old allocation is tool small:
	+
	+ const bool allocate = ( old_pool_reduce < pool_reduce_bytes ) \|\|
	+ ( old_team_reduce < team_reduce_bytes ) \|\|
	+ ( old_team_shared < team_shared_bytes ) \|\|
	+ ( old_thread_local < thread_local_bytes );
	+
	+ if ( allocate ) {
	+
	+ Kokkos::HostSpace space ;
	+
	+ if ( old_alloc_bytes ) {
	+ g_serial_thread_team_data.disband_team();
	+ g_serial_thread_team_data.disband_pool();
	+
	+ space.deallocate( g_serial_thread_team_data.scratch_buffer()
	+ , g_serial_thread_team_data.scratch_bytes() );
	+ }
	+
	+ if ( pool_reduce_bytes < old_pool_reduce ) { pool_reduce_bytes = old_pool_reduce ; }
	+ if ( team_reduce_bytes < old_team_reduce ) { team_reduce_bytes = old_team_reduce ; }
	+ if ( team_shared_bytes < old_team_shared ) { team_shared_bytes = old_team_shared ; }
	+ if ( thread_local_bytes < old_thread_local ) { thread_local_bytes = old_thread_local ; }
	+
	+ const size_t alloc_bytes =
	+ HostThreadTeamData::scratch_size( pool_reduce_bytes
	+ , team_reduce_bytes
	+ , team_shared_bytes
	+ , thread_local_bytes );
	+
	+ void * const ptr = space.allocate( alloc_bytes );
	+
	+ g_serial_thread_team_data.
	+ scratch_assign( ((char *)ptr)
	+ , alloc_bytes
	+ , pool_reduce_bytes
	+ , team_reduce_bytes
	+ , team_shared_bytes
	+ , thread_local_bytes );
	+
	+ HostThreadTeamData * pool[1] = { & g_serial_thread_team_data };
	+
	+ g_serial_thread_team_data.organize_pool( pool , 1 );
	+ g_serial_thread_team_data.organize_team(1);
	+ }
	}

	-inline
	-unsigned align( unsigned n )
	+// Get thread team data structure for omp_get_thread_num()
	+HostThreadTeamData * serial_get_thread_team_data()
	{
	- enum { ALIGN = 0x0100 /* 256 */ , MASK = ALIGN - 1 };
	- return ( n + MASK ) & ~MASK ;
	+ return & g_serial_thread_team_data ;
	}

	-} // namespace
	+} // namespace Impl
	+} // namespace Kokkos

	-SerialTeamMember::SerialTeamMember( int arg_league_rank
	- , int arg_league_size
	- , int arg_shared_size
	- )
	- : m_space( ((char *) SerialImpl::Sentinel::singleton().m_scratch) + SerialImpl::Sentinel::singleton().m_reduce_end
	- , arg_shared_size )
	- , m_league_rank( arg_league_rank )
	- , m_league_size( arg_league_size )
	-{}
	+/--------------------------------------------------------------------------/

	-} // namespace Impl
	+namespace Kokkos {

	-void * Serial::scratch_memory_resize( unsigned reduce_size , unsigned shared_size )
	+int Serial::is_initialized()
	{
	- static Impl::SerialImpl::Sentinel & s = Impl::SerialImpl::Sentinel::singleton();
	+ return 1 ;
	+}

	- reduce_size = Impl::SerialImpl::align( reduce_size );
	- shared_size = Impl::SerialImpl::align( shared_size );
	+void Serial::initialize( unsigned threads_count
	+ , unsigned use_numa_count
	+ , unsigned use_cores_per_numa
	+ , bool allow_asynchronous_threadpool )
	+{
	+ (void) threads_count;
	+ (void) use_numa_count;
	+ (void) use_cores_per_numa;
	+ (void) allow_asynchronous_threadpool;
	+
	+ // Init the array of locks used for arbitrarily sized atomics
	+ Impl::init_lock_array_host_space();
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	+ Kokkos::Profiling::initialize();
	+ #endif
	+}

	- if ( ( s.m_reduce_end < reduce_size ) \|\|
	- ( s.m_shared_end < s.m_reduce_end + shared_size ) ) {
	+void Serial::finalize()
	+{
	+ if ( Impl::g_serial_thread_team_data.scratch_buffer() ) {
	+ Impl::g_serial_thread_team_data.disband_team();
	+ Impl::g_serial_thread_team_data.disband_pool();

	- if ( s.m_scratch ) { free( s.m_scratch ); }
	+ Kokkos::HostSpace space ;

	- if ( s.m_reduce_end < reduce_size ) s.m_reduce_end = reduce_size ;
	- if ( s.m_shared_end < s.m_reduce_end + shared_size ) s.m_shared_end = s.m_reduce_end + shared_size ;
	+ space.deallocate( Impl::g_serial_thread_team_data.scratch_buffer()
	+ , Impl::g_serial_thread_team_data.scratch_bytes() );

	- s.m_scratch = malloc( s.m_shared_end );
	+ Impl::g_serial_thread_team_data.scratch_assign( (void*) 0, 0, 0, 0, 0, 0 );
	}

	- return s.m_scratch ;
	+ #if defined(KOKKOS_ENABLE_PROFILING)
	+ Kokkos::Profiling::finalize();
	+ #endif
	}

	} // namespace Kokkos

	#endif // defined( KOKKOS_ENABLE_SERIAL )


	diff --git a/lib/kokkos/core/src/impl/Kokkos_Serial_Task.cpp b/lib/kokkos/core/src/impl/Kokkos_Serial_Task.cpp
	index 19f3abe71..d22d604fb 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_Serial_Task.cpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_Serial_Task.cpp
	@@ -1,148 +1,152 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Core.hpp>

	#if defined( KOKKOS_ENABLE_SERIAL ) && defined( KOKKOS_ENABLE_TASKDAG )

	#include <impl/Kokkos_Serial_Task.hpp>
	#include <impl/Kokkos_TaskQueue_impl.hpp>

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	template class TaskQueue< Kokkos::Serial > ;

	void TaskQueueSpecialization< Kokkos::Serial >::execute
	( TaskQueue< Kokkos::Serial > * const queue )
	{
	using execution_space = Kokkos::Serial ;
	using queue_type = TaskQueue< execution_space > ;
	using task_root_type = TaskBase< execution_space , void , void > ;
	- using Member = TaskExec< execution_space > ;
	+ using Member = Impl::HostThreadTeamMember< execution_space > ;

	task_root_type * const end = (task_root_type *) task_root_type::EndTag ;

	- Member exec ;
	+ Impl::HostThreadTeamData * const data = Impl::serial_get_thread_team_data();
	+
	+ Member exec( *data );

	// Loop until all queues are empty
	while ( 0 < queue->m_ready_count ) {

	task_root_type * task = end ;

	for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
	for ( int j = 0 ; j < 2 && end == task ; ++j ) {
	- task = queue_type::pop_task( & queue->m_ready[i][j] );
	+ task = queue_type::pop_ready_task( & queue->m_ready[i][j] );
	}
	}

	if ( end != task ) {

	- // pop_task resulted in lock == task->m_next
	+ // pop_ready_task resulted in lock == task->m_next
	// In the executing state

	(*task->m_apply)( task , & exec );

	#if 0
	printf( "TaskQueue<Serial>::executed: 0x%lx { 0x%lx 0x%lx %d %d %d }\n"
	, uintptr_t(task)
	, uintptr_t(task->m_wait)
	, uintptr_t(task->m_next)
	, task->m_task_type
	, task->m_priority
	, task->m_ref_count );
	#endif

	// If a respawn then re-enqueue otherwise the task is complete
	// and all tasks waiting on this task are updated.
	queue->complete( task );
	}
	else if ( 0 != queue->m_ready_count ) {
	Kokkos::abort("TaskQueue<Serial>::execute ERROR: ready_count");
	}
	}
	}

	void TaskQueueSpecialization< Kokkos::Serial > ::
	iff_single_thread_recursive_execute(
	TaskQueue< Kokkos::Serial > * const queue )
	{
	using execution_space = Kokkos::Serial ;
	using queue_type = TaskQueue< execution_space > ;
	using task_root_type = TaskBase< execution_space , void , void > ;
	- using Member = TaskExec< execution_space > ;
	+ using Member = Impl::HostThreadTeamMember< execution_space > ;

	task_root_type * const end = (task_root_type *) task_root_type::EndTag ;

	- Member exec ;
	+ Impl::HostThreadTeamData * const data = Impl::serial_get_thread_team_data();
	+
	+ Member exec( *data );

	// Loop until no runnable task

	task_root_type * task = end ;

	do {

	task = end ;

	for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
	for ( int j = 0 ; j < 2 && end == task ; ++j ) {
	- task = queue_type::pop_task( & queue->m_ready[i][j] );
	+ task = queue_type::pop_ready_task( & queue->m_ready[i][j] );
	}
	}

	if ( end == task ) break ;

	(*task->m_apply)( task , & exec );

	queue->complete( task );

	} while(1);
	}

	}} /* namespace Kokkos::Impl */

	#endif /* #if defined( KOKKOS_ENABLE_SERIAL ) && defined( KOKKOS_ENABLE_TASKDAG ) */

	diff --git a/lib/kokkos/core/src/impl/Kokkos_Serial_Task.hpp b/lib/kokkos/core/src/impl/Kokkos_Serial_Task.hpp
	index 178305c5d..ac7f17c0e 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_Serial_Task.hpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_Serial_Task.hpp
	@@ -1,308 +1,91 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_IMPL_SERIAL_TASK_HPP
	#define KOKKOS_IMPL_SERIAL_TASK_HPP

	#if defined( KOKKOS_ENABLE_TASKDAG )

	#include <impl/Kokkos_TaskQueue.hpp>

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	//----------------------------------------------------------------------------

	template<>
	class TaskQueueSpecialization< Kokkos::Serial >
	{
	public:

	using execution_space = Kokkos::Serial ;
	using memory_space = Kokkos::HostSpace ;
	using queue_type = Kokkos::Impl::TaskQueue< execution_space > ;
	using task_base_type = Kokkos::Impl::TaskBase< execution_space , void , void > ;
	+ using member_type = Kokkos::Impl::HostThreadTeamMember< execution_space > ;

	static
	void iff_single_thread_recursive_execute( queue_type * const );

	static
	void execute( queue_type * const );

	- template< typename FunctorType >
	+ template< typename TaskType >
	static
	- void proc_set_apply( task_base_type::function_type * ptr )
	- {
	- using TaskType = TaskBase< Kokkos::Serial
	- , typename FunctorType::value_type
	- , FunctorType
	- > ;
	- *ptr = TaskType::apply ;
	- }
	+ typename TaskType::function_type
	+ get_function_pointer() { return TaskType::apply ; }
	};

	extern template class TaskQueue< Kokkos::Serial > ;

	-//----------------------------------------------------------------------------
	-
	-template<>
	-class TaskExec< Kokkos::Serial >
	-{
	-public:
	-
	- KOKKOS_INLINE_FUNCTION void team_barrier() const {}
	- KOKKOS_INLINE_FUNCTION int team_rank() const { return 0 ; }
	- KOKKOS_INLINE_FUNCTION int team_size() const { return 1 ; }
	-};
	-
	-template<typename iType>
	-struct TeamThreadRangeBoundariesStruct<iType, TaskExec< Kokkos::Serial > >
	-{
	- typedef iType index_type;
	- const iType start ;
	- const iType end ;
	- enum {increment = 1};
	- //const TaskExec< Kokkos::Serial > & thread;
	- TaskExec< Kokkos::Serial > & thread;
	-
	- KOKKOS_INLINE_FUNCTION
	- TeamThreadRangeBoundariesStruct
	- //( const TaskExec< Kokkos::Serial > & arg_thread, const iType& arg_count)
	- ( TaskExec< Kokkos::Serial > & arg_thread, const iType& arg_count)
	- : start(0)
	- , end(arg_count)
	- , thread(arg_thread)
	- {}
	-
	- KOKKOS_INLINE_FUNCTION
	- TeamThreadRangeBoundariesStruct
	- //( const TaskExec< Kokkos::Serial > & arg_thread
	- ( TaskExec< Kokkos::Serial > & arg_thread
	- , const iType& arg_start
	- , const iType & arg_end
	- )
	- : start( arg_start )
	- , end( arg_end)
	- , thread( arg_thread )
	- {}
	-};
	-
	-//----------------------------------------------------------------------------
	-
	-template<typename iType>
	-struct ThreadVectorRangeBoundariesStruct<iType, TaskExec< Kokkos::Serial > >
	-{
	- typedef iType index_type;
	- const iType start ;
	- const iType end ;
	- enum {increment = 1};
	- TaskExec< Kokkos::Serial > & thread;
	-
	- KOKKOS_INLINE_FUNCTION
	- ThreadVectorRangeBoundariesStruct
	- ( TaskExec< Kokkos::Serial > & arg_thread, const iType& arg_count)
	- : start( 0 )
	- , end(arg_count)
	- , thread(arg_thread)
	- {}
	-};
	-
	}} /* namespace Kokkos::Impl */

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	-namespace Kokkos {
	-
	-// OMP version needs non-const TaskExec
	-template< typename iType >
	-KOKKOS_INLINE_FUNCTION
	-Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Serial > >
	-TeamThreadRange( Impl::TaskExec< Kokkos::Serial > & thread, const iType & count )
	-{
	- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Serial > >( thread, count );
	-}
	-
	-// OMP version needs non-const TaskExec
	-template< typename iType1, typename iType2 >
	-KOKKOS_INLINE_FUNCTION
	-Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
	- Impl::TaskExec< Kokkos::Serial > >
	-TeamThreadRange( Impl::TaskExec< Kokkos::Serial > & thread, const iType1 & start, const iType2 & end )
	-{
	- typedef typename std::common_type< iType1, iType2 >::type iType;
	- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Serial > >(
	- thread, iType(start), iType(end) );
	-}
	-
	-// OMP version needs non-const TaskExec
	-template<typename iType>
	-KOKKOS_INLINE_FUNCTION
	-Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >
	-ThreadVectorRange
	- ( Impl::TaskExec< Kokkos::Serial > & thread
	- , const iType & count )
	-{
	- return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >(thread,count);
	-}
	-
	- /** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
	- *
	- * The range i=0..N-1 is mapped to all threads of the the calling thread team.
	- * This functionality requires C++11 support.*/
	-template<typename iType, class Lambda>
	-KOKKOS_INLINE_FUNCTION
	-void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries, const Lambda& lambda) {
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
	- lambda(i);
	-}
	-
	-template< typename iType, class Lambda, typename ValueType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce
	- (const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
	- const Lambda & lambda,
	- ValueType& initialized_result)
	-{
	-
	- ValueType result = initialized_result;
	-
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
	- lambda(i, result);
	-
	- initialized_result = result;
	-}
	-
	-template< typename iType, class Lambda, typename ValueType, class JoinType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce
	- (const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
	- const Lambda & lambda,
	- const JoinType & join,
	- ValueType& initialized_result)
	-{
	- ValueType result = initialized_result;
	-
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
	- lambda(i, result);
	-
	- initialized_result = result;
	-}
	-
	-template< typename iType, class Lambda, typename ValueType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce
	- (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
	- const Lambda & lambda,
	- ValueType& initialized_result)
	-{
	- initialized_result = ValueType();
	-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	-#pragma ivdep
	-#endif
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- ValueType tmp = ValueType();
	- lambda(i,tmp);
	- initialized_result+=tmp;
	- }
	-}
	-
	-template< typename iType, class Lambda, typename ValueType, class JoinType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_reduce
	- (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
	- const Lambda & lambda,
	- const JoinType & join,
	- ValueType& initialized_result)
	-{
	- ValueType result = initialized_result;
	-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	-#pragma ivdep
	-#endif
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- ValueType tmp = ValueType();
	- lambda(i,tmp);
	- join(result,tmp);
	- }
	- initialized_result = result;
	-}
	-
	-template< typename ValueType, typename iType, class Lambda >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_scan
	- (const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
	- const Lambda & lambda)
	-{
	- ValueType accum = 0 ;
	- ValueType val, local_total;
	-
	- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
	- local_total = 0;
	- lambda(i,local_total,false);
	- val = accum;
	- lambda(i,val,true);
	- accum += local_total;
	- }
	-
	-}
	-
	-// placeholder for future function
	-template< typename iType, class Lambda, typename ValueType >
	-KOKKOS_INLINE_FUNCTION
	-void parallel_scan
	- (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
	- const Lambda & lambda)
	-{
	-}
	-
	-} /* namespace Kokkos */
	-
	-//----------------------------------------------------------------------------
	-//----------------------------------------------------------------------------
	-
	#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
	#endif /* #ifndef KOKKOS_IMPL_SERIAL_TASK_HPP */

	diff --git a/lib/kokkos/core/src/impl/Kokkos_Synchronic.hpp b/lib/kokkos/core/src/impl/Kokkos_Synchronic.hpp
	deleted file mode 100644
	index b2aea14df..000000000
	--- a/lib/kokkos/core/src/impl/Kokkos_Synchronic.hpp
	+++ /dev/null
	@@ -1,693 +0,0 @@
	-/*
	-
	-Copyright (c) 2014, NVIDIA Corporation
	-All rights reserved.
	-
	-Redistribution and use in source and binary forms, with or without modification,
	-are permitted provided that the following conditions are met:
	-
	-1. Redistributions of source code must retain the above copyright notice, this
	-list of conditions and the following disclaimer.
	-
	-2. Redistributions in binary form must reproduce the above copyright notice,
	-this list of conditions and the following disclaimer in the documentation
	-and/or other materials provided with the distribution.
	-
	-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
	-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
	-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
	-IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
	-INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
	-BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
	-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	-LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
	-OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
	-OF THE POSSIBILITY OF SUCH DAMAGE.
	-
	-*/
	-
	-#ifndef KOKKOS_SYNCHRONIC_HPP
	-#define KOKKOS_SYNCHRONIC_HPP
	-
	-#include <impl/Kokkos_Synchronic_Config.hpp>
	-
	-#include <atomic>
	-#include <chrono>
	-#include <thread>
	-#include <functional>
	-#include <algorithm>
	-
	-namespace Kokkos {
	-namespace Impl {
	-
	-enum notify_hint {
	- notify_all,
	- notify_one,
	- notify_none
	-};
	-enum expect_hint {
	- expect_urgent,
	- expect_delay
	-};
	-
	-namespace Details {
	-
	-template <class S, class T>
	-bool __synchronic_spin_wait_for_update(S const& arg, T const& nval, int attempts) noexcept {
	- int i = 0;
	- for(;i < __SYNCHRONIC_SPIN_RELAX(attempts); ++i)
	- if(__builtin_expect(arg.load(std::memory_order_relaxed) != nval,1))
	- return true;
	- else
	- __synchronic_relax();
	- for(;i < attempts; ++i)
	- if(__builtin_expect(arg.load(std::memory_order_relaxed) != nval,1))
	- return true;
	- else
	- __synchronic_yield();
	- return false;
	-}
	-
	-struct __exponential_backoff {
	- __exponential_backoff(int arg_maximum=512) : maximum(arg_maximum), microseconds(8), x(123456789), y(362436069), z(521288629) {
	- }
	- static inline void sleep_for(std::chrono::microseconds const& time) {
	- auto t = time.count();
	- if(__builtin_expect(t > 75,0)) {
	- portable_sleep(time);
	- }
	- else if(__builtin_expect(t > 25,0))
	- __synchronic_yield();
	- else
	- __synchronic_relax();
	- }
	- void sleep_for_step() {
	- sleep_for(step());
	- }
	- std::chrono::microseconds step() {
	- float const f = ranfu();
	- int const t = int(microseconds * f);
	- if(__builtin_expect(f >= 0.95f,0))
	- microseconds = 8;
	- else
	- microseconds = (std::min)(microseconds>>1,maximum);
	- return std::chrono::microseconds(t);
	- }
	-private :
	- int maximum, microseconds, x, y, z;
	- int xorshf96() {
	- int t;
	- x ^= x << 16; x ^= x >> 5; x ^= x << 1;
	- t = x; x = y; y = z; z = t ^ x ^ y;
	- return z;
	- }
	- float ranfu() {
	- return (float)(xorshf96()&(~0UL>>1)) / (float)(~0UL>>1);
	- }
	-};
	-
	-template <class T, class Enable = void>
	-struct __synchronic_base {
	-
	-protected:
	- std::atomic<T> atom;
	-
	- void notify(notify_hint = notify_all) noexcept {
	- }
	- void notify(notify_hint = notify_all) volatile noexcept {
	- }
	-
	-public :
	- __synchronic_base() noexcept = default;
	- constexpr __synchronic_base(T v) noexcept : atom(v) { }
	- __synchronic_base(const __synchronic_base&) = delete;
	- ~__synchronic_base() { }
	- __synchronic_base& operator=(const __synchronic_base&) = delete;
	- __synchronic_base& operator=(const __synchronic_base&) volatile = delete;
	-
	- void expect_update(T val, expect_hint = expect_urgent) const noexcept {
	- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_A))
	- return;
	- __exponential_backoff b;
	- while(atom.load(std::memory_order_relaxed) == val) {
	- __do_backoff(b);
	- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_B))
	- return;
	- }
	- }
	- void expect_update(T val, expect_hint = expect_urgent) const volatile noexcept {
	- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_A))
	- return;
	- __exponential_backoff b;
	- while(atom.load(std::memory_order_relaxed) == val) {
	- __do_backoff(b);
	- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_B))
	- return;
	- }
	- }
	-
	- template <class Clock, class Duration>
	- void expect_update_until(T val, std::chrono::time_point<Clock,Duration> const& then, expect_hint = expect_urgent) const {
	- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_A))
	- return;
	- __exponential_backoff b;
	- std::chrono::milliseconds remains = then - std::chrono::high_resolution_clock::now();
	- while(remains > std::chrono::milliseconds::zero() && atom.load(std::memory_order_relaxed) == val) {
	- __do_backoff(b);
	- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_B))
	- return;
	- remains = then - std::chrono::high_resolution_clock::now();
	- }
	- }
	- template <class Clock, class Duration>
	- void expect_update_until(T val, std::chrono::time_point<Clock,Duration> const& then, expect_hint = expect_urgent) const volatile {
	- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_A))
	- return;
	- __exponential_backoff b;
	- std::chrono::milliseconds remains = then - std::chrono::high_resolution_clock::now();
	- while(remains > std::chrono::milliseconds::zero() && atom.load(std::memory_order_relaxed) == val) {
	- __do_backoff(b);
	- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_B))
	- return;
	- remains = then - std::chrono::high_resolution_clock::now();
	- }
	- }
	-};
	-
	-#ifdef __SYNCHRONIC_COMPATIBLE
	-template <class T>
	-struct __synchronic_base<T, typename std::enable_if<__SYNCHRONIC_COMPATIBLE(T)>::type> {
	-
	-public:
	- std::atomic<T> atom;
	-
	- void notify(notify_hint hint = notify_all) noexcept {
	- if(__builtin_expect(hint == notify_none,1))
	- return;
	- auto const x = count.fetch_add(0,std::memory_order_acq_rel);
	- if(__builtin_expect(x,0)) {
	- if(__builtin_expect(hint == notify_all,1))
	- __synchronic_wake_all(&atom);
	- else
	- __synchronic_wake_one(&atom);
	- }
	- }
	- void notify(notify_hint hint = notify_all) volatile noexcept {
	- if(__builtin_expect(hint == notify_none,1))
	- return;
	- auto const x = count.fetch_add(0,std::memory_order_acq_rel);
	- if(__builtin_expect(x,0)) {
	- if(__builtin_expect(hint == notify_all,1))
	- __synchronic_wake_all_volatile(&atom);
	- else
	- __synchronic_wake_one_volatile(&atom);
	- }
	- }
	-
	-public :
	- __synchronic_base() noexcept : count(0) { }
	- constexpr __synchronic_base(T v) noexcept : atom(v), count(0) { }
	- __synchronic_base(const __synchronic_base&) = delete;
	- ~__synchronic_base() { }
	- __synchronic_base& operator=(const __synchronic_base&) = delete;
	- __synchronic_base& operator=(const __synchronic_base&) volatile = delete;
	-
	- void expect_update(T val, expect_hint = expect_urgent) const noexcept {
	- if(__builtin_expect(__synchronic_spin_wait_for_update(atom, val,__SYNCHRONIC_SPIN_COUNT_A),1))
	- return;
	- while(__builtin_expect(atom.load(std::memory_order_relaxed) == val,1)) {
	- count.fetch_add(1,std::memory_order_release);
	- __synchronic_wait(&atom,val);
	- count.fetch_add(-1,std::memory_order_acquire);
	- }
	- }
	- void expect_update(T val, expect_hint = expect_urgent) const volatile noexcept {
	- if(__builtin_expect(__synchronic_spin_wait_for_update(atom, val,__SYNCHRONIC_SPIN_COUNT_A),1))
	- return;
	- while(__builtin_expect(atom.load(std::memory_order_relaxed) == val,1)) {
	- count.fetch_add(1,std::memory_order_release);
	- __synchronic_wait_volatile(&atom,val);
	- count.fetch_add(-1,std::memory_order_acquire);
	- }
	- }
	-
	- template <class Clock, class Duration>
	- void expect_update_until(T val, std::chrono::time_point<Clock,Duration> const& then, expect_hint = expect_urgent) const {
	- if(__builtin_expect(__synchronic_spin_wait_for_update(atom, val,__SYNCHRONIC_SPIN_COUNT_A),1))
	- return;
	- std::chrono::milliseconds remains = then - std::chrono::high_resolution_clock::now();
	- while(__builtin_expect(remains > std::chrono::milliseconds::zero() && atom.load(std::memory_order_relaxed) == val,1)) {
	- count.fetch_add(1,std::memory_order_release);
	- __synchronic_wait_timed(&atom,val,remains);
	- count.fetch_add(-1,std::memory_order_acquire);
	- remains = then - std::chrono::high_resolution_clock::now();
	- }
	- }
	- template <class Clock, class Duration>
	- void expect_update_until(T val, std::chrono::time_point<Clock,Duration> const& then, expect_hint = expect_urgent) const volatile {
	- if(__builtin_expect(__synchronic_spin_wait_for_update(atom, val,__SYNCHRONIC_SPIN_COUNT_A),1))
	- return;
	- std::chrono::milliseconds remains = then - std::chrono::high_resolution_clock::now();
	- while(__builtin_expect(remains > std::chrono::milliseconds::zero() && atom.load(std::memory_order_relaxed) == val,1)) {
	- count.fetch_add(1,std::memory_order_release);
	- __synchronic_wait_timed_volatile(&atom,val,remains);
	- count.fetch_add(-1,std::memory_order_acquire);
	- remains = then - std::chrono::high_resolution_clock::now();
	- }
	- }
	-private:
	- mutable std::atomic<int> count;
	-};
	-#endif
	-
	-template <class T, class Enable = void>
	-struct __synchronic : public __synchronic_base<T> {
	-
	- __synchronic() noexcept = default;
	- constexpr __synchronic(T v) noexcept : __synchronic_base<T>(v) { }
	- __synchronic(const __synchronic&) = delete;
	- __synchronic& operator=(const __synchronic&) = delete;
	- __synchronic& operator=(const __synchronic&) volatile = delete;
	-};
	-
	-template <class T>
	-struct __synchronic<T,typename std::enable_if<std::is_integral<T>::value>::type> : public __synchronic_base<T> {
	-
	- T fetch_add(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
	- auto const t = this->atom.fetch_add(v,m);
	- this->notify(n);
	- return t;
	- }
	- T fetch_add(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
	- auto const t = this->atom.fetch_add(v,m);
	- this->notify(n);
	- return t;
	- }
	- T fetch_sub(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
	- auto const t = this->atom.fetch_sub(v,m);
	- this->notify(n);
	- return t;
	- }
	- T fetch_sub(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
	- auto const t = this->atom.fetch_sub(v,m);
	- this->notify(n);
	- return t;
	- }
	- T fetch_and(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
	- auto const t = this->atom.fetch_and(v,m);
	- this->notify(n);
	- return t;
	- }
	- T fetch_and(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
	- auto const t = this->atom.fetch_and(v,m);
	- this->notify(n);
	- return t;
	- }
	- T fetch_or(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
	- auto const t = this->atom.fetch_or(v,m);
	- this->notify(n);
	- return t;
	- }
	- T fetch_or(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
	- auto const t = this->atom.fetch_or(v,m);
	- this->notify(n);
	- return t;
	- }
	- T fetch_xor(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
	- auto const t = this->atom.fetch_xor(v,m);
	- this->notify(n);
	- return t;
	- }
	- T fetch_xor(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
	- auto const t = this->atom.fetch_xor(v,m);
	- this->notify(n);
	- return t;
	- }
	-
	- __synchronic() noexcept = default;
	- constexpr __synchronic(T v) noexcept : __synchronic_base<T>(v) { }
	- __synchronic(const __synchronic&) = delete;
	- __synchronic& operator=(const __synchronic&) = delete;
	- __synchronic& operator=(const __synchronic&) volatile = delete;
	-
	- T operator=(T v) volatile noexcept {
	- auto const t = this->atom = v;
	- this->notify();
	- return t;
	- }
	- T operator=(T v) noexcept {
	- auto const t = this->atom = v;
	- this->notify();
	- return t;
	- }
	- T operator++(int) volatile noexcept {
	- auto const t = ++this->atom;
	- this->notify();
	- return t;
	- }
	- T operator++(int) noexcept {
	- auto const t = ++this->atom;
	- this->notify();
	- return t;
	- }
	- T operator--(int) volatile noexcept {
	- auto const t = --this->atom;
	- this->notify();
	- return t;
	- }
	- T operator--(int) noexcept {
	- auto const t = --this->atom;
	- this->notify();
	- return t;
	- }
	- T operator++() volatile noexcept {
	- auto const t = this->atom++;
	- this->notify();
	- return t;
	- }
	- T operator++() noexcept {
	- auto const t = this->atom++;
	- this->notify();
	- return t;
	- }
	- T operator--() volatile noexcept {
	- auto const t = this->atom--;
	- this->notify();
	- return t;
	- }
	- T operator--() noexcept {
	- auto const t = this->atom--;
	- this->notify();
	- return t;
	- }
	- T operator+=(T v) volatile noexcept {
	- auto const t = this->atom += v;
	- this->notify();
	- return t;
	- }
	- T operator+=(T v) noexcept {
	- auto const t = this->atom += v;
	- this->notify();
	- return t;
	- }
	- T operator-=(T v) volatile noexcept {
	- auto const t = this->atom -= v;
	- this->notify();
	- return t;
	- }
	- T operator-=(T v) noexcept {
	- auto const t = this->atom -= v;
	- this->notify();
	- return t;
	- }
	- T operator&=(T v) volatile noexcept {
	- auto const t = this->atom &= v;
	- this->notify();
	- return t;
	- }
	- T operator&=(T v) noexcept {
	- auto const t = this->atom &= v;
	- this->notify();
	- return t;
	- }
	- T operator\|=(T v) volatile noexcept {
	- auto const t = this->atom \|= v;
	- this->notify();
	- return t;
	- }
	- T operator\|=(T v) noexcept {
	- auto const t = this->atom \|= v;
	- this->notify();
	- return t;
	- }
	- T operator^=(T v) volatile noexcept {
	- auto const t = this->atom ^= v;
	- this->notify();
	- return t;
	- }
	- T operator^=(T v) noexcept {
	- auto const t = this->atom ^= v;
	- this->notify();
	- return t;
	- }
	-};
	-
	-template <class T>
	-struct __synchronic<T> : public __synchronic_base<T> {
	-
	- T* fetch_add(ptrdiff_t v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
	- auto const t = this->atom.fetch_add(v,m);
	- this->notify(n);
	- return t;
	- }
	- T* fetch_add(ptrdiff_t v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
	- auto const t = this->atom.fetch_add(v,m);
	- this->notify(n);
	- return t;
	- }
	- T* fetch_sub(ptrdiff_t v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
	- auto const t = this->atom.fetch_sub(v,m);
	- this->notify(n);
	- return t;
	- }
	- T* fetch_sub(ptrdiff_t v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
	- auto const t = this->atom.fetch_sub(v,m);
	- this->notify(n);
	- return t;
	- }
	-
	- __synchronic() noexcept = default;
	- constexpr __synchronic(T* v) noexcept : __synchronic_base<T*>(v) { }
	- __synchronic(const __synchronic&) = delete;
	- __synchronic& operator=(const __synchronic&) = delete;
	- __synchronic& operator=(const __synchronic&) volatile = delete;
	-
	- T* operator=(T* v) volatile noexcept {
	- auto const t = this->atom = v;
	- this->notify();
	- return t;
	- }
	- T* operator=(T* v) noexcept {
	- auto const t = this->atom = v;
	- this->notify();
	- return t;
	- }
	- T* operator++(int) volatile noexcept {
	- auto const t = ++this->atom;
	- this->notify();
	- return t;
	- }
	- T* operator++(int) noexcept {
	- auto const t = ++this->atom;
	- this->notify();
	- return t;
	- }
	- T* operator--(int) volatile noexcept {
	- auto const t = --this->atom;
	- this->notify();
	- return t;
	- }
	- T* operator--(int) noexcept {
	- auto const t = --this->atom;
	- this->notify();
	- return t;
	- }
	- T* operator++() volatile noexcept {
	- auto const t = this->atom++;
	- this->notify();
	- return t;
	- }
	- T* operator++() noexcept {
	- auto const t = this->atom++;
	- this->notify();
	- return t;
	- }
	- T* operator--() volatile noexcept {
	- auto const t = this->atom--;
	- this->notify();
	- return t;
	- }
	- T* operator--() noexcept {
	- auto const t = this->atom--;
	- this->notify();
	- return t;
	- }
	- T* operator+=(ptrdiff_t v) volatile noexcept {
	- auto const t = this->atom += v;
	- this->notify();
	- return t;
	- }
	- T* operator+=(ptrdiff_t v) noexcept {
	- auto const t = this->atom += v;
	- this->notify();
	- return t;
	- }
	- T* operator-=(ptrdiff_t v) volatile noexcept {
	- auto const t = this->atom -= v;
	- this->notify();
	- return t;
	- }
	- T* operator-=(ptrdiff_t v) noexcept {
	- auto const t = this->atom -= v;
	- this->notify();
	- return t;
	- }
	-};
	-
	-} //namespace Details
	-
	-template <class T>
	-struct synchronic : public Details::__synchronic<T> {
	-
	- bool is_lock_free() const volatile noexcept { return this->atom.is_lock_free(); }
	- bool is_lock_free() const noexcept { return this->atom.is_lock_free(); }
	- void store(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
	- this->atom.store(v,m);
	- this->notify(n);
	- }
	- void store(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
	- this->atom.store(v,m);
	- this->notify(n);
	- }
	- T load(std::memory_order m = std::memory_order_seq_cst) const volatile noexcept { return this->atom.load(m); }
	- T load(std::memory_order m = std::memory_order_seq_cst) const noexcept { return this->atom.load(m); }
	-
	- operator T() const volatile noexcept { return (T)this->atom; }
	- operator T() const noexcept { return (T)this->atom; }
	-
	- T exchange(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
	- auto const t = this->atom.exchange(v,m);
	- this->notify(n);
	- return t;
	- }
	- T exchange(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
	- auto const t = this->atom.exchange(v,m);
	- this->notify(n);
	- return t;
	- }
	- bool compare_exchange_weak(T& r, T v, std::memory_order m1, std::memory_order m2, notify_hint n = notify_all) volatile noexcept {
	- auto const t = this->atom.compare_exchange_weak(r,v,m1,m2);
	- this->notify(n);
	- return t;
	- }
	- bool compare_exchange_weak(T& r, T v, std::memory_order m1, std::memory_order m2, notify_hint n = notify_all) noexcept {
	- auto const t = this->atom.compare_exchange_weak(r,v,m1, m2);
	- this->notify(n);
	- return t;
	- }
	- bool compare_exchange_strong(T& r, T v, std::memory_order m1, std::memory_order m2, notify_hint n = notify_all) volatile noexcept {
	- auto const t = this->atom.compare_exchange_strong(r,v,m1,m2);
	- this->notify(n);
	- return t;
	- }
	- bool compare_exchange_strong(T& r, T v, std::memory_order m1, std::memory_order m2, notify_hint n = notify_all) noexcept {
	- auto const t = this->atom.compare_exchange_strong(r,v,m1,m2);
	- this->notify(n);
	- return t;
	- }
	- bool compare_exchange_weak(T& r, T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
	- auto const t = this->atom.compare_exchange_weak(r,v,m);
	- this->notify(n);
	- return t;
	- }
	- bool compare_exchange_weak(T& r, T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
	- auto const t = this->atom.compare_exchange_weak(r,v,m);
	- this->notify(n);
	- return t;
	- }
	- bool compare_exchange_strong(T& r, T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
	- auto const t = this->atom.compare_exchange_strong(r,v,m);
	- this->notify(n);
	- return t;
	- }
	- bool compare_exchange_strong(T& r, T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
	- auto const t = this->atom.compare_exchange_strong(r,v,m);
	- this->notify(n);
	- return t;
	- }
	-
	- synchronic() noexcept = default;
	- constexpr synchronic(T val) noexcept : Details::__synchronic<T>(val) { }
	- synchronic(const synchronic&) = delete;
	- ~synchronic() { }
	- synchronic& operator=(const synchronic&) = delete;
	- synchronic& operator=(const synchronic&) volatile = delete;
	- T operator=(T val) noexcept {
	- return Details::__synchronic<T>::operator=(val);
	- }
	- T operator=(T val) volatile noexcept {
	- return Details::__synchronic<T>::operator=(val);
	- }
	-
	- T load_when_not_equal(T val, std::memory_order order = std::memory_order_seq_cst, expect_hint h = expect_urgent) const noexcept {
	- Details::__synchronic<T>::expect_update(val,h);
	- return load(order);
	- }
	- T load_when_not_equal(T val, std::memory_order order = std::memory_order_seq_cst, expect_hint h = expect_urgent) const volatile noexcept {
	- Details::__synchronic<T>::expect_update(val,h);
	- return load(order);
	- }
	- T load_when_equal(T val, std::memory_order order = std::memory_order_seq_cst, expect_hint h = expect_urgent) const noexcept {
	- for(T nval = load(std::memory_order_relaxed); nval != val; nval = load(std::memory_order_relaxed))
	- Details::__synchronic<T>::expect_update(nval,h);
	- return load(order);
	- }
	- T load_when_equal(T val, std::memory_order order = std::memory_order_seq_cst, expect_hint h = expect_urgent) const volatile noexcept {
	- for(T nval = load(std::memory_order_relaxed); nval != val; nval = load(std::memory_order_relaxed))
	- expect_update(nval,h);
	- return load(order);
	- }
	- template <class Rep, class Period>
	- void expect_update_for(T val, std::chrono::duration<Rep,Period> const& delta, expect_hint h = expect_urgent) const {
	- Details::__synchronic<T>::expect_update_until(val, std::chrono::high_resolution_clock::now() + delta,h);
	- }
	- template < class Rep, class Period>
	- void expect_update_for(T val, std::chrono::duration<Rep,Period> const& delta, expect_hint h = expect_urgent) const volatile {
	- Details::__synchronic<T>::expect_update_until(val, std::chrono::high_resolution_clock::now() + delta,h);
	- }
	-};
	-
	-#include <inttypes.h>
	-
	-typedef synchronic<char> synchronic_char;
	-typedef synchronic<char> synchronic_schar;
	-typedef synchronic<unsigned char> synchronic_uchar;
	-typedef synchronic<short> synchronic_short;
	-typedef synchronic<unsigned short> synchronic_ushort;
	-typedef synchronic<int> synchronic_int;
	-typedef synchronic<unsigned int> synchronic_uint;
	-typedef synchronic<long> synchronic_long;
	-typedef synchronic<unsigned long> synchronic_ulong;
	-typedef synchronic<long long> synchronic_llong;
	-typedef synchronic<unsigned long long> synchronic_ullong;
	-//typedef synchronic<char16_t> synchronic_char16_t;
	-//typedef synchronic<char32_t> synchronic_char32_t;
	-typedef synchronic<wchar_t> synchronic_wchar_t;
	-
	-typedef synchronic<int_least8_t> synchronic_int_least8_t;
	-typedef synchronic<uint_least8_t> synchronic_uint_least8_t;
	-typedef synchronic<int_least16_t> synchronic_int_least16_t;
	-typedef synchronic<uint_least16_t> synchronic_uint_least16_t;
	-typedef synchronic<int_least32_t> synchronic_int_least32_t;
	-typedef synchronic<uint_least32_t> synchronic_uint_least32_t;
	-//typedef synchronic<int_least_64_t> synchronic_int_least_64_t;
	-typedef synchronic<uint_least64_t> synchronic_uint_least64_t;
	-typedef synchronic<int_fast8_t> synchronic_int_fast8_t;
	-typedef synchronic<uint_fast8_t> synchronic_uint_fast8_t;
	-typedef synchronic<int_fast16_t> synchronic_int_fast16_t;
	-typedef synchronic<uint_fast16_t> synchronic_uint_fast16_t;
	-typedef synchronic<int_fast32_t> synchronic_int_fast32_t;
	-typedef synchronic<uint_fast32_t> synchronic_uint_fast32_t;
	-typedef synchronic<int_fast64_t> synchronic_int_fast64_t;
	-typedef synchronic<uint_fast64_t> synchronic_uint_fast64_t;
	-typedef synchronic<intptr_t> synchronic_intptr_t;
	-typedef synchronic<uintptr_t> synchronic_uintptr_t;
	-typedef synchronic<size_t> synchronic_size_t;
	-typedef synchronic<ptrdiff_t> synchronic_ptrdiff_t;
	-typedef synchronic<intmax_t> synchronic_intmax_t;
	-typedef synchronic<uintmax_t> synchronic_uintmax_t;
	-
	-}
	-}
	-
	-#endif //__SYNCHRONIC_H
	diff --git a/lib/kokkos/core/src/impl/Kokkos_Synchronic_Config.hpp b/lib/kokkos/core/src/impl/Kokkos_Synchronic_Config.hpp
	deleted file mode 100644
	index 0a6dd6e71..000000000
	--- a/lib/kokkos/core/src/impl/Kokkos_Synchronic_Config.hpp
	+++ /dev/null
	@@ -1,169 +0,0 @@
	-/*
	-
	-Copyright (c) 2014, NVIDIA Corporation
	-All rights reserved.
	-
	-Redistribution and use in source and binary forms, with or without modification,
	-are permitted provided that the following conditions are met:
	-
	-1. Redistributions of source code must retain the above copyright notice, this
	-list of conditions and the following disclaimer.
	-
	-2. Redistributions in binary form must reproduce the above copyright notice,
	-this list of conditions and the following disclaimer in the documentation
	-and/or other materials provided with the distribution.
	-
	-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
	-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
	-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
	-IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
	-INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
	-BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
	-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	-LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
	-OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
	-OF THE POSSIBILITY OF SUCH DAMAGE.
	-
	-*/
	-
	-#ifndef KOKKOS_SYNCHRONIC_CONFIG_H
	-#define KOKKOS_SYNCHRONIC_CONFIG_H
	-
	-#include <thread>
	-#include <chrono>
	-
	-namespace Kokkos {
	-namespace Impl {
	-
	-//the default yield function used inside the implementation is the Standard one
	-#define __synchronic_yield std::this_thread::yield
	-#define __synchronic_relax __synchronic_yield
	-
	-#if defined(_MSC_VER)
	- //this is a handy GCC optimization that I use inside the implementation
	- #define __builtin_expect(condition,common) condition
	- #if _MSC_VER <= 1800
	- //using certain keywords that VC++ temporarily doesn't support
	- #define _ALLOW_KEYWORD_MACROS
	- #define noexcept
	- #define constexpr
	- #endif
	- //yes, I define multiple assignment operators
	- #pragma warning(disable:4522)
	- //I don't understand how Windows is so bad at timing functions, but is OK
	- //with straight-up yield loops
	- #define __do_backoff(b) __synchronic_yield()
	-#else
	-#define __do_backoff(b) b.sleep_for_step()
	-#endif
	-
	-//certain platforms have efficient support for spin-waiting built into the operating system
	-#if defined(__linux__) \|\| (defined(_WIN32_WINNT) && _WIN32_WINNT >= 0x0602)
	-#if defined(_WIN32_WINNT)
	-#include <winsock2.h>
	-#include <Windows.h>
	- //the combination of WaitOnAddress and WakeByAddressAll is supported on Windows 8.1+
	- #define __synchronic_wait(x,v) WaitOnAddress((PVOID)x,(PVOID)&v,sizeof(v),-1)
	- #define __synchronic_wait_timed(x,v,t) WaitOnAddress((PVOID)x,(PVOID)&v,sizeof(v),std::chrono::duration_cast<std::chrono::milliseconds>(t).count())
	- #define __synchronic_wake_one(x) WakeByAddressSingle((PVOID)x)
	- #define __synchronic_wake_all(x) WakeByAddressAll((PVOID)x)
	- #define __synchronic_wait_volatile(x,v) WaitOnAddress((PVOID)x,(PVOID)&v,sizeof(v),-1)
	- #define __synchronic_wait_timed_volatile(x,v,t) WaitOnAddress((PVOID)x,(PVOID)&v,sizeof(v),std::chrono::duration_cast<std::chrono::milliseconds>(t).count())
	- #define __synchronic_wake_one_volatile(x) WakeByAddressSingle((PVOID)x)
	- #define __synchronic_wake_all_volatile(x) WakeByAddressAll((PVOID)x)
	- #define __SYNCHRONIC_COMPATIBLE(x) (std::is_pod<x>::value && (sizeof(x) <= 8))
	-
	- inline void native_sleep(unsigned long microseconds)
	- {
	- // What to do if microseconds is < 1000?
	- Sleep(microseconds / 1000);
	- }
	-
	- inline void native_yield()
	- {
	- SwitchToThread();
	- }
	-#elif defined(__linux__)
	- #include <chrono>
	- #include <time.h>
	- #include <unistd.h>
	- #include <pthread.h>
	- #include <linux/futex.h>
	- #include <sys/syscall.h>
	- #include <climits>
	- #include <cassert>
	- template < class Rep, class Period>
	- inline timespec to_timespec(std::chrono::duration<Rep,Period> const& delta) {
	- struct timespec ts;
	- ts.tv_sec = static_cast<long>(std::chrono::duration_cast<std::chrono::seconds>(delta).count());
	- assert(!ts.tv_sec);
	- ts.tv_nsec = static_cast<long>(std::chrono::duration_cast<std::chrono::nanoseconds>(delta).count());
	- return ts;
	- }
	- inline long futex(void const* addr1, int op, int val1) {
	- return syscall(SYS_futex, addr1, op, val1, 0, 0, 0);
	- }
	- inline long futex(void const* addr1, int op, int val1, struct timespec timeout) {
	- return syscall(SYS_futex, addr1, op, val1, &timeout, 0, 0);
	- }
	- inline void native_sleep(unsigned long microseconds)
	- {
	- usleep(microseconds);
	- }
	- inline void native_yield()
	- {
	- pthread_yield();
	- }
	-
	- //the combination of SYS_futex(WAIT) and SYS_futex(WAKE) is supported on all recent Linux distributions
	- #define __synchronic_wait(x,v) futex(x, FUTEX_WAIT_PRIVATE, v)
	- #define __synchronic_wait_timed(x,v,t) futex(x, FUTEX_WAIT_PRIVATE, v, to_timespec(t))
	- #define __synchronic_wake_one(x) futex(x, FUTEX_WAKE_PRIVATE, 1)
	- #define __synchronic_wake_all(x) futex(x, FUTEX_WAKE_PRIVATE, INT_MAX)
	- #define __synchronic_wait_volatile(x,v) futex(x, FUTEX_WAIT, v)
	- #define __synchronic_wait_volatile_timed(x,v,t) futex(x, FUTEX_WAIT, v, to_timespec(t))
	- #define __synchronic_wake_one_volatile(x) futex(x, FUTEX_WAKE, 1)
	- #define __synchronic_wake_all_volatile(x) futex(x, FUTEX_WAKE, INT_MAX)
	- #define __SYNCHRONIC_COMPATIBLE(x) (std::is_integral<x>::value && (sizeof(x) <= 4))
	-
	- //the yield function on Linux is better replaced by sched_yield, which is tuned for spin-waiting
	- #undef __synchronic_yield
	- #define __synchronic_yield sched_yield
	-
	- //for extremely short wait times, just let another hyper-thread run
	- #undef __synchronic_relax
	- #define __synchronic_relax() asm volatile("rep; nop" ::: "memory")
	-
	-#endif
	-#endif
	-
	-#ifdef _GLIBCXX_USE_NANOSLEEP
	-inline void portable_sleep(std::chrono::microseconds const& time)
	-{ std::this_thread::sleep_for(time); }
	-#else
	-inline void portable_sleep(std::chrono::microseconds const& time)
	-{ native_sleep(time.count()); }
	-#endif
	-
	-#ifdef _GLIBCXX_USE_SCHED_YIELD
	-inline void portable_yield()
	-{ std::this_thread::yield(); }
	-#else
	-inline void portable_yield()
	-{ native_yield(); }
	-#endif
	-
	-//this is the number of times we initially spin, on the first wait attempt
	-#define __SYNCHRONIC_SPIN_COUNT_A 16
	-
	-//this is how decide to yield instead of just spinning, 'c' is the current trip count
	-//#define __SYNCHRONIC_SPIN_YIELD(c) true
	-#define __SYNCHRONIC_SPIN_RELAX(c) (c>>3)
	-
	-//this is the number of times we normally spin, on every subsequent wait attempt
	-#define __SYNCHRONIC_SPIN_COUNT_B 8
	-
	-}
	-}
	-
	-#endif //__SYNCHRONIC_CONFIG_H
	diff --git a/lib/kokkos/core/src/impl/Kokkos_Synchronic_n3998.hpp b/lib/kokkos/core/src/impl/Kokkos_Synchronic_n3998.hpp
	deleted file mode 100644
	index facc8d6d8..000000000
	--- a/lib/kokkos/core/src/impl/Kokkos_Synchronic_n3998.hpp
	+++ /dev/null
	@@ -1,162 +0,0 @@
	-/*
	-
	-Copyright (c) 2014, NVIDIA Corporation
	-All rights reserved.
	-
	-Redistribution and use in source and binary forms, with or without modification,
	-are permitted provided that the following conditions are met:
	-
	-1. Redistributions of source code must retain the above copyright notice, this
	-list of conditions and the following disclaimer.
	-
	-2. Redistributions in binary form must reproduce the above copyright notice,
	-this list of conditions and the following disclaimer in the documentation
	-and/or other materials provided with the distribution.
	-
	-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
	-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
	-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
	-IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
	-INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
	-BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
	-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	-LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
	-OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
	-OF THE POSSIBILITY OF SUCH DAMAGE.
	-
	-*/
	-
	-#ifndef KOKKOS_SYNCHRONIC_N3998_HPP
	-#define KOKKOS_SYNCHRONIC_N3998_HPP
	-
	-#include <impl/Kokkos_Synchronic.hpp>
	-#include <functional>
	-
	-/*
	-In the section below, a synchronization point represents a point at which a
	-thread may block until a given synchronization condition has been reached or
	-at which it may notify other threads that a synchronization condition has
	-been achieved.
	-*/
	-namespace Kokkos { namespace Impl {
	-
	- /*
	- A latch maintains an internal counter that is initialized when the latch
	- is created. The synchronization condition is reached when the counter is
	- decremented to 0. Threads may block at a synchronization point waiting
	- for the condition to be reached. When the condition is reached, any such
	- blocked threads will be released.
	- */
	- struct latch {
	- latch(int val) : count(val), released(false) { }
	- latch(const latch&) = delete;
	- latch& operator=(const latch&) = delete;
	- ~latch( ) { }
	- void arrive( ) {
	- __arrive( );
	- }
	- void arrive_and_wait( ) {
	- if(!__arrive( ))
	- wait( );
	- }
	- void wait( ) {
	- while(!released.load_when_not_equal(false,std::memory_order_acquire))
	- ;
	- }
	- bool try_wait( ) {
	- return released.load(std::memory_order_acquire);
	- }
	- private:
	- bool __arrive( ) {
	- if(count.fetch_add(-1,std::memory_order_release)!=1)
	- return false;
	- released.store(true,std::memory_order_release);
	- return true;
	- }
	- std::atomic<int> count;
	- synchronic<bool> released;
	- };
	-
	- /*
	- A barrier is created with an initial value representing the number of threads
	- that can arrive at the synchronization point. When that many threads have
	- arrived, the synchronization condition is reached and the threads are
	- released. The barrier will then reset, and may be reused for a new cycle, in
	- which the same set of threads may arrive again at the synchronization point.
	- The same set of threads shall arrive at the barrier in each cycle, otherwise
	- the behaviour is undefined.
	- */
	- struct barrier {
	- barrier(int val) : expected(val), arrived(0), nexpected(val), epoch(0) { }
	- barrier(const barrier&) = delete;
	- barrier& operator=(const barrier&) = delete;
	- ~barrier() { }
	- void arrive_and_wait() {
	- int const myepoch = epoch.load(std::memory_order_relaxed);
	- if(!__arrive(myepoch))
	- while(epoch.load_when_not_equal(myepoch,std::memory_order_acquire) == myepoch)
	- ;
	- }
	- void arrive_and_drop() {
	- nexpected.fetch_add(-1,std::memory_order_relaxed);
	- __arrive(epoch.load(std::memory_order_relaxed));
	- }
	- private:
	- bool __arrive(int const myepoch) {
	- int const myresult = arrived.fetch_add(1,std::memory_order_acq_rel) + 1;
	- if(__builtin_expect(myresult == expected,0)) {
	- expected = nexpected.load(std::memory_order_relaxed);
	- arrived.store(0,std::memory_order_relaxed);
	- epoch.store(myepoch+1,std::memory_order_release);
	- return true;
	- }
	- return false;
	- }
	- int expected;
	- std::atomic<int> arrived, nexpected;
	- synchronic<int> epoch;
	- };
	-
	- /*
	- A notifying barrier behaves as a barrier, but is constructed with a callable
	- completion function that is invoked after all threads have arrived at the
	- synchronization point, and before the synchronization condition is reached.
	- The completion may modify the set of threads that arrives at the barrier in
	- each cycle.
	- */
	- struct notifying_barrier {
	- template <typename T>
	- notifying_barrier(int val, T && f) : expected(val), arrived(0), nexpected(val), epoch(0), completion(std::forward<T>(f)) { }
	- notifying_barrier(const notifying_barrier&) = delete;
	- notifying_barrier& operator=(const notifying_barrier&) = delete;
	- ~notifying_barrier( ) { }
	- void arrive_and_wait() {
	- int const myepoch = epoch.load(std::memory_order_relaxed);
	- if(!__arrive(myepoch))
	- while(epoch.load_when_not_equal(myepoch,std::memory_order_acquire) == myepoch)
	- ;
	- }
	- void arrive_and_drop() {
	- nexpected.fetch_add(-1,std::memory_order_relaxed);
	- __arrive(epoch.load(std::memory_order_relaxed));
	- }
	- private:
	- bool __arrive(int const myepoch) {
	- int const myresult = arrived.fetch_add(1,std::memory_order_acq_rel) + 1;
	- if(__builtin_expect(myresult == expected,0)) {
	- int const newexpected = completion();
	- expected = newexpected ? newexpected : nexpected.load(std::memory_order_relaxed);
	- arrived.store(0,std::memory_order_relaxed);
	- epoch.store(myepoch+1,std::memory_order_release);
	- return true;
	- }
	- return false;
	- }
	- int expected;
	- std::atomic<int> arrived, nexpected;
	- synchronic<int> epoch;
	- std::function<int()> completion;
	- };
	-}}
	-
	-#endif //__N3998_H
	diff --git a/lib/kokkos/core/src/impl/Kokkos_TaskQueue.hpp b/lib/kokkos/core/src/impl/Kokkos_TaskQueue.hpp
	index afa01d0cd..b514df351 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_TaskQueue.hpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_TaskQueue.hpp
	@@ -1,546 +1,614 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	// Experimental unified task-data parallel manycore LDRD

	#ifndef KOKKOS_IMPL_TASKQUEUE_HPP
	#define KOKKOS_IMPL_TASKQUEUE_HPP

	#if defined( KOKKOS_ENABLE_TASKDAG )

	#include <string>
	#include <typeinfo>
	#include <stdexcept>

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	/*\brief Implementation data for task data management, access, and execution.
	*
	* Curiously recurring template pattern (CRTP)
	* to allow static_cast from the
	* task root type and a task's FunctorType.
	*
	* TaskBase< Space , ResultType , FunctorType >
	* : TaskBase< Space , ResultType , void >
	* , FunctorType
	* { ... };
	*
	* TaskBase< Space , ResultType , void >
	* : TaskBase< Space , void , void >
	* { ... };
	*/
	template< typename Space , typename ResultType , typename FunctorType >
	class TaskBase ;

	-template< typename Space >
	-class TaskExec ;
	-
	} /* namespace Impl */
	} /* namespace Kokkos */

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	template< typename Space >
	class TaskQueueSpecialization ;

	/** \brief Manage task allocation, deallocation, and scheduling.
	*
	* Task execution is deferred to the TaskQueueSpecialization.
	* All other aspects of task management have shared implementation.
	*/
	template< typename ExecSpace >
	class TaskQueue {
	private:

	friend class TaskQueueSpecialization< ExecSpace > ;
	friend class Kokkos::TaskScheduler< ExecSpace > ;

	using execution_space = ExecSpace ;
	using specialization = TaskQueueSpecialization< execution_space > ;
	using memory_space = typename specialization::memory_space ;
	using device_type = Kokkos::Device< execution_space , memory_space > ;
	using memory_pool = Kokkos::Experimental::MemoryPool< device_type > ;
	using task_root_type = Kokkos::Impl::TaskBase<execution_space,void,void> ;

	struct Destroy {
	TaskQueue * m_queue ;
	void destroy_shared_allocation();
	};

	//----------------------------------------

	enum : int { NumQueue = 3 };

	// Queue is organized as [ priority ][ type ]

	memory_pool m_memory ;
	task_root_type * volatile m_ready[ NumQueue ][ 2 ];
	long m_accum_alloc ; // Accumulated number of allocations
	int m_count_alloc ; // Current number of allocations
	int m_max_alloc ; // Maximum number of allocations
	int m_ready_count ; // Number of ready or executing

	//----------------------------------------

	~TaskQueue();
	TaskQueue() = delete ;
	TaskQueue( TaskQueue && ) = delete ;
	TaskQueue( TaskQueue const & ) = delete ;
	TaskQueue & operator = ( TaskQueue && ) = delete ;
	TaskQueue & operator = ( TaskQueue const & ) = delete ;

	TaskQueue
	( const memory_space & arg_space
	, unsigned const arg_memory_pool_capacity
	, unsigned const arg_memory_pool_superblock_capacity_log2
	);

	// Schedule a task
	// Precondition:
	// task is not executing
	// task->m_next is the dependence or zero
	// Postcondition:
	// task->m_next is linked list membership
	- KOKKOS_FUNCTION
	- void schedule( task_root_type * const );
	+ KOKKOS_FUNCTION void schedule_runnable( task_root_type * const );
	+ KOKKOS_FUNCTION void schedule_aggregate( task_root_type * const );

	// Reschedule a task
	// Precondition:
	// task is in Executing state
	// task->m_next == LockTag
	// Postcondition:
	// task is in Executing-Respawn state
	// task->m_next == 0 (no dependence)
	KOKKOS_FUNCTION
	void reschedule( task_root_type * );

	// Complete a task
	// Precondition:
	// task is not executing
	// task->m_next == LockTag => task is complete
	// task->m_next != LockTag => task is respawn
	// Postcondition:
	// task->m_wait == LockTag => task is complete
	// task->m_wait != LockTag => task is waiting
	KOKKOS_FUNCTION
	void complete( task_root_type * );

	KOKKOS_FUNCTION
	static bool push_task( task_root_type * volatile * const
	, task_root_type * const );

	KOKKOS_FUNCTION
	- static task_root_type * pop_task( task_root_type * volatile * const );
	+ static task_root_type * pop_ready_task( task_root_type * volatile * const );

	KOKKOS_FUNCTION static
	void decrement( task_root_type * task );

	public:

	// If and only if the execution space is a single thread
	// then execute ready tasks.
	KOKKOS_INLINE_FUNCTION
	void iff_single_thread_recursive_execute()
	{
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	specialization::iff_single_thread_recursive_execute( this );
	#endif
	}

	void execute() { specialization::execute( this ); }

	template< typename FunctorType >
	void proc_set_apply( typename task_root_type::function_type * ptr )
	{
	specialization::template proc_set_apply< FunctorType >( ptr );
	}

	// Assign task pointer with reference counting of assigned tasks
	template< typename LV , typename RV >
	KOKKOS_FUNCTION static
	void assign( TaskBase< execution_space,LV,void> ** const lhs
	, TaskBase< execution_space,RV,void> * const rhs )
	{
	using task_lhs = TaskBase< execution_space,LV,void> ;
	#if 0
	{
	printf( "assign( 0x%lx { 0x%lx %d %d } , 0x%lx { 0x%lx %d %d } )\n"
	, uintptr_t( lhs ? *lhs : 0 )
	, uintptr_t( lhs && lhs ? (lhs)->m_next : 0 )
	, int( lhs && lhs ? (lhs)->m_task_type : 0 )
	, int( lhs && lhs ? (lhs)->m_ref_count : 0 )
	, uintptr_t(rhs)
	, uintptr_t( rhs ? rhs->m_next : 0 )
	, int( rhs ? rhs->m_task_type : 0 )
	, int( rhs ? rhs->m_ref_count : 0 )
	);
	fflush( stdout );
	}
	#endif

	if ( lhs ) decrement( lhs );
	if ( rhs ) { Kokkos::atomic_increment( &(rhs->m_ref_count) ); }

	// Force write of *lhs

	static_cast< task_lhs volatile * >(lhs) = rhs ;

	Kokkos::memory_fence();
	}

	KOKKOS_FUNCTION
	size_t allocate_block_size( size_t n ); ///< Actual block size allocated

	KOKKOS_FUNCTION
	void * allocate( size_t n ); ///< Allocate from the memory pool

	KOKKOS_FUNCTION
	void deallocate( void * p , size_t n ); ///< Deallocate to the memory pool
	};

	} /* namespace Impl */
	} /* namespace Kokkos */

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos {
	namespace Impl {

	template<>
	class TaskBase< void , void , void > {
	public:
	enum : int16_t { TaskTeam = 0 , TaskSingle = 1 , Aggregate = 2 };
	enum : uintptr_t { LockTag = ~uintptr_t(0) , EndTag = ~uintptr_t(1) };
	};

	/** \brief Base class for task management, access, and execution.
	*
	* Inheritance structure to allow static_cast from the task root type
	* and a task's FunctorType.
	*
	* // Enable a Future to access result data
	* TaskBase< Space , ResultType , void >
	* : TaskBase< void , void , void >
	* { ... };
	*
	* // Enable a functor to access the base class
	* TaskBase< Space , ResultType , FunctorType >
	* : TaskBase< Space , ResultType , void >
	* , FunctorType
	* { ... };
	*
	*
	* States of a task:
	*
	* Constructing State, NOT IN a linked list
	* m_wait == 0
	* m_next == 0
	*
	* Scheduling transition : Constructing -> Waiting
	* before:
	* m_wait == 0
	* m_next == this task's initial dependence, 0 if none
	* after:
	* m_wait == EndTag
	* m_next == EndTag
	*
	* Waiting State, IN a linked list
	* m_apply != 0
	* m_queue != 0
	* m_ref_count > 0
	* m_wait == head of linked list of tasks waiting on this task
	* m_next == next of linked list of tasks
	*
	* transition : Waiting -> Executing
	* before:
	* m_next == EndTag
	* after::
	* m_next == LockTag
	*
	* Executing State, NOT IN a linked list
	* m_apply != 0
	* m_queue != 0
	* m_ref_count > 0
	* m_wait == head of linked list of tasks waiting on this task
	* m_next == LockTag
	*
	* Respawn transition : Executing -> Executing-Respawn
	* before:
	* m_next == LockTag
	* after:
	* m_next == this task's updated dependence, 0 if none
	*
	* Executing-Respawn State, NOT IN a linked list
	* m_apply != 0
	* m_queue != 0
	* m_ref_count > 0
	* m_wait == head of linked list of tasks waiting on this task
	* m_next == this task's updated dependence, 0 if none
	*
	* transition : Executing -> Complete
	* before:
	* m_wait == head of linked list
	* after:
	* m_wait == LockTag
	*
	* Complete State, NOT IN a linked list
	* m_wait == LockTag: cannot add dependence
	* m_next == LockTag: not a member of a wait queue
	*
	*/
	template< typename ExecSpace >
	class TaskBase< ExecSpace , void , void >
	{
	public:

	enum : int16_t { TaskTeam = TaskBase<void,void,void>::TaskTeam
	, TaskSingle = TaskBase<void,void,void>::TaskSingle
	, Aggregate = TaskBase<void,void,void>::Aggregate };

	enum : uintptr_t { LockTag = TaskBase<void,void,void>::LockTag
	, EndTag = TaskBase<void,void,void>::EndTag };

	using execution_space = ExecSpace ;
	using queue_type = TaskQueue< execution_space > ;

	template< typename > friend class Kokkos::TaskScheduler ;

	typedef void (* function_type) ( TaskBase * , void * );

	// sizeof(TaskBase) == 48

	function_type m_apply ; ///< Apply function pointer
	queue_type * m_queue ; ///< Queue in which this task resides
	TaskBase * m_wait ; ///< Linked list of tasks waiting on this
	TaskBase * m_next ; ///< Waiting linked-list next
	int32_t m_ref_count ; ///< Reference count
	int32_t m_alloc_size ; ///< Allocation size
	int32_t m_dep_count ; ///< Aggregate's number of dependences
	int16_t m_task_type ; ///< Type of task
	int16_t m_priority ; ///< Priority of runnable task

	+ TaskBase() = delete ;
	TaskBase( TaskBase && ) = delete ;
	TaskBase( const TaskBase & ) = delete ;
	TaskBase & operator = ( TaskBase && ) = delete ;
	TaskBase & operator = ( const TaskBase & ) = delete ;

	KOKKOS_INLINE_FUNCTION ~TaskBase() = default ;

	+ // Constructor for a runnable task
	KOKKOS_INLINE_FUNCTION
	- constexpr TaskBase() noexcept
	- : m_apply(0)
	- , m_queue(0)
	- , m_wait(0)
	- , m_next(0)
	- , m_ref_count(0)
	- , m_alloc_size(0)
	- , m_dep_count(0)
	- , m_task_type( TaskSingle )
	- , m_priority( 1 /* TaskRegularPriority */ )
	+ constexpr TaskBase( function_type arg_apply
	+ , queue_type * arg_queue
	+ , TaskBase * arg_dependence
	+ , int arg_ref_count
	+ , int arg_alloc_size
	+ , int arg_task_type
	+ , int arg_priority
	+ ) noexcept
	+ : m_apply( arg_apply )
	+ , m_queue( arg_queue )
	+ , m_wait( 0 )
	+ , m_next( arg_dependence )
	+ , m_ref_count( arg_ref_count )
	+ , m_alloc_size( arg_alloc_size )
	+ , m_dep_count( 0 )
	+ , m_task_type( arg_task_type )
	+ , m_priority( arg_priority )
	+ {}
	+
	+ // Constructor for an aggregate task
	+ KOKKOS_INLINE_FUNCTION
	+ constexpr TaskBase( queue_type * arg_queue
	+ , int arg_ref_count
	+ , int arg_alloc_size
	+ , int arg_dep_count
	+ ) noexcept
	+ : m_apply( 0 )
	+ , m_queue( arg_queue )
	+ , m_wait( 0 )
	+ , m_next( 0 )
	+ , m_ref_count( arg_ref_count )
	+ , m_alloc_size( arg_alloc_size )
	+ , m_dep_count( arg_dep_count )
	+ , m_task_type( Aggregate )
	+ , m_priority( 0 )
	{}

	//----------------------------------------

	KOKKOS_INLINE_FUNCTION
	TaskBase ** aggregate_dependences()
	{ return reinterpret_cast<TaskBase**>( this + 1 ); }

	KOKKOS_INLINE_FUNCTION
	bool requested_respawn()
	{
	// This should only be called when a task has finished executing and is
	// in the transition to either the complete or executing-respawn state.
	TaskBase * const lock = reinterpret_cast< TaskBase * >( LockTag );
	return lock != m_next;
	}

	KOKKOS_INLINE_FUNCTION
	void add_dependence( TaskBase* dep )
	{
	+ // Precondition: lock == m_next
	+
	+ TaskBase * const lock = (TaskBase *) LockTag ;
	+
	// Assign dependence to m_next. It will be processed in the subsequent
	// call to schedule. Error if the dependence is reset.
	- if ( 0 != Kokkos::atomic_exchange( & m_next, dep ) ) {
	+ if ( lock != Kokkos::atomic_exchange( & m_next, dep ) ) {
	Kokkos::abort("TaskScheduler ERROR: resetting task dependence");
	}

	if ( 0 != dep ) {
	// The future may be destroyed upon returning from this call
	// so increment reference count to track this assignment.
	Kokkos::atomic_increment( &(dep->m_ref_count) );
	}
	}

	using get_return_type = void ;

	KOKKOS_INLINE_FUNCTION
	get_return_type get() const {}
	};

	template < typename ExecSpace , typename ResultType >
	class TaskBase< ExecSpace , ResultType , void >
	: public TaskBase< ExecSpace , void , void >
	{
	private:

	- static_assert( sizeof(TaskBase<ExecSpace,void,void>) == 48 , "" );
	+ using root_type = TaskBase<ExecSpace,void,void> ;
	+ using function_type = typename root_type::function_type ;
	+ using queue_type = typename root_type::queue_type ;

	+ static_assert( sizeof(root_type) == 48 , "" );
	+
	+ TaskBase() = delete ;
	TaskBase( TaskBase && ) = delete ;
	TaskBase( const TaskBase & ) = delete ;
	TaskBase & operator = ( TaskBase && ) = delete ;
	TaskBase & operator = ( const TaskBase & ) = delete ;

	public:

	ResultType m_result ;

	KOKKOS_INLINE_FUNCTION ~TaskBase() = default ;

	+ // Constructor for runnable task
	KOKKOS_INLINE_FUNCTION
	- TaskBase()
	- : TaskBase< ExecSpace , void , void >()
	+ constexpr TaskBase( function_type arg_apply
	+ , queue_type * arg_queue
	+ , root_type * arg_dependence
	+ , int arg_ref_count
	+ , int arg_alloc_size
	+ , int arg_task_type
	+ , int arg_priority
	+ )
	+ : root_type( arg_apply
	+ , arg_queue
	+ , arg_dependence
	+ , arg_ref_count
	+ , arg_alloc_size
	+ , arg_task_type
	+ , arg_priority
	+ )
	, m_result()
	{}

	using get_return_type = ResultType const & ;

	KOKKOS_INLINE_FUNCTION
	get_return_type get() const { return m_result ; }
	};

	template< typename ExecSpace , typename ResultType , typename FunctorType >
	class TaskBase
	: public TaskBase< ExecSpace , ResultType , void >
	, public FunctorType
	{
	private:

	TaskBase() = delete ;
	TaskBase( TaskBase && ) = delete ;
	TaskBase( const TaskBase & ) = delete ;
	TaskBase & operator = ( TaskBase && ) = delete ;
	TaskBase & operator = ( const TaskBase & ) = delete ;

	public:

	- using root_type = TaskBase< ExecSpace , void , void > ;
	- using base_type = TaskBase< ExecSpace , ResultType , void > ;
	- using member_type = TaskExec< ExecSpace > ;
	- using functor_type = FunctorType ;
	- using result_type = ResultType ;
	+ using root_type = TaskBase< ExecSpace , void , void > ;
	+ using base_type = TaskBase< ExecSpace , ResultType , void > ;
	+ using specialization = TaskQueueSpecialization< ExecSpace > ;
	+ using function_type = typename root_type::function_type ;
	+ using queue_type = typename root_type::queue_type ;
	+ using member_type = typename specialization::member_type ;
	+ using functor_type = FunctorType ;
	+ using result_type = ResultType ;

	template< typename Type >
	KOKKOS_INLINE_FUNCTION static
	void apply_functor
	( Type * const task
	, typename std::enable_if
	< std::is_same< typename Type::result_type , void >::value
	, member_type * const
	>::type member
	)
	{
	using fType = typename Type::functor_type ;
	static_cast<fType>(task)->operator()( member );
	}

	template< typename Type >
	KOKKOS_INLINE_FUNCTION static
	void apply_functor
	( Type * const task
	, typename std::enable_if
	< ! std::is_same< typename Type::result_type , void >::value
	, member_type * const
	>::type member
	)
	{
	using fType = typename Type::functor_type ;
	static_cast<fType>(task)->operator()( member , task->m_result );
	}

	KOKKOS_FUNCTION static
	void apply( root_type * root , void * exec )
	{
	TaskBase * const task = static_cast< TaskBase * >( root );
	member_type * const member = reinterpret_cast< member_type * >( exec );

	TaskBase::template apply_functor( task , member );

	// Task may be serial or team.
	// If team then must synchronize before querying if respawn was requested.
	// If team then only one thread calls destructor.

	member->team_barrier();

	if ( 0 == member->team_rank() && !(task->requested_respawn()) ) {
	// Did not respawn, destroy the functor to free memory.
	static_cast<functor_type*>(task)->~functor_type();
	- // Cannot destroy the task until its dependences have been processed.
	+ // Cannot destroy and deallocate the task until its dependences
	+ // have been processed.
	}
	}

	+ // Constructor for runnable task
	KOKKOS_INLINE_FUNCTION
	- TaskBase( functor_type const & arg_functor )
	- : base_type()
	+ constexpr TaskBase( function_type arg_apply
	+ , queue_type * arg_queue
	+ , root_type * arg_dependence
	+ , int arg_ref_count
	+ , int arg_alloc_size
	+ , int arg_task_type
	+ , int arg_priority
	+ , FunctorType && arg_functor
	+ )
	+ : base_type( arg_apply
	+ , arg_queue
	+ , arg_dependence
	+ , arg_ref_count
	+ , arg_alloc_size
	+ , arg_task_type
	+ , arg_priority
	+ )
	, functor_type( arg_functor )
	{}

	KOKKOS_INLINE_FUNCTION
	~TaskBase() {}
	};

	} /* namespace Impl */
	} /* namespace Kokkos */

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
	#endif /* #ifndef KOKKOS_IMPL_TASKQUEUE_HPP */
	diff --git a/lib/kokkos/core/src/impl/Kokkos_TaskQueue_impl.hpp b/lib/kokkos/core/src/impl/Kokkos_TaskQueue_impl.hpp
	index fefbbad8b..23f5d3cd3 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_TaskQueue_impl.hpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_TaskQueue_impl.hpp
	@@ -1,590 +1,661 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#if defined( KOKKOS_ENABLE_TASKDAG )

	namespace Kokkos {
	namespace Impl {

	//----------------------------------------------------------------------------

	template< typename ExecSpace >
	void TaskQueue< ExecSpace >::Destroy::destroy_shared_allocation()
	{
	m_queue->~TaskQueue();
	}

	//----------------------------------------------------------------------------

	template< typename ExecSpace >
	TaskQueue< ExecSpace >::TaskQueue
	( const TaskQueue< ExecSpace >::memory_space & arg_space
	, unsigned const arg_memory_pool_capacity
	, unsigned const arg_memory_pool_superblock_capacity_log2
	)
	: m_memory( arg_space
	, arg_memory_pool_capacity
	, arg_memory_pool_superblock_capacity_log2 )
	, m_ready()
	, m_accum_alloc(0)
	, m_count_alloc(0)
	, m_max_alloc(0)
	, m_ready_count(0)
	{
	for ( int i = 0 ; i < NumQueue ; ++i ) {
	m_ready[i][0] = (task_root_type *) task_root_type::EndTag ;
	m_ready[i][1] = (task_root_type *) task_root_type::EndTag ;
	}
	}

	//----------------------------------------------------------------------------

	template< typename ExecSpace >
	TaskQueue< ExecSpace >::~TaskQueue()
	{
	// Verify that queues are empty and ready count is zero

	for ( int i = 0 ; i < NumQueue ; ++i ) {
	for ( int j = 0 ; j < 2 ; ++j ) {
	if ( m_ready[i][j] != (task_root_type *) task_root_type::EndTag ) {
	Kokkos::abort("TaskQueue::~TaskQueue ERROR: has ready tasks");
	}
	}
	}

	if ( 0 != m_ready_count ) {
	Kokkos::abort("TaskQueue::~TaskQueue ERROR: has ready or executing tasks");
	}
	}

	//----------------------------------------------------------------------------

	template< typename ExecSpace >
	KOKKOS_FUNCTION
	void TaskQueue< ExecSpace >::decrement
	( TaskQueue< ExecSpace >::task_root_type * task )
	{
	const int count = Kokkos::atomic_fetch_add(&(task->m_ref_count),-1);

	#if 0
	if ( 1 == count ) {
	printf( "decrement-destroy( 0x%lx { 0x%lx %d %d } )\n"
	, uintptr_t( task )
	, uintptr_t( task->m_next )
	, int( task->m_task_type )
	, int( task->m_ref_count )
	);
	}
	#endif

	if ( ( 1 == count ) &&
	( task->m_next == (task_root_type *) task_root_type::LockTag ) ) {
	// Reference count is zero and task is complete, deallocate.
	task->m_queue->deallocate( task , task->m_alloc_size );
	}
	else if ( count <= 1 ) {
	Kokkos::abort("TaskScheduler task has negative reference count or is incomplete" );
	}
	}

	//----------------------------------------------------------------------------

	template< typename ExecSpace >
	KOKKOS_FUNCTION
	size_t TaskQueue< ExecSpace >::allocate_block_size( size_t n )
	{
	return m_memory.allocate_block_size( n );
	}

	template< typename ExecSpace >
	KOKKOS_FUNCTION
	void * TaskQueue< ExecSpace >::allocate( size_t n )
	{
	void * const p = m_memory.allocate(n);

	if ( p ) {
	Kokkos::atomic_increment( & m_accum_alloc );
	Kokkos::atomic_increment( & m_count_alloc );

	if ( m_max_alloc < m_count_alloc ) m_max_alloc = m_count_alloc ;
	}

	return p ;
	}

	template< typename ExecSpace >
	KOKKOS_FUNCTION
	void TaskQueue< ExecSpace >::deallocate( void * p , size_t n )
	{
	m_memory.deallocate( p , n );
	Kokkos::atomic_decrement( & m_count_alloc );
	}

	//----------------------------------------------------------------------------

	template< typename ExecSpace >
	KOKKOS_FUNCTION
	bool TaskQueue< ExecSpace >::push_task
	( TaskQueue< ExecSpace >::task_root_type * volatile * const queue
	, TaskQueue< ExecSpace >::task_root_type * const task
	)
	{
	// Push task into a concurrently pushed and popped queue.
	+ // The queue can be either a ready task queue or a waiting task queue.
	// The queue is a linked list where 'task->m_next' form the links.
	// Fail the push attempt if the queue is locked;
	// otherwise retry until the push succeeds.

	#if 0
	printf( "push_task( 0x%lx { 0x%lx } 0x%lx { 0x%lx 0x%lx %d %d %d } )\n"
	, uintptr_t(queue)
	, uintptr_t(*queue)
	, uintptr_t(task)
	, uintptr_t(task->m_wait)
	, uintptr_t(task->m_next)
	, task->m_task_type
	, task->m_priority
	, task->m_ref_count );
	#endif

	task_root_type * const zero = (task_root_type *) 0 ;
	task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;

	task_root_type * volatile * const next = & task->m_next ;

	if ( zero != *next ) {
	Kokkos::abort("TaskQueue::push_task ERROR: already a member of another queue" );
	}

	task_root_type * y = *queue ;

	while ( lock != y ) {

	*next = y ;

	// Do not proceed until '*next' has been stored.
	Kokkos::memory_fence();

	task_root_type * const x = y ;

	y = Kokkos::atomic_compare_exchange(queue,y,task);

	if ( x == y ) return true ;
	}

	// Failed, replace 'task->m_next' value since 'task' remains
	// not a member of a queue.

	*next = zero ;

	// Do not proceed until '*next' has been stored.
	Kokkos::memory_fence();

	return false ;
	}

	//----------------------------------------------------------------------------

	template< typename ExecSpace >
	KOKKOS_FUNCTION
	typename TaskQueue< ExecSpace >::task_root_type *
	-TaskQueue< ExecSpace >::pop_task
	+TaskQueue< ExecSpace >::pop_ready_task
	( TaskQueue< ExecSpace >::task_root_type * volatile * const queue )
	{
	- // Pop task from a concurrently pushed and popped queue.
	+ // Pop task from a concurrently pushed and popped ready task queue.
	// The queue is a linked list where 'task->m_next' form the links.

	- task_root_type * const zero = (task_root_type *) 0 ;
	task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
	task_root_type * const end = (task_root_type *) task_root_type::EndTag ;

	// *queue is
	// end => an empty queue
	// lock => a locked queue
	// valid

	// Retry until the lock is acquired or the queue is empty.

	task_root_type * task = *queue ;

	while ( end != task ) {

	// The only possible values for the queue are
	// (1) lock, (2) end, or (3) a valid task.
	// Thus zero will never appear in the queue.
	//
	- // If queue is locked then just read by guaranteeing
	- // the CAS will fail.
	+ // If queue is locked then just read by guaranteeing the CAS will fail.

	if ( lock == task ) task = 0 ;

	task_root_type * const x = task ;

	- task = Kokkos::atomic_compare_exchange(queue,task,lock);
	-
	- if ( x == task ) break ; // CAS succeeded and queue is locked
	- }
	+ task = Kokkos::atomic_compare_exchange(queue,x,lock);

	- if ( end != task ) {
	+ if ( x == task ) {
	+ // CAS succeeded and queue is locked
	+ //
	+ // This thread has locked the queue and removed 'task' from the queue.
	+ // Extract the next entry of the queue from 'task->m_next'
	+ // and mark 'task' as popped from a queue by setting
	+ // 'task->m_next = lock'.
	+ //
	+ // Place the next entry in the head of the queue,
	+ // which also unlocks the queue.
	+ //
	+ // This thread has exclusive access to
	+ // the queue and the popped task's m_next.

	- // This thread has locked the queue and removed 'task' from the queue.
	- // Extract the next entry of the queue from 'task->m_next'
	- // and mark 'task' as popped from a queue by setting
	- // 'task->m_next = lock'.
	+ *queue = task->m_next ; task->m_next = lock ;

	- task_root_type * const next =
	- Kokkos::atomic_exchange( & task->m_next , lock );
	+ Kokkos::memory_fence();

	- // Place the next entry in the head of the queue,
	- // which also unlocks the queue.
	-
	- task_root_type * const unlock =
	- Kokkos::atomic_exchange( queue , next );
	+#if 0
	+ printf( "pop_ready_task( 0x%lx 0x%lx { 0x%lx 0x%lx %d %d %d } )\n"
	+ , uintptr_t(queue)
	+ , uintptr_t(task)
	+ , uintptr_t(task->m_wait)
	+ , uintptr_t(task->m_next)
	+ , int(task->m_task_type)
	+ , int(task->m_priority)
	+ , int(task->m_ref_count) );
	+#endif

	- if ( next == zero \|\| next == lock \|\| lock != unlock ) {
	- Kokkos::abort("TaskQueue::pop_task ERROR");
	+ return task ;
	}
	}

	-#if 0
	- if ( end != task ) {
	- printf( "pop_task( 0x%lx 0x%lx { 0x%lx 0x%lx %d %d %d } )\n"
	- , uintptr_t(queue)
	- , uintptr_t(task)
	- , uintptr_t(task->m_wait)
	- , uintptr_t(task->m_next)
	- , int(task->m_task_type)
	- , int(task->m_priority)
	- , int(task->m_ref_count) );
	- }
	-#endif
	-
	- return task ;
	+ return end ;
	}

	//----------------------------------------------------------------------------

	template< typename ExecSpace >
	KOKKOS_FUNCTION
	-void TaskQueue< ExecSpace >::schedule
	+void TaskQueue< ExecSpace >::schedule_runnable
	( TaskQueue< ExecSpace >::task_root_type * const task )
	{
	- // Schedule a runnable or when_all task upon construction / spawn
	+ // Schedule a runnable task upon construction / spawn
	// and upon completion of other tasks that 'task' is waiting on.
	-
	- // Precondition on runnable task state:
	- // task is either constructing or executing
	+ //
	+ // Precondition:
	+ // - called by a single thread for the input task
	+ // - calling thread has exclusive access to the task
	+ // - task is not a member of a queue
	+ // - if runnable then task is either constructing or respawning
	//
	// Constructing state:
	// task->m_wait == 0
	- // task->m_next == dependence
	- // Executing-respawn state:
	- // task->m_wait == head of linked list
	- // task->m_next == dependence
	+ // task->m_next == dependence or 0
	+ // Respawn state:
	+ // task->m_wait == head of linked list: 'end' or valid task
	+ // task->m_next == dependence or 0
	//
	// Task state transition:
	- // Constructing -> Waiting
	- // Executing-respawn -> Waiting
	+ // Constructing -> Waiting
	+ // Respawn -> Waiting
	//
	// Postcondition on task state:
	- // task->m_wait == head of linked list
	- // task->m_next == member of linked list
	+ // task->m_wait == head of linked list (queue)
	+ // task->m_next == member of linked list (queue)

	#if 0
	- printf( "schedule( 0x%lx { 0x%lx 0x%lx %d %d %d }\n"
	+ printf( "schedule_runnable( 0x%lx { 0x%lx 0x%lx %d %d %d }\n"
	, uintptr_t(task)
	, uintptr_t(task->m_wait)
	, uintptr_t(task->m_next)
	, task->m_task_type
	, task->m_priority
	, task->m_ref_count );
	#endif

	task_root_type * const zero = (task_root_type *) 0 ;
	task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
	task_root_type * const end = (task_root_type *) task_root_type::EndTag ;

	- //----------------------------------------
	- {
	- // If Constructing then task->m_wait == 0
	- // Change to waiting by task->m_wait = EndTag
	-
	- task_root_type * const init =
	- Kokkos::atomic_compare_exchange( & task->m_wait , zero , end );
	+ bool respawn = false ;

	- // Precondition
	+ //----------------------------------------

	- if ( lock == init ) {
	- Kokkos::abort("TaskQueue::schedule ERROR: task is complete");
	- }
	+ if ( zero == task->m_wait ) {
	+ // Task in Constructing state
	+ // - Transition to Waiting state
	+ // Preconditions:
	+ // - call occurs exclusively within a single thread

	- // if ( init == 0 ) Constructing -> Waiting
	- // else Executing-Respawn -> Waiting
	+ task->m_wait = end ;
	+ // Task in Waiting state
	}
	+ else if ( lock != task->m_wait ) {
	+ // Task in Executing state with Respawn request
	+ // - Update dependence
	+ // - Transition to Waiting state
	+ respawn = true ;
	+ }
	+ else {
	+ // Task in Complete state
	+ Kokkos::abort("TaskQueue::schedule_runnable ERROR: task is complete");
	+ }
	+
	//----------------------------------------
	+ // Scheduling a runnable task which may have a depencency 'dep'.
	+ // Extract dependence, if any, from task->m_next.
	+ // If 'dep' is not null then attempt to push 'task'
	+ // into the wait queue of 'dep'.
	+ // If the push succeeds then 'task' may be
	+ // processed or executed by another thread at any time.
	+ // If the push fails then 'dep' is complete and 'task'
	+ // is ready to execute.
	+
	+ // Exclusive access so don't need an atomic exchange
	+ // task_root_type * dep = Kokkos::atomic_exchange( & task->m_next , zero );
	+ task_root_type * dep = task->m_next ; task->m_next = zero ;
	+
	+ const bool is_ready =
	+ ( 0 == dep ) \|\| ( ! push_task( & dep->m_wait , task ) );
	+
	+ if ( ( 0 != dep ) && respawn ) {
	+ // Reference count for dep was incremented when
	+ // respawn assigned dependency to task->m_next
	+ // so that if dep completed prior to the
	+ // above push_task dep would not be destroyed.
	+ // dep reference count can now be decremented,
	+ // which may deallocate the task.
	+ TaskQueue::assign( & dep , (task_root_type *)0 );
	+ }

	- if ( task_root_type::Aggregate != task->m_task_type ) {
	+ if ( is_ready ) {

	- // Scheduling a runnable task which may have a depencency 'dep'.
	- // Extract dependence, if any, from task->m_next.
	- // If 'dep' is not null then attempt to push 'task'
	- // into the wait queue of 'dep'.
	- // If the push succeeds then 'task' may be
	- // processed or executed by another thread at any time.
	- // If the push fails then 'dep' is complete and 'task'
	- // is ready to execute.
	+ // No dependence or 'dep' is complete so push task into ready queue.
	+ // Increment the ready count before pushing into ready queue
	+ // to track number of ready + executing tasks.
	+ // The ready count will be decremented when the task is complete.

	- task_root_type * dep = Kokkos::atomic_exchange( & task->m_next , zero );
	+ Kokkos::atomic_increment( & m_ready_count );

	- const bool is_ready =
	- ( 0 == dep ) \|\| ( ! push_task( & dep->m_wait , task ) );
	+ task_root_type * volatile * const ready_queue =
	+ & m_ready[ task->m_priority ][ task->m_task_type ];

	- // Reference count for dep was incremented when assigned
	- // to task->m_next so that if it completed prior to the
	- // above push_task dep would not be destroyed.
	- // dep reference count can now be decremented,
	- // which may deallocate the task.
	- TaskQueue::assign( & dep , (task_root_type *)0 );
	+ // A push_task fails if the ready queue is locked.
	+ // A ready queue is only locked during a push or pop;
	+ // i.e., it is never permanently locked.
	+ // Retry push to ready queue until it succeeds.
	+ // When the push succeeds then 'task' may be
	+ // processed or executed by another thread at any time.

	- if ( is_ready ) {
	+ while ( ! push_task( ready_queue , task ) );
	+ }

	- // No dependence or 'dep' is complete so push task into ready queue.
	- // Increment the ready count before pushing into ready queue
	- // to track number of ready + executing tasks.
	- // The ready count will be decremented when the task is complete.
	+ //----------------------------------------
	+ // Postcondition:
	+ // - A runnable 'task' was pushed into a wait or ready queue.
	+ // - Concurrent execution may have already popped 'task'
	+ // from a queue and processed it as appropriate.
	+}

	- Kokkos::atomic_increment( & m_ready_count );
	+template< typename ExecSpace >
	+KOKKOS_FUNCTION
	+void TaskQueue< ExecSpace >::schedule_aggregate
	+ ( TaskQueue< ExecSpace >::task_root_type * const task )
	+{
	+ // Schedule an aggregate task upon construction
	+ // and upon completion of other tasks that 'task' is waiting on.
	+ //
	+ // Precondition:
	+ // - called by a single thread for the input task
	+ // - calling thread has exclusive access to the task
	+ // - task is not a member of a queue
	+ //
	+ // Constructing state:
	+ // task->m_wait == 0
	+ // task->m_next == dependence or 0
	+ //
	+ // Task state transition:
	+ // Constructing -> Waiting
	+ //
	+ // Postcondition on task state:
	+ // task->m_wait == head of linked list (queue)
	+ // task->m_next == member of linked list (queue)
	+
	+#if 0
	+ printf( "schedule_aggregate( 0x%lx { 0x%lx 0x%lx %d %d %d }\n"
	+ , uintptr_t(task)
	+ , uintptr_t(task->m_wait)
	+ , uintptr_t(task->m_next)
	+ , task->m_task_type
	+ , task->m_priority
	+ , task->m_ref_count );
	+#endif

	- task_root_type * volatile * const queue =
	- & m_ready[ task->m_priority ][ task->m_task_type ];
	+ task_root_type * const zero = (task_root_type *) 0 ;
	+ task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
	+ task_root_type * const end = (task_root_type *) task_root_type::EndTag ;

	- // A push_task fails if the ready queue is locked.
	- // A ready queue is only locked during a push or pop;
	- // i.e., it is never permanently locked.
	- // Retry push to ready queue until it succeeds.
	- // When the push succeeds then 'task' may be
	- // processed or executed by another thread at any time.
	+ //----------------------------------------

	- while ( ! push_task( queue , task ) );
	- }
	+ if ( zero == task->m_wait ) {
	+ // Task in Constructing state
	+ // - Transition to Waiting state
	+ // Preconditions:
	+ // - call occurs exclusively within a single thread
	+
	+ task->m_wait = end ;
	+ // Task in Waiting state
	+ }
	+ else if ( lock == task->m_wait ) {
	+ // Task in Complete state
	+ Kokkos::abort("TaskQueue::schedule_aggregate ERROR: task is complete");
	}
	+
	//----------------------------------------
	- else {
	- // Scheduling a 'when_all' task with multiple dependences.
	- // This scheduling may be called when the 'when_all' is
	- // (1) created or
	- // (2) being removed from a completed task's wait list.
	+ // Scheduling a 'when_all' task with multiple dependences.
	+ // This scheduling may be called when the 'when_all' is
	+ // (1) created or
	+ // (2) being removed from a completed task's wait list.

	- task_root_type ** const aggr = task->aggregate_dependences();
	+ task_root_type ** const aggr = task->aggregate_dependences();

	- // Assume the 'when_all' is complete until a dependence is
	- // found that is not complete.
	+ // Assume the 'when_all' is complete until a dependence is
	+ // found that is not complete.

	- bool is_complete = true ;
	+ bool is_complete = true ;

	- for ( int i = task->m_dep_count ; 0 < i && is_complete ; ) {
	+ for ( int i = task->m_dep_count ; 0 < i && is_complete ; ) {

	- --i ;
	+ --i ;

	- // Loop dependences looking for an incomplete task.
	- // Add this task to the incomplete task's wait queue.
	+ // Loop dependences looking for an incomplete task.
	+ // Add this task to the incomplete task's wait queue.

	- // Remove a task 'x' from the dependence list.
	- // The reference count of 'x' was incremented when
	- // it was assigned into the dependence list.
	+ // Remove a task 'x' from the dependence list.
	+ // The reference count of 'x' was incremented when
	+ // it was assigned into the dependence list.

	- task_root_type * x = Kokkos::atomic_exchange( aggr + i , zero );
	+ // Exclusive access so don't need an atomic exchange
	+ // task_root_type * x = Kokkos::atomic_exchange( aggr + i , zero );
	+ task_root_type * x = aggr[i] ; aggr[i] = zero ;

	- if ( x ) {
	+ if ( x ) {

	- // If x->m_wait is not locked then push succeeds
	- // and the aggregate is not complete.
	- // If the push succeeds then this when_all 'task' may be
	- // processed by another thread at any time.
	- // For example, 'x' may be completeed by another
	- // thread and then re-schedule this when_all 'task'.
	+ // If x->m_wait is not locked then push succeeds
	+ // and the aggregate is not complete.
	+ // If the push succeeds then this when_all 'task' may be
	+ // processed by another thread at any time.
	+ // For example, 'x' may be completeed by another
	+ // thread and then re-schedule this when_all 'task'.

	- is_complete = ! push_task( & x->m_wait , task );
	+ is_complete = ! push_task( & x->m_wait , task );

	- // Decrement reference count which had been incremented
	- // when 'x' was added to the dependence list.
	+ // Decrement reference count which had been incremented
	+ // when 'x' was added to the dependence list.

	- TaskQueue::assign( & x , zero );
	- }
	+ TaskQueue::assign( & x , zero );
	}
	+ }

	- if ( is_complete ) {
	- // The when_all 'task' was not added to a wait queue because
	- // all dependences were complete so this aggregate is complete.
	- // Complete the when_all 'task' to schedule other tasks
	- // that are waiting for the when_all 'task' to complete.
	+ if ( is_complete ) {
	+ // The when_all 'task' was not added to a wait queue because
	+ // all dependences were complete so this aggregate is complete.
	+ // Complete the when_all 'task' to schedule other tasks
	+ // that are waiting for the when_all 'task' to complete.

	- task->m_next = lock ;
	+ task->m_next = lock ;

	- complete( task );
	+ complete( task );

	- // '*task' may have been deleted upon completion
	- }
	+ // '*task' may have been deleted upon completion
	}
	+
	//----------------------------------------
	// Postcondition:
	- // A runnable 'task' was pushed into a wait or ready queue.
	- // An aggregate 'task' was either pushed to a wait queue
	- // or completed.
	- // Concurrent execution may have already popped 'task'
	- // from a queue and processed it as appropriate.
	+ // - An aggregate 'task' was either pushed to a wait queue or completed.
	+ // - Concurrent execution may have already popped 'task'
	+ // from a queue and processed it as appropriate.
	}

	//----------------------------------------------------------------------------

	template< typename ExecSpace >
	KOKKOS_FUNCTION
	void TaskQueue< ExecSpace >::reschedule( task_root_type * task )
	{
	// Precondition:
	// task is in Executing state
	// task->m_next == LockTag
	//
	// Postcondition:
	// task is in Executing-Respawn state
	// task->m_next == 0 (no dependence)

	task_root_type * const zero = (task_root_type *) 0 ;
	task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;

	if ( lock != Kokkos::atomic_exchange( & task->m_next, zero ) ) {
	Kokkos::abort("TaskScheduler::respawn ERROR: already respawned");
	}
	}

	//----------------------------------------------------------------------------

	template< typename ExecSpace >
	KOKKOS_FUNCTION
	void TaskQueue< ExecSpace >::complete
	( TaskQueue< ExecSpace >::task_root_type * task )
	{
	// Complete a runnable task that has finished executing
	// or a when_all task when all of its dependeneces are complete.

	task_root_type * const zero = (task_root_type *) 0 ;
	task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
	task_root_type * const end = (task_root_type *) task_root_type::EndTag ;

	#if 0
	printf( "complete( 0x%lx { 0x%lx 0x%lx %d %d %d }\n"
	, uintptr_t(task)
	, uintptr_t(task->m_wait)
	, uintptr_t(task->m_next)
	, task->m_task_type
	, task->m_priority
	, task->m_ref_count );
	fflush( stdout );
	#endif

	const bool runnable = task_root_type::Aggregate != task->m_task_type ;

	//----------------------------------------

	if ( runnable && lock != task->m_next ) {
	// Is a runnable task has finished executing and requested respawn.
	// Schedule the task for subsequent execution.

	- schedule( task );
	+ schedule_runnable( task );
	}
	//----------------------------------------
	else {
	// Is either an aggregate or a runnable task that executed
	// and did not respawn. Transition this task to complete.

	// If 'task' is an aggregate then any of the runnable tasks that
	// it depends upon may be attempting to complete this 'task'.
	// Must only transition a task once to complete status.
	// This is controled by atomically locking the wait queue.

	// Stop other tasks from adding themselves to this task's wait queue
	// by locking the head of this task's wait queue.

	task_root_type * x = Kokkos::atomic_exchange( & task->m_wait , lock );

	if ( x != (task_root_type *) lock ) {

	// This thread has transitioned this 'task' to complete.
	// 'task' is no longer in a queue and is not executing
	// so decrement the reference count from 'task's creation.
	// If no other references to this 'task' then it will be deleted.

	TaskQueue::assign( & task , zero );

	// This thread has exclusive access to the wait list so
	- // the concurrency-safe pop_task function is not needed.
	+ // the concurrency-safe pop_ready_task function is not needed.
	// Schedule the tasks that have been waiting on the input 'task',
	// which may have been deleted.

	while ( x != end ) {
	+ // Have exclusive access to 'x' until it is scheduled
	+ // Set x->m_next = zero <= no dependence, not a respawn

	- // Set x->m_next = zero <= no dependence
	-
	- task_root_type * const next =
	- (task_root_type *) Kokkos::atomic_exchange( & x->m_next , zero );
	+ task_root_type * const next = x->m_next ; x->m_next = 0 ;

	- schedule( x );
	+ if ( task_root_type::Aggregate != x->m_task_type ) {
	+ schedule_runnable( x );
	+ }
	+ else {
	+ schedule_aggregate( x );
	+ }

	x = next ;
	}
	}
	}

	if ( runnable ) {
	// A runnable task was popped from a ready queue and executed.
	// If respawned into a ready queue then the ready count was incremented
	// so decrement whether respawned or not.
	Kokkos::atomic_decrement( & m_ready_count );
	}
	}

	//----------------------------------------------------------------------------

	} /* namespace Impl */
	} /* namespace Kokkos */

	#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
	diff --git a/lib/kokkos/core/src/impl/Kokkos_Utilities.hpp b/lib/kokkos/core/src/impl/Kokkos_Utilities.hpp
	index ff503cb27..d72cde03f 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_Utilities.hpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_Utilities.hpp
	@@ -1,414 +1,415 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef KOKKOS_CORE_IMPL_UTILITIES_HPP
	#define KOKKOS_CORE_IMPL_UTILITIES_HPP

	#include <Kokkos_Macros.hpp>
	+#include <stdint.h>
	#include <type_traits>

	//----------------------------------------------------------------------------
	//----------------------------------------------------------------------------

	namespace Kokkos { namespace Impl {

	// same as std::forward
	// needed to allow perfect forwarding on the device
	template <typename T>
	KOKKOS_INLINE_FUNCTION
	constexpr
	T&& forward( typename std::remove_reference<T>::type& arg ) noexcept
	{ return static_cast<T&&>(arg); }

	template <typename T>
	KOKKOS_INLINE_FUNCTION
	constexpr
	T&& forward( typename std::remove_reference<T>::type&& arg ) noexcept
	{ return static_cast<T&&>(arg); }

	// same as std::move
	// needed to allowing moving on the device
	template <typename T>
	KOKKOS_INLINE_FUNCTION
	constexpr
	typename std::remove_reference<T>::type&& move( T&& arg ) noexcept
	{ return static_cast<typename std::remove_reference<T>::type&&>(arg); }

	// empty function to allow expanding a variadic argument pack
	template<typename... Args>
	KOKKOS_INLINE_FUNCTION
	void expand_variadic(Args &&...) {}

	//----------------------------------------
	// C++14 integer sequence
	template< typename T , T ... Ints >
	struct integer_sequence {
	using value_type = T ;
	static constexpr std::size_t size() noexcept { return sizeof...(Ints); }
	};

	template< typename T , std::size_t N >
	struct make_integer_sequence_helper ;

	template< typename T , T N >
	using make_integer_sequence =
	typename make_integer_sequence_helper<T,N>::type ;

	template< typename T >
	struct make_integer_sequence_helper< T , 0 >
	{ using type = integer_sequence<T> ; };

	template< typename T >
	struct make_integer_sequence_helper< T , 1 >
	{ using type = integer_sequence<T,0> ; };

	template< typename T >
	struct make_integer_sequence_helper< T , 2 >
	{ using type = integer_sequence<T,0,1> ; };

	template< typename T >
	struct make_integer_sequence_helper< T , 3 >
	{ using type = integer_sequence<T,0,1,2> ; };

	template< typename T >
	struct make_integer_sequence_helper< T , 4 >
	{ using type = integer_sequence<T,0,1,2,3> ; };

	template< typename T >
	struct make_integer_sequence_helper< T , 5 >
	{ using type = integer_sequence<T,0,1,2,3,4> ; };

	template< typename T >
	struct make_integer_sequence_helper< T , 6 >
	{ using type = integer_sequence<T,0,1,2,3,4,5> ; };

	template< typename T >
	struct make_integer_sequence_helper< T , 7 >
	{ using type = integer_sequence<T,0,1,2,3,4,5,6> ; };

	template< typename T >
	struct make_integer_sequence_helper< T , 8 >
	{ using type = integer_sequence<T,0,1,2,3,4,5,6,7> ; };

	template< typename X , typename Y >
	struct make_integer_sequence_concat ;

	template< typename T , T ... x , T ... y >
	struct make_integer_sequence_concat< integer_sequence<T,x...>
	, integer_sequence<T,y...> >
	{ using type = integer_sequence< T , x ... , (sizeof...(x)+y)... > ; };

	template< typename T , std::size_t N >
	struct make_integer_sequence_helper {
	using type = typename make_integer_sequence_concat
	< typename make_integer_sequence_helper< T , N/2 >::type
	, typename make_integer_sequence_helper< T , N - N/2 >::type
	>::type ;
	};

	//----------------------------------------

	template <std::size_t... Indices>
	using index_sequence = integer_sequence<std::size_t, Indices...>;

	template< std::size_t N >
	using make_index_sequence = make_integer_sequence< std::size_t, N>;

	//----------------------------------------

	template <unsigned I, typename IntegerSequence>
	struct integer_sequence_at;

	template <unsigned I, typename T, T h0, T... tail>
	struct integer_sequence_at<I, integer_sequence<T, h0, tail...> >
	: public integer_sequence_at<I-1u, integer_sequence<T,tail...> >
	{
	static_assert( 8 <= I , "Reasoning Error" );
	static_assert( I < integer_sequence<T, h0, tail...>::size(), "Error: Index out of bounds");
	};

	template < typename T, T h0, T... tail>
	struct integer_sequence_at<0u, integer_sequence<T,h0, tail...> >
	{
	using type = T;
	static constexpr T value = h0;
	};

	template < typename T, T h0, T h1, T... tail>
	struct integer_sequence_at<1u, integer_sequence<T, h0, h1, tail...> >
	{
	using type = T;
	static constexpr T value = h1;
	};

	template < typename T, T h0, T h1, T h2, T... tail>
	struct integer_sequence_at<2u, integer_sequence<T, h0, h1, h2, tail...> >
	{
	using type = T;
	static constexpr T value = h2;
	};

	template < typename T, T h0, T h1, T h2, T h3, T... tail>
	struct integer_sequence_at<3u, integer_sequence<T, h0, h1, h2, h3, tail...> >
	{
	using type = T;
	static constexpr T value = h3;
	};

	template < typename T, T h0, T h1, T h2, T h3, T h4, T... tail>
	struct integer_sequence_at<4u, integer_sequence<T, h0, h1, h2, h3, h4, tail...> >
	{
	using type = T;
	static constexpr T value = h4;
	};

	template < typename T, T h0, T h1, T h2, T h3, T h4, T h5, T... tail>
	struct integer_sequence_at<5u, integer_sequence<T, h0, h1, h2, h3, h4, h5, tail...> >
	{
	using type = T;
	static constexpr T value = h5;
	};

	template < typename T, T h0, T h1, T h2, T h3, T h4, T h5, T h6, T... tail>
	struct integer_sequence_at<6u, integer_sequence<T, h0, h1, h2, h3, h4, h5, h6, tail...> >
	{
	using type = T;
	static constexpr T value = h6;
	};

	template < typename T, T h0, T h1, T h2, T h3, T h4, T h5, T h6, T h7, T... tail>
	struct integer_sequence_at<7u, integer_sequence<T, h0, h1, h2, h3, h4, h5, h6, h7, tail...> >
	{
	using type = T;
	static constexpr T value = h7;
	};

	//----------------------------------------

	template <typename T>
	constexpr
	T at( const unsigned, integer_sequence<T> ) noexcept
	{ return ~static_cast<T>(0); }

	template <typename T, T h0, T... tail>
	constexpr
	T at( const unsigned i, integer_sequence<T, h0> ) noexcept
	{ return i==0u ? h0 : ~static_cast<T>(0); }

	template <typename T, T h0, T h1>
	constexpr
	T at( const unsigned i, integer_sequence<T, h0, h1> ) noexcept
	{ return i==0u ? h0 :
	i==1u ? h1 : ~static_cast<T>(0);
	}

	template <typename T, T h0, T h1, T h2>
	constexpr
	T at( const unsigned i, integer_sequence<T, h0, h1, h2> ) noexcept
	{ return i==0u ? h0 :
	i==1u ? h1 :
	i==2u ? h2 : ~static_cast<T>(0);
	}

	template <typename T, T h0, T h1, T h2, T h3>
	constexpr
	T at( const unsigned i, integer_sequence<T, h0, h1, h2, h3> ) noexcept
	{ return i==0u ? h0 :
	i==1u ? h1 :
	i==2u ? h2 :
	i==3u ? h3 : ~static_cast<T>(0);
	}

	template <typename T, T h0, T h1, T h2, T h3, T h4>
	constexpr
	T at( const unsigned i, integer_sequence<T, h0, h1, h2, h3, h4> ) noexcept
	{ return i==0u ? h0 :
	i==1u ? h1 :
	i==2u ? h2 :
	i==3u ? h3 :
	i==4u ? h4 : ~static_cast<T>(0);
	}

	template <typename T, T h0, T h1, T h2, T h3, T h4, T h5>
	constexpr
	T at( const unsigned i, integer_sequence<T, h0, h1, h2, h3, h4, h5> ) noexcept
	{ return i==0u ? h0 :
	i==1u ? h1 :
	i==2u ? h2 :
	i==3u ? h3 :
	i==4u ? h4 :
	i==5u ? h5 : ~static_cast<T>(0);
	}

	template <typename T, T h0, T h1, T h2, T h3, T h4, T h5, T h6>
	constexpr
	T at( const unsigned i, integer_sequence<T, h0, h1, h2, h3, h4, h5, h6> ) noexcept
	{ return i==0u ? h0 :
	i==1u ? h1 :
	i==2u ? h2 :
	i==3u ? h3 :
	i==4u ? h4 :
	i==5u ? h5 :
	i==6u ? h6 : ~static_cast<T>(0);
	}

	template <typename T, T h0, T h1, T h2, T h3, T h4, T h5, T h6, T h7, T... tail>
	constexpr
	T at( const unsigned i, integer_sequence<T, h0, h1, h2, h3, h4, h5, h6, h7, tail...> ) noexcept
	{ return i==0u ? h0 :
	i==1u ? h1 :
	i==2u ? h2 :
	i==3u ? h3 :
	i==4u ? h4 :
	i==5u ? h5 :
	i==6u ? h6 :
	i==7u ? h7 : at(i-8u, integer_sequence<T, tail...>{} );
	}

	//----------------------------------------


	template < typename IntegerSequence
	, typename ResultSequence = integer_sequence<typename IntegerSequence::value_type>
	>
	struct reverse_integer_sequence_helper;

	template <typename T, T h0, T... tail, T... results>
	struct reverse_integer_sequence_helper< integer_sequence<T, h0, tail...>, integer_sequence<T, results...> >
	: public reverse_integer_sequence_helper< integer_sequence<T, tail...>, integer_sequence<T, h0, results...> >
	{};

	template <typename T, T... results>
	struct reverse_integer_sequence_helper< integer_sequence<T>, integer_sequence<T, results...> >
	{
	using type = integer_sequence<T, results...>;
	};


	template <typename IntegerSequence>
	using reverse_integer_sequence = typename reverse_integer_sequence_helper<IntegerSequence>::type;

	//----------------------------------------

	template < typename IntegerSequence
	, typename Result
	, typename ResultSequence = integer_sequence<typename IntegerSequence::value_type>
	>
	struct exclusive_scan_integer_sequence_helper;

	template <typename T, T h0, T... tail, typename Result, T... results>
	struct exclusive_scan_integer_sequence_helper
	< integer_sequence<T, h0, tail...>
	, Result
	, integer_sequence<T, results...> >
	: public exclusive_scan_integer_sequence_helper
	< integer_sequence<T, tail...>
	, std::integral_constant<T,Result::value+h0>
	, integer_sequence<T, 0, (results+h0)...> >
	{};

	template <typename T, typename Result, T... results>
	struct exclusive_scan_integer_sequence_helper
	< integer_sequence<T>, Result, integer_sequence<T, results...> >
	{
	using type = integer_sequence<T, results...>;
	static constexpr T value = Result::value ;
	};

	template <typename IntegerSequence>
	struct exclusive_scan_integer_sequence
	{
	using value_type = typename IntegerSequence::value_type;
	using helper =
	exclusive_scan_integer_sequence_helper
	< reverse_integer_sequence<IntegerSequence>
	, std::integral_constant< value_type , 0 >
	> ;
	using type = typename helper::type ;
	static constexpr value_type value = helper::value ;
	};

	//----------------------------------------

	template < typename IntegerSequence
	, typename Result
	, typename ResultSequence = integer_sequence<typename IntegerSequence::value_type>
	>
	struct inclusive_scan_integer_sequence_helper;

	template <typename T, T h0, T... tail, typename Result, T... results>
	struct inclusive_scan_integer_sequence_helper
	< integer_sequence<T, h0, tail...>
	, Result
	, integer_sequence<T, results...> >
	: public inclusive_scan_integer_sequence_helper
	< integer_sequence<T, tail...>
	, std::integral_constant<T,Result::value+h0>
	, integer_sequence<T, h0, (results+h0)...> >
	{};

	template <typename T, typename Result, T... results>
	struct inclusive_scan_integer_sequence_helper
	< integer_sequence<T>, Result, integer_sequence<T, results...> >
	{
	using type = integer_sequence<T, results...>;
	static constexpr T value = Result::value ;
	};

	template <typename IntegerSequence>
	struct inclusive_scan_integer_sequence
	{
	using value_type = typename IntegerSequence::value_type;
	using helper =
	inclusive_scan_integer_sequence_helper
	< reverse_integer_sequence<IntegerSequence>
	, std::integral_constant< value_type , 0 >
	> ;
	using type = typename helper::type ;
	static constexpr value_type value = helper::value ;
	};

	}} // namespace Kokkos::Impl


	#endif //KOKKOS_CORE_IMPL_UTILITIES_HPP
	diff --git a/lib/kokkos/core/src/impl/Kokkos_spinwait.cpp b/lib/kokkos/core/src/impl/Kokkos_spinwait.cpp
	index ad1b6dce3..93ff6c48a 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_spinwait.cpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_spinwait.cpp
	@@ -1,89 +1,181 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Macros.hpp>
	+
	#include <impl/Kokkos_spinwait.hpp>

	+#include <Kokkos_Atomic.hpp>
	+#include <impl/Kokkos_BitOps.hpp>
	+
	/--------------------------------------------------------------------------/

	-#if ( KOKKOS_ENABLE_ASM )
	- #if defined( __arm__ ) \|\| defined( __aarch64__ )
	- /* No-operation instruction to idle the thread. */
	- #define YIELD asm volatile("nop")
	+#if !defined( _WIN32 )
	+ #if defined( KOKKOS_ENABLE_ASM )
	+ #if defined( __arm__ ) \|\| defined( __aarch64__ )
	+ /* No-operation instruction to idle the thread. */
	+ #define KOKKOS_INTERNAL_PAUSE
	+ #else
	+ /* Pause instruction to prevent excess processor bus usage */
	+ #define KOKKOS_INTERNAL_PAUSE asm volatile("pause\n":::"memory")
	+ #endif
	+ #define KOKKOS_INTERNAL_NOP2 asm volatile("nop\n" "nop\n")
	+ #define KOKKOS_INTERNAL_NOP4 KOKKOS_INTERNAL_NOP2; KOKKOS_INTERNAL_NOP2
	+ #define KOKKOS_INTERNAL_NOP8 KOKKOS_INTERNAL_NOP4; KOKKOS_INTERNAL_NOP4;
	+ #define KOKKOS_INTERNAL_NOP16 KOKKOS_INTERNAL_NOP8; KOKKOS_INTERNAL_NOP8;
	+ #define KOKKOS_INTERNAL_NOP32 KOKKOS_INTERNAL_NOP16; KOKKOS_INTERNAL_NOP16;
	+ namespace {
	+ inline void kokkos_internal_yield( const unsigned i ) noexcept {
	+ switch (Kokkos::Impl::bit_scan_reverse((i >> 2)+1u)) {
	+ case 0u: KOKKOS_INTERNAL_NOP2; break;
	+ case 1u: KOKKOS_INTERNAL_NOP4; break;
	+ case 2u: KOKKOS_INTERNAL_NOP8; break;
	+ case 3u: KOKKOS_INTERNAL_NOP16; break;
	+ default: KOKKOS_INTERNAL_NOP32;
	+ }
	+ KOKKOS_INTERNAL_PAUSE;
	+ }
	+ }
	#else
	- /* Pause instruction to prevent excess processor bus usage */
	- #define YIELD asm volatile("pause\n":::"memory")
	+ #include <sched.h>
	+ namespace {
	+ inline void kokkos_internal_yield( const unsigned ) noexcept {
	+ sched_yield();
	+ }
	+ }
	+ #endif
	+#else // defined( _WIN32 )
	+ #if defined ( KOKKOS_ENABLE_WINTHREAD )
	+ #include <process.h>
	+ namespace {
	+ inline void kokkos_internal_yield( const unsigned ) noexcept {
	+ Sleep(0);
	+ }
	+ }
	+ #elif defined( _MSC_VER )
	+ #define NOMINMAX
	+ #include <winsock2.h>
	+ #include <windows.h>
	+ namespace {
	+ inline void kokkos_internal_yield( const unsigned ) noexcept {
	+ YieldProcessor();
	+ }
	+ }
	+ #else
	+ #define KOKKOS_INTERNAL_PAUSE __asm__ __volatile__("pause\n":::"memory")
	+ #define KOKKOS_INTERNAL_NOP2 __asm__ __volatile__("nop\n" "nop")
	+ #define KOKKOS_INTERNAL_NOP4 KOKKOS_INTERNAL_NOP2; KOKKOS_INTERNAL_NOP2
	+ #define KOKKOS_INTERNAL_NOP8 KOKKOS_INTERNAL_NOP4; KOKKOS_INTERNAL_NOP4;
	+ #define KOKKOS_INTERNAL_NOP16 KOKKOS_INTERNAL_NOP8; KOKKOS_INTERNAL_NOP8;
	+ #define KOKKOS_INTERNAL_NOP32 KOKKOS_INTERNAL_NOP16; KOKKOS_INTERNAL_NOP16;
	+ namespace {
	+ inline void kokkos_internal_yield( const unsigned i ) noexcept {
	+ switch (Kokkos::Impl::bit_scan_reverse((i >> 2)+1u)) {
	+ case 0: KOKKOS_INTERNAL_NOP2; break;
	+ case 1: KOKKOS_INTERNAL_NOP4; break;
	+ case 2: KOKKOS_INTERNAL_NOP8; break;
	+ case 3: KOKKOS_INTERNAL_NOP16; break;
	+ default: KOKKOS_INTERNAL_NOP32;
	+ }
	+ KOKKOS_INTERNAL_PAUSE;
	+ }
	+ }
	#endif
	-#elif defined ( KOKKOS_ENABLE_WINTHREAD )
	- #include <process.h>
	- #define YIELD Sleep(0)
	-#elif defined ( _WIN32) && defined (_MSC_VER)
	- /* Windows w/ Visual Studio */
	- #define NOMINMAX
	- #include <winsock2.h>
	- #include <windows.h>
	-#define YIELD YieldProcessor();
	-#elif defined ( _WIN32 )
	- /* Windows w/ Intel*/
	- #define YIELD __asm__ __volatile__("pause\n":::"memory")
	-#else
	- #include <sched.h>
	- #define YIELD sched_yield()
	#endif

	+
	/--------------------------------------------------------------------------/

	namespace Kokkos {
	namespace Impl {
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	-void spinwait( volatile int & flag , const int value )
	+
	+void spinwait_while_equal( volatile int32_t & flag , const int32_t value )
	+{
	+ Kokkos::store_fence();
	+ unsigned i = 0;
	+ while ( value == flag ) {
	+ kokkos_internal_yield(i);
	+ ++i;
	+ }
	+ Kokkos::load_fence();
	+}
	+
	+void spinwait_until_equal( volatile int32_t & flag , const int32_t value )
	+{
	+ Kokkos::store_fence();
	+ unsigned i = 0;
	+ while ( value != flag ) {
	+ kokkos_internal_yield(i);
	+ ++i;
	+ }
	+ Kokkos::load_fence();
	+}
	+
	+void spinwait_while_equal( volatile int64_t & flag , const int64_t value )
	{
	+ Kokkos::store_fence();
	+ unsigned i = 0;
	while ( value == flag ) {
	- YIELD ;
	+ kokkos_internal_yield(i);
	+ ++i;
	+ }
	+ Kokkos::load_fence();
	+}
	+
	+void spinwait_until_equal( volatile int64_t & flag , const int64_t value )
	+{
	+ Kokkos::store_fence();
	+ unsigned i = 0;
	+ while ( value != flag ) {
	+ kokkos_internal_yield(i);
	+ ++i;
	}
	+ Kokkos::load_fence();
	}
	+
	#endif

	} /* namespace Impl */
	} /* namespace Kokkos */

	diff --git a/lib/kokkos/core/src/impl/Kokkos_spinwait.hpp b/lib/kokkos/core/src/impl/Kokkos_spinwait.hpp
	index cc87771fa..6e34b8a94 100644
	--- a/lib/kokkos/core/src/impl/Kokkos_spinwait.hpp
	+++ b/lib/kokkos/core/src/impl/Kokkos_spinwait.hpp
	@@ -1,64 +1,80 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/


	#ifndef KOKKOS_SPINWAIT_HPP
	#define KOKKOS_SPINWAIT_HPP

	#include <Kokkos_Macros.hpp>

	+#include <cstdint>
	+
	namespace Kokkos {
	namespace Impl {

	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	-void spinwait( volatile int & flag , const int value );
	+
	+void spinwait_while_equal( volatile int32_t & flag , const int32_t value );
	+void spinwait_until_equal( volatile int32_t & flag , const int32_t value );
	+
	+void spinwait_while_equal( volatile int64_t & flag , const int64_t value );
	+void spinwait_until_equal( volatile int64_t & flag , const int64_t value );
	#else
	+
	+KOKKOS_INLINE_FUNCTION
	+void spinwait_while_equal( volatile int32_t & , const int32_t ) {}
	+KOKKOS_INLINE_FUNCTION
	+void spinwait_until_equal( volatile int32_t & , const int32_t ) {}
	+
	+KOKKOS_INLINE_FUNCTION
	+void spinwait_while_equal( volatile int64_t & , const int64_t ) {}
	KOKKOS_INLINE_FUNCTION
	-void spinwait( volatile int & , const int ) {}
	+void spinwait_until_equal( volatile int64_t & , const int64_t ) {}
	+
	#endif

	} /* namespace Impl */
	} /* namespace Kokkos */

	#endif /* #ifndef KOKKOS_SPINWAIT_HPP */

	diff --git a/lib/kokkos/core/unit_test/CMakeLists.txt b/lib/kokkos/core/unit_test/CMakeLists.txt
	index 795657fe8..caf6c5012 100644
	--- a/lib/kokkos/core/unit_test/CMakeLists.txt
	+++ b/lib/kokkos/core/unit_test/CMakeLists.txt
	@@ -1,197 +1,217 @@
	#
	# Add test-only library for gtest to be reused by all the subpackages
	#

	SET(GTEST_SOURCE_DIR ${${PARENT_PACKAGE_NAME}_SOURCE_DIR}/tpls/gtest)

	INCLUDE_DIRECTORIES(${GTEST_SOURCE_DIR})
	TRIBITS_ADD_LIBRARY(
	kokkos_gtest
	HEADERS ${GTEST_SOURCE_DIR}/gtest/gtest.h
	SOURCES ${GTEST_SOURCE_DIR}/gtest/gtest-all.cc
	TESTONLY
	)

	#
	# Define the tests
	#

	INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
	INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})

	IF(Kokkos_ENABLE_Serial)
	TRIBITS_ADD_EXECUTABLE_AND_TEST(
	UnitTest_Serial
	SOURCES
	UnitTestMain.cpp
	serial/TestSerial_Atomics.cpp
	serial/TestSerial_Other.cpp
	serial/TestSerial_Reductions.cpp
	serial/TestSerial_SubView_a.cpp
	serial/TestSerial_SubView_b.cpp
	serial/TestSerial_SubView_c01.cpp
	serial/TestSerial_SubView_c02.cpp
	serial/TestSerial_SubView_c03.cpp
	serial/TestSerial_SubView_c04.cpp
	serial/TestSerial_SubView_c05.cpp
	serial/TestSerial_SubView_c06.cpp
	serial/TestSerial_SubView_c07.cpp
	serial/TestSerial_SubView_c08.cpp
	serial/TestSerial_SubView_c09.cpp
	serial/TestSerial_SubView_c10.cpp
	serial/TestSerial_SubView_c11.cpp
	serial/TestSerial_SubView_c12.cpp
	serial/TestSerial_Team.cpp
	serial/TestSerial_ViewAPI_a.cpp
	serial/TestSerial_ViewAPI_b.cpp
	COMM serial mpi
	NUM_MPI_PROCS 1
	FAIL_REGULAR_EXPRESSION " FAILED "
	TESTONLYLIBS kokkos_gtest
	)
	ENDIF()

	IF(Kokkos_ENABLE_Pthread)
	TRIBITS_ADD_EXECUTABLE_AND_TEST(
	UnitTest_Threads
	SOURCES
	UnitTestMain.cpp
	threads/TestThreads_Atomics.cpp
	threads/TestThreads_Other.cpp
	threads/TestThreads_Reductions.cpp
	threads/TestThreads_SubView_a.cpp
	threads/TestThreads_SubView_b.cpp
	threads/TestThreads_SubView_c01.cpp
	threads/TestThreads_SubView_c02.cpp
	threads/TestThreads_SubView_c03.cpp
	threads/TestThreads_SubView_c04.cpp
	threads/TestThreads_SubView_c05.cpp
	threads/TestThreads_SubView_c06.cpp
	threads/TestThreads_SubView_c07.cpp
	threads/TestThreads_SubView_c08.cpp
	threads/TestThreads_SubView_c09.cpp
	threads/TestThreads_SubView_c10.cpp
	threads/TestThreads_SubView_c11.cpp
	threads/TestThreads_SubView_c12.cpp
	threads/TestThreads_Team.cpp
	threads/TestThreads_ViewAPI_a.cpp
	threads/TestThreads_ViewAPI_b.cpp
	COMM serial mpi
	NUM_MPI_PROCS 1
	FAIL_REGULAR_EXPRESSION " FAILED "
	TESTONLYLIBS kokkos_gtest
	)
	ENDIF()

	IF(Kokkos_ENABLE_OpenMP)
	TRIBITS_ADD_EXECUTABLE_AND_TEST(
	UnitTest_OpenMP
	SOURCES
	UnitTestMain.cpp
	openmp/TestOpenMP_Atomics.cpp
	openmp/TestOpenMP_Other.cpp
	openmp/TestOpenMP_Reductions.cpp
	openmp/TestOpenMP_SubView_a.cpp
	openmp/TestOpenMP_SubView_b.cpp
	openmp/TestOpenMP_SubView_c01.cpp
	openmp/TestOpenMP_SubView_c02.cpp
	openmp/TestOpenMP_SubView_c03.cpp
	openmp/TestOpenMP_SubView_c04.cpp
	openmp/TestOpenMP_SubView_c05.cpp
	openmp/TestOpenMP_SubView_c06.cpp
	openmp/TestOpenMP_SubView_c07.cpp
	openmp/TestOpenMP_SubView_c08.cpp
	openmp/TestOpenMP_SubView_c09.cpp
	openmp/TestOpenMP_SubView_c10.cpp
	openmp/TestOpenMP_SubView_c11.cpp
	openmp/TestOpenMP_SubView_c12.cpp
	openmp/TestOpenMP_Team.cpp
	openmp/TestOpenMP_ViewAPI_a.cpp
	openmp/TestOpenMP_ViewAPI_b.cpp
	COMM serial mpi
	NUM_MPI_PROCS 1
	FAIL_REGULAR_EXPRESSION " FAILED "
	TESTONLYLIBS kokkos_gtest
	)
	ENDIF()

	-IF(Kokkos_ENABLE_QTHREAD)
	+IF(Kokkos_ENABLE_Qthreads)
	TRIBITS_ADD_EXECUTABLE_AND_TEST(
	- UnitTest_Qthread
	- SOURCES UnitTestMain.cpp TestQthread.cpp
	+ UnitTest_Qthreads
	+ SOURCES
	+ UnitTestMain.cpp
	+ qthreads/TestQthreads_Atomics.cpp
	+ qthreads/TestQthreads_Other.cpp
	+ qthreads/TestQthreads_Reductions.cpp
	+ qthreads/TestQthreads_SubView_a.cpp
	+ qthreads/TestQthreads_SubView_b.cpp
	+ qthreads/TestQthreads_SubView_c01.cpp
	+ qthreads/TestQthreads_SubView_c02.cpp
	+ qthreads/TestQthreads_SubView_c03.cpp
	+ qthreads/TestQthreads_SubView_c04.cpp
	+ qthreads/TestQthreads_SubView_c05.cpp
	+ qthreads/TestQthreads_SubView_c06.cpp
	+ qthreads/TestQthreads_SubView_c07.cpp
	+ qthreads/TestQthreads_SubView_c08.cpp
	+ qthreads/TestQthreads_SubView_c09.cpp
	+ qthreads/TestQthreads_SubView_c10.cpp
	+ qthreads/TestQthreads_SubView_c11.cpp
	+ qthreads/TestQthreads_SubView_c12.cpp
	+ qthreads/TestQthreads_Team.cpp
	+ qthreads/TestQthreads_ViewAPI_a.cpp
	+ qthreads/TestQthreads_ViewAPI_b.cpp
	COMM serial mpi
	NUM_MPI_PROCS 1
	FAIL_REGULAR_EXPRESSION " FAILED "
	TESTONLYLIBS kokkos_gtest
	)
	ENDIF()

	IF(Kokkos_ENABLE_Cuda)
	TRIBITS_ADD_EXECUTABLE_AND_TEST(
	UnitTest_Cuda
	SOURCES
	UnitTestMain.cpp
	cuda/TestCuda_Atomics.cpp
	cuda/TestCuda_Other.cpp
	cuda/TestCuda_Reductions_a.cpp
	cuda/TestCuda_Reductions_b.cpp
	cuda/TestCuda_Spaces.cpp
	cuda/TestCuda_SubView_a.cpp
	cuda/TestCuda_SubView_b.cpp
	cuda/TestCuda_SubView_c01.cpp
	cuda/TestCuda_SubView_c02.cpp
	cuda/TestCuda_SubView_c03.cpp
	cuda/TestCuda_SubView_c04.cpp
	cuda/TestCuda_SubView_c05.cpp
	cuda/TestCuda_SubView_c06.cpp
	cuda/TestCuda_SubView_c07.cpp
	cuda/TestCuda_SubView_c08.cpp
	cuda/TestCuda_SubView_c09.cpp
	cuda/TestCuda_SubView_c10.cpp
	cuda/TestCuda_SubView_c11.cpp
	cuda/TestCuda_SubView_c12.cpp
	cuda/TestCuda_Team.cpp
	cuda/TestCuda_ViewAPI_a.cpp
	cuda/TestCuda_ViewAPI_b.cpp
	cuda/TestCuda_ViewAPI_c.cpp
	cuda/TestCuda_ViewAPI_d.cpp
	cuda/TestCuda_ViewAPI_e.cpp
	cuda/TestCuda_ViewAPI_f.cpp
	cuda/TestCuda_ViewAPI_g.cpp
	cuda/TestCuda_ViewAPI_h.cpp
	COMM serial mpi
	NUM_MPI_PROCS 1
	FAIL_REGULAR_EXPRESSION " FAILED "
	TESTONLYLIBS kokkos_gtest
	)
	ENDIF()

	TRIBITS_ADD_EXECUTABLE_AND_TEST(
	UnitTest_Default
	SOURCES UnitTestMain.cpp TestDefaultDeviceType.cpp TestDefaultDeviceType_a.cpp
	COMM serial mpi
	NUM_MPI_PROCS 1
	FAIL_REGULAR_EXPRESSION " FAILED "
	TESTONLYLIBS kokkos_gtest
	)

	foreach(INITTESTS_NUM RANGE 1 16)
	TRIBITS_ADD_EXECUTABLE_AND_TEST(
	UnitTest_DefaultInit_${INITTESTS_NUM}
	SOURCES UnitTestMain.cpp TestDefaultDeviceTypeInit_${INITTESTS_NUM}.cpp
	COMM serial mpi
	NUM_MPI_PROCS 1
	FAIL_REGULAR_EXPRESSION " FAILED "
	TESTONLYLIBS kokkos_gtest
	)
	endforeach(INITTESTS_NUM)

	TRIBITS_ADD_EXECUTABLE_AND_TEST(
	UnitTest_HWLOC
	SOURCES UnitTestMain.cpp TestHWLOC.cpp
	COMM serial mpi
	NUM_MPI_PROCS 1
	FAIL_REGULAR_EXPRESSION " FAILED "
	TESTONLYLIBS kokkos_gtest
	)
	-
	diff --git a/lib/kokkos/core/unit_test/Makefile b/lib/kokkos/core/unit_test/Makefile
	index cc59825fb..d93830a28 100644
	--- a/lib/kokkos/core/unit_test/Makefile
	+++ b/lib/kokkos/core/unit_test/Makefile
	@@ -1,196 +1,196 @@
	KOKKOS_PATH = ../..

	GTEST_PATH = ../../tpls/gtest

	vpath %.cpp ${KOKKOS_PATH}/core/unit_test
	vpath %.cpp ${KOKKOS_PATH}/core/unit_test/serial
	vpath %.cpp ${KOKKOS_PATH}/core/unit_test/threads
	vpath %.cpp ${KOKKOS_PATH}/core/unit_test/openmp
	+vpath %.cpp ${KOKKOS_PATH}/core/unit_test/qthreads
	vpath %.cpp ${KOKKOS_PATH}/core/unit_test/cuda

	TEST_HEADERS = $(wildcard $(KOKKOS_PATH)/core/unit_test/*.hpp)
	TEST_HEADERS += $(wildcard $(KOKKOS_PATH)/core/unit_test//.hpp)

	default: build_all
	echo "End Build"

	ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
	CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
	else
	CXX = g++
	endif

	CXXFLAGS = -O3
	LINK ?= $(CXX)
	LDFLAGS ?= -lpthread

	include $(KOKKOS_PATH)/Makefile.kokkos

	KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/core/unit_test

	TEST_TARGETS =
	TARGETS =

	ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
	OBJ_CUDA = TestCuda_Other.o TestCuda_Reductions_a.o TestCuda_Reductions_b.o TestCuda_Atomics.o TestCuda_Team.o TestCuda_Spaces.o
	OBJ_CUDA += TestCuda_SubView_a.o TestCuda_SubView_b.o
	ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
	- OBJ_OPENMP += TestCuda_SubView_c_all.o
	+ OBJ_OPENMP += TestCuda_SubView_c_all.o
	else
	OBJ_CUDA += TestCuda_SubView_c01.o TestCuda_SubView_c02.o TestCuda_SubView_c03.o
	- OBJ_CUDA += TestCuda_SubView_c04.o TestCuda_SubView_c05.o TestCuda_SubView_c06.o
	- OBJ_CUDA += TestCuda_SubView_c07.o TestCuda_SubView_c08.o TestCuda_SubView_c09.o
	+ OBJ_CUDA += TestCuda_SubView_c04.o TestCuda_SubView_c05.o TestCuda_SubView_c06.o
	+ OBJ_CUDA += TestCuda_SubView_c07.o TestCuda_SubView_c08.o TestCuda_SubView_c09.o
	OBJ_CUDA += TestCuda_SubView_c10.o TestCuda_SubView_c11.o TestCuda_SubView_c12.o
	endif
	- OBJ_CUDA += TestCuda_ViewAPI_a.o TestCuda_ViewAPI_b.o TestCuda_ViewAPI_c.o TestCuda_ViewAPI_d.o
	- OBJ_CUDA += TestCuda_ViewAPI_e.o TestCuda_ViewAPI_f.o TestCuda_ViewAPI_g.o TestCuda_ViewAPI_h.o
	+ OBJ_CUDA += TestCuda_ViewAPI_a.o TestCuda_ViewAPI_b.o TestCuda_ViewAPI_c.o TestCuda_ViewAPI_d.o
	+ OBJ_CUDA += TestCuda_ViewAPI_e.o TestCuda_ViewAPI_f.o TestCuda_ViewAPI_g.o TestCuda_ViewAPI_h.o
	OBJ_CUDA += TestCuda_ViewAPI_s.o
	OBJ_CUDA += UnitTestMain.o gtest-all.o
	TARGETS += KokkosCore_UnitTest_Cuda
	TEST_TARGETS += test-cuda
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
	- OBJ_THREADS = TestThreads_Other.o TestThreads_Reductions.o TestThreads_Atomics.o TestThreads_Team.o
	- OBJ_THREADS += TestThreads_SubView_a.o TestThreads_SubView_b.o
	+ OBJ_THREADS = TestThreads_Other.o TestThreads_Reductions.o TestThreads_Atomics.o TestThreads_Team.o
	+ OBJ_THREADS += TestThreads_SubView_a.o TestThreads_SubView_b.o
	OBJ_THREADS += TestThreads_SubView_c01.o TestThreads_SubView_c02.o TestThreads_SubView_c03.o
	- OBJ_THREADS += TestThreads_SubView_c04.o TestThreads_SubView_c05.o TestThreads_SubView_c06.o
	- OBJ_THREADS += TestThreads_SubView_c07.o TestThreads_SubView_c08.o TestThreads_SubView_c09.o
	+ OBJ_THREADS += TestThreads_SubView_c04.o TestThreads_SubView_c05.o TestThreads_SubView_c06.o
	+ OBJ_THREADS += TestThreads_SubView_c07.o TestThreads_SubView_c08.o TestThreads_SubView_c09.o
	OBJ_THREADS += TestThreads_SubView_c10.o TestThreads_SubView_c11.o TestThreads_SubView_c12.o
	- OBJ_THREADS += TestThreads_ViewAPI_a.o TestThreads_ViewAPI_b.o UnitTestMain.o gtest-all.o
	+ OBJ_THREADS += TestThreads_ViewAPI_a.o TestThreads_ViewAPI_b.o UnitTestMain.o gtest-all.o
	TARGETS += KokkosCore_UnitTest_Threads
	TEST_TARGETS += test-threads
	endif

	ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
	OBJ_OPENMP = TestOpenMP_Other.o TestOpenMP_Reductions.o TestOpenMP_Atomics.o TestOpenMP_Team.o
	OBJ_OPENMP += TestOpenMP_SubView_a.o TestOpenMP_SubView_b.o
	ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
	- OBJ_OPENMP += TestOpenMP_SubView_c_all.o
	+ OBJ_OPENMP += TestOpenMP_SubView_c_all.o
	else
	OBJ_OPENMP += TestOpenMP_SubView_c01.o TestOpenMP_SubView_c02.o TestOpenMP_SubView_c03.o
	- OBJ_OPENMP += TestOpenMP_SubView_c04.o TestOpenMP_SubView_c05.o TestOpenMP_SubView_c06.o
	- OBJ_OPENMP += TestOpenMP_SubView_c07.o TestOpenMP_SubView_c08.o TestOpenMP_SubView_c09.o
	+ OBJ_OPENMP += TestOpenMP_SubView_c04.o TestOpenMP_SubView_c05.o TestOpenMP_SubView_c06.o
	+ OBJ_OPENMP += TestOpenMP_SubView_c07.o TestOpenMP_SubView_c08.o TestOpenMP_SubView_c09.o
	OBJ_OPENMP += TestOpenMP_SubView_c10.o TestOpenMP_SubView_c11.o TestOpenMP_SubView_c12.o
	endif
	OBJ_OPENMP += TestOpenMP_ViewAPI_a.o TestOpenMP_ViewAPI_b.o UnitTestMain.o gtest-all.o
	TARGETS += KokkosCore_UnitTest_OpenMP
	TEST_TARGETS += test-openmp
	endif

	+ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 1)
	+ OBJ_QTHREADS = TestQthreads_Other.o TestQthreads_Reductions.o TestQthreads_Atomics.o TestQthreads_Team.o
	+ OBJ_QTHREADS += TestQthreads_SubView_a.o TestQthreads_SubView_b.o
	+ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
	+ OBJ_QTHREADS += TestQthreads_SubView_c_all.o
	+else
	+ OBJ_QTHREADS += TestQthreads_SubView_c01.o TestQthreads_SubView_c02.o TestQthreads_SubView_c03.o
	+ OBJ_QTHREADS += TestQthreads_SubView_c04.o TestQthreads_SubView_c05.o TestQthreads_SubView_c06.o
	+ OBJ_QTHREADS += TestQthreads_SubView_c07.o TestQthreads_SubView_c08.o TestQthreads_SubView_c09.o
	+ OBJ_QTHREADS += TestQthreads_SubView_c10.o TestQthreads_SubView_c11.o TestQthreads_SubView_c12.o
	+endif
	+ OBJ_QTHREADS += TestQthreads_ViewAPI_a.o TestQthreads_ViewAPI_b.o UnitTestMain.o gtest-all.o
	+ TARGETS += KokkosCore_UnitTest_Qthreads
	+ TEST_TARGETS += test-qthreads
	+endif
	+
	ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
	- OBJ_SERIAL = TestSerial_Other.o TestSerial_Reductions.o TestSerial_Atomics.o TestSerial_Team.o
	- OBJ_SERIAL += TestSerial_SubView_a.o TestSerial_SubView_b.o
	+ OBJ_SERIAL = TestSerial_Other.o TestSerial_Reductions.o TestSerial_Atomics.o TestSerial_Team.o
	+ OBJ_SERIAL += TestSerial_SubView_a.o TestSerial_SubView_b.o
	ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
	- OBJ_OPENMP += TestSerial_SubView_c_all.o
	+ OBJ_OPENMP += TestSerial_SubView_c_all.o
	else
	OBJ_SERIAL += TestSerial_SubView_c01.o TestSerial_SubView_c02.o TestSerial_SubView_c03.o
	- OBJ_SERIAL += TestSerial_SubView_c04.o TestSerial_SubView_c05.o TestSerial_SubView_c06.o
	- OBJ_SERIAL += TestSerial_SubView_c07.o TestSerial_SubView_c08.o TestSerial_SubView_c09.o
	+ OBJ_SERIAL += TestSerial_SubView_c04.o TestSerial_SubView_c05.o TestSerial_SubView_c06.o
	+ OBJ_SERIAL += TestSerial_SubView_c07.o TestSerial_SubView_c08.o TestSerial_SubView_c09.o
	OBJ_SERIAL += TestSerial_SubView_c10.o TestSerial_SubView_c11.o TestSerial_SubView_c12.o
	endif
	- OBJ_SERIAL += TestSerial_ViewAPI_a.o TestSerial_ViewAPI_b.o UnitTestMain.o gtest-all.o
	+ OBJ_SERIAL += TestSerial_ViewAPI_a.o TestSerial_ViewAPI_b.o UnitTestMain.o gtest-all.o
	TARGETS += KokkosCore_UnitTest_Serial
	TEST_TARGETS += test-serial
	endif

	-ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
	- OBJ_QTHREAD = TestQthread.o UnitTestMain.o gtest-all.o
	- TARGETS += KokkosCore_UnitTest_Qthread
	- TEST_TARGETS += test-qthread
	-endif
	-
	OBJ_HWLOC = TestHWLOC.o UnitTestMain.o gtest-all.o
	TARGETS += KokkosCore_UnitTest_HWLOC
	TEST_TARGETS += test-hwloc

	OBJ_DEFAULT = TestDefaultDeviceType.o TestDefaultDeviceType_a.o TestDefaultDeviceType_b.o TestDefaultDeviceType_c.o TestDefaultDeviceType_d.o UnitTestMain.o gtest-all.o
	TARGETS += KokkosCore_UnitTest_Default
	TEST_TARGETS += test-default

	NUM_INITTESTS = 16
	INITTESTS_NUMBERS := $(shell seq 1 ${NUM_INITTESTS})
	INITTESTS_TARGETS := $(addprefix KokkosCore_UnitTest_DefaultDeviceTypeInit_,${INITTESTS_NUMBERS})
	TARGETS += ${INITTESTS_TARGETS}
	INITTESTS_TEST_TARGETS := $(addprefix test-default-init-,${INITTESTS_NUMBERS})
	TEST_TARGETS += ${INITTESTS_TEST_TARGETS}

	-OBJ_SYNCHRONIC = TestSynchronic.o UnitTestMain.o gtest-all.o
	-TARGETS += KokkosCore_UnitTest_Synchronic
	-TEST_TARGETS += test-synchronic
	-
	KokkosCore_UnitTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
	$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Cuda

	KokkosCore_UnitTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
	$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Threads

	KokkosCore_UnitTest_OpenMP: $(OBJ_OPENMP) $(KOKKOS_LINK_DEPENDS)
	$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_OPENMP) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_OpenMP

	KokkosCore_UnitTest_Serial: $(OBJ_SERIAL) $(KOKKOS_LINK_DEPENDS)
	$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_SERIAL) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Serial

	-KokkosCore_UnitTest_Qthread: $(OBJ_QTHREAD) $(KOKKOS_LINK_DEPENDS)
	- $(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_QTHREAD) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Qthread
	+KokkosCore_UnitTest_Qthreads: $(OBJ_QTHREADS) $(KOKKOS_LINK_DEPENDS)
	+ $(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_QTHREADS) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Qthreads

	KokkosCore_UnitTest_HWLOC: $(OBJ_HWLOC) $(KOKKOS_LINK_DEPENDS)
	$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_HWLOC) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_HWLOC

	KokkosCore_UnitTest_AllocationTracker: $(OBJ_ALLOCATIONTRACKER) $(KOKKOS_LINK_DEPENDS)
	$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_ALLOCATIONTRACKER) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_AllocationTracker

	KokkosCore_UnitTest_Default: $(OBJ_DEFAULT) $(KOKKOS_LINK_DEPENDS)
	$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_DEFAULT) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Default

	${INITTESTS_TARGETS}: KokkosCore_UnitTest_DefaultDeviceTypeInit_%: TestDefaultDeviceTypeInit_%.o UnitTestMain.o gtest-all.o $(KOKKOS_LINK_DEPENDS)
	$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) TestDefaultDeviceTypeInit_$.o UnitTestMain.o gtest-all.o $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_DefaultDeviceTypeInit_$

	-KokkosCore_UnitTest_Synchronic: $(OBJ_SYNCHRONIC) $(KOKKOS_LINK_DEPENDS)
	- $(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_SYNCHRONIC) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Synchronic
	-
	test-cuda: KokkosCore_UnitTest_Cuda
	./KokkosCore_UnitTest_Cuda

	test-threads: KokkosCore_UnitTest_Threads
	./KokkosCore_UnitTest_Threads

	test-openmp: KokkosCore_UnitTest_OpenMP
	./KokkosCore_UnitTest_OpenMP

	test-serial: KokkosCore_UnitTest_Serial
	./KokkosCore_UnitTest_Serial

	-test-qthread: KokkosCore_UnitTest_Qthread
	- ./KokkosCore_UnitTest_Qthread
	+test-qthreads: KokkosCore_UnitTest_Qthreads
	+ ./KokkosCore_UnitTest_Qthreads

	test-hwloc: KokkosCore_UnitTest_HWLOC
	./KokkosCore_UnitTest_HWLOC

	test-allocationtracker: KokkosCore_UnitTest_AllocationTracker
	./KokkosCore_UnitTest_AllocationTracker

	test-default: KokkosCore_UnitTest_Default
	./KokkosCore_UnitTest_Default

	${INITTESTS_TEST_TARGETS}: test-default-init-%: KokkosCore_UnitTest_DefaultDeviceTypeInit_%
	./KokkosCore_UnitTest_DefaultDeviceTypeInit_$*

	-test-synchronic: KokkosCore_UnitTest_Synchronic
	- ./KokkosCore_UnitTest_Synchronic
	-
	build_all: $(TARGETS)

	test: $(TEST_TARGETS)

	clean: kokkos-clean
	rm -f *.o $(TARGETS)

	# Compilation rules

	%.o:%.cpp $(KOKKOS_CPP_DEPENDS) $(TEST_HEADERS)
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<

	gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
	$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc
	-
	diff --git a/lib/kokkos/core/unit_test/TestAggregate.hpp b/lib/kokkos/core/unit_test/TestAggregate.hpp
	index d22837f3e..f09cc5018 100644
	--- a/lib/kokkos/core/unit_test/TestAggregate.hpp
	+++ b/lib/kokkos/core/unit_test/TestAggregate.hpp
	@@ -1,109 +1,124 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef TEST_AGGREGATE_HPP
	#define TEST_AGGREGATE_HPP

	#include <gtest/gtest.h>

	#include <stdexcept>
	#include <sstream>
	#include <iostream>

	-/--------------------------------------------------------------------------/
	-
	#include <impl/Kokkos_ViewArray.hpp>

	namespace Test {

	template< class DeviceType >
	void TestViewAggregate()
	{
	- typedef Kokkos::Array<double,32> value_type ;
	-
	- typedef Kokkos::Experimental::Impl::
	- ViewDataAnalysis< value_type * , Kokkos::LayoutLeft , value_type >
	- analysis_1d ;
	+ typedef Kokkos::Array< double, 32 > value_type;
	+ typedef Kokkos::Experimental::Impl::ViewDataAnalysis< value_type *, Kokkos::LayoutLeft, value_type > analysis_1d;

	- static_assert( std::is_same< typename analysis_1d::specialize , Kokkos::Array<> >::value , "" );
	+ static_assert( std::is_same< typename analysis_1d::specialize, Kokkos::Array<> >::value, "" );

	+ typedef Kokkos::ViewTraits< value_type **, DeviceType > a32_traits;
	+ typedef Kokkos::ViewTraits< typename a32_traits::scalar_array_type, DeviceType > flat_traits;

	- typedef Kokkos::ViewTraits< value_type ** , DeviceType > a32_traits ;
	- typedef Kokkos::ViewTraits< typename a32_traits::scalar_array_type , DeviceType > flat_traits ;
	+ static_assert( std::is_same< typename a32_traits::specialize, Kokkos::Array<> >::value, "" );
	+ static_assert( std::is_same< typename a32_traits::value_type, value_type >::value, "" );
	+ static_assert( a32_traits::rank == 2, "" );
	+ static_assert( a32_traits::rank_dynamic == 2, "" );

	- static_assert( std::is_same< typename a32_traits::specialize , Kokkos::Array<> >::value , "" );
	- static_assert( std::is_same< typename a32_traits::value_type , value_type >::value , "" );
	- static_assert( a32_traits::rank == 2 , "" );
	- static_assert( a32_traits::rank_dynamic == 2 , "" );
	+ static_assert( std::is_same< typename flat_traits::specialize, void >::value, "" );
	+ static_assert( flat_traits::rank == 3, "" );
	+ static_assert( flat_traits::rank_dynamic == 2, "" );
	+ static_assert( flat_traits::dimension::N2 == 32, "" );

	- static_assert( std::is_same< typename flat_traits::specialize , void >::value , "" );
	- static_assert( flat_traits::rank == 3 , "" );
	- static_assert( flat_traits::rank_dynamic == 2 , "" );
	- static_assert( flat_traits::dimension::N2 == 32 , "" );
	+ typedef Kokkos::View< Kokkos::Array< double, 32 > **, DeviceType > a32_type;
	+ typedef typename a32_type::array_type a32_flat_type;

	+ static_assert( std::is_same< typename a32_type::value_type, value_type >::value, "" );
	+ static_assert( std::is_same< typename a32_type::pointer_type, double * >::value, "" );
	+ static_assert( a32_type::Rank == 2, "" );
	+ static_assert( a32_flat_type::Rank == 3, "" );

	- typedef Kokkos::View< Kokkos::Array<double,32> ** , DeviceType > a32_type ;
	-
	- typedef typename a32_type::array_type a32_flat_type ;
	-
	- static_assert( std::is_same< typename a32_type::value_type , value_type >::value , "" );
	- static_assert( std::is_same< typename a32_type::pointer_type , double * >::value , "" );
	- static_assert( a32_type::Rank == 2 , "" );
	- static_assert( a32_flat_type::Rank == 3 , "" );
	-
	- a32_type x("test",4,5);
	+ a32_type x( "test", 4, 5 );
	a32_flat_type y( x );

	- ASSERT_EQ( x.extent(0) , 4 );
	- ASSERT_EQ( x.extent(1) , 5 );
	- ASSERT_EQ( y.extent(0) , 4 );
	- ASSERT_EQ( y.extent(1) , 5 );
	- ASSERT_EQ( y.extent(2) , 32 );
	-}
	-
	+ ASSERT_EQ( x.extent( 0 ), 4 );
	+ ASSERT_EQ( x.extent( 1 ), 5 );
	+ ASSERT_EQ( y.extent( 0 ), 4 );
	+ ASSERT_EQ( y.extent( 1 ), 5 );
	+ ASSERT_EQ( y.extent( 2 ), 32 );
	+
	+ // Initialize arrays from brace-init-list as for std::array.
	+ //
	+ // Comment: Clang will issue the following warning if we don't use double
	+ // braces here (one for initializing the Kokkos::Array and one for
	+ // initializing the sub-aggreagate C-array data member),
	+ //
	+ // warning: suggest braces around initialization of subobject
	+ //
	+ // but single brace syntax would be valid as well.
	+ Kokkos::Array< float, 2 > aggregate_initialization_syntax_1 = { { 1.41, 3.14 } };
	+ ASSERT_FLOAT_EQ( aggregate_initialization_syntax_1[0], 1.41 );
	+ ASSERT_FLOAT_EQ( aggregate_initialization_syntax_1[1], 3.14 );
	+
	+ Kokkos::Array< int, 3 > aggregate_initialization_syntax_2{ { 0, 1, 2 } }; // since C++11
	+ for ( int i = 0; i < 3; ++i ) {
	+ ASSERT_EQ( aggregate_initialization_syntax_2[i], i );
	+ }
	+
	+ // Note that this is a valid initialization.
	+ Kokkos::Array< double, 3 > initialized_with_one_argument_missing = { { 255, 255 } };
	+ for (int i = 0; i < 2; ++i) {
	+ ASSERT_DOUBLE_EQ( initialized_with_one_argument_missing[i], 255 );
	+ }
	+ // But the following line would not compile
	+// Kokkos::Array< double, 3 > initialized_with_too_many{ { 1, 2, 3, 4 } };
	}

	-/--------------------------------------------------------------------------/
	-/--------------------------------------------------------------------------/
	+} // namespace Test

	#endif /* #ifndef TEST_AGGREGATE_HPP */
	diff --git a/lib/kokkos/core/unit_test/TestAtomic.hpp b/lib/kokkos/core/unit_test/TestAtomic.hpp
	index e94872357..ff77b8dca 100644
	--- a/lib/kokkos/core/unit_test/TestAtomic.hpp
	+++ b/lib/kokkos/core/unit_test/TestAtomic.hpp
	@@ -1,402 +1,433 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Core.hpp>

	namespace TestAtomic {

	-// Struct for testing arbitrary size atomics
	+// Struct for testing arbitrary size atomics.

	-template<int N>
	+template< int N >
	struct SuperScalar {
	double val[N];

	KOKKOS_INLINE_FUNCTION
	SuperScalar() {
	- for(int i=0; i<N; i++)
	+ for ( int i = 0; i < N; i++ ) {
	val[i] = 0.0;
	+ }
	}

	KOKKOS_INLINE_FUNCTION
	- SuperScalar(const SuperScalar& src) {
	- for(int i=0; i<N; i++)
	+ SuperScalar( const SuperScalar & src ) {
	+ for ( int i = 0; i < N; i++ ) {
	val[i] = src.val[i];
	+ }
	}

	KOKKOS_INLINE_FUNCTION
	- SuperScalar(const volatile SuperScalar& src) {
	- for(int i=0; i<N; i++)
	+ SuperScalar( const volatile SuperScalar & src ) {
	+ for ( int i = 0; i < N; i++ ) {
	val[i] = src.val[i];
	+ }
	}

	KOKKOS_INLINE_FUNCTION
	- SuperScalar& operator = (const SuperScalar& src) {
	- for(int i=0; i<N; i++)
	+ SuperScalar& operator=( const SuperScalar & src ) {
	+ for ( int i = 0; i < N; i++ ) {
	val[i] = src.val[i];
	+ }
	return *this;
	}

	KOKKOS_INLINE_FUNCTION
	- SuperScalar& operator = (const volatile SuperScalar& src) {
	- for(int i=0; i<N; i++)
	+ SuperScalar& operator=( const volatile SuperScalar & src ) {
	+ for ( int i = 0; i < N; i++ ) {
	val[i] = src.val[i];
	+ }
	return *this;
	}

	KOKKOS_INLINE_FUNCTION
	- void operator = (const SuperScalar& src) volatile {
	- for(int i=0; i<N; i++)
	+ void operator=( const SuperScalar & src ) volatile {
	+ for ( int i = 0; i < N; i++ ) {
	val[i] = src.val[i];
	+ }
	}

	KOKKOS_INLINE_FUNCTION
	- SuperScalar operator + (const SuperScalar& src) {
	+ SuperScalar operator+( const SuperScalar & src ) {
	SuperScalar tmp = *this;
	- for(int i=0; i<N; i++)
	+ for ( int i = 0; i < N; i++ ) {
	tmp.val[i] += src.val[i];
	+ }
	return tmp;
	}

	KOKKOS_INLINE_FUNCTION
	- SuperScalar& operator += (const double& src) {
	- for(int i=0; i<N; i++)
	- val[i] += 1.0(i+1)src;
	+ SuperScalar& operator+=( const double & src ) {
	+ for ( int i = 0; i < N; i++ ) {
	+ val[i] += 1.0 * ( i + 1 ) * src;
	+ }
	return *this;
	}

	KOKKOS_INLINE_FUNCTION
	- SuperScalar& operator += (const SuperScalar& src) {
	- for(int i=0; i<N; i++)
	+ SuperScalar& operator+=( const SuperScalar & src ) {
	+ for ( int i = 0; i < N; i++ ) {
	val[i] += src.val[i];
	+ }
	return *this;
	}

	KOKKOS_INLINE_FUNCTION
	- bool operator == (const SuperScalar& src) {
	+ bool operator==( const SuperScalar & src ) {
	bool compare = true;
	- for(int i=0; i<N; i++)
	- compare = compare && ( val[i] == src.val[i]);
	+ for( int i = 0; i < N; i++ ) {
	+ compare = compare && ( val[i] == src.val[i] );
	+ }
	return compare;
	}

	KOKKOS_INLINE_FUNCTION
	- bool operator != (const SuperScalar& src) {
	+ bool operator!=( const SuperScalar & src ) {
	bool compare = true;
	- for(int i=0; i<N; i++)
	- compare = compare && ( val[i] == src.val[i]);
	+ for ( int i = 0; i < N; i++ ) {
	+ compare = compare && ( val[i] == src.val[i] );
	+ }
	return !compare;
	}

	-
	-
	KOKKOS_INLINE_FUNCTION
	- SuperScalar(const double& src) {
	- for(int i=0; i<N; i++)
	- val[i] = 1.0 * (i+1) * src;
	+ SuperScalar( const double & src ) {
	+ for ( int i = 0; i < N; i++ ) {
	+ val[i] = 1.0 * ( i + 1 ) * src;
	+ }
	}
	-
	};

	-template<int N>
	-std::ostream& operator<<(std::ostream& os, const SuperScalar<N>& dt)
	+template< int N >
	+std::ostream & operator<<( std::ostream & os, const SuperScalar< N > & dt )
	{
	- os << "{ ";
	- for(int i=0;i<N-1;i++)
	- os << dt.val[i] << ", ";
	- os << dt.val[N-1] << "}";
	- return os;
	+ os << "{ ";
	+ for ( int i = 0; i < N - 1; i++ ) {
	+ os << dt.val[i] << ", ";
	+ }
	+ os << dt.val[N-1] << "}";
	+
	+ return os;
	}

	-template<class T,class DEVICE_TYPE>
	+template< class T, class DEVICE_TYPE >
	struct ZeroFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef typename Kokkos::View<T,execution_space> type;
	- typedef typename Kokkos::View<T,execution_space>::HostMirror h_type;
	+ typedef typename Kokkos::View< T, execution_space > type;
	+ typedef typename Kokkos::View< T, execution_space >::HostMirror h_type;
	+
	type data;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	+ void operator()( int ) const {
	data() = 0;
	}
	};

	//---------------------------------------------------
	//--------------atomic_fetch_add---------------------
	//---------------------------------------------------

	-template<class T,class DEVICE_TYPE>
	-struct AddFunctor{
	+template< class T, class DEVICE_TYPE >
	+struct AddFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef Kokkos::View<T,execution_space> type;
	+ typedef Kokkos::View< T, execution_space > type;
	+
	type data;

	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	- Kokkos::atomic_fetch_add(&data(),(T)1);
	+ void operator()( int ) const {
	+ Kokkos::atomic_fetch_add( &data(), (T) 1 );
	}
	};

	-template<class T, class execution_space >
	-T AddLoop(int loop) {
	- struct ZeroFunctor<T,execution_space> f_zero;
	- typename ZeroFunctor<T,execution_space>::type data("Data");
	- typename ZeroFunctor<T,execution_space>::h_type h_data("HData");
	+template< class T, class execution_space >
	+T AddLoop( int loop ) {
	+ struct ZeroFunctor< T, execution_space > f_zero;
	+ typename ZeroFunctor< T, execution_space >::type data( "Data" );
	+ typename ZeroFunctor< T, execution_space >::h_type h_data( "HData" );
	+
	f_zero.data = data;
	- Kokkos::parallel_for(1,f_zero);
	+ Kokkos::parallel_for( 1, f_zero );
	execution_space::fence();

	- struct AddFunctor<T,execution_space> f_add;
	+ struct AddFunctor< T, execution_space > f_add;
	+
	f_add.data = data;
	- Kokkos::parallel_for(loop,f_add);
	+ Kokkos::parallel_for( loop, f_add );
	execution_space::fence();

	- Kokkos::deep_copy(h_data,data);
	+ Kokkos::deep_copy( h_data, data );
	T val = h_data();
	+
	return val;
	}

	-template<class T>
	-T AddLoopSerial(int loop) {
	+template< class T >
	+T AddLoopSerial( int loop ) {
	T* data = new T[1];
	data[0] = 0;

	- for(int i=0;i<loop;i++)
	- *data+=(T)1;
	+ for ( int i = 0; i < loop; i++ ) {
	+ *data += (T) 1;
	+ }

	T val = *data;
	delete [] data;
	+
	return val;
	}

	//------------------------------------------------------
	//--------------atomic_compare_exchange-----------------
	//------------------------------------------------------

	-template<class T,class DEVICE_TYPE>
	-struct CASFunctor{
	+template< class T, class DEVICE_TYPE >
	+struct CASFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef Kokkos::View<T,execution_space> type;
	+ typedef Kokkos::View< T, execution_space > type;
	+
	type data;

	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	- T old = data();
	- T newval, assumed;
	- do {
	- assumed = old;
	- newval = assumed + (T)1;
	- old = Kokkos::atomic_compare_exchange(&data(), assumed, newval);
	- }
	- while( old != assumed );
	+ void operator()( int ) const {
	+ T old = data();
	+ T newval, assumed;
	+
	+ do {
	+ assumed = old;
	+ newval = assumed + (T) 1;
	+ old = Kokkos::atomic_compare_exchange( &data(), assumed, newval );
	+ } while( old != assumed );
	}
	};

	-template<class T, class execution_space >
	-T CASLoop(int loop) {
	- struct ZeroFunctor<T,execution_space> f_zero;
	- typename ZeroFunctor<T,execution_space>::type data("Data");
	- typename ZeroFunctor<T,execution_space>::h_type h_data("HData");
	+template< class T, class execution_space >
	+T CASLoop( int loop ) {
	+ struct ZeroFunctor< T, execution_space > f_zero;
	+ typename ZeroFunctor< T, execution_space >::type data( "Data" );
	+ typename ZeroFunctor< T, execution_space >::h_type h_data( "HData" );
	+
	f_zero.data = data;
	- Kokkos::parallel_for(1,f_zero);
	+ Kokkos::parallel_for( 1, f_zero );
	execution_space::fence();

	- struct CASFunctor<T,execution_space> f_cas;
	+ struct CASFunctor< T, execution_space > f_cas;
	+
	f_cas.data = data;
	- Kokkos::parallel_for(loop,f_cas);
	+ Kokkos::parallel_for( loop, f_cas );
	execution_space::fence();

	- Kokkos::deep_copy(h_data,data);
	+ Kokkos::deep_copy( h_data, data );
	T val = h_data();

	return val;
	}

	-template<class T>
	-T CASLoopSerial(int loop) {
	+template< class T >
	+T CASLoopSerial( int loop ) {
	T* data = new T[1];
	data[0] = 0;

	- for(int i=0;i<loop;i++) {
	- T assumed;
	- T newval;
	- T old;
	- do {
	- assumed = *data;
	- newval = assumed + (T)1;
	- old = *data;
	- *data = newval;
	- }
	- while(!(assumed==old));
	+ for ( int i = 0; i < loop; i++ ) {
	+ T assumed;
	+ T newval;
	+ T old;
	+
	+ do {
	+ assumed = *data;
	+ newval = assumed + (T) 1;
	+ old = *data;
	+ *data = newval;
	+ } while( !( assumed == old ) );
	}

	T val = *data;
	delete [] data;
	+
	return val;
	}

	//----------------------------------------------
	//--------------atomic_exchange-----------------
	//----------------------------------------------

	-template<class T,class DEVICE_TYPE>
	-struct ExchFunctor{
	+template< class T, class DEVICE_TYPE >
	+struct ExchFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef Kokkos::View<T,execution_space> type;
	+ typedef Kokkos::View< T, execution_space > type;
	+
	type data, data2;

	KOKKOS_INLINE_FUNCTION
	- void operator()(int i) const {
	- T old = Kokkos::atomic_exchange(&data(),(T)i);
	- Kokkos::atomic_fetch_add(&data2(),old);
	+ void operator()( int i ) const {
	+ T old = Kokkos::atomic_exchange( &data(), (T) i );
	+ Kokkos::atomic_fetch_add( &data2(), old );
	}
	};

	-template<class T, class execution_space >
	-T ExchLoop(int loop) {
	- struct ZeroFunctor<T,execution_space> f_zero;
	- typename ZeroFunctor<T,execution_space>::type data("Data");
	- typename ZeroFunctor<T,execution_space>::h_type h_data("HData");
	+template< class T, class execution_space >
	+T ExchLoop( int loop ) {
	+ struct ZeroFunctor< T, execution_space > f_zero;
	+ typename ZeroFunctor< T, execution_space >::type data( "Data" );
	+ typename ZeroFunctor< T, execution_space >::h_type h_data( "HData" );
	+
	f_zero.data = data;
	- Kokkos::parallel_for(1,f_zero);
	+ Kokkos::parallel_for( 1, f_zero );
	execution_space::fence();

	- typename ZeroFunctor<T,execution_space>::type data2("Data");
	- typename ZeroFunctor<T,execution_space>::h_type h_data2("HData");
	+ typename ZeroFunctor< T, execution_space >::type data2( "Data" );
	+ typename ZeroFunctor< T, execution_space >::h_type h_data2( "HData" );
	+
	f_zero.data = data2;
	- Kokkos::parallel_for(1,f_zero);
	+ Kokkos::parallel_for( 1, f_zero );
	execution_space::fence();

	- struct ExchFunctor<T,execution_space> f_exch;
	+ struct ExchFunctor< T, execution_space > f_exch;
	+
	f_exch.data = data;
	f_exch.data2 = data2;
	- Kokkos::parallel_for(loop,f_exch);
	+ Kokkos::parallel_for( loop, f_exch );
	execution_space::fence();

	- Kokkos::deep_copy(h_data,data);
	- Kokkos::deep_copy(h_data2,data2);
	+ Kokkos::deep_copy( h_data, data );
	+ Kokkos::deep_copy( h_data2, data2 );
	T val = h_data() + h_data2();

	return val;
	}

	-template<class T>
	-T ExchLoopSerial(typename std::conditional<!std::is_same<T,Kokkos::complex<double> >::value,int,void>::type loop) {
	+template< class T >
	+T ExchLoopSerial( typename std::conditional< !std::is_same< T, Kokkos::complex<double> >::value, int, void >::type loop ) {
	T* data = new T[1];
	T* data2 = new T[1];
	data[0] = 0;
	data2[0] = 0;
	- for(int i=0;i<loop;i++) {
	- T old = *data;
	- *data=(T) i;
	- *data2+=old;
	+
	+ for ( int i = 0; i < loop; i++ ) {
	+ T old = *data;
	+ *data = (T) i;
	+ *data2 += old;
	}

	T val = data2 + data;
	delete [] data;
	delete [] data2;
	+
	return val;
	}

	-template<class T>
	-T ExchLoopSerial(typename std::conditional<std::is_same<T,Kokkos::complex<double> >::value,int,void>::type loop) {
	+template< class T >
	+T ExchLoopSerial( typename std::conditional< std::is_same< T, Kokkos::complex<double> >::value, int, void >::type loop ) {
	T* data = new T[1];
	T* data2 = new T[1];
	data[0] = 0;
	data2[0] = 0;
	- for(int i=0;i<loop;i++) {
	- T old = *data;
	- data->real() = (static_cast<double>(i));
	- data->imag() = 0;
	- *data2+=old;
	+
	+ for ( int i = 0; i < loop; i++ ) {
	+ T old = *data;
	+ data->real() = ( static_cast<double>( i ) );
	+ data->imag() = 0;
	+ *data2 += old;
	}

	T val = data2 + data;
	delete [] data;
	delete [] data2;
	+
	return val;
	}

	-template<class T, class DeviceType >
	-T LoopVariant(int loop, int test) {
	- switch (test) {
	- case 1: return AddLoop<T,DeviceType>(loop);
	- case 2: return CASLoop<T,DeviceType>(loop);
	- case 3: return ExchLoop<T,DeviceType>(loop);
	+template< class T, class DeviceType >
	+T LoopVariant( int loop, int test ) {
	+ switch ( test ) {
	+ case 1: return AddLoop< T, DeviceType >( loop );
	+ case 2: return CASLoop< T, DeviceType >( loop );
	+ case 3: return ExchLoop< T, DeviceType >( loop );
	}
	+
	return 0;
	}

	-template<class T>
	-T LoopVariantSerial(int loop, int test) {
	- switch (test) {
	- case 1: return AddLoopSerial<T>(loop);
	- case 2: return CASLoopSerial<T>(loop);
	- case 3: return ExchLoopSerial<T>(loop);
	+template< class T >
	+T LoopVariantSerial( int loop, int test ) {
	+ switch ( test ) {
	+ case 1: return AddLoopSerial< T >( loop );
	+ case 2: return CASLoopSerial< T >( loop );
	+ case 3: return ExchLoopSerial< T >( loop );
	}
	+
	return 0;
	}

	-template<class T,class DeviceType>
	-bool Loop(int loop, int test)
	+template< class T, class DeviceType >
	+bool Loop( int loop, int test )
	{
	- T res = LoopVariant<T,DeviceType>(loop,test);
	- T resSerial = LoopVariantSerial<T>(loop,test);
	+ T res = LoopVariant< T, DeviceType >( loop, test );
	+ T resSerial = LoopVariantSerial< T >( loop, test );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = "
	<< test << " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	-
	- return passed ;
	-}
	-
	+ return passed;
	}

	+} // namespace TestAtomic
	diff --git a/lib/kokkos/core/unit_test/TestAtomicOperations.hpp b/lib/kokkos/core/unit_test/TestAtomicOperations.hpp
	index 7f1519045..e3ceca404 100644
	--- a/lib/kokkos/core/unit_test/TestAtomicOperations.hpp
	+++ b/lib/kokkos/core/unit_test/TestAtomicOperations.hpp
	@@ -1,985 +1,1059 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Core.hpp>

	namespace TestAtomicOperations {

	//-----------------------------------------------
	//--------------zero_functor---------------------
	//-----------------------------------------------

	-template<class T,class DEVICE_TYPE>
	+template< class T, class DEVICE_TYPE >
	struct ZeroFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef typename Kokkos::View<T,execution_space> type;
	- typedef typename Kokkos::View<T,execution_space>::HostMirror h_type;
	+ typedef typename Kokkos::View< T, execution_space > type;
	+ typedef typename Kokkos::View< T, execution_space >::HostMirror h_type;
	+
	type data;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	+ void operator()( int ) const {
	data() = 0;
	}
	};

	//-----------------------------------------------
	//--------------init_functor---------------------
	//-----------------------------------------------

	-template<class T,class DEVICE_TYPE>
	+template< class T, class DEVICE_TYPE >
	struct InitFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef typename Kokkos::View<T,execution_space> type;
	- typedef typename Kokkos::View<T,execution_space>::HostMirror h_type;
	+ typedef typename Kokkos::View< T, execution_space > type;
	+ typedef typename Kokkos::View< T, execution_space >::HostMirror h_type;
	+
	type data;
	- T init_value ;
	+ T init_value;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	+ void operator()( int ) const {
	data() = init_value;
	}

	- InitFunctor(T _init_value) : init_value(_init_value) {}
	+ InitFunctor( T _init_value ) : init_value( _init_value ) {}
	};

	-
	//---------------------------------------------------
	//--------------atomic_fetch_max---------------------
	//---------------------------------------------------

	-template<class T,class DEVICE_TYPE>
	-struct MaxFunctor{
	+template< class T, class DEVICE_TYPE >
	+struct MaxFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef Kokkos::View<T,execution_space> type;
	+ typedef Kokkos::View< T, execution_space > type;
	+
	type data;
	T i0;
	T i1;

	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	- //Kokkos::atomic_fetch_max(&data(),(T)1);
	- Kokkos::atomic_fetch_max(&data(),(T)i1);
	+ void operator()( int ) const {
	+ //Kokkos::atomic_fetch_max( &data(), (T) 1 );
	+ Kokkos::atomic_fetch_max( &data(), (T) i1 );
	}
	- MaxFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
	+ MaxFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
	};

	-template<class T, class execution_space >
	-T MaxAtomic(T i0 , T i1) {
	- struct InitFunctor<T,execution_space> f_init(i0);
	- typename InitFunctor<T,execution_space>::type data("Data");
	- typename InitFunctor<T,execution_space>::h_type h_data("HData");
	+template< class T, class execution_space >
	+T MaxAtomic( T i0, T i1 ) {
	+ struct InitFunctor< T, execution_space > f_init( i0 );
	+ typename InitFunctor< T, execution_space >::type data( "Data" );
	+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
	+
	f_init.data = data;
	- Kokkos::parallel_for(1,f_init);
	+ Kokkos::parallel_for( 1, f_init );
	execution_space::fence();

	- struct MaxFunctor<T,execution_space> f(i0,i1);
	+ struct MaxFunctor< T, execution_space > f( i0, i1 );
	+
	f.data = data;
	- Kokkos::parallel_for(1,f);
	+ Kokkos::parallel_for( 1, f );
	execution_space::fence();

	- Kokkos::deep_copy(h_data,data);
	+ Kokkos::deep_copy( h_data, data );
	T val = h_data();
	+
	return val;
	}

	-template<class T>
	-T MaxAtomicCheck(T i0 , T i1) {
	+template< class T >
	+T MaxAtomicCheck( T i0, T i1 ) {
	T* data = new T[1];
	data[0] = 0;

	- *data = (i0 > i1 ? i0 : i1) ;
	+ *data = ( i0 > i1 ? i0 : i1 );

	T val = *data;
	delete [] data;
	+
	return val;
	}

	-template<class T,class DeviceType>
	-bool MaxAtomicTest(T i0, T i1)
	+template< class T, class DeviceType >
	+bool MaxAtomicTest( T i0, T i1 )
	{
	- T res = MaxAtomic<T,DeviceType>(i0,i1);
	- T resSerial = MaxAtomicCheck<T>(i0,i1);
	+ T res = MaxAtomic< T, DeviceType >( i0, i1 );
	+ T resSerial = MaxAtomicCheck<T>( i0, i1 );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = MaxAtomicTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	//---------------------------------------------------
	//--------------atomic_fetch_min---------------------
	//---------------------------------------------------

	-template<class T,class DEVICE_TYPE>
	-struct MinFunctor{
	+template< class T, class DEVICE_TYPE >
	+struct MinFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef Kokkos::View<T,execution_space> type;
	+ typedef Kokkos::View< T, execution_space > type;
	+
	type data;
	T i0;
	T i1;

	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	- Kokkos::atomic_fetch_min(&data(),(T)i1);
	+ void operator()( int ) const {
	+ Kokkos::atomic_fetch_min( &data(), (T) i1 );
	}
	- MinFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
	+
	+ MinFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
	};

	-template<class T, class execution_space >
	-T MinAtomic(T i0 , T i1) {
	- struct InitFunctor<T,execution_space> f_init(i0);
	- typename InitFunctor<T,execution_space>::type data("Data");
	- typename InitFunctor<T,execution_space>::h_type h_data("HData");
	+template< class T, class execution_space >
	+T MinAtomic( T i0, T i1 ) {
	+ struct InitFunctor< T, execution_space > f_init( i0 );
	+ typename InitFunctor< T, execution_space >::type data( "Data" );
	+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
	+
	f_init.data = data;
	- Kokkos::parallel_for(1,f_init);
	+ Kokkos::parallel_for( 1, f_init );
	execution_space::fence();

	- struct MinFunctor<T,execution_space> f(i0,i1);
	+ struct MinFunctor< T, execution_space > f( i0, i1 );
	+
	f.data = data;
	- Kokkos::parallel_for(1,f);
	+ Kokkos::parallel_for( 1, f );
	execution_space::fence();

	- Kokkos::deep_copy(h_data,data);
	+ Kokkos::deep_copy( h_data, data );
	T val = h_data();
	+
	return val;
	}

	-template<class T>
	-T MinAtomicCheck(T i0 , T i1) {
	+template< class T >
	+T MinAtomicCheck( T i0, T i1 ) {
	T* data = new T[1];
	data[0] = 0;

	- *data = (i0 < i1 ? i0 : i1) ;
	+ *data = ( i0 < i1 ? i0 : i1 );

	T val = *data;
	delete [] data;
	+
	return val;
	}

	-template<class T,class DeviceType>
	-bool MinAtomicTest(T i0, T i1)
	+template< class T, class DeviceType >
	+bool MinAtomicTest( T i0, T i1 )
	{
	- T res = MinAtomic<T,DeviceType>(i0,i1);
	- T resSerial = MinAtomicCheck<T>(i0,i1);
	+ T res = MinAtomic< T, DeviceType >( i0, i1 );
	+ T resSerial = MinAtomicCheck< T >( i0, i1 );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = MinAtomicTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	//---------------------------------------------------
	//--------------atomic_increment---------------------
	//---------------------------------------------------

	-template<class T,class DEVICE_TYPE>
	-struct IncFunctor{
	+template< class T, class DEVICE_TYPE >
	+struct IncFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef Kokkos::View<T,execution_space> type;
	+ typedef Kokkos::View< T, execution_space > type;
	+
	type data;
	T i0;

	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	- Kokkos::atomic_increment(&data());
	+ void operator()( int ) const {
	+ Kokkos::atomic_increment( &data() );
	}
	- IncFunctor( T _i0 ) : i0(_i0) {}
	+
	+ IncFunctor( T _i0 ) : i0( _i0 ) {}
	};

	-template<class T, class execution_space >
	-T IncAtomic(T i0) {
	- struct InitFunctor<T,execution_space> f_init(i0);
	- typename InitFunctor<T,execution_space>::type data("Data");
	- typename InitFunctor<T,execution_space>::h_type h_data("HData");
	+template< class T, class execution_space >
	+T IncAtomic( T i0 ) {
	+ struct InitFunctor< T, execution_space > f_init( i0 );
	+ typename InitFunctor< T, execution_space >::type data( "Data" );
	+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
	+
	f_init.data = data;
	- Kokkos::parallel_for(1,f_init);
	+ Kokkos::parallel_for( 1, f_init );
	execution_space::fence();

	- struct IncFunctor<T,execution_space> f(i0);
	+ struct IncFunctor< T, execution_space > f( i0 );
	+
	f.data = data;
	- Kokkos::parallel_for(1,f);
	+ Kokkos::parallel_for( 1, f );
	execution_space::fence();

	- Kokkos::deep_copy(h_data,data);
	+ Kokkos::deep_copy( h_data, data );
	T val = h_data();
	+
	return val;
	}

	-template<class T>
	-T IncAtomicCheck(T i0) {
	+template< class T >
	+T IncAtomicCheck( T i0 ) {
	T* data = new T[1];
	data[0] = 0;

	*data = i0 + 1;

	T val = *data;
	delete [] data;
	+
	return val;
	}

	-template<class T,class DeviceType>
	-bool IncAtomicTest(T i0)
	+template< class T, class DeviceType >
	+bool IncAtomicTest( T i0 )
	{
	- T res = IncAtomic<T,DeviceType>(i0);
	- T resSerial = IncAtomicCheck<T>(i0);
	+ T res = IncAtomic< T, DeviceType >( i0 );
	+ T resSerial = IncAtomicCheck< T >( i0 );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = IncAtomicTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	//---------------------------------------------------
	//--------------atomic_decrement---------------------
	//---------------------------------------------------

	-template<class T,class DEVICE_TYPE>
	-struct DecFunctor{
	+template< class T, class DEVICE_TYPE >
	+struct DecFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef Kokkos::View<T,execution_space> type;
	+ typedef Kokkos::View< T, execution_space > type;
	+
	type data;
	T i0;

	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	- Kokkos::atomic_decrement(&data());
	+ void operator()( int ) const {
	+ Kokkos::atomic_decrement( &data() );
	}
	- DecFunctor( T _i0 ) : i0(_i0) {}
	+
	+ DecFunctor( T _i0 ) : i0( _i0 ) {}
	};

	-template<class T, class execution_space >
	-T DecAtomic(T i0) {
	- struct InitFunctor<T,execution_space> f_init(i0);
	- typename InitFunctor<T,execution_space>::type data("Data");
	- typename InitFunctor<T,execution_space>::h_type h_data("HData");
	+template< class T, class execution_space >
	+T DecAtomic( T i0 ) {
	+ struct InitFunctor< T, execution_space > f_init( i0 );
	+ typename InitFunctor< T, execution_space >::type data( "Data" );
	+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
	+
	f_init.data = data;
	- Kokkos::parallel_for(1,f_init);
	+ Kokkos::parallel_for( 1, f_init );
	execution_space::fence();

	- struct DecFunctor<T,execution_space> f(i0);
	+ struct DecFunctor< T, execution_space > f( i0 );
	+
	f.data = data;
	- Kokkos::parallel_for(1,f);
	+ Kokkos::parallel_for( 1, f );
	execution_space::fence();

	- Kokkos::deep_copy(h_data,data);
	+ Kokkos::deep_copy( h_data, data );
	T val = h_data();
	+
	return val;
	}

	-template<class T>
	-T DecAtomicCheck(T i0) {
	+template< class T >
	+T DecAtomicCheck( T i0 ) {
	T* data = new T[1];
	data[0] = 0;

	*data = i0 - 1;

	T val = *data;
	delete [] data;
	+
	return val;
	}

	-template<class T,class DeviceType>
	-bool DecAtomicTest(T i0)
	+template< class T, class DeviceType >
	+bool DecAtomicTest( T i0 )
	{
	- T res = DecAtomic<T,DeviceType>(i0);
	- T resSerial = DecAtomicCheck<T>(i0);
	+ T res = DecAtomic< T, DeviceType >( i0 );
	+ T resSerial = DecAtomicCheck< T >( i0 );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = DecAtomicTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	//---------------------------------------------------
	//--------------atomic_fetch_mul---------------------
	//---------------------------------------------------

	-template<class T,class DEVICE_TYPE>
	-struct MulFunctor{
	+template< class T, class DEVICE_TYPE >
	+struct MulFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef Kokkos::View<T,execution_space> type;
	+ typedef Kokkos::View< T, execution_space > type;
	+
	type data;
	T i0;
	T i1;

	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	- Kokkos::atomic_fetch_mul(&data(),(T)i1);
	+ void operator()( int ) const {
	+ Kokkos::atomic_fetch_mul( &data(), (T) i1 );
	}
	- MulFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
	+
	+ MulFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
	};

	-template<class T, class execution_space >
	-T MulAtomic(T i0 , T i1) {
	- struct InitFunctor<T,execution_space> f_init(i0);
	- typename InitFunctor<T,execution_space>::type data("Data");
	- typename InitFunctor<T,execution_space>::h_type h_data("HData");
	+template< class T, class execution_space >
	+T MulAtomic( T i0, T i1 ) {
	+ struct InitFunctor< T, execution_space > f_init( i0 );
	+ typename InitFunctor< T, execution_space >::type data( "Data" );
	+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
	+
	f_init.data = data;
	- Kokkos::parallel_for(1,f_init);
	+ Kokkos::parallel_for( 1, f_init );
	execution_space::fence();

	- struct MulFunctor<T,execution_space> f(i0,i1);
	+ struct MulFunctor< T, execution_space > f( i0, i1 );
	+
	f.data = data;
	- Kokkos::parallel_for(1,f);
	+ Kokkos::parallel_for( 1, f );
	execution_space::fence();

	- Kokkos::deep_copy(h_data,data);
	+ Kokkos::deep_copy( h_data, data );
	T val = h_data();
	+
	return val;
	}

	-template<class T>
	-T MulAtomicCheck(T i0 , T i1) {
	+template< class T >
	+T MulAtomicCheck( T i0, T i1 ) {
	T* data = new T[1];
	data[0] = 0;

	- data = i0i1 ;
	+ data = i0i1;

	T val = *data;
	delete [] data;
	+
	return val;
	}

	-template<class T,class DeviceType>
	-bool MulAtomicTest(T i0, T i1)
	+template< class T, class DeviceType >
	+bool MulAtomicTest( T i0, T i1 )
	{
	- T res = MulAtomic<T,DeviceType>(i0,i1);
	- T resSerial = MulAtomicCheck<T>(i0,i1);
	+ T res = MulAtomic< T, DeviceType >( i0, i1 );
	+ T resSerial = MulAtomicCheck< T >( i0, i1 );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = MulAtomicTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	//---------------------------------------------------
	//--------------atomic_fetch_div---------------------
	//---------------------------------------------------

	-template<class T,class DEVICE_TYPE>
	-struct DivFunctor{
	+template< class T, class DEVICE_TYPE >
	+struct DivFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef Kokkos::View<T,execution_space> type;
	+ typedef Kokkos::View< T, execution_space > type;
	+
	type data;
	T i0;
	T i1;

	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	- Kokkos::atomic_fetch_div(&data(),(T)i1);
	+ void operator()( int ) const {
	+ Kokkos::atomic_fetch_div( &data(), (T) i1 );
	}
	- DivFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
	+
	+ DivFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
	};

	-template<class T, class execution_space >
	-T DivAtomic(T i0 , T i1) {
	- struct InitFunctor<T,execution_space> f_init(i0);
	- typename InitFunctor<T,execution_space>::type data("Data");
	- typename InitFunctor<T,execution_space>::h_type h_data("HData");
	+template< class T, class execution_space >
	+T DivAtomic( T i0, T i1 ) {
	+ struct InitFunctor< T, execution_space > f_init( i0 );
	+ typename InitFunctor< T, execution_space >::type data( "Data" );
	+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
	+
	f_init.data = data;
	- Kokkos::parallel_for(1,f_init);
	+ Kokkos::parallel_for( 1, f_init );
	execution_space::fence();

	- struct DivFunctor<T,execution_space> f(i0,i1);
	+ struct DivFunctor< T, execution_space > f( i0, i1 );
	+
	f.data = data;
	- Kokkos::parallel_for(1,f);
	+ Kokkos::parallel_for( 1, f );
	execution_space::fence();

	- Kokkos::deep_copy(h_data,data);
	+ Kokkos::deep_copy( h_data, data );
	T val = h_data();
	+
	return val;
	}

	-template<class T>
	-T DivAtomicCheck(T i0 , T i1) {
	+template< class T >
	+T DivAtomicCheck( T i0, T i1 ) {
	T* data = new T[1];
	data[0] = 0;

	- *data = i0/i1 ;
	+ *data = i0 / i1;

	T val = *data;
	delete [] data;
	+
	return val;
	}

	-template<class T,class DeviceType>
	-bool DivAtomicTest(T i0, T i1)
	+template< class T, class DeviceType >
	+bool DivAtomicTest( T i0, T i1 )
	{
	- T res = DivAtomic<T,DeviceType>(i0,i1);
	- T resSerial = DivAtomicCheck<T>(i0,i1);
	+ T res = DivAtomic< T, DeviceType >( i0, i1 );
	+ T resSerial = DivAtomicCheck< T >( i0, i1 );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = DivAtomicTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	//---------------------------------------------------
	//--------------atomic_fetch_mod---------------------
	//---------------------------------------------------

	-template<class T,class DEVICE_TYPE>
	-struct ModFunctor{
	+template< class T, class DEVICE_TYPE >
	+struct ModFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef Kokkos::View<T,execution_space> type;
	+ typedef Kokkos::View< T, execution_space > type;
	+
	type data;
	T i0;
	T i1;

	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	- Kokkos::atomic_fetch_mod(&data(),(T)i1);
	+ void operator()( int ) const {
	+ Kokkos::atomic_fetch_mod( &data(), (T) i1 );
	}
	- ModFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
	+
	+ ModFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
	};

	-template<class T, class execution_space >
	-T ModAtomic(T i0 , T i1) {
	- struct InitFunctor<T,execution_space> f_init(i0);
	- typename InitFunctor<T,execution_space>::type data("Data");
	- typename InitFunctor<T,execution_space>::h_type h_data("HData");
	+template< class T, class execution_space >
	+T ModAtomic( T i0, T i1 ) {
	+ struct InitFunctor< T, execution_space > f_init( i0 );
	+ typename InitFunctor< T, execution_space >::type data( "Data" );
	+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
	+
	f_init.data = data;
	- Kokkos::parallel_for(1,f_init);
	+ Kokkos::parallel_for( 1, f_init );
	execution_space::fence();

	- struct ModFunctor<T,execution_space> f(i0,i1);
	+ struct ModFunctor< T, execution_space > f( i0, i1 );
	+
	f.data = data;
	- Kokkos::parallel_for(1,f);
	+ Kokkos::parallel_for( 1, f );
	execution_space::fence();

	- Kokkos::deep_copy(h_data,data);
	+ Kokkos::deep_copy( h_data, data );
	T val = h_data();
	+
	return val;
	}

	-template<class T>
	-T ModAtomicCheck(T i0 , T i1) {
	+template< class T >
	+T ModAtomicCheck( T i0, T i1 ) {
	T* data = new T[1];
	data[0] = 0;

	- *data = i0%i1 ;
	+ *data = i0 % i1;

	T val = *data;
	delete [] data;
	+
	return val;
	}

	-template<class T,class DeviceType>
	-bool ModAtomicTest(T i0, T i1)
	+template< class T, class DeviceType >
	+bool ModAtomicTest( T i0, T i1 )
	{
	- T res = ModAtomic<T,DeviceType>(i0,i1);
	- T resSerial = ModAtomicCheck<T>(i0,i1);
	+ T res = ModAtomic< T, DeviceType >( i0, i1 );
	+ T resSerial = ModAtomicCheck< T >( i0, i1 );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = ModAtomicTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	//---------------------------------------------------
	//--------------atomic_fetch_and---------------------
	//---------------------------------------------------

	-template<class T,class DEVICE_TYPE>
	-struct AndFunctor{
	+template< class T, class DEVICE_TYPE >
	+struct AndFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef Kokkos::View<T,execution_space> type;
	+ typedef Kokkos::View< T, execution_space > type;
	+
	type data;
	T i0;
	T i1;

	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	- Kokkos::atomic_fetch_and(&data(),(T)i1);
	+ void operator()( int ) const {
	+ Kokkos::atomic_fetch_and( &data(), (T) i1 );
	}
	- AndFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
	+
	+ AndFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
	};

	-template<class T, class execution_space >
	-T AndAtomic(T i0 , T i1) {
	- struct InitFunctor<T,execution_space> f_init(i0);
	- typename InitFunctor<T,execution_space>::type data("Data");
	- typename InitFunctor<T,execution_space>::h_type h_data("HData");
	+template< class T, class execution_space >
	+T AndAtomic( T i0, T i1 ) {
	+ struct InitFunctor< T, execution_space > f_init( i0 );
	+ typename InitFunctor< T, execution_space >::type data( "Data" );
	+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
	+
	f_init.data = data;
	- Kokkos::parallel_for(1,f_init);
	+ Kokkos::parallel_for( 1, f_init );
	execution_space::fence();

	- struct AndFunctor<T,execution_space> f(i0,i1);
	+ struct AndFunctor< T, execution_space > f( i0, i1 );
	+
	f.data = data;
	- Kokkos::parallel_for(1,f);
	+ Kokkos::parallel_for( 1, f );
	execution_space::fence();

	- Kokkos::deep_copy(h_data,data);
	+ Kokkos::deep_copy( h_data, data );
	T val = h_data();
	+
	return val;
	}

	-template<class T>
	-T AndAtomicCheck(T i0 , T i1) {
	+template< class T >
	+T AndAtomicCheck( T i0, T i1 ) {
	T* data = new T[1];
	data[0] = 0;

	- *data = i0&i1 ;
	+ *data = i0 & i1;

	T val = *data;
	delete [] data;
	+
	return val;
	}

	-template<class T,class DeviceType>
	-bool AndAtomicTest(T i0, T i1)
	+template< class T, class DeviceType >
	+bool AndAtomicTest( T i0, T i1 )
	{
	- T res = AndAtomic<T,DeviceType>(i0,i1);
	- T resSerial = AndAtomicCheck<T>(i0,i1);
	+ T res = AndAtomic< T, DeviceType >( i0, i1 );
	+ T resSerial = AndAtomicCheck< T >( i0, i1 );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = AndAtomicTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	//---------------------------------------------------
	//--------------atomic_fetch_or----------------------
	//---------------------------------------------------

	-template<class T,class DEVICE_TYPE>
	-struct OrFunctor{
	+template< class T, class DEVICE_TYPE >
	+struct OrFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef Kokkos::View<T,execution_space> type;
	+ typedef Kokkos::View< T, execution_space > type;
	+
	type data;
	T i0;
	T i1;

	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	- Kokkos::atomic_fetch_or(&data(),(T)i1);
	+ void operator()( int ) const {
	+ Kokkos::atomic_fetch_or( &data(), (T) i1 );
	}
	- OrFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
	+
	+ OrFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
	};

	-template<class T, class execution_space >
	-T OrAtomic(T i0 , T i1) {
	- struct InitFunctor<T,execution_space> f_init(i0);
	- typename InitFunctor<T,execution_space>::type data("Data");
	- typename InitFunctor<T,execution_space>::h_type h_data("HData");
	+template< class T, class execution_space >
	+T OrAtomic( T i0, T i1 ) {
	+ struct InitFunctor< T, execution_space > f_init( i0 );
	+ typename InitFunctor< T, execution_space >::type data( "Data" );
	+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
	+
	f_init.data = data;
	- Kokkos::parallel_for(1,f_init);
	+ Kokkos::parallel_for( 1, f_init );
	execution_space::fence();

	- struct OrFunctor<T,execution_space> f(i0,i1);
	+ struct OrFunctor< T, execution_space > f( i0, i1 );
	+
	f.data = data;
	- Kokkos::parallel_for(1,f);
	+ Kokkos::parallel_for( 1, f );
	execution_space::fence();

	- Kokkos::deep_copy(h_data,data);
	+ Kokkos::deep_copy( h_data, data );
	T val = h_data();
	+
	return val;
	}

	-template<class T>
	-T OrAtomicCheck(T i0 , T i1) {
	+template< class T >
	+T OrAtomicCheck( T i0, T i1 ) {
	T* data = new T[1];
	data[0] = 0;

	- *data = i0\|i1 ;
	+ *data = i0 \| i1;

	T val = *data;
	delete [] data;
	+
	return val;
	}

	-template<class T,class DeviceType>
	-bool OrAtomicTest(T i0, T i1)
	+template< class T, class DeviceType >
	+bool OrAtomicTest( T i0, T i1 )
	{
	- T res = OrAtomic<T,DeviceType>(i0,i1);
	- T resSerial = OrAtomicCheck<T>(i0,i1);
	+ T res = OrAtomic< T, DeviceType >( i0, i1 );
	+ T resSerial = OrAtomicCheck< T >( i0, i1 );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = OrAtomicTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	//---------------------------------------------------
	//--------------atomic_fetch_xor---------------------
	//---------------------------------------------------

	-template<class T,class DEVICE_TYPE>
	-struct XorFunctor{
	+template< class T, class DEVICE_TYPE >
	+struct XorFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef Kokkos::View<T,execution_space> type;
	+ typedef Kokkos::View< T, execution_space > type;
	+
	type data;
	T i0;
	T i1;

	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	- Kokkos::atomic_fetch_xor(&data(),(T)i1);
	+ void operator()( int ) const {
	+ Kokkos::atomic_fetch_xor( &data(), (T) i1 );
	}
	- XorFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
	+
	+ XorFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
	};

	-template<class T, class execution_space >
	-T XorAtomic(T i0 , T i1) {
	- struct InitFunctor<T,execution_space> f_init(i0);
	- typename InitFunctor<T,execution_space>::type data("Data");
	- typename InitFunctor<T,execution_space>::h_type h_data("HData");
	+template< class T, class execution_space >
	+T XorAtomic( T i0, T i1 ) {
	+ struct InitFunctor< T, execution_space > f_init( i0 );
	+ typename InitFunctor< T, execution_space >::type data( "Data" );
	+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
	+
	f_init.data = data;
	- Kokkos::parallel_for(1,f_init);
	+ Kokkos::parallel_for( 1, f_init );
	execution_space::fence();

	- struct XorFunctor<T,execution_space> f(i0,i1);
	+ struct XorFunctor< T, execution_space > f( i0, i1 );
	+
	f.data = data;
	- Kokkos::parallel_for(1,f);
	+ Kokkos::parallel_for( 1, f );
	execution_space::fence();

	- Kokkos::deep_copy(h_data,data);
	+ Kokkos::deep_copy( h_data, data );
	T val = h_data();
	+
	return val;
	}

	-template<class T>
	-T XorAtomicCheck(T i0 , T i1) {
	+template< class T >
	+T XorAtomicCheck( T i0, T i1 ) {
	T* data = new T[1];
	data[0] = 0;

	- *data = i0^i1 ;
	+ *data = i0 ^ i1;

	T val = *data;
	delete [] data;
	+
	return val;
	}

	-template<class T,class DeviceType>
	-bool XorAtomicTest(T i0, T i1)
	+template< class T, class DeviceType >
	+bool XorAtomicTest( T i0, T i1 )
	{
	- T res = XorAtomic<T,DeviceType>(i0,i1);
	- T resSerial = XorAtomicCheck<T>(i0,i1);
	+ T res = XorAtomic< T, DeviceType >( i0, i1 );
	+ T resSerial = XorAtomicCheck< T >( i0, i1 );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = XorAtomicTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	//---------------------------------------------------
	//--------------atomic_fetch_lshift---------------------
	//---------------------------------------------------

	-template<class T,class DEVICE_TYPE>
	-struct LShiftFunctor{
	+template< class T, class DEVICE_TYPE >
	+struct LShiftFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef Kokkos::View<T,execution_space> type;
	+ typedef Kokkos::View< T, execution_space > type;
	+
	type data;
	T i0;
	T i1;

	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	- Kokkos::atomic_fetch_lshift(&data(),(T)i1);
	+ void operator()( int ) const {
	+ Kokkos::atomic_fetch_lshift( &data(), (T) i1 );
	}
	- LShiftFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
	+
	+ LShiftFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
	};

	-template<class T, class execution_space >
	-T LShiftAtomic(T i0 , T i1) {
	- struct InitFunctor<T,execution_space> f_init(i0);
	- typename InitFunctor<T,execution_space>::type data("Data");
	- typename InitFunctor<T,execution_space>::h_type h_data("HData");
	+template< class T, class execution_space >
	+T LShiftAtomic( T i0, T i1 ) {
	+ struct InitFunctor< T, execution_space > f_init( i0 );
	+ typename InitFunctor< T, execution_space >::type data( "Data" );
	+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
	+
	f_init.data = data;
	- Kokkos::parallel_for(1,f_init);
	+ Kokkos::parallel_for( 1, f_init );
	execution_space::fence();

	- struct LShiftFunctor<T,execution_space> f(i0,i1);
	+ struct LShiftFunctor< T, execution_space > f( i0, i1 );
	+
	f.data = data;
	- Kokkos::parallel_for(1,f);
	+ Kokkos::parallel_for( 1, f );
	execution_space::fence();

	- Kokkos::deep_copy(h_data,data);
	+ Kokkos::deep_copy( h_data, data );
	T val = h_data();
	+
	return val;
	}

	-template<class T>
	-T LShiftAtomicCheck(T i0 , T i1) {
	+template< class T >
	+T LShiftAtomicCheck( T i0, T i1 ) {
	T* data = new T[1];
	data[0] = 0;

	- *data = i0<<i1 ;
	+ *data = i0 << i1;

	T val = *data;
	delete [] data;
	+
	return val;
	}

	-template<class T,class DeviceType>
	-bool LShiftAtomicTest(T i0, T i1)
	+template< class T, class DeviceType >
	+bool LShiftAtomicTest( T i0, T i1 )
	{
	- T res = LShiftAtomic<T,DeviceType>(i0,i1);
	- T resSerial = LShiftAtomicCheck<T>(i0,i1);
	+ T res = LShiftAtomic< T, DeviceType >( i0, i1 );
	+ T resSerial = LShiftAtomicCheck< T >( i0, i1 );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = LShiftAtomicTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	//---------------------------------------------------
	//--------------atomic_fetch_rshift---------------------
	//---------------------------------------------------

	-template<class T,class DEVICE_TYPE>
	-struct RShiftFunctor{
	+template< class T, class DEVICE_TYPE >
	+struct RShiftFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef Kokkos::View<T,execution_space> type;
	+ typedef Kokkos::View< T, execution_space > type;
	+
	type data;
	T i0;
	T i1;

	KOKKOS_INLINE_FUNCTION
	- void operator()(int) const {
	- Kokkos::atomic_fetch_rshift(&data(),(T)i1);
	+ void operator()( int ) const {
	+ Kokkos::atomic_fetch_rshift( &data(), (T) i1 );
	}
	- RShiftFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
	+
	+ RShiftFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
	};

	-template<class T, class execution_space >
	-T RShiftAtomic(T i0 , T i1) {
	- struct InitFunctor<T,execution_space> f_init(i0);
	- typename InitFunctor<T,execution_space>::type data("Data");
	- typename InitFunctor<T,execution_space>::h_type h_data("HData");
	+template< class T, class execution_space >
	+T RShiftAtomic( T i0, T i1 ) {
	+ struct InitFunctor< T, execution_space > f_init( i0 );
	+ typename InitFunctor< T, execution_space >::type data( "Data" );
	+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
	+
	f_init.data = data;
	- Kokkos::parallel_for(1,f_init);
	+ Kokkos::parallel_for( 1, f_init );
	execution_space::fence();

	- struct RShiftFunctor<T,execution_space> f(i0,i1);
	+ struct RShiftFunctor< T, execution_space > f( i0, i1 );
	+
	f.data = data;
	- Kokkos::parallel_for(1,f);
	+ Kokkos::parallel_for( 1, f );
	execution_space::fence();

	- Kokkos::deep_copy(h_data,data);
	+ Kokkos::deep_copy( h_data, data );
	T val = h_data();
	+
	return val;
	}

	-template<class T>
	-T RShiftAtomicCheck(T i0 , T i1) {
	+template< class T >
	+T RShiftAtomicCheck( T i0, T i1 ) {
	T* data = new T[1];
	data[0] = 0;

	- *data = i0>>i1 ;
	+ *data = i0 >> i1;

	T val = *data;
	delete [] data;
	+
	return val;
	}

	-template<class T,class DeviceType>
	-bool RShiftAtomicTest(T i0, T i1)
	+template< class T, class DeviceType >
	+bool RShiftAtomicTest( T i0, T i1 )
	{
	- T res = RShiftAtomic<T,DeviceType>(i0,i1);
	- T resSerial = RShiftAtomicCheck<T>(i0,i1);
	+ T res = RShiftAtomic< T, DeviceType >( i0, i1 );
	+ T resSerial = RShiftAtomicCheck< T >( i0, i1 );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = RShiftAtomicTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	-
	//---------------------------------------------------
	//--------------atomic_test_control------------------
	//---------------------------------------------------

	-template<class T,class DeviceType>
	-bool AtomicOperationsTestIntegralType( int i0 , int i1 , int test )
	+template< class T, class DeviceType >
	+bool AtomicOperationsTestIntegralType( int i0, int i1, int test )
	{
	- switch (test) {
	- case 1: return MaxAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
	- case 2: return MinAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
	- case 3: return MulAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
	- case 4: return DivAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
	- case 5: return ModAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
	- case 6: return AndAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
	- case 7: return OrAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
	- case 8: return XorAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
	- case 9: return LShiftAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
	- case 10: return RShiftAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
	- case 11: return IncAtomicTest<T,DeviceType>( (T)i0 );
	- case 12: return DecAtomicTest<T,DeviceType>( (T)i0 );
	+ switch ( test ) {
	+ case 1: return MaxAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
	+ case 2: return MinAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
	+ case 3: return MulAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
	+ case 4: return DivAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
	+ case 5: return ModAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
	+ case 6: return AndAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
	+ case 7: return OrAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
	+ case 8: return XorAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
	+ case 9: return LShiftAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
	+ case 10: return RShiftAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
	+ case 11: return IncAtomicTest< T, DeviceType >( (T) i0 );
	+ case 12: return DecAtomicTest< T, DeviceType >( (T) i0 );
	}
	+
	return 0;
	}

	-template<class T,class DeviceType>
	-bool AtomicOperationsTestNonIntegralType( int i0 , int i1 , int test )
	+template< class T, class DeviceType >
	+bool AtomicOperationsTestNonIntegralType( int i0, int i1, int test )
	{
	- switch (test) {
	- case 1: return MaxAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
	- case 2: return MinAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
	- case 3: return MulAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
	- case 4: return DivAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
	+ switch ( test ) {
	+ case 1: return MaxAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
	+ case 2: return MinAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
	+ case 3: return MulAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
	+ case 4: return DivAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
	}
	+
	return 0;
	}

	-} // namespace
	-
	+} // namespace TestAtomicOperations
	diff --git a/lib/kokkos/core/unit_test/TestAtomicViews.hpp b/lib/kokkos/core/unit_test/TestAtomicViews.hpp
	index 739492d32..71080e5c8 100644
	--- a/lib/kokkos/core/unit_test/TestAtomicViews.hpp
	+++ b/lib/kokkos/core/unit_test/TestAtomicViews.hpp
	@@ -1,1532 +1,1439 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Core.hpp>

	namespace TestAtomicViews {

	//-------------------------------------------------
	//-----------atomic view api tests-----------------
	//-------------------------------------------------

	-template< class T , class ... P >
	-size_t allocation_count( const Kokkos::View<T,P...> & view )
	+template< class T, class ... P >
	+size_t allocation_count( const Kokkos::View< T, P... > & view )
	{
	const size_t card = view.size();
	const size_t alloc = view.span();

	- const int memory_span = Kokkos::View<int*>::required_allocation_size(100);
	+ const int memory_span = Kokkos::View< int* >::required_allocation_size( 100 );

	- return (card <= alloc && memory_span == 400) ? alloc : 0 ;
	+ return ( card <= alloc && memory_span == 400 ) ? alloc : 0;
	}

	-template< class DataType ,
	- class DeviceType ,
	+template< class DataType,
	+ class DeviceType,
	unsigned Rank = Kokkos::ViewTraits< DataType >::rank >
	-struct TestViewOperator_LeftAndRight ;
	+struct TestViewOperator_LeftAndRight;

	-template< class DataType , class DeviceType >
	-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 1 >
	+template< class DataType, class DeviceType >
	+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 1 >
	{
	- typedef typename DeviceType::execution_space execution_space ;
	- typedef typename DeviceType::memory_space memory_space ;
	- typedef typename execution_space::size_type size_type ;
	+ typedef typename DeviceType::execution_space execution_space;
	+ typedef typename DeviceType::memory_space memory_space;
	+ typedef typename execution_space::size_type size_type;

	- typedef int value_type ;
	+ typedef int value_type;

	KOKKOS_INLINE_FUNCTION
	- static void join( volatile value_type & update ,
	+ static void join( volatile value_type & update,
	const volatile value_type & input )
	- { update \|= input ; }
	+ { update \|= input; }

	KOKKOS_INLINE_FUNCTION
	static void init( value_type & update )
	- { update = 0 ; }
	+ { update = 0; }

	+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > left_view;

	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutLeft, execution_space, Kokkos::MemoryTraits< Kokkos::Atomic > > left_view ;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > right_view;

	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutRight, execution_space, Kokkos::MemoryTraits< Kokkos::Atomic > > right_view ;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutStride, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > stride_view;

	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutStride, execution_space, Kokkos::MemoryTraits< Kokkos::Atomic >> stride_view ;
	-
	- left_view left ;
	- right_view right ;
	- stride_view left_stride ;
	- stride_view right_stride ;
	- long left_alloc ;
	- long right_alloc ;
	+ left_view left;
	+ right_view right;
	+ stride_view left_stride;
	+ stride_view right_stride;
	+ long left_alloc;
	+ long right_alloc;

	TestViewOperator_LeftAndRight()
	: left( "left" )
	, right( "right" )
	, left_stride( left )
	, right_stride( right )
	, left_alloc( allocation_count( left ) )
	, right_alloc( allocation_count( right ) )
	{}

	static void testit()
	{
	- TestViewOperator_LeftAndRight driver ;
	+ TestViewOperator_LeftAndRight driver;

	- int error_flag = 0 ;
	+ int error_flag = 0;

	- Kokkos::parallel_reduce( 1 , driver , error_flag );
	+ Kokkos::parallel_reduce( 1, driver, error_flag );

	- ASSERT_EQ( error_flag , 0 );
	+ ASSERT_EQ( error_flag, 0 );
	}

	KOKKOS_INLINE_FUNCTION
	- void operator()( const size_type , value_type & update ) const
	+ void operator()( const size_type, value_type & update ) const
	{
	- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
	+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
	{
	- // below checks that values match, but unable to check the references
	- // - should this be able to be checked?
	- if ( left(i0) != left(i0,0,0,0,0,0,0,0) ) { update \|= 3 ; }
	- if ( right(i0) != right(i0,0,0,0,0,0,0,0) ) { update \|= 3 ; }
	- if ( left(i0) != left_stride(i0) ) { update \|= 4 ; }
	- if ( right(i0) != right_stride(i0) ) { update \|= 8 ; }
	- /*
	- if ( & left(i0) != & left(i0,0,0,0,0,0,0,0) ) { update \|= 3 ; }
	- if ( & right(i0) != & right(i0,0,0,0,0,0,0,0) ) { update \|= 3 ; }
	- if ( & left(i0) != & left_stride(i0) ) { update \|= 4 ; }
	- if ( & right(i0) != & right_stride(i0) ) { update \|= 8 ; }
	- */
	+ // Below checks that values match, but unable to check the references.
	+ // Should this be able to be checked?
	+ if ( left( i0 ) != left( i0, 0, 0, 0, 0, 0, 0, 0 ) ) { update \|= 3; }
	+ if ( right( i0 ) != right( i0, 0, 0, 0, 0, 0, 0, 0 ) ) { update \|= 3; }
	+ if ( left( i0 ) != left_stride( i0 ) ) { update \|= 4; }
	+ if ( right( i0 ) != right_stride( i0 ) ) { update \|= 8; }
	+/*
	+ if ( &left( i0 ) != &left( i0, 0, 0, 0, 0, 0, 0, 0 ) ) { update \|= 3; }
	+ if ( &right( i0 ) != &right( i0, 0, 0, 0, 0, 0, 0, 0 ) ) { update \|= 3; }
	+ if ( &left( i0 ) != &left_stride( i0 ) ) { update \|= 4; }
	+ if ( &right( i0 ) != &right_stride( i0 ) ) { update \|= 8; }
	+*/
	}
	}
	};

	-
	template< typename T, class DeviceType >
	class TestAtomicViewAPI
	{
	public:
	- typedef DeviceType device ;
	+ typedef DeviceType device;

	- enum { N0 = 1000 ,
	- N1 = 3 ,
	- N2 = 5 ,
	+ enum { N0 = 1000,
	+ N1 = 3,
	+ N2 = 5,
	N3 = 7 };

	- typedef Kokkos::View< T , device > dView0 ;
	- typedef Kokkos::View< T* , device > dView1 ;
	- typedef Kokkos::View< T*[N1] , device > dView2 ;
	- typedef Kokkos::View< T*[N1][N2] , device > dView3 ;
	- typedef Kokkos::View< T*[N1][N2][N3] , device > dView4 ;
	- typedef Kokkos::View< const T*[N1][N2][N3] , device > const_dView4 ;
	- typedef Kokkos::View< T****, device, Kokkos::MemoryUnmanaged > dView4_unmanaged ;
	- typedef typename dView0::host_mirror_space host ;
	+ typedef Kokkos::View< T, device > dView0;
	+ typedef Kokkos::View< T*, device > dView1;
	+ typedef Kokkos::View< T*[N1], device > dView2;
	+ typedef Kokkos::View< T*[N1][N2], device > dView3;
	+ typedef Kokkos::View< T*[N1][N2][N3], device > dView4;
	+ typedef Kokkos::View< const T*[N1][N2][N3], device > const_dView4;
	+ typedef Kokkos::View< T****, device, Kokkos::MemoryUnmanaged > dView4_unmanaged;
	+ typedef typename dView0::host_mirror_space host;

	- typedef Kokkos::View< T , device , Kokkos::MemoryTraits< Kokkos::Atomic > > aView0 ;
	- typedef Kokkos::View< T* , device , Kokkos::MemoryTraits< Kokkos::Atomic > > aView1 ;
	- typedef Kokkos::View< T*[N1] , device , Kokkos::MemoryTraits< Kokkos::Atomic > > aView2 ;
	- typedef Kokkos::View< T*[N1][N2] , device , Kokkos::MemoryTraits< Kokkos::Atomic > > aView3 ;
	- typedef Kokkos::View< T*[N1][N2][N3] , device , Kokkos::MemoryTraits< Kokkos::Atomic > > aView4 ;
	- typedef Kokkos::View< const T*[N1][N2][N3] , device , Kokkos::MemoryTraits< Kokkos::Atomic > > const_aView4 ;
	+ typedef Kokkos::View< T, device, Kokkos::MemoryTraits< Kokkos::Atomic > > aView0;
	+ typedef Kokkos::View< T*, device, Kokkos::MemoryTraits< Kokkos::Atomic > > aView1;
	+ typedef Kokkos::View< T*[N1], device, Kokkos::MemoryTraits< Kokkos::Atomic > > aView2;
	+ typedef Kokkos::View< T*[N1][N2], device, Kokkos::MemoryTraits< Kokkos::Atomic > > aView3;
	+ typedef Kokkos::View< T*[N1][N2][N3], device, Kokkos::MemoryTraits< Kokkos::Atomic > > aView4;
	+ typedef Kokkos::View< const T*[N1][N2][N3], device, Kokkos::MemoryTraits< Kokkos::Atomic > > const_aView4;

	- typedef Kokkos::View< T****, device, Kokkos::MemoryTraits< Kokkos::Unmanaged \| Kokkos::Atomic > > aView4_unmanaged ;
	+ typedef Kokkos::View< T****, device, Kokkos::MemoryTraits< Kokkos::Unmanaged \| Kokkos::Atomic > > aView4_unmanaged;

	- typedef typename aView0::host_mirror_space host_atomic ;
	+ typedef typename aView0::host_mirror_space host_atomic;

	TestAtomicViewAPI()
	{
	- TestViewOperator_LeftAndRight< int[2] , device >::testit();
	+ TestViewOperator_LeftAndRight< int[2], device >::testit();
	run_test_rank0();
	run_test_rank4();
	run_test_const();
	}

	-
	static void run_test_rank0()
	{
	- dView0 dx , dy ;
	- aView0 ax , ay , az ;
	+ dView0 dx, dy;
	+ aView0 ax, ay, az;

	dx = dView0( "dx" );
	dy = dView0( "dy" );
	- ASSERT_EQ( dx.use_count() , size_t(1) );
	- ASSERT_EQ( dy.use_count() , size_t(1) );
	-
	- ax = dx ;
	- ay = dy ;
	- ASSERT_EQ( dx.use_count() , size_t(2) );
	- ASSERT_EQ( dy.use_count() , size_t(2) );
	- ASSERT_EQ( dx.use_count() , ax.use_count() );
	-
	- az = ax ;
	- ASSERT_EQ( dx.use_count() , size_t(3) );
	- ASSERT_EQ( ax.use_count() , size_t(3) );
	- ASSERT_EQ( az.use_count() , size_t(3) );
	- ASSERT_EQ( az.use_count() , ax.use_count() );
	+ ASSERT_EQ( dx.use_count(), size_t( 1 ) );
	+ ASSERT_EQ( dy.use_count(), size_t( 1 ) );
	+
	+ ax = dx;
	+ ay = dy;
	+ ASSERT_EQ( dx.use_count(), size_t( 2 ) );
	+ ASSERT_EQ( dy.use_count(), size_t( 2 ) );
	+ ASSERT_EQ( dx.use_count(), ax.use_count() );
	+
	+ az = ax;
	+ ASSERT_EQ( dx.use_count(), size_t( 3 ) );
	+ ASSERT_EQ( ax.use_count(), size_t( 3 ) );
	+ ASSERT_EQ( az.use_count(), size_t( 3 ) );
	+ ASSERT_EQ( az.use_count(), ax.use_count() );
	}

	static void run_test_rank4()
	{
	- dView4 dx , dy ;
	- aView4 ax , ay , az ;
	+ dView4 dx, dy;
	+ aView4 ax, ay, az;

	- dx = dView4( "dx" , N0 );
	- dy = dView4( "dy" , N0 );
	- ASSERT_EQ( dx.use_count() , size_t(1) );
	- ASSERT_EQ( dy.use_count() , size_t(1) );
	+ dx = dView4( "dx", N0 );
	+ dy = dView4( "dy", N0 );
	+ ASSERT_EQ( dx.use_count(), size_t( 1 ) );
	+ ASSERT_EQ( dy.use_count(), size_t( 1 ) );

	- ax = dx ;
	- ay = dy ;
	- ASSERT_EQ( dx.use_count() , size_t(2) );
	- ASSERT_EQ( dy.use_count() , size_t(2) );
	- ASSERT_EQ( dx.use_count() , ax.use_count() );
	+ ax = dx;
	+ ay = dy;
	+ ASSERT_EQ( dx.use_count(), size_t( 2 ) );
	+ ASSERT_EQ( dy.use_count(), size_t( 2 ) );
	+ ASSERT_EQ( dx.use_count(), ax.use_count() );

	dView4_unmanaged unmanaged_dx = dx;
	- ASSERT_EQ( dx.use_count() , size_t(2) );
	+ ASSERT_EQ( dx.use_count(), size_t( 2 ) );

	- az = ax ;
	- ASSERT_EQ( dx.use_count() , size_t(3) );
	- ASSERT_EQ( ax.use_count() , size_t(3) );
	- ASSERT_EQ( az.use_count() , size_t(3) );
	- ASSERT_EQ( az.use_count() , ax.use_count() );
	+ az = ax;
	+ ASSERT_EQ( dx.use_count(), size_t( 3 ) );
	+ ASSERT_EQ( ax.use_count(), size_t( 3 ) );
	+ ASSERT_EQ( az.use_count(), size_t( 3 ) );
	+ ASSERT_EQ( az.use_count(), ax.use_count() );

	aView4_unmanaged unmanaged_ax = ax;
	- ASSERT_EQ( ax.use_count() , size_t(3) );
	+ ASSERT_EQ( ax.use_count(), size_t( 3 ) );

	- aView4_unmanaged unmanaged_ax_from_ptr_dx = aView4_unmanaged(dx.data(),
	- dx.dimension_0(),
	- dx.dimension_1(),
	- dx.dimension_2(),
	- dx.dimension_3());
	- ASSERT_EQ( ax.use_count() , size_t(3) );
	+ aView4_unmanaged unmanaged_ax_from_ptr_dx =
	+ aView4_unmanaged( dx.data(), dx.dimension_0(), dx.dimension_1(), dx.dimension_2(), dx.dimension_3() );
	+ ASSERT_EQ( ax.use_count(), size_t( 3 ) );

	- const_aView4 const_ax = ax ;
	- ASSERT_EQ( ax.use_count() , size_t(4) );
	- ASSERT_EQ( const_ax.use_count() , ax.use_count() );
	+ const_aView4 const_ax = ax;
	+ ASSERT_EQ( ax.use_count(), size_t( 4 ) );
	+ ASSERT_EQ( const_ax.use_count(), ax.use_count() );

	ASSERT_FALSE( ax.data() == 0 );
	ASSERT_FALSE( const_ax.data() == 0 ); // referenceable ptr
	ASSERT_FALSE( unmanaged_ax.data() == 0 );
	ASSERT_FALSE( unmanaged_ax_from_ptr_dx.data() == 0 );
	ASSERT_FALSE( ay.data() == 0 );
	-// ASSERT_NE( ax , ay );
	+// ASSERT_NE( ax, ay );
	// Above test results in following runtime error from gtest:
	// Expected: (ax) != (ay), actual: 32-byte object <30-01 D0-A0 D8-7F 00-00 00-31 44-0C 01-00 00-00 E8-03 00-00 00-00 00-00 69-00 00-00 00-00 00-00> vs 32-byte object <80-01 D0-A0 D8-7F 00-00 00-A1 4A-0C 01-00 00-00 E8-03 00-00 00-00 00-00 69-00 00-00 00-00 00-00>

	- ASSERT_EQ( ax.dimension_0() , unsigned(N0) );
	- ASSERT_EQ( ax.dimension_1() , unsigned(N1) );
	- ASSERT_EQ( ax.dimension_2() , unsigned(N2) );
	- ASSERT_EQ( ax.dimension_3() , unsigned(N3) );
	+ ASSERT_EQ( ax.dimension_0(), unsigned( N0 ) );
	+ ASSERT_EQ( ax.dimension_1(), unsigned( N1 ) );
	+ ASSERT_EQ( ax.dimension_2(), unsigned( N2 ) );
	+ ASSERT_EQ( ax.dimension_3(), unsigned( N3 ) );

	- ASSERT_EQ( ay.dimension_0() , unsigned(N0) );
	- ASSERT_EQ( ay.dimension_1() , unsigned(N1) );
	- ASSERT_EQ( ay.dimension_2() , unsigned(N2) );
	- ASSERT_EQ( ay.dimension_3() , unsigned(N3) );
	+ ASSERT_EQ( ay.dimension_0(), unsigned( N0 ) );
	+ ASSERT_EQ( ay.dimension_1(), unsigned( N1 ) );
	+ ASSERT_EQ( ay.dimension_2(), unsigned( N2 ) );
	+ ASSERT_EQ( ay.dimension_3(), unsigned( N3 ) );

	- ASSERT_EQ( unmanaged_ax_from_ptr_dx.capacity(),unsigned(N0)unsigned(N1)unsigned(N2)*unsigned(N3) );
	+ ASSERT_EQ( unmanaged_ax_from_ptr_dx.capacity(), unsigned( N0 ) * unsigned( N1 ) * unsigned( N2 ) * unsigned( N3 ) );
	}

	- typedef T DataType[2] ;
	+ typedef T DataType[2];

	static void
	check_auto_conversion_to_const(
	- const Kokkos::View< const DataType , device , Kokkos::MemoryTraits< Kokkos::Atomic> > & arg_const ,
	- const Kokkos::View< const DataType , device , Kokkos::MemoryTraits< Kokkos::Atomic> > & arg )
	+ const Kokkos::View< const DataType, device, Kokkos::MemoryTraits<Kokkos::Atomic> > & arg_const,
	+ const Kokkos::View< const DataType, device, Kokkos::MemoryTraits<Kokkos::Atomic> > & arg )
	{
	ASSERT_TRUE( arg_const == arg );
	}

	static void run_test_const()
	{
	- typedef Kokkos::View< DataType , device , Kokkos::MemoryTraits< Kokkos::Atomic> > typeX ;
	- typedef Kokkos::View< const DataType , device , Kokkos::MemoryTraits< Kokkos::Atomic> > const_typeX ;
	+ typedef Kokkos::View< DataType, device, Kokkos::MemoryTraits<Kokkos::Atomic> > typeX;
	+ typedef Kokkos::View< const DataType, device, Kokkos::MemoryTraits<Kokkos::Atomic> > const_typeX;

	typeX x( "X" );
	- const_typeX xc = x ;
	+ const_typeX xc = x;

	//ASSERT_TRUE( xc == x ); // const xc is referenceable, non-const x is not
	//ASSERT_TRUE( x == xc );

	- check_auto_conversion_to_const( x , xc );
	+ check_auto_conversion_to_const( x, xc );
	}
	-
	};

	-
	//---------------------------------------------------
	//-----------initialization functors-----------------
	//---------------------------------------------------

	template<class T, class execution_space >
	struct InitFunctor_Seq {
	+ typedef Kokkos::View< T*, execution_space > view_type;

	- typedef Kokkos::View< T* , execution_space > view_type ;
	-
	- view_type input ;
	- const long length ;
	+ view_type input;
	+ const long length;

	- InitFunctor_Seq( view_type & input_ , const long length_ )
	- : input(input_)
	- , length(length_)
	+ InitFunctor_Seq( view_type & input_, const long length_ )
	+ : input( input_ )
	+ , length( length_ )
	{}

	KOKKOS_INLINE_FUNCTION
	void operator()( const long i ) const {
	if ( i < length ) {
	- input(i) = (T) i ;
	+ input( i ) = (T) i;
	}
	}
	-
	};

	-
	template<class T, class execution_space >
	struct InitFunctor_ModTimes {
	+ typedef Kokkos::View< T*, execution_space > view_type;

	- typedef Kokkos::View< T* , execution_space > view_type ;
	-
	- view_type input ;
	- const long length ;
	- const long remainder ;
	+ view_type input;
	+ const long length;
	+ const long remainder;

	- InitFunctor_ModTimes( view_type & input_ , const long length_ , const long remainder_ )
	- : input(input_)
	- , length(length_)
	- , remainder(remainder_)
	+ InitFunctor_ModTimes( view_type & input_, const long length_, const long remainder_ )
	+ : input( input_ )
	+ , length( length_ )
	+ , remainder( remainder_ )
	{}

	KOKKOS_INLINE_FUNCTION
	void operator()( const long i ) const {
	if ( i < length ) {
	- if ( i % (remainder+1) == remainder ) {
	- input(i) = (T)2 ;
	+ if ( i % ( remainder + 1 ) == remainder ) {
	+ input( i ) = (T) 2;
	}
	else {
	- input(i) = (T)1 ;
	+ input( i ) = (T) 1;
	}
	}
	}
	};

	-
	template<class T, class execution_space >
	struct InitFunctor_ModShift {
	+ typedef Kokkos::View< T*, execution_space > view_type;

	- typedef Kokkos::View< T* , execution_space > view_type ;
	-
	- view_type input ;
	- const long length ;
	- const long remainder ;
	+ view_type input;
	+ const long length;
	+ const long remainder;

	- InitFunctor_ModShift( view_type & input_ , const long length_ , const long remainder_ )
	- : input(input_)
	- , length(length_)
	- , remainder(remainder_)
	+ InitFunctor_ModShift( view_type & input_, const long length_, const long remainder_ )
	+ : input( input_ )
	+ , length( length_ )
	+ , remainder( remainder_ )
	{}

	KOKKOS_INLINE_FUNCTION
	void operator()( const long i ) const {
	if ( i < length ) {
	- if ( i % (remainder+1) == remainder ) {
	- input(i) = 1 ;
	+ if ( i % ( remainder + 1 ) == remainder ) {
	+ input( i ) = 1;
	}
	}
	}
	};

	-
	//---------------------------------------------------
	//-----------atomic view plus-equal------------------
	//---------------------------------------------------

	template<class T, class execution_space >
	struct PlusEqualAtomicViewFunctor {
	-
	- typedef Kokkos::View< T* , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	+ typedef Kokkos::View< T*, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
	+ typedef Kokkos::View< T*, execution_space > view_type;

	view_type input;
	atomic_view_type even_odd_result;
	const long length;

	// Wrap the result view in an atomic view, use this for operator
	- PlusEqualAtomicViewFunctor( const view_type & input_ , view_type & even_odd_result_ , const long length_)
	- : input(input_)
	- , even_odd_result(even_odd_result_)
	- , length(length_)
	+ PlusEqualAtomicViewFunctor( const view_type & input_, view_type & even_odd_result_, const long length_ )
	+ : input( input_ )
	+ , even_odd_result( even_odd_result_ )
	+ , length( length_ )
	{}

	KOKKOS_INLINE_FUNCTION
	- void operator()(const long i) const {
	+ void operator()( const long i ) const {
	if ( i < length ) {
	if ( i % 2 == 0 ) {
	- even_odd_result(0) += input(i);
	+ even_odd_result( 0 ) += input( i );
	}
	else {
	- even_odd_result(1) += input(i);
	+ even_odd_result( 1 ) += input( i );
	}
	}
	}
	-
	};

	-
	-template<class T, class execution_space >
	-T PlusEqualAtomicView(const long input_length) {
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	- typedef typename view_type::HostMirror host_view_type ;
	+template< class T, class execution_space >
	+T PlusEqualAtomicView( const long input_length ) {
	+ typedef Kokkos::View< T*, execution_space > view_type;
	+ typedef typename view_type::HostMirror host_view_type;

	const long length = input_length;

	- view_type input("input_view",length) ;
	- view_type result_view("result_view",2) ;
	+ view_type input( "input_view", length );
	+ view_type result_view( "result_view", 2 );

	- InitFunctor_Seq<T, execution_space> init_f( input , length ) ;
	- Kokkos::parallel_for(Kokkos::RangePolicy<execution_space>(0, length) , init_f );
	+ InitFunctor_Seq< T, execution_space > init_f( input, length );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );

	- PlusEqualAtomicViewFunctor<T,execution_space> functor(input, result_view, length);
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
	+ PlusEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
	+ Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>( 0, length ), functor );
	Kokkos::fence();

	- host_view_type h_result_view = Kokkos::create_mirror_view(result_view);
	- Kokkos::deep_copy(h_result_view, result_view);
	+ host_view_type h_result_view = Kokkos::create_mirror_view( result_view );
	+ Kokkos::deep_copy( h_result_view, result_view );

	- return (T) (h_result_view(0) + h_result_view(1) ) ;
	+ return (T) ( h_result_view( 0 ) + h_result_view( 1 ) );
	}

	-template<class T>
	+template< class T >
	T PlusEqualAtomicViewCheck( const long input_length ) {
	-
	const long N = input_length;
	T result[2];
	+
	if ( N % 2 == 0 ) {
	- const long half_sum_end = (N/2) - 1;
	+ const long half_sum_end = ( N / 2 ) - 1;
	const long full_sum_end = N - 1;
	- result[0] = half_sum_end*(half_sum_end + 1)/2 ; //even sum
	- result[1] = ( full_sum_end*(full_sum_end + 1)/2 ) - result[0] ; // odd sum
	+ result[0] = half_sum_end * ( half_sum_end + 1 ) / 2; // Even sum.
	+ result[1] = ( full_sum_end * ( full_sum_end + 1 ) / 2 ) - result[0]; // Odd sum.
	}
	else {
	- const long half_sum_end = (T)(N/2) ;
	+ const long half_sum_end = (T) ( N / 2 );
	const long full_sum_end = N - 2;
	- result[0] = half_sum_end*(half_sum_end - 1)/2 ; //even sum
	- result[1] = ( full_sum_end*(full_sum_end - 1)/2 ) - result[0] ; // odd sum
	+ result[0] = half_sum_end * ( half_sum_end - 1 ) / 2; // Even sum.
	+ result[1] = ( full_sum_end * ( full_sum_end - 1 ) / 2 ) - result[0]; // Odd sum.
	}

	- return (T)(result[0] + result[1]);
	+ return (T) ( result[0] + result[1] );
	}

	-template<class T,class DeviceType>
	-bool PlusEqualAtomicViewTest(long input_length)
	+template< class T, class DeviceType >
	+bool PlusEqualAtomicViewTest( long input_length )
	{
	- T res = PlusEqualAtomicView<T,DeviceType>(input_length);
	- T resSerial = PlusEqualAtomicViewCheck<T>(input_length);
	+ T res = PlusEqualAtomicView< T, DeviceType >( input_length );
	+ T resSerial = PlusEqualAtomicViewCheck< T >( input_length );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = PlusEqualAtomicViewTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	-
	//---------------------------------------------------
	//-----------atomic view minus-equal-----------------
	//---------------------------------------------------

	template<class T, class execution_space >
	struct MinusEqualAtomicViewFunctor {
	-
	- typedef Kokkos::View< T* , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	+ typedef Kokkos::View< T*, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
	+ typedef Kokkos::View< T*, execution_space > view_type;

	view_type input;
	atomic_view_type even_odd_result;
	const long length;

	- // Wrap the result view in an atomic view, use this for operator
	- MinusEqualAtomicViewFunctor( const view_type & input_ , view_type & even_odd_result_ , const long length_)
	- : input(input_)
	- , even_odd_result(even_odd_result_)
	- , length(length_)
	+ // Wrap the result view in an atomic view, use this for operator.
	+ MinusEqualAtomicViewFunctor( const view_type & input_, view_type & even_odd_result_, const long length_ )
	+ : input( input_ )
	+ , even_odd_result( even_odd_result_ )
	+ , length( length_ )
	{}

	KOKKOS_INLINE_FUNCTION
	- void operator()(const long i) const {
	+ void operator()( const long i ) const {
	if ( i < length ) {
	if ( i % 2 == 0 ) {
	- even_odd_result(0) -= input(i);
	+ even_odd_result( 0 ) -= input( i );
	}
	else {
	- even_odd_result(1) -= input(i);
	+ even_odd_result( 1 ) -= input( i );
	}
	}
	}
	-
	};

	-
	-template<class T, class execution_space >
	-T MinusEqualAtomicView(const long input_length) {
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	- typedef typename view_type::HostMirror host_view_type ;
	+template< class T, class execution_space >
	+T MinusEqualAtomicView( const long input_length ) {
	+ typedef Kokkos::View< T*, execution_space > view_type;
	+ typedef typename view_type::HostMirror host_view_type;

	const long length = input_length;

	- view_type input("input_view",length) ;
	- view_type result_view("result_view",2) ;
	+ view_type input( "input_view", length );
	+ view_type result_view( "result_view", 2 );

	- InitFunctor_Seq<T, execution_space> init_f( input , length ) ;
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
	+ InitFunctor_Seq< T, execution_space > init_f( input, length );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );

	- MinusEqualAtomicViewFunctor<T,execution_space> functor(input, result_view,length);
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
	+ MinusEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
	Kokkos::fence();

	- host_view_type h_result_view = Kokkos::create_mirror_view(result_view);
	- Kokkos::deep_copy(h_result_view, result_view);
	+ host_view_type h_result_view = Kokkos::create_mirror_view( result_view );
	+ Kokkos::deep_copy( h_result_view, result_view );

	- return (T) (h_result_view(0) + h_result_view(1) ) ;
	+ return (T) ( h_result_view( 0 ) + h_result_view( 1 ) );
	}

	-template<class T>
	+template< class T >
	T MinusEqualAtomicViewCheck( const long input_length ) {
	-
	const long N = input_length;
	T result[2];
	+
	if ( N % 2 == 0 ) {
	- const long half_sum_end = (N/2) - 1;
	+ const long half_sum_end = ( N / 2 ) - 1;
	const long full_sum_end = N - 1;
	- result[0] = -1( half_sum_end(half_sum_end + 1)/2 ) ; //even sum
	- result[1] = -1( ( full_sum_end(full_sum_end + 1)/2 ) + result[0] ) ; // odd sum
	+ result[0] = -1 * ( half_sum_end * ( half_sum_end + 1 ) / 2 ); // Even sum.
	+ result[1] = -1 * ( ( full_sum_end * ( full_sum_end + 1 ) / 2 ) + result[0] ); // Odd sum.
	}
	else {
	- const long half_sum_end = (long)(N/2) ;
	+ const long half_sum_end = (long) ( N / 2 );
	const long full_sum_end = N - 2;
	- result[0] = -1( half_sum_end(half_sum_end - 1)/2 ) ; //even sum
	- result[1] = -1( ( full_sum_end(full_sum_end - 1)/2 ) + result[0] ) ; // odd sum
	+ result[0] = -1 * ( half_sum_end * ( half_sum_end - 1 ) / 2 ); // Even sum.
	+ result[1] = -1 * ( ( full_sum_end * ( full_sum_end - 1 ) / 2 ) + result[0] ); // Odd sum.
	}

	- return (result[0] + result[1]);
	+ return ( result[0] + result[1] );
	}

	-template<class T,class DeviceType>
	-bool MinusEqualAtomicViewTest(long input_length)
	+template< class T, class DeviceType >
	+bool MinusEqualAtomicViewTest( long input_length )
	{
	- T res = MinusEqualAtomicView<T,DeviceType>(input_length);
	- T resSerial = MinusEqualAtomicViewCheck<T>(input_length);
	+ T res = MinusEqualAtomicView< T, DeviceType >( input_length );
	+ T resSerial = MinusEqualAtomicViewCheck< T >( input_length );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = MinusEqualAtomicViewTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	-
	//---------------------------------------------------
	//-----------atomic view times-equal-----------------
	//---------------------------------------------------

	template<class T, class execution_space >
	struct TimesEqualAtomicViewFunctor {
	-
	- typedef Kokkos::View< T* , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	+ typedef Kokkos::View< T*, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
	+ typedef Kokkos::View< T*, execution_space > view_type;

	view_type input;
	atomic_view_type result;
	const long length;

	// Wrap the result view in an atomic view, use this for operator
	- TimesEqualAtomicViewFunctor( const view_type & input_ , view_type & result_ , const long length_)
	- : input(input_)
	- , result(result_)
	- , length(length_)
	+ TimesEqualAtomicViewFunctor( const view_type & input_, view_type & result_, const long length_ )
	+ : input( input_ )
	+ , result( result_ )
	+ , length( length_ )
	{}

	KOKKOS_INLINE_FUNCTION
	- void operator()(const long i) const {
	+ void operator()( const long i ) const {
	if ( i < length && i > 0 ) {
	- result(0) *= (double)input(i);
	+ result( 0 ) *= (double) input( i );
	}
	}
	-
	};

	-
	-template<class T, class execution_space >
	-T TimesEqualAtomicView(const long input_length, const long remainder) {
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	- typedef typename view_type::HostMirror host_view_type ;
	+template< class T, class execution_space >
	+T TimesEqualAtomicView( const long input_length, const long remainder ) {
	+ typedef Kokkos::View< T*, execution_space > view_type;
	+ typedef typename view_type::HostMirror host_view_type;

	const long length = input_length;

	- view_type input("input_view",length) ;
	- view_type result_view("result_view",1) ;
	- deep_copy(result_view, 1.0);
	+ view_type input( "input_view", length );
	+ view_type result_view( "result_view", 1 );
	+ deep_copy( result_view, 1.0 );

	- InitFunctor_ModTimes<T, execution_space> init_f( input , length , remainder ) ;
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
	+ InitFunctor_ModTimes< T, execution_space > init_f( input, length, remainder );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );

	- TimesEqualAtomicViewFunctor<T,execution_space> functor(input, result_view, length);
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
	+ TimesEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
	Kokkos::fence();

	- host_view_type h_result_view = Kokkos::create_mirror_view(result_view);
	- Kokkos::deep_copy(h_result_view, result_view);
	+ host_view_type h_result_view = Kokkos::create_mirror_view( result_view );
	+ Kokkos::deep_copy( h_result_view, result_view );

	- return (T) (h_result_view(0)) ;
	+ return (T) ( h_result_view( 0 ) );
	}

	-template<class T>
	+template< class T >
	T TimesEqualAtomicViewCheck( const long input_length, const long remainder ) {
	-
	- //Analytical result
	+ // Analytical result.
	const long N = input_length;
	T result = 1.0;

	for ( long i = 2; i < N; ++i ) {
	- if ( i % (remainder+1) == remainder ) {
	+ if ( i % ( remainder + 1 ) == remainder ) {
	result *= 2.0;
	}
	else {
	result *= 1.0;
	}
	}

	- return (T)result;
	+ return (T) result;
	}

	-template<class T, class DeviceType>
	-bool TimesEqualAtomicViewTest(const long input_length)
	+template< class T, class DeviceType>
	+bool TimesEqualAtomicViewTest( const long input_length )
	{
	const long remainder = 23;
	- T res = TimesEqualAtomicView<T,DeviceType>(input_length, remainder);
	- T resSerial = TimesEqualAtomicViewCheck<T>(input_length, remainder);
	+ T res = TimesEqualAtomicView< T, DeviceType >( input_length, remainder );
	+ T resSerial = TimesEqualAtomicViewCheck< T >( input_length, remainder );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = TimesEqualAtomicViewTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	-
	//---------------------------------------------------
	//------------atomic view div-equal------------------
	//---------------------------------------------------

	template<class T, class execution_space >
	struct DivEqualAtomicViewFunctor {
	-
	- typedef Kokkos::View< T , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	- typedef Kokkos::View< T , execution_space > scalar_view_type ;
	+ typedef Kokkos::View< T, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
	+ typedef Kokkos::View< T*, execution_space > view_type;
	+ typedef Kokkos::View< T, execution_space > scalar_view_type;

	view_type input;
	atomic_view_type result;
	const long length;

	- // Wrap the result view in an atomic view, use this for operator
	- DivEqualAtomicViewFunctor( const view_type & input_ , scalar_view_type & result_ , const long length_)
	- : input(input_)
	- , result(result_)
	- , length(length_)
	+ // Wrap the result view in an atomic view, use this for operator.
	+ DivEqualAtomicViewFunctor( const view_type & input_, scalar_view_type & result_, const long length_ )
	+ : input( input_ )
	+ , result( result_ )
	+ , length( length_ )
	{}

	KOKKOS_INLINE_FUNCTION
	- void operator()(const long i) const {
	+ void operator()( const long i ) const {
	if ( i < length && i > 0 ) {
	- result() /= (double)(input(i));
	+ result() /= (double) ( input( i ) );
	}
	}
	-
	};

	-
	-template<class T, class execution_space >
	-T DivEqualAtomicView(const long input_length, const long remainder) {
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	- typedef Kokkos::View< T , execution_space > scalar_view_type ;
	- typedef typename scalar_view_type::HostMirror host_scalar_view_type ;
	+template< class T, class execution_space >
	+T DivEqualAtomicView( const long input_length, const long remainder ) {
	+ typedef Kokkos::View< T*, execution_space > view_type;
	+ typedef Kokkos::View< T, execution_space > scalar_view_type;
	+ typedef typename scalar_view_type::HostMirror host_scalar_view_type;

	const long length = input_length;

	- view_type input("input_view",length) ;
	- scalar_view_type result_view("result_view") ;
	- Kokkos::deep_copy(result_view, 12121212121);
	+ view_type input( "input_view", length );
	+ scalar_view_type result_view( "result_view" );
	+ Kokkos::deep_copy( result_view, 12121212121 );

	- InitFunctor_ModTimes<T, execution_space> init_f( input , length , remainder ) ;
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
	+ InitFunctor_ModTimes< T, execution_space > init_f( input, length, remainder );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );

	- DivEqualAtomicViewFunctor<T,execution_space> functor(input, result_view, length);
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
	+ DivEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
	Kokkos::fence();

	- host_scalar_view_type h_result_view = Kokkos::create_mirror_view(result_view);
	- Kokkos::deep_copy(h_result_view, result_view);
	+ host_scalar_view_type h_result_view = Kokkos::create_mirror_view( result_view );
	+ Kokkos::deep_copy( h_result_view, result_view );

	- return (T) (h_result_view()) ;
	+ return (T) ( h_result_view() );
	}

	-template<class T>
	-T DivEqualAtomicViewCheck( const long input_length , const long remainder ) {
	-
	+template< class T >
	+T DivEqualAtomicViewCheck( const long input_length, const long remainder ) {
	const long N = input_length;
	T result = 12121212121.0;
	for ( long i = 2; i < N; ++i ) {
	- if ( i % (remainder+1) == remainder ) {
	+ if ( i % ( remainder + 1 ) == remainder ) {
	result /= 1.0;
	}
	else {
	result /= 2.0;
	}
	-
	}

	- return (T)result;
	+ return (T) result;
	}

	-template<class T, class DeviceType>
	-bool DivEqualAtomicViewTest(const long input_length)
	+template< class T, class DeviceType >
	+bool DivEqualAtomicViewTest( const long input_length )
	{
	const long remainder = 23;

	- T res = DivEqualAtomicView<T,DeviceType>(input_length, remainder);
	- T resSerial = DivEqualAtomicViewCheck<T>(input_length, remainder);
	+ T res = DivEqualAtomicView< T, DeviceType >( input_length, remainder );
	+ T resSerial = DivEqualAtomicViewCheck< T >( input_length, remainder );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = DivEqualAtomicViewTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	-
	//---------------------------------------------------
	//------------atomic view mod-equal------------------
	//---------------------------------------------------

	-template<class T, class execution_space >
	+template< class T, class execution_space >
	struct ModEqualAtomicViewFunctor {
	-
	- typedef Kokkos::View< T , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	- typedef Kokkos::View< T , execution_space > scalar_view_type ;
	+ typedef Kokkos::View< T, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
	+ typedef Kokkos::View< T*, execution_space > view_type;
	+ typedef Kokkos::View< T, execution_space > scalar_view_type;

	view_type input;
	atomic_view_type result;
	const long length;

	- // Wrap the result view in an atomic view, use this for operator
	- ModEqualAtomicViewFunctor( const view_type & input_ , scalar_view_type & result_ , const long length_)
	- : input(input_)
	- , result(result_)
	- , length(length_)
	+ // Wrap the result view in an atomic view, use this for operator.
	+ ModEqualAtomicViewFunctor( const view_type & input_, scalar_view_type & result_, const long length_ )
	+ : input( input_ )
	+ , result( result_ )
	+ , length( length_ )
	{}

	KOKKOS_INLINE_FUNCTION
	- void operator()(const long i) const {
	+ void operator()( const long i ) const {
	if ( i < length && i > 0 ) {
	- result() %= (double)(input(i));
	+ result() %= (double) ( input( i ) );
	}
	}
	-
	};

	-
	-template<class T, class execution_space >
	-T ModEqualAtomicView(const long input_length, const long remainder) {
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	- typedef Kokkos::View< T , execution_space > scalar_view_type ;
	- typedef typename scalar_view_type::HostMirror host_scalar_view_type ;
	+template< class T, class execution_space >
	+T ModEqualAtomicView( const long input_length, const long remainder ) {
	+ typedef Kokkos::View< T*, execution_space > view_type;
	+ typedef Kokkos::View< T, execution_space > scalar_view_type;
	+ typedef typename scalar_view_type::HostMirror host_scalar_view_type;

	const long length = input_length;

	- view_type input("input_view",length) ;
	- scalar_view_type result_view("result_view") ;
	- Kokkos::deep_copy(result_view, 12121212121);
	+ view_type input( "input_view", length );
	+ scalar_view_type result_view( "result_view" );
	+ Kokkos::deep_copy( result_view, 12121212121 );

	- InitFunctor_ModTimes<T, execution_space> init_f( input , length , remainder ) ;
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
	+ InitFunctor_ModTimes< T, execution_space > init_f( input, length, remainder );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );

	- ModEqualAtomicViewFunctor<T,execution_space> functor(input, result_view, length);
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
	+ ModEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
	Kokkos::fence();

	- host_scalar_view_type h_result_view = Kokkos::create_mirror_view(result_view);
	- Kokkos::deep_copy(h_result_view, result_view);
	+ host_scalar_view_type h_result_view = Kokkos::create_mirror_view( result_view );
	+ Kokkos::deep_copy( h_result_view, result_view );

	- return (T) (h_result_view()) ;
	+ return (T) ( h_result_view() );
	}

	-template<class T>
	-T ModEqualAtomicViewCheck( const long input_length , const long remainder ) {
	-
	+template< class T >
	+T ModEqualAtomicViewCheck( const long input_length, const long remainder ) {
	const long N = input_length;
	T result = 12121212121;
	for ( long i = 2; i < N; ++i ) {
	- if ( i % (remainder+1) == remainder ) {
	+ if ( i % ( remainder + 1 ) == remainder ) {
	result %= 1;
	}
	else {
	result %= 2;
	}
	}

	- return (T)result;
	+ return (T) result;
	}

	-template<class T, class DeviceType>
	-bool ModEqualAtomicViewTest(const long input_length)
	+template< class T, class DeviceType >
	+bool ModEqualAtomicViewTest( const long input_length )
	{
	-
	- static_assert( std::is_integral<T>::value, "ModEqualAtomicView Error: Type must be integral type for this unit test");
	+ static_assert( std::is_integral< T >::value, "ModEqualAtomicView Error: Type must be integral type for this unit test" );

	const long remainder = 23;

	- T res = ModEqualAtomicView<T,DeviceType>(input_length, remainder);
	- T resSerial = ModEqualAtomicViewCheck<T>(input_length, remainder);
	+ T res = ModEqualAtomicView< T, DeviceType >( input_length, remainder );
	+ T resSerial = ModEqualAtomicViewCheck< T >( input_length, remainder );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = ModEqualAtomicViewTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	-
	//---------------------------------------------------
	//------------atomic view rs-equal------------------
	//---------------------------------------------------

	-template<class T, class execution_space >
	+template< class T, class execution_space >
	struct RSEqualAtomicViewFunctor {
	-
	- typedef Kokkos::View< T**** , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	- typedef Kokkos::View< T**** , execution_space > result_view_type ;
	+ typedef Kokkos::View< T****, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
	+ typedef Kokkos::View< T*, execution_space > view_type;
	+ typedef Kokkos::View< T****, execution_space > result_view_type;

	const view_type input;
	atomic_view_type result;
	const long length;
	const long value;

	- // Wrap the result view in an atomic view, use this for operator
	- RSEqualAtomicViewFunctor( const view_type & input_ , result_view_type & result_ , const long & length_ , const long & value_ )
	- : input(input_)
	- , result(result_)
	- , length(length_)
	- , value(value_)
	+ // Wrap the result view in an atomic view, use this for operator.
	+ RSEqualAtomicViewFunctor( const view_type & input_, result_view_type & result_, const long & length_, const long & value_ )
	+ : input( input_ )
	+ , result( result_ )
	+ , length( length_ )
	+ , value( value_ )
	{}

	KOKKOS_INLINE_FUNCTION
	- void operator()(const long i) const {
	+ void operator()( const long i ) const {
	if ( i < length ) {
	if ( i % 4 == 0 ) {
	- result(1,0,0,0) >>= input(i);
	+ result( 1, 0, 0, 0 ) >>= input( i );
	}
	else if ( i % 4 == 1 ) {
	- result(0,1,0,0) >>= input(i);
	+ result( 0, 1, 0, 0 ) >>= input( i );
	}
	else if ( i % 4 == 2 ) {
	- result(0,0,1,0) >>= input(i);
	+ result( 0, 0, 1, 0 ) >>= input( i );
	}
	else if ( i % 4 == 3 ) {
	- result(0,0,0,1) >>= input(i);
	+ result( 0, 0, 0, 1 ) >>= input( i );
	}
	}
	}
	-
	};

	-
	-template<class T, class execution_space >
	-T RSEqualAtomicView(const long input_length, const long value, const long remainder) {
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	- typedef Kokkos::View< T**** , execution_space > result_view_type ;
	- typedef typename result_view_type::HostMirror host_scalar_view_type ;
	+template< class T, class execution_space >
	+T RSEqualAtomicView( const long input_length, const long value, const long remainder ) {
	+ typedef Kokkos::View< T*, execution_space > view_type;
	+ typedef Kokkos::View< T****, execution_space > result_view_type;
	+ typedef typename result_view_type::HostMirror host_scalar_view_type;

	const long length = input_length;

	- view_type input("input_view",length) ;
	- result_view_type result_view("result_view",2,2,2,2) ;
	- host_scalar_view_type h_result_view = Kokkos::create_mirror_view(result_view);
	- h_result_view(1,0,0,0) = value;
	- h_result_view(0,1,0,0) = value;
	- h_result_view(0,0,1,0) = value;
	- h_result_view(0,0,0,1) = value;
	- Kokkos::deep_copy( result_view , h_result_view );
	+ view_type input( "input_view", length );
	+ result_view_type result_view( "result_view", 2, 2, 2, 2 );
	+ host_scalar_view_type h_result_view = Kokkos::create_mirror_view( result_view );
	+ h_result_view( 1, 0, 0, 0 ) = value;
	+ h_result_view( 0, 1, 0, 0 ) = value;
	+ h_result_view( 0, 0, 1, 0 ) = value;
	+ h_result_view( 0, 0, 0, 1 ) = value;
	+ Kokkos::deep_copy( result_view, h_result_view );

	+ InitFunctor_ModShift< T, execution_space > init_f( input, length, remainder );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );

	- InitFunctor_ModShift<T, execution_space> init_f( input , length , remainder ) ;
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
	-
	- RSEqualAtomicViewFunctor<T,execution_space> functor(input, result_view, length, value);
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
	+ RSEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length, value );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
	Kokkos::fence();

	- Kokkos::deep_copy(h_result_view, result_view);
	+ Kokkos::deep_copy( h_result_view, result_view );

	- return (T) (h_result_view(1,0,0,0)) ;
	+ return (T) ( h_result_view( 1, 0, 0, 0 ) );
	}

	-template<class T>
	+template< class T >
	T RSEqualAtomicViewCheck( const long input_length, const long value, const long remainder ) {
	-
	- T result[4] ;
	- result[0] = value ;
	- result[1] = value ;
	- result[2] = value ;
	- result[3] = value ;
	+ T result[4];
	+ result[0] = value;
	+ result[1] = value;
	+ result[2] = value;
	+ result[3] = value;

	T * input = new T[input_length];
	for ( long i = 0; i < input_length; ++i ) {
	- if ( i % (remainder+1) == remainder ) {
	- input[i] = 1;
	- }
	- else {
	- input[i] = 0;
	- }
	+ if ( i % ( remainder + 1 ) == remainder ) {
	+ input[i] = 1;
	+ }
	+ else {
	+ input[i] = 0;
	+ }
	}

	for ( long i = 0; i < input_length; ++i ) {
	- if ( i % 4 == 0 ) {
	- result[0] >>= input[i];
	- }
	- else if ( i % 4 == 1 ) {
	- result[1] >>= input[i];
	- }
	- else if ( i % 4 == 2 ) {
	- result[2] >>= input[i];
	- }
	- else if ( i % 4 == 3 ) {
	- result[3] >>= input[i];
	- }
	+ if ( i % 4 == 0 ) {
	+ result[0] >>= input[i];
	+ }
	+ else if ( i % 4 == 1 ) {
	+ result[1] >>= input[i];
	+ }
	+ else if ( i % 4 == 2 ) {
	+ result[2] >>= input[i];
	+ }
	+ else if ( i % 4 == 3 ) {
	+ result[3] >>= input[i];
	+ }
	}
	+
	delete [] input;

	- return (T)result[0];
	+ return (T) result[0];
	}

	-template<class T, class DeviceType>
	-bool RSEqualAtomicViewTest(const long input_length)
	+template< class T, class DeviceType >
	+bool RSEqualAtomicViewTest( const long input_length )
	{
	-
	- static_assert( std::is_integral<T>::value, "RSEqualAtomicViewTest: Must be integral type for test");
	+ static_assert( std::is_integral< T >::value, "RSEqualAtomicViewTest: Must be integral type for test" );

	const long remainder = 61042; //prime - 1
	- const long value = 1073741825; // 2^30+1
	- T res = RSEqualAtomicView<T,DeviceType>(input_length, value, remainder);
	- T resSerial = RSEqualAtomicViewCheck<T>(input_length, value, remainder);
	+ const long value = 1073741825; // 2^30+1
	+ T res = RSEqualAtomicView< T, DeviceType >( input_length, value, remainder );
	+ T resSerial = RSEqualAtomicViewCheck< T >( input_length, value, remainder );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = RSEqualAtomicViewTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	-
	//---------------------------------------------------
	//------------atomic view ls-equal------------------
	//---------------------------------------------------

	template<class T, class execution_space >
	struct LSEqualAtomicViewFunctor {
	-
	- typedef Kokkos::View< T**** , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	- typedef Kokkos::View< T**** , execution_space > result_view_type ;
	+ typedef Kokkos::View< T****, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
	+ typedef Kokkos::View< T*, execution_space > view_type;
	+ typedef Kokkos::View< T****, execution_space > result_view_type;

	view_type input;
	atomic_view_type result;
	const long length;
	const long value;

	- // Wrap the result view in an atomic view, use this for operator
	- LSEqualAtomicViewFunctor( const view_type & input_ , result_view_type & result_ , const long & length_ , const long & value_ )
	- : input(input_)
	- , result(result_)
	- , length(length_)
	- , value(value_)
	+ // Wrap the result view in an atomic view, use this for operator.
	+ LSEqualAtomicViewFunctor( const view_type & input_, result_view_type & result_, const long & length_, const long & value_ )
	+ : input( input_ )
	+ , result( result_ )
	+ , length( length_ )
	+ , value( value_ )
	{}

	KOKKOS_INLINE_FUNCTION
	- void operator()(const long i) const {
	+ void operator()( const long i ) const {
	if ( i < length ) {
	if ( i % 4 == 0 ) {
	- result(1,0,0,0) <<= input(i);
	+ result( 1, 0, 0, 0 ) <<= input( i );
	}
	else if ( i % 4 == 1 ) {
	- result(0,1,0,0) <<= input(i);
	+ result( 0, 1, 0, 0 ) <<= input( i );
	}
	else if ( i % 4 == 2 ) {
	- result(0,0,1,0) <<= input(i);
	+ result( 0, 0, 1, 0 ) <<= input( i );
	}
	else if ( i % 4 == 3 ) {
	- result(0,0,0,1) <<= input(i);
	+ result( 0, 0, 0, 1 ) <<= input( i );
	}
	}
	}
	-
	};

	-
	-template<class T, class execution_space >
	-T LSEqualAtomicView(const long input_length, const long value, const long remainder) {
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	- typedef Kokkos::View< T**** , execution_space > result_view_type ;
	- typedef typename result_view_type::HostMirror host_scalar_view_type ;
	+template< class T, class execution_space >
	+T LSEqualAtomicView( const long input_length, const long value, const long remainder ) {
	+ typedef Kokkos::View< T*, execution_space > view_type;
	+ typedef Kokkos::View< T****, execution_space > result_view_type;
	+ typedef typename result_view_type::HostMirror host_scalar_view_type;

	const long length = input_length;

	- view_type input("input_view",length) ;
	- result_view_type result_view("result_view",2,2,2,2) ;
	- host_scalar_view_type h_result_view = Kokkos::create_mirror_view(result_view);
	- h_result_view(1,0,0,0) = value;
	- h_result_view(0,1,0,0) = value;
	- h_result_view(0,0,1,0) = value;
	- h_result_view(0,0,0,1) = value;
	- Kokkos::deep_copy( result_view , h_result_view );
	+ view_type input( "input_view", length );
	+ result_view_type result_view( "result_view", 2, 2, 2, 2 );
	+ host_scalar_view_type h_result_view = Kokkos::create_mirror_view( result_view );
	+ h_result_view( 1, 0, 0, 0 ) = value;
	+ h_result_view( 0, 1, 0, 0 ) = value;
	+ h_result_view( 0, 0, 1, 0 ) = value;
	+ h_result_view( 0, 0, 0, 1 ) = value;
	+ Kokkos::deep_copy( result_view, h_result_view );

	- InitFunctor_ModShift<T, execution_space> init_f( input , length , remainder ) ;
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
	+ InitFunctor_ModShift< T, execution_space > init_f( input, length, remainder );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );

	- LSEqualAtomicViewFunctor<T,execution_space> functor(input, result_view, length, value);
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
	+ LSEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length, value );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
	Kokkos::fence();

	- Kokkos::deep_copy(h_result_view, result_view);
	+ Kokkos::deep_copy( h_result_view, result_view );

	- return (T) (h_result_view(1,0,0,0)) ;
	+ return (T) ( h_result_view( 1, 0, 0, 0 ) );
	}

	-template<class T>
	+template< class T >
	T LSEqualAtomicViewCheck( const long input_length, const long value, const long remainder ) {
	-
	- T result[4] ;
	- result[0] = value ;
	- result[1] = value ;
	- result[2] = value ;
	- result[3] = value ;
	+ T result[4];
	+ result[0] = value;
	+ result[1] = value;
	+ result[2] = value;
	+ result[3] = value;

	T * input = new T[input_length];
	for ( long i = 0; i < input_length; ++i ) {
	- if ( i % (remainder+1) == remainder ) {
	- input[i] = 1;
	- }
	- else {
	- input[i] = 0;
	- }
	+ if ( i % ( remainder + 1 ) == remainder ) {
	+ input[i] = 1;
	+ }
	+ else {
	+ input[i] = 0;
	+ }
	}

	for ( long i = 0; i < input_length; ++i ) {
	- if ( i % 4 == 0 ) {
	- result[0] <<= input[i];
	- }
	- else if ( i % 4 == 1 ) {
	- result[1] <<= input[i];
	- }
	- else if ( i % 4 == 2 ) {
	- result[2] <<= input[i];
	- }
	- else if ( i % 4 == 3 ) {
	- result[3] <<= input[i];
	- }
	+ if ( i % 4 == 0 ) {
	+ result[0] <<= input[i];
	+ }
	+ else if ( i % 4 == 1 ) {
	+ result[1] <<= input[i];
	+ }
	+ else if ( i % 4 == 2 ) {
	+ result[2] <<= input[i];
	+ }
	+ else if ( i % 4 == 3 ) {
	+ result[3] <<= input[i];
	+ }
	}

	delete [] input;

	- return (T)result[0];
	+ return (T) result[0];
	}

	-template<class T, class DeviceType>
	-bool LSEqualAtomicViewTest(const long input_length)
	+template< class T, class DeviceType >
	+bool LSEqualAtomicViewTest( const long input_length )
	{
	-
	- static_assert( std::is_integral<T>::value, "LSEqualAtomicViewTest: Must be integral type for test");
	+ static_assert( std::is_integral< T >::value, "LSEqualAtomicViewTest: Must be integral type for test" );

	const long remainder = 61042; //prime - 1
	- const long value = 1; // 2^30+1
	- T res = LSEqualAtomicView<T,DeviceType>(input_length, value, remainder);
	- T resSerial = LSEqualAtomicViewCheck<T>(input_length, value, remainder);
	+ const long value = 1; // 2^30+1
	+ T res = LSEqualAtomicView< T, DeviceType >( input_length, value, remainder );
	+ T resSerial = LSEqualAtomicViewCheck< T >( input_length, value, remainder );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = RSEqualAtomicViewTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	-
	//---------------------------------------------------
	//-----------atomic view and-equal-----------------
	//---------------------------------------------------

	-template<class T, class execution_space >
	+template< class T, class execution_space >
	struct AndEqualAtomicViewFunctor {
	-
	- typedef Kokkos::View< T* , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	+ typedef Kokkos::View< T*, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
	+ typedef Kokkos::View< T*, execution_space > view_type;

	view_type input;
	atomic_view_type even_odd_result;
	const long length;

	- // Wrap the result view in an atomic view, use this for operator
	- AndEqualAtomicViewFunctor( const view_type & input_ , view_type & even_odd_result_ , const long length_)
	- : input(input_)
	- , even_odd_result(even_odd_result_)
	- , length(length_)
	+ // Wrap the result view in an atomic view, use this for operator.
	+ AndEqualAtomicViewFunctor( const view_type & input_, view_type & even_odd_result_, const long length_ )
	+ : input( input_ )
	+ , even_odd_result( even_odd_result_ )
	+ , length( length_ )
	{}

	KOKKOS_INLINE_FUNCTION
	- void operator()(const long i) const {
	+ void operator()( const long i ) const {
	if ( i < length ) {
	if ( i % 2 == 0 ) {
	- even_odd_result(0) &= input(i);
	+ even_odd_result( 0 ) &= input( i );
	}
	else {
	- even_odd_result(1) &= input(i);
	+ even_odd_result( 1 ) &= input( i );
	}
	}
	}
	-
	};

	-
	-template<class T, class execution_space >
	-T AndEqualAtomicView(const long input_length) {
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	- typedef typename view_type::HostMirror host_view_type ;
	+template< class T, class execution_space >
	+T AndEqualAtomicView( const long input_length ) {
	+ typedef Kokkos::View< T*, execution_space > view_type;
	+ typedef typename view_type::HostMirror host_view_type;

	const long length = input_length;

	- view_type input("input_view",length) ;
	- view_type result_view("result_view",2) ;
	- Kokkos::deep_copy(result_view, 1);
	+ view_type input( "input_view", length );
	+ view_type result_view( "result_view", 2 );
	+ Kokkos::deep_copy( result_view, 1 );

	- InitFunctor_Seq<T, execution_space> init_f( input , length ) ;
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
	+ InitFunctor_Seq< T, execution_space > init_f( input, length );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );

	- AndEqualAtomicViewFunctor<T,execution_space> functor(input, result_view,length);
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
	+ AndEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
	Kokkos::fence();

	- host_view_type h_result_view = Kokkos::create_mirror_view(result_view);
	- Kokkos::deep_copy(h_result_view, result_view);
	+ host_view_type h_result_view = Kokkos::create_mirror_view( result_view );
	+ Kokkos::deep_copy( h_result_view, result_view );

	- return (T) (h_result_view(0)) ;
	+ return (T) ( h_result_view( 0 ) );
	}

	-template<class T>
	+template< class T >
	T AndEqualAtomicViewCheck( const long input_length ) {
	-
	const long N = input_length;
	- T result[2] = {1};
	+ T result[2] = { 1 };
	for ( long i = 0; i < N; ++i ) {
	if ( N % 2 == 0 ) {
	- result[0] &= (T)i;
	+ result[0] &= (T) i;
	}
	else {
	- result[1] &= (T)i;
	+ result[1] &= (T) i;
	}
	}

	- return (result[0]);
	+ return ( result[0] );
	}

	-template<class T,class DeviceType>
	-bool AndEqualAtomicViewTest(long input_length)
	+template< class T, class DeviceType >
	+bool AndEqualAtomicViewTest( long input_length )
	{
	+ static_assert( std::is_integral< T >::value, "AndEqualAtomicViewTest: Must be integral type for test" );

	- static_assert( std::is_integral<T>::value, "AndEqualAtomicViewTest: Must be integral type for test");
	-
	- T res = AndEqualAtomicView<T,DeviceType>(input_length);
	- T resSerial = AndEqualAtomicViewCheck<T>(input_length);
	+ T res = AndEqualAtomicView< T, DeviceType >( input_length );
	+ T resSerial = AndEqualAtomicViewCheck< T >( input_length );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = AndEqualAtomicViewTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	-
	//---------------------------------------------------
	//-----------atomic view or-equal-----------------
	//---------------------------------------------------

	-template<class T, class execution_space >
	+template< class T, class execution_space >
	struct OrEqualAtomicViewFunctor {
	-
	- typedef Kokkos::View< T* , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	+ typedef Kokkos::View< T*, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
	+ typedef Kokkos::View< T*, execution_space > view_type;

	view_type input;
	atomic_view_type even_odd_result;
	const long length;

	- // Wrap the result view in an atomic view, use this for operator
	- OrEqualAtomicViewFunctor( const view_type & input_ , view_type & even_odd_result_ , const long length_)
	- : input(input_)
	- , even_odd_result(even_odd_result_)
	- , length(length_)
	+ // Wrap the result view in an atomic view, use this for operator.
	+ OrEqualAtomicViewFunctor( const view_type & input_, view_type & even_odd_result_, const long length_ )
	+ : input( input_ )
	+ , even_odd_result( even_odd_result_ )
	+ , length( length_ )
	{}

	KOKKOS_INLINE_FUNCTION
	- void operator()(const long i) const {
	+ void operator()( const long i ) const {
	if ( i < length ) {
	if ( i % 2 == 0 ) {
	- even_odd_result(0) \|= input(i);
	+ even_odd_result( 0 ) \|= input( i );
	}
	else {
	- even_odd_result(1) \|= input(i);
	+ even_odd_result( 1 ) \|= input( i );
	}
	}
	}
	-
	};

	-
	-template<class T, class execution_space >
	-T OrEqualAtomicView(const long input_length) {
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	- typedef typename view_type::HostMirror host_view_type ;
	+template< class T, class execution_space >
	+T OrEqualAtomicView( const long input_length ) {
	+ typedef Kokkos::View< T*, execution_space > view_type;
	+ typedef typename view_type::HostMirror host_view_type;

	const long length = input_length;

	- view_type input("input_view",length) ;
	- view_type result_view("result_view",2) ;
	+ view_type input( "input_view", length );
	+ view_type result_view( "result_view", 2 );

	- InitFunctor_Seq<T, execution_space> init_f( input , length ) ;
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
	+ InitFunctor_Seq< T, execution_space > init_f( input, length );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );

	- OrEqualAtomicViewFunctor<T,execution_space> functor(input, result_view,length);
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
	+ OrEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
	Kokkos::fence();

	- host_view_type h_result_view = Kokkos::create_mirror_view(result_view);
	- Kokkos::deep_copy(h_result_view, result_view);
	+ host_view_type h_result_view = Kokkos::create_mirror_view( result_view );
	+ Kokkos::deep_copy( h_result_view, result_view );

	- return (T) (h_result_view(0)) ;
	+ return (T) ( h_result_view( 0 ) );
	}

	-template<class T>
	+template< class T >
	T OrEqualAtomicViewCheck( const long input_length ) {

	const long N = input_length;
	- T result[2] = {0};
	+ T result[2] = { 0 };
	for ( long i = 0; i < N; ++i ) {
	if ( i % 2 == 0 ) {
	- result[0] \|= (T)i;
	+ result[0] \|= (T) i;
	}
	else {
	- result[1] \|= (T)i;
	+ result[1] \|= (T) i;
	}
	}

	- return (T)(result[0]);
	+ return (T) ( result[0] );
	}

	-template<class T,class DeviceType>
	-bool OrEqualAtomicViewTest(long input_length)
	+template< class T, class DeviceType >
	+bool OrEqualAtomicViewTest( long input_length )
	{
	-
	- static_assert( std::is_integral<T>::value, "OrEqualAtomicViewTest: Must be integral type for test");
	+ static_assert( std::is_integral< T >::value, "OrEqualAtomicViewTest: Must be integral type for test" );

	- T res = OrEqualAtomicView<T,DeviceType>(input_length);
	- T resSerial = OrEqualAtomicViewCheck<T>(input_length);
	+ T res = OrEqualAtomicView< T, DeviceType >( input_length );
	+ T resSerial = OrEqualAtomicViewCheck< T >( input_length );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = OrEqualAtomicViewTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	-
	//---------------------------------------------------
	//-----------atomic view xor-equal-----------------
	//---------------------------------------------------

	-template<class T, class execution_space >
	+template< class T, class execution_space >
	struct XOrEqualAtomicViewFunctor {
	-
	- typedef Kokkos::View< T* , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	+ typedef Kokkos::View< T*, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
	+ typedef Kokkos::View< T*, execution_space > view_type;

	view_type input;
	atomic_view_type even_odd_result;
	const long length;

	- // Wrap the result view in an atomic view, use this for operator
	- XOrEqualAtomicViewFunctor( const view_type & input_ , view_type & even_odd_result_ , const long length_)
	- : input(input_)
	- , even_odd_result(even_odd_result_)
	- , length(length_)
	+ // Wrap the result view in an atomic view, use this for operator.
	+ XOrEqualAtomicViewFunctor( const view_type & input_, view_type & even_odd_result_, const long length_ )
	+ : input( input_ )
	+ , even_odd_result( even_odd_result_ )
	+ , length( length_ )
	{}

	KOKKOS_INLINE_FUNCTION
	- void operator()(const long i) const {
	+ void operator()( const long i ) const {
	if ( i < length ) {
	if ( i % 2 == 0 ) {
	- even_odd_result(0) ^= input(i);
	+ even_odd_result( 0 ) ^= input( i );
	}
	else {
	- even_odd_result(1) ^= input(i);
	+ even_odd_result( 1 ) ^= input( i );
	}
	}
	}
	-
	};

	-
	-template<class T, class execution_space >
	-T XOrEqualAtomicView(const long input_length) {
	-
	- typedef Kokkos::View< T* , execution_space > view_type ;
	- typedef typename view_type::HostMirror host_view_type ;
	+template< class T, class execution_space >
	+T XOrEqualAtomicView( const long input_length ) {
	+ typedef Kokkos::View< T*, execution_space > view_type;
	+ typedef typename view_type::HostMirror host_view_type;

	const long length = input_length;

	- view_type input("input_view",length) ;
	- view_type result_view("result_view",2) ;
	+ view_type input( "input_view", length );
	+ view_type result_view( "result_view", 2 );

	- InitFunctor_Seq<T, execution_space> init_f( input , length ) ;
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
	+ InitFunctor_Seq< T, execution_space > init_f( input, length );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );

	- XOrEqualAtomicViewFunctor<T,execution_space> functor(input, result_view,length);
	- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
	+ XOrEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
	Kokkos::fence();

	- host_view_type h_result_view = Kokkos::create_mirror_view(result_view);
	- Kokkos::deep_copy(h_result_view, result_view);
	+ host_view_type h_result_view = Kokkos::create_mirror_view( result_view );
	+ Kokkos::deep_copy( h_result_view, result_view );

	- return (T) (h_result_view(0)) ;
	+ return (T) ( h_result_view( 0 ) );
	}

	-template<class T>
	+template< class T >
	T XOrEqualAtomicViewCheck( const long input_length ) {
	-
	const long N = input_length;
	- T result[2] = {0};
	+ T result[2] = { 0 };
	for ( long i = 0; i < N; ++i ) {
	if ( i % 2 == 0 ) {
	- result[0] ^= (T)i;
	+ result[0] ^= (T) i;
	}
	else {
	- result[1] ^= (T)i;
	+ result[1] ^= (T) i;
	}
	}

	- return (T)(result[0]);
	+ return (T) ( result[0] );
	}

	-template<class T,class DeviceType>
	-bool XOrEqualAtomicViewTest(long input_length)
	+template< class T, class DeviceType >
	+bool XOrEqualAtomicViewTest( long input_length )
	{
	+ static_assert( std::is_integral< T >::value, "XOrEqualAtomicViewTest: Must be integral type for test" );

	- static_assert( std::is_integral<T>::value, "XOrEqualAtomicViewTest: Must be integral type for test");
	-
	- T res = XOrEqualAtomicView<T,DeviceType>(input_length);
	- T resSerial = XOrEqualAtomicViewCheck<T>(input_length);
	+ T res = XOrEqualAtomicView< T, DeviceType >( input_length );
	+ T resSerial = XOrEqualAtomicViewCheck< T >( input_length );

	bool passed = true;

	if ( resSerial != res ) {
	passed = false;

	std::cout << "Loop<"
	- << typeid(T).name()
	+ << typeid( T ).name()
	<< ">( test = XOrEqualAtomicViewTest"
	<< " FAILED : "
	<< resSerial << " != " << res
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	}

	-
	// inc/dec?

	-
	//---------------------------------------------------
	//--------------atomic_test_control------------------
	//---------------------------------------------------

	-template<class T,class DeviceType>
	-bool AtomicViewsTestIntegralType( const int length , int test )
	+template< class T, class DeviceType >
	+bool AtomicViewsTestIntegralType( const int length, int test )
	{
	- static_assert( std::is_integral<T>::value, "TestAtomicViews Error: Non-integral type passed into IntegralType tests");
	-
	- switch (test) {
	- case 1: return PlusEqualAtomicViewTest<T,DeviceType>( length );
	- case 2: return MinusEqualAtomicViewTest<T,DeviceType>( length );
	- case 3: return RSEqualAtomicViewTest<T,DeviceType>( length );
	- case 4: return LSEqualAtomicViewTest<T,DeviceType>( length );
	- case 5: return ModEqualAtomicViewTest<T,DeviceType>( length );
	- case 6: return AndEqualAtomicViewTest<T,DeviceType>( length );
	- case 7: return OrEqualAtomicViewTest<T,DeviceType>( length );
	- case 8: return XOrEqualAtomicViewTest<T,DeviceType>( length );
	+ static_assert( std::is_integral< T >::value, "TestAtomicViews Error: Non-integral type passed into IntegralType tests" );
	+
	+ switch ( test ) {
	+ case 1: return PlusEqualAtomicViewTest< T, DeviceType >( length );
	+ case 2: return MinusEqualAtomicViewTest< T, DeviceType >( length );
	+ case 3: return RSEqualAtomicViewTest< T, DeviceType >( length );
	+ case 4: return LSEqualAtomicViewTest< T, DeviceType >( length );
	+ case 5: return ModEqualAtomicViewTest< T, DeviceType >( length );
	+ case 6: return AndEqualAtomicViewTest< T, DeviceType >( length );
	+ case 7: return OrEqualAtomicViewTest< T, DeviceType >( length );
	+ case 8: return XOrEqualAtomicViewTest< T, DeviceType >( length );
	}
	+
	return 0;
	}

	-
	-template<class T,class DeviceType>
	-bool AtomicViewsTestNonIntegralType( const int length , int test )
	+template< class T, class DeviceType >
	+bool AtomicViewsTestNonIntegralType( const int length, int test )
	{
	- switch (test) {
	- case 1: return PlusEqualAtomicViewTest<T,DeviceType>( length );
	- case 2: return MinusEqualAtomicViewTest<T,DeviceType>( length );
	- case 3: return TimesEqualAtomicViewTest<T,DeviceType>( length );
	- case 4: return DivEqualAtomicViewTest<T,DeviceType>( length );
	+ switch ( test ) {
	+ case 1: return PlusEqualAtomicViewTest< T, DeviceType >( length );
	+ case 2: return MinusEqualAtomicViewTest< T, DeviceType >( length );
	+ case 3: return TimesEqualAtomicViewTest< T, DeviceType >( length );
	+ case 4: return DivEqualAtomicViewTest< T, DeviceType >( length );
	}
	+
	return 0;
	}

	-} // namespace
	-
	+} // namespace TestAtomicViews
	diff --git a/lib/kokkos/core/unit_test/TestCXX11.hpp b/lib/kokkos/core/unit_test/TestCXX11.hpp
	index d6dde5e96..e2ad623d9 100644
	--- a/lib/kokkos/core/unit_test/TestCXX11.hpp
	+++ b/lib/kokkos/core/unit_test/TestCXX11.hpp
	@@ -1,334 +1,345 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <Kokkos_Core.hpp>

	namespace TestCXX11 {

	-template<class DeviceType>
	-struct FunctorAddTest{
	- typedef Kokkos::View<double**,DeviceType> view_type;
	- view_type a_, b_;
	+template< class DeviceType >
	+struct FunctorAddTest {
	+ typedef Kokkos::View< double**, DeviceType > view_type;
	typedef DeviceType execution_space;
	- FunctorAddTest(view_type & a, view_type &b):a_(a),b_(b) {}
	+ typedef typename Kokkos::TeamPolicy< execution_space >::member_type team_member;
	+
	+ view_type a_, b_;
	+
	+ FunctorAddTest( view_type & a, view_type & b ) : a_( a ), b_( b ) {}
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i) const {
	- b_(i,0) = a_(i,1) + a_(i,2);
	- b_(i,1) = a_(i,0) - a_(i,3);
	- b_(i,2) = a_(i,4) + a_(i,0);
	- b_(i,3) = a_(i,2) - a_(i,1);
	- b_(i,4) = a_(i,3) + a_(i,4);
	+ void operator() ( const int& i ) const {
	+ b_( i, 0 ) = a_( i, 1 ) + a_( i, 2 );
	+ b_( i, 1 ) = a_( i, 0 ) - a_( i, 3 );
	+ b_( i, 2 ) = a_( i, 4 ) + a_( i, 0 );
	+ b_( i, 3 ) = a_( i, 2 ) - a_( i, 1 );
	+ b_( i, 4 ) = a_( i, 3 ) + a_( i, 4 );
	}

	- typedef typename Kokkos::TeamPolicy< execution_space >::member_type team_member ;
	KOKKOS_INLINE_FUNCTION
	- void operator() (const team_member & dev) const {
	- const int begin = dev.league_rank() * 4 ;
	- const int end = begin + 4 ;
	- for ( int i = begin + dev.team_rank() ; i < end ; i += dev.team_size() ) {
	- b_(i,0) = a_(i,1) + a_(i,2);
	- b_(i,1) = a_(i,0) - a_(i,3);
	- b_(i,2) = a_(i,4) + a_(i,0);
	- b_(i,3) = a_(i,2) - a_(i,1);
	- b_(i,4) = a_(i,3) + a_(i,4);
	+ void operator() ( const team_member & dev ) const {
	+ const int begin = dev.league_rank() * 4;
	+ const int end = begin + 4;
	+ for ( int i = begin + dev.team_rank(); i < end; i += dev.team_size() ) {
	+ b_( i, 0 ) = a_( i, 1 ) + a_( i, 2 );
	+ b_( i, 1 ) = a_( i, 0 ) - a_( i, 3 );
	+ b_( i, 2 ) = a_( i, 4 ) + a_( i, 0 );
	+ b_( i, 3 ) = a_( i, 2 ) - a_( i, 1 );
	+ b_( i, 4 ) = a_( i, 3 ) + a_( i, 4 );
	}
	}
	};

	-template<class DeviceType, bool PWRTest>
	+template< class DeviceType, bool PWRTest >
	double AddTestFunctor() {
	+ typedef Kokkos::TeamPolicy< DeviceType > policy_type;

	- typedef Kokkos::TeamPolicy<DeviceType> policy_type ;
	-
	- Kokkos::View<double**,DeviceType> a("A",100,5);
	- Kokkos::View<double**,DeviceType> b("B",100,5);
	- typename Kokkos::View<double**,DeviceType>::HostMirror h_a = Kokkos::create_mirror_view(a);
	- typename Kokkos::View<double**,DeviceType>::HostMirror h_b = Kokkos::create_mirror_view(b);
	+ Kokkos::View< double**, DeviceType > a( "A", 100, 5 );
	+ Kokkos::View< double**, DeviceType > b( "B", 100, 5 );
	+ typename Kokkos::View< double**, DeviceType >::HostMirror h_a = Kokkos::create_mirror_view( a );
	+ typename Kokkos::View< double**, DeviceType >::HostMirror h_b = Kokkos::create_mirror_view( b );

	- for(int i=0;i<100;i++) {
	- for(int j=0;j<5;j++)
	- h_a(i,j) = 0.1i/(1.1j+1.0) + 0.5*j;
	+ for ( int i = 0; i < 100; i++ ) {
	+ for ( int j = 0; j < 5; j++ ) {
	+ h_a( i, j ) = 0.1 * i / ( 1.1 * j + 1.0 ) + 0.5 * j;
	+ }
	}
	- Kokkos::deep_copy(a,h_a);
	+ Kokkos::deep_copy( a, h_a );

	- if(PWRTest==false)
	- Kokkos::parallel_for(100,FunctorAddTest<DeviceType>(a,b));
	- else
	- Kokkos::parallel_for(policy_type(25,Kokkos::AUTO),FunctorAddTest<DeviceType>(a,b));
	- Kokkos::deep_copy(h_b,b);
	+ if ( PWRTest == false ) {
	+ Kokkos::parallel_for( 100, FunctorAddTest< DeviceType >( a, b ) );
	+ }
	+ else {
	+ Kokkos::parallel_for( policy_type( 25, Kokkos::AUTO ), FunctorAddTest< DeviceType >( a, b ) );
	+ }
	+ Kokkos::deep_copy( h_b, b );

	double result = 0;
	- for(int i=0;i<100;i++) {
	- for(int j=0;j<5;j++)
	- result += h_b(i,j);
	+ for ( int i = 0; i < 100; i++ ) {
	+ for ( int j = 0; j < 5; j++ ) {
	+ result += h_b( i, j );
	}
	+ }

	return result;
	}

	-
	-#if defined (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	-template<class DeviceType, bool PWRTest>
	+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
	+template< class DeviceType, bool PWRTest >
	double AddTestLambda() {
	-
	- Kokkos::View<double**,DeviceType> a("A",100,5);
	- Kokkos::View<double**,DeviceType> b("B",100,5);
	- typename Kokkos::View<double**,DeviceType>::HostMirror h_a = Kokkos::create_mirror_view(a);
	- typename Kokkos::View<double**,DeviceType>::HostMirror h_b = Kokkos::create_mirror_view(b);
	-
	- for(int i=0;i<100;i++) {
	- for(int j=0;j<5;j++)
	- h_a(i,j) = 0.1i/(1.1j+1.0) + 0.5*j;
	+ Kokkos::View< double**, DeviceType > a( "A", 100, 5 );
	+ Kokkos::View< double**, DeviceType > b( "B", 100, 5 );
	+ typename Kokkos::View< double**, DeviceType >::HostMirror h_a = Kokkos::create_mirror_view( a );
	+ typename Kokkos::View< double**, DeviceType >::HostMirror h_b = Kokkos::create_mirror_view( b );
	+
	+ for ( int i = 0; i < 100; i++ ) {
	+ for ( int j = 0; j < 5; j++ ) {
	+ h_a( i, j ) = 0.1 * i / ( 1.1 * j + 1.0 ) + 0.5 * j;
	+ }
	}
	- Kokkos::deep_copy(a,h_a);
	-
	- if(PWRTest==false) {
	- Kokkos::parallel_for(100,KOKKOS_LAMBDA(const int& i) {
	- b(i,0) = a(i,1) + a(i,2);
	- b(i,1) = a(i,0) - a(i,3);
	- b(i,2) = a(i,4) + a(i,0);
	- b(i,3) = a(i,2) - a(i,1);
	- b(i,4) = a(i,3) + a(i,4);
	+ Kokkos::deep_copy( a, h_a );
	+
	+ if ( PWRTest == false ) {
	+ Kokkos::parallel_for( 100, KOKKOS_LAMBDA( const int & i ) {
	+ b( i, 0 ) = a( i, 1 ) + a( i, 2 );
	+ b( i, 1 ) = a( i, 0 ) - a( i, 3 );
	+ b( i, 2 ) = a( i, 4 ) + a( i, 0 );
	+ b( i, 3 ) = a( i, 2 ) - a( i, 1 );
	+ b( i, 4 ) = a( i, 3 ) + a( i, 4 );
	});
	- } else {
	- typedef Kokkos::TeamPolicy<DeviceType> policy_type ;
	- typedef typename policy_type::member_type team_member ;
	-
	- policy_type policy(25,Kokkos::AUTO);
	-
	- Kokkos::parallel_for(policy,KOKKOS_LAMBDA(const team_member & dev) {
	- const int begin = dev.league_rank() * 4 ;
	- const int end = begin + 4 ;
	- for ( int i = begin + dev.team_rank() ; i < end ; i += dev.team_size() ) {
	- b(i,0) = a(i,1) + a(i,2);
	- b(i,1) = a(i,0) - a(i,3);
	- b(i,2) = a(i,4) + a(i,0);
	- b(i,3) = a(i,2) - a(i,1);
	- b(i,4) = a(i,3) + a(i,4);
	+ }
	+ else {
	+ typedef Kokkos::TeamPolicy< DeviceType > policy_type;
	+ typedef typename policy_type::member_type team_member;
	+
	+ policy_type policy( 25, Kokkos::AUTO );
	+
	+ Kokkos::parallel_for( policy, KOKKOS_LAMBDA( const team_member & dev ) {
	+ const int begin = dev.league_rank() * 4;
	+ const int end = begin + 4;
	+ for ( int i = begin + dev.team_rank(); i < end; i += dev.team_size() ) {
	+ b( i, 0 ) = a( i, 1 ) + a( i, 2 );
	+ b( i, 1 ) = a( i, 0 ) - a( i, 3 );
	+ b( i, 2 ) = a( i, 4 ) + a( i, 0 );
	+ b( i, 3 ) = a( i, 2 ) - a( i, 1 );
	+ b( i, 4 ) = a( i, 3 ) + a( i, 4 );
	}
	});
	}
	- Kokkos::deep_copy(h_b,b);
	+ Kokkos::deep_copy( h_b, b );

	double result = 0;
	- for(int i=0;i<100;i++) {
	- for(int j=0;j<5;j++)
	- result += h_b(i,j);
	+ for ( int i = 0; i < 100; i++ ) {
	+ for ( int j = 0; j < 5; j++ ) {
	+ result += h_b( i, j );
	}
	+ }

	return result;
	}
	-
	#else
	-template<class DeviceType, bool PWRTest>
	+template< class DeviceType, bool PWRTest >
	double AddTestLambda() {
	- return AddTestFunctor<DeviceType,PWRTest>();
	+ return AddTestFunctor< DeviceType, PWRTest >();
	}
	#endif

	-
	-template<class DeviceType>
	-struct FunctorReduceTest{
	- typedef Kokkos::View<double**,DeviceType> view_type;
	- view_type a_;
	+template< class DeviceType >
	+struct FunctorReduceTest {
	+ typedef Kokkos::View< double**, DeviceType > view_type;
	typedef DeviceType execution_space;
	typedef double value_type;
	- FunctorReduceTest(view_type & a):a_(a) {}
	+ typedef typename Kokkos::TeamPolicy< execution_space >::member_type team_member;
	+
	+ view_type a_;
	+
	+ FunctorReduceTest( view_type & a ) : a_( a ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, value_type& sum) const {
	- sum += a_(i,1) + a_(i,2);
	- sum += a_(i,0) - a_(i,3);
	- sum += a_(i,4) + a_(i,0);
	- sum += a_(i,2) - a_(i,1);
	- sum += a_(i,3) + a_(i,4);
	+ void operator() ( const int & i, value_type & sum ) const {
	+ sum += a_( i, 1 ) + a_( i, 2 );
	+ sum += a_( i, 0 ) - a_( i, 3 );
	+ sum += a_( i, 4 ) + a_( i, 0 );
	+ sum += a_( i, 2 ) - a_( i, 1 );
	+ sum += a_( i, 3 ) + a_( i, 4 );
	}

	- typedef typename Kokkos::TeamPolicy< execution_space >::member_type team_member ;
	-
	KOKKOS_INLINE_FUNCTION
	- void operator() (const team_member & dev, value_type& sum) const {
	- const int begin = dev.league_rank() * 4 ;
	- const int end = begin + 4 ;
	- for ( int i = begin + dev.team_rank() ; i < end ; i += dev.team_size() ) {
	- sum += a_(i,1) + a_(i,2);
	- sum += a_(i,0) - a_(i,3);
	- sum += a_(i,4) + a_(i,0);
	- sum += a_(i,2) - a_(i,1);
	- sum += a_(i,3) + a_(i,4);
	+ void operator() ( const team_member & dev, value_type & sum ) const {
	+ const int begin = dev.league_rank() * 4;
	+ const int end = begin + 4;
	+ for ( int i = begin + dev.team_rank(); i < end; i += dev.team_size() ) {
	+ sum += a_( i, 1 ) + a_( i, 2 );
	+ sum += a_( i, 0 ) - a_( i, 3 );
	+ sum += a_( i, 4 ) + a_( i, 0 );
	+ sum += a_( i, 2 ) - a_( i, 1 );
	+ sum += a_( i, 3 ) + a_( i, 4 );
	}
	}
	+
	KOKKOS_INLINE_FUNCTION
	- void init(value_type& update) const {update = 0.0;}
	+ void init( value_type & update ) const { update = 0.0; }
	+
	KOKKOS_INLINE_FUNCTION
	- void join(volatile value_type& update, volatile value_type const& input) const {update += input;}
	+ void join( volatile value_type & update, volatile value_type const & input ) const { update += input; }
	};

	-template<class DeviceType, bool PWRTest>
	+template< class DeviceType, bool PWRTest >
	double ReduceTestFunctor() {
	+ typedef Kokkos::TeamPolicy< DeviceType > policy_type;
	+ typedef Kokkos::View< double**, DeviceType > view_type;
	+ typedef Kokkos::View< double, typename view_type::host_mirror_space, Kokkos::MemoryUnmanaged > unmanaged_result;

	- typedef Kokkos::TeamPolicy<DeviceType> policy_type ;
	- typedef Kokkos::View<double**,DeviceType> view_type ;
	- typedef Kokkos::View<double,typename view_type::host_mirror_space,Kokkos::MemoryUnmanaged> unmanaged_result ;
	-
	- view_type a("A",100,5);
	- typename view_type::HostMirror h_a = Kokkos::create_mirror_view(a);
	+ view_type a( "A", 100, 5 );
	+ typename view_type::HostMirror h_a = Kokkos::create_mirror_view( a );

	- for(int i=0;i<100;i++) {
	- for(int j=0;j<5;j++)
	- h_a(i,j) = 0.1i/(1.1j+1.0) + 0.5*j;
	+ for ( int i = 0; i < 100; i++ ) {
	+ for ( int j = 0; j < 5; j++ ) {
	+ h_a( i, j ) = 0.1 * i / ( 1.1 * j + 1.0 ) + 0.5 * j;
	+ }
	}
	- Kokkos::deep_copy(a,h_a);
	+ Kokkos::deep_copy( a, h_a );

	double result = 0.0;
	- if(PWRTest==false)
	- Kokkos::parallel_reduce(100,FunctorReduceTest<DeviceType>(a), unmanaged_result( & result ));
	- else
	- Kokkos::parallel_reduce(policy_type(25,Kokkos::AUTO),FunctorReduceTest<DeviceType>(a), unmanaged_result( & result ));
	+ if ( PWRTest == false ) {
	+ Kokkos::parallel_reduce( 100, FunctorReduceTest< DeviceType >( a ), unmanaged_result( & result ) );
	+ }
	+ else {
	+ Kokkos::parallel_reduce( policy_type( 25, Kokkos::AUTO ), FunctorReduceTest< DeviceType >( a ), unmanaged_result( & result ) );
	+ }

	return result;
	}

	-#if defined (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	-template<class DeviceType, bool PWRTest>
	+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
	+template< class DeviceType, bool PWRTest >
	double ReduceTestLambda() {
	+ typedef Kokkos::TeamPolicy< DeviceType > policy_type;
	+ typedef Kokkos::View< double**, DeviceType > view_type;
	+ typedef Kokkos::View< double, typename view_type::host_mirror_space, Kokkos::MemoryUnmanaged > unmanaged_result;

	- typedef Kokkos::TeamPolicy<DeviceType> policy_type ;
	- typedef Kokkos::View<double**,DeviceType> view_type ;
	- typedef Kokkos::View<double,typename view_type::host_mirror_space,Kokkos::MemoryUnmanaged> unmanaged_result ;
	-
	- view_type a("A",100,5);
	- typename view_type::HostMirror h_a = Kokkos::create_mirror_view(a);
	+ view_type a( "A", 100, 5 );
	+ typename view_type::HostMirror h_a = Kokkos::create_mirror_view( a );

	- for(int i=0;i<100;i++) {
	- for(int j=0;j<5;j++)
	- h_a(i,j) = 0.1i/(1.1j+1.0) + 0.5*j;
	+ for ( int i = 0; i < 100; i++ ) {
	+ for ( int j = 0; j < 5; j++ ) {
	+ h_a( i, j ) = 0.1 * i / ( 1.1 * j + 1.0 ) + 0.5 * j;
	+ }
	}
	- Kokkos::deep_copy(a,h_a);
	+ Kokkos::deep_copy( a, h_a );

	double result = 0.0;

	- if(PWRTest==false) {
	- Kokkos::parallel_reduce(100,KOKKOS_LAMBDA(const int& i, double& sum) {
	- sum += a(i,1) + a(i,2);
	- sum += a(i,0) - a(i,3);
	- sum += a(i,4) + a(i,0);
	- sum += a(i,2) - a(i,1);
	- sum += a(i,3) + a(i,4);
	+ if ( PWRTest == false ) {
	+ Kokkos::parallel_reduce( 100, KOKKOS_LAMBDA( const int & i, double & sum ) {
	+ sum += a( i, 1 ) + a( i, 2 );
	+ sum += a( i, 0 ) - a( i, 3 );
	+ sum += a( i, 4 ) + a( i, 0 );
	+ sum += a( i, 2 ) - a( i, 1 );
	+ sum += a( i, 3 ) + a( i, 4 );
	}, unmanaged_result( & result ) );
	- } else {
	- typedef typename policy_type::member_type team_member ;
	- Kokkos::parallel_reduce(policy_type(25,Kokkos::AUTO),KOKKOS_LAMBDA(const team_member & dev, double& sum) {
	- const int begin = dev.league_rank() * 4 ;
	- const int end = begin + 4 ;
	- for ( int i = begin + dev.team_rank() ; i < end ; i += dev.team_size() ) {
	- sum += a(i,1) + a(i,2);
	- sum += a(i,0) - a(i,3);
	- sum += a(i,4) + a(i,0);
	- sum += a(i,2) - a(i,1);
	- sum += a(i,3) + a(i,4);
	+ }
	+ else {
	+ typedef typename policy_type::member_type team_member;
	+ Kokkos::parallel_reduce( policy_type( 25, Kokkos::AUTO ), KOKKOS_LAMBDA( const team_member & dev, double & sum ) {
	+ const int begin = dev.league_rank() * 4;
	+ const int end = begin + 4;
	+ for ( int i = begin + dev.team_rank(); i < end; i += dev.team_size() ) {
	+ sum += a( i, 1 ) + a( i, 2 );
	+ sum += a( i, 0 ) - a( i, 3 );
	+ sum += a( i, 4 ) + a( i, 0 );
	+ sum += a( i, 2 ) - a( i, 1 );
	+ sum += a( i, 3 ) + a( i, 4 );
	}
	}, unmanaged_result( & result ) );
	}

	return result;
	}
	-
	#else
	-template<class DeviceType, bool PWRTest>
	+template< class DeviceType, bool PWRTest >
	double ReduceTestLambda() {
	- return ReduceTestFunctor<DeviceType,PWRTest>();
	+ return ReduceTestFunctor< DeviceType, PWRTest >();
	}
	#endif

	-template<class DeviceType>
	-double TestVariantLambda(int test) {
	- switch (test) {
	- case 1: return AddTestLambda<DeviceType,false>();
	- case 2: return AddTestLambda<DeviceType,true>();
	- case 3: return ReduceTestLambda<DeviceType,false>();
	- case 4: return ReduceTestLambda<DeviceType,true>();
	+template< class DeviceType >
	+double TestVariantLambda( int test ) {
	+ switch ( test ) {
	+ case 1: return AddTestLambda< DeviceType, false >();
	+ case 2: return AddTestLambda< DeviceType, true >();
	+ case 3: return ReduceTestLambda< DeviceType, false >();
	+ case 4: return ReduceTestLambda< DeviceType, true >();
	}
	+
	return 0;
	}

	-
	-template<class DeviceType>
	-double TestVariantFunctor(int test) {
	- switch (test) {
	- case 1: return AddTestFunctor<DeviceType,false>();
	- case 2: return AddTestFunctor<DeviceType,true>();
	- case 3: return ReduceTestFunctor<DeviceType,false>();
	- case 4: return ReduceTestFunctor<DeviceType,true>();
	+template< class DeviceType >
	+double TestVariantFunctor( int test ) {
	+ switch ( test ) {
	+ case 1: return AddTestFunctor< DeviceType, false >();
	+ case 2: return AddTestFunctor< DeviceType, true >();
	+ case 3: return ReduceTestFunctor< DeviceType, false >();
	+ case 4: return ReduceTestFunctor< DeviceType, true >();
	}
	+
	return 0;
	}

	-template<class DeviceType>
	-bool Test(int test) {
	-
	+template< class DeviceType >
	+bool Test( int test ) {
	#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
	- double res_functor = TestVariantFunctor<DeviceType>(test);
	- double res_lambda = TestVariantLambda<DeviceType>(test);
	+ double res_functor = TestVariantFunctor< DeviceType >( test );
	+ double res_lambda = TestVariantLambda< DeviceType >( test );

	- char testnames[5][256] = {" "
	- ,"AddTest","AddTest TeamPolicy"
	- ,"ReduceTest","ReduceTest TeamPolicy"
	+ char testnames[5][256] = { " "
	+ , "AddTest", "AddTest TeamPolicy"
	+ , "ReduceTest", "ReduceTest TeamPolicy"
	};
	bool passed = true;

	if ( res_functor != res_lambda ) {
	passed = false;

	std::cout << "CXX11 ( test = '"
	<< testnames[test] << "' FAILED : "
	<< res_functor << " != " << res_lambda
	- << std::endl ;
	+ << std::endl;
	}

	- return passed ;
	+ return passed;
	#else
	return true;
	#endif
	}

	-}
	+} // namespace TestCXX11
	diff --git a/lib/kokkos/core/unit_test/TestCXX11Deduction.hpp b/lib/kokkos/core/unit_test/TestCXX11Deduction.hpp
	index 359e17a44..b53b42b8e 100644
	--- a/lib/kokkos/core/unit_test/TestCXX11Deduction.hpp
	+++ b/lib/kokkos/core/unit_test/TestCXX11Deduction.hpp
	@@ -1,94 +1,92 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <Kokkos_Core.hpp>

	#ifndef TESTCXX11DEDUCTION_HPP
	#define TESTCXX11DEDUCTION_HPP

	namespace TestCXX11 {

	struct TestReductionDeductionTagA {};
	struct TestReductionDeductionTagB {};

	template < class ExecSpace >
	struct TestReductionDeductionFunctor {
	-
	// KOKKOS_INLINE_FUNCTION
	- // void operator()( long i , long & value ) const
	- // { value += i + 1 ; }
	+ // void operator()( long i, long & value ) const
	+ // { value += i + 1; }

	KOKKOS_INLINE_FUNCTION
	- void operator()( TestReductionDeductionTagA , long i , long & value ) const
	+ void operator()( TestReductionDeductionTagA, long i, long & value ) const
	{ value += ( 2 * i + 1 ) + ( 2 * i + 2 ); }

	KOKKOS_INLINE_FUNCTION
	- void operator()( const TestReductionDeductionTagB & , const long i , long & value ) const
	- { value += ( 3 * i + 1 ) + ( 3 * i + 2 ) + ( 3 * i + 3 ) ; }
	-
	+ void operator()( const TestReductionDeductionTagB &, const long i, long & value ) const
	+ { value += ( 3 * i + 1 ) + ( 3 * i + 2 ) + ( 3 * i + 3 ); }
	};

	template< class ExecSpace >
	void test_reduction_deduction()
	{
	- typedef TestReductionDeductionFunctor< ExecSpace > Functor ;
	+ typedef TestReductionDeductionFunctor< ExecSpace > Functor;

	- const long N = 50 ;
	- // const long answer = N % 2 ? ( N * ((N+1)/2 )) : ( (N/2) * (N+1) );
	- const long answerA = N % 2 ? ( (2N) (((2N)+1)/2 )) : ( ((2N)/2) * ((2*N)+1) );
	- const long answerB = N % 2 ? ( (3N) (((3N)+1)/2 )) : ( ((3N)/2) * ((3*N)+1) );
	- long result = 0 ;
	+ const long N = 50;
	+ // const long answer = N % 2 ? ( N * ( ( N + 1 ) / 2 ) ) : ( ( N / 2 ) * ( N + 1 ) );
	+ const long answerA = N % 2 ? ( ( 2 * N ) * ( ( ( 2 * N ) + 1 ) / 2 ) ) : ( ( ( 2 * N ) / 2 ) * ( ( 2 * N ) + 1 ) );
	+ const long answerB = N % 2 ? ( ( 3 * N ) * ( ( ( 3 * N ) + 1 ) / 2 ) ) : ( ( ( 3 * N ) / 2 ) * ( ( 3 * N ) + 1 ) );
	+ long result = 0;

	- // Kokkos::parallel_reduce( Kokkos::RangePolicy<ExecSpace>(0,N) , Functor() , result );
	- // ASSERT_EQ( answer , result );
	-
	- Kokkos::parallel_reduce( Kokkos::RangePolicy<ExecSpace,TestReductionDeductionTagA>(0,N) , Functor() , result );
	- ASSERT_EQ( answerA , result );
	-
	- Kokkos::parallel_reduce( Kokkos::RangePolicy<ExecSpace,TestReductionDeductionTagB>(0,N) , Functor() , result );
	- ASSERT_EQ( answerB , result );
	-}
	+ // Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), Functor(), result );
	+ // ASSERT_EQ( answer, result );
	+
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace, TestReductionDeductionTagA >( 0, N ), Functor(), result );
	+ ASSERT_EQ( answerA, result );

	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace, TestReductionDeductionTagB >( 0, N ), Functor(), result );
	+ ASSERT_EQ( answerB, result );
	}

	-#endif
	+} // namespace TestCXX11

	+#endif
	diff --git a/lib/kokkos/core/unit_test/TestCompilerMacros.hpp b/lib/kokkos/core/unit_test/TestCompilerMacros.hpp
	index 5add656a4..455543834 100644
	--- a/lib/kokkos/core/unit_test/TestCompilerMacros.hpp
	+++ b/lib/kokkos/core/unit_test/TestCompilerMacros.hpp
	@@ -1,95 +1,97 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Core.hpp>

	#define KOKKOS_PRAGMA_UNROLL(a)

	namespace TestCompilerMacros {

	-template<class DEVICE_TYPE>
	+template< class DEVICE_TYPE >
	struct AddFunctor {
	typedef DEVICE_TYPE execution_space;
	- typedef typename Kokkos::View<int**,execution_space> type;
	- type a,b;
	+ typedef typename Kokkos::View< int**, execution_space > type;
	+ type a, b;
	int length;

	- AddFunctor(type a_, type b_):a(a_),b(b_),length(a.dimension_1()) {}
	+ AddFunctor( type a_, type b_ ) : a( a_ ), b( b_ ), length( a.dimension_1() ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator()(int i) const {
	+ void operator()( int i ) const {
	#ifdef KOKKOS_ENABLE_PRAGMA_UNROLL
	#pragma unroll
	#endif
	#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
	#pragma ivdep
	#endif
	#ifdef KOKKOS_ENABLE_PRAGMA_VECTOR
	#pragma vector always
	#endif
	#ifdef KOKKOS_ENABLE_PRAGMA_LOOPCOUNT
	#pragma loop count(128)
	#endif
	#ifndef KOKKOS_DEBUG
	#ifdef KOKKOS_ENABLE_PRAGMA_SIMD
	#pragma simd
	#endif
	#endif
	- for(int j=0;j<length;j++)
	- a(i,j) += b(i,j);
	+ for ( int j = 0; j < length; j++ ) {
	+ a( i, j ) += b( i, j );
	+ }
	}
	};

	-template<class DeviceType>
	+template< class DeviceType >
	bool Test() {
	- typedef typename Kokkos::View<int**,DeviceType> type;
	- type a("A",1024,128);
	- type b("B",1024,128);
	+ typedef typename Kokkos::View< int**, DeviceType > type;
	+ type a( "A", 1024, 128 );
	+ type b( "B", 1024, 128 );

	- AddFunctor<DeviceType> f(a,b);
	- Kokkos::parallel_for(1024,f);
	+ AddFunctor< DeviceType > f( a, b );
	+ Kokkos::parallel_for( 1024, f );
	DeviceType::fence();
	+
	return true;
	}

	-}
	+} // namespace TestCompilerMacros
	diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType.cpp b/lib/kokkos/core/unit_test/TestDefaultDeviceType.cpp
	index 7e08f67e6..f85a35c09 100644
	--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType.cpp
	+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceType.cpp
	@@ -1,101 +1,99 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <Kokkos_Core.hpp>

	-#if !defined(KOKKOS_ENABLE_CUDA) \|\| defined(__CUDACC__)
	-//----------------------------------------------------------------------------
	+#if !defined( KOKKOS_ENABLE_CUDA ) \|\| defined( __CUDACC__ )

	#include <TestAtomic.hpp>
	-
	#include <TestViewAPI.hpp>
	-
	#include <TestReduce.hpp>
	#include <TestScan.hpp>
	#include <TestTeam.hpp>
	#include <TestAggregate.hpp>
	#include <TestCompilerMacros.hpp>
	#include <TestCXX11.hpp>
	#include <TestTeamVector.hpp>
	#include <TestUtilities.hpp>

	namespace Test {

	class defaultdevicetype : public ::testing::Test {
	protected:
	static void SetUpTestCase()
	{
	Kokkos::initialize();
	}

	static void TearDownTestCase()
	{
	Kokkos::finalize();
	}
	};

	TEST_F( defaultdevicetype, host_space_access )
	{
	- typedef Kokkos::HostSpace::execution_space host_exec_space ;
	- typedef Kokkos::Device< host_exec_space , Kokkos::HostSpace > device_space ;
	- typedef Kokkos::Impl::HostMirror< Kokkos::DefaultExecutionSpace >::Space mirror_space ;
	+ typedef Kokkos::HostSpace::execution_space host_exec_space;
	+ typedef Kokkos::Device< host_exec_space, Kokkos::HostSpace > device_space;
	+ typedef Kokkos::Impl::HostMirror< Kokkos::DefaultExecutionSpace >::Space mirror_space;

	static_assert(
	- Kokkos::Impl::SpaceAccessibility< host_exec_space , Kokkos::HostSpace >::accessible , "" );
	+ Kokkos::Impl::SpaceAccessibility< host_exec_space, Kokkos::HostSpace >::accessible, "" );

	static_assert(
	- Kokkos::Impl::SpaceAccessibility< device_space , Kokkos::HostSpace >::accessible , "" );
	+ Kokkos::Impl::SpaceAccessibility< device_space, Kokkos::HostSpace >::accessible, "" );

	static_assert(
	- Kokkos::Impl::SpaceAccessibility< mirror_space , Kokkos::HostSpace >::accessible , "" );
	+ Kokkos::Impl::SpaceAccessibility< mirror_space, Kokkos::HostSpace >::accessible, "" );
	}

	-TEST_F( defaultdevicetype, view_api) {
	- TestViewAPI< double , Kokkos::DefaultExecutionSpace >();
	+TEST_F( defaultdevicetype, view_api )
	+{
	+ TestViewAPI< double, Kokkos::DefaultExecutionSpace >();
	}

	-} // namespace test
	+} // namespace Test

	#endif
	diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceTypeInit.hpp b/lib/kokkos/core/unit_test/TestDefaultDeviceTypeInit.hpp
	index 7778efde3..401da58a5 100644
	--- a/lib/kokkos/core/unit_test/TestDefaultDeviceTypeInit.hpp
	+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceTypeInit.hpp
	@@ -1,419 +1,468 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <Kokkos_Core.hpp>
	+
	#ifdef KOKKOS_ENABLE_OPENMP
	#include <omp.h>
	#endif

	-#if !defined(KOKKOS_ENABLE_CUDA) \|\| defined(__CUDACC__)
	-//----------------------------------------------------------------------------
	+#if !defined( KOKKOS_ENABLE_CUDA ) \|\| defined( __CUDACC__ )

	namespace Test {

	namespace Impl {

	- char** init_kokkos_args(bool do_threads,bool do_numa,bool do_device,bool do_other, int& nargs, Kokkos::InitArguments& init_args) {
	- nargs = (do_threads?1:0) +
	- (do_numa?1:0) +
	- (do_device?1:0) +
	- (do_other?4:0);
	- char** args_kokkos = new char*[nargs];
	- for(int i = 0; i < nargs; i++)
	- args_kokkos[i] = new char[20];
	+char** init_kokkos_args( bool do_threads, bool do_numa, bool do_device, bool do_other, int & nargs, Kokkos::InitArguments & init_args ) {
	+ nargs = ( do_threads ? 1 : 0 ) +
	+ ( do_numa ? 1 : 0 ) +
	+ ( do_device ? 1 : 0 ) +
	+ ( do_other ? 4 : 0 );

	- int threads_idx = do_other?1:0;
	- int numa_idx = (do_other?3:0) + (do_threads?1:0);
	- int device_idx = (do_other?3:0) + (do_threads?1:0) + (do_numa?1:0);
	+ char** args_kokkos = new char*[nargs];
	+ for ( int i = 0; i < nargs; i++ ) {
	+ args_kokkos[i] = new char[20];
	+ }

	+ int threads_idx = do_other ? 1 : 0;
	+ int numa_idx = ( do_other ? 3 : 0 ) + ( do_threads ? 1 : 0 );
	+ int device_idx = ( do_other ? 3 : 0 ) + ( do_threads ? 1 : 0 ) + ( do_numa ? 1 : 0 );

	- if(do_threads) {
	- int nthreads = 3;
	+ if ( do_threads ) {
	+ int nthreads = 3;

	#ifdef KOKKOS_ENABLE_OPENMP
	- if(omp_get_max_threads() < 3)
	- nthreads = omp_get_max_threads();
	+ if ( omp_get_max_threads() < 3 )
	+ nthreads = omp_get_max_threads();
	#endif

	- if(Kokkos::hwloc::available()) {
	- if(Kokkos::hwloc::get_available_threads_per_core()<3)
	- nthreads = Kokkos::hwloc::get_available_threads_per_core()
	- * Kokkos::hwloc::get_available_numa_count();
	- }
	-
	-#ifdef KOKKOS_ENABLE_SERIAL
	- if(std::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value \|\|
	- std::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
	- nthreads = 1;
	- }
	-#endif
	- init_args.num_threads = nthreads;
	- sprintf(args_kokkos[threads_idx],"--threads=%i",nthreads);
	+ if ( Kokkos::hwloc::available() ) {
	+ if ( Kokkos::hwloc::get_available_threads_per_core() < 3 )
	+ nthreads = Kokkos::hwloc::get_available_threads_per_core()
	+ * Kokkos::hwloc::get_available_numa_count();
	}

	- if(do_numa) {
	- int numa = 1;
	- if(Kokkos::hwloc::available())
	- numa = Kokkos::hwloc::get_available_numa_count();
	#ifdef KOKKOS_ENABLE_SERIAL
	- if(std::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value \|\|
	- std::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
	- numa = 1;
	- }
	-#endif
	-
	- init_args.num_numa = numa;
	- sprintf(args_kokkos[numa_idx],"--numa=%i",numa);
	+ if ( std::is_same< Kokkos::Serial, Kokkos::DefaultExecutionSpace >::value \|\|
	+ std::is_same< Kokkos::Serial, Kokkos::DefaultHostExecutionSpace >::value ) {
	+ nthreads = 1;
	}
	+#endif

	- if(do_device) {
	+ init_args.num_threads = nthreads;
	+ sprintf( args_kokkos[threads_idx], "--threads=%i", nthreads );
	+ }

	- init_args.device_id = 0;
	- sprintf(args_kokkos[device_idx],"--device=%i",0);
	+ if ( do_numa ) {
	+ int numa = 1;
	+ if ( Kokkos::hwloc::available() ) {
	+ numa = Kokkos::hwloc::get_available_numa_count();
	}

	- if(do_other) {
	- sprintf(args_kokkos[0],"--dummyarg=1");
	- sprintf(args_kokkos[threads_idx+(do_threads?1:0)],"--dummy2arg");
	- sprintf(args_kokkos[threads_idx+(do_threads?1:0)+1],"dummy3arg");
	- sprintf(args_kokkos[device_idx+(do_device?1:0)],"dummy4arg=1");
	+#ifdef KOKKOS_ENABLE_SERIAL
	+ if ( std::is_same< Kokkos::Serial, Kokkos::DefaultExecutionSpace >::value \|\|
	+ std::is_same< Kokkos::Serial, Kokkos::DefaultHostExecutionSpace >::value ) {
	+ numa = 1;
	}
	+#endif

	+ init_args.num_numa = numa;
	+ sprintf( args_kokkos[numa_idx], "--numa=%i", numa );
	+ }

	- return args_kokkos;
	+ if ( do_device ) {
	+ init_args.device_id = 0;
	+ sprintf( args_kokkos[device_idx], "--device=%i", 0 );
	}

	- Kokkos::InitArguments init_initstruct(bool do_threads, bool do_numa, bool do_device) {
	- Kokkos::InitArguments args;
	+ if ( do_other ) {
	+ sprintf( args_kokkos[0], "--dummyarg=1" );
	+ sprintf( args_kokkos[ threads_idx + ( do_threads ? 1 : 0 ) ], "--dummy2arg" );
	+ sprintf( args_kokkos[ threads_idx + ( do_threads ? 1 : 0 ) + 1 ], "dummy3arg" );
	+ sprintf( args_kokkos[ device_idx + ( do_device ? 1 : 0 ) ], "dummy4arg=1" );
	+ }
	+
	+ return args_kokkos;
	+}
	+
	+Kokkos::InitArguments init_initstruct( bool do_threads, bool do_numa, bool do_device ) {
	+ Kokkos::InitArguments args;

	- if(do_threads) {
	- int nthreads = 3;
	+ if ( do_threads ) {
	+ int nthreads = 3;

	#ifdef KOKKOS_ENABLE_OPENMP
	- if(omp_get_max_threads() < 3)
	- nthreads = omp_get_max_threads();
	+ if ( omp_get_max_threads() < 3 ) {
	+ nthreads = omp_get_max_threads();
	+ }
	#endif

	- if(Kokkos::hwloc::available()) {
	- if(Kokkos::hwloc::get_available_threads_per_core()<3)
	- nthreads = Kokkos::hwloc::get_available_threads_per_core()
	- * Kokkos::hwloc::get_available_numa_count();
	+ if ( Kokkos::hwloc::available() ) {
	+ if ( Kokkos::hwloc::get_available_threads_per_core() < 3 ) {
	+ nthreads = Kokkos::hwloc::get_available_threads_per_core()
	+ * Kokkos::hwloc::get_available_numa_count();
	}
	+ }
	+
	#ifdef KOKKOS_ENABLE_SERIAL
	- if(std::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value \|\|
	- std::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
	- nthreads = 1;
	- }
	+ if ( std::is_same< Kokkos::Serial, Kokkos::DefaultExecutionSpace >::value \|\|
	+ std::is_same< Kokkos::Serial, Kokkos::DefaultHostExecutionSpace >::value ) {
	+ nthreads = 1;
	+ }
	#endif

	- args.num_threads = nthreads;
	+ args.num_threads = nthreads;
	+ }
	+
	+ if ( do_numa ) {
	+ int numa = 1;
	+ if ( Kokkos::hwloc::available() ) {
	+ numa = Kokkos::hwloc::get_available_numa_count();
	}

	- if(do_numa) {
	- int numa = 1;
	- if(Kokkos::hwloc::available())
	- numa = Kokkos::hwloc::get_available_numa_count();
	#ifdef KOKKOS_ENABLE_SERIAL
	- if(std::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value \|\|
	- std::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
	- numa = 1;
	- }
	-#endif
	- args.num_numa = numa;
	+ if ( std::is_same< Kokkos::Serial, Kokkos::DefaultExecutionSpace >::value \|\|
	+ std::is_same< Kokkos::Serial, Kokkos::DefaultHostExecutionSpace >::value ) {
	+ numa = 1;
	}
	+#endif

	- if(do_device) {
	- args.device_id = 0;
	- }
	+ args.num_numa = numa;
	+ }

	- return args;
	+ if ( do_device ) {
	+ args.device_id = 0;
	}

	- void check_correct_initialization(const Kokkos::InitArguments& argstruct) {
	- ASSERT_EQ( Kokkos::DefaultExecutionSpace::is_initialized(), 1);
	- ASSERT_EQ( Kokkos::HostSpace::execution_space::is_initialized(), 1);
	-
	- //Figure out the number of threads the HostSpace ExecutionSpace should have initialized to
	- int expected_nthreads = argstruct.num_threads;
	- if(expected_nthreads<1) {
	- if(Kokkos::hwloc::available()) {
	- expected_nthreads = Kokkos::hwloc::get_available_numa_count()
	- * Kokkos::hwloc::get_available_cores_per_numa()
	- * Kokkos::hwloc::get_available_threads_per_core();
	- } else {
	- #ifdef KOKKOS_ENABLE_OPENMP
	- if(std::is_same<Kokkos::HostSpace::execution_space,Kokkos::OpenMP>::value) {
	- expected_nthreads = omp_get_max_threads();
	- } else
	- #endif
	- expected_nthreads = 1;
	+ return args;
	+}
	+
	+void check_correct_initialization( const Kokkos::InitArguments & argstruct ) {
	+ ASSERT_EQ( Kokkos::DefaultExecutionSpace::is_initialized(), 1 );
	+ ASSERT_EQ( Kokkos::HostSpace::execution_space::is_initialized(), 1 );
	+
	+ // Figure out the number of threads the HostSpace ExecutionSpace should have initialized to.
	+ int expected_nthreads = argstruct.num_threads;

	+ if ( expected_nthreads < 1 ) {
	+ if ( Kokkos::hwloc::available() ) {
	+ expected_nthreads = Kokkos::hwloc::get_available_numa_count()
	+ * Kokkos::hwloc::get_available_cores_per_numa()
	+ * Kokkos::hwloc::get_available_threads_per_core();
	+ }
	+ else {
	+#ifdef KOKKOS_ENABLE_OPENMP
	+ if ( std::is_same< Kokkos::HostSpace::execution_space, Kokkos::OpenMP >::value ) {
	+ expected_nthreads = omp_get_max_threads();
	}
	- #ifdef KOKKOS_ENABLE_SERIAL
	- if(std::is_same<Kokkos::DefaultExecutionSpace,Kokkos::Serial>::value \|\|
	- std::is_same<Kokkos::DefaultHostExecutionSpace,Kokkos::Serial>::value )
	+ else
	+#endif
	expected_nthreads = 1;
	- #endif
	}

	- int expected_numa = argstruct.num_numa;
	- if(expected_numa<1) {
	- if(Kokkos::hwloc::available()) {
	- expected_numa = Kokkos::hwloc::get_available_numa_count();
	- } else {
	- expected_numa = 1;
	- }
	- #ifdef KOKKOS_ENABLE_SERIAL
	- if(std::is_same<Kokkos::DefaultExecutionSpace,Kokkos::Serial>::value \|\|
	- std::is_same<Kokkos::DefaultHostExecutionSpace,Kokkos::Serial>::value )
	- expected_numa = 1;
	- #endif
	+#ifdef KOKKOS_ENABLE_SERIAL
	+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::Serial >::value \|\|
	+ std::is_same< Kokkos::DefaultHostExecutionSpace, Kokkos::Serial >::value ) {
	+ expected_nthreads = 1;
	}
	- ASSERT_EQ(Kokkos::HostSpace::execution_space::thread_pool_size(),expected_nthreads);
	+#endif
	+ }

	-#ifdef KOKKOS_ENABLE_CUDA
	- if(std::is_same<Kokkos::DefaultExecutionSpace,Kokkos::Cuda>::value) {
	- int device;
	- cudaGetDevice( &device );
	- int expected_device = argstruct.device_id;
	- if(argstruct.device_id<0) {
	- expected_device = 0;
	- }
	- ASSERT_EQ(expected_device,device);
	+ int expected_numa = argstruct.num_numa;
	+
	+ if ( expected_numa < 1 ) {
	+ if ( Kokkos::hwloc::available() ) {
	+ expected_numa = Kokkos::hwloc::get_available_numa_count();
	+ }
	+ else {
	+ expected_numa = 1;
	}
	+
	+#ifdef KOKKOS_ENABLE_SERIAL
	+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::Serial >::value \|\|
	+ std::is_same< Kokkos::DefaultHostExecutionSpace, Kokkos::Serial >::value )
	+ expected_numa = 1;
	#endif
	}

	- //ToDo: Add check whether correct number of threads are actually started
	- void test_no_arguments() {
	- Kokkos::initialize();
	- check_correct_initialization(Kokkos::InitArguments());
	- Kokkos::finalize();
	- }
	+ ASSERT_EQ( Kokkos::HostSpace::execution_space::thread_pool_size(), expected_nthreads );

	- void test_commandline_args(int nargs, char** args, const Kokkos::InitArguments& argstruct) {
	- Kokkos::initialize(nargs,args);
	- check_correct_initialization(argstruct);
	- Kokkos::finalize();
	- }

	- void test_initstruct_args(const Kokkos::InitArguments& args) {
	- Kokkos::initialize(args);
	- check_correct_initialization(args);
	- Kokkos::finalize();
	+#ifdef KOKKOS_ENABLE_CUDA
	+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::Cuda >::value ) {
	+ int device;
	+ cudaGetDevice( &device );
	+
	+ int expected_device = argstruct.device_id;
	+ if ( argstruct.device_id < 0 ) {
	+ expected_device = 0;
	+ }
	+
	+ ASSERT_EQ( expected_device, device );
	}
	+#endif
	+}
	+
	+// TODO: Add check whether correct number of threads are actually started.
	+void test_no_arguments() {
	+ Kokkos::initialize();
	+ check_correct_initialization( Kokkos::InitArguments() );
	+ Kokkos::finalize();
	}

	+void test_commandline_args( int nargs, char** args, const Kokkos::InitArguments & argstruct ) {
	+ Kokkos::initialize( nargs, args );
	+ check_correct_initialization( argstruct );
	+ Kokkos::finalize();
	+}
	+
	+void test_initstruct_args( const Kokkos::InitArguments & args ) {
	+ Kokkos::initialize( args );
	+ check_correct_initialization( args );
	+ Kokkos::finalize();
	+}
	+
	+} // namespace Impl
	+
	class defaultdevicetypeinit : public ::testing::Test {
	protected:
	- static void SetUpTestCase()
	- {
	- }
	+ static void SetUpTestCase() {}

	- static void TearDownTestCase()
	- {
	- }
	+ static void TearDownTestCase() {}
	};

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_01
	-TEST_F( defaultdevicetypeinit, no_args) {
	+TEST_F( defaultdevicetypeinit, no_args )
	+{
	Impl::test_no_arguments();
	}
	#endif

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_02
	-TEST_F( defaultdevicetypeinit, commandline_args_empty) {
	+TEST_F( defaultdevicetypeinit, commandline_args_empty )
	+{
	Kokkos::InitArguments argstruct;
	int nargs = 0;
	- char** args = Impl::init_kokkos_args(false,false,false,false,nargs, argstruct);
	- Impl::test_commandline_args(nargs,args,argstruct);
	- for(int i = 0; i < nargs; i++)
	+ char** args = Impl::init_kokkos_args( false, false, false, false, nargs, argstruct );
	+ Impl::test_commandline_args( nargs, args, argstruct );
	+
	+ for ( int i = 0; i < nargs; i++ ) {
	delete [] args[i];
	+ }
	delete [] args;
	}
	#endif

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_03
	-TEST_F( defaultdevicetypeinit, commandline_args_other) {
	+TEST_F( defaultdevicetypeinit, commandline_args_other )
	+{
	Kokkos::InitArguments argstruct;
	int nargs = 0;
	- char** args = Impl::init_kokkos_args(false,false,false,true,nargs, argstruct);
	- Impl::test_commandline_args(nargs,args,argstruct);
	- for(int i = 0; i < nargs; i++)
	+ char** args = Impl::init_kokkos_args( false, false, false, true, nargs, argstruct );
	+ Impl::test_commandline_args( nargs, args, argstruct );
	+
	+ for ( int i = 0; i < nargs; i++ ) {
	delete [] args[i];
	+ }
	delete [] args;
	}
	#endif

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_04
	-TEST_F( defaultdevicetypeinit, commandline_args_nthreads) {
	+TEST_F( defaultdevicetypeinit, commandline_args_nthreads )
	+{
	Kokkos::InitArguments argstruct;
	int nargs = 0;
	- char** args = Impl::init_kokkos_args(true,false,false,false,nargs, argstruct);
	- Impl::test_commandline_args(nargs,args,argstruct);
	- for(int i = 0; i < nargs; i++)
	+ char** args = Impl::init_kokkos_args( true, false, false, false, nargs, argstruct );
	+ Impl::test_commandline_args( nargs, args, argstruct );
	+
	+ for ( int i = 0; i < nargs; i++ ) {
	delete [] args[i];
	+ }
	delete [] args;
	}
	#endif

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_05
	-TEST_F( defaultdevicetypeinit, commandline_args_nthreads_numa) {
	+TEST_F( defaultdevicetypeinit, commandline_args_nthreads_numa )
	+{
	Kokkos::InitArguments argstruct;
	int nargs = 0;
	- char** args = Impl::init_kokkos_args(true,true,false,false,nargs, argstruct);
	- Impl::test_commandline_args(nargs,args,argstruct);
	- for(int i = 0; i < nargs; i++)
	+ char** args = Impl::init_kokkos_args( true, true, false, false, nargs, argstruct );
	+ Impl::test_commandline_args( nargs, args, argstruct );
	+
	+ for ( int i = 0; i < nargs; i++ ) {
	delete [] args[i];
	+ }
	delete [] args;
	}
	#endif

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_06
	-TEST_F( defaultdevicetypeinit, commandline_args_nthreads_numa_device) {
	+TEST_F( defaultdevicetypeinit, commandline_args_nthreads_numa_device )
	+{
	Kokkos::InitArguments argstruct;
	int nargs = 0;
	- char** args = Impl::init_kokkos_args(true,true,true,false,nargs, argstruct);
	- Impl::test_commandline_args(nargs,args,argstruct);
	- for(int i = 0; i < nargs; i++)
	+ char** args = Impl::init_kokkos_args( true, true, true, false, nargs, argstruct );
	+ Impl::test_commandline_args( nargs, args, argstruct );
	+
	+ for ( int i = 0; i < nargs; i++ ) {
	delete [] args[i];
	+ }
	delete [] args;
	}
	#endif

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_07
	-TEST_F( defaultdevicetypeinit, commandline_args_nthreads_device) {
	+TEST_F( defaultdevicetypeinit, commandline_args_nthreads_device )
	+{
	Kokkos::InitArguments argstruct;
	int nargs = 0;
	- char** args = Impl::init_kokkos_args(true,false,true,false,nargs, argstruct);
	- Impl::test_commandline_args(nargs,args,argstruct);
	- for(int i = 0; i < nargs; i++)
	+ char** args = Impl::init_kokkos_args( true, false, true, false, nargs, argstruct );
	+ Impl::test_commandline_args( nargs, args, argstruct );
	+
	+ for ( int i = 0; i < nargs; i++ ) {
	delete [] args[i];
	+ }
	delete [] args;
	}
	#endif

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_08
	-TEST_F( defaultdevicetypeinit, commandline_args_numa_device) {
	+TEST_F( defaultdevicetypeinit, commandline_args_numa_device )
	+{
	Kokkos::InitArguments argstruct;
	int nargs = 0;
	- char** args = Impl::init_kokkos_args(false,true,true,false,nargs, argstruct);
	- Impl::test_commandline_args(nargs,args,argstruct);
	- for(int i = 0; i < nargs; i++)
	+ char** args = Impl::init_kokkos_args( false, true, true, false, nargs, argstruct );
	+ Impl::test_commandline_args( nargs, args, argstruct );
	+
	+ for ( int i = 0; i < nargs; i++ ) {
	delete [] args[i];
	+ }
	delete [] args;
	}
	#endif

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_09
	-TEST_F( defaultdevicetypeinit, commandline_args_device) {
	+TEST_F( defaultdevicetypeinit, commandline_args_device )
	+{
	Kokkos::InitArguments argstruct;
	int nargs = 0;
	- char** args = Impl::init_kokkos_args(false,false,true,false,nargs, argstruct);
	- Impl::test_commandline_args(nargs,args,argstruct);
	- for(int i = 0; i < nargs; i++)
	+ char** args = Impl::init_kokkos_args( false, false, true, false, nargs, argstruct );
	+ Impl::test_commandline_args( nargs, args, argstruct );
	+
	+ for ( int i = 0; i < nargs; i++ ) {
	delete [] args[i];
	+ }
	delete [] args;
	}
	#endif

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_10
	-TEST_F( defaultdevicetypeinit, commandline_args_nthreads_numa_device_other) {
	+TEST_F( defaultdevicetypeinit, commandline_args_nthreads_numa_device_other )
	+{
	Kokkos::InitArguments argstruct;
	int nargs = 0;
	- char** args = Impl::init_kokkos_args(true,true,true,true,nargs, argstruct);
	- Impl::test_commandline_args(nargs,args,argstruct);
	- for(int i = 0; i < nargs; i++)
	+ char** args = Impl::init_kokkos_args( true, true, true, true, nargs, argstruct );
	+ Impl::test_commandline_args( nargs, args, argstruct );
	+
	+ for ( int i = 0; i < nargs; i++ ) {
	delete [] args[i];
	+ }
	delete [] args;
	}
	#endif

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_11
	-TEST_F( defaultdevicetypeinit, initstruct_default) {
	+TEST_F( defaultdevicetypeinit, initstruct_default )
	+{
	Kokkos::InitArguments args;
	- Impl::test_initstruct_args(args);
	+ Impl::test_initstruct_args( args );
	}
	#endif

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_12
	-TEST_F( defaultdevicetypeinit, initstruct_nthreads) {
	- Kokkos::InitArguments args = Impl::init_initstruct(true,false,false);
	- Impl::test_initstruct_args(args);
	+TEST_F( defaultdevicetypeinit, initstruct_nthreads )
	+{
	+ Kokkos::InitArguments args = Impl::init_initstruct( true, false, false );
	+ Impl::test_initstruct_args( args );
	}
	#endif

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_13
	-TEST_F( defaultdevicetypeinit, initstruct_nthreads_numa) {
	- Kokkos::InitArguments args = Impl::init_initstruct(true,true,false);
	- Impl::test_initstruct_args(args);
	+TEST_F( defaultdevicetypeinit, initstruct_nthreads_numa )
	+{
	+ Kokkos::InitArguments args = Impl::init_initstruct( true, true, false );
	+ Impl::test_initstruct_args( args );
	}
	#endif

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_14
	-TEST_F( defaultdevicetypeinit, initstruct_device) {
	- Kokkos::InitArguments args = Impl::init_initstruct(false,false,true);
	- Impl::test_initstruct_args(args);
	+TEST_F( defaultdevicetypeinit, initstruct_device )
	+{
	+ Kokkos::InitArguments args = Impl::init_initstruct( false, false, true );
	+ Impl::test_initstruct_args( args );
	}
	#endif

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_15
	-TEST_F( defaultdevicetypeinit, initstruct_nthreads_device) {
	- Kokkos::InitArguments args = Impl::init_initstruct(true,false,true);
	- Impl::test_initstruct_args(args);
	+TEST_F( defaultdevicetypeinit, initstruct_nthreads_device )
	+{
	+ Kokkos::InitArguments args = Impl::init_initstruct( true, false, true );
	+ Impl::test_initstruct_args( args );
	}
	#endif

	#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_16
	-TEST_F( defaultdevicetypeinit, initstruct_nthreads_numa_device) {
	- Kokkos::InitArguments args = Impl::init_initstruct(true,true,true);
	- Impl::test_initstruct_args(args);
	+TEST_F( defaultdevicetypeinit, initstruct_nthreads_numa_device )
	+{
	+ Kokkos::InitArguments args = Impl::init_initstruct( true, true, true );
	+ Impl::test_initstruct_args( args );
	}
	#endif

	-
	-} // namespace test
	+} // namespace Test

	#endif
	diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
	index dd148a062..4fdfa9591 100644
	--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
	+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
	@@ -1,76 +1,74 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <Kokkos_Core.hpp>

	-#if !defined(KOKKOS_ENABLE_CUDA) \|\| defined(__CUDACC__)
	-//----------------------------------------------------------------------------
	+#if !defined( KOKKOS_ENABLE_CUDA ) \|\| defined( __CUDACC__ )

	#include <TestReduce.hpp>

	-
	namespace Test {

	class defaultdevicetype : public ::testing::Test {
	protected:
	static void SetUpTestCase()
	{
	Kokkos::initialize();
	}

	static void TearDownTestCase()
	{
	Kokkos::finalize();
	}
	};

	-
	-TEST_F( defaultdevicetype, reduce_instantiation_a) {
	+TEST_F( defaultdevicetype, reduce_instantiation_a )
	+{
	TestReduceCombinatoricalInstantiation<>::execute_a();
	}

	-} // namespace test
	+} // namespace Test

	#endif
	diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_b.cpp b/lib/kokkos/core/unit_test/TestDefaultDeviceType_b.cpp
	index c8edfdd5c..841f34e03 100644
	--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_b.cpp
	+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceType_b.cpp
	@@ -1,76 +1,74 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <Kokkos_Core.hpp>

	-#if !defined(KOKKOS_ENABLE_CUDA) \|\| defined(__CUDACC__)
	-//----------------------------------------------------------------------------
	+#if !defined( KOKKOS_ENABLE_CUDA ) \|\| defined( __CUDACC__ )

	#include <TestReduce.hpp>

	-
	namespace Test {

	class defaultdevicetype : public ::testing::Test {
	protected:
	static void SetUpTestCase()
	{
	Kokkos::initialize();
	}

	static void TearDownTestCase()
	{
	Kokkos::finalize();
	}
	};

	-
	-TEST_F( defaultdevicetype, reduce_instantiation_b) {
	+TEST_F( defaultdevicetype, reduce_instantiation_b )
	+{
	TestReduceCombinatoricalInstantiation<>::execute_b();
	}

	-} // namespace test
	+} // namespace Test

	#endif
	diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_c.cpp b/lib/kokkos/core/unit_test/TestDefaultDeviceType_c.cpp
	index 405d49a9b..602863be3 100644
	--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_c.cpp
	+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceType_c.cpp
	@@ -1,76 +1,74 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <Kokkos_Core.hpp>

	-#if !defined(KOKKOS_ENABLE_CUDA) \|\| defined(__CUDACC__)
	-//----------------------------------------------------------------------------
	+#if !defined( KOKKOS_ENABLE_CUDA ) \|\| defined( __CUDACC__ )

	#include <TestReduce.hpp>

	-
	namespace Test {

	class defaultdevicetype : public ::testing::Test {
	protected:
	static void SetUpTestCase()
	{
	Kokkos::initialize();
	}

	static void TearDownTestCase()
	{
	Kokkos::finalize();
	}
	};

	-
	-TEST_F( defaultdevicetype, reduce_instantiation_c) {
	+TEST_F( defaultdevicetype, reduce_instantiation_c )
	+{
	TestReduceCombinatoricalInstantiation<>::execute_c();
	}

	-} // namespace test
	+} // namespace Test

	#endif
	diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_d.cpp b/lib/kokkos/core/unit_test/TestDefaultDeviceType_d.cpp
	index 426cc4f06..5d3665b90 100644
	--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_d.cpp
	+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceType_d.cpp
	@@ -1,237 +1,237 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <Kokkos_Core.hpp>

	-#if !defined(KOKKOS_ENABLE_CUDA) \|\| defined(__CUDACC__)
	-//----------------------------------------------------------------------------
	+#if !defined( KOKKOS_ENABLE_CUDA ) \|\| defined( __CUDACC__ )

	#include <TestAtomic.hpp>
	-
	#include <TestViewAPI.hpp>
	-
	#include <TestReduce.hpp>
	#include <TestScan.hpp>
	#include <TestTeam.hpp>
	#include <TestAggregate.hpp>
	#include <TestCompilerMacros.hpp>
	#include <TestCXX11.hpp>
	#include <TestTeamVector.hpp>
	#include <TestUtilities.hpp>

	namespace Test {

	class defaultdevicetype : public ::testing::Test {
	protected:
	static void SetUpTestCase()
	{
	Kokkos::initialize();
	}

	static void TearDownTestCase()
	{
	Kokkos::finalize();
	}
	};

	-TEST_F( defaultdevicetype, test_utilities) {
	+TEST_F( defaultdevicetype, test_utilities )
	+{
	test_utilities();
	}

	-TEST_F( defaultdevicetype, long_reduce) {
	- TestReduce< long , Kokkos::DefaultExecutionSpace >( 100000 );
	+TEST_F( defaultdevicetype, long_reduce )
	+{
	+ TestReduce< long, Kokkos::DefaultExecutionSpace >( 100000 );
	}

	-TEST_F( defaultdevicetype, double_reduce) {
	- TestReduce< double , Kokkos::DefaultExecutionSpace >( 100000 );
	+TEST_F( defaultdevicetype, double_reduce )
	+{
	+ TestReduce< double, Kokkos::DefaultExecutionSpace >( 100000 );
	}

	-TEST_F( defaultdevicetype, long_reduce_dynamic ) {
	- TestReduceDynamic< long , Kokkos::DefaultExecutionSpace >( 100000 );
	+TEST_F( defaultdevicetype, long_reduce_dynamic )
	+{
	+ TestReduceDynamic< long, Kokkos::DefaultExecutionSpace >( 100000 );
	}

	-TEST_F( defaultdevicetype, double_reduce_dynamic ) {
	- TestReduceDynamic< double , Kokkos::DefaultExecutionSpace >( 100000 );
	+TEST_F( defaultdevicetype, double_reduce_dynamic )
	+{
	+ TestReduceDynamic< double, Kokkos::DefaultExecutionSpace >( 100000 );
	}

	-TEST_F( defaultdevicetype, long_reduce_dynamic_view ) {
	- TestReduceDynamicView< long , Kokkos::DefaultExecutionSpace >( 100000 );
	+TEST_F( defaultdevicetype, long_reduce_dynamic_view )
	+{
	+ TestReduceDynamicView< long, Kokkos::DefaultExecutionSpace >( 100000 );
	}

	-
	-TEST_F( defaultdevicetype , atomics )
	+TEST_F( defaultdevicetype, atomics )
	{
	- const int loop_count = 1e4 ;
	+ const int loop_count = 1e4;

	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::DefaultExecutionSpace >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::DefaultExecutionSpace >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::DefaultExecutionSpace >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::DefaultExecutionSpace >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::DefaultExecutionSpace >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::DefaultExecutionSpace >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::DefaultExecutionSpace >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::DefaultExecutionSpace >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::DefaultExecutionSpace >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::DefaultExecutionSpace >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::DefaultExecutionSpace >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::DefaultExecutionSpace >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::DefaultExecutionSpace >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::DefaultExecutionSpace >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::DefaultExecutionSpace >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::DefaultExecutionSpace >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::DefaultExecutionSpace >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::DefaultExecutionSpace >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::DefaultExecutionSpace>(100,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::DefaultExecutionSpace>(100,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::DefaultExecutionSpace>(100,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::DefaultExecutionSpace >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::DefaultExecutionSpace >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::DefaultExecutionSpace >( 100, 3 ) ) );
	}

	-/*TEST_F( defaultdevicetype , view_remap )
	+/*TEST_F( defaultdevicetype, view_remap )
	{
	- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
	-
	- typedef Kokkos::View< double*[N1][N2][N3] ,
	- Kokkos::LayoutRight ,
	- Kokkos::DefaultExecutionSpace > output_type ;
	-
	- typedef Kokkos::View< int**[N2][N3] ,
	- Kokkos::LayoutLeft ,
	- Kokkos::DefaultExecutionSpace > input_type ;
	-
	- typedef Kokkos::View< int*[N0][N2][N3] ,
	- Kokkos::LayoutLeft ,
	- Kokkos::DefaultExecutionSpace > diff_type ;
	-
	- output_type output( "output" , N0 );
	- input_type input ( "input" , N0 , N1 );
	- diff_type diff ( "diff" , N0 );
	-
	- int value = 0 ;
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
	- input(i0,i1,i2,i3) = ++value ;
	- }}}}
	-
	- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
	- Kokkos::deep_copy( output , input );
	-
	- value = 0 ;
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
	- ++value ;
	- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
	- }}}}
	-}*/
	-
	-//----------------------------------------------------------------------------
	+ enum { N0 = 3, N1 = 2, N2 = 8, N3 = 9 };
	+
	+ typedef Kokkos::View< double*[N1][N2][N3],
	+ Kokkos::LayoutRight,
	+ Kokkos::DefaultExecutionSpace > output_type;
	+
	+ typedef Kokkos::View< int**[N2][N3],
	+ Kokkos::LayoutLeft,
	+ Kokkos::DefaultExecutionSpace > input_type;
	+
	+ typedef Kokkos::View< int*[N0][N2][N3],
	+ Kokkos::LayoutLeft,
	+ Kokkos::DefaultExecutionSpace > diff_type;
	+
	+ output_type output( "output", N0 );
	+ input_type input ( "input", N0, N1 );
	+ diff_type diff ( "diff", N0 );
	+
	+ int value = 0;
	+ for ( size_t i3 = 0; i3 < N3; ++i3 ) {
	+ for ( size_t i2 = 0; i2 < N2; ++i2 ) {
	+ for ( size_t i1 = 0; i1 < N1; ++i1 ) {
	+ for ( size_t i0 = 0; i0 < N0; ++i0 ) {
	+ input( i0, i1, i2, i3 ) = ++value;
	+ }
	+ }
	+ }
	+ }

	+ // Kokkos::deep_copy( diff, input ); // Throw with incompatible shape.
	+ Kokkos::deep_copy( output, input );
	+
	+ value = 0;
	+ for ( size_t i3 = 0; i3 < N3; ++i3 ) {
	+ for ( size_t i2 = 0; i2 < N2; ++i2 ) {
	+ for ( size_t i1 = 0; i1 < N1; ++i1 ) {
	+ for ( size_t i0 = 0; i0 < N0; ++i0 ) {
	+ ++value;
	+ ASSERT_EQ( value, ( (int) output( i0, i1, i2, i3 ) ) );
	+ }
	+ }
	+ }
	+ }
	+}*/

	-TEST_F( defaultdevicetype , view_aggregate )
	+TEST_F( defaultdevicetype, view_aggregate )
	{
	TestViewAggregate< Kokkos::DefaultExecutionSpace >();
	}

	-//----------------------------------------------------------------------------
	-
	-TEST_F( defaultdevicetype , scan )
	+TEST_F( defaultdevicetype, scan )
	{
	- TestScan< Kokkos::DefaultExecutionSpace >::test_range( 1 , 1000 );
	+ TestScan< Kokkos::DefaultExecutionSpace >::test_range( 1, 1000 );
	TestScan< Kokkos::DefaultExecutionSpace >( 1000000 );
	TestScan< Kokkos::DefaultExecutionSpace >( 10000000 );
	Kokkos::DefaultExecutionSpace::fence();
	}

	-
	-//----------------------------------------------------------------------------
	-
	-TEST_F( defaultdevicetype , compiler_macros )
	+TEST_F( defaultdevicetype, compiler_macros )
	{
	ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::DefaultExecutionSpace >() ) );
	}

	-
	-//----------------------------------------------------------------------------
	-TEST_F( defaultdevicetype , cxx11 )
	+TEST_F( defaultdevicetype, cxx11 )
	{
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(1) ) );
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(2) ) );
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(3) ) );
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(4) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >( 1 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >( 2 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >( 3 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >( 4 ) ) );
	}

	-TEST_F( defaultdevicetype , team_vector )
	+#if !defined(KOKKOS_CUDA_CLANG_WORKAROUND) && !defined(KOKKOS_ARCH_PASCAL)
	+TEST_F( defaultdevicetype, team_vector )
	{
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(0) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(1) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(2) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(3) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(4) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(5) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >( 0 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >( 1 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >( 2 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >( 3 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >( 4 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >( 5 ) ) );
	}
	+#endif

	-TEST_F( defaultdevicetype , malloc )
	+TEST_F( defaultdevicetype, malloc )
	{
	- int* data = (int) Kokkos::kokkos_malloc(100sizeof(int));
	- ASSERT_NO_THROW(data = (int) Kokkos::kokkos_realloc(data,120sizeof(int)));
	- Kokkos::kokkos_free(data);
	+ int* data = (int) Kokkos::kokkos_malloc( 100 sizeof( int ) );
	+ ASSERT_NO_THROW( data = (int) Kokkos::kokkos_realloc( data, 120 sizeof( int ) ) );
	+ Kokkos::kokkos_free( data );

	- int* data2 = (int*) Kokkos::kokkos_malloc(0);
	- ASSERT_TRUE(data2==NULL);
	- Kokkos::kokkos_free(data2);
	+ int* data2 = (int*) Kokkos::kokkos_malloc( 0 );
	+ ASSERT_TRUE( data2 == NULL );
	+ Kokkos::kokkos_free( data2 );
	}

	-} // namespace test
	+} // namespace Test

	#endif
	diff --git a/lib/kokkos/core/unit_test/TestHWLOC.cpp b/lib/kokkos/core/unit_test/TestHWLOC.cpp
	index 1637dec5d..d03d9b816 100644
	--- a/lib/kokkos/core/unit_test/TestHWLOC.cpp
	+++ b/lib/kokkos/core/unit_test/TestHWLOC.cpp
	@@ -1,69 +1,67 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <iostream>
	+
	#include <Kokkos_hwloc.hpp>

	namespace Test {

	class hwloc : public ::testing::Test {
	protected:
	- static void SetUpTestCase()
	- {}
	+ static void SetUpTestCase() {}

	- static void TearDownTestCase()
	- {}
	+ static void TearDownTestCase() {}
	};

	-TEST_F( hwloc, query)
	+TEST_F( hwloc, query )
	{
	std::cout << " NUMA[" << Kokkos::hwloc::get_available_numa_count() << "]"
	<< " CORE[" << Kokkos::hwloc::get_available_cores_per_numa() << "]"
	<< " PU[" << Kokkos::hwloc::get_available_threads_per_core() << "]"
	- << std::endl ;
	-}
	-
	+ << std::endl;
	}

	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/TestMDRange.hpp b/lib/kokkos/core/unit_test/TestMDRange.hpp
	index 9894d1ce6..1dc349cc1 100644
	--- a/lib/kokkos/core/unit_test/TestMDRange.hpp
	+++ b/lib/kokkos/core/unit_test/TestMDRange.hpp
	@@ -1,555 +1,1721 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <stdio.h>

	#include <gtest/gtest.h>

	#include <Kokkos_Core.hpp>

	-/--------------------------------------------------------------------------/
	-
	namespace Test {
	+
	namespace {

	template <typename ExecSpace >
	struct TestMDRange_2D {
	+ using DataType = int;
	+ using ViewType = typename Kokkos::View< DataType**, ExecSpace >;
	+ using HostViewType = typename ViewType::HostMirror;

	- using DataType = int ;
	- using ViewType = typename Kokkos::View< DataType** , ExecSpace > ;
	- using HostViewType = typename ViewType::HostMirror ;
	+ ViewType input_view;

	- ViewType input_view ;
	+ TestMDRange_2D( const DataType N0, const DataType N1 ) : input_view( "input_view", N0, N1 ) {}

	- TestMDRange_2D( const DataType N0, const DataType N1 ) : input_view("input_view", N0, N1) {}
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const int i, const int j ) const
	+ {
	+ input_view( i, j ) = 1;
	+ }

	KOKKOS_INLINE_FUNCTION
	- void operator()( const int i , const int j ) const
	+ void operator()( const int i, const int j, double &lsum ) const
	{
	- input_view(i,j) = 1;
	+ lsum += input_view( i, j ) * 2;
	}

	+ // tagged operators
	+ struct InitTag {};
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const InitTag &, const int i, const int j ) const
	+ {
	+ input_view( i, j ) = 3;
	+ }

	- static void test_for2( const int64_t N0, const int64_t N1 )
	+ static void test_reduce2( const int N0, const int N1 )
	{
	+ using namespace Kokkos::Experimental;
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 3, 3 } } );
	+
	+ TestMDRange_2D functor( N0, N1 );
	+
	+ md_parallel_for( range, functor );
	+ double sum = 0.0;
	+ md_parallel_reduce( range, functor, sum );
	+
	+ ASSERT_EQ( sum, 2 * N0 * N1 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Default, Iterate::Default>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 2, 6 } } );
	+
	+ TestMDRange_2D functor( N0, N1 );
	+
	+ md_parallel_for( range, functor );
	+ double sum = 0.0;
	+ md_parallel_reduce( range, functor, sum );
	+
	+ ASSERT_EQ( sum, 2 * N0 * N1 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Left, Iterate::Left>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 2, 6 } } );
	+
	+ TestMDRange_2D functor( N0, N1 );
	+
	+ md_parallel_for( range, functor );
	+ double sum = 0.0;
	+ md_parallel_reduce( range, functor, sum );
	+
	+ ASSERT_EQ( sum, 2 * N0 * N1 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Left, Iterate::Right>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 2, 6 } } );
	+
	+ TestMDRange_2D functor( N0, N1 );
	+
	+ md_parallel_for( range, functor );
	+ double sum = 0.0;
	+ md_parallel_reduce( range, functor, sum );
	+
	+ ASSERT_EQ( sum, 2 * N0 * N1 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Right, Iterate::Left>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 2, 6 } } );
	+
	+ TestMDRange_2D functor( N0, N1 );
	+
	+ md_parallel_for( range, functor );
	+ double sum = 0.0;
	+ md_parallel_reduce( range, functor, sum );
	+
	+ ASSERT_EQ( sum, 2 * N0 * N1 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Right, Iterate::Right>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 2, 6 } } );
	+
	+ TestMDRange_2D functor( N0, N1 );

	+ md_parallel_for( range, functor );
	+ double sum = 0.0;
	+ md_parallel_reduce( range, functor, sum );
	+
	+ ASSERT_EQ( sum, 2 * N0 * N1 );
	+ }
	+ } // end test_reduce2
	+
	+ static void test_for2( const int N0, const int N1 )
	+ {
	using namespace Kokkos::Experimental;

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<2>, Kokkos::IndexType<int> >;
	- range_type range( {0,0}, {N0,N1} );
	- TestMDRange_2D functor(N0,N1);
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2>, Kokkos::IndexType<int>, InitTag > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 3, 3 } } );
	+ TestMDRange_2D functor( N0, N1 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- if ( h_view(i,j) != 1 ) {
	- ++counter;
	- }
	- }}
	- if ( counter != 0 )
	- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ {
	+ if ( h_view( i, j ) != 3 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "Default Layouts + InitTag op(): Errors in test_for2; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Default, Iterate::Default >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2>, InitTag > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;

	- range_type range( {0,0}, {N0,N1} );
	- TestMDRange_2D functor(N0,N1);
	+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 3, 3 } } );
	+ TestMDRange_2D functor( N0, N1 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- if ( h_view(i,j) != 1 ) {
	- ++counter;
	- }
	- }}
	- if ( counter != 0 )
	- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ {
	+ if ( h_view( i, j ) != 3 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "Default Layouts + InitTag op(): Errors in test_for2; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Default, Iterate::Flat >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2>, InitTag > range_type;
	+ typedef typename range_type::point_type point_type;

	- range_type range( {0,0}, {N0,N1} );
	- TestMDRange_2D functor(N0,N1);
	+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } } );
	+ TestMDRange_2D functor( N0, N1 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- if ( h_view(i,j) != 1 ) {
	- ++counter;
	- }
	- }}
	- if ( counter != 0 )
	- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ {
	+ if ( h_view( i, j ) != 3 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "Default Layouts + InitTag op() + Default Tile: Errors in test_for2; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Right, Iterate::Flat >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;

	- range_type range( {0,0}, {N0,N1} );
	- TestMDRange_2D functor(N0,N1);
	+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 3, 3 } } );
	+ TestMDRange_2D functor( N0, N1 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- if ( h_view(i,j) != 1 ) {
	- ++counter;
	- }
	- }}
	- if ( counter != 0 )
	- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ {
	+ if ( h_view( i, j ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "No info: Errors in test_for2; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Left, Iterate::Flat >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Default, Iterate::Default>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;

	- range_type range( {0,0}, {N0,N1} );
	- TestMDRange_2D functor(N0,N1);
	+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 4, 4 } } );
	+ TestMDRange_2D functor( N0, N1 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- if ( h_view(i,j) != 1 ) {
	- ++counter;
	- }
	- }}
	- if ( counter != 0 )
	- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ {
	+ if ( h_view( i, j ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "D D: Errors in test_for2; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Left , Iterate::Left >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Left, Iterate::Left>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;

	- range_type range( {0,0}, {N0,N1}, {3,3} );
	- TestMDRange_2D functor(N0,N1);
	+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 3, 3 } } );
	+ TestMDRange_2D functor( N0, N1 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- if ( h_view(i,j) != 1 ) {
	- ++counter;
	- }
	- }}
	- if ( counter != 0 )
	- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ {
	+ if ( h_view( i, j ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "L L: Errors in test_for2; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Left , Iterate::Right >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Left, Iterate::Right>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;

	- range_type range( {0,0}, {N0,N1}, {7,7} );
	- TestMDRange_2D functor(N0,N1);
	+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 7, 7 } } );
	+ TestMDRange_2D functor( N0, N1 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- if ( h_view(i,j) != 1 ) {
	- ++counter;
	- }
	- }}
	- if ( counter != 0 )
	- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ {
	+ if ( h_view( i, j ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "L R: Errors in test_for2; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Right, Iterate::Left >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Right, Iterate::Left>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;

	- range_type range( {0,0}, {N0,N1}, {16,16} );
	- TestMDRange_2D functor(N0,N1);
	+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 16, 16 } } );
	+ TestMDRange_2D functor( N0, N1 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- if ( h_view(i,j) != 1 ) {
	- ++counter;
	- }
	- }}
	- if ( counter != 0 )
	- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ {
	+ if ( h_view( i, j ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "R L: Errors in test_for2; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Right, Iterate::Right >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Right, Iterate::Right>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;

	- range_type range( {0,0}, {N0,N1}, {5,16} );
	- TestMDRange_2D functor(N0,N1);
	+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 5, 16 } } );
	+ TestMDRange_2D functor( N0, N1 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- if ( h_view(i,j) != 1 ) {
	- ++counter;
	- }
	- }}
	- if ( counter != 0 )
	- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ {
	+ if ( h_view( i, j ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "R R: Errors in test_for2; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}
	-
	- } //end test_for2
	-}; //MDRange_2D
	+ } // end test_for2
	+}; // MDRange_2D

	template <typename ExecSpace >
	struct TestMDRange_3D {
	+ using DataType = int;
	+ using ViewType = typename Kokkos::View< DataType***, ExecSpace >;
	+ using HostViewType = typename ViewType::HostMirror;

	- using DataType = int ;
	- using ViewType = typename Kokkos::View< DataType*** , ExecSpace > ;
	- using HostViewType = typename ViewType::HostMirror ;
	+ ViewType input_view;

	- ViewType input_view ;
	+ TestMDRange_3D( const DataType N0, const DataType N1, const DataType N2 ) : input_view( "input_view", N0, N1, N2 ) {}

	- TestMDRange_3D( const DataType N0, const DataType N1, const DataType N2 ) : input_view("input_view", N0, N1, N2) {}
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const int i, const int j, const int k ) const
	+ {
	+ input_view( i, j, k ) = 1;
	+ }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const int i, const int j, const int k, double &lsum ) const
	+ {
	+ lsum += input_view( i, j, k ) * 2;
	+ }

	+ // tagged operators
	+ struct InitTag {};
	KOKKOS_INLINE_FUNCTION
	- void operator()( const int i , const int j , const int k ) const
	+ void operator()( const InitTag &, const int i, const int j, const int k ) const
	{
	- input_view(i,j,k) = 1;
	+ input_view( i, j, k ) = 3;
	}

	- static void test_for3( const int64_t N0, const int64_t N1, const int64_t N2 )
	+ static void test_reduce3( const int N0, const int N1, const int N2 )
	{
	using namespace Kokkos::Experimental;

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<3>, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 3, 3, 3 } } );
	+
	+ TestMDRange_3D functor( N0, N1, N2 );
	+
	+ md_parallel_for( range, functor );
	+ double sum = 0.0;
	+ md_parallel_reduce( range, functor, sum );
	+
	+ ASSERT_EQ( sum, 2 * N0 * N1 * N2 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Default, Iterate::Default >, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 2, 4, 6 } } );
	+
	+ TestMDRange_3D functor( N0, N1, N2 );
	+
	+ md_parallel_for( range, functor );
	+ double sum = 0.0;
	+ md_parallel_reduce( range, functor, sum );
	+
	+ ASSERT_EQ( sum, 2 * N0 * N1 * N2 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Left, Iterate::Left>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 2, 4, 6 } } );
	+
	+ TestMDRange_3D functor( N0, N1, N2 );
	+
	+ md_parallel_for( range, functor );
	+ double sum = 0.0;
	+ md_parallel_reduce( range, functor, sum );
	+
	+ ASSERT_EQ( sum, 2 * N0 * N1 * N2 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Left, Iterate::Right>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 2, 4, 6 } } );
	+
	+ TestMDRange_3D functor( N0, N1, N2 );
	+
	+ md_parallel_for( range, functor );
	+ double sum = 0.0;
	+ md_parallel_reduce( range, functor, sum );
	+
	+ ASSERT_EQ( sum, 2 * N0 * N1 * N2 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Right, Iterate::Left>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;

	- range_type range( {0,0,0}, {N0,N1,N2} );
	- TestMDRange_3D functor(N0,N1,N2);
	+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 2, 4, 6 } } );
	+
	+ TestMDRange_3D functor( N0, N1, N2 );
	+
	+ md_parallel_for( range, functor );
	+ double sum = 0.0;
	+ md_parallel_reduce( range, functor, sum );
	+
	+ ASSERT_EQ( sum, 2 * N0 * N1 * N2 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Right, Iterate::Right>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 2, 4, 6 } } );
	+
	+ TestMDRange_3D functor( N0, N1, N2 );
	+
	+ md_parallel_for( range, functor );
	+ double sum = 0.0;
	+ md_parallel_reduce( range, functor, sum );
	+
	+ ASSERT_EQ( sum, 2 * N0 * N1 * N2 );
	+ }
	+ } // end test_reduce3
	+
	+ static void test_for3( const int N0, const int N1, const int N2 )
	+ {
	+ using namespace Kokkos::Experimental;
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3> > range_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } } );
	+ TestMDRange_3D functor( N0, N1, N2 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ {
	+ if ( h_view( i, j, k ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "Defaults + No Tile: Errors in test_for3; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3>, Kokkos::IndexType<int>, InitTag > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 3, 3, 3 } } );
	+ TestMDRange_3D functor( N0, N1, N2 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ {
	+ if ( h_view( i, j, k ) != 3 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "Defaults + InitTag op(): Errors in test_for3; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 3, 3, 3 } } );
	+
	+ TestMDRange_3D functor( N0, N1, N2 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ {
	+ if ( h_view( i, j, k ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for3; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Default, Iterate::Default>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 3, 3, 3 } } );
	+ TestMDRange_3D functor( N0, N1, N2 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ {
	+ if ( h_view( i, j, k ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for3; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Left, Iterate::Left>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 2, 4, 2 } } );
	+ TestMDRange_3D functor( N0, N1, N2 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- for ( int k=0; k<N2; ++k ) {
	- if ( h_view(i,j,k) != 1 ) {
	- ++counter;
	- }
	- }}}
	- if ( counter != 0 )
	- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ {
	+ if ( h_view( i, j, k ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for3; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Default, Iterate::Default >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Left, Iterate::Right>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;

	- range_type range( {0,0,0}, {N0,N1,N2} );
	- TestMDRange_3D functor(N0,N1,N2);
	+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 3, 5, 7 } } );
	+ TestMDRange_3D functor( N0, N1, N2 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- for ( int k=0; k<N2; ++k ) {
	- if ( h_view(i,j,k) != 1 ) {
	- ++counter;
	- }
	- }}}
	- if ( counter != 0 )
	- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ {
	+ if ( h_view( i, j, k ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for3; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Flat, Iterate::Default>, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Right, Iterate::Left>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;

	- range_type range( {0,0,0}, {N0,N1,N2} );
	- TestMDRange_3D functor(N0,N1,N2);
	+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 8, 8, 8 } } );
	+ TestMDRange_3D functor( N0, N1, N2 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- for ( int k=0; k<N2; ++k ) {
	- if ( h_view(i,j,k) != 1 ) {
	- ++counter;
	- }
	- }}}
	- if ( counter != 0 )
	- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ {
	+ if ( h_view( i, j, k ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for3; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Flat, Iterate::Flat >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Right, Iterate::Right>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 2, 4, 2 } } );
	+ TestMDRange_3D functor( N0, N1, N2 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ {
	+ if ( h_view( i, j, k ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for3; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+ } // end test_for3
	+};
	+
	+template <typename ExecSpace >
	+struct TestMDRange_4D {
	+ using DataType = int;
	+ using ViewType = typename Kokkos::View< DataType****, ExecSpace >;
	+ using HostViewType = typename ViewType::HostMirror;
	+
	+ ViewType input_view;
	+
	+ TestMDRange_4D( const DataType N0, const DataType N1, const DataType N2, const DataType N3 ) : input_view( "input_view", N0, N1, N2, N3 ) {}
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const int i, const int j, const int k, const int l ) const
	+ {
	+ input_view( i, j, k, l ) = 1;
	+ }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const int i, const int j, const int k, const int l, double &lsum ) const
	+ {
	+ lsum += input_view( i, j, k, l ) * 2;
	+ }
	+
	+ // tagged operators
	+ struct InitTag {};
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const InitTag &, const int i, const int j, const int k, const int l ) const
	+ {
	+ input_view( i, j, k, l ) = 3;
	+ }
	+
	+ static void test_for4( const int N0, const int N1, const int N2, const int N3 )
	+ {
	+ using namespace Kokkos::Experimental;
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4> > range_type;
	+ typedef typename range_type::point_type point_type;

	- range_type range( {0,0,0}, {N0,N1,N2} );
	- TestMDRange_3D functor(N0,N1,N2);
	+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } } );
	+ TestMDRange_4D functor( N0, N1, N2, N3 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- for ( int k=0; k<N2; ++k ) {
	- if ( h_view(i,j,k) != 1 ) {
	- ++counter;
	- }
	- }}}
	- if ( counter != 0 )
	- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ {
	+ if ( h_view( i, j, k, l ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "Defaults + No Tile: Errors in test_for4; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Left, Iterate::Flat >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4>, Kokkos::IndexType<int>, InitTag > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;

	- range_type range( {0,0,0}, {N0,N1,N2} );
	- TestMDRange_3D functor(N0,N1,N2);
	+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } }, tile_type{ { 3, 11, 3, 3 } } );
	+ TestMDRange_4D functor( N0, N1, N2, N3 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- for ( int k=0; k<N2; ++k ) {
	- if ( h_view(i,j,k) != 1 ) {
	- ++counter;
	- }
	- }}}
	- if ( counter != 0 )
	- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ {
	+ if ( h_view( i, j, k, l ) != 3 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf("Defaults +m_tile > m_upper dim2 InitTag op(): Errors in test_for4; mismatches = %d\n\n",counter);
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Right, Iterate::Flat >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } }, tile_type{ { 4, 4, 4, 4 } } );

	- range_type range( {0,0,0}, {N0,N1,N2} );
	- TestMDRange_3D functor(N0,N1,N2);
	+ TestMDRange_4D functor( N0, N1, N2, N3 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- for ( int k=0; k<N2; ++k ) {
	- if ( h_view(i,j,k) != 1 ) {
	- ++counter;
	- }
	- }}}
	- if ( counter != 0 )
	- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ {
	+ if ( h_view( i, j, k, l ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for4; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Left, Iterate::Left >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4, Iterate::Default, Iterate::Default>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;

	- range_type range( {0,0,0}, {N0,N1,N2}, {2,4,2} );
	- TestMDRange_3D functor(N0,N1,N2);
	+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } }, tile_type{ { 4, 4, 4, 4 } } );
	+
	+ TestMDRange_4D functor( N0, N1, N2, N3 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- for ( int k=0; k<N2; ++k ) {
	- if ( h_view(i,j,k) != 1 ) {
	- ++counter;
	- }
	- }}}
	- if ( counter != 0 )
	- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ {
	+ if ( h_view( i, j, k, l ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for4; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Left, Iterate::Right >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4, Iterate::Left, Iterate::Left>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } }, tile_type{ { 4, 4, 4, 4 } } );

	- range_type range( {0,0,0}, {N0,N1,N2}, {3,5,7} );
	- TestMDRange_3D functor(N0,N1,N2);
	+ TestMDRange_4D functor( N0, N1, N2, N3 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- for ( int k=0; k<N2; ++k ) {
	- if ( h_view(i,j,k) != 1 ) {
	- ++counter;
	- }
	- }}}
	- if ( counter != 0 )
	- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ {
	+ if ( h_view( i, j, k, l ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for4; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Right, Iterate::Left >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4, Iterate::Left, Iterate::Right>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } }, tile_type{ { 4, 4, 4, 4 } } );

	- range_type range( {0,0,0}, {N0,N1,N2}, {8,8,8} );
	- TestMDRange_3D functor(N0,N1,N2);
	+ TestMDRange_4D functor( N0, N1, N2, N3 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- for ( int k=0; k<N2; ++k ) {
	- if ( h_view(i,j,k) != 1 ) {
	- ++counter;
	- }
	- }}}
	- if ( counter != 0 )
	- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ {
	+ if ( h_view( i, j, k, l ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for4; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	{
	- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Right, Iterate::Right >, Kokkos::IndexType<int> >;
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4, Iterate::Right, Iterate::Left>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;

	- range_type range( {0,0,0}, {N0,N1,N2}, {2,4,2} );
	- TestMDRange_3D functor(N0,N1,N2);
	+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } }, tile_type{ { 4, 4, 4, 4 } } );
	+
	+ TestMDRange_4D functor( N0, N1, N2, N3 );

	md_parallel_for( range, functor );

	HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	- Kokkos::deep_copy( h_view , functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );

	int counter = 0;
	- for ( int i=0; i<N0; ++i ) {
	- for ( int j=0; j<N1; ++j ) {
	- for ( int k=0; k<N2; ++k ) {
	- if ( h_view(i,j,k) != 1 ) {
	- ++counter;
	- }
	- }}}
	- if ( counter != 0 )
	- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
	- ASSERT_EQ( counter , 0 );
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ {
	+ if ( h_view( i, j, k, l ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for4; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	}

	- } //end test_for3
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4, Iterate::Right, Iterate::Right>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } }, tile_type{ { 4, 4, 4, 4 } } );
	+
	+ TestMDRange_4D functor( N0, N1, N2, N3 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ {
	+ if ( h_view( i, j, k, l ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for4; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+ } // end test_for4
	};

	-} /* namespace */
	-} /* namespace Test */
	+template <typename ExecSpace >
	+struct TestMDRange_5D {
	+ using DataType = int;
	+ using ViewType = typename Kokkos::View< DataType*****, ExecSpace >;
	+ using HostViewType = typename ViewType::HostMirror;
	+
	+ ViewType input_view;
	+
	+ TestMDRange_5D( const DataType N0, const DataType N1, const DataType N2, const DataType N3, const DataType N4 ) : input_view( "input_view", N0, N1, N2, N3, N4 ) {}
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const int i, const int j, const int k, const int l, const int m ) const
	+ {
	+ input_view( i, j, k, l, m ) = 1;
	+ }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const int i, const int j, const int k, const int l, const int m, double &lsum ) const
	+ {
	+ lsum += input_view( i, j, k, l, m ) * 2;
	+ }
	+
	+ // tagged operators
	+ struct InitTag {};
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const InitTag &, const int i, const int j, const int k, const int l, const int m ) const
	+ {
	+ input_view( i, j, k, l, m ) = 3;
	+ }
	+
	+ static void test_for5( const int N0, const int N1, const int N2, const int N3, const int N4 )
	+ {
	+ using namespace Kokkos::Experimental;
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5> > range_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } } );
	+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ {
	+ if ( h_view( i, j, k, l, m ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "Defaults + No Tile: Errors in test_for5; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5>, Kokkos::IndexType<int>, InitTag > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } }, tile_type{ { 3, 3, 3, 3, 7 } } );
	+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ {
	+ if ( h_view( i, j, k, l, m ) != 3 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "Defaults + InitTag op(): Errors in test_for5; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } }, tile_type{ { 4, 4, 4, 2, 2 } } );
	+
	+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ {
	+ if ( h_view( i, j, k, l, m ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for5; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5, Iterate::Default, Iterate::Default>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } }, tile_type{ { 4, 4, 4, 2, 2 } } );
	+
	+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ {
	+ if ( h_view( i, j, k, l, m ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for5; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5, Iterate::Left, Iterate::Left>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } }, tile_type{ { 4, 4, 4, 2, 2 } } );
	+
	+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ {
	+ if ( h_view( i, j, k, l, m ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for5; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5, Iterate::Left, Iterate::Right>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } }, tile_type{ { 4, 4, 4, 2, 2 } } );
	+
	+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ {
	+ if ( h_view( i, j, k, l, m ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for5; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5, Iterate::Right, Iterate::Left>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } }, tile_type{ { 4, 4, 4, 2, 2 } } );
	+
	+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ {
	+ if ( h_view( i, j, k, l, m ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for5; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5, Iterate::Right, Iterate::Right>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } }, tile_type{ { 4, 4, 4, 2, 2 } } );
	+
	+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ {
	+ if ( h_view( i, j, k, l, m ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for5; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+ }
	+};
	+
	+template <typename ExecSpace >
	+struct TestMDRange_6D {
	+ using DataType = int;
	+ using ViewType = typename Kokkos::View< DataType******, ExecSpace >;
	+ using HostViewType = typename ViewType::HostMirror;
	+
	+ ViewType input_view;
	+
	+ TestMDRange_6D( const DataType N0, const DataType N1, const DataType N2, const DataType N3, const DataType N4, const DataType N5 ) : input_view( "input_view", N0, N1, N2, N3, N4, N5 ) {}
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const int i, const int j, const int k, const int l, const int m, const int n ) const
	+ {
	+ input_view( i, j, k, l, m, n ) = 1;
	+ }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const int i, const int j, const int k, const int l, const int m, const int n, double &lsum ) const
	+ {
	+ lsum += input_view( i, j, k, l, m, n ) * 2;
	+ }
	+
	+ // tagged operators
	+ struct InitTag {};
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const InitTag &, const int i, const int j, const int k, const int l, const int m, const int n ) const
	+ {
	+ input_view( i, j, k, l, m, n ) = 3;
	+ }
	+
	+ static void test_for6( const int N0, const int N1, const int N2, const int N3, const int N4, const int N5 )
	+ {
	+ using namespace Kokkos::Experimental;
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6> > range_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } } );
	+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ for ( int n = 0; n < N5; ++n )
	+ {
	+ if ( h_view( i, j, k, l, m, n ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "Defaults + No Tile: Errors in test_for6; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6>, Kokkos::IndexType<int>, InitTag > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } }, tile_type{ { 3, 3, 3, 3, 2, 3 } } ); //tile dims 3,3,3,3,3,3 more than cuda can handle with debugging
	+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ for ( int n = 0; n < N5; ++n )
	+ {
	+ if ( h_view( i, j, k, l, m, n ) != 3 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( "Defaults + InitTag op(): Errors in test_for6; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } }, tile_type{ { 4, 4, 4, 2, 2, 2 } } );
	+
	+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ for ( int n = 0; n < N5; ++n )
	+ {
	+ if ( h_view( i, j, k, l, m, n ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for6; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6, Iterate::Default, Iterate::Default>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } }, tile_type{ { 4, 4, 4, 2, 2, 2 } } );
	+
	+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ for ( int n = 0; n < N5; ++n )
	+ {
	+ if ( h_view( i, j, k, l, m, n ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for6; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6, Iterate::Left, Iterate::Left>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } }, tile_type{ { 4, 4, 4, 2, 2, 2 } } );
	+
	+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ for ( int n = 0; n < N5; ++n )
	+ {
	+ if ( h_view( i, j, k, l, m, n ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for6; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6, Iterate::Left, Iterate::Right>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } }, tile_type{ { 4, 4, 4, 2, 2, 2 } } );
	+
	+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ for ( int n = 0; n < N5; ++n )
	+ {
	+ if ( h_view( i, j, k, l, m, n ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for6; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6, Iterate::Right, Iterate::Left>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } }, tile_type{ { 4, 4, 4, 2, 2, 2 } } );
	+
	+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ for ( int n = 0; n < N5; ++n )
	+ {
	+ if ( h_view( i, j, k, l, m, n ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for6; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+
	+ {
	+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6, Iterate::Right, Iterate::Right>, Kokkos::IndexType<int> > range_type;
	+ typedef typename range_type::tile_type tile_type;
	+ typedef typename range_type::point_type point_type;
	+
	+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } }, tile_type{ { 4, 4, 4, 2, 2, 2 } } );
	+
	+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
	+
	+ md_parallel_for( range, functor );
	+
	+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
	+ Kokkos::deep_copy( h_view, functor.input_view );
	+
	+ int counter = 0;
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < N2; ++k )
	+ for ( int l = 0; l < N3; ++l )
	+ for ( int m = 0; m < N4; ++m )
	+ for ( int n = 0; n < N5; ++n )
	+ {
	+ if ( h_view( i, j, k, l, m, n ) != 1 ) {
	+ ++counter;
	+ }
	+ }
	+
	+ if ( counter != 0 ) {
	+ printf( " Errors in test_for6; mismatches = %d\n\n", counter );
	+ }
	+
	+ ASSERT_EQ( counter, 0 );
	+ }
	+ }
	+};

	-/--------------------------------------------------------------------------/
	+} // namespace

	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/TestMemoryPool.hpp b/lib/kokkos/core/unit_test/TestMemoryPool.hpp
	index 868e64e9d..925f0e35e 100644
	--- a/lib/kokkos/core/unit_test/TestMemoryPool.hpp
	+++ b/lib/kokkos/core/unit_test/TestMemoryPool.hpp
	@@ -1,820 +1,820 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/


	#ifndef KOKKOS_UNITTEST_MEMPOOL_HPP
	#define KOKKOS_UNITTEST_MEMPOOL_HPP

	#include <stdio.h>
	#include <iostream>
	#include <cmath>
	#include <algorithm>

	#include <impl/Kokkos_Timer.hpp>

	//#define TESTMEMORYPOOL_PRINT
	//#define TESTMEMORYPOOL_PRINT_STATUS

	#define STRIDE 1
	#ifdef KOKKOS_ENABLE_CUDA
	#define STRIDE_ALLOC 32
	#else
	#define STRIDE_ALLOC 1
	#endif

	namespace TestMemoryPool {

	struct pointer_obj {
	uint64_t * ptr;

	KOKKOS_INLINE_FUNCTION
	pointer_obj() : ptr( 0 ) {}
	};

	struct pointer_obj2 {
	void * ptr;
	size_t size;

	KOKKOS_INLINE_FUNCTION
	pointer_obj2() : ptr( 0 ), size( 0 ) {}
	};

	template < typename PointerView, typename Allocator >
	struct allocate_memory {
	typedef typename PointerView::execution_space execution_space;
	typedef typename execution_space::size_type size_type;

	PointerView m_pointers;
	size_t m_chunk_size;
	Allocator m_mempool;

	allocate_memory( PointerView & ptrs, size_t num_ptrs,
	size_t cs, Allocator & m )
	: m_pointers( ptrs ), m_chunk_size( cs ), m_mempool( m )
	{
	// Initialize the view with the out degree of each vertex.
	Kokkos::parallel_for( num_ptrs * STRIDE_ALLOC, *this );
	}

	KOKKOS_INLINE_FUNCTION
	void operator()( size_type i ) const
	{
	if ( i % STRIDE_ALLOC == 0 ) {
	m_pointers[i / STRIDE_ALLOC].ptr =
	static_cast< uint64_t * >( m_mempool.allocate( m_chunk_size ) );
	}
	}
	};

	template < typename PointerView >
	struct count_invalid_memory {
	typedef typename PointerView::execution_space execution_space;
	typedef typename execution_space::size_type size_type;
	typedef uint64_t value_type;

	PointerView m_pointers;
	uint64_t & m_result;

	count_invalid_memory( PointerView & ptrs, size_t num_ptrs, uint64_t & res )
	: m_pointers( ptrs ), m_result( res )
	{
	// Initialize the view with the out degree of each vertex.
	Kokkos::parallel_reduce( num_ptrs * STRIDE, *this, m_result );
	}

	KOKKOS_INLINE_FUNCTION
	void init( value_type & v ) const
	{ v = 0; }

	KOKKOS_INLINE_FUNCTION
	void join( volatile value_type & dst, volatile value_type const & src ) const
	{ dst += src; }

	KOKKOS_INLINE_FUNCTION
	void operator()( size_type i, value_type & r ) const
	{
	if ( i % STRIDE == 0 ) {
	r += ( m_pointers[i / STRIDE].ptr == 0 );
	}
	}
	};

	template < typename PointerView >
	struct fill_memory {
	typedef typename PointerView::execution_space execution_space;
	typedef typename execution_space::size_type size_type;

	PointerView m_pointers;

	fill_memory( PointerView & ptrs, size_t num_ptrs ) : m_pointers( ptrs )
	{
	// Initialize the view with the out degree of each vertex.
	Kokkos::parallel_for( num_ptrs * STRIDE, *this );
	}

	KOKKOS_INLINE_FUNCTION
	void operator()( size_type i ) const
	{
	if ( i % STRIDE == 0 ) {
	- *m_pointers[i / STRIDE].ptr = i / STRIDE ;
	+ *m_pointers[i / STRIDE].ptr = i / STRIDE;
	}
	}
	};

	template < typename PointerView >
	struct sum_memory {
	typedef typename PointerView::execution_space execution_space;
	typedef typename execution_space::size_type size_type;
	typedef uint64_t value_type;

	PointerView m_pointers;
	uint64_t & m_result;

	sum_memory( PointerView & ptrs, size_t num_ptrs, uint64_t & res )
	: m_pointers( ptrs ), m_result( res )
	{
	// Initialize the view with the out degree of each vertex.
	Kokkos::parallel_reduce( num_ptrs * STRIDE, *this, m_result );
	}

	KOKKOS_INLINE_FUNCTION
	void init( value_type & v ) const
	{ v = 0; }

	KOKKOS_INLINE_FUNCTION
	void join( volatile value_type & dst, volatile value_type const & src ) const
	{ dst += src; }

	KOKKOS_INLINE_FUNCTION
	void operator()( size_type i, value_type & r ) const
	{
	if ( i % STRIDE == 0 ) {
	r += *m_pointers[i / STRIDE].ptr;
	}
	}
	};

	template < typename PointerView, typename Allocator >
	struct deallocate_memory {
	typedef typename PointerView::execution_space execution_space;
	typedef typename execution_space::size_type size_type;

	PointerView m_pointers;
	size_t m_chunk_size;
	Allocator m_mempool;

	deallocate_memory( PointerView & ptrs, size_t num_ptrs,
	size_t cs, Allocator & m )
	: m_pointers( ptrs ), m_chunk_size( cs ), m_mempool( m )
	{
	// Initialize the view with the out degree of each vertex.
	Kokkos::parallel_for( num_ptrs * STRIDE, *this );
	}

	KOKKOS_INLINE_FUNCTION
	void operator()( size_type i ) const
	{
	if ( i % STRIDE == 0 ) {
	m_mempool.deallocate( m_pointers[i / STRIDE].ptr, m_chunk_size );
	}
	}
	};

	template < typename WorkView, typename PointerView, typename ScalarView,
	typename Allocator >
	struct allocate_deallocate_memory {
	typedef typename WorkView::execution_space execution_space;
	typedef typename execution_space::size_type size_type;

	WorkView m_work;
	PointerView m_pointers;
	ScalarView m_ptrs_front;
	ScalarView m_ptrs_back;
	Allocator m_mempool;

	allocate_deallocate_memory( WorkView & w, size_t work_size, PointerView & p,
	ScalarView pf, ScalarView pb, Allocator & m )
	: m_work( w ), m_pointers( p ), m_ptrs_front( pf ), m_ptrs_back( pb ),
	m_mempool( m )
	{
	// Initialize the view with the out degree of each vertex.
	Kokkos::parallel_for( work_size * STRIDE_ALLOC, *this );
	}

	KOKKOS_INLINE_FUNCTION
	void operator()( size_type i ) const
	{
	if ( i % STRIDE_ALLOC == 0 ) {
	unsigned my_work = m_work[i / STRIDE_ALLOC];

	if ( ( my_work & 1 ) == 0 ) {
	// Allocation.
	size_t pos = Kokkos::atomic_fetch_add( &m_ptrs_back(), 1 );
	size_t alloc_size = my_work >> 1;
	m_pointers[pos].ptr = m_mempool.allocate( alloc_size );
	m_pointers[pos].size = alloc_size;
	}
	else {
	// Deallocation.
	size_t pos = Kokkos::atomic_fetch_add( &m_ptrs_front(), 1 );
	m_mempool.deallocate( m_pointers[pos].ptr, m_pointers[pos].size );
	}
	}
	}
	};

	#define PRECISION 6
	#define SHIFTW 24
	#define SHIFTW2 12

	template < typename F >
	void print_results( const std::string & text, F elapsed_time )
	{
	std::cout << std::setw( SHIFTW ) << text << std::setw( SHIFTW2 )
	<< std::fixed << std::setprecision( PRECISION ) << elapsed_time
	<< std::endl;
	}

	template < typename F, typename T >
	void print_results( const std::string & text, unsigned long long width,
	F elapsed_time, T result )
	{
	std::cout << std::setw( SHIFTW ) << text << std::setw( SHIFTW2 )
	<< std::fixed << std::setprecision( PRECISION ) << elapsed_time
	<< " " << std::setw( width ) << result << std::endl;
	}

	template < typename F >
	void print_results( const std::string & text, unsigned long long width,
	F elapsed_time, const std::string & result )
	{
	std::cout << std::setw( SHIFTW ) << text << std::setw( SHIFTW2 )
	<< std::fixed << std::setprecision( PRECISION ) << elapsed_time
	<< " " << std::setw( width ) << result << std::endl;
	}

	// This test slams allocation and deallocation in a worse than real-world usage
	// scenario to see how bad the thread-safety really is by having a loop where
	// all threads allocate and a subsequent loop where all threads deallocate.
	// All of the allocation requests are for equal-sized chunks that are the base
	// chunk size of the memory pool. It also tests initialization of the memory
	// pool and breaking large chunks into smaller chunks to fulfill allocation
	// requests. It verifies that MemoryPool(), allocate(), and deallocate() work
	// correctly.
	template < class Device >
	bool test_mempool( size_t chunk_size, size_t total_size )
	{
	typedef typename Device::execution_space execution_space;
	typedef typename Device::memory_space memory_space;
	typedef Device device_type;
	typedef Kokkos::View< pointer_obj *, device_type > pointer_view;
	typedef Kokkos::Experimental::MemoryPool< device_type > pool_memory_space;

	uint64_t result = 0;
	size_t num_chunks = total_size / chunk_size;
	bool return_val = true;

	pointer_view pointers( "pointers", num_chunks );

	#ifdef TESTMEMORYPOOL_PRINT
	std::cout << "* test_mempool() *" << std::endl
	<< std::setw( SHIFTW ) << "chunk_size: " << std::setw( 12 )
	<< chunk_size << std::endl
	<< std::setw( SHIFTW ) << "total_size: " << std::setw( 12 )
	<< total_size << std::endl
	<< std::setw( SHIFTW ) << "num_chunks: " << std::setw( 12 )
	<< num_chunks << std::endl;

	double elapsed_time = 0;
	Kokkos::Timer timer;
	#endif

	pool_memory_space mempool( memory_space(), total_size * 1.2, 20 );

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "initialize mempool: ", elapsed_time );
	#ifdef TESTMEMORYPOOL_PRINT_STATUS
	mempool.print_status();
	#endif
	timer.reset();
	#endif

	{
	allocate_memory< pointer_view, pool_memory_space >
	am( pointers, num_chunks, chunk_size, mempool );
	}

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "allocate chunks: ", elapsed_time );
	#ifdef TESTMEMORYPOOL_PRINT_STATUS
	mempool.print_status();
	#endif
	timer.reset();
	#endif

	{
	count_invalid_memory< pointer_view > sm( pointers, num_chunks, result );
	}

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "invalid chunks: ", 16, elapsed_time, result );
	timer.reset();
	#endif

	{
	fill_memory< pointer_view > fm( pointers, num_chunks );
	}

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "fill chunks: ", elapsed_time );
	timer.reset();
	#endif

	{
	sum_memory< pointer_view > sm( pointers, num_chunks, result );
	}

	execution_space::fence();

	#ifdef TESTMEMORYPOOL_PRINT
	elapsed_time = timer.seconds();
	print_results( "sum chunks: ", 16, elapsed_time, result );
	#endif

	if ( result != ( num_chunks * ( num_chunks - 1 ) ) / 2 ) {
	std::cerr << "Invalid sum value in memory." << std::endl;
	return_val = false;
	}

	#ifdef TESTMEMORYPOOL_PRINT
	timer.reset();
	#endif

	{
	deallocate_memory< pointer_view, pool_memory_space >
	dm( pointers, num_chunks, chunk_size, mempool );
	}

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "deallocate chunks: ", elapsed_time );
	#ifdef TESTMEMORYPOOL_PRINT_STATUS
	mempool.print_status();
	#endif
	timer.reset();
	#endif

	{
	allocate_memory< pointer_view, pool_memory_space >
	am( pointers, num_chunks, chunk_size, mempool );
	}

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "allocate chunks: ", elapsed_time );
	#ifdef TESTMEMORYPOOL_PRINT_STATUS
	mempool.print_status();
	#endif
	timer.reset();
	#endif

	{
	count_invalid_memory< pointer_view > sm( pointers, num_chunks, result );
	}

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "invalid chunks: ", 16, elapsed_time, result );
	timer.reset();
	#endif

	{
	fill_memory< pointer_view > fm( pointers, num_chunks );
	}

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "fill chunks: ", elapsed_time );
	timer.reset();
	#endif

	{
	sum_memory< pointer_view > sm( pointers, num_chunks, result );
	}

	execution_space::fence();

	#ifdef TESTMEMORYPOOL_PRINT
	elapsed_time = timer.seconds();
	print_results( "sum chunks: ", 16, elapsed_time, result );
	#endif

	if ( result != ( num_chunks * ( num_chunks - 1 ) ) / 2 ) {
	std::cerr << "Invalid sum value in memory." << std::endl;
	return_val = false;
	}

	#ifdef TESTMEMORYPOOL_PRINT
	timer.reset();
	#endif

	{
	deallocate_memory< pointer_view, pool_memory_space >
	dm( pointers, num_chunks, chunk_size, mempool );
	}

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "deallocate chunks: ", elapsed_time );
	#ifdef TESTMEMORYPOOL_PRINT_STATUS
	mempool.print_status();
	#endif
	#endif

	return return_val;
	}

	template < typename T >
	T smallest_power2_ge( T val )
	{
	// Find the most significant nonzero bit.
	int first_nonzero_bit = Kokkos::Impl::bit_scan_reverse( val );

	- // If val is an integral power of 2, ceil( log2(val) ) is equal to the
	+ // If val is an integral power of 2, ceil( log2( val ) ) is equal to the
	// most significant nonzero bit. Otherwise, you need to add 1.
	int lg2_size = first_nonzero_bit +
	!Kokkos::Impl::is_integral_power_of_two( val );

	- return T(1) << T(lg2_size);
	+ return T( 1 ) << T( lg2_size );
	}

	// This test makes allocation requests for multiple sizes and interleaves
	// allocation and deallocation.
	//
	// There are 3 phases. The first phase does only allocations to build up a
	// working state for the allocator. The second phase interleaves allocations
	// and deletions. The third phase does only deallocations to undo all the
	// allocations from the first phase. By building first to a working state,
	// allocations and deallocations can happen in any order for the second phase.
	// Each phase performs on multiple chunk sizes.
	template < class Device >
	void test_mempool2( unsigned base_chunk_size, size_t num_chunk_sizes,
	size_t phase1_size, size_t phase2_size )
	{
	#ifdef TESTMEMORYPOOL_PRINT
	typedef typename Device::execution_space execution_space;
	#endif
	typedef typename Device::memory_space memory_space;
	typedef Device device_type;
	typedef Kokkos::View< unsigned *, device_type > work_view;
	typedef Kokkos::View< size_t, device_type > scalar_view;
	typedef Kokkos::View< pointer_obj2 *, device_type > pointer_view;
	typedef Kokkos::Experimental::MemoryPool< device_type > pool_memory_space;

	enum {
	MIN_CHUNK_SIZE = 64,
	MIN_BASE_CHUNK_SIZE = MIN_CHUNK_SIZE / 2 + 1
	};

	// Make sure the base chunk size is at least MIN_BASE_CHUNK_SIZE bytes, so
	// all the different chunk sizes translate to different block sizes for the
	// allocator.
	if ( base_chunk_size < MIN_BASE_CHUNK_SIZE ) {
	base_chunk_size = MIN_BASE_CHUNK_SIZE;
	}

	// Get the smallest power of 2 >= the base chunk size. The size must be
	// >= MIN_CHUNK_SIZE, though.
	unsigned ceil_base_chunk_size = smallest_power2_ge( base_chunk_size );
	if ( ceil_base_chunk_size < MIN_CHUNK_SIZE ) {
	ceil_base_chunk_size = MIN_CHUNK_SIZE;
	}

	// Make sure the phase 1 size is multiples of num_chunk_sizes.
	phase1_size = ( ( phase1_size + num_chunk_sizes - 1 ) / num_chunk_sizes ) *
	num_chunk_sizes;

	- // Make sure the phase 2 size is multiples of (2 * num_chunk_sizes).
	+ // Make sure the phase 2 size is multiples of ( 2 * num_chunk_sizes ).
	phase2_size =
	( ( phase2_size + 2 * num_chunk_sizes - 1 ) / ( 2 * num_chunk_sizes ) ) *
	2 * num_chunk_sizes;

	// The phase2 size must be <= twice the phase1 size so that deallocations
	// can't happen before allocations.
	if ( phase2_size > 2 * phase1_size ) phase2_size = 2 * phase1_size;

	size_t phase3_size = phase1_size;
	size_t half_phase2_size = phase2_size / 2;

	// Each entry in the work views has the following format. The least
	// significant bit indicates allocation (0) vs. deallocation (1). For
	// allocation, the other bits indicate the desired allocation size.

	// Initialize the phase 1 work view with an equal number of allocations for
	// each chunk size.
	work_view phase1_work( "Phase 1 Work", phase1_size );
	typename work_view::HostMirror host_phase1_work =
	- create_mirror_view(phase1_work);
	+ create_mirror_view( phase1_work );

	size_t inner_size = phase1_size / num_chunk_sizes;
	unsigned chunk_size = base_chunk_size;

	for ( size_t i = 0; i < num_chunk_sizes; ++i ) {
	for ( size_t j = 0; j < inner_size; ++j ) {
	host_phase1_work[i * inner_size + j] = chunk_size << 1;
	}

	chunk_size *= 2;
	}

	std::random_shuffle( host_phase1_work.ptr_on_device(),
	host_phase1_work.ptr_on_device() + phase1_size );

	deep_copy( phase1_work, host_phase1_work );

	// Initialize the phase 2 work view with half allocations and half
	// deallocations with an equal number of allocations for each chunk size.
	work_view phase2_work( "Phase 2 Work", phase2_size );
	typename work_view::HostMirror host_phase2_work =
	- create_mirror_view(phase2_work);
	+ create_mirror_view( phase2_work );

	inner_size = half_phase2_size / num_chunk_sizes;
	chunk_size = base_chunk_size;

	for ( size_t i = 0; i < num_chunk_sizes; ++i ) {
	for ( size_t j = 0; j < inner_size; ++j ) {
	host_phase2_work[i * inner_size + j] = chunk_size << 1;
	}

	chunk_size *= 2;
	}

	for ( size_t i = half_phase2_size; i < phase2_size; ++i ) {
	host_phase2_work[i] = 1;
	}

	std::random_shuffle( host_phase2_work.ptr_on_device(),
	host_phase2_work.ptr_on_device() + phase2_size );

	deep_copy( phase2_work, host_phase2_work );

	// Initialize the phase 3 work view with all deallocations.
	work_view phase3_work( "Phase 3 Work", phase3_size );
	typename work_view::HostMirror host_phase3_work =
	- create_mirror_view(phase3_work);
	+ create_mirror_view( phase3_work );

	inner_size = phase3_size / num_chunk_sizes;

	for ( size_t i = 0; i < phase3_size; ++i ) host_phase3_work[i] = 1;

	deep_copy( phase3_work, host_phase3_work );

	// Calculate the amount of memory needed for the allocator. We need to know
	// the number of superblocks required for each chunk size and use that to
	// calculate the amount of memory for each chunk size.
	size_t lg_sb_size = 18;
	size_t sb_size = 1 << lg_sb_size;
	size_t total_size = 0;
	size_t allocs_per_size = phase1_size / num_chunk_sizes +
	half_phase2_size / num_chunk_sizes;

	chunk_size = ceil_base_chunk_size;
	for ( size_t i = 0; i < num_chunk_sizes; ++i ) {
	size_t my_size = allocs_per_size * chunk_size;
	total_size += ( my_size + sb_size - 1 ) / sb_size * sb_size;
	chunk_size *= 2;
	}

	// Declare the queue to hold the records for allocated memory. An allocation
	// adds a record to the back of the queue, and a deallocation removes a
	// record from the front of the queue.
	size_t num_allocations = phase1_size + half_phase2_size;
	scalar_view ptrs_front( "Pointers front" );
	scalar_view ptrs_back( "Pointers back" );

	pointer_view pointers( "pointers", num_allocations );

	#ifdef TESTMEMORYPOOL_PRINT
	printf( "\n* test_mempool2() *\n" );
	printf( " num_chunk_sizes: %12zu\n", num_chunk_sizes );
	printf( " base_chunk_size: %12u\n", base_chunk_size );
	printf( " ceil_base_chunk_size: %12u\n", ceil_base_chunk_size );
	printf( " phase1_size: %12zu\n", phase1_size );
	printf( " phase2_size: %12zu\n", phase2_size );
	printf( " phase3_size: %12zu\n", phase3_size );
	printf( " allocs_per_size: %12zu\n", allocs_per_size );
	printf( " num_allocations: %12zu\n", num_allocations );
	printf( " total_size: %12zu\n", total_size );
	fflush( stdout );

	double elapsed_time = 0;
	Kokkos::Timer timer;
	#endif

	pool_memory_space mempool( memory_space(), total_size * 1.2, lg_sb_size );

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "initialize mempool: ", elapsed_time );

	#ifdef TESTMEMORYPOOL_PRINT_STATUS
	mempool.print_status();
	#endif

	timer.reset();
	#endif

	{
	allocate_deallocate_memory< work_view, pointer_view, scalar_view,
	pool_memory_space >
	adm( phase1_work, phase1_size, pointers, ptrs_front, ptrs_back, mempool );
	}

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "phase1: ", elapsed_time );

	#ifdef TESTMEMORYPOOL_PRINT_STATUS
	mempool.print_status();
	#endif

	timer.reset();
	#endif

	{
	allocate_deallocate_memory< work_view, pointer_view, scalar_view,
	pool_memory_space >
	adm( phase2_work, phase2_size, pointers, ptrs_front, ptrs_back, mempool );
	}

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "phase2: ", elapsed_time );

	#ifdef TESTMEMORYPOOL_PRINT_STATUS
	mempool.print_status();
	#endif

	timer.reset();
	#endif

	{
	allocate_deallocate_memory< work_view, pointer_view, scalar_view,
	pool_memory_space >
	adm( phase3_work, phase3_size, pointers, ptrs_front, ptrs_back, mempool );
	}

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "phase3: ", elapsed_time );
	#ifdef TESTMEMORYPOOL_PRINT_STATUS
	mempool.print_status();
	#endif
	#endif
	}

	// Tests for correct behavior when the allocator is out of memory.
	template < class Device >
	void test_memory_exhaustion()
	{
	#ifdef TESTMEMORYPOOL_PRINT
	typedef typename Device::execution_space execution_space;
	#endif
	typedef typename Device::memory_space memory_space;
	typedef Device device_type;
	typedef Kokkos::View< pointer_obj *, device_type > pointer_view;
	typedef Kokkos::Experimental::MemoryPool< device_type > pool_memory_space;

	// The allocator will have a single superblock, and allocations will all be
	// of the same chunk size. The allocation loop will attempt to allocate
	// twice the number of chunks as are available in the allocator. The
	// deallocation loop will only free the successfully allocated chunks.

	size_t chunk_size = 128;
	size_t num_chunks = 128;
	size_t half_num_chunks = num_chunks / 2;
	size_t superblock_size = chunk_size * half_num_chunks;
	size_t lg_superblock_size =
	Kokkos::Impl::integral_power_of_two( superblock_size );

	pointer_view pointers( "pointers", num_chunks );

	#ifdef TESTMEMORYPOOL_PRINT
	std::cout << "\n* test_memory_exhaustion() *" << std::endl;

	double elapsed_time = 0;
	Kokkos::Timer timer;
	#endif

	pool_memory_space mempool( memory_space(), superblock_size,
	lg_superblock_size );

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "initialize mempool: ", elapsed_time );
	#ifdef TESTMEMORYPOOL_PRINT_STATUS
	mempool.print_status();
	#endif
	timer.reset();
	#endif

	{
	allocate_memory< pointer_view, pool_memory_space >
	am( pointers, num_chunks, chunk_size, mempool );
	}

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "allocate chunks: ", elapsed_time );
	#ifdef TESTMEMORYPOOL_PRINT_STATUS
	mempool.print_status();
	#endif
	timer.reset();
	#endif

	{
	// In parallel, the allocations that succeeded were not put contiguously
	// into the pointers View. The whole View can still be looped over and
	// have deallocate called because deallocate will just do nothing for NULL
	// pointers.
	deallocate_memory< pointer_view, pool_memory_space >
	dm( pointers, num_chunks, chunk_size, mempool );
	}

	#ifdef TESTMEMORYPOOL_PRINT
	execution_space::fence();
	elapsed_time = timer.seconds();
	print_results( "deallocate chunks: ", elapsed_time );
	#ifdef TESTMEMORYPOOL_PRINT_STATUS
	mempool.print_status();
	#endif
	#endif
	}

	}

	#undef TESTMEMORYPOOL_PRINT
	#undef TESTMEMORYPOOL_PRINT_STATUS
	#undef STRIDE
	#undef STRIDE_ALLOC

	#endif
	diff --git a/lib/kokkos/core/unit_test/TestPolicyConstruction.hpp b/lib/kokkos/core/unit_test/TestPolicyConstruction.hpp
	index 1bb45481c..6f2ca6a61 100644
	--- a/lib/kokkos/core/unit_test/TestPolicyConstruction.hpp
	+++ b/lib/kokkos/core/unit_test/TestPolicyConstruction.hpp
	@@ -1,497 +1,528 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <Kokkos_Core.hpp>
	#include <stdexcept>
	#include <sstream>
	#include <iostream>

	-struct SomeTag{};
	+struct SomeTag {};

	template< class ExecutionSpace >
	class TestRangePolicyConstruction {
	public:
	TestRangePolicyConstruction() {
	test_compile_time_parameters();
	}
	+
	private:
	void test_compile_time_parameters() {
	{
	Kokkos::Impl::expand_variadic();
	- Kokkos::Impl::expand_variadic(1,2,3);
	+ Kokkos::Impl::expand_variadic( 1, 2, 3 );
	}
	+
	{
	typedef Kokkos::RangePolicy<> policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Static> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Static> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::RangePolicy<ExecutionSpace> policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Static> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef Kokkos::RangePolicy< ExecutionSpace > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Static> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::RangePolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef Kokkos::RangePolicy< ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::RangePolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef Kokkos::RangePolicy< ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long> > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::RangePolicy<Kokkos::IndexType<long>, ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef Kokkos::RangePolicy< Kokkos::IndexType<long>, ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::RangePolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
	+ typedef Kokkos::RangePolicy< ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, SomeTag > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
	}
	+
	{
	- typedef Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic>,ExecutionSpace,Kokkos::IndexType<long>,SomeTag > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
	+ typedef Kokkos::RangePolicy< Kokkos::Schedule<Kokkos::Dynamic>, ExecutionSpace, Kokkos::IndexType<long>, SomeTag > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
	}
	+
	{
	- typedef Kokkos::RangePolicy<SomeTag,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,ExecutionSpace > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
	+ typedef Kokkos::RangePolicy< SomeTag, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, ExecutionSpace > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
	}
	+
	{
	- typedef Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef Kokkos::RangePolicy< Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef Kokkos::RangePolicy< Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long> > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::RangePolicy<Kokkos::IndexType<long>, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef Kokkos::RangePolicy< Kokkos::IndexType<long>, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
	+ typedef Kokkos::RangePolicy< Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, SomeTag > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
	}
	+
	{
	- typedef Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
	+ typedef Kokkos::RangePolicy< Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, SomeTag > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
	}
	+
	{
	- typedef Kokkos::RangePolicy<SomeTag,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
	+ typedef Kokkos::RangePolicy< SomeTag, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long> > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
	}
	}
	};

	template< class ExecutionSpace >
	class TestTeamPolicyConstruction {
	public:
	TestTeamPolicyConstruction() {
	test_compile_time_parameters();
	test_run_time_parameters();
	}
	+
	private:
	void test_compile_time_parameters() {
	{
	typedef Kokkos::TeamPolicy<> policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Static> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Static> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Static> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Static> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::TeamPolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef Kokkos::TeamPolicy< ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::TeamPolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef Kokkos::TeamPolicy< ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long> > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::TeamPolicy<Kokkos::IndexType<long>, ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef Kokkos::TeamPolicy< Kokkos::IndexType<long>, ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::TeamPolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
	+ typedef Kokkos::TeamPolicy< ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, SomeTag > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
	}
	+
	{
	- typedef Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>,ExecutionSpace,Kokkos::IndexType<long>,SomeTag > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
	+ typedef Kokkos::TeamPolicy< Kokkos::Schedule<Kokkos::Dynamic>, ExecutionSpace, Kokkos::IndexType<long>, SomeTag > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
	}
	+
	{
	- typedef Kokkos::TeamPolicy<SomeTag,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,ExecutionSpace > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
	+ typedef Kokkos::TeamPolicy< SomeTag, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, ExecutionSpace > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
	}
	+
	{
	- typedef Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef Kokkos::TeamPolicy< Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef Kokkos::TeamPolicy< Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long> > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::TeamPolicy<Kokkos::IndexType<long>, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
	+ typedef Kokkos::TeamPolicy< Kokkos::IndexType<long>, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
	}
	+
	{
	- typedef Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
	+ typedef Kokkos::TeamPolicy< Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, SomeTag > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
	}
	+
	{
	- typedef Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
	+ typedef Kokkos::TeamPolicy< Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, SomeTag > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
	}
	+
	{
	- typedef Kokkos::TeamPolicy<SomeTag,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
	- typedef typename policy_t::execution_space execution_space;
	- typedef typename policy_t::index_type index_type;
	- typedef typename policy_t::schedule_type schedule_type;
	- typedef typename policy_t::work_tag work_tag;
	-
	- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
	- ASSERT_TRUE((std::is_same<index_type ,long >::value));
	- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
	- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
	+ typedef Kokkos::TeamPolicy< SomeTag, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long> > policy_t;
	+ typedef typename policy_t::execution_space execution_space;
	+ typedef typename policy_t::index_type index_type;
	+ typedef typename policy_t::schedule_type schedule_type;
	+ typedef typename policy_t::work_tag work_tag;
	+
	+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
	+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
	+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
	+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
	}
	}


	- template<class policy_t>
	+ template< class policy_t >
	void test_run_time_parameters_type() {
	int league_size = 131;
	- int team_size = 4<policy_t::execution_space::concurrency()?4:policy_t::execution_space::concurrency();
	+ int team_size = 4 < policy_t::execution_space::concurrency() ? 4 : policy_t::execution_space::concurrency();
	int chunk_size = 4;
	int per_team_scratch = 1024;
	int per_thread_scratch = 16;
	- int scratch_size = per_team_scratch + per_thread_scratch*team_size;
	- policy_t p1(league_size,team_size);
	- ASSERT_EQ (p1.league_size() , league_size);
	- ASSERT_EQ (p1.team_size() , team_size);
	- ASSERT_TRUE(p1.chunk_size() > 0);
	- ASSERT_EQ (p1.scratch_size(0), 0);
	-
	- policy_t p2 = p1.set_chunk_size(chunk_size);
	- ASSERT_EQ (p1.league_size() , league_size);
	- ASSERT_EQ (p1.team_size() , team_size);
	- ASSERT_TRUE(p1.chunk_size() > 0);
	- ASSERT_EQ (p1.scratch_size(0), 0);
	-
	- ASSERT_EQ (p2.league_size() , league_size);
	- ASSERT_EQ (p2.team_size() , team_size);
	- ASSERT_EQ (p2.chunk_size() , chunk_size);
	- ASSERT_EQ (p2.scratch_size(0), 0);
	-
	- policy_t p3 = p2.set_scratch_size(0,Kokkos::PerTeam(per_team_scratch));
	- ASSERT_EQ (p2.league_size() , league_size);
	- ASSERT_EQ (p2.team_size() , team_size);
	- ASSERT_EQ (p2.chunk_size() , chunk_size);
	- ASSERT_EQ (p2.scratch_size(0), 0);
	- ASSERT_EQ (p3.league_size() , league_size);
	- ASSERT_EQ (p3.team_size() , team_size);
	- ASSERT_EQ (p3.chunk_size() , chunk_size);
	- ASSERT_EQ (p3.scratch_size(0), per_team_scratch);
	-
	- policy_t p4 = p2.set_scratch_size(0,Kokkos::PerThread(per_thread_scratch));
	- ASSERT_EQ (p2.league_size() , league_size);
	- ASSERT_EQ (p2.team_size() , team_size);
	- ASSERT_EQ (p2.chunk_size() , chunk_size);
	- ASSERT_EQ (p2.scratch_size(0), 0);
	- ASSERT_EQ (p4.league_size() , league_size);
	- ASSERT_EQ (p4.team_size() , team_size);
	- ASSERT_EQ (p4.chunk_size() , chunk_size);
	- ASSERT_EQ (p4.scratch_size(0), per_thread_scratch*team_size);
	-
	- policy_t p5 = p2.set_scratch_size(0,Kokkos::PerThread(per_thread_scratch),Kokkos::PerTeam(per_team_scratch));
	- ASSERT_EQ (p2.league_size() , league_size);
	- ASSERT_EQ (p2.team_size() , team_size);
	- ASSERT_EQ (p2.chunk_size() , chunk_size);
	- ASSERT_EQ (p2.scratch_size(0), 0);
	- ASSERT_EQ (p5.league_size() , league_size);
	- ASSERT_EQ (p5.team_size() , team_size);
	- ASSERT_EQ (p5.chunk_size() , chunk_size);
	- ASSERT_EQ (p5.scratch_size(0), scratch_size);
	-
	- policy_t p6 = p2.set_scratch_size(0,Kokkos::PerTeam(per_team_scratch),Kokkos::PerThread(per_thread_scratch));
	- ASSERT_EQ (p2.league_size() , league_size);
	- ASSERT_EQ (p2.team_size() , team_size);
	- ASSERT_EQ (p2.chunk_size() , chunk_size);
	- ASSERT_EQ (p2.scratch_size(0), 0);
	- ASSERT_EQ (p6.league_size() , league_size);
	- ASSERT_EQ (p6.team_size() , team_size);
	- ASSERT_EQ (p6.chunk_size() , chunk_size);
	- ASSERT_EQ (p6.scratch_size(0), scratch_size);
	-
	- policy_t p7 = p3.set_scratch_size(0,Kokkos::PerTeam(per_team_scratch),Kokkos::PerThread(per_thread_scratch));
	- ASSERT_EQ (p3.league_size() , league_size);
	- ASSERT_EQ (p3.team_size() , team_size);
	- ASSERT_EQ (p3.chunk_size() , chunk_size);
	- ASSERT_EQ (p3.scratch_size(0), per_team_scratch);
	- ASSERT_EQ (p7.league_size() , league_size);
	- ASSERT_EQ (p7.team_size() , team_size);
	- ASSERT_EQ (p7.chunk_size() , chunk_size);
	- ASSERT_EQ (p7.scratch_size(0), scratch_size);
	-}
	+ int scratch_size = per_team_scratch + per_thread_scratch * team_size;
	+
	+ policy_t p1( league_size, team_size );
	+ ASSERT_EQ ( p1.league_size(), league_size );
	+ ASSERT_EQ ( p1.team_size(), team_size );
	+ ASSERT_TRUE( p1.chunk_size() > 0 );
	+ ASSERT_EQ ( p1.scratch_size( 0 ), 0 );
	+
	+ policy_t p2 = p1.set_chunk_size( chunk_size );
	+ ASSERT_EQ ( p1.league_size(), league_size );
	+ ASSERT_EQ ( p1.team_size(), team_size );
	+ ASSERT_TRUE( p1.chunk_size() > 0 );
	+ ASSERT_EQ ( p1.scratch_size( 0 ), 0 );
	+
	+ ASSERT_EQ ( p2.league_size(), league_size );
	+ ASSERT_EQ ( p2.team_size(), team_size );
	+ ASSERT_EQ ( p2.chunk_size(), chunk_size );
	+ ASSERT_EQ ( p2.scratch_size( 0 ), 0 );
	+
	+ policy_t p3 = p2.set_scratch_size( 0, Kokkos::PerTeam( per_team_scratch ) );
	+ ASSERT_EQ ( p2.league_size(), league_size );
	+ ASSERT_EQ ( p2.team_size(), team_size );
	+ ASSERT_EQ ( p2.chunk_size(), chunk_size );
	+ ASSERT_EQ ( p2.scratch_size( 0 ), 0 );
	+ ASSERT_EQ ( p3.league_size(), league_size );
	+ ASSERT_EQ ( p3.team_size(), team_size );
	+ ASSERT_EQ ( p3.chunk_size(), chunk_size );
	+ ASSERT_EQ ( p3.scratch_size( 0 ), per_team_scratch );
	+
	+ policy_t p4 = p2.set_scratch_size( 0, Kokkos::PerThread( per_thread_scratch ) );
	+ ASSERT_EQ ( p2.league_size(), league_size );
	+ ASSERT_EQ ( p2.team_size(), team_size );
	+ ASSERT_EQ ( p2.chunk_size(), chunk_size );
	+ ASSERT_EQ ( p2.scratch_size( 0 ), 0 );
	+ ASSERT_EQ ( p4.league_size(), league_size );
	+ ASSERT_EQ ( p4.team_size(), team_size );
	+ ASSERT_EQ ( p4.chunk_size(), chunk_size );
	+ ASSERT_EQ ( p4.scratch_size( 0 ), per_thread_scratch * team_size );
	+
	+ policy_t p5 = p2.set_scratch_size( 0, Kokkos::PerThread( per_thread_scratch ), Kokkos::PerTeam( per_team_scratch ) );
	+ ASSERT_EQ ( p2.league_size(), league_size );
	+ ASSERT_EQ ( p2.team_size(), team_size );
	+ ASSERT_EQ ( p2.chunk_size(), chunk_size );
	+ ASSERT_EQ ( p2.scratch_size( 0 ), 0 );
	+ ASSERT_EQ ( p5.league_size(), league_size );
	+ ASSERT_EQ ( p5.team_size(), team_size );
	+ ASSERT_EQ ( p5.chunk_size(), chunk_size );
	+ ASSERT_EQ ( p5.scratch_size( 0 ), scratch_size );
	+
	+ policy_t p6 = p2.set_scratch_size( 0, Kokkos::PerTeam( per_team_scratch ), Kokkos::PerThread( per_thread_scratch ) );
	+ ASSERT_EQ ( p2.league_size(), league_size );
	+ ASSERT_EQ ( p2.team_size(), team_size );
	+ ASSERT_EQ ( p2.chunk_size(), chunk_size );
	+ ASSERT_EQ ( p2.scratch_size( 0 ), 0 );
	+ ASSERT_EQ ( p6.league_size(), league_size );
	+ ASSERT_EQ ( p6.team_size(), team_size );
	+ ASSERT_EQ ( p6.chunk_size(), chunk_size );
	+ ASSERT_EQ ( p6.scratch_size( 0 ), scratch_size );
	+
	+ policy_t p7 = p3.set_scratch_size( 0, Kokkos::PerTeam( per_team_scratch ), Kokkos::PerThread( per_thread_scratch ) );
	+ ASSERT_EQ ( p3.league_size(), league_size );
	+ ASSERT_EQ ( p3.team_size(), team_size );
	+ ASSERT_EQ ( p3.chunk_size(), chunk_size );
	+ ASSERT_EQ ( p3.scratch_size( 0 ), per_team_scratch );
	+ ASSERT_EQ ( p7.league_size(), league_size );
	+ ASSERT_EQ ( p7.team_size(), team_size );
	+ ASSERT_EQ ( p7.chunk_size(), chunk_size );
	+ ASSERT_EQ ( p7.scratch_size( 0 ), scratch_size );
	+ }
	+
	void test_run_time_parameters() {
	- test_run_time_parameters_type<Kokkos::TeamPolicy<ExecutionSpace> >();
	- test_run_time_parameters_type<Kokkos::TeamPolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > >();
	- test_run_time_parameters_type<Kokkos::TeamPolicy<Kokkos::IndexType<long>, ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic> > >();
	- test_run_time_parameters_type<Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,ExecutionSpace,SomeTag > >();
	+ test_run_time_parameters_type< Kokkos::TeamPolicy<ExecutionSpace> >();
	+ test_run_time_parameters_type< Kokkos::TeamPolicy<ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long> > >();
	+ test_run_time_parameters_type< Kokkos::TeamPolicy<Kokkos::IndexType<long>, ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic> > >();
	+ test_run_time_parameters_type< Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, ExecutionSpace, SomeTag > >();
	}
	};
	diff --git a/lib/kokkos/core/unit_test/TestQthread.cpp b/lib/kokkos/core/unit_test/TestQthread.cpp
	deleted file mode 100644
	index a465f39ca..000000000
	--- a/lib/kokkos/core/unit_test/TestQthread.cpp
	+++ /dev/null
	@@ -1,287 +0,0 @@
	-/*
	-//@HEADER
	-// ************************************************************************
	-//
	-// Kokkos v. 2.0
	-// Copyright (2014) Sandia Corporation
	-//
	-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	-// the U.S. Government retains certain rights in this software.
	-//
	-// Redistribution and use in source and binary forms, with or without
	-// modification, are permitted provided that the following conditions are
	-// met:
	-//
	-// 1. Redistributions of source code must retain the above copyright
	-// notice, this list of conditions and the following disclaimer.
	-//
	-// 2. Redistributions in binary form must reproduce the above copyright
	-// notice, this list of conditions and the following disclaimer in the
	-// documentation and/or other materials provided with the distribution.
	-//
	-// 3. Neither the name of the Corporation nor the names of the
	-// contributors may be used to endorse or promote products derived from
	-// this software without specific prior written permission.
	-//
	-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	-//
	-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	-// ************************************************************************
	-//@HEADER
	-*/
	-
	-#include <gtest/gtest.h>
	-
	-#include <Kokkos_Core.hpp>
	-#include <Kokkos_Qthread.hpp>
	-
	-//----------------------------------------------------------------------------
	-
	-#include <TestAtomic.hpp>
	-
	-#include <TestViewAPI.hpp>
	-#include <TestViewOfClass.hpp>
	-
	-#include <TestTeam.hpp>
	-#include <TestRange.hpp>
	-#include <TestReduce.hpp>
	-#include <TestScan.hpp>
	-#include <TestAggregate.hpp>
	-#include <TestCompilerMacros.hpp>
	-#include <TestTaskScheduler.hpp>
	-// #include <TestTeamVector.hpp>
	-
	-namespace Test {
	-
	-class qthread : public ::testing::Test {
	-protected:
	- static void SetUpTestCase()
	- {
	- const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
	- const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
	- const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
	-
	- int threads_count = std::max( 1u , numa_count )
	- * std::max( 2u , ( cores_per_numa * threads_per_core ) / 2 );
	- Kokkos::Qthread::initialize( threads_count );
	- Kokkos::Qthread::print_configuration( std::cout , true );
	- }
	-
	- static void TearDownTestCase()
	- {
	- Kokkos::Qthread::finalize();
	- }
	-};
	-
	-TEST_F( qthread , compiler_macros )
	-{
	- ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Qthread >() ) );
	-}
	-
	-TEST_F( qthread, view_impl) {
	- test_view_impl< Kokkos::Qthread >();
	-}
	-
	-TEST_F( qthread, view_api) {
	- TestViewAPI< double , Kokkos::Qthread >();
	-}
	-
	-TEST_F( qthread , view_nested_view )
	-{
	- ::Test::view_nested_view< Kokkos::Qthread >();
	-}
	-
	-TEST_F( qthread , range_tag )
	-{
	- TestRange< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
	- TestRange< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
	- TestRange< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
	-}
	-
	-TEST_F( qthread , team_tag )
	-{
	- TestTeamPolicy< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
	- TestTeamPolicy< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
	-}
	-
	-TEST_F( qthread, long_reduce) {
	- TestReduce< long , Kokkos::Qthread >( 1000000 );
	-}
	-
	-TEST_F( qthread, double_reduce) {
	- TestReduce< double , Kokkos::Qthread >( 1000000 );
	-}
	-
	-TEST_F( qthread, long_reduce_dynamic ) {
	- TestReduceDynamic< long , Kokkos::Qthread >( 1000000 );
	-}
	-
	-TEST_F( qthread, double_reduce_dynamic ) {
	- TestReduceDynamic< double , Kokkos::Qthread >( 1000000 );
	-}
	-
	-TEST_F( qthread, long_reduce_dynamic_view ) {
	- TestReduceDynamicView< long , Kokkos::Qthread >( 1000000 );
	-}
	-
	-TEST_F( qthread, team_long_reduce) {
	- TestReduceTeam< long , Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >( 1000000 );
	-}
	-
	-TEST_F( qthread, team_double_reduce) {
	- TestReduceTeam< double , Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >( 1000000 );
	-}
	-
	-
	-TEST_F( qthread , atomics )
	-{
	- const int loop_count = 1e4 ;
	-
	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Qthread>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Qthread>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Qthread>(loop_count,3) ) );
	-
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Qthread>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Qthread>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Qthread>(loop_count,3) ) );
	-
	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Qthread>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Qthread>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Qthread>(loop_count,3) ) );
	-
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Qthread>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Qthread>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Qthread>(loop_count,3) ) );
	-
	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Qthread>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Qthread>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Qthread>(loop_count,3) ) );
	-
	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Qthread>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Qthread>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Qthread>(loop_count,3) ) );
	-
	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Qthread>(100,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Qthread>(100,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Qthread>(100,3) ) );
	-
	-#if defined( KOKKOS_ENABLE_ASM )
	- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Qthread>(100,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Qthread>(100,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Qthread>(100,3) ) );
	-#endif
	-
	-}
	-
	-TEST_F( qthread , view_remap )
	-{
	- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
	-
	- typedef Kokkos::View< double*[N1][N2][N3] ,
	- Kokkos::LayoutRight ,
	- Kokkos::Qthread > output_type ;
	-
	- typedef Kokkos::View< int**[N2][N3] ,
	- Kokkos::LayoutLeft ,
	- Kokkos::Qthread > input_type ;
	-
	- typedef Kokkos::View< int*[N0][N2][N3] ,
	- Kokkos::LayoutLeft ,
	- Kokkos::Qthread > diff_type ;
	-
	- output_type output( "output" , N0 );
	- input_type input ( "input" , N0 , N1 );
	- diff_type diff ( "diff" , N0 );
	-
	- int value = 0 ;
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
	- input(i0,i1,i2,i3) = ++value ;
	- }}}}
	-
	- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
	- Kokkos::deep_copy( output , input );
	-
	- value = 0 ;
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
	- ++value ;
	- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
	- }}}}
	-}
	-
	-//----------------------------------------------------------------------------
	-
	-TEST_F( qthread , view_aggregate )
	-{
	- TestViewAggregate< Kokkos::Qthread >();
	-}
	-
	-//----------------------------------------------------------------------------
	-
	-TEST_F( qthread , scan )
	-{
	- TestScan< Kokkos::Qthread >::test_range( 1 , 1000 );
	- TestScan< Kokkos::Qthread >( 1000000 );
	- TestScan< Kokkos::Qthread >( 10000000 );
	- Kokkos::Qthread::fence();
	-}
	-
	-TEST_F( qthread, team_shared ) {
	- TestSharedTeam< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >();
	-}
	-
	-TEST_F( qthread, shmem_size) {
	- TestShmemSize< Kokkos::Qthread >();
	-}
	-
	-TEST_F( qthread , team_scan )
	-{
	- TestScanTeam< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >( 10 );
	- TestScanTeam< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >( 10000 );
	-}
	-
	-#if 0 /* disable */
	-TEST_F( qthread , team_vector )
	-{
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthread >(0) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthread >(1) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthread >(2) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthread >(3) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthread >(4) ) );
	-}
	-#endif
	-
	-//----------------------------------------------------------------------------
	-
	-TEST_F( qthread , task_policy )
	-{
	- TestTaskScheduler::test_task_dep< Kokkos::Qthread >( 10 );
	- for ( long i = 0 ; i < 25 ; ++i ) TestTaskScheduler::test_fib< Kokkos::Qthread >(i);
	- for ( long i = 0 ; i < 35 ; ++i ) TestTaskScheduler::test_fib2< Kokkos::Qthread >(i);
	-}
	-
	-TEST_F( qthread , task_team )
	-{
	- TestTaskScheduler::test_task_team< Kokkos::Qthread >(1000);
	-}
	-
	-//----------------------------------------------------------------------------
	-
	-} // namespace test
	-
	diff --git a/lib/kokkos/core/unit_test/TestRange.hpp b/lib/kokkos/core/unit_test/TestRange.hpp
	index e342e844c..90411a57a 100644
	--- a/lib/kokkos/core/unit_test/TestRange.hpp
	+++ b/lib/kokkos/core/unit_test/TestRange.hpp
	@@ -1,242 +1,248 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <stdio.h>

	#include <Kokkos_Core.hpp>

	-/--------------------------------------------------------------------------/
	-
	namespace Test {
	+
	namespace {

	template< class ExecSpace, class ScheduleType >
	struct TestRange {
	+ typedef int value_type; ///< typedef required for the parallel_reduce

	- typedef int value_type ; ///< typedef required for the parallel_reduce
	-
	- typedef Kokkos::View<int*,ExecSpace> view_type ;
	+ typedef Kokkos::View< int*, ExecSpace > view_type;

	- view_type m_flags ;
	+ view_type m_flags;

	struct VerifyInitTag {};
	struct ResetTag {};
	struct VerifyResetTag {};

	TestRange( const size_t N )
	- : m_flags( Kokkos::ViewAllocateWithoutInitializing("flags"), N )
	+ : m_flags( Kokkos::ViewAllocateWithoutInitializing( "flags" ), N )
	{}

	static void test_for( const size_t N )
	- {
	- TestRange functor(N);
	+ {
	+ TestRange functor( N );

	- typename view_type::HostMirror host_flags = Kokkos::create_mirror_view( functor.m_flags );
	+ typename view_type::HostMirror host_flags = Kokkos::create_mirror_view( functor.m_flags );

	- Kokkos::parallel_for( Kokkos::RangePolicy<ExecSpace,ScheduleType>(0,N) , functor );
	- Kokkos::parallel_for( Kokkos::RangePolicy<ExecSpace,ScheduleType,VerifyInitTag>(0,N) , functor );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace, ScheduleType >( 0, N ), functor );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace, ScheduleType, VerifyInitTag >( 0, N ), functor );

	- Kokkos::deep_copy( host_flags , functor.m_flags );
	+ Kokkos::deep_copy( host_flags, functor.m_flags );

	- size_t error_count = 0 ;
	- for ( size_t i = 0 ; i < N ; ++i ) {
	- if ( int(i) != host_flags(i) ) ++error_count ;
	- }
	- ASSERT_EQ( error_count , size_t(0) );
	+ size_t error_count = 0;
	+ for ( size_t i = 0; i < N; ++i ) {
	+ if ( int( i ) != host_flags( i ) ) ++error_count;
	+ }
	+ ASSERT_EQ( error_count, size_t( 0 ) );

	- Kokkos::parallel_for( Kokkos::RangePolicy<ExecSpace,ScheduleType,ResetTag>(0,N) , functor );
	- Kokkos::parallel_for( std::string("TestKernelFor") , Kokkos::RangePolicy<ExecSpace,ScheduleType,VerifyResetTag>(0,N) , functor );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace, ScheduleType, ResetTag >( 0, N ), functor );
	+ Kokkos::parallel_for( std::string( "TestKernelFor" ), Kokkos::RangePolicy< ExecSpace, ScheduleType, VerifyResetTag >( 0, N ), functor );

	- Kokkos::deep_copy( host_flags , functor.m_flags );
	+ Kokkos::deep_copy( host_flags, functor.m_flags );

	- error_count = 0 ;
	- for ( size_t i = 0 ; i < N ; ++i ) {
	- if ( int(2*i) != host_flags(i) ) ++error_count ;
	- }
	- ASSERT_EQ( error_count , size_t(0) );
	+ error_count = 0;
	+ for ( size_t i = 0; i < N; ++i ) {
	+ if ( int( 2 * i ) != host_flags( i ) ) ++error_count;
	}
	+ ASSERT_EQ( error_count, size_t( 0 ) );
	+ }

	KOKKOS_INLINE_FUNCTION
	void operator()( const int i ) const
	- { m_flags(i) = i ; }
	+ { m_flags( i ) = i; }

	KOKKOS_INLINE_FUNCTION
	- void operator()( const VerifyInitTag & , const int i ) const
	- { if ( i != m_flags(i) ) { printf("TestRange::test_for error at %d != %d\n",i,m_flags(i)); } }
	+ void operator()( const VerifyInitTag &, const int i ) const
	+ {
	+ if ( i != m_flags( i ) ) {
	+ printf( "TestRange::test_for error at %d != %d\n", i, m_flags( i ) );
	+ }
	+ }

	KOKKOS_INLINE_FUNCTION
	- void operator()( const ResetTag & , const int i ) const
	- { m_flags(i) = 2 * m_flags(i); }
	+ void operator()( const ResetTag &, const int i ) const
	+ { m_flags( i ) = 2 * m_flags( i ); }

	KOKKOS_INLINE_FUNCTION
	- void operator()( const VerifyResetTag & , const int i ) const
	- { if ( 2 * i != m_flags(i) ) { printf("TestRange::test_for error at %d != %d\n",i,m_flags(i)); } }
	+ void operator()( const VerifyResetTag &, const int i ) const
	+ {
	+ if ( 2 * i != m_flags( i ) )
	+ {
	+ printf( "TestRange::test_for error at %d != %d\n", i, m_flags( i ) );
	+ }
	+ }

	//----------------------------------------

	struct OffsetTag {};

	static void test_reduce( const size_t N )
	- {
	- TestRange functor(N);
	- int total = 0 ;
	+ {
	+ TestRange functor( N );
	+ int total = 0;

	- Kokkos::parallel_for( Kokkos::RangePolicy<ExecSpace,ScheduleType>(0,N) , functor );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace, ScheduleType >( 0, N ), functor );

	- Kokkos::parallel_reduce( "TestKernelReduce" , Kokkos::RangePolicy<ExecSpace,ScheduleType>(0,N) , functor , total );
	- // sum( 0 .. N-1 )
	- ASSERT_EQ( size_t((N-1)*(N)/2) , size_t(total) );
	+ Kokkos::parallel_reduce( "TestKernelReduce", Kokkos::RangePolicy< ExecSpace, ScheduleType >( 0, N ), functor, total );
	+ // sum( 0 .. N-1 )
	+ ASSERT_EQ( size_t( ( N - 1 ) * ( N ) / 2 ), size_t( total ) );

	- Kokkos::parallel_reduce( Kokkos::RangePolicy<ExecSpace,ScheduleType,OffsetTag>(0,N) , functor , total );
	- // sum( 1 .. N )
	- ASSERT_EQ( size_t((N)*(N+1)/2) , size_t(total) );
	- }
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace, ScheduleType, OffsetTag>( 0, N ), functor, total );
	+ // sum( 1 .. N )
	+ ASSERT_EQ( size_t( ( N ) * ( N + 1 ) / 2 ), size_t( total ) );
	+ }

	KOKKOS_INLINE_FUNCTION
	- void operator()( const int i , value_type & update ) const
	- { update += m_flags(i); }
	+ void operator()( const int i, value_type & update ) const
	+ { update += m_flags( i ); }

	KOKKOS_INLINE_FUNCTION
	- void operator()( const OffsetTag & , const int i , value_type & update ) const
	- { update += 1 + m_flags(i); }
	+ void operator()( const OffsetTag &, const int i, value_type & update ) const
	+ { update += 1 + m_flags( i ); }

	//----------------------------------------

	static void test_scan( const size_t N )
	- {
	- TestRange functor(N);
	+ {
	+ TestRange functor( N );

	- Kokkos::parallel_for( Kokkos::RangePolicy<ExecSpace,ScheduleType>(0,N) , functor );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace, ScheduleType >( 0, N ), functor );

	- Kokkos::parallel_scan( "TestKernelScan" , Kokkos::RangePolicy<ExecSpace,ScheduleType,OffsetTag>(0,N) , functor );
	- }
	+ Kokkos::parallel_scan( "TestKernelScan", Kokkos::RangePolicy< ExecSpace, ScheduleType, OffsetTag>( 0, N ), functor );
	+ }

	KOKKOS_INLINE_FUNCTION
	- void operator()( const OffsetTag & , const int i , value_type & update , bool final ) const
	- {
	- update += m_flags(i);
	+ void operator()( const OffsetTag &, const int i, value_type & update, bool final ) const
	+ {
	+ update += m_flags( i );

	- if ( final ) {
	- if ( update != (i*(i+1))/2 ) {
	- printf("TestRange::test_scan error %d : %d != %d\n",i,(i*(i+1))/2,m_flags(i));
	- }
	+ if ( final ) {
	+ if ( update != ( i * ( i + 1 ) ) / 2 ) {
	+ printf( "TestRange::test_scan error %d : %d != %d\n", i, ( i * ( i + 1 ) ) / 2, m_flags( i ) );
	}
	}
	+ }

	- static void test_dynamic_policy( const size_t N ) {
	-
	-
	- typedef Kokkos::RangePolicy<ExecSpace,Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
	+ static void test_dynamic_policy( const size_t N )
	+ {
	+ typedef Kokkos::RangePolicy< ExecSpace, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;

	{
	- Kokkos::View<size_t*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Atomic> > count("Count",ExecSpace::concurrency());
	- Kokkos::View<int*,ExecSpace> a("A",N);
	-
	- Kokkos::parallel_for( policy_t(0,N),
	- KOKKOS_LAMBDA (const typename policy_t::member_type& i) {
	- for(int k=0; k<(i<N/2?1:10000); k++ )
	- a(i)++;
	- count(ExecSpace::hardware_thread_id())++;
	+ Kokkos::View< size_t*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Atomic> > count( "Count", ExecSpace::concurrency() );
	+ Kokkos::View< int*, ExecSpace > a( "A", N );
	+
	+ Kokkos::parallel_for( policy_t( 0, N ), KOKKOS_LAMBDA ( const typename policy_t::member_type& i ) {
	+ for ( int k = 0; k < ( i < N / 2 ? 1 : 10000 ); k++ ) {
	+ a( i )++;
	+ }
	+ count( ExecSpace::hardware_thread_id() )++;
	});

	int error = 0;
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N), KOKKOS_LAMBDA(const typename policy_t::member_type& i, int& lsum) {
	- lsum += ( a(i)!= (i<N/2?1:10000) );
	- },error);
	- ASSERT_EQ(error,0);
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), KOKKOS_LAMBDA( const typename policy_t::member_type & i, int & lsum ) {
	+ lsum += ( a( i ) != ( i < N / 2 ? 1 : 10000 ) );
	+ }, error );
	+ ASSERT_EQ( error, 0 );

	- if( ( ExecSpace::concurrency()>(int)1) && (N>static_cast<size_t>(4*ExecSpace::concurrency())) ) {
	+ if ( ( ExecSpace::concurrency() > (int) 1 ) && ( N > static_cast<size_t>( 4 * ExecSpace::concurrency() ) ) ) {
	size_t min = N;
	size_t max = 0;
	- for(int t=0; t<ExecSpace::concurrency(); t++) {
	- if(count(t)<min) min = count(t);
	- if(count(t)>max) max = count(t);
	+ for ( int t = 0; t < ExecSpace::concurrency(); t++ ) {
	+ if ( count( t ) < min ) min = count( t );
	+ if ( count( t ) > max ) max = count( t );
	}
	- ASSERT_TRUE(min<max);
	- //if(ExecSpace::concurrency()>2)
	- // ASSERT_TRUE(2*min<max);
	+ ASSERT_TRUE( min < max );
	+
	+ //if ( ExecSpace::concurrency() > 2 ) {
	+ // ASSERT_TRUE( 2 * min < max );
	+ //}
	}
	-
	}

	{
	- Kokkos::View<size_t*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Atomic> > count("Count",ExecSpace::concurrency());
	- Kokkos::View<int*,ExecSpace> a("A",N);
	+ Kokkos::View< size_t*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Atomic> > count( "Count", ExecSpace::concurrency() );
	+ Kokkos::View< int*, ExecSpace> a( "A", N );

	int sum = 0;
	- Kokkos::parallel_reduce( policy_t(0,N),
	- KOKKOS_LAMBDA (const typename policy_t::member_type& i, int& lsum) {
	- for(int k=0; k<(i<N/2?1:10000); k++ )
	- a(i)++;
	- count(ExecSpace::hardware_thread_id())++;
	+ Kokkos::parallel_reduce( policy_t( 0, N ), KOKKOS_LAMBDA( const typename policy_t::member_type & i, int & lsum ) {
	+ for ( int k = 0; k < ( i < N / 2 ? 1 : 10000 ); k++ ) {
	+ a( i )++;
	+ }
	+ count( ExecSpace::hardware_thread_id() )++;
	lsum++;
	- },sum);
	- ASSERT_EQ(sum,N);
	+ }, sum );
	+ ASSERT_EQ( sum, N );

	int error = 0;
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N), KOKKOS_LAMBDA(const typename policy_t::member_type& i, int& lsum) {
	- lsum += ( a(i)!= (i<N/2?1:10000) );
	- },error);
	- ASSERT_EQ(error,0);
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), KOKKOS_LAMBDA( const typename policy_t::member_type & i, int & lsum ) {
	+ lsum += ( a( i ) != ( i < N / 2 ? 1 : 10000 ) );
	+ }, error );
	+ ASSERT_EQ( error, 0 );

	- if( ( ExecSpace::concurrency()>(int)1) && (N>static_cast<size_t>(4*ExecSpace::concurrency())) ) {
	+ if ( ( ExecSpace::concurrency() > (int) 1 ) && ( N > static_cast<size_t>( 4 * ExecSpace::concurrency() ) ) ) {
	size_t min = N;
	size_t max = 0;
	- for(int t=0; t<ExecSpace::concurrency(); t++) {
	- if(count(t)<min) min = count(t);
	- if(count(t)>max) max = count(t);
	+ for ( int t = 0; t < ExecSpace::concurrency(); t++ ) {
	+ if ( count( t ) < min ) min = count( t );
	+ if ( count( t ) > max ) max = count( t );
	}
	- ASSERT_TRUE(min<max);
	- //if(ExecSpace::concurrency()>2)
	- // ASSERT_TRUE(2*min<max);
	+ ASSERT_TRUE( min < max );
	+
	+ //if ( ExecSpace::concurrency() > 2 ) {
	+ // ASSERT_TRUE( 2 * min < max );
	+ //}
	}
	}
	-
	}
	};

	-} /* namespace */
	-} /* namespace Test */
	-
	-/--------------------------------------------------------------------------/
	+} // namespace

	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/TestReduce.hpp b/lib/kokkos/core/unit_test/TestReduce.hpp
	index 645fc9e31..7e77dadf6 100644
	--- a/lib/kokkos/core/unit_test/TestReduce.hpp
	+++ b/lib/kokkos/core/unit_test/TestReduce.hpp
	@@ -1,1907 +1,2062 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <stdexcept>
	#include <sstream>
	#include <iostream>
	#include <limits>

	#include <Kokkos_Core.hpp>

	-/--------------------------------------------------------------------------/
	-
	namespace Test {

	-template< typename ScalarType , class DeviceType >
	+template< typename ScalarType, class DeviceType >
	class ReduceFunctor
	{
	public:
	- typedef DeviceType execution_space ;
	- typedef typename execution_space::size_type size_type ;
	+ typedef DeviceType execution_space;
	+ typedef typename execution_space::size_type size_type;

	struct value_type {
	- ScalarType value[3] ;
	+ ScalarType value[3];
	};

	- const size_type nwork ;
	+ const size_type nwork;

	- ReduceFunctor( const size_type & arg_nwork ) : nwork( arg_nwork ) {}
	+ ReduceFunctor( const size_type & arg_nwork )
	+ : nwork( arg_nwork ) {}

	ReduceFunctor( const ReduceFunctor & rhs )
	: nwork( rhs.nwork ) {}

	/*
	KOKKOS_INLINE_FUNCTION
	void init( value_type & dst ) const
	{
	- dst.value[0] = 0 ;
	- dst.value[1] = 0 ;
	- dst.value[2] = 0 ;
	+ dst.value[0] = 0;
	+ dst.value[1] = 0;
	+ dst.value[2] = 0;
	}
	*/

	KOKKOS_INLINE_FUNCTION
	- void join( volatile value_type & dst ,
	+ void join( volatile value_type & dst,
	const volatile value_type & src ) const
	{
	- dst.value[0] += src.value[0] ;
	- dst.value[1] += src.value[1] ;
	- dst.value[2] += src.value[2] ;
	+ dst.value[0] += src.value[0];
	+ dst.value[1] += src.value[1];
	+ dst.value[2] += src.value[2];
	}

	KOKKOS_INLINE_FUNCTION
	- void operator()( size_type iwork , value_type & dst ) const
	+ void operator()( size_type iwork, value_type & dst ) const
	{
	- dst.value[0] += 1 ;
	- dst.value[1] += iwork + 1 ;
	- dst.value[2] += nwork - iwork ;
	+ dst.value[0] += 1;
	+ dst.value[1] += iwork + 1;
	+ dst.value[2] += nwork - iwork;
	}
	};

	template< class DeviceType >
	-class ReduceFunctorFinal : public ReduceFunctor< long , DeviceType > {
	+class ReduceFunctorFinal : public ReduceFunctor< long, DeviceType > {
	public:
	-
	- typedef typename ReduceFunctor< long , DeviceType >::value_type value_type ;
	+ typedef typename ReduceFunctor< long, DeviceType >::value_type value_type;

	ReduceFunctorFinal( const size_t n )
	- : ReduceFunctor<long,DeviceType>(n)
	- {}
	+ : ReduceFunctor< long, DeviceType >( n ) {}

	KOKKOS_INLINE_FUNCTION
	void final( value_type & dst ) const
	{
	- dst.value[0] = - dst.value[0] ;
	- dst.value[1] = - dst.value[1] ;
	- dst.value[2] = - dst.value[2] ;
	+ dst.value[0] = -dst.value[0];
	+ dst.value[1] = -dst.value[1];
	+ dst.value[2] = -dst.value[2];
	}
	};

	-template< typename ScalarType , class DeviceType >
	+template< typename ScalarType, class DeviceType >
	class RuntimeReduceFunctor
	{
	public:
	// Required for functor:
	- typedef DeviceType execution_space ;
	- typedef ScalarType value_type[] ;
	- const unsigned value_count ;
	-
	+ typedef DeviceType execution_space;
	+ typedef ScalarType value_type[];
	+ const unsigned value_count;

	// Unit test details:

	- typedef typename execution_space::size_type size_type ;
	+ typedef typename execution_space::size_type size_type;

	- const size_type nwork ;
	+ const size_type nwork;

	- RuntimeReduceFunctor( const size_type arg_nwork ,
	+ RuntimeReduceFunctor( const size_type arg_nwork,
	const size_type arg_count )
	: value_count( arg_count )
	, nwork( arg_nwork ) {}

	KOKKOS_INLINE_FUNCTION
	void init( ScalarType dst[] ) const
	{
	- for ( unsigned i = 0 ; i < value_count ; ++i ) dst[i] = 0 ;
	+ for ( unsigned i = 0; i < value_count; ++i ) dst[i] = 0;
	}

	KOKKOS_INLINE_FUNCTION
	- void join( volatile ScalarType dst[] ,
	+ void join( volatile ScalarType dst[],
	const volatile ScalarType src[] ) const
	{
	- for ( unsigned i = 0 ; i < value_count ; ++i ) dst[i] += src[i] ;
	+ for ( unsigned i = 0; i < value_count; ++i ) dst[i] += src[i];
	}

	KOKKOS_INLINE_FUNCTION
	- void operator()( size_type iwork , ScalarType dst[] ) const
	+ void operator()( size_type iwork, ScalarType dst[] ) const
	{
	- const size_type tmp[3] = { 1 , iwork + 1 , nwork - iwork };
	+ const size_type tmp[3] = { 1, iwork + 1, nwork - iwork };

	- for ( size_type i = 0 ; i < value_count ; ++i ) {
	+ for ( size_type i = 0; i < value_count; ++i ) {
	dst[i] += tmp[ i % 3 ];
	}
	}
	};

	-template< typename ScalarType , class DeviceType >
	+template< typename ScalarType, class DeviceType >
	class RuntimeReduceMinMax
	{
	public:
	// Required for functor:
	- typedef DeviceType execution_space ;
	- typedef ScalarType value_type[] ;
	- const unsigned value_count ;
	+ typedef DeviceType execution_space;
	+ typedef ScalarType value_type[];
	+ const unsigned value_count;

	// Unit test details:

	- typedef typename execution_space::size_type size_type ;
	+ typedef typename execution_space::size_type size_type;

	- const size_type nwork ;
	- const ScalarType amin ;
	- const ScalarType amax ;
	+ const size_type nwork;
	+ const ScalarType amin;
	+ const ScalarType amax;

	- RuntimeReduceMinMax( const size_type arg_nwork ,
	+ RuntimeReduceMinMax( const size_type arg_nwork,
	const size_type arg_count )
	: value_count( arg_count )
	, nwork( arg_nwork )
	- , amin( std::numeric_limits<ScalarType>::min() )
	- , amax( std::numeric_limits<ScalarType>::max() )
	+ , amin( std::numeric_limits< ScalarType >::min() )
	+ , amax( std::numeric_limits< ScalarType >::max() )
	{}

	KOKKOS_INLINE_FUNCTION
	void init( ScalarType dst[] ) const
	{
	- for ( unsigned i = 0 ; i < value_count ; ++i ) {
	- dst[i] = i % 2 ? amax : amin ;
	+ for ( unsigned i = 0; i < value_count; ++i ) {
	+ dst[i] = i % 2 ? amax : amin;
	}
	}

	KOKKOS_INLINE_FUNCTION
	- void join( volatile ScalarType dst[] ,
	+ void join( volatile ScalarType dst[],
	const volatile ScalarType src[] ) const
	{
	- for ( unsigned i = 0 ; i < value_count ; ++i ) {
	+ for ( unsigned i = 0; i < value_count; ++i ) {
	dst[i] = i % 2 ? ( dst[i] < src[i] ? dst[i] : src[i] ) // min
	: ( dst[i] > src[i] ? dst[i] : src[i] ); // max
	}
	}

	KOKKOS_INLINE_FUNCTION
	- void operator()( size_type iwork , ScalarType dst[] ) const
	+ void operator()( size_type iwork, ScalarType dst[] ) const
	{
	- const ScalarType tmp[2] = { ScalarType(iwork + 1)
	- , ScalarType(nwork - iwork) };
	+ const ScalarType tmp[2] = { ScalarType( iwork + 1 )
	+ , ScalarType( nwork - iwork ) };

	- for ( size_type i = 0 ; i < value_count ; ++i ) {
	- dst[i] = i % 2 ? ( dst[i] < tmp[i%2] ? dst[i] : tmp[i%2] )
	- : ( dst[i] > tmp[i%2] ? dst[i] : tmp[i%2] );
	+ for ( size_type i = 0; i < value_count; ++i ) {
	+ dst[i] = i % 2 ? ( dst[i] < tmp[i % 2] ? dst[i] : tmp[i % 2] )
	+ : ( dst[i] > tmp[i % 2] ? dst[i] : tmp[i % 2] );
	}
	}
	};

	template< class DeviceType >
	-class RuntimeReduceFunctorFinal : public RuntimeReduceFunctor< long , DeviceType > {
	+class RuntimeReduceFunctorFinal : public RuntimeReduceFunctor< long, DeviceType > {
	public:
	+ typedef RuntimeReduceFunctor< long, DeviceType > base_type;
	+ typedef typename base_type::value_type value_type;
	+ typedef long scalar_type;

	- typedef RuntimeReduceFunctor< long , DeviceType > base_type ;
	- typedef typename base_type::value_type value_type ;
	- typedef long scalar_type ;
	-
	- RuntimeReduceFunctorFinal( const size_t theNwork , const size_t count ) : base_type(theNwork,count) {}
	+ RuntimeReduceFunctorFinal( const size_t theNwork, const size_t count )
	+ : base_type( theNwork, count ) {}

	KOKKOS_INLINE_FUNCTION
	void final( value_type dst ) const
	{
	- for ( unsigned i = 0 ; i < base_type::value_count ; ++i ) {
	- dst[i] = - dst[i] ;
	+ for ( unsigned i = 0; i < base_type::value_count; ++i ) {
	+ dst[i] = -dst[i];
	}
	}
	};
	+
	} // namespace Test

	namespace {

	-template< typename ScalarType , class DeviceType >
	+template< typename ScalarType, class DeviceType >
	class TestReduce
	{
	public:
	- typedef DeviceType execution_space ;
	- typedef typename execution_space::size_type size_type ;
	-
	- //------------------------------------
	+ typedef DeviceType execution_space;
	+ typedef typename execution_space::size_type size_type;

	TestReduce( const size_type & nwork )
	{
	- run_test(nwork);
	- run_test_final(nwork);
	+ run_test( nwork );
	+ run_test_final( nwork );
	}

	void run_test( const size_type & nwork )
	{
	- typedef Test::ReduceFunctor< ScalarType , execution_space > functor_type ;
	- typedef typename functor_type::value_type value_type ;
	+ typedef Test::ReduceFunctor< ScalarType, execution_space > functor_type;
	+ typedef typename functor_type::value_type value_type;

	enum { Count = 3 };
	enum { Repeat = 100 };

	value_type result[ Repeat ];

	- const unsigned long nw = nwork ;
	- const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
	- : (nw/2) * ( nw + 1 );
	+ const unsigned long nw = nwork;
	+ const unsigned long nsum = nw % 2 ? nw * ( ( nw + 1 ) / 2 )
	+ : ( nw / 2 ) * ( nw + 1 );

	- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
	- Kokkos::parallel_reduce( nwork , functor_type(nwork) , result[i] );
	+ for ( unsigned i = 0; i < Repeat; ++i ) {
	+ Kokkos::parallel_reduce( nwork, functor_type( nwork ), result[i] );
	}

	- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
	- for ( unsigned j = 0 ; j < Count ; ++j ) {
	- const unsigned long correct = 0 == j % 3 ? nw : nsum ;
	- ASSERT_EQ( (ScalarType) correct , result[i].value[j] );
	+ for ( unsigned i = 0; i < Repeat; ++i ) {
	+ for ( unsigned j = 0; j < Count; ++j ) {
	+ const unsigned long correct = 0 == j % 3 ? nw : nsum;
	+ ASSERT_EQ( (ScalarType) correct, result[i].value[j] );
	}
	}
	}

	void run_test_final( const size_type & nwork )
	{
	- typedef Test::ReduceFunctorFinal< execution_space > functor_type ;
	- typedef typename functor_type::value_type value_type ;
	+ typedef Test::ReduceFunctorFinal< execution_space > functor_type;
	+ typedef typename functor_type::value_type value_type;

	enum { Count = 3 };
	enum { Repeat = 100 };

	value_type result[ Repeat ];

	- const unsigned long nw = nwork ;
	- const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
	- : (nw/2) * ( nw + 1 );
	+ const unsigned long nw = nwork;
	+ const unsigned long nsum = nw % 2 ? nw * ( ( nw + 1 ) / 2 )
	+ : ( nw / 2 ) * ( nw + 1 );

	- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
	- if(i%2==0)
	- Kokkos::parallel_reduce( nwork , functor_type(nwork) , result[i] );
	- else
	- Kokkos::parallel_reduce( "Reduce", nwork , functor_type(nwork) , result[i] );
	+ for ( unsigned i = 0; i < Repeat; ++i ) {
	+ if ( i % 2 == 0 ) {
	+ Kokkos::parallel_reduce( nwork, functor_type( nwork ), result[i] );
	+ }
	+ else {
	+ Kokkos::parallel_reduce( "Reduce", nwork, functor_type( nwork ), result[i] );
	+ }
	}

	- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
	- for ( unsigned j = 0 ; j < Count ; ++j ) {
	- const unsigned long correct = 0 == j % 3 ? nw : nsum ;
	- ASSERT_EQ( (ScalarType) correct , - result[i].value[j] );
	+ for ( unsigned i = 0; i < Repeat; ++i ) {
	+ for ( unsigned j = 0; j < Count; ++j ) {
	+ const unsigned long correct = 0 == j % 3 ? nw : nsum;
	+ ASSERT_EQ( (ScalarType) correct, -result[i].value[j] );
	}
	}
	}
	};

	-template< typename ScalarType , class DeviceType >
	+template< typename ScalarType, class DeviceType >
	class TestReduceDynamic
	{
	public:
	- typedef DeviceType execution_space ;
	- typedef typename execution_space::size_type size_type ;
	-
	- //------------------------------------
	+ typedef DeviceType execution_space;
	+ typedef typename execution_space::size_type size_type;

	TestReduceDynamic( const size_type nwork )
	{
	- run_test_dynamic(nwork);
	- run_test_dynamic_minmax(nwork);
	- run_test_dynamic_final(nwork);
	+ run_test_dynamic( nwork );
	+ run_test_dynamic_minmax( nwork );
	+ run_test_dynamic_final( nwork );
	}

	void run_test_dynamic( const size_type nwork )
	{
	- typedef Test::RuntimeReduceFunctor< ScalarType , execution_space > functor_type ;
	+ typedef Test::RuntimeReduceFunctor< ScalarType, execution_space > functor_type;

	enum { Count = 3 };
	enum { Repeat = 100 };

	- ScalarType result[ Repeat ][ Count ] ;
	+ ScalarType result[ Repeat ][ Count ];

	- const unsigned long nw = nwork ;
	- const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
	- : (nw/2) * ( nw + 1 );
	+ const unsigned long nw = nwork;
	+ const unsigned long nsum = nw % 2 ? nw * ( ( nw + 1 ) / 2 )
	+ : ( nw / 2 ) * ( nw + 1 );

	- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
	- if(i%2==0)
	- Kokkos::parallel_reduce( nwork , functor_type(nwork,Count) , result[i] );
	- else
	- Kokkos::parallel_reduce( "Reduce", nwork , functor_type(nwork,Count) , result[i] );
	+ for ( unsigned i = 0; i < Repeat; ++i ) {
	+ if ( i % 2 == 0 ) {
	+ Kokkos::parallel_reduce( nwork, functor_type( nwork, Count ), result[i] );
	+ }
	+ else {
	+ Kokkos::parallel_reduce( "Reduce", nwork, functor_type( nwork, Count ), result[i] );
	+ }
	}

	- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
	- for ( unsigned j = 0 ; j < Count ; ++j ) {
	- const unsigned long correct = 0 == j % 3 ? nw : nsum ;
	- ASSERT_EQ( (ScalarType) correct , result[i][j] );
	+ for ( unsigned i = 0; i < Repeat; ++i ) {
	+ for ( unsigned j = 0; j < Count; ++j ) {
	+ const unsigned long correct = 0 == j % 3 ? nw : nsum;
	+ ASSERT_EQ( (ScalarType) correct, result[i][j] );
	}
	}
	}

	void run_test_dynamic_minmax( const size_type nwork )
	{
	- typedef Test::RuntimeReduceMinMax< ScalarType , execution_space > functor_type ;
	+ typedef Test::RuntimeReduceMinMax< ScalarType, execution_space > functor_type;

	enum { Count = 2 };
	enum { Repeat = 100 };

	- ScalarType result[ Repeat ][ Count ] ;
	+ ScalarType result[ Repeat ][ Count ];

	- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
	- if(i%2==0)
	- Kokkos::parallel_reduce( nwork , functor_type(nwork,Count) , result[i] );
	- else
	- Kokkos::parallel_reduce( "Reduce", nwork , functor_type(nwork,Count) , result[i] );
	+ for ( unsigned i = 0; i < Repeat; ++i ) {
	+ if ( i % 2 == 0 ) {
	+ Kokkos::parallel_reduce( nwork, functor_type( nwork, Count ), result[i] );
	+ }
	+ else {
	+ Kokkos::parallel_reduce( "Reduce", nwork, functor_type( nwork, Count ), result[i] );
	+ }
	}

	- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
	- for ( unsigned j = 0 ; j < Count ; ++j ) {
	+ for ( unsigned i = 0; i < Repeat; ++i ) {
	+ for ( unsigned j = 0; j < Count; ++j ) {
	if ( nwork == 0 )
	{
	- ScalarType amin( std::numeric_limits<ScalarType>::min() );
	- ScalarType amax( std::numeric_limits<ScalarType>::max() );
	- const ScalarType correct = (j%2) ? amax : amin;
	- ASSERT_EQ( (ScalarType) correct , result[i][j] );
	- } else {
	- const unsigned long correct = j % 2 ? 1 : nwork ;
	- ASSERT_EQ( (ScalarType) correct , result[i][j] );
	+ ScalarType amin( std::numeric_limits< ScalarType >::min() );
	+ ScalarType amax( std::numeric_limits< ScalarType >::max() );
	+ const ScalarType correct = ( j % 2 ) ? amax : amin;
	+ ASSERT_EQ( (ScalarType) correct, result[i][j] );
	+ }
	+ else {
	+ const unsigned long correct = j % 2 ? 1 : nwork;
	+ ASSERT_EQ( (ScalarType) correct, result[i][j] );
	}
	}
	}
	}

	void run_test_dynamic_final( const size_type nwork )
	{
	- typedef Test::RuntimeReduceFunctorFinal< execution_space > functor_type ;
	+ typedef Test::RuntimeReduceFunctorFinal< execution_space > functor_type;

	enum { Count = 3 };
	enum { Repeat = 100 };

	- typename functor_type::scalar_type result[ Repeat ][ Count ] ;
	+ typename functor_type::scalar_type result[ Repeat ][ Count ];

	- const unsigned long nw = nwork ;
	- const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
	- : (nw/2) * ( nw + 1 );
	+ const unsigned long nw = nwork;
	+ const unsigned long nsum = nw % 2 ? nw * ( ( nw + 1 ) / 2 )
	+ : ( nw / 2 ) * ( nw + 1 );

	- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
	- if(i%2==0)
	- Kokkos::parallel_reduce( nwork , functor_type(nwork,Count) , result[i] );
	- else
	- Kokkos::parallel_reduce( "TestKernelReduce" , nwork , functor_type(nwork,Count) , result[i] );
	+ for ( unsigned i = 0; i < Repeat; ++i ) {
	+ if ( i % 2 == 0 ) {
	+ Kokkos::parallel_reduce( nwork, functor_type( nwork, Count ), result[i] );
	+ }
	+ else {
	+ Kokkos::parallel_reduce( "TestKernelReduce", nwork, functor_type( nwork, Count ), result[i] );
	+ }

	}

	- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
	- for ( unsigned j = 0 ; j < Count ; ++j ) {
	- const unsigned long correct = 0 == j % 3 ? nw : nsum ;
	- ASSERT_EQ( (ScalarType) correct , - result[i][j] );
	+ for ( unsigned i = 0; i < Repeat; ++i ) {
	+ for ( unsigned j = 0; j < Count; ++j ) {
	+ const unsigned long correct = 0 == j % 3 ? nw : nsum;
	+ ASSERT_EQ( (ScalarType) correct, -result[i][j] );
	}
	}
	}
	};

	-template< typename ScalarType , class DeviceType >
	+template< typename ScalarType, class DeviceType >
	class TestReduceDynamicView
	{
	public:
	- typedef DeviceType execution_space ;
	- typedef typename execution_space::size_type size_type ;
	-
	- //------------------------------------
	+ typedef DeviceType execution_space;
	+ typedef typename execution_space::size_type size_type;

	TestReduceDynamicView( const size_type nwork )
	{
	- run_test_dynamic_view(nwork);
	+ run_test_dynamic_view( nwork );
	}

	void run_test_dynamic_view( const size_type nwork )
	{
	- typedef Test::RuntimeReduceFunctor< ScalarType , execution_space > functor_type ;
	+ typedef Test::RuntimeReduceFunctor< ScalarType, execution_space > functor_type;

	- typedef Kokkos::View< ScalarType* , DeviceType > result_type ;
	- typedef typename result_type::HostMirror result_host_type ;
	+ typedef Kokkos::View< ScalarType*, DeviceType > result_type;
	+ typedef typename result_type::HostMirror result_host_type;

	- const unsigned CountLimit = 23 ;
	+ const unsigned CountLimit = 23;

	- const unsigned long nw = nwork ;
	- const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
	- : (nw/2) * ( nw + 1 );
	+ const unsigned long nw = nwork;
	+ const unsigned long nsum = nw % 2 ? nw * ( ( nw + 1 ) / 2 )
	+ : ( nw / 2 ) * ( nw + 1 );

	- for ( unsigned count = 0 ; count < CountLimit ; ++count ) {
	+ for ( unsigned count = 0; count < CountLimit; ++count ) {

	- result_type result("result",count);
	+ result_type result( "result", count );
	result_host_type host_result = Kokkos::create_mirror( result );

	// Test result to host pointer:

	- std::string str("TestKernelReduce");
	- if(count%2==0)
	- Kokkos::parallel_reduce( nw , functor_type(nw,count) , host_result.ptr_on_device() );
	- else
	- Kokkos::parallel_reduce( str , nw , functor_type(nw,count) , host_result.ptr_on_device() );
	+ std::string str( "TestKernelReduce" );
	+ if ( count % 2 == 0 ) {
	+ Kokkos::parallel_reduce( nw, functor_type( nw, count ), host_result.ptr_on_device() );
	+ }
	+ else {
	+ Kokkos::parallel_reduce( str, nw, functor_type( nw, count ), host_result.ptr_on_device() );
	+ }

	- for ( unsigned j = 0 ; j < count ; ++j ) {
	- const unsigned long correct = 0 == j % 3 ? nw : nsum ;
	- ASSERT_EQ( host_result(j), (ScalarType) correct );
	- host_result(j) = 0 ;
	+ for ( unsigned j = 0; j < count; ++j ) {
	+ const unsigned long correct = 0 == j % 3 ? nw : nsum;
	+ ASSERT_EQ( host_result( j ), (ScalarType) correct );
	+ host_result( j ) = 0;
	}
	}
	}
	};
	-}
	+
	+} // namespace

	// Computes y^TAx
	-// (modified from kokkos-tutorials/GTC2016/Exercises/ThreeLevelPar )
	+// ( modified from kokkos-tutorials/GTC2016/Exercises/ThreeLevelPar )

	#if ( ! defined( KOKKOS_ENABLE_CUDA ) ) \|\| defined( KOKKOS_ENABLE_CUDA_LAMBDA )

	-template< typename ScalarType , class DeviceType >
	+template< typename ScalarType, class DeviceType >
	class TestTripleNestedReduce
	{
	public:
	- typedef DeviceType execution_space ;
	- typedef typename execution_space::size_type size_type ;
	+ typedef DeviceType execution_space;
	+ typedef typename execution_space::size_type size_type;

	- //------------------------------------
	-
	- TestTripleNestedReduce( const size_type & nrows , const size_type & ncols
	- , const size_type & team_size , const size_type & vector_length )
	+ TestTripleNestedReduce( const size_type & nrows, const size_type & ncols
	+ , const size_type & team_size, const size_type & vector_length )
	{
	- run_test( nrows , ncols , team_size, vector_length );
	+ run_test( nrows, ncols, team_size, vector_length );
	}

	- void run_test( const size_type & nrows , const size_type & ncols
	+ void run_test( const size_type & nrows, const size_type & ncols
	, const size_type & team_size, const size_type & vector_length )
	{
	//typedef Kokkos::LayoutLeft Layout;
	typedef Kokkos::LayoutRight Layout;

	- typedef Kokkos::View<ScalarType* , DeviceType> ViewVector;
	- typedef Kokkos::View<ScalarType** , Layout , DeviceType> ViewMatrix;
	- ViewVector y( "y" , nrows );
	- ViewVector x( "x" , ncols );
	- ViewMatrix A( "A" , nrows , ncols );
	+ typedef Kokkos::View< ScalarType*, DeviceType > ViewVector;
	+ typedef Kokkos::View< ScalarType**, Layout, DeviceType > ViewMatrix;
	+
	+ ViewVector y( "y", nrows );
	+ ViewVector x( "x", ncols );
	+ ViewMatrix A( "A", nrows, ncols );

	typedef Kokkos::RangePolicy<DeviceType> range_policy;

	- // Initialize y vector
	- Kokkos::parallel_for( range_policy( 0 , nrows ) , KOKKOS_LAMBDA( const int i ) { y( i ) = 1; } );
	+ // Initialize y vector.
	+ Kokkos::parallel_for( range_policy( 0, nrows ), KOKKOS_LAMBDA ( const int i ) { y( i ) = 1; } );

	- // Initialize x vector
	- Kokkos::parallel_for( range_policy( 0 , ncols ) , KOKKOS_LAMBDA( const int i ) { x( i ) = 1; } );
	+ // Initialize x vector.
	+ Kokkos::parallel_for( range_policy( 0, ncols ), KOKKOS_LAMBDA ( const int i ) { x( i ) = 1; } );

	- typedef Kokkos::TeamPolicy<DeviceType> team_policy;
	- typedef typename Kokkos::TeamPolicy<DeviceType>::member_type member_type;
	+ typedef Kokkos::TeamPolicy< DeviceType > team_policy;
	+ typedef typename Kokkos::TeamPolicy< DeviceType >::member_type member_type;

	- // Initialize A matrix, note 2D indexing computation
	- Kokkos::parallel_for( team_policy( nrows , Kokkos::AUTO ) , KOKKOS_LAMBDA( const member_type& teamMember ) {
	+ // Initialize A matrix, note 2D indexing computation.
	+ Kokkos::parallel_for( team_policy( nrows, Kokkos::AUTO ), KOKKOS_LAMBDA ( const member_type & teamMember ) {
	const int j = teamMember.league_rank();
	- Kokkos::parallel_for( Kokkos::TeamThreadRange( teamMember , ncols ) , [&] ( const int i ) {
	- A( j , i ) = 1;
	+ Kokkos::parallel_for( Kokkos::TeamThreadRange( teamMember, ncols ), [&] ( const int i ) {
	+ A( j, i ) = 1;
	} );
	} );

	- // Three level parallelism kernel to force caching of vector x
	+ // Three level parallelism kernel to force caching of vector x.
	ScalarType result = 0.0;
	int chunk_size = 128;
	- Kokkos::parallel_reduce( team_policy( nrows/chunk_size , team_size , vector_length ) , KOKKOS_LAMBDA ( const member_type& teamMember , double &update ) {
	+ Kokkos::parallel_reduce( team_policy( nrows / chunk_size, team_size, vector_length ),
	+ KOKKOS_LAMBDA ( const member_type & teamMember, double & update ) {
	const int row_start = teamMember.league_rank() * chunk_size;
	const int row_end = row_start + chunk_size;
	- Kokkos::parallel_for( Kokkos::TeamThreadRange( teamMember , row_start , row_end ) , [&] ( const int i ) {
	+ Kokkos::parallel_for( Kokkos::TeamThreadRange( teamMember, row_start, row_end ), [&] ( const int i ) {
	ScalarType sum_i = 0.0;
	- Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( teamMember , ncols ) , [&] ( const int j , ScalarType &innerUpdate ) {
	- innerUpdate += A( i , j ) * x( j );
	- } , sum_i );
	- Kokkos::single( Kokkos::PerThread( teamMember ) , [&] () {
	+ Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( teamMember, ncols ), [&] ( const int j, ScalarType &innerUpdate ) {
	+ innerUpdate += A( i, j ) * x( j );
	+ }, sum_i );
	+ Kokkos::single( Kokkos::PerThread( teamMember ), [&] () {
	update += y( i ) * sum_i;
	} );
	} );
	- } , result );
	+ }, result );

	- const ScalarType solution= ( ScalarType ) nrows * ( ScalarType ) ncols;
	- ASSERT_EQ( solution , result );
	+ const ScalarType solution = (ScalarType) nrows * (ScalarType) ncols;
	+ ASSERT_EQ( solution, result );
	}
	};

	-#else /* #if ( ! defined( KOKKOS_ENABLE_CUDA ) ) \|\| defined( KOKKOS_ENABLE_CUDA_LAMBDA ) */
	+#else // #if ( ! defined( KOKKOS_ENABLE_CUDA ) ) \|\| defined( KOKKOS_ENABLE_CUDA_LAMBDA )

	-template< typename ScalarType , class DeviceType >
	+template< typename ScalarType, class DeviceType >
	class TestTripleNestedReduce
	{
	public:
	- typedef DeviceType execution_space ;
	- typedef typename execution_space::size_type size_type ;
	+ typedef DeviceType execution_space;
	+ typedef typename execution_space::size_type size_type;

	- TestTripleNestedReduce( const size_type & , const size_type
	- , const size_type & , const size_type )
	- { }
	+ TestTripleNestedReduce( const size_type &, const size_type
	+ , const size_type &, const size_type )
	+ {}
	};

	#endif

	//--------------------------------------------------------------------------

	namespace Test {
	+
	namespace ReduceCombinatorical {

	-template<class Scalar,class Space = Kokkos::HostSpace>
	+template< class Scalar, class Space = Kokkos::HostSpace >
	struct AddPlus {
	public:
	- //Required
	+ // Required.
	typedef AddPlus reducer_type;
	typedef Scalar value_type;

	- typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
	+ typedef Kokkos::View< value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;

	private:
	result_view_type result;

	public:
	+ AddPlus( value_type & result_ ) : result( &result_ ) {}

	- AddPlus(value_type& result_):result(&result_) {}
	-
	- //Required
	+ // Required.
	KOKKOS_INLINE_FUNCTION
	- void join(value_type& dest, const value_type& src) const {
	+ void join( value_type & dest, const value_type & src ) const {
	dest += src + 1;
	}

	KOKKOS_INLINE_FUNCTION
	- void join(volatile value_type& dest, const volatile value_type& src) const {
	+ void join( volatile value_type & dest, const volatile value_type & src ) const {
	dest += src + 1;
	}

	- //Optional
	+ // Optional.
	KOKKOS_INLINE_FUNCTION
	- void init( value_type& val) const {
	+ void init( value_type & val ) const {
	val = value_type();
	}

	result_view_type result_view() const {
	return result;
	}
	};

	-template<int ISTEAM>
	+template< int ISTEAM >
	struct FunctorScalar;

	template<>
	-struct FunctorScalar<0>{
	- FunctorScalar(Kokkos::View<double> r):result(r) {}
	- Kokkos::View<double> result;
	+struct FunctorScalar< 0 > {
	+ Kokkos::View< double > result;
	+
	+ FunctorScalar( Kokkos::View< double > r ) : result( r ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i,double& update) const {
	- update+=i;
	+ void operator()( const int & i, double & update ) const {
	+ update += i;
	}
	};

	template<>
	-struct FunctorScalar<1>{
	- FunctorScalar(Kokkos::View<double> r):result(r) {}
	- Kokkos::View<double> result;
	-
	+struct FunctorScalar< 1 > {
	typedef Kokkos::TeamPolicy<>::member_type team_type;
	+
	+ Kokkos::View< double > result;
	+
	+ FunctorScalar( Kokkos::View< double > r ) : result( r ) {}
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const team_type& team,double& update) const {
	- update+=1.0/team.team_size()*team.league_rank();
	+ void operator()( const team_type & team, double & update ) const {
	+ update += 1.0 / team.team_size() * team.league_rank();
	}
	};

	-template<int ISTEAM>
	+template< int ISTEAM >
	struct FunctorScalarInit;

	template<>
	-struct FunctorScalarInit<0> {
	- FunctorScalarInit(Kokkos::View<double> r):result(r) {}
	+struct FunctorScalarInit< 0 > {
	+ Kokkos::View< double > result;

	- Kokkos::View<double> result;
	+ FunctorScalarInit( Kokkos::View< double > r ) : result( r ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, double& update) const {
	+ void operator()( const int & i, double & update ) const {
	update += i;
	}

	KOKKOS_INLINE_FUNCTION
	- void init(double& update) const {
	+ void init( double & update ) const {
	update = 0.0;
	}
	};

	template<>
	-struct FunctorScalarInit<1> {
	- FunctorScalarInit(Kokkos::View<double> r):result(r) {}
	+struct FunctorScalarInit< 1 > {
	+ typedef Kokkos::TeamPolicy<>::member_type team_type;

	- Kokkos::View<double> result;
	+ Kokkos::View< double > result;
	+
	+ FunctorScalarInit( Kokkos::View< double > r ) : result( r ) {}

	- typedef Kokkos::TeamPolicy<>::member_type team_type;
	KOKKOS_INLINE_FUNCTION
	- void operator() (const team_type& team,double& update) const {
	- update+=1.0/team.team_size()*team.league_rank();
	+ void operator()( const team_type & team, double & update ) const {
	+ update += 1.0 / team.team_size() * team.league_rank();
	}

	KOKKOS_INLINE_FUNCTION
	- void init(double& update) const {
	+ void init( double & update ) const {
	update = 0.0;
	}
	};

	-template<int ISTEAM>
	+template< int ISTEAM >
	struct FunctorScalarFinal;

	-
	template<>
	-struct FunctorScalarFinal<0> {
	- FunctorScalarFinal(Kokkos::View<double> r):result(r) {}
	-
	+struct FunctorScalarFinal< 0 > {
	Kokkos::View<double> result;
	+
	+ FunctorScalarFinal( Kokkos::View< double > r ) : result( r ) {}
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, double& update) const {
	+ void operator()( const int & i, double & update ) const {
	update += i;
	}

	KOKKOS_INLINE_FUNCTION
	- void final(double& update) const {
	+ void final( double & update ) const {
	result() = update;
	}
	};

	template<>
	-struct FunctorScalarFinal<1> {
	- FunctorScalarFinal(Kokkos::View<double> r):result(r) {}
	+struct FunctorScalarFinal< 1 > {
	+ typedef Kokkos::TeamPolicy<>::member_type team_type;

	- Kokkos::View<double> result;
	+ Kokkos::View< double > result;

	- typedef Kokkos::TeamPolicy<>::member_type team_type;
	+ FunctorScalarFinal( Kokkos::View< double > r ) : result( r ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator() (const team_type& team, double& update) const {
	- update+=1.0/team.team_size()*team.league_rank();
	+ void operator()( const team_type & team, double & update ) const {
	+ update += 1.0 / team.team_size() * team.league_rank();
	}
	+
	KOKKOS_INLINE_FUNCTION
	- void final(double& update) const {
	+ void final( double & update ) const {
	result() = update;
	}
	};

	-template<int ISTEAM>
	+template< int ISTEAM >
	struct FunctorScalarJoin;

	template<>
	-struct FunctorScalarJoin<0> {
	- FunctorScalarJoin(Kokkos::View<double> r):result(r) {}
	-
	+struct FunctorScalarJoin< 0 > {
	Kokkos::View<double> result;
	+
	+ FunctorScalarJoin( Kokkos::View< double > r ) : result( r ) {}
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, double& update) const {
	+ void operator()( const int & i, double & update ) const {
	update += i;
	}

	KOKKOS_INLINE_FUNCTION
	- void join(volatile double& dst, const volatile double& update) const {
	+ void join( volatile double & dst, const volatile double & update ) const {
	dst += update;
	}
	};

	template<>
	-struct FunctorScalarJoin<1> {
	- FunctorScalarJoin(Kokkos::View<double> r):result(r) {}
	+struct FunctorScalarJoin< 1 > {
	+ typedef Kokkos::TeamPolicy<>::member_type team_type;

	- Kokkos::View<double> result;
	+ Kokkos::View< double > result;
	+
	+ FunctorScalarJoin( Kokkos::View< double > r ) : result( r ) {}

	- typedef Kokkos::TeamPolicy<>::member_type team_type;
	KOKKOS_INLINE_FUNCTION
	- void operator() (const team_type& team,double& update) const {
	- update+=1.0/team.team_size()*team.league_rank();
	+ void operator()( const team_type & team, double & update ) const {
	+ update += 1.0 / team.team_size() * team.league_rank();
	}

	KOKKOS_INLINE_FUNCTION
	- void join(volatile double& dst, const volatile double& update) const {
	+ void join( volatile double & dst, const volatile double & update ) const {
	dst += update;
	}
	};

	-template<int ISTEAM>
	+template< int ISTEAM >
	struct FunctorScalarJoinFinal;

	template<>
	-struct FunctorScalarJoinFinal<0> {
	- FunctorScalarJoinFinal(Kokkos::View<double> r):result(r) {}
	+struct FunctorScalarJoinFinal< 0 > {
	+ Kokkos::View< double > result;
	+
	+ FunctorScalarJoinFinal( Kokkos::View< double > r ) : result( r ) {}

	- Kokkos::View<double> result;
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, double& update) const {
	+ void operator()( const int & i, double & update ) const {
	update += i;
	}

	KOKKOS_INLINE_FUNCTION
	- void join(volatile double& dst, const volatile double& update) const {
	+ void join( volatile double & dst, const volatile double & update ) const {
	dst += update;
	}

	KOKKOS_INLINE_FUNCTION
	- void final(double& update) const {
	+ void final( double & update ) const {
	result() = update;
	}
	};

	template<>
	-struct FunctorScalarJoinFinal<1> {
	- FunctorScalarJoinFinal(Kokkos::View<double> r):result(r) {}
	+struct FunctorScalarJoinFinal< 1 > {
	+ typedef Kokkos::TeamPolicy<>::member_type team_type;

	- Kokkos::View<double> result;
	+ Kokkos::View< double > result;
	+
	+ FunctorScalarJoinFinal( Kokkos::View< double > r ) : result( r ) {}

	- typedef Kokkos::TeamPolicy<>::member_type team_type;
	KOKKOS_INLINE_FUNCTION
	- void operator() (const team_type& team,double& update) const {
	- update+=1.0/team.team_size()*team.league_rank();
	+ void operator()( const team_type & team, double & update ) const {
	+ update += 1.0 / team.team_size() * team.league_rank();
	}

	KOKKOS_INLINE_FUNCTION
	- void join(volatile double& dst, const volatile double& update) const {
	+ void join( volatile double & dst, const volatile double & update ) const {
	dst += update;
	}

	KOKKOS_INLINE_FUNCTION
	- void final(double& update) const {
	+ void final( double & update ) const {
	result() = update;
	}
	};

	-template<int ISTEAM>
	+template< int ISTEAM >
	struct FunctorScalarJoinInit;

	template<>
	-struct FunctorScalarJoinInit<0> {
	- FunctorScalarJoinInit(Kokkos::View<double> r):result(r) {}
	+struct FunctorScalarJoinInit< 0 > {
	+ Kokkos::View< double > result;
	+
	+ FunctorScalarJoinInit( Kokkos::View< double > r ) : result( r ) {}

	- Kokkos::View<double> result;
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, double& update) const {
	+ void operator()( const int & i, double & update ) const {
	update += i;
	}

	KOKKOS_INLINE_FUNCTION
	- void join(volatile double& dst, const volatile double& update) const {
	+ void join( volatile double & dst, const volatile double & update ) const {
	dst += update;
	}

	KOKKOS_INLINE_FUNCTION
	- void init(double& update) const {
	+ void init( double & update ) const {
	update = 0.0;
	}
	};

	template<>
	-struct FunctorScalarJoinInit<1> {
	- FunctorScalarJoinInit(Kokkos::View<double> r):result(r) {}
	+struct FunctorScalarJoinInit< 1 > {
	+ typedef Kokkos::TeamPolicy<>::member_type team_type;

	- Kokkos::View<double> result;
	+ Kokkos::View< double > result;
	+
	+ FunctorScalarJoinInit( Kokkos::View< double > r ) : result( r ) {}

	- typedef Kokkos::TeamPolicy<>::member_type team_type;
	KOKKOS_INLINE_FUNCTION
	- void operator() (const team_type& team,double& update) const {
	- update+=1.0/team.team_size()*team.league_rank();
	+ void operator()( const team_type & team, double & update ) const {
	+ update += 1.0 / team.team_size() * team.league_rank();
	}

	KOKKOS_INLINE_FUNCTION
	- void join(volatile double& dst, const volatile double& update) const {
	+ void join( volatile double & dst, const volatile double & update ) const {
	dst += update;
	}

	KOKKOS_INLINE_FUNCTION
	- void init(double& update) const {
	+ void init( double & update ) const {
	update = 0.0;
	}
	};

	-template<int ISTEAM>
	+template< int ISTEAM >
	struct FunctorScalarJoinFinalInit;

	template<>
	-struct FunctorScalarJoinFinalInit<0> {
	- FunctorScalarJoinFinalInit(Kokkos::View<double> r):result(r) {}
	-
	+struct FunctorScalarJoinFinalInit< 0 > {
	Kokkos::View<double> result;

	+ FunctorScalarJoinFinalInit( Kokkos::View< double > r ) : result( r ) {}
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, double& update) const {
	+ void operator()( const int & i, double & update ) const {
	update += i;
	}

	KOKKOS_INLINE_FUNCTION
	- void join(volatile double& dst, const volatile double& update) const {
	+ void join( volatile double & dst, const volatile double & update ) const {
	dst += update;
	}

	KOKKOS_INLINE_FUNCTION
	- void final(double& update) const {
	+ void final( double & update ) const {
	result() = update;
	}

	KOKKOS_INLINE_FUNCTION
	- void init(double& update) const {
	+ void init( double & update ) const {
	update = 0.0;
	}
	};

	template<>
	-struct FunctorScalarJoinFinalInit<1> {
	- FunctorScalarJoinFinalInit(Kokkos::View<double> r):result(r) {}
	+struct FunctorScalarJoinFinalInit< 1 > {
	+ typedef Kokkos::TeamPolicy<>::member_type team_type;

	- Kokkos::View<double> result;
	+ Kokkos::View< double > result;
	+
	+ FunctorScalarJoinFinalInit( Kokkos::View< double > r ) : result( r ) {}

	- typedef Kokkos::TeamPolicy<>::member_type team_type;
	KOKKOS_INLINE_FUNCTION
	- void operator() (const team_type& team,double& update) const {
	- update+=1.0/team.team_size()*team.league_rank();
	+ void operator()( const team_type & team, double & update ) const {
	+ update += 1.0 / team.team_size() * team.league_rank();
	}

	KOKKOS_INLINE_FUNCTION
	- void join(volatile double& dst, const volatile double& update) const {
	+ void join( volatile double & dst, const volatile double & update ) const {
	dst += update;
	}

	KOKKOS_INLINE_FUNCTION
	- void final(double& update) const {
	+ void final( double & update ) const {
	result() = update;
	}

	KOKKOS_INLINE_FUNCTION
	- void init(double& update) const {
	+ void init( double & update ) const {
	update = 0.0;
	}
	};
	+
	struct Functor1 {
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i,double& update) const {
	- update+=i;
	+ void operator()( const int & i, double & update ) const {
	+ update += i;
	}
	};

	struct Functor2 {
	typedef double value_type[];
	+
	const unsigned value_count;

	- Functor2(unsigned n):value_count(n){}
	+ Functor2( unsigned n ) : value_count( n ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator() (const unsigned& i,double update[]) const {
	- for(unsigned j=0;j<value_count;j++)
	- update[j]+=i;
	+ void operator()( const unsigned & i, double update[] ) const {
	+ for ( unsigned j = 0; j < value_count; j++ ) {
	+ update[j] += i;
	+ }
	}

	KOKKOS_INLINE_FUNCTION
	void init( double dst[] ) const
	{
	- for ( unsigned i = 0 ; i < value_count ; ++i ) dst[i] = 0 ;
	+ for ( unsigned i = 0; i < value_count; ++i ) dst[i] = 0;
	}

	KOKKOS_INLINE_FUNCTION
	- void join( volatile double dst[] ,
	+ void join( volatile double dst[],
	const volatile double src[] ) const
	{
	- for ( unsigned i = 0 ; i < value_count ; ++i ) dst[i] += src[i] ;
	+ for ( unsigned i = 0; i < value_count; ++i ) dst[i] += src[i];
	}
	};

	-}
	-}
	+} // namespace ReduceCombinatorical
	+
	+} // namespace Test

	namespace Test {

	-template<class ExecSpace = Kokkos::DefaultExecutionSpace>
	+template< class ExecSpace = Kokkos::DefaultExecutionSpace >
	struct TestReduceCombinatoricalInstantiation {
	- template<class ... Args>
	- static void CallParallelReduce(Args... args) {
	- Kokkos::parallel_reduce(args...);
	+ template< class ... Args >
	+ static void CallParallelReduce( Args... args ) {
	+ Kokkos::parallel_reduce( args... );
	}

	- template<class ... Args>
	- static void AddReturnArgument(Args... args) {
	- Kokkos::View<double,Kokkos::HostSpace> result_view("ResultView");
	- double expected_result = 1000.0*999.0/2.0;
	+ template< class ... Args >
	+ static void AddReturnArgument( Args... args ) {
	+ Kokkos::View< double, Kokkos::HostSpace > result_view( "ResultView" );
	+ double expected_result = 1000.0 * 999.0 / 2.0;

	double value = 0;
	- Kokkos::parallel_reduce(args...,value);
	- ASSERT_EQ(expected_result,value);
	+ Kokkos::parallel_reduce( args..., value );
	+ ASSERT_EQ( expected_result, value );

	result_view() = 0;
	- CallParallelReduce(args...,result_view);
	- ASSERT_EQ(expected_result,result_view());
	+ CallParallelReduce( args..., result_view );
	+ ASSERT_EQ( expected_result, result_view() );

	value = 0;
	- CallParallelReduce(args...,Kokkos::View<double,Kokkos::HostSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>(&value));
	- ASSERT_EQ(expected_result,value);
	+ CallParallelReduce( args..., Kokkos::View< double, Kokkos::HostSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >( &value ) );
	+ ASSERT_EQ( expected_result, value );

	result_view() = 0;
	- const Kokkos::View<double,Kokkos::HostSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> result_view_const_um = result_view;
	- CallParallelReduce(args...,result_view_const_um);
	- ASSERT_EQ(expected_result,result_view_const_um());
	+ const Kokkos::View< double, Kokkos::HostSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_const_um = result_view;
	+ CallParallelReduce( args..., result_view_const_um );
	+ ASSERT_EQ( expected_result, result_view_const_um() );

	value = 0;
	- CallParallelReduce(args...,Test::ReduceCombinatorical::AddPlus<double>(value));
	- if((Kokkos::DefaultExecutionSpace::concurrency() > 1) && (ExecSpace::concurrency()>1))
	- ASSERT_TRUE(expected_result<value);
	- else if((Kokkos::DefaultExecutionSpace::concurrency() > 1) \|\| (ExecSpace::concurrency()>1))
	- ASSERT_TRUE(expected_result<=value);
	- else
	- ASSERT_EQ(expected_result,value);
	+ CallParallelReduce( args..., Test::ReduceCombinatorical::AddPlus< double >( value ) );
	+ if ( ( Kokkos::DefaultExecutionSpace::concurrency() > 1 ) && ( ExecSpace::concurrency() > 1 ) ) {
	+ ASSERT_TRUE( expected_result < value );
	+ }
	+ else if ( ( Kokkos::DefaultExecutionSpace::concurrency() > 1 ) \|\| ( ExecSpace::concurrency() > 1 ) ) {
	+ ASSERT_TRUE( expected_result <= value );
	+ }
	+ else {
	+ ASSERT_EQ( expected_result, value );
	+ }

	value = 0;
	- Test::ReduceCombinatorical::AddPlus<double> add(value);
	- CallParallelReduce(args...,add);
	- if((Kokkos::DefaultExecutionSpace::concurrency() > 1) && (ExecSpace::concurrency()>1))
	- ASSERT_TRUE(expected_result<value);
	- else if((Kokkos::DefaultExecutionSpace::concurrency() > 1) \|\| (ExecSpace::concurrency()>1))
	- ASSERT_TRUE(expected_result<=value);
	- else
	- ASSERT_EQ(expected_result,value);
	+ Test::ReduceCombinatorical::AddPlus< double > add( value );
	+ CallParallelReduce( args..., add );
	+ if ( ( Kokkos::DefaultExecutionSpace::concurrency() > 1 ) && ( ExecSpace::concurrency() > 1 ) ) {
	+ ASSERT_TRUE( expected_result < value );
	+ }
	+ else if ( ( Kokkos::DefaultExecutionSpace::concurrency() > 1 ) \|\| ( ExecSpace::concurrency() > 1 ) ) {
	+ ASSERT_TRUE( expected_result <= value );
	+ }
	+ else {
	+ ASSERT_EQ( expected_result, value );
	+ }
	}

	-
	- template<class ... Args>
	- static void AddLambdaRange(void*,Args... args) {
	- AddReturnArgument(args..., KOKKOS_LAMBDA (const int&i , double& lsum) {
	+ template< class ... Args >
	+ static void AddLambdaRange( void*, Args... args ) {
	+ AddReturnArgument( args..., KOKKOS_LAMBDA ( const int & i, double & lsum ) {
	lsum += i;
	});
	}

	- template<class ... Args>
	- static void AddLambdaTeam(void*,Args... args) {
	- AddReturnArgument(args..., KOKKOS_LAMBDA (const Kokkos::TeamPolicy<>::member_type& team, double& update) {
	- update+=1.0/team.team_size()*team.league_rank();
	+ template< class ... Args >
	+ static void AddLambdaTeam( void*, Args... args ) {
	+ AddReturnArgument( args..., KOKKOS_LAMBDA ( const Kokkos::TeamPolicy<>::member_type & team, double & update ) {
	+ update += 1.0 / team.team_size() * team.league_rank();
	});
	}

	- template<class ... Args>
	- static void AddLambdaRange(Kokkos::InvalidType,Args... args) {
	- }
	+ template< class ... Args >
	+ static void AddLambdaRange( Kokkos::InvalidType, Args... args ) {}

	- template<class ... Args>
	- static void AddLambdaTeam(Kokkos::InvalidType,Args... args) {
	- }
	+ template< class ... Args >
	+ static void AddLambdaTeam( Kokkos::InvalidType, Args... args ) {}

	- template<int ISTEAM, class ... Args>
	- static void AddFunctor(Args... args) {
	- Kokkos::View<double> result_view("FunctorView");
	- auto h_r = Kokkos::create_mirror_view(result_view);
	- Test::ReduceCombinatorical::FunctorScalar<ISTEAM> functor(result_view);
	- double expected_result = 1000.0*999.0/2.0;
	+ template< int ISTEAM, class ... Args >
	+ static void AddFunctor( Args... args ) {
	+ Kokkos::View< double > result_view( "FunctorView" );
	+ auto h_r = Kokkos::create_mirror_view( result_view );
	+ Test::ReduceCombinatorical::FunctorScalar< ISTEAM > functor( result_view );
	+ double expected_result = 1000.0 * 999.0 / 2.0;

	- AddReturnArgument(args..., functor);
	- AddReturnArgument(args..., Test::ReduceCombinatorical::FunctorScalar<ISTEAM>(result_view));
	- AddReturnArgument(args..., Test::ReduceCombinatorical::FunctorScalarInit<ISTEAM>(result_view));
	- AddReturnArgument(args..., Test::ReduceCombinatorical::FunctorScalarJoin<ISTEAM>(result_view));
	- AddReturnArgument(args..., Test::ReduceCombinatorical::FunctorScalarJoinInit<ISTEAM>(result_view));
	+ AddReturnArgument( args..., functor );
	+ AddReturnArgument( args..., Test::ReduceCombinatorical::FunctorScalar< ISTEAM >( result_view ) );
	+ AddReturnArgument( args..., Test::ReduceCombinatorical::FunctorScalarInit< ISTEAM >( result_view ) );
	+ AddReturnArgument( args..., Test::ReduceCombinatorical::FunctorScalarJoin< ISTEAM >( result_view ) );
	+ AddReturnArgument( args..., Test::ReduceCombinatorical::FunctorScalarJoinInit< ISTEAM >( result_view ) );

	h_r() = 0;
	- Kokkos::deep_copy(result_view,h_r);
	- CallParallelReduce(args..., Test::ReduceCombinatorical::FunctorScalarFinal<ISTEAM>(result_view));
	- Kokkos::deep_copy(h_r,result_view);
	- ASSERT_EQ(expected_result,h_r());
	+ Kokkos::deep_copy( result_view, h_r );
	+ CallParallelReduce( args..., Test::ReduceCombinatorical::FunctorScalarFinal< ISTEAM >( result_view ) );
	+ Kokkos::deep_copy( h_r, result_view );
	+ ASSERT_EQ( expected_result, h_r() );

	h_r() = 0;
	- Kokkos::deep_copy(result_view,h_r);
	- CallParallelReduce(args..., Test::ReduceCombinatorical::FunctorScalarJoinFinal<ISTEAM>(result_view));
	- Kokkos::deep_copy(h_r,result_view);
	- ASSERT_EQ(expected_result,h_r());
	+ Kokkos::deep_copy( result_view, h_r );
	+ CallParallelReduce( args..., Test::ReduceCombinatorical::FunctorScalarJoinFinal< ISTEAM >( result_view ) );
	+ Kokkos::deep_copy( h_r, result_view );
	+ ASSERT_EQ( expected_result, h_r() );

	h_r() = 0;
	- Kokkos::deep_copy(result_view,h_r);
	- CallParallelReduce(args..., Test::ReduceCombinatorical::FunctorScalarJoinFinalInit<ISTEAM>(result_view));
	- Kokkos::deep_copy(h_r,result_view);
	- ASSERT_EQ(expected_result,h_r());
	+ Kokkos::deep_copy( result_view, h_r );
	+ CallParallelReduce( args..., Test::ReduceCombinatorical::FunctorScalarJoinFinalInit< ISTEAM >( result_view ) );
	+ Kokkos::deep_copy( h_r, result_view );
	+ ASSERT_EQ( expected_result, h_r() );
	}

	- template<class ... Args>
	- static void AddFunctorLambdaRange(Args... args) {
	- AddFunctor<0,Args...>(args...);
	- #ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
	- AddLambdaRange(typename std::conditional<std::is_same<ExecSpace,Kokkos::DefaultExecutionSpace>::value,void*,Kokkos::InvalidType>::type(), args...);
	- #endif
	+ template< class ... Args >
	+ static void AddFunctorLambdaRange( Args... args ) {
	+ AddFunctor< 0, Args... >( args... );
	+#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
	+ AddLambdaRange( typename std::conditional< std::is_same<ExecSpace, Kokkos::DefaultExecutionSpace>::value, void*, Kokkos::InvalidType >::type(), args... );
	+#endif
	}

	- template<class ... Args>
	- static void AddFunctorLambdaTeam(Args... args) {
	- AddFunctor<1,Args...>(args...);
	- #ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
	- AddLambdaTeam(typename std::conditional<std::is_same<ExecSpace,Kokkos::DefaultExecutionSpace>::value,void*,Kokkos::InvalidType>::type(), args...);
	- #endif
	+ template< class ... Args >
	+ static void AddFunctorLambdaTeam( Args... args ) {
	+ AddFunctor< 1, Args... >( args... );
	+#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
	+ AddLambdaTeam( typename std::conditional< std::is_same<ExecSpace, Kokkos::DefaultExecutionSpace>::value, void*, Kokkos::InvalidType >::type(), args... );
	+#endif
	}

	- template<class ... Args>
	- static void AddPolicy(Args... args) {
	+ template< class ... Args >
	+ static void AddPolicy( Args... args ) {
	int N = 1000;
	- Kokkos::RangePolicy<ExecSpace> policy(0,N);
	+ Kokkos::RangePolicy< ExecSpace > policy( 0, N );

	- AddFunctorLambdaRange(args...,1000);
	- AddFunctorLambdaRange(args...,N);
	- AddFunctorLambdaRange(args...,policy);
	- AddFunctorLambdaRange(args...,Kokkos::RangePolicy<ExecSpace>(0,N));
	- AddFunctorLambdaRange(args...,Kokkos::RangePolicy<ExecSpace,Kokkos::Schedule<Kokkos::Dynamic> >(0,N));
	- AddFunctorLambdaRange(args...,Kokkos::RangePolicy<ExecSpace,Kokkos::Schedule<Kokkos::Static> >(0,N).set_chunk_size(10));
	- AddFunctorLambdaRange(args...,Kokkos::RangePolicy<ExecSpace,Kokkos::Schedule<Kokkos::Dynamic> >(0,N).set_chunk_size(10));
	+ AddFunctorLambdaRange( args..., 1000 );
	+ AddFunctorLambdaRange( args..., N );
	+ AddFunctorLambdaRange( args..., policy );
	+ AddFunctorLambdaRange( args..., Kokkos::RangePolicy< ExecSpace >( 0, N ) );
	+ AddFunctorLambdaRange( args..., Kokkos::RangePolicy< ExecSpace, Kokkos::Schedule<Kokkos::Dynamic> >( 0, N ) );
	+ AddFunctorLambdaRange( args..., Kokkos::RangePolicy< ExecSpace, Kokkos::Schedule<Kokkos::Static> >( 0, N ).set_chunk_size( 10 ) );
	+ AddFunctorLambdaRange( args..., Kokkos::RangePolicy< ExecSpace, Kokkos::Schedule<Kokkos::Dynamic> >( 0, N ).set_chunk_size( 10 ) );

	- AddFunctorLambdaTeam(args...,Kokkos::TeamPolicy<ExecSpace>(N,Kokkos::AUTO));
	- AddFunctorLambdaTeam(args...,Kokkos::TeamPolicy<ExecSpace,Kokkos::Schedule<Kokkos::Dynamic> >(N,Kokkos::AUTO));
	- AddFunctorLambdaTeam(args...,Kokkos::TeamPolicy<ExecSpace,Kokkos::Schedule<Kokkos::Static> >(N,Kokkos::AUTO).set_chunk_size(10));
	- AddFunctorLambdaTeam(args...,Kokkos::TeamPolicy<ExecSpace,Kokkos::Schedule<Kokkos::Dynamic> >(N,Kokkos::AUTO).set_chunk_size(10));
	+ AddFunctorLambdaTeam( args..., Kokkos::TeamPolicy< ExecSpace >( N, Kokkos::AUTO ) );
	+ AddFunctorLambdaTeam( args..., Kokkos::TeamPolicy< ExecSpace, Kokkos::Schedule<Kokkos::Dynamic> >( N, Kokkos::AUTO ) );
	+ AddFunctorLambdaTeam( args..., Kokkos::TeamPolicy< ExecSpace, Kokkos::Schedule<Kokkos::Static> >( N, Kokkos::AUTO ).set_chunk_size( 10 ) );
	+ AddFunctorLambdaTeam( args..., Kokkos::TeamPolicy< ExecSpace, Kokkos::Schedule<Kokkos::Dynamic> >( N, Kokkos::AUTO ).set_chunk_size( 10 ) );
	}

	-
	static void execute_a() {
	AddPolicy();
	}

	static void execute_b() {
	- std::string s("Std::String");
	- AddPolicy(s.c_str());
	- AddPolicy("Char Constant");
	+ std::string s( "Std::String" );
	+ AddPolicy( s.c_str() );
	+ AddPolicy( "Char Constant" );
	}

	static void execute_c() {
	- std::string s("Std::String");
	- AddPolicy(s);
	+ std::string s( "Std::String" );
	+ AddPolicy( s );
	}
	};

	-template<class Scalar, class ExecSpace = Kokkos::DefaultExecutionSpace>
	+template< class Scalar, class ExecSpace = Kokkos::DefaultExecutionSpace >
	struct TestReducers {
	-
	struct SumFunctor {
	- Kokkos::View<const Scalar*,ExecSpace> values;
	+ Kokkos::View< const Scalar*, ExecSpace > values;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, Scalar& value) const {
	- value += values(i);
	+ void operator()( const int & i, Scalar & value ) const {
	+ value += values( i );
	}
	};

	struct ProdFunctor {
	- Kokkos::View<const Scalar*,ExecSpace> values;
	+ Kokkos::View< const Scalar*, ExecSpace > values;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, Scalar& value) const {
	- value *= values(i);
	+ void operator()( const int & i, Scalar & value ) const {
	+ value *= values( i );
	}
	};

	struct MinFunctor {
	- Kokkos::View<const Scalar*,ExecSpace> values;
	+ Kokkos::View< const Scalar*, ExecSpace > values;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, Scalar& value) const {
	- if(values(i) < value)
	- value = values(i);
	+ void operator()( const int & i, Scalar & value ) const {
	+ if ( values( i ) < value ) value = values( i );
	}
	};

	struct MaxFunctor {
	- Kokkos::View<const Scalar*,ExecSpace> values;
	+ Kokkos::View< const Scalar*, ExecSpace > values;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, Scalar& value) const {
	- if(values(i) > value)
	- value = values(i);
	+ void operator()( const int & i, Scalar & value ) const {
	+ if ( values( i ) > value ) value = values( i );
	}
	};

	struct MinLocFunctor {
	- Kokkos::View<const Scalar*,ExecSpace> values;
	+ Kokkos::View< const Scalar*, ExecSpace > values;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i,
	- typename Kokkos::Experimental::MinLoc<Scalar,int>::value_type& value) const {
	- if(values(i) < value.val) {
	- value.val = values(i);
	+ void operator()( const int & i, typename Kokkos::Experimental::MinLoc< Scalar, int >::value_type & value ) const {
	+ if ( values( i ) < value.val ) {
	+ value.val = values( i );
	value.loc = i;
	}
	}
	};

	struct MaxLocFunctor {
	- Kokkos::View<const Scalar*,ExecSpace> values;
	+ Kokkos::View< const Scalar*, ExecSpace > values;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i,
	- typename Kokkos::Experimental::MaxLoc<Scalar,int>::value_type& value) const {
	- if(values(i) > value.val) {
	- value.val = values(i);
	+ void operator()( const int & i, typename Kokkos::Experimental::MaxLoc< Scalar, int >::value_type & value ) const {
	+ if ( values( i ) > value.val ) {
	+ value.val = values( i );
	value.loc = i;
	}
	}
	};

	struct MinMaxLocFunctor {
	- Kokkos::View<const Scalar*,ExecSpace> values;
	+ Kokkos::View< const Scalar*, ExecSpace > values;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i,
	- typename Kokkos::Experimental::MinMaxLoc<Scalar,int>::value_type& value) const {
	- if(values(i) > value.max_val) {
	- value.max_val = values(i);
	+ void operator()( const int & i, typename Kokkos::Experimental::MinMaxLoc< Scalar, int >::value_type & value ) const {
	+ if ( values( i ) > value.max_val ) {
	+ value.max_val = values( i );
	value.max_loc = i;
	}
	- if(values(i) < value.min_val) {
	- value.min_val = values(i);
	+
	+ if ( values( i ) < value.min_val ) {
	+ value.min_val = values( i );
	value.min_loc = i;
	}
	}
	};

	struct BAndFunctor {
	- Kokkos::View<const Scalar*,ExecSpace> values;
	+ Kokkos::View< const Scalar*, ExecSpace > values;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, Scalar& value) const {
	- value = value & values(i);
	+ void operator()( const int & i, Scalar & value ) const {
	+ value = value & values( i );
	}
	};

	struct BOrFunctor {
	- Kokkos::View<const Scalar*,ExecSpace> values;
	+ Kokkos::View< const Scalar*, ExecSpace > values;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, Scalar& value) const {
	- value = value \| values(i);
	+ void operator()( const int & i, Scalar & value ) const {
	+ value = value \| values( i );
	}
	};

	struct BXorFunctor {
	- Kokkos::View<const Scalar*,ExecSpace> values;
	+ Kokkos::View< const Scalar*, ExecSpace > values;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, Scalar& value) const {
	- value = value ^ values(i);
	+ void operator()( const int & i, Scalar & value ) const {
	+ value = value ^ values( i );
	}
	};

	struct LAndFunctor {
	- Kokkos::View<const Scalar*,ExecSpace> values;
	+ Kokkos::View< const Scalar*, ExecSpace > values;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, Scalar& value) const {
	- value = value && values(i);
	+ void operator()( const int & i, Scalar & value ) const {
	+ value = value && values( i );
	}
	};

	struct LOrFunctor {
	- Kokkos::View<const Scalar*,ExecSpace> values;
	+ Kokkos::View< const Scalar*, ExecSpace > values;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, Scalar& value) const {
	- value = value \|\| values(i);
	+ void operator()( const int & i, Scalar & value ) const {
	+ value = value \|\| values( i );
	}
	};

	struct LXorFunctor {
	- Kokkos::View<const Scalar*,ExecSpace> values;
	+ Kokkos::View< const Scalar*, ExecSpace > values;
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int& i, Scalar& value) const {
	- value = value ? (!values(i)) : values(i);
	+ void operator()( const int & i, Scalar & value ) const {
	+ value = value ? ( !values( i ) ) : values( i );
	}
	};

	- static void test_sum(int N) {
	- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
	- auto h_values = Kokkos::create_mirror_view(values);
	+ static void test_sum( int N ) {
	+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
	+ auto h_values = Kokkos::create_mirror_view( values );
	Scalar reference_sum = 0;
	- for(int i=0; i<N; i++) {
	- h_values(i) = (Scalar)(rand()%100);
	- reference_sum += h_values(i);
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ h_values( i ) = (Scalar) ( rand() % 100 );
	+ reference_sum += h_values( i );
	}
	- Kokkos::deep_copy(values,h_values);
	+ Kokkos::deep_copy( values, h_values );

	SumFunctor f;
	f.values = values;
	Scalar init = 0;

	{
	Scalar sum_scalar = init;
	- Kokkos::Experimental::Sum<Scalar> reducer_scalar(sum_scalar);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
	- ASSERT_EQ(sum_scalar,reference_sum);
	+ Kokkos::Experimental::Sum< Scalar > reducer_scalar( sum_scalar );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
	+
	+ ASSERT_EQ( sum_scalar, reference_sum );
	+
	Scalar sum_scalar_view = reducer_scalar.result_view()();
	- ASSERT_EQ(sum_scalar_view,reference_sum);
	+ ASSERT_EQ( sum_scalar_view, reference_sum );
	}
	+
	{
	Scalar sum_scalar_init = init;
	- Kokkos::Experimental::Sum<Scalar> reducer_scalar_init(sum_scalar_init,init);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
	- ASSERT_EQ(sum_scalar_init,reference_sum);
	+ Kokkos::Experimental::Sum< Scalar > reducer_scalar_init( sum_scalar_init, init );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar_init );
	+
	+ ASSERT_EQ( sum_scalar_init, reference_sum );
	+
	Scalar sum_scalar_init_view = reducer_scalar_init.result_view()();
	- ASSERT_EQ(sum_scalar_init_view,reference_sum);
	+ ASSERT_EQ( sum_scalar_init_view, reference_sum );
	}
	+
	{
	- Kokkos::View<Scalar,Kokkos::HostSpace> sum_view("View");
	+ Kokkos::View< Scalar, Kokkos::HostSpace> sum_view( "View" );
	sum_view() = init;
	- Kokkos::Experimental::Sum<Scalar> reducer_view(sum_view);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
	+ Kokkos::Experimental::Sum< Scalar > reducer_view( sum_view );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
	+
	Scalar sum_view_scalar = sum_view();
	- ASSERT_EQ(sum_view_scalar,reference_sum);
	+ ASSERT_EQ( sum_view_scalar, reference_sum );
	+
	Scalar sum_view_view = reducer_view.result_view()();
	- ASSERT_EQ(sum_view_view,reference_sum);
	+ ASSERT_EQ( sum_view_view, reference_sum );
	}
	+
	{
	- Kokkos::View<Scalar,Kokkos::HostSpace> sum_view_init("View");
	+ Kokkos::View< Scalar, Kokkos::HostSpace > sum_view_init( "View" );
	sum_view_init() = init;
	- Kokkos::Experimental::Sum<Scalar> reducer_view_init(sum_view_init,init);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
	+ Kokkos::Experimental::Sum< Scalar > reducer_view_init( sum_view_init, init );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view_init );
	+
	Scalar sum_view_init_scalar = sum_view_init();
	- ASSERT_EQ(sum_view_init_scalar,reference_sum);
	+ ASSERT_EQ( sum_view_init_scalar, reference_sum );
	+
	Scalar sum_view_init_view = reducer_view_init.result_view()();
	- ASSERT_EQ(sum_view_init_view,reference_sum);
	+ ASSERT_EQ( sum_view_init_view, reference_sum );
	}
	}

	- static void test_prod(int N) {
	- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
	- auto h_values = Kokkos::create_mirror_view(values);
	+ static void test_prod( int N ) {
	+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
	+ auto h_values = Kokkos::create_mirror_view( values );
	Scalar reference_prod = 1;
	- for(int i=0; i<N; i++) {
	- h_values(i) = (Scalar)(rand()%4+1);
	- reference_prod *= h_values(i);
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ h_values( i ) = (Scalar) ( rand() % 4 + 1 );
	+ reference_prod *= h_values( i );
	}
	- Kokkos::deep_copy(values,h_values);
	+ Kokkos::deep_copy( values, h_values );

	ProdFunctor f;
	f.values = values;
	Scalar init = 1;

	- if(std::is_arithmetic<Scalar>::value)
	+ if ( std::is_arithmetic< Scalar >::value )
	{
	Scalar prod_scalar = init;
	- Kokkos::Experimental::Prod<Scalar> reducer_scalar(prod_scalar);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
	- ASSERT_EQ(prod_scalar,reference_prod);
	+ Kokkos::Experimental::Prod< Scalar > reducer_scalar( prod_scalar );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
	+
	+ ASSERT_EQ( prod_scalar, reference_prod );
	+
	Scalar prod_scalar_view = reducer_scalar.result_view()();
	- ASSERT_EQ(prod_scalar_view,reference_prod);
	+ ASSERT_EQ( prod_scalar_view, reference_prod );
	}
	+
	{
	Scalar prod_scalar_init = init;
	- Kokkos::Experimental::Prod<Scalar> reducer_scalar_init(prod_scalar_init,init);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
	- ASSERT_EQ(prod_scalar_init,reference_prod);
	+ Kokkos::Experimental::Prod< Scalar > reducer_scalar_init( prod_scalar_init, init );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar_init );
	+
	+ ASSERT_EQ( prod_scalar_init, reference_prod );
	+
	Scalar prod_scalar_init_view = reducer_scalar_init.result_view()();
	- ASSERT_EQ(prod_scalar_init_view,reference_prod);
	+ ASSERT_EQ( prod_scalar_init_view, reference_prod );
	}

	- if(std::is_arithmetic<Scalar>::value)
	+ if ( std::is_arithmetic< Scalar >::value )
	{
	- Kokkos::View<Scalar,Kokkos::HostSpace> prod_view("View");
	+ Kokkos::View< Scalar, Kokkos::HostSpace > prod_view( "View" );
	prod_view() = init;
	- Kokkos::Experimental::Prod<Scalar> reducer_view(prod_view);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
	+ Kokkos::Experimental::Prod< Scalar > reducer_view( prod_view );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
	+
	Scalar prod_view_scalar = prod_view();
	- ASSERT_EQ(prod_view_scalar,reference_prod);
	+ ASSERT_EQ( prod_view_scalar, reference_prod );
	+
	Scalar prod_view_view = reducer_view.result_view()();
	- ASSERT_EQ(prod_view_view,reference_prod);
	+ ASSERT_EQ( prod_view_view, reference_prod );
	}
	+
	{
	- Kokkos::View<Scalar,Kokkos::HostSpace> prod_view_init("View");
	+ Kokkos::View< Scalar, Kokkos::HostSpace > prod_view_init( "View" );
	prod_view_init() = init;
	- Kokkos::Experimental::Prod<Scalar> reducer_view_init(prod_view_init,init);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
	+ Kokkos::Experimental::Prod< Scalar > reducer_view_init( prod_view_init, init );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view_init );
	+
	Scalar prod_view_init_scalar = prod_view_init();
	- ASSERT_EQ(prod_view_init_scalar,reference_prod);
	+ ASSERT_EQ( prod_view_init_scalar, reference_prod );
	+
	Scalar prod_view_init_view = reducer_view_init.result_view()();
	- ASSERT_EQ(prod_view_init_view,reference_prod);
	+ ASSERT_EQ( prod_view_init_view, reference_prod );
	}
	}

	- static void test_min(int N) {
	- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
	- auto h_values = Kokkos::create_mirror_view(values);
	- Scalar reference_min = std::numeric_limits<Scalar>::max();
	- for(int i=0; i<N; i++) {
	- h_values(i) = (Scalar)(rand()%100000);
	- if(h_values(i)<reference_min)
	- reference_min = h_values(i);
	+ static void test_min( int N ) {
	+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
	+ auto h_values = Kokkos::create_mirror_view( values );
	+ Scalar reference_min = std::numeric_limits< Scalar >::max();
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ h_values( i ) = (Scalar) ( rand() % 100000 );
	+
	+ if ( h_values( i ) < reference_min ) reference_min = h_values( i );
	}
	- Kokkos::deep_copy(values,h_values);
	+ Kokkos::deep_copy( values, h_values );

	MinFunctor f;
	f.values = values;
	- Scalar init = std::numeric_limits<Scalar>::max();
	+ Scalar init = std::numeric_limits< Scalar >::max();

	{
	Scalar min_scalar = init;
	- Kokkos::Experimental::Min<Scalar> reducer_scalar(min_scalar);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
	- ASSERT_EQ(min_scalar,reference_min);
	+ Kokkos::Experimental::Min< Scalar > reducer_scalar( min_scalar );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
	+
	+ ASSERT_EQ( min_scalar, reference_min );
	+
	Scalar min_scalar_view = reducer_scalar.result_view()();
	- ASSERT_EQ(min_scalar_view,reference_min);
	+ ASSERT_EQ( min_scalar_view, reference_min );
	}
	+
	{
	Scalar min_scalar_init = init;
	- Kokkos::Experimental::Min<Scalar> reducer_scalar_init(min_scalar_init,init);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
	- ASSERT_EQ(min_scalar_init,reference_min);
	+ Kokkos::Experimental::Min< Scalar > reducer_scalar_init( min_scalar_init, init );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar_init );
	+
	+ ASSERT_EQ( min_scalar_init, reference_min );
	+
	Scalar min_scalar_init_view = reducer_scalar_init.result_view()();
	- ASSERT_EQ(min_scalar_init_view,reference_min);
	+ ASSERT_EQ( min_scalar_init_view, reference_min );
	}
	+
	{
	- Kokkos::View<Scalar,Kokkos::HostSpace> min_view("View");
	+ Kokkos::View< Scalar, Kokkos::HostSpace > min_view( "View" );
	min_view() = init;
	- Kokkos::Experimental::Min<Scalar> reducer_view(min_view);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
	+ Kokkos::Experimental::Min< Scalar > reducer_view( min_view );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
	+
	Scalar min_view_scalar = min_view();
	- ASSERT_EQ(min_view_scalar,reference_min);
	+ ASSERT_EQ( min_view_scalar, reference_min );
	+
	Scalar min_view_view = reducer_view.result_view()();
	- ASSERT_EQ(min_view_view,reference_min);
	+ ASSERT_EQ( min_view_view, reference_min );
	}
	+
	{
	- Kokkos::View<Scalar,Kokkos::HostSpace> min_view_init("View");
	+ Kokkos::View< Scalar, Kokkos::HostSpace > min_view_init( "View" );
	min_view_init() = init;
	- Kokkos::Experimental::Min<Scalar> reducer_view_init(min_view_init,init);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
	+ Kokkos::Experimental::Min< Scalar > reducer_view_init( min_view_init, init );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view_init );
	+
	Scalar min_view_init_scalar = min_view_init();
	- ASSERT_EQ(min_view_init_scalar,reference_min);
	+ ASSERT_EQ( min_view_init_scalar, reference_min );
	+
	Scalar min_view_init_view = reducer_view_init.result_view()();
	- ASSERT_EQ(min_view_init_view,reference_min);
	+ ASSERT_EQ( min_view_init_view, reference_min );
	}
	}

	- static void test_max(int N) {
	- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
	- auto h_values = Kokkos::create_mirror_view(values);
	- Scalar reference_max = std::numeric_limits<Scalar>::min();
	- for(int i=0; i<N; i++) {
	- h_values(i) = (Scalar)(rand()%100000+1);
	- if(h_values(i)>reference_max)
	- reference_max = h_values(i);
	+ static void test_max( int N ) {
	+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
	+ auto h_values = Kokkos::create_mirror_view( values );
	+ Scalar reference_max = std::numeric_limits< Scalar >::min();
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ h_values( i ) = (Scalar) ( rand() % 100000 + 1 );
	+
	+ if ( h_values( i ) > reference_max ) reference_max = h_values( i );
	}
	- Kokkos::deep_copy(values,h_values);
	+ Kokkos::deep_copy( values, h_values );

	MaxFunctor f;
	f.values = values;
	- Scalar init = std::numeric_limits<Scalar>::min();
	+ Scalar init = std::numeric_limits< Scalar >::min();

	{
	Scalar max_scalar = init;
	- Kokkos::Experimental::Max<Scalar> reducer_scalar(max_scalar);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
	- ASSERT_EQ(max_scalar,reference_max);
	+ Kokkos::Experimental::Max< Scalar > reducer_scalar( max_scalar );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
	+
	+ ASSERT_EQ( max_scalar, reference_max );
	+
	Scalar max_scalar_view = reducer_scalar.result_view()();
	- ASSERT_EQ(max_scalar_view,reference_max);
	+ ASSERT_EQ( max_scalar_view, reference_max );
	}
	+
	{
	Scalar max_scalar_init = init;
	- Kokkos::Experimental::Max<Scalar> reducer_scalar_init(max_scalar_init,init);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
	- ASSERT_EQ(max_scalar_init,reference_max);
	+ Kokkos::Experimental::Max< Scalar > reducer_scalar_init( max_scalar_init, init );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar_init );
	+
	+ ASSERT_EQ( max_scalar_init, reference_max );
	+
	Scalar max_scalar_init_view = reducer_scalar_init.result_view()();
	- ASSERT_EQ(max_scalar_init_view,reference_max);
	+ ASSERT_EQ( max_scalar_init_view, reference_max );
	}
	+
	{
	- Kokkos::View<Scalar,Kokkos::HostSpace> max_view("View");
	+ Kokkos::View< Scalar, Kokkos::HostSpace > max_view( "View" );
	max_view() = init;
	- Kokkos::Experimental::Max<Scalar> reducer_view(max_view);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
	+ Kokkos::Experimental::Max< Scalar > reducer_view( max_view );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
	+
	Scalar max_view_scalar = max_view();
	- ASSERT_EQ(max_view_scalar,reference_max);
	+ ASSERT_EQ( max_view_scalar, reference_max );
	+
	Scalar max_view_view = reducer_view.result_view()();
	- ASSERT_EQ(max_view_view,reference_max);
	+ ASSERT_EQ( max_view_view, reference_max );
	}
	+
	{
	- Kokkos::View<Scalar,Kokkos::HostSpace> max_view_init("View");
	+ Kokkos::View< Scalar, Kokkos::HostSpace > max_view_init( "View" );
	max_view_init() = init;
	- Kokkos::Experimental::Max<Scalar> reducer_view_init(max_view_init,init);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
	+ Kokkos::Experimental::Max< Scalar > reducer_view_init( max_view_init, init );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view_init );
	+
	Scalar max_view_init_scalar = max_view_init();
	- ASSERT_EQ(max_view_init_scalar,reference_max);
	+ ASSERT_EQ( max_view_init_scalar, reference_max );
	+
	Scalar max_view_init_view = reducer_view_init.result_view()();
	- ASSERT_EQ(max_view_init_view,reference_max);
	+ ASSERT_EQ( max_view_init_view, reference_max );
	}
	}

	- static void test_minloc(int N) {
	- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
	- auto h_values = Kokkos::create_mirror_view(values);
	- Scalar reference_min = std::numeric_limits<Scalar>::max();
	+ static void test_minloc( int N ) {
	+ typedef typename Kokkos::Experimental::MinLoc< Scalar, int >::value_type value_type;
	+
	+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
	+ auto h_values = Kokkos::create_mirror_view( values );
	+ Scalar reference_min = std::numeric_limits< Scalar >::max();
	int reference_loc = -1;
	- for(int i=0; i<N; i++) {
	- h_values(i) = (Scalar)(rand()%100000);
	- if(h_values(i)<reference_min) {
	- reference_min = h_values(i);
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ h_values( i ) = (Scalar) ( rand() % 100000 );
	+
	+ if ( h_values( i ) < reference_min ) {
	+ reference_min = h_values( i );
	reference_loc = i;
	- } else if (h_values(i) == reference_min) {
	- // make min unique
	- h_values(i) += std::numeric_limits<Scalar>::epsilon();
	+ }
	+ else if ( h_values( i ) == reference_min ) {
	+ // Make min unique.
	+ h_values( i ) += std::numeric_limits< Scalar >::epsilon();
	}
	}
	- Kokkos::deep_copy(values,h_values);
	+ Kokkos::deep_copy( values, h_values );

	MinLocFunctor f;
	- typedef typename Kokkos::Experimental::MinLoc<Scalar,int>::value_type value_type;
	f.values = values;
	- Scalar init = std::numeric_limits<Scalar>::max();
	-
	+ Scalar init = std::numeric_limits< Scalar >::max();

	{
	value_type min_scalar;
	- Kokkos::Experimental::MinLoc<Scalar,int> reducer_scalar(min_scalar);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
	- ASSERT_EQ(min_scalar.val,reference_min);
	- ASSERT_EQ(min_scalar.loc,reference_loc);
	+ Kokkos::Experimental::MinLoc< Scalar, int > reducer_scalar( min_scalar );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
	+
	+ ASSERT_EQ( min_scalar.val, reference_min );
	+ ASSERT_EQ( min_scalar.loc, reference_loc );
	+
	value_type min_scalar_view = reducer_scalar.result_view()();
	- ASSERT_EQ(min_scalar_view.val,reference_min);
	- ASSERT_EQ(min_scalar_view.loc,reference_loc);
	+ ASSERT_EQ( min_scalar_view.val, reference_min );
	+ ASSERT_EQ( min_scalar_view.loc, reference_loc );
	}
	+
	{
	value_type min_scalar_init;
	- Kokkos::Experimental::MinLoc<Scalar,int> reducer_scalar_init(min_scalar_init,init);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
	- ASSERT_EQ(min_scalar_init.val,reference_min);
	- ASSERT_EQ(min_scalar_init.loc,reference_loc);
	+ Kokkos::Experimental::MinLoc< Scalar, int > reducer_scalar_init( min_scalar_init, init );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar_init );
	+
	+ ASSERT_EQ( min_scalar_init.val, reference_min );
	+ ASSERT_EQ( min_scalar_init.loc, reference_loc );
	+
	value_type min_scalar_init_view = reducer_scalar_init.result_view()();
	- ASSERT_EQ(min_scalar_init_view.val,reference_min);
	- ASSERT_EQ(min_scalar_init_view.loc,reference_loc);
	+ ASSERT_EQ( min_scalar_init_view.val, reference_min );
	+ ASSERT_EQ( min_scalar_init_view.loc, reference_loc );
	}
	+
	{
	- Kokkos::View<value_type,Kokkos::HostSpace> min_view("View");
	- Kokkos::Experimental::MinLoc<Scalar,int> reducer_view(min_view);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
	+ Kokkos::View< value_type, Kokkos::HostSpace > min_view( "View" );
	+ Kokkos::Experimental::MinLoc< Scalar, int > reducer_view( min_view );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
	+
	value_type min_view_scalar = min_view();
	- ASSERT_EQ(min_view_scalar.val,reference_min);
	- ASSERT_EQ(min_view_scalar.loc,reference_loc);
	+ ASSERT_EQ( min_view_scalar.val, reference_min );
	+ ASSERT_EQ( min_view_scalar.loc, reference_loc );
	+
	value_type min_view_view = reducer_view.result_view()();
	- ASSERT_EQ(min_view_view.val,reference_min);
	- ASSERT_EQ(min_view_view.loc,reference_loc);
	+ ASSERT_EQ( min_view_view.val, reference_min );
	+ ASSERT_EQ( min_view_view.loc, reference_loc );
	}
	+
	{
	- Kokkos::View<value_type,Kokkos::HostSpace> min_view_init("View");
	- Kokkos::Experimental::MinLoc<Scalar,int> reducer_view_init(min_view_init,init);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
	+ Kokkos::View< value_type, Kokkos::HostSpace > min_view_init( "View" );
	+ Kokkos::Experimental::MinLoc< Scalar, int > reducer_view_init( min_view_init, init );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view_init );
	+
	value_type min_view_init_scalar = min_view_init();
	- ASSERT_EQ(min_view_init_scalar.val,reference_min);
	- ASSERT_EQ(min_view_init_scalar.loc,reference_loc);
	+ ASSERT_EQ( min_view_init_scalar.val, reference_min );
	+ ASSERT_EQ( min_view_init_scalar.loc, reference_loc );
	+
	value_type min_view_init_view = reducer_view_init.result_view()();
	- ASSERT_EQ(min_view_init_view.val,reference_min);
	- ASSERT_EQ(min_view_init_view.loc,reference_loc);
	+ ASSERT_EQ( min_view_init_view.val, reference_min );
	+ ASSERT_EQ( min_view_init_view.loc, reference_loc );
	}
	}

	- static void test_maxloc(int N) {
	- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
	- auto h_values = Kokkos::create_mirror_view(values);
	- Scalar reference_max = std::numeric_limits<Scalar>::min();
	+ static void test_maxloc( int N ) {
	+ typedef typename Kokkos::Experimental::MaxLoc< Scalar, int >::value_type value_type;
	+
	+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
	+ auto h_values = Kokkos::create_mirror_view( values );
	+ Scalar reference_max = std::numeric_limits< Scalar >::min();
	int reference_loc = -1;
	- for(int i=0; i<N; i++) {
	- h_values(i) = (Scalar)(rand()%100000);
	- if(h_values(i)>reference_max) {
	- reference_max = h_values(i);
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ h_values( i ) = (Scalar) ( rand() % 100000 );
	+
	+ if ( h_values( i ) > reference_max ) {
	+ reference_max = h_values( i );
	reference_loc = i;
	- } else if (h_values(i) == reference_max) {
	- // make max unique
	- h_values(i) -= std::numeric_limits<Scalar>::epsilon();
	+ }
	+ else if ( h_values( i ) == reference_max ) {
	+ // Make max unique.
	+ h_values( i ) -= std::numeric_limits< Scalar >::epsilon();
	}
	}
	- Kokkos::deep_copy(values,h_values);
	+ Kokkos::deep_copy( values, h_values );

	MaxLocFunctor f;
	- typedef typename Kokkos::Experimental::MaxLoc<Scalar,int>::value_type value_type;
	f.values = values;
	- Scalar init = std::numeric_limits<Scalar>::min();
	-
	+ Scalar init = std::numeric_limits< Scalar >::min();

	{
	value_type max_scalar;
	- Kokkos::Experimental::MaxLoc<Scalar,int> reducer_scalar(max_scalar);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
	- ASSERT_EQ(max_scalar.val,reference_max);
	- ASSERT_EQ(max_scalar.loc,reference_loc);
	+ Kokkos::Experimental::MaxLoc< Scalar, int > reducer_scalar( max_scalar );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
	+
	+ ASSERT_EQ( max_scalar.val, reference_max );
	+ ASSERT_EQ( max_scalar.loc, reference_loc );
	+
	value_type max_scalar_view = reducer_scalar.result_view()();
	- ASSERT_EQ(max_scalar_view.val,reference_max);
	- ASSERT_EQ(max_scalar_view.loc,reference_loc);
	+ ASSERT_EQ( max_scalar_view.val, reference_max );
	+ ASSERT_EQ( max_scalar_view.loc, reference_loc );
	}
	+
	{
	value_type max_scalar_init;
	- Kokkos::Experimental::MaxLoc<Scalar,int> reducer_scalar_init(max_scalar_init,init);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
	- ASSERT_EQ(max_scalar_init.val,reference_max);
	- ASSERT_EQ(max_scalar_init.loc,reference_loc);
	+ Kokkos::Experimental::MaxLoc< Scalar, int > reducer_scalar_init( max_scalar_init, init );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar_init );
	+
	+ ASSERT_EQ( max_scalar_init.val, reference_max );
	+ ASSERT_EQ( max_scalar_init.loc, reference_loc );
	+
	value_type max_scalar_init_view = reducer_scalar_init.result_view()();
	- ASSERT_EQ(max_scalar_init_view.val,reference_max);
	- ASSERT_EQ(max_scalar_init_view.loc,reference_loc);
	+ ASSERT_EQ( max_scalar_init_view.val, reference_max );
	+ ASSERT_EQ( max_scalar_init_view.loc, reference_loc );
	}
	+
	{
	- Kokkos::View<value_type,Kokkos::HostSpace> max_view("View");
	- Kokkos::Experimental::MaxLoc<Scalar,int> reducer_view(max_view);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
	+ Kokkos::View< value_type, Kokkos::HostSpace > max_view( "View" );
	+ Kokkos::Experimental::MaxLoc< Scalar, int > reducer_view( max_view );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
	+
	value_type max_view_scalar = max_view();
	- ASSERT_EQ(max_view_scalar.val,reference_max);
	- ASSERT_EQ(max_view_scalar.loc,reference_loc);
	+ ASSERT_EQ( max_view_scalar.val, reference_max );
	+ ASSERT_EQ( max_view_scalar.loc, reference_loc );
	+
	value_type max_view_view = reducer_view.result_view()();
	- ASSERT_EQ(max_view_view.val,reference_max);
	- ASSERT_EQ(max_view_view.loc,reference_loc);
	+ ASSERT_EQ( max_view_view.val, reference_max );
	+ ASSERT_EQ( max_view_view.loc, reference_loc );
	}
	+
	{
	- Kokkos::View<value_type,Kokkos::HostSpace> max_view_init("View");
	- Kokkos::Experimental::MaxLoc<Scalar,int> reducer_view_init(max_view_init,init);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
	+ Kokkos::View< value_type, Kokkos::HostSpace > max_view_init( "View" );
	+ Kokkos::Experimental::MaxLoc< Scalar, int > reducer_view_init( max_view_init, init );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view_init );
	+
	value_type max_view_init_scalar = max_view_init();
	- ASSERT_EQ(max_view_init_scalar.val,reference_max);
	- ASSERT_EQ(max_view_init_scalar.loc,reference_loc);
	+ ASSERT_EQ( max_view_init_scalar.val, reference_max );
	+ ASSERT_EQ( max_view_init_scalar.loc, reference_loc );
	+
	value_type max_view_init_view = reducer_view_init.result_view()();
	- ASSERT_EQ(max_view_init_view.val,reference_max);
	- ASSERT_EQ(max_view_init_view.loc,reference_loc);
	+ ASSERT_EQ( max_view_init_view.val, reference_max );
	+ ASSERT_EQ( max_view_init_view.loc, reference_loc );
	}
	}

	- static void test_minmaxloc(int N) {
	- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
	- auto h_values = Kokkos::create_mirror_view(values);
	- Scalar reference_max = std::numeric_limits<Scalar>::min();
	- Scalar reference_min = std::numeric_limits<Scalar>::max();
	+ static void test_minmaxloc( int N ) {
	+ typedef typename Kokkos::Experimental::MinMaxLoc< Scalar, int >::value_type value_type;
	+
	+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
	+ auto h_values = Kokkos::create_mirror_view( values );
	+ Scalar reference_max = std::numeric_limits< Scalar >::min();
	+ Scalar reference_min = std::numeric_limits< Scalar >::max();
	int reference_minloc = -1;
	int reference_maxloc = -1;
	- for(int i=0; i<N; i++) {
	- h_values(i) = (Scalar)(rand()%100000);
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ h_values( i ) = (Scalar) ( rand() % 100000 );
	}
	- for(int i=0; i<N; i++) {
	- if(h_values(i)>reference_max) {
	- reference_max = h_values(i);
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ if ( h_values( i ) > reference_max ) {
	+ reference_max = h_values( i );
	reference_maxloc = i;
	- } else if (h_values(i) == reference_max) {
	- // make max unique
	- h_values(i) -= std::numeric_limits<Scalar>::epsilon();
	+ }
	+ else if ( h_values( i ) == reference_max ) {
	+ // Make max unique.
	+ h_values( i ) -= std::numeric_limits< Scalar >::epsilon();
	}
	}
	- for(int i=0; i<N; i++) {
	- if(h_values(i)<reference_min) {
	- reference_min = h_values(i);
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ if ( h_values( i ) < reference_min ) {
	+ reference_min = h_values( i );
	reference_minloc = i;
	- } else if (h_values(i) == reference_min) {
	- // make min unique
	- h_values(i) += std::numeric_limits<Scalar>::epsilon();
	+ }
	+ else if ( h_values( i ) == reference_min ) {
	+ // Make min unique.
	+ h_values( i ) += std::numeric_limits< Scalar >::epsilon();
	}
	}
	- Kokkos::deep_copy(values,h_values);
	+
	+ Kokkos::deep_copy( values, h_values );

	MinMaxLocFunctor f;
	- typedef typename Kokkos::Experimental::MinMaxLoc<Scalar,int>::value_type value_type;
	f.values = values;
	- Scalar init_min = std::numeric_limits<Scalar>::max();
	- Scalar init_max = std::numeric_limits<Scalar>::min();
	-
	+ Scalar init_min = std::numeric_limits< Scalar >::max();
	+ Scalar init_max = std::numeric_limits< Scalar >::min();

	{
	value_type minmax_scalar;
	- Kokkos::Experimental::MinMaxLoc<Scalar,int> reducer_scalar(minmax_scalar);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
	- ASSERT_EQ(minmax_scalar.min_val,reference_min);
	- for(int i=0; i<N; i++) {
	- if((i == minmax_scalar.min_loc) && (h_values(i)==reference_min))
	+ Kokkos::Experimental::MinMaxLoc< Scalar, int > reducer_scalar( minmax_scalar );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
	+
	+ ASSERT_EQ( minmax_scalar.min_val, reference_min );
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ if ( ( i == minmax_scalar.min_loc ) && ( h_values( i ) == reference_min ) ) {
	reference_minloc = i;
	+ }
	}
	- ASSERT_EQ(minmax_scalar.min_loc,reference_minloc);
	- ASSERT_EQ(minmax_scalar.max_val,reference_max);
	- for(int i=0; i<N; i++) {
	- if((i == minmax_scalar.max_loc) && (h_values(i)==reference_max))
	+
	+ ASSERT_EQ( minmax_scalar.min_loc, reference_minloc );
	+ ASSERT_EQ( minmax_scalar.max_val, reference_max );
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ if ( ( i == minmax_scalar.max_loc ) && ( h_values( i ) == reference_max ) ) {
	reference_maxloc = i;
	+ }
	}
	- ASSERT_EQ(minmax_scalar.max_loc,reference_maxloc);
	+
	+ ASSERT_EQ( minmax_scalar.max_loc, reference_maxloc );
	+
	value_type minmax_scalar_view = reducer_scalar.result_view()();
	- ASSERT_EQ(minmax_scalar_view.min_val,reference_min);
	- ASSERT_EQ(minmax_scalar_view.min_loc,reference_minloc);
	- ASSERT_EQ(minmax_scalar_view.max_val,reference_max);
	- ASSERT_EQ(minmax_scalar_view.max_loc,reference_maxloc);
	+ ASSERT_EQ( minmax_scalar_view.min_val, reference_min );
	+ ASSERT_EQ( minmax_scalar_view.min_loc, reference_minloc );
	+ ASSERT_EQ( minmax_scalar_view.max_val, reference_max );
	+ ASSERT_EQ( minmax_scalar_view.max_loc, reference_maxloc );
	}
	+
	{
	value_type minmax_scalar_init;
	- Kokkos::Experimental::MinMaxLoc<Scalar,int> reducer_scalar_init(minmax_scalar_init,init_min,init_max);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
	- ASSERT_EQ(minmax_scalar_init.min_val,reference_min);
	- ASSERT_EQ(minmax_scalar_init.min_loc,reference_minloc);
	- ASSERT_EQ(minmax_scalar_init.max_val,reference_max);
	- ASSERT_EQ(minmax_scalar_init.max_loc,reference_maxloc);
	+ Kokkos::Experimental::MinMaxLoc< Scalar, int > reducer_scalar_init( minmax_scalar_init, init_min, init_max );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar_init );
	+
	+ ASSERT_EQ( minmax_scalar_init.min_val, reference_min );
	+ ASSERT_EQ( minmax_scalar_init.min_loc, reference_minloc );
	+ ASSERT_EQ( minmax_scalar_init.max_val, reference_max );
	+ ASSERT_EQ( minmax_scalar_init.max_loc, reference_maxloc );
	+
	value_type minmax_scalar_init_view = reducer_scalar_init.result_view()();
	- ASSERT_EQ(minmax_scalar_init_view.min_val,reference_min);
	- ASSERT_EQ(minmax_scalar_init_view.min_loc,reference_minloc);
	- ASSERT_EQ(minmax_scalar_init_view.max_val,reference_max);
	- ASSERT_EQ(minmax_scalar_init_view.max_loc,reference_maxloc);
	+ ASSERT_EQ( minmax_scalar_init_view.min_val, reference_min );
	+ ASSERT_EQ( minmax_scalar_init_view.min_loc, reference_minloc );
	+ ASSERT_EQ( minmax_scalar_init_view.max_val, reference_max );
	+ ASSERT_EQ( minmax_scalar_init_view.max_loc, reference_maxloc );
	}
	+
	{
	- Kokkos::View<value_type,Kokkos::HostSpace> minmax_view("View");
	- Kokkos::Experimental::MinMaxLoc<Scalar,int> reducer_view(minmax_view);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
	+ Kokkos::View< value_type, Kokkos::HostSpace > minmax_view( "View" );
	+ Kokkos::Experimental::MinMaxLoc< Scalar, int > reducer_view( minmax_view );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
	+
	value_type minmax_view_scalar = minmax_view();
	- ASSERT_EQ(minmax_view_scalar.min_val,reference_min);
	- ASSERT_EQ(minmax_view_scalar.min_loc,reference_minloc);
	- ASSERT_EQ(minmax_view_scalar.max_val,reference_max);
	- ASSERT_EQ(minmax_view_scalar.max_loc,reference_maxloc);
	+ ASSERT_EQ( minmax_view_scalar.min_val, reference_min );
	+ ASSERT_EQ( minmax_view_scalar.min_loc, reference_minloc );
	+ ASSERT_EQ( minmax_view_scalar.max_val, reference_max );
	+ ASSERT_EQ( minmax_view_scalar.max_loc, reference_maxloc );
	+
	value_type minmax_view_view = reducer_view.result_view()();
	- ASSERT_EQ(minmax_view_view.min_val,reference_min);
	- ASSERT_EQ(minmax_view_view.min_loc,reference_minloc);
	- ASSERT_EQ(minmax_view_view.max_val,reference_max);
	- ASSERT_EQ(minmax_view_view.max_loc,reference_maxloc);
	+ ASSERT_EQ( minmax_view_view.min_val, reference_min );
	+ ASSERT_EQ( minmax_view_view.min_loc, reference_minloc );
	+ ASSERT_EQ( minmax_view_view.max_val, reference_max );
	+ ASSERT_EQ( minmax_view_view.max_loc, reference_maxloc );
	}
	+
	{
	- Kokkos::View<value_type,Kokkos::HostSpace> minmax_view_init("View");
	- Kokkos::Experimental::MinMaxLoc<Scalar,int> reducer_view_init(minmax_view_init,init_min,init_max);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
	+ Kokkos::View< value_type, Kokkos::HostSpace > minmax_view_init( "View" );
	+ Kokkos::Experimental::MinMaxLoc< Scalar, int > reducer_view_init( minmax_view_init, init_min, init_max );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view_init );
	+
	value_type minmax_view_init_scalar = minmax_view_init();
	- ASSERT_EQ(minmax_view_init_scalar.min_val,reference_min);
	- ASSERT_EQ(minmax_view_init_scalar.min_loc,reference_minloc);
	- ASSERT_EQ(minmax_view_init_scalar.max_val,reference_max);
	- ASSERT_EQ(minmax_view_init_scalar.max_loc,reference_maxloc);
	+ ASSERT_EQ( minmax_view_init_scalar.min_val, reference_min );
	+ ASSERT_EQ( minmax_view_init_scalar.min_loc, reference_minloc );
	+ ASSERT_EQ( minmax_view_init_scalar.max_val, reference_max );
	+ ASSERT_EQ( minmax_view_init_scalar.max_loc, reference_maxloc );
	+
	value_type minmax_view_init_view = reducer_view_init.result_view()();
	- ASSERT_EQ(minmax_view_init_view.min_val,reference_min);
	- ASSERT_EQ(minmax_view_init_view.min_loc,reference_minloc);
	- ASSERT_EQ(minmax_view_init_view.max_val,reference_max);
	- ASSERT_EQ(minmax_view_init_view.max_loc,reference_maxloc);
	+ ASSERT_EQ( minmax_view_init_view.min_val, reference_min );
	+ ASSERT_EQ( minmax_view_init_view.min_loc, reference_minloc );
	+ ASSERT_EQ( minmax_view_init_view.max_val, reference_max );
	+ ASSERT_EQ( minmax_view_init_view.max_loc, reference_maxloc );
	}
	}

	- static void test_BAnd(int N) {
	- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
	- auto h_values = Kokkos::create_mirror_view(values);
	- Scalar reference_band = Scalar() \| (~Scalar());
	- for(int i=0; i<N; i++) {
	- h_values(i) = (Scalar)(rand()%100000+1);
	- reference_band = reference_band & h_values(i);
	+ static void test_BAnd( int N ) {
	+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
	+ auto h_values = Kokkos::create_mirror_view( values );
	+ Scalar reference_band = Scalar() \| ( ~Scalar() );
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ h_values( i ) = (Scalar) ( rand() % 100000 + 1 );
	+ reference_band = reference_band & h_values( i );
	}
	- Kokkos::deep_copy(values,h_values);
	+ Kokkos::deep_copy( values, h_values );

	BAndFunctor f;
	f.values = values;
	- Scalar init = Scalar() \| (~Scalar());
	+ Scalar init = Scalar() \| ( ~Scalar() );

	{
	Scalar band_scalar = init;
	- Kokkos::Experimental::BAnd<Scalar> reducer_scalar(band_scalar);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
	- ASSERT_EQ(band_scalar,reference_band);
	+ Kokkos::Experimental::BAnd< Scalar > reducer_scalar( band_scalar );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
	+
	+ ASSERT_EQ( band_scalar, reference_band );
	Scalar band_scalar_view = reducer_scalar.result_view()();
	- ASSERT_EQ(band_scalar_view,reference_band);
	+
	+ ASSERT_EQ( band_scalar_view, reference_band );
	}

	{
	- Kokkos::View<Scalar,Kokkos::HostSpace> band_view("View");
	+ Kokkos::View< Scalar, Kokkos::HostSpace > band_view( "View" );
	band_view() = init;
	- Kokkos::Experimental::BAnd<Scalar> reducer_view(band_view);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
	+ Kokkos::Experimental::BAnd< Scalar > reducer_view( band_view );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
	+
	Scalar band_view_scalar = band_view();
	- ASSERT_EQ(band_view_scalar,reference_band);
	+ ASSERT_EQ( band_view_scalar, reference_band );
	+
	Scalar band_view_view = reducer_view.result_view()();
	- ASSERT_EQ(band_view_view,reference_band);
	+ ASSERT_EQ( band_view_view, reference_band );
	}
	}

	- static void test_BOr(int N) {
	- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
	- auto h_values = Kokkos::create_mirror_view(values);
	- Scalar reference_bor = Scalar() & (~Scalar());
	- for(int i=0; i<N; i++) {
	- h_values(i) = (Scalar)((rand()%100000+1)*2);
	- reference_bor = reference_bor \| h_values(i);
	+ static void test_BOr( int N ) {
	+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
	+ auto h_values = Kokkos::create_mirror_view( values );
	+ Scalar reference_bor = Scalar() & ( ~Scalar() );
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ h_values( i ) = (Scalar) ( ( rand() % 100000 + 1 ) * 2 );
	+ reference_bor = reference_bor \| h_values( i );
	}
	- Kokkos::deep_copy(values,h_values);
	+ Kokkos::deep_copy( values, h_values );

	BOrFunctor f;
	f.values = values;
	- Scalar init = Scalar() & (~Scalar());
	+ Scalar init = Scalar() & ( ~Scalar() );

	{
	Scalar bor_scalar = init;
	- Kokkos::Experimental::BOr<Scalar> reducer_scalar(bor_scalar);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
	- ASSERT_EQ(bor_scalar,reference_bor);
	+ Kokkos::Experimental::BOr< Scalar > reducer_scalar( bor_scalar );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
	+
	+ ASSERT_EQ( bor_scalar, reference_bor );
	+
	Scalar bor_scalar_view = reducer_scalar.result_view()();
	- ASSERT_EQ(bor_scalar_view,reference_bor);
	+ ASSERT_EQ( bor_scalar_view, reference_bor );
	}

	{
	- Kokkos::View<Scalar,Kokkos::HostSpace> bor_view("View");
	+ Kokkos::View< Scalar, Kokkos::HostSpace > bor_view( "View" );
	bor_view() = init;
	- Kokkos::Experimental::BOr<Scalar> reducer_view(bor_view);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
	+ Kokkos::Experimental::BOr< Scalar > reducer_view( bor_view );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
	+
	Scalar bor_view_scalar = bor_view();
	- ASSERT_EQ(bor_view_scalar,reference_bor);
	+ ASSERT_EQ( bor_view_scalar, reference_bor );
	+
	Scalar bor_view_view = reducer_view.result_view()();
	- ASSERT_EQ(bor_view_view,reference_bor);
	+ ASSERT_EQ( bor_view_view, reference_bor );
	}
	}

	- static void test_BXor(int N) {
	- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
	- auto h_values = Kokkos::create_mirror_view(values);
	- Scalar reference_bxor = Scalar() & (~Scalar());
	- for(int i=0; i<N; i++) {
	- h_values(i) = (Scalar)((rand()%100000+1)*2);
	- reference_bxor = reference_bxor ^ h_values(i);
	+ static void test_BXor( int N ) {
	+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
	+ auto h_values = Kokkos::create_mirror_view( values );
	+ Scalar reference_bxor = Scalar() & ( ~Scalar() );
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ h_values( i ) = (Scalar) ( ( rand() % 100000 + 1 ) * 2 );
	+ reference_bxor = reference_bxor ^ h_values( i );
	}
	- Kokkos::deep_copy(values,h_values);
	+ Kokkos::deep_copy( values, h_values );

	BXorFunctor f;
	f.values = values;
	- Scalar init = Scalar() & (~Scalar());
	+ Scalar init = Scalar() & ( ~Scalar() );

	{
	Scalar bxor_scalar = init;
	- Kokkos::Experimental::BXor<Scalar> reducer_scalar(bxor_scalar);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
	- ASSERT_EQ(bxor_scalar,reference_bxor);
	+ Kokkos::Experimental::BXor< Scalar > reducer_scalar( bxor_scalar );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
	+
	+ ASSERT_EQ( bxor_scalar, reference_bxor );
	+
	Scalar bxor_scalar_view = reducer_scalar.result_view()();
	- ASSERT_EQ(bxor_scalar_view,reference_bxor);
	+ ASSERT_EQ( bxor_scalar_view, reference_bxor );
	}

	{
	- Kokkos::View<Scalar,Kokkos::HostSpace> bxor_view("View");
	+ Kokkos::View< Scalar, Kokkos::HostSpace > bxor_view( "View" );
	bxor_view() = init;
	- Kokkos::Experimental::BXor<Scalar> reducer_view(bxor_view);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
	+ Kokkos::Experimental::BXor< Scalar > reducer_view( bxor_view );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
	+
	Scalar bxor_view_scalar = bxor_view();
	- ASSERT_EQ(bxor_view_scalar,reference_bxor);
	+ ASSERT_EQ( bxor_view_scalar, reference_bxor );
	+
	Scalar bxor_view_view = reducer_view.result_view()();
	- ASSERT_EQ(bxor_view_view,reference_bxor);
	+ ASSERT_EQ( bxor_view_view, reference_bxor );
	}
	}

	- static void test_LAnd(int N) {
	- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
	- auto h_values = Kokkos::create_mirror_view(values);
	+ static void test_LAnd( int N ) {
	+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
	+ auto h_values = Kokkos::create_mirror_view( values );
	Scalar reference_land = 1;
	- for(int i=0; i<N; i++) {
	- h_values(i) = (Scalar)(rand()%2);
	- reference_land = reference_land && h_values(i);
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ h_values( i ) = (Scalar) ( rand() % 2 );
	+ reference_land = reference_land && h_values( i );
	}
	- Kokkos::deep_copy(values,h_values);
	+ Kokkos::deep_copy( values, h_values );

	LAndFunctor f;
	f.values = values;
	Scalar init = 1;

	{
	Scalar land_scalar = init;
	- Kokkos::Experimental::LAnd<Scalar> reducer_scalar(land_scalar);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
	- ASSERT_EQ(land_scalar,reference_land);
	+ Kokkos::Experimental::LAnd< Scalar > reducer_scalar( land_scalar );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
	+
	+ ASSERT_EQ( land_scalar, reference_land );
	+
	Scalar land_scalar_view = reducer_scalar.result_view()();
	- ASSERT_EQ(land_scalar_view,reference_land);
	+ ASSERT_EQ( land_scalar_view, reference_land );
	}

	{
	- Kokkos::View<Scalar,Kokkos::HostSpace> land_view("View");
	+ Kokkos::View< Scalar, Kokkos::HostSpace > land_view( "View" );
	land_view() = init;
	- Kokkos::Experimental::LAnd<Scalar> reducer_view(land_view);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
	+ Kokkos::Experimental::LAnd< Scalar > reducer_view( land_view );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
	+
	Scalar land_view_scalar = land_view();
	- ASSERT_EQ(land_view_scalar,reference_land);
	+ ASSERT_EQ( land_view_scalar, reference_land );
	+
	Scalar land_view_view = reducer_view.result_view()();
	- ASSERT_EQ(land_view_view,reference_land);
	+ ASSERT_EQ( land_view_view, reference_land );
	}
	}

	- static void test_LOr(int N) {
	- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
	- auto h_values = Kokkos::create_mirror_view(values);
	+ static void test_LOr( int N ) {
	+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
	+ auto h_values = Kokkos::create_mirror_view( values );
	Scalar reference_lor = 0;
	- for(int i=0; i<N; i++) {
	- h_values(i) = (Scalar)(rand()%2);
	- reference_lor = reference_lor \|\| h_values(i);
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ h_values( i ) = (Scalar) ( rand() % 2 );
	+ reference_lor = reference_lor \|\| h_values( i );
	}
	- Kokkos::deep_copy(values,h_values);
	+ Kokkos::deep_copy( values, h_values );

	LOrFunctor f;
	f.values = values;
	Scalar init = 0;

	{
	Scalar lor_scalar = init;
	- Kokkos::Experimental::LOr<Scalar> reducer_scalar(lor_scalar);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
	- ASSERT_EQ(lor_scalar,reference_lor);
	+ Kokkos::Experimental::LOr< Scalar > reducer_scalar( lor_scalar );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
	+
	+ ASSERT_EQ( lor_scalar, reference_lor );
	+
	Scalar lor_scalar_view = reducer_scalar.result_view()();
	- ASSERT_EQ(lor_scalar_view,reference_lor);
	+ ASSERT_EQ( lor_scalar_view, reference_lor );
	}

	{
	- Kokkos::View<Scalar,Kokkos::HostSpace> lor_view("View");
	+ Kokkos::View< Scalar, Kokkos::HostSpace > lor_view( "View" );
	lor_view() = init;
	- Kokkos::Experimental::LOr<Scalar> reducer_view(lor_view);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
	+ Kokkos::Experimental::LOr< Scalar > reducer_view( lor_view );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
	+
	Scalar lor_view_scalar = lor_view();
	- ASSERT_EQ(lor_view_scalar,reference_lor);
	+ ASSERT_EQ( lor_view_scalar, reference_lor );
	+
	Scalar lor_view_view = reducer_view.result_view()();
	- ASSERT_EQ(lor_view_view,reference_lor);
	+ ASSERT_EQ( lor_view_view, reference_lor );
	}
	}

	- static void test_LXor(int N) {
	- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
	- auto h_values = Kokkos::create_mirror_view(values);
	+ static void test_LXor( int N ) {
	+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
	+ auto h_values = Kokkos::create_mirror_view( values );
	Scalar reference_lxor = 0;
	- for(int i=0; i<N; i++) {
	- h_values(i) = (Scalar)(rand()%2);
	- reference_lxor = reference_lxor ? (!h_values(i)) : h_values(i);
	+
	+ for ( int i = 0; i < N; i++ ) {
	+ h_values( i ) = (Scalar) ( rand() % 2 );
	+ reference_lxor = reference_lxor ? ( !h_values( i ) ) : h_values( i );
	}
	- Kokkos::deep_copy(values,h_values);
	+ Kokkos::deep_copy( values, h_values );

	LXorFunctor f;
	f.values = values;
	Scalar init = 0;

	{
	Scalar lxor_scalar = init;
	- Kokkos::Experimental::LXor<Scalar> reducer_scalar(lxor_scalar);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
	- ASSERT_EQ(lxor_scalar,reference_lxor);
	+ Kokkos::Experimental::LXor< Scalar > reducer_scalar( lxor_scalar );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
	+
	+ ASSERT_EQ( lxor_scalar, reference_lxor );
	+
	Scalar lxor_scalar_view = reducer_scalar.result_view()();
	- ASSERT_EQ(lxor_scalar_view,reference_lxor);
	+ ASSERT_EQ( lxor_scalar_view, reference_lxor );
	}

	{
	- Kokkos::View<Scalar,Kokkos::HostSpace> lxor_view("View");
	+ Kokkos::View< Scalar, Kokkos::HostSpace > lxor_view( "View" );
	lxor_view() = init;
	- Kokkos::Experimental::LXor<Scalar> reducer_view(lxor_view);
	- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
	+ Kokkos::Experimental::LXor< Scalar > reducer_view( lxor_view );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
	+
	Scalar lxor_view_scalar = lxor_view();
	- ASSERT_EQ(lxor_view_scalar,reference_lxor);
	+ ASSERT_EQ( lxor_view_scalar, reference_lxor );
	+
	Scalar lxor_view_view = reducer_view.result_view()();
	- ASSERT_EQ(lxor_view_view,reference_lxor);
	+ ASSERT_EQ( lxor_view_view, reference_lxor );
	}
	}

	static void execute_float() {
	- test_sum(10001);
	- test_prod(35);
	- test_min(10003);
	- test_minloc(10003);
	- test_max(10007);
	- test_maxloc(10007);
	- test_minmaxloc(10007);
	+ test_sum( 10001 );
	+ test_prod( 35 );
	+ test_min( 10003 );
	+ test_minloc( 10003 );
	+ test_max( 10007 );
	+ test_maxloc( 10007 );
	+ test_minmaxloc( 10007 );
	}

	static void execute_integer() {
	- test_sum(10001);
	- test_prod(35);
	- test_min(10003);
	- test_minloc(10003);
	- test_max(10007);
	- test_maxloc(10007);
	- test_minmaxloc(10007);
	- test_BAnd(35);
	- test_BOr(35);
	- test_BXor(35);
	- test_LAnd(35);
	- test_LOr(35);
	- test_LXor(35);
	+ test_sum( 10001 );
	+ test_prod( 35 );
	+ test_min( 10003 );
	+ test_minloc( 10003 );
	+ test_max( 10007 );
	+ test_maxloc( 10007 );
	+ test_minmaxloc( 10007 );
	+ test_BAnd( 35 );
	+ test_BOr( 35 );
	+ test_BXor( 35 );
	+ test_LAnd( 35 );
	+ test_LOr( 35 );
	+ test_LXor( 35 );
	}

	static void execute_basic() {
	- test_sum(10001);
	- test_prod(35);
	+ test_sum( 10001 );
	+ test_prod( 35 );
	}
	};
	-}
	-
	-/--------------------------------------------------------------------------/

	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/TestScan.hpp b/lib/kokkos/core/unit_test/TestScan.hpp
	index 1a9811a85..547e03497 100644
	--- a/lib/kokkos/core/unit_test/TestScan.hpp
	+++ b/lib/kokkos/core/unit_test/TestScan.hpp
	@@ -1,117 +1,116 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	-/--------------------------------------------------------------------------/
	-
	#include <stdio.h>

	namespace Test {

	-template< class Device , class WorkSpec = size_t >
	+template< class Device, class WorkSpec = size_t >
	struct TestScan {
	+ typedef Device execution_space;
	+ typedef long int value_type;

	- typedef Device execution_space ;
	- typedef long int value_type ;
	-
	- Kokkos::View<int,Device,Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
	+ Kokkos::View< int, Device, Kokkos::MemoryTraits<Kokkos::Atomic> > errors;

	KOKKOS_INLINE_FUNCTION
	- void operator()( const int iwork , value_type & update , const bool final_pass ) const
	+ void operator()( const int iwork, value_type & update, const bool final_pass ) const
	{
	- const value_type n = iwork + 1 ;
	- const value_type imbalance = ( (1000 <= n) && (0 == n % 1000) ) ? 1000 : 0 ;
	+ const value_type n = iwork + 1;
	+ const value_type imbalance = ( ( 1000 <= n ) && ( 0 == n % 1000 ) ) ? 1000 : 0;

	// Insert an artificial load imbalance

	- for ( value_type i = 0 ; i < imbalance ; ++i ) { ++update ; }
	+ for ( value_type i = 0; i < imbalance; ++i ) { ++update; }

	- update += n - imbalance ;
	+ update += n - imbalance;

	if ( final_pass ) {
	const value_type answer = n & 1 ? ( n * ( ( n + 1 ) / 2 ) ) : ( ( n / 2 ) * ( n + 1 ) );

	if ( answer != update ) {
	errors()++;
	- if(errors()<20)
	- printf("TestScan(%d,%ld) != %ld\n",iwork,update,answer);
	+
	+ if ( errors() < 20 ) {
	+ printf( "TestScan(%d,%ld) != %ld\n", iwork, update, answer );
	+ }
	}
	}
	}

	KOKKOS_INLINE_FUNCTION
	- void init( value_type & update ) const { update = 0 ; }
	+ void init( value_type & update ) const { update = 0; }

	KOKKOS_INLINE_FUNCTION
	- void join( volatile value_type & update ,
	+ void join( volatile value_type & update,
	volatile const value_type & input ) const
	- { update += input ; }
	+ { update += input; }

	TestScan( const WorkSpec & N )
	- {
	- Kokkos::View<int,Device > errors_a("Errors");
	- Kokkos::deep_copy(errors_a,0);
	- errors = errors_a;
	- parallel_scan( N , *this );
	- }
	+ {
	+ Kokkos::View< int, Device > errors_a( "Errors" );
	+ Kokkos::deep_copy( errors_a, 0 );
	+ errors = errors_a;
	+
	+ parallel_scan( N , *this );
	+ }

	TestScan( const WorkSpec & Start , const WorkSpec & N )
	- {
	- typedef Kokkos::RangePolicy<execution_space> exec_policy ;
	+ {
	+ typedef Kokkos::RangePolicy< execution_space > exec_policy ;

	- Kokkos::View<int,Device > errors_a("Errors");
	- Kokkos::deep_copy(errors_a,0);
	- errors = errors_a;
	+ Kokkos::View< int, Device > errors_a( "Errors" );
	+ Kokkos::deep_copy( errors_a, 0 );
	+ errors = errors_a;

	- parallel_scan( exec_policy( Start , N ) , *this );
	- }
	+ parallel_scan( exec_policy( Start , N ) , *this );
	+ }

	- static void test_range( const WorkSpec & begin , const WorkSpec & end )
	- {
	- for ( WorkSpec i = begin ; i < end ; ++i ) {
	- (void) TestScan( i );
	- }
	+ static void test_range( const WorkSpec & begin, const WorkSpec & end )
	+ {
	+ for ( WorkSpec i = begin; i < end; ++i ) {
	+ (void) TestScan( i );
	}
	+ }
	};

	-}
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/TestSharedAlloc.hpp b/lib/kokkos/core/unit_test/TestSharedAlloc.hpp
	index 291f9f60e..6eca6bb38 100644
	--- a/lib/kokkos/core/unit_test/TestSharedAlloc.hpp
	+++ b/lib/kokkos/core/unit_test/TestSharedAlloc.hpp
	@@ -1,215 +1,210 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <stdexcept>
	#include <sstream>
	#include <iostream>

	#include <Kokkos_Core.hpp>

	/--------------------------------------------------------------------------/

	namespace Test {

	struct SharedAllocDestroy {
	+ volatile int * count;

	- volatile int * count ;
	-
	- SharedAllocDestroy() = default ;
	+ SharedAllocDestroy() = default;
	SharedAllocDestroy( int * arg ) : count( arg ) {}

	void destroy_shared_allocation()
	- {
	- Kokkos::atomic_increment( count );
	- }
	-
	+ {
	+ Kokkos::atomic_increment( count );
	+ }
	};

	-template< class MemorySpace , class ExecutionSpace >
	+template< class MemorySpace, class ExecutionSpace >
	void test_shared_alloc()
	{
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	+ typedef const Kokkos::Impl::SharedAllocationHeader Header;
	+ typedef Kokkos::Impl::SharedAllocationTracker Tracker;
	+ typedef Kokkos::Impl::SharedAllocationRecord< void, void > RecordBase;
	+ typedef Kokkos::Impl::SharedAllocationRecord< MemorySpace, void > RecordMemS;
	+ typedef Kokkos::Impl::SharedAllocationRecord< MemorySpace, SharedAllocDestroy > RecordFull;

	- typedef const Kokkos::Impl::SharedAllocationHeader Header ;
	- typedef Kokkos::Impl::SharedAllocationTracker Tracker ;
	- typedef Kokkos::Impl::SharedAllocationRecord< void , void > RecordBase ;
	- typedef Kokkos::Impl::SharedAllocationRecord< MemorySpace , void > RecordMemS ;
	- typedef Kokkos::Impl::SharedAllocationRecord< MemorySpace , SharedAllocDestroy > RecordFull ;
	-
	- static_assert( sizeof(Tracker) == sizeof(int*), "SharedAllocationTracker has wrong size!" );
	+ static_assert( sizeof( Tracker ) == sizeof( int* ), "SharedAllocationTracker has wrong size!" );

	- MemorySpace s ;
	+ MemorySpace s;

	- const size_t N = 1200 ;
	- const size_t size = 8 ;
	+ const size_t N = 1200;
	+ const size_t size = 8;

	RecordMemS * rarray[ N ];
	Header * harray[ N ];

	- RecordMemS ** const r = rarray ;
	- Header ** const h = harray ;
	+ RecordMemS ** const r = rarray;
	+ Header ** const h = harray;
	+
	+ Kokkos::RangePolicy< ExecutionSpace > range( 0, N );

	- Kokkos::RangePolicy< ExecutionSpace > range(0,N);
	-
	- //----------------------------------------
	{
	- // Since always executed on host space, leave [=]
	- Kokkos::parallel_for( range , [=]( size_t i ){
	- char name[64] ;
	- sprintf(name,"test_%.2d",int(i));
	+ // Since always executed on host space, leave [=]
	+ Kokkos::parallel_for( range, [=] ( size_t i ) {
	+ char name[64];
	+ sprintf( name, "test_%.2d", int( i ) );

	- r[i] = RecordMemS::allocate( s , name , size * ( i + 1 ) );
	+ r[i] = RecordMemS::allocate( s, name, size * ( i + 1 ) );
	h[i] = Header::get_header( r[i]->data() );

	- ASSERT_EQ( r[i]->use_count() , 0 );
	+ ASSERT_EQ( r[i]->use_count(), 0 );

	- for ( size_t j = 0 ; j < ( i / 10 ) + 1 ; ++j ) RecordBase::increment( r[i] );
	+ for ( size_t j = 0; j < ( i / 10 ) + 1; ++j ) RecordBase::increment( r[i] );

	- ASSERT_EQ( r[i]->use_count() , ( i / 10 ) + 1 );
	- ASSERT_EQ( r[i] , RecordMemS::get_record( r[i]->data() ) );
	+ ASSERT_EQ( r[i]->use_count(), ( i / 10 ) + 1 );
	+ ASSERT_EQ( r[i], RecordMemS::get_record( r[i]->data() ) );
	});

	// Sanity check for the whole set of allocation records to which this record belongs.
	RecordBase::is_sane( r[0] );
	- // RecordMemS::print_records( std::cout , s , true );
	+ // RecordMemS::print_records( std::cout, s, true );

	- Kokkos::parallel_for( range , [=]( size_t i ){
	- while ( 0 != ( r[i] = static_cast< RecordMemS *>( RecordBase::decrement( r[i] ) ) ) ) {
	+ Kokkos::parallel_for( range, [=] ( size_t i ) {
	+ while ( 0 != ( r[i] = static_cast< RecordMemS * >( RecordBase::decrement( r[i] ) ) ) ) {
	if ( r[i]->use_count() == 1 ) RecordBase::is_sane( r[i] );
	}
	});
	}
	- //----------------------------------------
	+
	{
	- int destroy_count = 0 ;
	- SharedAllocDestroy counter( & destroy_count );
	+ int destroy_count = 0;
	+ SharedAllocDestroy counter( &destroy_count );

	- Kokkos::parallel_for( range , [=]( size_t i ){
	- char name[64] ;
	- sprintf(name,"test_%.2d",int(i));
	+ Kokkos::parallel_for( range, [=] ( size_t i ) {
	+ char name[64];
	+ sprintf( name, "test_%.2d", int( i ) );

	- RecordFull * rec = RecordFull::allocate( s , name , size * ( i + 1 ) );
	+ RecordFull * rec = RecordFull::allocate( s, name, size * ( i + 1 ) );

	- rec->m_destroy = counter ;
	+ rec->m_destroy = counter;

	- r[i] = rec ;
	+ r[i] = rec;
	h[i] = Header::get_header( r[i]->data() );

	- ASSERT_EQ( r[i]->use_count() , 0 );
	+ ASSERT_EQ( r[i]->use_count(), 0 );

	- for ( size_t j = 0 ; j < ( i / 10 ) + 1 ; ++j ) RecordBase::increment( r[i] );
	+ for ( size_t j = 0; j < ( i / 10 ) + 1; ++j ) RecordBase::increment( r[i] );

	- ASSERT_EQ( r[i]->use_count() , ( i / 10 ) + 1 );
	- ASSERT_EQ( r[i] , RecordMemS::get_record( r[i]->data() ) );
	+ ASSERT_EQ( r[i]->use_count(), ( i / 10 ) + 1 );
	+ ASSERT_EQ( r[i], RecordMemS::get_record( r[i]->data() ) );
	});

	RecordBase::is_sane( r[0] );

	- Kokkos::parallel_for( range , [=]( size_t i ){
	- while ( 0 != ( r[i] = static_cast< RecordMemS *>( RecordBase::decrement( r[i] ) ) ) ) {
	+ Kokkos::parallel_for( range, [=] ( size_t i ) {
	+ while ( 0 != ( r[i] = static_cast< RecordMemS * >( RecordBase::decrement( r[i] ) ) ) ) {
	if ( r[i]->use_count() == 1 ) RecordBase::is_sane( r[i] );
	}
	});

	- ASSERT_EQ( destroy_count , int(N) );
	+ ASSERT_EQ( destroy_count, int( N ) );
	}

	- //----------------------------------------
	{
	- int destroy_count = 0 ;
	+ int destroy_count = 0;

	{
	- RecordFull * rec = RecordFull::allocate( s , "test" , size );
	+ RecordFull * rec = RecordFull::allocate( s, "test", size );

	- // ... Construction of the allocated { rec->data() , rec->size() }
	+ // ... Construction of the allocated { rec->data(), rec->size() }

	- // Copy destruction function object into the allocation record
	+ // Copy destruction function object into the allocation record.
	rec->m_destroy = SharedAllocDestroy( & destroy_count );

	- ASSERT_EQ( rec->use_count() , 0 );
	+ ASSERT_EQ( rec->use_count(), 0 );

	- // Start tracking, increments the use count from 0 to 1
	- Tracker track ;
	+ // Start tracking, increments the use count from 0 to 1.
	+ Tracker track;

	track.assign_allocated_record_to_uninitialized( rec );

	- ASSERT_EQ( rec->use_count() , 1 );
	- ASSERT_EQ( track.use_count() , 1 );
	+ ASSERT_EQ( rec->use_count(), 1 );
	+ ASSERT_EQ( track.use_count(), 1 );
	+
	+ // Verify construction / destruction increment.
	+ for ( size_t i = 0; i < N; ++i ) {
	+ ASSERT_EQ( rec->use_count(), 1 );

	- // Verify construction / destruction increment
	- for ( size_t i = 0 ; i < N ; ++i ) {
	- ASSERT_EQ( rec->use_count() , 1 );
	{
	- Tracker local_tracker ;
	+ Tracker local_tracker;
	local_tracker.assign_allocated_record_to_uninitialized( rec );
	- ASSERT_EQ( rec->use_count() , 2 );
	- ASSERT_EQ( local_tracker.use_count() , 2 );
	+ ASSERT_EQ( rec->use_count(), 2 );
	+ ASSERT_EQ( local_tracker.use_count(), 2 );
	}
	- ASSERT_EQ( rec->use_count() , 1 );
	- ASSERT_EQ( track.use_count() , 1 );
	+
	+ ASSERT_EQ( rec->use_count(), 1 );
	+ ASSERT_EQ( track.use_count(), 1 );
	}

	- Kokkos::parallel_for( range , [=]( size_t i ){
	- Tracker local_tracker ;
	+ Kokkos::parallel_for( range, [=] ( size_t i ) {
	+ Tracker local_tracker;
	local_tracker.assign_allocated_record_to_uninitialized( rec );
	- ASSERT_GT( rec->use_count() , 1 );
	+ ASSERT_GT( rec->use_count(), 1 );
	});

	- ASSERT_EQ( rec->use_count() , 1 );
	- ASSERT_EQ( track.use_count() , 1 );
	+ ASSERT_EQ( rec->use_count(), 1 );
	+ ASSERT_EQ( track.use_count(), 1 );

	// Destruction of 'track' object deallocates the 'rec' and invokes the destroy function object.
	}

	- ASSERT_EQ( destroy_count , 1 );
	+ ASSERT_EQ( destroy_count, 1 );
	}

	#endif /* #if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST ) */

	}

	-
	-}
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/TestSynchronic.cpp b/lib/kokkos/core/unit_test/TestSynchronic.cpp
	deleted file mode 100644
	index dc1abbd8b..000000000
	--- a/lib/kokkos/core/unit_test/TestSynchronic.cpp
	+++ /dev/null
	@@ -1,449 +0,0 @@
	-/*
	-
	-Copyright (c) 2014, NVIDIA Corporation
	-All rights reserved.
	-
	-Redistribution and use in source and binary forms, with or without modification,
	-are permitted provided that the following conditions are met:
	-
	-1. Redistributions of source code must retain the above copyright notice, this
	-list of conditions and the following disclaimer.
	-
	-2. Redistributions in binary form must reproduce the above copyright notice,
	-this list of conditions and the following disclaimer in the documentation
	-and/or other materials provided with the distribution.
	-
	-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
	-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
	-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
	-IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
	-INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
	-BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
	-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	-LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
	-OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
	-OF THE POSSIBILITY OF SUCH DAMAGE.
	-
	-*/
	-
	-//#undef _WIN32_WINNT
	-//#define _WIN32_WINNT 0x0602
	-
	-#if defined(__powerpc__) \|\| defined(__ppc__) \|\| defined(__PPC__) \|\| \
	- defined(__APPLE__) \|\| defined(__ARM_ARCH_8A) \|\| defined(_CRAYC)
	-
	-// Skip for now
	-
	-#else
	-
	-#include <gtest/gtest.h>
	-
	-#ifdef USEOMP
	-#include <omp.h>
	-#endif
	-
	-#include <iostream>
	-#include <sstream>
	-#include <algorithm>
	-#include <string>
	-#include <vector>
	-#include <map>
	-#include <cstring>
	-#include <ctime>
	-
	-//#include <details/config>
	-//#undef __SYNCHRONIC_COMPATIBLE
	-
	-#include <impl/Kokkos_Synchronic.hpp>
	-#include <impl/Kokkos_Synchronic_n3998.hpp>
	-
	-#include "TestSynchronic.hpp"
	-
	-// Uncomment to allow test to dump output
	-//#define VERBOSE_TEST
	-
	-namespace Test {
	-
	-unsigned next_table[] =
	- {
	- 0, 1, 2, 3, //0-3
	- 4, 4, 6, 6, //4-7
	- 8, 8, 8, 8, //8-11
	- 12, 12, 12, 12, //12-15
	- 16, 16, 16, 16, //16-19
	- 16, 16, 16, 16, //20-23
	- 24, 24, 24, 24, //24-27
	- 24, 24, 24, 24, //28-31
	- 32, 32, 32, 32, //32-35
	- 32, 32, 32, 32, //36-39
	- 40, 40, 40, 40, //40-43
	- 40, 40, 40, 40, //44-47
	- 48, 48, 48, 48, //48-51
	- 48, 48, 48, 48, //52-55
	- 56, 56, 56, 56, //56-59
	- 56, 56, 56, 56, //60-63
	- };
	-
	-//change this if you want to allow oversubscription of the system, by default only the range {1-(system size)} is tested
	-#define FOR_GAUNTLET(x) for(unsigned x = (std::min)(std::thread::hardware_concurrency()*8,unsigned(sizeof(next_table)/sizeof(unsigned))); x; x = next_table[x-1])
	-
	-//set this to override the benchmark of barriers to use OMP barriers instead of n3998 std::barrier
	-//#define USEOMP
	-
	-#if defined(__SYNCHRONIC_COMPATIBLE)
	- #define PREFIX "futex-"
	-#else
	- #define PREFIX "backoff-"
	-#endif
	-
	-//this test uses a custom Mersenne twister to eliminate implementation variation
	-MersenneTwister mt;
	-
	-int dummya = 1, dummyb =1;
	-
	-int dummy1 = 1;
	-std::atomic<int> dummy2(1);
	-std::atomic<int> dummy3(1);
	-
	-double time_item(int const count = (int)1E8) {
	-
	- clock_t const start = clock();
	-
	- for(int i = 0;i < count; ++i)
	- mt.integer();
	-
	- clock_t const end = clock();
	- double elapsed_seconds = (end - start) / double(CLOCKS_PER_SEC);
	-
	- return elapsed_seconds / count;
	-}
	-double time_nil(int const count = (int)1E08) {
	-
	- clock_t const start = clock();
	-
	- dummy3 = count;
	- for(int i = 0;i < (int)1E6; ++i) {
	- if(dummy1) {
	- // Do some work while holding the lock
	- int workunits = dummy3;//(int) (mtc.poissonInterval((float)num_items_critical) + 0.5f);
	- for (int j = 1; j < workunits; j++)
	- dummy1 &= j; // Do one work unit
	- dummy2.fetch_add(dummy1,std::memory_order_relaxed);
	- }
	- }
	-
	- clock_t const end = clock();
	- double elapsed_seconds = (end - start) / double(CLOCKS_PER_SEC);
	-
	- return elapsed_seconds / count;
	-}
	-
	-
	-template <class mutex_type>
	-void testmutex_inner(mutex_type& m, std::atomic<int>& t,std::atomic<int>& wc,std::atomic<int>& wnc, int const num_iterations,
	- int const num_items_critical, int const num_items_noncritical, MersenneTwister& mtc, MersenneTwister& mtnc, bool skip) {
	-
	- for(int k = 0; k < num_iterations; ++k) {
	-
	- if(num_items_noncritical) {
	- // Do some work without holding the lock
	- int workunits = num_items_noncritical;//(int) (mtnc.poissonInterval((float)num_items_noncritical) + 0.5f);
	- for (int i = 1; i < workunits; i++)
	- mtnc.integer(); // Do one work unit
	- wnc.fetch_add(workunits,std::memory_order_relaxed);
	- }
	-
	- t.fetch_add(1,std::memory_order_relaxed);
	-
	- if(!skip) {
	- std::unique_lock<mutex_type> l(m);
	- if(num_items_critical) {
	- // Do some work while holding the lock
	- int workunits = num_items_critical;//(int) (mtc.poissonInterval((float)num_items_critical) + 0.5f);
	- for (int i = 1; i < workunits; i++)
	- mtc.integer(); // Do one work unit
	- wc.fetch_add(workunits,std::memory_order_relaxed);
	- }
	- }
	- }
	-}
	-template <class mutex_type>
	-void testmutex_outer(std::map<std::string,std::vector<double>>& results, std::string const& name, double critical_fraction, double critical_duration) {
	-
	- std::ostringstream truename;
	- truename << name << " (f=" << critical_fraction << ",d=" << critical_duration << ")";
	-
	- std::vector<double>& data = results[truename.str()];
	-
	- double const workItemTime = time_item() ,
	- nilTime = time_nil();
	-
	- int const num_items_critical = (critical_duration <= 0 ? 0 : (std::max)( int(critical_duration / workItemTime + 0.5), int(100 * nilTime / workItemTime + 0.5))),
	- num_items_noncritical = (num_items_critical <= 0 ? 0 : int( ( 1 - critical_fraction ) * num_items_critical / critical_fraction + 0.5 ));
	-
	- FOR_GAUNTLET(num_threads) {
	-
	- //Kokkos::Impl::portable_sleep(std::chrono::microseconds(2000000));
	-
	- int const num_iterations = (num_items_critical + num_items_noncritical != 0) ?
	-#ifdef __SYNCHRONIC_JUST_YIELD
	- int( 1 / ( 8 * workItemTime ) / (num_items_critical + num_items_noncritical) / num_threads + 0.5 ) :
	-#else
	- int( 1 / ( 8 * workItemTime ) / (num_items_critical + num_items_noncritical) / num_threads + 0.5 ) :
	-#endif
	-#ifdef WIN32
	- int( 1 / workItemTime / (20 * num_threads * num_threads) );
	-#else
	- int( 1 / workItemTime / (200 * num_threads * num_threads) );
	-#endif
	-
	-#ifdef VERBOSE_TEST
	- std::cerr << "running " << truename.str() << " #" << num_threads << ", " << num_iterations << " * " << num_items_noncritical << "\n" << std::flush;
	-#endif
	-
	-
	- std::atomic<int> t[2], wc[2], wnc[2];
	-
	- clock_t start[2], end[2];
	- for(int pass = 0; pass < 2; ++pass) {
	-
	- t[pass] = 0;
	- wc[pass] = 0;
	- wnc[pass] = 0;
	-
	- srand(num_threads);
	- std::vector<MersenneTwister> randomsnc(num_threads),
	- randomsc(num_threads);
	-
	- mutex_type m;
	-
	- start[pass] = clock();
	-#ifdef USEOMP
	- omp_set_num_threads(num_threads);
	- std::atomic<int> _j(0);
	- #pragma omp parallel
	- {
	- int const j = _j.fetch_add(1,std::memory_order_relaxed);
	- testmutex_inner(m, t[pass], wc[pass], wnc[pass], num_iterations, num_items_critical, num_items_noncritical, randomsc[j], randomsnc[j], pass==0);
	- num_threads = omp_get_num_threads();
	- }
	-#else
	- std::vector<std::thread*> threads(num_threads);
	- for(unsigned j = 0; j < num_threads; ++j)
	- threads[j] = new std::thread([&,j](){
	- testmutex_inner(m, t[pass], wc[pass], wnc[pass], num_iterations, num_items_critical, num_items_noncritical, randomsc[j], randomsnc[j], pass==0);
	- }
	- );
	- for(unsigned j = 0; j < num_threads; ++j) {
	- threads[j]->join();
	- delete threads[j];
	- }
	-#endif
	- end[pass] = clock();
	- }
	- if(t[0] != t[1]) throw std::string("mismatched iteration counts");
	- if(wnc[0] != wnc[1]) throw std::string("mismatched work item counts");
	-
	- double elapsed_seconds_0 = (end[0] - start[0]) / double(CLOCKS_PER_SEC),
	- elapsed_seconds_1 = (end[1] - start[1]) / double(CLOCKS_PER_SEC);
	- double time = (elapsed_seconds_1 - elapsed_seconds_0 - wc[1]*workItemTime) / num_iterations;
	-
	- data.push_back(time);
	-#ifdef VERBOSE_TEST
	- std::cerr << truename.str() << " : " << num_threads << "," << elapsed_seconds_1 / num_iterations << " - " << elapsed_seconds_0 / num_iterations << " - " << wc[1]*workItemTime/num_iterations << " = " << time << " \n";
	-#endif
	- }
	-}
	-
	-template <class barrier_type>
	-void testbarrier_inner(barrier_type& b, int const num_threads, int const j, std::atomic<int>& t,std::atomic<int>& w,
	- int const num_iterations_odd, int const num_iterations_even,
	- int const num_items_noncritical, MersenneTwister& arg_mt, bool skip) {
	-
	- for(int k = 0; k < (std::max)(num_iterations_even,num_iterations_odd); ++k) {
	-
	- if(k >= (~j & 0x1 ? num_iterations_odd : num_iterations_even )) {
	- if(!skip)
	- b.arrive_and_drop();
	- break;
	- }
	-
	- if(num_items_noncritical) {
	- // Do some work without holding the lock
	- int workunits = (int) (arg_mt.poissonInterval((float)num_items_noncritical) + 0.5f);
	- for (int i = 1; i < workunits; i++)
	- arg_mt.integer(); // Do one work unit
	- w.fetch_add(workunits,std::memory_order_relaxed);
	- }
	-
	- t.fetch_add(1,std::memory_order_relaxed);
	-
	- if(!skip) {
	- int const thiscount = (std::min)(k+1,num_iterations_odd)((num_threads>>1)+(num_threads&1)) + (std::min)(k+1,num_iterations_even)(num_threads>>1);
	- if(t.load(std::memory_order_relaxed) > thiscount) {
	- std::cerr << "FAILURE: some threads have run ahead of the barrier (" << t.load(std::memory_order_relaxed) << ">" << thiscount << ").\n";
	- EXPECT_TRUE(false);
	- }
	-#ifdef USEOMP
	- #pragma omp barrier
	-#else
	- b.arrive_and_wait();
	-#endif
	- if(t.load(std::memory_order_relaxed) < thiscount) {
	- std::cerr << "FAILURE: some threads have fallen behind the barrier (" << t.load(std::memory_order_relaxed) << "<" << thiscount << ").\n";
	- EXPECT_TRUE(false);
	- }
	- }
	- }
	-}
	-template <class barrier_type>
	-void testbarrier_outer(std::map<std::string,std::vector<double>>& results, std::string const& name, double barrier_frequency, double phase_duration, bool randomIterations = false) {
	-
	- std::vector<double>& data = results[name];
	-
	- double const workItemTime = time_item();
	- int const num_items_noncritical = int( phase_duration / workItemTime + 0.5 );
	-
	- FOR_GAUNTLET(num_threads) {
	-
	- int const num_iterations = int( barrier_frequency );
	-#ifdef VERBOSE_TEST
	- std::cerr << "running " << name << " #" << num_threads << ", " << num_iterations << " * " << num_items_noncritical << "\r" << std::flush;
	-#endif
	-
	- srand(num_threads);
	-
	- MersenneTwister local_mt;
	- int const num_iterations_odd = randomIterations ? int(local_mt.poissonInterval((float)num_iterations)+0.5f) : num_iterations,
	- num_iterations_even = randomIterations ? int(local_mt.poissonInterval((float)num_iterations)+0.5f) : num_iterations;
	-
	- std::atomic<int> t[2], w[2];
	- std::chrono::time_point<std::chrono::high_resolution_clock> start[2], end[2];
	- for(int pass = 0; pass < 2; ++pass) {
	-
	- t[pass] = 0;
	- w[pass] = 0;
	-
	- srand(num_threads);
	- std::vector<MersenneTwister> randoms(num_threads);
	-
	- barrier_type b(num_threads);
	-
	- start[pass] = std::chrono::high_resolution_clock::now();
	-#ifdef USEOMP
	- omp_set_num_threads(num_threads);
	- std::atomic<int> _j(0);
	- #pragma omp parallel
	- {
	- int const j = _j.fetch_add(1,std::memory_order_relaxed);
	- testbarrier_inner(b, num_threads, j, t[pass], w[pass], num_iterations_odd, num_iterations_even, num_items_noncritical, randoms[j], pass==0);
	- num_threads = omp_get_num_threads();
	- }
	-#else
	- std::vector<std::thread*> threads(num_threads);
	- for(unsigned j = 0; j < num_threads; ++j)
	- threads[j] = new std::thread([&,j](){
	- testbarrier_inner(b, num_threads, j, t[pass], w[pass], num_iterations_odd, num_iterations_even, num_items_noncritical, randoms[j], pass==0);
	- });
	- for(unsigned j = 0; j < num_threads; ++j) {
	- threads[j]->join();
	- delete threads[j];
	- }
	-#endif
	- end[pass] = std::chrono::high_resolution_clock::now();
	- }
	-
	- if(t[0] != t[1]) throw std::string("mismatched iteration counts");
	- if(w[0] != w[1]) throw std::string("mismatched work item counts");
	-
	- int const phases = (std::max)(num_iterations_odd, num_iterations_even);
	-
	- std::chrono::duration<double> elapsed_seconds_0 = end[0]-start[0],
	- elapsed_seconds_1 = end[1]-start[1];
	- double const time = (elapsed_seconds_1.count() - elapsed_seconds_0.count()) / phases;
	-
	- data.push_back(time);
	-#ifdef VERBOSE_TEST
	- std::cerr << name << " : " << num_threads << "," << elapsed_seconds_1.count() / phases << " - " << elapsed_seconds_0.count() / phases << " = " << time << " \n";
	-#endif
	- }
	-}
	-
	-template <class... T>
	-struct mutex_tester;
	-template <class F>
	-struct mutex_tester<F> {
	- static void run(std::map<std::string,std::vector<double>>& results, std::string const name[], double critical_fraction, double critical_duration) {
	- testmutex_outer<F>(results, *name, critical_fraction, critical_duration);
	- }
	-};
	-template <class F, class... T>
	-struct mutex_tester<F,T...> {
	- static void run(std::map<std::string,std::vector<double>>& results, std::string const name[], double critical_fraction, double critical_duration) {
	- mutex_tester<F>::run(results, name, critical_fraction, critical_duration);
	- mutex_tester<T...>::run(results, ++name, critical_fraction, critical_duration);
	- }
	-};
	-
	-TEST( synchronic, main )
	-{
	- //warm up
	- time_item();
	-
	- //measure up
	-#ifdef VERBOSE_TEST
	- std::cerr << "measuring work item speed...\r";
	- std::cerr << "work item speed is " << time_item() << " per item, nil is " << time_nil() << "\n";
	-#endif
	- try {
	-
	- std::pair<double,double> testpoints[] = { {1, 0}, /{1E-1, 10E-3}, {5E-1, 2E-6}, {3E-1, 50E-9},/ };
	- for(auto x : testpoints ) {
	-
	- std::map<std::string,std::vector<double>> results;
	-
	- //testbarrier_outer<std::barrier>(results, PREFIX"bar 1khz 100us", 1E3, x.second);
	-
	- std::string const names[] = {
	- PREFIX"tkt", PREFIX"mcs", PREFIX"ttas", PREFIX"std"
	-#ifdef WIN32
	- ,PREFIX"srw"
	-#endif
	- };
	-
	- //run -->
	-
	- mutex_tester<
	- ticket_mutex, mcs_mutex, ttas_mutex, std::mutex
	-#ifdef WIN32
	- ,srw_mutex
	-#endif
	- >::run(results, names, x.first, x.second);
	-
	- //<-- run
	-
	-#ifdef VERBOSE_TEST
	- std::cout << "threads";
	- for(auto & i : results)
	- std::cout << ",\"" << i.first << '\"';
	- std::cout << std::endl;
	- int j = 0;
	- FOR_GAUNTLET(num_threads) {
	- std::cout << num_threads;
	- for(auto & i : results)
	- std::cout << ',' << i.second[j];
	- std::cout << std::endl;
	- ++j;
	- }
	-#endif
	- }
	- }
	- catch(std::string & e) {
	- std::cerr << "EXCEPTION : " << e << std::endl;
	- EXPECT_TRUE( false );
	- }
	-}
	-
	-} // namespace Test
	-
	-#endif
	diff --git a/lib/kokkos/core/unit_test/TestSynchronic.hpp b/lib/kokkos/core/unit_test/TestSynchronic.hpp
	deleted file mode 100644
	index f4341b978..000000000
	--- a/lib/kokkos/core/unit_test/TestSynchronic.hpp
	+++ /dev/null
	@@ -1,241 +0,0 @@
	-/*
	-
	-Copyright (c) 2014, NVIDIA Corporation
	-All rights reserved.
	-
	-Redistribution and use in source and binary forms, with or without modification,
	-are permitted provided that the following conditions are met:
	-
	-1. Redistributions of source code must retain the above copyright notice, this
	-list of conditions and the following disclaimer.
	-
	-2. Redistributions in binary form must reproduce the above copyright notice,
	-this list of conditions and the following disclaimer in the documentation
	-and/or other materials provided with the distribution.
	-
	-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
	-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
	-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
	-IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
	-INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
	-BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
	-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	-LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
	-OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
	-OF THE POSSIBILITY OF SUCH DAMAGE.
	-
	-*/
	-
	-#ifndef TEST_SYNCHRONIC_HPP
	-#define TEST_SYNCHRONIC_HPP
	-
	-#include <impl/Kokkos_Synchronic.hpp>
	-#include <mutex>
	-#include <cmath>
	-
	-namespace Test {
	-
	-template <bool truly>
	-struct dumb_mutex {
	-
	- dumb_mutex () : locked(0) {
	- }
	-
	- void lock() {
	- while(1) {
	- bool state = false;
	- if (locked.compare_exchange_weak(state,true,std::memory_order_acquire)) {
	- break;
	- }
	- while (locked.load(std::memory_order_relaxed)) {
	- if (!truly) {
	- Kokkos::Impl::portable_yield();
	- }
	- }
	- }
	- }
	-
	- void unlock() {
	- locked.store(false,std::memory_order_release);
	- }
	-
	-private :
	- std::atomic<bool> locked;
	-};
	-
	-#ifdef WIN32
	-#include <winsock2.h>
	-#include <windows.h>
	-#include <synchapi.h>
	-struct srw_mutex {
	-
	- srw_mutex () {
	- InitializeSRWLock(&_lock);
	- }
	-
	- void lock() {
	- AcquireSRWLockExclusive(&_lock);
	- }
	- void unlock() {
	- ReleaseSRWLockExclusive(&_lock);
	- }
	-
	-private :
	- SRWLOCK _lock;
	-};
	-#endif
	-
	-struct ttas_mutex {
	-
	- ttas_mutex() : locked(false) {
	- }
	-
	- ttas_mutex(const ttas_mutex&) = delete;
	- ttas_mutex& operator=(const ttas_mutex&) = delete;
	-
	- void lock() {
	- for(int i = 0;; ++i) {
	- bool state = false;
	- if(locked.compare_exchange_weak(state,true,std::memory_order_relaxed,Kokkos::Impl::notify_none))
	- break;
	- locked.expect_update(true);
	- }
	- std::atomic_thread_fence(std::memory_order_acquire);
	- }
	- void unlock() {
	- locked.store(false,std::memory_order_release);
	- }
	-
	-private :
	- Kokkos::Impl::synchronic<bool> locked;
	-};
	-
	-struct ticket_mutex {
	-
	- ticket_mutex() : active(0), queue(0) {
	- }
	-
	- ticket_mutex(const ticket_mutex&) = delete;
	- ticket_mutex& operator=(const ticket_mutex&) = delete;
	-
	- void lock() {
	- int const me = queue.fetch_add(1, std::memory_order_relaxed);
	- while(me != active.load_when_equal(me, std::memory_order_acquire))
	- ;
	- }
	-
	- void unlock() {
	- active.fetch_add(1,std::memory_order_release);
	- }
	-private :
	- Kokkos::Impl::synchronic<int> active;
	- std::atomic<int> queue;
	-};
	-
	-struct mcs_mutex {
	-
	- mcs_mutex() : head(nullptr) {
	- }
	-
	- mcs_mutex(const mcs_mutex&) = delete;
	- mcs_mutex& operator=(const mcs_mutex&) = delete;
	-
	- struct unique_lock {
	-
	- unique_lock(mcs_mutex & arg_m) : m(arg_m), next(nullptr), ready(false) {
	-
	- unique_lock * const h = m.head.exchange(this,std::memory_order_acquire);
	- if(__builtin_expect(h != nullptr,0)) {
	- h->next.store(this,std::memory_order_seq_cst,Kokkos::Impl::notify_one);
	- while(!ready.load_when_not_equal(false,std::memory_order_acquire))
	- ;
	- }
	- }
	-
	- unique_lock(const unique_lock&) = delete;
	- unique_lock& operator=(const unique_lock&) = delete;
	-
	- ~unique_lock() {
	- unique_lock * h = this;
	- if(__builtin_expect(!m.head.compare_exchange_strong(h,nullptr,std::memory_order_release, std::memory_order_relaxed),0)) {
	- unique_lock * n = next.load(std::memory_order_relaxed);
	- while(!n)
	- n = next.load_when_not_equal(n,std::memory_order_relaxed);
	- n->ready.store(true,std::memory_order_release,Kokkos::Impl::notify_one);
	- }
	- }
	-
	- private:
	- mcs_mutex & m;
	- Kokkos::Impl::synchronic<unique_lock*> next;
	- Kokkos::Impl::synchronic<bool> ready;
	- };
	-
	-private :
	- std::atomic<unique_lock*> head;
	-};
	-
	-}
	-
	-namespace std {
	-template<>
	-struct unique_lock<Test::mcs_mutex> : Test::mcs_mutex::unique_lock {
	- unique_lock(Test::mcs_mutex & arg_m) : Test::mcs_mutex::unique_lock(arg_m) {
	- }
	- unique_lock(const unique_lock&) = delete;
	- unique_lock& operator=(const unique_lock&) = delete;
	-};
	-
	-}
	-
	-/* #include <cmath> */
	-#include <stdlib.h>
	-
	-namespace Test {
	-
	-//-------------------------------------
	-// MersenneTwister
	-//-------------------------------------
	-#define MT_IA 397
	-#define MT_LEN 624
	-
	-class MersenneTwister
	-{
	- volatile unsigned long m_buffer[MT_LEN][64/sizeof(unsigned long)];
	- volatile int m_index;
	-
	-public:
	- MersenneTwister() {
	- for (int i = 0; i < MT_LEN; i++)
	- m_buffer[i][0] = rand();
	- m_index = 0;
	- for (int i = 0; i < MT_LEN * 100; i++)
	- integer();
	- }
	- unsigned long integer() {
	- // Indices
	- int i = m_index;
	- int i2 = m_index + 1; if (i2 >= MT_LEN) i2 = 0; // wrap-around
	- int j = m_index + MT_IA; if (j >= MT_LEN) j -= MT_LEN; // wrap-around
	-
	- // Twist
	- unsigned long s = (m_buffer[i][0] & 0x80000000) \| (m_buffer[i2][0] & 0x7fffffff);
	- unsigned long r = m_buffer[j][0] ^ (s >> 1) ^ ((s & 1) * 0x9908B0DF);
	- m_buffer[m_index][0] = r;
	- m_index = i2;
	-
	- // Swizzle
	- r ^= (r >> 11);
	- r ^= (r << 7) & 0x9d2c5680UL;
	- r ^= (r << 15) & 0xefc60000UL;
	- r ^= (r >> 18);
	- return r;
	- }
	- float poissonInterval(float ooLambda) {
	- return -logf(1.0f - integer() * 2.3283e-10f) * ooLambda;
	- }
	-};
	-
	-} // namespace Test
	-
	-#endif //TEST_HPP
	diff --git a/lib/kokkos/core/unit_test/TestTaskScheduler.hpp b/lib/kokkos/core/unit_test/TestTaskScheduler.hpp
	index 113455398..57e47d4ba 100644
	--- a/lib/kokkos/core/unit_test/TestTaskScheduler.hpp
	+++ b/lib/kokkos/core/unit_test/TestTaskScheduler.hpp
	@@ -1,551 +1,561 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	-
	#ifndef KOKKOS_UNITTEST_TASKSCHEDULER_HPP
	#define KOKKOS_UNITTEST_TASKSCHEDULER_HPP

	#include <stdio.h>
	#include <iostream>
	#include <cmath>

	#if defined( KOKKOS_ENABLE_TASKDAG )

	-//----------------------------------------------------------------------------
	-//----------------------------------------------------------------------------
	-
	namespace TestTaskScheduler {

	namespace {

	inline
	long eval_fib( long n )
	{
	- constexpr long mask = 0x03 ;
	+ constexpr long mask = 0x03;

	- long fib[4] = { 0 , 1 , 1 , 2 };
	+ long fib[4] = { 0, 1, 1, 2 };

	- for ( long i = 2 ; i <= n ; ++i ) {
	+ for ( long i = 2; i <= n; ++i ) {
	fib[ i & mask ] = fib[ ( i - 1 ) & mask ] + fib[ ( i - 2 ) & mask ];
	}
	-
	+
	return fib[ n & mask ];
	}

	}

	template< typename Space >
	struct TestFib
	{
	- typedef Kokkos::TaskScheduler<Space> policy_type ;
	- typedef Kokkos::Future<long,Space> future_type ;
	- typedef long value_type ;
	+ typedef Kokkos::TaskScheduler< Space > sched_type;
	+ typedef Kokkos::Future< long, Space > future_type;
	+ typedef long value_type;

	- policy_type policy ;
	- future_type fib_m1 ;
	- future_type fib_m2 ;
	- const value_type n ;
	+ sched_type sched;
	+ future_type fib_m1;
	+ future_type fib_m2;
	+ const value_type n;

	KOKKOS_INLINE_FUNCTION
	- TestFib( const policy_type & arg_policy , const value_type arg_n )
	- : policy(arg_policy)
	- , fib_m1() , fib_m2()
	- , n( arg_n )
	- {}
	+ TestFib( const sched_type & arg_sched, const value_type arg_n )
	+ : sched( arg_sched ), fib_m1(), fib_m2(), n( arg_n ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator()( typename policy_type::member_type & , value_type & result )
	- {
	+ void operator()( typename sched_type::member_type &, value_type & result )
	+ {
	#if 0
	- printf( "\nTestFib(%ld) %d %d\n"
	- , n
	- , int( ! fib_m1.is_null() )
	- , int( ! fib_m2.is_null() )
	- );
	+ printf( "\nTestFib(%ld) %d %d\n", n, int( !fib_m1.is_null() ), int( !fib_m2.is_null() ) );
	#endif

	- if ( n < 2 ) {
	- result = n ;
	- }
	- else if ( ! fib_m2.is_null() && ! fib_m1.is_null() ) {
	- result = fib_m1.get() + fib_m2.get();
	- }
	- else {
	-
	- // Spawn new children and respawn myself to sum their results:
	- // Spawn lower value at higher priority as it has a shorter
	- // path to completion.
	-
	- fib_m2 = policy.task_spawn( TestFib(policy,n-2)
	- , Kokkos::TaskSingle
	- , Kokkos::TaskHighPriority );
	+ if ( n < 2 ) {
	+ result = n;
	+ }
	+ else if ( !fib_m2.is_null() && !fib_m1.is_null() ) {
	+ result = fib_m1.get() + fib_m2.get();
	+ }
	+ else {
	+ // Spawn new children and respawn myself to sum their results.
	+ // Spawn lower value at higher priority as it has a shorter
	+ // path to completion.

	- fib_m1 = policy.task_spawn( TestFib(policy,n-1)
	- , Kokkos::TaskSingle );
	+ fib_m2 = Kokkos::task_spawn( Kokkos::TaskSingle( sched, Kokkos::TaskPriority::High )
	+ , TestFib( sched, n - 2 ) );

	- Kokkos::Future<Space> dep[] = { fib_m1 , fib_m2 };
	+ fib_m1 = Kokkos::task_spawn( Kokkos::TaskSingle( sched )
	+ , TestFib( sched, n - 1 ) );

	- Kokkos::Future<Space> fib_all = policy.when_all( 2 , dep );
	+ Kokkos::Future< Space > dep[] = { fib_m1, fib_m2 };
	+ Kokkos::Future< Space > fib_all = Kokkos::when_all( dep, 2 );

	- if ( ! fib_m2.is_null() && ! fib_m1.is_null() && ! fib_all.is_null() ) {
	- // High priority to retire this branch
	- policy.respawn( this , Kokkos::TaskHighPriority , fib_all );
	- }
	- else {
	+ if ( !fib_m2.is_null() && !fib_m1.is_null() && !fib_all.is_null() ) {
	+ // High priority to retire this branch.
	+ Kokkos::respawn( this, fib_all, Kokkos::TaskPriority::High );
	+ }
	+ else {
	#if 1
	- printf( "TestFib(%ld) insufficient memory alloc_capacity(%d) task_max(%d) task_accum(%ld)\n"
	- , n
	- , policy.allocation_capacity()
	- , policy.allocated_task_count_max()
	- , policy.allocated_task_count_accum()
	- );
	+ printf( "TestFib(%ld) insufficient memory alloc_capacity(%d) task_max(%d) task_accum(%ld)\n"
	+ , n
	+ , sched.allocation_capacity()
	+ , sched.allocated_task_count_max()
	+ , sched.allocated_task_count_accum()
	+ );
	#endif
	- Kokkos::abort("TestFib insufficient memory");

	- }
	+ Kokkos::abort( "TestFib insufficient memory" );
	+
	}
	}
	+ }

	- static void run( int i , size_t MemoryCapacity = 16000 )
	- {
	- typedef typename policy_type::memory_space memory_space ;
	+ static void run( int i, size_t MemoryCapacity = 16000 )
	+ {
	+ typedef typename sched_type::memory_space memory_space;

	- enum { Log2_SuperBlockSize = 12 };
	+ enum { Log2_SuperBlockSize = 12 };

	- policy_type root_policy( memory_space() , MemoryCapacity , Log2_SuperBlockSize );
	+ sched_type root_sched( memory_space(), MemoryCapacity, Log2_SuperBlockSize );

	- future_type f = root_policy.host_spawn( TestFib(root_policy,i) , Kokkos::TaskSingle );
	- Kokkos::wait( root_policy );
	- ASSERT_EQ( eval_fib(i) , f.get() );
	+ future_type f = Kokkos::host_spawn( Kokkos::TaskSingle( root_sched )
	+ , TestFib( root_sched, i ) );
	+
	+ Kokkos::wait( root_sched );
	+
	+ ASSERT_EQ( eval_fib( i ), f.get() );

	#if 0
	- fprintf( stdout , "\nTestFib::run(%d) spawn_size(%d) when_all_size(%d) alloc_capacity(%d) task_max(%d) task_accum(%ld)\n"
	- , i
	- , int(root_policy.template spawn_allocation_size<TestFib>())
	- , int(root_policy.when_all_allocation_size(2))
	- , root_policy.allocation_capacity()
	- , root_policy.allocated_task_count_max()
	- , root_policy.allocated_task_count_accum()
	- );
	- fflush( stdout );
	+ fprintf( stdout, "\nTestFib::run(%d) spawn_size(%d) when_all_size(%d) alloc_capacity(%d) task_max(%d) task_accum(%ld)\n"
	+ , i
	+ , int(root_sched.template spawn_allocation_size<TestFib>())
	+ , int(root_sched.when_all_allocation_size(2))
	+ , root_sched.allocation_capacity()
	+ , root_sched.allocated_task_count_max()
	+ , root_sched.allocated_task_count_accum()
	+ );
	+ fflush( stdout );
	#endif
	- }
	-
	+ }
	};

	} // namespace TestTaskScheduler

	//----------------------------------------------------------------------------

	namespace TestTaskScheduler {

	template< class Space >
	struct TestTaskDependence {
	+ typedef Kokkos::TaskScheduler< Space > sched_type;
	+ typedef Kokkos::Future< Space > future_type;
	+ typedef Kokkos::View< long, Space > accum_type;
	+ typedef void value_type;

	- typedef Kokkos::TaskScheduler<Space> policy_type ;
	- typedef Kokkos::Future<Space> future_type ;
	- typedef Kokkos::View<long,Space> accum_type ;
	- typedef void value_type ;
	-
	- policy_type m_policy ;
	- accum_type m_accum ;
	- long m_count ;
	+ sched_type m_sched;
	+ accum_type m_accum;
	+ long m_count;

	KOKKOS_INLINE_FUNCTION
	TestTaskDependence( long n
	- , const policy_type & arg_policy
	- , const accum_type & arg_accum )
	- : m_policy( arg_policy )
	+ , const sched_type & arg_sched
	+ , const accum_type & arg_accum )
	+ : m_sched( arg_sched )
	, m_accum( arg_accum )
	- , m_count( n )
	- {}
	+ , m_count( n ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator()( typename policy_type::member_type & )
	- {
	- enum { CHUNK = 8 };
	- const int n = CHUNK < m_count ? CHUNK : m_count ;
	+ void operator()( typename sched_type::member_type & )
	+ {
	+ enum { CHUNK = 8 };
	+ const int n = CHUNK < m_count ? CHUNK : m_count;

	- if ( 1 < m_count ) {
	- future_type f[ CHUNK ] ;
	+ if ( 1 < m_count ) {
	+ future_type f[ CHUNK ];

	- const int inc = ( m_count + n - 1 ) / n ;
	+ const int inc = ( m_count + n - 1 ) / n;

	- for ( int i = 0 ; i < n ; ++i ) {
	- long begin = i * inc ;
	- long count = begin + inc < m_count ? inc : m_count - begin ;
	- f[i] = m_policy.task_spawn( TestTaskDependence(count,m_policy,m_accum) , Kokkos::TaskSingle );
	- }
	+ for ( int i = 0; i < n; ++i ) {
	+ long begin = i * inc;
	+ long count = begin + inc < m_count ? inc : m_count - begin;
	+ f[i] = Kokkos::task_spawn( Kokkos::TaskSingle( m_sched )
	+ , TestTaskDependence( count, m_sched, m_accum ) );
	+ }

	- m_count = 0 ;
	+ m_count = 0;

	- m_policy.respawn( this , m_policy.when_all( n , f ) );
	- }
	- else if ( 1 == m_count ) {
	- Kokkos::atomic_increment( & m_accum() );
	- }
	+ Kokkos::respawn( this, Kokkos::when_all( f, n ) );
	+ }
	+ else if ( 1 == m_count ) {
	+ Kokkos::atomic_increment( & m_accum() );
	}
	+ }

	static void run( int n )
	- {
	- typedef typename policy_type::memory_space memory_space ;
	+ {
	+ typedef typename sched_type::memory_space memory_space;

	- // enum { MemoryCapacity = 4000 }; // Triggers infinite loop in memory pool
	- enum { MemoryCapacity = 16000 };
	- enum { Log2_SuperBlockSize = 12 };
	- policy_type policy( memory_space() , MemoryCapacity , Log2_SuperBlockSize );
	+ // enum { MemoryCapacity = 4000 }; // Triggers infinite loop in memory pool.
	+ enum { MemoryCapacity = 16000 };
	+ enum { Log2_SuperBlockSize = 12 };
	+ sched_type sched( memory_space(), MemoryCapacity, Log2_SuperBlockSize );

	- accum_type accum("accum");
	+ accum_type accum( "accum" );

	- typename accum_type::HostMirror host_accum =
	- Kokkos::create_mirror_view( accum );
	+ typename accum_type::HostMirror host_accum = Kokkos::create_mirror_view( accum );

	- policy.host_spawn( TestTaskDependence(n,policy,accum) , Kokkos::TaskSingle );
	+ Kokkos::host_spawn( Kokkos::TaskSingle( sched ), TestTaskDependence( n, sched, accum ) );

	- Kokkos::wait( policy );
	+ Kokkos::wait( sched );

	- Kokkos::deep_copy( host_accum , accum );
	+ Kokkos::deep_copy( host_accum, accum );

	- ASSERT_EQ( host_accum() , n );
	- }
	+ ASSERT_EQ( host_accum(), n );
	+ }
	};

	} // namespace TestTaskScheduler

	//----------------------------------------------------------------------------

	namespace TestTaskScheduler {

	template< class ExecSpace >
	struct TestTaskTeam {
	-
	//enum { SPAN = 8 };
	enum { SPAN = 33 };
	//enum { SPAN = 1 };

	- typedef void value_type ;
	- typedef Kokkos::TaskScheduler<ExecSpace> policy_type ;
	- typedef Kokkos::Future<ExecSpace> future_type ;
	- typedef Kokkos::View<long*,ExecSpace> view_type ;
	+ typedef void value_type;
	+ typedef Kokkos::TaskScheduler< ExecSpace > sched_type;
	+ typedef Kokkos::Future< ExecSpace > future_type;
	+ typedef Kokkos::View< long*, ExecSpace > view_type;

	- policy_type policy ;
	- future_type future ;
	+ sched_type sched;
	+ future_type future;

	- view_type parfor_result ;
	- view_type parreduce_check ;
	- view_type parscan_result ;
	- view_type parscan_check ;
	- const long nvalue ;
	+ view_type parfor_result;
	+ view_type parreduce_check;
	+ view_type parscan_result;
	+ view_type parscan_check;
	+ const long nvalue;

	KOKKOS_INLINE_FUNCTION
	- TestTaskTeam( const policy_type & arg_policy
	- , const view_type & arg_parfor_result
	- , const view_type & arg_parreduce_check
	- , const view_type & arg_parscan_result
	- , const view_type & arg_parscan_check
	- , const long arg_nvalue )
	- : policy(arg_policy)
	+ TestTaskTeam( const sched_type & arg_sched
	+ , const view_type & arg_parfor_result
	+ , const view_type & arg_parreduce_check
	+ , const view_type & arg_parscan_result
	+ , const view_type & arg_parscan_check
	+ , const long arg_nvalue )
	+ : sched( arg_sched )
	, future()
	, parfor_result( arg_parfor_result )
	, parreduce_check( arg_parreduce_check )
	, parscan_result( arg_parscan_result )
	, parscan_check( arg_parscan_check )
	- , nvalue( arg_nvalue )
	- {}
	+ , nvalue( arg_nvalue ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator()( typename policy_type::member_type & member )
	- {
	- const long end = nvalue + 1 ;
	- const long begin = 0 < end - SPAN ? end - SPAN : 0 ;
	-
	- if ( 0 < begin && future.is_null() ) {
	- if ( member.team_rank() == 0 ) {
	- future = policy.task_spawn
	- ( TestTaskTeam( policy ,
	- parfor_result ,
	- parreduce_check,
	- parscan_result,
	- parscan_check,
	- begin - 1 )
	- , Kokkos::TaskTeam );
	-
	- assert( ! future.is_null() );
	-
	- policy.respawn( this , future );
	- }
	- return ;
	- }
	+ void operator()( typename sched_type::member_type & member )
	+ {
	+ const long end = nvalue + 1;
	+ const long begin = 0 < end - SPAN ? end - SPAN : 0;

	- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
	- , [&]( int i ) { parfor_result[i] = i ; }
	- );
	-
	- // test parallel_reduce without join
	-
	- long tot = 0;
	- long expected = (begin+end-1)(end-begin)0.5;
	-
	- Kokkos::parallel_reduce( Kokkos::TeamThreadRange(member,begin,end)
	- , [&]( int i, long &res) { res += parfor_result[i]; }
	- , tot);
	- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
	- , [&]( int i ) { parreduce_check[i] = expected-tot ; }
	- );
	-
	- // test parallel_reduce with join
	-
	- tot = 0;
	- Kokkos::parallel_reduce( Kokkos::TeamThreadRange(member,begin,end)
	- , [&]( int i, long &res) { res += parfor_result[i]; }
	- , [&]( long& val1, const long& val2) { val1 += val2; }
	- , tot);
	- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
	- , [&]( int i ) { parreduce_check[i] += expected-tot ; }
	- );
	-
	- // test parallel_scan
	-
	- // Exclusive scan
	- Kokkos::parallel_scan<long>( Kokkos::TeamThreadRange(member,begin,end)
	- , [&]( int i, long &val , const bool final ) {
	- if ( final ) { parscan_result[i] = val; }
	- val += i;
	- }
	- );
	+ if ( 0 < begin && future.is_null() ) {
	if ( member.team_rank() == 0 ) {
	- for ( long i = begin ; i < end ; ++i ) {
	- parscan_check[i] = (i(i-1)-begin(begin-1))*0.5-parscan_result[i];
	- }
	+ future = Kokkos::task_spawn( Kokkos::TaskTeam( sched )
	+ , TestTaskTeam( sched
	+ , parfor_result
	+ , parreduce_check
	+ , parscan_result
	+ , parscan_check
	+ , begin - 1 )
	+ );
	+
	+ assert( !future.is_null() );
	+
	+ Kokkos::respawn( this, future );
	}

	- // Inclusive scan
	- Kokkos::parallel_scan<long>( Kokkos::TeamThreadRange(member,begin,end)
	- , [&]( int i, long &val , const bool final ) {
	- val += i;
	- if ( final ) { parscan_result[i] = val; }
	- }
	- );
	- if ( member.team_rank() == 0 ) {
	- for ( long i = begin ; i < end ; ++i ) {
	- parscan_check[i] += (i(i+1)-begin(begin-1))*0.5-parscan_result[i];
	- }
	+ return;
	+ }
	+
	+ Kokkos::parallel_for( Kokkos::TeamThreadRange( member, begin, end )
	+ , [&] ( int i ) { parfor_result[i] = i; }
	+ );
	+
	+ // Test parallel_reduce without join.
	+
	+ long tot = 0;
	+ long expected = ( begin + end - 1 ) * ( end - begin ) * 0.5;
	+
	+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange( member, begin, end )
	+ , [&] ( int i, long & res ) { res += parfor_result[i]; }
	+ , tot
	+ );
	+
	+ Kokkos::parallel_for( Kokkos::TeamThreadRange( member, begin, end )
	+ , [&] ( int i ) { parreduce_check[i] = expected - tot; }
	+ );
	+
	+ // Test parallel_reduce with join.
	+
	+ tot = 0;
	+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange( member, begin, end )
	+ , [&] ( int i, long & res ) { res += parfor_result[i]; }
	+#if 0
	+ , Kokkos::Sum( tot )
	+#else
	+ , [] ( long & dst, const long & src ) { dst += src; }
	+ , tot
	+#endif
	+ );
	+
	+ Kokkos::parallel_for( Kokkos::TeamThreadRange( member, begin, end )
	+ , [&] ( int i ) { parreduce_check[i] += expected - tot; }
	+ );
	+
	+ // Test parallel_scan.
	+
	+ // Exclusive scan.
	+ Kokkos::parallel_scan<long>( Kokkos::TeamThreadRange( member, begin, end )
	+ , [&] ( int i, long & val, const bool final )
	+ {
	+ if ( final ) { parscan_result[i] = val; }
	+
	+ val += i;
	+ });
	+
	+ // Wait for 'parscan_result' before testing it.
	+ member.team_barrier();
	+
	+ if ( member.team_rank() == 0 ) {
	+ for ( long i = begin; i < end; ++i ) {
	+ parscan_check[i] = ( i * ( i - 1 ) - begin * ( begin - 1 ) ) * 0.5 - parscan_result[i];
	}
	- // ThreadVectorRange check
	- /*
	- long result = 0;
	- expected = (begin+end-1)(end-begin)0.5;
	- Kokkos::parallel_reduce( Kokkos::TeamThreadRange( member , 0 , 1 )
	- , [&] ( const int i , long & outerUpdate ) {
	- long sum_j = 0.0;
	- Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( member , end - begin )
	- , [&] ( const int j , long &innerUpdate ) {
	- innerUpdate += begin+j;
	- } , sum_j );
	- outerUpdate += sum_j ;
	- } , result );
	- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
	- , [&]( int i ) {
	- parreduce_check[i] += result-expected ;
	- }
	- );
	- */
	}

	- static void run( long n )
	+ // Don't overwrite 'parscan_result' until it has been tested.
	+ member.team_barrier();
	+
	+ // Inclusive scan.
	+ Kokkos::parallel_scan<long>( Kokkos::TeamThreadRange( member, begin, end )
	+ , [&] ( int i, long & val, const bool final )
	{
	- // const unsigned memory_capacity = 10000 ; // causes memory pool infinite loop
	- // const unsigned memory_capacity = 100000 ; // fails with SPAN=1 for serial and OMP
	- const unsigned memory_capacity = 400000 ;
	-
	- policy_type root_policy( typename policy_type::memory_space()
	- , memory_capacity );
	-
	- view_type root_parfor_result("parfor_result",n+1);
	- view_type root_parreduce_check("parreduce_check",n+1);
	- view_type root_parscan_result("parscan_result",n+1);
	- view_type root_parscan_check("parscan_check",n+1);
	-
	- typename view_type::HostMirror
	- host_parfor_result = Kokkos::create_mirror_view( root_parfor_result );
	- typename view_type::HostMirror
	- host_parreduce_check = Kokkos::create_mirror_view( root_parreduce_check );
	- typename view_type::HostMirror
	- host_parscan_result = Kokkos::create_mirror_view( root_parscan_result );
	- typename view_type::HostMirror
	- host_parscan_check = Kokkos::create_mirror_view( root_parscan_check );
	-
	- future_type f = root_policy.host_spawn(
	- TestTaskTeam( root_policy ,
	- root_parfor_result ,
	- root_parreduce_check ,
	- root_parscan_result,
	- root_parscan_check,
	- n ) ,
	- Kokkos::TaskTeam );
	-
	- Kokkos::wait( root_policy );
	-
	- Kokkos::deep_copy( host_parfor_result , root_parfor_result );
	- Kokkos::deep_copy( host_parreduce_check , root_parreduce_check );
	- Kokkos::deep_copy( host_parscan_result , root_parscan_result );
	- Kokkos::deep_copy( host_parscan_check , root_parscan_check );
	-
	- for ( long i = 0 ; i <= n ; ++i ) {
	- const long answer = i ;
	- if ( host_parfor_result(i) != answer ) {
	- std::cerr << "TestTaskTeam::run ERROR parallel_for result(" << i << ") = "
	- << host_parfor_result(i) << " != " << answer << std::endl ;
	- }
	- if ( host_parreduce_check(i) != 0 ) {
	- std::cerr << "TestTaskTeam::run ERROR parallel_reduce check(" << i << ") = "
	- << host_parreduce_check(i) << " != 0" << std::endl ;
	- }
	- if ( host_parscan_check(i) != 0 ) {
	- std::cerr << "TestTaskTeam::run ERROR parallel_scan check(" << i << ") = "
	- << host_parscan_check(i) << " != 0" << std::endl ;
	- }
	+ val += i;
	+
	+ if ( final ) { parscan_result[i] = val; }
	+ });
	+
	+ // Wait for 'parscan_result' before testing it.
	+ member.team_barrier();
	+
	+ if ( member.team_rank() == 0 ) {
	+ for ( long i = begin; i < end; ++i ) {
	+ parscan_check[i] += ( i * ( i + 1 ) - begin * ( begin - 1 ) ) * 0.5 - parscan_result[i];
	}
	}
	+
	+ // ThreadVectorRange check.
	+/*
	+ long result = 0;
	+ expected = ( begin + end - 1 ) * ( end - begin ) * 0.5;
	+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange( member, 0, 1 )
	+ , [&] ( const int i, long & outerUpdate )
	+ {
	+ long sum_j = 0.0;
	+
	+ Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( member, end - begin )
	+ , [&] ( const int j, long & innerUpdate )
	+ {
	+ innerUpdate += begin + j;
	+ }, sum_j );
	+
	+ outerUpdate += sum_j;
	+ }, result );
	+
	+ Kokkos::parallel_for( Kokkos::TeamThreadRange( member, begin, end )
	+ , [&] ( int i )
	+ {
	+ parreduce_check[i] += result - expected;
	+ });
	+*/
	+ }
	+
	+ static void run( long n )
	+ {
	+ //const unsigned memory_capacity = 10000; // Causes memory pool infinite loop.
	+ //const unsigned memory_capacity = 100000; // Fails with SPAN=1 for serial and OMP.
	+ const unsigned memory_capacity = 400000;
	+
	+ sched_type root_sched( typename sched_type::memory_space(), memory_capacity );
	+
	+ view_type root_parfor_result( "parfor_result", n + 1 );
	+ view_type root_parreduce_check( "parreduce_check", n + 1 );
	+ view_type root_parscan_result( "parscan_result", n + 1 );
	+ view_type root_parscan_check( "parscan_check", n + 1 );
	+
	+ typename view_type::HostMirror
	+ host_parfor_result = Kokkos::create_mirror_view( root_parfor_result );
	+ typename view_type::HostMirror
	+ host_parreduce_check = Kokkos::create_mirror_view( root_parreduce_check );
	+ typename view_type::HostMirror
	+ host_parscan_result = Kokkos::create_mirror_view( root_parscan_result );
	+ typename view_type::HostMirror
	+ host_parscan_check = Kokkos::create_mirror_view( root_parscan_check );
	+
	+ future_type f = Kokkos::host_spawn( Kokkos::TaskTeam( root_sched )
	+ , TestTaskTeam( root_sched
	+ , root_parfor_result
	+ , root_parreduce_check
	+ , root_parscan_result
	+ , root_parscan_check
	+ , n )
	+ );
	+
	+ Kokkos::wait( root_sched );
	+
	+ Kokkos::deep_copy( host_parfor_result, root_parfor_result );
	+ Kokkos::deep_copy( host_parreduce_check, root_parreduce_check );
	+ Kokkos::deep_copy( host_parscan_result, root_parscan_result );
	+ Kokkos::deep_copy( host_parscan_check, root_parscan_check );
	+
	+ for ( long i = 0; i <= n; ++i ) {
	+ const long answer = i;
	+
	+ if ( host_parfor_result( i ) != answer ) {
	+ std::cerr << "TestTaskTeam::run ERROR parallel_for result(" << i << ") = "
	+ << host_parfor_result( i ) << " != " << answer << std::endl;
	+ }
	+
	+ if ( host_parreduce_check( i ) != 0 ) {
	+ std::cerr << "TestTaskTeam::run ERROR parallel_reduce check(" << i << ") = "
	+ << host_parreduce_check( i ) << " != 0" << std::endl;
	+ }
	+
	+ if ( host_parscan_check( i ) != 0 ) {
	+ std::cerr << "TestTaskTeam::run ERROR parallel_scan check(" << i << ") = "
	+ << host_parscan_check( i ) << " != 0" << std::endl;
	+ }
	+ }
	+ }
	};

	template< class ExecSpace >
	struct TestTaskTeamValue {
	-
	enum { SPAN = 8 };

	- typedef long value_type ;
	- typedef Kokkos::TaskScheduler<ExecSpace> policy_type ;
	- typedef Kokkos::Future<value_type,ExecSpace> future_type ;
	- typedef Kokkos::View<long*,ExecSpace> view_type ;
	+ typedef long value_type;
	+ typedef Kokkos::TaskScheduler< ExecSpace > sched_type;
	+ typedef Kokkos::Future< value_type, ExecSpace > future_type;
	+ typedef Kokkos::View< long*, ExecSpace > view_type;

	- policy_type policy ;
	- future_type future ;
	+ sched_type sched;
	+ future_type future;

	- view_type result ;
	- const long nvalue ;
	+ view_type result;
	+ const long nvalue;

	KOKKOS_INLINE_FUNCTION
	- TestTaskTeamValue( const policy_type & arg_policy
	- , const view_type & arg_result
	- , const long arg_nvalue )
	- : policy(arg_policy)
	+ TestTaskTeamValue( const sched_type & arg_sched
	+ , const view_type & arg_result
	+ , const long arg_nvalue )
	+ : sched( arg_sched )
	, future()
	, result( arg_result )
	- , nvalue( arg_nvalue )
	- {}
	+ , nvalue( arg_nvalue ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator()( typename policy_type::member_type const & member
	+ void operator()( typename sched_type::member_type const & member
	, value_type & final )
	- {
	- const long end = nvalue + 1 ;
	- const long begin = 0 < end - SPAN ? end - SPAN : 0 ;
	+ {
	+ const long end = nvalue + 1;
	+ const long begin = 0 < end - SPAN ? end - SPAN : 0;

	- if ( 0 < begin && future.is_null() ) {
	- if ( member.team_rank() == 0 ) {
	-
	- future = policy.task_spawn
	- ( TestTaskTeamValue( policy , result , begin - 1 )
	- , Kokkos::TaskTeam );
	+ if ( 0 < begin && future.is_null() ) {
	+ if ( member.team_rank() == 0 ) {
	+ future = sched.task_spawn( TestTaskTeamValue( sched, result, begin - 1 )
	+ , Kokkos::TaskTeam );

	- assert( ! future.is_null() );
	+ assert( !future.is_null() );

	- policy.respawn( this , future );
	- }
	- return ;
	+ sched.respawn( this , future );
	}

	- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
	- , [&]( int i ) { result[i] = i + 1 ; }
	- );
	+ return;
	+ }

	- if ( member.team_rank() == 0 ) {
	- final = result[nvalue] ;
	- }
	+ Kokkos::parallel_for( Kokkos::TeamThreadRange( member, begin, end )
	+ , [&] ( int i ) { result[i] = i + 1; }
	+ );

	- Kokkos::memory_fence();
	+ if ( member.team_rank() == 0 ) {
	+ final = result[nvalue];
	}

	+ Kokkos::memory_fence();
	+ }
	+
	static void run( long n )
	- {
	- // const unsigned memory_capacity = 10000 ; // causes memory pool infinite loop
	- const unsigned memory_capacity = 100000 ;
	+ {
	+ //const unsigned memory_capacity = 10000; // Causes memory pool infinite loop.
	+ const unsigned memory_capacity = 100000;

	- policy_type root_policy( typename policy_type::memory_space()
	- , memory_capacity );
	+ sched_type root_sched( typename sched_type::memory_space()
	+ , memory_capacity );

	- view_type root_result("result",n+1);
	+ view_type root_result( "result", n + 1 );

	- typename view_type::HostMirror
	- host_result = Kokkos::create_mirror_view( root_result );
	+ typename view_type::HostMirror host_result = Kokkos::create_mirror_view( root_result );

	- future_type fv = root_policy.host_spawn
	- ( TestTaskTeamValue( root_policy, root_result, n ) , Kokkos::TaskTeam );
	+ future_type fv = root_sched.host_spawn( TestTaskTeamValue( root_sched, root_result, n )
	+ , Kokkos::TaskTeam );

	- Kokkos::wait( root_policy );
	+ Kokkos::wait( root_sched );

	- Kokkos::deep_copy( host_result , root_result );
	+ Kokkos::deep_copy( host_result, root_result );

	- if ( fv.get() != n + 1 ) {
	- std::cerr << "TestTaskTeamValue ERROR future = "
	- << fv.get() << " != " << n + 1 << std::endl ;
	- }
	- for ( long i = 0 ; i <= n ; ++i ) {
	- const long answer = i + 1 ;
	- if ( host_result(i) != answer ) {
	- std::cerr << "TestTaskTeamValue ERROR result(" << i << ") = "
	- << host_result(i) << " != " << answer << std::endl ;
	- }
	+ if ( fv.get() != n + 1 ) {
	+ std::cerr << "TestTaskTeamValue ERROR future = "
	+ << fv.get() << " != " << n + 1 << std::endl;
	+ }
	+
	+ for ( long i = 0; i <= n; ++i ) {
	+ const long answer = i + 1;
	+
	+ if ( host_result( i ) != answer ) {
	+ std::cerr << "TestTaskTeamValue ERROR result(" << i << ") = "
	+ << host_result( i ) << " != " << answer << std::endl;
	}
	}
	+ }
	};
	-} // namespace TestTaskScheduler
	-
	-//----------------------------------------------------------------------------
	-//----------------------------------------------------------------------------

	-#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
	-#endif /* #ifndef KOKKOS_UNITTEST_TASKSCHEDULER_HPP */
	+} // namespace TestTaskScheduler

	+#endif // #if defined( KOKKOS_ENABLE_TASKDAG )

	+#endif // #ifndef KOKKOS_UNITTEST_TASKSCHEDULER_HPP
	diff --git a/lib/kokkos/core/unit_test/TestTeam.hpp b/lib/kokkos/core/unit_test/TestTeam.hpp
	index bcf4d3a17..11a523921 100644
	--- a/lib/kokkos/core/unit_test/TestTeam.hpp
	+++ b/lib/kokkos/core/unit_test/TestTeam.hpp
	@@ -1,923 +1,947 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <stdio.h>
	#include <stdexcept>
	#include <sstream>
	#include <iostream>

	#include <Kokkos_Core.hpp>

	-/--------------------------------------------------------------------------/
	-
	namespace Test {
	+
	namespace {

	template< class ExecSpace, class ScheduleType >
	struct TestTeamPolicy {
	+ typedef typename Kokkos::TeamPolicy< ScheduleType, ExecSpace >::member_type team_member;
	+ typedef Kokkos::View< int**, ExecSpace > view_type;

	- typedef typename Kokkos::TeamPolicy< ScheduleType, ExecSpace >::member_type team_member ;
	- typedef Kokkos::View<int**,ExecSpace> view_type ;
	-
	- view_type m_flags ;
	+ view_type m_flags;

	TestTeamPolicy( const size_t league_size )
	- : m_flags( Kokkos::ViewAllocateWithoutInitializing("flags")
	- , Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( *this )
	- , league_size )
	- {}
	+ : m_flags( Kokkos::ViewAllocateWithoutInitializing( "flags" ),
	+ Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( *this ),
	+ league_size ) {}

	struct VerifyInitTag {};

	KOKKOS_INLINE_FUNCTION
	void operator()( const team_member & member ) const
	- {
	- const int tid = member.team_rank() + member.team_size() * member.league_rank();
	+ {
	+ const int tid = member.team_rank() + member.team_size() * member.league_rank();

	- m_flags( member.team_rank() , member.league_rank() ) = tid ;
	- }
	+ m_flags( member.team_rank(), member.league_rank() ) = tid;
	+ }

	KOKKOS_INLINE_FUNCTION
	- void operator()( const VerifyInitTag & , const team_member & member ) const
	- {
	- const int tid = member.team_rank() + member.team_size() * member.league_rank();
	+ void operator()( const VerifyInitTag &, const team_member & member ) const
	+ {
	+ const int tid = member.team_rank() + member.team_size() * member.league_rank();

	- if ( tid != m_flags( member.team_rank() , member.league_rank() ) ) {
	- printf("TestTeamPolicy member(%d,%d) error %d != %d\n"
	- , member.league_rank() , member.team_rank()
	- , tid , m_flags( member.team_rank() , member.league_rank() ) );
	- }
	+ if ( tid != m_flags( member.team_rank(), member.league_rank() ) ) {
	+ printf( "TestTeamPolicy member(%d,%d) error %d != %d\n",
	+ member.league_rank(), member.team_rank(),
	+ tid, m_flags( member.team_rank(), member.league_rank() ) );
	}
	+ }

	- // included for test_small_league_size
	- TestTeamPolicy()
	- : m_flags()
	- {}
	+ // Included for test_small_league_size.
	+ TestTeamPolicy() : m_flags() {}
	+
	+ // Included for test_small_league_size.
	+ struct NoOpTag {};

	- // included for test_small_league_size
	- struct NoOpTag {} ;
	KOKKOS_INLINE_FUNCTION
	- void operator()( const NoOpTag & , const team_member & member ) const
	- {}
	+ void operator()( const NoOpTag &, const team_member & member ) const {}


	static void test_small_league_size() {
	-
	int bs = 8; // batch size (number of elements per batch)
	int ns = 16; // total number of "problems" to process

	- // calculate total scratch memory space size
	+ // Calculate total scratch memory space size.
	const int level = 0;
	int mem_size = 960;
	- const int num_teams = ns/bs;
	- const Kokkos::TeamPolicy< ExecSpace, NoOpTag > policy(num_teams, Kokkos::AUTO());
	+ const int num_teams = ns / bs;
	+ const Kokkos::TeamPolicy< ExecSpace, NoOpTag > policy( num_teams, Kokkos::AUTO() );

	- Kokkos::parallel_for ( policy.set_scratch_size(level, Kokkos::PerTeam(mem_size), Kokkos::PerThread(0))
	- , TestTeamPolicy()
	- );
	+ Kokkos::parallel_for( policy.set_scratch_size( level, Kokkos::PerTeam( mem_size ), Kokkos::PerThread( 0 ) ),
	+ TestTeamPolicy() );
	}

	static void test_for( const size_t league_size )
	- {
	- TestTeamPolicy functor( league_size );
	+ {
	+ TestTeamPolicy functor( league_size );

	- const int team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( functor );
	+ const int team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( functor );

	- Kokkos::parallel_for( Kokkos::TeamPolicy< ScheduleType, ExecSpace >( league_size , team_size ) , functor );
	- Kokkos::parallel_for( Kokkos::TeamPolicy< ScheduleType, ExecSpace , VerifyInitTag >( league_size , team_size ) , functor );
	+ Kokkos::parallel_for( Kokkos::TeamPolicy< ScheduleType, ExecSpace >( league_size, team_size ), functor );
	+ Kokkos::parallel_for( Kokkos::TeamPolicy< ScheduleType, ExecSpace, VerifyInitTag >( league_size, team_size ), functor );

	- test_small_league_size();
	- }
	+ test_small_league_size();
	+ }

	struct ReduceTag {};

	- typedef long value_type ;
	+ typedef long value_type;

	KOKKOS_INLINE_FUNCTION
	- void operator()( const team_member & member , value_type & update ) const
	- {
	- update += member.team_rank() + member.team_size() * member.league_rank();
	- }
	+ void operator()( const team_member & member, value_type & update ) const
	+ {
	+ update += member.team_rank() + member.team_size() * member.league_rank();
	+ }

	KOKKOS_INLINE_FUNCTION
	- void operator()( const ReduceTag & , const team_member & member , value_type & update ) const
	- {
	- update += 1 + member.team_rank() + member.team_size() * member.league_rank();
	- }
	+ void operator()( const ReduceTag &, const team_member & member, value_type & update ) const
	+ {
	+ update += 1 + member.team_rank() + member.team_size() * member.league_rank();
	+ }

	static void test_reduce( const size_t league_size )
	- {
	- TestTeamPolicy functor( league_size );
	+ {
	+ TestTeamPolicy functor( league_size );

	- const int team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( functor );
	- const long N = team_size * league_size ;
	+ const int team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( functor );
	+ const long N = team_size * league_size;

	- long total = 0 ;
	+ long total = 0;

	- Kokkos::parallel_reduce( Kokkos::TeamPolicy< ScheduleType, ExecSpace >( league_size , team_size ) , functor , total );
	- ASSERT_EQ( size_t((N-1)*(N))/2 , size_t(total) );
	+ Kokkos::parallel_reduce( Kokkos::TeamPolicy< ScheduleType, ExecSpace >( league_size, team_size ), functor, total );
	+ ASSERT_EQ( size_t( ( N - 1 ) * ( N ) ) / 2, size_t( total ) );

	- Kokkos::parallel_reduce( Kokkos::TeamPolicy< ScheduleType, ExecSpace , ReduceTag >( league_size , team_size ) , functor , total );
	- ASSERT_EQ( (size_t(N)*size_t(N+1))/2 , size_t(total) );
	- }
	+ Kokkos::parallel_reduce( Kokkos::TeamPolicy< ScheduleType, ExecSpace, ReduceTag >( league_size, team_size ), functor, total );
	+ ASSERT_EQ( ( size_t( N ) * size_t( N + 1 ) ) / 2, size_t( total ) );
	+ }
	};

	-}
	-}
	+} // namespace
	+
	+} // namespace Test

	/--------------------------------------------------------------------------/

	namespace Test {

	-template< typename ScalarType , class DeviceType, class ScheduleType >
	+template< typename ScalarType, class DeviceType, class ScheduleType >
	class ReduceTeamFunctor
	{
	public:
	- typedef DeviceType execution_space ;
	- typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
	- typedef typename execution_space::size_type size_type ;
	+ typedef DeviceType execution_space;
	+ typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type;
	+ typedef typename execution_space::size_type size_type;

	struct value_type {
	- ScalarType value[3] ;
	+ ScalarType value[3];
	};

	- const size_type nwork ;
	+ const size_type nwork;

	ReduceTeamFunctor( const size_type & arg_nwork ) : nwork( arg_nwork ) {}

	- ReduceTeamFunctor( const ReduceTeamFunctor & rhs )
	- : nwork( rhs.nwork ) {}
	+ ReduceTeamFunctor( const ReduceTeamFunctor & rhs ) : nwork( rhs.nwork ) {}

	KOKKOS_INLINE_FUNCTION
	void init( value_type & dst ) const
	{
	- dst.value[0] = 0 ;
	- dst.value[1] = 0 ;
	- dst.value[2] = 0 ;
	+ dst.value[0] = 0;
	+ dst.value[1] = 0;
	+ dst.value[2] = 0;
	}

	KOKKOS_INLINE_FUNCTION
	- void join( volatile value_type & dst ,
	- const volatile value_type & src ) const
	+ void join( volatile value_type & dst, const volatile value_type & src ) const
	{
	- dst.value[0] += src.value[0] ;
	- dst.value[1] += src.value[1] ;
	- dst.value[2] += src.value[2] ;
	+ dst.value[0] += src.value[0];
	+ dst.value[1] += src.value[1];
	+ dst.value[2] += src.value[2];
	}

	KOKKOS_INLINE_FUNCTION
	- void operator()( const typename policy_type::member_type ind , value_type & dst ) const
	+ void operator()( const typename policy_type::member_type ind, value_type & dst ) const
	{
	const int thread_rank = ind.team_rank() + ind.team_size() * ind.league_rank();
	const int thread_size = ind.team_size() * ind.league_size();
	- const int chunk = ( nwork + thread_size - 1 ) / thread_size ;
	+ const int chunk = ( nwork + thread_size - 1 ) / thread_size;

	- size_type iwork = chunk * thread_rank ;
	- const size_type iwork_end = iwork + chunk < nwork ? iwork + chunk : nwork ;
	+ size_type iwork = chunk * thread_rank;
	+ const size_type iwork_end = iwork + chunk < nwork ? iwork + chunk : nwork;

	- for ( ; iwork < iwork_end ; ++iwork ) {
	- dst.value[0] += 1 ;
	- dst.value[1] += iwork + 1 ;
	- dst.value[2] += nwork - iwork ;
	+ for ( ; iwork < iwork_end; ++iwork ) {
	+ dst.value[0] += 1;
	+ dst.value[1] += iwork + 1;
	+ dst.value[2] += nwork - iwork;
	}
	}
	};

	} // namespace Test

	namespace {

	-template< typename ScalarType , class DeviceType, class ScheduleType >
	+template< typename ScalarType, class DeviceType, class ScheduleType >
	class TestReduceTeam
	{
	public:
	- typedef DeviceType execution_space ;
	- typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
	- typedef typename execution_space::size_type size_type ;
	-
	- //------------------------------------
	+ typedef DeviceType execution_space;
	+ typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type;
	+ typedef typename execution_space::size_type size_type;

	- TestReduceTeam( const size_type & nwork )
	- {
	- run_test(nwork);
	- }
	+ TestReduceTeam( const size_type & nwork ) { run_test( nwork ); }

	void run_test( const size_type & nwork )
	{
	- typedef Test::ReduceTeamFunctor< ScalarType , execution_space , ScheduleType> functor_type ;
	- typedef typename functor_type::value_type value_type ;
	- typedef Kokkos::View< value_type, Kokkos::HostSpace, Kokkos::MemoryUnmanaged > result_type ;
	+ typedef Test::ReduceTeamFunctor< ScalarType, execution_space, ScheduleType> functor_type;
	+ typedef typename functor_type::value_type value_type;
	+ typedef Kokkos::View< value_type, Kokkos::HostSpace, Kokkos::MemoryUnmanaged > result_type;

	enum { Count = 3 };
	enum { Repeat = 100 };

	value_type result[ Repeat ];

	- const unsigned long nw = nwork ;
	- const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
	- : (nw/2) * ( nw + 1 );
	+ const unsigned long nw = nwork;
	+ const unsigned long nsum = nw % 2 ? nw * ( ( nw + 1 ) / 2 )
	+ : ( nw / 2 ) * ( nw + 1 );

	- const unsigned team_size = policy_type::team_size_recommended( functor_type(nwork) );
	- const unsigned league_size = ( nwork + team_size - 1 ) / team_size ;
	+ const unsigned team_size = policy_type::team_size_recommended( functor_type( nwork ) );
	+ const unsigned league_size = ( nwork + team_size - 1 ) / team_size;

	- policy_type team_exec( league_size , team_size );
	+ policy_type team_exec( league_size, team_size );

	- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
	+ for ( unsigned i = 0; i < Repeat; ++i ) {
	result_type tmp( & result[i] );
	- Kokkos::parallel_reduce( team_exec , functor_type(nwork) , tmp );
	+ Kokkos::parallel_reduce( team_exec, functor_type( nwork ), tmp );
	}

	execution_space::fence();

	- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
	- for ( unsigned j = 0 ; j < Count ; ++j ) {
	- const unsigned long correct = 0 == j % 3 ? nw : nsum ;
	- ASSERT_EQ( (ScalarType) correct , result[i].value[j] );
	+ for ( unsigned i = 0; i < Repeat; ++i ) {
	+ for ( unsigned j = 0; j < Count; ++j ) {
	+ const unsigned long correct = 0 == j % 3 ? nw : nsum;
	+ ASSERT_EQ( (ScalarType) correct, result[i].value[j] );
	}
	}
	}
	};

	-}
	+} // namespace

	/--------------------------------------------------------------------------/

	namespace Test {

	template< class DeviceType, class ScheduleType >
	class ScanTeamFunctor
	{
	public:
	- typedef DeviceType execution_space ;
	- typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
	+ typedef DeviceType execution_space;
	+ typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type;
	+ typedef long int value_type;

	- typedef long int value_type ;
	- Kokkos::View< value_type , execution_space > accum ;
	- Kokkos::View< value_type , execution_space > total ;
	+ Kokkos::View< value_type, execution_space > accum;
	+ Kokkos::View< value_type, execution_space > total;

	- ScanTeamFunctor() : accum("accum"), total("total") {}
	+ ScanTeamFunctor() : accum( "accum" ), total( "total" ) {}

	KOKKOS_INLINE_FUNCTION
	- void init( value_type & error ) const { error = 0 ; }
	+ void init( value_type & error ) const { error = 0; }

	KOKKOS_INLINE_FUNCTION
	- void join( value_type volatile & error ,
	- value_type volatile const & input ) const
	- { if ( input ) error = 1 ; }
	+ void join( value_type volatile & error, value_type volatile const & input ) const
	+ { if ( input ) error = 1; }

	struct JoinMax {
	- typedef long int value_type ;
	+ typedef long int value_type;
	+
	KOKKOS_INLINE_FUNCTION
	- void join( value_type volatile & dst
	- , value_type volatile const & input ) const
	- { if ( dst < input ) dst = input ; }
	+ void join( value_type volatile & dst, value_type volatile const & input ) const
	+ { if ( dst < input ) dst = input; }
	};

	KOKKOS_INLINE_FUNCTION
	- void operator()( const typename policy_type::member_type ind , value_type & error ) const
	+ void operator()( const typename policy_type::member_type ind, value_type & error ) const
	{
	if ( 0 == ind.league_rank() && 0 == ind.team_rank() ) {
	const long int thread_count = ind.league_size() * ind.team_size();
	- total() = ( thread_count * ( thread_count + 1 ) ) / 2 ;
	+ total() = ( thread_count * ( thread_count + 1 ) ) / 2;
	}

	// Team max:
	- const int long m = ind.team_reduce( (long int) ( ind.league_rank() + ind.team_rank() ) , JoinMax() );
	+ const int long m = ind.team_reduce( (long int) ( ind.league_rank() + ind.team_rank() ), JoinMax() );

	if ( m != ind.league_rank() + ( ind.team_size() - 1 ) ) {
	- printf("ScanTeamFunctor[%d.%d of %d.%d] reduce_max_answer(%ld) != reduce_max(%ld)\n"
	- , ind.league_rank(), ind.team_rank()
	- , ind.league_size(), ind.team_size()
	- , (long int)(ind.league_rank() + ( ind.team_size() - 1 )) , m );
	+ printf( "ScanTeamFunctor[%d.%d of %d.%d] reduce_max_answer(%ld) != reduce_max(%ld)\n",
	+ ind.league_rank(), ind.team_rank(),
	+ ind.league_size(), ind.team_size(),
	+ (long int) ( ind.league_rank() + ( ind.team_size() - 1 ) ), m );
	}

	// Scan:
	const long int answer =
	- ( ind.league_rank() + 1 ) * ind.team_rank() +
	- ( ind.team_rank() * ( ind.team_rank() + 1 ) ) / 2 ;
	+ ( ind.league_rank() + 1 ) * ind.team_rank() + ( ind.team_rank() * ( ind.team_rank() + 1 ) ) / 2;

	const long int result =
	ind.team_scan( ind.league_rank() + 1 + ind.team_rank() + 1 );

	const long int result2 =
	ind.team_scan( ind.league_rank() + 1 + ind.team_rank() + 1 );

	if ( answer != result \|\| answer != result2 ) {
	- printf("ScanTeamFunctor[%d.%d of %d.%d] answer(%ld) != scan_first(%ld) or scan_second(%ld)\n",
	- ind.league_rank(), ind.team_rank(),
	- ind.league_size(), ind.team_size(),
	- answer,result,result2);
	- error = 1 ;
	+ printf( "ScanTeamFunctor[%d.%d of %d.%d] answer(%ld) != scan_first(%ld) or scan_second(%ld)\n",
	+ ind.league_rank(), ind.team_rank(),
	+ ind.league_size(), ind.team_size(),
	+ answer, result, result2 );
	+
	+ error = 1;
	}

	const long int thread_rank = ind.team_rank() +
	ind.team_size() * ind.league_rank();
	- ind.team_scan( 1 + thread_rank , accum.ptr_on_device() );
	+ ind.team_scan( 1 + thread_rank, accum.ptr_on_device() );
	}
	};

	template< class DeviceType, class ScheduleType >
	class TestScanTeam
	{
	public:
	- typedef DeviceType execution_space ;
	- typedef long int value_type ;
	-
	- typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
	- typedef Test::ScanTeamFunctor<DeviceType, ScheduleType> functor_type ;
	+ typedef DeviceType execution_space;
	+ typedef long int value_type;
	+ typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type;
	+ typedef Test::ScanTeamFunctor<DeviceType, ScheduleType> functor_type;

	- //------------------------------------
	-
	- TestScanTeam( const size_t nteam )
	- {
	- run_test(nteam);
	- }
	+ TestScanTeam( const size_t nteam ) { run_test( nteam ); }

	void run_test( const size_t nteam )
	{
	- typedef Kokkos::View< long int , Kokkos::HostSpace , Kokkos::MemoryUnmanaged > result_type ;
	- const unsigned REPEAT = 100000 ;
	+ typedef Kokkos::View< long int, Kokkos::HostSpace, Kokkos::MemoryUnmanaged > result_type;
	+
	+ const unsigned REPEAT = 100000;
	unsigned Repeat;
	- if ( nteam == 0 )
	- {
	+
	+ if ( nteam == 0 ) {
	Repeat = 1;
	- } else {
	- Repeat = ( REPEAT + nteam - 1 ) / nteam ; //error here
	}
	+ else {
	+ Repeat = ( REPEAT + nteam - 1 ) / nteam; // Error here.
	+ }
	+
	+ functor_type functor;

	- functor_type functor ;
	+ policy_type team_exec( nteam, policy_type::team_size_max( functor ) );

	- policy_type team_exec( nteam , policy_type::team_size_max( functor ) );
	+ for ( unsigned i = 0; i < Repeat; ++i ) {
	+ long int accum = 0;
	+ long int total = 0;
	+ long int error = 0;
	+ Kokkos::deep_copy( functor.accum, total );

	- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
	- long int accum = 0 ;
	- long int total = 0 ;
	- long int error = 0 ;
	- Kokkos::deep_copy( functor.accum , total );
	- Kokkos::parallel_reduce( team_exec , functor , result_type( & error ) );
	+ Kokkos::parallel_reduce( team_exec, functor, result_type( & error ) );
	DeviceType::fence();
	- Kokkos::deep_copy( accum , functor.accum );
	- Kokkos::deep_copy( total , functor.total );

	- ASSERT_EQ( error , 0 );
	- ASSERT_EQ( total , accum );
	+ Kokkos::deep_copy( accum, functor.accum );
	+ Kokkos::deep_copy( total, functor.total );
	+
	+ ASSERT_EQ( error, 0 );
	+ ASSERT_EQ( total, accum );
	}

	execution_space::fence();
	}
	};

	} // namespace Test

	/--------------------------------------------------------------------------/

	namespace Test {

	template< class ExecSpace, class ScheduleType >
	struct SharedTeamFunctor {

	- typedef ExecSpace execution_space ;
	- typedef int value_type ;
	- typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
	+ typedef ExecSpace execution_space;
	+ typedef int value_type;
	+ typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type;

	enum { SHARED_COUNT = 1000 };

	- typedef typename ExecSpace::scratch_memory_space shmem_space ;
	+ typedef typename ExecSpace::scratch_memory_space shmem_space;

	- // tbd: MemoryUnmanaged should be the default for shared memory space
	- typedef Kokkos::View<int*,shmem_space,Kokkos::MemoryUnmanaged> shared_int_array_type ;
	+ // TBD: MemoryUnmanaged should be the default for shared memory space.
	+ typedef Kokkos::View< int*, shmem_space, Kokkos::MemoryUnmanaged > shared_int_array_type;

	- // Tell how much shared memory will be required by this functor:
	+ // Tell how much shared memory will be required by this functor.
	inline
	unsigned team_shmem_size( int team_size ) const
	{
	return shared_int_array_type::shmem_size( SHARED_COUNT ) +
	shared_int_array_type::shmem_size( SHARED_COUNT );
	}

	KOKKOS_INLINE_FUNCTION
	- void operator()( const typename policy_type::member_type & ind , value_type & update ) const
	+ void operator()( const typename policy_type::member_type & ind, value_type & update ) const
	{
	- const shared_int_array_type shared_A( ind.team_shmem() , SHARED_COUNT );
	- const shared_int_array_type shared_B( ind.team_shmem() , SHARED_COUNT );
	-
	- if ((shared_A.ptr_on_device () == NULL && SHARED_COUNT > 0) \|\|
	- (shared_B.ptr_on_device () == NULL && SHARED_COUNT > 0)) {
	- printf ("Failed to allocate shared memory of size %lu\n",
	- static_cast<unsigned long> (SHARED_COUNT));
	- ++update; // failure to allocate is an error
	+ const shared_int_array_type shared_A( ind.team_shmem(), SHARED_COUNT );
	+ const shared_int_array_type shared_B( ind.team_shmem(), SHARED_COUNT );
	+
	+ if ( ( shared_A.ptr_on_device () == NULL && SHARED_COUNT > 0 ) \|\|
	+ ( shared_B.ptr_on_device () == NULL && SHARED_COUNT > 0 ) )
	+ {
	+ printf ("member( %d/%d , %d/%d ) Failed to allocate shared memory of size %lu\n"
	+ , ind.league_rank()
	+ , ind.league_size()
	+ , ind.team_rank()
	+ , ind.team_size()
	+ , static_cast<unsigned long>( SHARED_COUNT )
	+ );
	+
	+ ++update; // Failure to allocate is an error.
	}
	else {
	- for ( int i = ind.team_rank() ; i < SHARED_COUNT ; i += ind.team_size() ) {
	+ for ( int i = ind.team_rank(); i < SHARED_COUNT; i += ind.team_size() ) {
	shared_A[i] = i + ind.league_rank();
	shared_B[i] = 2 * i + ind.league_rank();
	}

	ind.team_barrier();

	if ( ind.team_rank() + 1 == ind.team_size() ) {
	- for ( int i = 0 ; i < SHARED_COUNT ; ++i ) {
	+ for ( int i = 0; i < SHARED_COUNT; ++i ) {
	if ( shared_A[i] != i + ind.league_rank() ) {
	- ++update ;
	+ ++update;
	}
	+
	if ( shared_B[i] != 2 * i + ind.league_rank() ) {
	- ++update ;
	+ ++update;
	}
	}
	}
	}
	}
	};

	-}
	+} // namespace Test

	namespace {

	template< class ExecSpace, class ScheduleType >
	struct TestSharedTeam {
	-
	- TestSharedTeam()
	- { run(); }
	+ TestSharedTeam() { run(); }

	void run()
	{
	- typedef Test::SharedTeamFunctor<ExecSpace, ScheduleType> Functor ;
	- typedef Kokkos::View< typename Functor::value_type , Kokkos::HostSpace , Kokkos::MemoryUnmanaged > result_type ;
	+ typedef Test::SharedTeamFunctor<ExecSpace, ScheduleType> Functor;
	+ typedef Kokkos::View< typename Functor::value_type, Kokkos::HostSpace, Kokkos::MemoryUnmanaged > result_type;

	- const size_t team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( Functor() );
	+ const size_t team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( Functor() );

	- Kokkos::TeamPolicy< ScheduleType, ExecSpace > team_exec( 8192 / team_size , team_size );
	+ Kokkos::TeamPolicy< ScheduleType, ExecSpace > team_exec( 8192 / team_size, team_size );

	- typename Functor::value_type error_count = 0 ;
	+ typename Functor::value_type error_count = 0;

	- Kokkos::parallel_reduce( team_exec , Functor() , result_type( & error_count ) );
	+ Kokkos::parallel_reduce( team_exec, Functor(), result_type( & error_count ) );

	- ASSERT_EQ( error_count , 0 );
	+ ASSERT_EQ( error_count, 0 );
	}
	};
	-}
	+
	+} // namespace

	namespace Test {

	-#if defined (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
	template< class MemorySpace, class ExecSpace, class ScheduleType >
	struct TestLambdaSharedTeam {
	-
	- TestLambdaSharedTeam()
	- { run(); }
	+ TestLambdaSharedTeam() { run(); }

	void run()
	{
	- typedef Test::SharedTeamFunctor<ExecSpace, ScheduleType> Functor ;
	- //typedef Kokkos::View< typename Functor::value_type , Kokkos::HostSpace , Kokkos::MemoryUnmanaged > result_type ;
	- typedef Kokkos::View< typename Functor::value_type , MemorySpace, Kokkos::MemoryUnmanaged > result_type ;
	+ typedef Test::SharedTeamFunctor< ExecSpace, ScheduleType > Functor;
	+ //typedef Kokkos::View< typename Functor::value_type, Kokkos::HostSpace, Kokkos::MemoryUnmanaged > result_type;
	+ typedef Kokkos::View< typename Functor::value_type, MemorySpace, Kokkos::MemoryUnmanaged > result_type;

	- typedef typename ExecSpace::scratch_memory_space shmem_space ;
	+ typedef typename ExecSpace::scratch_memory_space shmem_space;

	- // tbd: MemoryUnmanaged should be the default for shared memory space
	- typedef Kokkos::View<int*,shmem_space,Kokkos::MemoryUnmanaged> shared_int_array_type ;
	+ // TBD: MemoryUnmanaged should be the default for shared memory space.
	+ typedef Kokkos::View< int*, shmem_space, Kokkos::MemoryUnmanaged > shared_int_array_type;

	const int SHARED_COUNT = 1000;
	int team_size = 1;
	+
	#ifdef KOKKOS_ENABLE_CUDA
	- if(std::is_same<ExecSpace,Kokkos::Cuda>::value)
	- team_size = 128;
	+ if ( std::is_same< ExecSpace, Kokkos::Cuda >::value ) team_size = 128;
	#endif
	- Kokkos::TeamPolicy< ScheduleType, ExecSpace > team_exec( 8192 / team_size , team_size);
	- team_exec = team_exec.set_scratch_size(0,Kokkos::PerTeam(SHARED_COUNT2sizeof(int)));

	- typename Functor::value_type error_count = 0 ;
	+ Kokkos::TeamPolicy< ScheduleType, ExecSpace > team_exec( 8192 / team_size, team_size );
	+ team_exec = team_exec.set_scratch_size( 0, Kokkos::PerTeam( SHARED_COUNT * 2 * sizeof( int ) ) );
	+
	+ typename Functor::value_type error_count = 0;

	- Kokkos::parallel_reduce( team_exec , KOKKOS_LAMBDA
	- ( const typename Kokkos::TeamPolicy< ScheduleType, ExecSpace >::member_type & ind , int & update ) {
	+ Kokkos::parallel_reduce( team_exec, KOKKOS_LAMBDA
	+ ( const typename Kokkos::TeamPolicy< ScheduleType, ExecSpace >::member_type & ind, int & update )
	+ {
	+ const shared_int_array_type shared_A( ind.team_shmem(), SHARED_COUNT );
	+ const shared_int_array_type shared_B( ind.team_shmem(), SHARED_COUNT );

	- const shared_int_array_type shared_A( ind.team_shmem() , SHARED_COUNT );
	- const shared_int_array_type shared_B( ind.team_shmem() , SHARED_COUNT );
	+ if ( ( shared_A.ptr_on_device () == NULL && SHARED_COUNT > 0 ) \|\|
	+ ( shared_B.ptr_on_device () == NULL && SHARED_COUNT > 0 ) )
	+ {
	+ printf( "Failed to allocate shared memory of size %lu\n",
	+ static_cast<unsigned long>( SHARED_COUNT ) );

	- if ((shared_A.ptr_on_device () == NULL && SHARED_COUNT > 0) \|\|
	- (shared_B.ptr_on_device () == NULL && SHARED_COUNT > 0)) {
	- printf ("Failed to allocate shared memory of size %lu\n",
	- static_cast<unsigned long> (SHARED_COUNT));
	- ++update; // failure to allocate is an error
	- } else {
	- for ( int i = ind.team_rank() ; i < SHARED_COUNT ; i += ind.team_size() ) {
	+ ++update; // Failure to allocate is an error.
	+ }
	+ else {
	+ for ( int i = ind.team_rank(); i < SHARED_COUNT; i += ind.team_size() ) {
	shared_A[i] = i + ind.league_rank();
	shared_B[i] = 2 * i + ind.league_rank();
	}

	ind.team_barrier();

	if ( ind.team_rank() + 1 == ind.team_size() ) {
	- for ( int i = 0 ; i < SHARED_COUNT ; ++i ) {
	+ for ( int i = 0; i < SHARED_COUNT; ++i ) {
	if ( shared_A[i] != i + ind.league_rank() ) {
	- ++update ;
	+ ++update;
	}
	+
	if ( shared_B[i] != 2 * i + ind.league_rank() ) {
	- ++update ;
	+ ++update;
	}
	}
	}
	}
	}, result_type( & error_count ) );

	- ASSERT_EQ( error_count , 0 );
	+ ASSERT_EQ( error_count, 0 );
	}
	};
	#endif
	-}
	+
	+} // namespace Test

	namespace Test {

	template< class ExecSpace, class ScheduleType >
	struct ScratchTeamFunctor {

	- typedef ExecSpace execution_space ;
	- typedef int value_type ;
	- typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
	+ typedef ExecSpace execution_space;
	+ typedef int value_type;
	+ typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type;

	enum { SHARED_TEAM_COUNT = 100 };
	enum { SHARED_THREAD_COUNT = 10 };

	- typedef typename ExecSpace::scratch_memory_space shmem_space ;
	+ typedef typename ExecSpace::scratch_memory_space shmem_space;

	- // tbd: MemoryUnmanaged should be the default for shared memory space
	- typedef Kokkos::View<size_t*,shmem_space,Kokkos::MemoryUnmanaged> shared_int_array_type ;
	+ // TBD: MemoryUnmanaged should be the default for shared memory space.
	+ typedef Kokkos::View< size_t*, shmem_space, Kokkos::MemoryUnmanaged > shared_int_array_type;

	KOKKOS_INLINE_FUNCTION
	- void operator()( const typename policy_type::member_type & ind , value_type & update ) const
	+ void operator()( const typename policy_type::member_type & ind, value_type & update ) const
	{
	- const shared_int_array_type scratch_ptr( ind.team_scratch(1) , 3*ind.team_size() );
	- const shared_int_array_type scratch_A( ind.team_scratch(1) , SHARED_TEAM_COUNT );
	- const shared_int_array_type scratch_B( ind.thread_scratch(1) , SHARED_THREAD_COUNT );
	-
	- if ((scratch_ptr.ptr_on_device () == NULL ) \|\|
	- (scratch_A. ptr_on_device () == NULL && SHARED_TEAM_COUNT > 0) \|\|
	- (scratch_B. ptr_on_device () == NULL && SHARED_THREAD_COUNT > 0)) {
	- printf ("Failed to allocate shared memory of size %lu\n",
	- static_cast<unsigned long> (SHARED_TEAM_COUNT));
	- ++update; // failure to allocate is an error
	+ const shared_int_array_type scratch_ptr( ind.team_scratch( 1 ), 3 * ind.team_size() );
	+ const shared_int_array_type scratch_A( ind.team_scratch( 1 ), SHARED_TEAM_COUNT );
	+ const shared_int_array_type scratch_B( ind.thread_scratch( 1 ), SHARED_THREAD_COUNT );
	+
	+ if ( ( scratch_ptr.ptr_on_device () == NULL ) \|\|
	+ ( scratch_A. ptr_on_device () == NULL && SHARED_TEAM_COUNT > 0 ) \|\|
	+ ( scratch_B. ptr_on_device () == NULL && SHARED_THREAD_COUNT > 0 ) )
	+ {
	+ printf( "Failed to allocate shared memory of size %lu\n",
	+ static_cast<unsigned long>( SHARED_TEAM_COUNT ) );
	+
	+ ++update; // Failure to allocate is an error.
	}
	else {
	- Kokkos::parallel_for(Kokkos::TeamThreadRange(ind,0,(int)SHARED_TEAM_COUNT),[&] (const int &i) {
	+ Kokkos::parallel_for( Kokkos::TeamThreadRange( ind, 0, (int) SHARED_TEAM_COUNT ), [&] ( const int & i ) {
	scratch_A[i] = i + ind.league_rank();
	});
	- for(int i=0; i<SHARED_THREAD_COUNT; i++)
	- scratch_B[i] = 10000ind.league_rank() + 100ind.team_rank() + i;
	+
	+ for ( int i = 0; i < SHARED_THREAD_COUNT; i++ ) {
	+ scratch_B[i] = 10000 * ind.league_rank() + 100 * ind.team_rank() + i;
	+ }

	scratch_ptr[ind.team_rank()] = (size_t) scratch_A.ptr_on_device();
	scratch_ptr[ind.team_rank() + ind.team_size()] = (size_t) scratch_B.ptr_on_device();

	ind.team_barrier();

	- for( int i = 0; i<SHARED_TEAM_COUNT; i++) {
	- if(scratch_A[i] != size_t(i + ind.league_rank()))
	- ++update;
	+ for ( int i = 0; i < SHARED_TEAM_COUNT; i++ ) {
	+ if ( scratch_A[i] != size_t( i + ind.league_rank() ) ) ++update;
	}
	- for( int i = 0; i < ind.team_size(); i++) {
	- if(scratch_ptr[0]!=scratch_ptr[i]) ++update;
	+
	+ for ( int i = 0; i < ind.team_size(); i++ ) {
	+ if ( scratch_ptr[0] != scratch_ptr[i] ) ++update;
	}
	- if(scratch_ptr[1+ind.team_size()] - scratch_ptr[0 + ind.team_size()] <
	- SHARED_THREAD_COUNT*sizeof(size_t))
	+
	+ if ( scratch_ptr[1 + ind.team_size()] - scratch_ptr[0 + ind.team_size()] < SHARED_THREAD_COUNT * sizeof( size_t ) ) {
	++update;
	- for( int i = 1; i < ind.team_size(); i++) {
	- if((scratch_ptr[i+ind.team_size()] - scratch_ptr[i-1+ind.team_size()]) !=
	- (scratch_ptr[1+ind.team_size()] - scratch_ptr[0 + ind.team_size()])) ++update;
	+ }

	+ for ( int i = 1; i < ind.team_size(); i++ ) {
	+ if ( ( scratch_ptr[i + ind.team_size()] - scratch_ptr[i - 1 + ind.team_size()] ) !=
	+ ( scratch_ptr[1 + ind.team_size()] - scratch_ptr[0 + ind.team_size()] ) )
	+ {
	+ ++update;
	+ }
	}
	}
	}
	};

	-}
	+} // namespace Test

	namespace {

	template< class ExecSpace, class ScheduleType >
	struct TestScratchTeam {
	-
	- TestScratchTeam()
	- { run(); }
	+ TestScratchTeam() { run(); }

	void run()
	{
	- typedef Test::ScratchTeamFunctor<ExecSpace, ScheduleType> Functor ;
	- typedef Kokkos::View< typename Functor::value_type , Kokkos::HostSpace , Kokkos::MemoryUnmanaged > result_type ;
	+ typedef Test::ScratchTeamFunctor<ExecSpace, ScheduleType> Functor;
	+ typedef Kokkos::View< typename Functor::value_type, Kokkos::HostSpace, Kokkos::MemoryUnmanaged > result_type;

	const size_t team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( Functor() );

	- Kokkos::TeamPolicy< ScheduleType, ExecSpace > team_exec( 8192 / team_size , team_size );
	+ Kokkos::TeamPolicy< ScheduleType, ExecSpace > team_exec( 8192 / team_size, team_size );
	+
	+ typename Functor::value_type error_count = 0;
	+
	+ int team_scratch_size = Functor::shared_int_array_type::shmem_size( Functor::SHARED_TEAM_COUNT ) +
	+ Functor::shared_int_array_type::shmem_size( 3 * team_size );

	- typename Functor::value_type error_count = 0 ;
	+ int thread_scratch_size = Functor::shared_int_array_type::shmem_size( Functor::SHARED_THREAD_COUNT );

	- int team_scratch_size = Functor::shared_int_array_type::shmem_size(Functor::SHARED_TEAM_COUNT) +
	- Functor::shared_int_array_type::shmem_size(3*team_size);
	- int thread_scratch_size = Functor::shared_int_array_type::shmem_size(Functor::SHARED_THREAD_COUNT);
	- Kokkos::parallel_reduce( team_exec.set_scratch_size(0,Kokkos::PerTeam(team_scratch_size),
	- Kokkos::PerThread(thread_scratch_size)) ,
	- Functor() , result_type( & error_count ) );
	+ Kokkos::parallel_reduce( team_exec.set_scratch_size( 0, Kokkos::PerTeam( team_scratch_size ),
	+ Kokkos::PerThread( thread_scratch_size ) ),
	+ Functor(), result_type( & error_count ) );

	- ASSERT_EQ( error_count , 0 );
	+ ASSERT_EQ( error_count, 0 );
	}
	};
	-}
	+
	+} // namespace

	namespace Test {
	-template< class ExecSpace>
	+
	+template< class ExecSpace >
	KOKKOS_INLINE_FUNCTION
	-int test_team_mulit_level_scratch_loop_body(const typename Kokkos::TeamPolicy<ExecSpace>::member_type& team) {
	- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_team1(team.team_scratch(0),128);
	- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_thread1(team.thread_scratch(0),16);
	- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_team2(team.team_scratch(0),128);
	- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_thread2(team.thread_scratch(0),16);
	-
	- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_team1(team.team_scratch(1),128000);
	- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_thread1(team.thread_scratch(1),16000);
	- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_team2(team.team_scratch(1),128000);
	- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_thread2(team.thread_scratch(1),16000);
	-
	- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_team3(team.team_scratch(0),128);
	- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_thread3(team.thread_scratch(0),16);
	- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_team3(team.team_scratch(1),128000);
	- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_thread3(team.thread_scratch(1),16000);
	+int test_team_mulit_level_scratch_loop_body( const typename Kokkos::TeamPolicy<ExecSpace>::member_type& team ) {
	+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > a_team1( team.team_scratch( 0 ), 128 );
	+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > a_thread1( team.thread_scratch( 0 ), 16 );
	+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > a_team2( team.team_scratch( 0 ), 128 );
	+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > a_thread2( team.thread_scratch( 0 ), 16 );
	+
	+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > b_team1( team.team_scratch( 1 ), 128000 );
	+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > b_thread1( team.thread_scratch( 1 ), 16000 );
	+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > b_team2( team.team_scratch( 1 ), 128000 );
	+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > b_thread2( team.thread_scratch( 1 ), 16000 );
	+
	+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > a_team3( team.team_scratch( 0 ), 128 );
	+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > a_thread3( team.thread_scratch( 0 ), 16 );
	+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > b_team3( team.team_scratch( 1 ), 128000 );
	+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > b_thread3( team.thread_scratch( 1 ), 16000 );

	// The explicit types for 0 and 128 are here to test TeamThreadRange accepting different
	// types for begin and end.
	- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,int(0),unsigned(128)), [&] (const int& i)
	+ Kokkos::parallel_for( Kokkos::TeamThreadRange( team, int( 0 ), unsigned( 128 ) ), [&] ( const int & i )
	{
	- a_team1(i) = 1000000 + i;
	- a_team2(i) = 2000000 + i;
	- a_team3(i) = 3000000 + i;
	+ a_team1( i ) = 1000000 + i + team.league_rank() * 100000;
	+ a_team2( i ) = 2000000 + i + team.league_rank() * 100000;
	+ a_team3( i ) = 3000000 + i + team.league_rank() * 100000;
	});
	team.team_barrier();
	- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16), [&] (const int& i)
	+
	+ Kokkos::parallel_for( Kokkos::ThreadVectorRange( team, 16 ), [&] ( const int & i )
	{
	- a_thread1(i) = 1000000 + 100000*team.team_rank() + 16-i;
	- a_thread2(i) = 2000000 + 100000*team.team_rank() + 16-i;
	- a_thread3(i) = 3000000 + 100000*team.team_rank() + 16-i;
	+ a_thread1( i ) = 1000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000;
	+ a_thread2( i ) = 2000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000;
	+ a_thread3( i ) = 3000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000;
	});

	- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,0,128000), [&] (const int& i)
	+ Kokkos::parallel_for( Kokkos::TeamThreadRange( team, 0, 128000 ), [&] ( const int & i )
	{
	- b_team1(i) = 1000000 + i;
	- b_team2(i) = 2000000 + i;
	- b_team3(i) = 3000000 + i;
	+ b_team1( i ) = 1000000 + i + team.league_rank() * 100000;
	+ b_team2( i ) = 2000000 + i + team.league_rank() * 100000;
	+ b_team3( i ) = 3000000 + i + team.league_rank() * 100000;
	});
	team.team_barrier();
	- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16000), [&] (const int& i)
	+
	+ Kokkos::parallel_for( Kokkos::ThreadVectorRange( team, 16000 ), [&] ( const int & i )
	{
	- b_thread1(i) = 1000000 + 100000*team.team_rank() + 16-i;
	- b_thread2(i) = 2000000 + 100000*team.team_rank() + 16-i;
	- b_thread3(i) = 3000000 + 100000*team.team_rank() + 16-i;
	+ b_thread1( i ) = 1000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000;
	+ b_thread2( i ) = 2000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000;
	+ b_thread3( i ) = 3000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000;
	});

	team.team_barrier();
	+
	int error = 0;
	- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,0,128), [&] (const int& i)
	+ Kokkos::parallel_for( Kokkos::TeamThreadRange( team, 0, 128 ), [&] ( const int & i )
	{
	- if(a_team1(i) != 1000000 + i) error++;
	- if(a_team2(i) != 2000000 + i) error++;
	- if(a_team3(i) != 3000000 + i) error++;
	+ if ( a_team1( i ) != 1000000 + i + team.league_rank() * 100000 ) error++;
	+ if ( a_team2( i ) != 2000000 + i + team.league_rank() * 100000 ) error++;
	+ if ( a_team3( i ) != 3000000 + i + team.league_rank() * 100000 ) error++;
	});
	team.team_barrier();
	- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16), [&] (const int& i)
	+
	+ Kokkos::parallel_for( Kokkos::ThreadVectorRange( team, 16 ), [&] ( const int & i )
	{
	- if(a_thread1(i) != 1000000 + 100000*team.team_rank() + 16-i) error++;
	- if(a_thread2(i) != 2000000 + 100000*team.team_rank() + 16-i) error++;
	- if(a_thread3(i) != 3000000 + 100000*team.team_rank() + 16-i) error++;
	+ if ( a_thread1( i ) != 1000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000 ) error++;
	+ if ( a_thread2( i ) != 2000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000 ) error++;
	+ if ( a_thread3( i ) != 3000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000 ) error++;
	});

	- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,0,128000), [&] (const int& i)
	+ Kokkos::parallel_for( Kokkos::TeamThreadRange( team, 0, 128000 ), [&] ( const int & i )
	{
	- if(b_team1(i) != 1000000 + i) error++;
	- if(b_team2(i) != 2000000 + i) error++;
	- if(b_team3(i) != 3000000 + i) error++;
	+ if ( b_team1( i ) != 1000000 + i + team.league_rank() * 100000 ) error++;
	+ if ( b_team2( i ) != 2000000 + i + team.league_rank() * 100000 ) error++;
	+ if ( b_team3( i ) != 3000000 + i + team.league_rank() * 100000 ) error++;
	});
	team.team_barrier();
	- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16000), [&] (const int& i)
	+
	+ Kokkos::parallel_for( Kokkos::ThreadVectorRange( team, 16000 ), [&] ( const int & i )
	{
	- if(b_thread1(i) != 1000000 + 100000*team.team_rank() + 16-i) error++;
	- if(b_thread2(i) != 2000000 + 100000*team.team_rank() + 16-i) error++;
	- if( b_thread3(i) != 3000000 + 100000*team.team_rank() + 16-i) error++;
	+ if ( b_thread1( i ) != 1000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000 ) error++;
	+ if ( b_thread2( i ) != 2000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000 ) error++;
	+ if ( b_thread3( i ) != 3000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000 ) error++;
	});

	return error;
	}

	struct TagReduce {};
	struct TagFor {};

	template< class ExecSpace, class ScheduleType >
	struct ClassNoShmemSizeFunction {
	- Kokkos::View<int,ExecSpace,Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
	+ typedef typename Kokkos::TeamPolicy< ExecSpace, ScheduleType >::member_type member_type;
	+
	+ Kokkos::View< int, ExecSpace, Kokkos::MemoryTraits<Kokkos::Atomic> > errors;

	KOKKOS_INLINE_FUNCTION
	- void operator() (const TagFor&, const typename Kokkos::TeamPolicy<ExecSpace,ScheduleType>::member_type& team) const {
	- int error = test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
	+ void operator()( const TagFor &, const member_type & team ) const {
	+ int error = test_team_mulit_level_scratch_loop_body< ExecSpace >( team );
	errors() += error;
	}

	KOKKOS_INLINE_FUNCTION
	- void operator() (const TagReduce&, const typename Kokkos::TeamPolicy<ExecSpace,ScheduleType>::member_type& team, int& error) const {
	- error += test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
	+ void operator() ( const TagReduce &, const member_type & team, int & error ) const {
	+ error += test_team_mulit_level_scratch_loop_body< ExecSpace >( team );
	}

	void run() {
	- Kokkos::View<int,ExecSpace> d_errors = Kokkos::View<int,ExecSpace>("Errors");
	+ Kokkos::View< int, ExecSpace > d_errors = Kokkos::View< int, ExecSpace >( "Errors" );
	errors = d_errors;

	- const int per_team0 = 3Kokkos::View<double,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128);
	- const int per_thread0 = 3Kokkos::View<double,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16);
	+ const int per_team0 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 128 );
	+ const int per_thread0 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 16 );
	+
	+ const int per_team1 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 128000 );
	+ const int per_thread1 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 16000 );

	- const int per_team1 = 3Kokkos::View<double,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128000);
	- const int per_thread1 = 3Kokkos::View<double,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16000);
	{
	- Kokkos::TeamPolicy<TagFor,ExecSpace,ScheduleType> policy(10,8,16);
	- Kokkos::parallel_for(policy.set_scratch_size(0,Kokkos::PerTeam(per_team0),Kokkos::PerThread(per_thread0)).set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
	- *this);
	- Kokkos::fence();
	- typename Kokkos::View<int,ExecSpace>::HostMirror h_errors = Kokkos::create_mirror_view(d_errors);
	- Kokkos::deep_copy(h_errors,d_errors);
	- ASSERT_EQ(h_errors(),0);
	+ Kokkos::TeamPolicy< TagFor, ExecSpace, ScheduleType > policy( 10, 8, 16 );
	+
	+ Kokkos::parallel_for( policy.set_scratch_size( 0, Kokkos::PerTeam( per_team0 ), Kokkos::PerThread( per_thread0 ) ).set_scratch_size( 1, Kokkos::PerTeam( per_team1 ), Kokkos::PerThread( per_thread1 ) ), *this );
	+ Kokkos::fence();
	+
	+ typename Kokkos::View< int, ExecSpace >::HostMirror h_errors = Kokkos::create_mirror_view( d_errors );
	+ Kokkos::deep_copy( h_errors, d_errors );
	+ ASSERT_EQ( h_errors(), 0 );
	}

	{
	- int error = 0;
	- Kokkos::TeamPolicy<TagReduce,ExecSpace,ScheduleType> policy(10,8,16);
	- Kokkos::parallel_reduce(policy.set_scratch_size(0,Kokkos::PerTeam(per_team0),Kokkos::PerThread(per_thread0)).set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
	- *this,error);
	- Kokkos::fence();
	- ASSERT_EQ(error,0);
	+ int error = 0;
	+ Kokkos::TeamPolicy< TagReduce, ExecSpace, ScheduleType > policy( 10, 8, 16 );
	+
	+ Kokkos::parallel_reduce( policy.set_scratch_size( 0, Kokkos::PerTeam( per_team0 ), Kokkos::PerThread( per_thread0 ) ).set_scratch_size( 1, Kokkos::PerTeam( per_team1 ), Kokkos::PerThread( per_thread1 ) ), *this, error );
	+ Kokkos::fence();
	+
	+ ASSERT_EQ( error, 0 );
	}
	};
	};

	template< class ExecSpace, class ScheduleType >
	struct ClassWithShmemSizeFunction {
	- Kokkos::View<int,ExecSpace,Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
	+ typedef typename Kokkos::TeamPolicy< ExecSpace, ScheduleType >::member_type member_type;
	+
	+ Kokkos::View< int, ExecSpace, Kokkos::MemoryTraits<Kokkos::Atomic> > errors;

	KOKKOS_INLINE_FUNCTION
	- void operator() (const TagFor&, const typename Kokkos::TeamPolicy<ExecSpace,ScheduleType>::member_type& team) const {
	- int error = test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
	+ void operator()( const TagFor &, const member_type & team ) const {
	+ int error = test_team_mulit_level_scratch_loop_body< ExecSpace >( team );
	errors() += error;
	}

	KOKKOS_INLINE_FUNCTION
	- void operator() (const TagReduce&, const typename Kokkos::TeamPolicy<ExecSpace,ScheduleType>::member_type& team, int& error) const {
	- error += test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
	+ void operator() ( const TagReduce &, const member_type & team, int & error ) const {
	+ error += test_team_mulit_level_scratch_loop_body< ExecSpace >( team );
	}

	void run() {
	- Kokkos::View<int,ExecSpace> d_errors = Kokkos::View<int,ExecSpace>("Errors");
	+ Kokkos::View< int, ExecSpace > d_errors = Kokkos::View< int, ExecSpace >( "Errors" );
	errors = d_errors;

	- const int per_team1 = 3Kokkos::View<double,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128000);
	- const int per_thread1 = 3Kokkos::View<double,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16000);
	+ const int per_team1 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 128000 );
	+ const int per_thread1 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 16000 );
	+
	{
	- Kokkos::TeamPolicy<TagFor,ExecSpace,ScheduleType> policy(10,8,16);
	- Kokkos::parallel_for(policy.set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
	- *this);
	- Kokkos::fence();
	- typename Kokkos::View<int,ExecSpace>::HostMirror h_errors= Kokkos::create_mirror_view(d_errors);
	- Kokkos::deep_copy(h_errors,d_errors);
	- ASSERT_EQ(h_errors(),0);
	+ Kokkos::TeamPolicy< TagFor, ExecSpace, ScheduleType > policy( 10, 8, 16 );
	+
	+ Kokkos::parallel_for( policy.set_scratch_size( 1, Kokkos::PerTeam( per_team1 ),
	+ Kokkos::PerThread( per_thread1 ) ),
	+ *this );
	+ Kokkos::fence();
	+
	+ typename Kokkos::View< int, ExecSpace >::HostMirror h_errors = Kokkos::create_mirror_view( d_errors );
	+ Kokkos::deep_copy( h_errors, d_errors );
	+ ASSERT_EQ( h_errors(), 0 );
	}

	{
	- int error = 0;
	- Kokkos::TeamPolicy<TagReduce,ExecSpace,ScheduleType> policy(10,8,16);
	- Kokkos::parallel_reduce(policy.set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
	- *this,error);
	- Kokkos::fence();
	- ASSERT_EQ(error,0);
	+ int error = 0;
	+ Kokkos::TeamPolicy< TagReduce, ExecSpace, ScheduleType > policy( 10, 8, 16 );
	+
	+ Kokkos::parallel_reduce( policy.set_scratch_size( 1, Kokkos::PerTeam( per_team1 ),
	+ Kokkos::PerThread( per_thread1 ) ),
	+ *this, error );
	+ Kokkos::fence();
	+
	+ ASSERT_EQ( error, 0 );
	}
	};

	- unsigned team_shmem_size(int team_size) const {
	- const int per_team0 = 3Kokkos::View<double,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128);
	- const int per_thread0 = 3Kokkos::View<double,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16);
	+ unsigned team_shmem_size( int team_size ) const {
	+ const int per_team0 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 128 );
	+ const int per_thread0 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 16 );
	return per_team0 + team_size * per_thread0;
	}
	};

	template< class ExecSpace, class ScheduleType >
	void test_team_mulit_level_scratch_test_lambda() {
	#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
	- Kokkos::View<int,ExecSpace,Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
	- Kokkos::View<int,ExecSpace> d_errors("Errors");
	+ Kokkos::View< int, ExecSpace, Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
	+ Kokkos::View< int, ExecSpace > d_errors( "Errors" );
	errors = d_errors;

	- const int per_team0 = 3Kokkos::View<double,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128);
	- const int per_thread0 = 3Kokkos::View<double,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16);
	+ const int per_team0 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 128 );
	+ const int per_thread0 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 16 );
	+
	+ const int per_team1 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 128000 );
	+ const int per_thread1 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 16000 );

	- const int per_team1 = 3Kokkos::View<double,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128000);
	- const int per_thread1 = 3Kokkos::View<double,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16000);
	+ Kokkos::TeamPolicy< ExecSpace, ScheduleType > policy( 10, 8, 16 );

	- Kokkos::TeamPolicy<ExecSpace,ScheduleType> policy(10,8,16);
	- Kokkos::parallel_for(policy.set_scratch_size(0,Kokkos::PerTeam(per_team0),Kokkos::PerThread(per_thread0)).set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
	- KOKKOS_LAMBDA(const typename Kokkos::TeamPolicy<ExecSpace>::member_type& team) {
	- int error = test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
	+ Kokkos::parallel_for( policy.set_scratch_size( 0, Kokkos::PerTeam( per_team0 ), Kokkos::PerThread( per_thread0 ) ).set_scratch_size( 1, Kokkos::PerTeam( per_team1 ), Kokkos::PerThread( per_thread1 ) ),
	+ KOKKOS_LAMBDA ( const typename Kokkos::TeamPolicy< ExecSpace >::member_type & team )
	+ {
	+ int error = test_team_mulit_level_scratch_loop_body< ExecSpace >( team );
	errors() += error;
	});
	Kokkos::fence();
	- typename Kokkos::View<int,ExecSpace>::HostMirror h_errors= Kokkos::create_mirror_view(errors);
	- Kokkos::deep_copy(h_errors,d_errors);
	- ASSERT_EQ(h_errors(),0);
	+
	+ typename Kokkos::View< int, ExecSpace >::HostMirror h_errors = Kokkos::create_mirror_view( errors );
	+ Kokkos::deep_copy( h_errors, d_errors );
	+ ASSERT_EQ( h_errors(), 0 );

	int error = 0;
	- Kokkos::parallel_reduce(policy.set_scratch_size(0,Kokkos::PerTeam(per_team0),Kokkos::PerThread(per_thread0)).set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
	- KOKKOS_LAMBDA(const typename Kokkos::TeamPolicy<ExecSpace>::member_type& team, int& count) {
	- count += test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
	- },error);
	- ASSERT_EQ(error,0);
	+ Kokkos::parallel_reduce( policy.set_scratch_size( 0, Kokkos::PerTeam( per_team0 ), Kokkos::PerThread( per_thread0 ) ).set_scratch_size( 1, Kokkos::PerTeam( per_team1 ), Kokkos::PerThread( per_thread1 ) ),
	+ KOKKOS_LAMBDA ( const typename Kokkos::TeamPolicy< ExecSpace >::member_type & team, int & count )
	+ {
	+ count += test_team_mulit_level_scratch_loop_body< ExecSpace >( team );
	+ }, error );
	+ ASSERT_EQ( error, 0 );
	Kokkos::fence();
	#endif
	}

	-
	-}
	+} // namespace Test

	namespace {
	+
	template< class ExecSpace, class ScheduleType >
	struct TestMultiLevelScratchTeam {
	-
	- TestMultiLevelScratchTeam()
	- { run(); }
	+ TestMultiLevelScratchTeam() { run(); }

	void run()
	{
	#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
	- Test::test_team_mulit_level_scratch_test_lambda<ExecSpace, ScheduleType>();
	+ Test::test_team_mulit_level_scratch_test_lambda< ExecSpace, ScheduleType >();
	#endif
	- Test::ClassNoShmemSizeFunction<ExecSpace, ScheduleType> c1;
	+ Test::ClassNoShmemSizeFunction< ExecSpace, ScheduleType > c1;
	c1.run();

	- Test::ClassWithShmemSizeFunction<ExecSpace, ScheduleType> c2;
	+ Test::ClassWithShmemSizeFunction< ExecSpace, ScheduleType > c2;
	c2.run();
	-
	}
	};
	-}
	+
	+} // namespace

	namespace Test {

	template< class ExecSpace >
	struct TestShmemSize {
	-
	TestShmemSize() { run(); }

	void run()
	{
	typedef Kokkos::View< long***, ExecSpace > view_type;

	size_t d1 = 5;
	size_t d2 = 6;
	size_t d3 = 7;

	size_t size = view_type::shmem_size( d1, d2, d3 );

	- ASSERT_EQ( size, d1 * d2 * d3 * sizeof(long) );
	+ ASSERT_EQ( size, d1 * d2 * d3 * sizeof( long ) );
	}
	};
	-}

	-/--------------------------------------------------------------------------/
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/TestTeamVector.hpp b/lib/kokkos/core/unit_test/TestTeamVector.hpp
	index d9b06c29e..8d16ac66d 100644
	--- a/lib/kokkos/core/unit_test/TestTeamVector.hpp
	+++ b/lib/kokkos/core/unit_test/TestTeamVector.hpp
	@@ -1,673 +1,745 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Core.hpp>

	#include <impl/Kokkos_Timer.hpp>
	#include <iostream>
	#include <cstdlib>

	namespace TestTeamVector {

	struct my_complex {
	- double re,im;
	+ double re, im;
	int dummy;
	+
	KOKKOS_INLINE_FUNCTION
	my_complex() {
	re = 0.0;
	im = 0.0;
	dummy = 0;
	}
	+
	KOKKOS_INLINE_FUNCTION
	- my_complex(const my_complex& src) {
	+ my_complex( const my_complex & src ) {
	re = src.re;
	im = src.im;
	dummy = src.dummy;
	}

	KOKKOS_INLINE_FUNCTION
	- my_complex(const volatile my_complex& src) {
	+ my_complex & operator=( const my_complex & src ) {
	re = src.re;
	im = src.im;
	dummy = src.dummy;
	+ return *this ;
	}

	KOKKOS_INLINE_FUNCTION
	- my_complex(const double& val) {
	+ my_complex( const volatile my_complex & src ) {
	+ re = src.re;
	+ im = src.im;
	+ dummy = src.dummy;
	+ }
	+
	+ KOKKOS_INLINE_FUNCTION
	+ my_complex( const double & val ) {
	re = val;
	im = 0.0;
	dummy = 0;
	}
	+
	KOKKOS_INLINE_FUNCTION
	- my_complex& operator += (const my_complex& src) {
	+ my_complex & operator+=( const my_complex & src ) {
	re += src.re;
	im += src.im;
	dummy += src.dummy;
	return *this;
	}

	KOKKOS_INLINE_FUNCTION
	- void operator += (const volatile my_complex& src) volatile {
	+ void operator+=( const volatile my_complex & src ) volatile {
	re += src.re;
	im += src.im;
	dummy += src.dummy;
	}
	+
	KOKKOS_INLINE_FUNCTION
	- my_complex& operator *= (const my_complex& src) {
	- double re_tmp = resrc.re - imsrc.im;
	+ my_complex & operator*=( const my_complex & src ) {
	+ double re_tmp = re * src.re - im * src.im;
	double im_tmp = re * src.im + im * src.re;
	re = re_tmp;
	im = im_tmp;
	dummy *= src.dummy;
	return *this;
	}
	+
	KOKKOS_INLINE_FUNCTION
	- void operator *= (const volatile my_complex& src) volatile {
	- double re_tmp = resrc.re - imsrc.im;
	+ void operator*=( const volatile my_complex & src ) volatile {
	+ double re_tmp = re * src.re - im * src.im;
	double im_tmp = re * src.im + im * src.re;
	re = re_tmp;
	im = im_tmp;
	dummy *= src.dummy;
	}
	+
	KOKKOS_INLINE_FUNCTION
	- bool operator == (const my_complex& src) {
	- return (re == src.re) && (im == src.im) && ( dummy == src.dummy );
	+ bool operator==( const my_complex & src ) {
	+ return ( re == src.re ) && ( im == src.im ) && ( dummy == src.dummy );
	}
	+
	KOKKOS_INLINE_FUNCTION
	- bool operator != (const my_complex& src) {
	- return (re != src.re) \|\| (im != src.im) \|\| ( dummy != src.dummy );
	+ bool operator!=( const my_complex & src ) {
	+ return ( re != src.re ) \|\| ( im != src.im ) \|\| ( dummy != src.dummy );
	}
	+
	KOKKOS_INLINE_FUNCTION
	- bool operator != (const double& val) {
	- return (re != val) \|\|
	- (im != 0) \|\| (dummy != 0);
	+ bool operator!=( const double & val ) {
	+ return ( re != val ) \|\| ( im != 0 ) \|\| ( dummy != 0 );
	}
	+
	KOKKOS_INLINE_FUNCTION
	- my_complex& operator= (const int& val) {
	+ my_complex & operator=( const int & val ) {
	re = val;
	im = 0.0;
	dummy = 0;
	return *this;
	}
	+
	KOKKOS_INLINE_FUNCTION
	- my_complex& operator= (const double& val) {
	+ my_complex & operator=( const double & val ) {
	re = val;
	im = 0.0;
	dummy = 0;
	return *this;
	}
	+
	KOKKOS_INLINE_FUNCTION
	operator double() {
	return re;
	}
	};

	-template<typename Scalar, class ExecutionSpace>
	+template< typename Scalar, class ExecutionSpace >
	struct functor_team_for {
	- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
	+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
	typedef ExecutionSpace execution_space;

	- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
	- functor_team_for(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
	+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;

	- unsigned team_shmem_size(int team_size) const {return team_size13sizeof(Scalar)+8;}
	+ functor_team_for( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}

	- KOKKOS_INLINE_FUNCTION
	- void operator() (typename policy_type::member_type team) const {
	+ unsigned team_shmem_size( int team_size ) const { return team_size * 13 * sizeof( Scalar ) + 8; }

	- typedef typename ExecutionSpace::scratch_memory_space shmem_space ;
	- typedef Kokkos::View<Scalar*,shmem_space,Kokkos::MemoryUnmanaged> shared_int;
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( typename policy_type::member_type team ) const {
	+ typedef typename ExecutionSpace::scratch_memory_space shmem_space;
	+ typedef Kokkos::View< Scalar*, shmem_space, Kokkos::MemoryUnmanaged > shared_int;
	typedef typename shared_int::size_type size_type;

	- const size_type shmemSize = team.team_size () * 13;
	- shared_int values = shared_int (team.team_shmem (), shmemSize);
	+ const size_type shmemSize = team.team_size() * 13;
	+ shared_int values = shared_int( team.team_shmem(), shmemSize );

	- if (values.ptr_on_device () == NULL \|\| values.dimension_0 () < shmemSize) {
	- printf ("FAILED to allocate shared memory of size %u\n",
	- static_cast<unsigned int> (shmemSize));
	+ if ( values.ptr_on_device() == NULL \|\| values.dimension_0() < shmemSize ) {
	+ printf( "FAILED to allocate shared memory of size %u\n",
	+ static_cast<unsigned int>( shmemSize ) );
	}
	else {
	+ // Initialize shared memory.
	+ values( team.team_rank() ) = 0;

	- // Initialize shared memory
	- values(team.team_rank ()) = 0;
	-
	- // Accumulate value into per thread shared memory
	- // This is non blocking
	- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,131),[&] (int i)
	+ // Accumulate value into per thread shared memory.
	+ // This is non blocking.
	+ Kokkos::parallel_for( Kokkos::TeamThreadRange( team, 131 ), [&] ( int i )
	{
	- values(team.team_rank ()) += i - team.league_rank () + team.league_size () + team.team_size ();
	+ values( team.team_rank() ) += i - team.league_rank() + team.league_size() + team.team_size();
	});
	- // Wait for all memory to be written
	- team.team_barrier ();
	- // One thread per team executes the comparison
	- Kokkos::single(Kokkos::PerTeam(team),[&]()
	+
	+ // Wait for all memory to be written.
	+ team.team_barrier();
	+
	+ // One thread per team executes the comparison.
	+ Kokkos::single( Kokkos::PerTeam( team ), [&] ()
	{
	- Scalar test = 0;
	- Scalar value = 0;
	- for (int i = 0; i < 131; ++i) {
	- test += i - team.league_rank () + team.league_size () + team.team_size ();
	- }
	- for (int i = 0; i < team.team_size (); ++i) {
	- value += values(i);
	- }
	- if (test != value) {
	- printf ("FAILED team_parallel_for %i %i %f %f\n",
	- team.league_rank (), team.team_rank (),
	- static_cast<double> (test), static_cast<double> (value));
	- flag() = 1;
	- }
	+ Scalar test = 0;
	+ Scalar value = 0;
	+
	+ for ( int i = 0; i < 131; ++i ) {
	+ test += i - team.league_rank() + team.league_size() + team.team_size();
	+ }
	+
	+ for ( int i = 0; i < team.team_size(); ++i ) {
	+ value += values( i );
	+ }
	+
	+ if ( test != value ) {
	+ printf ( "FAILED team_parallel_for %i %i %f %f\n",
	+ team.league_rank(), team.team_rank(),
	+ static_cast<double>( test ), static_cast<double>( value ) );
	+ flag() = 1;
	+ }
	});
	}
	}
	};

	-template<typename Scalar, class ExecutionSpace>
	+template< typename Scalar, class ExecutionSpace >
	struct functor_team_reduce {
	- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
	+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
	typedef ExecutionSpace execution_space;

	- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
	- functor_team_reduce(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
	+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;

	- unsigned team_shmem_size(int team_size) const {return team_size13sizeof(Scalar)+8;}
	+ functor_team_reduce( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}

	- KOKKOS_INLINE_FUNCTION
	- void operator() (typename policy_type::member_type team) const {
	+ unsigned team_shmem_size( int team_size ) const { return team_size * 13 * sizeof( Scalar ) + 8; }

	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( typename policy_type::member_type team ) const {
	Scalar value = Scalar();
	- Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,131),[&] (int i, Scalar& val)
	+
	+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange( team, 131 ), [&] ( int i, Scalar & val )
	{
	- val += i - team.league_rank () + team.league_size () + team.team_size ();
	- },value);
	+ val += i - team.league_rank() + team.league_size() + team.team_size();
	+ }, value );

	- team.team_barrier ();
	- Kokkos::single(Kokkos::PerTeam(team),[&]()
	- {
	- Scalar test = 0;
	- for (int i = 0; i < 131; ++i) {
	- test += i - team.league_rank () + team.league_size () + team.team_size ();
	- }
	- if (test != value) {
	- if(team.league_rank() == 0)
	- printf ("FAILED team_parallel_reduce %i %i %f %f %lu\n",
	- team.league_rank (), team.team_rank (),
	- static_cast<double> (test), static_cast<double> (value),sizeof(Scalar));
	- flag() = 1;
	- }
	+ team.team_barrier();
	+
	+ Kokkos::single( Kokkos::PerTeam( team ), [&] ()
	+ {
	+ Scalar test = 0;
	+
	+ for ( int i = 0; i < 131; ++i ) {
	+ test += i - team.league_rank() + team.league_size() + team.team_size();
	+ }
	+
	+ if ( test != value ) {
	+ if ( team.league_rank() == 0 ) {
	+ printf( "FAILED team_parallel_reduce %i %i %f %f %lu\n",
	+ team.league_rank(), team.team_rank(),
	+ static_cast<double>( test ), static_cast<double>( value ), sizeof( Scalar ) );
	+ }
	+
	+ flag() = 1;
	+ }
	});
	}
	};

	-template<typename Scalar, class ExecutionSpace>
	+template< typename Scalar, class ExecutionSpace >
	struct functor_team_reduce_join {
	- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
	+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
	typedef ExecutionSpace execution_space;

	- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
	- functor_team_reduce_join(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
	+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;

	- unsigned team_shmem_size(int team_size) const {return team_size13sizeof(Scalar)+8;}
	+ functor_team_reduce_join( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}

	- KOKKOS_INLINE_FUNCTION
	- void operator() (typename policy_type::member_type team) const {
	+ unsigned team_shmem_size( int team_size ) const { return team_size * 13 * sizeof( Scalar ) + 8; }

	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( typename policy_type::member_type team ) const {
	Scalar value = 0;

	- Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,131)
	- , [&] (int i, Scalar& val)
	- {
	- val += i - team.league_rank () + team.league_size () + team.team_size ();
	- }
	- , [&] (volatile Scalar& val, const volatile Scalar& src)
	- {val+=src;}
	- , value
	+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange( team, 131 ), [&] ( int i, Scalar & val )
	+ {
	+ val += i - team.league_rank() + team.league_size() + team.team_size();
	+ },
	+ [] ( volatile Scalar & val, const volatile Scalar & src ) { val += src; },
	+ value
	);

	- team.team_barrier ();
	- Kokkos::single(Kokkos::PerTeam(team),[&]()
	+ team.team_barrier();
	+
	+ Kokkos::single( Kokkos::PerTeam( team ), [&] ()
	{
	- Scalar test = 0;
	- for (int i = 0; i < 131; ++i) {
	- test += i - team.league_rank () + team.league_size () + team.team_size ();
	- }
	- if (test != value) {
	- printf ("FAILED team_vector_parallel_reduce_join %i %i %f %f\n",
	- team.league_rank (), team.team_rank (),
	- static_cast<double> (test), static_cast<double> (value));
	- flag() = 1;
	- }
	+ Scalar test = 0;
	+
	+ for ( int i = 0; i < 131; ++i ) {
	+ test += i - team.league_rank() + team.league_size() + team.team_size();
	+ }
	+
	+ if ( test != value ) {
	+ printf( "FAILED team_vector_parallel_reduce_join %i %i %f %f\n",
	+ team.league_rank(), team.team_rank(),
	+ static_cast<double>( test ), static_cast<double>( value ) );
	+
	+ flag() = 1;
	+ }
	});
	}
	};

	-template<typename Scalar, class ExecutionSpace>
	+template< typename Scalar, class ExecutionSpace >
	struct functor_team_vector_for {
	- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
	+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
	typedef ExecutionSpace execution_space;

	- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
	- functor_team_vector_for(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
	+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;

	- unsigned team_shmem_size(int team_size) const {return team_size13sizeof(Scalar)+8;}
	+ functor_team_vector_for( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}

	- KOKKOS_INLINE_FUNCTION
	- void operator() (typename policy_type::member_type team) const {
	+ unsigned team_shmem_size( int team_size ) const { return team_size * 13 * sizeof( Scalar ) + 8; }

	- typedef typename ExecutionSpace::scratch_memory_space shmem_space ;
	- typedef Kokkos::View<Scalar*,shmem_space,Kokkos::MemoryUnmanaged> shared_int;
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( typename policy_type::member_type team ) const {
	+ typedef typename ExecutionSpace::scratch_memory_space shmem_space;
	+ typedef Kokkos::View< Scalar*, shmem_space, Kokkos::MemoryUnmanaged > shared_int;
	typedef typename shared_int::size_type size_type;

	- const size_type shmemSize = team.team_size () * 13;
	- shared_int values = shared_int (team.team_shmem (), shmemSize);
	+ const size_type shmemSize = team.team_size() * 13;
	+ shared_int values = shared_int( team.team_shmem(), shmemSize );

	- if (values.ptr_on_device () == NULL \|\| values.dimension_0 () < shmemSize) {
	- printf ("FAILED to allocate shared memory of size %u\n",
	- static_cast<unsigned int> (shmemSize));
	+ if ( values.ptr_on_device() == NULL \|\| values.dimension_0() < shmemSize ) {
	+ printf( "FAILED to allocate shared memory of size %u\n",
	+ static_cast<unsigned int>( shmemSize ) );
	}
	else {
	- Kokkos::single(Kokkos::PerThread(team),[&] ()
	+ team.team_barrier();
	+
	+ Kokkos::single( Kokkos::PerThread( team ), [&] ()
	{
	- values(team.team_rank ()) = 0;
	+ values( team.team_rank() ) = 0;
	});

	- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,131),[&] (int i)
	+ Kokkos::parallel_for( Kokkos::TeamThreadRange( team, 131 ), [&] ( int i )
	{
	- Kokkos::single(Kokkos::PerThread(team),[&] ()
	+ Kokkos::single( Kokkos::PerThread( team ), [&] ()
	{
	- values(team.team_rank ()) += i - team.league_rank () + team.league_size () + team.team_size ();
	+ values( team.team_rank() ) += i - team.league_rank() + team.league_size() + team.team_size();
	});
	});

	- team.team_barrier ();
	- Kokkos::single(Kokkos::PerTeam(team),[&]()
	+ team.team_barrier();
	+
	+ Kokkos::single( Kokkos::PerTeam( team ), [&] ()
	{
	Scalar test = 0;
	Scalar value = 0;
	- for (int i = 0; i < 131; ++i) {
	- test += i - team.league_rank () + team.league_size () + team.team_size ();
	+
	+ for ( int i = 0; i < 131; ++i ) {
	+ test += i - team.league_rank() + team.league_size() + team.team_size();
	}
	- for (int i = 0; i < team.team_size (); ++i) {
	- value += values(i);
	+
	+ for ( int i = 0; i < team.team_size(); ++i ) {
	+ value += values( i );
	}
	- if (test != value) {
	- printf ("FAILED team_vector_parallel_for %i %i %f %f\n",
	- team.league_rank (), team.team_rank (),
	- static_cast<double> (test), static_cast<double> (value));
	+
	+ if ( test != value ) {
	+ printf( "FAILED team_vector_parallel_for %i %i %f %f\n",
	+ team.league_rank(), team.team_rank(),
	+ static_cast<double>( test ), static_cast<double>( value ) );
	+
	flag() = 1;
	}
	});
	}
	}
	};

	-template<typename Scalar, class ExecutionSpace>
	+template< typename Scalar, class ExecutionSpace >
	struct functor_team_vector_reduce {
	- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
	+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
	typedef ExecutionSpace execution_space;

	- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
	- functor_team_vector_reduce(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
	+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
	+ functor_team_vector_reduce( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}

	- unsigned team_shmem_size(int team_size) const {return team_size13sizeof(Scalar)+8;}
	+ unsigned team_shmem_size( int team_size ) const { return team_size * 13 * sizeof( Scalar ) + 8; }

	KOKKOS_INLINE_FUNCTION
	- void operator() (typename policy_type::member_type team) const {
	-
	+ void operator()( typename policy_type::member_type team ) const {
	Scalar value = Scalar();
	- Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,131),[&] (int i, Scalar& val)
	+
	+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange( team, 131 ), [&] ( int i, Scalar & val )
	{
	- val += i - team.league_rank () + team.league_size () + team.team_size ();
	- },value);
	+ val += i - team.league_rank() + team.league_size() + team.team_size();
	+ }, value );

	- team.team_barrier ();
	- Kokkos::single(Kokkos::PerTeam(team),[&]()
	+ team.team_barrier();
	+
	+ Kokkos::single( Kokkos::PerTeam( team ), [&] ()
	{
	Scalar test = 0;
	- for (int i = 0; i < 131; ++i) {
	- test += i - team.league_rank () + team.league_size () + team.team_size ();
	+
	+ for ( int i = 0; i < 131; ++i ) {
	+ test += i - team.league_rank() + team.league_size() + team.team_size();
	}
	- if (test != value) {
	- if(team.league_rank() == 0)
	- printf ("FAILED team_vector_parallel_reduce %i %i %f %f %lu\n",
	- team.league_rank (), team.team_rank (),
	- static_cast<double> (test), static_cast<double> (value),sizeof(Scalar));
	- flag() = 1;
	+
	+ if ( test != value ) {
	+ if ( team.league_rank() == 0 ) {
	+ printf( "FAILED team_vector_parallel_reduce %i %i %f %f %lu\n",
	+ team.league_rank(), team.team_rank(),
	+ static_cast<double>( test ), static_cast<double>( value ), sizeof( Scalar ) );
	+ }
	+
	+ flag() = 1;
	}
	});
	}
	};

	-template<typename Scalar, class ExecutionSpace>
	+template< typename Scalar, class ExecutionSpace >
	struct functor_team_vector_reduce_join {
	- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
	+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
	typedef ExecutionSpace execution_space;

	- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
	- functor_team_vector_reduce_join(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
	+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;

	- unsigned team_shmem_size(int team_size) const {return team_size13sizeof(Scalar)+8;}
	+ functor_team_vector_reduce_join( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}

	- KOKKOS_INLINE_FUNCTION
	- void operator() (typename policy_type::member_type team) const {
	+ unsigned team_shmem_size( int team_size ) const { return team_size * 13 * sizeof( Scalar ) + 8; }

	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( typename policy_type::member_type team ) const {
	Scalar value = 0;
	- Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,131)
	- , [&] (int i, Scalar& val)
	- {
	- val += i - team.league_rank () + team.league_size () + team.team_size ();
	- }
	- , [&] (volatile Scalar& val, const volatile Scalar& src)
	- {val+=src;}
	- , value
	+
	+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange( team, 131 ), [&] ( int i, Scalar & val )
	+ {
	+ val += i - team.league_rank() + team.league_size() + team.team_size();
	+ },
	+ [] ( volatile Scalar & val, const volatile Scalar & src ) { val += src; },
	+ value
	);

	- team.team_barrier ();
	- Kokkos::single(Kokkos::PerTeam(team),[&]()
	+ team.team_barrier();
	+
	+ Kokkos::single( Kokkos::PerTeam( team ), [&] ()
	{
	Scalar test = 0;
	- for (int i = 0; i < 131; ++i) {
	- test += i - team.league_rank () + team.league_size () + team.team_size ();
	+
	+ for ( int i = 0; i < 131; ++i ) {
	+ test += i - team.league_rank() + team.league_size() + team.team_size();
	}
	- if (test != value) {
	- printf ("FAILED team_vector_parallel_reduce_join %i %i %f %f\n",
	- team.league_rank (), team.team_rank (),
	- static_cast<double> (test), static_cast<double> (value));
	+
	+ if ( test != value ) {
	+ printf( "FAILED team_vector_parallel_reduce_join %i %i %f %f\n",
	+ team.league_rank(), team.team_rank(),
	+ static_cast<double>( test ), static_cast<double>( value ) );
	+
	flag() = 1;
	}
	});
	}
	};

	-template<typename Scalar, class ExecutionSpace>
	+template< typename Scalar, class ExecutionSpace >
	struct functor_vec_single {
	- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
	+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
	typedef ExecutionSpace execution_space;

	- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
	- functor_vec_single(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
	+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
	+ functor_vec_single( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator() (typename policy_type::member_type team) const {
	-
	- // Warning: this test case intentionally violates permissable semantics
	+ void operator()( typename policy_type::member_type team ) const {
	+ // Warning: this test case intentionally violates permissable semantics.
	// It is not valid to get references to members of the enclosing region
	// inside a parallel_for and write to it.
	Scalar value = 0;

	- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,13),[&] (int i)
	+ Kokkos::parallel_for( Kokkos::ThreadVectorRange( team, 13 ), [&] ( int i )
	{
	- value = i; // This write is violating Kokkos semantics for nested parallelism
	+ value = i; // This write is violating Kokkos semantics for nested parallelism.
	});

	- Kokkos::single(Kokkos::PerThread(team),[&] (Scalar& val)
	+ Kokkos::single( Kokkos::PerThread( team ), [&] ( Scalar & val )
	{
	val = 1;
	- },value);
	+ }, value );

	Scalar value2 = 0;
	- Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,13), [&] (int i, Scalar& val)
	+ Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( team, 13 ), [&] ( int i, Scalar & val )
	{
	val += value;
	- },value2);
	+ }, value2 );
	+
	+ if ( value2 != ( value * 13 ) ) {
	+ printf( "FAILED vector_single broadcast %i %i %f %f\n",
	+ team.league_rank(), team.team_rank(), (double) value2, (double) value );

	- if(value2!=(value*13)) {
	- printf("FAILED vector_single broadcast %i %i %f %f\n",team.league_rank(),team.team_rank(),(double) value2,(double) value);
	- flag()=1;
	+ flag() = 1;
	}
	}
	};

	-template<typename Scalar, class ExecutionSpace>
	+template< typename Scalar, class ExecutionSpace >
	struct functor_vec_for {
	- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
	+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
	typedef ExecutionSpace execution_space;

	- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
	- functor_vec_for(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
	+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
	+
	+ functor_vec_for( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}

	- unsigned team_shmem_size(int team_size) const {return team_size13sizeof(Scalar)+8;}
	+ unsigned team_shmem_size( int team_size ) const { return team_size * 13 * sizeof( Scalar ) + 8; }

	KOKKOS_INLINE_FUNCTION
	- void operator() (typename policy_type::member_type team) const {
	+ void operator()( typename policy_type::member_type team ) const {
	+ typedef typename ExecutionSpace::scratch_memory_space shmem_space;
	+ typedef Kokkos::View< Scalar*, shmem_space, Kokkos::MemoryUnmanaged > shared_int;

	- typedef typename ExecutionSpace::scratch_memory_space shmem_space ;
	- typedef Kokkos::View<Scalar*,shmem_space,Kokkos::MemoryUnmanaged> shared_int;
	- shared_int values = shared_int(team.team_shmem(),team.team_size()*13);
	+ shared_int values = shared_int( team.team_shmem(), team.team_size() * 13 );

	- if (values.ptr_on_device () == NULL \|\|
	- values.dimension_0() < (unsigned) team.team_size() * 13) {
	- printf ("FAILED to allocate memory of size %i\n",
	- static_cast<int> (team.team_size () * 13));
	+ if ( values.ptr_on_device() == NULL \|\| values.dimension_0() < (unsigned) team.team_size() * 13 ) {
	+ printf( "FAILED to allocate memory of size %i\n", static_cast<int>( team.team_size() * 13 ) );
	flag() = 1;
	}
	else {
	- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,13), [&] (int i)
	+ Kokkos::parallel_for( Kokkos::ThreadVectorRange( team, 13 ), [&] ( int i )
	{
	- values(13*team.team_rank() + i) = i - team.team_rank() - team.league_rank() + team.league_size() + team.team_size();
	+ values( 13 * team.team_rank() + i ) =
	+ i - team.team_rank() - team.league_rank() + team.league_size() + team.team_size();
	});

	- Kokkos::single(Kokkos::PerThread(team),[&] ()
	+ Kokkos::single( Kokkos::PerThread( team ), [&] ()
	{
	Scalar test = 0;
	Scalar value = 0;
	- for (int i = 0; i < 13; ++i) {
	+
	+ for ( int i = 0; i < 13; ++i ) {
	test += i - team.team_rank() - team.league_rank() + team.league_size() + team.team_size();
	- value += values(13*team.team_rank() + i);
	+ value += values( 13 * team.team_rank() + i );
	}
	- if (test != value) {
	- printf ("FAILED vector_par_for %i %i %f %f\n",
	- team.league_rank (), team.team_rank (),
	- static_cast<double> (test), static_cast<double> (value));
	+
	+ if ( test != value ) {
	+ printf( "FAILED vector_par_for %i %i %f %f\n",
	+ team.league_rank(), team.team_rank(),
	+ static_cast<double>( test ), static_cast<double>( value ) );
	+
	flag() = 1;
	}
	});
	}
	}
	};

	-template<typename Scalar, class ExecutionSpace>
	+template< typename Scalar, class ExecutionSpace >
	struct functor_vec_red {
	- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
	+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
	typedef ExecutionSpace execution_space;

	- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
	- functor_vec_red(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
	+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
	+
	+ functor_vec_red( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator() (typename policy_type::member_type team) const {
	+ void operator()( typename policy_type::member_type team ) const {
	Scalar value = 0;

	- Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,13),[&] (int i, Scalar& val)
	+ // When no reducer is given the default is summation.
	+ Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( team, 13 ), [&] ( int i, Scalar & val )
	{
	val += i;
	- }, value);
	+ }, value );

	- Kokkos::single(Kokkos::PerThread(team),[&] ()
	+ Kokkos::single( Kokkos::PerThread( team ), [&] ()
	{
	Scalar test = 0;
	- for(int i = 0; i < 13; i++) {
	- test+=i;
	- }
	- if(test!=value) {
	- printf("FAILED vector_par_reduce %i %i %f %f\n",team.league_rank(),team.team_rank(),(double) test,(double) value);
	- flag()=1;
	+
	+ for ( int i = 0; i < 13; i++ ) test += i;
	+
	+ if ( test != value ) {
	+ printf( "FAILED vector_par_reduce %i %i %f %f\n",
	+ team.league_rank(), team.team_rank(), (double) test, (double) value );
	+
	+ flag() = 1;
	}
	});
	}
	};

	-template<typename Scalar, class ExecutionSpace>
	+template< typename Scalar, class ExecutionSpace >
	struct functor_vec_red_join {
	- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
	+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
	typedef ExecutionSpace execution_space;

	- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
	- functor_vec_red_join(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
	+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
	+
	+ functor_vec_red_join( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator() (typename policy_type::member_type team) const {
	+ void operator()( typename policy_type::member_type team ) const {
	+ // Must initialize to the identity value for the reduce operation
	+ // for this test:
	+ // ( identity, operation ) = ( 1 , *= )
	Scalar value = 1;

	- Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,13)
	- , [&] (int i, Scalar& val)
	- { val *= i; }
	- , [&] (Scalar& val, const Scalar& src)
	- {val*=src;}
	- , value
	+ Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( team, 13 ), [&] ( int i, Scalar & val )
	+ {
	+ val *= ( i % 5 + 1 );
	+ },
	+ [&] ( Scalar & val, const Scalar & src ) { val *= src; },
	+ value
	);

	- Kokkos::single(Kokkos::PerThread(team),[&] ()
	+ Kokkos::single( Kokkos::PerThread( team ), [&] ()
	{
	Scalar test = 1;
	- for(int i = 0; i < 13; i++) {
	- test*=i;
	- }
	- if(test!=value) {
	- printf("FAILED vector_par_reduce_join %i %i %f %f\n",team.league_rank(),team.team_rank(),(double) test,(double) value);
	- flag()=1;
	+
	+ for ( int i = 0; i < 13; i++ ) test *= ( i % 5 + 1 );
	+
	+ if ( test != value ) {
	+ printf( "FAILED vector_par_reduce_join %i %i %f %f\n",
	+ team.league_rank(), team.team_rank(), (double) test, (double) value );
	+
	+ flag() = 1;
	}
	});
	}
	};

	-template<typename Scalar, class ExecutionSpace>
	+template< typename Scalar, class ExecutionSpace >
	struct functor_vec_scan {
	- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
	+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
	typedef ExecutionSpace execution_space;

	- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
	- functor_vec_scan(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
	+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
	+ functor_vec_scan( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator() (typename policy_type::member_type team) const {
	- Kokkos::parallel_scan(Kokkos::ThreadVectorRange(team,13),[&] (int i, Scalar& val, bool final)
	+ void operator()( typename policy_type::member_type team ) const {
	+ Kokkos::parallel_scan( Kokkos::ThreadVectorRange( team, 13 ), [&] ( int i, Scalar & val, bool final )
	{
	val += i;
	- if(final) {
	+
	+ if ( final ) {
	Scalar test = 0;
	- for(int k = 0; k <= i; k++) {
	- test+=k;
	- }
	- if(test!=val) {
	- printf("FAILED vector_par_scan %i %i %f %f\n",team.league_rank(),team.team_rank(),(double) test,(double) val);
	- flag()=1;
	+ for ( int k = 0; k <= i; k++ ) test += k;
	+
	+ if ( test != val ) {
	+ printf( "FAILED vector_par_scan %i %i %f %f\n",
	+ team.league_rank(), team.team_rank(), (double) test, (double) val );
	+
	+ flag() = 1;
	}
	}
	});
	}
	};

	-template<typename Scalar, class ExecutionSpace>
	+template< typename Scalar, class ExecutionSpace >
	struct functor_reduce {
	typedef double value_type;
	- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
	+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
	typedef ExecutionSpace execution_space;

	- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
	- functor_reduce(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
	+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
	+ functor_reduce( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator() (typename policy_type::member_type team, double& sum) const {
	+ void operator()( typename policy_type::member_type team, double & sum ) const {
	sum += team.league_rank() * 100 + team.thread_rank();
	}
	};

	-template<typename Scalar,class ExecutionSpace>
	-bool test_scalar(int nteams, int team_size, int test) {
	- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> d_flag("flag");
	- typename Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace>::HostMirror h_flag("h_flag");
	- h_flag() = 0 ;
	- Kokkos::deep_copy(d_flag,h_flag);
	-
	- if(test==0)
	- Kokkos::parallel_for( std::string("A") , Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
	- functor_vec_red<Scalar, ExecutionSpace>(d_flag));
	- if(test==1)
	- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
	- functor_vec_red_join<Scalar, ExecutionSpace>(d_flag));
	- if(test==2)
	- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
	- functor_vec_scan<Scalar, ExecutionSpace>(d_flag));
	- if(test==3)
	- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
	- functor_vec_for<Scalar, ExecutionSpace>(d_flag));
	- if(test==4)
	- Kokkos::parallel_for( "B" , Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
	- functor_vec_single<Scalar, ExecutionSpace>(d_flag));
	- if(test==5)
	- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size),
	- functor_team_for<Scalar, ExecutionSpace>(d_flag));
	- if(test==6)
	- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size),
	- functor_team_reduce<Scalar, ExecutionSpace>(d_flag));
	- if(test==7)
	- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size),
	- functor_team_reduce_join<Scalar, ExecutionSpace>(d_flag));
	- if(test==8)
	- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
	- functor_team_vector_for<Scalar, ExecutionSpace>(d_flag));
	- if(test==9)
	- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
	- functor_team_vector_reduce<Scalar, ExecutionSpace>(d_flag));
	- if(test==10)
	- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
	- functor_team_vector_reduce_join<Scalar, ExecutionSpace>(d_flag));
	-
	- Kokkos::deep_copy(h_flag,d_flag);
	-
	- return (h_flag() == 0);
	+template< typename Scalar, class ExecutionSpace >
	+bool test_scalar( int nteams, int team_size, int test ) {
	+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > d_flag( "flag" );
	+ typename Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace >::HostMirror h_flag( "h_flag" );
	+ h_flag() = 0;
	+ Kokkos::deep_copy( d_flag, h_flag );
	+
	+ if ( test == 0 ) {
	+ Kokkos::parallel_for( std::string( "A" ), Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
	+ functor_vec_red< Scalar, ExecutionSpace >( d_flag ) );
	+ }
	+ else if ( test == 1 ) {
	+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
	+ functor_vec_red_join< Scalar, ExecutionSpace >( d_flag ) );
	+ }
	+ else if ( test == 2 ) {
	+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
	+ functor_vec_scan< Scalar, ExecutionSpace >( d_flag ) );
	+ }
	+ else if ( test == 3 ) {
	+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
	+ functor_vec_for< Scalar, ExecutionSpace >( d_flag ) );
	+ }
	+ else if ( test == 4 ) {
	+ Kokkos::parallel_for( "B", Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
	+ functor_vec_single< Scalar, ExecutionSpace >( d_flag ) );
	+ }
	+ else if ( test == 5 ) {
	+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size ),
	+ functor_team_for< Scalar, ExecutionSpace >( d_flag ) );
	+ }
	+ else if ( test == 6 ) {
	+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size ),
	+ functor_team_reduce< Scalar, ExecutionSpace >( d_flag ) );
	+ }
	+ else if ( test == 7 ) {
	+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size ),
	+ functor_team_reduce_join< Scalar, ExecutionSpace >( d_flag ) );
	+ }
	+ else if ( test == 8 ) {
	+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
	+ functor_team_vector_for< Scalar, ExecutionSpace >( d_flag ) );
	+ }
	+ else if ( test == 9 ) {
	+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
	+ functor_team_vector_reduce< Scalar, ExecutionSpace >( d_flag ) );
	+ }
	+ else if ( test == 10 ) {
	+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
	+ functor_team_vector_reduce_join< Scalar, ExecutionSpace >( d_flag ) );
	+ }
	+
	+ Kokkos::deep_copy( h_flag, d_flag );
	+
	+ return ( h_flag() == 0 );
	}

	-template<class ExecutionSpace>
	-bool Test(int test) {
	+template< class ExecutionSpace >
	+bool Test( int test ) {
	bool passed = true;
	- passed = passed && test_scalar<int, ExecutionSpace>(317,33,test);
	- passed = passed && test_scalar<long long int, ExecutionSpace>(317,33,test);
	- passed = passed && test_scalar<float, ExecutionSpace>(317,33,test);
	- passed = passed && test_scalar<double, ExecutionSpace>(317,33,test);
	- passed = passed && test_scalar<my_complex, ExecutionSpace>(317,33,test);
	- return passed;
	-}
	+ passed = passed && test_scalar< int, ExecutionSpace >( 317, 33, test );
	+ passed = passed && test_scalar< long long int, ExecutionSpace >( 317, 33, test );
	+ passed = passed && test_scalar< float, ExecutionSpace >( 317, 33, test );
	+ passed = passed && test_scalar< double, ExecutionSpace >( 317, 33, test );
	+ passed = passed && test_scalar< my_complex, ExecutionSpace >( 317, 33, test );

	+ return passed;
	}

	+} // namespace TestTeamVector
	diff --git a/lib/kokkos/core/unit_test/TestTemplateMetaFunctions.hpp b/lib/kokkos/core/unit_test/TestTemplateMetaFunctions.hpp
	index 203c95267..7bcf3f8a3 100644
	--- a/lib/kokkos/core/unit_test/TestTemplateMetaFunctions.hpp
	+++ b/lib/kokkos/core/unit_test/TestTemplateMetaFunctions.hpp
	@@ -1,198 +1,208 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Core.hpp>

	#define KOKKOS_PRAGMA_UNROLL(a)

	namespace {

	-template<class Scalar, class ExecutionSpace>
	+template< class Scalar, class ExecutionSpace >
	struct SumPlain {
	typedef ExecutionSpace execution_space;
	- typedef typename Kokkos::View<Scalar*,execution_space> type;
	+ typedef typename Kokkos::View< Scalar*, execution_space > type;
	+
	type view;
	- SumPlain(type view_):view(view_) {}
	+
	+ SumPlain( type view_ ) : view( view_ ) {}

	KOKKOS_INLINE_FUNCTION
	- void operator() (int i, Scalar& val) {
	+ void operator() ( int i, Scalar & val ) {
	val += Scalar();
	}
	};

	-template<class Scalar, class ExecutionSpace>
	+template< class Scalar, class ExecutionSpace >
	struct SumInitJoinFinalValueType {
	typedef ExecutionSpace execution_space;
	- typedef typename Kokkos::View<Scalar*,execution_space> type;
	- type view;
	+ typedef typename Kokkos::View< Scalar*, execution_space > type;
	typedef Scalar value_type;
	- SumInitJoinFinalValueType(type view_):view(view_) {}
	+
	+ type view;
	+
	+ SumInitJoinFinalValueType( type view_ ) : view( view_ ) {}

	KOKKOS_INLINE_FUNCTION
	- void init(value_type& val) const {
	+ void init( value_type & val ) const {
	val = value_type();
	}

	KOKKOS_INLINE_FUNCTION
	- void join(volatile value_type& val, volatile value_type& src) const {
	+ void join( volatile value_type & val, volatile value_type & src ) const {
	val += src;
	}

	KOKKOS_INLINE_FUNCTION
	- void operator() (int i, value_type& val) const {
	+ void operator()( int i, value_type & val ) const {
	val += value_type();
	}
	-
	};

	-template<class Scalar, class ExecutionSpace>
	+template< class Scalar, class ExecutionSpace >
	struct SumInitJoinFinalValueType2 {
	typedef ExecutionSpace execution_space;
	- typedef typename Kokkos::View<Scalar*,execution_space> type;
	- type view;
	+ typedef typename Kokkos::View< Scalar*, execution_space > type;
	typedef Scalar value_type;
	- SumInitJoinFinalValueType2(type view_):view(view_) {}
	+
	+ type view;
	+
	+ SumInitJoinFinalValueType2( type view_ ) : view( view_ ) {}

	KOKKOS_INLINE_FUNCTION
	- void init(volatile value_type& val) const {
	+ void init( volatile value_type & val ) const {
	val = value_type();
	}

	KOKKOS_INLINE_FUNCTION
	- void join(volatile value_type& val, const volatile value_type& src) const {
	+ void join( volatile value_type & val, const volatile value_type & src ) const {
	val += src;
	}

	KOKKOS_INLINE_FUNCTION
	- void operator() (int i, value_type& val) const {
	+ void operator()( int i, value_type & val ) const {
	val += value_type();
	}
	-
	};

	-template<class Scalar, class ExecutionSpace>
	+template< class Scalar, class ExecutionSpace >
	struct SumInitJoinFinalValueTypeArray {
	typedef ExecutionSpace execution_space;
	- typedef typename Kokkos::View<Scalar*,execution_space> type;
	- type view;
	+ typedef typename Kokkos::View< Scalar*, execution_space > type;
	typedef Scalar value_type[];
	+
	+ type view;
	int n;
	- SumInitJoinFinalValueTypeArray(type view_, int n_):view(view_),n(n_) {}
	+
	+ SumInitJoinFinalValueTypeArray( type view_, int n_ ) : view( view_ ), n( n_ ) {}

	KOKKOS_INLINE_FUNCTION
	- void init(value_type val) const {
	- for(int k=0;k<n;k++)
	+ void init( value_type val ) const {
	+ for ( int k = 0; k < n; k++ ) {
	val[k] = 0;
	+ }
	}

	KOKKOS_INLINE_FUNCTION
	- void join(volatile value_type val, const volatile value_type src) const {
	- for(int k=0;k<n;k++)
	+ void join( volatile value_type val, const volatile value_type src ) const {
	+ for ( int k = 0; k < n; k++ ) {
	val[k] += src[k];
	+ }
	}

	KOKKOS_INLINE_FUNCTION
	- void operator() (int i, value_type val) const {
	- for(int k=0;k<n;k++)
	- val[k] += k*i;
	+ void operator()( int i, value_type val ) const {
	+ for ( int k = 0; k < n; k++ ) {
	+ val[k] += k * i;
	+ }
	}
	-
	};

	-template<class Scalar, class ExecutionSpace>
	+template< class Scalar, class ExecutionSpace >
	struct SumWrongInitJoinFinalValueType {
	typedef ExecutionSpace execution_space;
	- typedef typename Kokkos::View<Scalar*,execution_space> type;
	- type view;
	+ typedef typename Kokkos::View< Scalar*, execution_space > type;
	typedef Scalar value_type;
	- SumWrongInitJoinFinalValueType(type view_):view(view_) {}
	+
	+ type view;
	+
	+ SumWrongInitJoinFinalValueType( type view_ ) : view( view_ ) {}

	KOKKOS_INLINE_FUNCTION
	- void init(double& val) const {
	+ void init( double & val ) const {
	val = double();
	}

	KOKKOS_INLINE_FUNCTION
	- void join(volatile value_type& val, const value_type& src) const {
	+ void join( volatile value_type & val, const value_type & src ) const {
	val += src;
	}

	KOKKOS_INLINE_FUNCTION
	- void operator() (int i, value_type& val) const {
	+ void operator()( int i, value_type & val ) const {
	val += value_type();
	}
	-
	};

	-template<class Scalar, class ExecutionSpace>
	+template< class Scalar, class ExecutionSpace >
	void TestTemplateMetaFunctions() {
	- typedef typename Kokkos::View<Scalar*,ExecutionSpace> type;
	- type a("A",100);
	+ typedef typename Kokkos::View< Scalar*, ExecutionSpace > type;
	+ type a( "A", 100 );
	/*
	- int sum_plain_has_init_arg = Kokkos::Impl::FunctorHasInit<SumPlain<Scalar,ExecutionSpace>, Scalar& >::value;
	- ASSERT_EQ(sum_plain_has_init_arg,0);
	- int sum_initjoinfinalvaluetype_has_init_arg = Kokkos::Impl::FunctorHasInit<SumInitJoinFinalValueType<Scalar,ExecutionSpace>, Scalar >::value;
	- ASSERT_EQ(sum_initjoinfinalvaluetype_has_init_arg,1);
	- int sum_initjoinfinalvaluetype_has_init_arg2 = Kokkos::Impl::FunctorHasInit<SumInitJoinFinalValueType2<Scalar,ExecutionSpace>, Scalar >::value;
	- ASSERT_EQ(sum_initjoinfinalvaluetype_has_init_arg2,1);
	- int sum_wronginitjoinfinalvaluetype_has_init_arg = Kokkos::Impl::FunctorHasInit<SumWrongInitJoinFinalValueType<Scalar,ExecutionSpace>, Scalar >::value;
	- ASSERT_EQ(sum_wronginitjoinfinalvaluetype_has_init_arg,0);
	-
	- //int sum_initjoinfinalvaluetypearray_has_init_arg = Kokkos::Impl::FunctorHasInit<SumInitJoinFinalValueTypeArray<Scalar,ExecutionSpace>, Scalar[] >::value;
	- //ASSERT_EQ(sum_initjoinfinalvaluetypearray_has_init_arg,1);
	-
	- //printf("Values Init: %i %i %i\n",sum_plain_has_init_arg,sum_initjoinfinalvaluetype_has_init_arg,sum_wronginitjoinfinalvaluetype_has_init_arg);
	-
	- int sum_plain_has_join_arg = Kokkos::Impl::FunctorHasJoin<SumPlain<Scalar,ExecutionSpace>, Scalar >::value;
	- ASSERT_EQ(sum_plain_has_join_arg,0);
	- int sum_initjoinfinalvaluetype_has_join_arg = Kokkos::Impl::FunctorHasJoin<SumInitJoinFinalValueType<Scalar,ExecutionSpace>, Scalar >::value;
	- ASSERT_EQ(sum_initjoinfinalvaluetype_has_join_arg,1);
	- int sum_initjoinfinalvaluetype_has_join_arg2 = Kokkos::Impl::FunctorHasJoin<SumInitJoinFinalValueType2<Scalar,ExecutionSpace>, Scalar >::value;
	- ASSERT_EQ(sum_initjoinfinalvaluetype_has_join_arg2,1);
	- int sum_wronginitjoinfinalvaluetype_has_join_arg = Kokkos::Impl::FunctorHasJoin<SumWrongInitJoinFinalValueType<Scalar,ExecutionSpace>, Scalar >::value;
	- ASSERT_EQ(sum_wronginitjoinfinalvaluetype_has_join_arg,0);
	+ int sum_plain_has_init_arg = Kokkos::Impl::FunctorHasInit< SumPlain<Scalar, ExecutionSpace>, Scalar & >::value;
	+ ASSERT_EQ( sum_plain_has_init_arg, 0 );
	+ int sum_initjoinfinalvaluetype_has_init_arg = Kokkos::Impl::FunctorHasInit< SumInitJoinFinalValueType<Scalar, ExecutionSpace>, Scalar >::value;
	+ ASSERT_EQ( sum_initjoinfinalvaluetype_has_init_arg, 1 );
	+ int sum_initjoinfinalvaluetype_has_init_arg2 = Kokkos::Impl::FunctorHasInit< SumInitJoinFinalValueType2<Scalar,ExecutionSpace>, Scalar >::value;
	+ ASSERT_EQ( sum_initjoinfinalvaluetype_has_init_arg2, 1 );
	+ int sum_wronginitjoinfinalvaluetype_has_init_arg = Kokkos::Impl::FunctorHasInit< SumWrongInitJoinFinalValueType<Scalar, ExecutionSpace>, Scalar >::value;
	+ ASSERT_EQ( sum_wronginitjoinfinalvaluetype_has_init_arg, 0 );
	+
	+ //int sum_initjoinfinalvaluetypearray_has_init_arg = Kokkos::Impl::FunctorHasInit< SumInitJoinFinalValueTypeArray<Scalar, ExecutionSpace>, Scalar[] >::value;
	+ //ASSERT_EQ( sum_initjoinfinalvaluetypearray_has_init_arg, 1 );
	+
	+ //printf( "Values Init: %i %i %i\n", sum_plain_has_init_arg, sum_initjoinfinalvaluetype_has_init_arg, sum_wronginitjoinfinalvaluetype_has_init_arg );
	+
	+ int sum_plain_has_join_arg = Kokkos::Impl::FunctorHasJoin< SumPlain<Scalar, ExecutionSpace>, Scalar >::value;
	+ ASSERT_EQ( sum_plain_has_join_arg, 0 );
	+ int sum_initjoinfinalvaluetype_has_join_arg = Kokkos::Impl::FunctorHasJoin< SumInitJoinFinalValueType<Scalar, ExecutionSpace>, Scalar >::value;
	+ ASSERT_EQ( sum_initjoinfinalvaluetype_has_join_arg, 1 );
	+ int sum_initjoinfinalvaluetype_has_join_arg2 = Kokkos::Impl::FunctorHasJoin< SumInitJoinFinalValueType2<Scalar, ExecutionSpace>, Scalar >::value;
	+ ASSERT_EQ( sum_initjoinfinalvaluetype_has_join_arg2, 1 );
	+ int sum_wronginitjoinfinalvaluetype_has_join_arg = Kokkos::Impl::FunctorHasJoin< SumWrongInitJoinFinalValueType<Scalar, ExecutionSpace>, Scalar >::value;
	+ ASSERT_EQ( sum_wronginitjoinfinalvaluetype_has_join_arg, 0 );
	+
	+ //printf( "Values Join: %i %i %i\n", sum_plain_has_join_arg, sum_initjoinfinalvaluetype_has_join_arg, sum_wronginitjoinfinalvaluetype_has_join_arg );
	*/
	- //printf("Values Join: %i %i %i\n",sum_plain_has_join_arg,sum_initjoinfinalvaluetype_has_join_arg,sum_wronginitjoinfinalvaluetype_has_join_arg);
	}

	-}
	+} // namespace
	diff --git a/lib/kokkos/core/unit_test/TestTile.hpp b/lib/kokkos/core/unit_test/TestTile.hpp
	index 842131deb..7d096c24c 100644
	--- a/lib/kokkos/core/unit_test/TestTile.hpp
	+++ b/lib/kokkos/core/unit_test/TestTile.hpp
	@@ -1,154 +1,142 @@
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER

	#ifndef TEST_TILE_HPP
	#define TEST_TILE_HPP

	#include <Kokkos_Core.hpp>
	#include <impl/Kokkos_ViewTile.hpp>

	namespace TestTile {

	-template < typename Device , typename TileLayout>
	+template < typename Device, typename TileLayout >
	struct ReduceTileErrors
	{
	- typedef Device execution_space ;
	-
	- typedef Kokkos::View< ptrdiff_t**, TileLayout, Device> array_type;
	- typedef Kokkos::View< ptrdiff_t[ TileLayout::N0 ][ TileLayout::N1 ], Kokkos::LayoutLeft , Device > tile_type ;
	-
	- array_type m_array ;
	-
	+ typedef Device execution_space;
	+ typedef Kokkos::View< ptrdiff_t**, TileLayout, Device > array_type;
	+ typedef Kokkos::View< ptrdiff_t[ TileLayout::N0 ][ TileLayout::N1 ], Kokkos::LayoutLeft, Device > tile_type;
	typedef ptrdiff_t value_type;

	- ReduceTileErrors( array_type a )
	- : m_array(a)
	- {}
	+ array_type m_array;

	+ ReduceTileErrors( array_type a ) : m_array( a ) {}

	KOKKOS_INLINE_FUNCTION
	- static void init( value_type & errors )
	- {
	- errors = 0;
	- }
	+ static void init( value_type & errors ) { errors = 0; }

	KOKKOS_INLINE_FUNCTION
	- static void join( volatile value_type & errors ,
	+ static void join( volatile value_type & errors,
	const volatile value_type & src_errors )
	{
	errors += src_errors;
	}

	- // Initialize
	+ // Initialize.
	KOKKOS_INLINE_FUNCTION
	void operator()( size_t iwork ) const
	{
	const size_t i = iwork % m_array.dimension_0();
	const size_t j = iwork / m_array.dimension_0();
	- if ( j < m_array.dimension_1() ) {
	- m_array(i,j) = & m_array(i,j) - & m_array(0,0);

	-// printf("m_array(%d,%d) = %d\n",int(i),int(j),int(m_array(i,j)));
	+ if ( j < m_array.dimension_1() ) {
	+ m_array( i, j ) = &m_array( i, j ) - &m_array( 0, 0 );

	+ //printf( "m_array(%d, %d) = %d\n", int( i ), int( j ), int( m_array( i, j ) ) );
	}
	}

	// Verify:
	KOKKOS_INLINE_FUNCTION
	- void operator()( size_t iwork , value_type & errors ) const
	+ void operator()( size_t iwork, value_type & errors ) const
	{
	- const size_t tile_dim0 = ( m_array.dimension_0() + TileLayout::N0 - 1 ) / TileLayout::N0 ;
	- const size_t tile_dim1 = ( m_array.dimension_1() + TileLayout::N1 - 1 ) / TileLayout::N1 ;
	+ const size_t tile_dim0 = ( m_array.dimension_0() + TileLayout::N0 - 1 ) / TileLayout::N0;
	+ const size_t tile_dim1 = ( m_array.dimension_1() + TileLayout::N1 - 1 ) / TileLayout::N1;

	- const size_t itile = iwork % tile_dim0 ;
	- const size_t jtile = iwork / tile_dim0 ;
	+ const size_t itile = iwork % tile_dim0;
	+ const size_t jtile = iwork / tile_dim0;

	if ( jtile < tile_dim1 ) {
	+ tile_type tile = Kokkos::Experimental::tile_subview( m_array, itile, jtile );

	- tile_type tile = Kokkos::Experimental::tile_subview( m_array , itile , jtile );
	-
	- if ( tile(0,0) != ptrdiff_t(( itile + jtile * tile_dim0 ) * TileLayout::N0 * TileLayout::N1 ) ) {
	- ++errors ;
	+ if ( tile( 0, 0 ) != ptrdiff_t( ( itile + jtile * tile_dim0 ) * TileLayout::N0 * TileLayout::N1 ) ) {
	+ ++errors;
	}
	else {
	+ for ( size_t j = 0; j < size_t( TileLayout::N1 ); ++j ) {
	+ for ( size_t i = 0; i < size_t( TileLayout::N0 ); ++i ) {
	+ const size_t iglobal = i + itile * TileLayout::N0;
	+ const size_t jglobal = j + jtile * TileLayout::N1;

	- for ( size_t j = 0 ; j < size_t(TileLayout::N1) ; ++j ) {
	- for ( size_t i = 0 ; i < size_t(TileLayout::N0) ; ++i ) {
	- const size_t iglobal = i + itile * TileLayout::N0 ;
	- const size_t jglobal = j + jtile * TileLayout::N1 ;
	-
	- if ( iglobal < m_array.dimension_0() && jglobal < m_array.dimension_1() ) {
	- if ( tile(i,j) != ptrdiff_t( tile(0,0) + i + j * TileLayout::N0 ) ) ++errors ;
	-
	-// printf("tile(%d,%d)(%d,%d) = %d\n",int(itile),int(jtile),int(i),int(j),int(tile(i,j)));
	+ if ( iglobal < m_array.dimension_0() && jglobal < m_array.dimension_1() ) {
	+ if ( tile( i, j ) != ptrdiff_t( tile( 0, 0 ) + i + j * TileLayout::N0 ) ) ++errors;

	+ //printf( "tile(%d, %d)(%d, %d) = %d\n", int( itile ), int( jtile ), int( i ), int( j ), int( tile( i, j ) ) );
	+ }
	}
	}
	- }
	}
	}
	}
	};

	-template< class Space , unsigned N0 , unsigned N1 >
	-void test( const size_t dim0 , const size_t dim1 )
	+template< class Space, unsigned N0, unsigned N1 >
	+void test( const size_t dim0, const size_t dim1 )
	{
	- typedef Kokkos::LayoutTileLeft<N0,N1> array_layout ;
	- typedef ReduceTileErrors< Space , array_layout > functor_type ;
	+ typedef Kokkos::LayoutTileLeft< N0, N1 > array_layout;
	+ typedef ReduceTileErrors< Space, array_layout > functor_type;

	- const size_t tile_dim0 = ( dim0 + N0 - 1 ) / N0 ;
	- const size_t tile_dim1 = ( dim1 + N1 - 1 ) / N1 ;
	-
	- typename functor_type::array_type array("",dim0,dim1);
	+ const size_t tile_dim0 = ( dim0 + N0 - 1 ) / N0;
	+ const size_t tile_dim1 = ( dim1 + N1 - 1 ) / N1;

	- Kokkos::parallel_for( Kokkos::RangePolicy<Space,size_t>(0,dim0*dim1) , functor_type( array ) );
	+ typename functor_type::array_type array( "", dim0, dim1 );

	- ptrdiff_t error = 0 ;
	+ Kokkos::parallel_for( Kokkos::RangePolicy< Space, size_t >( 0, dim0 * dim1 ), functor_type( array ) );

	- Kokkos::parallel_reduce( Kokkos::RangePolicy<Space,size_t>(0,tile_dim0*tile_dim1) , functor_type( array ) , error );
	+ ptrdiff_t error = 0;

	- EXPECT_EQ( error , ptrdiff_t(0) );
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< Space, size_t >( 0, tile_dim0 * tile_dim1 ), functor_type( array ), error );
	+
	+ EXPECT_EQ( error, ptrdiff_t( 0 ) );
	}

	-} /* namespace TestTile */
	+} // namespace TestTile

	#endif //TEST_TILE_HPP
	-
	diff --git a/lib/kokkos/core/unit_test/TestUtilities.hpp b/lib/kokkos/core/unit_test/TestUtilities.hpp
	index 947be03e3..be4a93b89 100644
	--- a/lib/kokkos/core/unit_test/TestUtilities.hpp
	+++ b/lib/kokkos/core/unit_test/TestUtilities.hpp
	@@ -1,306 +1,301 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <stdexcept>
	#include <sstream>
	#include <iostream>

	#include <Kokkos_Core.hpp>

	-/--------------------------------------------------------------------------/
	-
	namespace Test {

	inline
	void test_utilities()
	{
	using namespace Kokkos::Impl;
	+
	{
	- using i = integer_sequence<int>;
	- using j = make_integer_sequence<int,0>;
	+ using i = integer_sequence< int >;
	+ using j = make_integer_sequence< int, 0 >;

	- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
	+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
	static_assert( i::size() == 0u, "Error: integer_sequence.size()" );
	}

	-
	{
	- using i = integer_sequence<int,0>;
	- using j = make_integer_sequence<int,1>;
	+ using i = integer_sequence< int, 0 >;
	+ using j = make_integer_sequence< int, 1 >;

	- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
	+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
	static_assert( i::size() == 1u, "Error: integer_sequence.size()" );

	- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );

	- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
	}

	-
	{
	- using i = integer_sequence<int,0,1>;
	- using j = make_integer_sequence<int,2>;
	+ using i = integer_sequence< int, 0, 1 >;
	+ using j = make_integer_sequence< int, 2 >;

	- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
	+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
	static_assert( i::size() == 2u, "Error: integer_sequence.size()" );

	- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );

	- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
	}

	{
	- using i = integer_sequence<int,0,1,2>;
	- using j = make_integer_sequence<int,3>;
	+ using i = integer_sequence< int, 0, 1, 2 >;
	+ using j = make_integer_sequence< int, 3 >;

	- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
	+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
	static_assert( i::size() == 3u, "Error: integer_sequence.size()" );

	- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );

	- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
	}

	{
	- using i = integer_sequence<int,0,1,2,3>;
	- using j = make_integer_sequence<int,4>;
	+ using i = integer_sequence< int, 0, 1, 2, 3 >;
	+ using j = make_integer_sequence< int, 4 >;

	- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
	+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
	static_assert( i::size() == 4u, "Error: integer_sequence.size()" );

	- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 3, i >::value == 3, "Error: integer_sequence_at" );

	- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 3, i{} ) == 3, "Error: at(unsigned, integer_sequence)" );
	}

	{
	- using i = integer_sequence<int,0,1,2,3,4>;
	- using j = make_integer_sequence<int,5>;
	+ using i = integer_sequence< int, 0, 1, 2, 3, 4 >;
	+ using j = make_integer_sequence< int, 5 >;

	- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
	+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
	static_assert( i::size() == 5u, "Error: integer_sequence.size()" );

	- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
	-
	- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 3, i >::value == 3, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 4, i >::value == 4, "Error: integer_sequence_at" );
	+
	+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 3, i{} ) == 3, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 4, i{} ) == 4, "Error: at(unsigned, integer_sequence)" );
	}

	{
	- using i = integer_sequence<int,0,1,2,3,4,5>;
	- using j = make_integer_sequence<int,6>;
	+ using i = integer_sequence< int, 0, 1, 2, 3, 4, 5 >;
	+ using j = make_integer_sequence< int, 6 >;

	- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
	+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
	static_assert( i::size() == 6u, "Error: integer_sequence.size()" );

	- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<5, i>::value == 5, "Error: integer_sequence_at" );
	-
	- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(5, i{}) == 5, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 3, i >::value == 3, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 4, i >::value == 4, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 5, i >::value == 5, "Error: integer_sequence_at" );
	+
	+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 3, i{} ) == 3, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 4, i{} ) == 4, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 5, i{} ) == 5, "Error: at(unsigned, integer_sequence)" );
	}

	{
	- using i = integer_sequence<int,0,1,2,3,4,5,6>;
	- using j = make_integer_sequence<int,7>;
	+ using i = integer_sequence< int, 0, 1, 2, 3, 4, 5, 6 >;
	+ using j = make_integer_sequence< int, 7 >;

	- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
	+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
	static_assert( i::size() == 7u, "Error: integer_sequence.size()" );

	- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<5, i>::value == 5, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<6, i>::value == 6, "Error: integer_sequence_at" );
	-
	- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(5, i{}) == 5, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(6, i{}) == 6, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 3, i >::value == 3, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 4, i >::value == 4, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 5, i >::value == 5, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 6, i >::value == 6, "Error: integer_sequence_at" );
	+
	+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 3, i{} ) == 3, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 4, i{} ) == 4, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 5, i{} ) == 5, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 6, i{} ) == 6, "Error: at(unsigned, integer_sequence)" );
	}

	{
	- using i = integer_sequence<int,0,1,2,3,4,5,6,7>;
	- using j = make_integer_sequence<int,8>;
	+ using i = integer_sequence< int, 0, 1, 2, 3, 4, 5, 6, 7 >;
	+ using j = make_integer_sequence< int, 8 >;

	- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
	+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
	static_assert( i::size() == 8u, "Error: integer_sequence.size()" );

	- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<5, i>::value == 5, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<6, i>::value == 6, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<7, i>::value == 7, "Error: integer_sequence_at" );
	-
	- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(5, i{}) == 5, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(6, i{}) == 6, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(7, i{}) == 7, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 3, i >::value == 3, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 4, i >::value == 4, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 5, i >::value == 5, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 6, i >::value == 6, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 7, i >::value == 7, "Error: integer_sequence_at" );
	+
	+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 3, i{} ) == 3, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 4, i{} ) == 4, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 5, i{} ) == 5, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 6, i{} ) == 6, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 7, i{} ) == 7, "Error: at(unsigned, integer_sequence)" );
	}

	{
	- using i = integer_sequence<int,0,1,2,3,4,5,6,7,8>;
	- using j = make_integer_sequence<int,9>;
	+ using i = integer_sequence< int, 0, 1, 2, 3, 4, 5, 6, 7, 8 >;
	+ using j = make_integer_sequence< int, 9 >;

	- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
	+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
	static_assert( i::size() == 9u, "Error: integer_sequence.size()" );

	- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<5, i>::value == 5, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<6, i>::value == 6, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<7, i>::value == 7, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<8, i>::value == 8, "Error: integer_sequence_at" );
	-
	- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(5, i{}) == 5, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(6, i{}) == 6, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(7, i{}) == 7, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(8, i{}) == 8, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 3, i >::value == 3, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 4, i >::value == 4, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 5, i >::value == 5, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 6, i >::value == 6, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 7, i >::value == 7, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 8, i >::value == 8, "Error: integer_sequence_at" );
	+
	+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 3, i{} ) == 3, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 4, i{} ) == 4, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 5, i{} ) == 5, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 6, i{} ) == 6, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 7, i{} ) == 7, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 8, i{} ) == 8, "Error: at(unsigned, integer_sequence)" );
	}

	{
	- using i = integer_sequence<int,0,1,2,3,4,5,6,7,8,9>;
	- using j = make_integer_sequence<int,10>;
	+ using i = integer_sequence< int, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 >;
	+ using j = make_integer_sequence< int, 10 >;

	- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
	+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
	static_assert( i::size() == 10u, "Error: integer_sequence.size()" );

	- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<5, i>::value == 5, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<6, i>::value == 6, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<7, i>::value == 7, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<8, i>::value == 8, "Error: integer_sequence_at" );
	- static_assert( integer_sequence_at<9, i>::value == 9, "Error: integer_sequence_at" );
	-
	- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(5, i{}) == 5, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(6, i{}) == 6, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(7, i{}) == 7, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(8, i{}) == 8, "Error: at(unsigned, integer_sequence)" );
	- static_assert( at(9, i{}) == 9, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 3, i >::value == 3, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 4, i >::value == 4, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 5, i >::value == 5, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 6, i >::value == 6, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 7, i >::value == 7, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 8, i >::value == 8, "Error: integer_sequence_at" );
	+ static_assert( integer_sequence_at< 9, i >::value == 9, "Error: integer_sequence_at" );
	+
	+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 3, i{} ) == 3, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 4, i{} ) == 4, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 5, i{} ) == 5, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 6, i{} ) == 6, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 7, i{} ) == 7, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 8, i{} ) == 8, "Error: at(unsigned, integer_sequence)" );
	+ static_assert( at( 9, i{} ) == 9, "Error: at(unsigned, integer_sequence)" );
	}

	{
	- using i = make_integer_sequence<int, 5>;
	- using r = reverse_integer_sequence<i>;
	- using gr = integer_sequence<int, 4, 3, 2, 1, 0>;
	+ using i = make_integer_sequence< int, 5 >;
	+ using r = reverse_integer_sequence< i >;
	+ using gr = integer_sequence< int, 4, 3, 2, 1, 0 >;

	- static_assert( std::is_same<r,gr>::value, "Error: reverse_integer_sequence" );
	+ static_assert( std::is_same< r, gr >::value, "Error: reverse_integer_sequence" );
	}

	{
	- using s = make_integer_sequence<int,10>;
	- using e = exclusive_scan_integer_sequence<s>;
	- using i = inclusive_scan_integer_sequence<s>;
	+ using s = make_integer_sequence< int, 10 >;
	+ using e = exclusive_scan_integer_sequence< s >;
	+ using i = inclusive_scan_integer_sequence< s >;

	- using ge = integer_sequence<int, 0, 0, 1, 3, 6, 10, 15, 21, 28, 36>;
	- using gi = integer_sequence<int, 0, 1, 3, 6, 10, 15, 21, 28, 36, 45>;
	+ using ge = integer_sequence< int, 0, 0, 1, 3, 6, 10, 15, 21, 28, 36 >;
	+ using gi = integer_sequence< int, 0, 1, 3, 6, 10, 15, 21, 28, 36, 45 >;

	- static_assert( e::value == 45, "Error: scan value");
	- static_assert( i::value == 45, "Error: scan value");
	+ static_assert( e::value == 45, "Error: scan value" );
	+ static_assert( i::value == 45, "Error: scan value" );

	- static_assert( std::is_same< e::type, ge >::value, "Error: exclusive_scan");
	- static_assert( std::is_same< i::type, gi >::value, "Error: inclusive_scan");
	+ static_assert( std::is_same< e::type, ge >::value, "Error: exclusive_scan" );
	+ static_assert( std::is_same< i::type, gi >::value, "Error: inclusive_scan" );
	}
	-
	-
	}

	} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/TestViewAPI.hpp b/lib/kokkos/core/unit_test/TestViewAPI.hpp
	index a96f31cc1..cbf86dc58 100644
	--- a/lib/kokkos/core/unit_test/TestViewAPI.hpp
	+++ b/lib/kokkos/core/unit_test/TestViewAPI.hpp
	@@ -1,1361 +1,1322 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <Kokkos_Core.hpp>
	#include <stdexcept>
	#include <sstream>
	#include <iostream>

	-/--------------------------------------------------------------------------/
	-
	-
	-/--------------------------------------------------------------------------/
	-
	namespace Test {

	-template< class T , class ... P >
	-size_t allocation_count( const Kokkos::View<T,P...> & view )
	+template< class T, class ... P >
	+size_t allocation_count( const Kokkos::View< T, P... > & view )
	{
	const size_t card = view.size();
	const size_t alloc = view.span();

	- const int memory_span = Kokkos::View<int*>::required_allocation_size(100);
	+ const int memory_span = Kokkos::View< int* >::required_allocation_size( 100 );

	- return (card <= alloc && memory_span == 400) ? alloc : 0 ;
	+ return ( card <= alloc && memory_span == 400 ) ? alloc : 0;
	}

	/--------------------------------------------------------------------------/

	-template< typename T, class DeviceType>
	+template< typename T, class DeviceType >
	struct TestViewOperator
	{
	- typedef typename DeviceType::execution_space execution_space ;
	+ typedef typename DeviceType::execution_space execution_space;

	- static const unsigned N = 100 ;
	- static const unsigned D = 3 ;
	+ static const unsigned N = 100;
	+ static const unsigned D = 3;

	- typedef Kokkos::View< T*[D] , execution_space > view_type ;
	+ typedef Kokkos::View< T*[D], execution_space > view_type;

	- const view_type v1 ;
	- const view_type v2 ;
	+ const view_type v1;
	+ const view_type v2;

	TestViewOperator()
	- : v1( "v1" , N )
	- , v2( "v2" , N )
	+ : v1( "v1", N )
	+ , v2( "v2", N )
	{}

	static void testit()
	{
	- Kokkos::parallel_for( N , TestViewOperator() );
	+ Kokkos::parallel_for( N, TestViewOperator() );
	}

	KOKKOS_INLINE_FUNCTION
	void operator()( const unsigned i ) const
	{
	- const unsigned X = 0 ;
	- const unsigned Y = 1 ;
	- const unsigned Z = 2 ;
	+ const unsigned X = 0;
	+ const unsigned Y = 1;
	+ const unsigned Z = 2;

	- v2(i,X) = v1(i,X);
	- v2(i,Y) = v1(i,Y);
	- v2(i,Z) = v1(i,Z);
	+ v2( i, X ) = v1( i, X );
	+ v2( i, Y ) = v1( i, Y );
	+ v2( i, Z ) = v1( i, Z );
	}
	};

	/--------------------------------------------------------------------------/

	-template< class DataType ,
	- class DeviceType ,
	+template< class DataType,
	+ class DeviceType,
	unsigned Rank = Kokkos::ViewTraits< DataType >::rank >
	-struct TestViewOperator_LeftAndRight ;
	+struct TestViewOperator_LeftAndRight;

	-template< class DataType , class DeviceType >
	-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 8 >
	+template< class DataType, class DeviceType >
	+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 8 >
	{
	- typedef typename DeviceType::execution_space execution_space ;
	- typedef typename DeviceType::memory_space memory_space ;
	- typedef typename execution_space::size_type size_type ;
	+ typedef typename DeviceType::execution_space execution_space;
	+ typedef typename DeviceType::memory_space memory_space;
	+ typedef typename execution_space::size_type size_type;

	- typedef int value_type ;
	+ typedef int value_type;

	KOKKOS_INLINE_FUNCTION
	- static void join( volatile value_type & update ,
	+ static void join( volatile value_type & update,
	const volatile value_type & input )
	- { update \|= input ; }
	+ { update \|= input; }

	KOKKOS_INLINE_FUNCTION
	static void init( value_type & update )
	- { update = 0 ; }
	-
	+ { update = 0; }

	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutStride, execution_space > stride_view;

	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
	-
	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutStride, execution_space > stride_view ;
	-
	- left_view left ;
	- right_view right ;
	- stride_view left_stride ;
	- stride_view right_stride ;
	- long left_alloc ;
	- long right_alloc ;
	+ left_view left;
	+ right_view right;
	+ stride_view left_stride;
	+ stride_view right_stride;
	+ long left_alloc;
	+ long right_alloc;

	TestViewOperator_LeftAndRight()
	: left( "left" )
	, right( "right" )
	, left_stride( left )
	, right_stride( right )
	, left_alloc( allocation_count( left ) )
	, right_alloc( allocation_count( right ) )
	{}

	static void testit()
	{
	- TestViewOperator_LeftAndRight driver ;
	+ TestViewOperator_LeftAndRight driver;

	- int error_flag = 0 ;
	+ int error_flag = 0;

	- Kokkos::parallel_reduce( 1 , driver , error_flag );
	+ Kokkos::parallel_reduce( 1, driver, error_flag );

	- ASSERT_EQ( error_flag , 0 );
	+ ASSERT_EQ( error_flag, 0 );
	}

	KOKKOS_INLINE_FUNCTION
	- void operator()( const size_type , value_type & update ) const
	+ void operator()( const size_type, value_type & update ) const
	{
	- long offset ;
	-
	- offset = -1 ;
	- for ( unsigned i7 = 0 ; i7 < unsigned(left.dimension_7()) ; ++i7 )
	- for ( unsigned i6 = 0 ; i6 < unsigned(left.dimension_6()) ; ++i6 )
	- for ( unsigned i5 = 0 ; i5 < unsigned(left.dimension_5()) ; ++i5 )
	- for ( unsigned i4 = 0 ; i4 < unsigned(left.dimension_4()) ; ++i4 )
	- for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
	- for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
	- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
	- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
	+ long offset = -1;
	+
	+ for ( unsigned i7 = 0; i7 < unsigned( left.dimension_7() ); ++i7 )
	+ for ( unsigned i6 = 0; i6 < unsigned( left.dimension_6() ); ++i6 )
	+ for ( unsigned i5 = 0; i5 < unsigned( left.dimension_5() ); ++i5 )
	+ for ( unsigned i4 = 0; i4 < unsigned( left.dimension_4() ); ++i4 )
	+ for ( unsigned i3 = 0; i3 < unsigned( left.dimension_3() ); ++i3 )
	+ for ( unsigned i2 = 0; i2 < unsigned( left.dimension_2() ); ++i2 )
	+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
	+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
	{
	const long j = & left( i0, i1, i2, i3, i4, i5, i6, i7 ) -
	& left( 0, 0, 0, 0, 0, 0, 0, 0 );
	- if ( j <= offset \|\| left_alloc <= j ) { update \|= 1 ; }
	- offset = j ;
	+ if ( j <= offset \|\| left_alloc <= j ) { update \|= 1; }
	+ offset = j;

	- if ( & left(i0,i1,i2,i3,i4,i5,i6,i7) !=
	- & left_stride(i0,i1,i2,i3,i4,i5,i6,i7) ) {
	- update \|= 4 ;
	+ if ( & left( i0, i1, i2, i3, i4, i5, i6, i7 ) !=
	+ & left_stride( i0, i1, i2, i3, i4, i5, i6, i7 ) ) {
	+ update \|= 4;
	}
	}

	- offset = -1 ;
	- for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
	- for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
	- for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
	- for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
	- for ( unsigned i4 = 0 ; i4 < unsigned(right.dimension_4()) ; ++i4 )
	- for ( unsigned i5 = 0 ; i5 < unsigned(right.dimension_5()) ; ++i5 )
	- for ( unsigned i6 = 0 ; i6 < unsigned(right.dimension_6()) ; ++i6 )
	- for ( unsigned i7 = 0 ; i7 < unsigned(right.dimension_7()) ; ++i7 )
	+ offset = -1;
	+
	+ for ( unsigned i0 = 0; i0 < unsigned( right.dimension_0() ); ++i0 )
	+ for ( unsigned i1 = 0; i1 < unsigned( right.dimension_1() ); ++i1 )
	+ for ( unsigned i2 = 0; i2 < unsigned( right.dimension_2() ); ++i2 )
	+ for ( unsigned i3 = 0; i3 < unsigned( right.dimension_3() ); ++i3 )
	+ for ( unsigned i4 = 0; i4 < unsigned( right.dimension_4() ); ++i4 )
	+ for ( unsigned i5 = 0; i5 < unsigned( right.dimension_5() ); ++i5 )
	+ for ( unsigned i6 = 0; i6 < unsigned( right.dimension_6() ); ++i6 )
	+ for ( unsigned i7 = 0; i7 < unsigned( right.dimension_7() ); ++i7 )
	{
	const long j = & right( i0, i1, i2, i3, i4, i5, i6, i7 ) -
	& right( 0, 0, 0, 0, 0, 0, 0, 0 );
	- if ( j <= offset \|\| right_alloc <= j ) { update \|= 2 ; }
	- offset = j ;
	+ if ( j <= offset \|\| right_alloc <= j ) { update \|= 2; }
	+ offset = j;

	- if ( & right(i0,i1,i2,i3,i4,i5,i6,i7) !=
	- & right_stride(i0,i1,i2,i3,i4,i5,i6,i7) ) {
	- update \|= 8 ;
	+ if ( & right( i0, i1, i2, i3, i4, i5, i6, i7 ) !=
	+ & right_stride( i0, i1, i2, i3, i4, i5, i6, i7 ) ) {
	+ update \|= 8;
	}
	}
	}
	};

	-template< class DataType , class DeviceType >
	-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 7 >
	+template< class DataType, class DeviceType >
	+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 7 >
	{
	- typedef typename DeviceType::execution_space execution_space ;
	- typedef typename DeviceType::memory_space memory_space ;
	- typedef typename execution_space::size_type size_type ;
	+ typedef typename DeviceType::execution_space execution_space;
	+ typedef typename DeviceType::memory_space memory_space;
	+ typedef typename execution_space::size_type size_type;

	- typedef int value_type ;
	+ typedef int value_type;

	KOKKOS_INLINE_FUNCTION
	- static void join( volatile value_type & update ,
	+ static void join( volatile value_type & update,
	const volatile value_type & input )
	- { update \|= input ; }
	+ { update \|= input; }

	KOKKOS_INLINE_FUNCTION
	static void init( value_type & update )
	- { update = 0 ; }
	-
	-
	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
	+ { update = 0; }

	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;

	- left_view left ;
	- right_view right ;
	- long left_alloc ;
	- long right_alloc ;
	+ left_view left;
	+ right_view right;
	+ long left_alloc;
	+ long right_alloc;

	TestViewOperator_LeftAndRight()
	: left( "left" )
	, right( "right" )
	, left_alloc( allocation_count( left ) )
	, right_alloc( allocation_count( right ) )
	{}

	static void testit()
	{
	- TestViewOperator_LeftAndRight driver ;
	+ TestViewOperator_LeftAndRight driver;

	- int error_flag = 0 ;
	+ int error_flag = 0;

	- Kokkos::parallel_reduce( 1 , driver , error_flag );
	+ Kokkos::parallel_reduce( 1, driver, error_flag );

	- ASSERT_EQ( error_flag , 0 );
	+ ASSERT_EQ( error_flag, 0 );
	}

	KOKKOS_INLINE_FUNCTION
	- void operator()( const size_type , value_type & update ) const
	+ void operator()( const size_type, value_type & update ) const
	{
	- long offset ;
	-
	- offset = -1 ;
	- for ( unsigned i6 = 0 ; i6 < unsigned(left.dimension_6()) ; ++i6 )
	- for ( unsigned i5 = 0 ; i5 < unsigned(left.dimension_5()) ; ++i5 )
	- for ( unsigned i4 = 0 ; i4 < unsigned(left.dimension_4()) ; ++i4 )
	- for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
	- for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
	- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
	- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
	+ long offset = -1;
	+
	+ for ( unsigned i6 = 0; i6 < unsigned( left.dimension_6() ); ++i6 )
	+ for ( unsigned i5 = 0; i5 < unsigned( left.dimension_5() ); ++i5 )
	+ for ( unsigned i4 = 0; i4 < unsigned( left.dimension_4() ); ++i4 )
	+ for ( unsigned i3 = 0; i3 < unsigned( left.dimension_3() ); ++i3 )
	+ for ( unsigned i2 = 0; i2 < unsigned( left.dimension_2() ); ++i2 )
	+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
	+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
	{
	const long j = & left( i0, i1, i2, i3, i4, i5, i6 ) -
	& left( 0, 0, 0, 0, 0, 0, 0 );
	- if ( j <= offset \|\| left_alloc <= j ) { update \|= 1 ; }
	- offset = j ;
	+ if ( j <= offset \|\| left_alloc <= j ) { update \|= 1; }
	+ offset = j;
	}

	- offset = -1 ;
	- for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
	- for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
	- for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
	- for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
	- for ( unsigned i4 = 0 ; i4 < unsigned(right.dimension_4()) ; ++i4 )
	- for ( unsigned i5 = 0 ; i5 < unsigned(right.dimension_5()) ; ++i5 )
	- for ( unsigned i6 = 0 ; i6 < unsigned(right.dimension_6()) ; ++i6 )
	+ offset = -1;
	+
	+ for ( unsigned i0 = 0; i0 < unsigned( right.dimension_0() ); ++i0 )
	+ for ( unsigned i1 = 0; i1 < unsigned( right.dimension_1() ); ++i1 )
	+ for ( unsigned i2 = 0; i2 < unsigned( right.dimension_2() ); ++i2 )
	+ for ( unsigned i3 = 0; i3 < unsigned( right.dimension_3() ); ++i3 )
	+ for ( unsigned i4 = 0; i4 < unsigned( right.dimension_4() ); ++i4 )
	+ for ( unsigned i5 = 0; i5 < unsigned( right.dimension_5() ); ++i5 )
	+ for ( unsigned i6 = 0; i6 < unsigned( right.dimension_6() ); ++i6 )
	{
	const long j = & right( i0, i1, i2, i3, i4, i5, i6 ) -
	& right( 0, 0, 0, 0, 0, 0, 0 );
	- if ( j <= offset \|\| right_alloc <= j ) { update \|= 2 ; }
	- offset = j ;
	+ if ( j <= offset \|\| right_alloc <= j ) { update \|= 2; }
	+ offset = j;
	}
	}
	};

	-template< class DataType , class DeviceType >
	-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 6 >
	+template< class DataType, class DeviceType >
	+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 6 >
	{
	- typedef typename DeviceType::execution_space execution_space ;
	- typedef typename DeviceType::memory_space memory_space ;
	- typedef typename execution_space::size_type size_type ;
	+ typedef typename DeviceType::execution_space execution_space;
	+ typedef typename DeviceType::memory_space memory_space;
	+ typedef typename execution_space::size_type size_type;

	- typedef int value_type ;
	+ typedef int value_type;

	KOKKOS_INLINE_FUNCTION
	- static void join( volatile value_type & update ,
	+ static void join( volatile value_type & update,
	const volatile value_type & input )
	- { update \|= input ; }
	+ { update \|= input; }

	KOKKOS_INLINE_FUNCTION
	static void init( value_type & update )
	- { update = 0 ; }
	-
	-
	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
	+ { update = 0; }

	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;

	- left_view left ;
	- right_view right ;
	- long left_alloc ;
	- long right_alloc ;
	+ left_view left;
	+ right_view right;
	+ long left_alloc;
	+ long right_alloc;

	TestViewOperator_LeftAndRight()
	: left( "left" )
	, right( "right" )
	, left_alloc( allocation_count( left ) )
	, right_alloc( allocation_count( right ) )
	{}

	static void testit()
	{
	- TestViewOperator_LeftAndRight driver ;
	+ TestViewOperator_LeftAndRight driver;

	- int error_flag = 0 ;
	+ int error_flag = 0;

	- Kokkos::parallel_reduce( 1 , driver , error_flag );
	+ Kokkos::parallel_reduce( 1, driver, error_flag );

	- ASSERT_EQ( error_flag , 0 );
	+ ASSERT_EQ( error_flag, 0 );
	}

	KOKKOS_INLINE_FUNCTION
	- void operator()( const size_type , value_type & update ) const
	+ void operator()( const size_type, value_type & update ) const
	{
	- long offset ;
	-
	- offset = -1 ;
	- for ( unsigned i5 = 0 ; i5 < unsigned(left.dimension_5()) ; ++i5 )
	- for ( unsigned i4 = 0 ; i4 < unsigned(left.dimension_4()) ; ++i4 )
	- for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
	- for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
	- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
	- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
	+ long offset = -1;
	+
	+ for ( unsigned i5 = 0; i5 < unsigned( left.dimension_5() ); ++i5 )
	+ for ( unsigned i4 = 0; i4 < unsigned( left.dimension_4() ); ++i4 )
	+ for ( unsigned i3 = 0; i3 < unsigned( left.dimension_3() ); ++i3 )
	+ for ( unsigned i2 = 0; i2 < unsigned( left.dimension_2() ); ++i2 )
	+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
	+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
	{
	const long j = & left( i0, i1, i2, i3, i4, i5 ) -
	& left( 0, 0, 0, 0, 0, 0 );
	- if ( j <= offset \|\| left_alloc <= j ) { update \|= 1 ; }
	- offset = j ;
	+ if ( j <= offset \|\| left_alloc <= j ) { update \|= 1; }
	+ offset = j;
	}

	- offset = -1 ;
	- for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
	- for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
	- for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
	- for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
	- for ( unsigned i4 = 0 ; i4 < unsigned(right.dimension_4()) ; ++i4 )
	- for ( unsigned i5 = 0 ; i5 < unsigned(right.dimension_5()) ; ++i5 )
	+ offset = -1;
	+
	+ for ( unsigned i0 = 0; i0 < unsigned( right.dimension_0() ); ++i0 )
	+ for ( unsigned i1 = 0; i1 < unsigned( right.dimension_1() ); ++i1 )
	+ for ( unsigned i2 = 0; i2 < unsigned( right.dimension_2() ); ++i2 )
	+ for ( unsigned i3 = 0; i3 < unsigned( right.dimension_3() ); ++i3 )
	+ for ( unsigned i4 = 0; i4 < unsigned( right.dimension_4() ); ++i4 )
	+ for ( unsigned i5 = 0; i5 < unsigned( right.dimension_5() ); ++i5 )
	{
	const long j = & right( i0, i1, i2, i3, i4, i5 ) -
	& right( 0, 0, 0, 0, 0, 0 );
	- if ( j <= offset \|\| right_alloc <= j ) { update \|= 2 ; }
	- offset = j ;
	+ if ( j <= offset \|\| right_alloc <= j ) { update \|= 2; }
	+ offset = j;
	}
	}
	};

	-template< class DataType , class DeviceType >
	-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 5 >
	+template< class DataType, class DeviceType >
	+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 5 >
	{
	- typedef typename DeviceType::execution_space execution_space ;
	- typedef typename DeviceType::memory_space memory_space ;
	- typedef typename execution_space::size_type size_type ;
	+ typedef typename DeviceType::execution_space execution_space;
	+ typedef typename DeviceType::memory_space memory_space;
	+ typedef typename execution_space::size_type size_type;

	- typedef int value_type ;
	+ typedef int value_type;

	KOKKOS_INLINE_FUNCTION
	- static void join( volatile value_type & update ,
	+ static void join( volatile value_type & update,
	const volatile value_type & input )
	- { update \|= input ; }
	+ { update \|= input; }

	KOKKOS_INLINE_FUNCTION
	static void init( value_type & update )
	- { update = 0 ; }
	-
	+ { update = 0; }

	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutStride, execution_space > stride_view;

	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
	-
	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutStride, execution_space > stride_view ;
	-
	- left_view left ;
	- right_view right ;
	- stride_view left_stride ;
	- stride_view right_stride ;
	- long left_alloc ;
	- long right_alloc ;
	+ left_view left;
	+ right_view right;
	+ stride_view left_stride;
	+ stride_view right_stride;
	+ long left_alloc;
	+ long right_alloc;

	TestViewOperator_LeftAndRight()
	: left( "left" )
	, right( "right" )
	, left_stride( left )
	, right_stride( right )
	, left_alloc( allocation_count( left ) )
	, right_alloc( allocation_count( right ) )
	{}

	static void testit()
	{
	- TestViewOperator_LeftAndRight driver ;
	+ TestViewOperator_LeftAndRight driver;

	- int error_flag = 0 ;
	+ int error_flag = 0;

	- Kokkos::parallel_reduce( 1 , driver , error_flag );
	+ Kokkos::parallel_reduce( 1, driver, error_flag );

	- ASSERT_EQ( error_flag , 0 );
	+ ASSERT_EQ( error_flag, 0 );
	}

	KOKKOS_INLINE_FUNCTION
	- void operator()( const size_type , value_type & update ) const
	+ void operator()( const size_type, value_type & update ) const
	{
	- long offset ;
	-
	- offset = -1 ;
	- for ( unsigned i4 = 0 ; i4 < unsigned(left.dimension_4()) ; ++i4 )
	- for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
	- for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
	- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
	- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
	+ long offset = -1;
	+
	+ for ( unsigned i4 = 0; i4 < unsigned( left.dimension_4() ); ++i4 )
	+ for ( unsigned i3 = 0; i3 < unsigned( left.dimension_3() ); ++i3 )
	+ for ( unsigned i2 = 0; i2 < unsigned( left.dimension_2() ); ++i2 )
	+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
	+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
	{
	const long j = & left( i0, i1, i2, i3, i4 ) -
	& left( 0, 0, 0, 0, 0 );
	- if ( j <= offset \|\| left_alloc <= j ) { update \|= 1 ; }
	- offset = j ;
	+ if ( j <= offset \|\| left_alloc <= j ) { update \|= 1; }
	+ offset = j;

	if ( & left( i0, i1, i2, i3, i4 ) !=
	- & left_stride( i0, i1, i2, i3, i4 ) ) { update \|= 4 ; }
	+ & left_stride( i0, i1, i2, i3, i4 ) ) { update \|= 4; }
	}

	- offset = -1 ;
	- for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
	- for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
	- for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
	- for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
	- for ( unsigned i4 = 0 ; i4 < unsigned(right.dimension_4()) ; ++i4 )
	+ offset = -1;
	+
	+ for ( unsigned i0 = 0; i0 < unsigned( right.dimension_0() ); ++i0 )
	+ for ( unsigned i1 = 0; i1 < unsigned( right.dimension_1() ); ++i1 )
	+ for ( unsigned i2 = 0; i2 < unsigned( right.dimension_2() ); ++i2 )
	+ for ( unsigned i3 = 0; i3 < unsigned( right.dimension_3() ); ++i3 )
	+ for ( unsigned i4 = 0; i4 < unsigned( right.dimension_4() ); ++i4 )
	{
	const long j = & right( i0, i1, i2, i3, i4 ) -
	& right( 0, 0, 0, 0, 0 );
	- if ( j <= offset \|\| right_alloc <= j ) { update \|= 2 ; }
	- offset = j ;
	+ if ( j <= offset \|\| right_alloc <= j ) { update \|= 2; }
	+ offset = j;

	if ( & right( i0, i1, i2, i3, i4 ) !=
	- & right_stride( i0, i1, i2, i3, i4 ) ) { update \|= 8 ; }
	+ & right_stride( i0, i1, i2, i3, i4 ) ) { update \|= 8; }
	}
	}
	};

	-template< class DataType , class DeviceType >
	-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 4 >
	+template< class DataType, class DeviceType >
	+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 4 >
	{
	- typedef typename DeviceType::execution_space execution_space ;
	- typedef typename DeviceType::memory_space memory_space ;
	- typedef typename execution_space::size_type size_type ;
	+ typedef typename DeviceType::execution_space execution_space;
	+ typedef typename DeviceType::memory_space memory_space;
	+ typedef typename execution_space::size_type size_type;

	- typedef int value_type ;
	+ typedef int value_type;

	KOKKOS_INLINE_FUNCTION
	- static void join( volatile value_type & update ,
	+ static void join( volatile value_type & update,
	const volatile value_type & input )
	- { update \|= input ; }
	+ { update \|= input; }

	KOKKOS_INLINE_FUNCTION
	static void init( value_type & update )
	- { update = 0 ; }
	-
	+ { update = 0; }

	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;

	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
	-
	- left_view left ;
	- right_view right ;
	- long left_alloc ;
	- long right_alloc ;
	+ left_view left;
	+ right_view right;
	+ long left_alloc;
	+ long right_alloc;

	TestViewOperator_LeftAndRight()
	: left( "left" )
	, right( "right" )
	, left_alloc( allocation_count( left ) )
	, right_alloc( allocation_count( right ) )
	{}

	static void testit()
	{
	- TestViewOperator_LeftAndRight driver ;
	+ TestViewOperator_LeftAndRight driver;

	- int error_flag = 0 ;
	+ int error_flag = 0;

	- Kokkos::parallel_reduce( 1 , driver , error_flag );
	+ Kokkos::parallel_reduce( 1, driver, error_flag );

	- ASSERT_EQ( error_flag , 0 );
	+ ASSERT_EQ( error_flag, 0 );
	}

	KOKKOS_INLINE_FUNCTION
	- void operator()( const size_type , value_type & update ) const
	+ void operator()( const size_type, value_type & update ) const
	{
	- long offset ;
	+ long offset = -1;

	- offset = -1 ;
	- for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
	- for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
	- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
	- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
	+ for ( unsigned i3 = 0; i3 < unsigned( left.dimension_3() ); ++i3 )
	+ for ( unsigned i2 = 0; i2 < unsigned( left.dimension_2() ); ++i2 )
	+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
	+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
	{
	const long j = & left( i0, i1, i2, i3 ) -
	& left( 0, 0, 0, 0 );
	- if ( j <= offset \|\| left_alloc <= j ) { update \|= 1 ; }
	- offset = j ;
	+ if ( j <= offset \|\| left_alloc <= j ) { update \|= 1; }
	+ offset = j;
	}

	- offset = -1 ;
	- for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
	- for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
	- for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
	- for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
	+ offset = -1;
	+
	+ for ( unsigned i0 = 0; i0 < unsigned( right.dimension_0() ); ++i0 )
	+ for ( unsigned i1 = 0; i1 < unsigned( right.dimension_1() ); ++i1 )
	+ for ( unsigned i2 = 0; i2 < unsigned( right.dimension_2() ); ++i2 )
	+ for ( unsigned i3 = 0; i3 < unsigned( right.dimension_3() ); ++i3 )
	{
	const long j = & right( i0, i1, i2, i3 ) -
	& right( 0, 0, 0, 0 );
	- if ( j <= offset \|\| right_alloc <= j ) { update \|= 2 ; }
	- offset = j ;
	+ if ( j <= offset \|\| right_alloc <= j ) { update \|= 2; }
	+ offset = j;
	}
	}
	};

	-template< class DataType , class DeviceType >
	-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 3 >
	+template< class DataType, class DeviceType >
	+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 3 >
	{
	- typedef typename DeviceType::execution_space execution_space ;
	- typedef typename DeviceType::memory_space memory_space ;
	- typedef typename execution_space::size_type size_type ;
	+ typedef typename DeviceType::execution_space execution_space;
	+ typedef typename DeviceType::memory_space memory_space;
	+ typedef typename execution_space::size_type size_type;

	- typedef int value_type ;
	+ typedef int value_type;

	KOKKOS_INLINE_FUNCTION
	- static void join( volatile value_type & update ,
	+ static void join( volatile value_type & update,
	const volatile value_type & input )
	- { update \|= input ; }
	+ { update \|= input; }

	KOKKOS_INLINE_FUNCTION
	static void init( value_type & update )
	- { update = 0 ; }
	-
	-
	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
	-
	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
	+ { update = 0; }

	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutStride, execution_space > stride_view ;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutStride, execution_space > stride_view;

	- left_view left ;
	- right_view right ;
	- stride_view left_stride ;
	- stride_view right_stride ;
	- long left_alloc ;
	- long right_alloc ;
	+ left_view left;
	+ right_view right;
	+ stride_view left_stride;
	+ stride_view right_stride;
	+ long left_alloc;
	+ long right_alloc;

	TestViewOperator_LeftAndRight()
	- : left( std::string("left") )
	- , right( std::string("right") )
	+ : left( std::string( "left" ) )
	+ , right( std::string( "right" ) )
	, left_stride( left )
	, right_stride( right )
	, left_alloc( allocation_count( left ) )
	, right_alloc( allocation_count( right ) )
	{}

	static void testit()
	{
	- TestViewOperator_LeftAndRight driver ;
	+ TestViewOperator_LeftAndRight driver;

	- int error_flag = 0 ;
	+ int error_flag = 0;

	- Kokkos::parallel_reduce( 1 , driver , error_flag );
	+ Kokkos::parallel_reduce( 1, driver, error_flag );

	- ASSERT_EQ( error_flag , 0 );
	+ ASSERT_EQ( error_flag, 0 );
	}

	KOKKOS_INLINE_FUNCTION
	- void operator()( const size_type , value_type & update ) const
	+ void operator()( const size_type, value_type & update ) const
	{
	- long offset ;
	+ long offset = -1;

	- offset = -1 ;
	- for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
	- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
	- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
	+ for ( unsigned i2 = 0; i2 < unsigned( left.dimension_2() ); ++i2 )
	+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
	+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
	{
	const long j = & left( i0, i1, i2 ) -
	& left( 0, 0, 0 );
	- if ( j <= offset \|\| left_alloc <= j ) { update \|= 1 ; }
	- offset = j ;
	+ if ( j <= offset \|\| left_alloc <= j ) { update \|= 1; }
	+ offset = j;

	- if ( & left(i0,i1,i2) != & left_stride(i0,i1,i2) ) { update \|= 4 ; }
	+ if ( & left( i0, i1, i2 ) != & left_stride( i0, i1, i2 ) ) { update \|= 4; }
	}

	- offset = -1 ;
	- for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
	- for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
	- for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
	+ offset = -1;
	+
	+ for ( unsigned i0 = 0; i0 < unsigned( right.dimension_0() ); ++i0 )
	+ for ( unsigned i1 = 0; i1 < unsigned( right.dimension_1() ); ++i1 )
	+ for ( unsigned i2 = 0; i2 < unsigned( right.dimension_2() ); ++i2 )
	{
	const long j = & right( i0, i1, i2 ) -
	& right( 0, 0, 0 );
	- if ( j <= offset \|\| right_alloc <= j ) { update \|= 2 ; }
	- offset = j ;
	+ if ( j <= offset \|\| right_alloc <= j ) { update \|= 2; }
	+ offset = j;

	- if ( & right(i0,i1,i2) != & right_stride(i0,i1,i2) ) { update \|= 8 ; }
	+ if ( & right( i0, i1, i2 ) != & right_stride( i0, i1, i2 ) ) { update \|= 8; }
	}

	- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
	- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
	- for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
	+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
	+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
	+ for ( unsigned i2 = 0; i2 < unsigned( left.dimension_2() ); ++i2 )
	{
	- if ( & left(i0,i1,i2) != & left(i0,i1,i2,0,0,0,0,0) ) { update \|= 3 ; }
	- if ( & right(i0,i1,i2) != & right(i0,i1,i2,0,0,0,0,0) ) { update \|= 3 ; }
	+ if ( & left( i0, i1, i2 ) != & left( i0, i1, i2, 0, 0, 0, 0, 0 ) ) { update \|= 3; }
	+ if ( & right( i0, i1, i2 ) != & right( i0, i1, i2, 0, 0, 0, 0, 0 ) ) { update \|= 3; }
	}
	}
	};

	-template< class DataType , class DeviceType >
	-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 2 >
	+template< class DataType, class DeviceType >
	+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 2 >
	{
	- typedef typename DeviceType::execution_space execution_space ;
	- typedef typename DeviceType::memory_space memory_space ;
	- typedef typename execution_space::size_type size_type ;
	+ typedef typename DeviceType::execution_space execution_space;
	+ typedef typename DeviceType::memory_space memory_space;
	+ typedef typename execution_space::size_type size_type;

	- typedef int value_type ;
	+ typedef int value_type;

	KOKKOS_INLINE_FUNCTION
	- static void join( volatile value_type & update ,
	+ static void join( volatile value_type & update,
	const volatile value_type & input )
	- { update \|= input ; }
	+ { update \|= input; }

	KOKKOS_INLINE_FUNCTION
	static void init( value_type & update )
	- { update = 0 ; }
	-
	-
	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
	+ { update = 0; }

	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;

	- left_view left ;
	- right_view right ;
	- long left_alloc ;
	- long right_alloc ;
	+ left_view left;
	+ right_view right;
	+ long left_alloc;
	+ long right_alloc;

	TestViewOperator_LeftAndRight()
	: left( "left" )
	, right( "right" )
	, left_alloc( allocation_count( left ) )
	, right_alloc( allocation_count( right ) )
	{}

	static void testit()
	{
	- TestViewOperator_LeftAndRight driver ;
	+ TestViewOperator_LeftAndRight driver;

	- int error_flag = 0 ;
	+ int error_flag = 0;

	- Kokkos::parallel_reduce( 1 , driver , error_flag );
	+ Kokkos::parallel_reduce( 1, driver, error_flag );

	- ASSERT_EQ( error_flag , 0 );
	+ ASSERT_EQ( error_flag, 0 );
	}

	KOKKOS_INLINE_FUNCTION
	- void operator()( const size_type , value_type & update ) const
	+ void operator()( const size_type, value_type & update ) const
	{
	- long offset ;
	+ long offset = -1;

	- offset = -1 ;
	- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
	- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
	+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
	+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
	{
	const long j = & left( i0, i1 ) -
	& left( 0, 0 );
	- if ( j <= offset \|\| left_alloc <= j ) { update \|= 1 ; }
	- offset = j ;
	+ if ( j <= offset \|\| left_alloc <= j ) { update \|= 1; }
	+ offset = j;
	}

	- offset = -1 ;
	- for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
	- for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
	+ offset = -1;
	+
	+ for ( unsigned i0 = 0; i0 < unsigned( right.dimension_0() ); ++i0 )
	+ for ( unsigned i1 = 0; i1 < unsigned( right.dimension_1() ); ++i1 )
	{
	const long j = & right( i0, i1 ) -
	& right( 0, 0 );
	- if ( j <= offset \|\| right_alloc <= j ) { update \|= 2 ; }
	- offset = j ;
	+ if ( j <= offset \|\| right_alloc <= j ) { update \|= 2; }
	+ offset = j;
	}

	- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
	- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
	+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
	+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
	{
	- if ( & left(i0,i1) != & left(i0,i1,0,0,0,0,0,0) ) { update \|= 3 ; }
	- if ( & right(i0,i1) != & right(i0,i1,0,0,0,0,0,0) ) { update \|= 3 ; }
	+ if ( & left( i0, i1 ) != & left( i0, i1, 0, 0, 0, 0, 0, 0 ) ) { update \|= 3; }
	+ if ( & right( i0, i1 ) != & right( i0, i1, 0, 0, 0, 0, 0, 0 ) ) { update \|= 3; }
	}
	}
	};

	-template< class DataType , class DeviceType >
	-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 1 >
	+template< class DataType, class DeviceType >
	+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 1 >
	{
	- typedef typename DeviceType::execution_space execution_space ;
	- typedef typename DeviceType::memory_space memory_space ;
	- typedef typename execution_space::size_type size_type ;
	+ typedef typename DeviceType::execution_space execution_space;
	+ typedef typename DeviceType::memory_space memory_space;
	+ typedef typename execution_space::size_type size_type;

	- typedef int value_type ;
	+ typedef int value_type;

	KOKKOS_INLINE_FUNCTION
	- static void join( volatile value_type & update ,
	+ static void join( volatile value_type & update,
	const volatile value_type & input )
	- { update \|= input ; }
	+ { update \|= input; }

	KOKKOS_INLINE_FUNCTION
	static void init( value_type & update )
	- { update = 0 ; }
	-
	+ { update = 0; }

	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;
	+ typedef Kokkos::View< DataType, Kokkos::LayoutStride, execution_space > stride_view;

	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
	-
	- typedef Kokkos::
	- View< DataType, Kokkos::LayoutStride, execution_space > stride_view ;
	-
	- left_view left ;
	- right_view right ;
	- stride_view left_stride ;
	- stride_view right_stride ;
	- long left_alloc ;
	- long right_alloc ;
	+ left_view left;
	+ right_view right;
	+ stride_view left_stride;
	+ stride_view right_stride;
	+ long left_alloc;
	+ long right_alloc;

	TestViewOperator_LeftAndRight()
	: left( "left" )
	, right( "right" )
	, left_stride( left )
	, right_stride( right )
	, left_alloc( allocation_count( left ) )
	, right_alloc( allocation_count( right ) )
	{}

	static void testit()
	{
	- TestViewOperator_LeftAndRight driver ;
	+ TestViewOperator_LeftAndRight driver;

	- int error_flag = 0 ;
	+ int error_flag = 0;

	- Kokkos::parallel_reduce( 1 , driver , error_flag );
	+ Kokkos::parallel_reduce( 1, driver, error_flag );

	- ASSERT_EQ( error_flag , 0 );
	+ ASSERT_EQ( error_flag, 0 );
	}

	KOKKOS_INLINE_FUNCTION
	- void operator()( const size_type , value_type & update ) const
	+ void operator()( const size_type, value_type & update ) const
	{
	- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
	+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
	{
	- if ( & left(i0) != & left(i0,0,0,0,0,0,0,0) ) { update \|= 3 ; }
	- if ( & right(i0) != & right(i0,0,0,0,0,0,0,0) ) { update \|= 3 ; }
	- if ( & left(i0) != & left_stride(i0) ) { update \|= 4 ; }
	- if ( & right(i0) != & right_stride(i0) ) { update \|= 8 ; }
	+ if ( & left( i0 ) != & left( i0, 0, 0, 0, 0, 0, 0, 0 ) ) { update \|= 3; }
	+ if ( & right( i0 ) != & right( i0, 0, 0, 0, 0, 0, 0, 0 ) ) { update \|= 3; }
	+ if ( & left( i0 ) != & left_stride( i0 ) ) { update \|= 4; }
	+ if ( & right( i0 ) != & right_stride( i0 ) ) { update \|= 8; }
	}
	}
	};

	-template<class Layout, class DeviceType>
	-struct TestViewMirror {
	-
	- template<class MemoryTraits>
	+template< class Layout, class DeviceType >
	+struct TestViewMirror
	+{
	+ template< class MemoryTraits >
	void static test_mirror() {
	- Kokkos::View<double*, Layout, Kokkos::HostSpace> a_org("A",1000);
	- Kokkos::View<double*, Layout, Kokkos::HostSpace, MemoryTraits> a_h = a_org;
	- auto a_h2 = Kokkos::create_mirror(Kokkos::HostSpace(),a_h);
	- auto a_d = Kokkos::create_mirror(DeviceType(),a_h);
	-
	- int equal_ptr_h_h2 = (a_h.data() ==a_h2.data())?1:0;
	- int equal_ptr_h_d = (a_h.data() ==a_d. data())?1:0;
	- int equal_ptr_h2_d = (a_h2.data()==a_d. data())?1:0;
	-
	- ASSERT_EQ(equal_ptr_h_h2,0);
	- ASSERT_EQ(equal_ptr_h_d ,0);
	- ASSERT_EQ(equal_ptr_h2_d,0);
	-
	-
	- ASSERT_EQ(a_h.dimension_0(),a_h2.dimension_0());
	- ASSERT_EQ(a_h.dimension_0(),a_d .dimension_0());
	- }
	+ Kokkos::View< double*, Layout, Kokkos::HostSpace > a_org( "A", 1000 );
	+ Kokkos::View< double*, Layout, Kokkos::HostSpace, MemoryTraits > a_h = a_org;
	+ auto a_h2 = Kokkos::create_mirror( Kokkos::HostSpace(), a_h );
	+ auto a_d = Kokkos::create_mirror( DeviceType(), a_h );

	+ int equal_ptr_h_h2 = ( a_h.data() == a_h2.data() ) ? 1 : 0;
	+ int equal_ptr_h_d = ( a_h.data() == a_d.data() ) ? 1 : 0;
	+ int equal_ptr_h2_d = ( a_h2.data() == a_d.data() ) ? 1 : 0;

	- template<class MemoryTraits>
	- void static test_mirror_view() {
	- Kokkos::View<double*, Layout, Kokkos::HostSpace> a_org("A",1000);
	- Kokkos::View<double*, Layout, Kokkos::HostSpace, MemoryTraits> a_h = a_org;
	- auto a_h2 = Kokkos::create_mirror_view(Kokkos::HostSpace(),a_h);
	- auto a_d = Kokkos::create_mirror_view(DeviceType(),a_h);
	-
	- int equal_ptr_h_h2 = a_h.data() ==a_h2.data()?1:0;
	- int equal_ptr_h_d = a_h.data() ==a_d. data()?1:0;
	- int equal_ptr_h2_d = a_h2.data()==a_d. data()?1:0;
	-
	- int is_same_memspace = std::is_same<Kokkos::HostSpace,typename DeviceType::memory_space>::value?1:0;
	- ASSERT_EQ(equal_ptr_h_h2,1);
	- ASSERT_EQ(equal_ptr_h_d ,is_same_memspace);
	- ASSERT_EQ(equal_ptr_h2_d ,is_same_memspace);
	+ ASSERT_EQ( equal_ptr_h_h2, 0 );
	+ ASSERT_EQ( equal_ptr_h_d, 0 );
	+ ASSERT_EQ( equal_ptr_h2_d, 0 );

	+ ASSERT_EQ( a_h.dimension_0(), a_h2.dimension_0() );
	+ ASSERT_EQ( a_h.dimension_0(), a_d .dimension_0() );
	+ }

	- ASSERT_EQ(a_h.dimension_0(),a_h2.dimension_0());
	- ASSERT_EQ(a_h.dimension_0(),a_d .dimension_0());
	- }
	+ template< class MemoryTraits >
	+ void static test_mirror_view() {
	+ Kokkos::View< double*, Layout, Kokkos::HostSpace > a_org( "A", 1000 );
	+ Kokkos::View< double*, Layout, Kokkos::HostSpace, MemoryTraits > a_h = a_org;
	+ auto a_h2 = Kokkos::create_mirror_view( Kokkos::HostSpace(), a_h );
	+ auto a_d = Kokkos::create_mirror_view( DeviceType(), a_h );
	+
	+ int equal_ptr_h_h2 = a_h.data() == a_h2.data() ? 1 : 0;
	+ int equal_ptr_h_d = a_h.data() == a_d.data() ? 1 : 0;
	+ int equal_ptr_h2_d = a_h2.data() == a_d.data() ? 1 : 0;
	+
	+ int is_same_memspace = std::is_same< Kokkos::HostSpace, typename DeviceType::memory_space >::value ? 1 : 0;
	+ ASSERT_EQ( equal_ptr_h_h2, 1 );
	+ ASSERT_EQ( equal_ptr_h_d, is_same_memspace );
	+ ASSERT_EQ( equal_ptr_h2_d, is_same_memspace );
	+
	+ ASSERT_EQ( a_h.dimension_0(), a_h2.dimension_0() );
	+ ASSERT_EQ( a_h.dimension_0(), a_d .dimension_0() );
	+ }

	void static testit() {
	- test_mirror<Kokkos::MemoryTraits<0>>();
	- test_mirror<Kokkos::MemoryTraits<Kokkos::Unmanaged>>();
	- test_mirror_view<Kokkos::MemoryTraits<0>>();
	- test_mirror_view<Kokkos::MemoryTraits<Kokkos::Unmanaged>>();
	+ test_mirror< Kokkos::MemoryTraits<0> >();
	+ test_mirror< Kokkos::MemoryTraits<Kokkos::Unmanaged> >();
	+ test_mirror_view< Kokkos::MemoryTraits<0> >();
	+ test_mirror_view< Kokkos::MemoryTraits<Kokkos::Unmanaged> >();
	}
	};

	/--------------------------------------------------------------------------/

	template< typename T, class DeviceType >
	class TestViewAPI
	{
	public:
	- typedef DeviceType device ;
	+ typedef DeviceType device;

	- enum { N0 = 1000 ,
	- N1 = 3 ,
	- N2 = 5 ,
	+ enum { N0 = 1000,
	+ N1 = 3,
	+ N2 = 5,
	N3 = 7 };

	- typedef Kokkos::View< T , device > dView0 ;
	- typedef Kokkos::View< T* , device > dView1 ;
	- typedef Kokkos::View< T*[N1] , device > dView2 ;
	- typedef Kokkos::View< T*[N1][N2] , device > dView3 ;
	- typedef Kokkos::View< T*[N1][N2][N3] , device > dView4 ;
	- typedef Kokkos::View< const T*[N1][N2][N3] , device > const_dView4 ;
	-
	- typedef Kokkos::View< T****, device, Kokkos::MemoryUnmanaged > dView4_unmanaged ;
	-
	- typedef typename dView0::host_mirror_space host ;
	+ typedef Kokkos::View< T, device > dView0;
	+ typedef Kokkos::View< T*, device > dView1;
	+ typedef Kokkos::View< T*[N1], device > dView2;
	+ typedef Kokkos::View< T*[N1][N2], device > dView3;
	+ typedef Kokkos::View< T*[N1][N2][N3], device > dView4;
	+ typedef Kokkos::View< const T*[N1][N2][N3], device > const_dView4;
	+ typedef Kokkos::View< T****, device, Kokkos::MemoryUnmanaged > dView4_unmanaged;
	+ typedef typename dView0::host_mirror_space host;

	TestViewAPI()
	{
	run_test_mirror();
	run_test();
	run_test_scalar();
	run_test_const();
	run_test_subview();
	run_test_subview_strided();
	run_test_vector();

	- TestViewOperator< T , device >::testit();
	- TestViewOperator_LeftAndRight< int[2][3][4][2][3][4][2][3] , device >::testit();
	- TestViewOperator_LeftAndRight< int[2][3][4][2][3][4][2] , device >::testit();
	- TestViewOperator_LeftAndRight< int[2][3][4][2][3][4] , device >::testit();
	- TestViewOperator_LeftAndRight< int[2][3][4][2][3] , device >::testit();
	- TestViewOperator_LeftAndRight< int[2][3][4][2] , device >::testit();
	- TestViewOperator_LeftAndRight< int[2][3][4] , device >::testit();
	- TestViewOperator_LeftAndRight< int[2][3] , device >::testit();
	- TestViewOperator_LeftAndRight< int[2] , device >::testit();
	- TestViewMirror<Kokkos::LayoutLeft, device >::testit();
	- TestViewMirror<Kokkos::LayoutRight, device >::testit();
	-
	+ TestViewOperator< T, device >::testit();
	+ TestViewOperator_LeftAndRight< int[2][3][4][2][3][4][2][3], device >::testit();
	+ TestViewOperator_LeftAndRight< int[2][3][4][2][3][4][2], device >::testit();
	+ TestViewOperator_LeftAndRight< int[2][3][4][2][3][4], device >::testit();
	+ TestViewOperator_LeftAndRight< int[2][3][4][2][3], device >::testit();
	+ TestViewOperator_LeftAndRight< int[2][3][4][2], device >::testit();
	+ TestViewOperator_LeftAndRight< int[2][3][4], device >::testit();
	+ TestViewOperator_LeftAndRight< int[2][3], device >::testit();
	+ TestViewOperator_LeftAndRight< int[2], device >::testit();
	+ TestViewMirror< Kokkos::LayoutLeft, device >::testit();
	+ TestViewMirror< Kokkos::LayoutRight, device >::testit();
	}

	static void run_test_mirror()
	{
	- typedef Kokkos::View< int , host > view_type ;
	- typedef typename view_type::HostMirror mirror_type ;
	+ typedef Kokkos::View< int, host > view_type;
	+ typedef typename view_type::HostMirror mirror_type;

	- static_assert( std::is_same< typename view_type::memory_space
	- , typename mirror_type::memory_space
	- >::value , "" );
	+ static_assert( std::is_same< typename view_type::memory_space, typename mirror_type::memory_space >::value, "" );

	- view_type a("a");
	- mirror_type am = Kokkos::create_mirror_view(a);
	- mirror_type ax = Kokkos::create_mirror(a);
	- ASSERT_EQ( & a() , & am() );
	+ view_type a( "a" );
	+ mirror_type am = Kokkos::create_mirror_view( a );
	+ mirror_type ax = Kokkos::create_mirror( a );
	+ ASSERT_EQ( & a(), & am() );
	}

	static void run_test_scalar()
	{
	- typedef typename dView0::HostMirror hView0 ;
	+ typedef typename dView0::HostMirror hView0;

	- dView0 dx , dy ;
	- hView0 hx , hy ;
	+ dView0 dx, dy;
	+ hView0 hx, hy;

	dx = dView0( "dx" );
	dy = dView0( "dy" );

	hx = Kokkos::create_mirror( dx );
	hy = Kokkos::create_mirror( dy );

	- hx() = 1 ;
	+ hx() = 1;

	- Kokkos::deep_copy( dx , hx );
	- Kokkos::deep_copy( dy , dx );
	- Kokkos::deep_copy( hy , dy );
	+ Kokkos::deep_copy( dx, hx );
	+ Kokkos::deep_copy( dy, dx );
	+ Kokkos::deep_copy( hy, dy );

	ASSERT_EQ( hx(), hy() );
	}

	static void run_test()
	{
	// mfh 14 Feb 2014: This test doesn't actually create instances of
	// these types. In order to avoid "declared but unused typedef"
	// warnings, we declare empty instances of these types, with the
	// usual "(void)" marker to avoid compiler warnings for unused
	// variables.

	- typedef typename dView0::HostMirror hView0 ;
	- typedef typename dView1::HostMirror hView1 ;
	- typedef typename dView2::HostMirror hView2 ;
	- typedef typename dView3::HostMirror hView3 ;
	- typedef typename dView4::HostMirror hView4 ;
	+ typedef typename dView0::HostMirror hView0;
	+ typedef typename dView1::HostMirror hView1;
	+ typedef typename dView2::HostMirror hView2;
	+ typedef typename dView3::HostMirror hView3;
	+ typedef typename dView4::HostMirror hView4;

	{
	hView0 thing;
	(void) thing;
	}
	{
	hView1 thing;
	(void) thing;
	}
	{
	hView2 thing;
	(void) thing;
	}
	{
	hView3 thing;
	(void) thing;
	}
	{
	hView4 thing;
	(void) thing;
	}

	- dView4 dx , dy , dz ;
	- hView4 hx , hy , hz ;
	+ dView4 dx, dy, dz;
	+ hView4 hx, hy, hz;

	ASSERT_TRUE( dx.ptr_on_device() == 0 );
	ASSERT_TRUE( dy.ptr_on_device() == 0 );
	ASSERT_TRUE( dz.ptr_on_device() == 0 );
	ASSERT_TRUE( hx.ptr_on_device() == 0 );
	ASSERT_TRUE( hy.ptr_on_device() == 0 );
	ASSERT_TRUE( hz.ptr_on_device() == 0 );
	- ASSERT_EQ( dx.dimension_0() , 0u );
	- ASSERT_EQ( dy.dimension_0() , 0u );
	- ASSERT_EQ( dz.dimension_0() , 0u );
	- ASSERT_EQ( hx.dimension_0() , 0u );
	- ASSERT_EQ( hy.dimension_0() , 0u );
	- ASSERT_EQ( hz.dimension_0() , 0u );
	- ASSERT_EQ( dx.dimension_1() , unsigned(N1) );
	- ASSERT_EQ( dy.dimension_1() , unsigned(N1) );
	- ASSERT_EQ( dz.dimension_1() , unsigned(N1) );
	- ASSERT_EQ( hx.dimension_1() , unsigned(N1) );
	- ASSERT_EQ( hy.dimension_1() , unsigned(N1) );
	- ASSERT_EQ( hz.dimension_1() , unsigned(N1) );
	-
	- dx = dView4( "dx" , N0 );
	- dy = dView4( "dy" , N0 );
	-
	- ASSERT_EQ( dx.use_count() , size_t(1) );
	+ ASSERT_EQ( dx.dimension_0(), 0u );
	+ ASSERT_EQ( dy.dimension_0(), 0u );
	+ ASSERT_EQ( dz.dimension_0(), 0u );
	+ ASSERT_EQ( hx.dimension_0(), 0u );
	+ ASSERT_EQ( hy.dimension_0(), 0u );
	+ ASSERT_EQ( hz.dimension_0(), 0u );
	+ ASSERT_EQ( dx.dimension_1(), unsigned( N1 ) );
	+ ASSERT_EQ( dy.dimension_1(), unsigned( N1 ) );
	+ ASSERT_EQ( dz.dimension_1(), unsigned( N1 ) );
	+ ASSERT_EQ( hx.dimension_1(), unsigned( N1 ) );
	+ ASSERT_EQ( hy.dimension_1(), unsigned( N1 ) );
	+ ASSERT_EQ( hz.dimension_1(), unsigned( N1 ) );
	+
	+ dx = dView4( "dx", N0 );
	+ dy = dView4( "dy", N0 );
	+
	+ ASSERT_EQ( dx.use_count(), size_t( 1 ) );

	dView4_unmanaged unmanaged_dx = dx;
	- ASSERT_EQ( dx.use_count() , size_t(1) );
	+ ASSERT_EQ( dx.use_count(), size_t( 1 ) );

	- dView4_unmanaged unmanaged_from_ptr_dx = dView4_unmanaged(dx.ptr_on_device(),
	- dx.dimension_0(),
	- dx.dimension_1(),
	- dx.dimension_2(),
	- dx.dimension_3());
	+ dView4_unmanaged unmanaged_from_ptr_dx = dView4_unmanaged( dx.ptr_on_device(),
	+ dx.dimension_0(),
	+ dx.dimension_1(),
	+ dx.dimension_2(),
	+ dx.dimension_3() );

	{
	- // Destruction of this view should be harmless
	- const_dView4 unmanaged_from_ptr_const_dx( dx.ptr_on_device() ,
	- dx.dimension_0() ,
	- dx.dimension_1() ,
	- dx.dimension_2() ,
	+ // Destruction of this view should be harmless.
	+ const_dView4 unmanaged_from_ptr_const_dx( dx.ptr_on_device(),
	+ dx.dimension_0(),
	+ dx.dimension_1(),
	+ dx.dimension_2(),
	dx.dimension_3() );
	}

	- const_dView4 const_dx = dx ;
	- ASSERT_EQ( dx.use_count() , size_t(2) );
	+ const_dView4 const_dx = dx;
	+ ASSERT_EQ( dx.use_count(), size_t( 2 ) );

	{
	const_dView4 const_dx2;
	const_dx2 = const_dx;
	- ASSERT_EQ( dx.use_count() , size_t(3) );
	+ ASSERT_EQ( dx.use_count(), size_t( 3 ) );

	const_dx2 = dy;
	- ASSERT_EQ( dx.use_count() , size_t(2) );
	+ ASSERT_EQ( dx.use_count(), size_t( 2 ) );

	- const_dView4 const_dx3(dx);
	- ASSERT_EQ( dx.use_count() , size_t(3) );
	-
	- dView4_unmanaged dx4_unmanaged(dx);
	- ASSERT_EQ( dx.use_count() , size_t(3) );
	- }
	+ const_dView4 const_dx3( dx );
	+ ASSERT_EQ( dx.use_count(), size_t( 3 ) );

	- ASSERT_EQ( dx.use_count() , size_t(2) );
	+ dView4_unmanaged dx4_unmanaged( dx );
	+ ASSERT_EQ( dx.use_count(), size_t( 3 ) );
	+ }

	+ ASSERT_EQ( dx.use_count(), size_t( 2 ) );

	ASSERT_FALSE( dx.ptr_on_device() == 0 );
	ASSERT_FALSE( const_dx.ptr_on_device() == 0 );
	ASSERT_FALSE( unmanaged_dx.ptr_on_device() == 0 );
	ASSERT_FALSE( unmanaged_from_ptr_dx.ptr_on_device() == 0 );
	ASSERT_FALSE( dy.ptr_on_device() == 0 );
	- ASSERT_NE( dx , dy );
	+ ASSERT_NE( dx, dy );

	- ASSERT_EQ( dx.dimension_0() , unsigned(N0) );
	- ASSERT_EQ( dx.dimension_1() , unsigned(N1) );
	- ASSERT_EQ( dx.dimension_2() , unsigned(N2) );
	- ASSERT_EQ( dx.dimension_3() , unsigned(N3) );
	+ ASSERT_EQ( dx.dimension_0(), unsigned( N0 ) );
	+ ASSERT_EQ( dx.dimension_1(), unsigned( N1 ) );
	+ ASSERT_EQ( dx.dimension_2(), unsigned( N2 ) );
	+ ASSERT_EQ( dx.dimension_3(), unsigned( N3 ) );

	- ASSERT_EQ( dy.dimension_0() , unsigned(N0) );
	- ASSERT_EQ( dy.dimension_1() , unsigned(N1) );
	- ASSERT_EQ( dy.dimension_2() , unsigned(N2) );
	- ASSERT_EQ( dy.dimension_3() , unsigned(N3) );
	+ ASSERT_EQ( dy.dimension_0(), unsigned( N0 ) );
	+ ASSERT_EQ( dy.dimension_1(), unsigned( N1 ) );
	+ ASSERT_EQ( dy.dimension_2(), unsigned( N2 ) );
	+ ASSERT_EQ( dy.dimension_3(), unsigned( N3 ) );

	- ASSERT_EQ( unmanaged_from_ptr_dx.capacity(),unsigned(N0)unsigned(N1)unsigned(N2)*unsigned(N3) );
	+ ASSERT_EQ( unmanaged_from_ptr_dx.capacity(), unsigned( N0 ) * unsigned( N1 ) * unsigned( N2 ) * unsigned( N3 ) );

	hx = Kokkos::create_mirror( dx );
	hy = Kokkos::create_mirror( dy );

	- // T v1 = hx() ; // Generates compile error as intended
	- // T v2 = hx(0,0) ; // Generates compile error as intended
	- // hx(0,0) = v2 ; // Generates compile error as intended
	+ // T v1 = hx(); // Generates compile error as intended.
	+ // T v2 = hx( 0, 0 ); // Generates compile error as intended.
	+ // hx( 0, 0 ) = v2; // Generates compile error as intended.

	// Testing with asynchronous deep copy with respect to device
	{
	- size_t count = 0 ;
	- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
	- for ( size_t i1 = 0 ; i1 < hx.dimension_1() ; ++i1 ) {
	- for ( size_t i2 = 0 ; i2 < hx.dimension_2() ; ++i2 ) {
	- for ( size_t i3 = 0 ; i3 < hx.dimension_3() ; ++i3 ) {
	- hx(ip,i1,i2,i3) = ++count ;
	- }}}}
	-
	-
	- Kokkos::deep_copy(typename hView4::execution_space(), dx , hx );
	- Kokkos::deep_copy(typename hView4::execution_space(), dy , dx );
	- Kokkos::deep_copy(typename hView4::execution_space(), hy , dy );
	-
	- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- { ASSERT_EQ( hx(ip,i1,i2,i3) , hy(ip,i1,i2,i3) ); }
	- }}}}
	-
	- Kokkos::deep_copy(typename hView4::execution_space(), dx , T(0) );
	- Kokkos::deep_copy(typename hView4::execution_space(), hx , dx );
	-
	- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- { ASSERT_EQ( hx(ip,i1,i2,i3) , T(0) ); }
	- }}}}
	+ size_t count = 0;
	+
	+ for ( size_t ip = 0; ip < N0; ++ip )
	+ for ( size_t i1 = 0; i1 < hx.dimension_1(); ++i1 )
	+ for ( size_t i2 = 0; i2 < hx.dimension_2(); ++i2 )
	+ for ( size_t i3 = 0; i3 < hx.dimension_3(); ++i3 )
	+ {
	+ hx( ip, i1, i2, i3 ) = ++count;
	+ }
	+
	+ Kokkos::deep_copy( typename hView4::execution_space(), dx, hx );
	+ Kokkos::deep_copy( typename hView4::execution_space(), dy, dx );
	+ Kokkos::deep_copy( typename hView4::execution_space(), hy, dy );
	+
	+ for ( size_t ip = 0; ip < N0; ++ip )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ {
	+ ASSERT_EQ( hx( ip, i1, i2, i3 ), hy( ip, i1, i2, i3 ) );
	+ }
	+
	+ Kokkos::deep_copy( typename hView4::execution_space(), dx, T( 0 ) );
	+ Kokkos::deep_copy( typename hView4::execution_space(), hx, dx );
	+
	+ for ( size_t ip = 0; ip < N0; ++ip )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ {
	+ ASSERT_EQ( hx( ip, i1, i2, i3 ), T( 0 ) );
	+ }
	}

	- // Testing with asynchronous deep copy with respect to host
	+ // Testing with asynchronous deep copy with respect to host.
	{
	- size_t count = 0 ;
	- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
	- for ( size_t i1 = 0 ; i1 < hx.dimension_1() ; ++i1 ) {
	- for ( size_t i2 = 0 ; i2 < hx.dimension_2() ; ++i2 ) {
	- for ( size_t i3 = 0 ; i3 < hx.dimension_3() ; ++i3 ) {
	- hx(ip,i1,i2,i3) = ++count ;
	- }}}}
	-
	- Kokkos::deep_copy(typename dView4::execution_space(), dx , hx );
	- Kokkos::deep_copy(typename dView4::execution_space(), dy , dx );
	- Kokkos::deep_copy(typename dView4::execution_space(), hy , dy );
	-
	- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- { ASSERT_EQ( hx(ip,i1,i2,i3) , hy(ip,i1,i2,i3) ); }
	- }}}}
	-
	- Kokkos::deep_copy(typename dView4::execution_space(), dx , T(0) );
	- Kokkos::deep_copy(typename dView4::execution_space(), hx , dx );
	-
	- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- { ASSERT_EQ( hx(ip,i1,i2,i3) , T(0) ); }
	- }}}}
	+ size_t count = 0;
	+
	+ for ( size_t ip = 0; ip < N0; ++ip )
	+ for ( size_t i1 = 0; i1 < hx.dimension_1(); ++i1 )
	+ for ( size_t i2 = 0; i2 < hx.dimension_2(); ++i2 )
	+ for ( size_t i3 = 0; i3 < hx.dimension_3(); ++i3 )
	+ {
	+ hx( ip, i1, i2, i3 ) = ++count;
	+ }
	+
	+ Kokkos::deep_copy( typename dView4::execution_space(), dx, hx );
	+ Kokkos::deep_copy( typename dView4::execution_space(), dy, dx );
	+ Kokkos::deep_copy( typename dView4::execution_space(), hy, dy );
	+
	+ for ( size_t ip = 0; ip < N0; ++ip )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ {
	+ ASSERT_EQ( hx( ip, i1, i2, i3 ), hy( ip, i1, i2, i3 ) );
	+ }
	+
	+ Kokkos::deep_copy( typename dView4::execution_space(), dx, T( 0 ) );
	+ Kokkos::deep_copy( typename dView4::execution_space(), hx, dx );
	+
	+ for ( size_t ip = 0; ip < N0; ++ip )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ {
	+ ASSERT_EQ( hx( ip, i1, i2, i3 ), T( 0 ) );
	+ }
	}

	- // Testing with synchronous deep copy
	+ // Testing with synchronous deep copy.
	{
	- size_t count = 0 ;
	- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
	- for ( size_t i1 = 0 ; i1 < hx.dimension_1() ; ++i1 ) {
	- for ( size_t i2 = 0 ; i2 < hx.dimension_2() ; ++i2 ) {
	- for ( size_t i3 = 0 ; i3 < hx.dimension_3() ; ++i3 ) {
	- hx(ip,i1,i2,i3) = ++count ;
	- }}}}
	-
	- Kokkos::deep_copy( dx , hx );
	- Kokkos::deep_copy( dy , dx );
	- Kokkos::deep_copy( hy , dy );
	-
	- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- { ASSERT_EQ( hx(ip,i1,i2,i3) , hy(ip,i1,i2,i3) ); }
	- }}}}
	-
	- Kokkos::deep_copy( dx , T(0) );
	- Kokkos::deep_copy( hx , dx );
	-
	- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- { ASSERT_EQ( hx(ip,i1,i2,i3) , T(0) ); }
	- }}}}
	+ size_t count = 0;
	+
	+ for ( size_t ip = 0; ip < N0; ++ip )
	+ for ( size_t i1 = 0; i1 < hx.dimension_1(); ++i1 )
	+ for ( size_t i2 = 0; i2 < hx.dimension_2(); ++i2 )
	+ for ( size_t i3 = 0; i3 < hx.dimension_3(); ++i3 )
	+ {
	+ hx( ip, i1, i2, i3 ) = ++count;
	+ }
	+
	+ Kokkos::deep_copy( dx, hx );
	+ Kokkos::deep_copy( dy, dx );
	+ Kokkos::deep_copy( hy, dy );
	+
	+ for ( size_t ip = 0; ip < N0; ++ip )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ {
	+ ASSERT_EQ( hx( ip, i1, i2, i3 ), hy( ip, i1, i2, i3 ) );
	+ }
	+
	+ Kokkos::deep_copy( dx, T( 0 ) );
	+ Kokkos::deep_copy( hx, dx );
	+
	+ for ( size_t ip = 0; ip < N0; ++ip )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ {
	+ ASSERT_EQ( hx( ip, i1, i2, i3 ), T( 0 ) );
	+ }
	}
	- dz = dx ; ASSERT_EQ( dx, dz); ASSERT_NE( dy, dz);
	- dz = dy ; ASSERT_EQ( dy, dz); ASSERT_NE( dx, dz);
	+
	+ dz = dx;
	+ ASSERT_EQ( dx, dz );
	+ ASSERT_NE( dy, dz );
	+
	+ dz = dy;
	+ ASSERT_EQ( dy, dz );
	+ ASSERT_NE( dx, dz );

	dx = dView4();
	ASSERT_TRUE( dx.ptr_on_device() == 0 );
	ASSERT_FALSE( dy.ptr_on_device() == 0 );
	ASSERT_FALSE( dz.ptr_on_device() == 0 );
	+
	dy = dView4();
	ASSERT_TRUE( dx.ptr_on_device() == 0 );
	ASSERT_TRUE( dy.ptr_on_device() == 0 );
	ASSERT_FALSE( dz.ptr_on_device() == 0 );
	+
	dz = dView4();
	ASSERT_TRUE( dx.ptr_on_device() == 0 );
	ASSERT_TRUE( dy.ptr_on_device() == 0 );
	ASSERT_TRUE( dz.ptr_on_device() == 0 );
	}

	- typedef T DataType[2] ;
	+ typedef T DataType[2];

	static void
	check_auto_conversion_to_const(
	- const Kokkos::View< const DataType , device > & arg_const ,
	- const Kokkos::View< DataType , device > & arg )
	+ const Kokkos::View< const DataType, device > & arg_const,
	+ const Kokkos::View< DataType, device > & arg )
	{
	ASSERT_TRUE( arg_const == arg );
	}

	static void run_test_const()
	{
	- typedef Kokkos::View< DataType , device > typeX ;
	- typedef Kokkos::View< const DataType , device > const_typeX ;
	- typedef Kokkos::View< const DataType , device , Kokkos::MemoryRandomAccess > const_typeR ;
	+ typedef Kokkos::View< DataType, device > typeX;
	+ typedef Kokkos::View< const DataType, device > const_typeX;
	+ typedef Kokkos::View< const DataType, device, Kokkos::MemoryRandomAccess > const_typeR;
	+
	typeX x( "X" );
	- const_typeX xc = x ;
	- const_typeR xr = x ;
	+ const_typeX xc = x;
	+ const_typeR xr = x;

	ASSERT_TRUE( xc == x );
	ASSERT_TRUE( x == xc );

	// For CUDA the constant random access View does not return
	// an lvalue reference due to retrieving through texture cache
	// therefore not allowed to query the underlying pointer.
	#if defined( KOKKOS_ENABLE_CUDA )
	- if ( ! std::is_same< typename device::execution_space , Kokkos::Cuda >::value )
	+ if ( !std::is_same< typename device::execution_space, Kokkos::Cuda >::value )
	#endif
	{
	ASSERT_TRUE( x.ptr_on_device() == xr.ptr_on_device() );
	}

	- // typeX xf = xc ; // setting non-const from const must not compile
	+ // typeX xf = xc; // Setting non-const from const must not compile.

	- check_auto_conversion_to_const( x , x );
	+ check_auto_conversion_to_const( x, x );
	}

	static void run_test_subview()
	{
	- typedef Kokkos::View< const T , device > sView ;
	+ typedef Kokkos::View< const T, device > sView;

	dView0 d0( "d0" );
	- dView1 d1( "d1" , N0 );
	- dView2 d2( "d2" , N0 );
	- dView3 d3( "d3" , N0 );
	- dView4 d4( "d4" , N0 );
	-
	- sView s0 = d0 ;
	- sView s1 = Kokkos::subview( d1 , 1 );
	- sView s2 = Kokkos::subview( d2 , 1 , 1 );
	- sView s3 = Kokkos::subview( d3 , 1 , 1 , 1 );
	- sView s4 = Kokkos::subview( d4 , 1 , 1 , 1 , 1 );
	+ dView1 d1( "d1", N0 );
	+ dView2 d2( "d2", N0 );
	+ dView3 d3( "d3", N0 );
	+ dView4 d4( "d4", N0 );
	+
	+ sView s0 = d0;
	+ sView s1 = Kokkos::subview( d1, 1 );
	+ sView s2 = Kokkos::subview( d2, 1, 1 );
	+ sView s3 = Kokkos::subview( d3, 1, 1, 1 );
	+ sView s4 = Kokkos::subview( d4, 1, 1, 1, 1 );
	}

	static void run_test_subview_strided()
	{
	- typedef Kokkos::View< int **** , Kokkos::LayoutLeft , host > view_left_4 ;
	- typedef Kokkos::View< int **** , Kokkos::LayoutRight , host > view_right_4 ;
	- typedef Kokkos::View< int ** , Kokkos::LayoutLeft , host > view_left_2 ;
	- typedef Kokkos::View< int ** , Kokkos::LayoutRight , host > view_right_2 ;
	-
	- typedef Kokkos::View< int * , Kokkos::LayoutStride , host > view_stride_1 ;
	- typedef Kokkos::View< int ** , Kokkos::LayoutStride , host > view_stride_2 ;
	-
	- view_left_2 xl2("xl2", 100 , 200 );
	- view_right_2 xr2("xr2", 100 , 200 );
	- view_stride_1 yl1 = Kokkos::subview( xl2 , 0 , Kokkos::ALL() );
	- view_stride_1 yl2 = Kokkos::subview( xl2 , 1 , Kokkos::ALL() );
	- view_stride_1 yr1 = Kokkos::subview( xr2 , 0 , Kokkos::ALL() );
	- view_stride_1 yr2 = Kokkos::subview( xr2 , 1 , Kokkos::ALL() );
	-
	- ASSERT_EQ( yl1.dimension_0() , xl2.dimension_1() );
	- ASSERT_EQ( yl2.dimension_0() , xl2.dimension_1() );
	- ASSERT_EQ( yr1.dimension_0() , xr2.dimension_1() );
	- ASSERT_EQ( yr2.dimension_0() , xr2.dimension_1() );
	-
	- ASSERT_EQ( & yl1(0) - & xl2(0,0) , 0 );
	- ASSERT_EQ( & yl2(0) - & xl2(1,0) , 0 );
	- ASSERT_EQ( & yr1(0) - & xr2(0,0) , 0 );
	- ASSERT_EQ( & yr2(0) - & xr2(1,0) , 0 );
	-
	- view_left_4 xl4( "xl4" , 10 , 20 , 30 , 40 );
	- view_right_4 xr4( "xr4" , 10 , 20 , 30 , 40 );
	-
	- view_stride_2 yl4 = Kokkos::subview( xl4 , 1 , Kokkos::ALL() , 2 , Kokkos::ALL() );
	- view_stride_2 yr4 = Kokkos::subview( xr4 , 1 , Kokkos::ALL() , 2 , Kokkos::ALL() );
	-
	- ASSERT_EQ( yl4.dimension_0() , xl4.dimension_1() );
	- ASSERT_EQ( yl4.dimension_1() , xl4.dimension_3() );
	- ASSERT_EQ( yr4.dimension_0() , xr4.dimension_1() );
	- ASSERT_EQ( yr4.dimension_1() , xr4.dimension_3() );
	-
	- ASSERT_EQ( & yl4(4,4) - & xl4(1,4,2,4) , 0 );
	- ASSERT_EQ( & yr4(4,4) - & xr4(1,4,2,4) , 0 );
	+ typedef Kokkos::View< int ****, Kokkos::LayoutLeft , host > view_left_4;
	+ typedef Kokkos::View< int ****, Kokkos::LayoutRight, host > view_right_4;
	+ typedef Kokkos::View< int ** , Kokkos::LayoutLeft , host > view_left_2;
	+ typedef Kokkos::View< int ** , Kokkos::LayoutRight, host > view_right_2;
	+
	+ typedef Kokkos::View< int * , Kokkos::LayoutStride, host > view_stride_1;
	+ typedef Kokkos::View< int **, Kokkos::LayoutStride, host > view_stride_2;
	+
	+ view_left_2 xl2( "xl2", 100, 200 );
	+ view_right_2 xr2( "xr2", 100, 200 );
	+ view_stride_1 yl1 = Kokkos::subview( xl2, 0, Kokkos::ALL() );
	+ view_stride_1 yl2 = Kokkos::subview( xl2, 1, Kokkos::ALL() );
	+ view_stride_1 yr1 = Kokkos::subview( xr2, 0, Kokkos::ALL() );
	+ view_stride_1 yr2 = Kokkos::subview( xr2, 1, Kokkos::ALL() );
	+
	+ ASSERT_EQ( yl1.dimension_0(), xl2.dimension_1() );
	+ ASSERT_EQ( yl2.dimension_0(), xl2.dimension_1() );
	+ ASSERT_EQ( yr1.dimension_0(), xr2.dimension_1() );
	+ ASSERT_EQ( yr2.dimension_0(), xr2.dimension_1() );
	+
	+ ASSERT_EQ( & yl1( 0 ) - & xl2( 0, 0 ), 0 );
	+ ASSERT_EQ( & yl2( 0 ) - & xl2( 1, 0 ), 0 );
	+ ASSERT_EQ( & yr1( 0 ) - & xr2( 0, 0 ), 0 );
	+ ASSERT_EQ( & yr2( 0 ) - & xr2( 1, 0 ), 0 );
	+
	+ view_left_4 xl4( "xl4", 10, 20, 30, 40 );
	+ view_right_4 xr4( "xr4", 10, 20, 30, 40 );
	+
	+ view_stride_2 yl4 = Kokkos::subview( xl4, 1, Kokkos::ALL(), 2, Kokkos::ALL() );
	+ view_stride_2 yr4 = Kokkos::subview( xr4, 1, Kokkos::ALL(), 2, Kokkos::ALL() );
	+
	+ ASSERT_EQ( yl4.dimension_0(), xl4.dimension_1() );
	+ ASSERT_EQ( yl4.dimension_1(), xl4.dimension_3() );
	+ ASSERT_EQ( yr4.dimension_0(), xr4.dimension_1() );
	+ ASSERT_EQ( yr4.dimension_1(), xr4.dimension_3() );
	+
	+ ASSERT_EQ( & yl4( 4, 4 ) - & xl4( 1, 4, 2, 4 ), 0 );
	+ ASSERT_EQ( & yr4( 4, 4 ) - & xr4( 1, 4, 2, 4 ), 0 );
	}

	static void run_test_vector()
	{
	- static const unsigned Length = 1000 , Count = 8 ;
	+ static const unsigned Length = 1000, Count = 8;

	- typedef Kokkos::View< T* , Kokkos::LayoutLeft , host > vector_type ;
	- typedef Kokkos::View< T** , Kokkos::LayoutLeft , host > multivector_type ;
	+ typedef Kokkos::View< T*, Kokkos::LayoutLeft, host > vector_type;
	+ typedef Kokkos::View< T**, Kokkos::LayoutLeft, host > multivector_type;

	- typedef Kokkos::View< T* , Kokkos::LayoutRight , host > vector_right_type ;
	- typedef Kokkos::View< T** , Kokkos::LayoutRight , host > multivector_right_type ;
	+ typedef Kokkos::View< T*, Kokkos::LayoutRight, host > vector_right_type;
	+ typedef Kokkos::View< T**, Kokkos::LayoutRight, host > multivector_right_type;

	- typedef Kokkos::View< const T* , Kokkos::LayoutRight, host > const_vector_right_type ;
	- typedef Kokkos::View< const T* , Kokkos::LayoutLeft , host > const_vector_type ;
	- typedef Kokkos::View< const T** , Kokkos::LayoutLeft , host > const_multivector_type ;
	+ typedef Kokkos::View< const T*, Kokkos::LayoutRight, host > const_vector_right_type;
	+ typedef Kokkos::View< const T*, Kokkos::LayoutLeft, host > const_vector_type;
	+ typedef Kokkos::View< const T**, Kokkos::LayoutLeft, host > const_multivector_type;

	- multivector_type mv = multivector_type( "mv" , Length , Count );
	- multivector_right_type mv_right = multivector_right_type( "mv" , Length , Count );
	+ multivector_type mv = multivector_type( "mv", Length, Count );
	+ multivector_right_type mv_right = multivector_right_type( "mv", Length, Count );

	- vector_type v1 = Kokkos::subview( mv , Kokkos::ALL() , 0 );
	- vector_type v2 = Kokkos::subview( mv , Kokkos::ALL() , 1 );
	- vector_type v3 = Kokkos::subview( mv , Kokkos::ALL() , 2 );
	+ vector_type v1 = Kokkos::subview( mv, Kokkos::ALL(), 0 );
	+ vector_type v2 = Kokkos::subview( mv, Kokkos::ALL(), 1 );
	+ vector_type v3 = Kokkos::subview( mv, Kokkos::ALL(), 2 );

	- vector_type rv1 = Kokkos::subview( mv_right , 0 , Kokkos::ALL() );
	- vector_type rv2 = Kokkos::subview( mv_right , 1 , Kokkos::ALL() );
	- vector_type rv3 = Kokkos::subview( mv_right , 2 , Kokkos::ALL() );
	+ vector_type rv1 = Kokkos::subview( mv_right, 0, Kokkos::ALL() );
	+ vector_type rv2 = Kokkos::subview( mv_right, 1, Kokkos::ALL() );
	+ vector_type rv3 = Kokkos::subview( mv_right, 2, Kokkos::ALL() );

	- multivector_type mv1 = Kokkos::subview( mv , std::make_pair( 1 , 998 ) ,
	- std::make_pair( 2 , 5 ) );
	+ multivector_type mv1 = Kokkos::subview( mv, std::make_pair( 1, 998 ),
	+ std::make_pair( 2, 5 ) );

	- multivector_right_type mvr1 =
	- Kokkos::subview( mv_right ,
	- std::make_pair( 1 , 998 ) ,
	- std::make_pair( 2 , 5 ) );
	+ multivector_right_type mvr1 = Kokkos::subview( mv_right, std::make_pair( 1, 998 ),
	+ std::make_pair( 2, 5 ) );

	- const_vector_type cv1 = Kokkos::subview( mv , Kokkos::ALL(), 0 );
	- const_vector_type cv2 = Kokkos::subview( mv , Kokkos::ALL(), 1 );
	- const_vector_type cv3 = Kokkos::subview( mv , Kokkos::ALL(), 2 );
	+ const_vector_type cv1 = Kokkos::subview( mv, Kokkos::ALL(), 0 );
	+ const_vector_type cv2 = Kokkos::subview( mv, Kokkos::ALL(), 1 );
	+ const_vector_type cv3 = Kokkos::subview( mv, Kokkos::ALL(), 2 );

	- vector_right_type vr1 = Kokkos::subview( mv , Kokkos::ALL() , 0 );
	- vector_right_type vr2 = Kokkos::subview( mv , Kokkos::ALL() , 1 );
	- vector_right_type vr3 = Kokkos::subview( mv , Kokkos::ALL() , 2 );
	+ vector_right_type vr1 = Kokkos::subview( mv, Kokkos::ALL(), 0 );
	+ vector_right_type vr2 = Kokkos::subview( mv, Kokkos::ALL(), 1 );
	+ vector_right_type vr3 = Kokkos::subview( mv, Kokkos::ALL(), 2 );

	- const_vector_right_type cvr1 = Kokkos::subview( mv , Kokkos::ALL() , 0 );
	- const_vector_right_type cvr2 = Kokkos::subview( mv , Kokkos::ALL() , 1 );
	- const_vector_right_type cvr3 = Kokkos::subview( mv , Kokkos::ALL() , 2 );
	+ const_vector_right_type cvr1 = Kokkos::subview( mv, Kokkos::ALL(), 0 );
	+ const_vector_right_type cvr2 = Kokkos::subview( mv, Kokkos::ALL(), 1 );
	+ const_vector_right_type cvr3 = Kokkos::subview( mv, Kokkos::ALL(), 2 );

	- ASSERT_TRUE( & v1[0] == & v1(0) );
	- ASSERT_TRUE( & v1[0] == & mv(0,0) );
	- ASSERT_TRUE( & v2[0] == & mv(0,1) );
	- ASSERT_TRUE( & v3[0] == & mv(0,2) );
	+ ASSERT_TRUE( & v1[0] == & v1( 0 ) );
	+ ASSERT_TRUE( & v1[0] == & mv( 0, 0 ) );
	+ ASSERT_TRUE( & v2[0] == & mv( 0, 1 ) );
	+ ASSERT_TRUE( & v3[0] == & mv( 0, 2 ) );

	- ASSERT_TRUE( & cv1[0] == & mv(0,0) );
	- ASSERT_TRUE( & cv2[0] == & mv(0,1) );
	- ASSERT_TRUE( & cv3[0] == & mv(0,2) );
	+ ASSERT_TRUE( & cv1[0] == & mv( 0, 0 ) );
	+ ASSERT_TRUE( & cv2[0] == & mv( 0, 1 ) );
	+ ASSERT_TRUE( & cv3[0] == & mv( 0, 2 ) );

	- ASSERT_TRUE( & vr1[0] == & mv(0,0) );
	- ASSERT_TRUE( & vr2[0] == & mv(0,1) );
	- ASSERT_TRUE( & vr3[0] == & mv(0,2) );
	+ ASSERT_TRUE( & vr1[0] == & mv( 0, 0 ) );
	+ ASSERT_TRUE( & vr2[0] == & mv( 0, 1 ) );
	+ ASSERT_TRUE( & vr3[0] == & mv( 0, 2 ) );

	- ASSERT_TRUE( & cvr1[0] == & mv(0,0) );
	- ASSERT_TRUE( & cvr2[0] == & mv(0,1) );
	- ASSERT_TRUE( & cvr3[0] == & mv(0,2) );
	+ ASSERT_TRUE( & cvr1[0] == & mv( 0, 0 ) );
	+ ASSERT_TRUE( & cvr2[0] == & mv( 0, 1 ) );
	+ ASSERT_TRUE( & cvr3[0] == & mv( 0, 2 ) );

	- ASSERT_TRUE( & mv1(0,0) == & mv( 1 , 2 ) );
	- ASSERT_TRUE( & mv1(1,1) == & mv( 2 , 3 ) );
	- ASSERT_TRUE( & mv1(3,2) == & mv( 4 , 4 ) );
	- ASSERT_TRUE( & mvr1(0,0) == & mv_right( 1 , 2 ) );
	- ASSERT_TRUE( & mvr1(1,1) == & mv_right( 2 , 3 ) );
	- ASSERT_TRUE( & mvr1(3,2) == & mv_right( 4 , 4 ) );
	+ ASSERT_TRUE( & mv1( 0, 0 ) == & mv( 1, 2 ) );
	+ ASSERT_TRUE( & mv1( 1, 1 ) == & mv( 2, 3 ) );
	+ ASSERT_TRUE( & mv1( 3, 2 ) == & mv( 4, 4 ) );
	+ ASSERT_TRUE( & mvr1( 0, 0 ) == & mv_right( 1, 2 ) );
	+ ASSERT_TRUE( & mvr1( 1, 1 ) == & mv_right( 2, 3 ) );
	+ ASSERT_TRUE( & mvr1( 3, 2 ) == & mv_right( 4, 4 ) );

	const_vector_type c_cv1( v1 );
	typename vector_type::const_type c_cv2( v2 );
	typename const_vector_type::const_type c_ccv2( v2 );

	const_multivector_type cmv( mv );
	typename multivector_type::const_type cmvX( cmv );
	typename const_multivector_type::const_type ccmvX( cmv );
	}
	};

	} // namespace Test
	-
	-/--------------------------------------------------------------------------/
	-
	diff --git a/lib/kokkos/core/unit_test/TestViewMapping.hpp b/lib/kokkos/core/unit_test/TestViewMapping.hpp
	index 324f02e94..71604bed5 100644
	--- a/lib/kokkos/core/unit_test/TestViewMapping.hpp
	+++ b/lib/kokkos/core/unit_test/TestViewMapping.hpp
	@@ -1,1437 +1,1463 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <stdexcept>
	#include <sstream>
	#include <iostream>

	#include <Kokkos_Core.hpp>

	-/--------------------------------------------------------------------------/
	-
	namespace Test {

	template< class Space >
	void test_view_mapping()
	{
	- typedef typename Space::execution_space ExecSpace ;
	-
	- typedef Kokkos::Experimental::Impl::ViewDimension<> dim_0 ;
	- typedef Kokkos::Experimental::Impl::ViewDimension<2> dim_s2 ;
	- typedef Kokkos::Experimental::Impl::ViewDimension<2,3> dim_s2_s3 ;
	- typedef Kokkos::Experimental::Impl::ViewDimension<2,3,4> dim_s2_s3_s4 ;
	-
	- typedef Kokkos::Experimental::Impl::ViewDimension<0> dim_s0 ;
	- typedef Kokkos::Experimental::Impl::ViewDimension<0,3> dim_s0_s3 ;
	- typedef Kokkos::Experimental::Impl::ViewDimension<0,3,4> dim_s0_s3_s4 ;
	-
	- typedef Kokkos::Experimental::Impl::ViewDimension<0,0> dim_s0_s0 ;
	- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,4> dim_s0_s0_s4 ;
	-
	- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0> dim_s0_s0_s0 ;
	- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0,0> dim_s0_s0_s0_s0 ;
	- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0,0,0> dim_s0_s0_s0_s0_s0 ;
	- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0,0,0,0> dim_s0_s0_s0_s0_s0_s0 ;
	- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0,0,0,0,0> dim_s0_s0_s0_s0_s0_s0_s0 ;
	- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0,0,0,0,0,0> dim_s0_s0_s0_s0_s0_s0_s0_s0 ;
	-
	- // Fully static dimensions should not be larger than an int
	- ASSERT_LE( sizeof(dim_0) , sizeof(int) );
	- ASSERT_LE( sizeof(dim_s2) , sizeof(int) );
	- ASSERT_LE( sizeof(dim_s2_s3) , sizeof(int) );
	- ASSERT_LE( sizeof(dim_s2_s3_s4) , sizeof(int) );
	-
	- // Rank 1 is size_t
	- ASSERT_EQ( sizeof(dim_s0) , sizeof(size_t) );
	- ASSERT_EQ( sizeof(dim_s0_s3) , sizeof(size_t) );
	- ASSERT_EQ( sizeof(dim_s0_s3_s4) , sizeof(size_t) );
	-
	- // Allow for padding
	- ASSERT_LE( sizeof(dim_s0_s0) , 2 * sizeof(size_t) );
	- ASSERT_LE( sizeof(dim_s0_s0_s4) , 2 * sizeof(size_t) );
	-
	- ASSERT_LE( sizeof(dim_s0_s0_s0) , 4 * sizeof(size_t) );
	- ASSERT_EQ( sizeof(dim_s0_s0_s0_s0) , 4 * sizeof(unsigned) );
	- ASSERT_LE( sizeof(dim_s0_s0_s0_s0_s0) , 6 * sizeof(unsigned) );
	- ASSERT_EQ( sizeof(dim_s0_s0_s0_s0_s0_s0) , 6 * sizeof(unsigned) );
	- ASSERT_LE( sizeof(dim_s0_s0_s0_s0_s0_s0_s0) , 8 * sizeof(unsigned) );
	- ASSERT_EQ( sizeof(dim_s0_s0_s0_s0_s0_s0_s0_s0) , 8 * sizeof(unsigned) );
	-
	- static_assert( int(dim_0::rank) == int(0) , "" );
	- static_assert( int(dim_0::rank_dynamic) == int(0) , "" );
	- static_assert( int(dim_0::ArgN0) == 1 , "" );
	- static_assert( int(dim_0::ArgN1) == 1 , "" );
	- static_assert( int(dim_0::ArgN2) == 1 , "" );
	-
	- static_assert( int(dim_s2::rank) == int(1) , "" );
	- static_assert( int(dim_s2::rank_dynamic) == int(0) , "" );
	- static_assert( int(dim_s2::ArgN0) == 2 , "" );
	- static_assert( int(dim_s2::ArgN1) == 1 , "" );
	-
	- static_assert( int(dim_s2_s3::rank) == int(2) , "" );
	- static_assert( int(dim_s2_s3::rank_dynamic) == int(0) , "" );
	- static_assert( int(dim_s2_s3::ArgN0) == 2 , "" );
	- static_assert( int(dim_s2_s3::ArgN1) == 3 , "" );
	- static_assert( int(dim_s2_s3::ArgN2) == 1 , "" );
	-
	- static_assert( int(dim_s2_s3_s4::rank) == int(3) , "" );
	- static_assert( int(dim_s2_s3_s4::rank_dynamic) == int(0) , "" );
	- static_assert( int(dim_s2_s3_s4::ArgN0) == 2 , "" );
	- static_assert( int(dim_s2_s3_s4::ArgN1) == 3 , "" );
	- static_assert( int(dim_s2_s3_s4::ArgN2) == 4 , "" );
	- static_assert( int(dim_s2_s3_s4::ArgN3) == 1 , "" );
	-
	- static_assert( int(dim_s0::rank) == int(1) , "" );
	- static_assert( int(dim_s0::rank_dynamic) == int(1) , "" );
	-
	- static_assert( int(dim_s0_s3::rank) == int(2) , "" );
	- static_assert( int(dim_s0_s3::rank_dynamic) == int(1) , "" );
	- static_assert( int(dim_s0_s3::ArgN0) == 0 , "" );
	- static_assert( int(dim_s0_s3::ArgN1) == 3 , "" );
	-
	- static_assert( int(dim_s0_s3_s4::rank) == int(3) , "" );
	- static_assert( int(dim_s0_s3_s4::rank_dynamic) == int(1) , "" );
	- static_assert( int(dim_s0_s3_s4::ArgN0) == 0 , "" );
	- static_assert( int(dim_s0_s3_s4::ArgN1) == 3 , "" );
	- static_assert( int(dim_s0_s3_s4::ArgN2) == 4 , "" );
	-
	- static_assert( int(dim_s0_s0_s4::rank) == int(3) , "" );
	- static_assert( int(dim_s0_s0_s4::rank_dynamic) == int(2) , "" );
	- static_assert( int(dim_s0_s0_s4::ArgN0) == 0 , "" );
	- static_assert( int(dim_s0_s0_s4::ArgN1) == 0 , "" );
	- static_assert( int(dim_s0_s0_s4::ArgN2) == 4 , "" );
	-
	- static_assert( int(dim_s0_s0_s0::rank) == int(3) , "" );
	- static_assert( int(dim_s0_s0_s0::rank_dynamic) == int(3) , "" );
	-
	- static_assert( int(dim_s0_s0_s0_s0::rank) == int(4) , "" );
	- static_assert( int(dim_s0_s0_s0_s0::rank_dynamic) == int(4) , "" );
	-
	- static_assert( int(dim_s0_s0_s0_s0_s0::rank) == int(5) , "" );
	- static_assert( int(dim_s0_s0_s0_s0_s0::rank_dynamic) == int(5) , "" );
	-
	- static_assert( int(dim_s0_s0_s0_s0_s0_s0::rank) == int(6) , "" );
	- static_assert( int(dim_s0_s0_s0_s0_s0_s0::rank_dynamic) == int(6) , "" );
	-
	- static_assert( int(dim_s0_s0_s0_s0_s0_s0_s0::rank) == int(7) , "" );
	- static_assert( int(dim_s0_s0_s0_s0_s0_s0_s0::rank_dynamic) == int(7) , "" );
	-
	- static_assert( int(dim_s0_s0_s0_s0_s0_s0_s0_s0::rank) == int(8) , "" );
	- static_assert( int(dim_s0_s0_s0_s0_s0_s0_s0_s0::rank_dynamic) == int(8) , "" );
	-
	- dim_s0 d1( 2, 3, 4, 5, 6, 7, 8, 9 );
	+ typedef typename Space::execution_space ExecSpace;
	+
	+ typedef Kokkos::Experimental::Impl::ViewDimension<> dim_0;
	+ typedef Kokkos::Experimental::Impl::ViewDimension< 2 > dim_s2;
	+ typedef Kokkos::Experimental::Impl::ViewDimension< 2, 3 > dim_s2_s3;
	+ typedef Kokkos::Experimental::Impl::ViewDimension< 2, 3, 4 > dim_s2_s3_s4;
	+
	+ typedef Kokkos::Experimental::Impl::ViewDimension< 0 > dim_s0;
	+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 3 > dim_s0_s3;
	+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 3, 4 > dim_s0_s3_s4;
	+
	+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0 > dim_s0_s0;
	+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 4 > dim_s0_s0_s4;
	+
	+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 0 > dim_s0_s0_s0;
	+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 0, 0 > dim_s0_s0_s0_s0;
	+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 0, 0, 0 > dim_s0_s0_s0_s0_s0;
	+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 0, 0, 0, 0 > dim_s0_s0_s0_s0_s0_s0;
	+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 0, 0, 0, 0, 0 > dim_s0_s0_s0_s0_s0_s0_s0;
	+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 0, 0, 0, 0, 0, 0 > dim_s0_s0_s0_s0_s0_s0_s0_s0;
	+
	+ // Fully static dimensions should not be larger than an int.
	+ ASSERT_LE( sizeof( dim_0 ), sizeof( int ) );
	+ ASSERT_LE( sizeof( dim_s2 ), sizeof( int ) );
	+ ASSERT_LE( sizeof( dim_s2_s3 ), sizeof( int ) );
	+ ASSERT_LE( sizeof( dim_s2_s3_s4 ), sizeof( int ) );
	+
	+ // Rank 1 is size_t.
	+ ASSERT_EQ( sizeof( dim_s0 ), sizeof( size_t ) );
	+ ASSERT_EQ( sizeof( dim_s0_s3 ), sizeof( size_t ) );
	+ ASSERT_EQ( sizeof( dim_s0_s3_s4 ), sizeof( size_t ) );
	+
	+ // Allow for padding.
	+ ASSERT_LE( sizeof( dim_s0_s0 ), 2 * sizeof( size_t ) );
	+ ASSERT_LE( sizeof( dim_s0_s0_s4 ), 2 * sizeof( size_t ) );
	+
	+ ASSERT_LE( sizeof( dim_s0_s0_s0 ), 4 * sizeof( size_t ) );
	+ ASSERT_EQ( sizeof( dim_s0_s0_s0_s0 ), 4 * sizeof( unsigned ) );
	+ ASSERT_LE( sizeof( dim_s0_s0_s0_s0_s0 ), 6 * sizeof( unsigned ) );
	+ ASSERT_EQ( sizeof( dim_s0_s0_s0_s0_s0_s0 ), 6 * sizeof( unsigned ) );
	+ ASSERT_LE( sizeof( dim_s0_s0_s0_s0_s0_s0_s0 ), 8 * sizeof( unsigned ) );
	+ ASSERT_EQ( sizeof( dim_s0_s0_s0_s0_s0_s0_s0_s0 ), 8 * sizeof( unsigned ) );
	+
	+ static_assert( int( dim_0::rank ) == int( 0 ), "" );
	+ static_assert( int( dim_0::rank_dynamic ) == int( 0 ), "" );
	+ static_assert( int( dim_0::ArgN0 ) == 1, "" );
	+ static_assert( int( dim_0::ArgN1 ) == 1, "" );
	+ static_assert( int( dim_0::ArgN2 ) == 1, "" );
	+
	+ static_assert( int( dim_s2::rank ) == int( 1 ), "" );
	+ static_assert( int( dim_s2::rank_dynamic ) == int( 0 ), "" );
	+ static_assert( int( dim_s2::ArgN0 ) == 2, "" );
	+ static_assert( int( dim_s2::ArgN1 ) == 1, "" );
	+
	+ static_assert( int( dim_s2_s3::rank ) == int( 2 ), "" );
	+ static_assert( int( dim_s2_s3::rank_dynamic ) == int( 0 ), "" );
	+ static_assert( int( dim_s2_s3::ArgN0 ) == 2, "" );
	+ static_assert( int( dim_s2_s3::ArgN1 ) == 3, "" );
	+ static_assert( int( dim_s2_s3::ArgN2 ) == 1, "" );
	+
	+ static_assert( int( dim_s2_s3_s4::rank ) == int( 3 ), "" );
	+ static_assert( int( dim_s2_s3_s4::rank_dynamic ) == int( 0 ), "" );
	+ static_assert( int( dim_s2_s3_s4::ArgN0 ) == 2, "" );
	+ static_assert( int( dim_s2_s3_s4::ArgN1 ) == 3, "" );
	+ static_assert( int( dim_s2_s3_s4::ArgN2 ) == 4, "" );
	+ static_assert( int( dim_s2_s3_s4::ArgN3 ) == 1, "" );
	+
	+ static_assert( int( dim_s0::rank ) == int( 1 ), "" );
	+ static_assert( int( dim_s0::rank_dynamic ) == int( 1 ), "" );
	+
	+ static_assert( int( dim_s0_s3::rank ) == int( 2 ), "" );
	+ static_assert( int( dim_s0_s3::rank_dynamic ) == int( 1 ), "" );
	+ static_assert( int( dim_s0_s3::ArgN0 ) == 0, "" );
	+ static_assert( int( dim_s0_s3::ArgN1 ) == 3, "" );
	+
	+ static_assert( int( dim_s0_s3_s4::rank ) == int( 3 ), "" );
	+ static_assert( int( dim_s0_s3_s4::rank_dynamic ) == int( 1 ), "" );
	+ static_assert( int( dim_s0_s3_s4::ArgN0 ) == 0, "" );
	+ static_assert( int( dim_s0_s3_s4::ArgN1 ) == 3, "" );
	+ static_assert( int( dim_s0_s3_s4::ArgN2 ) == 4, "" );
	+
	+ static_assert( int( dim_s0_s0_s4::rank ) == int( 3 ), "" );
	+ static_assert( int( dim_s0_s0_s4::rank_dynamic ) == int( 2 ), "" );
	+ static_assert( int( dim_s0_s0_s4::ArgN0 ) == 0, "" );
	+ static_assert( int( dim_s0_s0_s4::ArgN1 ) == 0, "" );
	+ static_assert( int( dim_s0_s0_s4::ArgN2 ) == 4, "" );
	+
	+ static_assert( int( dim_s0_s0_s0::rank ) == int( 3 ), "" );
	+ static_assert( int( dim_s0_s0_s0::rank_dynamic ) == int( 3 ), "" );
	+
	+ static_assert( int( dim_s0_s0_s0_s0::rank ) == int( 4 ), "" );
	+ static_assert( int( dim_s0_s0_s0_s0::rank_dynamic ) == int( 4 ), "" );
	+
	+ static_assert( int( dim_s0_s0_s0_s0_s0::rank ) == int( 5 ), "" );
	+ static_assert( int( dim_s0_s0_s0_s0_s0::rank_dynamic ) == int( 5 ), "" );
	+
	+ static_assert( int( dim_s0_s0_s0_s0_s0_s0::rank ) == int( 6 ), "" );
	+ static_assert( int( dim_s0_s0_s0_s0_s0_s0::rank_dynamic ) == int( 6 ), "" );
	+
	+ static_assert( int( dim_s0_s0_s0_s0_s0_s0_s0::rank ) == int( 7 ), "" );
	+ static_assert( int( dim_s0_s0_s0_s0_s0_s0_s0::rank_dynamic ) == int( 7 ), "" );
	+
	+ static_assert( int( dim_s0_s0_s0_s0_s0_s0_s0_s0::rank ) == int( 8 ), "" );
	+ static_assert( int( dim_s0_s0_s0_s0_s0_s0_s0_s0::rank_dynamic ) == int( 8 ), "" );
	+
	+ dim_s0 d1( 2, 3, 4, 5, 6, 7, 8, 9 );
	dim_s0_s0 d2( 2, 3, 4, 5, 6, 7, 8, 9 );
	dim_s0_s0_s0 d3( 2, 3, 4, 5, 6, 7, 8, 9 );
	dim_s0_s0_s0_s0 d4( 2, 3, 4, 5, 6, 7, 8, 9 );

	- ASSERT_EQ( d1.N0 , 2 );
	- ASSERT_EQ( d2.N0 , 2 );
	- ASSERT_EQ( d3.N0 , 2 );
	- ASSERT_EQ( d4.N0 , 2 );
	+ ASSERT_EQ( d1.N0, 2 );
	+ ASSERT_EQ( d2.N0, 2 );
	+ ASSERT_EQ( d3.N0, 2 );
	+ ASSERT_EQ( d4.N0, 2 );

	- ASSERT_EQ( d1.N1 , 1 );
	- ASSERT_EQ( d2.N1 , 3 );
	- ASSERT_EQ( d3.N1 , 3 );
	- ASSERT_EQ( d4.N1 , 3 );
	+ ASSERT_EQ( d1.N1, 1 );
	+ ASSERT_EQ( d2.N1, 3 );
	+ ASSERT_EQ( d3.N1, 3 );
	+ ASSERT_EQ( d4.N1, 3 );

	- ASSERT_EQ( d1.N2 , 1 );
	- ASSERT_EQ( d2.N2 , 1 );
	- ASSERT_EQ( d3.N2 , 4 );
	- ASSERT_EQ( d4.N2 , 4 );
	+ ASSERT_EQ( d1.N2, 1 );
	+ ASSERT_EQ( d2.N2, 1 );
	+ ASSERT_EQ( d3.N2, 4 );
	+ ASSERT_EQ( d4.N2, 4 );

	- ASSERT_EQ( d1.N3 , 1 );
	- ASSERT_EQ( d2.N3 , 1 );
	- ASSERT_EQ( d3.N3 , 1 );
	- ASSERT_EQ( d4.N3 , 5 );
	+ ASSERT_EQ( d1.N3, 1 );
	+ ASSERT_EQ( d2.N3, 1 );
	+ ASSERT_EQ( d3.N3, 1 );
	+ ASSERT_EQ( d4.N3, 5 );

	//----------------------------------------

	- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s0 , Kokkos::LayoutStride > stride_s0_s0_s0 ;
	+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s0, Kokkos::LayoutStride > stride_s0_s0_s0;

	//----------------------------------------
	- // Static dimension
	+ // Static dimension.
	{
	- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s2_s3_s4 , Kokkos::LayoutLeft > left_s2_s3_s4 ;
	+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s2_s3_s4, Kokkos::LayoutLeft > left_s2_s3_s4;

	- ASSERT_EQ( sizeof(left_s2_s3_s4) , sizeof(dim_s2_s3_s4) );
	+ ASSERT_EQ( sizeof( left_s2_s3_s4 ), sizeof( dim_s2_s3_s4 ) );

	- left_s2_s3_s4 off3 ;
	+ left_s2_s3_s4 off3;

	- stride_s0_s0_s0 stride3( off3 );
	+ stride_s0_s0_s0 stride3( off3 );

	- ASSERT_EQ( off3.stride_0() , 1 );
	- ASSERT_EQ( off3.stride_1() , 2 );
	- ASSERT_EQ( off3.stride_2() , 6 );
	- ASSERT_EQ( off3.span() , 24 );
	+ ASSERT_EQ( off3.stride_0(), 1 );
	+ ASSERT_EQ( off3.stride_1(), 2 );
	+ ASSERT_EQ( off3.stride_2(), 6 );
	+ ASSERT_EQ( off3.span(), 24 );

	- ASSERT_EQ( off3.stride_0() , stride3.stride_0() );
	- ASSERT_EQ( off3.stride_1() , stride3.stride_1() );
	- ASSERT_EQ( off3.stride_2() , stride3.stride_2() );
	- ASSERT_EQ( off3.span() , stride3.span() );
	+ ASSERT_EQ( off3.stride_0(), stride3.stride_0() );
	+ ASSERT_EQ( off3.stride_1(), stride3.stride_1() );
	+ ASSERT_EQ( off3.stride_2(), stride3.stride_2() );
	+ ASSERT_EQ( off3.span(), stride3.span() );

	- int offset = 0 ;
	+ int offset = 0;

	- for ( int k = 0 ; k < 4 ; ++k ){
	- for ( int j = 0 ; j < 3 ; ++j ){
	- for ( int i = 0 ; i < 2 ; ++i , ++offset ){
	- ASSERT_EQ( off3(i,j,k) , offset );
	- ASSERT_EQ( stride3(i,j,k) , off3(i,j,k) );
	- }}}
	+ for ( int k = 0; k < 4; ++k )
	+ for ( int j = 0; j < 3; ++j )
	+ for ( int i = 0; i < 2; ++i, ++offset )
	+ {
	+ ASSERT_EQ( off3( i, j, k ), offset );
	+ ASSERT_EQ( stride3( i, j, k ), off3( i, j, k ) );
	+ }
	}

	//----------------------------------------
	- // Small dimension is unpadded
	+ // Small dimension is unpadded.
	{
	- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutLeft > left_s0_s0_s4 ;
	+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4, Kokkos::LayoutLeft > left_s0_s0_s4;

	- left_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
	+ left_s0_s0_s4 dyn_off3( std::integral_constant< unsigned, sizeof( int ) >()
	, Kokkos::LayoutLeft( 2, 3, 0, 0, 0, 0, 0, 0 ) );

	stride_s0_s0_s0 stride3( dyn_off3 );

	- ASSERT_EQ( dyn_off3.m_dim.rank , 3 );
	- ASSERT_EQ( dyn_off3.m_dim.N0 , 2 );
	- ASSERT_EQ( dyn_off3.m_dim.N1 , 3 );
	- ASSERT_EQ( dyn_off3.m_dim.N2 , 4 );
	- ASSERT_EQ( dyn_off3.m_dim.N3 , 1 );
	- ASSERT_EQ( dyn_off3.size() , 2 * 3 * 4 );
	+ ASSERT_EQ( dyn_off3.m_dim.rank, 3 );
	+ ASSERT_EQ( dyn_off3.m_dim.N0, 2 );
	+ ASSERT_EQ( dyn_off3.m_dim.N1, 3 );
	+ ASSERT_EQ( dyn_off3.m_dim.N2, 4 );
	+ ASSERT_EQ( dyn_off3.m_dim.N3, 1 );
	+ ASSERT_EQ( dyn_off3.size(), 2 * 3 * 4 );

	const Kokkos::LayoutLeft layout = dyn_off3.layout();

	- ASSERT_EQ( layout.dimension[0] , 2 );
	- ASSERT_EQ( layout.dimension[1] , 3 );
	- ASSERT_EQ( layout.dimension[2] , 4 );
	- ASSERT_EQ( layout.dimension[3] , 1 );
	- ASSERT_EQ( layout.dimension[4] , 1 );
	- ASSERT_EQ( layout.dimension[5] , 1 );
	- ASSERT_EQ( layout.dimension[6] , 1 );
	- ASSERT_EQ( layout.dimension[7] , 1 );
	-
	- ASSERT_EQ( stride3.m_dim.rank , 3 );
	- ASSERT_EQ( stride3.m_dim.N0 , 2 );
	- ASSERT_EQ( stride3.m_dim.N1 , 3 );
	- ASSERT_EQ( stride3.m_dim.N2 , 4 );
	- ASSERT_EQ( stride3.m_dim.N3 , 1 );
	- ASSERT_EQ( stride3.size() , 2 * 3 * 4 );
	-
	- int offset = 0 ;
	-
	- for ( int k = 0 ; k < 4 ; ++k ){
	- for ( int j = 0 ; j < 3 ; ++j ){
	- for ( int i = 0 ; i < 2 ; ++i , ++offset ){
	- ASSERT_EQ( offset , dyn_off3(i,j,k) );
	- ASSERT_EQ( stride3(i,j,k) , dyn_off3(i,j,k) );
	- }}}
	-
	- ASSERT_EQ( dyn_off3.span() , offset );
	- ASSERT_EQ( stride3.span() , dyn_off3.span() );
	+ ASSERT_EQ( layout.dimension[0], 2 );
	+ ASSERT_EQ( layout.dimension[1], 3 );
	+ ASSERT_EQ( layout.dimension[2], 4 );
	+ ASSERT_EQ( layout.dimension[3], 1 );
	+ ASSERT_EQ( layout.dimension[4], 1 );
	+ ASSERT_EQ( layout.dimension[5], 1 );
	+ ASSERT_EQ( layout.dimension[6], 1 );
	+ ASSERT_EQ( layout.dimension[7], 1 );
	+
	+ ASSERT_EQ( stride3.m_dim.rank, 3 );
	+ ASSERT_EQ( stride3.m_dim.N0, 2 );
	+ ASSERT_EQ( stride3.m_dim.N1, 3 );
	+ ASSERT_EQ( stride3.m_dim.N2, 4 );
	+ ASSERT_EQ( stride3.m_dim.N3, 1 );
	+ ASSERT_EQ( stride3.size(), 2 * 3 * 4 );
	+
	+ int offset = 0;
	+
	+ for ( int k = 0; k < 4; ++k )
	+ for ( int j = 0; j < 3; ++j )
	+ for ( int i = 0; i < 2; ++i, ++offset )
	+ {
	+ ASSERT_EQ( offset, dyn_off3( i, j, k ) );
	+ ASSERT_EQ( stride3( i, j, k ), dyn_off3( i, j, k ) );
	+ }
	+
	+ ASSERT_EQ( dyn_off3.span(), offset );
	+ ASSERT_EQ( stride3.span(), dyn_off3.span() );
	}

	- // Large dimension is likely padded
	+ //----------------------------------------
	+ // Large dimension is likely padded.
	{
	- constexpr int N0 = 2000 ;
	- constexpr int N1 = 300 ;
	+ constexpr int N0 = 2000;
	+ constexpr int N1 = 300;

	- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutLeft > left_s0_s0_s4 ;
	+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4, Kokkos::LayoutLeft > left_s0_s0_s4;

	- left_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
	+ left_s0_s0_s4 dyn_off3( std::integral_constant< unsigned, sizeof( int ) >()
	, Kokkos::LayoutLeft( N0, N1, 0, 0, 0, 0, 0, 0 ) );

	stride_s0_s0_s0 stride3( dyn_off3 );

	- ASSERT_EQ( dyn_off3.m_dim.rank , 3 );
	- ASSERT_EQ( dyn_off3.m_dim.N0 , N0 );
	- ASSERT_EQ( dyn_off3.m_dim.N1 , N1 );
	- ASSERT_EQ( dyn_off3.m_dim.N2 , 4 );
	- ASSERT_EQ( dyn_off3.m_dim.N3 , 1 );
	- ASSERT_EQ( dyn_off3.size() , N0 * N1 * 4 );
	-
	- ASSERT_EQ( stride3.m_dim.rank , 3 );
	- ASSERT_EQ( stride3.m_dim.N0 , N0 );
	- ASSERT_EQ( stride3.m_dim.N1 , N1 );
	- ASSERT_EQ( stride3.m_dim.N2 , 4 );
	- ASSERT_EQ( stride3.m_dim.N3 , 1 );
	- ASSERT_EQ( stride3.size() , N0 * N1 * 4 );
	- ASSERT_EQ( stride3.span() , dyn_off3.span() );
	-
	- int offset = 0 ;
	-
	- for ( int k = 0 ; k < 4 ; ++k ){
	- for ( int j = 0 ; j < N1 ; ++j ){
	- for ( int i = 0 ; i < N0 ; ++i ){
	- ASSERT_LE( offset , dyn_off3(i,j,k) );
	- ASSERT_EQ( stride3(i,j,k) , dyn_off3(i,j,k) );
	- offset = dyn_off3(i,j,k) + 1 ;
	- }}}
	-
	- ASSERT_LE( offset , dyn_off3.span() );
	+ ASSERT_EQ( dyn_off3.m_dim.rank, 3 );
	+ ASSERT_EQ( dyn_off3.m_dim.N0, N0 );
	+ ASSERT_EQ( dyn_off3.m_dim.N1, N1 );
	+ ASSERT_EQ( dyn_off3.m_dim.N2, 4 );
	+ ASSERT_EQ( dyn_off3.m_dim.N3, 1 );
	+ ASSERT_EQ( dyn_off3.size(), N0 * N1 * 4 );
	+
	+ ASSERT_EQ( stride3.m_dim.rank, 3 );
	+ ASSERT_EQ( stride3.m_dim.N0, N0 );
	+ ASSERT_EQ( stride3.m_dim.N1, N1 );
	+ ASSERT_EQ( stride3.m_dim.N2, 4 );
	+ ASSERT_EQ( stride3.m_dim.N3, 1 );
	+ ASSERT_EQ( stride3.size(), N0 * N1 * 4 );
	+ ASSERT_EQ( stride3.span(), dyn_off3.span() );
	+
	+ int offset = 0;
	+
	+ for ( int k = 0; k < 4; ++k )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int i = 0; i < N0; ++i )
	+ {
	+ ASSERT_LE( offset, dyn_off3( i, j, k ) );
	+ ASSERT_EQ( stride3( i, j, k ), dyn_off3( i, j, k ) );
	+ offset = dyn_off3( i, j, k ) + 1;
	+ }
	+
	+ ASSERT_LE( offset, dyn_off3.span() );
	}

	//----------------------------------------
	- // Static dimension
	+ // Static dimension.
	{
	- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s2_s3_s4 , Kokkos::LayoutRight > right_s2_s3_s4 ;
	+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s2_s3_s4, Kokkos::LayoutRight > right_s2_s3_s4;

	- ASSERT_EQ( sizeof(right_s2_s3_s4) , sizeof(dim_s2_s3_s4) );
	+ ASSERT_EQ( sizeof( right_s2_s3_s4 ), sizeof( dim_s2_s3_s4 ) );

	- right_s2_s3_s4 off3 ;
	+ right_s2_s3_s4 off3;

	stride_s0_s0_s0 stride3( off3 );

	- ASSERT_EQ( off3.stride_0() , 12 );
	- ASSERT_EQ( off3.stride_1() , 4 );
	- ASSERT_EQ( off3.stride_2() , 1 );
	+ ASSERT_EQ( off3.stride_0(), 12 );
	+ ASSERT_EQ( off3.stride_1(), 4 );
	+ ASSERT_EQ( off3.stride_2(), 1 );

	- ASSERT_EQ( off3.dimension_0() , stride3.dimension_0() );
	- ASSERT_EQ( off3.dimension_1() , stride3.dimension_1() );
	- ASSERT_EQ( off3.dimension_2() , stride3.dimension_2() );
	- ASSERT_EQ( off3.stride_0() , stride3.stride_0() );
	- ASSERT_EQ( off3.stride_1() , stride3.stride_1() );
	- ASSERT_EQ( off3.stride_2() , stride3.stride_2() );
	- ASSERT_EQ( off3.span() , stride3.span() );
	+ ASSERT_EQ( off3.dimension_0(), stride3.dimension_0() );
	+ ASSERT_EQ( off3.dimension_1(), stride3.dimension_1() );
	+ ASSERT_EQ( off3.dimension_2(), stride3.dimension_2() );
	+ ASSERT_EQ( off3.stride_0(), stride3.stride_0() );
	+ ASSERT_EQ( off3.stride_1(), stride3.stride_1() );
	+ ASSERT_EQ( off3.stride_2(), stride3.stride_2() );
	+ ASSERT_EQ( off3.span(), stride3.span() );

	- int offset = 0 ;
	+ int offset = 0;

	- for ( int i = 0 ; i < 2 ; ++i ){
	- for ( int j = 0 ; j < 3 ; ++j ){
	- for ( int k = 0 ; k < 4 ; ++k , ++offset ){
	- ASSERT_EQ( off3(i,j,k) , offset );
	- ASSERT_EQ( off3(i,j,k) , stride3(i,j,k) );
	- }}}
	+ for ( int i = 0; i < 2; ++i )
	+ for ( int j = 0; j < 3; ++j )
	+ for ( int k = 0; k < 4; ++k, ++offset )
	+ {
	+ ASSERT_EQ( off3( i, j, k ), offset );
	+ ASSERT_EQ( off3( i, j, k ), stride3( i, j, k ) );
	+ }

	- ASSERT_EQ( off3.span() , offset );
	+ ASSERT_EQ( off3.span(), offset );
	}

	//----------------------------------------
	- // Small dimension is unpadded
	+ // Small dimension is unpadded.
	{
	- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutRight > right_s0_s0_s4 ;
	+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4, Kokkos::LayoutRight > right_s0_s0_s4;

	- right_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
	+ right_s0_s0_s4 dyn_off3( std::integral_constant< unsigned, sizeof( int ) >()
	, Kokkos::LayoutRight( 2, 3, 0, 0, 0, 0, 0, 0 ) );

	stride_s0_s0_s0 stride3( dyn_off3 );

	- ASSERT_EQ( dyn_off3.m_dim.rank , 3 );
	- ASSERT_EQ( dyn_off3.m_dim.N0 , 2 );
	- ASSERT_EQ( dyn_off3.m_dim.N1 , 3 );
	- ASSERT_EQ( dyn_off3.m_dim.N2 , 4 );
	- ASSERT_EQ( dyn_off3.m_dim.N3 , 1 );
	- ASSERT_EQ( dyn_off3.size() , 2 * 3 * 4 );
	-
	- ASSERT_EQ( dyn_off3.dimension_0() , stride3.dimension_0() );
	- ASSERT_EQ( dyn_off3.dimension_1() , stride3.dimension_1() );
	- ASSERT_EQ( dyn_off3.dimension_2() , stride3.dimension_2() );
	- ASSERT_EQ( dyn_off3.stride_0() , stride3.stride_0() );
	- ASSERT_EQ( dyn_off3.stride_1() , stride3.stride_1() );
	- ASSERT_EQ( dyn_off3.stride_2() , stride3.stride_2() );
	- ASSERT_EQ( dyn_off3.span() , stride3.span() );
	-
	- int offset = 0 ;
	-
	- for ( int i = 0 ; i < 2 ; ++i ){
	- for ( int j = 0 ; j < 3 ; ++j ){
	- for ( int k = 0 ; k < 4 ; ++k , ++offset ){
	- ASSERT_EQ( offset , dyn_off3(i,j,k) );
	- ASSERT_EQ( dyn_off3(i,j,k) , stride3(i,j,k) );
	- }}}
	-
	- ASSERT_EQ( dyn_off3.span() , offset );
	+ ASSERT_EQ( dyn_off3.m_dim.rank, 3 );
	+ ASSERT_EQ( dyn_off3.m_dim.N0, 2 );
	+ ASSERT_EQ( dyn_off3.m_dim.N1, 3 );
	+ ASSERT_EQ( dyn_off3.m_dim.N2, 4 );
	+ ASSERT_EQ( dyn_off3.m_dim.N3, 1 );
	+ ASSERT_EQ( dyn_off3.size(), 2 * 3 * 4 );
	+
	+ ASSERT_EQ( dyn_off3.dimension_0(), stride3.dimension_0() );
	+ ASSERT_EQ( dyn_off3.dimension_1(), stride3.dimension_1() );
	+ ASSERT_EQ( dyn_off3.dimension_2(), stride3.dimension_2() );
	+ ASSERT_EQ( dyn_off3.stride_0(), stride3.stride_0() );
	+ ASSERT_EQ( dyn_off3.stride_1(), stride3.stride_1() );
	+ ASSERT_EQ( dyn_off3.stride_2(), stride3.stride_2() );
	+ ASSERT_EQ( dyn_off3.span(), stride3.span() );
	+
	+ int offset = 0;
	+
	+ for ( int i = 0; i < 2; ++i )
	+ for ( int j = 0; j < 3; ++j )
	+ for ( int k = 0; k < 4; ++k, ++offset )
	+ {
	+ ASSERT_EQ( offset, dyn_off3( i, j, k ) );
	+ ASSERT_EQ( dyn_off3( i, j, k ), stride3( i, j, k ) );
	+ }
	+
	+ ASSERT_EQ( dyn_off3.span(), offset );
	}

	- // Large dimension is likely padded
	+ //----------------------------------------
	+ // Large dimension is likely padded.
	{
	- constexpr int N0 = 2000 ;
	- constexpr int N1 = 300 ;
	+ constexpr int N0 = 2000;
	+ constexpr int N1 = 300;

	- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutRight > right_s0_s0_s4 ;
	+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4, Kokkos::LayoutRight > right_s0_s0_s4;

	- right_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
	+ right_s0_s0_s4 dyn_off3( std::integral_constant< unsigned, sizeof( int ) >()
	, Kokkos::LayoutRight( N0, N1, 0, 0, 0, 0, 0, 0 ) );

	stride_s0_s0_s0 stride3( dyn_off3 );

	- ASSERT_EQ( dyn_off3.m_dim.rank , 3 );
	- ASSERT_EQ( dyn_off3.m_dim.N0 , N0 );
	- ASSERT_EQ( dyn_off3.m_dim.N1 , N1 );
	- ASSERT_EQ( dyn_off3.m_dim.N2 , 4 );
	- ASSERT_EQ( dyn_off3.m_dim.N3 , 1 );
	- ASSERT_EQ( dyn_off3.size() , N0 * N1 * 4 );
	-
	- ASSERT_EQ( dyn_off3.dimension_0() , stride3.dimension_0() );
	- ASSERT_EQ( dyn_off3.dimension_1() , stride3.dimension_1() );
	- ASSERT_EQ( dyn_off3.dimension_2() , stride3.dimension_2() );
	- ASSERT_EQ( dyn_off3.stride_0() , stride3.stride_0() );
	- ASSERT_EQ( dyn_off3.stride_1() , stride3.stride_1() );
	- ASSERT_EQ( dyn_off3.stride_2() , stride3.stride_2() );
	- ASSERT_EQ( dyn_off3.span() , stride3.span() );
	-
	- int offset = 0 ;
	-
	- for ( int i = 0 ; i < N0 ; ++i ){
	- for ( int j = 0 ; j < N1 ; ++j ){
	- for ( int k = 0 ; k < 4 ; ++k ){
	- ASSERT_LE( offset , dyn_off3(i,j,k) );
	- ASSERT_EQ( dyn_off3(i,j,k) , stride3(i,j,k) );
	- offset = dyn_off3(i,j,k) + 1 ;
	- }}}
	-
	- ASSERT_LE( offset , dyn_off3.span() );
	+ ASSERT_EQ( dyn_off3.m_dim.rank, 3 );
	+ ASSERT_EQ( dyn_off3.m_dim.N0, N0 );
	+ ASSERT_EQ( dyn_off3.m_dim.N1, N1 );
	+ ASSERT_EQ( dyn_off3.m_dim.N2, 4 );
	+ ASSERT_EQ( dyn_off3.m_dim.N3, 1 );
	+ ASSERT_EQ( dyn_off3.size(), N0 * N1 * 4 );
	+
	+ ASSERT_EQ( dyn_off3.dimension_0(), stride3.dimension_0() );
	+ ASSERT_EQ( dyn_off3.dimension_1(), stride3.dimension_1() );
	+ ASSERT_EQ( dyn_off3.dimension_2(), stride3.dimension_2() );
	+ ASSERT_EQ( dyn_off3.stride_0(), stride3.stride_0() );
	+ ASSERT_EQ( dyn_off3.stride_1(), stride3.stride_1() );
	+ ASSERT_EQ( dyn_off3.stride_2(), stride3.stride_2() );
	+ ASSERT_EQ( dyn_off3.span(), stride3.span() );
	+
	+ int offset = 0;
	+
	+ for ( int i = 0; i < N0; ++i )
	+ for ( int j = 0; j < N1; ++j )
	+ for ( int k = 0; k < 4; ++k )
	+ {
	+ ASSERT_LE( offset, dyn_off3( i, j, k ) );
	+ ASSERT_EQ( dyn_off3( i, j, k ), stride3( i, j, k ) );
	+ offset = dyn_off3( i, j, k ) + 1;
	+ }
	+
	+ ASSERT_LE( offset, dyn_off3.span() );
	}

	//----------------------------------------
	- // Subview
	+ // Subview.
	{
	// Mapping rank 4 to rank 3
	- typedef Kokkos::Experimental::Impl::SubviewExtents<4,3> SubviewExtents ;
	+ typedef Kokkos::Experimental::Impl::SubviewExtents< 4, 3 > SubviewExtents;

	- constexpr int N0 = 1000 ;
	- constexpr int N1 = 2000 ;
	- constexpr int N2 = 3000 ;
	- constexpr int N3 = 4000 ;
	+ constexpr int N0 = 1000;
	+ constexpr int N1 = 2000;
	+ constexpr int N2 = 3000;
	+ constexpr int N3 = 4000;

	- Kokkos::Experimental::Impl::ViewDimension<N0,N1,N2,N3> dim ;
	+ Kokkos::Experimental::Impl::ViewDimension< N0, N1, N2, N3 > dim;

	SubviewExtents tmp( dim
	, N0 / 2
	, Kokkos::Experimental::ALL
	- , std::pair<int,int>( N2 / 4 , 10 + N2 / 4 )
	- , Kokkos::pair<int,int>( N3 / 4 , 20 + N3 / 4 )
	+ , std::pair< int, int >( N2 / 4, 10 + N2 / 4 )
	+ , Kokkos::pair< int, int >( N3 / 4, 20 + N3 / 4 )
	);

	- ASSERT_EQ( tmp.domain_offset(0) , N0 / 2 );
	- ASSERT_EQ( tmp.domain_offset(1) , 0 );
	- ASSERT_EQ( tmp.domain_offset(2) , N2 / 4 );
	- ASSERT_EQ( tmp.domain_offset(3) , N3 / 4 );
	+ ASSERT_EQ( tmp.domain_offset( 0 ), N0 / 2 );
	+ ASSERT_EQ( tmp.domain_offset( 1 ), 0 );
	+ ASSERT_EQ( tmp.domain_offset( 2 ), N2 / 4 );
	+ ASSERT_EQ( tmp.domain_offset( 3 ), N3 / 4 );

	- ASSERT_EQ( tmp.range_index(0) , 1 );
	- ASSERT_EQ( tmp.range_index(1) , 2 );
	- ASSERT_EQ( tmp.range_index(2) , 3 );
	+ ASSERT_EQ( tmp.range_index( 0 ), 1 );
	+ ASSERT_EQ( tmp.range_index( 1 ), 2 );
	+ ASSERT_EQ( tmp.range_index( 2 ), 3 );

	- ASSERT_EQ( tmp.range_extent(0) , N1 );
	- ASSERT_EQ( tmp.range_extent(1) , 10 );
	- ASSERT_EQ( tmp.range_extent(2) , 20 );
	+ ASSERT_EQ( tmp.range_extent( 0 ), N1 );
	+ ASSERT_EQ( tmp.range_extent( 1 ), 10 );
	+ ASSERT_EQ( tmp.range_extent( 2 ), 20 );
	}
	- //----------------------------------------
	+
	{
	- constexpr int N0 = 2000 ;
	- constexpr int N1 = 300 ;
	+ constexpr int N0 = 2000;
	+ constexpr int N1 = 300;

	- constexpr int sub_N0 = 1000 ;
	- constexpr int sub_N1 = 200 ;
	- constexpr int sub_N2 = 4 ;
	+ constexpr int sub_N0 = 1000;
	+ constexpr int sub_N1 = 200;
	+ constexpr int sub_N2 = 4;

	- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutLeft > left_s0_s0_s4 ;
	+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4, Kokkos::LayoutLeft > left_s0_s0_s4;

	- left_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
	+ left_s0_s0_s4 dyn_off3( std::integral_constant< unsigned, sizeof( int ) >()
	, Kokkos::LayoutLeft( N0, N1, 0, 0, 0, 0, 0, 0 ) );

	- Kokkos::Experimental::Impl::SubviewExtents< 3 , 3 >
	+ Kokkos::Experimental::Impl::SubviewExtents< 3, 3 >
	sub( dyn_off3.m_dim
	- , Kokkos::pair<int,int>(0,sub_N0)
	- , Kokkos::pair<int,int>(0,sub_N1)
	- , Kokkos::pair<int,int>(0,sub_N2)
	+ , Kokkos::pair< int, int >( 0, sub_N0 )
	+ , Kokkos::pair< int, int >( 0, sub_N1 )
	+ , Kokkos::pair< int, int >( 0, sub_N2 )
	);

	- stride_s0_s0_s0 stride3( dyn_off3 , sub );
	+ stride_s0_s0_s0 stride3( dyn_off3, sub );

	- ASSERT_EQ( stride3.dimension_0() , sub_N0 );
	- ASSERT_EQ( stride3.dimension_1() , sub_N1 );
	- ASSERT_EQ( stride3.dimension_2() , sub_N2 );
	- ASSERT_EQ( stride3.size() , sub_N0 * sub_N1 * sub_N2 );
	+ ASSERT_EQ( stride3.dimension_0(), sub_N0 );
	+ ASSERT_EQ( stride3.dimension_1(), sub_N1 );
	+ ASSERT_EQ( stride3.dimension_2(), sub_N2 );
	+ ASSERT_EQ( stride3.size(), sub_N0 * sub_N1 * sub_N2 );

	- ASSERT_EQ( dyn_off3.stride_0() , stride3.stride_0() );
	- ASSERT_EQ( dyn_off3.stride_1() , stride3.stride_1() );
	- ASSERT_EQ( dyn_off3.stride_2() , stride3.stride_2() );
	- ASSERT_GE( dyn_off3.span() , stride3.span() );
	+ ASSERT_EQ( dyn_off3.stride_0(), stride3.stride_0() );
	+ ASSERT_EQ( dyn_off3.stride_1(), stride3.stride_1() );
	+ ASSERT_EQ( dyn_off3.stride_2(), stride3.stride_2() );
	+ ASSERT_GE( dyn_off3.span() , stride3.span() );

	- for ( int k = 0 ; k < sub_N2 ; ++k ){
	- for ( int j = 0 ; j < sub_N1 ; ++j ){
	- for ( int i = 0 ; i < sub_N0 ; ++i ){
	- ASSERT_EQ( stride3(i,j,k) , dyn_off3(i,j,k) );
	- }}}
	+ for ( int k = 0; k < sub_N2; ++k )
	+ for ( int j = 0; j < sub_N1; ++j )
	+ for ( int i = 0; i < sub_N0; ++i )
	+ {
	+ ASSERT_EQ( stride3( i, j, k ), dyn_off3( i, j, k ) );
	+ }
	}

	{
	- constexpr int N0 = 2000 ;
	- constexpr int N1 = 300 ;
	+ constexpr int N0 = 2000;
	+ constexpr int N1 = 300;

	- constexpr int sub_N0 = 1000 ;
	- constexpr int sub_N1 = 200 ;
	- constexpr int sub_N2 = 4 ;
	+ constexpr int sub_N0 = 1000;
	+ constexpr int sub_N1 = 200;
	+ constexpr int sub_N2 = 4;

	- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutRight > right_s0_s0_s4 ;
	+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4, Kokkos::LayoutRight > right_s0_s0_s4;

	- right_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
	+ right_s0_s0_s4 dyn_off3( std::integral_constant< unsigned, sizeof( int ) >()
	, Kokkos::LayoutRight( N0, N1, 0, 0, 0, 0, 0, 0 ) );

	- Kokkos::Experimental::Impl::SubviewExtents< 3 , 3 >
	+ Kokkos::Experimental::Impl::SubviewExtents< 3, 3 >
	sub( dyn_off3.m_dim
	- , Kokkos::pair<int,int>(0,sub_N0)
	- , Kokkos::pair<int,int>(0,sub_N1)
	- , Kokkos::pair<int,int>(0,sub_N2)
	+ , Kokkos::pair< int, int >( 0, sub_N0 )
	+ , Kokkos::pair< int, int >( 0, sub_N1 )
	+ , Kokkos::pair< int, int >( 0, sub_N2 )
	);

	- stride_s0_s0_s0 stride3( dyn_off3 , sub );
	+ stride_s0_s0_s0 stride3( dyn_off3, sub );

	- ASSERT_EQ( stride3.dimension_0() , sub_N0 );
	- ASSERT_EQ( stride3.dimension_1() , sub_N1 );
	- ASSERT_EQ( stride3.dimension_2() , sub_N2 );
	- ASSERT_EQ( stride3.size() , sub_N0 * sub_N1 * sub_N2 );
	+ ASSERT_EQ( stride3.dimension_0(), sub_N0 );
	+ ASSERT_EQ( stride3.dimension_1(), sub_N1 );
	+ ASSERT_EQ( stride3.dimension_2(), sub_N2 );
	+ ASSERT_EQ( stride3.size(), sub_N0 * sub_N1 * sub_N2 );

	- ASSERT_EQ( dyn_off3.stride_0() , stride3.stride_0() );
	- ASSERT_EQ( dyn_off3.stride_1() , stride3.stride_1() );
	- ASSERT_EQ( dyn_off3.stride_2() , stride3.stride_2() );
	- ASSERT_GE( dyn_off3.span() , stride3.span() );
	+ ASSERT_EQ( dyn_off3.stride_0(), stride3.stride_0() );
	+ ASSERT_EQ( dyn_off3.stride_1(), stride3.stride_1() );
	+ ASSERT_EQ( dyn_off3.stride_2(), stride3.stride_2() );
	+ ASSERT_GE( dyn_off3.span() , stride3.span() );

	- for ( int i = 0 ; i < sub_N0 ; ++i ){
	- for ( int j = 0 ; j < sub_N1 ; ++j ){
	- for ( int k = 0 ; k < sub_N2 ; ++k ){
	- ASSERT_EQ( stride3(i,j,k) , dyn_off3(i,j,k) );
	- }}}
	+ for ( int i = 0; i < sub_N0; ++i )
	+ for ( int j = 0; j < sub_N1; ++j )
	+ for ( int k = 0; k < sub_N2; ++k )
	+ {
	+ ASSERT_EQ( stride3( i, j, k ), dyn_off3( i, j, k ) );
	+ }
	}

	//----------------------------------------
	- // view data analysis
	+ // View data analysis.
	{
	- using namespace Kokkos::Experimental::Impl ;
	- static_assert( rank_dynamic<>::value == 0 , "" );
	- static_assert( rank_dynamic<1>::value == 0 , "" );
	- static_assert( rank_dynamic<0>::value == 1 , "" );
	- static_assert( rank_dynamic<0,1>::value == 1 , "" );
	- static_assert( rank_dynamic<0,0,1>::value == 2 , "" );
	+ using namespace Kokkos::Experimental::Impl;
	+
	+ static_assert( rank_dynamic<>::value == 0, "" );
	+ static_assert( rank_dynamic< 1 >::value == 0, "" );
	+ static_assert( rank_dynamic< 0 >::value == 1, "" );
	+ static_assert( rank_dynamic< 0, 1 >::value == 1, "" );
	+ static_assert( rank_dynamic< 0, 0, 1 >::value == 2, "" );
	}

	{
	- using namespace Kokkos::Experimental::Impl ;
	-
	- typedef ViewArrayAnalysis< int[] > a_int_r1 ;
	- typedef ViewArrayAnalysis< int**[4][5][6] > a_int_r5 ;
	- typedef ViewArrayAnalysis< const int[] > a_const_int_r1 ;
	- typedef ViewArrayAnalysis< const int**[4][5][6] > a_const_int_r5 ;
	-
	- static_assert( a_int_r1::dimension::rank == 1 , "" );
	- static_assert( a_int_r1::dimension::rank_dynamic == 1 , "" );
	- static_assert( a_int_r5::dimension::ArgN0 == 0 , "" );
	- static_assert( a_int_r5::dimension::ArgN1 == 0 , "" );
	- static_assert( a_int_r5::dimension::ArgN2 == 4 , "" );
	- static_assert( a_int_r5::dimension::ArgN3 == 5 , "" );
	- static_assert( a_int_r5::dimension::ArgN4 == 6 , "" );
	- static_assert( a_int_r5::dimension::ArgN5 == 1 , "" );
	-
	- static_assert( std::is_same< typename a_int_r1::dimension , ViewDimension<0> >::value , "" );
	- static_assert( std::is_same< typename a_int_r1::non_const_value_type , int >::value , "" );
	-
	- static_assert( a_const_int_r1::dimension::rank == 1 , "" );
	- static_assert( a_const_int_r1::dimension::rank_dynamic == 1 , "" );
	- static_assert( std::is_same< typename a_const_int_r1::dimension , ViewDimension<0> >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r1::non_const_value_type , int >::value , "" );
	-
	- static_assert( a_const_int_r5::dimension::rank == 5 , "" );
	- static_assert( a_const_int_r5::dimension::rank_dynamic == 2 , "" );
	-
	- static_assert( a_const_int_r5::dimension::ArgN0 == 0 , "" );
	- static_assert( a_const_int_r5::dimension::ArgN1 == 0 , "" );
	- static_assert( a_const_int_r5::dimension::ArgN2 == 4 , "" );
	- static_assert( a_const_int_r5::dimension::ArgN3 == 5 , "" );
	- static_assert( a_const_int_r5::dimension::ArgN4 == 6 , "" );
	- static_assert( a_const_int_r5::dimension::ArgN5 == 1 , "" );
	-
	- static_assert( std::is_same< typename a_const_int_r5::dimension , ViewDimension<0,0,4,5,6> >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r5::non_const_value_type , int >::value , "" );
	-
	- static_assert( a_int_r5::dimension::rank == 5 , "" );
	- static_assert( a_int_r5::dimension::rank_dynamic == 2 , "" );
	- static_assert( std::is_same< typename a_int_r5::dimension , ViewDimension<0,0,4,5,6> >::value , "" );
	- static_assert( std::is_same< typename a_int_r5::non_const_value_type , int >::value , "" );
	+ using namespace Kokkos::Experimental::Impl;
	+
	+ typedef ViewArrayAnalysis< int[] > a_int_r1;
	+ typedef ViewArrayAnalysis< int**[4][5][6] > a_int_r5;
	+ typedef ViewArrayAnalysis< const int[] > a_const_int_r1;
	+ typedef ViewArrayAnalysis< const int**[4][5][6] > a_const_int_r5;
	+
	+ static_assert( a_int_r1::dimension::rank == 1, "" );
	+ static_assert( a_int_r1::dimension::rank_dynamic == 1, "" );
	+ static_assert( a_int_r5::dimension::ArgN0 == 0, "" );
	+ static_assert( a_int_r5::dimension::ArgN1 == 0, "" );
	+ static_assert( a_int_r5::dimension::ArgN2 == 4, "" );
	+ static_assert( a_int_r5::dimension::ArgN3 == 5, "" );
	+ static_assert( a_int_r5::dimension::ArgN4 == 6, "" );
	+ static_assert( a_int_r5::dimension::ArgN5 == 1, "" );
	+
	+ static_assert( std::is_same< typename a_int_r1::dimension, ViewDimension<0> >::value, "" );
	+ static_assert( std::is_same< typename a_int_r1::non_const_value_type, int >::value, "" );
	+
	+ static_assert( a_const_int_r1::dimension::rank == 1, "" );
	+ static_assert( a_const_int_r1::dimension::rank_dynamic == 1, "" );
	+ static_assert( std::is_same< typename a_const_int_r1::dimension, ViewDimension<0> >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r1::non_const_value_type, int >::value, "" );
	+
	+ static_assert( a_const_int_r5::dimension::rank == 5, "" );
	+ static_assert( a_const_int_r5::dimension::rank_dynamic == 2, "" );
	+
	+ static_assert( a_const_int_r5::dimension::ArgN0 == 0, "" );
	+ static_assert( a_const_int_r5::dimension::ArgN1 == 0, "" );
	+ static_assert( a_const_int_r5::dimension::ArgN2 == 4, "" );
	+ static_assert( a_const_int_r5::dimension::ArgN3 == 5, "" );
	+ static_assert( a_const_int_r5::dimension::ArgN4 == 6, "" );
	+ static_assert( a_const_int_r5::dimension::ArgN5 == 1, "" );
	+
	+ static_assert( std::is_same< typename a_const_int_r5::dimension, ViewDimension<0, 0, 4, 5, 6> >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r5::non_const_value_type, int >::value, "" );
	+
	+ static_assert( a_int_r5::dimension::rank == 5, "" );
	+ static_assert( a_int_r5::dimension::rank_dynamic == 2, "" );
	+ static_assert( std::is_same< typename a_int_r5::dimension, ViewDimension<0, 0, 4, 5, 6> >::value, "" );
	+ static_assert( std::is_same< typename a_int_r5::non_const_value_type, int >::value, "" );
	}

	{
	- using namespace Kokkos::Experimental::Impl ;
	+ using namespace Kokkos::Experimental::Impl;

	- typedef int t_i4[4] ;
	+ typedef int t_i4[4];

	// Dimensions of t_i4 are appended to the multdimensional array.
	- typedef ViewArrayAnalysis< t_i4 ***[3] > a_int_r5 ;
	-
	- static_assert( a_int_r5::dimension::rank == 5 , "" );
	- static_assert( a_int_r5::dimension::rank_dynamic == 3 , "" );
	- static_assert( a_int_r5::dimension::ArgN0 == 0 , "" );
	- static_assert( a_int_r5::dimension::ArgN1 == 0 , "" );
	- static_assert( a_int_r5::dimension::ArgN2 == 0 , "" );
	- static_assert( a_int_r5::dimension::ArgN3 == 3 , "" );
	- static_assert( a_int_r5::dimension::ArgN4 == 4 , "" );
	- static_assert( std::is_same< typename a_int_r5::non_const_value_type , int >::value , "" );
	+ typedef ViewArrayAnalysis< t_i4 ***[3] > a_int_r5;
	+
	+ static_assert( a_int_r5::dimension::rank == 5, "" );
	+ static_assert( a_int_r5::dimension::rank_dynamic == 3, "" );
	+ static_assert( a_int_r5::dimension::ArgN0 == 0, "" );
	+ static_assert( a_int_r5::dimension::ArgN1 == 0, "" );
	+ static_assert( a_int_r5::dimension::ArgN2 == 0, "" );
	+ static_assert( a_int_r5::dimension::ArgN3 == 3, "" );
	+ static_assert( a_int_r5::dimension::ArgN4 == 4, "" );
	+ static_assert( std::is_same< typename a_int_r5::non_const_value_type, int >::value, "" );
	}

	{
	- using namespace Kokkos::Experimental::Impl ;
	+ using namespace Kokkos::Experimental::Impl;

	- typedef ViewDataAnalysis< const int[] , void > a_const_int_r1 ;
	+ typedef ViewDataAnalysis< const int[], void > a_const_int_r1;

	- static_assert( std::is_same< typename a_const_int_r1::specialize , void >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r1::dimension , Kokkos::Experimental::Impl::ViewDimension<0> >::value , "" );
	+ static_assert( std::is_same< typename a_const_int_r1::specialize, void >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r1::dimension, Kokkos::Experimental::Impl::ViewDimension<0> >::value, "" );

	- static_assert( std::is_same< typename a_const_int_r1::type , const int * >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r1::value_type , const int >::value , "" );
	+ static_assert( std::is_same< typename a_const_int_r1::type, const int * >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r1::value_type, const int >::value, "" );

	- static_assert( std::is_same< typename a_const_int_r1::scalar_array_type , const int * >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r1::const_type , const int * >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r1::const_value_type , const int >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r1::const_scalar_array_type , const int * >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r1::non_const_type , int * >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r1::non_const_value_type , int >::value , "" );
	+ static_assert( std::is_same< typename a_const_int_r1::scalar_array_type, const int * >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r1::const_type, const int * >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r1::const_value_type, const int >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r1::const_scalar_array_type, const int * >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r1::non_const_type, int * >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r1::non_const_value_type, int >::value, "" );

	- typedef ViewDataAnalysis< const int**[4] , void > a_const_int_r3 ;
	+ typedef ViewDataAnalysis< const int**[4], void > a_const_int_r3;

	- static_assert( std::is_same< typename a_const_int_r3::specialize , void >::value , "" );
	+ static_assert( std::is_same< typename a_const_int_r3::specialize, void >::value, "" );

	- static_assert( std::is_same< typename a_const_int_r3::dimension , Kokkos::Experimental::Impl::ViewDimension<0,0,4> >::value , "" );
	+ static_assert( std::is_same< typename a_const_int_r3::dimension, Kokkos::Experimental::Impl::ViewDimension<0, 0, 4> >::value, "" );

	- static_assert( std::is_same< typename a_const_int_r3::type , const int**[4] >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r3::value_type , const int >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r3::scalar_array_type , const int**[4] >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r3::const_type , const int**[4] >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r3::const_value_type , const int >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r3::const_scalar_array_type , const int**[4] >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r3::non_const_type , int**[4] >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r3::non_const_value_type , int >::value , "" );
	- static_assert( std::is_same< typename a_const_int_r3::non_const_scalar_array_type , int**[4] >::value , "" );
	+ static_assert( std::is_same< typename a_const_int_r3::type, const int**[4] >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r3::value_type, const int >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r3::scalar_array_type, const int**[4] >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r3::const_type, const int**[4] >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r3::const_value_type, const int >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r3::const_scalar_array_type, const int**[4] >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r3::non_const_type, int**[4] >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r3::non_const_value_type, int >::value, "" );
	+ static_assert( std::is_same< typename a_const_int_r3::non_const_scalar_array_type, int**[4] >::value, "" );

	-
	- // std::cout << "typeid(const int[4]).name() = " << typeid(const int[4]).name() << std::endl ;
	+ // std::cout << "typeid( const int[4] ).name() = " << typeid( const int[4] ).name() << std::endl;
	}

	//----------------------------------------

	{
	- constexpr int N = 10 ;
	+ constexpr int N = 10;

	- typedef Kokkos::View<int*,Space> T ;
	- typedef Kokkos::View<const int*,Space> C ;
	+ typedef Kokkos::View< int*, Space > T;
	+ typedef Kokkos::View< const int*, Space > C;

	- int data[N] ;
	+ int data[N];

	- T vr1(data,N); // view of non-const
	- C cr1(vr1); // view of const from view of non-const
	- C cr2( (const int *) data , N );
	+ T vr1( data, N ); // View of non-const.
	+ C cr1( vr1 ); // View of const from view of non-const.
	+ C cr2( (const int *) data, N );

	// Generate static_assert error:
	// T tmp( cr1 );

	- ASSERT_EQ( vr1.span() , N );
	- ASSERT_EQ( cr1.span() , N );
	- ASSERT_EQ( vr1.data() , & data[0] );
	- ASSERT_EQ( cr1.data() , & data[0] );
	+ ASSERT_EQ( vr1.span(), N );
	+ ASSERT_EQ( cr1.span(), N );
	+ ASSERT_EQ( vr1.data(), & data[0] );
	+ ASSERT_EQ( cr1.data(), & data[0] );

	- ASSERT_TRUE( ( std::is_same< typename T::data_type , int* >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename T::const_data_type , const int* >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename T::non_const_data_type , int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::data_type , int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::const_data_type , const int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::non_const_data_type, int* >::value ) );

	- ASSERT_TRUE( ( std::is_same< typename T::scalar_array_type , int* >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename T::const_scalar_array_type , const int* >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename T::non_const_scalar_array_type , int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::scalar_array_type , int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::const_scalar_array_type , const int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::non_const_scalar_array_type, int* >::value ) );

	- ASSERT_TRUE( ( std::is_same< typename T::value_type , int >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename T::const_value_type , const int >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename T::non_const_value_type , int >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::value_type , int >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::const_value_type , const int >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::non_const_value_type, int >::value ) );

	- ASSERT_TRUE( ( std::is_same< typename T::memory_space , typename Space::memory_space >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename T::reference_type , int & >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::memory_space, typename Space::memory_space >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::reference_type, int & >::value ) );

	- ASSERT_EQ( T::Rank , 1 );
	+ ASSERT_EQ( T::Rank, 1 );

	- ASSERT_TRUE( ( std::is_same< typename C::data_type , const int* >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename C::const_data_type , const int* >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename C::non_const_data_type , int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename C::data_type , const int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename C::const_data_type , const int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename C::non_const_data_type, int* >::value ) );

	- ASSERT_TRUE( ( std::is_same< typename C::scalar_array_type , const int* >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename C::const_scalar_array_type , const int* >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename C::non_const_scalar_array_type , int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename C::scalar_array_type , const int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename C::const_scalar_array_type , const int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename C::non_const_scalar_array_type, int* >::value ) );

	- ASSERT_TRUE( ( std::is_same< typename C::value_type , const int >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename C::const_value_type , const int >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename C::non_const_value_type , int >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename C::value_type , const int >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename C::const_value_type , const int >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename C::non_const_value_type, int >::value ) );

	- ASSERT_TRUE( ( std::is_same< typename C::memory_space , typename Space::memory_space >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename C::reference_type , const int & >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename C::memory_space, typename Space::memory_space >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename C::reference_type, const int & >::value ) );

	- ASSERT_EQ( C::Rank , 1 );
	+ ASSERT_EQ( C::Rank, 1 );

	- ASSERT_EQ( vr1.dimension_0() , N );
	+ ASSERT_EQ( vr1.dimension_0(), N );

	- if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace , typename Space::memory_space >::accessible ) {
	- for ( int i = 0 ; i < N ; ++i ) data[i] = i + 1 ;
	- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( vr1[i] , i + 1 );
	- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( cr1[i] , i + 1 );
	+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, typename Space::memory_space >::accessible ) {
	+ for ( int i = 0; i < N; ++i ) data[i] = i + 1;
	+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( vr1[i], i + 1 );
	+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( cr1[i], i + 1 );

	{
	T tmp( vr1 );
	- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( tmp[i] , i + 1 );
	- for ( int i = 0 ; i < N ; ++i ) vr1(i) = i + 2 ;
	- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( tmp[i] , i + 2 );
	+
	+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( tmp[i], i + 1 );
	+ for ( int i = 0; i < N; ++i ) vr1( i ) = i + 2;
	+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( tmp[i], i + 2 );
	}

	- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( vr1[i] , i + 2 );
	+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( vr1[i], i + 2 );
	}
	}

	-
	{
	- constexpr int N = 10 ;
	- typedef Kokkos::View<int*,Space> T ;
	- typedef Kokkos::View<const int*,Space> C ;
	+ constexpr int N = 10;
	+ typedef Kokkos::View< int*, Space > T;
	+ typedef Kokkos::View< const int*, Space > C;
	+
	+ T vr1( "vr1", N );
	+ C cr1( vr1 );

	- T vr1("vr1",N);
	- C cr1(vr1);
	+ ASSERT_TRUE( ( std::is_same< typename T::data_type , int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::const_data_type , const int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::non_const_data_type, int* >::value ) );

	- ASSERT_TRUE( ( std::is_same< typename T::data_type , int* >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename T::const_data_type , const int* >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename T::non_const_data_type , int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::scalar_array_type , int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::const_scalar_array_type , const int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::non_const_scalar_array_type, int* >::value ) );

	- ASSERT_TRUE( ( std::is_same< typename T::scalar_array_type , int* >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename T::const_scalar_array_type , const int* >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename T::non_const_scalar_array_type , int* >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::value_type , int >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::const_value_type , const int >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::non_const_value_type, int >::value ) );

	- ASSERT_TRUE( ( std::is_same< typename T::value_type , int >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename T::const_value_type , const int >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename T::non_const_value_type , int >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::memory_space, typename Space::memory_space >::value ) );
	+ ASSERT_TRUE( ( std::is_same< typename T::reference_type, int & >::value ) );
	+ ASSERT_EQ( T::Rank, 1 );

	- ASSERT_TRUE( ( std::is_same< typename T::memory_space , typename Space::memory_space >::value ) );
	- ASSERT_TRUE( ( std::is_same< typename T::reference_type , int & >::value ) );
	- ASSERT_EQ( T::Rank , 1 );
	-
	- ASSERT_EQ( vr1.dimension_0() , N );
	+ ASSERT_EQ( vr1.dimension_0(), N );

	- if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace , typename Space::memory_space >::accessible ) {
	- for ( int i = 0 ; i < N ; ++i ) vr1(i) = i + 1 ;
	- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( vr1[i] , i + 1 );
	- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( cr1[i] , i + 1 );
	+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, typename Space::memory_space >::accessible ) {
	+ for ( int i = 0; i < N; ++i ) vr1( i ) = i + 1;
	+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( vr1[i], i + 1 );
	+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( cr1[i], i + 1 );

	{
	T tmp( vr1 );
	- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( tmp[i] , i + 1 );
	- for ( int i = 0 ; i < N ; ++i ) vr1(i) = i + 2 ;
	- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( tmp[i] , i + 2 );
	+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( tmp[i], i + 1 );
	+ for ( int i = 0; i < N; ++i ) vr1( i ) = i + 2;
	+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( tmp[i], i + 2 );
	}

	- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( vr1[i] , i + 2 );
	+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( vr1[i], i + 2 );
	}
	}

	- // Testing proper handling of zero-length allocations
	+ // Testing proper handling of zero-length allocations.
	{
	- constexpr int N = 0 ;
	- typedef Kokkos::View<int*,Space> T ;
	- typedef Kokkos::View<const int*,Space> C ;
	+ constexpr int N = 0;
	+ typedef Kokkos::View< int*, Space > T;
	+ typedef Kokkos::View< const int*, Space > C;

	- T vr1("vr1",N);
	- C cr1(vr1);
	+ T vr1( "vr1", N );
	+ C cr1( vr1 );

	- ASSERT_EQ( vr1.dimension_0() , 0 );
	- ASSERT_EQ( cr1.dimension_0() , 0 );
	+ ASSERT_EQ( vr1.dimension_0(), 0 );
	+ ASSERT_EQ( cr1.dimension_0(), 0 );
	}

	-
	// Testing using space instance for allocation.
	- // The execution space of the memory space must be available for view data initialization
	-
	- if ( std::is_same< ExecSpace , typename ExecSpace::memory_space::execution_space >::value ) {
	-
	- using namespace Kokkos::Experimental ;
	-
	- typedef typename ExecSpace::memory_space memory_space ;
	- typedef View<int*,memory_space> V ;
	-
	- constexpr int N = 10 ;
	-
	- memory_space mem_space ;
	-
	- V v( "v" , N );
	- V va( view_alloc() , N );
	- V vb( view_alloc( "vb" ) , N );
	- V vc( view_alloc( "vc" , AllowPadding ) , N );
	- V vd( view_alloc( "vd" , WithoutInitializing ) , N );
	- V ve( view_alloc( "ve" , WithoutInitializing , AllowPadding ) , N );
	- V vf( view_alloc( "vf" , mem_space , WithoutInitializing , AllowPadding ) , N );
	- V vg( view_alloc( mem_space , "vg" , WithoutInitializing , AllowPadding ) , N );
	- V vh( view_alloc( WithoutInitializing , AllowPadding ) , N );
	- V vi( view_alloc( WithoutInitializing ) , N );
	- V vj( view_alloc( std::string("vj") , AllowPadding ) , N );
	- V vk( view_alloc( mem_space , std::string("vk") , AllowPadding ) , N );
	+ // The execution space of the memory space must be available for view data initialization.
	+ if ( std::is_same< ExecSpace, typename ExecSpace::memory_space::execution_space >::value ) {
	+
	+ using namespace Kokkos::Experimental;
	+
	+ typedef typename ExecSpace::memory_space memory_space;
	+ typedef View< int*, memory_space > V;
	+
	+ constexpr int N = 10;
	+
	+ memory_space mem_space;
	+
	+ V v( "v", N );
	+ V va( view_alloc(), N );
	+ V vb( view_alloc( "vb" ), N );
	+ V vc( view_alloc( "vc", AllowPadding ), N );
	+ V vd( view_alloc( "vd", WithoutInitializing ), N );
	+ V ve( view_alloc( "ve", WithoutInitializing, AllowPadding ), N );
	+ V vf( view_alloc( "vf", mem_space, WithoutInitializing, AllowPadding ), N );
	+ V vg( view_alloc( mem_space, "vg", WithoutInitializing, AllowPadding ), N );
	+ V vh( view_alloc( WithoutInitializing, AllowPadding ), N );
	+ V vi( view_alloc( WithoutInitializing ), N );
	+ V vj( view_alloc( std::string( "vj" ), AllowPadding ), N );
	+ V vk( view_alloc( mem_space, std::string( "vk" ), AllowPadding ), N );
	}

	{
	- typedef Kokkos::ViewTraits<int***,Kokkos::LayoutStride,ExecSpace> traits_t ;
	- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0> dims_t ;
	- typedef Kokkos::Experimental::Impl::ViewOffset< dims_t , Kokkos::LayoutStride > offset_t ;
	+ typedef Kokkos::ViewTraits< int***, Kokkos::LayoutStride, ExecSpace > traits_t;
	+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 0 > dims_t;
	+ typedef Kokkos::Experimental::Impl::ViewOffset< dims_t, Kokkos::LayoutStride > offset_t;

	- Kokkos::LayoutStride stride ;
	+ Kokkos::LayoutStride stride;

	- stride.dimension[0] = 3 ;
	- stride.dimension[1] = 4 ;
	- stride.dimension[2] = 5 ;
	- stride.stride[0] = 4 ;
	- stride.stride[1] = 1 ;
	- stride.stride[2] = 12 ;
	+ stride.dimension[0] = 3;
	+ stride.dimension[1] = 4;
	+ stride.dimension[2] = 5;
	+ stride.stride[0] = 4;
	+ stride.stride[1] = 1;
	+ stride.stride[2] = 12;

	- const offset_t offset( std::integral_constant<unsigned,0>() , stride );
	+ const offset_t offset( std::integral_constant< unsigned, 0 >(), stride );

	- ASSERT_EQ( offset.dimension_0() , 3 );
	- ASSERT_EQ( offset.dimension_1() , 4 );
	- ASSERT_EQ( offset.dimension_2() , 5 );
	+ ASSERT_EQ( offset.dimension_0(), 3 );
	+ ASSERT_EQ( offset.dimension_1(), 4 );
	+ ASSERT_EQ( offset.dimension_2(), 5 );

	- ASSERT_EQ( offset.stride_0() , 4 );
	- ASSERT_EQ( offset.stride_1() , 1 );
	- ASSERT_EQ( offset.stride_2() , 12 );
	+ ASSERT_EQ( offset.stride_0(), 4 );
	+ ASSERT_EQ( offset.stride_1(), 1 );
	+ ASSERT_EQ( offset.stride_2(), 12 );

	- ASSERT_EQ( offset.span() , 60 );
	+ ASSERT_EQ( offset.span(), 60 );
	ASSERT_TRUE( offset.span_is_contiguous() );

	- Kokkos::Experimental::Impl::ViewMapping< traits_t , void >
	- v( Kokkos::Experimental::Impl::ViewCtorProp<int>((int)0), stride );
	+ Kokkos::Experimental::Impl::ViewMapping< traits_t, void >
	+ v( Kokkos::Experimental::Impl::ViewCtorProp< int* >( (int*) 0 ), stride );
	}

	{
	- typedef Kokkos::View<int**,Space> V ;
	- typedef typename V::HostMirror M ;
	- typedef typename Kokkos::View<int**,Space>::array_layout layout_type;
	+ typedef Kokkos::View< int**, Space > V;
	+ typedef typename V::HostMirror M;
	+ typedef typename Kokkos::View< int**, Space >::array_layout layout_type;

	- constexpr int N0 = 10 ;
	- constexpr int N1 = 11 ;
	+ constexpr int N0 = 10;
	+ constexpr int N1 = 11;

	- V a("a",N0,N1);
	- M b = Kokkos::Experimental::create_mirror(a);
	- M c = Kokkos::Experimental::create_mirror_view(a);
	- M d ;
	+ V a( "a", N0, N1 );
	+ M b = Kokkos::Experimental::create_mirror( a );
	+ M c = Kokkos::Experimental::create_mirror_view( a );
	+ M d;

	- for ( int i0 = 0 ; i0 < N0 ; ++i0 )
	- for ( int i1 = 0 ; i1 < N1 ; ++i1 )
	- b(i0,i1) = 1 + i0 + i1 * N0 ;
	+ for ( int i0 = 0; i0 < N0; ++i0 )
	+ for ( int i1 = 0; i1 < N1; ++i1 )
	+ {
	+ b( i0, i1 ) = 1 + i0 + i1 * N0;
	+ }

	- Kokkos::Experimental::deep_copy( a , b );
	- Kokkos::Experimental::deep_copy( c , a );
	+ Kokkos::Experimental::deep_copy( a, b );
	+ Kokkos::Experimental::deep_copy( c, a );

	- for ( int i0 = 0 ; i0 < N0 ; ++i0 )
	- for ( int i1 = 0 ; i1 < N1 ; ++i1 )
	- ASSERT_EQ( b(i0,i1) , c(i0,i1) );
	+ for ( int i0 = 0; i0 < N0; ++i0 )
	+ for ( int i1 = 0; i1 < N1; ++i1 )
	+ {
	+ ASSERT_EQ( b( i0, i1 ), c( i0, i1 ) );
	+ }

	- Kokkos::Experimental::resize( b , 5 , 6 );
	+ Kokkos::Experimental::resize( b, 5, 6 );

	- for ( int i0 = 0 ; i0 < 5 ; ++i0 )
	- for ( int i1 = 0 ; i1 < 6 ; ++i1 ) {
	+ for ( int i0 = 0; i0 < 5; ++i0 )
	+ for ( int i1 = 0; i1 < 6; ++i1 )
	+ {
	int val = 1 + i0 + i1 * N0;
	- ASSERT_EQ( b(i0,i1) , c(i0,i1) );
	- ASSERT_EQ( b(i0,i1) , val );
	+ ASSERT_EQ( b( i0, i1 ), c( i0, i1 ) );
	+ ASSERT_EQ( b( i0, i1 ), val );
	}

	- Kokkos::Experimental::realloc( c , 5 , 6 );
	- Kokkos::Experimental::realloc( d , 5 , 6 );
	+ Kokkos::Experimental::realloc( c, 5, 6 );
	+ Kokkos::Experimental::realloc( d, 5, 6 );

	- ASSERT_EQ( b.dimension_0() , 5 );
	- ASSERT_EQ( b.dimension_1() , 6 );
	- ASSERT_EQ( c.dimension_0() , 5 );
	- ASSERT_EQ( c.dimension_1() , 6 );
	- ASSERT_EQ( d.dimension_0() , 5 );
	- ASSERT_EQ( d.dimension_1() , 6 );
	+ ASSERT_EQ( b.dimension_0(), 5 );
	+ ASSERT_EQ( b.dimension_1(), 6 );
	+ ASSERT_EQ( c.dimension_0(), 5 );
	+ ASSERT_EQ( c.dimension_1(), 6 );
	+ ASSERT_EQ( d.dimension_0(), 5 );
	+ ASSERT_EQ( d.dimension_1(), 6 );

	- layout_type layout(7,8);
	- Kokkos::Experimental::resize( b , layout );
	- for ( int i0 = 0 ; i0 < 7 ; ++i0 )
	- for ( int i1 = 6 ; i1 < 8 ; ++i1 )
	- b(i0,i1) = 1 + i0 + i1 * N0 ;
	+ layout_type layout( 7, 8 );
	+ Kokkos::Experimental::resize( b, layout );
	+ for ( int i0 = 0; i0 < 7; ++i0 )
	+ for ( int i1 = 6; i1 < 8; ++i1 )
	+ {
	+ b( i0, i1 ) = 1 + i0 + i1 * N0;
	+ }

	- for ( int i0 = 5 ; i0 < 7 ; ++i0 )
	- for ( int i1 = 0 ; i1 < 8 ; ++i1 )
	- b(i0,i1) = 1 + i0 + i1 * N0 ;
	+ for ( int i0 = 5; i0 < 7; ++i0 )
	+ for ( int i1 = 0; i1 < 8; ++i1 )
	+ {
	+ b( i0, i1 ) = 1 + i0 + i1 * N0;
	+ }

	- for ( int i0 = 0 ; i0 < 7 ; ++i0 )
	- for ( int i1 = 0 ; i1 < 8 ; ++i1 ) {
	+ for ( int i0 = 0; i0 < 7; ++i0 )
	+ for ( int i1 = 0; i1 < 8; ++i1 )
	+ {
	int val = 1 + i0 + i1 * N0;
	- ASSERT_EQ( b(i0,i1) , val );
	+ ASSERT_EQ( b( i0, i1 ), val );
	}

	- Kokkos::Experimental::realloc( c , layout );
	- Kokkos::Experimental::realloc( d , layout );
	-
	- ASSERT_EQ( b.dimension_0() , 7 );
	- ASSERT_EQ( b.dimension_1() , 8 );
	- ASSERT_EQ( c.dimension_0() , 7 );
	- ASSERT_EQ( c.dimension_1() , 8 );
	- ASSERT_EQ( d.dimension_0() , 7 );
	- ASSERT_EQ( d.dimension_1() , 8 );
	+ Kokkos::Experimental::realloc( c, layout );
	+ Kokkos::Experimental::realloc( d, layout );

	+ ASSERT_EQ( b.dimension_0(), 7 );
	+ ASSERT_EQ( b.dimension_1(), 8 );
	+ ASSERT_EQ( c.dimension_0(), 7 );
	+ ASSERT_EQ( c.dimension_1(), 8 );
	+ ASSERT_EQ( d.dimension_0(), 7 );
	+ ASSERT_EQ( d.dimension_1(), 8 );
	}

	{
	- typedef Kokkos::View<int**,Kokkos::LayoutStride,Space> V ;
	- typedef typename V::HostMirror M ;
	- typedef typename Kokkos::View<int**,Kokkos::LayoutStride,Space>::array_layout layout_type;
	+ typedef Kokkos::View< int**, Kokkos::LayoutStride, Space > V;
	+ typedef typename V::HostMirror M;
	+ typedef typename Kokkos::View< int**, Kokkos::LayoutStride, Space >::array_layout layout_type;

	- constexpr int N0 = 10 ;
	- constexpr int N1 = 11 ;
	+ constexpr int N0 = 10;
	+ constexpr int N1 = 11;

	- const int dimensions[] = {N0,N1};
	- const int order[] = {1,0};
	+ const int dimensions[] = { N0, N1 };
	+ const int order[] = { 1, 0 };

	- V a("a",Kokkos::LayoutStride::order_dimensions(2,order,dimensions));
	- M b = Kokkos::Experimental::create_mirror(a);
	- M c = Kokkos::Experimental::create_mirror_view(a);
	- M d ;
	+ V a( "a", Kokkos::LayoutStride::order_dimensions( 2, order, dimensions ) );
	+ M b = Kokkos::Experimental::create_mirror( a );
	+ M c = Kokkos::Experimental::create_mirror_view( a );
	+ M d;

	- for ( int i0 = 0 ; i0 < N0 ; ++i0 )
	- for ( int i1 = 0 ; i1 < N1 ; ++i1 )
	- b(i0,i1) = 1 + i0 + i1 * N0 ;
	+ for ( int i0 = 0; i0 < N0; ++i0 )
	+ for ( int i1 = 0; i1 < N1; ++i1 )
	+ {
	+ b( i0, i1 ) = 1 + i0 + i1 * N0;
	+ }

	- Kokkos::Experimental::deep_copy( a , b );
	- Kokkos::Experimental::deep_copy( c , a );
	+ Kokkos::Experimental::deep_copy( a, b );
	+ Kokkos::Experimental::deep_copy( c, a );

	- for ( int i0 = 0 ; i0 < N0 ; ++i0 )
	- for ( int i1 = 0 ; i1 < N1 ; ++i1 )
	- ASSERT_EQ( b(i0,i1) , c(i0,i1) );
	+ for ( int i0 = 0; i0 < N0; ++i0 )
	+ for ( int i1 = 0; i1 < N1; ++i1 )
	+ {
	+ ASSERT_EQ( b( i0, i1 ), c( i0, i1 ) );
	+ }

	- const int dimensions2[] = {7,8};
	- const int order2[] = {1,0};
	- layout_type layout = layout_type::order_dimensions(2,order2,dimensions2);
	- Kokkos::Experimental::resize( b , layout );
	+ const int dimensions2[] = { 7, 8 };
	+ const int order2[] = { 1, 0 };
	+ layout_type layout = layout_type::order_dimensions( 2, order2, dimensions2 );
	+ Kokkos::Experimental::resize( b, layout );

	- for ( int i0 = 0 ; i0 < 7 ; ++i0 )
	- for ( int i1 = 0 ; i1 < 8 ; ++i1 ) {
	+ for ( int i0 = 0; i0 < 7; ++i0 )
	+ for ( int i1 = 0; i1 < 8; ++i1 )
	+ {
	int val = 1 + i0 + i1 * N0;
	- ASSERT_EQ( b(i0,i1) , c(i0,i1) );
	- ASSERT_EQ( b(i0,i1) , val );
	+ ASSERT_EQ( b( i0, i1 ), c( i0, i1 ) );
	+ ASSERT_EQ( b( i0, i1 ), val );
	}

	- Kokkos::Experimental::realloc( c , layout );
	- Kokkos::Experimental::realloc( d , layout );
	+ Kokkos::Experimental::realloc( c, layout );
	+ Kokkos::Experimental::realloc( d, layout );

	- ASSERT_EQ( b.dimension_0() , 7 );
	- ASSERT_EQ( b.dimension_1() , 8 );
	- ASSERT_EQ( c.dimension_0() , 7 );
	- ASSERT_EQ( c.dimension_1() , 8 );
	- ASSERT_EQ( d.dimension_0() , 7 );
	- ASSERT_EQ( d.dimension_1() , 8 );
	+ ASSERT_EQ( b.dimension_0(), 7 );
	+ ASSERT_EQ( b.dimension_1(), 8 );
	+ ASSERT_EQ( c.dimension_0(), 7 );
	+ ASSERT_EQ( c.dimension_1(), 8 );
	+ ASSERT_EQ( d.dimension_0(), 7 );
	+ ASSERT_EQ( d.dimension_1(), 8 );

	}

	{
	- typedef Kokkos::View<int*,Space> V ;
	- typedef Kokkos::View<int*,Space,Kokkos::MemoryUnmanaged> U ;
	+ typedef Kokkos::View< int*, Space > V;
	+ typedef Kokkos::View< int*, Space, Kokkos::MemoryUnmanaged > U;

	+ V a( "a", 10 );

	- V a("a",10);
	+ ASSERT_EQ( a.use_count(), 1 );

	- ASSERT_EQ( a.use_count() , 1 );
	+ V b = a;

	- V b = a ;
	-
	- ASSERT_EQ( a.use_count() , 2 );
	- ASSERT_EQ( b.use_count() , 2 );
	+ ASSERT_EQ( a.use_count(), 2 );
	+ ASSERT_EQ( b.use_count(), 2 );

	{
	- U c = b ; // 'c' is compile-time unmanaged
	+ U c = b; // 'c' is compile-time unmanaged.

	- ASSERT_EQ( a.use_count() , 2 );
	- ASSERT_EQ( b.use_count() , 2 );
	- ASSERT_EQ( c.use_count() , 2 );
	+ ASSERT_EQ( a.use_count(), 2 );
	+ ASSERT_EQ( b.use_count(), 2 );
	+ ASSERT_EQ( c.use_count(), 2 );

	- V d = c ; // 'd' is run-time unmanaged
	+ V d = c; // 'd' is run-time unmanaged.

	- ASSERT_EQ( a.use_count() , 2 );
	- ASSERT_EQ( b.use_count() , 2 );
	- ASSERT_EQ( c.use_count() , 2 );
	- ASSERT_EQ( d.use_count() , 2 );
	+ ASSERT_EQ( a.use_count(), 2 );
	+ ASSERT_EQ( b.use_count(), 2 );
	+ ASSERT_EQ( c.use_count(), 2 );
	+ ASSERT_EQ( d.use_count(), 2 );
	}

	- ASSERT_EQ( a.use_count() , 2 );
	- ASSERT_EQ( b.use_count() , 2 );
	+ ASSERT_EQ( a.use_count(), 2 );
	+ ASSERT_EQ( b.use_count(), 2 );

	b = V();

	- ASSERT_EQ( a.use_count() , 1 );
	- ASSERT_EQ( b.use_count() , 0 );
	-
	-#if ! defined ( KOKKOS_ENABLE_CUDA_LAMBDA )
	- /* Cannot launch host lambda when CUDA lambda is enabled */
	-
	- typedef typename Kokkos::Impl::HostMirror< Space >::Space::execution_space
	- host_exec_space ;
	-
	- Kokkos::parallel_for(
	- Kokkos::RangePolicy< host_exec_space >(0,10) ,
	- KOKKOS_LAMBDA( int i ){
	- // 'a' is captured by copy and the capture mechanism
	- // converts 'a' to an unmanaged copy.
	- // When the parallel dispatch accepts a move for the lambda
	- // this count should become 1
	- ASSERT_EQ( a.use_count() , 2 );
	- V x = a ;
	- ASSERT_EQ( a.use_count() , 2 );
	- ASSERT_EQ( x.use_count() , 2 );
	- });
	-#endif /* #if ! defined ( KOKKOS_ENABLE_CUDA_LAMBDA ) */
	+ ASSERT_EQ( a.use_count(), 1 );
	+ ASSERT_EQ( b.use_count(), 0 );
	+
	+#if !defined( KOKKOS_ENABLE_CUDA_LAMBDA )
	+ // Cannot launch host lambda when CUDA lambda is enabled.
	+
	+ typedef typename Kokkos::Impl::HostMirror< Space >::Space::execution_space host_exec_space;
	+
	+ Kokkos::parallel_for( Kokkos::RangePolicy< host_exec_space >( 0, 10 ), KOKKOS_LAMBDA ( int i ) {
	+ // 'a' is captured by copy, and the capture mechanism converts 'a' to an
	+ // unmanaged copy. When the parallel dispatch accepts a move for the
	+ // lambda, this count should become 1.
	+ ASSERT_EQ( a.use_count(), 2 );
	+ V x = a;
	+ ASSERT_EQ( a.use_count(), 2 );
	+ ASSERT_EQ( x.use_count(), 2 );
	+ });
	+#endif // #if !defined( KOKKOS_ENABLE_CUDA_LAMBDA )
	}
	}

	template< class Space >
	struct TestViewMappingSubview
	{
	- typedef typename Space::execution_space ExecSpace ;
	- typedef typename Space::memory_space MemSpace ;
	+ typedef typename Space::execution_space ExecSpace;
	+ typedef typename Space::memory_space MemSpace;

	- typedef Kokkos::pair<int,int> range ;
	+ typedef Kokkos::pair< int, int > range;

	enum { AN = 10 };
	- typedef Kokkos::View<int*,ExecSpace> AT ;
	- typedef Kokkos::View<const int*,ExecSpace> ACT ;
	- typedef Kokkos::Subview< AT , range > AS ;
	+ typedef Kokkos::View< int*, ExecSpace > AT;
	+ typedef Kokkos::View< const int*, ExecSpace > ACT;
	+ typedef Kokkos::Subview< AT, range > AS;

	- enum { BN0 = 10 , BN1 = 11 , BN2 = 12 };
	- typedef Kokkos::View<int***,ExecSpace> BT ;
	- typedef Kokkos::Subview< BT , range , range , range > BS ;
	+ enum { BN0 = 10, BN1 = 11, BN2 = 12 };
	+ typedef Kokkos::View< int***, ExecSpace > BT;
	+ typedef Kokkos::Subview< BT, range, range, range > BS;

	- enum { CN0 = 10 , CN1 = 11 , CN2 = 12 };
	- typedef Kokkos::View<int***[13][14],ExecSpace> CT ;
	- typedef Kokkos::Subview< CT , range , range , range , int , int > CS ;
	+ enum { CN0 = 10, CN1 = 11, CN2 = 12 };
	+ typedef Kokkos::View< int***[13][14], ExecSpace > CT;
	+ typedef Kokkos::Subview< CT, range, range, range, int, int > CS;

	- enum { DN0 = 10 , DN1 = 11 , DN2 = 12 , DN3 = 13 , DN4 = 14 };
	- typedef Kokkos::View<int***[DN3][DN4],ExecSpace> DT ;
	- typedef Kokkos::Subview< DT , int , range , range , range , int > DS ;
	+ enum { DN0 = 10, DN1 = 11, DN2 = 12, DN3 = 13, DN4 = 14 };
	+ typedef Kokkos::View< int***[DN3][DN4], ExecSpace > DT;
	+ typedef Kokkos::Subview< DT, int, range, range, range, int > DS;

	+ typedef Kokkos::View< int***[13][14], Kokkos::LayoutLeft, ExecSpace > DLT;
	+ typedef Kokkos::Subview< DLT, range, int, int, int, int > DLS1;

	- typedef Kokkos::View<int***[13][14],Kokkos::LayoutLeft,ExecSpace> DLT ;
	- typedef Kokkos::Subview< DLT , range , int , int , int , int > DLS1 ;
	-
	- static_assert( DLS1::rank == 1 && std::is_same< typename DLS1::array_layout , Kokkos::LayoutLeft >::value
	+ static_assert( DLS1::rank == 1 && std::is_same< typename DLS1::array_layout, Kokkos::LayoutLeft >::value
	, "Subview layout error for rank 1 subview of left-most range of LayoutLeft" );

	- typedef Kokkos::View<int***[13][14],Kokkos::LayoutRight,ExecSpace> DRT ;
	- typedef Kokkos::Subview< DRT , int , int , int , int , range > DRS1 ;
	+ typedef Kokkos::View< int***[13][14], Kokkos::LayoutRight, ExecSpace > DRT;
	+ typedef Kokkos::Subview< DRT, int, int, int, int, range > DRS1;

	- static_assert( DRS1::rank == 1 && std::is_same< typename DRS1::array_layout , Kokkos::LayoutRight >::value
	+ static_assert( DRS1::rank == 1 && std::is_same< typename DRS1::array_layout, Kokkos::LayoutRight >::value
	, "Subview layout error for rank 1 subview of right-most range of LayoutRight" );

	- AT Aa ;
	- AS Ab ;
	- ACT Ac ;
	- BT Ba ;
	- BS Bb ;
	- CT Ca ;
	- CS Cb ;
	- DT Da ;
	- DS Db ;
	+ AT Aa;
	+ AS Ab;
	+ ACT Ac;
	+ BT Ba;
	+ BS Bb;
	+ CT Ca;
	+ CS Cb;
	+ DT Da;
	+ DS Db;

	TestViewMappingSubview()
	- : Aa("Aa",AN)
	- , Ab( Kokkos::Experimental::subview( Aa , std::pair<int,int>(1,AN-1) ) )
	- , Ac( Aa , std::pair<int,int>(1,AN-1) )
	- , Ba("Ba",BN0,BN1,BN2)
	+ : Aa( "Aa", AN )
	+ , Ab( Kokkos::Experimental::subview( Aa, std::pair< int, int >( 1, AN - 1 ) ) )
	+ , Ac( Aa, std::pair< int, int >( 1, AN - 1 ) )
	+ , Ba( "Ba", BN0, BN1, BN2 )
	, Bb( Kokkos::Experimental::subview( Ba
	- , std::pair<int,int>(1,BN0-1)
	- , std::pair<int,int>(1,BN1-1)
	- , std::pair<int,int>(1,BN2-1)
	+ , std::pair< int, int >( 1, BN0 - 1 )
	+ , std::pair< int, int >( 1, BN1 - 1 )
	+ , std::pair< int, int >( 1, BN2 - 1 )
	) )
	- , Ca("Ca",CN0,CN1,CN2)
	+ , Ca( "Ca", CN0, CN1, CN2 )
	, Cb( Kokkos::Experimental::subview( Ca
	- , std::pair<int,int>(1,CN0-1)
	- , std::pair<int,int>(1,CN1-1)
	- , std::pair<int,int>(1,CN2-1)
	+ , std::pair< int, int >( 1, CN0 - 1 )
	+ , std::pair< int, int >( 1, CN1 - 1 )
	+ , std::pair< int, int >( 1, CN2 - 1 )
	, 1
	, 2
	) )
	- , Da("Da",DN0,DN1,DN2)
	+ , Da( "Da", DN0, DN1, DN2 )
	, Db( Kokkos::Experimental::subview( Da
	, 1
	- , std::pair<int,int>(1,DN1-1)
	- , std::pair<int,int>(1,DN2-1)
	- , std::pair<int,int>(1,DN3-1)
	+ , std::pair< int, int >( 1, DN1 - 1 )
	+ , std::pair< int, int >( 1, DN2 - 1 )
	+ , std::pair< int, int >( 1, DN3 - 1 )
	, 2
	) )
	+ {}
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const int, long & error_count ) const
	+ {
	+ auto Ad = Kokkos::Experimental::subview< Kokkos::MemoryUnmanaged >( Aa, Kokkos::pair< int, int >( 1, AN - 1 ) );
	+
	+ for ( int i = 1; i < AN - 1; ++i ) if( & Aa[i] != & Ab[i - 1] ) ++error_count;
	+ for ( int i = 1; i < AN - 1; ++i ) if( & Aa[i] != & Ac[i - 1] ) ++error_count;
	+ for ( int i = 1; i < AN - 1; ++i ) if( & Aa[i] != & Ad[i - 1] ) ++error_count;
	+
	+ for ( int i2 = 1; i2 < BN2 - 1; ++i2 )
	+ for ( int i1 = 1; i1 < BN1 - 1; ++i1 )
	+ for ( int i0 = 1; i0 < BN0 - 1; ++i0 )
	{
	+ if ( & Ba( i0, i1, i2 ) != & Bb( i0 - 1, i1 - 1, i2 - 1 ) ) ++error_count;
	}

	+ for ( int i2 = 1; i2 < CN2 - 1; ++i2 )
	+ for ( int i1 = 1; i1 < CN1 - 1; ++i1 )
	+ for ( int i0 = 1; i0 < CN0 - 1; ++i0 )
	+ {
	+ if ( & Ca( i0, i1, i2, 1, 2 ) != & Cb( i0 - 1, i1 - 1, i2 - 1 ) ) ++error_count;
	+ }

	- KOKKOS_INLINE_FUNCTION
	- void operator()( const int , long & error_count ) const
	+ for ( int i2 = 1; i2 < DN3 - 1; ++i2 )
	+ for ( int i1 = 1; i1 < DN2 - 1; ++i1 )
	+ for ( int i0 = 1; i0 < DN1 - 1; ++i0 )
	{
	- auto Ad = Kokkos::Experimental::subview< Kokkos::MemoryUnmanaged >( Aa , Kokkos::pair<int,int>(1,AN-1) );
	-
	- for ( int i = 1 ; i < AN-1 ; ++i ) if( & Aa[i] != & Ab[i-1] ) ++error_count ;
	- for ( int i = 1 ; i < AN-1 ; ++i ) if( & Aa[i] != & Ac[i-1] ) ++error_count ;
	- for ( int i = 1 ; i < AN-1 ; ++i ) if( & Aa[i] != & Ad[i-1] ) ++error_count ;
	-
	- for ( int i2 = 1 ; i2 < BN2-1 ; ++i2 ) {
	- for ( int i1 = 1 ; i1 < BN1-1 ; ++i1 ) {
	- for ( int i0 = 1 ; i0 < BN0-1 ; ++i0 ) {
	- if ( & Ba(i0,i1,i2) != & Bb(i0-1,i1-1,i2-1) ) ++error_count ;
	- }}}
	-
	- for ( int i2 = 1 ; i2 < CN2-1 ; ++i2 ) {
	- for ( int i1 = 1 ; i1 < CN1-1 ; ++i1 ) {
	- for ( int i0 = 1 ; i0 < CN0-1 ; ++i0 ) {
	- if ( & Ca(i0,i1,i2,1,2) != & Cb(i0-1,i1-1,i2-1) ) ++error_count ;
	- }}}
	-
	- for ( int i2 = 1 ; i2 < DN3-1 ; ++i2 ) {
	- for ( int i1 = 1 ; i1 < DN2-1 ; ++i1 ) {
	- for ( int i0 = 1 ; i0 < DN1-1 ; ++i0 ) {
	- if ( & Da(1,i0,i1,i2,2) != & Db(i0-1,i1-1,i2-1) ) ++error_count ;
	- }}}
	+ if ( & Da( 1, i0, i1, i2, 2 ) != & Db( i0 - 1, i1 - 1, i2 - 1 ) ) ++error_count;
	}
	+ }

	static void run()
	{
	- TestViewMappingSubview self ;
	-
	- ASSERT_EQ( self.Aa.dimension_0() , AN );
	- ASSERT_EQ( self.Ab.dimension_0() , AN - 2 );
	- ASSERT_EQ( self.Ac.dimension_0() , AN - 2 );
	- ASSERT_EQ( self.Ba.dimension_0() , BN0 );
	- ASSERT_EQ( self.Ba.dimension_1() , BN1 );
	- ASSERT_EQ( self.Ba.dimension_2() , BN2 );
	- ASSERT_EQ( self.Bb.dimension_0() , BN0 - 2 );
	- ASSERT_EQ( self.Bb.dimension_1() , BN1 - 2 );
	- ASSERT_EQ( self.Bb.dimension_2() , BN2 - 2 );
	-
	- ASSERT_EQ( self.Ca.dimension_0() , CN0 );
	- ASSERT_EQ( self.Ca.dimension_1() , CN1 );
	- ASSERT_EQ( self.Ca.dimension_2() , CN2 );
	- ASSERT_EQ( self.Ca.dimension_3() , 13 );
	- ASSERT_EQ( self.Ca.dimension_4() , 14 );
	- ASSERT_EQ( self.Cb.dimension_0() , CN0 - 2 );
	- ASSERT_EQ( self.Cb.dimension_1() , CN1 - 2 );
	- ASSERT_EQ( self.Cb.dimension_2() , CN2 - 2 );
	-
	- ASSERT_EQ( self.Da.dimension_0() , DN0 );
	- ASSERT_EQ( self.Da.dimension_1() , DN1 );
	- ASSERT_EQ( self.Da.dimension_2() , DN2 );
	- ASSERT_EQ( self.Da.dimension_3() , DN3 );
	- ASSERT_EQ( self.Da.dimension_4() , DN4 );
	-
	- ASSERT_EQ( self.Db.dimension_0() , DN1 - 2 );
	- ASSERT_EQ( self.Db.dimension_1() , DN2 - 2 );
	- ASSERT_EQ( self.Db.dimension_2() , DN3 - 2 );
	-
	- ASSERT_EQ( self.Da.stride_1() , self.Db.stride_0() );
	- ASSERT_EQ( self.Da.stride_2() , self.Db.stride_1() );
	- ASSERT_EQ( self.Da.stride_3() , self.Db.stride_2() );
	-
	- long error_count = -1 ;
	- Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >(0,1) , self , error_count );
	- ASSERT_EQ( error_count , 0 );
	+ TestViewMappingSubview self;
	+
	+ ASSERT_EQ( self.Aa.dimension_0(), AN );
	+ ASSERT_EQ( self.Ab.dimension_0(), AN - 2 );
	+ ASSERT_EQ( self.Ac.dimension_0(), AN - 2 );
	+ ASSERT_EQ( self.Ba.dimension_0(), BN0 );
	+ ASSERT_EQ( self.Ba.dimension_1(), BN1 );
	+ ASSERT_EQ( self.Ba.dimension_2(), BN2 );
	+ ASSERT_EQ( self.Bb.dimension_0(), BN0 - 2 );
	+ ASSERT_EQ( self.Bb.dimension_1(), BN1 - 2 );
	+ ASSERT_EQ( self.Bb.dimension_2(), BN2 - 2 );
	+
	+ ASSERT_EQ( self.Ca.dimension_0(), CN0 );
	+ ASSERT_EQ( self.Ca.dimension_1(), CN1 );
	+ ASSERT_EQ( self.Ca.dimension_2(), CN2 );
	+ ASSERT_EQ( self.Ca.dimension_3(), 13 );
	+ ASSERT_EQ( self.Ca.dimension_4(), 14 );
	+ ASSERT_EQ( self.Cb.dimension_0(), CN0 - 2 );
	+ ASSERT_EQ( self.Cb.dimension_1(), CN1 - 2 );
	+ ASSERT_EQ( self.Cb.dimension_2(), CN2 - 2 );
	+
	+ ASSERT_EQ( self.Da.dimension_0(), DN0 );
	+ ASSERT_EQ( self.Da.dimension_1(), DN1 );
	+ ASSERT_EQ( self.Da.dimension_2(), DN2 );
	+ ASSERT_EQ( self.Da.dimension_3(), DN3 );
	+ ASSERT_EQ( self.Da.dimension_4(), DN4 );
	+
	+ ASSERT_EQ( self.Db.dimension_0(), DN1 - 2 );
	+ ASSERT_EQ( self.Db.dimension_1(), DN2 - 2 );
	+ ASSERT_EQ( self.Db.dimension_2(), DN3 - 2 );
	+
	+ ASSERT_EQ( self.Da.stride_1(), self.Db.stride_0() );
	+ ASSERT_EQ( self.Da.stride_2(), self.Db.stride_1() );
	+ ASSERT_EQ( self.Da.stride_3(), self.Db.stride_2() );
	+
	+ long error_count = -1;
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, 1 ), self, error_count );
	+ ASSERT_EQ( error_count, 0 );
	}
	-
	};

	template< class Space >
	void test_view_mapping_subview()
	{
	- typedef typename Space::execution_space ExecSpace ;
	+ typedef typename Space::execution_space ExecSpace;

	TestViewMappingSubview< ExecSpace >::run();
	}

	/--------------------------------------------------------------------------/

	template< class ViewType >
	struct TestViewMapOperator {

	static_assert( ViewType::reference_type_is_lvalue_reference
	, "Test only valid for lvalue reference type" );

	- const ViewType v ;
	+ const ViewType v;

	KOKKOS_INLINE_FUNCTION
	- void test_left( size_t i0 , long & error_count ) const
	+ void test_left( size_t i0, long & error_count ) const
	+ {
	+ typename ViewType::value_type * const base_ptr = & v( 0, 0, 0, 0, 0, 0, 0, 0 );
	+ const size_t n1 = v.dimension_1();
	+ const size_t n2 = v.dimension_2();
	+ const size_t n3 = v.dimension_3();
	+ const size_t n4 = v.dimension_4();
	+ const size_t n5 = v.dimension_5();
	+ const size_t n6 = v.dimension_6();
	+ const size_t n7 = v.dimension_7();
	+
	+ long offset = 0;
	+
	+ for ( size_t i7 = 0; i7 < n7; ++i7 )
	+ for ( size_t i6 = 0; i6 < n6; ++i6 )
	+ for ( size_t i5 = 0; i5 < n5; ++i5 )
	+ for ( size_t i4 = 0; i4 < n4; ++i4 )
	+ for ( size_t i3 = 0; i3 < n3; ++i3 )
	+ for ( size_t i2 = 0; i2 < n2; ++i2 )
	+ for ( size_t i1 = 0; i1 < n1; ++i1 )
	{
	- typename ViewType::value_type * const base_ptr = & v(0,0,0,0,0,0,0,0);
	- const size_t n1 = v.dimension_1();
	- const size_t n2 = v.dimension_2();
	- const size_t n3 = v.dimension_3();
	- const size_t n4 = v.dimension_4();
	- const size_t n5 = v.dimension_5();
	- const size_t n6 = v.dimension_6();
	- const size_t n7 = v.dimension_7();
	-
	- long offset = 0 ;
	-
	- for ( size_t i7 = 0 ; i7 < n7 ; ++i7 )
	- for ( size_t i6 = 0 ; i6 < n6 ; ++i6 )
	- for ( size_t i5 = 0 ; i5 < n5 ; ++i5 )
	- for ( size_t i4 = 0 ; i4 < n4 ; ++i4 )
	- for ( size_t i3 = 0 ; i3 < n3 ; ++i3 )
	- for ( size_t i2 = 0 ; i2 < n2 ; ++i2 )
	- for ( size_t i1 = 0 ; i1 < n1 ; ++i1 )
	- {
	- const long d = & v(i0,i1,i2,i3,i4,i5,i6,i7) - base_ptr ;
	- if ( d < offset ) ++error_count ;
	- offset = d ;
	- }
	-
	- if ( v.span() <= size_t(offset) ) ++error_count ;
	+ const long d = & v( i0, i1, i2, i3, i4, i5, i6, i7 ) - base_ptr;
	+ if ( d < offset ) ++error_count;
	+ offset = d;
	}

	+ if ( v.span() <= size_t( offset ) ) ++error_count;
	+ }
	+
	KOKKOS_INLINE_FUNCTION
	- void test_right( size_t i0 , long & error_count ) const
	+ void test_right( size_t i0, long & error_count ) const
	+ {
	+ typename ViewType::value_type * const base_ptr = & v( 0, 0, 0, 0, 0, 0, 0, 0 );
	+ const size_t n1 = v.dimension_1();
	+ const size_t n2 = v.dimension_2();
	+ const size_t n3 = v.dimension_3();
	+ const size_t n4 = v.dimension_4();
	+ const size_t n5 = v.dimension_5();
	+ const size_t n6 = v.dimension_6();
	+ const size_t n7 = v.dimension_7();
	+
	+ long offset = 0;
	+
	+ for ( size_t i1 = 0; i1 < n1; ++i1 )
	+ for ( size_t i2 = 0; i2 < n2; ++i2 )
	+ for ( size_t i3 = 0; i3 < n3; ++i3 )
	+ for ( size_t i4 = 0; i4 < n4; ++i4 )
	+ for ( size_t i5 = 0; i5 < n5; ++i5 )
	+ for ( size_t i6 = 0; i6 < n6; ++i6 )
	+ for ( size_t i7 = 0; i7 < n7; ++i7 )
	{
	- typename ViewType::value_type * const base_ptr = & v(0,0,0,0,0,0,0,0);
	- const size_t n1 = v.dimension_1();
	- const size_t n2 = v.dimension_2();
	- const size_t n3 = v.dimension_3();
	- const size_t n4 = v.dimension_4();
	- const size_t n5 = v.dimension_5();
	- const size_t n6 = v.dimension_6();
	- const size_t n7 = v.dimension_7();
	-
	- long offset = 0 ;
	-
	- for ( size_t i1 = 0 ; i1 < n1 ; ++i1 )
	- for ( size_t i2 = 0 ; i2 < n2 ; ++i2 )
	- for ( size_t i3 = 0 ; i3 < n3 ; ++i3 )
	- for ( size_t i4 = 0 ; i4 < n4 ; ++i4 )
	- for ( size_t i5 = 0 ; i5 < n5 ; ++i5 )
	- for ( size_t i6 = 0 ; i6 < n6 ; ++i6 )
	- for ( size_t i7 = 0 ; i7 < n7 ; ++i7 )
	- {
	- const long d = & v(i0,i1,i2,i3,i4,i5,i6,i7) - base_ptr ;
	- if ( d < offset ) ++error_count ;
	- offset = d ;
	- }
	-
	- if ( v.span() <= size_t(offset) ) ++error_count ;
	+ const long d = & v( i0, i1, i2, i3, i4, i5, i6, i7 ) - base_ptr;
	+ if ( d < offset ) ++error_count;
	+ offset = d;
	}

	+ if ( v.span() <= size_t( offset ) ) ++error_count;
	+ }
	+
	KOKKOS_INLINE_FUNCTION
	- void operator()( size_t i , long & error_count ) const
	- {
	- if ( std::is_same< typename ViewType::array_layout , Kokkos::LayoutLeft >::value )
	- test_left(i,error_count);
	- else if ( std::is_same< typename ViewType::array_layout , Kokkos::LayoutRight >::value )
	- test_right(i,error_count);
	+ void operator()( size_t i, long & error_count ) const
	+ {
	+ if ( std::is_same< typename ViewType::array_layout, Kokkos::LayoutLeft >::value ) {
	+ test_left( i, error_count );
	}
	+ else if ( std::is_same< typename ViewType::array_layout, Kokkos::LayoutRight >::value ) {
	+ test_right( i, error_count );
	+ }
	+ }

	- constexpr static size_t N0 = 10 ;
	- constexpr static size_t N1 = 9 ;
	- constexpr static size_t N2 = 8 ;
	- constexpr static size_t N3 = 7 ;
	- constexpr static size_t N4 = 6 ;
	- constexpr static size_t N5 = 5 ;
	- constexpr static size_t N6 = 4 ;
	- constexpr static size_t N7 = 3 ;
	+ constexpr static size_t N0 = 10;
	+ constexpr static size_t N1 = 9;
	+ constexpr static size_t N2 = 8;
	+ constexpr static size_t N3 = 7;
	+ constexpr static size_t N4 = 6;
	+ constexpr static size_t N5 = 5;
	+ constexpr static size_t N6 = 4;
	+ constexpr static size_t N7 = 3;

	- TestViewMapOperator() : v( "Test" , N0, N1, N2, N3, N4, N5, N6, N7 ) {}
	+ TestViewMapOperator() : v( "Test", N0, N1, N2, N3, N4, N5, N6, N7 ) {}

	static void run()
	- {
	- TestViewMapOperator self ;
	-
	- ASSERT_EQ( self.v.dimension_0() , ( 0 < ViewType::rank ? N0 : 1 ) );
	- ASSERT_EQ( self.v.dimension_1() , ( 1 < ViewType::rank ? N1 : 1 ) );
	- ASSERT_EQ( self.v.dimension_2() , ( 2 < ViewType::rank ? N2 : 1 ) );
	- ASSERT_EQ( self.v.dimension_3() , ( 3 < ViewType::rank ? N3 : 1 ) );
	- ASSERT_EQ( self.v.dimension_4() , ( 4 < ViewType::rank ? N4 : 1 ) );
	- ASSERT_EQ( self.v.dimension_5() , ( 5 < ViewType::rank ? N5 : 1 ) );
	- ASSERT_EQ( self.v.dimension_6() , ( 6 < ViewType::rank ? N6 : 1 ) );
	- ASSERT_EQ( self.v.dimension_7() , ( 7 < ViewType::rank ? N7 : 1 ) );
	-
	- ASSERT_LE( self.v.dimension_0()*
	- self.v.dimension_1()*
	- self.v.dimension_2()*
	- self.v.dimension_3()*
	- self.v.dimension_4()*
	- self.v.dimension_5()*
	- self.v.dimension_6()*
	- self.v.dimension_7()
	- , self.v.span() );
	-
	- long error_count ;
	- Kokkos::RangePolicy< typename ViewType::execution_space > range(0,self.v.dimension_0());
	- Kokkos::parallel_reduce( range , self , error_count );
	- ASSERT_EQ( 0 , error_count );
	- }
	+ {
	+ TestViewMapOperator self;
	+
	+ ASSERT_EQ( self.v.dimension_0(), ( 0 < ViewType::rank ? N0 : 1 ) );
	+ ASSERT_EQ( self.v.dimension_1(), ( 1 < ViewType::rank ? N1 : 1 ) );
	+ ASSERT_EQ( self.v.dimension_2(), ( 2 < ViewType::rank ? N2 : 1 ) );
	+ ASSERT_EQ( self.v.dimension_3(), ( 3 < ViewType::rank ? N3 : 1 ) );
	+ ASSERT_EQ( self.v.dimension_4(), ( 4 < ViewType::rank ? N4 : 1 ) );
	+ ASSERT_EQ( self.v.dimension_5(), ( 5 < ViewType::rank ? N5 : 1 ) );
	+ ASSERT_EQ( self.v.dimension_6(), ( 6 < ViewType::rank ? N6 : 1 ) );
	+ ASSERT_EQ( self.v.dimension_7(), ( 7 < ViewType::rank ? N7 : 1 ) );
	+
	+ ASSERT_LE( self.v.dimension_0() *
	+ self.v.dimension_1() *
	+ self.v.dimension_2() *
	+ self.v.dimension_3() *
	+ self.v.dimension_4() *
	+ self.v.dimension_5() *
	+ self.v.dimension_6() *
	+ self.v.dimension_7()
	+ , self.v.span() );
	+
	+ long error_count;
	+ Kokkos::RangePolicy< typename ViewType::execution_space > range( 0, self.v.dimension_0() );
	+ Kokkos::parallel_reduce( range, self, error_count );
	+ ASSERT_EQ( 0, error_count );
	+ }
	};

	-
	template< class Space >
	void test_view_mapping_operator()
	{
	- typedef typename Space::execution_space ExecSpace ;
	-
	- TestViewMapOperator< Kokkos::View<int,Kokkos::LayoutLeft,ExecSpace> >::run();
	- TestViewMapOperator< Kokkos::View<int*,Kokkos::LayoutLeft,ExecSpace> >::run();
	- TestViewMapOperator< Kokkos::View<int**,Kokkos::LayoutLeft,ExecSpace> >::run();
	- TestViewMapOperator< Kokkos::View<int***,Kokkos::LayoutLeft,ExecSpace> >::run();
	- TestViewMapOperator< Kokkos::View<int****,Kokkos::LayoutLeft,ExecSpace> >::run();
	- TestViewMapOperator< Kokkos::View<int*****,Kokkos::LayoutLeft,ExecSpace> >::run();
	- TestViewMapOperator< Kokkos::View<int******,Kokkos::LayoutLeft,ExecSpace> >::run();
	- TestViewMapOperator< Kokkos::View<int*******,Kokkos::LayoutLeft,ExecSpace> >::run();
	-
	- TestViewMapOperator< Kokkos::View<int,Kokkos::LayoutRight,ExecSpace> >::run();
	- TestViewMapOperator< Kokkos::View<int*,Kokkos::LayoutRight,ExecSpace> >::run();
	- TestViewMapOperator< Kokkos::View<int**,Kokkos::LayoutRight,ExecSpace> >::run();
	- TestViewMapOperator< Kokkos::View<int***,Kokkos::LayoutRight,ExecSpace> >::run();
	- TestViewMapOperator< Kokkos::View<int****,Kokkos::LayoutRight,ExecSpace> >::run();
	- TestViewMapOperator< Kokkos::View<int*****,Kokkos::LayoutRight,ExecSpace> >::run();
	- TestViewMapOperator< Kokkos::View<int******,Kokkos::LayoutRight,ExecSpace> >::run();
	- TestViewMapOperator< Kokkos::View<int*******,Kokkos::LayoutRight,ExecSpace> >::run();
	+ typedef typename Space::execution_space ExecSpace;
	+
	+ TestViewMapOperator< Kokkos::View<int, Kokkos::LayoutLeft, ExecSpace> >::run();
	+ TestViewMapOperator< Kokkos::View<int*, Kokkos::LayoutLeft, ExecSpace> >::run();
	+ TestViewMapOperator< Kokkos::View<int**, Kokkos::LayoutLeft, ExecSpace> >::run();
	+ TestViewMapOperator< Kokkos::View<int***, Kokkos::LayoutLeft, ExecSpace> >::run();
	+ TestViewMapOperator< Kokkos::View<int****, Kokkos::LayoutLeft, ExecSpace> >::run();
	+ TestViewMapOperator< Kokkos::View<int*****, Kokkos::LayoutLeft, ExecSpace> >::run();
	+ TestViewMapOperator< Kokkos::View<int******, Kokkos::LayoutLeft, ExecSpace> >::run();
	+ TestViewMapOperator< Kokkos::View<int*******, Kokkos::LayoutLeft, ExecSpace> >::run();
	+
	+ TestViewMapOperator< Kokkos::View<int, Kokkos::LayoutRight, ExecSpace> >::run();
	+ TestViewMapOperator< Kokkos::View<int*, Kokkos::LayoutRight, ExecSpace> >::run();
	+ TestViewMapOperator< Kokkos::View<int**, Kokkos::LayoutRight, ExecSpace> >::run();
	+ TestViewMapOperator< Kokkos::View<int***, Kokkos::LayoutRight, ExecSpace> >::run();
	+ TestViewMapOperator< Kokkos::View<int****, Kokkos::LayoutRight, ExecSpace> >::run();
	+ TestViewMapOperator< Kokkos::View<int*****, Kokkos::LayoutRight, ExecSpace> >::run();
	+ TestViewMapOperator< Kokkos::View<int******, Kokkos::LayoutRight, ExecSpace> >::run();
	+ TestViewMapOperator< Kokkos::View<int*******, Kokkos::LayoutRight, ExecSpace> >::run();
	}

	/--------------------------------------------------------------------------/

	template< class Space >
	struct TestViewMappingAtomic {
	- typedef typename Space::execution_space ExecSpace ;
	- typedef typename Space::memory_space MemSpace ;
	+ typedef typename Space::execution_space ExecSpace;
	+ typedef typename Space::memory_space MemSpace;

	- typedef Kokkos::MemoryTraits< Kokkos::Atomic > mem_trait ;
	+ typedef Kokkos::MemoryTraits< Kokkos::Atomic > mem_trait;

	- typedef Kokkos::View< int * , ExecSpace > T ;
	- typedef Kokkos::View< int * , ExecSpace , mem_trait > T_atom ;
	+ typedef Kokkos::View< int *, ExecSpace > T;
	+ typedef Kokkos::View< int *, ExecSpace, mem_trait > T_atom;

	- T x ;
	- T_atom x_atom ;
	+ T x;
	+ T_atom x_atom;

	- constexpr static size_t N = 100000 ;
	+ constexpr static size_t N = 100000;

	struct TagInit {};
	struct TagUpdate {};
	struct TagVerify {};

	KOKKOS_INLINE_FUNCTION
	- void operator()( const TagInit & , const int i ) const
	- { x(i) = i ; }
	+ void operator()( const TagInit &, const int i ) const
	+ { x( i ) = i; }

	KOKKOS_INLINE_FUNCTION
	- void operator()( const TagUpdate & , const int i ) const
	- { x_atom(i%2) += 1 ; }
	+ void operator()( const TagUpdate &, const int i ) const
	+ { x_atom( i % 2 ) += 1; }

	KOKKOS_INLINE_FUNCTION
	- void operator()( const TagVerify & , const int i , long & error_count ) const
	- {
	- if ( i < 2 ) { if ( x(i) != int(i + N / 2) ) ++error_count ; }
	- else { if ( x(i) != int(i) ) ++error_count ; }
	- }
	+ void operator()( const TagVerify &, const int i, long & error_count ) const
	+ {
	+ if ( i < 2 ) { if ( x( i ) != int( i + N / 2 ) ) ++error_count; }
	+ else { if ( x( i ) != int( i ) ) ++error_count; }
	+ }

	TestViewMappingAtomic()
	- : x("x",N)
	+ : x( "x", N )
	, x_atom( x )
	{}

	static void run()
	+ {
	+ ASSERT_TRUE( T::reference_type_is_lvalue_reference );
	+ ASSERT_FALSE( T_atom::reference_type_is_lvalue_reference );
	+
	+ TestViewMappingAtomic self;
	+
	+ Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace, TagInit >( 0, N ), self );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace, TagUpdate >( 0, N ), self );
	+
	+ long error_count = -1;
	+
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace, TagVerify >( 0, N ), self, error_count );
	+
	+ ASSERT_EQ( 0, error_count );
	+
	+ typename TestViewMappingAtomic::T_atom::HostMirror x_host = Kokkos::create_mirror_view( self.x );
	+ Kokkos::deep_copy( x_host, self.x );
	+
	+ error_count = -1;
	+
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< Kokkos::DefaultHostExecutionSpace, TagVerify >( 0, N ),
	+ [=] ( const TagVerify &, const int i, long & tmp_error_count )
	{
	- ASSERT_TRUE( T::reference_type_is_lvalue_reference );
	- ASSERT_FALSE( T_atom::reference_type_is_lvalue_reference );
	-
	- TestViewMappingAtomic self ;
	- Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace , TagInit >(0,N) , self );
	- Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace , TagUpdate >(0,N) , self );
	- long error_count = -1 ;
	- Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace , TagVerify >(0,N) , self , error_count );
	- ASSERT_EQ( 0 , error_count );
	- typename TestViewMappingAtomic::T_atom::HostMirror x_host = Kokkos::create_mirror_view(self.x);
	- Kokkos::deep_copy(x_host,self.x);
	- error_count = -1;
	- Kokkos::parallel_reduce( Kokkos::RangePolicy< Kokkos::DefaultHostExecutionSpace, TagVerify>(0,N),
	- [=] ( const TagVerify & , const int i , long & tmp_error_count ) {
	- if ( i < 2 ) { if ( x_host(i) != int(i + N / 2) ) ++tmp_error_count ; }
	- else { if ( x_host(i) != int(i) ) ++tmp_error_count ; }
	- }, error_count);
	- ASSERT_EQ( 0 , error_count );
	- Kokkos::deep_copy(self.x,x_host);
	- }
	+ if ( i < 2 ) {
	+ if ( x_host( i ) != int( i + N / 2 ) ) ++tmp_error_count ;
	+ }
	+ else {
	+ if ( x_host( i ) != int( i ) ) ++tmp_error_count ;
	+ }
	+ }, error_count);
	+
	+ ASSERT_EQ( 0 , error_count );
	+ Kokkos::deep_copy( self.x, x_host );
	+ }
	};

	/--------------------------------------------------------------------------/

	template< class Space >
	struct TestViewMappingClassValue {
	- typedef typename Space::execution_space ExecSpace ;
	- typedef typename Space::memory_space MemSpace ;
	+ typedef typename Space::execution_space ExecSpace;
	+ typedef typename Space::memory_space MemSpace;

	struct ValueType {
	KOKKOS_INLINE_FUNCTION
	ValueType()
	{
	#if 0
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA )
	- printf("TestViewMappingClassValue construct on Cuda\n");
	+ printf( "TestViewMappingClassValue construct on Cuda\n" );
	#elif defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- printf("TestViewMappingClassValue construct on Host\n");
	+ printf( "TestViewMappingClassValue construct on Host\n" );
	#else
	- printf("TestViewMappingClassValue construct unknown\n");
	+ printf( "TestViewMappingClassValue construct unknown\n" );
	#endif
	#endif
	}
	KOKKOS_INLINE_FUNCTION
	~ValueType()
	{
	#if 0
	#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA )
	- printf("TestViewMappingClassValue destruct on Cuda\n");
	+ printf( "TestViewMappingClassValue destruct on Cuda\n" );
	#elif defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
	- printf("TestViewMappingClassValue destruct on Host\n");
	+ printf( "TestViewMappingClassValue destruct on Host\n" );
	#else
	- printf("TestViewMappingClassValue destruct unknown\n");
	+ printf( "TestViewMappingClassValue destruct unknown\n" );
	#endif
	#endif
	}
	};

	static void run()
	{
	- using namespace Kokkos::Experimental ;
	+ using namespace Kokkos::Experimental;
	+
	ExecSpace::fence();
	{
	- View< ValueType , ExecSpace > a("a");
	+ View< ValueType, ExecSpace > a( "a" );
	ExecSpace::fence();
	}
	ExecSpace::fence();
	}
	};

	-} /* namespace Test */
	-
	-/--------------------------------------------------------------------------/
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/TestViewOfClass.hpp b/lib/kokkos/core/unit_test/TestViewOfClass.hpp
	index 381b8786b..d624c5dda 100644
	--- a/lib/kokkos/core/unit_test/TestViewOfClass.hpp
	+++ b/lib/kokkos/core/unit_test/TestViewOfClass.hpp
	@@ -1,131 +1,121 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <Kokkos_Core.hpp>
	#include <stdexcept>
	#include <sstream>
	#include <iostream>

	-/--------------------------------------------------------------------------/
	-
	namespace Test {

	template< class Space >
	struct NestedView {
	-
	- Kokkos::View<int*,Space> member ;
	+ Kokkos::View< int*, Space > member;

	public:
	-
	KOKKOS_INLINE_FUNCTION
	- NestedView() : member()
	- {}
	+ NestedView() : member() {}

	KOKKOS_INLINE_FUNCTION
	- NestedView & operator = ( const Kokkos::View<int*,Space> & lhs )
	- {
	- member = lhs ;
	- if ( member.dimension_0() ) Kokkos::atomic_add( & member(0) , 1 );
	- return *this ;
	- }
	+ NestedView & operator=( const Kokkos::View< int*, Space > & lhs )
	+ {
	+ member = lhs;
	+ if ( member.dimension_0() ) Kokkos::atomic_add( & member( 0 ), 1 );
	+ return *this;
	+ }

	KOKKOS_INLINE_FUNCTION
	~NestedView()
	- {
	+ {
	if ( member.dimension_0() ) {
	- Kokkos::atomic_add( & member(0) , -1 );
	+ Kokkos::atomic_add( & member( 0 ), -1 );
	}
	}
	};

	template< class Space >
	struct NestedViewFunctor {

	- Kokkos::View< NestedView<Space> * , Space > nested ;
	- Kokkos::View<int*,Space> array ;
	+ Kokkos::View< NestedView<Space> *, Space > nested;
	+ Kokkos::View< int*, Space > array;

	- NestedViewFunctor(
	- const Kokkos::View< NestedView<Space> * , Space > & arg_nested ,
	- const Kokkos::View<int*,Space> & arg_array )
	+ NestedViewFunctor(
	+ const Kokkos::View< NestedView<Space> *, Space > & arg_nested,
	+ const Kokkos::View< int*, Space > & arg_array )
	: nested( arg_nested )
	, array( arg_array )
	{}

	KOKKOS_INLINE_FUNCTION
	- void operator()( int i ) const
	- { nested[i] = array ; }
	+ void operator()( int i ) const { nested[i] = array; }
	};

	-
	template< class Space >
	void view_nested_view()
	{
	- Kokkos::View<int*,Space> tracking("tracking",1);
	+ Kokkos::View< int*, Space > tracking( "tracking", 1 );

	- typename Kokkos::View<int*,Space>::HostMirror
	- host_tracking = Kokkos::create_mirror( tracking );
	+ typename Kokkos::View< int*, Space >::HostMirror host_tracking = Kokkos::create_mirror( tracking );

	{
	- Kokkos::View< NestedView<Space> * , Space > a("a_nested_view",2);
	+ Kokkos::View< NestedView<Space> *, Space > a( "a_nested_view", 2 );

	- Kokkos::parallel_for( Kokkos::RangePolicy<Space>(0,2) , NestedViewFunctor<Space>( a , tracking ) );
	- Kokkos::deep_copy( host_tracking , tracking );
	- ASSERT_EQ( 2 , host_tracking(0) );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< Space >( 0, 2 ), NestedViewFunctor< Space >( a, tracking ) );
	+ Kokkos::deep_copy( host_tracking, tracking );
	+ ASSERT_EQ( 2, host_tracking( 0 ) );

	- Kokkos::View< NestedView<Space> * , Space > b("b_nested_view",2);
	- Kokkos::parallel_for( Kokkos::RangePolicy<Space>(0,2) , NestedViewFunctor<Space>( b , tracking ) );
	- Kokkos::deep_copy( host_tracking , tracking );
	- ASSERT_EQ( 4 , host_tracking(0) );
	+ Kokkos::View< NestedView<Space> *, Space > b( "b_nested_view", 2 );
	+ Kokkos::parallel_for( Kokkos::RangePolicy< Space >( 0, 2 ), NestedViewFunctor< Space >( b, tracking ) );
	+ Kokkos::deep_copy( host_tracking, tracking );
	+ ASSERT_EQ( 4, host_tracking( 0 ) );

	}
	- Kokkos::deep_copy( host_tracking , tracking );

	- ASSERT_EQ( 0 , host_tracking(0) );
	-}
	+ Kokkos::deep_copy( host_tracking, tracking );

	+ ASSERT_EQ( 0, host_tracking( 0 ) );
	}

	-/--------------------------------------------------------------------------/
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/TestViewSpaceAssign.hpp b/lib/kokkos/core/unit_test/TestViewSpaceAssign.hpp
	index 09141e582..21ae92e93 100644
	--- a/lib/kokkos/core/unit_test/TestViewSpaceAssign.hpp
	+++ b/lib/kokkos/core/unit_test/TestViewSpaceAssign.hpp
	@@ -1,82 +1,76 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <Kokkos_Core.hpp>
	#include <stdexcept>
	#include <sstream>
	#include <iostream>

	-/--------------------------------------------------------------------------/
	-
	namespace Test {

	-template< typename SpaceDst , typename SpaceSrc >
	+template< typename SpaceDst, typename SpaceSrc >
	void view_space_assign()
	{
	- Kokkos::View<double*,SpaceDst> a =
	- Kokkos::View<double*,SpaceSrc>("a",1);
	+ Kokkos::View< double*, SpaceDst > a =
	+ Kokkos::View< double*, SpaceSrc >( "a", 1 );

	- Kokkos::View<double*,Kokkos::LayoutLeft,SpaceDst> b =
	- Kokkos::View<double*,Kokkos::LayoutLeft,SpaceSrc>("b",1);
	+ Kokkos::View< double*, Kokkos::LayoutLeft, SpaceDst > b =
	+ Kokkos::View< double*, Kokkos::LayoutLeft, SpaceSrc >( "b", 1 );

	- Kokkos::View<double*,Kokkos::LayoutRight,SpaceDst> c =
	- Kokkos::View<double*,Kokkos::LayoutRight,SpaceSrc>("c",1);
	+ Kokkos::View< double*, Kokkos::LayoutRight, SpaceDst > c =
	+ Kokkos::View< double*, Kokkos::LayoutRight, SpaceSrc >( "c", 1 );

	- Kokkos::View<double*,SpaceDst,Kokkos::MemoryRandomAccess> d =
	- Kokkos::View<double*,SpaceSrc>("d",1);
	+ Kokkos::View< double*, SpaceDst, Kokkos::MemoryRandomAccess > d =
	+ Kokkos::View< double*, SpaceSrc >( "d", 1 );

	- Kokkos::View<double*,Kokkos::LayoutLeft,SpaceDst,Kokkos::MemoryRandomAccess> e =
	- Kokkos::View<double*,Kokkos::LayoutLeft,SpaceSrc>("e",1);
	+ Kokkos::View< double*, Kokkos::LayoutLeft, SpaceDst, Kokkos::MemoryRandomAccess > e =
	+ Kokkos::View< double*, Kokkos::LayoutLeft, SpaceSrc >( "e", 1 );

	// Rank-one layout can assign:
	- Kokkos::View<double*,Kokkos::LayoutRight,SpaceDst> f =
	- Kokkos::View<double*,Kokkos::LayoutLeft,SpaceSrc>("f",1);
	+ Kokkos::View< double*, Kokkos::LayoutRight, SpaceDst > f =
	+ Kokkos::View< double*, Kokkos::LayoutLeft, SpaceSrc >( "f", 1 );
	}

	-
	} // namespace Test
	-
	-/--------------------------------------------------------------------------/
	-
	diff --git a/lib/kokkos/core/unit_test/TestViewSubview.hpp b/lib/kokkos/core/unit_test/TestViewSubview.hpp
	index 1c2575b6f..386301b45 100644
	--- a/lib/kokkos/core/unit_test/TestViewSubview.hpp
	+++ b/lib/kokkos/core/unit_test/TestViewSubview.hpp
	@@ -1,1239 +1,1291 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	#include <Kokkos_Core.hpp>
	#include <stdexcept>
	#include <sstream>
	#include <iostream>

	-/--------------------------------------------------------------------------/
	-
	namespace TestViewSubview {

	-template<class Layout, class Space>
	+template< class Layout, class Space >
	struct getView {
	static
	- Kokkos::View<double**,Layout,Space> get(int n, int m) {
	- return Kokkos::View<double**,Layout,Space>("G",n,m);
	+ Kokkos::View< double**, Layout, Space > get( int n, int m ) {
	+ return Kokkos::View< double**, Layout, Space >( "G", n, m );
	}
	};

	-template<class Space>
	-struct getView<Kokkos::LayoutStride,Space> {
	+template< class Space >
	+struct getView< Kokkos::LayoutStride, Space > {
	static
	- Kokkos::View<double**,Kokkos::LayoutStride,Space> get(int n, int m) {
	- const int rank = 2 ;
	+ Kokkos::View< double**, Kokkos::LayoutStride, Space > get( int n, int m ) {
	+ const int rank = 2;
	const int order[] = { 0, 1 };
	- const unsigned dim[] = { unsigned(n), unsigned(m) };
	- Kokkos::LayoutStride stride = Kokkos::LayoutStride::order_dimensions( rank , order , dim );
	- return Kokkos::View<double**,Kokkos::LayoutStride,Space>("G",stride);
	+ const unsigned dim[] = { unsigned( n ), unsigned( m ) };
	+ Kokkos::LayoutStride stride = Kokkos::LayoutStride::order_dimensions( rank, order, dim );
	+
	+ return Kokkos::View< double**, Kokkos::LayoutStride, Space >( "G", stride );
	}
	};

	-template<class ViewType, class Space>
	+template< class ViewType, class Space >
	struct fill_1D {
	typedef typename Space::execution_space execution_space;
	typedef typename ViewType::size_type size_type;
	+
	ViewType a;
	double val;
	- fill_1D(ViewType a_, double val_):a(a_),val(val_) {
	- }
	+
	+ fill_1D( ViewType a_, double val_ ) : a( a_ ), val( val_ ) {}
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int i) const {
	- a(i) = val;
	- }
	+ void operator()( const int i ) const { a( i ) = val; }
	};

	-template<class ViewType, class Space>
	+template< class ViewType, class Space >
	struct fill_2D {
	typedef typename Space::execution_space execution_space;
	typedef typename ViewType::size_type size_type;
	+
	ViewType a;
	double val;
	- fill_2D(ViewType a_, double val_):a(a_),val(val_) {
	- }
	+
	+ fill_2D( ViewType a_, double val_ ) : a( a_ ), val( val_ ) {}
	+
	KOKKOS_INLINE_FUNCTION
	- void operator() (const int i) const{
	- for(int j = 0; j < static_cast<int>(a.dimension_1()); j++)
	- a(i,j) = val;
	+ void operator()( const int i ) const
	+ {
	+ for ( int j = 0; j < static_cast< int >( a.dimension_1() ); j++ ) {
	+ a( i, j ) = val;
	+ }
	}
	};

	-template<class Layout, class Space>
	+template< class Layout, class Space >
	void test_auto_1d ()
	{
	- typedef Kokkos::View<double**, Layout, Space> mv_type;
	+ typedef Kokkos::View< double**, Layout, Space > mv_type;
	typedef typename mv_type::size_type size_type;
	+
	const double ZERO = 0.0;
	const double ONE = 1.0;
	const double TWO = 2.0;

	const size_type numRows = 10;
	const size_type numCols = 3;

	- mv_type X = getView<Layout,Space>::get(numRows, numCols);
	- typename mv_type::HostMirror X_h = Kokkos::create_mirror_view (X);
	+ mv_type X = getView< Layout, Space >::get( numRows, numCols );
	+ typename mv_type::HostMirror X_h = Kokkos::create_mirror_view( X );

	- fill_2D<mv_type,Space> f1(X, ONE);
	- Kokkos::parallel_for(X.dimension_0(),f1);
	- Kokkos::deep_copy (X_h, X);
	- for (size_type j = 0; j < numCols; ++j) {
	- for (size_type i = 0; i < numRows; ++i) {
	- ASSERT_TRUE(X_h(i,j) == ONE);
	+ fill_2D< mv_type, Space > f1( X, ONE );
	+ Kokkos::parallel_for( X.dimension_0(), f1 );
	+ Kokkos::deep_copy( X_h, X );
	+ for ( size_type j = 0; j < numCols; ++j ) {
	+ for ( size_type i = 0; i < numRows; ++i ) {
	+ ASSERT_TRUE( X_h( i, j ) == ONE );
	}
	}

	- fill_2D<mv_type,Space> f2(X, 0.0);
	- Kokkos::parallel_for(X.dimension_0(),f2);
	- Kokkos::deep_copy (X_h, X);
	- for (size_type j = 0; j < numCols; ++j) {
	- for (size_type i = 0; i < numRows; ++i) {
	- ASSERT_TRUE(X_h(i,j) == ZERO);
	+ fill_2D< mv_type, Space > f2( X, 0.0 );
	+ Kokkos::parallel_for( X.dimension_0(), f2 );
	+ Kokkos::deep_copy( X_h, X );
	+ for ( size_type j = 0; j < numCols; ++j ) {
	+ for ( size_type i = 0; i < numRows; ++i ) {
	+ ASSERT_TRUE( X_h( i, j ) == ZERO );
	}
	}

	- fill_2D<mv_type,Space> f3(X, TWO);
	- Kokkos::parallel_for(X.dimension_0(),f3);
	- Kokkos::deep_copy (X_h, X);
	- for (size_type j = 0; j < numCols; ++j) {
	- for (size_type i = 0; i < numRows; ++i) {
	- ASSERT_TRUE(X_h(i,j) == TWO);
	+ fill_2D< mv_type, Space > f3( X, TWO );
	+ Kokkos::parallel_for( X.dimension_0(), f3 );
	+ Kokkos::deep_copy( X_h, X );
	+ for ( size_type j = 0; j < numCols; ++j ) {
	+ for ( size_type i = 0; i < numRows; ++i ) {
	+ ASSERT_TRUE( X_h( i, j ) == TWO );
	}
	}

	- for (size_type j = 0; j < numCols; ++j) {
	- auto X_j = Kokkos::subview (X, Kokkos::ALL, j);
	+ for ( size_type j = 0; j < numCols; ++j ) {
	+ auto X_j = Kokkos::subview( X, Kokkos::ALL, j );

	- fill_1D<decltype(X_j),Space> f4(X_j, ZERO);
	- Kokkos::parallel_for(X_j.dimension_0(),f4);
	- Kokkos::deep_copy (X_h, X);
	- for (size_type i = 0; i < numRows; ++i) {
	- ASSERT_TRUE(X_h(i,j) == ZERO);
	+ fill_1D< decltype( X_j ), Space > f4( X_j, ZERO );
	+ Kokkos::parallel_for( X_j.dimension_0(), f4 );
	+ Kokkos::deep_copy( X_h, X );
	+ for ( size_type i = 0; i < numRows; ++i ) {
	+ ASSERT_TRUE( X_h( i, j ) == ZERO );
	}

	- for (size_type jj = 0; jj < numCols; ++jj) {
	- auto X_jj = Kokkos::subview (X, Kokkos::ALL, jj);
	- fill_1D<decltype(X_jj),Space> f5(X_jj, ONE);
	- Kokkos::parallel_for(X_jj.dimension_0(),f5);
	- Kokkos::deep_copy (X_h, X);
	- for (size_type i = 0; i < numRows; ++i) {
	- ASSERT_TRUE(X_h(i,jj) == ONE);
	+ for ( size_type jj = 0; jj < numCols; ++jj ) {
	+ auto X_jj = Kokkos::subview ( X, Kokkos::ALL, jj );
	+ fill_1D< decltype( X_jj ), Space > f5( X_jj, ONE );
	+ Kokkos::parallel_for( X_jj.dimension_0(), f5 );
	+ Kokkos::deep_copy( X_h, X );
	+ for ( size_type i = 0; i < numRows; ++i ) {
	+ ASSERT_TRUE( X_h( i, jj ) == ONE );
	}
	}
	}
	}

	-template<class LD, class LS, class Space>
	-void test_1d_strided_assignment_impl(bool a, bool b, bool c, bool d, int n, int m) {
	- Kokkos::View<double**,LS,Space> l2d("l2d",n,m);
	+template< class LD, class LS, class Space >
	+void test_1d_strided_assignment_impl( bool a, bool b, bool c, bool d, int n, int m ) {
	+ Kokkos::View< double**, LS, Space > l2d( "l2d", n, m );

	- int col = n>2?2:0;
	- int row = m>2?2:0;
	+ int col = n > 2 ? 2 : 0;
	+ int row = m > 2 ? 2 : 0;

	- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
	- if(a) {
	- Kokkos::View<double*,LD,Space> l1da = Kokkos::subview(l2d,Kokkos::ALL,row);
	- ASSERT_TRUE( & l1da(0) == & l2d(0,row) );
	- if(n>1)
	- ASSERT_TRUE( & l1da(1) == & l2d(1,row) );
	- }
	- if(b && n>13) {
	- Kokkos::View<double*,LD,Space> l1db = Kokkos::subview(l2d,std::pair<unsigned,unsigned>(2,13),row);
	- ASSERT_TRUE( & l1db(0) == & l2d(2,row) );
	- ASSERT_TRUE( & l1db(1) == & l2d(3,row) );
	- }
	- if(c) {
	- Kokkos::View<double*,LD,Space> l1dc = Kokkos::subview(l2d,col,Kokkos::ALL);
	- ASSERT_TRUE( & l1dc(0) == & l2d(col,0) );
	- if(m>1)
	- ASSERT_TRUE( & l1dc(1) == & l2d(col,1) );
	- }
	- if(d && m>13) {
	- Kokkos::View<double*,LD,Space> l1dd = Kokkos::subview(l2d,col,std::pair<unsigned,unsigned>(2,13));
	- ASSERT_TRUE( & l1dd(0) == & l2d(col,2) );
	- ASSERT_TRUE( & l1dd(1) == & l2d(col,3) );
	- }
	+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, typename Space::memory_space >::accessible ) {
	+ if ( a ) {
	+ Kokkos::View< double*, LD, Space > l1da = Kokkos::subview( l2d, Kokkos::ALL, row );
	+ ASSERT_TRUE( & l1da( 0 ) == & l2d( 0, row ) );
	+ if ( n > 1 ) {
	+ ASSERT_TRUE( & l1da( 1 ) == & l2d( 1, row ) );
	+ }
	+ }
	+
	+ if ( b && n > 13 ) {
	+ Kokkos::View< double*, LD, Space > l1db = Kokkos::subview( l2d, std::pair< unsigned, unsigned >( 2, 13 ), row );
	+ ASSERT_TRUE( & l1db( 0 ) == & l2d( 2, row ) );
	+ ASSERT_TRUE( & l1db( 1 ) == & l2d( 3, row ) );
	+ }
	+
	+ if ( c ) {
	+ Kokkos::View< double*, LD, Space > l1dc = Kokkos::subview( l2d, col, Kokkos::ALL );
	+ ASSERT_TRUE( & l1dc( 0 ) == & l2d( col, 0 ) );
	+ if( m > 1 ) {
	+ ASSERT_TRUE( & l1dc( 1 ) == & l2d( col, 1 ) );
	+ }
	+ }
	+
	+ if ( d && m > 13 ) {
	+ Kokkos::View< double*, LD, Space > l1dd = Kokkos::subview( l2d, col, std::pair< unsigned, unsigned >( 2, 13 ) );
	+ ASSERT_TRUE( & l1dd( 0 ) == & l2d( col, 2 ) );
	+ ASSERT_TRUE( & l1dd( 1 ) == & l2d( col, 3 ) );
	+ }
	}

	}

	-template<class Space >
	+template< class Space >
	void test_1d_strided_assignment() {
	- test_1d_strided_assignment_impl<Kokkos::LayoutStride,Kokkos::LayoutLeft,Space>(true,true,true,true,17,3);
	- test_1d_strided_assignment_impl<Kokkos::LayoutStride,Kokkos::LayoutRight,Space>(true,true,true,true,17,3);
	-
	- test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutLeft,Space>(true,true,false,false,17,3);
	- test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutLeft,Space>(true,true,false,false,17,3);
	- test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutRight,Space>(false,false,true,true,17,3);
	- test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutRight,Space>(false,false,true,true,17,3);
	-
	- test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutLeft,Space>(true,true,false,false,17,1);
	- test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutLeft,Space>(true,true,true,true,1,17);
	- test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutLeft,Space>(true,true,true,true,1,17);
	- test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutLeft,Space>(true,true,false,false,17,1);
	-
	- test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutRight,Space>(true,true,true,true,17,1);
	- test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutRight,Space>(false,false,true,true,1,17);
	- test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutRight,Space>(false,false,true,true,1,17);
	- test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutRight,Space>(true,true,true,true,17,1);
	+ test_1d_strided_assignment_impl< Kokkos::LayoutStride, Kokkos::LayoutLeft, Space >( true, true, true, true, 17, 3 );
	+ test_1d_strided_assignment_impl< Kokkos::LayoutStride, Kokkos::LayoutRight, Space >( true, true, true, true, 17, 3 );
	+
	+ test_1d_strided_assignment_impl< Kokkos::LayoutLeft, Kokkos::LayoutLeft, Space >( true, true, false, false, 17, 3 );
	+ test_1d_strided_assignment_impl< Kokkos::LayoutRight, Kokkos::LayoutLeft, Space >( true, true, false, false, 17, 3 );
	+ test_1d_strided_assignment_impl< Kokkos::LayoutLeft, Kokkos::LayoutRight, Space >( false, false, true, true, 17, 3 );
	+ test_1d_strided_assignment_impl< Kokkos::LayoutRight, Kokkos::LayoutRight, Space >( false, false, true, true, 17, 3 );
	+
	+ test_1d_strided_assignment_impl< Kokkos::LayoutLeft, Kokkos::LayoutLeft, Space >( true, true, false, false, 17, 1 );
	+ test_1d_strided_assignment_impl< Kokkos::LayoutLeft, Kokkos::LayoutLeft, Space >( true, true, true, true, 1, 17 );
	+ test_1d_strided_assignment_impl< Kokkos::LayoutRight, Kokkos::LayoutLeft, Space >( true, true, true, true, 1, 17 );
	+ test_1d_strided_assignment_impl< Kokkos::LayoutRight, Kokkos::LayoutLeft, Space >( true, true, false, false, 17, 1 );
	+
	+ test_1d_strided_assignment_impl< Kokkos::LayoutLeft, Kokkos::LayoutRight, Space >( true, true, true, true, 17, 1 );
	+ test_1d_strided_assignment_impl< Kokkos::LayoutLeft, Kokkos::LayoutRight, Space >( false, false, true, true, 1, 17 );
	+ test_1d_strided_assignment_impl< Kokkos::LayoutRight, Kokkos::LayoutRight, Space >( false, false, true, true, 1, 17 );
	+ test_1d_strided_assignment_impl< Kokkos::LayoutRight, Kokkos::LayoutRight, Space >( true, true, true, true, 17, 1 );
	}

	template< class Space >
	void test_left_0()
	{
	- typedef Kokkos::View< int [2][3][4][5][2][3][4][5] , Kokkos::LayoutLeft , Space >
	- view_static_8_type ;
	-
	- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
	+ typedef Kokkos::View< int [2][3][4][5][2][3][4][5], Kokkos::LayoutLeft, Space > view_static_8_type;

	- view_static_8_type x_static_8("x_static_left_8");
	+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, typename Space::memory_space >::accessible ) {
	+ view_static_8_type x_static_8( "x_static_left_8" );

	- ASSERT_TRUE( x_static_8.is_contiguous() );
	+ ASSERT_TRUE( x_static_8.is_contiguous() );

	- Kokkos::View<int,Kokkos::LayoutLeft,Space> x0 = Kokkos::subview( x_static_8 , 0, 0, 0, 0, 0, 0, 0, 0 );
	+ Kokkos::View< int, Kokkos::LayoutLeft, Space > x0 = Kokkos::subview( x_static_8, 0, 0, 0, 0, 0, 0, 0, 0 );

	- ASSERT_TRUE( x0.is_contiguous() );
	- ASSERT_TRUE( & x0() == & x_static_8(0,0,0,0,0,0,0,0) );
	+ ASSERT_TRUE( x0.is_contiguous() );
	+ ASSERT_TRUE( & x0() == & x_static_8( 0, 0, 0, 0, 0, 0, 0, 0 ) );

	- Kokkos::View<int*,Kokkos::LayoutLeft,Space> x1 =
	- Kokkos::subview( x_static_8, Kokkos::pair<int,int>(0,2), 1, 2, 3, 0, 1, 2, 3 );
	+ Kokkos::View< int*, Kokkos::LayoutLeft, Space > x1 =
	+ Kokkos::subview( x_static_8, Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3, 0, 1, 2, 3 );

	- ASSERT_TRUE( x1.is_contiguous() );
	- ASSERT_TRUE( & x1(0) == & x_static_8(0,1,2,3,0,1,2,3) );
	- ASSERT_TRUE( & x1(1) == & x_static_8(1,1,2,3,0,1,2,3) );
	+ ASSERT_TRUE( x1.is_contiguous() );
	+ ASSERT_TRUE( & x1( 0 ) == & x_static_8( 0, 1, 2, 3, 0, 1, 2, 3 ) );
	+ ASSERT_TRUE( & x1( 1 ) == & x_static_8( 1, 1, 2, 3, 0, 1, 2, 3 ) );

	- Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2 =
	- Kokkos::subview( x_static_8, Kokkos::pair<int,int>(0,2), 1, 2, 3
	- , Kokkos::pair<int,int>(0,2), 1, 2, 3 );
	+ Kokkos::View< int**, Kokkos::LayoutLeft, Space > x2 =
	+ Kokkos::subview( x_static_8, Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3
	+ , Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3 );

	- ASSERT_TRUE( ! x2.is_contiguous() );
	- ASSERT_TRUE( & x2(0,0) == & x_static_8(0,1,2,3,0,1,2,3) );
	- ASSERT_TRUE( & x2(1,0) == & x_static_8(1,1,2,3,0,1,2,3) );
	- ASSERT_TRUE( & x2(0,1) == & x_static_8(0,1,2,3,1,1,2,3) );
	- ASSERT_TRUE( & x2(1,1) == & x_static_8(1,1,2,3,1,1,2,3) );
	+ ASSERT_TRUE( ! x2.is_contiguous() );
	+ ASSERT_TRUE( & x2( 0, 0 ) == & x_static_8( 0, 1, 2, 3, 0, 1, 2, 3 ) );
	+ ASSERT_TRUE( & x2( 1, 0 ) == & x_static_8( 1, 1, 2, 3, 0, 1, 2, 3 ) );
	+ ASSERT_TRUE( & x2( 0, 1 ) == & x_static_8( 0, 1, 2, 3, 1, 1, 2, 3 ) );
	+ ASSERT_TRUE( & x2( 1, 1 ) == & x_static_8( 1, 1, 2, 3, 1, 1, 2, 3 ) );

	- // Kokkos::View<int**,Kokkos::LayoutLeft,Space> error_2 =
	- Kokkos::View<int**,Kokkos::LayoutStride,Space> sx2 =
	- Kokkos::subview( x_static_8, 1, Kokkos::pair<int,int>(0,2), 2, 3
	- , Kokkos::pair<int,int>(0,2), 1, 2, 3 );
	+ // Kokkos::View< int**, Kokkos::LayoutLeft, Space > error_2 =
	+ Kokkos::View< int**, Kokkos::LayoutStride, Space > sx2 =
	+ Kokkos::subview( x_static_8, 1, Kokkos::pair< int, int >( 0, 2 ), 2, 3
	+ , Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3 );

	- ASSERT_TRUE( ! sx2.is_contiguous() );
	- ASSERT_TRUE( & sx2(0,0) == & x_static_8(1,0,2,3,0,1,2,3) );
	- ASSERT_TRUE( & sx2(1,0) == & x_static_8(1,1,2,3,0,1,2,3) );
	- ASSERT_TRUE( & sx2(0,1) == & x_static_8(1,0,2,3,1,1,2,3) );
	- ASSERT_TRUE( & sx2(1,1) == & x_static_8(1,1,2,3,1,1,2,3) );
	+ ASSERT_TRUE( ! sx2.is_contiguous() );
	+ ASSERT_TRUE( & sx2( 0, 0 ) == & x_static_8( 1, 0, 2, 3, 0, 1, 2, 3 ) );
	+ ASSERT_TRUE( & sx2( 1, 0 ) == & x_static_8( 1, 1, 2, 3, 0, 1, 2, 3 ) );
	+ ASSERT_TRUE( & sx2( 0, 1 ) == & x_static_8( 1, 0, 2, 3, 1, 1, 2, 3 ) );
	+ ASSERT_TRUE( & sx2( 1, 1 ) == & x_static_8( 1, 1, 2, 3, 1, 1, 2, 3 ) );

	- Kokkos::View<int****,Kokkos::LayoutStride,Space> sx4 =
	- Kokkos::subview( x_static_8, 0, Kokkos::pair<int,int>(0,2) /* of [3] */
	- , 1, Kokkos::pair<int,int>(1,3) /* of [5] */
	- , 1, Kokkos::pair<int,int>(0,2) /* of [3] */
	- , 2, Kokkos::pair<int,int>(2,4) /* of [5] */
	- );
	+ Kokkos::View< int****, Kokkos::LayoutStride, Space > sx4 =
	+ Kokkos::subview( x_static_8, 0, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
	+ , 1, Kokkos::pair< int, int >( 1, 3 ) /* of [5] */
	+ , 1, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
	+ , 2, Kokkos::pair< int, int >( 2, 4 ) /* of [5] */
	+ );

	- ASSERT_TRUE( ! sx4.is_contiguous() );
	-
	- for ( int i0 = 0 ; i0 < (int) sx4.dimension_0() ; ++i0 )
	- for ( int i1 = 0 ; i1 < (int) sx4.dimension_1() ; ++i1 )
	- for ( int i2 = 0 ; i2 < (int) sx4.dimension_2() ; ++i2 )
	- for ( int i3 = 0 ; i3 < (int) sx4.dimension_3() ; ++i3 ) {
	- ASSERT_TRUE( & sx4(i0,i1,i2,i3) == & x_static_8(0,0+i0, 1,1+i1, 1,0+i2, 2,2+i3) );
	- }
	+ ASSERT_TRUE( ! sx4.is_contiguous() );

	+ for ( int i0 = 0; i0 < (int) sx4.dimension_0(); ++i0 )
	+ for ( int i1 = 0; i1 < (int) sx4.dimension_1(); ++i1 )
	+ for ( int i2 = 0; i2 < (int) sx4.dimension_2(); ++i2 )
	+ for ( int i3 = 0; i3 < (int) sx4.dimension_3(); ++i3 )
	+ {
	+ ASSERT_TRUE( & sx4( i0, i1, i2, i3 ) == & x_static_8( 0, 0 + i0, 1, 1 + i1, 1, 0 + i2, 2, 2 + i3 ) );
	+ }
	}
	}

	template< class Space >
	void test_left_1()
	{
	- typedef Kokkos::View< int ****[2][3][4][5] , Kokkos::LayoutLeft , Space >
	- view_type ;
	-
	- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
	+ typedef Kokkos::View< int ****[2][3][4][5], Kokkos::LayoutLeft, Space > view_type;

	- view_type x8("x_left_8",2,3,4,5);
	+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, typename Space::memory_space >::accessible ) {
	+ view_type x8( "x_left_8", 2, 3, 4, 5 );

	- ASSERT_TRUE( x8.is_contiguous() );
	+ ASSERT_TRUE( x8.is_contiguous() );

	- Kokkos::View<int,Kokkos::LayoutLeft,Space> x0 = Kokkos::subview( x8 , 0, 0, 0, 0, 0, 0, 0, 0 );
	+ Kokkos::View< int, Kokkos::LayoutLeft, Space > x0 = Kokkos::subview( x8, 0, 0, 0, 0, 0, 0, 0, 0 );

	- ASSERT_TRUE( x0.is_contiguous() );
	- ASSERT_TRUE( & x0() == & x8(0,0,0,0,0,0,0,0) );
	+ ASSERT_TRUE( x0.is_contiguous() );
	+ ASSERT_TRUE( & x0() == & x8( 0, 0, 0, 0, 0, 0, 0, 0 ) );

	- Kokkos::View<int*,Kokkos::LayoutLeft,Space> x1 =
	- Kokkos::subview( x8, Kokkos::pair<int,int>(0,2), 1, 2, 3, 0, 1, 2, 3 );
	+ Kokkos::View< int*, Kokkos::LayoutLeft, Space > x1 =
	+ Kokkos::subview( x8, Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3, 0, 1, 2, 3 );

	- ASSERT_TRUE( x1.is_contiguous() );
	- ASSERT_TRUE( & x1(0) == & x8(0,1,2,3,0,1,2,3) );
	- ASSERT_TRUE( & x1(1) == & x8(1,1,2,3,0,1,2,3) );
	+ ASSERT_TRUE( x1.is_contiguous() );
	+ ASSERT_TRUE( & x1( 0 ) == & x8( 0, 1, 2, 3, 0, 1, 2, 3 ) );
	+ ASSERT_TRUE( & x1( 1 ) == & x8( 1, 1, 2, 3, 0, 1, 2, 3 ) );

	- Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2 =
	- Kokkos::subview( x8, Kokkos::pair<int,int>(0,2), 1, 2, 3
	- , Kokkos::pair<int,int>(0,2), 1, 2, 3 );
	+ Kokkos::View< int**, Kokkos::LayoutLeft, Space > x2 =
	+ Kokkos::subview( x8, Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3
	+ , Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3 );

	- ASSERT_TRUE( ! x2.is_contiguous() );
	- ASSERT_TRUE( & x2(0,0) == & x8(0,1,2,3,0,1,2,3) );
	- ASSERT_TRUE( & x2(1,0) == & x8(1,1,2,3,0,1,2,3) );
	- ASSERT_TRUE( & x2(0,1) == & x8(0,1,2,3,1,1,2,3) );
	- ASSERT_TRUE( & x2(1,1) == & x8(1,1,2,3,1,1,2,3) );
	+ ASSERT_TRUE( ! x2.is_contiguous() );
	+ ASSERT_TRUE( & x2( 0, 0 ) == & x8( 0, 1, 2, 3, 0, 1, 2, 3 ) );
	+ ASSERT_TRUE( & x2( 1, 0 ) == & x8( 1, 1, 2, 3, 0, 1, 2, 3 ) );
	+ ASSERT_TRUE( & x2( 0, 1 ) == & x8( 0, 1, 2, 3, 1, 1, 2, 3 ) );
	+ ASSERT_TRUE( & x2( 1, 1 ) == & x8( 1, 1, 2, 3, 1, 1, 2, 3 ) );

	- // Kokkos::View<int**,Kokkos::LayoutLeft,Space> error_2 =
	- Kokkos::View<int**,Kokkos::LayoutStride,Space> sx2 =
	- Kokkos::subview( x8, 1, Kokkos::pair<int,int>(0,2), 2, 3
	- , Kokkos::pair<int,int>(0,2), 1, 2, 3 );
	+ // Kokkos::View< int**, Kokkos::LayoutLeft, Space > error_2 =
	+ Kokkos::View< int**, Kokkos::LayoutStride, Space > sx2 =
	+ Kokkos::subview( x8, 1, Kokkos::pair< int, int >( 0, 2 ), 2, 3
	+ , Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3 );

	- ASSERT_TRUE( ! sx2.is_contiguous() );
	- ASSERT_TRUE( & sx2(0,0) == & x8(1,0,2,3,0,1,2,3) );
	- ASSERT_TRUE( & sx2(1,0) == & x8(1,1,2,3,0,1,2,3) );
	- ASSERT_TRUE( & sx2(0,1) == & x8(1,0,2,3,1,1,2,3) );
	- ASSERT_TRUE( & sx2(1,1) == & x8(1,1,2,3,1,1,2,3) );
	+ ASSERT_TRUE( ! sx2.is_contiguous() );
	+ ASSERT_TRUE( & sx2( 0, 0 ) == & x8( 1, 0, 2, 3, 0, 1, 2, 3 ) );
	+ ASSERT_TRUE( & sx2( 1, 0 ) == & x8( 1, 1, 2, 3, 0, 1, 2, 3 ) );
	+ ASSERT_TRUE( & sx2( 0, 1 ) == & x8( 1, 0, 2, 3, 1, 1, 2, 3 ) );
	+ ASSERT_TRUE( & sx2( 1, 1 ) == & x8( 1, 1, 2, 3, 1, 1, 2, 3 ) );

	- Kokkos::View<int****,Kokkos::LayoutStride,Space> sx4 =
	- Kokkos::subview( x8, 0, Kokkos::pair<int,int>(0,2) /* of [3] */
	- , 1, Kokkos::pair<int,int>(1,3) /* of [5] */
	- , 1, Kokkos::pair<int,int>(0,2) /* of [3] */
	- , 2, Kokkos::pair<int,int>(2,4) /* of [5] */
	- );
	+ Kokkos::View< int****, Kokkos::LayoutStride, Space > sx4 =
	+ Kokkos::subview( x8, 0, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
	+ , 1, Kokkos::pair< int, int >( 1, 3 ) /* of [5] */
	+ , 1, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
	+ , 2, Kokkos::pair< int, int >( 2, 4 ) /* of [5] */
	+ );

	- ASSERT_TRUE( ! sx4.is_contiguous() );
	-
	- for ( int i0 = 0 ; i0 < (int) sx4.dimension_0() ; ++i0 )
	- for ( int i1 = 0 ; i1 < (int) sx4.dimension_1() ; ++i1 )
	- for ( int i2 = 0 ; i2 < (int) sx4.dimension_2() ; ++i2 )
	- for ( int i3 = 0 ; i3 < (int) sx4.dimension_3() ; ++i3 ) {
	- ASSERT_TRUE( & sx4(i0,i1,i2,i3) == & x8(0,0+i0, 1,1+i1, 1,0+i2, 2,2+i3) );
	- }
	+ ASSERT_TRUE( ! sx4.is_contiguous() );

	+ for ( int i0 = 0; i0 < (int) sx4.dimension_0(); ++i0 )
	+ for ( int i1 = 0; i1 < (int) sx4.dimension_1(); ++i1 )
	+ for ( int i2 = 0; i2 < (int) sx4.dimension_2(); ++i2 )
	+ for ( int i3 = 0; i3 < (int) sx4.dimension_3(); ++i3 )
	+ {
	+ ASSERT_TRUE( & sx4( i0, i1, i2, i3 ) == & x8( 0, 0 + i0, 1, 1 + i1, 1, 0 + i2, 2, 2 + i3 ) );
	+ }
	}
	}

	template< class Space >
	void test_left_2()
	{
	- typedef Kokkos::View< int **** , Kokkos::LayoutLeft , Space > view_type ;
	-
	- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
	-
	- view_type x4("x4",2,3,4,5);
	-
	- ASSERT_TRUE( x4.is_contiguous() );
	-
	- Kokkos::View<int,Kokkos::LayoutLeft,Space> x0 = Kokkos::subview( x4 , 0, 0, 0, 0 );
	-
	- ASSERT_TRUE( x0.is_contiguous() );
	- ASSERT_TRUE( & x0() == & x4(0,0,0,0) );
	-
	- Kokkos::View<int*,Kokkos::LayoutLeft,Space> x1 =
	- Kokkos::subview( x4, Kokkos::pair<int,int>(0,2), 1, 2, 3 );
	-
	- ASSERT_TRUE( x1.is_contiguous() );
	- ASSERT_TRUE( & x1(0) == & x4(0,1,2,3) );
	- ASSERT_TRUE( & x1(1) == & x4(1,1,2,3) );
	-
	- Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2 =
	- Kokkos::subview( x4, Kokkos::pair<int,int>(0,2), 1, Kokkos::pair<int,int>(1,3), 2 );
	-
	- ASSERT_TRUE( ! x2.is_contiguous() );
	- ASSERT_TRUE( & x2(0,0) == & x4(0,1,1,2) );
	- ASSERT_TRUE( & x2(1,0) == & x4(1,1,1,2) );
	- ASSERT_TRUE( & x2(0,1) == & x4(0,1,2,2) );
	- ASSERT_TRUE( & x2(1,1) == & x4(1,1,2,2) );
	-
	- // Kokkos::View<int**,Kokkos::LayoutLeft,Space> error_2 =
	- Kokkos::View<int**,Kokkos::LayoutStride,Space> sx2 =
	- Kokkos::subview( x4, 1, Kokkos::pair<int,int>(0,2)
	- , 2, Kokkos::pair<int,int>(1,4) );
	-
	- ASSERT_TRUE( ! sx2.is_contiguous() );
	- ASSERT_TRUE( & sx2(0,0) == & x4(1,0,2,1) );
	- ASSERT_TRUE( & sx2(1,0) == & x4(1,1,2,1) );
	- ASSERT_TRUE( & sx2(0,1) == & x4(1,0,2,2) );
	- ASSERT_TRUE( & sx2(1,1) == & x4(1,1,2,2) );
	- ASSERT_TRUE( & sx2(0,2) == & x4(1,0,2,3) );
	- ASSERT_TRUE( & sx2(1,2) == & x4(1,1,2,3) );
	-
	- Kokkos::View<int****,Kokkos::LayoutStride,Space> sx4 =
	- Kokkos::subview( x4, Kokkos::pair<int,int>(1,2) /* of [2] */
	- , Kokkos::pair<int,int>(1,3) /* of [3] */
	- , Kokkos::pair<int,int>(0,4) /* of [4] */
	- , Kokkos::pair<int,int>(2,4) /* of [5] */
	- );
	-
	- ASSERT_TRUE( ! sx4.is_contiguous() );
	-
	- for ( int i0 = 0 ; i0 < (int) sx4.dimension_0() ; ++i0 )
	- for ( int i1 = 0 ; i1 < (int) sx4.dimension_1() ; ++i1 )
	- for ( int i2 = 0 ; i2 < (int) sx4.dimension_2() ; ++i2 )
	- for ( int i3 = 0 ; i3 < (int) sx4.dimension_3() ; ++i3 ) {
	- ASSERT_TRUE( & sx4(i0,i1,i2,i3) == & x4( 1+i0, 1+i1, 0+i2, 2+i3 ) );
	- }
	-
	+ typedef Kokkos::View< int ****, Kokkos::LayoutLeft, Space > view_type;
	+
	+ if ( Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace, typename Space::memory_space>::accessible ) {
	+ view_type x4( "x4", 2, 3, 4, 5 );
	+
	+ ASSERT_TRUE( x4.is_contiguous() );
	+
	+ Kokkos::View< int, Kokkos::LayoutLeft, Space > x0 = Kokkos::subview( x4, 0, 0, 0, 0 );
	+
	+ ASSERT_TRUE( x0.is_contiguous() );
	+ ASSERT_TRUE( & x0() == & x4( 0, 0, 0, 0 ) );
	+
	+ Kokkos::View< int*, Kokkos::LayoutLeft, Space > x1 =
	+ Kokkos::subview( x4, Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3 );
	+
	+ ASSERT_TRUE( x1.is_contiguous() );
	+ ASSERT_TRUE( & x1( 0 ) == & x4( 0, 1, 2, 3 ) );
	+ ASSERT_TRUE( & x1( 1 ) == & x4( 1, 1, 2, 3 ) );
	+
	+ Kokkos::View< int**, Kokkos::LayoutLeft, Space > x2 =
	+ Kokkos::subview( x4, Kokkos::pair< int, int >( 0, 2 ), 1
	+ , Kokkos::pair< int, int >( 1, 3 ), 2 );
	+
	+ ASSERT_TRUE( ! x2.is_contiguous() );
	+ ASSERT_TRUE( & x2( 0, 0 ) == & x4( 0, 1, 1, 2 ) );
	+ ASSERT_TRUE( & x2( 1, 0 ) == & x4( 1, 1, 1, 2 ) );
	+ ASSERT_TRUE( & x2( 0, 1 ) == & x4( 0, 1, 2, 2 ) );
	+ ASSERT_TRUE( & x2( 1, 1 ) == & x4( 1, 1, 2, 2 ) );
	+
	+ // Kokkos::View< int**, Kokkos::LayoutLeft, Space > error_2 =
	+ Kokkos::View< int**, Kokkos::LayoutStride, Space > sx2 =
	+ Kokkos::subview( x4, 1, Kokkos::pair< int, int >( 0, 2 )
	+ , 2, Kokkos::pair< int, int >( 1, 4 ) );
	+
	+ ASSERT_TRUE( ! sx2.is_contiguous() );
	+ ASSERT_TRUE( & sx2( 0, 0 ) == & x4( 1, 0, 2, 1 ) );
	+ ASSERT_TRUE( & sx2( 1, 0 ) == & x4( 1, 1, 2, 1 ) );
	+ ASSERT_TRUE( & sx2( 0, 1 ) == & x4( 1, 0, 2, 2 ) );
	+ ASSERT_TRUE( & sx2( 1, 1 ) == & x4( 1, 1, 2, 2 ) );
	+ ASSERT_TRUE( & sx2( 0, 2 ) == & x4( 1, 0, 2, 3 ) );
	+ ASSERT_TRUE( & sx2( 1, 2 ) == & x4( 1, 1, 2, 3 ) );
	+
	+ Kokkos::View< int****, Kokkos::LayoutStride, Space > sx4 =
	+ Kokkos::subview( x4, Kokkos::pair< int, int >( 1, 2 ) /* of [2] */
	+ , Kokkos::pair< int, int >( 1, 3 ) /* of [3] */
	+ , Kokkos::pair< int, int >( 0, 4 ) /* of [4] */
	+ , Kokkos::pair< int, int >( 2, 4 ) /* of [5] */
	+ );
	+
	+ ASSERT_TRUE( ! sx4.is_contiguous() );
	+
	+ for ( int i0 = 0; i0 < (int) sx4.dimension_0(); ++i0 )
	+ for ( int i1 = 0; i1 < (int) sx4.dimension_1(); ++i1 )
	+ for ( int i2 = 0; i2 < (int) sx4.dimension_2(); ++i2 )
	+ for ( int i3 = 0; i3 < (int) sx4.dimension_3(); ++i3 )
	+ {
	+ ASSERT_TRUE( & sx4( i0, i1, i2, i3 ) == & x4( 1 + i0, 1 + i1, 0 + i2, 2 + i3 ) );
	+ }
	}
	}

	template< class Space >
	void test_left_3()
	{
	- typedef Kokkos::View< int ** , Kokkos::LayoutLeft , Space > view_type ;
	-
	- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
	+ typedef Kokkos::View< int **, Kokkos::LayoutLeft, Space > view_type;

	- view_type xm("x4",10,5);
	+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, typename Space::memory_space >::accessible ) {
	+ view_type xm( "x4", 10, 5 );

	- ASSERT_TRUE( xm.is_contiguous() );
	+ ASSERT_TRUE( xm.is_contiguous() );

	- Kokkos::View<int,Kokkos::LayoutLeft,Space> x0 = Kokkos::subview( xm , 5, 3 );
	+ Kokkos::View< int, Kokkos::LayoutLeft, Space > x0 = Kokkos::subview( xm, 5, 3 );

	- ASSERT_TRUE( x0.is_contiguous() );
	- ASSERT_TRUE( & x0() == & xm(5,3) );
	+ ASSERT_TRUE( x0.is_contiguous() );
	+ ASSERT_TRUE( & x0() == & xm( 5, 3 ) );

	- Kokkos::View<int*,Kokkos::LayoutLeft,Space> x1 =
	- Kokkos::subview( xm, Kokkos::ALL, 3 );
	+ Kokkos::View< int*, Kokkos::LayoutLeft, Space > x1 = Kokkos::subview( xm, Kokkos::ALL, 3 );

	- ASSERT_TRUE( x1.is_contiguous() );
	- for ( int i = 0 ; i < int(xm.dimension_0()) ; ++i ) {
	- ASSERT_TRUE( & x1(i) == & xm(i,3) );
	- }
	-
	- Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2 =
	- Kokkos::subview( xm, Kokkos::pair<int,int>(1,9), Kokkos::ALL );
	+ ASSERT_TRUE( x1.is_contiguous() );
	+ for ( int i = 0; i < int( xm.dimension_0() ); ++i ) {
	+ ASSERT_TRUE( & x1( i ) == & xm( i, 3 ) );
	+ }

	- ASSERT_TRUE( ! x2.is_contiguous() );
	- for ( int j = 0 ; j < int(x2.dimension_1()) ; ++j )
	- for ( int i = 0 ; i < int(x2.dimension_0()) ; ++i ) {
	- ASSERT_TRUE( & x2(i,j) == & xm(1+i,j) );
	- }
	+ Kokkos::View< int**, Kokkos::LayoutLeft, Space > x2 =
	+ Kokkos::subview( xm, Kokkos::pair< int, int >( 1, 9 ), Kokkos::ALL );

	- Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2c =
	- Kokkos::subview( xm, Kokkos::ALL, std::pair<int,int>(2,4) );
	+ ASSERT_TRUE( ! x2.is_contiguous() );
	+ for ( int j = 0; j < int( x2.dimension_1() ); ++j )
	+ for ( int i = 0; i < int( x2.dimension_0() ); ++i )
	+ {
	+ ASSERT_TRUE( & x2( i, j ) == & xm( 1 + i, j ) );
	+ }

	- ASSERT_TRUE( x2c.is_contiguous() );
	- for ( int j = 0 ; j < int(x2c.dimension_1()) ; ++j )
	- for ( int i = 0 ; i < int(x2c.dimension_0()) ; ++i ) {
	- ASSERT_TRUE( & x2c(i,j) == & xm(i,2+j) );
	- }
	+ Kokkos::View< int**, Kokkos::LayoutLeft, Space > x2c =
	+ Kokkos::subview( xm, Kokkos::ALL, std::pair< int, int >( 2, 4 ) );

	- Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2_n1 =
	- Kokkos::subview( xm , std::pair<int,int>(1,1) , Kokkos::ALL );
	+ ASSERT_TRUE( x2c.is_contiguous() );
	+ for ( int j = 0; j < int( x2c.dimension_1() ); ++j )
	+ for ( int i = 0; i < int( x2c.dimension_0() ); ++i )
	+ {
	+ ASSERT_TRUE( & x2c( i, j ) == & xm( i, 2 + j ) );
	+ }

	- ASSERT_TRUE( x2_n1.dimension_0() == 0 );
	- ASSERT_TRUE( x2_n1.dimension_1() == xm.dimension_1() );
	+ Kokkos::View< int**, Kokkos::LayoutLeft, Space > x2_n1 =
	+ Kokkos::subview( xm, std::pair< int, int >( 1, 1 ), Kokkos::ALL );

	- Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2_n2 =
	- Kokkos::subview( xm , Kokkos::ALL , std::pair<int,int>(1,1) );
	+ ASSERT_TRUE( x2_n1.dimension_0() == 0 );
	+ ASSERT_TRUE( x2_n1.dimension_1() == xm.dimension_1() );

	- ASSERT_TRUE( x2_n2.dimension_0() == xm.dimension_0() );
	- ASSERT_TRUE( x2_n2.dimension_1() == 0 );
	+ Kokkos::View< int**, Kokkos::LayoutLeft, Space > x2_n2 =
	+ Kokkos::subview( xm, Kokkos::ALL, std::pair< int, int >( 1, 1 ) );

	+ ASSERT_TRUE( x2_n2.dimension_0() == xm.dimension_0() );
	+ ASSERT_TRUE( x2_n2.dimension_1() == 0 );
	}
	}

	//----------------------------------------------------------------------------

	template< class Space >
	void test_right_0()
	{
	- typedef Kokkos::View< int [2][3][4][5][2][3][4][5] , Kokkos::LayoutRight , Space >
	- view_static_8_type ;
	-
	- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
	-
	- view_static_8_type x_static_8("x_static_right_8");
	-
	- Kokkos::View<int,Kokkos::LayoutRight,Space> x0 = Kokkos::subview( x_static_8 , 0, 0, 0, 0, 0, 0, 0, 0 );
	-
	- ASSERT_TRUE( & x0() == & x_static_8(0,0,0,0,0,0,0,0) );
	-
	- Kokkos::View<int*,Kokkos::LayoutRight,Space> x1 =
	- Kokkos::subview( x_static_8, 0, 1, 2, 3, 0, 1, 2, Kokkos::pair<int,int>(1,3) );
	-
	- ASSERT_TRUE( x1.dimension_0() == 2 );
	- ASSERT_TRUE( & x1(0) == & x_static_8(0,1,2,3,0,1,2,1) );
	- ASSERT_TRUE( & x1(1) == & x_static_8(0,1,2,3,0,1,2,2) );
	-
	- Kokkos::View<int**,Kokkos::LayoutRight,Space> x2 =
	- Kokkos::subview( x_static_8, 0, 1, 2, Kokkos::pair<int,int>(1,3)
	- , 0, 1, 2, Kokkos::pair<int,int>(1,3) );
	-
	- ASSERT_TRUE( x2.dimension_0() == 2 );
	- ASSERT_TRUE( x2.dimension_1() == 2 );
	- ASSERT_TRUE( & x2(0,0) == & x_static_8(0,1,2,1,0,1,2,1) );
	- ASSERT_TRUE( & x2(1,0) == & x_static_8(0,1,2,2,0,1,2,1) );
	- ASSERT_TRUE( & x2(0,1) == & x_static_8(0,1,2,1,0,1,2,2) );
	- ASSERT_TRUE( & x2(1,1) == & x_static_8(0,1,2,2,0,1,2,2) );
	-
	- // Kokkos::View<int**,Kokkos::LayoutRight,Space> error_2 =
	- Kokkos::View<int**,Kokkos::LayoutStride,Space> sx2 =
	- Kokkos::subview( x_static_8, 1, Kokkos::pair<int,int>(0,2), 2, 3
	- , Kokkos::pair<int,int>(0,2), 1, 2, 3 );
	-
	- ASSERT_TRUE( sx2.dimension_0() == 2 );
	- ASSERT_TRUE( sx2.dimension_1() == 2 );
	- ASSERT_TRUE( & sx2(0,0) == & x_static_8(1,0,2,3,0,1,2,3) );
	- ASSERT_TRUE( & sx2(1,0) == & x_static_8(1,1,2,3,0,1,2,3) );
	- ASSERT_TRUE( & sx2(0,1) == & x_static_8(1,0,2,3,1,1,2,3) );
	- ASSERT_TRUE( & sx2(1,1) == & x_static_8(1,1,2,3,1,1,2,3) );
	-
	- Kokkos::View<int****,Kokkos::LayoutStride,Space> sx4 =
	- Kokkos::subview( x_static_8, 0, Kokkos::pair<int,int>(0,2) /* of [3] */
	- , 1, Kokkos::pair<int,int>(1,3) /* of [5] */
	- , 1, Kokkos::pair<int,int>(0,2) /* of [3] */
	- , 2, Kokkos::pair<int,int>(2,4) /* of [5] */
	- );
	-
	- ASSERT_TRUE( sx4.dimension_0() == 2 );
	- ASSERT_TRUE( sx4.dimension_1() == 2 );
	- ASSERT_TRUE( sx4.dimension_2() == 2 );
	- ASSERT_TRUE( sx4.dimension_3() == 2 );
	- for ( int i0 = 0 ; i0 < (int) sx4.dimension_0() ; ++i0 )
	- for ( int i1 = 0 ; i1 < (int) sx4.dimension_1() ; ++i1 )
	- for ( int i2 = 0 ; i2 < (int) sx4.dimension_2() ; ++i2 )
	- for ( int i3 = 0 ; i3 < (int) sx4.dimension_3() ; ++i3 ) {
	- ASSERT_TRUE( & sx4(i0,i1,i2,i3) == & x_static_8(0, 0+i0, 1, 1+i1, 1, 0+i2, 2, 2+i3) );
	- }
	-
	+ typedef Kokkos::View< int [2][3][4][5][2][3][4][5], Kokkos::LayoutRight, Space > view_static_8_type;
	+
	+ if ( Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace, typename Space::memory_space>::accessible ) {
	+ view_static_8_type x_static_8( "x_static_right_8" );
	+
	+ Kokkos::View< int, Kokkos::LayoutRight, Space > x0 = Kokkos::subview( x_static_8, 0, 0, 0, 0, 0, 0, 0, 0 );
	+
	+ ASSERT_TRUE( & x0() == & x_static_8( 0, 0, 0, 0, 0, 0, 0, 0 ) );
	+
	+ Kokkos::View< int*, Kokkos::LayoutRight, Space > x1 =
	+ Kokkos::subview( x_static_8, 0, 1, 2, 3, 0, 1, 2, Kokkos::pair< int, int >( 1, 3 ) );
	+
	+ ASSERT_TRUE( x1.dimension_0() == 2 );
	+ ASSERT_TRUE( & x1( 0 ) == & x_static_8( 0, 1, 2, 3, 0, 1, 2, 1 ) );
	+ ASSERT_TRUE( & x1( 1 ) == & x_static_8( 0, 1, 2, 3, 0, 1, 2, 2 ) );
	+
	+ Kokkos::View< int**, Kokkos::LayoutRight, Space > x2 =
	+ Kokkos::subview( x_static_8, 0, 1, 2, Kokkos::pair< int, int >( 1, 3 )
	+ , 0, 1, 2, Kokkos::pair< int, int >( 1, 3 ) );
	+
	+ ASSERT_TRUE( x2.dimension_0() == 2 );
	+ ASSERT_TRUE( x2.dimension_1() == 2 );
	+ ASSERT_TRUE( & x2( 0, 0 ) == & x_static_8( 0, 1, 2, 1, 0, 1, 2, 1 ) );
	+ ASSERT_TRUE( & x2( 1, 0 ) == & x_static_8( 0, 1, 2, 2, 0, 1, 2, 1 ) );
	+ ASSERT_TRUE( & x2( 0, 1 ) == & x_static_8( 0, 1, 2, 1, 0, 1, 2, 2 ) );
	+ ASSERT_TRUE( & x2( 1, 1 ) == & x_static_8( 0, 1, 2, 2, 0, 1, 2, 2 ) );
	+
	+ // Kokkos::View< int**, Kokkos::LayoutRight, Space > error_2 =
	+ Kokkos::View< int**, Kokkos::LayoutStride, Space > sx2 =
	+ Kokkos::subview( x_static_8, 1, Kokkos::pair< int, int >( 0, 2 ), 2, 3
	+ , Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3 );
	+
	+ ASSERT_TRUE( sx2.dimension_0() == 2 );
	+ ASSERT_TRUE( sx2.dimension_1() == 2 );
	+ ASSERT_TRUE( & sx2( 0, 0 ) == & x_static_8( 1, 0, 2, 3, 0, 1, 2, 3 ) );
	+ ASSERT_TRUE( & sx2( 1, 0 ) == & x_static_8( 1, 1, 2, 3, 0, 1, 2, 3 ) );
	+ ASSERT_TRUE( & sx2( 0, 1 ) == & x_static_8( 1, 0, 2, 3, 1, 1, 2, 3 ) );
	+ ASSERT_TRUE( & sx2( 1, 1 ) == & x_static_8( 1, 1, 2, 3, 1, 1, 2, 3 ) );
	+
	+ Kokkos::View< int****, Kokkos::LayoutStride, Space > sx4 =
	+ Kokkos::subview( x_static_8, 0, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
	+ , 1, Kokkos::pair< int, int >( 1, 3 ) /* of [5] */
	+ , 1, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
	+ , 2, Kokkos::pair< int, int >( 2, 4 ) /* of [5] */
	+ );
	+
	+ ASSERT_TRUE( sx4.dimension_0() == 2 );
	+ ASSERT_TRUE( sx4.dimension_1() == 2 );
	+ ASSERT_TRUE( sx4.dimension_2() == 2 );
	+ ASSERT_TRUE( sx4.dimension_3() == 2 );
	+ for ( int i0 = 0; i0 < (int) sx4.dimension_0(); ++i0 )
	+ for ( int i1 = 0; i1 < (int) sx4.dimension_1(); ++i1 )
	+ for ( int i2 = 0; i2 < (int) sx4.dimension_2(); ++i2 )
	+ for ( int i3 = 0; i3 < (int) sx4.dimension_3(); ++i3 )
	+ {
	+ ASSERT_TRUE( & sx4( i0, i1, i2, i3 ) == & x_static_8( 0, 0 + i0, 1, 1 + i1, 1, 0 + i2, 2, 2 + i3 ) );
	+ }
	}
	}

	template< class Space >
	void test_right_1()
	{
	- typedef Kokkos::View< int ****[2][3][4][5] , Kokkos::LayoutRight , Space >
	- view_type ;
	-
	- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
	+ typedef Kokkos::View< int ****[2][3][4][5], Kokkos::LayoutRight, Space > view_type;

	- view_type x8("x_right_8",2,3,4,5);
	+ if ( Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace, typename Space::memory_space>::accessible ) {
	+ view_type x8( "x_right_8", 2, 3, 4, 5 );

	- Kokkos::View<int,Kokkos::LayoutRight,Space> x0 = Kokkos::subview( x8 , 0, 0, 0, 0, 0, 0, 0, 0 );
	+ Kokkos::View< int, Kokkos::LayoutRight, Space > x0 = Kokkos::subview( x8, 0, 0, 0, 0, 0, 0, 0, 0 );

	- ASSERT_TRUE( & x0() == & x8(0,0,0,0,0,0,0,0) );
	+ ASSERT_TRUE( & x0() == & x8( 0, 0, 0, 0, 0, 0, 0, 0 ) );

	- Kokkos::View<int*,Kokkos::LayoutRight,Space> x1 =
	- Kokkos::subview( x8, 0, 1, 2, 3, 0, 1, 2, Kokkos::pair<int,int>(1,3) );
	+ Kokkos::View< int*, Kokkos::LayoutRight, Space > x1 =
	+ Kokkos::subview( x8, 0, 1, 2, 3, 0, 1, 2, Kokkos::pair< int, int >( 1, 3 ) );

	- ASSERT_TRUE( & x1(0) == & x8(0,1,2,3,0,1,2,1) );
	- ASSERT_TRUE( & x1(1) == & x8(0,1,2,3,0,1,2,2) );
	+ ASSERT_TRUE( & x1( 0 ) == & x8( 0, 1, 2, 3, 0, 1, 2, 1 ) );
	+ ASSERT_TRUE( & x1( 1 ) == & x8( 0, 1, 2, 3, 0, 1, 2, 2 ) );

	- Kokkos::View<int**,Kokkos::LayoutRight,Space> x2 =
	- Kokkos::subview( x8, 0, 1, 2, Kokkos::pair<int,int>(1,3)
	- , 0, 1, 2, Kokkos::pair<int,int>(1,3) );
	+ Kokkos::View< int**, Kokkos::LayoutRight, Space > x2 =
	+ Kokkos::subview( x8, 0, 1, 2, Kokkos::pair< int, int >( 1, 3 )
	+ , 0, 1, 2, Kokkos::pair< int, int >( 1, 3 ) );

	- ASSERT_TRUE( & x2(0,0) == & x8(0,1,2,1,0,1,2,1) );
	- ASSERT_TRUE( & x2(1,0) == & x8(0,1,2,2,0,1,2,1) );
	- ASSERT_TRUE( & x2(0,1) == & x8(0,1,2,1,0,1,2,2) );
	- ASSERT_TRUE( & x2(1,1) == & x8(0,1,2,2,0,1,2,2) );
	+ ASSERT_TRUE( & x2( 0, 0 ) == & x8( 0, 1, 2, 1, 0, 1, 2, 1 ) );
	+ ASSERT_TRUE( & x2( 1, 0 ) == & x8( 0, 1, 2, 2, 0, 1, 2, 1 ) );
	+ ASSERT_TRUE( & x2( 0, 1 ) == & x8( 0, 1, 2, 1, 0, 1, 2, 2 ) );
	+ ASSERT_TRUE( & x2( 1, 1 ) == & x8( 0, 1, 2, 2, 0, 1, 2, 2 ) );

	- // Kokkos::View<int**,Kokkos::LayoutRight,Space> error_2 =
	- Kokkos::View<int**,Kokkos::LayoutStride,Space> sx2 =
	- Kokkos::subview( x8, 1, Kokkos::pair<int,int>(0,2), 2, 3
	- , Kokkos::pair<int,int>(0,2), 1, 2, 3 );
	+ // Kokkos::View< int**, Kokkos::LayoutRight, Space > error_2 =
	+ Kokkos::View< int**, Kokkos::LayoutStride, Space > sx2 =
	+ Kokkos::subview( x8, 1, Kokkos::pair< int, int >( 0, 2 ), 2, 3
	+ , Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3 );

	- ASSERT_TRUE( & sx2(0,0) == & x8(1,0,2,3,0,1,2,3) );
	- ASSERT_TRUE( & sx2(1,0) == & x8(1,1,2,3,0,1,2,3) );
	- ASSERT_TRUE( & sx2(0,1) == & x8(1,0,2,3,1,1,2,3) );
	- ASSERT_TRUE( & sx2(1,1) == & x8(1,1,2,3,1,1,2,3) );
	+ ASSERT_TRUE( & sx2( 0, 0 ) == & x8( 1, 0, 2, 3, 0, 1, 2, 3 ) );
	+ ASSERT_TRUE( & sx2( 1, 0 ) == & x8( 1, 1, 2, 3, 0, 1, 2, 3 ) );
	+ ASSERT_TRUE( & sx2( 0, 1 ) == & x8( 1, 0, 2, 3, 1, 1, 2, 3 ) );
	+ ASSERT_TRUE( & sx2( 1, 1 ) == & x8( 1, 1, 2, 3, 1, 1, 2, 3 ) );

	- Kokkos::View<int****,Kokkos::LayoutStride,Space> sx4 =
	- Kokkos::subview( x8, 0, Kokkos::pair<int,int>(0,2) /* of [3] */
	- , 1, Kokkos::pair<int,int>(1,3) /* of [5] */
	- , 1, Kokkos::pair<int,int>(0,2) /* of [3] */
	- , 2, Kokkos::pair<int,int>(2,4) /* of [5] */
	- );
	-
	- for ( int i0 = 0 ; i0 < (int) sx4.dimension_0() ; ++i0 )
	- for ( int i1 = 0 ; i1 < (int) sx4.dimension_1() ; ++i1 )
	- for ( int i2 = 0 ; i2 < (int) sx4.dimension_2() ; ++i2 )
	- for ( int i3 = 0 ; i3 < (int) sx4.dimension_3() ; ++i3 ) {
	- ASSERT_TRUE( & sx4(i0,i1,i2,i3) == & x8(0,0+i0, 1,1+i1, 1,0+i2, 2,2+i3) );
	- }
	+ Kokkos::View< int****, Kokkos::LayoutStride, Space > sx4 =
	+ Kokkos::subview( x8, 0, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
	+ , 1, Kokkos::pair< int, int >( 1, 3 ) /* of [5] */
	+ , 1, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
	+ , 2, Kokkos::pair< int, int >( 2, 4 ) /* of [5] */
	+ );

	+ for ( int i0 = 0; i0 < (int) sx4.dimension_0(); ++i0 )
	+ for ( int i1 = 0; i1 < (int) sx4.dimension_1(); ++i1 )
	+ for ( int i2 = 0; i2 < (int) sx4.dimension_2(); ++i2 )
	+ for ( int i3 = 0; i3 < (int) sx4.dimension_3(); ++i3 )
	+ {
	+ ASSERT_TRUE( & sx4( i0, i1, i2, i3 ) == & x8( 0, 0 + i0, 1, 1 + i1, 1, 0 + i2, 2, 2 + i3 ) );
	+ }
	}
	}

	template< class Space >
	void test_right_3()
	{
	- typedef Kokkos::View< int ** , Kokkos::LayoutRight , Space > view_type ;
	-
	- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
	+ typedef Kokkos::View< int **, Kokkos::LayoutRight, Space > view_type;

	- view_type xm("x4",10,5);
	+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, typename Space::memory_space >::accessible ) {
	+ view_type xm( "x4", 10, 5 );

	- ASSERT_TRUE( xm.is_contiguous() );
	+ ASSERT_TRUE( xm.is_contiguous() );

	- Kokkos::View<int,Kokkos::LayoutRight,Space> x0 = Kokkos::subview( xm , 5, 3 );
	+ Kokkos::View< int, Kokkos::LayoutRight, Space > x0 = Kokkos::subview( xm, 5, 3 );

	- ASSERT_TRUE( x0.is_contiguous() );
	- ASSERT_TRUE( & x0() == & xm(5,3) );
	+ ASSERT_TRUE( x0.is_contiguous() );
	+ ASSERT_TRUE( & x0() == & xm( 5, 3 ) );

	- Kokkos::View<int*,Kokkos::LayoutRight,Space> x1 =
	- Kokkos::subview( xm, 3, Kokkos::ALL );
	-
	- ASSERT_TRUE( x1.is_contiguous() );
	- for ( int i = 0 ; i < int(xm.dimension_1()) ; ++i ) {
	- ASSERT_TRUE( & x1(i) == & xm(3,i) );
	- }
	+ Kokkos::View< int*, Kokkos::LayoutRight, Space > x1 = Kokkos::subview( xm, 3, Kokkos::ALL );

	- Kokkos::View<int**,Kokkos::LayoutRight,Space> x2c =
	- Kokkos::subview( xm, Kokkos::pair<int,int>(1,9), Kokkos::ALL );
	+ ASSERT_TRUE( x1.is_contiguous() );
	+ for ( int i = 0; i < int( xm.dimension_1() ); ++i ) {
	+ ASSERT_TRUE( & x1( i ) == & xm( 3, i ) );
	+ }

	- ASSERT_TRUE( x2c.is_contiguous() );
	- for ( int j = 0 ; j < int(x2c.dimension_1()) ; ++j )
	- for ( int i = 0 ; i < int(x2c.dimension_0()) ; ++i ) {
	- ASSERT_TRUE( & x2c(i,j) == & xm(1+i,j) );
	- }
	+ Kokkos::View< int**, Kokkos::LayoutRight, Space > x2c =
	+ Kokkos::subview( xm, Kokkos::pair< int, int >( 1, 9 ), Kokkos::ALL );

	- Kokkos::View<int**,Kokkos::LayoutRight,Space> x2 =
	- Kokkos::subview( xm, Kokkos::ALL, std::pair<int,int>(2,4) );
	+ ASSERT_TRUE( x2c.is_contiguous() );
	+ for ( int j = 0; j < int( x2c.dimension_1() ); ++j )
	+ for ( int i = 0; i < int( x2c.dimension_0() ); ++i ) {
	+ ASSERT_TRUE( & x2c( i, j ) == & xm( 1 + i, j ) );
	+ }

	- ASSERT_TRUE( ! x2.is_contiguous() );
	- for ( int j = 0 ; j < int(x2.dimension_1()) ; ++j )
	- for ( int i = 0 ; i < int(x2.dimension_0()) ; ++i ) {
	- ASSERT_TRUE( & x2(i,j) == & xm(i,2+j) );
	- }
	+ Kokkos::View< int**, Kokkos::LayoutRight, Space > x2 =
	+ Kokkos::subview( xm, Kokkos::ALL, std::pair< int, int >( 2, 4 ) );

	- Kokkos::View<int**,Kokkos::LayoutRight,Space> x2_n1 =
	- Kokkos::subview( xm , std::pair<int,int>(1,1) , Kokkos::ALL );
	+ ASSERT_TRUE( ! x2.is_contiguous() );
	+ for ( int j = 0; j < int( x2.dimension_1() ); ++j )
	+ for ( int i = 0; i < int( x2.dimension_0() ); ++i )
	+ {
	+ ASSERT_TRUE( & x2( i, j ) == & xm( i, 2 + j ) );
	+ }

	- ASSERT_TRUE( x2_n1.dimension_0() == 0 );
	- ASSERT_TRUE( x2_n1.dimension_1() == xm.dimension_1() );
	+ Kokkos::View< int**, Kokkos::LayoutRight, Space > x2_n1 =
	+ Kokkos::subview( xm, std::pair< int, int >( 1, 1 ), Kokkos::ALL );

	- Kokkos::View<int**,Kokkos::LayoutRight,Space> x2_n2 =
	- Kokkos::subview( xm , Kokkos::ALL , std::pair<int,int>(1,1) );
	+ ASSERT_TRUE( x2_n1.dimension_0() == 0 );
	+ ASSERT_TRUE( x2_n1.dimension_1() == xm.dimension_1() );

	- ASSERT_TRUE( x2_n2.dimension_0() == xm.dimension_0() );
	- ASSERT_TRUE( x2_n2.dimension_1() == 0 );
	+ Kokkos::View< int**, Kokkos::LayoutRight, Space > x2_n2 =
	+ Kokkos::subview( xm, Kokkos::ALL, std::pair< int, int >( 1, 1 ) );

	+ ASSERT_TRUE( x2_n2.dimension_0() == xm.dimension_0() );
	+ ASSERT_TRUE( x2_n2.dimension_1() == 0 );
	}
	}

	namespace Impl {

	-constexpr int N0=113;
	-constexpr int N1=11;
	-constexpr int N2=17;
	-constexpr int N3=5;
	-constexpr int N4=7;
	+constexpr int N0 = 113;
	+constexpr int N1 = 11;
	+constexpr int N2 = 17;
	+constexpr int N3 = 5;
	+constexpr int N4 = 7;

	-template<class SubView,class View>
	-void test_Check1D(SubView a, View b, std::pair<int,int> range) {
	+template< class SubView, class View >
	+void test_Check1D( SubView a, View b, std::pair< int, int > range ) {
	int errors = 0;
	- for(int i=0;i<range.second-range.first;i++) {
	- if(a(i)!=b(i+range.first))
	- errors++;
	+
	+ for ( int i = 0; i < range.second - range.first; i++ ) {
	+ if ( a( i ) != b( i + range.first ) ) errors++;
	+ }
	+
	+ if ( errors > 0 ) {
	+ std::cout << "Error Suviews test_Check1D: " << errors << std::endl;
	}
	- if(errors>0)
	- std::cout << "Error Suviews test_Check1D: " << errors <<std::endl;
	+
	ASSERT_TRUE( errors == 0 );
	}

	-template<class SubView,class View>
	-void test_Check1D2D(SubView a, View b, int i0, std::pair<int,int> range) {
	+template< class SubView, class View >
	+void test_Check1D2D( SubView a, View b, int i0, std::pair< int, int > range ) {
	int errors = 0;
	- for(int i1=0;i1<range.second-range.first;i1++) {
	- if(a(i1)!=b(i0,i1+range.first))
	- errors++;
	+
	+ for ( int i1 = 0; i1 < range.second - range.first; i1++ ) {
	+ if ( a( i1 ) != b( i0, i1 + range.first ) ) errors++;
	}
	- if(errors>0)
	- std::cout << "Error Suviews test_Check1D2D: " << errors <<std::endl;
	+
	+ if ( errors > 0 ) {
	+ std::cout << "Error Suviews test_Check1D2D: " << errors << std::endl;
	+ }
	+
	ASSERT_TRUE( errors == 0 );
	}

	-template<class SubView,class View>
	-void test_Check2D3D(SubView a, View b, int i0, std::pair<int,int> range1, std::pair<int,int> range2) {
	+template< class SubView, class View >
	+void test_Check2D3D( SubView a, View b, int i0, std::pair< int, int > range1
	+ , std::pair< int, int > range2 )
	+{
	int errors = 0;
	- for(int i1=0;i1<range1.second-range1.first;i1++) {
	- for(int i2=0;i2<range2.second-range2.first;i2++) {
	- if(a(i1,i2)!=b(i0,i1+range1.first,i2+range2.first))
	- errors++;
	+
	+ for ( int i1 = 0; i1 < range1.second - range1.first; i1++ ) {
	+ for ( int i2 = 0; i2 < range2.second - range2.first; i2++ ) {
	+ if ( a( i1, i2 ) != b( i0, i1 + range1.first, i2 + range2.first ) ) errors++;
	}
	}
	- if(errors>0)
	- std::cout << "Error Suviews test_Check2D3D: " << errors <<std::endl;
	+
	+ if ( errors > 0 ) {
	+ std::cout << "Error Suviews test_Check2D3D: " << errors << std::endl;
	+ }
	+
	ASSERT_TRUE( errors == 0 );
	}

	-template<class SubView,class View>
	-void test_Check3D5D(SubView a, View b, int i0, int i1, std::pair<int,int> range2, std::pair<int,int> range3, std::pair<int,int> range4) {
	+template<class SubView, class View>
	+void test_Check3D5D( SubView a, View b, int i0, int i1, std::pair< int, int > range2
	+ , std::pair< int, int > range3, std::pair< int, int > range4 )
	+{
	int errors = 0;
	- for(int i2=0;i2<range2.second-range2.first;i2++) {
	- for(int i3=0;i3<range3.second-range3.first;i3++) {
	- for(int i4=0;i4<range4.second-range4.first;i4++) {
	- if(a(i2,i3,i4)!=b(i0,i1,i2+range2.first,i3+range3.first,i4+range4.first))
	+
	+ for ( int i2 = 0; i2 < range2.second - range2.first; i2++ ) {
	+ for ( int i3 = 0; i3 < range3.second - range3.first; i3++ ) {
	+ for ( int i4 = 0; i4 < range4.second - range4.first; i4++ ) {
	+ if ( a( i2, i3, i4 ) != b( i0, i1, i2 + range2.first, i3 + range3.first, i4 + range4.first ) ) {
	errors++;
	+ }
	}
	}
	}
	- if(errors>0)
	- std::cout << "Error Suviews test_Check3D5D: " << errors <<std::endl;
	+
	+ if ( errors > 0 ) {
	+ std::cout << "Error Suviews test_Check3D5D: " << errors << std::endl;
	+ }
	+
	ASSERT_TRUE( errors == 0 );
	}

	-template<class Space, class LayoutSub, class Layout, class LayoutOrg, class MemTraits>
	+template< class Space, class LayoutSub, class Layout, class LayoutOrg, class MemTraits >
	void test_1d_assign_impl() {
	-
	- { //Breaks
	- Kokkos::View<int*,LayoutOrg,Space> a_org("A",N0);
	- Kokkos::View<int*,LayoutOrg,Space,MemTraits> a(a_org);
	+ { // Breaks.
	+ Kokkos::View< int*, LayoutOrg, Space > a_org( "A", N0 );
	+ Kokkos::View< int*, LayoutOrg, Space, MemTraits > a( a_org );
	Kokkos::fence();
	- for(int i=0; i<N0; i++)
	- a_org(i) = i;
	+ for ( int i = 0; i < N0; i++ ) a_org( i ) = i;

	- Kokkos::View<int[N0],Layout,Space,MemTraits> a1(a);
	+ Kokkos::View< int[N0], Layout, Space, MemTraits > a1( a );
	Kokkos::fence();
	- test_Check1D(a1,a,std::pair<int,int>(0,N0));
	+ test_Check1D( a1, a, std::pair< int, int >( 0, N0 ) );

	- Kokkos::View<int[N0],LayoutSub,Space,MemTraits> a2(a1);
	+ Kokkos::View< int[N0], LayoutSub, Space, MemTraits > a2( a1 );
	Kokkos::fence();
	- test_Check1D(a2,a,std::pair<int,int>(0,N0));
	+ test_Check1D( a2, a, std::pair< int, int >( 0, N0 ) );
	a1 = a;
	- test_Check1D(a1,a,std::pair<int,int>(0,N0));
	+ test_Check1D( a1, a, std::pair< int, int >( 0, N0 ) );

	- //Runtime Fail expected
	- //Kokkos::View<int[N1]> afail1(a);
	+ // Runtime Fail expected.
	+ //Kokkos::View< int[N1] > afail1( a );

	- //Compile Time Fail expected
	- //Kokkos::View<int[N1]> afail2(a1);
	+ // Compile Time Fail expected.
	+ //Kokkos::View< int[N1] > afail2( a1 );
	}

	- { // Works
	- Kokkos::View<int[N0],LayoutOrg,Space,MemTraits> a("A");
	- Kokkos::View<int*,Layout,Space,MemTraits> a1(a);
	+ { // Works.
	+ Kokkos::View< int[N0], LayoutOrg, Space, MemTraits > a( "A" );
	+ Kokkos::View< int*, Layout, Space, MemTraits > a1( a );
	Kokkos::fence();
	- test_Check1D(a1,a,std::pair<int,int>(0,N0));
	+ test_Check1D( a1, a, std::pair< int, int >( 0, N0 ) );
	a1 = a;
	Kokkos::fence();
	- test_Check1D(a1,a,std::pair<int,int>(0,N0));
	+ test_Check1D( a1, a, std::pair< int, int >( 0, N0 ) );
	}
	}

	-template<class Space, class Type, class TypeSub,class LayoutSub, class Layout, class LayoutOrg,class MemTraits>
	+template< class Space, class Type, class TypeSub, class LayoutSub, class Layout, class LayoutOrg, class MemTraits >
	void test_2d_subview_3d_impl_type() {
	- Kokkos::View<int***,LayoutOrg,Space> a_org("A",N0,N1,N2);
	- Kokkos::View<Type,Layout,Space,MemTraits> a(a_org);
	- for(int i0=0; i0<N0; i0++)
	- for(int i1=0; i1<N1; i1++)
	- for(int i2=0; i2<N2; i2++)
	- a_org(i0,i1,i2) = i01000000+i11000+i2;
	- Kokkos::View<TypeSub,LayoutSub,Space,MemTraits> a1;
	- a1 = Kokkos::subview(a,3,Kokkos::ALL,Kokkos::ALL);
	+ Kokkos::View< int***, LayoutOrg, Space > a_org( "A", N0, N1, N2 );
	+ Kokkos::View< Type, Layout, Space, MemTraits > a( a_org );
	+
	+ for ( int i0 = 0; i0 < N0; i0++ )
	+ for ( int i1 = 0; i1 < N1; i1++ )
	+ for ( int i2 = 0; i2 < N2; i2++ )
	+ {
	+ a_org( i0, i1, i2 ) = i0 * 1000000 + i1 * 1000 + i2;
	+ }
	+
	+ Kokkos::View< TypeSub, LayoutSub, Space, MemTraits > a1;
	+ a1 = Kokkos::subview( a, 3, Kokkos::ALL, Kokkos::ALL );
	Kokkos::fence();
	- test_Check2D3D(a1,a,3,std::pair<int,int>(0,N1),std::pair<int,int>(0,N2));
	+ test_Check2D3D( a1, a, 3, std::pair< int, int >( 0, N1 ), std::pair< int, int >( 0, N2 ) );

	- Kokkos::View<TypeSub,LayoutSub,Space,MemTraits> a2(a,3,Kokkos::ALL,Kokkos::ALL);
	+ Kokkos::View< TypeSub, LayoutSub, Space, MemTraits > a2( a, 3, Kokkos::ALL, Kokkos::ALL );
	Kokkos::fence();
	- test_Check2D3D(a2,a,3,std::pair<int,int>(0,N1),std::pair<int,int>(0,N2));
	+ test_Check2D3D( a2, a, 3, std::pair< int, int >( 0, N1 ), std::pair< int, int >( 0, N2 ) );
	}

	-template<class Space, class LayoutSub, class Layout, class LayoutOrg, class MemTraits>
	+template< class Space, class LayoutSub, class Layout, class LayoutOrg, class MemTraits >
	void test_2d_subview_3d_impl_layout() {
	- test_2d_subview_3d_impl_type<Space,int[N0][N1][N2],int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,int[N0][N1][N2],int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,int[N0][N1][N2],int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	+ test_2d_subview_3d_impl_type< Space, int[N0][N1][N2], int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, int[N0][N1][N2], int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, int[N0][N1][N2], int** , LayoutSub, Layout, LayoutOrg, MemTraits >();

	- test_2d_subview_3d_impl_type<Space,int* [N1][N2],int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,int* [N1][N2],int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,int* [N1][N2],int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	+ test_2d_subview_3d_impl_type< Space, int* [N1][N2], int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, int* [N1][N2], int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, int* [N1][N2], int** , LayoutSub, Layout, LayoutOrg, MemTraits >();

	- test_2d_subview_3d_impl_type<Space,int** [N2],int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,int** [N2],int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,int [N2],int ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	+ test_2d_subview_3d_impl_type< Space, int** [N2], int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, int** [N2], int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, int [N2], int , LayoutSub, Layout, LayoutOrg, MemTraits >();

	- test_2d_subview_3d_impl_type<Space,int*** ,int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,int*** ,int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,int* ,int ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	+ test_2d_subview_3d_impl_type< Space, int*** , int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, int*** , int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, int* , int , LayoutSub, Layout, LayoutOrg, MemTraits >();

	- test_2d_subview_3d_impl_type<Space,const int[N0][N1][N2],const int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,const int[N0][N1][N2],const int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,const int[N0][N1][N2],const int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	+ test_2d_subview_3d_impl_type< Space, const int[N0][N1][N2], const int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, const int[N0][N1][N2], const int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, const int[N0][N1][N2], const int** , LayoutSub, Layout, LayoutOrg, MemTraits >();

	- test_2d_subview_3d_impl_type<Space,const int* [N1][N2],const int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,const int* [N1][N2],const int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,const int* [N1][N2],const int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	+ test_2d_subview_3d_impl_type< Space, const int* [N1][N2], const int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, const int* [N1][N2], const int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, const int* [N1][N2], const int** , LayoutSub, Layout, LayoutOrg, MemTraits >();

	- test_2d_subview_3d_impl_type<Space,const int** [N2],const int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,const int** [N2],const int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,const int [N2],const int ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	+ test_2d_subview_3d_impl_type< Space, const int** [N2], const int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, const int** [N2], const int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, const int [N2], const int , LayoutSub, Layout, LayoutOrg, MemTraits >();

	- test_2d_subview_3d_impl_type<Space,const int*** ,const int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,const int*** ,const int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_2d_subview_3d_impl_type<Space,const int* ,const int ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	+ test_2d_subview_3d_impl_type< Space, const int*** , const int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, const int*** , const int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_2d_subview_3d_impl_type< Space, const int* , const int , LayoutSub, Layout, LayoutOrg, MemTraits >();
	}

	-template<class Space, class Type, class TypeSub,class LayoutSub, class Layout, class LayoutOrg, class MemTraits>
	+template< class Space, class Type, class TypeSub, class LayoutSub, class Layout, class LayoutOrg, class MemTraits >
	void test_3d_subview_5d_impl_type() {
	- Kokkos::View<int*****,LayoutOrg,Space> a_org("A",N0,N1,N2,N3,N4);
	- Kokkos::View<Type,Layout,Space,MemTraits> a(a_org);
	- for(int i0=0; i0<N0; i0++)
	- for(int i1=0; i1<N1; i1++)
	- for(int i2=0; i2<N2; i2++)
	- for(int i3=0; i3<N3; i3++)
	- for(int i4=0; i4<N4; i4++)
	- a_org(i0,i1,i2,i3,i4) = i01000000+i110000+i2100+i310+i4;
	- Kokkos::View<TypeSub,LayoutSub,Space,MemTraits> a1;
	- a1 = Kokkos::subview(a,3,5,Kokkos::ALL,Kokkos::ALL,Kokkos::ALL);
	+ Kokkos::View< int*****, LayoutOrg, Space > a_org( "A", N0, N1, N2, N3, N4 );
	+ Kokkos::View< Type, Layout, Space, MemTraits > a( a_org );
	+
	+ for ( int i0 = 0; i0 < N0; i0++ )
	+ for ( int i1 = 0; i1 < N1; i1++ )
	+ for ( int i2 = 0; i2 < N2; i2++ )
	+ for ( int i3 = 0; i3 < N3; i3++ )
	+ for ( int i4 = 0; i4 < N4; i4++ )
	+ {
	+ a_org( i0, i1, i2, i3, i4 ) = i0 * 1000000 + i1 * 10000 + i2 * 100 + i3 * 10 + i4;
	+ }
	+
	+ Kokkos::View< TypeSub, LayoutSub, Space, MemTraits > a1;
	+ a1 = Kokkos::subview( a, 3, 5, Kokkos::ALL, Kokkos::ALL, Kokkos::ALL );
	Kokkos::fence();
	- test_Check3D5D(a1,a,3,5,std::pair<int,int>(0,N2),std::pair<int,int>(0,N3),std::pair<int,int>(0,N4));
	+ test_Check3D5D( a1, a, 3, 5, std::pair< int, int >( 0, N2 ), std::pair< int, int >( 0, N3 ), std::pair< int, int >( 0, N4 ) );

	- Kokkos::View<TypeSub,LayoutSub,Space,MemTraits> a2(a,3,5,Kokkos::ALL,Kokkos::ALL,Kokkos::ALL);
	+ Kokkos::View< TypeSub, LayoutSub, Space, MemTraits > a2( a, 3, 5, Kokkos::ALL, Kokkos::ALL, Kokkos::ALL );
	Kokkos::fence();
	- test_Check3D5D(a2,a,3,5,std::pair<int,int>(0,N2),std::pair<int,int>(0,N3),std::pair<int,int>(0,N4));
	+ test_Check3D5D( a2, a, 3, 5, std::pair< int, int >( 0, N2 ), std::pair< int, int >( 0, N3 ), std::pair< int, int >( 0, N4 ) );
	}

	-template<class Space, class LayoutSub, class Layout, class LayoutOrg, class MemTraits>
	+template< class Space, class LayoutSub, class Layout, class LayoutOrg, class MemTraits >
	void test_3d_subview_5d_impl_layout() {
	- test_3d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	-
	- test_3d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	-
	- test_3d_subview_5d_impl_type<Space, int** [N2][N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int** [N2][N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int [N2][N3][N4],int [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int [N2][N3][N4],int* ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	-
	- test_3d_subview_5d_impl_type<Space, int*** [N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int*** [N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int* [N3][N4],int [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int* [N3][N4],int* ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	-
	- test_3d_subview_5d_impl_type<Space, int**** [N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int**** [N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int** [N4],int [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int** [N4],int* ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	-
	- test_3d_subview_5d_impl_type<Space, int***** ,int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int***** ,int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int*** ,int [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, int*** ,int* ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	-
	- test_3d_subview_5d_impl_type<Space, const int[N0][N1][N2][N3][N4],const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int[N0][N1][N2][N3][N4],const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int[N0][N1][N2][N3][N4],const int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int[N0][N1][N2][N3][N4],const int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	-
	- test_3d_subview_5d_impl_type<Space, const int* [N1][N2][N3][N4],const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int* [N1][N2][N3][N4],const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int* [N1][N2][N3][N4],const int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int* [N1][N2][N3][N4],const int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	-
	- test_3d_subview_5d_impl_type<Space, const int** [N2][N3][N4],const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int** [N2][N3][N4],const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int [N2][N3][N4],const int [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int [N2][N3][N4],const int* ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	-
	- test_3d_subview_5d_impl_type<Space, const int*** [N3][N4],const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int*** [N3][N4],const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int* [N3][N4],const int [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int* [N3][N4],const int* ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	-
	- test_3d_subview_5d_impl_type<Space, const int**** [N4],const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int**** [N4],const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int** [N4],const int [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int** [N4],const int* ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	-
	- test_3d_subview_5d_impl_type<Space, const int***** ,const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int***** ,const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int*** ,const int [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
	- test_3d_subview_5d_impl_type<Space, const int*** ,const int* ,LayoutSub, Layout, LayoutOrg, MemTraits>();
	+ test_3d_subview_5d_impl_type< Space, int[N0][N1][N2][N3][N4], int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int[N0][N1][N2][N3][N4], int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int[N0][N1][N2][N3][N4], int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int[N0][N1][N2][N3][N4], int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
	+
	+ test_3d_subview_5d_impl_type< Space, int* [N1][N2][N3][N4], int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int* [N1][N2][N3][N4], int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int* [N1][N2][N3][N4], int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int* [N1][N2][N3][N4], int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
	+
	+ test_3d_subview_5d_impl_type< Space, int** [N2][N3][N4], int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int** [N2][N3][N4], int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int [N2][N3][N4], int [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int [N2][N3][N4], int* , LayoutSub, Layout, LayoutOrg, MemTraits >();
	+
	+ test_3d_subview_5d_impl_type< Space, int*** [N3][N4], int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int*** [N3][N4], int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int* [N3][N4], int [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int* [N3][N4], int* , LayoutSub, Layout, LayoutOrg, MemTraits >();
	+
	+ test_3d_subview_5d_impl_type< Space, int**** [N4], int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int**** [N4], int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int** [N4], int [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int** [N4], int* , LayoutSub, Layout, LayoutOrg, MemTraits >();
	+
	+ test_3d_subview_5d_impl_type< Space, int***** , int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int***** , int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int*** , int [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, int*** , int* , LayoutSub, Layout, LayoutOrg, MemTraits >();
	+
	+ test_3d_subview_5d_impl_type< Space, const int[N0][N1][N2][N3][N4], const int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int[N0][N1][N2][N3][N4], const int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int[N0][N1][N2][N3][N4], const int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int[N0][N1][N2][N3][N4], const int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
	+
	+ test_3d_subview_5d_impl_type< Space, const int* [N1][N2][N3][N4], const int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int* [N1][N2][N3][N4], const int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int* [N1][N2][N3][N4], const int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int* [N1][N2][N3][N4], const int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
	+
	+ test_3d_subview_5d_impl_type< Space, const int** [N2][N3][N4], const int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int** [N2][N3][N4], const int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int [N2][N3][N4], const int [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int [N2][N3][N4], const int* , LayoutSub, Layout, LayoutOrg, MemTraits >();
	+
	+ test_3d_subview_5d_impl_type< Space, const int*** [N3][N4], const int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int*** [N3][N4], const int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int* [N3][N4], const int [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int* [N3][N4], const int* , LayoutSub, Layout, LayoutOrg, MemTraits >();
	+
	+ test_3d_subview_5d_impl_type< Space, const int**** [N4], const int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int**** [N4], const int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int** [N4], const int [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int** [N4], const int* , LayoutSub, Layout, LayoutOrg, MemTraits >();
	+
	+ test_3d_subview_5d_impl_type< Space, const int***** , const int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int***** , const int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int*** , const int [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
	+ test_3d_subview_5d_impl_type< Space, const int*** , const int* , LayoutSub, Layout, LayoutOrg, MemTraits >();
	}

	inline
	void test_subview_legal_args_right() {
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,int>::value));
	-
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
	-
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
	-
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
	-
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t>::value));
	-
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
	-
	- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, int >::value ) );
	+
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
	+
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, int >::value ) );
	+
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int >::value ) );
	+
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t >::value ) );
	+
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
	+
	+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
	}

	inline
	void test_subview_legal_args_left() {
	- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,int>::value));
	- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,int>::value));
	- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,int>::value));
	- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,int>::value));
	-
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
	-
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
	-
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
	-
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t>::value));
	-
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
	-
	- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
	- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
	+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, int >::value ) );
	+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, int >::value ) );
	+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, int >::value ) );
	+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, int >::value ) );
	+
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
	+
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, int >::value ) );
	+
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int >::value ) );
	+
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t >::value ) );
	+
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
	+
	+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
	+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
	}

	-}
	+} // namespace Impl

	-template< class Space, class MemTraits = void>
	+template< class Space, class MemTraits = void >
	void test_1d_assign() {
	- Impl::test_1d_assign_impl<Space,Kokkos::LayoutLeft ,Kokkos::LayoutLeft ,Kokkos::LayoutLeft, MemTraits>();
	- //Impl::test_1d_assign_impl<Space,Kokkos::LayoutRight ,Kokkos::LayoutLeft ,Kokkos::LayoutLeft >();
	- Impl::test_1d_assign_impl<Space,Kokkos::LayoutStride,Kokkos::LayoutLeft ,Kokkos::LayoutLeft, MemTraits>();
	- //Impl::test_1d_assign_impl<Space,Kokkos::LayoutLeft ,Kokkos::LayoutRight ,Kokkos::LayoutLeft >();
	- Impl::test_1d_assign_impl<Space,Kokkos::LayoutRight ,Kokkos::LayoutRight ,Kokkos::LayoutRight, MemTraits>();
	- Impl::test_1d_assign_impl<Space,Kokkos::LayoutStride,Kokkos::LayoutRight ,Kokkos::LayoutRight, MemTraits>();
	- //Impl::test_1d_assign_impl<Space,Kokkos::LayoutLeft ,Kokkos::LayoutStride,Kokkos::LayoutLeft >();
	- //Impl::test_1d_assign_impl<Space,Kokkos::LayoutRight ,Kokkos::LayoutStride,Kokkos::LayoutLeft >();
	- Impl::test_1d_assign_impl<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutLeft, MemTraits>();
	+ Impl::test_1d_assign_impl< Space, Kokkos::LayoutLeft, Kokkos::LayoutLeft, Kokkos::LayoutLeft, MemTraits >();
	+ //Impl::test_1d_assign_impl< Space, Kokkos::LayoutRight, Kokkos::LayoutLeft, Kokkos::LayoutLeft >();
	+ Impl::test_1d_assign_impl< Space, Kokkos::LayoutStride, Kokkos::LayoutLeft, Kokkos::LayoutLeft, MemTraits >();
	+ //Impl::test_1d_assign_impl< Space, Kokkos::LayoutLeft, Kokkos::LayoutRight, Kokkos::LayoutLeft >();
	+ Impl::test_1d_assign_impl< Space, Kokkos::LayoutRight, Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits >();
	+ Impl::test_1d_assign_impl< Space, Kokkos::LayoutStride, Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits >();
	+ //Impl::test_1d_assign_impl< Space, Kokkos::LayoutLeft, Kokkos::LayoutStride, Kokkos::LayoutLeft >();
	+ //Impl::test_1d_assign_impl< Space, Kokkos::LayoutRight, Kokkos::LayoutStride, Kokkos::LayoutLeft >();
	+ Impl::test_1d_assign_impl< Space, Kokkos::LayoutStride, Kokkos::LayoutStride, Kokkos::LayoutLeft, MemTraits >();
	}

	-template<class Space, class MemTraits = void>
	+template< class Space, class MemTraits = void >
	void test_2d_subview_3d() {
	- Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutRight ,Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits>();
	- Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits>();
	- Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutRight, MemTraits>();
	- Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutLeft, Kokkos::LayoutLeft, MemTraits>();
	- Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutLeft, MemTraits>();
	+ Impl::test_2d_subview_3d_impl_layout< Space, Kokkos::LayoutRight, Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits >();
	+ Impl::test_2d_subview_3d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits >();
	+ Impl::test_2d_subview_3d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutStride, Kokkos::LayoutRight, MemTraits >();
	+ Impl::test_2d_subview_3d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutLeft, Kokkos::LayoutLeft, MemTraits >();
	+ Impl::test_2d_subview_3d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutStride, Kokkos::LayoutLeft, MemTraits >();
	}

	-template<class Space, class MemTraits = void>
	+template< class Space, class MemTraits = void >
	void test_3d_subview_5d_right() {
	- Impl::test_3d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits>();
	- Impl::test_3d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutRight, MemTraits>();
	+ Impl::test_3d_subview_5d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits >();
	+ Impl::test_3d_subview_5d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutStride, Kokkos::LayoutRight, MemTraits >();
	}

	-template<class Space, class MemTraits = void>
	+template< class Space, class MemTraits = void >
	void test_3d_subview_5d_left() {
	- Impl::test_3d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutLeft, Kokkos::LayoutLeft, MemTraits>();
	- Impl::test_3d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutLeft, MemTraits>();
	+ Impl::test_3d_subview_5d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutLeft, Kokkos::LayoutLeft, MemTraits >();
	+ Impl::test_3d_subview_5d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutStride, Kokkos::LayoutLeft, MemTraits >();
	}

	+namespace Impl {

	+template< class Layout, class Space >
	+struct FillView_3D {
	+ Kokkos::View< int***, Layout, Space > a;

	-namespace Impl {
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const int & ii ) const
	+ {
	+ const int i = std::is_same< Layout, Kokkos::LayoutLeft >::value
	+ ? ii % a.dimension_0()
	+ : ii / ( a.dimension_1() * a.dimension_2() );
	+
	+ const int j = std::is_same< Layout, Kokkos::LayoutLeft >::value
	+ ? ( ii / a.dimension_0() ) % a.dimension_1()
	+ : ( ii / a.dimension_2() ) % a.dimension_1();
	+
	+ const int k = std::is_same< Layout, Kokkos::LayoutRight >::value
	+ ? ii / ( a.dimension_0() * a.dimension_1() )
	+ : ii % a.dimension_2();

	- template<class Layout, class Space>
	- struct FillView_3D {
	- Kokkos::View<int***,Layout,Space> a;
	-
	- KOKKOS_INLINE_FUNCTION
	- void operator() (const int& ii) const {
	- const int i = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
	- ii % a.dimension_0(): ii / (a.dimension_1()*a.dimension_2());
	- const int j = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
	- (ii / a.dimension_0()) % a.dimension_1() : (ii / a.dimension_2()) % a.dimension_1();
	- const int k = std::is_same<Layout,Kokkos::LayoutRight>::value ?
	- ii / (a.dimension_0() * a.dimension_1()) : ii % a.dimension_2();
	- a(i,j,k) = 1000000 * i + 1000 * j + k;
	+ a( i, j, k ) = 1000000 * i + 1000 * j + k;
	+ }
	+};
	+
	+template< class Layout, class Space >
	+struct FillView_4D {
	+ Kokkos::View< int****, Layout, Space > a;
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const int & ii ) const {
	+ const int i = std::is_same< Layout, Kokkos::LayoutLeft >::value
	+ ? ii % a.dimension_0()
	+ : ii / ( a.dimension_1() * a.dimension_2() * a.dimension_3() );
	+
	+ const int j = std::is_same< Layout, Kokkos::LayoutLeft >::value
	+ ? ( ii / a.dimension_0() ) % a.dimension_1()
	+ : ( ii / ( a.dimension_2() * a.dimension_3() ) % a.dimension_1() );
	+
	+ const int k = std::is_same< Layout, Kokkos::LayoutRight >::value
	+ ? ( ii / ( a.dimension_0() * a.dimension_1() ) ) % a.dimension_2()
	+ : ( ii / a.dimension_3() ) % a.dimension_2();
	+
	+ const int l = std::is_same< Layout, Kokkos::LayoutRight >::value
	+ ? ii / ( a.dimension_0() * a.dimension_1() * a.dimension_2() )
	+ : ii % a.dimension_3();
	+
	+ a( i, j, k, l ) = 1000000 * i + 10000 * j + 100 * k + l;
	+ }
	+};
	+
	+template< class Layout, class Space, class MemTraits >
	+struct CheckSubviewCorrectness_3D_3D {
	+ Kokkos::View< const int***, Layout, Space, MemTraits > a;
	+ Kokkos::View< const int***, Layout, Space, MemTraits > b;
	+ int offset_0, offset_2;
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const int & ii ) const
	+ {
	+ const int i = std::is_same< Layout, Kokkos::LayoutLeft >::value
	+ ? ii % b.dimension_0()
	+ : ii / ( b.dimension_1() * b.dimension_2() );
	+
	+ const int j = std::is_same< Layout, Kokkos::LayoutLeft >::value
	+ ? ( ii / b.dimension_0() ) % b.dimension_1()
	+ : ( ii / b.dimension_2() ) % b.dimension_1();
	+
	+ const int k = std::is_same< Layout, Kokkos::LayoutRight >::value
	+ ? ii / ( b.dimension_0() * b.dimension_1() )
	+ : ii % b.dimension_2();
	+
	+ if ( a( i + offset_0, j, k + offset_2 ) != b( i, j, k ) ) {
	+ Kokkos::abort( "Error: check_subview_correctness 3D-3D (LayoutLeft -> LayoutLeft or LayoutRight -> LayoutRight)" );
	}
	- };
	-
	- template<class Layout, class Space>
	- struct FillView_4D {
	- Kokkos::View<int****,Layout,Space> a;
	-
	- KOKKOS_INLINE_FUNCTION
	- void operator() (const int& ii) const {
	- const int i = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
	- ii % a.dimension_0(): ii / (a.dimension_1()a.dimension_2()a.dimension_3());
	- const int j = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
	- (ii / a.dimension_0()) % a.dimension_1() : (ii / (a.dimension_2()*a.dimension_3()) % a.dimension_1());
	- const int k = std::is_same<Layout,Kokkos::LayoutRight>::value ?
	- (ii / (a.dimension_0() * a.dimension_1())) % a.dimension_2() : (ii / a.dimension_3()) % a.dimension_2();
	- const int l = std::is_same<Layout,Kokkos::LayoutRight>::value ?
	- ii / (a.dimension_0() * a.dimension_1() * a.dimension_2()) : ii % a.dimension_3();
	- a(i,j,k,l) = 1000000 * i + 10000 * j + 100 * k + l;
	+ }
	+};
	+
	+template< class Layout, class Space, class MemTraits >
	+struct CheckSubviewCorrectness_3D_4D {
	+ Kokkos::View< const int****, Layout, Space, MemTraits > a;
	+ Kokkos::View< const int***, Layout, Space, MemTraits > b;
	+ int offset_0, offset_2, index;
	+
	+ KOKKOS_INLINE_FUNCTION
	+ void operator()( const int & ii ) const {
	+ const int i = std::is_same< Layout, Kokkos::LayoutLeft >::value
	+ ? ii % b.dimension_0()
	+ : ii / ( b.dimension_1() * b.dimension_2() );
	+
	+ const int j = std::is_same< Layout, Kokkos::LayoutLeft >::value
	+ ? ( ii / b.dimension_0() ) % b.dimension_1()
	+ : ( ii / b.dimension_2() ) % b.dimension_1();
	+
	+ const int k = std::is_same< Layout, Kokkos::LayoutRight >::value
	+ ? ii / ( b.dimension_0() * b.dimension_1() )
	+ : ii % b.dimension_2();
	+
	+ int i0, i1, i2, i3;
	+
	+ if ( std::is_same< Layout, Kokkos::LayoutLeft >::value ) {
	+ i0 = i + offset_0;
	+ i1 = j;
	+ i2 = k + offset_2;
	+ i3 = index;
	}
	- };
	-
	- template<class Layout, class Space, class MemTraits>
	- struct CheckSubviewCorrectness_3D_3D {
	- Kokkos::View<const int***,Layout,Space,MemTraits> a;
	- Kokkos::View<const int***,Layout,Space,MemTraits> b;
	- int offset_0,offset_2;
	-
	- KOKKOS_INLINE_FUNCTION
	- void operator() (const int& ii) const {
	- const int i = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
	- ii % b.dimension_0(): ii / (b.dimension_1()*b.dimension_2());
	- const int j = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
	- (ii / b.dimension_0()) % b.dimension_1() : (ii / b.dimension_2()) % b.dimension_1();
	- const int k = std::is_same<Layout,Kokkos::LayoutRight>::value ?
	- ii / (b.dimension_0() * b.dimension_1()) : ii % b.dimension_2();
	- if( a(i+offset_0,j,k+offset_2) != b(i,j,k))
	- Kokkos::abort("Error: check_subview_correctness 3D-3D (LayoutLeft -> LayoutLeft or LayoutRight -> LayoutRight)");
	+ else {
	+ i0 = index;
	+ i1 = i + offset_0;
	+ i2 = j;
	+ i3 = k + offset_2;
	}
	- };
	-
	- template<class Layout, class Space, class MemTraits>
	- struct CheckSubviewCorrectness_3D_4D {
	- Kokkos::View<const int****,Layout,Space,MemTraits> a;
	- Kokkos::View<const int***,Layout,Space,MemTraits> b;
	- int offset_0,offset_2,index;
	-
	- KOKKOS_INLINE_FUNCTION
	- void operator() (const int& ii) const {
	- const int i = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
	- ii % b.dimension_0(): ii / (b.dimension_1()*b.dimension_2());
	- const int j = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
	- (ii / b.dimension_0()) % b.dimension_1() : (ii / b.dimension_2()) % b.dimension_1();
	- const int k = std::is_same<Layout,Kokkos::LayoutRight>::value ?
	- ii / (b.dimension_0() * b.dimension_1()) : ii % b.dimension_2();
	-
	- int i0,i1,i2,i3;
	- if(std::is_same<Layout,Kokkos::LayoutLeft>::value) {
	- i0 = i + offset_0;
	- i1 = j;
	- i2 = k + offset_2;
	- i3 = index;
	- } else {
	- i0 = index;
	- i1 = i + offset_0;
	- i2 = j;
	- i3 = k + offset_2;
	- }
	- if( a(i0,i1,i2,i3) != b(i,j,k))
	- Kokkos::abort("Error: check_subview_correctness 3D-4D (LayoutLeft -> LayoutLeft or LayoutRight -> LayoutRight)");
	+
	+ if ( a( i0, i1, i2, i3 ) != b( i, j, k ) ) {
	+ Kokkos::abort( "Error: check_subview_correctness 3D-4D (LayoutLeft -> LayoutLeft or LayoutRight -> LayoutRight)" );
	}
	- };
	-}
	+ }
	+};

	-template<class Space, class MemTraits = void>
	+} // namespace Impl
	+
	+template< class Space, class MemTraits = void >
	void test_layoutleft_to_layoutleft() {
	Impl::test_subview_legal_args_left();

	{
	- Kokkos::View<int***,Kokkos::LayoutLeft,Space> a("A",100,4,3);
	- Kokkos::View<int***,Kokkos::LayoutLeft,Space> b(a,Kokkos::pair<int,int>(16,32),Kokkos::ALL,Kokkos::ALL);
	+ Kokkos::View< int***, Kokkos::LayoutLeft, Space > a( "A", 100, 4, 3 );
	+ Kokkos::View< int***, Kokkos::LayoutLeft, Space > b( a, Kokkos::pair< int, int >( 16, 32 ), Kokkos::ALL, Kokkos::ALL );

	- Impl::FillView_3D<Kokkos::LayoutLeft,Space> fill;
	+ Impl::FillView_3D< Kokkos::LayoutLeft, Space > fill;
	fill.a = a;
	- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,a.extent(0)a.extent(1)a.extent(2)), fill);
	+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, a.extent( 0 ) * a.extent( 1 ) * a.extent( 2 ) ), fill );

	- Impl::CheckSubviewCorrectness_3D_3D<Kokkos::LayoutLeft,Space,MemTraits> check;
	+ Impl::CheckSubviewCorrectness_3D_3D< Kokkos::LayoutLeft, Space, MemTraits > check;
	check.a = a;
	check.b = b;
	check.offset_0 = 16;
	check.offset_2 = 0;
	- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,b.extent(0)b.extent(1)b.extent(2)), check);
	+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, b.extent( 0 ) * b.extent( 1 ) * b.extent( 2 ) ), check );
	}
	+
	{
	- Kokkos::View<int***,Kokkos::LayoutLeft,Space> a("A",100,4,5);
	- Kokkos::View<int***,Kokkos::LayoutLeft,Space> b(a,Kokkos::pair<int,int>(16,32),Kokkos::ALL,Kokkos::pair<int,int>(1,3));
	+ Kokkos::View< int***, Kokkos::LayoutLeft, Space > a( "A", 100, 4, 5 );
	+ Kokkos::View< int***, Kokkos::LayoutLeft, Space > b( a, Kokkos::pair< int, int >( 16, 32 ), Kokkos::ALL, Kokkos::pair< int, int >( 1, 3 ) );

	- Impl::FillView_3D<Kokkos::LayoutLeft,Space> fill;
	+ Impl::FillView_3D<Kokkos::LayoutLeft, Space> fill;
	fill.a = a;
	- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,a.extent(0)a.extent(1)a.extent(2)), fill);
	+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, a.extent( 0 ) * a.extent( 1 ) * a.extent( 2 ) ), fill );

	- Impl::CheckSubviewCorrectness_3D_3D<Kokkos::LayoutLeft,Space,MemTraits> check;
	+ Impl::CheckSubviewCorrectness_3D_3D< Kokkos::LayoutLeft, Space, MemTraits > check;
	check.a = a;
	check.b = b;
	check.offset_0 = 16;
	check.offset_2 = 1;
	- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,b.extent(0)b.extent(1)b.extent(2)), check);
	+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, b.extent( 0 ) * b.extent( 1 ) * b.extent( 2 ) ), check );
	}
	+
	{
	- Kokkos::View<int****,Kokkos::LayoutLeft,Space> a("A",100,4,5,3);
	- Kokkos::View<int***,Kokkos::LayoutLeft,Space> b(a,Kokkos::pair<int,int>(16,32),Kokkos::ALL,Kokkos::pair<int,int>(1,3),1);
	+ Kokkos::View< int****, Kokkos::LayoutLeft, Space > a( "A", 100, 4, 5, 3 );
	+ Kokkos::View< int***, Kokkos::LayoutLeft, Space > b( a, Kokkos::pair< int, int >( 16, 32 ), Kokkos::ALL, Kokkos::pair< int, int >( 1, 3 ), 1 );

	- Impl::FillView_4D<Kokkos::LayoutLeft,Space> fill;
	+ Impl::FillView_4D< Kokkos::LayoutLeft, Space > fill;
	fill.a = a;
	- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,a.extent(0)a.extent(1)a.extent(2)*a.extent(3)), fill);
	+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, a.extent( 0 ) * a.extent( 1 ) * a.extent( 2 ) * a.extent( 3 ) ), fill );

	- Impl::CheckSubviewCorrectness_3D_4D<Kokkos::LayoutLeft,Space,MemTraits> check;
	+ Impl::CheckSubviewCorrectness_3D_4D< Kokkos::LayoutLeft, Space, MemTraits > check;
	check.a = a;
	check.b = b;
	check.offset_0 = 16;
	check.offset_2 = 1;
	check.index = 1;
	- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,b.extent(0)b.extent(1)b.extent(2)), check);
	+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, b.extent( 0 ) * b.extent( 1 ) * b.extent( 2 ) ), check );
	}
	}

	-template<class Space, class MemTraits = void>
	+template< class Space, class MemTraits = void >
	void test_layoutright_to_layoutright() {
	Impl::test_subview_legal_args_right();

	{
	- Kokkos::View<int***,Kokkos::LayoutRight,Space> a("A",100,4,3);
	- Kokkos::View<int***,Kokkos::LayoutRight,Space> b(a,Kokkos::pair<int,int>(16,32),Kokkos::ALL,Kokkos::ALL);
	+ Kokkos::View< int***, Kokkos::LayoutRight, Space > a( "A", 100, 4, 3 );
	+ Kokkos::View< int***, Kokkos::LayoutRight, Space > b( a, Kokkos::pair< int, int >( 16, 32 ), Kokkos::ALL, Kokkos::ALL );

	- Impl::FillView_3D<Kokkos::LayoutRight,Space> fill;
	+ Impl::FillView_3D<Kokkos::LayoutRight, Space> fill;
	fill.a = a;
	- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,a.extent(0)a.extent(1)a.extent(2)), fill);
	+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, a.extent( 0 ) * a.extent( 1 ) * a.extent( 2 ) ), fill );

	- Impl::CheckSubviewCorrectness_3D_3D<Kokkos::LayoutRight,Space,MemTraits> check;
	+ Impl::CheckSubviewCorrectness_3D_3D< Kokkos::LayoutRight, Space, MemTraits > check;
	check.a = a;
	check.b = b;
	check.offset_0 = 16;
	check.offset_2 = 0;
	- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,b.extent(0)b.extent(1)b.extent(2)), check);
	+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, b.extent( 0 ) * b.extent( 1 ) * b.extent( 2 ) ), check );
	}
	- {
	- Kokkos::View<int****,Kokkos::LayoutRight,Space> a("A",3,4,5,100);
	- Kokkos::View<int***,Kokkos::LayoutRight,Space> b(a,1,Kokkos::pair<int,int>(1,3),Kokkos::ALL,Kokkos::ALL);

	+ {
	+ Kokkos::View< int****, Kokkos::LayoutRight, Space > a( "A", 3, 4, 5, 100 );
	+ Kokkos::View< int***, Kokkos::LayoutRight, Space > b( a, 1, Kokkos::pair< int, int >( 1, 3 ), Kokkos::ALL, Kokkos::ALL );

	- Impl::FillView_4D<Kokkos::LayoutRight,Space> fill;
	+ Impl::FillView_4D< Kokkos::LayoutRight, Space > fill;
	fill.a = a;
	- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,a.extent(0)a.extent(1)a.extent(2)*a.extent(3)), fill);
	+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, a.extent( 0 ) * a.extent( 1 ) * a.extent( 2 ) * a.extent( 3 ) ), fill );

	- Impl::CheckSubviewCorrectness_3D_4D<Kokkos::LayoutRight,Space,MemTraits> check;
	+ Impl::CheckSubviewCorrectness_3D_4D< Kokkos::LayoutRight, Space, MemTraits > check;
	check.a = a;
	check.b = b;
	check.offset_0 = 1;
	check.offset_2 = 0;
	check.index = 1;
	- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,b.extent(0)b.extent(1)b.extent(2)), check);
	+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, b.extent( 0 ) * b.extent( 1 ) * b.extent( 2 ) ), check );
	}
	}

	-
	-}
	-//----------------------------------------------------------------------------
	-
	+} // namespace TestViewSubview
	diff --git a/lib/kokkos/core/unit_test/UnitTestMain.cpp b/lib/kokkos/core/unit_test/UnitTestMain.cpp
	index f952ab3db..4f52fc956 100644
	--- a/lib/kokkos/core/unit_test/UnitTestMain.cpp
	+++ b/lib/kokkos/core/unit_test/UnitTestMain.cpp
	@@ -1,50 +1,49 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <gtest/gtest.h>

	-int main(int argc, char *argv[]) {
	- ::testing::InitGoogleTest(&argc,argv);
	+int main( int argc, char *argv[] ) {
	+ ::testing::InitGoogleTest( &argc, argv );
	return RUN_ALL_TESTS();
	}
	-
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda.hpp
	index 36b9b0688..768b03920 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda.hpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda.hpp
	@@ -1,111 +1,103 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#ifndef KOKKOS_TEST_CUDA_HPP
	#define KOKKOS_TEST_CUDA_HPP
	+
	#include <gtest/gtest.h>

	#include <Kokkos_Macros.hpp>
	-
	#include <Kokkos_Core.hpp>

	#include <TestTile.hpp>
	-
	-//----------------------------------------------------------------------------
	-
	#include <TestSharedAlloc.hpp>
	#include <TestViewMapping.hpp>
	-
	-
	#include <TestViewAPI.hpp>
	#include <TestViewOfClass.hpp>
	#include <TestViewSubview.hpp>
	#include <TestViewSpaceAssign.hpp>
	#include <TestAtomic.hpp>
	#include <TestAtomicOperations.hpp>
	-
	#include <TestAtomicViews.hpp>
	-
	#include <TestRange.hpp>
	#include <TestTeam.hpp>
	#include <TestReduce.hpp>
	#include <TestScan.hpp>
	#include <TestAggregate.hpp>
	#include <TestCompilerMacros.hpp>
	#include <TestTaskScheduler.hpp>
	#include <TestMemoryPool.hpp>
	-
	-
	#include <TestCXX11.hpp>
	#include <TestCXX11Deduction.hpp>
	#include <TestTeamVector.hpp>
	#include <TestTemplateMetaFunctions.hpp>
	-
	#include <TestPolicyConstruction.hpp>
	-
	#include <TestMDRange.hpp>

	namespace Test {

	-// For Some Reason I can only have the definition of SetUp and TearDown in one cpp file ...
	+// For some reason I can only have the definition of SetUp and TearDown in one cpp file ...
	class cuda : public ::testing::Test {
	protected:
	static void SetUpTestCase();
	static void TearDownTestCase();
	};

	#ifdef TEST_CUDA_INSTANTIATE_SETUP_TEARDOWN
	void cuda::SetUpTestCase()
	- {
	- Kokkos::Cuda::print_configuration( std::cout );
	- Kokkos::HostSpace::execution_space::initialize();
	- Kokkos::Cuda::initialize( Kokkos::Cuda::SelectDevice(0) );
	- }
	+{
	+ Kokkos::print_configuration( std::cout );
	+ Kokkos::HostSpace::execution_space::initialize();
	+ Kokkos::Cuda::initialize( Kokkos::Cuda::SelectDevice( 0 ) );
	+}

	void cuda::TearDownTestCase()
	- {
	- Kokkos::Cuda::finalize();
	- Kokkos::HostSpace::execution_space::finalize();
	- }
	-#endif
	+{
	+ Kokkos::Cuda::finalize();
	+ Kokkos::HostSpace::execution_space::finalize();
	}
	#endif
	+
	+} // namespace Test
	+
	+#endif
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Atomics.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Atomics.cpp
	index ff379dc80..7cf19b26d 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_Atomics.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Atomics.cpp
	@@ -1,203 +1,203 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda , atomics )
	+TEST_F( cuda, atomics )
	{
	- const int loop_count = 1e3 ;
	+ const int loop_count = 1e3;

	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Cuda>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Cuda>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Cuda>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Cuda >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Cuda >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Cuda >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Cuda>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Cuda>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Cuda>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Cuda >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Cuda >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Cuda >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Cuda>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Cuda>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Cuda>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Cuda >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Cuda >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Cuda >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Cuda>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Cuda>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Cuda>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Cuda >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Cuda >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Cuda >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Cuda>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Cuda>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Cuda>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Cuda >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Cuda >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Cuda >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Cuda>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Cuda>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Cuda>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Cuda >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Cuda >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Cuda >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Cuda>(100,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Cuda>(100,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Cuda>(100,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Cuda >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Cuda >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Cuda >( 100, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Cuda>(100,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Cuda>(100,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Cuda>(100,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Cuda >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Cuda >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Cuda >( 100, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Cuda>(100,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Cuda>(100,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Cuda>(100,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Cuda >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Cuda >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Cuda >( 100, 3 ) ) );
	}

	-TEST_F( cuda , atomic_operations )
	+TEST_F( cuda, atomic_operations )
	{
	- const int start = 1; //Avoid zero for division
	+ const int start = 1; // Avoid zero for division.
	const int end = 11;
	- for (int i = start; i < end; ++i)
	+
	+ for ( int i = start; i < end; ++i )
	{
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Cuda>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Cuda>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Cuda>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Cuda>(start, end-i, 4 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Cuda>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Cuda>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Cuda>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Cuda>(start, end-i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Cuda >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Cuda >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Cuda >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Cuda >( start, end - i, 4 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Cuda >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Cuda >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Cuda >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Cuda >( start, end - i, 4 ) ) );
	}
	}

	-TEST_F( cuda , atomic_views_integral )
	+TEST_F( cuda, atomic_views_integral )
	{
	const long length = 1000000;
	+
	{
	- //Integral Types
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 8 ) ) );
	+ // Integral Types.
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 8 ) ) );
	}
	}

	-TEST_F( cuda , atomic_views_nonintegral )
	+TEST_F( cuda, atomic_views_nonintegral )
	{
	const long length = 1000000;
	- {
	- //Non-Integral Types
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Cuda>(length, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Cuda>(length, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Cuda>(length, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Cuda>(length, 4 ) ) );

	+ {
	+ // Non-Integral Types.
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Cuda >( length, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Cuda >( length, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Cuda >( length, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Cuda >( length, 4 ) ) );
	}
	}

	-
	-TEST_F( cuda , atomic_view_api )
	+TEST_F( cuda, atomic_view_api )
	{
	- TestAtomicViews::TestAtomicViewAPI<int, Kokkos::Cuda>();
	+ TestAtomicViews::TestAtomicViewAPI< int, Kokkos::Cuda >();
	}

	-
	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Other.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Other.cpp
	index aeaa2a0e8..e655193a5 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_Other.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Other.cpp
	@@ -1,189 +1,194 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#define TEST_CUDA_INSTANTIATE_SETUP_TEARDOWN
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda , init ) {
	+TEST_F( cuda, init )
	+{
	;
	}

	-TEST_F( cuda , md_range ) {
	- TestMDRange_2D< Kokkos::Cuda >::test_for2(100,100);
	-
	- TestMDRange_3D< Kokkos::Cuda >::test_for3(100,100,100);
	+TEST_F( cuda , mdrange_for ) {
	+ TestMDRange_2D< Kokkos::Cuda >::test_for2( 100, 100 );
	+ TestMDRange_3D< Kokkos::Cuda >::test_for3( 100, 100, 100 );
	+ TestMDRange_4D< Kokkos::Cuda >::test_for4( 100, 10, 100, 10 );
	+ TestMDRange_5D< Kokkos::Cuda >::test_for5( 100, 10, 10, 10, 5 );
	+ TestMDRange_6D< Kokkos::Cuda >::test_for6( 100, 10, 5, 2, 10, 5 );
	}

	-TEST_F( cuda, policy_construction) {
	+TEST_F( cuda, policy_construction )
	+{
	TestRangePolicyConstruction< Kokkos::Cuda >();
	TestTeamPolicyConstruction< Kokkos::Cuda >();
	}

	-TEST_F( cuda , range_tag )
	+TEST_F( cuda, range_tag )
	{
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_scan(0);
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(0);
	-
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_scan(2);
	-
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(3);
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(3);
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(3);
	-
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
	-
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
	- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_scan( 0 );
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 0 );
	+
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_scan( 2 );
	+
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 3 );
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 3 );
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 3 );
	+
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_scan( 1000 );
	+
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1001 );
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1001 );
	+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 1001 );
	}

	-
	//----------------------------------------------------------------------------

	-TEST_F( cuda , compiler_macros )
	+TEST_F( cuda, compiler_macros )
	{
	ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Cuda >() ) );
	}

	//----------------------------------------------------------------------------

	-TEST_F( cuda , memory_pool )
	+TEST_F( cuda, memory_pool )
	{
	bool val = TestMemoryPool::test_mempool< Kokkos::Cuda >( 128, 128000000 );
	ASSERT_TRUE( val );

	TestMemoryPool::test_mempool2< Kokkos::Cuda >( 64, 4, 1000000, 2000000 );

	TestMemoryPool::test_memory_exhaustion< Kokkos::Cuda >();
	}

	//----------------------------------------------------------------------------

	#if defined( KOKKOS_ENABLE_TASKDAG )

	-TEST_F( cuda , task_fib )
	+TEST_F( cuda, task_fib )
	{
	- for ( int i = 0 ; i < 25 ; ++i ) {
	- TestTaskScheduler::TestFib< Kokkos::Cuda >::run(i, (i+1)(i+1)10000 );
	+ for ( int i = 0; i < 25; ++i ) {
	+ TestTaskScheduler::TestFib< Kokkos::Cuda >::run( i, ( i + 1 ) * ( i + 1 ) * 10000 );
	}
	}

	-TEST_F( cuda , task_depend )
	+TEST_F( cuda, task_depend )
	{
	- for ( int i = 0 ; i < 25 ; ++i ) {
	- TestTaskScheduler::TestTaskDependence< Kokkos::Cuda >::run(i);
	+ for ( int i = 0; i < 25; ++i ) {
	+ TestTaskScheduler::TestTaskDependence< Kokkos::Cuda >::run( i );
	}
	}

	-TEST_F( cuda , task_team )
	+TEST_F( cuda, task_team )
	{
	- TestTaskScheduler::TestTaskTeam< Kokkos::Cuda >::run(1000);
	- //TestTaskScheduler::TestTaskTeamValue< Kokkos::Cuda >::run(1000); //put back after testing
	+ TestTaskScheduler::TestTaskTeam< Kokkos::Cuda >::run( 1000 );
	+ //TestTaskScheduler::TestTaskTeamValue< Kokkos::Cuda >::run( 1000 ); // Put back after testing.
	}

	#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */

	//----------------------------------------------------------------------------

	#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA )
	-TEST_F( cuda , cxx11 )
	+TEST_F( cuda, cxx11 )
	{
	- if ( std::is_same< Kokkos::DefaultExecutionSpace , Kokkos::Cuda >::value ) {
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >(1) ) );
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >(2) ) );
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >(3) ) );
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >(4) ) );
	+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::Cuda >::value ) {
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >( 1 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >( 2 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >( 3 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >( 4 ) ) );
	}
	}
	#endif

	TEST_F( cuda, tile_layout )
	{
	- TestTile::test< Kokkos::Cuda , 1 , 1 >( 1 , 1 );
	- TestTile::test< Kokkos::Cuda , 1 , 1 >( 2 , 3 );
	- TestTile::test< Kokkos::Cuda , 1 , 1 >( 9 , 10 );
	-
	- TestTile::test< Kokkos::Cuda , 2 , 2 >( 1 , 1 );
	- TestTile::test< Kokkos::Cuda , 2 , 2 >( 2 , 3 );
	- TestTile::test< Kokkos::Cuda , 2 , 2 >( 4 , 4 );
	- TestTile::test< Kokkos::Cuda , 2 , 2 >( 9 , 9 );
	-
	- TestTile::test< Kokkos::Cuda , 2 , 4 >( 9 , 9 );
	- TestTile::test< Kokkos::Cuda , 4 , 2 >( 9 , 9 );
	-
	- TestTile::test< Kokkos::Cuda , 4 , 4 >( 1 , 1 );
	- TestTile::test< Kokkos::Cuda , 4 , 4 >( 4 , 4 );
	- TestTile::test< Kokkos::Cuda , 4 , 4 >( 9 , 9 );
	- TestTile::test< Kokkos::Cuda , 4 , 4 >( 9 , 11 );
	-
	- TestTile::test< Kokkos::Cuda , 8 , 8 >( 1 , 1 );
	- TestTile::test< Kokkos::Cuda , 8 , 8 >( 4 , 4 );
	- TestTile::test< Kokkos::Cuda , 8 , 8 >( 9 , 9 );
	- TestTile::test< Kokkos::Cuda , 8 , 8 >( 9 , 11 );
	+ TestTile::test< Kokkos::Cuda, 1, 1 >( 1, 1 );
	+ TestTile::test< Kokkos::Cuda, 1, 1 >( 2, 3 );
	+ TestTile::test< Kokkos::Cuda, 1, 1 >( 9, 10 );
	+
	+ TestTile::test< Kokkos::Cuda, 2, 2 >( 1, 1 );
	+ TestTile::test< Kokkos::Cuda, 2, 2 >( 2, 3 );
	+ TestTile::test< Kokkos::Cuda, 2, 2 >( 4, 4 );
	+ TestTile::test< Kokkos::Cuda, 2, 2 >( 9, 9 );
	+
	+ TestTile::test< Kokkos::Cuda, 2, 4 >( 9, 9 );
	+ TestTile::test< Kokkos::Cuda, 4, 2 >( 9, 9 );
	+
	+ TestTile::test< Kokkos::Cuda, 4, 4 >( 1, 1 );
	+ TestTile::test< Kokkos::Cuda, 4, 4 >( 4, 4 );
	+ TestTile::test< Kokkos::Cuda, 4, 4 >( 9, 9 );
	+ TestTile::test< Kokkos::Cuda, 4, 4 >( 9, 11 );
	+
	+ TestTile::test< Kokkos::Cuda, 8, 8 >( 1, 1 );
	+ TestTile::test< Kokkos::Cuda, 8, 8 >( 4, 4 );
	+ TestTile::test< Kokkos::Cuda, 8, 8 >( 9, 9 );
	+ TestTile::test< Kokkos::Cuda, 8, 8 >( 9, 11 );
	}

	-#if defined (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	-#if defined (KOKKOS_COMPILER_CLANG)
	-TEST_F( cuda , dispatch )
	+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
	+#if defined( KOKKOS_COMPILER_CLANG )
	+TEST_F( cuda, dispatch )
	{
	- const int repeat = 100 ;
	- for ( int i = 0 ; i < repeat ; ++i ) {
	- for ( int j = 0 ; j < repeat ; ++j ) {
	- Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Cuda >(0,j)
	- , KOKKOS_LAMBDA( int ) {} );
	- }}
	+ const int repeat = 100;
	+ for ( int i = 0; i < repeat; ++i ) {
	+ for ( int j = 0; j < repeat; ++j ) {
	+ Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Cuda >( 0, j )
	+ , KOKKOS_LAMBDA( int ) {} );
	+ }
	+ }
	}
	#endif
	#endif

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_a.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_a.cpp
	index b9ab9fe72..01eed4e02 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_a.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_a.cpp
	@@ -1,56 +1,56 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda , reducers )
	+TEST_F( cuda, reducers )
	{
	- TestReducers<int, Kokkos::Cuda>::execute_integer();
	- TestReducers<size_t, Kokkos::Cuda>::execute_integer();
	- TestReducers<double, Kokkos::Cuda>::execute_float();
	- TestReducers<Kokkos::complex<double>, Kokkos::Cuda>::execute_basic();
	+ TestReducers< int, Kokkos::Cuda >::execute_integer();
	+ TestReducers< size_t, Kokkos::Cuda >::execute_integer();
	+ TestReducers< double, Kokkos::Cuda >::execute_float();
	+ TestReducers< Kokkos::complex<double>, Kokkos::Cuda >::execute_basic();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_b.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_b.cpp
	index c588d752d..7f4e0973e 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_b.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_b.cpp
	@@ -1,130 +1,138 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, long_reduce) {
	- TestReduce< long , Kokkos::Cuda >( 0 );
	- TestReduce< long , Kokkos::Cuda >( 1000000 );
	+TEST_F( cuda, long_reduce )
	+{
	+ TestReduce< long, Kokkos::Cuda >( 0 );
	+ TestReduce< long, Kokkos::Cuda >( 1000000 );
	}

	-TEST_F( cuda, double_reduce) {
	- TestReduce< double , Kokkos::Cuda >( 0 );
	- TestReduce< double , Kokkos::Cuda >( 1000000 );
	+TEST_F( cuda, double_reduce )
	+{
	+ TestReduce< double, Kokkos::Cuda >( 0 );
	+ TestReduce< double, Kokkos::Cuda >( 1000000 );
	}

	-TEST_F( cuda, long_reduce_dynamic ) {
	- TestReduceDynamic< long , Kokkos::Cuda >( 0 );
	- TestReduceDynamic< long , Kokkos::Cuda >( 1000000 );
	+TEST_F( cuda, long_reduce_dynamic )
	+{
	+ TestReduceDynamic< long, Kokkos::Cuda >( 0 );
	+ TestReduceDynamic< long, Kokkos::Cuda >( 1000000 );
	}

	-TEST_F( cuda, double_reduce_dynamic ) {
	- TestReduceDynamic< double , Kokkos::Cuda >( 0 );
	- TestReduceDynamic< double , Kokkos::Cuda >( 1000000 );
	+TEST_F( cuda, double_reduce_dynamic )
	+{
	+ TestReduceDynamic< double, Kokkos::Cuda >( 0 );
	+ TestReduceDynamic< double, Kokkos::Cuda >( 1000000 );
	}

	-TEST_F( cuda, long_reduce_dynamic_view ) {
	- TestReduceDynamicView< long , Kokkos::Cuda >( 0 );
	- TestReduceDynamicView< long , Kokkos::Cuda >( 1000000 );
	+TEST_F( cuda, long_reduce_dynamic_view )
	+{
	+ TestReduceDynamicView< long, Kokkos::Cuda >( 0 );
	+ TestReduceDynamicView< long, Kokkos::Cuda >( 1000000 );
	}

	-TEST_F( cuda , scan )
	+TEST_F( cuda, scan )
	{
	- TestScan< Kokkos::Cuda >::test_range( 1 , 1000 );
	+ TestScan< Kokkos::Cuda >::test_range( 1, 1000 );
	TestScan< Kokkos::Cuda >( 0 );
	TestScan< Kokkos::Cuda >( 100000 );
	TestScan< Kokkos::Cuda >( 10000000 );
	Kokkos::Cuda::fence();
	}

	#if 0
	-TEST_F( cuda , scan_small )
	+TEST_F( cuda, scan_small )
	{
	- typedef TestScan< Kokkos::Cuda , Kokkos::Impl::CudaExecUseScanSmall > TestScanFunctor ;
	- for ( int i = 0 ; i < 1000 ; ++i ) {
	+ typedef TestScan< Kokkos::Cuda, Kokkos::Impl::CudaExecUseScanSmall > TestScanFunctor;
	+
	+ for ( int i = 0; i < 1000; ++i ) {
	TestScanFunctor( 10 );
	TestScanFunctor( 10000 );
	}
	TestScanFunctor( 1000000 );
	TestScanFunctor( 10000000 );

	Kokkos::Cuda::fence();
	}
	#endif

	-TEST_F( cuda , team_scan )
	+TEST_F( cuda, team_scan )
	{
	- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 0 );
	- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 10 );
	- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
	- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 10000 );
	- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
	+ TestScanTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 0 );
	+ TestScanTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	+ TestScanTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 10 );
	+ TestScanTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
	+ TestScanTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 10000 );
	+ TestScanTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
	}

	-TEST_F( cuda , team_long_reduce) {
	- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 0 );
	- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 3 );
	- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 100000 );
	- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	+TEST_F( cuda, team_long_reduce )
	+{
	+ TestReduceTeam< long, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 0 );
	+ TestReduceTeam< long, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	+ TestReduceTeam< long, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 3 );
	+ TestReduceTeam< long, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	+ TestReduceTeam< long, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 100000 );
	+ TestReduceTeam< long, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	}

	-TEST_F( cuda , team_double_reduce) {
	- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 0 );
	- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 3 );
	- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 100000 );
	- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	+TEST_F( cuda, team_double_reduce )
	+{
	+ TestReduceTeam< double, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 0 );
	+ TestReduceTeam< double, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	+ TestReduceTeam< double, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 3 );
	+ TestReduceTeam< double, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	+ TestReduceTeam< double, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 100000 );
	+ TestReduceTeam< double, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	}

	-TEST_F( cuda , reduction_deduction )
	+TEST_F( cuda, reduction_deduction )
	{
	TestCXX11::test_reduction_deduction< Kokkos::Cuda >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Spaces.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Spaces.cpp
	index f3cbc3b88..5bed7640d 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_Spaces.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Spaces.cpp
	@@ -1,399 +1,385 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	__global__
	void test_abort()
	{
	- Kokkos::abort("test_abort");
	+ Kokkos::abort( "test_abort" );
	}

	__global__
	void test_cuda_spaces_int_value( int * ptr )
	{
	- if ( ptr == 42 ) { ptr = 2 * 42 ; }
	+ if ( ptr == 42 ) { ptr = 2 * 42; }
	}

	-TEST_F( cuda , space_access )
	+TEST_F( cuda, space_access )
	{
	- //--------------------------------------
	-
	static_assert(
	- Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::HostSpace >::assignable , "" );
	+ Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace, Kokkos::HostSpace >::assignable, "" );

	static_assert(
	- Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaHostPinnedSpace >::assignable , "" );
	+ Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace, Kokkos::CudaHostPinnedSpace >::assignable, "" );

	static_assert(
	- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaSpace >::assignable , "" );
	+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace, Kokkos::CudaSpace >::assignable, "" );

	static_assert(
	- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaSpace >::accessible , "" );
	+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace, Kokkos::CudaSpace >::accessible, "" );

	static_assert(
	- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaUVMSpace >::assignable , "" );
	+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace, Kokkos::CudaUVMSpace >::assignable, "" );

	static_assert(
	- Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaUVMSpace >::accessible , "" );
	+ Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace, Kokkos::CudaUVMSpace >::accessible, "" );

	//--------------------------------------

	static_assert(
	- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaSpace >::assignable , "" );
	+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace, Kokkos::CudaSpace >::assignable, "" );

	static_assert(
	- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaUVMSpace >::assignable , "" );
	+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace, Kokkos::CudaUVMSpace >::assignable, "" );

	static_assert(
	- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaHostPinnedSpace >::assignable , "" );
	+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace, Kokkos::CudaHostPinnedSpace >::assignable, "" );

	static_assert(
	- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaHostPinnedSpace >::accessible , "" );
	+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace, Kokkos::CudaHostPinnedSpace >::accessible, "" );

	static_assert(
	- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::HostSpace >::assignable , "" );
	+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace, Kokkos::HostSpace >::assignable, "" );

	static_assert(
	- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::HostSpace >::accessible , "" );
	+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace, Kokkos::HostSpace >::accessible, "" );

	//--------------------------------------

	static_assert(
	- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaUVMSpace >::assignable , "" );
	+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace, Kokkos::CudaUVMSpace >::assignable, "" );

	static_assert(
	- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaSpace >::assignable , "" );
	+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace, Kokkos::CudaSpace >::assignable, "" );

	static_assert(
	- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaSpace >::accessible , "" );
	+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace, Kokkos::CudaSpace >::accessible, "" );

	static_assert(
	- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::HostSpace >::assignable , "" );
	+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace, Kokkos::HostSpace >::assignable, "" );

	static_assert(
	- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::HostSpace >::accessible , "" );
	+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace, Kokkos::HostSpace >::accessible, "" );

	static_assert(
	- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaHostPinnedSpace >::assignable , "" );
	+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace, Kokkos::CudaHostPinnedSpace >::assignable, "" );

	static_assert(
	- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaHostPinnedSpace >::accessible , "" );
	+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace, Kokkos::CudaHostPinnedSpace >::accessible, "" );

	//--------------------------------------

	static_assert(
	- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaHostPinnedSpace >::assignable , "" );
	+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace, Kokkos::CudaHostPinnedSpace >::assignable, "" );

	static_assert(
	- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace >::assignable , "" );
	+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace, Kokkos::HostSpace >::assignable, "" );

	static_assert(
	- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace >::accessible , "" );
	+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace, Kokkos::HostSpace >::accessible, "" );

	static_assert(
	- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaSpace >::assignable , "" );
	+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace, Kokkos::CudaSpace >::assignable, "" );

	static_assert(
	- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaSpace >::accessible , "" );
	+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace, Kokkos::CudaSpace >::accessible, "" );

	static_assert(
	- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaUVMSpace >::assignable , "" );
	+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace, Kokkos::CudaUVMSpace >::assignable, "" );

	static_assert(
	- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaUVMSpace >::accessible , "" );
	+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace, Kokkos::CudaUVMSpace >::accessible, "" );

	//--------------------------------------

	static_assert(
	- ! Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda , Kokkos::HostSpace >::accessible , "" );
	+ ! Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda, Kokkos::HostSpace >::accessible, "" );

	static_assert(
	- Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda , Kokkos::CudaSpace >::accessible , "" );
	+ Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda, Kokkos::CudaSpace >::accessible, "" );

	static_assert(
	- Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda , Kokkos::CudaUVMSpace >::accessible , "" );
	+ Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda, Kokkos::CudaUVMSpace >::accessible, "" );

	static_assert(
	- Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda , Kokkos::CudaHostPinnedSpace >::accessible , "" );
	+ Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda, Kokkos::CudaHostPinnedSpace >::accessible, "" );

	static_assert(
	- ! Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace , Kokkos::CudaSpace >::accessible , "" );
	+ ! Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, Kokkos::CudaSpace >::accessible, "" );

	static_assert(
	- Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace , Kokkos::CudaUVMSpace >::accessible , "" );
	+ Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, Kokkos::CudaUVMSpace >::accessible, "" );

	static_assert(
	- Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace , Kokkos::CudaHostPinnedSpace >::accessible , "" );
	-
	+ Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, Kokkos::CudaHostPinnedSpace >::accessible, "" );

	static_assert(
	std::is_same< Kokkos::Impl::HostMirror< Kokkos::CudaSpace >::Space
	- , Kokkos::HostSpace >::value , "" );
	+ , Kokkos::HostSpace >::value, "" );

	static_assert(
	std::is_same< Kokkos::Impl::HostMirror< Kokkos::CudaUVMSpace >::Space
	, Kokkos::Device< Kokkos::HostSpace::execution_space
	- , Kokkos::CudaUVMSpace > >::value , "" );
	+ , Kokkos::CudaUVMSpace > >::value, "" );

	static_assert(
	std::is_same< Kokkos::Impl::HostMirror< Kokkos::CudaHostPinnedSpace >::Space
	- , Kokkos::CudaHostPinnedSpace >::value , "" );
	+ , Kokkos::CudaHostPinnedSpace >::value, "" );

	static_assert(
	std::is_same< Kokkos::Device< Kokkos::HostSpace::execution_space
	, Kokkos::CudaUVMSpace >
	, Kokkos::Device< Kokkos::HostSpace::execution_space
	- , Kokkos::CudaUVMSpace > >::value , "" );
	+ , Kokkos::CudaUVMSpace > >::value, "" );

	static_assert(
	Kokkos::Impl::SpaceAccessibility
	< Kokkos::Impl::HostMirror< Kokkos::Cuda >::Space
	, Kokkos::HostSpace
	- >::accessible , "" );
	+ >::accessible, "" );

	static_assert(
	Kokkos::Impl::SpaceAccessibility
	< Kokkos::Impl::HostMirror< Kokkos::CudaSpace >::Space
	, Kokkos::HostSpace
	- >::accessible , "" );
	+ >::accessible, "" );

	static_assert(
	Kokkos::Impl::SpaceAccessibility
	< Kokkos::Impl::HostMirror< Kokkos::CudaUVMSpace >::Space
	, Kokkos::HostSpace
	- >::accessible , "" );
	+ >::accessible, "" );

	static_assert(
	Kokkos::Impl::SpaceAccessibility
	< Kokkos::Impl::HostMirror< Kokkos::CudaHostPinnedSpace >::Space
	, Kokkos::HostSpace
	- >::accessible , "" );
	+ >::accessible, "" );
	}

	TEST_F( cuda, uvm )
	{
	if ( Kokkos::CudaUVMSpace::available() ) {
	+ int * uvm_ptr = (int*) Kokkos::kokkos_malloc< Kokkos::CudaUVMSpace >( "uvm_ptr", sizeof( int ) );

	- int * uvm_ptr = (int*) Kokkos::kokkos_malloc< Kokkos::CudaUVMSpace >("uvm_ptr",sizeof(int));
	-
	- *uvm_ptr = 42 ;
	+ *uvm_ptr = 42;

	Kokkos::Cuda::fence();
	- test_cuda_spaces_int_value<<<1,1>>>(uvm_ptr);
	+ test_cuda_spaces_int_value<<< 1, 1 >>>( uvm_ptr );
	Kokkos::Cuda::fence();

	- EXPECT_EQ( uvm_ptr, int(242) );
	-
	- Kokkos::kokkos_free< Kokkos::CudaUVMSpace >(uvm_ptr );
	+ EXPECT_EQ( uvm_ptr, int( 2 42 ) );

	+ Kokkos::kokkos_free< Kokkos::CudaUVMSpace >( uvm_ptr );
	}
	}

	TEST_F( cuda, uvm_num_allocs )
	{
	- // The max number of uvm allocations allowed is 65536
	+ // The max number of UVM allocations allowed is 65536.
	#define MAX_NUM_ALLOCS 65536

	if ( Kokkos::CudaUVMSpace::available() ) {
	-
	struct TestMaxUVMAllocs {

	- using view_type = Kokkos::View< double* , Kokkos::CudaUVMSpace >;
	- using view_of_view_type = Kokkos::View< view_type[ MAX_NUM_ALLOCS ]
	+ using view_type = Kokkos::View< double*, Kokkos::CudaUVMSpace >;
	+ using view_of_view_type = Kokkos::View< view_type[ MAX_NUM_ALLOCS ]
	, Kokkos::CudaUVMSpace >;

	- TestMaxUVMAllocs()
	- : view_allocs_test("view_allocs_test")
	+ TestMaxUVMAllocs() : view_allocs_test( "view_allocs_test" )
	{
	+ for ( auto i = 0; i < MAX_NUM_ALLOCS; ++i ) {

	- for ( auto i = 0; i < MAX_NUM_ALLOCS ; ++i ) {
	-
	- // Kokkos will throw a runtime exception if an attempt is made to
	- // allocate more than the maximum number of uvm allocations
	+ // Kokkos will throw a runtime exception if an attempt is made to
	+ // allocate more than the maximum number of uvm allocations.

	// In this test, the max num of allocs occurs when i = MAX_NUM_ALLOCS - 1
	// since the 'outer' view counts as one UVM allocation, leaving
	- // 65535 possible UVM allocations, that is 'i in [0 , 65535)'
	+ // 65535 possible UVM allocations, that is 'i in [0, 65535)'.

	- // The test will catch the exception thrown in this case and continue
	+ // The test will catch the exception thrown in this case and continue.

	- if ( i == ( MAX_NUM_ALLOCS - 1) ) {
	- EXPECT_ANY_THROW( { view_allocs_test(i) = view_type("inner_view",1); } ) ;
	+ if ( i == ( MAX_NUM_ALLOCS - 1 ) ) {
	+ EXPECT_ANY_THROW( { view_allocs_test( i ) = view_type( "inner_view", 1 ); } );
	}
	else {
	- if(i<MAX_NUM_ALLOCS - 1000) {
	- EXPECT_NO_THROW( { view_allocs_test(i) = view_type("inner_view",1); } ) ;
	- } else { // This might or might not throw depending on compilation options.
	+ if ( i < MAX_NUM_ALLOCS - 1000 ) {
	+ EXPECT_NO_THROW( { view_allocs_test( i ) = view_type( "inner_view", 1 ); } );
	+ } else { // This might or might not throw depending on compilation options.
	try {
	- view_allocs_test(i) = view_type("inner_view",1);
	+ view_allocs_test( i ) = view_type( "inner_view", 1 );
	}
	- catch (...) {}
	+ catch ( ... ) {}
	}
	}

	- } //end allocation for loop
	+ } // End allocation for loop.

	- for ( auto i = 0; i < MAX_NUM_ALLOCS -1; ++i ) {
	+ for ( auto i = 0; i < MAX_NUM_ALLOCS - 1; ++i ) {

	- view_allocs_test(i) = view_type();
	+ view_allocs_test( i ) = view_type();

	- } //end deallocation for loop
	+ } // End deallocation for loop.

	- view_allocs_test = view_of_view_type(); // deallocate the view of views
	+ view_allocs_test = view_of_view_type(); // Deallocate the view of views.
	}

	- // Member
	- view_of_view_type view_allocs_test ;
	- } ;
	-
	- // trigger the test via the TestMaxUVMAllocs constructor
	- TestMaxUVMAllocs() ;
	+ // Member.
	+ view_of_view_type view_allocs_test;
	+ };

	+ // Trigger the test via the TestMaxUVMAllocs constructor.
	+ TestMaxUVMAllocs();
	}
	- #undef MAX_NUM_ALLOCS
	+
	+ #undef MAX_NUM_ALLOCS
	}

	-template< class MemSpace , class ExecSpace >
	+template< class MemSpace, class ExecSpace >
	struct TestViewCudaAccessible {
	-
	enum { N = 1000 };

	- using V = Kokkos::View<double*,MemSpace> ;
	+ using V = Kokkos::View< double*, MemSpace >;

	- V m_base ;
	+ V m_base;

	struct TagInit {};
	struct TagTest {};

	KOKKOS_INLINE_FUNCTION
	- void operator()( const TagInit & , const int i ) const { m_base[i] = i + 1 ; }
	+ void operator()( const TagInit &, const int i ) const { m_base[i] = i + 1; }

	KOKKOS_INLINE_FUNCTION
	- void operator()( const TagTest & , const int i , long & error_count ) const
	- { if ( m_base[i] != i + 1 ) ++error_count ; }
	+ void operator()( const TagTest &, const int i, long & error_count ) const
	+ { if ( m_base[i] != i + 1 ) ++error_count; }

	TestViewCudaAccessible()
	- : m_base("base",N)
	+ : m_base( "base", N )
	{}

	static void run()
	- {
	- TestViewCudaAccessible self ;
	- Kokkos::parallel_for( Kokkos::RangePolicy< typename MemSpace::execution_space , TagInit >(0,N) , self );
	- MemSpace::execution_space::fence();
	- // Next access is a different execution space, must complete prior kernel.
	- long error_count = -1 ;
	- Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace , TagTest >(0,N) , self , error_count );
	- EXPECT_EQ( error_count , 0 );
	- }
	+ {
	+ TestViewCudaAccessible self;
	+ Kokkos::parallel_for( Kokkos::RangePolicy< typename MemSpace::execution_space, TagInit >( 0, N ), self );
	+ MemSpace::execution_space::fence();
	+
	+ // Next access is a different execution space, must complete prior kernel.
	+ long error_count = -1;
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace, TagTest >( 0, N ), self, error_count );
	+ EXPECT_EQ( error_count, 0 );
	+ }
	};

	-TEST_F( cuda , impl_view_accessible )
	+TEST_F( cuda, impl_view_accessible )
	{
	- TestViewCudaAccessible< Kokkos::CudaSpace , Kokkos::Cuda >::run();
	+ TestViewCudaAccessible< Kokkos::CudaSpace, Kokkos::Cuda >::run();

	- TestViewCudaAccessible< Kokkos::CudaUVMSpace , Kokkos::Cuda >::run();
	- TestViewCudaAccessible< Kokkos::CudaUVMSpace , Kokkos::HostSpace::execution_space >::run();
	+ TestViewCudaAccessible< Kokkos::CudaUVMSpace, Kokkos::Cuda >::run();
	+ TestViewCudaAccessible< Kokkos::CudaUVMSpace, Kokkos::HostSpace::execution_space >::run();

	- TestViewCudaAccessible< Kokkos::CudaHostPinnedSpace , Kokkos::Cuda >::run();
	- TestViewCudaAccessible< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace::execution_space >::run();
	+ TestViewCudaAccessible< Kokkos::CudaHostPinnedSpace, Kokkos::Cuda >::run();
	+ TestViewCudaAccessible< Kokkos::CudaHostPinnedSpace, Kokkos::HostSpace::execution_space >::run();
	}

	template< class MemSpace >
	struct TestViewCudaTexture {
	-
	enum { N = 1000 };

	- using V = Kokkos::View<double*,MemSpace> ;
	- using T = Kokkos::View<const double*, MemSpace, Kokkos::MemoryRandomAccess > ;
	+ using V = Kokkos::View< double*, MemSpace >;
	+ using T = Kokkos::View< const double*, MemSpace, Kokkos::MemoryRandomAccess >;

	- V m_base ;
	- T m_tex ;
	+ V m_base;
	+ T m_tex;

	struct TagInit {};
	struct TagTest {};

	KOKKOS_INLINE_FUNCTION
	- void operator()( const TagInit & , const int i ) const { m_base[i] = i + 1 ; }
	+ void operator()( const TagInit &, const int i ) const { m_base[i] = i + 1; }

	KOKKOS_INLINE_FUNCTION
	- void operator()( const TagTest & , const int i , long & error_count ) const
	- { if ( m_tex[i] != i + 1 ) ++error_count ; }
	+ void operator()( const TagTest &, const int i, long & error_count ) const
	+ { if ( m_tex[i] != i + 1 ) ++error_count; }

	TestViewCudaTexture()
	- : m_base("base",N)
	+ : m_base( "base", N )
	, m_tex( m_base )
	{}

	static void run()
	- {
	- EXPECT_TRUE( ( std::is_same< typename V::reference_type
	- , double &
	- >::value ) );
	-
	- EXPECT_TRUE( ( std::is_same< typename T::reference_type
	- , const double
	- >::value ) );
	-
	- EXPECT_TRUE( V::reference_type_is_lvalue_reference ); // An ordinary view
	- EXPECT_FALSE( T::reference_type_is_lvalue_reference ); // Texture fetch returns by value
	-
	- TestViewCudaTexture self ;
	- Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Cuda , TagInit >(0,N) , self );
	- long error_count = -1 ;
	- Kokkos::parallel_reduce( Kokkos::RangePolicy< Kokkos::Cuda , TagTest >(0,N) , self , error_count );
	- EXPECT_EQ( error_count , 0 );
	- }
	-};
	+ {
	+ EXPECT_TRUE( ( std::is_same< typename V::reference_type, double & >::value ) );
	+ EXPECT_TRUE( ( std::is_same< typename T::reference_type, const double >::value ) );
	+
	+ EXPECT_TRUE( V::reference_type_is_lvalue_reference ); // An ordinary view.
	+ EXPECT_FALSE( T::reference_type_is_lvalue_reference ); // Texture fetch returns by value.

	+ TestViewCudaTexture self;
	+ Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Cuda, TagInit >( 0, N ), self );

	-TEST_F( cuda , impl_view_texture )
	+ long error_count = -1;
	+ Kokkos::parallel_reduce( Kokkos::RangePolicy< Kokkos::Cuda, TagTest >( 0, N ), self, error_count );
	+ EXPECT_EQ( error_count, 0 );
	+ }
	+};
	+
	+TEST_F( cuda, impl_view_texture )
	{
	TestViewCudaTexture< Kokkos::CudaSpace >::run();
	TestViewCudaTexture< Kokkos::CudaUVMSpace >::run();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_a.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_a.cpp
	index fd8a647ef..0aea35db5 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_a.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_a.cpp
	@@ -1,92 +1,103 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_auto_1d_left ) {
	- TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::Cuda >();
	+TEST_F( cuda, view_subview_auto_1d_left )
	+{
	+ TestViewSubview::test_auto_1d< Kokkos::LayoutLeft, Kokkos::Cuda >();
	}

	-TEST_F( cuda, view_subview_auto_1d_right ) {
	- TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::Cuda >();
	+TEST_F( cuda, view_subview_auto_1d_right )
	+{
	+ TestViewSubview::test_auto_1d< Kokkos::LayoutRight, Kokkos::Cuda >();
	}

	-TEST_F( cuda, view_subview_auto_1d_stride ) {
	- TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::Cuda >();
	+TEST_F( cuda, view_subview_auto_1d_stride )
	+{
	+ TestViewSubview::test_auto_1d< Kokkos::LayoutStride, Kokkos::Cuda >();
	}

	-TEST_F( cuda, view_subview_assign_strided ) {
	+TEST_F( cuda, view_subview_assign_strided )
	+{
	TestViewSubview::test_1d_strided_assignment< Kokkos::Cuda >();
	}

	-TEST_F( cuda, view_subview_left_0 ) {
	+TEST_F( cuda, view_subview_left_0 )
	+{
	TestViewSubview::test_left_0< Kokkos::CudaUVMSpace >();
	}

	-TEST_F( cuda, view_subview_left_1 ) {
	+TEST_F( cuda, view_subview_left_1 )
	+{
	TestViewSubview::test_left_1< Kokkos::CudaUVMSpace >();
	}

	-TEST_F( cuda, view_subview_left_2 ) {
	+TEST_F( cuda, view_subview_left_2 )
	+{
	TestViewSubview::test_left_2< Kokkos::CudaUVMSpace >();
	}

	-TEST_F( cuda, view_subview_left_3 ) {
	+TEST_F( cuda, view_subview_left_3 )
	+{
	TestViewSubview::test_left_3< Kokkos::CudaUVMSpace >();
	}

	-TEST_F( cuda, view_subview_right_0 ) {
	+TEST_F( cuda, view_subview_right_0 )
	+{
	TestViewSubview::test_right_0< Kokkos::CudaUVMSpace >();
	}

	-TEST_F( cuda, view_subview_right_1 ) {
	+TEST_F( cuda, view_subview_right_1 )
	+{
	TestViewSubview::test_right_1< Kokkos::CudaUVMSpace >();
	}

	-TEST_F( cuda, view_subview_right_3 ) {
	+TEST_F( cuda, view_subview_right_3 )
	+{
	TestViewSubview::test_right_3< Kokkos::CudaUVMSpace >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_b.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_b.cpp
	index 053fcfc20..f31f4cbe6 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_b.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_b.cpp
	@@ -1,60 +1,62 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_layoutleft_to_layoutleft) {
	+TEST_F( cuda, view_subview_layoutleft_to_layoutleft )
	+{
	TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Cuda >();
	- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Cuda , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Cuda , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-TEST_F( cuda, view_subview_layoutright_to_layoutright) {
	+TEST_F( cuda, view_subview_layoutright_to_layoutright )
	+{
	TestViewSubview::test_layoutright_to_layoutright< Kokkos::Cuda >();
	- TestViewSubview::test_layoutright_to_layoutright< Kokkos::Cuda , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	- TestViewSubview::test_layoutright_to_layoutright< Kokkos::Cuda , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
	index 4c5f2ef72..0213a196e 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_1d_assign ) {
	+TEST_F( cuda, view_subview_1d_assign )
	+{
	TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c02.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c02.cpp
	index aee6f1730..181e1bab2 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c02.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c02.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_1d_assign_atomic ) {
	- TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( cuda, view_subview_1d_assign_atomic )
	+{
	+ TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c03.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c03.cpp
	index 2ef48c686..708cc1f5b 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c03.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c03.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_1d_assign_randomaccess ) {
	- TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( cuda, view_subview_1d_assign_randomaccess )
	+{
	+ TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c04.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c04.cpp
	index aec123ac2..a3db996f8 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c04.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c04.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_2d_from_3d ) {
	+TEST_F( cuda, view_subview_2d_from_3d )
	+{
	TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp
	index e8ad23199..2f7cffa75 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_2d_from_3d_atomic ) {
	- TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( cuda, view_subview_2d_from_3d_atomic )
	+{
	+ TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp
	index e86b4513f..949c6f3e0 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_2d_from_3d_randomaccess ) {
	- TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( cuda, view_subview_2d_from_3d_randomaccess )
	+{
	+ TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c07.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c07.cpp
	index ad9dcc0fd..3e68277a9 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c07.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c07.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_3d_from_5d_left ) {
	+TEST_F( cuda, view_subview_3d_from_5d_left )
	+{
	TestViewSubview::test_3d_subview_5d_left< Kokkos::CudaUVMSpace >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c08.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c08.cpp
	index f97d97e59..0cd91b779 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c08.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c08.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_3d_from_5d_left_atomic ) {
	- TestViewSubview::test_3d_subview_5d_left< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( cuda, view_subview_3d_from_5d_left_atomic )
	+{
	+ TestViewSubview::test_3d_subview_5d_left< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c09.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c09.cpp
	index 2a07f28f8..cd1c13f7d 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c09.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c09.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_3d_from_5d_left_randomaccess ) {
	- TestViewSubview::test_3d_subview_5d_left< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( cuda, view_subview_3d_from_5d_left_randomaccess )
	+{
	+ TestViewSubview::test_3d_subview_5d_left< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c10.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c10.cpp
	index 3c51d9420..22d275354 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c10.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c10.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_3d_from_5d_right ) {
	+TEST_F( cuda, view_subview_3d_from_5d_right )
	+{
	TestViewSubview::test_3d_subview_5d_right< Kokkos::CudaUVMSpace >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c11.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c11.cpp
	index 835caa7b8..5dc5f87b4 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c11.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c11.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_3d_from_5d_right_atomic ) {
	- TestViewSubview::test_3d_subview_5d_right< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( cuda, view_subview_3d_from_5d_right_atomic )
	+{
	+ TestViewSubview::test_3d_subview_5d_right< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c12.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c12.cpp
	index 53bd5eee2..318d8edbb 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c12.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c12.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_3d_from_5d_right_randomaccess ) {
	- TestViewSubview::test_3d_subview_5d_right< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( cuda, view_subview_3d_from_5d_right_randomaccess )
	+{
	+ TestViewSubview::test_3d_subview_5d_right< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c_all.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c_all.cpp
	index e4348319f..a2158f06c 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c_all.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c_all.cpp
	@@ -1,12 +1,12 @@
	-#include<cuda/TestCuda_SubView_c01.cpp>
	-#include<cuda/TestCuda_SubView_c02.cpp>
	-#include<cuda/TestCuda_SubView_c03.cpp>
	-#include<cuda/TestCuda_SubView_c04.cpp>
	-#include<cuda/TestCuda_SubView_c05.cpp>
	-#include<cuda/TestCuda_SubView_c06.cpp>
	-#include<cuda/TestCuda_SubView_c07.cpp>
	-#include<cuda/TestCuda_SubView_c08.cpp>
	-#include<cuda/TestCuda_SubView_c09.cpp>
	-#include<cuda/TestCuda_SubView_c10.cpp>
	-#include<cuda/TestCuda_SubView_c11.cpp>
	-#include<cuda/TestCuda_SubView_c12.cpp>
	+#include <cuda/TestCuda_SubView_c01.cpp>
	+#include <cuda/TestCuda_SubView_c02.cpp>
	+#include <cuda/TestCuda_SubView_c03.cpp>
	+#include <cuda/TestCuda_SubView_c04.cpp>
	+#include <cuda/TestCuda_SubView_c05.cpp>
	+#include <cuda/TestCuda_SubView_c06.cpp>
	+#include <cuda/TestCuda_SubView_c07.cpp>
	+#include <cuda/TestCuda_SubView_c08.cpp>
	+#include <cuda/TestCuda_SubView_c09.cpp>
	+#include <cuda/TestCuda_SubView_c10.cpp>
	+#include <cuda/TestCuda_SubView_c11.cpp>
	+#include <cuda/TestCuda_SubView_c12.cpp>
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Team.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Team.cpp
	index 13834d09a..8d9b9328b 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_Team.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Team.cpp
	@@ -1,120 +1,126 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda , team_tag )
	+TEST_F( cuda, team_tag )
	{
	- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
	- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
	- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
	- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
	+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
	+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
	+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
	+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );

	- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
	- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
	- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(2);
	- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(2);
	+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
	+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
	+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 2 );
	+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 2 );

	- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
	- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
	- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
	- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
	+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
	+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
	+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1000 );
	+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1000 );
	}

	-TEST_F( cuda , team_shared_request) {
	- TestSharedTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >();
	- TestSharedTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >();
	+TEST_F( cuda, team_shared_request )
	+{
	+ TestSharedTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
	+ TestSharedTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
	}

	-//THis Tests request to much L0 scratch
	-//TEST_F( cuda, team_scratch_request) {
	-// TestScratchTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >();
	-// TestScratchTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >();
	+// This tests request to much L0 scratch.
	+//TEST_F( cuda, team_scratch_request )
	+//{
	+// TestScratchTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
	+// TestScratchTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
	//}

	-#if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	-TEST_F( cuda , team_lambda_shared_request) {
	+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
	+TEST_F( cuda, team_lambda_shared_request )
	+{
	TestLambdaSharedTeam< Kokkos::CudaSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
	TestLambdaSharedTeam< Kokkos::CudaUVMSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
	- TestLambdaSharedTeam< Kokkos::CudaHostPinnedSpace, Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >();
	+ TestLambdaSharedTeam< Kokkos::CudaHostPinnedSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
	TestLambdaSharedTeam< Kokkos::CudaSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
	TestLambdaSharedTeam< Kokkos::CudaUVMSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
	- TestLambdaSharedTeam< Kokkos::CudaHostPinnedSpace, Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >();
	+ TestLambdaSharedTeam< Kokkos::CudaHostPinnedSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
	}
	#endif

	-TEST_F( cuda, shmem_size) {
	+TEST_F( cuda, shmem_size )
	+{
	TestShmemSize< Kokkos::Cuda >();
	}

	-TEST_F( cuda, multi_level_scratch) {
	- TestMultiLevelScratchTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >();
	- TestMultiLevelScratchTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >();
	+TEST_F( cuda, multi_level_scratch )
	+{
	+ TestMultiLevelScratchTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
	+ TestMultiLevelScratchTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
	}

	-TEST_F( cuda , team_vector )
	+#if !defined(KOKKOS_CUDA_CLANG_WORKAROUND) && !defined(KOKKOS_ARCH_PASCAL)
	+TEST_F( cuda, team_vector )
	{
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(0) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(1) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(2) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(3) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(4) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(5) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(6) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(7) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(8) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(9) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(10) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 0 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 1 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 2 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 3 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 4 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 5 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 6 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 7 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 8 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 9 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 10 ) ) );
	}
	+#endif

	TEST_F( cuda, triple_nested_parallelism )
	{
	- TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048 , 32 , 32 );
	- TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048 , 32 , 16 );
	- TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048 , 16 , 16 );
	+ TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048, 32, 32 );
	+ TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048, 32, 16 );
	+ TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048, 16, 16 );
	}

	-
	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp
	index c01ca1c14..be0c4c571 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp
	@@ -1,59 +1,60 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda , impl_view_mapping_a ) {
	+TEST_F( cuda, impl_view_mapping_a )
	+{
	test_view_mapping< Kokkos::CudaSpace >();
	test_view_mapping_operator< Kokkos::CudaSpace >();
	}

	-TEST_F( cuda , view_of_class )
	+TEST_F( cuda, view_of_class )
	{
	TestViewMappingClassValue< Kokkos::CudaSpace >::run();
	TestViewMappingClassValue< Kokkos::CudaUVMSpace >::run();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_b.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_b.cpp
	index 8e821ada0..b4d8e5d95 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_b.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_b.cpp
	@@ -1,53 +1,54 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda , impl_view_mapping_d ) {
	+TEST_F( cuda, impl_view_mapping_d )
	+{
	test_view_mapping< Kokkos::CudaHostPinnedSpace >();
	test_view_mapping_operator< Kokkos::CudaHostPinnedSpace >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_c.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_c.cpp
	index cf29a68e9..e4e6894c5 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_c.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_c.cpp
	@@ -1,53 +1,54 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda , impl_view_mapping_c ) {
	+TEST_F( cuda, impl_view_mapping_c )
	+{
	test_view_mapping< Kokkos::CudaUVMSpace >();
	test_view_mapping_operator< Kokkos::CudaUVMSpace >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_d.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_d.cpp
	index db14b5158..82a3dd83e 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_d.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_d.cpp
	@@ -1,112 +1,116 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda , view_nested_view )
	+TEST_F( cuda, view_nested_view )
	{
	::Test::view_nested_view< Kokkos::Cuda >();
	}

	-
	-
	-TEST_F( cuda , view_remap )
	+TEST_F( cuda, view_remap )
	{
	- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
	+ enum { N0 = 3, N1 = 2, N2 = 8, N3 = 9 };

	- typedef Kokkos::View< double*[N1][N2][N3] ,
	- Kokkos::LayoutRight ,
	- Kokkos::CudaUVMSpace > output_type ;
	+ typedef Kokkos::View< double*[N1][N2][N3],
	+ Kokkos::LayoutRight,
	+ Kokkos::CudaUVMSpace > output_type;

	- typedef Kokkos::View< int**[N2][N3] ,
	- Kokkos::LayoutLeft ,
	- Kokkos::CudaUVMSpace > input_type ;
	+ typedef Kokkos::View< int**[N2][N3],
	+ Kokkos::LayoutLeft,
	+ Kokkos::CudaUVMSpace > input_type;

	- typedef Kokkos::View< int*[N0][N2][N3] ,
	- Kokkos::LayoutLeft ,
	- Kokkos::CudaUVMSpace > diff_type ;
	+ typedef Kokkos::View< int*[N0][N2][N3],
	+ Kokkos::LayoutLeft,
	+ Kokkos::CudaUVMSpace > diff_type;

	- output_type output( "output" , N0 );
	- input_type input ( "input" , N0 , N1 );
	- diff_type diff ( "diff" , N0 );
	+ output_type output( "output", N0 );
	+ input_type input ( "input", N0, N1 );
	+ diff_type diff ( "diff", N0 );

	Kokkos::fence();
	- int value = 0 ;
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
	- input(i0,i1,i2,i3) = ++value ;
	- }}}}
	+
	+ int value = 0;
	+
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i0 = 0; i0 < N0; ++i0 )
	+ {
	+ input( i0, i1, i2, i3 ) = ++value;
	+ }
	+
	Kokkos::fence();

	- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
	- Kokkos::deep_copy( output , input );
	-
	+ // Kokkos::deep_copy( diff, input ); // Throw with incompatible shape.
	+ Kokkos::deep_copy( output, input );
	+
	Kokkos::fence();
	- value = 0 ;
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
	- ++value ;
	- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
	- }}}}
	+
	+ value = 0;
	+
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i0 = 0; i0 < N0; ++i0 )
	+ {
	+ ++value;
	+ ASSERT_EQ( value, ( (int) output( i0, i1, i2, i3 ) ) );
	+ }
	+
	Kokkos::fence();
	}

	-//----------------------------------------------------------------------------
	-
	-TEST_F( cuda , view_aggregate )
	+TEST_F( cuda, view_aggregate )
	{
	TestViewAggregate< Kokkos::Cuda >();
	}

	-TEST_F( cuda , template_meta_functions )
	+TEST_F( cuda, template_meta_functions )
	{
	- TestTemplateMetaFunctions<int, Kokkos::Cuda >();
	+ TestTemplateMetaFunctions< int, Kokkos::Cuda >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_e.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_e.cpp
	index 07d425647..27450fa6f 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_e.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_e.cpp
	@@ -1,63 +1,65 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda , impl_shared_alloc ) {
	- test_shared_alloc< Kokkos::CudaSpace , Kokkos::HostSpace::execution_space >();
	- test_shared_alloc< Kokkos::CudaUVMSpace , Kokkos::HostSpace::execution_space >();
	- test_shared_alloc< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace::execution_space >();
	+TEST_F( cuda, impl_shared_alloc )
	+{
	+ test_shared_alloc< Kokkos::CudaSpace, Kokkos::HostSpace::execution_space >();
	+ test_shared_alloc< Kokkos::CudaUVMSpace, Kokkos::HostSpace::execution_space >();
	+ test_shared_alloc< Kokkos::CudaHostPinnedSpace, Kokkos::HostSpace::execution_space >();
	}

	-TEST_F( cuda , impl_view_mapping_b ) {
	+TEST_F( cuda, impl_view_mapping_b )
	+{
	test_view_mapping_subview< Kokkos::CudaSpace >();
	test_view_mapping_subview< Kokkos::CudaUVMSpace >();
	test_view_mapping_subview< Kokkos::CudaHostPinnedSpace >();
	TestViewMappingAtomic< Kokkos::CudaSpace >::run();
	TestViewMappingAtomic< Kokkos::CudaUVMSpace >::run();
	TestViewMappingAtomic< Kokkos::CudaHostPinnedSpace >::run();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_f.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_f.cpp
	index 34721f02d..56524111a 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_f.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_f.cpp
	@@ -1,55 +1,56 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_api_a) {
	- typedef Kokkos::View< const int * , Kokkos::Cuda , Kokkos::MemoryTraits< Kokkos::RandomAccess > > view_texture_managed ;
	- typedef Kokkos::View< const int * , Kokkos::Cuda , Kokkos::MemoryTraits< Kokkos::RandomAccess \| Kokkos::Unmanaged > > view_texture_unmanaged ;
	+TEST_F( cuda, view_api_a )
	+{
	+ typedef Kokkos::View< const int *, Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::RandomAccess> > view_texture_managed;
	+ typedef Kokkos::View< const int *, Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::RandomAccess \| Kokkos::Unmanaged> > view_texture_unmanaged;

	- TestViewAPI< double , Kokkos::Cuda >();
	+ TestViewAPI< double, Kokkos::Cuda >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_g.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_g.cpp
	index abbcf3bf8..d5fd24456 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_g.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_g.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_api_b) {
	- TestViewAPI< double , Kokkos::CudaUVMSpace >();
	+TEST_F( cuda, view_api_b )
	+{
	+ TestViewAPI< double, Kokkos::CudaUVMSpace >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_h.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_h.cpp
	index 989964203..649023e4a 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_h.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_h.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda, view_api_c) {
	- TestViewAPI< double , Kokkos::CudaHostPinnedSpace >();
	+TEST_F( cuda, view_api_c )
	+{
	+ TestViewAPI< double, Kokkos::CudaHostPinnedSpace >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_s.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_s.cpp
	index 9bc09ba89..b46b1e5f8 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_s.cpp
	+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_s.cpp
	@@ -1,53 +1,54 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <cuda/TestCuda.hpp>

	namespace Test {

	-TEST_F( cuda , view_space_assign ) {
	- view_space_assign< Kokkos::HostSpace , Kokkos::CudaHostPinnedSpace >();
	- view_space_assign< Kokkos::CudaSpace , Kokkos::CudaUVMSpace >();
	+TEST_F( cuda, view_space_assign )
	+{
	+ view_space_assign< Kokkos::HostSpace, Kokkos::CudaHostPinnedSpace >();
	+ view_space_assign< Kokkos::CudaSpace, Kokkos::CudaUVMSpace >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP.hpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP.hpp
	index 28ae5b41b..ed9bb68cd 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP.hpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP.hpp
	@@ -1,117 +1,112 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#ifndef KOKKOS_TEST_OPENMP_HPP
	#define KOKKOS_TEST_OPENMP_HPP
	+
	#include <gtest/gtest.h>

	#include <Kokkos_Macros.hpp>
	+
	#ifdef KOKKOS_LAMBDA
	#undef KOKKOS_LAMBDA
	#endif
	#define KOKKOS_LAMBDA [=]

	#include <Kokkos_Core.hpp>

	#include <TestTile.hpp>
	-
	-//----------------------------------------------------------------------------
	-
	#include <TestSharedAlloc.hpp>
	#include <TestViewMapping.hpp>
	-
	-
	#include <TestViewAPI.hpp>
	#include <TestViewOfClass.hpp>
	#include <TestViewSubview.hpp>
	#include <TestAtomic.hpp>
	#include <TestAtomicOperations.hpp>
	#include <TestAtomicViews.hpp>
	#include <TestRange.hpp>
	#include <TestTeam.hpp>
	#include <TestReduce.hpp>
	#include <TestScan.hpp>
	#include <TestAggregate.hpp>
	#include <TestCompilerMacros.hpp>
	#include <TestTaskScheduler.hpp>
	#include <TestMemoryPool.hpp>
	-
	-
	#include <TestCXX11.hpp>
	#include <TestCXX11Deduction.hpp>
	#include <TestTeamVector.hpp>
	#include <TestTemplateMetaFunctions.hpp>
	-
	#include <TestPolicyConstruction.hpp>
	-
	#include <TestMDRange.hpp>

	namespace Test {

	class openmp : public ::testing::Test {
	protected:
	static void SetUpTestCase()
	{
	const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
	const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
	const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();

	- const unsigned threads_count = std::max( 1u , numa_count ) *
	- std::max( 2u , ( cores_per_numa * threads_per_core ) / 2 );
	+ const unsigned threads_count = std::max( 1u, numa_count ) *
	+ std::max( 2u, ( cores_per_numa * threads_per_core ) / 2 );

	Kokkos::OpenMP::initialize( threads_count );
	- Kokkos::OpenMP::print_configuration( std::cout , true );
	- srand(10231);
	+ Kokkos::print_configuration( std::cout, true );
	+ srand( 10231 );
	}

	static void TearDownTestCase()
	{
	Kokkos::OpenMP::finalize();

	- omp_set_num_threads(1);
	+ omp_set_num_threads( 1 );

	- ASSERT_EQ( 1 , omp_get_max_threads() );
	+ ASSERT_EQ( 1, omp_get_max_threads() );
	}
	};

	-}
	+} // namespace Test
	+
	#endif
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Atomics.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Atomics.cpp
	index ed6c9f8d1..2585c0197 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Atomics.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Atomics.cpp
	@@ -1,204 +1,201 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp , atomics )
	+TEST_F( openmp, atomics )
	{
	- const int loop_count = 1e4 ;
	+ const int loop_count = 1e4;

	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::OpenMP>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::OpenMP>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::OpenMP>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::OpenMP >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::OpenMP >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::OpenMP >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::OpenMP>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::OpenMP>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::OpenMP>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::OpenMP >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::OpenMP >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::OpenMP >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::OpenMP>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::OpenMP>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::OpenMP>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::OpenMP >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::OpenMP >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::OpenMP >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::OpenMP>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::OpenMP>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::OpenMP>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::OpenMP >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::OpenMP >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::OpenMP >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::OpenMP>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::OpenMP>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::OpenMP>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::OpenMP >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::OpenMP >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::OpenMP >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::OpenMP>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::OpenMP>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::OpenMP>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::OpenMP >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::OpenMP >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::OpenMP >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::OpenMP>(100,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::OpenMP>(100,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::OpenMP>(100,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::OpenMP >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::OpenMP >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::OpenMP >( 100, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::OpenMP>(100,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::OpenMP>(100,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::OpenMP>(100,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::OpenMP >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::OpenMP >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::OpenMP >( 100, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::OpenMP>(100,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::OpenMP>(100,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::OpenMP>(100,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::OpenMP >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::OpenMP >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::OpenMP >( 100, 3 ) ) );
	}

	-TEST_F( openmp , atomic_operations )
	+TEST_F( openmp, atomic_operations )
	{
	- const int start = 1; //Avoid zero for division
	+ const int start = 1; // Avoid zero for division.
	const int end = 11;
	- for (int i = start; i < end; ++i)
	+
	+ for ( int i = start; i < end; ++i )
	{
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::OpenMP>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::OpenMP>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::OpenMP>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::OpenMP>(start, end-i, 4 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::OpenMP>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::OpenMP>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::OpenMP>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::OpenMP>(start, end-i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::OpenMP >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::OpenMP >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::OpenMP >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::OpenMP >( start, end - i, 4 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::OpenMP >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::OpenMP >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::OpenMP >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::OpenMP >( start, end - i, 4 ) ) );
	}
	-
	}

	-
	-TEST_F( openmp , atomic_views_integral )
	+TEST_F( openmp, atomic_views_integral )
	{
	const long length = 1000000;
	{
	- //Integral Types
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 8 ) ) );
	-
	+ // Integral Types.
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 8 ) ) );
	}
	}

	-TEST_F( openmp , atomic_views_nonintegral )
	+TEST_F( openmp, atomic_views_nonintegral )
	{
	const long length = 1000000;
	{
	- //Non-Integral Types
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::OpenMP>(length, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::OpenMP>(length, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::OpenMP>(length, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::OpenMP>(length, 4 ) ) );
	-
	+ // Non-Integral Types.
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::OpenMP >( length, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::OpenMP >( length, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::OpenMP >( length, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::OpenMP >( length, 4 ) ) );
	}
	}

	-TEST_F( openmp , atomic_view_api )
	+TEST_F( openmp, atomic_view_api )
	{
	- TestAtomicViews::TestAtomicViewAPI<int, Kokkos::OpenMP>();
	+ TestAtomicViews::TestAtomicViewAPI<int, Kokkos::OpenMP >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Other.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Other.cpp
	index 126d730f0..b4f32dac7 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Other.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Other.cpp
	@@ -1,189 +1,212 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp , init ) {
	+TEST_F( openmp, init )
	+{
	;
	}

	-TEST_F( openmp , md_range ) {
	- TestMDRange_2D< Kokkos::OpenMP >::test_for2(100,100);
	+TEST_F( openmp, mdrange_for )
	+{
	+ Kokkos::Timer timer;
	+ TestMDRange_2D< Kokkos::OpenMP >::test_for2( 10000, 1000 );
	+ std::cout << " 2D: " << timer.seconds() << std::endl;
	+
	+ timer.reset();
	+ TestMDRange_3D< Kokkos::OpenMP >::test_for3( 100, 100, 1000 );
	+ std::cout << " 3D: " << timer.seconds() << std::endl;

	- TestMDRange_3D< Kokkos::OpenMP >::test_for3(100,100,100);
	+ timer.reset();
	+ TestMDRange_4D< Kokkos::OpenMP >::test_for4( 100, 10, 100, 100 );
	+ std::cout << " 4D: " << timer.seconds() << std::endl;
	+
	+ timer.reset();
	+ TestMDRange_5D< Kokkos::OpenMP >::test_for5( 100, 10, 10, 100, 50 );
	+ std::cout << " 5D: " << timer.seconds() << std::endl;
	+
	+ timer.reset();
	+ TestMDRange_6D< Kokkos::OpenMP >::test_for6( 10, 10, 10, 10, 50, 50 );
	+ std::cout << " 6D: " << timer.seconds() << std::endl;
	}

	-TEST_F( openmp, policy_construction) {
	+TEST_F( openmp, mdrange_reduce )
	+{
	+ TestMDRange_2D< Kokkos::OpenMP >::test_reduce2( 100, 100 );
	+ TestMDRange_3D< Kokkos::OpenMP >::test_reduce3( 100, 10, 100 );
	+}
	+
	+TEST_F( openmp, policy_construction )
	+{
	TestRangePolicyConstruction< Kokkos::OpenMP >();
	TestTeamPolicyConstruction< Kokkos::OpenMP >();
	}

	-TEST_F( openmp , range_tag )
	+TEST_F( openmp, range_tag )
	{
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_scan(0);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(0);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(0);
	-
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_scan(2);
	-
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(3);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(3);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(3);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(3);
	-
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
	-
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
	- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(1000);
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_scan( 0 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 0 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 0 );
	+
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_scan( 2 );
	+
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 3 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 3 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 3 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 3 );
	+
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_scan( 1000 );
	+
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1001 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1001 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 1001 );
	+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 1000 );
	}

	-
	//----------------------------------------------------------------------------

	-TEST_F( openmp , compiler_macros )
	+TEST_F( openmp, compiler_macros )
	{
	ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::OpenMP >() ) );
	}

	//----------------------------------------------------------------------------

	-TEST_F( openmp , memory_pool )
	+TEST_F( openmp, memory_pool )
	{
	bool val = TestMemoryPool::test_mempool< Kokkos::OpenMP >( 128, 128000000 );
	ASSERT_TRUE( val );

	TestMemoryPool::test_mempool2< Kokkos::OpenMP >( 64, 4, 1000000, 2000000 );

	TestMemoryPool::test_memory_exhaustion< Kokkos::OpenMP >();
	}

	//----------------------------------------------------------------------------

	#if defined( KOKKOS_ENABLE_TASKDAG )

	-TEST_F( openmp , task_fib )
	+TEST_F( openmp, task_fib )
	{
	- for ( int i = 0 ; i < 25 ; ++i ) {
	- TestTaskScheduler::TestFib< Kokkos::OpenMP >::run(i, (i+1)(i+1)10000 );
	+ for ( int i = 0; i < 25; ++i ) {
	+ TestTaskScheduler::TestFib< Kokkos::OpenMP >::run( i, ( i + 1 ) * ( i + 1 ) * 10000 );
	}
	}

	-TEST_F( openmp , task_depend )
	+TEST_F( openmp, task_depend )
	{
	- for ( int i = 0 ; i < 25 ; ++i ) {
	- TestTaskScheduler::TestTaskDependence< Kokkos::OpenMP >::run(i);
	+ for ( int i = 0; i < 25; ++i ) {
	+ TestTaskScheduler::TestTaskDependence< Kokkos::OpenMP >::run( i );
	}
	}

	-TEST_F( openmp , task_team )
	+TEST_F( openmp, task_team )
	{
	- TestTaskScheduler::TestTaskTeam< Kokkos::OpenMP >::run(1000);
	- //TestTaskScheduler::TestTaskTeamValue< Kokkos::OpenMP >::run(1000); //put back after testing
	+ TestTaskScheduler::TestTaskTeam< Kokkos::OpenMP >::run( 1000 );
	+ //TestTaskScheduler::TestTaskTeamValue< Kokkos::OpenMP >::run( 1000 ); // Put back after testing.
	}

	#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */

	//----------------------------------------------------------------------------

	#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
	-TEST_F( openmp , cxx11 )
	+TEST_F( openmp, cxx11 )
	{
	- if ( std::is_same< Kokkos::DefaultExecutionSpace , Kokkos::OpenMP >::value ) {
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(1) ) );
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(2) ) );
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(3) ) );
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(4) ) );
	+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::OpenMP >::value ) {
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >( 1 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >( 2 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >( 3 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >( 4 ) ) );
	}
	}
	#endif

	TEST_F( openmp, tile_layout )
	{
	- TestTile::test< Kokkos::OpenMP , 1 , 1 >( 1 , 1 );
	- TestTile::test< Kokkos::OpenMP , 1 , 1 >( 2 , 3 );
	- TestTile::test< Kokkos::OpenMP , 1 , 1 >( 9 , 10 );
	-
	- TestTile::test< Kokkos::OpenMP , 2 , 2 >( 1 , 1 );
	- TestTile::test< Kokkos::OpenMP , 2 , 2 >( 2 , 3 );
	- TestTile::test< Kokkos::OpenMP , 2 , 2 >( 4 , 4 );
	- TestTile::test< Kokkos::OpenMP , 2 , 2 >( 9 , 9 );
	-
	- TestTile::test< Kokkos::OpenMP , 2 , 4 >( 9 , 9 );
	- TestTile::test< Kokkos::OpenMP , 4 , 2 >( 9 , 9 );
	-
	- TestTile::test< Kokkos::OpenMP , 4 , 4 >( 1 , 1 );
	- TestTile::test< Kokkos::OpenMP , 4 , 4 >( 4 , 4 );
	- TestTile::test< Kokkos::OpenMP , 4 , 4 >( 9 , 9 );
	- TestTile::test< Kokkos::OpenMP , 4 , 4 >( 9 , 11 );
	-
	- TestTile::test< Kokkos::OpenMP , 8 , 8 >( 1 , 1 );
	- TestTile::test< Kokkos::OpenMP , 8 , 8 >( 4 , 4 );
	- TestTile::test< Kokkos::OpenMP , 8 , 8 >( 9 , 9 );
	- TestTile::test< Kokkos::OpenMP , 8 , 8 >( 9 , 11 );
	+ TestTile::test< Kokkos::OpenMP, 1, 1 >( 1, 1 );
	+ TestTile::test< Kokkos::OpenMP, 1, 1 >( 2, 3 );
	+ TestTile::test< Kokkos::OpenMP, 1, 1 >( 9, 10 );
	+
	+ TestTile::test< Kokkos::OpenMP, 2, 2 >( 1, 1 );
	+ TestTile::test< Kokkos::OpenMP, 2, 2 >( 2, 3 );
	+ TestTile::test< Kokkos::OpenMP, 2, 2 >( 4, 4 );
	+ TestTile::test< Kokkos::OpenMP, 2, 2 >( 9, 9 );
	+
	+ TestTile::test< Kokkos::OpenMP, 2, 4 >( 9, 9 );
	+ TestTile::test< Kokkos::OpenMP, 4, 2 >( 9, 9 );
	+
	+ TestTile::test< Kokkos::OpenMP, 4, 4 >( 1, 1 );
	+ TestTile::test< Kokkos::OpenMP, 4, 4 >( 4, 4 );
	+ TestTile::test< Kokkos::OpenMP, 4, 4 >( 9, 9 );
	+ TestTile::test< Kokkos::OpenMP, 4, 4 >( 9, 11 );
	+
	+ TestTile::test< Kokkos::OpenMP, 8, 8 >( 1, 1 );
	+ TestTile::test< Kokkos::OpenMP, 8, 8 >( 4, 4 );
	+ TestTile::test< Kokkos::OpenMP, 8, 8 >( 9, 9 );
	+ TestTile::test< Kokkos::OpenMP, 8, 8 >( 9, 11 );
	}

	-
	-TEST_F( openmp , dispatch )
	+TEST_F( openmp, dispatch )
	{
	- const int repeat = 100 ;
	- for ( int i = 0 ; i < repeat ; ++i ) {
	- for ( int j = 0 ; j < repeat ; ++j ) {
	- Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::OpenMP >(0,j)
	- , KOKKOS_LAMBDA( int ) {} );
	- }}
	+ const int repeat = 100;
	+ for ( int i = 0; i < repeat; ++i ) {
	+ for ( int j = 0; j < repeat; ++j ) {
	+ Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::OpenMP >( 0, j )
	+ , KOKKOS_LAMBDA( int ) {} );
	+ }
	+ }
	}

	-
	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Reductions.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Reductions.cpp
	index d41e1493e..22c29308a 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Reductions.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Reductions.cpp
	@@ -1,138 +1,146 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp, long_reduce) {
	- TestReduce< long , Kokkos::OpenMP >( 0 );
	- TestReduce< long , Kokkos::OpenMP >( 1000000 );
	+TEST_F( openmp, long_reduce )
	+{
	+ TestReduce< long, Kokkos::OpenMP >( 0 );
	+ TestReduce< long, Kokkos::OpenMP >( 1000000 );
	}

	-TEST_F( openmp, double_reduce) {
	- TestReduce< double , Kokkos::OpenMP >( 0 );
	- TestReduce< double , Kokkos::OpenMP >( 1000000 );
	+TEST_F( openmp, double_reduce )
	+{
	+ TestReduce< double, Kokkos::OpenMP >( 0 );
	+ TestReduce< double, Kokkos::OpenMP >( 1000000 );
	}

	-TEST_F( openmp , reducers )
	+TEST_F( openmp, reducers )
	{
	- TestReducers<int, Kokkos::OpenMP>::execute_integer();
	- TestReducers<size_t, Kokkos::OpenMP>::execute_integer();
	- TestReducers<double, Kokkos::OpenMP>::execute_float();
	- TestReducers<Kokkos::complex<double>, Kokkos::OpenMP>::execute_basic();
	+ TestReducers< int, Kokkos::OpenMP >::execute_integer();
	+ TestReducers< size_t, Kokkos::OpenMP >::execute_integer();
	+ TestReducers< double, Kokkos::OpenMP >::execute_float();
	+ TestReducers< Kokkos::complex<double>, Kokkos::OpenMP >::execute_basic();
	}

	-TEST_F( openmp, long_reduce_dynamic ) {
	- TestReduceDynamic< long , Kokkos::OpenMP >( 0 );
	- TestReduceDynamic< long , Kokkos::OpenMP >( 1000000 );
	+TEST_F( openmp, long_reduce_dynamic )
	+{
	+ TestReduceDynamic< long, Kokkos::OpenMP >( 0 );
	+ TestReduceDynamic< long, Kokkos::OpenMP >( 1000000 );
	}

	-TEST_F( openmp, double_reduce_dynamic ) {
	- TestReduceDynamic< double , Kokkos::OpenMP >( 0 );
	- TestReduceDynamic< double , Kokkos::OpenMP >( 1000000 );
	+TEST_F( openmp, double_reduce_dynamic )
	+{
	+ TestReduceDynamic< double, Kokkos::OpenMP >( 0 );
	+ TestReduceDynamic< double, Kokkos::OpenMP >( 1000000 );
	}

	-TEST_F( openmp, long_reduce_dynamic_view ) {
	- TestReduceDynamicView< long , Kokkos::OpenMP >( 0 );
	- TestReduceDynamicView< long , Kokkos::OpenMP >( 1000000 );
	+TEST_F( openmp, long_reduce_dynamic_view )
	+{
	+ TestReduceDynamicView< long, Kokkos::OpenMP >( 0 );
	+ TestReduceDynamicView< long, Kokkos::OpenMP >( 1000000 );
	}

	-TEST_F( openmp , scan )
	+TEST_F( openmp, scan )
	{
	- TestScan< Kokkos::OpenMP >::test_range( 1 , 1000 );
	+ TestScan< Kokkos::OpenMP >::test_range( 1, 1000 );
	TestScan< Kokkos::OpenMP >( 0 );
	TestScan< Kokkos::OpenMP >( 100000 );
	TestScan< Kokkos::OpenMP >( 10000000 );
	Kokkos::OpenMP::fence();
	}

	#if 0
	-TEST_F( openmp , scan_small )
	+TEST_F( openmp, scan_small )
	{
	- typedef TestScan< Kokkos::OpenMP , Kokkos::Impl::OpenMPExecUseScanSmall > TestScanFunctor ;
	- for ( int i = 0 ; i < 1000 ; ++i ) {
	+ typedef TestScan< Kokkos::OpenMP, Kokkos::Impl::OpenMPExecUseScanSmall > TestScanFunctor;
	+
	+ for ( int i = 0; i < 1000; ++i ) {
	TestScanFunctor( 10 );
	TestScanFunctor( 10000 );
	}
	TestScanFunctor( 1000000 );
	TestScanFunctor( 10000000 );

	Kokkos::OpenMP::fence();
	}
	#endif

	-TEST_F( openmp , team_scan )
	+TEST_F( openmp, team_scan )
	{
	- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 0 );
	- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 10 );
	- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
	- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 10000 );
	- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
	+ TestScanTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 0 );
	+ TestScanTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	+ TestScanTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 10 );
	+ TestScanTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
	+ TestScanTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 10000 );
	+ TestScanTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
	}

	-TEST_F( openmp , team_long_reduce) {
	- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 0 );
	- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 3 );
	- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 100000 );
	- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	+TEST_F( openmp, team_long_reduce )
	+{
	+ TestReduceTeam< long, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 0 );
	+ TestReduceTeam< long, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	+ TestReduceTeam< long, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 3 );
	+ TestReduceTeam< long, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	+ TestReduceTeam< long, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 100000 );
	+ TestReduceTeam< long, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	}

	-TEST_F( openmp , team_double_reduce) {
	- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 0 );
	- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 3 );
	- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 100000 );
	- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	+TEST_F( openmp, team_double_reduce )
	+{
	+ TestReduceTeam< double, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 0 );
	+ TestReduceTeam< double, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	+ TestReduceTeam< double, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 3 );
	+ TestReduceTeam< double, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	+ TestReduceTeam< double, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 100000 );
	+ TestReduceTeam< double, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	}

	-TEST_F( openmp , reduction_deduction )
	+TEST_F( openmp, reduction_deduction )
	{
	TestCXX11::test_reduction_deduction< Kokkos::OpenMP >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_a.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_a.cpp
	index 9854417e4..fefae0732 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_a.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_a.cpp
	@@ -1,92 +1,103 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp, view_subview_auto_1d_left ) {
	- TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::OpenMP >();
	+TEST_F( openmp, view_subview_auto_1d_left )
	+{
	+ TestViewSubview::test_auto_1d< Kokkos::LayoutLeft, Kokkos::OpenMP >();
	}

	-TEST_F( openmp, view_subview_auto_1d_right ) {
	- TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::OpenMP >();
	+TEST_F( openmp, view_subview_auto_1d_right )
	+{
	+ TestViewSubview::test_auto_1d< Kokkos::LayoutRight, Kokkos::OpenMP >();
	}

	-TEST_F( openmp, view_subview_auto_1d_stride ) {
	- TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::OpenMP >();
	+TEST_F( openmp, view_subview_auto_1d_stride )
	+{
	+ TestViewSubview::test_auto_1d< Kokkos::LayoutStride, Kokkos::OpenMP >();
	}

	-TEST_F( openmp, view_subview_assign_strided ) {
	+TEST_F( openmp, view_subview_assign_strided )
	+{
	TestViewSubview::test_1d_strided_assignment< Kokkos::OpenMP >();
	}

	-TEST_F( openmp, view_subview_left_0 ) {
	+TEST_F( openmp, view_subview_left_0 )
	+{
	TestViewSubview::test_left_0< Kokkos::OpenMP >();
	}

	-TEST_F( openmp, view_subview_left_1 ) {
	+TEST_F( openmp, view_subview_left_1 )
	+{
	TestViewSubview::test_left_1< Kokkos::OpenMP >();
	}

	-TEST_F( openmp, view_subview_left_2 ) {
	+TEST_F( openmp, view_subview_left_2 )
	+{
	TestViewSubview::test_left_2< Kokkos::OpenMP >();
	}

	-TEST_F( openmp, view_subview_left_3 ) {
	+TEST_F( openmp, view_subview_left_3 )
	+{
	TestViewSubview::test_left_3< Kokkos::OpenMP >();
	}

	-TEST_F( openmp, view_subview_right_0 ) {
	+TEST_F( openmp, view_subview_right_0 )
	+{
	TestViewSubview::test_right_0< Kokkos::OpenMP >();
	}

	-TEST_F( openmp, view_subview_right_1 ) {
	+TEST_F( openmp, view_subview_right_1 )
	+{
	TestViewSubview::test_right_1< Kokkos::OpenMP >();
	}

	-TEST_F( openmp, view_subview_right_3 ) {
	+TEST_F( openmp, view_subview_right_3 )
	+{
	TestViewSubview::test_right_3< Kokkos::OpenMP >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_b.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_b.cpp
	index 2aa1fc5c6..7de7ca91b 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_b.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_b.cpp
	@@ -1,60 +1,62 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp, view_subview_layoutleft_to_layoutleft) {
	+TEST_F( openmp, view_subview_layoutleft_to_layoutleft )
	+{
	TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::OpenMP >();
	- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-TEST_F( openmp, view_subview_layoutright_to_layoutright) {
	+TEST_F( openmp, view_subview_layoutright_to_layoutright )
	+{
	TestViewSubview::test_layoutright_to_layoutright< Kokkos::OpenMP >();
	- TestViewSubview::test_layoutright_to_layoutright< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	- TestViewSubview::test_layoutright_to_layoutright< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c01.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c01.cpp
	index 1a6871cfc..d727ec0ee 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c01.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c01.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp, view_subview_1d_assign ) {
	+TEST_F( openmp, view_subview_1d_assign )
	+{
	TestViewSubview::test_1d_assign< Kokkos::OpenMP >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c02.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c02.cpp
	index b04edbb99..df43f555d 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c02.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c02.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp, view_subview_1d_assign_atomic ) {
	- TestViewSubview::test_1d_assign< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( openmp, view_subview_1d_assign_atomic )
	+{
	+ TestViewSubview::test_1d_assign< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c03.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c03.cpp
	index 765e23583..38f241ebf 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c03.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c03.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp, view_subview_1d_assign_randomaccess ) {
	- TestViewSubview::test_1d_assign< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( openmp, view_subview_1d_assign_randomaccess )
	+{
	+ TestViewSubview::test_1d_assign< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c04.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c04.cpp
	index 9d8b62708..11a4ea8ac 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c04.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c04.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp, view_subview_2d_from_3d ) {
	+TEST_F( openmp, view_subview_2d_from_3d )
	+{
	TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c05.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c05.cpp
	index 9c19cf0e5..a91baa34d 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c05.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c05.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp, view_subview_2d_from_3d_atomic ) {
	- TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( openmp, view_subview_2d_from_3d_atomic )
	+{
	+ TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c06.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c06.cpp
	index c1bdf7235..20d4d9bd6 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c06.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c06.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp, view_subview_2d_from_3d_randomaccess ) {
	- TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( openmp, view_subview_2d_from_3d_randomaccess )
	+{
	+ TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c07.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c07.cpp
	index 08a3b5a54..528df1c07 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c07.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c07.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp, view_subview_3d_from_5d_left ) {
	+TEST_F( openmp, view_subview_3d_from_5d_left )
	+{
	TestViewSubview::test_3d_subview_5d_left< Kokkos::OpenMP >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c08.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c08.cpp
	index 0864ebbda..d9eea8dba 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c08.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c08.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp, view_subview_3d_from_5d_left_atomic ) {
	- TestViewSubview::test_3d_subview_5d_left< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( openmp, view_subview_3d_from_5d_left_atomic )
	+{
	+ TestViewSubview::test_3d_subview_5d_left< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c09.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c09.cpp
	index e38dfecbf..f909dc33c 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c09.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c09.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp, view_subview_3d_from_5d_left_randomaccess ) {
	- TestViewSubview::test_3d_subview_5d_left< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( openmp, view_subview_3d_from_5d_left_randomaccess )
	+{
	+ TestViewSubview::test_3d_subview_5d_left< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c10.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c10.cpp
	index b7e4683d2..59996d5e3 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c10.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c10.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp, view_subview_3d_from_5d_right ) {
	+TEST_F( openmp, view_subview_3d_from_5d_right )
	+{
	TestViewSubview::test_3d_subview_5d_right< Kokkos::OpenMP >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c11.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c11.cpp
	index fc3e66fd4..3f9c215d9 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c11.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c11.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp, view_subview_3d_from_5d_right_atomic ) {
	- TestViewSubview::test_3d_subview_5d_right< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( openmp, view_subview_3d_from_5d_right_atomic )
	+{
	+ TestViewSubview::test_3d_subview_5d_right< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c12.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c12.cpp
	index e21a13ee5..d3a73483a 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c12.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c12.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp, view_subview_3d_from_5d_right_randomaccess ) {
	- TestViewSubview::test_3d_subview_5d_right< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( openmp, view_subview_3d_from_5d_right_randomaccess )
	+{
	+ TestViewSubview::test_3d_subview_5d_right< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c_all.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c_all.cpp
	index 9da159ab5..399c6e92e 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c_all.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c_all.cpp
	@@ -1,12 +1,12 @@
	-#include<openmp/TestOpenMP_SubView_c01.cpp>
	-#include<openmp/TestOpenMP_SubView_c02.cpp>
	-#include<openmp/TestOpenMP_SubView_c03.cpp>
	-#include<openmp/TestOpenMP_SubView_c04.cpp>
	-#include<openmp/TestOpenMP_SubView_c05.cpp>
	-#include<openmp/TestOpenMP_SubView_c06.cpp>
	-#include<openmp/TestOpenMP_SubView_c07.cpp>
	-#include<openmp/TestOpenMP_SubView_c08.cpp>
	-#include<openmp/TestOpenMP_SubView_c09.cpp>
	-#include<openmp/TestOpenMP_SubView_c10.cpp>
	-#include<openmp/TestOpenMP_SubView_c11.cpp>
	-#include<openmp/TestOpenMP_SubView_c12.cpp>
	+#include <openmp/TestOpenMP_SubView_c01.cpp>
	+#include <openmp/TestOpenMP_SubView_c02.cpp>
	+#include <openmp/TestOpenMP_SubView_c03.cpp>
	+#include <openmp/TestOpenMP_SubView_c04.cpp>
	+#include <openmp/TestOpenMP_SubView_c05.cpp>
	+#include <openmp/TestOpenMP_SubView_c06.cpp>
	+#include <openmp/TestOpenMP_SubView_c07.cpp>
	+#include <openmp/TestOpenMP_SubView_c08.cpp>
	+#include <openmp/TestOpenMP_SubView_c09.cpp>
	+#include <openmp/TestOpenMP_SubView_c10.cpp>
	+#include <openmp/TestOpenMP_SubView_c11.cpp>
	+#include <openmp/TestOpenMP_SubView_c12.cpp>
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Team.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Team.cpp
	index 38cf0a0f4..216789e8b 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Team.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Team.cpp
	@@ -1,122 +1,127 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp , team_tag )
	+TEST_F( openmp, team_tag )
	{
	- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
	- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
	- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
	- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
	+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
	+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
	+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
	+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );

	- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
	- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
	- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(2);
	- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(2);
	+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
	+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
	+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 2 );
	+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 2 );

	- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
	- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
	- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
	- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
	+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
	+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
	+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1000 );
	+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1000 );
	}

	-TEST_F( openmp , team_shared_request) {
	- TestSharedTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
	- TestSharedTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
	+TEST_F( openmp, team_shared_request )
	+{
	+ TestSharedTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >();
	+ TestSharedTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >();
	}

	-TEST_F( openmp, team_scratch_request) {
	- TestScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
	- TestScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
	+TEST_F( openmp, team_scratch_request )
	+{
	+ TestScratchTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >();
	+ TestScratchTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >();
	}

	-#if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	-TEST_F( openmp , team_lambda_shared_request) {
	- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
	- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
	+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
	+TEST_F( openmp, team_lambda_shared_request )
	+{
	+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >();
	+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >();
	}
	#endif

	-TEST_F( openmp, shmem_size) {
	+TEST_F( openmp, shmem_size )
	+{
	TestShmemSize< Kokkos::OpenMP >();
	}

	-TEST_F( openmp, multi_level_scratch) {
	- TestMultiLevelScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
	- TestMultiLevelScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
	+TEST_F( openmp, multi_level_scratch )
	+{
	+ TestMultiLevelScratchTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >();
	+ TestMultiLevelScratchTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >();
	}

	-TEST_F( openmp , team_vector )
	+TEST_F( openmp, team_vector )
	{
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(0) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(1) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(2) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(3) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(4) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(5) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(6) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(7) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(8) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(9) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(10) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 0 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 1 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 2 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 3 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 4 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 5 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 6 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 7 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 8 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 9 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 10 ) ) );
	}

	#ifdef KOKKOS_COMPILER_GNU
	#if ( KOKKOS_COMPILER_GNU == 472 )
	#define SKIP_TEST
	#endif
	#endif

	#ifndef SKIP_TEST
	TEST_F( openmp, triple_nested_parallelism )
	{
	- TestTripleNestedReduce< double, Kokkos::OpenMP >( 8192, 2048 , 32 , 32 );
	- TestTripleNestedReduce< double, Kokkos::OpenMP >( 8192, 2048 , 32 , 16 );
	- TestTripleNestedReduce< double, Kokkos::OpenMP >( 8192, 2048 , 16 , 16 );
	+ TestTripleNestedReduce< double, Kokkos::OpenMP >( 8192, 2048, 32, 32 );
	+ TestTripleNestedReduce< double, Kokkos::OpenMP >( 8192, 2048, 32, 16 );
	+ TestTripleNestedReduce< double, Kokkos::OpenMP >( 8192, 2048, 16, 16 );
	}
	#endif

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_a.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_a.cpp
	index 82cbf3ea1..aead381a1 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_a.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_a.cpp
	@@ -1,53 +1,54 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp , impl_view_mapping_a ) {
	+TEST_F( openmp, impl_view_mapping_a )
	+{
	test_view_mapping< Kokkos::OpenMP >();
	test_view_mapping_operator< Kokkos::OpenMP >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_b.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_b.cpp
	index b2d4f87fd..c802fb79c 100644
	--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_b.cpp
	+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_b.cpp
	@@ -1,121 +1,124 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <openmp/TestOpenMP.hpp>

	namespace Test {

	-TEST_F( openmp , impl_shared_alloc ) {
	- test_shared_alloc< Kokkos::HostSpace , Kokkos::OpenMP >();
	+TEST_F( openmp, impl_shared_alloc )
	+{
	+ test_shared_alloc< Kokkos::HostSpace, Kokkos::OpenMP >();
	}

	-TEST_F( openmp , impl_view_mapping_b ) {
	+TEST_F( openmp, impl_view_mapping_b )
	+{
	test_view_mapping_subview< Kokkos::OpenMP >();
	TestViewMappingAtomic< Kokkos::OpenMP >::run();
	}

	-TEST_F( openmp, view_api) {
	- TestViewAPI< double , Kokkos::OpenMP >();
	+TEST_F( openmp, view_api )
	+{
	+ TestViewAPI< double, Kokkos::OpenMP >();
	}

	-TEST_F( openmp , view_nested_view )
	+TEST_F( openmp, view_nested_view )
	{
	::Test::view_nested_view< Kokkos::OpenMP >();
	}

	-
	-
	-TEST_F( openmp , view_remap )
	+TEST_F( openmp, view_remap )
	{
	- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
	-
	- typedef Kokkos::View< double*[N1][N2][N3] ,
	- Kokkos::LayoutRight ,
	- Kokkos::OpenMP > output_type ;
	-
	- typedef Kokkos::View< int**[N2][N3] ,
	- Kokkos::LayoutLeft ,
	- Kokkos::OpenMP > input_type ;
	-
	- typedef Kokkos::View< int*[N0][N2][N3] ,
	- Kokkos::LayoutLeft ,
	- Kokkos::OpenMP > diff_type ;
	-
	- output_type output( "output" , N0 );
	- input_type input ( "input" , N0 , N1 );
	- diff_type diff ( "diff" , N0 );
	-
	- int value = 0 ;
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
	- input(i0,i1,i2,i3) = ++value ;
	- }}}}
	-
	- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
	- Kokkos::deep_copy( output , input );
	-
	- value = 0 ;
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
	- ++value ;
	- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
	- }}}}
	+ enum { N0 = 3, N1 = 2, N2 = 8, N3 = 9 };
	+
	+ typedef Kokkos::View< double*[N1][N2][N3],
	+ Kokkos::LayoutRight,
	+ Kokkos::OpenMP > output_type;
	+
	+ typedef Kokkos::View< int**[N2][N3],
	+ Kokkos::LayoutLeft,
	+ Kokkos::OpenMP > input_type;
	+
	+ typedef Kokkos::View< int*[N0][N2][N3],
	+ Kokkos::LayoutLeft,
	+ Kokkos::OpenMP > diff_type;
	+
	+ output_type output( "output", N0 );
	+ input_type input ( "input", N0, N1 );
	+ diff_type diff ( "diff", N0 );
	+
	+ int value = 0;
	+
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i0 = 0; i0 < N0; ++i0 )
	+ {
	+ input( i0, i1, i2, i3 ) = ++value;
	+ }
	+
	+ // Kokkos::deep_copy( diff, input ); // Throw with incompatible shape.
	+ Kokkos::deep_copy( output, input );
	+
	+ value = 0;
	+
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i0 = 0; i0 < N0; ++i0 )
	+ {
	+ ++value;
	+ ASSERT_EQ( value, ( (int) output( i0, i1, i2, i3 ) ) );
	+ }
	}

	-//----------------------------------------------------------------------------
	-
	-TEST_F( openmp , view_aggregate )
	+TEST_F( openmp, view_aggregate )
	{
	TestViewAggregate< Kokkos::OpenMP >();
	}

	-TEST_F( openmp , template_meta_functions )
	+TEST_F( openmp, template_meta_functions )
	{
	- TestTemplateMetaFunctions<int, Kokkos::OpenMP >();
	+ TestTemplateMetaFunctions< int, Kokkos::OpenMP >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads.hpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads.hpp
	similarity index 86%
	copy from lib/kokkos/core/unit_test/threads/TestThreads.hpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads.hpp
	index 4f611cf99..907fe23ea 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads.hpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads.hpp
	@@ -1,115 +1,109 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#ifndef KOKKOS_TEST_THREADS_HPP
	-#define KOKKOS_TEST_THREADS_HPP
	+
	+#ifndef KOKKOS_TEST_QTHREADS_HPP
	+#define KOKKOS_TEST_QTHREADS_HPP
	+
	#include <gtest/gtest.h>

	#include <Kokkos_Macros.hpp>
	+
	#ifdef KOKKOS_LAMBDA
	#undef KOKKOS_LAMBDA
	#endif
	#define KOKKOS_LAMBDA [=]

	#include <Kokkos_Core.hpp>

	#include <TestTile.hpp>
	-
	-//----------------------------------------------------------------------------
	-
	#include <TestSharedAlloc.hpp>
	#include <TestViewMapping.hpp>
	-
	-
	#include <TestViewAPI.hpp>
	#include <TestViewOfClass.hpp>
	#include <TestViewSubview.hpp>
	#include <TestAtomic.hpp>
	#include <TestAtomicOperations.hpp>
	#include <TestAtomicViews.hpp>
	#include <TestRange.hpp>
	#include <TestTeam.hpp>
	#include <TestReduce.hpp>
	#include <TestScan.hpp>
	#include <TestAggregate.hpp>
	#include <TestCompilerMacros.hpp>
	#include <TestTaskScheduler.hpp>
	#include <TestMemoryPool.hpp>
	-
	-
	#include <TestCXX11.hpp>
	#include <TestCXX11Deduction.hpp>
	#include <TestTeamVector.hpp>
	#include <TestTemplateMetaFunctions.hpp>
	-
	#include <TestPolicyConstruction.hpp>
	-
	#include <TestMDRange.hpp>

	namespace Test {

	-class threads : public ::testing::Test {
	+class qthreads : public ::testing::Test {
	protected:
	static void SetUpTestCase()
	{
	const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
	const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
	const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();

	- unsigned threads_count = 0 ;
	+ const unsigned threads_count = std::max( 1u, numa_count ) *
	+ std::max( 2u, ( cores_per_numa * threads_per_core ) / 2 );

	- threads_count = std::max( 1u , numa_count )
	- * std::max( 2u , cores_per_numa * threads_per_core );
	+ Kokkos::Qthreads::initialize( threads_count );
	+ Kokkos::print_configuration( std::cout, true );

	- Kokkos::Threads::initialize( threads_count );
	- Kokkos::Threads::print_configuration( std::cout , true /* detailed */ );
	+ srand( 10231 );
	}

	static void TearDownTestCase()
	{
	- Kokkos::Threads::finalize();
	+ Kokkos::Qthreads::finalize();
	}
	};

	+} // namespace Test

	-}
	#endif
	diff --git a/lib/kokkos/core/unit_test/qthreads/TestQthreads_Atomics.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Atomics.cpp
	new file mode 100644
	index 000000000..e64c3305d
	--- /dev/null
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Atomics.cpp
	@@ -0,0 +1,213 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+#include <qthreads/TestQthreads.hpp>
	+
	+namespace Test {
	+
	+TEST_F( qthreads, atomics )
	+{
	+#if 0
	+ const int loop_count = 1e4;
	+
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Qthreads >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Qthreads >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Qthreads >( loop_count, 3 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Qthreads >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Qthreads >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Qthreads >( loop_count, 3 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Qthreads >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Qthreads >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Qthreads >( loop_count, 3 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Qthreads >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Qthreads >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Qthreads >( loop_count, 3 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Qthreads >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Qthreads >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Qthreads >( loop_count, 3 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Qthreads >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Qthreads >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Qthreads >( loop_count, 3 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Qthreads >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Qthreads >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Qthreads >( 100, 3 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Qthreads >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Qthreads >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Qthreads >( 100, 3 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Qthreads >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Qthreads >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Qthreads >( 100, 3 ) ) );
	+#endif
	+}
	+
	+TEST_F( qthreads, atomic_operations )
	+{
	+#if 0
	+ const int start = 1; // Avoid zero for division.
	+ const int end = 11;
	+
	+ for ( int i = start; i < end; ++i )
	+ {
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Qthreads >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Qthreads >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Qthreads >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Qthreads >( start, end - i, 4 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Qthreads >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Qthreads >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Qthreads >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Qthreads >( start, end - i, 4 ) ) );
	+ }
	+#endif
	+}
	+
	+TEST_F( qthreads, atomic_views_integral )
	+{
	+#if 0
	+ const long length = 1000000;
	+
	+ {
	+ // Integral Types.
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 8 ) ) );
	+ }
	+#endif
	+}
	+
	+TEST_F( qthreads, atomic_views_nonintegral )
	+{
	+#if 0
	+ const long length = 1000000;
	+
	+ {
	+ // Non-Integral Types.
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Qthreads >( length, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Qthreads >( length, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Qthreads >( length, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Qthreads >( length, 4 ) ) );
	+ }
	+#endif
	+}
	+
	+TEST_F( qthreads, atomic_view_api )
	+{
	+#if 0
	+ TestAtomicViews::TestAtomicViewAPI< int, Kokkos::Qthreads >();
	+#endif
	+}
	+
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/qthreads/TestQthreads_Other.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Other.cpp
	new file mode 100644
	index 000000000..0faec8405
	--- /dev/null
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Other.cpp
	@@ -0,0 +1,213 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+#include <qthreads/TestQthreads.hpp>
	+
	+namespace Test {
	+
	+TEST_F( qthreads, init )
	+{
	+ ;
	+}
	+
	+TEST_F( qthreads, md_range )
	+{
	+#if 0
	+ TestMDRange_2D< Kokkos::Qthreads >::test_for2( 100, 100 );
	+ TestMDRange_3D< Kokkos::Qthreads >::test_for3( 100, 100, 100 );
	+#endif
	+}
	+
	+TEST_F( qthreads, policy_construction )
	+{
	+#if 0
	+ TestRangePolicyConstruction< Kokkos::Qthreads >();
	+ TestTeamPolicyConstruction< Kokkos::Qthreads >();
	+#endif
	+}
	+
	+TEST_F( qthreads, range_tag )
	+{
	+#if 0
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_scan( 0 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 0 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 0 );
	+
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_scan( 2 );
	+
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 3 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 3 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 3 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 3 );
	+
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_scan( 1000 );
	+
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1001 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1001 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 1001 );
	+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 1000 );
	+#endif
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+TEST_F( qthreads, compiler_macros )
	+{
	+#if 0
	+ ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Qthreads >() ) );
	+#endif
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+TEST_F( qthreads, memory_pool )
	+{
	+#if 0
	+ bool val = TestMemoryPool::test_mempool< Kokkos::Qthreads >( 128, 128000000 );
	+ ASSERT_TRUE( val );
	+
	+ TestMemoryPool::test_mempool2< Kokkos::Qthreads >( 64, 4, 1000000, 2000000 );
	+
	+ TestMemoryPool::test_memory_exhaustion< Kokkos::Qthreads >();
	+#endif
	+}
	+
	+//----------------------------------------------------------------------------
	+
	+#if defined( KOKKOS_ENABLE_TASKDAG )
	+
	+TEST_F( qthreads, task_fib )
	+{
	+#if 0
	+ for ( int i = 0; i < 25; ++i ) {
	+ TestTaskScheduler::TestFib< Kokkos::Qthreads >::run( i, ( i + 1 ) * ( i + 1 ) * 10000 );
	+ }
	+#endif
	+}
	+
	+TEST_F( qthreads, task_depend )
	+{
	+#if 0
	+ for ( int i = 0; i < 25; ++i ) {
	+ TestTaskScheduler::TestTaskDependence< Kokkos::Qthreads >::run( i );
	+ }
	+#endif
	+}
	+
	+TEST_F( qthreads, task_team )
	+{
	+#if 0
	+ TestTaskScheduler::TestTaskTeam< Kokkos::Qthreads >::run( 1000 );
	+ //TestTaskScheduler::TestTaskTeamValue< Kokkos::Qthreads >::run( 1000 ); // Put back after testing.
	+#endif
	+}
	+
	+#endif // #if defined( KOKKOS_ENABLE_TASKDAG )
	+
	+//----------------------------------------------------------------------------
	+
	+#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS )
	+
	+TEST_F( qthreads, cxx11 )
	+{
	+#if 0
	+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::Qthreads >::value ) {
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Qthreads >( 1 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Qthreads >( 2 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Qthreads >( 3 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Qthreads >( 4 ) ) );
	+ }
	+#endif
	+}
	+
	+#endif
	+
	+TEST_F( qthreads, tile_layout )
	+{
	+#if 0
	+ TestTile::test< Kokkos::Qthreads, 1, 1 >( 1, 1 );
	+ TestTile::test< Kokkos::Qthreads, 1, 1 >( 2, 3 );
	+ TestTile::test< Kokkos::Qthreads, 1, 1 >( 9, 10 );
	+
	+ TestTile::test< Kokkos::Qthreads, 2, 2 >( 1, 1 );
	+ TestTile::test< Kokkos::Qthreads, 2, 2 >( 2, 3 );
	+ TestTile::test< Kokkos::Qthreads, 2, 2 >( 4, 4 );
	+ TestTile::test< Kokkos::Qthreads, 2, 2 >( 9, 9 );
	+
	+ TestTile::test< Kokkos::Qthreads, 2, 4 >( 9, 9 );
	+ TestTile::test< Kokkos::Qthreads, 4, 2 >( 9, 9 );
	+
	+ TestTile::test< Kokkos::Qthreads, 4, 4 >( 1, 1 );
	+ TestTile::test< Kokkos::Qthreads, 4, 4 >( 4, 4 );
	+ TestTile::test< Kokkos::Qthreads, 4, 4 >( 9, 9 );
	+ TestTile::test< Kokkos::Qthreads, 4, 4 >( 9, 11 );
	+
	+ TestTile::test< Kokkos::Qthreads, 8, 8 >( 1, 1 );
	+ TestTile::test< Kokkos::Qthreads, 8, 8 >( 4, 4 );
	+ TestTile::test< Kokkos::Qthreads, 8, 8 >( 9, 9 );
	+ TestTile::test< Kokkos::Qthreads, 8, 8 >( 9, 11 );
	+#endif
	+}
	+
	+TEST_F( qthreads, dispatch )
	+{
	+#if 0
	+ const int repeat = 100;
	+ for ( int i = 0; i < repeat; ++i ) {
	+ for ( int j = 0; j < repeat; ++j ) {
	+ Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Qthreads >( 0, j )
	+ , KOKKOS_LAMBDA( int ) {} );
	+ }
	+ }
	+#endif
	+}
	+
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/qthreads/TestQthreads_Reductions.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Reductions.cpp
	new file mode 100644
	index 000000000..a2470ac15
	--- /dev/null
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Reductions.cpp
	@@ -0,0 +1,168 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+#include <qthreads/TestQthreads.hpp>
	+
	+namespace Test {
	+
	+TEST_F( qthreads, long_reduce )
	+{
	+#if 0
	+ TestReduce< long, Kokkos::Qthreads >( 0 );
	+ TestReduce< long, Kokkos::Qthreads >( 1000000 );
	+#endif
	+}
	+
	+TEST_F( qthreads, double_reduce )
	+{
	+#if 0
	+ TestReduce< double, Kokkos::Qthreads >( 0 );
	+ TestReduce< double, Kokkos::Qthreads >( 1000000 );
	+#endif
	+}
	+
	+TEST_F( qthreads, reducers )
	+{
	+#if 0
	+ TestReducers< int, Kokkos::Qthreads >::execute_integer();
	+ TestReducers< size_t, Kokkos::Qthreads >::execute_integer();
	+ TestReducers< double, Kokkos::Qthreads >::execute_float();
	+ TestReducers< Kokkos::complex<double >, Kokkos::Qthreads>::execute_basic();
	+#endif
	+}
	+
	+TEST_F( qthreads, long_reduce_dynamic )
	+{
	+#if 0
	+ TestReduceDynamic< long, Kokkos::Qthreads >( 0 );
	+ TestReduceDynamic< long, Kokkos::Qthreads >( 1000000 );
	+#endif
	+}
	+
	+TEST_F( qthreads, double_reduce_dynamic )
	+{
	+#if 0
	+ TestReduceDynamic< double, Kokkos::Qthreads >( 0 );
	+ TestReduceDynamic< double, Kokkos::Qthreads >( 1000000 );
	+#endif
	+}
	+
	+TEST_F( qthreads, long_reduce_dynamic_view )
	+{
	+#if 0
	+ TestReduceDynamicView< long, Kokkos::Qthreads >( 0 );
	+ TestReduceDynamicView< long, Kokkos::Qthreads >( 1000000 );
	+#endif
	+}
	+
	+TEST_F( qthreads, scan )
	+{
	+#if 0
	+ TestScan< Kokkos::Qthreads >::test_range( 1, 1000 );
	+ TestScan< Kokkos::Qthreads >( 0 );
	+ TestScan< Kokkos::Qthreads >( 100000 );
	+ TestScan< Kokkos::Qthreads >( 10000000 );
	+ Kokkos::Qthreads::fence();
	+#endif
	+}
	+
	+TEST_F( qthreads, scan_small )
	+{
	+#if 0
	+ typedef TestScan< Kokkos::Qthreads, Kokkos::Impl::QthreadsExecUseScanSmall > TestScanFunctor;
	+
	+ for ( int i = 0; i < 1000; ++i ) {
	+ TestScanFunctor( 10 );
	+ TestScanFunctor( 10000 );
	+ }
	+ TestScanFunctor( 1000000 );
	+ TestScanFunctor( 10000000 );
	+
	+ Kokkos::Qthreads::fence();
	+#endif
	+}
	+
	+TEST_F( qthreads, team_scan )
	+{
	+#if 0
	+ TestScanTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 0 );
	+ TestScanTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	+ TestScanTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 10 );
	+ TestScanTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
	+ TestScanTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 10000 );
	+ TestScanTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
	+#endif
	+}
	+
	+TEST_F( qthreads, team_long_reduce )
	+{
	+#if 0
	+ TestReduceTeam< long, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 0 );
	+ TestReduceTeam< long, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	+ TestReduceTeam< long, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 3 );
	+ TestReduceTeam< long, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	+ TestReduceTeam< long, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 100000 );
	+ TestReduceTeam< long, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	+#endif
	+}
	+
	+TEST_F( qthreads, team_double_reduce )
	+{
	+#if 0
	+ TestReduceTeam< double, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 0 );
	+ TestReduceTeam< double, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	+ TestReduceTeam< double, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 3 );
	+ TestReduceTeam< double, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	+ TestReduceTeam< double, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 100000 );
	+ TestReduceTeam< double, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	+#endif
	+}
	+
	+TEST_F( qthreads, reduction_deduction )
	+{
	+#if 0
	+ TestCXX11::test_reduction_deduction< Kokkos::Qthreads >();
	+#endif
	+}
	+
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_a.cpp
	similarity index 59%
	copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_a.cpp
	index 2df9e19de..ab873359a 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_a.cpp
	@@ -1,92 +1,125 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#include <threads/TestThreads.hpp>
	+
	+#include <qthreads/TestQthreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_auto_1d_left ) {
	- TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::Threads >();
	+TEST_F( qthreads, view_subview_auto_1d_left )
	+{
	+#if 0
	+ TestViewSubview::test_auto_1d< Kokkos::LayoutLeft, Kokkos::Qthreads >();
	+#endif
	}

	-TEST_F( threads, view_subview_auto_1d_right ) {
	- TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::Threads >();
	+TEST_F( qthreads, view_subview_auto_1d_right )
	+{
	+#if 0
	+ TestViewSubview::test_auto_1d< Kokkos::LayoutRight, Kokkos::Qthreads >();
	+#endif
	}

	-TEST_F( threads, view_subview_auto_1d_stride ) {
	- TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::Threads >();
	+TEST_F( qthreads, view_subview_auto_1d_stride )
	+{
	+#if 0
	+ TestViewSubview::test_auto_1d< Kokkos::LayoutStride, Kokkos::Qthreads >();
	+#endif
	}

	-TEST_F( threads, view_subview_assign_strided ) {
	- TestViewSubview::test_1d_strided_assignment< Kokkos::Threads >();
	+TEST_F( qthreads, view_subview_assign_strided )
	+{
	+#if 0
	+ TestViewSubview::test_1d_strided_assignment< Kokkos::Qthreads >();
	+#endif
	}

	-TEST_F( threads, view_subview_left_0 ) {
	- TestViewSubview::test_left_0< Kokkos::Threads >();
	+TEST_F( qthreads, view_subview_left_0 )
	+{
	+#if 0
	+ TestViewSubview::test_left_0< Kokkos::Qthreads >();
	+#endif
	}

	-TEST_F( threads, view_subview_left_1 ) {
	- TestViewSubview::test_left_1< Kokkos::Threads >();
	+TEST_F( qthreads, view_subview_left_1 )
	+{
	+#if 0
	+ TestViewSubview::test_left_1< Kokkos::Qthreads >();
	+#endif
	}

	-TEST_F( threads, view_subview_left_2 ) {
	- TestViewSubview::test_left_2< Kokkos::Threads >();
	+TEST_F( qthreads, view_subview_left_2 )
	+{
	+#if 0
	+ TestViewSubview::test_left_2< Kokkos::Qthreads >();
	+#endif
	}

	-TEST_F( threads, view_subview_left_3 ) {
	- TestViewSubview::test_left_3< Kokkos::Threads >();
	+TEST_F( qthreads, view_subview_left_3 )
	+{
	+#if 0
	+ TestViewSubview::test_left_3< Kokkos::Qthreads >();
	+#endif
	}

	-TEST_F( threads, view_subview_right_0 ) {
	- TestViewSubview::test_right_0< Kokkos::Threads >();
	+TEST_F( qthreads, view_subview_right_0 )
	+{
	+#if 0
	+ TestViewSubview::test_right_0< Kokkos::Qthreads >();
	+#endif
	}

	-TEST_F( threads, view_subview_right_1 ) {
	- TestViewSubview::test_right_1< Kokkos::Threads >();
	+TEST_F( qthreads, view_subview_right_1 )
	+{
	+#if 0
	+ TestViewSubview::test_right_1< Kokkos::Qthreads >();
	+#endif
	}

	-TEST_F( threads, view_subview_right_3 ) {
	- TestViewSubview::test_right_3< Kokkos::Threads >();
	+TEST_F( qthreads, view_subview_right_3 )
	+{
	+#if 0
	+ TestViewSubview::test_right_3< Kokkos::Qthreads >();
	+#endif
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_b.cpp
	similarity index 71%
	copy from lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_b.cpp
	index c01ca1c14..199c5c795 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_b.cpp
	@@ -1,59 +1,66 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#include <cuda/TestCuda.hpp>
	+
	+#include <qthreads/TestQthreads.hpp>

	namespace Test {

	-TEST_F( cuda , impl_view_mapping_a ) {
	- test_view_mapping< Kokkos::CudaSpace >();
	- test_view_mapping_operator< Kokkos::CudaSpace >();
	+TEST_F( qthreads, view_subview_layoutleft_to_layoutleft )
	+{
	+#if 0
	+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Qthreads >();
	+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+#endif
	}

	-TEST_F( cuda , view_of_class )
	+TEST_F( qthreads, view_subview_layoutright_to_layoutright )
	{
	- TestViewMappingClassValue< Kokkos::CudaSpace >::run();
	- TestViewMappingClassValue< Kokkos::CudaUVMSpace >::run();
	+#if 0
	+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Qthreads >();
	+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+#endif
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c01.cpp
	similarity index 92%
	copy from lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c01.cpp
	index 4c5f2ef72..f44909f3d 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c01.cpp
	@@ -1,52 +1,55 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#include <cuda/TestCuda.hpp>
	+
	+#include <qthreads/TestQthreads.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_1d_assign ) {
	- TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace >();
	+TEST_F( qthreads, view_subview_1d_assign )
	+{
	+#if 0
	+ TestViewSubview::test_1d_assign< Kokkos::Qthreads >();
	+#endif
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c02.cpp
	similarity index 91%
	copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c02.cpp
	index e340240c4..7bb936f8d 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c02.cpp
	@@ -1,52 +1,55 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#include <threads/TestThreads.hpp>
	+
	+#include <qthreads/TestQthreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_1d_assign_atomic ) {
	- TestViewSubview::test_1d_assign< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( qthreads, view_subview_1d_assign_atomic )
	+{
	+#if 0
	+ TestViewSubview::test_1d_assign< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+#endif
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c03.cpp
	similarity index 91%
	copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c03.cpp
	index ad27fa0fa..27073dfa8 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c03.cpp
	@@ -1,52 +1,55 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#include <threads/TestThreads.hpp>
	+
	+#include <qthreads/TestQthreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_1d_assign_randomaccess ) {
	- TestViewSubview::test_1d_assign< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( qthreads, view_subview_1d_assign_randomaccess )
	+{
	+#if 0
	+ TestViewSubview::test_1d_assign< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+#endif
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c04.cpp
	similarity index 91%
	copy from lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c04.cpp
	index 4c5f2ef72..1b3cf4885 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c04.cpp
	@@ -1,52 +1,55 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#include <cuda/TestCuda.hpp>
	+
	+#include <qthreads/TestQthreads.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_1d_assign ) {
	- TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace >();
	+TEST_F( qthreads, view_subview_2d_from_3d )
	+{
	+#if 0
	+ TestViewSubview::test_2d_subview_3d< Kokkos::Qthreads >();
	+#endif
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c05.cpp
	similarity index 91%
	copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c05.cpp
	index c7dfca941..34dda63e6 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c05.cpp
	@@ -1,52 +1,55 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#include <threads/TestThreads.hpp>
	+
	+#include <qthreads/TestQthreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_2d_from_3d_atomic ) {
	- TestViewSubview::test_2d_subview_3d< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( qthreads, view_subview_2d_from_3d_atomic )
	+{
	+#if 0
	+ TestViewSubview::test_2d_subview_3d< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+#endif
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c06.cpp
	similarity index 91%
	copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c06.cpp
	index 38e839491..5a4ee50fb 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c06.cpp
	@@ -1,52 +1,55 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#include <threads/TestThreads.hpp>
	+
	+#include <qthreads/TestQthreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_2d_from_3d_randomaccess ) {
	- TestViewSubview::test_2d_subview_3d< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( qthreads, view_subview_2d_from_3d_randomaccess )
	+{
	+#if 0
	+ TestViewSubview::test_2d_subview_3d< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+#endif
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c07.cpp
	similarity index 91%
	copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c07.cpp
	index 7cef6fa07..fe386e34a 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c07.cpp
	@@ -1,52 +1,55 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#include <threads/TestThreads.hpp>
	+
	+#include <qthreads/TestQthreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_3d_from_5d_right ) {
	- TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads >();
	+TEST_F( qthreads, view_subview_3d_from_5d_left )
	+{
	+#if 0
	+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Qthreads >();
	+#endif
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c08.cpp
	similarity index 91%
	copy from lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c08.cpp
	index e8ad23199..a3e0ab252 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c08.cpp
	@@ -1,52 +1,55 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#include <cuda/TestCuda.hpp>
	+
	+#include <qthreads/TestQthreads.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_2d_from_3d_atomic ) {
	- TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( qthreads, view_subview_3d_from_5d_left_atomic )
	+{
	+#if 0
	+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+#endif
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c09.cpp
	similarity index 91%
	copy from lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c09.cpp
	index e86b4513f..df1f570e9 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c09.cpp
	@@ -1,52 +1,55 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#include <cuda/TestCuda.hpp>
	+
	+#include <qthreads/TestQthreads.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_2d_from_3d_randomaccess ) {
	- TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( qthreads, view_subview_3d_from_5d_left_randomaccess )
	+{
	+#if 0
	+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+#endif
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c10.cpp
	similarity index 91%
	copy from lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c10.cpp
	index 4c5f2ef72..cc3c80d10 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c10.cpp
	@@ -1,52 +1,55 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#include <cuda/TestCuda.hpp>
	+
	+#include <qthreads/TestQthreads.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_1d_assign ) {
	- TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace >();
	+TEST_F( qthreads, view_subview_3d_from_5d_right )
	+{
	+#if 0
	+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Qthreads >();
	+#endif
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c11.cpp
	similarity index 91%
	copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c11.cpp
	index d67bf3157..14b331a45 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c11.cpp
	@@ -1,52 +1,55 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#include <threads/TestThreads.hpp>
	+
	+#include <qthreads/TestQthreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_3d_from_5d_right_atomic ) {
	- TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( qthreads, view_subview_3d_from_5d_right_atomic )
	+{
	+#if 0
	+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+#endif
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c12.cpp
	similarity index 91%
	copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c12.cpp
	index e8a2c825c..571382e66 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c12.cpp
	@@ -1,52 +1,55 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#include <threads/TestThreads.hpp>
	+
	+#include <qthreads/TestQthreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_3d_from_5d_right_randomaccess ) {
	- TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( qthreads, view_subview_3d_from_5d_right_randomaccess )
	+{
	+#if 0
	+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+#endif
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c_all.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c_all.cpp
	new file mode 100644
	index 000000000..ab984c5f3
	--- /dev/null
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c_all.cpp
	@@ -0,0 +1,12 @@
	+#include <qthreads/TestQthreads_SubView_c01.cpp>
	+#include <qthreads/TestQthreads_SubView_c02.cpp>
	+#include <qthreads/TestQthreads_SubView_c03.cpp>
	+#include <qthreads/TestQthreads_SubView_c04.cpp>
	+#include <qthreads/TestQthreads_SubView_c05.cpp>
	+#include <qthreads/TestQthreads_SubView_c06.cpp>
	+#include <qthreads/TestQthreads_SubView_c07.cpp>
	+#include <qthreads/TestQthreads_SubView_c08.cpp>
	+#include <qthreads/TestQthreads_SubView_c09.cpp>
	+#include <qthreads/TestQthreads_SubView_c10.cpp>
	+#include <qthreads/TestQthreads_SubView_c11.cpp>
	+#include <qthreads/TestQthreads_SubView_c12.cpp>
	diff --git a/lib/kokkos/core/unit_test/qthreads/TestQthreads_Team.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Team.cpp
	new file mode 100644
	index 000000000..e7b81283f
	--- /dev/null
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Team.cpp
	@@ -0,0 +1,143 @@
	+/*
	+//@HEADER
	+// ************************************************************************
	+//
	+// Kokkos v. 2.0
	+// Copyright (2014) Sandia Corporation
	+//
	+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	+// the U.S. Government retains certain rights in this software.
	+//
	+// Redistribution and use in source and binary forms, with or without
	+// modification, are permitted provided that the following conditions are
	+// met:
	+//
	+// 1. Redistributions of source code must retain the above copyright
	+// notice, this list of conditions and the following disclaimer.
	+//
	+// 2. Redistributions in binary form must reproduce the above copyright
	+// notice, this list of conditions and the following disclaimer in the
	+// documentation and/or other materials provided with the distribution.
	+//
	+// 3. Neither the name of the Corporation nor the names of the
	+// contributors may be used to endorse or promote products derived from
	+// this software without specific prior written permission.
	+//
	+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	+//
	+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	+//
	+// ************************************************************************
	+//@HEADER
	+*/
	+
	+#include <qthreads/TestQthreads.hpp>
	+
	+namespace Test {
	+
	+TEST_F( qthreads, team_tag )
	+{
	+#if 0
	+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
	+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
	+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
	+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
	+
	+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
	+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
	+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 2 );
	+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 2 );
	+
	+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
	+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
	+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1000 );
	+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1000 );
	+#endif
	+}
	+
	+TEST_F( qthreads, team_shared_request )
	+{
	+#if 0
	+ TestSharedTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >();
	+ TestSharedTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >();
	+#endif
	+}
	+
	+TEST_F( qthreads, team_scratch_request )
	+{
	+#if 0
	+ TestScratchTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >();
	+ TestScratchTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >();
	+#endif
	+}
	+
	+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
	+TEST_F( qthreads, team_lambda_shared_request )
	+{
	+#if 0
	+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >();
	+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >();
	+#endif
	+}
	+#endif
	+
	+TEST_F( qthreads, shmem_size )
	+{
	+#if 0
	+ TestShmemSize< Kokkos::Qthreads >();
	+#endif
	+}
	+
	+TEST_F( qthreads, multi_level_scratch )
	+{
	+#if 0
	+ TestMultiLevelScratchTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >();
	+ TestMultiLevelScratchTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >();
	+#endif
	+}
	+
	+TEST_F( qthreads, team_vector )
	+{
	+#if 0
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 0 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 1 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 2 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 3 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 4 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 5 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 6 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 7 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 8 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 9 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 10 ) ) );
	+#endif
	+}
	+
	+#ifdef KOKKOS_COMPILER_GNU
	+#if ( KOKKOS_COMPILER_GNU == 472 )
	+#define SKIP_TEST
	+#endif
	+#endif
	+
	+#ifndef SKIP_TEST
	+TEST_F( qthreads, triple_nested_parallelism )
	+{
	+#if 0
	+ TestTripleNestedReduce< double, Kokkos::Qthreads >( 8192, 2048, 32, 32 );
	+ TestTripleNestedReduce< double, Kokkos::Qthreads >( 8192, 2048, 32, 16 );
	+ TestTripleNestedReduce< double, Kokkos::Qthreads >( 8192, 2048, 16, 16 );
	+#endif
	+}
	+#endif
	+
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_ViewAPI_a.cpp
	similarity index 90%
	copy from lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_ViewAPI_a.cpp
	index 4c5f2ef72..cd876a36b 100644
	--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_ViewAPI_a.cpp
	@@ -1,52 +1,56 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	-#include <cuda/TestCuda.hpp>
	+
	+#include <qthreads/TestQthreads.hpp>

	namespace Test {

	-TEST_F( cuda, view_subview_1d_assign ) {
	- TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace >();
	+TEST_F( qthreads, impl_view_mapping_a )
	+{
	+#if 0
	+ test_view_mapping< Kokkos::Qthreads >();
	+ test_view_mapping_operator< Kokkos::Qthreads >();
	+#endif
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/perf_test/PerfTestHost.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_ViewAPI_b.cpp
	similarity index 51%
	copy from lib/kokkos/core/perf_test/PerfTestHost.cpp
	copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_ViewAPI_b.cpp
	index 606177ca5..adf048b61 100644
	--- a/lib/kokkos/core/perf_test/PerfTestHost.cpp
	+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_ViewAPI_b.cpp
	@@ -1,115 +1,138 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	-#include <gtest/gtest.h>
	-
	-#include <Kokkos_Core.hpp>
	+#include <qthreads/TestQthreads.hpp>

	-#if defined( KOKKOS_ENABLE_OPENMP )
	+namespace Test {

	-typedef Kokkos::OpenMP TestHostDevice ;
	-const char TestHostDeviceName[] = "Kokkos::OpenMP" ;
	+TEST_F( qthreads, impl_shared_alloc )
	+{
	+#if 0
	+ test_shared_alloc< Kokkos::HostSpace, Kokkos::Qthreads >();
	+#endif
	+}

	-#elif defined( KOKKOS_ENABLE_PTHREAD )
	+TEST_F( qthreads, impl_view_mapping_b )
	+{
	+#if 0
	+ test_view_mapping_subview< Kokkos::Qthreads >();
	+ TestViewMappingAtomic< Kokkos::Qthreads >::run();
	+#endif
	+}

	-typedef Kokkos::Threads TestHostDevice ;
	-const char TestHostDeviceName[] = "Kokkos::Threads" ;
	+TEST_F( qthreads, view_api )
	+{
	+#if 0
	+ TestViewAPI< double, Kokkos::Qthreads >();
	+#endif
	+}

	-#elif defined( KOKKOS_ENABLE_SERIAL )
	+TEST_F( qthreads, view_nested_view )
	+{
	+#if 0
	+ ::Test::view_nested_view< Kokkos::Qthreads >();
	+#endif
	+}

	-typedef Kokkos::Serial TestHostDevice ;
	-const char TestHostDeviceName[] = "Kokkos::Serial" ;
	+TEST_F( qthreads, view_remap )
	+{
	+#if 0
	+ enum { N0 = 3, N1 = 2, N2 = 8, N3 = 9 };

	-#else
	-# error "You must enable at least one of the following execution spaces in order to build this test: Kokkos::Threads, Kokkos::OpenMP, or Kokkos::Serial."
	-#endif
	+ typedef Kokkos::View< double*[N1][N2][N3],
	+ Kokkos::LayoutRight,
	+ Kokkos::Qthreads > output_type;

	-#include <impl/Kokkos_Timer.hpp>
	+ typedef Kokkos::View< int**[N2][N3],
	+ Kokkos::LayoutLeft,
	+ Kokkos::Qthreads > input_type;

	-#include <PerfTestHexGrad.hpp>
	-#include <PerfTestBlasKernels.hpp>
	-#include <PerfTestGramSchmidt.hpp>
	-#include <PerfTestDriver.hpp>
	+ typedef Kokkos::View< int*[N0][N2][N3],
	+ Kokkos::LayoutLeft,
	+ Kokkos::Qthreads > diff_type;

	-//------------------------------------------------------------------------
	+ output_type output( "output", N0 );
	+ input_type input ( "input", N0, N1 );
	+ diff_type diff ( "diff", N0 );

	-namespace Test {
	+ int value = 0;

	-class host : public ::testing::Test {
	-protected:
	- static void SetUpTestCase()
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i0 = 0; i0 < N0; ++i0 )
	{
	- if(Kokkos::hwloc::available()) {
	- const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
	- const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
	- const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
	-
	- unsigned threads_count = 0 ;
	-
	- threads_count = std::max( 1u , numa_count )
	- * std::max( 2u , cores_per_numa * threads_per_core );
	-
	- TestHostDevice::initialize( threads_count );
	- } else {
	- const unsigned thread_count = 4 ;
	- TestHostDevice::initialize( thread_count );
	- }
	+ input( i0, i1, i2, i3 ) = ++value;
	}

	- static void TearDownTestCase()
	+ // Kokkos::deep_copy( diff, input ); // Throw with incompatible shape.
	+ Kokkos::deep_copy( output, input );
	+
	+ value = 0;
	+
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i0 = 0; i0 < N0; ++i0 )
	{
	- TestHostDevice::finalize();
	+ ++value;
	+ ASSERT_EQ( value, ( (int) output( i0, i1, i2, i3 ) ) );
	}
	-};
	+#endif
	+}

	-TEST_F( host, hexgrad ) {
	- EXPECT_NO_THROW(run_test_hexgrad< TestHostDevice>( 10, 20, TestHostDeviceName ));
	+TEST_F( qthreads, view_aggregate )
	+{
	+#if 0
	+ TestViewAggregate< Kokkos::Qthreads >();
	+#endif
	}

	-TEST_F( host, gramschmidt ) {
	- EXPECT_NO_THROW(run_test_gramschmidt< TestHostDevice>( 10, 20, TestHostDeviceName ));
	+TEST_F( qthreads, template_meta_functions )
	+{
	+#if 0
	+ TestTemplateMetaFunctions< int, Kokkos::Qthreads >();
	+#endif
	}

	} // namespace Test
	-
	-
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial.hpp b/lib/kokkos/core/unit_test/serial/TestSerial.hpp
	index c0ffa6afb..03da07e06 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial.hpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial.hpp
	@@ -1,105 +1,99 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#ifndef KOKKOS_TEST_SERIAL_HPP
	#define KOKKOS_TEST_SERIAL_HPP
	+
	#include <gtest/gtest.h>

	#include <Kokkos_Macros.hpp>
	+
	#ifdef KOKKOS_LAMBDA
	#undef KOKKOS_LAMBDA
	#endif
	#define KOKKOS_LAMBDA [=]

	#include <Kokkos_Core.hpp>

	#include <TestTile.hpp>
	-
	-//----------------------------------------------------------------------------
	-
	#include <TestSharedAlloc.hpp>
	#include <TestViewMapping.hpp>
	-
	-
	#include <TestViewAPI.hpp>
	#include <TestViewOfClass.hpp>
	#include <TestViewSubview.hpp>
	#include <TestAtomic.hpp>
	#include <TestAtomicOperations.hpp>
	-
	#include <TestAtomicViews.hpp>
	-
	#include <TestRange.hpp>
	#include <TestTeam.hpp>
	#include <TestReduce.hpp>
	#include <TestScan.hpp>
	#include <TestAggregate.hpp>
	#include <TestCompilerMacros.hpp>
	#include <TestTaskScheduler.hpp>
	#include <TestMemoryPool.hpp>
	-
	-
	#include <TestCXX11.hpp>
	#include <TestCXX11Deduction.hpp>
	#include <TestTeamVector.hpp>
	#include <TestTemplateMetaFunctions.hpp>
	-
	#include <TestPolicyConstruction.hpp>
	-
	#include <TestMDRange.hpp>

	namespace Test {

	class serial : public ::testing::Test {
	protected:
	static void SetUpTestCase()
	- {
	- Kokkos::HostSpace::execution_space::initialize();
	- }
	+ {
	+ Kokkos::HostSpace::execution_space::initialize();
	+ }
	+
	static void TearDownTestCase()
	- {
	- Kokkos::HostSpace::execution_space::finalize();
	- }
	+ {
	+ Kokkos::HostSpace::execution_space::finalize();
	+ }
	};

	-}
	+} // namespace Test
	+
	#endif
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_Atomics.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_Atomics.cpp
	index 729a76556..81ba532a3 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_Atomics.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_Atomics.cpp
	@@ -1,204 +1,204 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial , atomics )
	+TEST_F( serial, atomics )
	{
	- const int loop_count = 1e6 ;
	+ const int loop_count = 1e6;

	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Serial>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Serial>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Serial>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Serial >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Serial >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Serial >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Serial>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Serial>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Serial>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Serial >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Serial >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Serial >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Serial>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Serial>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Serial>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Serial >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Serial >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Serial >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Serial>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Serial>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Serial>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Serial >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Serial >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Serial >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Serial>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Serial>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Serial>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Serial >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Serial >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Serial >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Serial>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Serial>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Serial>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Serial >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Serial >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Serial >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Serial>(100,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Serial>(100,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Serial>(100,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Serial >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Serial >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Serial >( 100, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Serial>(100,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Serial>(100,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Serial>(100,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Serial >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Serial >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Serial >( 100, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Serial>(100,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Serial>(100,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Serial>(100,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Serial >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Serial >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Serial >( 100, 3 ) ) );
	}

	-TEST_F( serial , atomic_operations )
	+TEST_F( serial, atomic_operations )
	{
	- const int start = 1; //Avoid zero for division
	+ const int start = 1; // Avoid zero for division.
	const int end = 11;
	- for (int i = start; i < end; ++i)
	+
	+ for ( int i = start; i < end; ++i )
	{
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 12) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 4 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Serial >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Serial >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Serial >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Serial >( start, end - i, 4 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Serial >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Serial >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Serial >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Serial >( start, end - i, 4 ) ) );
	}
	-
	}


	-TEST_F( serial , atomic_views_integral )
	+TEST_F( serial, atomic_views_integral )
	{
	const long length = 1000000;
	- {
	- //Integral Types
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 8 ) ) );

	+ {
	+ // Integral Types.
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 8 ) ) );
	}
	}

	-TEST_F( serial , atomic_views_nonintegral )
	+TEST_F( serial, atomic_views_nonintegral )
	{
	const long length = 1000000;
	- {
	- //Non-Integral Types
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Serial>(length, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Serial>(length, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Serial>(length, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Serial>(length, 4 ) ) );

	+ {
	+ // Non-Integral Types.
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Serial >( length, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Serial >( length, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Serial >( length, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Serial >( length, 4 ) ) );
	}
	}

	-TEST_F( serial , atomic_view_api )
	+TEST_F( serial, atomic_view_api )
	{
	- TestAtomicViews::TestAtomicViewAPI<int, Kokkos::Serial>();
	+ TestAtomicViews::TestAtomicViewAPI< int, Kokkos::Serial >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_Other.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_Other.cpp
	index 43fc4c358..b40ed3f4a 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_Other.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_Other.cpp
	@@ -1,165 +1,172 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial , md_range ) {
	- TestMDRange_2D< Kokkos::Serial >::test_for2(100,100);
	+TEST_F( serial , mdrange_for )
	+{
	+ TestMDRange_2D< Kokkos::Serial >::test_for2( 100, 100 );
	+ TestMDRange_3D< Kokkos::Serial >::test_for3( 100, 10, 100 );
	+ TestMDRange_4D< Kokkos::Serial >::test_for4( 100, 10, 10, 10 );
	+ TestMDRange_5D< Kokkos::Serial >::test_for5( 100, 10, 10, 10, 5 );
	+ TestMDRange_6D< Kokkos::Serial >::test_for6( 10, 10, 10, 10, 5, 5 );
	+}

	- TestMDRange_3D< Kokkos::Serial >::test_for3(100,100,100);
	+TEST_F( serial , mdrange_reduce )
	+{
	+ TestMDRange_2D< Kokkos::Serial >::test_reduce2( 100, 100 );
	+ TestMDRange_3D< Kokkos::Serial >::test_reduce3( 100, 10, 100 );
	}

	-TEST_F( serial, policy_construction) {
	+TEST_F( serial, policy_construction )
	+{
	TestRangePolicyConstruction< Kokkos::Serial >();
	TestTeamPolicyConstruction< Kokkos::Serial >();
	}

	-TEST_F( serial , range_tag )
	+TEST_F( serial, range_tag )
	{
	- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
	- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
	- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_scan(0);
	- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
	- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
	- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(0);
	-
	- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
	- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
	- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
	- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
	- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
	- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
	- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(1000);
	+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
	+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
	+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_scan( 0 );
	+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
	+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
	+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 0 );
	+
	+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
	+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
	+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_scan( 1000 );
	+
	+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1001 );
	+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1001 );
	+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 1001 );
	+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 1000 );
	}

	-
	//----------------------------------------------------------------------------

	-TEST_F( serial , compiler_macros )
	+TEST_F( serial, compiler_macros )
	{
	ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Serial >() ) );
	}

	//----------------------------------------------------------------------------

	-TEST_F( serial , memory_pool )
	+TEST_F( serial, memory_pool )
	{
	bool val = TestMemoryPool::test_mempool< Kokkos::Serial >( 128, 128000000 );
	ASSERT_TRUE( val );

	TestMemoryPool::test_mempool2< Kokkos::Serial >( 64, 4, 1000000, 2000000 );

	TestMemoryPool::test_memory_exhaustion< Kokkos::Serial >();
	}

	//----------------------------------------------------------------------------

	#if defined( KOKKOS_ENABLE_TASKDAG )

	-TEST_F( serial , task_fib )
	+TEST_F( serial, task_fib )
	{
	- for ( int i = 0 ; i < 25 ; ++i ) {
	- TestTaskScheduler::TestFib< Kokkos::Serial >::run(i);
	+ for ( int i = 0; i < 25; ++i ) {
	+ TestTaskScheduler::TestFib< Kokkos::Serial >::run( i );
	}
	}

	-TEST_F( serial , task_depend )
	+TEST_F( serial, task_depend )
	{
	- for ( int i = 0 ; i < 25 ; ++i ) {
	- TestTaskScheduler::TestTaskDependence< Kokkos::Serial >::run(i);
	+ for ( int i = 0; i < 25; ++i ) {
	+ TestTaskScheduler::TestTaskDependence< Kokkos::Serial >::run( i );
	}
	}

	-TEST_F( serial , task_team )
	+TEST_F( serial, task_team )
	{
	- TestTaskScheduler::TestTaskTeam< Kokkos::Serial >::run(1000);
	- //TestTaskScheduler::TestTaskTeamValue< Kokkos::Serial >::run(1000); //put back after testing
	+ TestTaskScheduler::TestTaskTeam< Kokkos::Serial >::run( 1000 );
	+ //TestTaskScheduler::TestTaskTeamValue< Kokkos::Serial >::run( 1000 ); // Put back after testing.
	}

	#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */

	//----------------------------------------------------------------------------

	#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
	-TEST_F( serial , cxx11 )
	+TEST_F( serial, cxx11 )
	{
	- if ( std::is_same< Kokkos::DefaultExecutionSpace , Kokkos::Serial >::value ) {
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(1) ) );
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(2) ) );
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(3) ) );
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(4) ) );
	+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::Serial >::value ) {
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >( 1 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >( 2 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >( 3 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >( 4 ) ) );
	}
	}
	#endif

	TEST_F( serial, tile_layout )
	{
	- TestTile::test< Kokkos::Serial , 1 , 1 >( 1 , 1 );
	- TestTile::test< Kokkos::Serial , 1 , 1 >( 2 , 3 );
	- TestTile::test< Kokkos::Serial , 1 , 1 >( 9 , 10 );
	-
	- TestTile::test< Kokkos::Serial , 2 , 2 >( 1 , 1 );
	- TestTile::test< Kokkos::Serial , 2 , 2 >( 2 , 3 );
	- TestTile::test< Kokkos::Serial , 2 , 2 >( 4 , 4 );
	- TestTile::test< Kokkos::Serial , 2 , 2 >( 9 , 9 );
	-
	- TestTile::test< Kokkos::Serial , 2 , 4 >( 9 , 9 );
	- TestTile::test< Kokkos::Serial , 4 , 2 >( 9 , 9 );
	-
	- TestTile::test< Kokkos::Serial , 4 , 4 >( 1 , 1 );
	- TestTile::test< Kokkos::Serial , 4 , 4 >( 4 , 4 );
	- TestTile::test< Kokkos::Serial , 4 , 4 >( 9 , 9 );
	- TestTile::test< Kokkos::Serial , 4 , 4 >( 9 , 11 );
	-
	- TestTile::test< Kokkos::Serial , 8 , 8 >( 1 , 1 );
	- TestTile::test< Kokkos::Serial , 8 , 8 >( 4 , 4 );
	- TestTile::test< Kokkos::Serial , 8 , 8 >( 9 , 9 );
	- TestTile::test< Kokkos::Serial , 8 , 8 >( 9 , 11 );
	+ TestTile::test< Kokkos::Serial, 1, 1 >( 1, 1 );
	+ TestTile::test< Kokkos::Serial, 1, 1 >( 2, 3 );
	+ TestTile::test< Kokkos::Serial, 1, 1 >( 9, 10 );
	+
	+ TestTile::test< Kokkos::Serial, 2, 2 >( 1, 1 );
	+ TestTile::test< Kokkos::Serial, 2, 2 >( 2, 3 );
	+ TestTile::test< Kokkos::Serial, 2, 2 >( 4, 4 );
	+ TestTile::test< Kokkos::Serial, 2, 2 >( 9, 9 );
	+
	+ TestTile::test< Kokkos::Serial, 2, 4 >( 9, 9 );
	+ TestTile::test< Kokkos::Serial, 4, 2 >( 9, 9 );
	+
	+ TestTile::test< Kokkos::Serial, 4, 4 >( 1, 1 );
	+ TestTile::test< Kokkos::Serial, 4, 4 >( 4, 4 );
	+ TestTile::test< Kokkos::Serial, 4, 4 >( 9, 9 );
	+ TestTile::test< Kokkos::Serial, 4, 4 >( 9, 11 );
	+
	+ TestTile::test< Kokkos::Serial, 8, 8 >( 1, 1 );
	+ TestTile::test< Kokkos::Serial, 8, 8 >( 4, 4 );
	+ TestTile::test< Kokkos::Serial, 8, 8 >( 9, 9 );
	+ TestTile::test< Kokkos::Serial, 8, 8 >( 9, 11 );
	}

	-
	-
	-
	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_Reductions.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_Reductions.cpp
	index 25b5ac6d1..8a3d518cf 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_Reductions.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_Reductions.cpp
	@@ -1,122 +1,129 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial, long_reduce) {
	- TestReduce< long , Kokkos::Serial >( 0 );
	- TestReduce< long , Kokkos::Serial >( 1000000 );
	+TEST_F( serial, long_reduce )
	+{
	+ TestReduce< long, Kokkos::Serial >( 0 );
	+ TestReduce< long, Kokkos::Serial >( 1000000 );
	}

	-TEST_F( serial, double_reduce) {
	- TestReduce< double , Kokkos::Serial >( 0 );
	- TestReduce< double , Kokkos::Serial >( 1000000 );
	+TEST_F( serial, double_reduce )
	+{
	+ TestReduce< double, Kokkos::Serial >( 0 );
	+ TestReduce< double, Kokkos::Serial >( 1000000 );
	}

	-TEST_F( serial , reducers )
	+TEST_F( serial, reducers )
	{
	- TestReducers<int, Kokkos::Serial>::execute_integer();
	- TestReducers<size_t, Kokkos::Serial>::execute_integer();
	- TestReducers<double, Kokkos::Serial>::execute_float();
	- TestReducers<Kokkos::complex<double>, Kokkos::Serial>::execute_basic();
	+ TestReducers< int, Kokkos::Serial >::execute_integer();
	+ TestReducers< size_t, Kokkos::Serial >::execute_integer();
	+ TestReducers< double, Kokkos::Serial >::execute_float();
	+ TestReducers< Kokkos::complex<double >, Kokkos::Serial>::execute_basic();
	}

	-TEST_F( serial, long_reduce_dynamic ) {
	- TestReduceDynamic< long , Kokkos::Serial >( 0 );
	- TestReduceDynamic< long , Kokkos::Serial >( 1000000 );
	+TEST_F( serial, long_reduce_dynamic )
	+{
	+ TestReduceDynamic< long, Kokkos::Serial >( 0 );
	+ TestReduceDynamic< long, Kokkos::Serial >( 1000000 );
	}

	-TEST_F( serial, double_reduce_dynamic ) {
	- TestReduceDynamic< double , Kokkos::Serial >( 0 );
	- TestReduceDynamic< double , Kokkos::Serial >( 1000000 );
	+TEST_F( serial, double_reduce_dynamic )
	+{
	+ TestReduceDynamic< double, Kokkos::Serial >( 0 );
	+ TestReduceDynamic< double, Kokkos::Serial >( 1000000 );
	}

	-TEST_F( serial, long_reduce_dynamic_view ) {
	- TestReduceDynamicView< long , Kokkos::Serial >( 0 );
	- TestReduceDynamicView< long , Kokkos::Serial >( 1000000 );
	+TEST_F( serial, long_reduce_dynamic_view )
	+{
	+ TestReduceDynamicView< long, Kokkos::Serial >( 0 );
	+ TestReduceDynamicView< long, Kokkos::Serial >( 1000000 );
	}

	-TEST_F( serial , scan )
	+TEST_F( serial, scan )
	{
	- TestScan< Kokkos::Serial >::test_range( 1 , 1000 );
	+ TestScan< Kokkos::Serial >::test_range( 1, 1000 );
	TestScan< Kokkos::Serial >( 0 );
	TestScan< Kokkos::Serial >( 10 );
	TestScan< Kokkos::Serial >( 10000 );
	}

	-TEST_F( serial , team_scan )
	+TEST_F( serial, team_scan )
	{
	- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 0 );
	- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 10 );
	- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
	- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 10000 );
	- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
	+ TestScanTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 0 );
	+ TestScanTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	+ TestScanTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 10 );
	+ TestScanTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
	+ TestScanTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 10000 );
	+ TestScanTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
	}

	-TEST_F( serial , team_long_reduce) {
	- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 0 );
	- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 3 );
	- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 100000 );
	- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	+TEST_F( serial, team_long_reduce )
	+{
	+ TestReduceTeam< long, Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 0 );
	+ TestReduceTeam< long, Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	+ TestReduceTeam< long, Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 3 );
	+ TestReduceTeam< long, Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	+ TestReduceTeam< long, Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 100000 );
	+ TestReduceTeam< long, Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	}

	-TEST_F( serial , team_double_reduce) {
	- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 0 );
	- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 3 );
	- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 100000 );
	- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	+TEST_F( serial, team_double_reduce )
	+{
	+ TestReduceTeam< double, Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 0 );
	+ TestReduceTeam< double, Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	+ TestReduceTeam< double, Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 3 );
	+ TestReduceTeam< double, Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	+ TestReduceTeam< double, Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 100000 );
	+ TestReduceTeam< double, Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	}

	-TEST_F( serial , reduction_deduction )
	+TEST_F( serial, reduction_deduction )
	{
	TestCXX11::test_reduction_deduction< Kokkos::Serial >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_a.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_a.cpp
	index bc838ccde..3dc3e2019 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_a.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_a.cpp
	@@ -1,92 +1,103 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial, view_subview_auto_1d_left ) {
	- TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::Serial >();
	+TEST_F( serial, view_subview_auto_1d_left )
	+{
	+ TestViewSubview::test_auto_1d< Kokkos::LayoutLeft, Kokkos::Serial >();
	}

	-TEST_F( serial, view_subview_auto_1d_right ) {
	- TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::Serial >();
	+TEST_F( serial, view_subview_auto_1d_right )
	+{
	+ TestViewSubview::test_auto_1d< Kokkos::LayoutRight, Kokkos::Serial >();
	}

	-TEST_F( serial, view_subview_auto_1d_stride ) {
	- TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::Serial >();
	+TEST_F( serial, view_subview_auto_1d_stride )
	+{
	+ TestViewSubview::test_auto_1d< Kokkos::LayoutStride, Kokkos::Serial >();
	}

	-TEST_F( serial, view_subview_assign_strided ) {
	+TEST_F( serial, view_subview_assign_strided )
	+{
	TestViewSubview::test_1d_strided_assignment< Kokkos::Serial >();
	}

	-TEST_F( serial, view_subview_left_0 ) {
	+TEST_F( serial, view_subview_left_0 )
	+{
	TestViewSubview::test_left_0< Kokkos::Serial >();
	}

	-TEST_F( serial, view_subview_left_1 ) {
	+TEST_F( serial, view_subview_left_1 )
	+{
	TestViewSubview::test_left_1< Kokkos::Serial >();
	}

	-TEST_F( serial, view_subview_left_2 ) {
	+TEST_F( serial, view_subview_left_2 )
	+{
	TestViewSubview::test_left_2< Kokkos::Serial >();
	}

	-TEST_F( serial, view_subview_left_3 ) {
	+TEST_F( serial, view_subview_left_3 )
	+{
	TestViewSubview::test_left_3< Kokkos::Serial >();
	}

	-TEST_F( serial, view_subview_right_0 ) {
	+TEST_F( serial, view_subview_right_0 )
	+{
	TestViewSubview::test_right_0< Kokkos::Serial >();
	}

	-TEST_F( serial, view_subview_right_1 ) {
	+TEST_F( serial, view_subview_right_1 )
	+{
	TestViewSubview::test_right_1< Kokkos::Serial >();
	}

	-TEST_F( serial, view_subview_right_3 ) {
	+TEST_F( serial, view_subview_right_3 )
	+{
	TestViewSubview::test_right_3< Kokkos::Serial >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_b.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_b.cpp
	index e6a5b56d3..536c3bf19 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_b.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_b.cpp
	@@ -1,60 +1,62 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial, view_subview_layoutleft_to_layoutleft) {
	+TEST_F( serial, view_subview_layoutleft_to_layoutleft )
	+{
	TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Serial >();
	- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-TEST_F( serial, view_subview_layoutright_to_layoutright) {
	+TEST_F( serial, view_subview_layoutright_to_layoutright )
	+{
	TestViewSubview::test_layoutright_to_layoutright< Kokkos::Serial >();
	- TestViewSubview::test_layoutright_to_layoutright< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	- TestViewSubview::test_layoutright_to_layoutright< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c01.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c01.cpp
	index 0b7a0d3bf..579a12bf7 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c01.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c01.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial, view_subview_1d_assign ) {
	+TEST_F( serial, view_subview_1d_assign )
	+{
	TestViewSubview::test_1d_assign< Kokkos::Serial >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c02.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c02.cpp
	index 8ca7285c1..ff009fef2 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c02.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c02.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial, view_subview_1d_assign_atomic ) {
	- TestViewSubview::test_1d_assign< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( serial, view_subview_1d_assign_atomic )
	+{
	+ TestViewSubview::test_1d_assign< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c03.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c03.cpp
	index 1d156c741..a20478433 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c03.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c03.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial, view_subview_1d_assign_randomaccess ) {
	- TestViewSubview::test_1d_assign< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( serial, view_subview_1d_assign_randomaccess )
	+{
	+ TestViewSubview::test_1d_assign< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c04.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c04.cpp
	index ebf0e5c99..a34b26d9f 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c04.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c04.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial, view_subview_2d_from_3d ) {
	+TEST_F( serial, view_subview_2d_from_3d )
	+{
	TestViewSubview::test_2d_subview_3d< Kokkos::Serial >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c05.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c05.cpp
	index 74acb92f1..6d1882cf0 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c05.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c05.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial, view_subview_2d_from_3d_atomic ) {
	- TestViewSubview::test_2d_subview_3d< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( serial, view_subview_2d_from_3d_atomic )
	+{
	+ TestViewSubview::test_2d_subview_3d< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c06.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c06.cpp
	index 8075d46e0..12fb883b6 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c06.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c06.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial, view_subview_2d_from_3d_randomaccess ) {
	- TestViewSubview::test_2d_subview_3d< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( serial, view_subview_2d_from_3d_randomaccess )
	+{
	+ TestViewSubview::test_2d_subview_3d< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c07.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c07.cpp
	index 9ce822264..8aae20c02 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c07.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c07.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial, view_subview_3d_from_5d_left ) {
	+TEST_F( serial, view_subview_3d_from_5d_left )
	+{
	TestViewSubview::test_3d_subview_5d_left< Kokkos::Serial >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c08.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c08.cpp
	index c8a5c8f33..e75db8d52 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c08.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c08.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial, view_subview_3d_from_5d_left_atomic ) {
	- TestViewSubview::test_3d_subview_5d_left< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( serial, view_subview_3d_from_5d_left_atomic )
	+{
	+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c09.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c09.cpp
	index b66f15f17..b9cea2ce8 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c09.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c09.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial, view_subview_3d_from_5d_left_randomaccess ) {
	- TestViewSubview::test_3d_subview_5d_left< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( serial, view_subview_3d_from_5d_left_randomaccess )
	+{
	+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c10.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c10.cpp
	index 5e5e3cf3d..e5dbcead3 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c10.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c10.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial, view_subview_3d_from_5d_right ) {
	+TEST_F( serial, view_subview_3d_from_5d_right )
	+{
	TestViewSubview::test_3d_subview_5d_right< Kokkos::Serial >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c11.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c11.cpp
	index 55a353bca..3005030f9 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c11.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c11.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial, view_subview_3d_from_5d_right_atomic ) {
	- TestViewSubview::test_3d_subview_5d_right< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( serial, view_subview_3d_from_5d_right_atomic )
	+{
	+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c12.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c12.cpp
	index a168e1e23..fee8cb7af 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c12.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c12.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial, view_subview_3d_from_5d_right_randomaccess ) {
	- TestViewSubview::test_3d_subview_5d_right< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( serial, view_subview_3d_from_5d_right_randomaccess )
	+{
	+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c_all.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c_all.cpp
	index a489b0fcb..24dc6b506 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c_all.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c_all.cpp
	@@ -1,12 +1,12 @@
	-#include<serial/TestSerial_SubView_c01.cpp>
	-#include<serial/TestSerial_SubView_c02.cpp>
	-#include<serial/TestSerial_SubView_c03.cpp>
	-#include<serial/TestSerial_SubView_c04.cpp>
	-#include<serial/TestSerial_SubView_c05.cpp>
	-#include<serial/TestSerial_SubView_c06.cpp>
	-#include<serial/TestSerial_SubView_c07.cpp>
	-#include<serial/TestSerial_SubView_c08.cpp>
	-#include<serial/TestSerial_SubView_c09.cpp>
	-#include<serial/TestSerial_SubView_c10.cpp>
	-#include<serial/TestSerial_SubView_c11.cpp>
	-#include<serial/TestSerial_SubView_c12.cpp>
	+#include <serial/TestSerial_SubView_c01.cpp>
	+#include <serial/TestSerial_SubView_c02.cpp>
	+#include <serial/TestSerial_SubView_c03.cpp>
	+#include <serial/TestSerial_SubView_c04.cpp>
	+#include <serial/TestSerial_SubView_c05.cpp>
	+#include <serial/TestSerial_SubView_c06.cpp>
	+#include <serial/TestSerial_SubView_c07.cpp>
	+#include <serial/TestSerial_SubView_c08.cpp>
	+#include <serial/TestSerial_SubView_c09.cpp>
	+#include <serial/TestSerial_SubView_c10.cpp>
	+#include <serial/TestSerial_SubView_c11.cpp>
	+#include <serial/TestSerial_SubView_c12.cpp>
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_Team.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_Team.cpp
	index df400b4cb..f13b2ce1b 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_Team.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_Team.cpp
	@@ -1,117 +1,122 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial , team_tag )
	+TEST_F( serial, team_tag )
	{
	- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
	- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
	- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
	- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
	+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
	+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
	+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
	+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );

	- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
	- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
	- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
	- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
	+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
	+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
	+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1000 );
	+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1000 );
	}

	-TEST_F( serial , team_shared_request) {
	- TestSharedTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >();
	- TestSharedTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >();
	+TEST_F( serial, team_shared_request )
	+{
	+ TestSharedTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >();
	+ TestSharedTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >();
	}

	-TEST_F( serial, team_scratch_request) {
	- TestScratchTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >();
	- TestScratchTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >();
	+TEST_F( serial, team_scratch_request )
	+{
	+ TestScratchTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >();
	+ TestScratchTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >();
	}

	-#if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	-TEST_F( serial , team_lambda_shared_request) {
	- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >();
	- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >();
	+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
	+TEST_F( serial, team_lambda_shared_request )
	+{
	+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >();
	+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >();
	}
	#endif

	-TEST_F( serial, shmem_size) {
	+TEST_F( serial, shmem_size )
	+{
	TestShmemSize< Kokkos::Serial >();
	}

	-TEST_F( serial, multi_level_scratch) {
	- TestMultiLevelScratchTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >();
	- TestMultiLevelScratchTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >();
	+TEST_F( serial, multi_level_scratch )
	+{
	+ TestMultiLevelScratchTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >();
	+ TestMultiLevelScratchTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >();
	}

	-TEST_F( serial , team_vector )
	+TEST_F( serial, team_vector )
	{
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(0) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(1) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(2) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(3) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(4) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(5) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(6) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(7) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(8) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(9) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(10) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 0 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 1 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 2 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 3 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 4 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 5 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 6 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 7 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 8 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 9 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 10 ) ) );
	}

	#ifdef KOKKOS_COMPILER_GNU
	#if ( KOKKOS_COMPILER_GNU == 472 )
	#define SKIP_TEST
	#endif
	#endif

	#ifndef SKIP_TEST
	TEST_F( serial, triple_nested_parallelism )
	{
	- TestTripleNestedReduce< double, Kokkos::Serial >( 8192, 2048 , 32 , 32 );
	- TestTripleNestedReduce< double, Kokkos::Serial >( 8192, 2048 , 32 , 16 );
	- TestTripleNestedReduce< double, Kokkos::Serial >( 8192, 2048 , 16 , 16 );
	+ TestTripleNestedReduce< double, Kokkos::Serial >( 8192, 2048, 32, 32 );
	+ TestTripleNestedReduce< double, Kokkos::Serial >( 8192, 2048, 32, 16 );
	+ TestTripleNestedReduce< double, Kokkos::Serial >( 8192, 2048, 16, 16 );
	}
	#endif

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_a.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_a.cpp
	index 4c655fe77..2192159b8 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_a.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_a.cpp
	@@ -1,53 +1,54 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial , impl_view_mapping_a ) {
	+TEST_F( serial, impl_view_mapping_a )
	+{
	test_view_mapping< Kokkos::Serial >();
	test_view_mapping_operator< Kokkos::Serial >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_b.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_b.cpp
	index 4947f2eaa..8c48ad2ce 100644
	--- a/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_b.cpp
	+++ b/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_b.cpp
	@@ -1,121 +1,124 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <serial/TestSerial.hpp>

	namespace Test {

	-TEST_F( serial , impl_shared_alloc ) {
	- test_shared_alloc< Kokkos::HostSpace , Kokkos::Serial >();
	+TEST_F( serial, impl_shared_alloc )
	+{
	+ test_shared_alloc< Kokkos::HostSpace, Kokkos::Serial >();
	}

	-TEST_F( serial , impl_view_mapping_b ) {
	+TEST_F( serial, impl_view_mapping_b )
	+{
	test_view_mapping_subview< Kokkos::Serial >();
	TestViewMappingAtomic< Kokkos::Serial >::run();
	}

	-TEST_F( serial, view_api) {
	- TestViewAPI< double , Kokkos::Serial >();
	+TEST_F( serial, view_api )
	+{
	+ TestViewAPI< double, Kokkos::Serial >();
	}

	-TEST_F( serial , view_nested_view )
	+TEST_F( serial, view_nested_view )
	{
	::Test::view_nested_view< Kokkos::Serial >();
	}

	-
	-
	-TEST_F( serial , view_remap )
	+TEST_F( serial, view_remap )
	{
	- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
	-
	- typedef Kokkos::View< double*[N1][N2][N3] ,
	- Kokkos::LayoutRight ,
	- Kokkos::Serial > output_type ;
	-
	- typedef Kokkos::View< int**[N2][N3] ,
	- Kokkos::LayoutLeft ,
	- Kokkos::Serial > input_type ;
	-
	- typedef Kokkos::View< int*[N0][N2][N3] ,
	- Kokkos::LayoutLeft ,
	- Kokkos::Serial > diff_type ;
	-
	- output_type output( "output" , N0 );
	- input_type input ( "input" , N0 , N1 );
	- diff_type diff ( "diff" , N0 );
	-
	- int value = 0 ;
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
	- input(i0,i1,i2,i3) = ++value ;
	- }}}}
	-
	- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
	- Kokkos::deep_copy( output , input );
	-
	- value = 0 ;
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
	- ++value ;
	- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
	- }}}}
	+ enum { N0 = 3, N1 = 2, N2 = 8, N3 = 9 };
	+
	+ typedef Kokkos::View< double*[N1][N2][N3],
	+ Kokkos::LayoutRight,
	+ Kokkos::Serial > output_type;
	+
	+ typedef Kokkos::View< int**[N2][N3],
	+ Kokkos::LayoutLeft,
	+ Kokkos::Serial > input_type;
	+
	+ typedef Kokkos::View< int*[N0][N2][N3],
	+ Kokkos::LayoutLeft,
	+ Kokkos::Serial > diff_type;
	+
	+ output_type output( "output", N0 );
	+ input_type input ( "input", N0, N1 );
	+ diff_type diff ( "diff", N0 );
	+
	+ int value = 0;
	+
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i0 = 0; i0 < N0; ++i0 )
	+ {
	+ input( i0, i1, i2, i3 ) = ++value;
	+ }
	+
	+ // Kokkos::deep_copy( diff, input ); // Throw with incompatible shape.
	+ Kokkos::deep_copy( output, input );
	+
	+ value = 0;
	+
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i0 = 0; i0 < N0; ++i0 )
	+ {
	+ ++value;
	+ ASSERT_EQ( value, ( (int) output( i0, i1, i2, i3 ) ) );
	+ }
	}

	-//----------------------------------------------------------------------------
	-
	-TEST_F( serial , view_aggregate )
	+TEST_F( serial, view_aggregate )
	{
	TestViewAggregate< Kokkos::Serial >();
	}

	-TEST_F( serial , template_meta_functions )
	+TEST_F( serial, template_meta_functions )
	{
	- TestTemplateMetaFunctions<int, Kokkos::Serial >();
	+ TestTemplateMetaFunctions< int, Kokkos::Serial >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads.hpp b/lib/kokkos/core/unit_test/threads/TestThreads.hpp
	index 4f611cf99..0afd6772f 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads.hpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads.hpp
	@@ -1,115 +1,109 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#ifndef KOKKOS_TEST_THREADS_HPP
	#define KOKKOS_TEST_THREADS_HPP
	+
	#include <gtest/gtest.h>

	#include <Kokkos_Macros.hpp>
	+
	#ifdef KOKKOS_LAMBDA
	#undef KOKKOS_LAMBDA
	#endif
	#define KOKKOS_LAMBDA [=]

	#include <Kokkos_Core.hpp>

	#include <TestTile.hpp>
	-
	-//----------------------------------------------------------------------------
	-
	#include <TestSharedAlloc.hpp>
	#include <TestViewMapping.hpp>
	-
	-
	#include <TestViewAPI.hpp>
	#include <TestViewOfClass.hpp>
	#include <TestViewSubview.hpp>
	#include <TestAtomic.hpp>
	#include <TestAtomicOperations.hpp>
	#include <TestAtomicViews.hpp>
	#include <TestRange.hpp>
	#include <TestTeam.hpp>
	#include <TestReduce.hpp>
	#include <TestScan.hpp>
	#include <TestAggregate.hpp>
	#include <TestCompilerMacros.hpp>
	#include <TestTaskScheduler.hpp>
	#include <TestMemoryPool.hpp>
	-
	-
	#include <TestCXX11.hpp>
	#include <TestCXX11Deduction.hpp>
	#include <TestTeamVector.hpp>
	#include <TestTemplateMetaFunctions.hpp>
	-
	#include <TestPolicyConstruction.hpp>
	-
	#include <TestMDRange.hpp>

	namespace Test {

	class threads : public ::testing::Test {
	protected:
	static void SetUpTestCase()
	{
	const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
	const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
	const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();

	- unsigned threads_count = 0 ;
	+ unsigned threads_count = 0;

	- threads_count = std::max( 1u , numa_count )
	- * std::max( 2u , cores_per_numa * threads_per_core );
	+ threads_count = std::max( 1u, numa_count )
	+ * std::max( 2u, cores_per_numa * threads_per_core );

	Kokkos::Threads::initialize( threads_count );
	- Kokkos::Threads::print_configuration( std::cout , true /* detailed */ );
	+ Kokkos::print_configuration( std::cout, true /* detailed */ );
	}

	static void TearDownTestCase()
	{
	Kokkos::Threads::finalize();
	}
	};

	+} // namespace Test

	-}
	#endif
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_Atomics.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_Atomics.cpp
	index 6e24c4973..d2a5ea5d6 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_Atomics.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_Atomics.cpp
	@@ -1,204 +1,200 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads , atomics )
	+TEST_F( threads, atomics )
	{
	- const int loop_count = 1e4 ;
	+ const int loop_count = 1e4;

	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Threads>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Threads>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Threads>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Threads >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Threads >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Threads >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Threads>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Threads>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Threads>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Threads >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Threads >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Threads >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Threads>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Threads>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Threads>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Threads >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Threads >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Threads >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Threads>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Threads>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Threads>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Threads >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Threads >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Threads >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Threads>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Threads>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Threads>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Threads >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Threads >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Threads >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Threads>(loop_count,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Threads>(loop_count,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Threads>(loop_count,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Threads >( loop_count, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Threads >( loop_count, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Threads >( loop_count, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Threads>(100,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Threads>(100,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Threads>(100,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Threads >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Threads >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Threads >( 100, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Threads>(100,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Threads>(100,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Threads>(100,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Threads >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Threads >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Threads >( 100, 3 ) ) );

	- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Threads>(100,1) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Threads>(100,2) ) );
	- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Threads>(100,3) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Threads >( 100, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Threads >( 100, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Threads >( 100, 3 ) ) );
	}

	-TEST_F( threads , atomic_operations )
	+TEST_F( threads, atomic_operations )
	{
	- const int start = 1; //Avoid zero for division
	+ const int start = 1; // Avoid zero for division.
	const int end = 11;
	- for (int i = start; i < end; ++i)
	+ for ( int i = start; i < end; ++i )
	{
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 8 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 9 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 11 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 12 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 4 ) ) );
	-
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 8 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 9 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 11 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 12 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Threads >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Threads >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Threads >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Threads >( start, end - i, 4 ) ) );
	+
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Threads >( start, end - i, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Threads >( start, end - i, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Threads >( start, end - i, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Threads >( start, end - i, 4 ) ) );
	}
	-
	}

	-
	-TEST_F( threads , atomic_views_integral )
	+TEST_F( threads, atomic_views_integral )
	{
	const long length = 1000000;
	{
	- //Integral Types
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 4 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 5 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 6 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 7 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 8 ) ) );
	-
	+ // Integral Types.
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 4 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 5 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 6 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 7 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 8 ) ) );
	}
	}

	-TEST_F( threads , atomic_views_nonintegral )
	+TEST_F( threads, atomic_views_nonintegral )
	{
	const long length = 1000000;
	{
	- //Non-Integral Types
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Threads>(length, 1 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Threads>(length, 2 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Threads>(length, 3 ) ) );
	- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Threads>(length, 4 ) ) );
	-
	+ // Non-Integral Types.
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Threads >( length, 1 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Threads >( length, 2 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Threads >( length, 3 ) ) );
	+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Threads >( length, 4 ) ) );
	}
	}

	-TEST_F( threads , atomic_view_api )
	+TEST_F( threads, atomic_view_api )
	{
	- TestAtomicViews::TestAtomicViewAPI<int, Kokkos::Threads>();
	+ TestAtomicViews::TestAtomicViewAPI< int, Kokkos::Threads >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_Other.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_Other.cpp
	index ac0356eeb..7d268c145 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_Other.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_Other.cpp
	@@ -1,189 +1,196 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads , init ) {
	+TEST_F( threads, init )
	+{
	;
	}

	-TEST_F( threads , md_range ) {
	- TestMDRange_2D< Kokkos::Threads >::test_for2(100,100);
	+TEST_F( threads , mdrange_for ) {
	+ TestMDRange_2D< Kokkos::Threads >::test_for2( 100, 100 );
	+ TestMDRange_3D< Kokkos::Threads >::test_for3( 100, 10, 100 );
	+ TestMDRange_4D< Kokkos::Threads >::test_for4( 100, 10, 10, 10 );
	+ TestMDRange_5D< Kokkos::Threads >::test_for5( 100, 10, 10, 10, 5 );
	+ TestMDRange_6D< Kokkos::Threads >::test_for6( 10, 10, 10, 10, 5, 5 );
	+}

	- TestMDRange_3D< Kokkos::Threads >::test_for3(100,100,100);
	+TEST_F( threads , mdrange_reduce ) {
	+ TestMDRange_2D< Kokkos::Threads >::test_reduce2( 100, 100 );
	+ TestMDRange_3D< Kokkos::Threads >::test_reduce3( 100, 10, 100 );
	}

	-TEST_F( threads, policy_construction) {
	+TEST_F( threads, policy_construction )
	+{
	TestRangePolicyConstruction< Kokkos::Threads >();
	TestTeamPolicyConstruction< Kokkos::Threads >();
	}

	-TEST_F( threads , range_tag )
	+TEST_F( threads, range_tag )
	{
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_scan(0);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(0);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(0);
	-
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_scan(2);
	-
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(3);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(3);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(3);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(3);
	-
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
	-
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
	- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(1000);
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_scan( 0 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 0 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 0 );
	+
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_scan( 2 );
	+
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 3 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 3 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 3 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 3 );
	+
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_scan( 1000 );
	+
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1001 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1001 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 1001 );
	+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 1000 );
	}

	-
	//----------------------------------------------------------------------------

	-TEST_F( threads , compiler_macros )
	+TEST_F( threads, compiler_macros )
	{
	ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Threads >() ) );
	}

	//----------------------------------------------------------------------------

	-TEST_F( threads , memory_pool )
	+TEST_F( threads, memory_pool )
	{
	bool val = TestMemoryPool::test_mempool< Kokkos::Threads >( 128, 128000000 );
	ASSERT_TRUE( val );

	TestMemoryPool::test_mempool2< Kokkos::Threads >( 64, 4, 1000000, 2000000 );

	TestMemoryPool::test_memory_exhaustion< Kokkos::Threads >();
	}

	//----------------------------------------------------------------------------

	#if defined( KOKKOS_ENABLE_TASKDAG )
	/*
	-TEST_F( threads , task_fib )
	+TEST_F( threads, task_fib )
	{
	- for ( int i = 0 ; i < 25 ; ++i ) {
	- TestTaskScheduler::TestFib< Kokkos::Threads >::run(i);
	+ for ( int i = 0; i < 25; ++i ) {
	+ TestTaskScheduler::TestFib< Kokkos::Threads >::run( i );
	}
	}

	-TEST_F( threads , task_depend )
	+TEST_F( threads, task_depend )
	{
	- for ( int i = 0 ; i < 25 ; ++i ) {
	- TestTaskScheduler::TestTaskDependence< Kokkos::Threads >::run(i);
	+ for ( int i = 0; i < 25; ++i ) {
	+ TestTaskScheduler::TestTaskDependence< Kokkos::Threads >::run( i );
	}
	}

	-TEST_F( threads , task_team )
	+TEST_F( threads, task_team )
	{
	- TestTaskScheduler::TestTaskTeam< Kokkos::Threads >::run(1000);
	- //TestTaskScheduler::TestTaskTeamValue< Kokkos::Threads >::run(1000); //put back after testing
	+ TestTaskScheduler::TestTaskTeam< Kokkos::Threads >::run( 1000 );
	+ //TestTaskScheduler::TestTaskTeamValue< Kokkos::Threads >::run( 1000 ); // Put back after testing.
	}
	*/
	#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */

	//----------------------------------------------------------------------------

	#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
	-TEST_F( threads , cxx11 )
	+TEST_F( threads, cxx11 )
	{
	- if ( std::is_same< Kokkos::DefaultExecutionSpace , Kokkos::Threads >::value ) {
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(1) ) );
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(2) ) );
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(3) ) );
	- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(4) ) );
	+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::Threads >::value ) {
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >( 1 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >( 2 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >( 3 ) ) );
	+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >( 4 ) ) );
	}
	}
	#endif

	TEST_F( threads, tile_layout )
	{
	- TestTile::test< Kokkos::Threads , 1 , 1 >( 1 , 1 );
	- TestTile::test< Kokkos::Threads , 1 , 1 >( 2 , 3 );
	- TestTile::test< Kokkos::Threads , 1 , 1 >( 9 , 10 );
	-
	- TestTile::test< Kokkos::Threads , 2 , 2 >( 1 , 1 );
	- TestTile::test< Kokkos::Threads , 2 , 2 >( 2 , 3 );
	- TestTile::test< Kokkos::Threads , 2 , 2 >( 4 , 4 );
	- TestTile::test< Kokkos::Threads , 2 , 2 >( 9 , 9 );
	-
	- TestTile::test< Kokkos::Threads , 2 , 4 >( 9 , 9 );
	- TestTile::test< Kokkos::Threads , 4 , 2 >( 9 , 9 );
	-
	- TestTile::test< Kokkos::Threads , 4 , 4 >( 1 , 1 );
	- TestTile::test< Kokkos::Threads , 4 , 4 >( 4 , 4 );
	- TestTile::test< Kokkos::Threads , 4 , 4 >( 9 , 9 );
	- TestTile::test< Kokkos::Threads , 4 , 4 >( 9 , 11 );
	-
	- TestTile::test< Kokkos::Threads , 8 , 8 >( 1 , 1 );
	- TestTile::test< Kokkos::Threads , 8 , 8 >( 4 , 4 );
	- TestTile::test< Kokkos::Threads , 8 , 8 >( 9 , 9 );
	- TestTile::test< Kokkos::Threads , 8 , 8 >( 9 , 11 );
	+ TestTile::test< Kokkos::Threads, 1, 1 >( 1, 1 );
	+ TestTile::test< Kokkos::Threads, 1, 1 >( 2, 3 );
	+ TestTile::test< Kokkos::Threads, 1, 1 >( 9, 10 );
	+
	+ TestTile::test< Kokkos::Threads, 2, 2 >( 1, 1 );
	+ TestTile::test< Kokkos::Threads, 2, 2 >( 2, 3 );
	+ TestTile::test< Kokkos::Threads, 2, 2 >( 4, 4 );
	+ TestTile::test< Kokkos::Threads, 2, 2 >( 9, 9 );
	+
	+ TestTile::test< Kokkos::Threads, 2, 4 >( 9, 9 );
	+ TestTile::test< Kokkos::Threads, 4, 2 >( 9, 9 );
	+
	+ TestTile::test< Kokkos::Threads, 4, 4 >( 1, 1 );
	+ TestTile::test< Kokkos::Threads, 4, 4 >( 4, 4 );
	+ TestTile::test< Kokkos::Threads, 4, 4 >( 9, 9 );
	+ TestTile::test< Kokkos::Threads, 4, 4 >( 9, 11 );
	+
	+ TestTile::test< Kokkos::Threads, 8, 8 >( 1, 1 );
	+ TestTile::test< Kokkos::Threads, 8, 8 >( 4, 4 );
	+ TestTile::test< Kokkos::Threads, 8, 8 >( 9, 9 );
	+ TestTile::test< Kokkos::Threads, 8, 8 >( 9, 11 );
	}

	-
	-TEST_F( threads , dispatch )
	+TEST_F( threads, dispatch )
	{
	- const int repeat = 100 ;
	- for ( int i = 0 ; i < repeat ; ++i ) {
	- for ( int j = 0 ; j < repeat ; ++j ) {
	- Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Threads >(0,j)
	- , KOKKOS_LAMBDA( int ) {} );
	- }}
	+ const int repeat = 100;
	+ for ( int i = 0; i < repeat; ++i ) {
	+ for ( int j = 0; j < repeat; ++j ) {
	+ Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Threads >( 0, j )
	+ , KOKKOS_LAMBDA( int ) {} );
	+ }
	+ }
	}

	-
	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_Reductions.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_Reductions.cpp
	index a637d1e3a..d2b75ca89 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_Reductions.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_Reductions.cpp
	@@ -1,138 +1,146 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads, long_reduce) {
	- TestReduce< long , Kokkos::Threads >( 0 );
	- TestReduce< long , Kokkos::Threads >( 1000000 );
	+TEST_F( threads, long_reduce )
	+{
	+ TestReduce< long, Kokkos::Threads >( 0 );
	+ TestReduce< long, Kokkos::Threads >( 1000000 );
	}

	-TEST_F( threads, double_reduce) {
	- TestReduce< double , Kokkos::Threads >( 0 );
	- TestReduce< double , Kokkos::Threads >( 1000000 );
	+TEST_F( threads, double_reduce )
	+{
	+ TestReduce< double, Kokkos::Threads >( 0 );
	+ TestReduce< double, Kokkos::Threads >( 1000000 );
	}

	-TEST_F( threads , reducers )
	+TEST_F( threads, reducers )
	{
	- TestReducers<int, Kokkos::Threads>::execute_integer();
	- TestReducers<size_t, Kokkos::Threads>::execute_integer();
	- TestReducers<double, Kokkos::Threads>::execute_float();
	- TestReducers<Kokkos::complex<double>, Kokkos::Threads>::execute_basic();
	+ TestReducers< int, Kokkos::Threads >::execute_integer();
	+ TestReducers< size_t, Kokkos::Threads >::execute_integer();
	+ TestReducers< double, Kokkos::Threads >::execute_float();
	+ TestReducers< Kokkos::complex<double>, Kokkos::Threads >::execute_basic();
	}

	-TEST_F( threads, long_reduce_dynamic ) {
	- TestReduceDynamic< long , Kokkos::Threads >( 0 );
	- TestReduceDynamic< long , Kokkos::Threads >( 1000000 );
	+TEST_F( threads, long_reduce_dynamic )
	+{
	+ TestReduceDynamic< long, Kokkos::Threads >( 0 );
	+ TestReduceDynamic< long, Kokkos::Threads >( 1000000 );
	}

	-TEST_F( threads, double_reduce_dynamic ) {
	- TestReduceDynamic< double , Kokkos::Threads >( 0 );
	- TestReduceDynamic< double , Kokkos::Threads >( 1000000 );
	+TEST_F( threads, double_reduce_dynamic )
	+{
	+ TestReduceDynamic< double, Kokkos::Threads >( 0 );
	+ TestReduceDynamic< double, Kokkos::Threads >( 1000000 );
	}

	-TEST_F( threads, long_reduce_dynamic_view ) {
	- TestReduceDynamicView< long , Kokkos::Threads >( 0 );
	- TestReduceDynamicView< long , Kokkos::Threads >( 1000000 );
	+TEST_F( threads, long_reduce_dynamic_view )
	+{
	+ TestReduceDynamicView< long, Kokkos::Threads >( 0 );
	+ TestReduceDynamicView< long, Kokkos::Threads >( 1000000 );
	}

	-TEST_F( threads , scan )
	+TEST_F( threads, scan )
	{
	- TestScan< Kokkos::Threads >::test_range( 1 , 1000 );
	+ TestScan< Kokkos::Threads >::test_range( 1, 1000 );
	TestScan< Kokkos::Threads >( 0 );
	TestScan< Kokkos::Threads >( 100000 );
	TestScan< Kokkos::Threads >( 10000000 );
	Kokkos::Threads::fence();
	}

	#if 0
	-TEST_F( threads , scan_small )
	+TEST_F( threads, scan_small )
	{
	- typedef TestScan< Kokkos::Threads , Kokkos::Impl::ThreadsExecUseScanSmall > TestScanFunctor ;
	- for ( int i = 0 ; i < 1000 ; ++i ) {
	+ typedef TestScan< Kokkos::Threads, Kokkos::Impl::ThreadsExecUseScanSmall > TestScanFunctor;
	+
	+ for ( int i = 0; i < 1000; ++i ) {
	TestScanFunctor( 10 );
	TestScanFunctor( 10000 );
	}
	TestScanFunctor( 1000000 );
	TestScanFunctor( 10000000 );

	Kokkos::Threads::fence();
	}
	#endif

	-TEST_F( threads , team_scan )
	+TEST_F( threads, team_scan )
	{
	- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 0 );
	- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 10 );
	- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
	- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 10000 );
	- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
	+ TestScanTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 0 );
	+ TestScanTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	+ TestScanTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 10 );
	+ TestScanTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
	+ TestScanTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 10000 );
	+ TestScanTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
	}

	-TEST_F( threads , team_long_reduce) {
	- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 0 );
	- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 3 );
	- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 100000 );
	- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	+TEST_F( threads, team_long_reduce )
	+{
	+ TestReduceTeam< long, Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 0 );
	+ TestReduceTeam< long, Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	+ TestReduceTeam< long, Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 3 );
	+ TestReduceTeam< long, Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	+ TestReduceTeam< long, Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 100000 );
	+ TestReduceTeam< long, Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	}

	-TEST_F( threads , team_double_reduce) {
	- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 0 );
	- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 3 );
	- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 100000 );
	- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	+TEST_F( threads, team_double_reduce )
	+{
	+ TestReduceTeam< double, Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 0 );
	+ TestReduceTeam< double, Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
	+ TestReduceTeam< double, Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 3 );
	+ TestReduceTeam< double, Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
	+ TestReduceTeam< double, Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 100000 );
	+ TestReduceTeam< double, Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
	}

	-TEST_F( threads , reduction_deduction )
	+TEST_F( threads, reduction_deduction )
	{
	TestCXX11::test_reduction_deduction< Kokkos::Threads >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp
	index 2df9e19de..68a9da6ae 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp
	@@ -1,92 +1,103 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_auto_1d_left ) {
	- TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::Threads >();
	+TEST_F( threads, view_subview_auto_1d_left )
	+{
	+ TestViewSubview::test_auto_1d< Kokkos::LayoutLeft, Kokkos::Threads >();
	}

	-TEST_F( threads, view_subview_auto_1d_right ) {
	- TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::Threads >();
	+TEST_F( threads, view_subview_auto_1d_right )
	+{
	+ TestViewSubview::test_auto_1d< Kokkos::LayoutRight, Kokkos::Threads >();
	}

	-TEST_F( threads, view_subview_auto_1d_stride ) {
	- TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::Threads >();
	+TEST_F( threads, view_subview_auto_1d_stride )
	+{
	+ TestViewSubview::test_auto_1d< Kokkos::LayoutStride, Kokkos::Threads >();
	}

	-TEST_F( threads, view_subview_assign_strided ) {
	+TEST_F( threads, view_subview_assign_strided )
	+{
	TestViewSubview::test_1d_strided_assignment< Kokkos::Threads >();
	}

	-TEST_F( threads, view_subview_left_0 ) {
	+TEST_F( threads, view_subview_left_0 )
	+{
	TestViewSubview::test_left_0< Kokkos::Threads >();
	}

	-TEST_F( threads, view_subview_left_1 ) {
	+TEST_F( threads, view_subview_left_1 )
	+{
	TestViewSubview::test_left_1< Kokkos::Threads >();
	}

	-TEST_F( threads, view_subview_left_2 ) {
	+TEST_F( threads, view_subview_left_2 )
	+{
	TestViewSubview::test_left_2< Kokkos::Threads >();
	}

	-TEST_F( threads, view_subview_left_3 ) {
	+TEST_F( threads, view_subview_left_3 )
	+{
	TestViewSubview::test_left_3< Kokkos::Threads >();
	}

	-TEST_F( threads, view_subview_right_0 ) {
	+TEST_F( threads, view_subview_right_0 )
	+{
	TestViewSubview::test_right_0< Kokkos::Threads >();
	}

	-TEST_F( threads, view_subview_right_1 ) {
	+TEST_F( threads, view_subview_right_1 )
	+{
	TestViewSubview::test_right_1< Kokkos::Threads >();
	}

	-TEST_F( threads, view_subview_right_3 ) {
	+TEST_F( threads, view_subview_right_3 )
	+{
	TestViewSubview::test_right_3< Kokkos::Threads >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_b.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_b.cpp
	index d57dbe97c..c5cf061e8 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_b.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_b.cpp
	@@ -1,60 +1,62 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_layoutleft_to_layoutleft) {
	+TEST_F( threads, view_subview_layoutleft_to_layoutleft )
	+{
	TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Threads >();
	- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-TEST_F( threads, view_subview_layoutright_to_layoutright) {
	+TEST_F( threads, view_subview_layoutright_to_layoutright )
	+{
	TestViewSubview::test_layoutright_to_layoutright< Kokkos::Threads >();
	- TestViewSubview::test_layoutright_to_layoutright< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	- TestViewSubview::test_layoutright_to_layoutright< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c01.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c01.cpp
	index 67d998c0e..9018c1f4f 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c01.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c01.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_1d_assign ) {
	+TEST_F( threads, view_subview_1d_assign )
	+{
	TestViewSubview::test_1d_assign< Kokkos::Threads >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp
	index e340240c4..9483abd9c 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_1d_assign_atomic ) {
	- TestViewSubview::test_1d_assign< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( threads, view_subview_1d_assign_atomic )
	+{
	+ TestViewSubview::test_1d_assign< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp
	index ad27fa0fa..e252a2656 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_1d_assign_randomaccess ) {
	- TestViewSubview::test_1d_assign< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( threads, view_subview_1d_assign_randomaccess )
	+{
	+ TestViewSubview::test_1d_assign< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c04.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c04.cpp
	index 6fca47cc4..3e211b1a5 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c04.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c04.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_2d_from_3d ) {
	+TEST_F( threads, view_subview_2d_from_3d )
	+{
	TestViewSubview::test_2d_subview_3d< Kokkos::Threads >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp
	index c7dfca941..865d50b1a 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_2d_from_3d_atomic ) {
	- TestViewSubview::test_2d_subview_3d< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( threads, view_subview_2d_from_3d_atomic )
	+{
	+ TestViewSubview::test_2d_subview_3d< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp
	index 38e839491..c5840073b 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_2d_from_3d_randomaccess ) {
	- TestViewSubview::test_2d_subview_3d< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( threads, view_subview_2d_from_3d_randomaccess )
	+{
	+ TestViewSubview::test_2d_subview_3d< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c07.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c07.cpp
	index 1f01fe6b5..7b8825ef6 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c07.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c07.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_3d_from_5d_left ) {
	+TEST_F( threads, view_subview_3d_from_5d_left )
	+{
	TestViewSubview::test_3d_subview_5d_left< Kokkos::Threads >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c08.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c08.cpp
	index e9a1ccbe3..7bc16a582 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c08.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c08.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_3d_from_5d_left_atomic ) {
	- TestViewSubview::test_3d_subview_5d_left< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( threads, view_subview_3d_from_5d_left_atomic )
	+{
	+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c09.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c09.cpp
	index c8b6c8743..57b87b609 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c09.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c09.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_3d_from_5d_left_randomaccess ) {
	- TestViewSubview::test_3d_subview_5d_left< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( threads, view_subview_3d_from_5d_left_randomaccess )
	+{
	+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp
	index 7cef6fa07..1875a883d 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_3d_from_5d_right ) {
	+TEST_F( threads, view_subview_3d_from_5d_right )
	+{
	TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp
	index d67bf3157..cf6428b18 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_3d_from_5d_right_atomic ) {
	- TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
	+TEST_F( threads, view_subview_3d_from_5d_right_atomic )
	+{
	+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp
	index e8a2c825c..7060fdb27 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp
	@@ -1,52 +1,53 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads, view_subview_3d_from_5d_right_randomaccess ) {
	- TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	+TEST_F( threads, view_subview_3d_from_5d_right_randomaccess )
	+{
	+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_Team.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_Team.cpp
	index 4690be4d3..d802d6583 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_Team.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_Team.cpp
	@@ -1,122 +1,127 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads , team_tag )
	+TEST_F( threads, team_tag )
	{
	- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
	- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
	- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
	- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
	+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
	+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
	+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
	+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );

	- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
	- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
	- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(2);
	- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(2);
	+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
	+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
	+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 2 );
	+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 2 );

	- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
	- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
	- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
	- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
	+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
	+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
	+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1000 );
	+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1000 );
	}

	-TEST_F( threads , team_shared_request) {
	- TestSharedTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >();
	- TestSharedTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >();
	+TEST_F( threads, team_shared_request )
	+{
	+ TestSharedTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >();
	+ TestSharedTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >();
	}

	-TEST_F( threads, team_scratch_request) {
	- TestScratchTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >();
	- TestScratchTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >();
	+TEST_F( threads, team_scratch_request )
	+{
	+ TestScratchTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >();
	+ TestScratchTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >();
	}

	-#if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	-TEST_F( threads , team_lambda_shared_request) {
	- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >();
	- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >();
	+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
	+TEST_F( threads, team_lambda_shared_request )
	+{
	+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >();
	+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >();
	}
	#endif

	-TEST_F( threads, shmem_size) {
	+TEST_F( threads, shmem_size )
	+{
	TestShmemSize< Kokkos::Threads >();
	}

	-TEST_F( threads, multi_level_scratch) {
	- TestMultiLevelScratchTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >();
	- TestMultiLevelScratchTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >();
	+TEST_F( threads, multi_level_scratch )
	+{
	+ TestMultiLevelScratchTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >();
	+ TestMultiLevelScratchTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >();
	}

	-TEST_F( threads , team_vector )
	+TEST_F( threads, team_vector )
	{
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(0) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(1) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(2) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(3) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(4) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(5) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(6) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(7) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(8) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(9) ) );
	- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(10) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 0 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 1 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 2 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 3 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 4 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 5 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 6 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 7 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 8 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 9 ) ) );
	+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 10 ) ) );
	}

	#ifdef KOKKOS_COMPILER_GNU
	#if ( KOKKOS_COMPILER_GNU == 472 )
	#define SKIP_TEST
	#endif
	#endif

	#ifndef SKIP_TEST
	TEST_F( threads, triple_nested_parallelism )
	{
	- TestTripleNestedReduce< double, Kokkos::Threads >( 8192, 2048 , 32 , 32 );
	- TestTripleNestedReduce< double, Kokkos::Threads >( 8192, 2048 , 32 , 16 );
	- TestTripleNestedReduce< double, Kokkos::Threads >( 8192, 2048 , 16 , 16 );
	+ TestTripleNestedReduce< double, Kokkos::Threads >( 8192, 2048, 32, 32 );
	+ TestTripleNestedReduce< double, Kokkos::Threads >( 8192, 2048, 32, 16 );
	+ TestTripleNestedReduce< double, Kokkos::Threads >( 8192, 2048, 16, 16 );
	}
	#endif

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_a.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_a.cpp
	index 46a576b02..36eae2879 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_a.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_a.cpp
	@@ -1,53 +1,54 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads , impl_view_mapping_a ) {
	+TEST_F( threads, impl_view_mapping_a )
	+{
	test_view_mapping< Kokkos::Threads >();
	test_view_mapping_operator< Kokkos::Threads >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_b.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_b.cpp
	index b5d6ac843..8c78d0944 100644
	--- a/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_b.cpp
	+++ b/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_b.cpp
	@@ -1,121 +1,124 @@
	/*
	//@HEADER
	// ************************************************************************
	//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	//
	// ************************************************************************
	//@HEADER
	*/
	+
	#include <threads/TestThreads.hpp>

	namespace Test {

	-TEST_F( threads , impl_shared_alloc ) {
	- test_shared_alloc< Kokkos::HostSpace , Kokkos::Threads >();
	+TEST_F( threads, impl_shared_alloc )
	+{
	+ test_shared_alloc< Kokkos::HostSpace, Kokkos::Threads >();
	}

	-TEST_F( threads , impl_view_mapping_b ) {
	+TEST_F( threads, impl_view_mapping_b )
	+{
	test_view_mapping_subview< Kokkos::Threads >();
	TestViewMappingAtomic< Kokkos::Threads >::run();
	}

	-TEST_F( threads, view_api) {
	- TestViewAPI< double , Kokkos::Threads >();
	+TEST_F( threads, view_api )
	+{
	+ TestViewAPI< double, Kokkos::Threads >();
	}

	-TEST_F( threads , view_nested_view )
	+TEST_F( threads, view_nested_view )
	{
	::Test::view_nested_view< Kokkos::Threads >();
	}

	-
	-
	-TEST_F( threads , view_remap )
	+TEST_F( threads, view_remap )
	{
	- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
	-
	- typedef Kokkos::View< double*[N1][N2][N3] ,
	- Kokkos::LayoutRight ,
	- Kokkos::Threads > output_type ;
	-
	- typedef Kokkos::View< int**[N2][N3] ,
	- Kokkos::LayoutLeft ,
	- Kokkos::Threads > input_type ;
	-
	- typedef Kokkos::View< int*[N0][N2][N3] ,
	- Kokkos::LayoutLeft ,
	- Kokkos::Threads > diff_type ;
	-
	- output_type output( "output" , N0 );
	- input_type input ( "input" , N0 , N1 );
	- diff_type diff ( "diff" , N0 );
	-
	- int value = 0 ;
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
	- input(i0,i1,i2,i3) = ++value ;
	- }}}}
	-
	- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
	- Kokkos::deep_copy( output , input );
	-
	- value = 0 ;
	- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
	- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
	- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
	- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
	- ++value ;
	- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
	- }}}}
	+ enum { N0 = 3, N1 = 2, N2 = 8, N3 = 9 };
	+
	+ typedef Kokkos::View< double*[N1][N2][N3],
	+ Kokkos::LayoutRight,
	+ Kokkos::Threads > output_type;
	+
	+ typedef Kokkos::View< int**[N2][N3],
	+ Kokkos::LayoutLeft,
	+ Kokkos::Threads > input_type;
	+
	+ typedef Kokkos::View< int*[N0][N2][N3],
	+ Kokkos::LayoutLeft,
	+ Kokkos::Threads > diff_type;
	+
	+ output_type output( "output", N0 );
	+ input_type input ( "input", N0, N1 );
	+ diff_type diff ( "diff", N0 );
	+
	+ int value = 0;
	+
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i0 = 0; i0 < N0; ++i0 )
	+ {
	+ input( i0, i1, i2, i3 ) = ++value;
	+ }
	+
	+ // Kokkos::deep_copy( diff, input ); // Throw with incompatible shape.
	+ Kokkos::deep_copy( output, input );
	+
	+ value = 0;
	+
	+ for ( size_t i3 = 0; i3 < N3; ++i3 )
	+ for ( size_t i2 = 0; i2 < N2; ++i2 )
	+ for ( size_t i1 = 0; i1 < N1; ++i1 )
	+ for ( size_t i0 = 0; i0 < N0; ++i0 )
	+ {
	+ ++value;
	+ ASSERT_EQ( value, ( (int) output( i0, i1, i2, i3 ) ) );
	+ }
	}

	-//----------------------------------------------------------------------------
	-
	-TEST_F( threads , view_aggregate )
	+TEST_F( threads, view_aggregate )
	{
	TestViewAggregate< Kokkos::Threads >();
	}

	-TEST_F( threads , template_meta_functions )
	+TEST_F( threads, template_meta_functions )
	{
	- TestTemplateMetaFunctions<int, Kokkos::Threads >();
	+ TestTemplateMetaFunctions< int, Kokkos::Threads >();
	}

	-} // namespace test
	-
	+} // namespace Test
	diff --git a/lib/kokkos/doc/design_notes_space_instances.md b/lib/kokkos/doc/design_notes_space_instances.md
	index 487fa25bc..0124dfbc8 100644
	--- a/lib/kokkos/doc/design_notes_space_instances.md
	+++ b/lib/kokkos/doc/design_notes_space_instances.md
	@@ -1,166 +1,131 @@
	# Design Notes for Execution and Memory Space Instances

	+## Objective

	-## Execution Spaces
	+ * Enable Kokkos interoperability with coarse-grain tasking models
	+
	+## Requirements

	- * Work is dispatched to an execution space instance
	+ * Backwards compatable with existing Kokkos API
	+ * Support existing Host execution spaces (Serial, Threads, OpenMP, maybe Qthreads)
	+ * Support DARMA threading model (may require a new Host execution space)
	+ * Support Uintah threading model, i.e. indepentant worker threadpools working of of shared task queues
	+
	+
	+## Execution Space

	+ * Parallel work is dispatched on an execution space instance
	+
	+ * Execution space instances are conceptually disjoint/independant from each other
	+

	-
	-## Host Associated Execution Space Instances
	-
	-Vocabulary and examples assuming C++11 Threads Support Library
	+## Host Execution Space Instances

	* A host-side control thread dispatches work to an instance

	- * `this_thread` is the control thread
	-
	* `main` is the initial control thread

	- * An execution space instance is a pool of threads
	+ * A host execution space instance is an organized thread pool

	- * All instances are disjoint thread pools
	+ * All instances are disjoint, i.e. hardware resources are not shared between instances

	* Exactly one control thread is associated with
	an instance and only that control thread may
	dispatch work to to that instance

	- * A control thread may be a member of an instance,
	- if so then it is also the control thread associated
	- with that instance
	+ * The control thread is a member of the instance

	- * The pool of threads associated with an instances is not mutatable
	+ * The pool of threads associated with an instances is not mutatable during that instance existance

	* The pool of threads associated with an instance may be masked

	- Allows work to be dispatched to a subset of the pool

	- Example: only one hyperthread per core of the instance

	- - When a mask is applied to an instance that mask
	- remains until cleared or another mask is applied
	-
	- - Masking is portable by defining it as using a fraction
	- of the available resources (threads)
	-
	- * Instances are shared (referenced counted) objects,
	- just like `Kokkos::View`
	-
	-```
	-struct StdThread {
	- void mask( float fraction );
	- void unmask() { mask( 1.0 ); }
	-};
	-```
	-
	-
	-
	-### Requesting an Execution Space Instance
	-
	- * `Space::request(` who `,` what `,` control-opt `)`
	-
	- * who is an identifier for subsquent queries regarding
	- who requested each instance
	-
	- * what is the number of threads and how they should be placed
	-
	- - Placement within locality-topology hierarchy; e.g., HWLOC
	-
	- - Compact within a level of hierarchy, or striped across that level;
	- e.g., socket or NUMA region
	-
	- - Granularity of request is core
	-
	- * control-opt optionally specifies whether the instance
	- has a new control thread
	-
	- - control-opt includes a control function / closure
	-
	- - The new control thread is a member of the instance
	-
	- - The control function is called by the new control thread
	- and is passed a `const` instance
	-
	- - The instance is not returned to the creating control thread
	-
	- * `std::thread` that is not a member of an instance is
	- hard blocked on a `std::mutex`
	-
	- - One global mutex or one mutex per thread?
	-
	- * `std::thread` that is a member of an instance is
	- spinning waiting for work, or are working
	-
	-```
	-struct StdThread {
	-
	- struct Resource ;
	-
	- static StdThread request(); // default
	+ - A mask can be applied during the policy creation of a parallel algorithm
	+
	+ - Masking is portable by defining it as ceiling of fraction between [0.0, 1.0]
	+ of the available resources

	- static StdThread request( const std::string & , const Resource & );
	-
	- // If the instance can be reserved then
	- // allocate a copy of ControlClosure and invoke
	- // ControlClosure::operator()( const StdThread intance ) const
	- template< class ControlClosure >
	- static bool request( const std::string & , const Resource &
	- , const ControlClosure & );
	-};
	```
	-
	-### Relinquishing an Execution Space Instance
	-
	- * De-referencing the last reference-counted instance
	- relinquishes the pool of threads
	-
	- * If a control thread was created for the instance then
	- it is relinquished when that control thread returns
	- from the control function
	-
	- - Requires the reference count to be zero, an error if not
	-
	- * No forced relinquish
	-
	-
	-
	-## CUDA Associated Execution Space Instances
	-
	- * Only a signle CUDA architecture
	-
	- * An instance is a device + stream
	-
	- * A stream is exclusive to an instance
	-
	- * Only a host-side control thread can dispatch work to an instance
	-
	- * Finite number of streams per device
	-
	- * ISSUE: How to use CUDA `const` memory with multiple streams?
	-
	- * Masking can be mapped to restricting the number of CUDA blocks
	- to the fraction of available resources; e.g., maximum resident blocks
	-
	-
	-### Requesting an Execution Space Instance
	-
	- * `Space::request(` who `,` what `)`
	-
	- * who is an identifier for subsquent queries regarding
	- who requested each instance
	-
	- * what is which device, the stream is a requested/relinquished resource
	-
	+class ExecutionSpace {
	+public:
	+ using execution_space = ExecutionSpace;
	+ using memory_space = ...;
	+ using device_type = Kokkos::Device<execution_space, memory_space>;
	+ using array_layout = ...;
	+ using size_type = ...;
	+ using scratch_memory_space = ...;
	+
	+
	+ class Instance
	+ {
	+ int thread_pool_size( int depth = 0 );
	+ ...
	+ };
	+
	+ class InstanceRequest
	+ {
	+ public:
	+ using Control = std::function< void( Instance * )>;
	+
	+ InstanceRequest( Control control
	+ , unsigned thread_count
	+ , unsigned use_numa_count = 0
	+ , unsigned use_cores_per_numa = 0
	+ );
	+
	+ };
	+
	+ static bool in_parallel();
	+
	+ static bool sleep();
	+ static bool wake();
	+
	+ static void fence();
	+
	+ static void print_configuration( std::ostream &, const bool detailed = false );
	+
	+ static void initialize( unsigned thread_count = 0
	+ , unsigned use_numa_count = 0
	+ , unsigned use_cores_per_numa = 0
	+ );
	+
	+ // Partition the current instance into the requested instances
	+ // and run the given functions on the cooresponding instances
	+ // will block until all the partitioned instances complete and
	+ // the original instance will be restored
	+ //
	+ // Requires that the space has already been initialized
	+ // Requires that the request can be statisfied by the current instance
	+ // i.e. the sum of number of requested threads must be less than the
	+ // max_hardware_threads
	+ //
	+ // Each control functor will accept a handle to its new default instance
	+ // Each instance must be independant of all other instances
	+ // i.e. no assumption on scheduling between instances
	+ // The user is responible for checking the return code for errors
	+ static int run_instances( std::vector< InstanceRequest> const& requests );
	+
	+ static void finalize();
	+
	+ static int is_initialized();
	+
	+ static int concurrency();
	+
	+ static int thread_pool_size( int depth = 0 );
	+
	+ static int thread_pool_rank();
	+
	+ static int max_hardware_threads();
	+
	+ static int hardware_thread_id();
	+
	+ };

	```
	-struct Cuda {
	+

	- struct Resource ;
	-
	- static Cuda request();
	-
	- static Cuda request( const std::string & , const Resource & );
	-};
	-```


	diff --git a/lib/kokkos/example/md_skeleton/types.h b/lib/kokkos/example/md_skeleton/types.h
	index 7f92b7cd0..c9689188a 100644
	--- a/lib/kokkos/example/md_skeleton/types.h
	+++ b/lib/kokkos/example/md_skeleton/types.h
	@@ -1,118 +1,118 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#ifndef TYPES_H_
	#define TYPES_H_

	/* Determine default device type and necessary includes */

	#include <Kokkos_Core.hpp>

	typedef Kokkos::DefaultExecutionSpace execution_space ;

	-#if ! defined( KOKKOS_HAVE_CUDA )
	+#if ! defined( KOKKOS_ENABLE_CUDA )
	struct double2 {
	double x, y;
	KOKKOS_INLINE_FUNCTION
	double2(double xinit, double yinit) {
	x = xinit;
	y = yinit;
	}
	KOKKOS_INLINE_FUNCTION
	double2() {
	x = 0.0;
	y = 0.0;
	}
	KOKKOS_INLINE_FUNCTION
	double2& operator += (const double2& src) {
	x+=src.x;
	y+=src.y;
	return *this;
	}

	KOKKOS_INLINE_FUNCTION
	volatile double2& operator += (const volatile double2& src) volatile {
	x+=src.x;
	y+=src.y;
	return *this;
	}

	};
	#endif

	#include <impl/Kokkos_Timer.hpp>

	/* Define types used throughout the code */

	//Position arrays
	typedef Kokkos::View<double*[3], Kokkos::LayoutRight, execution_space> t_x_array ;
	typedef t_x_array::HostMirror t_x_array_host ;
	typedef Kokkos::View<const double*[3], Kokkos::LayoutRight, execution_space> t_x_array_const ;
	typedef Kokkos::View<const double*[3], Kokkos::LayoutRight, execution_space, Kokkos::MemoryRandomAccess > t_x_array_randomread ;

	//Force array
	typedef Kokkos::View<double*[3], execution_space> t_f_array ;


	//Neighborlist
	typedef Kokkos::View<int**, execution_space > t_neighbors ;
	typedef Kokkos::View<const int**, execution_space > t_neighbors_const ;
	typedef Kokkos::View<int*, execution_space, Kokkos::MemoryUnmanaged > t_neighbors_sub ;
	typedef Kokkos::View<const int*, execution_space, Kokkos::MemoryUnmanaged > t_neighbors_const_sub ;

	//1d int array
	typedef Kokkos::View<int*, execution_space > t_int_1d ;
	typedef t_int_1d::HostMirror t_int_1d_host ;
	typedef Kokkos::View<const int*, execution_space > t_int_1d_const ;
	typedef Kokkos::View<int*, execution_space , Kokkos::MemoryUnmanaged> t_int_1d_um ;
	typedef Kokkos::View<const int* , execution_space , Kokkos::MemoryUnmanaged> t_int_1d_const_um ;

	//2d int array
	typedef Kokkos::View<int**, Kokkos::LayoutRight, execution_space > t_int_2d ;
	typedef t_int_2d::HostMirror t_int_2d_host ;

	//Scalar ints
	typedef Kokkos::View<int[1], Kokkos::LayoutLeft, execution_space> t_int_scalar ;
	typedef t_int_scalar::HostMirror t_int_scalar_host ;

	#endif /* TYPES_H_ */
	diff --git a/lib/kokkos/example/tutorial/01_hello_world_lambda/hello_world_lambda.cpp b/lib/kokkos/example/tutorial/01_hello_world_lambda/hello_world_lambda.cpp
	index 326d06410..249d44ab5 100644
	--- a/lib/kokkos/example/tutorial/01_hello_world_lambda/hello_world_lambda.cpp
	+++ b/lib/kokkos/example/tutorial/01_hello_world_lambda/hello_world_lambda.cpp
	@@ -1,112 +1,112 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Core.hpp>
	#include <cstdio>
	#include <typeinfo>

	//
	// "Hello world" parallel_for example:
	// 1. Start up Kokkos
	// 2. Execute a parallel for loop in the default execution space,
	// using a C++11 lambda to define the loop body
	// 3. Shut down Kokkos
	//
	// This example only builds if C++11 is enabled. Compare this example
	// to 01_hello_world, which uses functors (explicitly defined classes)
	// to define the loop body of the parallel_for. Both functors and
	// lambdas have their places.
	//

	int main (int argc, char* argv[]) {
	// You must call initialize() before you may call Kokkos.
	//
	// With no arguments, this initializes the default execution space
	// (and potentially its host execution space) with default
	// parameters. You may also pass in argc and argv, analogously to
	// MPI_Init(). It reads and removes command-line arguments that
	// start with "--kokkos-".
	Kokkos::initialize (argc, argv);

	// Print the name of Kokkos' default execution space. We're using
	// typeid here, so the name might get a bit mangled by the linker,
	// but you should still be able to figure out what it is.
	printf ("Hello World on Kokkos execution space %s\n",
	typeid (Kokkos::DefaultExecutionSpace).name ());

	// Run lambda on the default Kokkos execution space in parallel,
	// with a parallel for loop count of 15. The lambda's argument is
	// an integer which is the parallel for's loop index. As you learn
	// about different kinds of parallelism, you will find out that
	// there are other valid argument types as well.
	//
	// For a single level of parallelism, we prefer that you use the
	// KOKKOS_LAMBDA macro. If CUDA is disabled, this just turns into
	// [=]. That captures variables from the surrounding scope by
	// value. Do NOT capture them by reference! If CUDA is enabled,
	// this macro may have a special definition that makes the lambda
	// work correctly with CUDA. Compare to the KOKKOS_INLINE_FUNCTION
	// macro, which has a special meaning if CUDA is enabled.
	//
	// The following parallel_for would look like this if we were using
	// OpenMP by itself, instead of Kokkos:
	//
	// #pragma omp parallel for
	// for (int i = 0; i < 15; ++i) {
	// printf ("Hello from i = %i\n", i);
	// }
	//
	// You may notice that the printed numbers do not print out in
	// order. Parallel for loops may execute in any order.
	// We also need to protect the usage of a lambda against compiling
	// with a backend which doesn't support it (i.e. Cuda 6.5/7.0).
	-#if (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	+#if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	Kokkos::parallel_for (15, KOKKOS_LAMBDA (const int i) {
	// printf works in a CUDA parallel kernel; std::ostream does not.
	printf ("Hello from i = %i\n", i);
	});
	#endif
	// You must call finalize() after you are done using Kokkos.
	Kokkos::finalize ();
	}

	diff --git a/lib/kokkos/example/tutorial/02_simple_reduce_lambda/simple_reduce_lambda.cpp b/lib/kokkos/example/tutorial/02_simple_reduce_lambda/simple_reduce_lambda.cpp
	index 70eea4324..f7f467ad2 100644
	--- a/lib/kokkos/example/tutorial/02_simple_reduce_lambda/simple_reduce_lambda.cpp
	+++ b/lib/kokkos/example/tutorial/02_simple_reduce_lambda/simple_reduce_lambda.cpp
	@@ -1,94 +1,94 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Core.hpp>
	#include <cstdio>

	//
	// First reduction (parallel_reduce) example:
	// 1. Start up Kokkos
	// 2. Execute a parallel_reduce loop in the default execution space,
	// using a C++11 lambda to define the loop body
	// 3. Shut down Kokkos
	//
	// This example only builds if C++11 is enabled. Compare this example
	// to 02_simple_reduce, which uses a functor to define the loop body
	// of the parallel_reduce.
	//

	int main (int argc, char* argv[]) {
	Kokkos::initialize (argc, argv);
	const int n = 10;

	// Compute the sum of squares of integers from 0 to n-1, in
	// parallel, using Kokkos. This time, use a lambda instead of a
	// functor. The lambda takes the same arguments as the functor's
	// operator().
	int sum = 0;
	// The KOKKOS_LAMBDA macro replaces the capture-by-value clause [=].
	// It also handles any other syntax needed for CUDA.
	// We also need to protect the usage of a lambda against compiling
	// with a backend which doesn't support it (i.e. Cuda 6.5/7.0).
	- #if (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	+ #if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	Kokkos::parallel_reduce (n, KOKKOS_LAMBDA (const int i, int& lsum) {
	lsum += i*i;
	}, sum);
	#endif
	printf ("Sum of squares of integers from 0 to %i, "
	"computed in parallel, is %i\n", n - 1, sum);

	// Compare to a sequential loop.
	int seqSum = 0;
	for (int i = 0; i < n; ++i) {
	seqSum += i*i;
	}
	printf ("Sum of squares of integers from 0 to %i, "
	"computed sequentially, is %i\n", n - 1, seqSum);
	Kokkos::finalize ();
	-#if (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	+#if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	return (sum == seqSum) ? 0 : -1;
	#else
	return 0;
	#endif
	}

	diff --git a/lib/kokkos/example/tutorial/03_simple_view_lambda/simple_view_lambda.cpp b/lib/kokkos/example/tutorial/03_simple_view_lambda/simple_view_lambda.cpp
	index dd0641be5..3450ad1bb 100644
	--- a/lib/kokkos/example/tutorial/03_simple_view_lambda/simple_view_lambda.cpp
	+++ b/lib/kokkos/example/tutorial/03_simple_view_lambda/simple_view_lambda.cpp
	@@ -1,120 +1,120 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	//
	// First Kokkos::View (multidimensional array) example:
	// 1. Start up Kokkos
	// 2. Allocate a Kokkos::View
	// 3. Execute a parallel_for and a parallel_reduce over that View's data
	// 4. Shut down Kokkos
	//
	// Compare this example to 03_simple_view, which uses functors to
	// define the loop bodies of the parallel_for and parallel_reduce.
	//

	#include <Kokkos_Core.hpp>
	#include <cstdio>

	// A Kokkos::View is an array of zero or more dimensions. The number
	// of dimensions is specified at compile time, as part of the type of
	// the View. This array has two dimensions. The first one
	// (represented by the asterisk) is a run-time dimension, and the
	// second (represented by [3]) is a compile-time dimension. Thus,
	// this View type is an N x 3 array of type double, where N is
	// specified at run time in the View's constructor.
	//
	// The first dimension of the View is the dimension over which it is
	// efficient for Kokkos to parallelize.
	typedef Kokkos::View<double*[3]> view_type;

	int main (int argc, char* argv[]) {
	Kokkos::initialize (argc, argv);

	// Allocate the View. The first dimension is a run-time parameter
	// N. We set N = 10 here. The second dimension is a compile-time
	// parameter, 3. We don't specify it here because we already set it
	// by declaring the type of the View.
	//
	// Views get initialized to zero by default. This happens in
	// parallel, using the View's memory space's default execution
	// space. Parallel initialization ensures first-touch allocation.
	// There is a way to shut off default initialization.
	//
	// You may NOT allocate a View inside of a parallel_{for, reduce,
	// scan}. Treat View allocation as a "thread collective."
	//
	// The string "A" is just the label; it only matters for debugging.
	// Different Views may have the same label.
	view_type a ("A", 10);

	// Fill the View with some data. The parallel_for loop will iterate
	// over the View's first dimension N.
	//
	// Note that the View is passed by value into the lambda. The macro
	// KOKKOS_LAMBDA includes the "capture by value" clause [=]. This
	// tells the lambda to "capture all variables in the enclosing scope
	// by value." Views have "view semantics"; they behave like
	// pointers, not like std::vector. Passing them by value does a
	// shallow copy. A deep copy never happens unless you explicitly
	// ask for one.
	// We also need to protect the usage of a lambda against compiling
	// with a backend which doesn't support it (i.e. Cuda 6.5/7.0).
	- #if (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	+ #if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	Kokkos::parallel_for (10, KOKKOS_LAMBDA (const int i) {
	// Acesss the View just like a Fortran array. The layout depends
	// on the View's memory space, so don't rely on the View's
	// physical memory layout unless you know what you're doing.
	a(i,0) = 1.0*i;
	a(i,1) = 1.0ii;
	a(i,2) = 1.0ii*i;
	});
	// Reduction functor that reads the View given to its constructor.
	double sum = 0;
	Kokkos::parallel_reduce (10, KOKKOS_LAMBDA (const int i, double& lsum) {
	lsum += a(i,0)*a(i,1)/(a(i,2)+0.1);
	}, sum);
	printf ("Result: %f\n", sum);
	#endif
	Kokkos::finalize ();
	}

	diff --git a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/thread_teams_lambda.cpp b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/thread_teams_lambda.cpp
	index 216db7f12..9ea5e8b70 100644
	--- a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/thread_teams_lambda.cpp
	+++ b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/thread_teams_lambda.cpp
	@@ -1,97 +1,97 @@
	/*
	//@HEADER
	// ************************************************************************
	-//
	+//
	// Kokkos v. 2.0
	// Copyright (2014) Sandia Corporation
	-//
	+//
	// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
	// the U.S. Government retains certain rights in this software.
	-//
	+//
	// Redistribution and use in source and binary forms, with or without
	// modification, are permitted provided that the following conditions are
	// met:
	//
	// 1. Redistributions of source code must retain the above copyright
	// notice, this list of conditions and the following disclaimer.
	//
	// 2. Redistributions in binary form must reproduce the above copyright
	// notice, this list of conditions and the following disclaimer in the
	// documentation and/or other materials provided with the distribution.
	//
	// 3. Neither the name of the Corporation nor the names of the
	// contributors may be used to endorse or promote products derived from
	// this software without specific prior written permission.
	//
	// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
	// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
	// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
	// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
	// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
	// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
	// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
	// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
	// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
	//
	// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
	-//
	+//
	// ************************************************************************
	//@HEADER
	*/

	#include <Kokkos_Core.hpp>
	#include <cstdio>

	// Demonstrate a parallel reduction using thread teams (TeamPolicy).
	//
	// A thread team consists of 1 to n threads. The hardware determines
	// the maxmimum value of n. On a dual-socket CPU machine with 8 cores
	// per socket, the maximum size of a team is 8. The number of teams
	// (the league_size) is not limited by physical constraints (up to
	// some reasonable bound, which eventually depends upon the hardware
	// and programming model implementation).

	int main (int narg, char* args[]) {
	using Kokkos::parallel_reduce;
	typedef Kokkos::TeamPolicy<> team_policy;
	typedef typename team_policy::member_type team_member;

	Kokkos::initialize (narg, args);

	// Set up a policy that launches 12 teams, with the maximum number
	// of threads per team.

	const team_policy policy (12, Kokkos::AUTO);

	// This is a reduction with a team policy. The team policy changes
	// the first argument of the lambda. Rather than an integer index
	// (as with RangePolicy), it's now TeamPolicy::member_type. This
	// object provides all information to identify a thread uniquely.
	// It also provides some team-related function calls such as a team
	// barrier (which a subsequent example will use).
	//
	// Every member of the team contributes to the total sum. It is
	// helpful to think of the lambda's body as a "team parallel
	// region." That is, every team member is active and will execute
	// the body of the lambda.
	int sum = 0;
	// We also need to protect the usage of a lambda against compiling
	// with a backend which doesn't support it (i.e. Cuda 6.5/7.0).
	- #if (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	+ #if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
	parallel_reduce (policy, KOKKOS_LAMBDA (const team_member& thread, int& lsum) {
	lsum += 1;
	// TeamPolicy<>::member_type provides functions to query the
	// multidimensional index of a thread, as well as the number of
	// thread teams and the size of each team.
	printf ("Hello World: %i %i // %i %i\n", thread.league_rank (),
	thread.team_rank (), thread.league_size (), thread.team_size ());
	}, sum);
	#endif
	// The result will be 12*team_policy::team_size_max([=]{})
	printf ("Result %i\n",sum);

	Kokkos::finalize ();
	}

	diff --git a/lib/kokkos/generate_makefile.bash b/lib/kokkos/generate_makefile.bash
	index e7bd9da36..e671293ff 100755
	--- a/lib/kokkos/generate_makefile.bash
	+++ b/lib/kokkos/generate_makefile.bash
	@@ -1,413 +1,437 @@
	#!/bin/bash

	KOKKOS_DEVICES=""
	MAKE_J_OPTION="32"

	while [[ $# > 0 ]]
	do
	-key="$1"
	+ key="$1"

	-case $key in
	+ case $key in
	--kokkos-path*)
	- KOKKOS_PATH="${key#*=}"
	- ;;
	+ KOKKOS_PATH="${key#*=}"
	+ ;;
	+ --qthreads-path*)
	+ QTHREADS_PATH="${key#*=}"
	+ ;;
	--prefix*)
	- PREFIX="${key#*=}"
	- ;;
	+ PREFIX="${key#*=}"
	+ ;;
	--with-cuda)
	- KOKKOS_DEVICES="${KOKKOS_DEVICES},Cuda"
	- CUDA_PATH_NVCC=`which nvcc`
	- CUDA_PATH=${CUDA_PATH_NVCC%/bin/nvcc}
	- ;;
	+ KOKKOS_DEVICES="${KOKKOS_DEVICES},Cuda"
	+ CUDA_PATH_NVCC=`which nvcc`
	+ CUDA_PATH=${CUDA_PATH_NVCC%/bin/nvcc}
	+ ;;
	# Catch this before '--with-cuda*'
	--with-cuda-options*)
	- KOKKOS_CUDA_OPT="${key#*=}"
	- ;;
	+ KOKKOS_CUDA_OPT="${key#*=}"
	+ ;;
	--with-cuda*)
	- KOKKOS_DEVICES="${KOKKOS_DEVICES},Cuda"
	- CUDA_PATH="${key#*=}"
	- ;;
	+ KOKKOS_DEVICES="${KOKKOS_DEVICES},Cuda"
	+ CUDA_PATH="${key#*=}"
	+ ;;
	--with-openmp)
	- KOKKOS_DEVICES="${KOKKOS_DEVICES},OpenMP"
	- ;;
	+ KOKKOS_DEVICES="${KOKKOS_DEVICES},OpenMP"
	+ ;;
	--with-pthread)
	- KOKKOS_DEVICES="${KOKKOS_DEVICES},Pthread"
	- ;;
	+ KOKKOS_DEVICES="${KOKKOS_DEVICES},Pthread"
	+ ;;
	--with-serial)
	- KOKKOS_DEVICES="${KOKKOS_DEVICES},Serial"
	- ;;
	- --with-qthread*)
	- KOKKOS_DEVICES="${KOKKOS_DEVICES},Qthread"
	- QTHREAD_PATH="${key#*=}"
	- ;;
	+ KOKKOS_DEVICES="${KOKKOS_DEVICES},Serial"
	+ ;;
	+ --with-qthreads*)
	+ KOKKOS_DEVICES="${KOKKOS_DEVICES},Qthreads"
	+ if [ -z "$QTHREADS_PATH" ]; then
	+ QTHREADS_PATH="${key#*=}"
	+ fi
	+ ;;
	--with-devices*)
	- DEVICES="${key#*=}"
	- KOKKOS_DEVICES="${KOKKOS_DEVICES},${DEVICES}"
	- ;;
	+ DEVICES="${key#*=}"
	+ KOKKOS_DEVICES="${KOKKOS_DEVICES},${DEVICES}"
	+ ;;
	--with-gtest*)
	- GTEST_PATH="${key#*=}"
	- ;;
	+ GTEST_PATH="${key#*=}"
	+ ;;
	--with-hwloc*)
	- HWLOC_PATH="${key#*=}"
	- ;;
	+ HWLOC_PATH="${key#*=}"
	+ ;;
	--arch*)
	- KOKKOS_ARCH="${key#*=}"
	- ;;
	+ KOKKOS_ARCH="${key#*=}"
	+ ;;
	--cxxflags*)
	- CXXFLAGS="${key#*=}"
	- ;;
	+ CXXFLAGS="${key#*=}"
	+ ;;
	--ldflags*)
	- LDFLAGS="${key#*=}"
	- ;;
	+ LDFLAGS="${key#*=}"
	+ ;;
	--debug\|-dbg)
	- KOKKOS_DEBUG=yes
	- ;;
	+ KOKKOS_DEBUG=yes
	+ ;;
	--make-j*)
	- MAKE_J_OPTION="${key#*=}"
	- ;;
	+ MAKE_J_OPTION="${key#*=}"
	+ ;;
	--compiler*)
	- COMPILER="${key#*=}"
	- CNUM=`which ${COMPILER} 2>&1 >/dev/null \| grep "no ${COMPILER}" \| wc -l`
	- if [ ${CNUM} -gt 0 ]; then
	- echo "Invalid compiler by --compiler command: '${COMPILER}'"
	- exit
	- fi
	- if [[ ! -n ${COMPILER} ]]; then
	- echo "Empty compiler specified by --compiler command."
	- exit
	- fi
	- CNUM=`which ${COMPILER} \| grep ${COMPILER} \| wc -l`
	- if [ ${CNUM} -eq 0 ]; then
	- echo "Invalid compiler by --compiler command: '${COMPILER}'"
	- exit
	- fi
	- ;;
	- --with-options*)
	- KOKKOS_OPT="${key#*=}"
	- ;;
	+ COMPILER="${key#*=}"
	+ CNUM=`which ${COMPILER} 2>&1 >/dev/null \| grep "no ${COMPILER}" \| wc -l`
	+ if [ ${CNUM} -gt 0 ]; then
	+ echo "Invalid compiler by --compiler command: '${COMPILER}'"
	+ exit
	+ fi
	+ if [[ ! -n ${COMPILER} ]]; then
	+ echo "Empty compiler specified by --compiler command."
	+ exit
	+ fi
	+ CNUM=`which ${COMPILER} \| grep ${COMPILER} \| wc -l`
	+ if [ ${CNUM} -eq 0 ]; then
	+ echo "Invalid compiler by --compiler command: '${COMPILER}'"
	+ exit
	+ fi
	+ ;;
	+ --with-options*)
	+ KOKKOS_OPT="${key#*=}"
	+ ;;
	--help)
	- echo "Kokkos configure options:"
	- echo "--kokkos-path=/Path/To/Kokkos: Path to the Kokkos root directory"
	- echo "--prefix=/Install/Path: Path to where the Kokkos library should be installed"
	- echo ""
	- echo "--with-cuda[=/Path/To/Cuda]: enable Cuda and set path to Cuda Toolkit"
	- echo "--with-openmp: enable OpenMP backend"
	- echo "--with-pthread: enable Pthreads backend"
	- echo "--with-serial: enable Serial backend"
	- echo "--with-qthread=/Path/To/Qthread: enable Qthread backend"
	- echo "--with-devices: explicitly add a set of backends"
	- echo ""
	- echo "--arch=[OPTIONS]: set target architectures. Options are:"
	- echo " ARMv80 = ARMv8.0 Compatible CPU"
	- echo " ARMv81 = ARMv8.1 Compatible CPU"
	- echo " ARMv8-ThunderX = ARMv8 Cavium ThunderX CPU"
	- echo " SNB = Intel Sandy/Ivy Bridge CPUs"
	- echo " HSW = Intel Haswell CPUs"
	- echo " BDW = Intel Broadwell Xeon E-class CPUs"
	- echo " SKX = Intel Sky Lake Xeon E-class HPC CPUs (AVX512)"
	- echo " KNC = Intel Knights Corner Xeon Phi"
	- echo " KNL = Intel Knights Landing Xeon Phi"
	- echo " Kepler30 = NVIDIA Kepler generation CC 3.0"
	- echo " Kepler35 = NVIDIA Kepler generation CC 3.5"
	- echo " Kepler37 = NVIDIA Kepler generation CC 3.7"
	- echo " Pascal60 = NVIDIA Pascal generation CC 6.0"
	- echo " Pascal61 = NVIDIA Pascal generation CC 6.1"
	- echo " Maxwell50 = NVIDIA Maxwell generation CC 5.0"
	- echo " Power8 = IBM POWER8 CPUs"
	- echo " Power9 = IBM POWER9 CPUs"
	- echo ""
	- echo "--compiler=/Path/To/Compiler set the compiler"
	- echo "--debug,-dbg: enable Debugging"
	- echo "--cxxflags=[FLAGS] overwrite CXXFLAGS for library build and test build"
	- echo " This will still set certain required flags via"
	- echo " KOKKOS_CXXFLAGS (such as -fopenmp, --std=c++11, etc.)"
	- echo "--ldflags=[FLAGS] overwrite LDFLAGS for library build and test build"
	- echo " This will still set certain required flags via"
	- echo " KOKKOS_LDFLAGS (such as -fopenmp, -lpthread, etc.)"
	- echo "--with-gtest=/Path/To/Gtest: set path to gtest (used in unit and performance tests"
	- echo "--with-hwloc=/Path/To/Hwloc: set path to hwloc"
	- echo "--with-options=[OPTIONS]: additional options to Kokkos:"
	- echo " aggressive_vectorization = add ivdep on loops"
	- echo "--with-cuda-options=[OPT]: additional options to CUDA:"
	- echo " force_uvm, use_ldg, enable_lambda, rdc"
	- echo "--make-j=[NUM]: set -j flag used during build."
	- exit 0
	- ;;
	+ echo "Kokkos configure options:"
	+ echo "--kokkos-path=/Path/To/Kokkos: Path to the Kokkos root directory."
	+ echo "--qthreads-path=/Path/To/Qthreads: Path to Qthreads install directory."
	+ echo " Overrides path given by --with-qthreads."
	+ echo "--prefix=/Install/Path: Path to install the Kokkos library."
	+ echo ""
	+ echo "--with-cuda[=/Path/To/Cuda]: Enable Cuda and set path to Cuda Toolkit."
	+ echo "--with-openmp: Enable OpenMP backend."
	+ echo "--with-pthread: Enable Pthreads backend."
	+ echo "--with-serial: Enable Serial backend."
	+ echo "--with-qthreads[=/Path/To/Qthreads]: Enable Qthreads backend."
	+ echo "--with-devices: Explicitly add a set of backends."
	+ echo ""
	+ echo "--arch=[OPT]: Set target architectures. Options are:"
	+ echo " ARMv80 = ARMv8.0 Compatible CPU"
	+ echo " ARMv81 = ARMv8.1 Compatible CPU"
	+ echo " ARMv8-ThunderX = ARMv8 Cavium ThunderX CPU"
	+ echo " SNB = Intel Sandy/Ivy Bridge CPUs"
	+ echo " HSW = Intel Haswell CPUs"
	+ echo " BDW = Intel Broadwell Xeon E-class CPUs"
	+ echo " SKX = Intel Sky Lake Xeon E-class HPC CPUs (AVX512)"
	+ echo " KNC = Intel Knights Corner Xeon Phi"
	+ echo " KNL = Intel Knights Landing Xeon Phi"
	+ echo " Kepler30 = NVIDIA Kepler generation CC 3.0"
	+ echo " Kepler35 = NVIDIA Kepler generation CC 3.5"
	+ echo " Kepler37 = NVIDIA Kepler generation CC 3.7"
	+ echo " Pascal60 = NVIDIA Pascal generation CC 6.0"
	+ echo " Pascal61 = NVIDIA Pascal generation CC 6.1"
	+ echo " Maxwell50 = NVIDIA Maxwell generation CC 5.0"
	+ echo " Power8 = IBM POWER8 CPUs"
	+ echo " Power9 = IBM POWER9 CPUs"
	+ echo ""
	+ echo "--compiler=/Path/To/Compiler Set the compiler."
	+ echo "--debug,-dbg: Enable Debugging."
	+ echo "--cxxflags=[FLAGS] Overwrite CXXFLAGS for library build and test"
	+ echo " build. This will still set certain required"
	+ echo " flags via KOKKOS_CXXFLAGS (such as -fopenmp,"
	+ echo " --std=c++11, etc.)."
	+ echo "--ldflags=[FLAGS] Overwrite LDFLAGS for library build and test"
	+ echo " build. This will still set certain required"
	+ echo " flags via KOKKOS_LDFLAGS (such as -fopenmp,"
	+ echo " -lpthread, etc.)."
	+ echo "--with-gtest=/Path/To/Gtest: Set path to gtest. (Used in unit and performance"
	+ echo " tests.)"
	+ echo "--with-hwloc=/Path/To/Hwloc: Set path to hwloc."
	+ echo "--with-options=[OPT]: Additional options to Kokkos:"
	+ echo " aggressive_vectorization = add ivdep on loops"
	+ echo "--with-cuda-options=[OPT]: Additional options to CUDA:"
	+ echo " force_uvm, use_ldg, enable_lambda, rdc"
	+ echo "--make-j=[NUM]: Set -j flag used during build."
	+ exit 0
	+ ;;
	*)
	- echo "warning: ignoring unknown option $key"
	- ;;
	-esac
	-shift
	+ echo "warning: ignoring unknown option $key"
	+ ;;
	+ esac
	+
	+ shift
	done

	-# If KOKKOS_PATH undefined, assume parent dir of this
	-# script is the KOKKOS_PATH
	+# Remove leading ',' from KOKKOS_DEVICES.
	+KOKKOS_DEVICES=$(echo $KOKKOS_DEVICES \| sed 's/^,//')
	+
	+# If KOKKOS_PATH undefined, assume parent dir of this script is the KOKKOS_PATH.
	if [ -z "$KOKKOS_PATH" ]; then
	- KOKKOS_PATH=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
	+ KOKKOS_PATH=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
	else
	- # Ensure KOKKOS_PATH is abs path
	- KOKKOS_PATH=$( cd $KOKKOS_PATH && pwd )
	+ # Ensure KOKKOS_PATH is abs path
	+ KOKKOS_PATH=$( cd $KOKKOS_PATH && pwd )
	fi

	if [ "${KOKKOS_PATH}" = "${PWD}" ] \|\| [ "${KOKKOS_PATH}" = "${PWD}/" ]; then
	-echo "Running generate_makefile.sh in the Kokkos root directory is not allowed"
	-exit
	+ echo "Running generate_makefile.sh in the Kokkos root directory is not allowed"
	+ exit
	fi

	KOKKOS_SRC_PATH=${KOKKOS_PATH}

	KOKKOS_SETTINGS="KOKKOS_SRC_PATH=${KOKKOS_SRC_PATH}"
	#KOKKOS_SETTINGS="KOKKOS_PATH=${KOKKOS_PATH}"

	if [ ${#COMPILER} -gt 0 ]; then
	-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} CXX=${COMPILER}"
	+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} CXX=${COMPILER}"
	fi
	+
	if [ ${#KOKKOS_DEVICES} -gt 0 ]; then
	-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_DEVICES=${KOKKOS_DEVICES}"
	+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_DEVICES=${KOKKOS_DEVICES}"
	fi
	+
	if [ ${#KOKKOS_ARCH} -gt 0 ]; then
	-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_ARCH=${KOKKOS_ARCH}"
	+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_ARCH=${KOKKOS_ARCH}"
	fi
	+
	if [ ${#KOKKOS_DEBUG} -gt 0 ]; then
	-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_DEBUG=${KOKKOS_DEBUG}"
	+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_DEBUG=${KOKKOS_DEBUG}"
	fi
	+
	if [ ${#CUDA_PATH} -gt 0 ]; then
	-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} CUDA_PATH=${CUDA_PATH}"
	+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} CUDA_PATH=${CUDA_PATH}"
	fi
	+
	if [ ${#CXXFLAGS} -gt 0 ]; then
	-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} CXXFLAGS=\"${CXXFLAGS}\""
	+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} CXXFLAGS=\"${CXXFLAGS}\""
	fi
	+
	if [ ${#LDFLAGS} -gt 0 ]; then
	-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} LDFLAGS=\"${LDFLAGS}\""
	+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} LDFLAGS=\"${LDFLAGS}\""
	fi
	+
	if [ ${#GTEST_PATH} -gt 0 ]; then
	-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} GTEST_PATH=${GTEST_PATH}"
	+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} GTEST_PATH=${GTEST_PATH}"
	else
	-GTEST_PATH=${KOKKOS_PATH}/tpls/gtest
	-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} GTEST_PATH=${GTEST_PATH}"
	+ GTEST_PATH=${KOKKOS_PATH}/tpls/gtest
	+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} GTEST_PATH=${GTEST_PATH}"
	fi
	+
	if [ ${#HWLOC_PATH} -gt 0 ]; then
	-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} HWLOC_PATH=${HWLOC_PATH} KOKKOS_USE_TPLS=hwloc"
	+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} HWLOC_PATH=${HWLOC_PATH} KOKKOS_USE_TPLS=hwloc"
	fi
	-if [ ${#QTHREAD_PATH} -gt 0 ]; then
	-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} QTHREAD_PATH=${QTHREAD_PATH}"
	+
	+if [ ${#QTHREADS_PATH} -gt 0 ]; then
	+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} QTHREADS_PATH=${QTHREADS_PATH}"
	fi
	+
	if [ ${#KOKKOS_OPT} -gt 0 ]; then
	-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_OPTIONS=${KOKKOS_OPT}"
	+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_OPTIONS=${KOKKOS_OPT}"
	fi
	+
	if [ ${#KOKKOS_CUDA_OPT} -gt 0 ]; then
	-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_CUDA_OPTIONS=${KOKKOS_CUDA_OPT}"
	+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_CUDA_OPTIONS=${KOKKOS_CUDA_OPT}"
	fi

	KOKKOS_SETTINGS_NO_KOKKOS_PATH="${KOKKOS_SETTINGS}"

	KOKKOS_TEST_INSTALL_PATH="${PWD}/install"
	if [ ${#PREFIX} -gt 0 ]; then
	-KOKKOS_INSTALL_PATH="${PREFIX}"
	+ KOKKOS_INSTALL_PATH="${PREFIX}"
	else
	-KOKKOS_INSTALL_PATH=${KOKKOS_TEST_INSTALL_PATH}
	+ KOKKOS_INSTALL_PATH=${KOKKOS_TEST_INSTALL_PATH}
	fi


	mkdir install
	echo "#Makefile to satisfy existens of target kokkos-clean before installing the library" > install/Makefile.kokkos
	echo "kokkos-clean:" >> install/Makefile.kokkos
	echo "" >> install/Makefile.kokkos
	mkdir core
	mkdir core/unit_test
	mkdir core/perf_test
	mkdir containers
	mkdir containers/unit_tests
	mkdir containers/performance_tests
	mkdir algorithms
	mkdir algorithms/unit_tests
	mkdir algorithms/performance_tests
	mkdir example
	mkdir example/fixture
	mkdir example/feint
	mkdir example/fenl
	mkdir example/tutorial

	if [ ${#KOKKOS_ENABLE_EXAMPLE_ICHOL} -gt 0 ]; then
	-mkdir example/ichol
	+ mkdir example/ichol
	fi

	KOKKOS_SETTINGS="${KOKKOS_SETTINGS_NO_KOKKOS_PATH} KOKKOS_PATH=${KOKKOS_PATH}"

	# Generate subdirectory makefiles.
	echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > core/unit_test/Makefile
	echo "" >> core/unit_test/Makefile
	echo "all:" >> core/unit_test/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/core/unit_test/Makefile ${KOKKOS_SETTINGS}" >> core/unit_test/Makefile
	echo "" >> core/unit_test/Makefile
	echo "test: all" >> core/unit_test/Makefile
	echo -e "\tmake -f ${KOKKOS_PATH}/core/unit_test/Makefile ${KOKKOS_SETTINGS} test" >> core/unit_test/Makefile
	echo "" >> core/unit_test/Makefile
	echo "clean:" >> core/unit_test/Makefile
	echo -e "\tmake -f ${KOKKOS_PATH}/core/unit_test/Makefile ${KOKKOS_SETTINGS} clean" >> core/unit_test/Makefile

	echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > core/perf_test/Makefile
	echo "" >> core/perf_test/Makefile
	echo "all:" >> core/perf_test/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/core/perf_test/Makefile ${KOKKOS_SETTINGS}" >> core/perf_test/Makefile
	echo "" >> core/perf_test/Makefile
	echo "test: all" >> core/perf_test/Makefile
	echo -e "\tmake -f ${KOKKOS_PATH}/core/perf_test/Makefile ${KOKKOS_SETTINGS} test" >> core/perf_test/Makefile
	echo "" >> core/perf_test/Makefile
	echo "clean:" >> core/perf_test/Makefile
	echo -e "\tmake -f ${KOKKOS_PATH}/core/perf_test/Makefile ${KOKKOS_SETTINGS} clean" >> core/perf_test/Makefile

	echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > containers/unit_tests/Makefile
	echo "" >> containers/unit_tests/Makefile
	echo "all:" >> containers/unit_tests/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/containers/unit_tests/Makefile ${KOKKOS_SETTINGS}" >> containers/unit_tests/Makefile
	echo "" >> containers/unit_tests/Makefile
	echo "test: all" >> containers/unit_tests/Makefile
	echo -e "\tmake -f ${KOKKOS_PATH}/containers/unit_tests/Makefile ${KOKKOS_SETTINGS} test" >> containers/unit_tests/Makefile
	echo "" >> containers/unit_tests/Makefile
	echo "clean:" >> containers/unit_tests/Makefile
	echo -e "\tmake -f ${KOKKOS_PATH}/containers/unit_tests/Makefile ${KOKKOS_SETTINGS} clean" >> containers/unit_tests/Makefile

	echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > containers/performance_tests/Makefile
	echo "" >> containers/performance_tests/Makefile
	echo "all:" >> containers/performance_tests/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/containers/performance_tests/Makefile ${KOKKOS_SETTINGS}" >> containers/performance_tests/Makefile
	echo "" >> containers/performance_tests/Makefile
	echo "test: all" >> containers/performance_tests/Makefile
	echo -e "\tmake -f ${KOKKOS_PATH}/containers/performance_tests/Makefile ${KOKKOS_SETTINGS} test" >> containers/performance_tests/Makefile
	echo "" >> containers/performance_tests/Makefile
	echo "clean:" >> containers/performance_tests/Makefile
	echo -e "\tmake -f ${KOKKOS_PATH}/containers/performance_tests/Makefile ${KOKKOS_SETTINGS} clean" >> containers/performance_tests/Makefile

	echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > algorithms/unit_tests/Makefile
	echo "" >> algorithms/unit_tests/Makefile
	echo "all:" >> algorithms/unit_tests/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/algorithms/unit_tests/Makefile ${KOKKOS_SETTINGS}" >> algorithms/unit_tests/Makefile
	echo "" >> algorithms/unit_tests/Makefile
	echo "test: all" >> algorithms/unit_tests/Makefile
	echo -e "\tmake -f ${KOKKOS_PATH}/algorithms/unit_tests/Makefile ${KOKKOS_SETTINGS} test" >> algorithms/unit_tests/Makefile
	echo "" >> algorithms/unit_tests/Makefile
	echo "clean:" >> algorithms/unit_tests/Makefile
	echo -e "\tmake -f ${KOKKOS_PATH}/algorithms/unit_tests/Makefile ${KOKKOS_SETTINGS} clean" >> algorithms/unit_tests/Makefile

	KOKKOS_SETTINGS="${KOKKOS_SETTINGS_NO_KOKKOS_PATH} KOKKOS_PATH=${KOKKOS_TEST_INSTALL_PATH}"

	echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > example/fixture/Makefile
	echo "" >> example/fixture/Makefile
	echo "all:" >> example/fixture/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/fixture/Makefile ${KOKKOS_SETTINGS}" >> example/fixture/Makefile
	echo "" >> example/fixture/Makefile
	echo "test: all" >> example/fixture/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/fixture/Makefile ${KOKKOS_SETTINGS} test" >> example/fixture/Makefile
	echo "" >> example/fixture/Makefile
	echo "clean:" >> example/fixture/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/fixture/Makefile ${KOKKOS_SETTINGS} clean" >> example/fixture/Makefile

	echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > example/feint/Makefile
	echo "" >> example/feint/Makefile
	echo "all:" >> example/feint/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/feint/Makefile ${KOKKOS_SETTINGS}" >> example/feint/Makefile
	echo "" >> example/feint/Makefile
	echo "test: all" >> example/feint/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/feint/Makefile ${KOKKOS_SETTINGS} test" >> example/feint/Makefile
	echo "" >> example/feint/Makefile
	echo "clean:" >> example/feint/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/feint/Makefile ${KOKKOS_SETTINGS} clean" >> example/feint/Makefile

	echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > example/fenl/Makefile
	echo "" >> example/fenl/Makefile
	echo "all:" >> example/fenl/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/fenl/Makefile ${KOKKOS_SETTINGS}" >> example/fenl/Makefile
	echo "" >> example/fenl/Makefile
	echo "test: all" >> example/fenl/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/fenl/Makefile ${KOKKOS_SETTINGS} test" >> example/fenl/Makefile
	echo "" >> example/fenl/Makefile
	echo "clean:" >> example/fenl/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/fenl/Makefile ${KOKKOS_SETTINGS} clean" >> example/fenl/Makefile

	echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > example/tutorial/Makefile
	echo "" >> example/tutorial/Makefile
	echo "build:" >> example/tutorial/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/tutorial/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}' KOKKOS_PATH=${KOKKOS_PATH} build">> example/tutorial/Makefile
	echo "" >> example/tutorial/Makefile
	echo "test: build" >> example/tutorial/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/tutorial/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}' KOKKOS_PATH=${KOKKOS_PATH} test" >> example/tutorial/Makefile
	echo "" >> example/tutorial/Makefile
	echo "clean:" >> example/tutorial/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/tutorial/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}' KOKKOS_PATH=${KOKKOS_PATH} clean" >> example/tutorial/Makefile


	if [ ${#KOKKOS_ENABLE_EXAMPLE_ICHOL} -gt 0 ]; then
	echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > example/ichol/Makefile
	echo "" >> example/ichol/Makefile
	echo "all:" >> example/ichol/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/ichol/Makefile ${KOKKOS_SETTINGS}" >> example/ichol/Makefile
	echo "" >> example/ichol/Makefile
	echo "test: all" >> example/ichol/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/ichol/Makefile ${KOKKOS_SETTINGS} test" >> example/ichol/Makefile
	echo "" >> example/ichol/Makefile
	echo "clean:" >> example/ichol/Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/ichol/Makefile ${KOKKOS_SETTINGS} clean" >> example/ichol/Makefile
	fi

	KOKKOS_SETTINGS="${KOKKOS_SETTINGS_NO_KOKKOS_PATH} KOKKOS_PATH=${KOKKOS_PATH}"

	# Generate top level directory makefile.
	echo "Generating Makefiles with options " ${KOKKOS_SETTINGS}
	echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > Makefile
	echo "" >> Makefile
	echo "kokkoslib:" >> Makefile
	echo -e "\tcd core; \\" >> Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_SETTINGS} PREFIX=${KOKKOS_INSTALL_PATH} build-lib" >> Makefile
	echo "" >> Makefile
	echo "install: kokkoslib" >> Makefile
	echo -e "\tcd core; \\" >> Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_SETTINGS} PREFIX=${KOKKOS_INSTALL_PATH} install" >> Makefile
	echo "" >> Makefile
	echo "kokkoslib-test:" >> Makefile
	echo -e "\tcd core; \\" >> Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_SETTINGS} PREFIX=${KOKKOS_TEST_INSTALL_PATH} build-lib" >> Makefile
	echo "" >> Makefile
	echo "install-test: kokkoslib-test" >> Makefile
	echo -e "\tcd core; \\" >> Makefile
	echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_SETTINGS} PREFIX=${KOKKOS_TEST_INSTALL_PATH} install" >> Makefile
	echo "" >> Makefile
	echo "build-test: install-test" >> Makefile
	echo -e "\tmake -C core/unit_test" >> Makefile
	echo -e "\tmake -C core/perf_test" >> Makefile
	echo -e "\tmake -C containers/unit_tests" >> Makefile
	echo -e "\tmake -C containers/performance_tests" >> Makefile
	echo -e "\tmake -C algorithms/unit_tests" >> Makefile
	echo -e "\tmake -C example/fixture" >> Makefile
	echo -e "\tmake -C example/feint" >> Makefile
	echo -e "\tmake -C example/fenl" >> Makefile
	echo -e "\tmake -C example/tutorial build" >> Makefile
	echo "" >> Makefile
	echo "test: build-test" >> Makefile
	echo -e "\tmake -C core/unit_test test" >> Makefile
	echo -e "\tmake -C core/perf_test test" >> Makefile
	echo -e "\tmake -C containers/unit_tests test" >> Makefile
	echo -e "\tmake -C containers/performance_tests test" >> Makefile
	echo -e "\tmake -C algorithms/unit_tests test" >> Makefile
	echo -e "\tmake -C example/fixture test" >> Makefile
	echo -e "\tmake -C example/feint test" >> Makefile
	echo -e "\tmake -C example/fenl test" >> Makefile
	echo -e "\tmake -C example/tutorial test" >> Makefile
	echo "" >> Makefile
	echo "unit-tests-only:" >> Makefile
	echo -e "\tmake -C core/unit_test test" >> Makefile
	echo -e "\tmake -C containers/unit_tests test" >> Makefile
	echo -e "\tmake -C algorithms/unit_tests test" >> Makefile
	echo "" >> Makefile
	echo "clean:" >> Makefile
	echo -e "\tmake -C core/unit_test clean" >> Makefile
	echo -e "\tmake -C core/perf_test clean" >> Makefile
	echo -e "\tmake -C containers/unit_tests clean" >> Makefile
	echo -e "\tmake -C containers/performance_tests clean" >> Makefile
	echo -e "\tmake -C algorithms/unit_tests clean" >> Makefile
	echo -e "\tmake -C example/fixture clean" >> Makefile
	echo -e "\tmake -C example/feint clean" >> Makefile
	echo -e "\tmake -C example/fenl clean" >> Makefile
	echo -e "\tmake -C example/tutorial clean" >> Makefile
	echo -e "\tcd core; \\" >> Makefile
	echo -e "\tmake -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_SETTINGS} clean" >> Makefile
	diff --git a/lib/linalg/Install.py b/lib/linalg/Install.py
	new file mode 100644
	index 000000000..c7076ca52
	--- /dev/null
	+++ b/lib/linalg/Install.py
	@@ -0,0 +1,52 @@
	+#!/usr/bin/env python
	+
	+# install.py tool to do build of the linear algebra library
	+# used to automate the steps described in the README file in this dir
	+
	+import sys,commands,os
	+
	+# help message
	+
	+help = """
	+Syntax: python Install.py -m machine
	+ -m = peform a clean followed by "make -f Makefile.machine"
	+ machine = suffix of a lib/Makefile.* file
	+"""
	+
	+# print error message or help
	+
	+def error(str=None):
	+ if not str: print help
	+ else: print "ERROR",str
	+ sys.exit()
	+
	+# parse args
	+
	+args = sys.argv[1:]
	+nargs = len(args)
	+if nargs == 0: error()
	+
	+machine = None
	+
	+iarg = 0
	+while iarg < nargs:
	+ if args[iarg] == "-m":
	+ if iarg+2 > nargs: error()
	+ machine = args[iarg+1]
	+ iarg += 2
	+ else: error()
	+
	+# set lib from working dir
	+
	+cwd = os.getcwd()
	+lib = os.path.basename(cwd)
	+
	+# make the library
	+
	+print "Building lib%s.a ..." % lib
	+cmd = "make -f Makefile.%s clean; make -f Makefile.%s" % (machine,machine)
	+txt = commands.getoutput(cmd)
	+print txt
	+
	+if os.path.exists("lib%s.a" % lib): print "Build was successful"
	+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
	diff --git a/lib/linalg/README b/lib/linalg/README
	index 20f3ff094..725df86c4 100644
	--- a/lib/linalg/README
	+++ b/lib/linalg/README
	@@ -1,23 +1,29 @@
	This directory has BLAS and LAPACK files needed by the USER-ATC and
	USER-AWPMD packages, and possibly by other packages in the future.

	Note that this is an incomplete subset of full BLAS/LAPACK.

	-You should only need to build and use the resulting library in this
	-directory if you want to build LAMMPS with the USER-ATC and/or
	-USER-AWPMD packages AND you do not have any other suitable BLAS and
	-LAPACK libraries installed on your system. E.g. ATLAS, GOTO-BLAS,
	-OpenBLAS, ACML, or MKL.
	+You should only need to build and use the library in this directory if
	+you want to build LAMMPS with the USER-ATC and/or USER-AWPMD packages
	+AND you do not have any other suitable BLAS and LAPACK libraries
	+installed on your system. E.g. ATLAS, GOTO-BLAS, OpenBLAS, ACML, or
	+MKL.
	+
	+You can type "make lib-linalg" from the src directory to see help on
	+how to build this library via make commands, or you can do the same
	+thing by typing "python Install.py" from within this directory, or you
	+can do it manually by following the instructions below.

	Build the library using one of the provided Makefile.* files or create
	your own, specific to your compiler and system. For example:

	make -f Makefile.gfortran

	When you are done building this library, one file should exist in this
	directory:

	liblinalg.a the library LAMMPS will link against

	You can then include this library and its path in the Makefile.lammps
	-file of any packages that need it, e.g. in lib/atc/Makefile.lammps.
	+file of any packages that need it. As an example, see the
	+lib/atc/Makefile.lammps.linalg file.
	diff --git a/lib/meam/Install.py b/lib/meam/Install.py
	new file mode 100644
	index 000000000..18b426f92
	--- /dev/null
	+++ b/lib/meam/Install.py
	@@ -0,0 +1,82 @@
	+#!/usr/bin/env python
	+
	+# install.py tool to do a generic build of a library
	+# soft linked to by many of the lib/Install.py files
	+# used to automate the steps described in the corresponding lib/README
	+
	+import sys,commands,os
	+
	+# help message
	+
	+help = """
	+Syntax: python Install.py -m machine -e suffix
	+ specify -m and optionally -e, order does not matter
	+ -m = peform a clean followed by "make -f Makefile.machine"
	+ machine = suffix of a lib/Makefile.* file
	+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
	+ does not alter existing Makefile.machine
	+"""
	+
	+# print error message or help
	+
	+def error(str=None):
	+ if not str: print help
	+ else: print "ERROR",str
	+ sys.exit()
	+
	+# parse args
	+
	+args = sys.argv[1:]
	+nargs = len(args)
	+if nargs == 0: error()
	+
	+machine = None
	+extraflag = 0
	+
	+iarg = 0
	+while iarg < nargs:
	+ if args[iarg] == "-m":
	+ if iarg+2 > nargs: error()
	+ machine = args[iarg+1]
	+ iarg += 2
	+ elif args[iarg] == "-e":
	+ if iarg+2 > nargs: error()
	+ extraflag = 1
	+ suffix = args[iarg+1]
	+ iarg += 2
	+ else: error()
	+
	+# set lib from working dir
	+
	+cwd = os.getcwd()
	+lib = os.path.basename(cwd)
	+
	+# create Makefile.auto as copy of Makefile.machine
	+# reset EXTRAMAKE if requested
	+
	+if not os.path.exists("Makefile.%s" % machine):
	+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
	+
	+lines = open("Makefile.%s" % machine,'r').readlines()
	+fp = open("Makefile.auto",'w')
	+
	+for line in lines:
	+ words = line.split()
	+ if len(words) == 3 and extraflag and \
	+ words[0] == "EXTRAMAKE" and words[1] == '=':
	+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
	+ print >>fp,line,
	+
	+fp.close()
	+
	+# make the library via Makefile.auto
	+
	+print "Building lib%s.a ..." % lib
	+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
	+txt = commands.getoutput(cmd)
	+print txt
	+
	+if os.path.exists("lib%s.a" % lib): print "Build was successful"
	+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
	+if not os.path.exists("Makefile.lammps"):
	+ print "lib/%s/Makefile.lammps was NOT created" % lib
	diff --git a/lib/meam/README b/lib/meam/README
	index 436259ee8..b3111c131 100644
	--- a/lib/meam/README
	+++ b/lib/meam/README
	@@ -1,46 +1,51 @@
	MEAM (modified embedded atom method) library

	Greg Wagner, Sandia National Labs
	gjwagne at sandia.gov
	Jan 2007

	This library is in implementation of the MEAM potential, specifically
	designed to work with LAMMPS.

	-------------------------------------------------

	This directory has source files to build a library that LAMMPS
	links against when using the MEAM package.

	This library must be built with a F90 compiler, before LAMMPS is
	built, so LAMMPS can link against it.

	+You can type "make lib-meam" from the src directory to see help on how
	+to build this library via make commands, or you can do the same thing
	+by typing "python Install.py" from within this directory, or you can
	+do it manually by following the instructions below.
	+
	Build the library using one of the provided Makefile.* files or create
	your own, specific to your compiler and system. For example:

	make -f Makefile.gfortran

	When you are done building this library, two files should
	exist in this directory:

	libmeam.a the library LAMMPS will link against
	Makefile.lammps settings the LAMMPS Makefile will import

	Makefile.lammps is created by the make command, by copying one of the
	Makefile.lammps.* files. See the EXTRAMAKE setting at the top of the
	Makefile.* files.

	IMPORTANT: You must examine the final Makefile.lammps to insure it is
	correct for your system, else the LAMMPS build will likely fail.

	Makefile.lammps has settings for 3 variables:

	user-meam_SYSINC = leave blank for this package
	user-meam_SYSLIB = auxiliary F90 libs needed to link a F90 lib with
	a C++ program (LAMMPS) via a C++ compiler
	user-meam_SYSPATH = path(s) to where those libraries are

	Because you have a F90 compiler on your system, you should have these
	libraries. But you will have to figure out which ones are needed and
	where they are. Examples of common configurations are in the
	Makefile.lammps.* files.
	diff --git a/lib/molfile/Makefile.lammps b/lib/molfile/Makefile.lammps
	index 08118991a..a181f48ae 100644
	--- a/lib/molfile/Makefile.lammps
	+++ b/lib/molfile/Makefile.lammps
	@@ -1,37 +1,43 @@
	# This file contains the hooks to build and link LAMMPS with the VMD
	# molfile plugins described here:
	#
	# http://www.ks.uiuc.edu/Research/vmd/plugins/molfile
	#
	# When you build LAMMPS with the USER-MOLFILE package installed, it will
	# use the 3 settings in this file. They should be set as follows.
	#
	+# The molfile_SYSINC setting is to point to the folder with the VMD
	+# plugin headers. By default it points to bundled headers in this folder
	+#
	# The molfile_SYSLIB setting is for a system dynamic loading library
	# that will be used to load the molfile plugins. It contains functions
	# like dlopen(), dlsym() and so on for dynamic linking of executable
	# code into an executable. For Linux and most current Unix-like
	# operating systems, the setting of "-ldl" will work. On some platforms
	# you may need "-ldld". For compilation on Windows, a different
	# mechanism is used that is part of the Windows programming environment
	# and thus molfile_SYSLIB can be left blank.
	#
	# The molfile_SYSINC and molfile_SYSPATH variables do not typically need
	# to be set. If the dl library is not in a place the linker can find
	# it, specify its directory via the molfile_SYSPATH variable, e.g.
	# -Ldir.

	# -----------------------------------------------------------

	# Settings that the LAMMPS build will import when this package is installed

	-molfile_SYSINC =
	+# change this to -I/path/to/your/lib/vmd/plugins/include if the bundled
	+# header files are incompatible with your VMD plugsins
	+molfile_SYSINC =-I../../lib/molfile
	+#
	ifneq ($(LIBOBJDIR),/Obj_mingw32)
	ifneq ($(LIBOBJDIR),/Obj_mingw64)
	ifneq ($(LIBOBJDIR),/Obj_mingw32-mpi)
	ifneq ($(LIBOBJDIR),/Obj_mingw64-mpi)
	molfile_SYSLIB = -ldl
	endif
	endif
	endif
	endif
	molfile_SYSPATH =
	diff --git a/lib/molfile/README b/lib/molfile/README
	index 09ea3cc5c..9e8260c20 100644
	--- a/lib/molfile/README
	+++ b/lib/molfile/README
	@@ -1,22 +1,35 @@
	This directory has a Makefile.lammps file with settings that allows
	LAMMPS to dynamically link to the VMD molfile library. This is
	required to use the USER-MOLFILE package and its interface to the dump
	and write_dump commands in a LAMMPS input script.

	More information about the VMD molfile plugins can be found at
	http://www.ks.uiuc.edu/Research/vmd/plugins/molfile.

	-More specifically, to be able to dynamically load and execute the
	-plugins from inside LAMMPS, you need to link with a system library
	-containing functions like dlopen(), dlsym() and so on for dynamic
	-linking of executable code into an executable. This library is
	-defined by setting the molfile_SYSLIB variable in the Makefile.lammps
	-file in this dir.
	+NOTE: while the programming interface (API) of the VMD molfile plugins
	+is backward compatible (i.e. you can expect to be able to compile this
	+package for plugins from newer VMD packages), the binary interface
	+(ABI) is not. So it is necessary to compile this package with the
	+VMD molfile plugin header files (vmdplugin.h and molfile_plugin.h)
	+matching VMD installation that the (binary) plugin files are taken from.
	+These header files can be found inside the VMD installation tree under
	+"plugins/include". For convenience, this package includes a set of
	+header files that is compatible with VMD 1.9.3 (the current version
	+in April 2017). You need to adjust the molfile_SYSINC variable in the
	+Makefile.lammps file in this directory, in case you want to use VMD
	+molfile plugins from a different version. The interface is compatible
	+with plugins starting from VMD version 1.8.4.
	+
	+In order to be able to dynamically load and execute the plugins from
	+inside LAMMPS, you need to link with a system library containing functions
	+like dlopen(), dlsym() and so on for dynamic linking of executable code
	+into an executable. This library is defined by setting the molfile_SYSLIB
	+variable in the Makefile.lammps file in this dir.

	For Linux and most current unix-like operating systems, this can be
	kept at the default setting of "-ldl" (on some platforms this library
	is called "-ldld"). For compilation on Windows, a slightly different
	mechanism is used that is part of the Windows programming environment
	-and this library is not needed.
	+and this kind of library is not needed.

	See the header of Makefile.lammps for more info.
	diff --git a/src/USER-MOLFILE/molfile_plugin.h b/lib/molfile/molfile_plugin.h
	similarity index 92%
	rename from src/USER-MOLFILE/molfile_plugin.h
	rename to lib/molfile/molfile_plugin.h
	index 7a2d7ca42..c79e7a5ab 100644
	--- a/src/USER-MOLFILE/molfile_plugin.h
	+++ b/lib/molfile/molfile_plugin.h
	@@ -1,890 +1,903 @@
	/***************************************************************************
	*cr
	*cr (C) Copyright 1995-2006 The Board of Trustees of the
	*cr University of Illinois
	*cr All Rights Reserved
	*cr
	***************************************************************************/

	/***************************************************************************
	* RCS INFORMATION:
	*
	* $RCSfile: molfile_plugin.h,v $
	* $Author: johns $ $Locker: $ $State: Exp $
	- * $Revision: 1.103 $ $Date: 2011/03/05 03:56:11 $
	+ * $Revision: 1.108 $ $Date: 2016/02/26 03:17:01 $
	*
	***************************************************************************/

	/** @file
	* API for C extensions to define a way to load structure, coordinate,
	* trajectory, and volumetric data files
	*/

	#ifndef MOL_FILE_PLUGIN_H
	#define MOL_FILE_PLUGIN_H

	#include "vmdplugin.h"

	#if defined(DESRES_READ_TIMESTEP2)
	/* includes needed for large integer types used for frame counts */
	#include <sys/types.h>
	typedef ssize_t molfile_ssize_t; /*< for frame counts /
	#endif

	/**
	* Define a common plugin type to be used when registering the plugin.
	*/
	#define MOLFILE_PLUGIN_TYPE "mol file reader"

	/**
	* File converter plugins use the same API but register under a different
	* type so that regular file readers can have priority.
	*/
	#define MOLFILE_CONVERTER_PLUGIN_TYPE "mol file converter"

	/* File plugin symbolic constants for better code readability */
	#define MOLFILE_SUCCESS 0 /*< succeeded in reading file /
	#define MOLFILE_EOF -1 /*< end of file /
	#define MOLFILE_ERROR -1 /*< error reading/opening a file /
	#define MOLFILE_NOSTRUCTUREDATA -2 /*< no structure data in this file /

	#define MOLFILE_NUMATOMS_UNKNOWN -1 /*< unknown number of atoms /
	#define MOLFILE_NUMATOMS_NONE 0 /*< no atoms in this file type /

	/**
	* Maximum string size macro
	*/
	#define MOLFILE_BUFSIZ 81 /*< maximum chars in string data /
	#define MOLFILE_BIGBUFSIZ 4096 /*< maximum chars in long strings /

	#define MOLFILE_MAXWAVEPERTS 25 /**< maximum number of wavefunctions
	* per timestep */

	+/**
	+ * Hard-coded direct-I/O page size constants for use by both VMD
	+ * and the plugins that want to use direct, unbuffered I/O for high
	+ * performance with SSDs etc. We use two constants to define the
	+ * range of hardware page sizes that we can support, so that we can
	+ * add support for larger 8KB or 16KB page sizes in the future
	+ * as they become more prevalent in high-end storage systems.
	+ *
	+ * At present, VMD uses a hard-coded 4KB page size to reduce memory
	+ * fragmentation, but these constants will make it easier to enable the
	+ * use of larger page sizes in the future if it becomes necessary.
	+ */
	+#define MOLFILE_DIRECTIO_MIN_BLOCK_SIZE 4096
	+#define MOLFILE_DIRECTIO_MAX_BLOCK_SIZE 4096
	+

	/**
	* File level comments, origin information, and annotations.
	*/
	typedef struct {
	char database[81]; /*< database of origin, if any /
	char accession[81]; /*< database accession code, if any /
	char date[81]; /*< date/time stamp for this data /
	char title[81]; /*< brief title for this data /
	int remarklen; /*< length of remarks string /
	char remarks; /< free-form remarks about data /
	} molfile_metadata_t;


	/*
	* Struct for specifying atoms in a molecular structure. The first
	* six components are required, the rest are optional and their presence is
	* indicating by setting the corresponding bit in optsflag. When omitted,
	* the application (for read_structure) or plugin (for write_structure)
	* must be able to supply default values if the missing parameters are
	* part of its internal data structure.
	* Note that it is not possible to specify coordinates with this structure.
	* This is intentional; all coordinate I/O is done with the read_timestep and
	* write_timestep functions.
	*/

	/**
	* Per-atom attributes and information.
	*/
	typedef struct {
	/* these fields absolutely must be set or initialized to empty */
	char name[16]; /*< required atom name string /
	char type[16]; /*< required atom type string /
	char resname[8]; /*< required residue name string /
	int resid; /*< required integer residue ID /
	char segid[8]; /*< required segment name string, or "" /
	+#if 0 && vmdplugin_ABIVERSION > 17
	+ /* The new PDB file formats allows for much larger structures, */
	+ /* which can therefore require longer chain ID strings. The */
	+ /* new PDBx/mmCIF file formats do not have length limits on */
	+ /* fields, so PDB chains could be arbitrarily long strings */
	+ /* in such files. At present, we know we need at least 3-char */
	+ /* chains for existing PDBx/mmCIF files. */
	+ char chain[4]; /*< required chain name, or "" /
	+#else
	char chain[2]; /*< required chain name, or "" /
	-
	+#endif
	/* rest are optional; use optflags to specify what's present */
	char altloc[2]; /*< optional PDB alternate location code /
	char insertion[2]; /*< optional PDB insertion code /
	float occupancy; /*< optional occupancy value /
	float bfactor; /*< optional B-factor value /
	float mass; /*< optional mass value /
	float charge; /*< optional charge value /
	float radius; /*< optional radius value /
	int atomicnumber; /*< optional element atomic number /
	+
	+#if 0
	+ char complex[16];
	+ char assembly[16];
	+ int qmregion;
	+ int qmregionlink;
	+ int qmlayer;
	+ int qmlayerlink;
	+ int qmfrag;
	+ int qmfraglink;
	+ string qmecp;
	+ int qmadapt;
	+ int qmect; /*< boolean /
	+ int qmparam;
	+ int autoparam;
	+#endif
	+
	#if defined(DESRES_CTNUMBER)
	int ctnumber; /*< mae ct block, 0-based, including meta /
	#endif
	} molfile_atom_t;

	/@{/
	/** Plugin optional data field availability flag */
	#define MOLFILE_NOOPTIONS 0x0000 /*< no optional data /
	#define MOLFILE_INSERTION 0x0001 /*< insertion codes provided /
	#define MOLFILE_OCCUPANCY 0x0002 /*< occupancy data provided /
	#define MOLFILE_BFACTOR 0x0004 /*< B-factor data provided /
	#define MOLFILE_MASS 0x0008 /*< Atomic mass provided /
	#define MOLFILE_CHARGE 0x0010 /*< Atomic charge provided /
	#define MOLFILE_RADIUS 0x0020 /*< Atomic VDW radius provided /
	#define MOLFILE_ALTLOC 0x0040 /*< Multiple conformations present /
	#define MOLFILE_ATOMICNUMBER 0x0080 /*< Atomic element number provided /
	#define MOLFILE_BONDSSPECIAL 0x0100 /*< Only non-standard bonds provided /
	#if defined(DESRES_CTNUMBER)
	#define MOLFILE_CTNUMBER 0x0200 /*< ctnumber provided /
	#endif
	#define MOLFILE_BADOPTIONS 0xFFFFFFFF /*< Detect badly behaved plugins /

	/@}/

	/@{/
	/** Flags indicating availability of optional data fields
	* for QM timesteps
	*/
	#define MOLFILE_QMTS_NOOPTIONS 0x0000 /*< no optional data /
	#define MOLFILE_QMTS_GRADIENT 0x0001 /*< energy gradients provided /
	#define MOLFILE_QMTS_SCFITER 0x0002
	/@}/

	-#if vmdplugin_ABIVERSION > 10
	typedef struct molfile_timestep_metadata {
	unsigned int count; /*< total # timesteps; -1 if unknown /
	unsigned int avg_bytes_per_timestep; /** bytes per timestep */
	int has_velocities; /*< if timesteps have velocities /
	} molfile_timestep_metadata_t;
	-#endif

	/*
	* Per-timestep atom coordinates and periodic cell information
	*/
	typedef struct {
	float coords; /< coordinates of all atoms, arranged xyzxyzxyz /
	-#if vmdplugin_ABIVERSION > 10
	float velocities; /< space for velocities of all atoms; same layout /
	/*< NULL unless has_velocities is set /
	-#endif

	/@{/
	/**
	* Unit cell specification of the form A, B, C, alpha, beta, gamma.
	* notes: A, B, C are side lengths of the unit cell
	* alpha = angle between b and c
	* beta = angle between a and c
	* gamma = angle between a and b
	*/
	float A, B, C, alpha, beta, gamma;
	/@}/

	-#if vmdplugin_ABIVERSION > 10
	double physical_time; /*< physical time point associated with this frame /
	-#endif

	#if defined(DESRES_READ_TIMESTEP2)
	/* HACK to support generic trajectory information */
	double total_energy;
	double potential_energy;
	double kinetic_energy;
	double extended_energy;
	double force_energy;
	double total_pressure;
	#endif

	} molfile_timestep_t;


	/**
	* Metadata for volumetric datasets, read initially and used for subsequent
	* memory allocations and file loading.
	*/
	typedef struct {
	char dataname[256]; /*< name of volumetric data set /
	float origin[3]; /*< origin: origin of volume (x=0, y=0, z=0 corner /

	/*
	* x/y/z axis:
	* These the three cell sides, providing both direction and length
	* (not unit vectors) for the x, y, and z axes. In the simplest
	* case, these would be <size,0,0> <0,size,0> and <0,0,size) for
	* an orthogonal cubic volume set. For other cell shapes these
	* axes can be oriented non-orthogonally, and the parallelpiped
	* may have different side lengths, not just a cube/rhombus.
	*/
	float xaxis[3]; /*< direction (and length) for X axis /
	float yaxis[3]; /*< direction (and length) for Y axis /
	float zaxis[3]; /*< direction (and length) for Z axis /

	/*
	* x/y/z size:
	* Number of grid cells along each axis. This is _not_ the
	* physical size of the box, this is the number of voxels in each
	* direction, independent of the shape of the volume set.
	*/
	- int xsize; /*< number of grid cells along the X axis /
	- int ysize; /*< number of grid cells along the Y axis /
	- int zsize; /*< number of grid cells along the Z axis /
	-
	- int has_color; /*< flag indicating presence of voxel color data /
	+ int xsize; /*< number of grid cells along the X axis /
	+ int ysize; /*< number of grid cells along the Y axis /
	+ int zsize; /*< number of grid cells along the Z axis /
	+
	+#if vmdplugin_ABIVERSION > 16
	+ int has_scalar; /*< flag indicating presence of scalar volume /
	+ int has_gradient; /*< flag indicating presence of vector volume /
	+ int has_variance; /*< flag indicating presence of variance map /
	+#endif
	+ int has_color; /*< flag indicating presence of voxel color data /
	} molfile_volumetric_t;


	+#if vmdplugin_ABIVERSION > 16
	+/**
	+ * Volumetric dataset read/write structure with both flag/parameter sets
	+ * and VMD-allocated pointers for fields to be used by the plugin.
	+ */
	+typedef struct {
	+ int setidx; /*< volumetric dataset index to load/save /
	+ float scalar; /< scalar density/potential field data /
	+ float gradient; /< gradient vector field /
	+ float variance; /< variance map indicating signal/noise /
	+ float rgb3f; /< RGB floating point color texture map /
	+ unsigned char rgb3u; /< RGB unsigned byte color texture map /
	+} molfile_volumetric_readwrite_t;
	+#endif


	/**************************************************************
	**************************************************************
	** **
	** Data structures for QM files **
	** **
	**************************************************************
	**************************************************************/

	-#if vmdplugin_ABIVERSION > 9
	-
	-
	/* macros for the convergence status of a QM calculation. */
	#define MOLFILE_QMSTATUS_UNKNOWN -1 /* don't know yet */
	#define MOLFILE_QMSTATUS_OPT_CONV 0 /* optimization converged */
	#define MOLFILE_QMSTATUS_SCF_NOT_CONV 1 /* SCF convergence failed */
	#define MOLFILE_QMSTATUS_OPT_NOT_CONV 2 /* optimization not converged */
	#define MOLFILE_QMSTATUS_FILE_TRUNCATED 3 /* file was truncated */

	/* macros describing the SCF method (SCFTYP in GAMESS) */
	#define MOLFILE_SCFTYPE_UNKNOWN -1 /* no info about the method */
	#define MOLFILE_SCFTYPE_NONE 0 /* calculation didn't make use of SCF */
	#define MOLFILE_SCFTYPE_RHF 1 /* restricted Hartree-Fock */
	#define MOLFILE_SCFTYPE_UHF 2 /* unrestricted Hartree-Fock */
	#define MOLFILE_SCFTYPE_ROHF 3 /* restricted open-shell Hartree-Fock */
	#define MOLFILE_SCFTYPE_GVB 4 /* generalized valence bond orbitals */
	#define MOLFILE_SCFTYPE_MCSCF 5 /* multi-configuration SCF */
	#define MOLFILE_SCFTYPE_FF 6 /* classical force-field based sim. */

	/* macros describing the type of calculation (RUNTYP in GAMESS) */
	#define MOLFILE_RUNTYPE_UNKNOWN 0 /* single point run */
	#define MOLFILE_RUNTYPE_ENERGY 1 /* single point run */
	#define MOLFILE_RUNTYPE_OPTIMIZE 2 /* geometry optimization */
	#define MOLFILE_RUNTYPE_SADPOINT 3 /* saddle point search */
	#define MOLFILE_RUNTYPE_HESSIAN 4 /* Hessian/frequency calculation */
	#define MOLFILE_RUNTYPE_SURFACE 5 /* potential surface scan */
	#define MOLFILE_RUNTYPE_GRADIENT 6 /* energy gradient calculation */
	#define MOLFILE_RUNTYPE_MEX 7 /* minimum energy crossing */
	#define MOLFILE_RUNTYPE_DYNAMICS 8 /* Any type of molecular dynamics
	* e.g. Born-Oppenheimer, Car-Parinello,
	* or classical MD */
	#define MOLFILE_RUNTYPE_PROPERTIES 9 /* Properties were calculated from a
	* wavefunction that was read from file */


	/**
	* Sizes of various QM-related, timestep independent data arrays
	* which must be allocated by the caller (VMD) so that the plugin
	* can fill in the arrays with data.
	*/
	typedef struct {
	/* hessian data */
	int nimag; /*< number of imaginary modes /
	int nintcoords; /*< number internal coordinates /
	int ncart; /*< number cartesian coordinates /

	/* orbital/basisset data */
	int num_basis_funcs; /*< number of uncontracted basis functions in basis array /
	int num_basis_atoms; /*< number of atoms in basis set /
	int num_shells; /*< total number of atomic shells /
	int wavef_size; /**< size of the wavefunction
	* i.e. size of secular eq. or
	* # of cartesian contracted
	* gaussian basis functions */

	/* everything else */
	int have_sysinfo;
	int have_carthessian; /*< hessian in cartesian coords available /
	int have_inthessian; /*< hessian in internal coords available /
	int have_normalmodes; /*< normal modes available /
	} molfile_qm_metadata_t;


	/**
	* QM run info. Parameters that stay unchanged during a single file.
	*/
	typedef struct {
	int nproc; /*< number of processors used. /
	int memory; /*< amount of memory used in Mbyte. /
	int runtype; /*< flag indicating the calculation method. /
	int scftype; /*< SCF type: RHF, UHF, ROHF, GVB or MCSCF wfn. /
	int status; /**< indicates wether SCF and geometry optimization
	* have converged properly. */
	int num_electrons; /*< number of electrons. XXX: can be fractional in some DFT codes /
	int totalcharge; /*< total charge of system. XXX: can be fractional in some DFT codes /
	int num_occupied_A; /*< number of occupied alpha orbitals /
	int num_occupied_B; /*< number of occupied beta orbitals /

	double nuc_charge; /< array(natom) containing the nuclear charge of atom i /

	char basis_string[MOLFILE_BUFSIZ]; /*< basis name as "nice" string. /
	char runtitle[MOLFILE_BIGBUFSIZ]; /*< title of run. /
	char geometry[MOLFILE_BUFSIZ]; /**< type of provided geometry, XXX: remove?
	* e.g. UNIQUE, ZMT, CART, ... */
	char version_string[MOLFILE_BUFSIZ]; /*< QM code version information. /
	} molfile_qm_sysinfo_t;


	/**
	* Data for QM basis set
	*/
	typedef struct {
	int num_shells_per_atom; /< number of shells per atom /
	int num_prim_per_shell; /< number of shell primitives shell /

	float basis; /*< contraction coeffients and exponents for
	* the basis functions in the form
	* {exp(1), c-coeff(1), exp(2), c-coeff(2), ...};
	* array size = 2*num_basis_funcs
	* The basis must NOT be normalized. */
	int atomic_number; /< atomic numbers (chem. element) of atoms in basis set /
	int angular_momentum; /*< 3 ints per wave function coefficient do describe the
	* cartesian components of the angular momentum.
	* E.g. S={0 0 0}, Px={1 0 0}, Dxy={1 1 0}, or Fyyz={0 2 1}.
	*/
	int shell_types; /< type for each shell in basis /
	} molfile_qm_basis_t;


	/**
	* Data from QM Hessian/normal mode runs
	*
	* A noteworthy comment from one of Axel's emails:
	* The molfile_qm_hessian_t, I'd rename to molfile_hessian_t (one
	* can do vibrational analysis without QM) and would make this a
	* completely separate entity. This could then be also used to
	* read in data from, say, principal component analysis or normal
	* mode analysis and VMD could contain code to either project a
	* trajectory on the contained eigenvectors or animate them and
	* so on. There is a bunch of possible applications...
	*/
	typedef struct {
	double carthessian; /< hessian matrix in cartesian coordinates (ncart)(ncart)
	* as a single array of doubles (row(1), ...,row(natoms)) */
	int imag_modes; /< list(nimag) of imaginary modes /
	double inthessian; /*< hessian matrix in internal coordinates
	* (nintcoords*nintcoords) as a single array of
	* doubles (row(1), ...,row(nintcoords)) */
	float wavenumbers; /< array(ncart) of wavenumbers of normal modes /
	float intensities; /< array(ncart) of intensities of normal modes /
	float normalmodes; /< matrix(ncartncart) of normal modes */
	} molfile_qm_hessian_t;


	/**
	* QM related information that is timestep independent
	*/
	typedef struct {
	molfile_qm_sysinfo_t run; /* system info */
	molfile_qm_basis_t basis; /* basis set info */
	molfile_qm_hessian_t hess; /* hessian info */
	} molfile_qm_t;



	/**
	* Enumeration of all of the wavefunction types that can be read
	* from QM file reader plugins.
	*
	* CANON = canonical (i.e diagonalized) wavefunction
	* GEMINAL = GVB-ROHF geminal pairs
	* MCSCFNAT = Multi-Configuration SCF natural orbitals
	* MCSCFOPT = Multi-Configuration SCF optimized orbitals
	* CINATUR = Configuration-Interaction natural orbitals
	* BOYS = Boys localization
	* RUEDEN = Ruedenberg localization
	* PIPEK = Pipek-Mezey population localization
	*
	* NBO related localizations:
	* --------------------------
	* NAO = Natural Atomic Orbitals
	* PNAO = pre-orthogonal NAOs
	* NBO = Natural Bond Orbitals
	* PNBO = pre-orthogonal NBOs
	* NHO = Natural Hybrid Orbitals
	* PNHO = pre-orthogonal NHOs
	* NLMO = Natural Localized Molecular Orbitals
	* PNLMO = pre-orthogonal NLMOs
	*
	* UNKNOWN = Use this for any type not listed here
	* You can use the string field for description
	*/
	enum molfile_qm_wavefunc_type {
	MOLFILE_WAVE_CANON, MOLFILE_WAVE_GEMINAL,
	MOLFILE_WAVE_MCSCFNAT, MOLFILE_WAVE_MCSCFOPT,
	MOLFILE_WAVE_CINATUR,
	MOLFILE_WAVE_PIPEK, MOLFILE_WAVE_BOYS, MOLFILE_WAVE_RUEDEN,
	MOLFILE_WAVE_NAO, MOLFILE_WAVE_PNAO, MOLFILE_WAVE_NHO,
	MOLFILE_WAVE_PNHO, MOLFILE_WAVE_NBO, MOLFILE_WAVE_PNBO,
	MOLFILE_WAVE_PNLMO, MOLFILE_WAVE_NLMO, MOLFILE_WAVE_MOAO,
	MOLFILE_WAVE_NATO, MOLFILE_WAVE_UNKNOWN
	};


	/**
	* Enumeration of all of the supported QM related charge
	* types
	*/
	enum molfile_qm_charge_type {
	MOLFILE_QMCHARGE_UNKNOWN,
	MOLFILE_QMCHARGE_MULLIKEN, MOLFILE_QMCHARGE_LOWDIN,
	MOLFILE_QMCHARGE_ESP, MOLFILE_QMCHARGE_NPA
	};



	/**
	* Sizes of various QM-related, per-timestep data arrays
	* which must be allocated by the caller (VMD) so that the plugin
	* can fill in the arrays with data.
	*/
	typedef struct molfile_qm_timestep_metadata {
	unsigned int count; /*< total # timesteps; -1 if unknown /
	unsigned int avg_bytes_per_timestep; /*< bytes per timestep /
	int has_gradient; /*< if timestep contains gradient /
	int num_scfiter; /*< # scf iterations for this ts /
	int num_orbitals_per_wavef[MOLFILE_MAXWAVEPERTS]; /*< # orbitals for each wavefunction /
	int has_orben_per_wavef[MOLFILE_MAXWAVEPERTS]; /*< orbital energy flags /
	int has_occup_per_wavef[MOLFILE_MAXWAVEPERTS]; /*< orbital occupancy flags /
	int num_wavef ; /*< # wavefunctions in this ts /
	int wavef_size; /**< size of one wavefunction
	* (# of gaussian basis fctns) */
	int num_charge_sets; /*< # of charge values per atom /
	} molfile_qm_timestep_metadata_t;


	/**
	* QM wavefunction
	*/
	typedef struct {
	int type; /*< MOLFILE_WAVE_CANON, MOLFILE_WAVE_MCSCFNAT, ... /
	int spin; /*< 1 for alpha, -1 for beta /
	int excitation; /*< 0 for ground state, 1,2,3,... for excited states /
	int multiplicity; /*< spin multiplicity of the state, zero if unknown /
	char info[MOLFILE_BUFSIZ]; /*< string for additional type info /

	double energy; /**< energy of the electronic state.
	* i.e. HF-SCF energy, CI state energy,
	* MCSCF energy, etc. */

	float wave_coeffs; /*< expansion coefficients for wavefunction in the
	* form {orbital1(c1),orbital1(c2),.....,orbitalM(cN)} */
	float orbital_energies; /< list of orbital energies for wavefunction /
	float occupancies; /< orbital occupancies /
	int orbital_ids; /*< orbital ID numbers; If NULL then VMD will
	* assume 1,2,3,...num_orbs. */
	} molfile_qm_wavefunction_t;


	/**
	* QM per trajectory timestep info
	* Note that each timestep can contain multiple wavefunctions.
	*/
	typedef struct {
	molfile_qm_wavefunction_t wave; /< array of wavefunction objects /
	float gradient; /< force on each atom (=gradient of energy) /

	double scfenergies; /< energies from the SCF cycles /
	double charges; /< per-atom charges /
	int charge_types; /< type of each charge set /
	} molfile_qm_timestep_t;


	-#endif
	-
	/**************************************************************
	**************************************************************/




	/**
	* Enumeration of all of the supported graphics objects that can be read
	* from graphics file reader plugins.
	*/
	enum molfile_graphics_type {
	MOLFILE_POINT, MOLFILE_TRIANGLE, MOLFILE_TRINORM, MOLFILE_NORMS,
	MOLFILE_LINE, MOLFILE_CYLINDER, MOLFILE_CAPCYL, MOLFILE_CONE,
	MOLFILE_SPHERE, MOLFILE_TEXT, MOLFILE_COLOR, MOLFILE_TRICOLOR
	};

	/**
	* Individual graphics object/element data
	*/
	typedef struct {
	int type; /* One of molfile_graphics_type */
	int style; /* A general style parameter */
	float size; /* A general size parameter */
	float data[9]; /* All data for the element */
	} molfile_graphics_t;


	/*
	* Types for raw graphics elements stored in files. Data for each type
	* should be stored by the plugin as follows:

	type data style size
	---- ---- ----- ----
	point x, y, z pixel size
	triangle x1,y1,z1,x2,y2,z2,x3,y3,z3
	trinorm x1,y1,z1,x2,y2,z2,x3,y3,z3
	the next array element must be NORMS
	tricolor x1,y1,z1,x2,y2,z2,x3,y3,z3
	the next array elements must be NORMS
	the following element must be COLOR, with three RGB triples
	norms x1,y1,z1,x2,y2,z2,x3,y3,z3
	line x1,y1,z1,x2,y2,z2 0=solid pixel width
	1=stippled
	cylinder x1,y1,z1,x2,y2,z2 resolution radius
	capcyl x1,y1,z1,x2,y2,z2 resolution radius
	sphere x1,y1,z1 resolution radius
	text x, y, z, up to 24 bytes of text pixel size
	color r, g, b
	*/


	/**
	* Main file reader API. Any function in this struct may be NULL
	* if not implemented by the plugin; the application checks this to determine
	* what functionality is present in the plugin.
	*/
	typedef struct {
	/**
	* Required header
	*/
	vmdplugin_HEAD

	/**
	* Filename extension for this file type. May be NULL if no filename
	* extension exists and/or is known. For file types that match several
	* common extensions, list them in a comma separated list such as:
	* "pdb,ent,foo,bar,baz,ban"
	* The comma separated list will be expanded when filename extension matching
	* is performed. If multiple plugins solicit the same filename extensions,
	* the one that lists the extension earliest in its list is selected. In the
	* case of a "tie", the first one tried/checked "wins".
	*/
	const char *filename_extension;

	/**
	* Try to open the file for reading. Return an opaque handle, or NULL on
	* failure. Set the number of atoms; if the number of atoms cannot be
	* determined, set natoms to MOLFILE_NUMATOMS_UNKNOWN.
	* Filetype should be the name under which this plugin was registered;
	* this is provided so that plugins can provide the same function pointer
	* to handle multiple file types.
	*/
	void ( open_file_read)(const char filepath, const char filetype,
	int *natoms);

	/**
	* Read molecular structure from the given file handle. atoms is allocated
	* by the caller and points to space for natoms.
	* On success, place atom information in the passed-in pointer.
	* optflags specifies which optional fields in the atoms will be set by
	* the plugin.
	*/
	int (read_structure)(void , int optflags, molfile_atom_t atoms);

	/**
	* Read bond information for the molecule. On success the arrays from
	* and to should point to the (one-based) indices of bonded atoms.
	* Each unique bond should be specified only once, so file formats that list
	* bonds twice will need post-processing before the results are returned to
	* the caller.
	* If the plugin provides bond information, but the file loaded doesn't
	* actually contain any bond info, the nbonds parameter should be
	* set to 0 and from/to should be set to NULL to indicate that no bond
	* information was actually present, and automatic bond search should be
	* performed.
	*
	* If the plugin provides bond order information, the bondorder array
	* will contain the bond order for each from/to pair. If not, the bondorder
	* pointer should be set to NULL, in which case the caller will provide a
	* default bond order value of 1.0.
	*
	* If the plugin provides bond type information, the bondtype array
	* will contain the bond type index for each from/to pair. These numbers
	* are consecutive integers starting from 0.
	* the bondtypenames list, contains the corresponding names, if available,
	* as a NULL string terminated list. nbondtypes is provided for convenience
	* and consistency checking.
	*
	* These arrays must be freed by the plugin in the close_file_read function.
	* This function can be called only after read_structure().
	* Return MOLFILE_SUCCESS if no errors occur.
	*/
	-#if vmdplugin_ABIVERSION > 14
	int (read_bonds)(void , int nbonds, int from, int to, float *bondorder,
	int *bondtype, int nbondtypes, char ***bondtypename);
	-#else
	- int (read_bonds)(void , int nbonds, int from, int to, float *bondorder);
	-#endif

	/**
	* XXX this function will be augmented and possibly superceded by a
	* new QM-capable version named read_timestep(), when finished.
	*
	* Read the next timestep from the file. Return MOLFILE_SUCCESS, or
	* MOLFILE_EOF on EOF. If the molfile_timestep_t argument is NULL, then
	* the frame should be skipped. Otherwise, the application must prepare
	* molfile_timestep_t by allocating space in coords for the corresponding
	* number of coordinates.
	* The natoms parameter exists because some coordinate file formats
	* (like CRD) cannot determine for themselves how many atoms are in a
	* timestep; the app must therefore obtain this information elsewhere
	* and provide it to the plugin.
	*/
	int (* read_next_timestep)(void , int natoms, molfile_timestep_t );

	/**
	* Close the file and release all data. The handle cannot be reused.
	*/
	void (* close_file_read)(void *);

	/**
	* Open a coordinate file for writing using the given header information.
	* Return an opaque handle, or NULL on failure. The application must
	* specify the number of atoms to be written.
	* filetype should be the name under which this plugin was registered.
	*/
	void ( open_file_write)(const char filepath, const char filetype,
	int natoms);

	/**
	* Write structure information. Return success.
	*/
	int (* write_structure)(void , int optflags, const molfile_atom_t atoms);

	/**
	* Write a timestep to the coordinate file. Return MOLFILE_SUCCESS if no
	* errors occur. If the file contains structure information in each
	* timestep (like a multi-entry PDB), it will have to cache the information
	* from the initial calls from write_structure.
	*/
	int (* write_timestep)(void , const molfile_timestep_t );

	/**
	* Close the file and release all data. The handle cannot be reused.
	*/
	void (* close_file_write)(void *);

	/**
	* Retrieve metadata pertaining to volumetric datasets in this file.
	* Set nsets to the number of volumetric data sets, and set *metadata
	* to point to an array of molfile_volumetric_t. The array is owned by
	* the plugin and should be freed by close_file_read(). The application
	* may call this function any number of times.
	*/
	int (* read_volumetric_metadata)(void , int nsets,
	molfile_volumetric_t **metadata);

	/**
	* Read the specified volumetric data set into the space pointed to by
	* datablock. The set is specified with a zero-based index. The space
	* allocated for the datablock must be equal to
	* xsize * ysize * zsize. No space will be allocated for colorblock
	* unless has_color is nonzero; in that case, colorblock should be
	* filled in with three RGB floats per datapoint.
	*/
	int (* read_volumetric_data)(void , int set, float datablock,
	float *colorblock);
	+#if vmdplugin_ABIVERSION > 16
	+ int (* read_volumetric_data_ex)(void , molfile_volumetric_readwrite_t v);
	+#endif

	/**
	* Read raw graphics data stored in this file. Return the number of data
	* elements and the data itself as an array of molfile_graphics_t in the
	* pointer provided by the application. The plugin is responsible for
	* freeing the data when the file is closed.
	*/
	int (* read_rawgraphics)(void , int nelem, const molfile_graphics_t **data);

	/**
	* Read molecule metadata such as what database (if any) this file/data
	* came from, what the accession code for the database is, textual remarks
	* and other notes pertaining to the contained structure/trajectory/volume
	* and anything else that's informative at the whole file level.
	*/
	int (* read_molecule_metadata)(void , molfile_metadata_t *metadata);

	/**
	* Write bond information for the molecule. The arrays from
	* and to point to the (one-based) indices of bonded atoms.
	* Each unique bond will be specified only once by the caller.
	* File formats that list bonds twice will need to emit both the
	* from/to and to/from versions of each.
	* This function must be called before write_structure().
	*
	* Like the read_bonds() routine, the bondorder pointer is set to NULL
	* if the caller doesn't have such information, in which case the
	* plugin should assume a bond order of 1.0 if the file format requires
	* bond order information.
	*
	* Support for bond types follows the bondorder rules. bondtype is
	* an integer array of the size nbonds that contains the bond type
	* index (consecutive integers starting from 0) and bondtypenames
	* contain the corresponding strings, in case the naming/numbering
	* scheme is different from the index numbers.
	* if the pointers are set to NULL, then this information is not available.
	* bondtypenames can only be used of bondtypes is also given.
	* Return MOLFILE_SUCCESS if no errors occur.
	*/
	-#if vmdplugin_ABIVERSION > 14
	int (* write_bonds)(void , int nbonds, int from, int to, float bondorder,
	int bondtype, int nbondtypes, char *bondtypename);
	-#else
	- int (* write_bonds)(void , int nbonds, int from, int to, float bondorder);
	-#endif

	-#if vmdplugin_ABIVERSION > 9
	/**
	* Write the specified volumetric data set into the space pointed to by
	* datablock. The * allocated for the datablock must be equal to
	* xsize * ysize * zsize. No space will be allocated for colorblock
	* unless has_color is nonzero; in that case, colorblock should be
	* filled in with three RGB floats per datapoint.
	*/
	int (* write_volumetric_data)(void , molfile_volumetric_t metadata,
	float datablock, float colorblock);
	+#if vmdplugin_ABIVERSION > 16
	+ int (* write_volumetric_data_ex)(void , molfile_volumetric_t metadata,
	+ molfile_volumetric_readwrite_t *v);
	+#endif

	-#if vmdplugin_ABIVERSION > 15
	/**
	* Read in Angles, Dihedrals, Impropers, and Cross Terms and optionally types.
	* (Cross terms pertain to the CHARMM/NAMD CMAP feature)
	*/
	int (* read_angles)(void handle, int numangles, int angles, int angletypes,
	int numangletypes, char *angletypenames, int numdihedrals,
	int dihedrals, int dihedraltypes, int *numdihedraltypes,
	char **dihedraltypenames, int numimpropers, int **impropers,
	int *impropertypes, int numimpropertypes, char ***impropertypenames,
	int numcterms, int cterms, int ctermcols, int *ctermrows);

	/**
	* Write out Angles, Dihedrals, Impropers, and Cross Terms
	* (Cross terms pertain to the CHARMM/NAMD CMAP feature)
	*/
	int (* write_angles)(void handle, int numangles, const int angles, const int *angletypes,
	int numangletypes, const char **angletypenames, int numdihedrals,
	const int dihedrals, const int dihedraltypes, int numdihedraltypes,
	const char **dihedraltypenames, int numimpropers,
	const int impropers, const int impropertypes, int numimpropertypes,
	const char *impropertypenames, int numcterms, const int cterms,
	int ctermcols, int ctermrows);
	-#else
	- /**
	- * Read in Angles, Dihedrals, Impropers, and Cross Terms
	- * Forces are in Kcal/mol
	- * (Cross terms pertain to the CHARMM/NAMD CMAP feature, forces are given
	- * as a 2-D matrix)
	- */
	- int (* read_angles)(void *,
	- int numangles, int angles, double *angleforces,
	- int numdihedrals, int dihedrals, double *dihedralforces,
	- int numimpropers, int impropers, double *improperforces,
	- int numcterms, int *cterms,
	- int ctermcols, int ctermrows, double **ctermforces);
	-
	- /**
	- * Write out Angles, Dihedrals, Impropers, and Cross Terms
	- * Forces are in Kcal/mol
	- * (Cross terms pertain to the CHARMM/NAMD CMAP feature, forces are given
	- * as a 2-D matrix)
	- */
	- int (* write_angles)(void *,
	- int numangles, const int angles, const double angleforces,
	- int numdihedrals, const int dihedrals, const double dihedralforces,
	- int numimpropers, const int impropers, const double improperforces,
	- int numcterms, const int *cterms,
	- int ctermcols, int ctermrows, const double *ctermforces);
	-#endif


	/**
	* Retrieve metadata pertaining to timestep independent
	* QM datasets in this file.
	*
	* The metadata are the sizes of the QM related data structure
	* arrays that will be populated by the plugin when
	* read_qm_rundata() is called. Since the allocation of these
	* arrays is done by VMD rather than the plugin, VMD needs to
	* know the sizes beforehand. Consequently read_qm_metadata()
	* has to be called before read_qm_rundata().
	*/
	int (* read_qm_metadata)(void , molfile_qm_metadata_t metadata);


	/**
	* Read timestep independent QM data.
	*
	* Typical data that are defined only once per trajectory are
	* general info about the calculation (such as the used method),
	* the basis set and normal modes.
	* The data structures to be populated must have been allocated
	* before by VMD according to sizes obtained through
	* read_qm_metadata().
	*/
	int (* read_qm_rundata)(void , molfile_qm_t qmdata);


	/**
	* Read the next timestep from the file. Return MOLFILE_SUCCESS, or
	* MOLFILE_EOF on EOF. If the molfile_timestep_t or molfile_qm_metadata_t
	* arguments are NULL, then the coordinate or qm data should be skipped.
	* Otherwise, the application must prepare molfile_timestep_t and
	* molfile_qm_timestep_t by allocating space for the corresponding
	* number of coordinates, orbital wavefunction coefficients, etc.
	* Since it is common for users to want to load only the final timestep
	* data from a QM run, the application may provide any combination of
	* valid, or NULL pointers for the molfile_timestep_t and
	* molfile_qm_timestep_t parameters, depending on what information the
	* user is interested in.
	* The natoms and qm metadata parameters exist because some file formats
	* cannot determine for themselves how many atoms etc are in a
	* timestep; the app must therefore obtain this information elsewhere
	* and provide it to the plugin.
	*/
	int (* read_timestep)(void , int natoms, molfile_timestep_t ,
	molfile_qm_metadata_t , molfile_qm_timestep_t );
	-#endif

	-#if vmdplugin_ABIVERSION > 10
	int (* read_timestep_metadata)(void , molfile_timestep_metadata_t );
	-#endif
	-#if vmdplugin_ABIVERSION > 11
	int (* read_qm_timestep_metadata)(void , molfile_qm_timestep_metadata_t );
	-#endif

	#if defined(DESRES_READ_TIMESTEP2)
	/**
	* Read a specified timestep!
	*/
	int (* read_timestep2)(void , molfile_ssize_t index, molfile_timestep_t );

	/**
	* write up to count times beginning at index start into the given
	* space. Return the number read, or -1 on error.
	*/
	molfile_ssize_t (* read_times)( void *,
	molfile_ssize_t start,
	molfile_ssize_t count,
	double * times );
	#endif

	-#if vmdplugin_ABIVERSION > 13
	/**
	* Console output, READ-ONLY function pointer.
	* Function pointer that plugins can use for printing to the host
	* application's text console. This provides a clean way for plugins
	* to send message strings back to the calling application, giving the
	* caller the ability to prioritize, buffer, and redirect console messages
	* to an appropriate output channel, window, etc. This enables the use of
	* graphical consoles like TkCon without losing console output from plugins.
	* If the function pointer is NULL, no console output service is provided
	* by the calling application, and the output should default to stdout
	* stream. If the function pointer is non-NULL, all output will be
	* subsequently dealt with by the calling application.
	*
	* XXX this should really be put into a separate block of
	* application-provided read-only function pointers for any
	* application-provided services
	*/
	int (* cons_fputs)(const int, const char*);
	-#endif

	} molfile_plugin_t;

	#endif
	+
	diff --git a/src/USER-MOLFILE/vmdplugin.h b/lib/molfile/vmdplugin.h
	similarity index 98%
	rename from src/USER-MOLFILE/vmdplugin.h
	rename to lib/molfile/vmdplugin.h
	index 37299408f..842d1e431 100644
	--- a/src/USER-MOLFILE/vmdplugin.h
	+++ b/lib/molfile/vmdplugin.h
	@@ -1,191 +1,191 @@
	/***************************************************************************
	*cr
	*cr (C) Copyright 1995-2006 The Board of Trustees of the
	*cr University of Illinois
	*cr All Rights Reserved
	*cr
	***************************************************************************/

	/***************************************************************************
	* RCS INFORMATION:
	*
	* $RCSfile: vmdplugin.h,v $
	* $Author: johns $ $Locker: $ $State: Exp $
	- * $Revision: 1.32 $ $Date: 2009/02/24 05:12:35 $
	+ * $Revision: 1.33 $ $Date: 2015/10/29 05:10:54 $
	*
	***************************************************************************/

	/** @file
	* This header must be included by every VMD plugin library. It defines the
	* API for every plugin so that VMD can organize the plugins it finds.
	*/

	#ifndef VMD_PLUGIN_H
	#define VMD_PLUGIN_H


	/*
	* Preprocessor tricks to make it easier for us to redefine the names of
	* functions when building static plugins.
	*/
	#if !defined(VMDPLUGIN)
	/**
	* macro defining VMDPLUGIN if it hasn't already been set to the name of
	* a static plugin that is being compiled. This is the catch-all case.
	*/
	#define VMDPLUGIN vmdplugin
	#endif
	/** concatenation macro, joins args x and y together as a single string */
	#define xcat(x, y) cat(x, y)
	/** concatenation macro, joins args x and y together as a single string */
	#define cat(x, y) x ## y

	/*
	* macros to correctly define plugin function names depending on whether
	* the plugin is being compiled for static linkage or dynamic loading.
	* When compiled for static linkage, each plugin needs to have unique
	* function names for all of its entry points. When compiled for dynamic
	* loading, the plugins must name their entry points consistently so that
	* the plugin loading mechanism can find the register, register_tcl, init,
	* and fini routines via dlopen() or similar operating system interfaces.
	*/
	/@{/
	/** Macro names entry points correctly for static linkage or dynamic loading */
	#define VMDPLUGIN_register xcat(VMDPLUGIN, _register)
	#define VMDPLUGIN_register_tcl xcat(VMDPLUGIN, _register_tcl)
	#define VMDPLUGIN_init xcat(VMDPLUGIN, _init)
	#define VMDPLUGIN_fini xcat(VMDPLUGIN, _fini)
	/@}/


	/** "WIN32" is defined on both WIN32 and WIN64 platforms... */
	#if (defined(WIN32))
	#define WIN32_LEAN_AND_MEAN
	#include <windows.h>

	#if !defined(STATIC_PLUGIN)
	#if defined(VMDPLUGIN_EXPORTS)
	/**
	* Only define DllMain for plugins, not in VMD or in statically linked plugins
	* VMDPLUGIN_EXPORTS is only defined when compiling dynamically loaded plugins
	*/
	BOOL APIENTRY DllMain( HANDLE hModule,
	DWORD ul_reason_for_call,
	LPVOID lpReserved
	)
	{
	return TRUE;
	}

	#define VMDPLUGIN_API __declspec(dllexport)
	#else
	#define VMDPLUGIN_API __declspec(dllimport)
	#endif /* VMDPLUGIN_EXPORTS */
	#else /* ! STATIC_PLUGIN */
	#define VMDPLUGIN_API
	#endif /* ! STATIC_PLUGIN */
	#else
	/** If we're not compiling on Windows, then this macro is defined empty */
	#define VMDPLUGIN_API
	#endif

	/** define plugin linkage correctly for both C and C++ based plugins */
	#ifdef __cplusplus
	#define VMDPLUGIN_EXTERN extern "C" VMDPLUGIN_API
	#else
	#define VMDPLUGIN_EXTERN extern VMDPLUGIN_API
	#endif /* __cplusplus */

	/*
	* Plugin API functions start here
	*/


	/**
	* Init routine: called the first time the library is loaded by the
	* application and before any other API functions are referenced.
	* Return 0 on success.
	*/
	VMDPLUGIN_EXTERN int VMDPLUGIN_init(void);

	/**
	* Macro for creating a struct header used in all plugin structures.
	*
	* This header should be placed at the top of every plugin API definition
	* so that it can be treated as a subtype of the base plugin type.
	*
	* abiversion: Defines the ABI for the base plugin type (not for other plugins)
	* type: A string descriptor of the plugin type.
	* name: A name for the plugin.
	* author: A string identifier, possibly including newlines.
	* Major and minor version.
	* is_reentrant: Whether this library can be run concurrently with itself.
	*/
	#define vmdplugin_HEAD \
	int abiversion; \
	const char *type; \
	const char *name; \
	const char *prettyname; \
	const char *author; \
	int majorv; \
	int minorv; \
	int is_reentrant;

	/**
	* Typedef for generic plugin header, individual plugins can
	* make their own structures as long as the header info remains
	* the same as the generic plugin header, most easily done by
	* using the vmdplugin_HEAD macro.
	*/
	typedef struct {
	vmdplugin_HEAD
	} vmdplugin_t;

	/**
	* Use this macro to initialize the abiversion member of each plugin
	*/
	-#define vmdplugin_ABIVERSION 16
	+#define vmdplugin_ABIVERSION 17

	/@{/
	/** Use this macro to indicate a plugin's thread-safety at registration time */
	#define VMDPLUGIN_THREADUNSAFE 0
	#define VMDPLUGIN_THREADSAFE 1
	/@}/

	/@{/
	/** Error return code for use in the plugin registration and init functions */
	#define VMDPLUGIN_SUCCESS 0
	#define VMDPLUGIN_ERROR -1
	/@}/

	/**
	* Function pointer typedef for register callback functions
	*/
	typedef int (vmdplugin_register_cb)(void , vmdplugin_t *);

	/**
	* Allow the library to register plugins with the application.
	* The callback should be called using the passed-in void pointer, which
	* should not be interpreted in any way by the library. Each vmdplugin_t
	* pointer passed to the application should point to statically-allocated
	* or heap-allocated memory and should never be later modified by the plugin.
	* Applications must be permitted to retain only a copy of the the plugin
	* pointer, without making any deep copy of the items in the struct.
	*/
	VMDPLUGIN_EXTERN int VMDPLUGIN_register(void *, vmdplugin_register_cb);

	/**
	* Allow the library to register Tcl extensions.
	* This API is optional; if found by dlopen, it will be called after first
	* calling init and register.
	*/
	VMDPLUGIN_EXTERN int VMDPLUGIN_register_tcl(void , void tcl_interp,
	vmdplugin_register_cb);

	/**
	* The Fini method is called when the application will no longer use
	* any plugins in the library.
	*/
	VMDPLUGIN_EXTERN int VMDPLUGIN_fini(void);

	#endif /* VMD_PLUGIN_H */
	diff --git a/lib/mscg/Install.py b/lib/mscg/Install.py
	new file mode 100644
	index 000000000..e54723261
	--- /dev/null
	+++ b/lib/mscg/Install.py
	@@ -0,0 +1,122 @@
	+#!/usr/bin/env python
	+
	+# Install.py tool to download, unpack, build, and link to the MS-CG library
	+# used to automate the steps described in the README file in this dir
	+
	+import sys,os,re,commands
	+
	+# help message
	+
	+help = """
	+Syntax: python Install.py -h hpath hdir -g -b [suffix] -l
	+ specify one or more options, order does not matter
	+ -h = set home dir of MS-CG to be hpath/hdir
	+ hpath can be full path, contain '~' or '.' chars
	+ default hpath = . = lib/mscg
	+ default hdir = MSCG-release-master = what GitHub zipfile unpacks to
	+ -g = grab (download) zipfile from MS-CG GitHub website
	+ unpack it to hpath/hdir
	+ hpath must already exist
	+ if hdir already exists, it will be deleted before unpack
	+ -b = build MS-CG library in its src dir
	+ optional suffix specifies which src/Make/Makefile.suffix to use
	+ default suffix = g++_simple
	+ -l = create 2 softlinks (includelink,liblink) in lib/mscg to MS-CG src dir
	+"""
	+
	+# settings
	+
	+url = "https://github.com/uchicago-voth/MSCG-release/archive/master.zip"
	+zipfile = "MS-CG-master.zip"
	+zipdir = "MSCG-release-master"
	+
	+# print error message or help
	+
	+def error(str=None):
	+ if not str: print help
	+ else: print "ERROR",str
	+ sys.exit()
	+
	+# expand to full path name
	+# process leading '~' or relative path
	+
	+def fullpath(path):
	+ return os.path.abspath(os.path.expanduser(path))
	+
	+# parse args
	+
	+args = sys.argv[1:]
	+nargs = len(args)
	+if nargs == 0: error()
	+
	+homepath = "."
	+homedir = zipdir
	+
	+grabflag = 0
	+buildflag = 0
	+msuffix = "g++_simple"
	+linkflag = 0
	+
	+iarg = 0
	+while iarg < nargs:
	+ if args[iarg] == "-h":
	+ if iarg+3 > nargs: error()
	+ homepath = args[iarg+1]
	+ homedir = args[iarg+2]
	+ iarg += 3
	+ elif args[iarg] == "-g":
	+ grabflag = 1
	+ iarg += 1
	+ elif args[iarg] == "-b":
	+ buildflag = 1
	+ if iarg+1 < nargs and args[iarg+1][0] != '-':
	+ msuffix = args[iarg+1]
	+ iarg += 1
	+ iarg += 1
	+ elif args[iarg] == "-l":
	+ linkflag = 1
	+ iarg += 1
	+ else: error()
	+
	+homepath = fullpath(homepath)
	+if not os.path.isdir(homepath): error("MS-CG path does not exist")
	+homedir = "%s/%s" % (homepath,homedir)
	+
	+# download and unpack MS-CG zipfile
	+
	+if grabflag:
	+ print "Downloading MS-CG ..."
	+ cmd = "curl -L %s > %s/%s" % (url,homepath,zipfile)
	+ print cmd
	+ print commands.getoutput(cmd)
	+
	+ print "Unpacking MS-CG zipfile ..."
	+ if os.path.exists("%s/%s" % (homepath,zipdir)):
	+ commands.getoutput("rm -rf %s/%s" % (homepath,zipdir))
	+ cmd = "cd %s; unzip %s" % (homepath,zipfile)
	+ commands.getoutput(cmd)
	+ if os.path.basename(homedir) != zipdir:
	+ if os.path.exists(homedir): commands.getoutput("rm -rf %s" % homedir)
	+ os.rename("%s/%s" % (homepath,zipdir),homedir)
	+
	+# build MS-CG
	+
	+if buildflag:
	+ print "Building MS-CG ..."
	+ cmd = "cd %s/src; cp Make/Makefile.%s .; make -f Makefile.%s" % \
	+ (homedir,msuffix,msuffix)
	+ txt = commands.getoutput(cmd)
	+ print txt
	+
	+# create 2 links in lib/mscg to MS-CG src dir
	+
	+if linkflag:
	+ print "Creating links to MS-CG include and lib files"
	+ if os.path.isfile("includelink") or os.path.islink("includelink"):
	+ os.remove("includelink")
	+ if os.path.isfile("liblink") or os.path.islink("liblink"):
	+ os.remove("liblink")
	+ cmd = "ln -s %s/src includelink" % homedir
	+ commands.getoutput(cmd)
	+ cmd = "ln -s %s/src liblink" % homedir
	+ commands.getoutput(cmd)
	diff --git a/lib/mscg/Makefile.lammps b/lib/mscg/Makefile.lammps
	index 0aa55b087..f0d9a9b8a 100644
	--- a/lib/mscg/Makefile.lammps
	+++ b/lib/mscg/Makefile.lammps
	@@ -1,5 +1,5 @@
	# Settings that the LAMMPS build will import when this package library is used

	-mscg_SYSINC =
	-mscg_SYSLIB = -lm -lgsl -llapack -lcblas
	+mscg_SYSINC = -std=c++11
	+mscg_SYSLIB = -lm -lgsl -llapack -lgslcblas
	mscg_SYSPATH =
	diff --git a/lib/mscg/README b/lib/mscg/README
	index cc4fc9a66..b73c8563c 100755
	--- a/lib/mscg/README
	+++ b/lib/mscg/README
	@@ -1,53 +1,67 @@
	This directory contains links to the Multi-scale Coarse-graining
	(MS-CG) library which is required to use the MSCG package and its fix
	command in a LAMMPS input script.

	The MS-CG library is available at
	https://github.com/uchicago-voth/MSCG-release and was developed by
	Jacob Wagner in Greg Voth's group at the University of Chicago.

	+This library requires a compiler with C++11 support (e.g., g++ v4.9+),
	+LAPACK, and the GNU scientific library (GSL v 2.1+).
	+
	+You can type "make lib-mscg" from the src directory to see help on how
	+to download and build this library via make commands, or you can do
	+the same thing by typing "python Install.py" from within this
	+directory, or you can do it manually by following the instructions
	+below.
	+
	-----------------

	You must perform the following steps yourself.

	1. Download MS-CG at https://github.com/uchicago-voth/MSCG-release
	either as a tarball or via SVN, and unpack the tarball either in
	this /lib/mscg directory or somewhere else on your system.

	-2. Compile MS-CG from within its home directory using your makefile choice:
	+2. Ensure that you have LAPACK and GSL (or Intel MKL) as well as a compiler
	+ with support for C++11.
	+
	+3. Compile MS-CG from within its home directory using your makefile of choice:
	% make -f Makefile."name" libmscg.a
	+ It is recommended that you start with Makefile.g++_simple
	+ for most machines

	-3. There is no need to install MS-CG if you only wish
	+4. There is no need to install MS-CG if you only wish
	to use it from LAMMPS.

	-4. Create two soft links in this dir (lib/mscg) to the MS-CG src
	+5. Create two soft links in this dir (lib/mscg) to the MS-CG src
	directory. E.g if you built MS-CG in this dir:
	- % ln -s mscgfm-master/src includelink
	- % ln -s mscgfm-master/src liblink
	+ % ln -s src includelink
	+ % ln -s src liblink
	These links could instead be set to the include and lib
	directories created by a MS-CG install, e.g.
	% ln -s /usr/local/include includelink
	% ln -s /usr/local/lib liblink

	-----------------

	When these steps are complete you can build LAMMPS with the MS-CG
	package installed:

	% cd lammps/src
	% make yes-USER-MSCG
	% make g++ (or whatever target you wish)

	Note that if you download and unpack a new LAMMPS tarball, the
	"includelink" and "liblink" files will be lost and you will need to
	re-create them (step 4). If you built MS-CG in this directory (as
	opposed to somewhere else on your system) and did not install it
	somewhere else, you will also need to repeat steps 1,2,3.

	The Makefile.lammps file in this directory is there for compatibility
	with the way other libraries under the lib dir are linked with by
	-LAMMPS. MS-CG requires the GSL, LAPACK, and BLAS libraries as listed
	-in Makefile.lammps. If they are not in default locations where your
	+LAMMPS. MS-CG requires the GSL and LAPACK libraries as listed in
	+Makefile.lammps. If they are not in default locations where your
	LD_LIBRARY_PATH environment settings can find them, then you should
	add the approrpriate -L paths to the mscg_SYSPATH variable in
	Makefile.lammps.
	diff --git a/lib/netcdf/README b/lib/netcdf/README
	index 00db8df00..b18ea1d27 100644
	--- a/lib/netcdf/README
	+++ b/lib/netcdf/README
	@@ -1,43 +1,46 @@
	The Makefile.lammps file in this directory is used when building
	LAMMPS with packages that make use of the NetCDF library or its
	-parallel version. The file has several settings needed to compile
	+parallel version. For example, the USER-NETCDF package which adds
	+dump netcdf and dump netcdf/mpiio commands.
	+
	+The file has several settings needed to compile
	and link LAMMPS with the NetCDF and parallel NetCDF support.
	For any regular NetCDF installation, all required flags should be
	autodetected. Please note that parallel NetCDF support is
	beneficial only when you run on a machine with very many processors
	like an IBM BlueGene or Cray. For most people regular NetCDF
	support should be sufficient and not cause any performance
	penalties.

	If you have problems compiling or linking, you may have to set
	the flags manually. There are three makefile variables

	1) netcdf_SYSINC
	This is for setting preprocessor options and include file paths.
	Set -DLMP_HAS_NETCDF, if you have NetCDF installed.
	Set -DLMP_HAS_PNETCDF, if you have parallel NetCDF installed.
	You can have either or both defines set. If none of these are
	set, LAMMPS will compile, but the NetCDF enabled functionality
	will not be available.
	In addition you may have to point to the folder with the include
	with -I/path/to/netcdf/include

	Example for a Fedora 24 machine with serial NetCDF installed as
	netcdf-devel-4.4.0-3.fc24.x86_64 RPM package:

	netcdf_SYSINC = -DLMP_HAS_NETCDF -I/usr/include -I/usr/include/hdf

	2) netcdf_SYSLIB
	This is the setting for all required libraries that need to be linked to.

	Example for a Fedora 24 machine with serial NetCDF installed as
	netcdf-devel-4.4.0-3.fc24.x86_64 RPM package:

	netcdf_SYSLIB = -lnetcdf

	3) netcdf_SYSPATH
	This is the setting for the path of directories with the NetCDF libraries.
	Typically, this will be of the form -L/path/to/netcdf/lib

	In the example from above, it can be left empty, because the Linux
	distribution provided libraries are installed in a system library location.
	diff --git a/lib/poems/Install.py b/lib/poems/Install.py
	new file mode 100644
	index 000000000..18b426f92
	--- /dev/null
	+++ b/lib/poems/Install.py
	@@ -0,0 +1,82 @@
	+#!/usr/bin/env python
	+
	+# install.py tool to do a generic build of a library
	+# soft linked to by many of the lib/Install.py files
	+# used to automate the steps described in the corresponding lib/README
	+
	+import sys,commands,os
	+
	+# help message
	+
	+help = """
	+Syntax: python Install.py -m machine -e suffix
	+ specify -m and optionally -e, order does not matter
	+ -m = peform a clean followed by "make -f Makefile.machine"
	+ machine = suffix of a lib/Makefile.* file
	+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
	+ does not alter existing Makefile.machine
	+"""
	+
	+# print error message or help
	+
	+def error(str=None):
	+ if not str: print help
	+ else: print "ERROR",str
	+ sys.exit()
	+
	+# parse args
	+
	+args = sys.argv[1:]
	+nargs = len(args)
	+if nargs == 0: error()
	+
	+machine = None
	+extraflag = 0
	+
	+iarg = 0
	+while iarg < nargs:
	+ if args[iarg] == "-m":
	+ if iarg+2 > nargs: error()
	+ machine = args[iarg+1]
	+ iarg += 2
	+ elif args[iarg] == "-e":
	+ if iarg+2 > nargs: error()
	+ extraflag = 1
	+ suffix = args[iarg+1]
	+ iarg += 2
	+ else: error()
	+
	+# set lib from working dir
	+
	+cwd = os.getcwd()
	+lib = os.path.basename(cwd)
	+
	+# create Makefile.auto as copy of Makefile.machine
	+# reset EXTRAMAKE if requested
	+
	+if not os.path.exists("Makefile.%s" % machine):
	+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
	+
	+lines = open("Makefile.%s" % machine,'r').readlines()
	+fp = open("Makefile.auto",'w')
	+
	+for line in lines:
	+ words = line.split()
	+ if len(words) == 3 and extraflag and \
	+ words[0] == "EXTRAMAKE" and words[1] == '=':
	+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
	+ print >>fp,line,
	+
	+fp.close()
	+
	+# make the library via Makefile.auto
	+
	+print "Building lib%s.a ..." % lib
	+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
	+txt = commands.getoutput(cmd)
	+print txt
	+
	+if os.path.exists("lib%s.a" % lib): print "Build was successful"
	+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
	+if not os.path.exists("Makefile.lammps"):
	+ print "lib/%s/Makefile.lammps was NOT created" % lib
	diff --git a/lib/poems/README b/lib/poems/README
	index 836595bdd..e0ded85e4 100644
	--- a/lib/poems/README
	+++ b/lib/poems/README
	@@ -1,65 +1,70 @@
	POEMS (Parallelizable Open source Efficient Multibody Software) library

	Rudranarayan Mukherjee, RPI
	mukher at rpi.edu
	June 2006

	This is version 1.0 of the POEMS library, general purpose distributed
	multibody dynamics software, which is able to simulate the dynamics of
	articulated body systems.

	POEMS is supported by the funding agencies listed in the Grants' List.
	POEMS is an open source program distributed under the Rensselaer
	Scorec License.

	The Authors as listed in Authors' List reserve the right to reject the
	request on technical supports of the POEMS freely obtained.

	We are open to hear from you about bugs, an idea for improvement, and
	suggestions, etc. We keep improving the POEMS. Check the POEMS web
	site (www.rpi.edu/~anderk5/POEMS) for the recent changes.

	All correspondence regarding the POEMS should be sent to:

	By email: (preferred)
	Prof. Kurt Anderson (anderk5@rpi.edu) or
	Rudranarayan Mukherjee (mukher@rpi.edu) - include "[POEMS]" in the subject

	or by mail:
	Prof. Kurt S. Anderson
	4006 Jonsson Engineering Center
	Rensselaer Polytechnic Institute
	110 8th Street,
	Troy, NY 12180-3510, U.S.A.

	-------------------------------------------------

	This directory has source files to build a library that LAMMPS
	links against when using the POEMA package.

	This library must be built with a C++ compiler, before LAMMPS is
	built, so LAMMPS can link against it.

	+You can type "make lib-poems" from the src directory to see help on
	+how to build this library via make commands, or you can do the same
	+thing by typing "python Install.py" from within this directory, or you
	+can do it manually by following the instructions below.
	+
	Build the library using one of the provided Makefile.* files or create
	your own, specific to your compiler and system. For example:

	make -f Makefile.g++

	When you are done building this library, two files should
	exist in this directory:

	libpoems.a the library LAMMPS will link against
	Makefile.lammps settings the LAMMPS Makefile will import

	Makefile.lammps is created by the make command, by copying one of the
	Makefile.lammps.* files. See the EXTRAMAKE setting at the top of the
	Makefile.* files.

	Makefile.lammps has settings for 3 variables:

	user-poems_SYSINC = leave blank for this package
	user-poems_SYSLIB = leave blank for this package
	user-poems_SYSPATH = leave blank for this package

	Because this library does not currently need the additional settings
	the settings in Makefile.lammps.empty should work.
	diff --git a/lib/qmmm/Install.py b/lib/qmmm/Install.py
	new file mode 100644
	index 000000000..18b426f92
	--- /dev/null
	+++ b/lib/qmmm/Install.py
	@@ -0,0 +1,82 @@
	+#!/usr/bin/env python
	+
	+# install.py tool to do a generic build of a library
	+# soft linked to by many of the lib/Install.py files
	+# used to automate the steps described in the corresponding lib/README
	+
	+import sys,commands,os
	+
	+# help message
	+
	+help = """
	+Syntax: python Install.py -m machine -e suffix
	+ specify -m and optionally -e, order does not matter
	+ -m = peform a clean followed by "make -f Makefile.machine"
	+ machine = suffix of a lib/Makefile.* file
	+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
	+ does not alter existing Makefile.machine
	+"""
	+
	+# print error message or help
	+
	+def error(str=None):
	+ if not str: print help
	+ else: print "ERROR",str
	+ sys.exit()
	+
	+# parse args
	+
	+args = sys.argv[1:]
	+nargs = len(args)
	+if nargs == 0: error()
	+
	+machine = None
	+extraflag = 0
	+
	+iarg = 0
	+while iarg < nargs:
	+ if args[iarg] == "-m":
	+ if iarg+2 > nargs: error()
	+ machine = args[iarg+1]
	+ iarg += 2
	+ elif args[iarg] == "-e":
	+ if iarg+2 > nargs: error()
	+ extraflag = 1
	+ suffix = args[iarg+1]
	+ iarg += 2
	+ else: error()
	+
	+# set lib from working dir
	+
	+cwd = os.getcwd()
	+lib = os.path.basename(cwd)
	+
	+# create Makefile.auto as copy of Makefile.machine
	+# reset EXTRAMAKE if requested
	+
	+if not os.path.exists("Makefile.%s" % machine):
	+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
	+
	+lines = open("Makefile.%s" % machine,'r').readlines()
	+fp = open("Makefile.auto",'w')
	+
	+for line in lines:
	+ words = line.split()
	+ if len(words) == 3 and extraflag and \
	+ words[0] == "EXTRAMAKE" and words[1] == '=':
	+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
	+ print >>fp,line,
	+
	+fp.close()
	+
	+# make the library via Makefile.auto
	+
	+print "Building lib%s.a ..." % lib
	+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
	+txt = commands.getoutput(cmd)
	+print txt
	+
	+if os.path.exists("lib%s.a" % lib): print "Build was successful"
	+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
	+if not os.path.exists("Makefile.lammps"):
	+ print "lib/%s/Makefile.lammps was NOT created" % lib
	diff --git a/lib/qmmm/README b/lib/qmmm/README
	index b50f25ed6..2746c9e86 100644
	--- a/lib/qmmm/README
	+++ b/lib/qmmm/README
	@@ -1,187 +1,196 @@
	QM/MM support library

	Axel Kohlmeyer, akohlmey@gmail.com
	Temple University, Philadelphia and ICTP, Trieste

	with contributions by
	Carlo Cavazzoni & Mariella Ippolito
	Cineca, Italy

	This library provides the basic glue code to combine LAMMPS with the
	Quantum ESPRESSO package plane wave density functional theory code for
	performing QM/MM molecular dynamics simulations. More information on
	Quantum ESPRESSO can be found at: http://www.quantum-espresso.org

	The interface code itself is designed so it can also be combined with
	other QM codes, however only support for Quantum ESPRESSO is currently
	the only option. Adding support for a different QM code will require
	to write a new version of the top-level wrapper code, pwqmmm.c, and
	also an interface layer into the QM code similar to the one in QE.

	+You can type "make lib-qmmm" from the src directory to see help on how
	+to build this library (steps 1 and 2 below) via make commands, or you
	+can do the same thing by typing "python Install.py" from within this
	+directory, or you can do it manually by following the instructions
	+below.
	+
	+However you perform steps 1 and 2, you will need to perform steps 3
	+and 4 manually, as outlined below.
	+
	-------------------------------------------------

	WARNING: This is experimental code under developement and is provided
	at this early stage to encourage others to write interfaces to other
	QM codes. Please test very carefully before using this software for
	production calculations. At the time of the last update of this README
	(July 2016) you have to download a QE snapshot (revision 12611) from
	the QE subversion repository.

	At this point, both mechanical and multipole based electrostatic
	coupling have been successfully tested on a cluster of water
	molecules as included in the two example folders.

	-------------------------------------------------

	Building the QM/MM executable has to be done in multiple stages.

	Step 1)
	Build the qmmm coupling library in this directory using one of the
	provided Makefile.<compiler> files or create your own, specific to
	your compiler and system. For example with:

	make -f Makefile.gfortran

	When you are done building this library, two new files should
	exist in this directory:

	libqmmm.a the library LAMMPS will link against
	Makefile.lammps settings the LAMMPS Makefile will import

	Makefile.lammps is created by the make command by simply copying the
	Makefile.lammps.empty file. Currently no additional dependencies for
	this library exist.

	Step 2)
	Build a standalone LAMMPS executable as described in the LAMMPS
	documentation and include the USER-QMMM package. This executable
	is not functional for QM/MM, but it will usually be needed to
	run all MM calculations for equilibration and testing and also
	to confirm that the classical part of the code is set up correctly.

	Step 3)
	Build a standalone pw.x executable in the Quantum ESPRESSO directory
	and also make the "couple" target. At the time of this writing
	(July 2016) you have to download a QE snapshot (revision 12611)
	from the SVN repository, since no official release with the
	completed QM/MM support code has been made available yet. The current
	plan is to have a usable QM/MM interface released with the next
	Quantum ESPRESSO release version 6.0. Building the standalone pw.x
	binary is also needed to confirm that corresponding QM input is
	working correctly and to run test calculations on QM atoms only.

	Step 4)
	To compile and link the final QM/MM executable, which combines the
	compiled sources from both packages, you have to return to the lib/qmmm
	directory and now edit the Makefile.<compiler> for the Makefile
	configuration used to compile LAMMPS and also update the directory
	and library settings for the Quantum ESPRESSO installation.

	The makefile variable MPILIBS needs to be set to include all linker
	flags that will need to be used in addition to the various libraries
	from _both_ packages. Please see the provided example(s).

	"make -f Makefile.<compiler> all" will now recurse through both the
	Quantum ESPRESSO and LAMMPS directories to compile all files that
	require recompilation and then link the combined QM/MM executable.

	If you want to only update the local objects and the QM/MM executable,
	you can use "make -f Makefile.<compiler> pwqmmm.x"

	Please refer to the specific LAMMPS and Quantum ESPRESSO documentation
	for details on how to set up compilation for each package and make
	sure you have a set of settings and flags that allow you to build
	each package successfully, so that it can run on its own.

	-------------------------------------------------

	How it works.

	This directory has the source files for an interface layer and a
	toplevel code that combines objects/libraries from the QM code and
	LAMMPS to build a QM/MM executable. LAMMPS will act as the MD "driver"
	and will delegate computation of forces for the QM subset of the QM
	code, i.e. Quantum ESPRESSO currently. While the code is combined into
	a single executable, this executable can only act as either "QM slave",
	"MM slave" or "MM master" and information between those is done solely
	via MPI. Thus MPI is required to make it work, and both codes have
	to be configured to use the same MPI library.

	The toplevel code provided here will split the total number of cpus
	into three partitions: the first for running a DFT calculation, the
	second for running the "master" classical MD calculation, and the
	third for a "slave" classical MD calculation. Each calculation will
	have to be run in its own subdirectory with its own specific input
	data and will write its output there as well. This and other settings
	are provided in the QM/MM input file that is mandatory argument to the
	QM/MM executable. The number of MM cpus is provided as the optional
	second argument. The MM "slave" partition is always run with only 1
	cpu thus the minimum required number of MM CPU is 2, which is also
	the default. Therefore a QM/MM calculation with this code requires at
	least 3 processes.

	Thus the overall calling sequence is like this:

	mpirun -np <total #cpus> ./pwqmmm.x <QM/MM input> [<#cpus for MM>]

	A commented example QM/MM input file is given below.

	-------------------------------------------------

	To run a QM/MM calculation, you need to set up 4 inputs, each is
	best placed in a separate subdirectory:

	1: the total system as classical MD input. this becomes the MM master
	and in addition to the regular MD setup it needs to define a group,
	e.g. "wat" for the atoms that are treated as QM atoms and then add
	the QM/MM fix like this:

	fix 1 wat qmmm

	2: the QM system as classical MD input
	This system must only contain the atom (and bonds, angles, etc) for
	the subsystem that is supposed to be treated with the QM code. This
	will become the MM slave run and here the QM/MM fix needs to be
	applied to all atoms:

	fix 1 all qmmm

	3: the QM system as QM input
	This needs to be a cluster calculation for the QM subset, i.e. the
	same atoms as in the MM slave configuration. For Quantum ESPRESSO
	this is a regular input which in addition contains the line

	tqmmm = .true.

	in the &CONTROL namelist. This will make the include QE code
	connect to the LAMMPS code and receive updated positions while
	it sends QM forces back to the MM code.

	4: the fourth input is the QM/MM configuration file which tells the
	QM/MM wrapper code where to find the other 3 inputs, where to place
	the corresponding output of the partitions and how many MD steps are
	to run with this setup.

	-------------------------------------------------

	# configuration file for QMMM wrapper

	mode mech # coupling choices: o(ff), m(echanical), e(lectrostatic)
	steps 20 # number of QM/MM (MD) steps
	verbose 1 # verbosity level (0=no QM/MM output during run)
	restart water.restart # checkpoint/restart file to write out at end

	# QM system config
	qmdir qm-pw # directory to run QM system in
	qminp water.in # input file for QM code
	qmout NULL # output file for QM code (or NULL to print to screen)

	# MM master config
	madir mm-master # directory to run MM master in
	mainp water.in # input file for MM master
	maout water.out # output file for MM master (or NULL to print to screen)

	# MM slave config
	sldir mm-slave # directory to run MM slave in
	slinp water_single.in # input file for MM slave
	slout water_single.out # output file for MM slave (or NULL to print to screen)
	diff --git a/lib/reax/Install.py b/lib/reax/Install.py
	new file mode 100644
	index 000000000..18b426f92
	--- /dev/null
	+++ b/lib/reax/Install.py
	@@ -0,0 +1,82 @@
	+#!/usr/bin/env python
	+
	+# install.py tool to do a generic build of a library
	+# soft linked to by many of the lib/Install.py files
	+# used to automate the steps described in the corresponding lib/README
	+
	+import sys,commands,os
	+
	+# help message
	+
	+help = """
	+Syntax: python Install.py -m machine -e suffix
	+ specify -m and optionally -e, order does not matter
	+ -m = peform a clean followed by "make -f Makefile.machine"
	+ machine = suffix of a lib/Makefile.* file
	+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
	+ does not alter existing Makefile.machine
	+"""
	+
	+# print error message or help
	+
	+def error(str=None):
	+ if not str: print help
	+ else: print "ERROR",str
	+ sys.exit()
	+
	+# parse args
	+
	+args = sys.argv[1:]
	+nargs = len(args)
	+if nargs == 0: error()
	+
	+machine = None
	+extraflag = 0
	+
	+iarg = 0
	+while iarg < nargs:
	+ if args[iarg] == "-m":
	+ if iarg+2 > nargs: error()
	+ machine = args[iarg+1]
	+ iarg += 2
	+ elif args[iarg] == "-e":
	+ if iarg+2 > nargs: error()
	+ extraflag = 1
	+ suffix = args[iarg+1]
	+ iarg += 2
	+ else: error()
	+
	+# set lib from working dir
	+
	+cwd = os.getcwd()
	+lib = os.path.basename(cwd)
	+
	+# create Makefile.auto as copy of Makefile.machine
	+# reset EXTRAMAKE if requested
	+
	+if not os.path.exists("Makefile.%s" % machine):
	+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
	+
	+lines = open("Makefile.%s" % machine,'r').readlines()
	+fp = open("Makefile.auto",'w')
	+
	+for line in lines:
	+ words = line.split()
	+ if len(words) == 3 and extraflag and \
	+ words[0] == "EXTRAMAKE" and words[1] == '=':
	+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
	+ print >>fp,line,
	+
	+fp.close()
	+
	+# make the library via Makefile.auto
	+
	+print "Building lib%s.a ..." % lib
	+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
	+txt = commands.getoutput(cmd)
	+print txt
	+
	+if os.path.exists("lib%s.a" % lib): print "Build was successful"
	+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
	+if not os.path.exists("Makefile.lammps"):
	+ print "lib/%s/Makefile.lammps was NOT created" % lib
	diff --git a/lib/reax/README b/lib/reax/README
	index 2840a242a..f21a47061 100644
	--- a/lib/reax/README
	+++ b/lib/reax/README
	@@ -1,73 +1,78 @@
	ReaxFF library

	Aidan Thompson, Sandia National Labs
	athomps at sandia.gov
	Jan 2008

	This library is an implementation of the ReaxFF potential,
	specifically designed to work with LAMMPS. It is derived from Adri van
	Duin's original serial code, with intervening incarnations in CMDF and
	GRASP.

	-------------------------------------------------

	This directory has source files to build a library that LAMMPS
	links against when using the REAX package.

	This library must be built with a F90 compiler, before LAMMPS is
	built, so LAMMPS can link against it.

	+You can type "make lib-reax" from the src directory to see help on how
	+to build this library via make commands, or you can do the same thing
	+by typing "python Install.py" from within this directory, or you can
	+do it manually by following the instructions below.
	+
	Build the library using one of the provided Makefile.* files or create
	your own, specific to your compiler and system. For example:

	make -f Makefile.gfortran

	When you are done building this library, two files should
	exist in this directory:

	libreax.a the library LAMMPS will link against
	Makefile.lammps settings the LAMMPS Makefile will import

	Makefile.lammps is created by the make command, by copying one of the
	Makefile.lammps.* files. See the EXTRAMAKE setting at the top of the
	Makefile.* files.

	IMPORTANT: You must examine the final Makefile.lammps to insure it is
	correct for your system, else the LAMMPS build will likely fail.

	Makefile.lammps has settings for 3 variables:

	user-reax_SYSINC = leave blank for this package
	user-reax_SYSLIB = auxiliary F90 libs needed to link a F90 lib with
	a C++ program (LAMMPS) via a C++ compiler
	user-reax_SYSPATH = path(s) to where those libraries are

	Because you have a F90 compiler on your system, you should have these
	libraries. But you will have to figure out which ones are needed and
	where they are. Examples of common configurations are in the
	Makefile.lammps.* files.

	-------------------------------------------------

	Additional build notes:

	The include file reax_defs.h is used by both the ReaxFF library source
	files and the LAMMPS pair_reax.cpp source file (in package src/REAX).
	It contains dimensions of statically-allocated arrays created by the
	ReaxFF library. The size of these arrays must be set small enough to
	avoid exceeding the available machine memory, and large enough to fit
	the actual data generated by ReaxFF. If you change the values in
	reax_defs.h, you must first rebuild the library and then rebuild
	LAMMPS.

	This library is called by functions in pair_reax.cpp. The C++ to
	FORTRAN function calls in pair_reax.cpp assume that FORTRAN object
	names are converted to C object names by appending an underscore
	character. This is generally the case, but on machines that do not
	conform to this convention, you will need to modify either the C++
	code or your compiler settings. The name conversion is handled by the
	preprocessor macro called FORTRAN in the file pair_reax_fortran.h,
	which is included by pair_reax.cpp. Different definitions of this
	macro can be obtained by adding a machine-specific macro definition to
	the CCFLAGS variable in your your LAMMPS Makefile e.g. -D_IBM. See
	pair_reax_fortran.h for more info.
	diff --git a/lib/smd/Install.py b/lib/smd/Install.py
	new file mode 100644
	index 000000000..dc0a3187c
	--- /dev/null
	+++ b/lib/smd/Install.py
	@@ -0,0 +1,103 @@
	+#!/usr/bin/env python
	+
	+# Install.py tool to download, unpack, and point to the Eigen library
	+# used to automate the steps described in the README file in this dir
	+
	+import sys,os,re,glob,commands
	+
	+# help message
	+
	+help = """
	+Syntax: python Install.py -h hpath hdir -g -l
	+ specify one or more options, order does not matter
	+ -h = set home dir of Eigen to be hpath/hdir
	+ hpath can be full path, contain '~' or '.' chars
	+ default hpath = . = lib/smd
	+ default hdir = "ee" = what tarball unpacks to (eigen-eigen-*)
	+ -g = grab (download) tarball from http://eigen.tuxfamily.org website
	+ unpack it to hpath/hdir
	+ hpath must already exist
	+ if hdir already exists, it will be deleted before unpack
	+ -l = create softlink (includelink) in lib/smd to Eigen src dir
	+"""
	+
	+# settings
	+
	+url = "http://bitbucket.org/eigen/eigen/get/3.3.3.tar.gz"
	+tarball = "eigen.tar.gz"
	+
	+# print error message or help
	+
	+def error(str=None):
	+ if not str: print help
	+ else: print "ERROR",str
	+ sys.exit()
	+
	+# expand to full path name
	+# process leading '~' or relative path
	+
	+def fullpath(path):
	+ return os.path.abspath(os.path.expanduser(path))
	+
	+# parse args
	+
	+args = sys.argv[1:]
	+nargs = len(args)
	+if nargs == 0: error()
	+
	+homepath = "."
	+homedir = "ee"
	+
	+grabflag = 0
	+linkflag = 0
	+
	+iarg = 0
	+while iarg < nargs:
	+ if args[iarg] == "-h":
	+ if iarg+3 > nargs: error()
	+ homepath = args[iarg+1]
	+ homedir = args[iarg+2]
	+ iarg += 3
	+ elif args[iarg] == "-g":
	+ grabflag = 1
	+ iarg += 1
	+ elif args[iarg] == "-l":
	+ linkflag = 1
	+ iarg += 1
	+ else: error()
	+
	+homepath = fullpath(homepath)
	+if not os.path.isdir(homepath): error("Eigen path does not exist")
	+
	+# download and unpack Eigen tarball
	+# glob to find name of dir it unpacks to
	+
	+if grabflag:
	+ print "Downloading Eigen ..."
	+ cmd = "curl -L %s > %s/%s" % (url,homepath,tarball)
	+ print cmd
	+ print commands.getoutput(cmd)
	+
	+ print "Unpacking Eigen tarball ..."
	+ edir = glob.glob("%s/eigen-eigen-*" % homepath)
	+ for one in edir:
	+ if os.path.isdir(one): commands.getoutput("rm -rf %s" % one)
	+ cmd = "cd %s; tar zxvf %s" % (homepath,tarball)
	+ commands.getoutput(cmd)
	+ if homedir != "ee":
	+ if os.path.exists(homedir): commands.getoutput("rm -rf %s" % homedir)
	+ edir = glob.glob("%s/eigen-eigen-*" % homepath)
	+ os.rename(edir[0],"%s/%s" % (homepath,homedir))
	+
	+# create link in lib/smd to Eigen src dir
	+
	+if linkflag:
	+ print "Creating link to Eigen files"
	+ if os.path.isfile("includelink") or os.path.islink("includelink"):
	+ os.remove("includelink")
	+ if homedir == "ee":
	+ edir = glob.glob("%s/eigen-eigen-*" % homepath)
	+ linkdir = edir[0]
	+ else: linkdir = "%s/%s" % (homepath,homedir)
	+ cmd = "ln -s %s includelink" % linkdir
	+ commands.getoutput(cmd)
	diff --git a/lib/smd/README b/lib/smd/README
	index 846c440da..1bd5902a1 100644
	--- a/lib/smd/README
	+++ b/lib/smd/README
	@@ -1,41 +1,44 @@
	This directory contains links to the Eigen library which is required
	to use the USER-SMD package in a LAMMPS input script.

	The Eigen library is available at http://eigen.tuxfamily.org. It's
	a general C++ template library for linear algebra.

	-You must perform the following steps yourself, or you can use the
	-install.py Python script to automate any or all steps of the process.
	-Type "python install.py" for instructions.
	+You can type "make lib-smd" from the src directory to see help on how
	+to download build this library via make commands, or you can do the
	+same thing by typing "python Install.py" from within this directory,
	+or you can do it manually by following the instructions below.
	+
	+Instructions:

	1. Download the Eigen tarball at http://eigen.tuxfamily.org and
	unpack the tarball either in this /lib/smd directory or somewhere
	else on your system. It should unpack with into a directory with
	a name similar to eigen-eigen-bdd17ee3b1b3. You can rename
	the directory to just "eigen" if you wish. Note that Eigen is a
	template library, so you do not have to build it.

	2. Create a soft link in this dir (lib/smd)
	to the eigen directory. E.g if you unpacked Eigen in this dir:
	% ln -s eigen-eigen-bdd17ee3b1b3 includelink
	If you unpacked Eigen somewhere else and renamed
	the resulting directory to just eigen, then do something like this:
	% ln -s /home/sjplimp/tools/eigen includelink

	When these steps are complete you can build LAMMPS
	with the USER-SMD package installed:

	% cd lammps/src
	% make yes-user-smd
	% make g++ (or whatever target you wish)

	Note that if you download and unpack a new LAMMPS tarball, the
	"includelink" and "liblink" files will be lost and you will need to
	re-create them (step 2). If you unpacked the Eigen library in this
	directory (as opposed to somewhere else on your system), you will also
	need to repeat step 1.

	The Makefile.lammps file in this directory is there for compatibility
	with the way other libraries under the lib dir are linked with by
	LAMMPS. However, the Eigen library requires no auxiliary files or
	settings, so its variables are blank.
	diff --git a/lib/voronoi/Install.py b/lib/voronoi/Install.py
	index 8ae08917e..7d847183b 100644
	--- a/lib/voronoi/Install.py
	+++ b/lib/voronoi/Install.py
	@@ -1,116 +1,118 @@
	#!/usr/bin/env python

	-# install.py tool to download, unpack, build, and link to the Voro++ library
	+# Install.py tool to download, unpack, build, and link to the Voro++ library
	# used to automate the steps described in the README file in this dir

	import sys,os,re,urllib,commands

	# help message

	help = """
	-Syntax: install.py -v version -g gdir [gname] -b bdir -l ldir
	+Syntax: python Install.py -v version -h hpath hdir -g -b -l
	specify one or more options, order does not matter
	- gdir,bdir,ldir can be paths relative to lib/latte, full paths, or contain ~
	-v = version of Voro++ to download and build
	- default = voro++-0.4.6 (current as of Jan 2015)
	- -g = grab (download) from math.lbl.gov/voro++ website
	- unpack tarfile in gdir to produce version dir (e.g. voro++-0.4.6)
	- if optional gname specified, rename version dir to gname within gdir
	- -b = build Voro++, bdir = Voro++ home directory
	- note that bdir must include the version suffix unless renamed
	- -l = create 2 softlinks (includelink,liblink)
	- in lib/voronoi to src dir of ldir = Voro++ home directory
	- note that ldir must include the version suffix unless renamed
	+ default version = voro++-0.4.6 (current as of Jan 2015)
	+ -h = set home dir of Voro++ to be hpath/hdir
	+ hpath can be full path, contain '~' or '.' chars
	+ default hpath = . = lib/voronoi
	+ default hdir = voro++-0.4.6 = what tarball unpacks to
	+ -g = grab (download) tarball from math.lbl.gov/voro++ website
	+ unpack it to hpath/hdir
	+ hpath must already exist
	+ if hdir already exists, it will be deleted before unpack
	+ -b = build Voro++ library in its src dir
	+ -l = create 2 softlinks (includelink,liblink) in lib/voronoi to Voro++ src dir
	"""

	# settings

	version = "voro++-0.4.6"
	url = "http://math.lbl.gov/voro++/download/dir/%s.tar.gz" % version

	# print error message or help

	def error(str=None):
	if not str: print help
	else: print "ERROR",str
	sys.exit()

	# expand to full path name
	# process leading '~' or relative path

	def fullpath(path):
	return os.path.abspath(os.path.expanduser(path))

	# parse args

	args = sys.argv[1:]
	nargs = len(args)
	if nargs == 0: error()

	+homepath = "."
	+homedir = version
	+
	grabflag = 0
	buildflag = 0
	linkflag = 0

	iarg = 0
	while iarg < nargs:
	if args[iarg] == "-v":
	if iarg+2 > nargs: error()
	version = args[iarg+1]
	- iarg += 2
	+ iarg += 2
	+ elif args[iarg] == "-h":
	+ if iarg+3 > nargs: error()
	+ homepath = args[iarg+1]
	+ homedir = args[iarg+2]
	+ iarg += 3
	elif args[iarg] == "-g":
	- if iarg+2 > nargs: error()
	grabflag = 1
	- grabdir = args[iarg+1]
	- grabname = None
	- if iarg+2 < nargs and args[iarg+2][0] != '-':
	- grabname = args[iarg+2]
	- iarg += 1
	- iarg += 2
	+ iarg += 1
	elif args[iarg] == "-b":
	- if iarg+2 > nargs: error()
	buildflag = 1
	- builddir = args[iarg+1]
	- iarg += 2
	+ iarg += 1
	elif args[iarg] == "-l":
	- if iarg+2 > nargs: error()
	linkflag = 1
	- linkdir = args[iarg+1]
	- iarg += 2
	+ iarg += 1
	else: error()

	+homepath = fullpath(homepath)
	+if not os.path.isdir(homepath): error("Voro++ path does not exist")
	+homedir = "%s/%s" % (homepath,homedir)
	+
	# download and unpack Voro++ tarball

	if grabflag:
	print "Downloading Voro++ ..."
	- grabdir = fullpath(grabdir)
	- if not os.path.isdir(grabdir): error("Grab directory does not exist")
	- urllib.urlretrieve(url,"%s/%s.tar.gz" % (grabdir,version))
	+ urllib.urlretrieve(url,"%s/%s.tar.gz" % (homepath,version))

	print "Unpacking Voro++ tarball ..."
	- tardir = "%s/%s" % (grabdir,version)
	- if os.path.exists(tardir): commands.getoutput("rm -rf %s" % tardir)
	- cmd = "cd %s; tar zxvf %s.tar.gz" % (grabdir,version)
	- txt = commands.getoutput(cmd)
	- print tardir,grabdir,grabname
	- if grabname: os.rename(tardir,"%s/%s" % (grabdir,grabname))
	+ if os.path.exists("%s/%s" % (homepath,version)):
	+ commands.getoutput("rm -rf %s/%s" % (homepath,version))
	+ cmd = "cd %s; tar zxvf %s.tar.gz" % (homepath,version)
	+ commands.getoutput(cmd)
	+ if os.path.basename(homedir) != version:
	+ if os.path.exists(homedir): commands.getoutput("rm -rf %s" % homedir)
	+ os.rename("%s/%s" % (homepath,version),homedir)

	# build Voro++

	if buildflag:
	print "Building Voro++ ..."
	- cmd = "cd %s; make" % builddir
	+ cmd = "cd %s; make" % homedir
	txt = commands.getoutput(cmd)
	print txt

	# create 2 links in lib/voronoi to Voro++ src dir

	if linkflag:
	print "Creating links to Voro++ include and lib files"
	if os.path.isfile("includelink") or os.path.islink("includelink"):
	os.remove("includelink")
	if os.path.isfile("liblink") or os.path.islink("liblink"):
	os.remove("liblink")
	- cmd = "ln -s %s/src includelink" % linkdir
	+ cmd = "ln -s %s/src includelink" % homedir
	commands.getoutput(cmd)
	- cmd = "ln -s %s/src liblink" % linkdir
	+ cmd = "ln -s %s/src liblink" % homedir
	commands.getoutput(cmd)
	diff --git a/lib/voronoi/README b/lib/voronoi/README
	index 2507a9bae..9863632be 100644
	--- a/lib/voronoi/README
	+++ b/lib/voronoi/README
	@@ -1,59 +1,63 @@
	This directory contains links to the Voro++ library which is required
	to use the VORONOI package and its compute voronoi/atom command in a
	LAMMPS input script.

	The Voro++ library is available at http://math.lbl.gov/voro++ and was
	developed by Chris H. Rycroft while at UC Berkeley / Lawrence Berkeley
	Laboratory.

	+You can type "make lib-voronoi" from the src directory to see help on
	+how to download and build this library via make commands, or you can
	+do the same thing by typing "python Install.py" from within this
	+directory, or you can do it manually by following the instructions
	+below.
	+
	-----------------

	-You must perform the following steps yourself, or you can use the
	-Install.py Python script to automate any or all steps of the process.
	-Type "python Install.py" for instructions.
	+Instructions:

	1. Download Voro++ at http://math.lbl.gov/voro++/download
	either as a tarball or via SVN, and unpack the
	tarball either in this /lib/voronoi directory
	or somewhere else on your system.

	2. compile Voro++ from within its home directory
	% make

	3. There is no need to install Voro++ if you only wish
	to use it from LAMMPS. You can install it if you
	wish to use it stand-alone or from other codes:
	a) install under the default /usr/local
	% sudo make install
	b) install under a user-writeable location by first
	changing the PREFIX variable in the config.mk file, then
	% make install

	4. Create two soft links in this dir (lib/voronoi)
	to the Voro++ src directory is. E.g if you built Voro++ in this dir:
	% ln -s voro++-0.4.6/src includelink
	% ln -s voro++-0.4.6/src liblink
	These links could instead be set to the include and lib
	directories created by a Voro++ install, e.g.
	% ln -s /usr/local/include includelink
	% ln -s /usr/local/lib liblink

	-----------------

	When these steps are complete you can build LAMMPS
	with the VORONOI package installed:

	% cd lammps/src
	% make yes-voronoi
	% make g++ (or whatever target you wish)

	Note that if you download and unpack a new LAMMPS tarball, the
	"includelink" and "liblink" files will be lost and you will need to
	re-create them (step 4). If you built Voro++ in this directory (as
	opposed to somewhere else on your system) and did not install it
	somewhere else, you will also need to repeat steps 1,2,3.

	The Makefile.lammps file in this directory is there for compatibility
	with the way other libraries under the lib dir are linked with by
	LAMMPS. However, Voro++ requires no auxiliary files or settings, so
	its variables are blank.
	diff --git a/lib/vtk/Makefile.lammps b/lib/vtk/Makefile.lammps
	index e3b28ed92..b86856a9c 100644
	--- a/lib/vtk/Makefile.lammps
	+++ b/lib/vtk/Makefile.lammps
	@@ -1,13 +1,12 @@
	# Settings that the LAMMPS build will import when this package library is used
	-#
	+
	# settings for VTK-5.8.0 on RHEL/CentOS 6.x
	vtk_SYSINC = -I/usr/include/vtk
	vtk_SYSLIB = -lvtkCommon -lvtkIO
	vtk_SYSPATH = -L/usr/lib64/vtk
	-#
	+
	# settings for VTK 6.2.0 on Fedora 23
	#vtk_SYSINC = -I/usr/include/vtk
	#vtk_SYSLIB = -lvtkCommonCore -lvtkIOCore -lvtkCommonDataModel -lvtkIOXML -lvtkIOLegacy -lvtkIOParallelXML
	#vtk_SYSPATH = -L/usr/lib64/vtk
	-#

	diff --git a/lib/vtk/README b/lib/vtk/README
	index 11add94f5..61e2a40c2 100644
	--- a/lib/vtk/README
	+++ b/lib/vtk/README
	@@ -1,28 +1,30 @@
	-The Makefile.lammps file in this directory is used when building LAMMPS with
	-its USER-VTK package installed. The file has several settings needed to
	-compile and link LAMMPS with the VTK library. You should choose a
	-Makefile.lammps.* file compatible with your system and your version of VTK, and
	-copy it to Makefile.lammps before building LAMMPS itself. You may need to edit
	-one of the provided files to match your system.
	+The Makefile.lammps file in this directory is used when building
	+LAMMPS with its USER-VTK package installed. The file has several
	+settings needed to compile and link LAMMPS with the VTK library. You
	+should choose a Makefile.lammps.* file compatible with your system and
	+your version of VTK, and copy it to Makefile.lammps before building
	+LAMMPS itself. You may need to edit one of the provided files to
	+match your system.

	-If you create a new Makefile.lammps file suitable for some version of VTK on
	-some system, that is not a match to one of the provided Makefile.lammps.*
	-files, you can send it to the developers, and we can include it in the
	-distribution for others to use.
	+If you create a new Makefile.lammps file suitable for some version of
	+VTK on some system, that is not a match to one of the provided
	+Makefile.lammps.* files, you can send it to the developers, and we can
	+include it in the distribution for others to use.

	To illustrate, these are example settings from the
	Makefile.lammps.ubuntu14.04_vtk6 file:

	vtk_SYSINC = -I/usr/include/vtk-6.0
	vtk_SYSLIB = -lvtkCommonCore-6.0 -lvtkIOCore-6.0 -lvtkIOXML-6.0 -lvtkIOLegacy-6.0 -lvtkCommonDataModel-6.0
	vtk_SYSPATH =

	vtk_SYSINC refers to the include directory of the installed VTK library

	-vtk_SYSLIB refers to the libraries needed to link to from an application
	-(LAMMPS in this case) to "embed" VTK in the application. VTK consists of
	-multiple shared libraries which are needed when using the USER-VTK package.
	+vtk_SYSLIB refers to the libraries needed to link to from an
	+application (LAMMPS in this case) to "embed" VTK in the
	+application. VTK consists of multiple shared libraries which are
	+needed when using the USER-VTK package.

	-vtk_SYSPATH = refers to the path (e.g. -L/usr/local/lib) where the VTK library
	-can be found. You may not need this setting if the path is already included in
	-your LD_LIBRARY_PATH environment variable.
	+vtk_SYSPATH = refers to the path (e.g. -L/usr/local/lib) where the VTK
	+library can be found. You may not need this setting if the path is
	+already included in your LD_LIBRARY_PATH environment variable.
	diff --git a/src/.gitignore b/src/.gitignore
	index bb6f0a392..1327704e4 100644
	--- a/src/.gitignore
	+++ b/src/.gitignore
	@@ -1,1066 +1,1068 @@
	/Makefile.package
	/Makefile.package.settings
	/MAKE/MINE
	/Make.py.last
	/lmp_*

	/style_*.h

	/*_gpu.h
	/*_gpu.cpp
	/*_intel.h
	/*_intel.cpp
	/*_kokkos.h
	/*_kokkos.cpp
	/*_omp.h
	/*_omp.cpp
	/*_tally.h
	/*_tally.cpp
	/*_rx.h
	/*_rx.cpp
	/*_ssa.h
	/*_ssa.cpp

	/kokkos.cpp
	/kokkos.h
	/kokkos_type.h
	/kokkos_few.h

	/manifold*.cpp
	/manifold*.h
	/fix_manifold.cpp
	/fix_manifold.h

	/fix_qeq*.cpp
	/fix_qeq*.h

	/compute_test_nbl.cpp
	/compute_test_nbl.h
	/pair_multi_lucy.cpp
	/pair_multi_lucy.h

	/colvarproxy_lammps.cpp
	/colvarproxy_lammps.h
	/fix_colvars.cpp
	/fix_colvars.h
	/dump_molfile.cpp
	/dump_molfile.h
	/molfile_interface.cpp
	/molfile_interface.h
	-/molfile_plugin.h
	-/vmdplugin.h
	/type_detector.h

	/intel_buffers.cpp
	/intel_buffers.h
	/intel_intrinsics.h
	/intel_preprocess.h
	/intel_simd.h

	/compute_sna_atom.cpp
	/compute_sna_atom.h
	/compute_snad_atom.cpp
	/compute_snad_atom.h
	/compute_snav_atom.cpp
	/compute_snav_atom.h
	/openmp_snap.h
	/pair_snap.cpp
	/pair_snap.h
	/sna.cpp
	/sna.h

	/atom_vec_wavepacket.cpp
	/atom_vec_wavepacket.h
	/fix_nve_awpmd.cpp
	/fix_nve_awpmd.h
	/pair_awpmd_cut.cpp
	/pair_awpmd_cut.h

	-/dihedral_charmmfsh.cpp
	-/dihedral_charmmfsh.h
	+/dihedral_charmmfsw.cpp
	+/dihedral_charmmfsw.h
	/pair_lj_charmmfsw_coul_charmmfsh.cpp
	/pair_lj_charmmfsw_coul_charmmfsh.h
	/pair_lj_charmmfsw_coul_long.cpp
	/pair_lj_charmmfsw_coul_long.h

	/angle_cg_cmm.cpp
	/angle_cg_cmm.h
	/angle_charmm.cpp
	/angle_charmm.h
	/angle_class2.cpp
	/angle_class2.h
	/angle_cosine.cpp
	/angle_cosine.h
	/angle_cosine_delta.cpp
	/angle_cosine_delta.h
	/angle_cosine_periodic.cpp
	/angle_cosine_periodic.h
	/angle_cosine_shift.cpp
	/angle_cosine_shift.h
	/angle_cosine_shift_exp.cpp
	/angle_cosine_shift_exp.h
	/angle_cosine_squared.cpp
	/angle_cosine_squared.h
	/angle_dipole.cpp
	/angle_dipole.h
	/angle_fourier.cpp
	/angle_fourier.h
	/angle_fourier_simple.cpp
	/angle_fourier_simple.h
	/angle_harmonic.cpp
	/angle_harmonic.h
	/angle_quartic.cpp
	/angle_quartic.h
	/angle_sdk.cpp
	/angle_sdk.h
	/angle_table.cpp
	/angle_table.h
	/atom_vec_angle.cpp
	/atom_vec_angle.h
	/atom_vec_bond.cpp
	/atom_vec_bond.h
	/atom_vec_colloid.cpp
	/atom_vec_colloid.h
	/atom_vec_dipole.cpp
	/atom_vec_dipole.h
	/atom_vec_dpd.cpp
	/atom_vec_dpd.h
	/atom_vec_electron.cpp
	/atom_vec_electron.h
	/atom_vec_ellipsoid.cpp
	/atom_vec_ellipsoid.h
	/atom_vec_full.cpp
	/atom_vec_full.h
	/atom_vec_full_hars.cpp
	/atom_vec_full_hars.h
	/atom_vec_granular.cpp
	/atom_vec_granular.h
	/atom_vec_meso.cpp
	/atom_vec_meso.h
	/atom_vec_molecular.cpp
	/atom_vec_molecular.h
	/atom_vec_peri.cpp
	/atom_vec_peri.h
	/atom_vec_template.cpp
	/atom_vec_template.h
	/body_nparticle.cpp
	/body_nparticle.h
	/bond_class2.cpp
	/bond_class2.h
	/bond_fene.cpp
	/bond_fene.h
	/bond_fene_expand.cpp
	/bond_fene_expand.h
	/bond_harmonic.cpp
	/bond_harmonic.h
	/bond_harmonic_shift.cpp
	/bond_harmonic_shift.h
	/bond_harmonic_shift_cut.cpp
	/bond_harmonic_shift_cut.h
	/bond_morse.cpp
	/bond_morse.h
	/bond_nonlinear.cpp
	/bond_nonlinear.h
	/bond_oxdna_fene.cpp
	/bond_oxdna_fene.h
	+/bond_oxdna2_fene.cpp
	+/bond_oxdna2_fene.h
	/bond_quartic.cpp
	/bond_quartic.h
	/bond_table.cpp
	/bond_table.h
	/cg_cmm_parms.cpp
	/cg_cmm_parms.h
	/commgrid.cpp
	/commgrid.h
	/compute_ackland_atom.cpp
	/compute_ackland_atom.h
	/compute_basal_atom.cpp
	/compute_basal_atom.h
	/compute_body_local.cpp
	/compute_body_local.h
	/compute_cna_atom2.cpp
	/compute_cna_atom2.h
	/compute_damage_atom.cpp
	/compute_damage_atom.h
	/compute_dilatation_atom.cpp
	/compute_dilatation_atom.h
	/compute_dpd.cpp
	/compute_dpd.h
	/compute_dpd_atom.cpp
	/compute_dpd_atom.h
	/compute_erotate_asphere.cpp
	/compute_erotate_asphere.h
	/compute_erotate_rigid.cpp
	/compute_erotate_rigid.h
	/compute_event_displace.cpp
	/compute_event_displace.h
	/compute_fep.cpp
	/compute_fep.h
	/compute_force_tally.cpp
	/compute_force_tally.h
	/compute_heat_flux_tally.cpp
	/compute_heat_flux_tally.h
	/compute_ke_atom_eff.cpp
	/compute_ke_atom_eff.h
	/compute_ke_eff.cpp
	/compute_ke_eff.h
	/compute_ke_rigid.cpp
	/compute_ke_rigid.h
	/compute_meso_e_atom.cpp
	/compute_meso_e_atom.h
	/compute_meso_rho_atom.cpp
	/compute_meso_rho_atom.h
	/compute_meso_t_atom.cpp
	/compute_meso_t_atom.h
	/compute_msd_nongauss.cpp
	/compute_msd_nongauss.h
	/compute_pe_tally.cpp
	/compute_pe_tally.h
	/compute_plasticity_atom.cpp
	/compute_plasticity_atom.h
	/compute_pressure_grem.cpp
	/compute_pressure_grem.h
	/compute_rigid_local.cpp
	/compute_rigid_local.h
	/compute_spec_atom.cpp
	/compute_spec_atom.h
	/compute_stress_tally.cpp
	/compute_stress_tally.h
	/compute_temp_asphere.cpp
	/compute_temp_asphere.h
	/compute_temp_body.cpp
	/compute_temp_body.h
	/compute_temp_deform_eff.cpp
	/compute_temp_deform_eff.h
	/compute_temp_eff.cpp
	/compute_temp_eff.h
	/compute_temp_region_eff.cpp
	/compute_temp_region_eff.h
	/compute_temp_rotate.cpp
	/compute_temp_rotate.h
	/compute_ti.cpp
	/compute_ti.h
	/compute_voronoi_atom.cpp
	/compute_voronoi_atom.h
	/dihedral_charmm.cpp
	/dihedral_charmm.h
	/dihedral_class2.cpp
	/dihedral_class2.h
	/dihedral_cosine_shift_exp.cpp
	/dihedral_cosine_shift_exp.h
	/dihedral_fourier.cpp
	/dihedral_fourier.h
	/dihedral_harmonic.cpp
	/dihedral_harmonic.h
	/dihedral_helix.cpp
	/dihedral_helix.h
	/dihedral_hybrid.cpp
	/dihedral_hybrid.h
	/dihedral_multi_harmonic.cpp
	/dihedral_multi_harmonic.h
	/dihedral_nharmonic.cpp
	/dihedral_nharmonic.h
	/dihedral_opls.cpp
	/dihedral_opls.h
	/dihedral_quadratic.cpp
	/dihedral_quadratic.h
	/dihedral_spherical.cpp
	/dihedral_spherical.h
	/dihedral_table.cpp
	/dihedral_table.h
	/dump_atom_gz.cpp
	/dump_atom_gz.h
	/dump_xyz_gz.cpp
	/dump_xyz_gz.h
	/dump_atom_mpiio.cpp
	/dump_atom_mpiio.h
	/dump_cfg_gz.cpp
	/dump_cfg_gz.h
	/dump_cfg_mpiio.cpp
	/dump_cfg_mpiio.h
	/dump_custom_gz.cpp
	/dump_custom_gz.h
	/dump_custom_mpiio.cpp
	/dump_custom_mpiio.h
	/dump_custom_vtk.cpp
	/dump_custom_vtk.h
	/dump_h5md.cpp
	/dump_h5md.h
	/dump_nc.cpp
	/dump_nc.h
	/dump_nc_mpiio.cpp
	/dump_nc_mpiio.h
	/dump_xtc.cpp
	/dump_xtc.h
	/dump_xyz_mpiio.cpp
	/dump_xyz_mpiio.h
	/ewald.cpp
	/ewald.h
	/ewald_cg.cpp
	/ewald_cg.h
	/ewald_disp.cpp
	/ewald_disp.h
	/ewald_n.cpp
	/ewald_n.h
	/fft3d.cpp
	/fft3d.h
	/fft3d_wrap.cpp
	/fft3d_wrap.h
	/fix_adapt_fep.cpp
	/fix_adapt_fep.h
	/fix_addtorque.cpp
	/fix_addtorque.h
	/fix_append_atoms.cpp
	/fix_append_atoms.h
	/fix_atc.cpp
	/fix_atc.h
	/fix_ave_correlate_long.cpp
	/fix_ave_correlate_long.h
	/fix_bond_break.cpp
	/fix_bond_break.h
	/fix_bond_create.cpp
	/fix_bond_create.h
	/fix_bond_swap.cpp
	/fix_bond_swap.h
	/fix_cmap.cpp
	/fix_cmap.h
	/fix_deposit.cpp
	/fix_deposit.h
	/fix_dpd_energy.cpp
	/fix_dpd_energy.h
	/fix_efield.cpp
	/fix_efield.h
	/fix_eos_cv.cpp
	/fix_eos_cv.h
	/fix_eos_table.cpp
	/fix_eos_table.h
	/fix_evaporate.cpp
	/fix_evaporate.h
	/fix_filter_corotate.cpp
	/fix_filter_corotate.h
	/fix_viscosity.cpp
	/fix_viscosity.h
	/fix_ehex.cpp
	/fix_ehex.h
	/fix_event.cpp
	/fix_event.h
	/fix_event_prd.cpp
	/fix_event_prd.h
	/fix_event_tad.cpp
	/fix_event_tad.h
	/fix_flow_gauss.cpp
	/fix_flow_gauss.h
	/fix_freeze.cpp
	/fix_freeze.h
	/fix_gcmc.cpp
	/fix_gcmc.h
	/fix_gld.cpp
	/fix_gld.h
	/fix_gle.cpp
	/fix_gle.h
	/fix_gpu.cpp
	/fix_gpu.h
	/fix_grem.cpp
	/fix_grem.h
	/fix_imd.cpp
	/fix_imd.h
	/fix_ipi.cpp
	/fix_ipi.h
	/fix_lambdah_calc.cpp
	/fix_lambdah_calc.h
	/fix_langevin_eff.cpp
	/fix_langevin_eff.h
	/fix_lb_fluid.cpp
	/fix_lb_fluid.h
	/fix_lb_momentum.cpp
	/fix_lb_momentum.h
	/fix_lb_pc.cpp
	/fix_lb_pc.h
	/fix_lb_rigid_pc_sphere.cpp
	/fix_lb_rigid_pc_sphere.h
	/fix_lb_viscous.cpp
	/fix_lb_viscous.h
	/fix_load_report.cpp
	/fix_load_report.h
	/fix_meso.cpp
	/fix_meso.h
	/fix_meso_stationary.cpp
	/fix_meso_stationary.h
	/fix_mscg.cpp
	/fix_mscg.h
	/fix_msst.cpp
	/fix_msst.h
	/fix_neb.cpp
	/fix_neb.h
	/fix_nh_asphere.cpp
	/fix_nh_asphere.h
	/fix_nph_asphere.cpp
	/fix_nph_asphere.h
	/fix_npt_asphere.cpp
	/fix_npt_asphere.h
	/fix_nve_asphere.cpp
	/fix_nve_asphere.h
	/fix_nve_asphere_noforce.cpp
	/fix_nve_asphere_noforce.h
	/fix_nve_dot.cpp
	/fix_nve_dot.h
	/fix_nve_dotc_langevin.cpp
	/fix_nve_dotc_langevin.h
	/fix_nh_body.cpp
	/fix_nh_body.h
	/fix_nph_body.cpp
	/fix_nph_body.h
	/fix_npt_body.cpp
	/fix_npt_body.h
	/fix_nvk.cpp
	/fix_nvk.h
	/fix_nvt_body.cpp
	/fix_nvt_body.h
	/fix_nve_body.cpp
	/fix_nve_body.h
	/fix_nvt_asphere.cpp
	/fix_nvt_asphere.h
	/fix_nh_eff.cpp
	/fix_nh_eff.h
	/fix_nph_eff.cpp
	/fix_nph_eff.h
	/fix_nphug.cpp
	/fix_nphug.h
	/fix_npt_eff.cpp
	/fix_npt_eff.h
	/fix_nve_eff.cpp
	/fix_nve_eff.h
	/fix_nve_line.cpp
	/fix_nve_line.h
	/fix_nvt_eff.cpp
	/fix_nvt_eff.h
	/fix_nvt_sllod_eff.cpp
	/fix_nvt_sllod_eff.h
	/fix_nve_tri.cpp
	/fix_nve_tri.h
	/fix_oneway.cpp
	/fix_oneway.h
	/fix_orient_bcc.cpp
	/fix_orient_bcc.h
	/fix_orient_fcc.cpp
	/fix_orient_fcc.h
	/fix_peri_neigh.cpp
	/fix_peri_neigh.h
	/fix_phonon.cpp
	/fix_phonon.h
	/fix_poems.cpp
	/fix_poems.h
	/fix_pour.cpp
	/fix_pour.h
	/fix_qeq_comb.cpp
	/fix_qeq_comb.h
	/fix_qeq_reax.cpp
	/fix_qeq_fire.cpp
	/fix_qeq_fire.h
	/fix_qeq_reax.h
	/fix_qmmm.cpp
	/fix_qmmm.h
	/fix_reax_bonds.cpp
	/fix_reax_bonds.h
	/fix_reax_c.cpp
	/fix_reax_c.h
	/fix_reaxc_bonds.cpp
	/fix_reaxc_bonds.h
	/fix_reaxc_species.cpp
	/fix_reaxc_species.h
	/fix_rigid.cpp
	/fix_rigid.h
	/fix_rigid_nh.cpp
	/fix_rigid_nh.h
	/fix_rigid_nph.cpp
	/fix_rigid_nph.h
	/fix_rigid_npt.cpp
	/fix_rigid_npt.h
	/fix_rigid_nve.cpp
	/fix_rigid_nve.h
	/fix_rigid_nvt.cpp
	/fix_rigid_nvt.h
	/fix_rigid_nh_small.cpp
	/fix_rigid_nh_small.h
	/fix_rigid_nph_small.cpp
	/fix_rigid_nph_small.h
	/fix_rigid_npt_small.cpp
	/fix_rigid_npt_small.h
	/fix_rigid_nve_small.cpp
	/fix_rigid_nve_small.h
	/fix_rigid_nvt_small.cpp
	/fix_rigid_nvt_small.h
	/fix_rigid_small.cpp
	/fix_rigid_small.h
	/fix_shake.cpp
	/fix_shake.h
	/fix_shardlow.cpp
	/fix_shardlow.h
	/fix_smd.cpp
	/fix_smd.h
	/fix_species.cpp
	/fix_species.h
	/fix_spring_pull.cpp
	/fix_spring_pull.h
	/fix_srd.cpp
	/fix_srd.h
	/fix_temp_rescale_eff.cpp
	/fix_temp_rescale_eff.h
	/fix_thermal_conductivity.cpp
	/fix_thermal_conductivity.h
	/fix_ti_rs.cpp
	/fix_ti_rs.h
	/fix_ti_spring.cpp
	/fix_ti_spring.h
	/fix_ttm.cpp
	/fix_ttm.h
	/fix_tune_kspace.cpp
	/fix_tune_kspace.h
	/fix_wall_colloid.cpp
	/fix_wall_colloid.h
	/fix_wall_gran.cpp
	/fix_wall_gran.h
	/fix_wall_gran_region.cpp
	/fix_wall_gran_region.h
	/fix_wall_piston.cpp
	/fix_wall_piston.h
	/fix_wall_srd.cpp
	/fix_wall_srd.h
	/gpu_extra.h
	/gridcomm.cpp
	/gridcomm.h
	/group_ndx.cpp
	/group_ndx.h
	/ndx_group.cpp
	/ndx_group.h
	/improper_class2.cpp
	/improper_class2.h
	/improper_cossq.cpp
	/improper_cossq.h
	/improper_cvff.cpp
	/improper_cvff.h
	/improper_distance.cpp
	/improper_distance.h
	/improper_fourier.cpp
	/improper_fourier.h
	/improper_harmonic.cpp
	/improper_harmonic.h
	/improper_hybrid.cpp
	/improper_hybrid.h
	/improper_ring.cpp
	/improper_ring.h
	/improper_umbrella.cpp
	/improper_umbrella.h
	/kissfft.h
	/lj_sdk_common.h
	/math_complex.h
	/math_vector.h
	/mgpt_*.cpp
	/mgpt_*.h
	/msm.cpp
	/msm.h
	/msm_cg.cpp
	/msm_cg.h
	/neb.cpp
	/neb.h

	/pair_adp.cpp
	/pair_adp.h
	/pair_agni.cpp
	/pair_agni.h
	/pair_airebo.cpp
	/pair_airebo.h
	/pair_airebo_morse.cpp
	/pair_airebo_morse.h
	/pair_body.cpp
	/pair_body.h
	/pair_bop.cpp
	/pair_bop.h
	/pair_born_coul_long.cpp
	/pair_born_coul_long.h
	/pair_born_coul_msm.cpp
	/pair_born_coul_msm.h
	/pair_brownian.cpp
	/pair_brownian.h
	/pair_brownian_poly.cpp
	/pair_brownian_poly.h
	/pair_buck_coul_long.cpp
	/pair_buck_coul_long.h
	/pair_buck_coul_msm.cpp
	/pair_buck_coul_msm.h
	/pair_buck_coul.cpp
	/pair_buck_coul.h
	/pair_buck_long_coul_long.cpp
	/pair_buck_long_coul_long.h
	/pair_cdeam.cpp
	/pair_cdeam.h
	/pair_cg_cmm.cpp
	/pair_cg_cmm.h
	/pair_cg_cmm_coul_cut.cpp
	/pair_cg_cmm_coul_cut.h
	/pair_cg_cmm_coul_long.cpp
	/pair_cg_cmm_coul_long.h
	/pair_cmm_common.cpp
	/pair_cmm_common.h
	/pair_cg_cmm_coul_msm.cpp
	/pair_cg_cmm_coul_msm.h
	/pair_comb.cpp
	/pair_comb.h
	/pair_comb3.cpp
	/pair_comb3.h
	/pair_colloid.cpp
	/pair_colloid.h
	/pair_coul_diel.cpp
	/pair_coul_diel.h
	/pair_coul_long.cpp
	/pair_coul_long.h
	/pair_coul_msm.cpp
	/pair_coul_msm.h
	/pair_dipole_cut.cpp
	/pair_dipole_cut.h
	/pair_dipole_sf.cpp
	/pair_dipole_sf.h
	/pair_dpd_mt.cpp
	/pair_dpd_mt.h
	/pair_dsmc.cpp
	/pair_dsmc.h
	/pair_eam.cpp
	/pair_eam.h
	/pair_eam_opt.cpp
	/pair_eam_opt.h
	/pair_eam_alloy.cpp
	/pair_eam_alloy.h
	/pair_eam_alloy_opt.cpp
	/pair_eam_alloy_opt.h
	/pair_eam_fs.cpp
	/pair_eam_fs.h
	/pair_eam_fs_opt.cpp
	/pair_eam_fs_opt.h
	/pair_edip.cpp
	/pair_edip.h
	/pair_eff_cut.cpp
	/pair_eff_cut.h
	/pair_eff_inline.h
	/pair_eim.cpp
	/pair_eim.h
	/pair_gauss_cut.cpp
	/pair_gauss_cut.h
	/pair_gayberne.cpp
	/pair_gayberne.h
	/pair_gran_easy.cpp
	/pair_gran_easy.h
	/pair_gran_hertz_history.cpp
	/pair_gran_hertz_history.h
	/pair_gran_hooke.cpp
	/pair_gran_hooke.h
	/pair_gran_hooke_history.cpp
	/pair_gran_hooke_history.h
	/pair_gw.cpp
	/pair_gw.h
	/pair_gw_zbl.cpp
	/pair_gw_zbl.h
	/pair_hbond_dreiding_lj.cpp
	/pair_hbond_dreiding_lj.h
	/pair_hbond_dreiding_morse.cpp
	/pair_hbond_dreiding_morse.h
	/pair_kolmogorov_crespi_z.cpp
	/pair_kolmogorov_crespi_z.h
	/pair_lcbop.cpp
	/pair_lcbop.h
	/pair_line_lj.cpp
	/pair_line_lj.h
	/pair_list.cpp
	/pair_list.h
	/pair_lj_charmm_coul_charmm.cpp
	/pair_lj_charmm_coul_charmm.h
	/pair_lj_charmm_coul_charmm_implicit.cpp
	/pair_lj_charmm_coul_charmm_implicit.h
	/pair_lj_charmm_coul_long.cpp
	/pair_lj_charmm_coul_long.h
	/pair_lj_charmm_coul_long_opt.cpp
	/pair_lj_charmm_coul_long_opt.h
	/pair_lj_charmm_coul_long_soft.cpp
	/pair_lj_charmm_coul_long_soft.h
	/pair_lj_charmm_coul_msm.cpp
	/pair_lj_charmm_coul_msm.h
	/pair_lj_class2.cpp
	/pair_lj_class2.h
	/pair_lj_class2_coul_cut.cpp
	/pair_lj_class2_coul_cut.h
	/pair_lj_class2_coul_long.cpp
	/pair_lj_class2_coul_long.h
	/pair_lj_coul.cpp
	/pair_lj_coul.h
	/pair_coul_cut_soft.cpp
	/pair_coul_cut_soft.h
	/pair_coul_long_soft.cpp
	/pair_coul_long_soft.h
	/pair_lj_cut_coul_cut_soft.cpp
	/pair_lj_cut_coul_cut_soft.h
	/pair_lj_cut_tip4p_cut.cpp
	/pair_lj_cut_tip4p_cut.h
	/pair_lj_cut_coul_long.cpp
	/pair_lj_cut_coul_long.h
	/pair_lj_cut_coul_long_opt.cpp
	/pair_lj_cut_coul_long_opt.h
	/pair_lj_cut_coul_long_soft.cpp
	/pair_lj_cut_coul_long_soft.h
	/pair_lj_cut_coul_msm.cpp
	/pair_lj_cut_coul_msm.h
	/pair_lj_cut_dipole_cut.cpp
	/pair_lj_cut_dipole_cut.h
	/pair_lj_cut_dipole_long.cpp
	/pair_lj_cut_dipole_long.h
	/pair_lj_cut_hars_.cpp
	/pair_lj_cut_hars_.h
	/pair_lj_cut_soft.cpp
	/pair_lj_cut_soft.h
	/pair_lj_cut_tip4p_long.cpp
	/pair_lj_cut_tip4p_long.h
	/pair_lj_cut_tip4p_long_opt.cpp
	/pair_lj_cut_tip4p_long_opt.h
	/pair_lj_cut_tip4p_long_soft.cpp
	/pair_lj_cut_tip4p_long_soft.h
	/pair_lj_long_coul_long.cpp
	/pair_lj_long_coul_long.h
	/pair_lj_long_coul_long_opt.cpp
	/pair_lj_long_coul_long_opt.h
	/pair_lj_long_dipole_long.cpp
	/pair_lj_long_dipole_long.h
	/pair_lj_long_tip4p_long.cpp
	/pair_lj_long_tip4p_long.h
	/pair_lj_cut_opt.cpp
	/pair_lj_cut_opt.h
	/pair_lj_cut_tgpu.cpp
	/pair_lj_cut_tgpu.h
	/pair_lj_sdk.cpp
	/pair_lj_sdk.h
	/pair_lj_sdk_coul_long.cpp
	/pair_lj_sdk_coul_long.h
	/pair_lj_sdk_coul_msm.cpp
	/pair_lj_sdk_coul_msm.h
	/pair_lj_sf.cpp
	/pair_lj_sf.h
	/pair_lj_sf_dipole_sf.cpp
	/pair_lj_sf_dipole_sf.h
	/pair_lubricateU.cpp
	/pair_lubricateU.h
	/pair_lubricateU_poly.cpp
	/pair_lubricateU_poly.h
	/pair_lubricate_poly.cpp
	/pair_lubricate_poly.h
	/pair_lubricate.cpp
	/pair_lubricate.h
	/pair_meam.cpp
	/pair_meam.h
	/pair_meam_spline.cpp
	/pair_meam_spline.h
	/pair_meam_sw_spline.cpp
	/pair_meam_sw_spline.h
	/pair_morse_opt.cpp
	/pair_morse_opt.h
	/pair_morse_soft.cpp
	/pair_morse_soft.h
	/pair_nb3b_harmonic.cpp
	/pair_nb3b_harmonic.h
	/pair_nm_cut.cpp
	/pair_nm_cut.h
	/pair_nm_cut_coul_cut.cpp
	/pair_nm_cut_coul_cut.h
	/pair_nm_cut_coul_long.cpp
	/pair_nm_cut_coul_long.h
	/pair_oxdna_*.cpp
	/pair_oxdna_*.h
	+/pair_oxdna2_*.cpp
	+/pair_oxdna2_*.h
	/mf_oxdna.h
	/pair_peri_eps.cpp
	/pair_peri_eps.h
	/pair_peri_lps.cpp
	/pair_peri_lps.h
	/pair_peri_pmb.cpp
	/pair_peri_pmb.h
	/pair_peri_ves.cpp
	/pair_peri_ves.h
	/pair_reax.cpp
	/pair_reax.h
	/pair_reax_fortran.h
	/pair_reax_c.cpp
	/pair_reax_c.h
	/pair_rebo.cpp
	/pair_rebo.h
	/pair_resquared.cpp
	/pair_resquared.h
	/pair_sph_heatconduction.cpp
	/pair_sph_heatconduction.h
	/pair_sph_idealgas.cpp
	/pair_sph_idealgas.h
	/pair_sph_lj.cpp
	/pair_sph_lj.h
	/pair_sph_rhosum.cpp
	/pair_sph_rhosum.h
	/pair_sph_taitwater.cpp
	/pair_sph_taitwater.h
	/pair_sph_taitwater_morris.cpp
	/pair_sph_taitwater_morris.h
	/pair_sw.cpp
	/pair_sw.h
	/pair_tersoff.cpp
	/pair_tersoff.h
	/pair_tersoff_mod.cpp
	/pair_tersoff_mod.h
	/pair_tersoff_mod_c.cpp
	/pair_tersoff_mod_c.h
	/pair_tersoff_table.cpp
	/pair_tersoff_table.h
	/pair_tersoff_zbl.cpp
	/pair_tersoff_zbl.h
	/pair_tip4p_cut.cpp
	/pair_tip4p_cut.h
	/pair_tip4p_long.cpp
	/pair_tip4p_long.h
	/pair_tip4p_long_soft.cpp
	/pair_tip4p_long_soft.h
	/pair_tri_lj.cpp
	/pair_tri_lj.h
	/pair_yukawa_colloid.cpp
	/pair_yukawa_colloid.h
	/pair_momb.cpp
	/pair_momb.h
	/pppm.cpp
	/pppm.h
	/pppm_cg.cpp
	/pppm_cg.h
	/pppm_disp.cpp
	/pppm_disp.h
	/pppm_disp_tip4p.cpp
	/pppm_disp_tip4p.h
	/pppm_old.cpp
	/pppm_old.h
	/pppm_proxy.cpp
	/pppm_proxy.h
	/pppm_stagger.cpp
	/pppm_stagger.h
	/pppm_tip4p.cpp
	/pppm_tip4p.h
	/pppm_tip4p_proxy.cpp
	/pppm_tip4p_proxy.h
	/pppm_tip4p_cg.cpp
	/pppm_tip4p_cg.h
	/prd.cpp
	/prd.h
	/python_impl.cpp
	/python_impl.h
	/reader_molfile.cpp
	/reader_molfile.h
	/reaxc_allocate.cpp
	/reaxc_allocate.h
	/reaxc_basic_comm.cpp
	/reaxc_basic_comm.h
	/reaxc_bond_orders.cpp
	/reaxc_bond_orders.h
	/reaxc_bonds.cpp
	/reaxc_bonds.h
	/reaxc_control.cpp
	/reaxc_control.h
	/reaxc_defs.h
	/reaxc_ffield.cpp
	/reaxc_ffield.h
	/reaxc_forces.cpp
	/reaxc_forces.h
	/reaxc_hydrogen_bonds.cpp
	/reaxc_hydrogen_bonds.h
	/reaxc_init_md.cpp
	/reaxc_init_md.h
	/reaxc_io_tools.cpp
	/reaxc_io_tools.h
	/reaxc_list.cpp
	/reaxc_list.h
	/reaxc_lookup.cpp
	/reaxc_lookup.h
	/reaxc_multi_body.cpp
	/reaxc_multi_body.h
	/reaxc_nonbonded.cpp
	/reaxc_nonbonded.h
	/reaxc_reset_tools.cpp
	/reaxc_reset_tools.h
	/reaxc_system_props.cpp
	/reaxc_system_props.h
	/reaxc_tool_box.cpp
	/reaxc_tool_box.h
	/reaxc_torsion_angles.cpp
	/reaxc_torsion_angles.h
	/reaxc_traj.cpp
	/reaxc_traj.h
	/reaxc_types.h
	/reaxc_valence_angles.cpp
	/reaxc_valence_angles.h
	/reaxc_vector.cpp
	/reaxc_vector.h
	/remap.cpp
	/remap.h
	/remap_wrap.cpp
	/remap_wrap.h
	/restart_mpiio.cpp
	/restart_mpiio.h
	/smd_kernels.h
	/smd_material_models.cpp
	/smd_material_models.h
	/smd_math.h
	/tad.cpp
	/tad.h
	/temper.cpp
	/temper.h
	/temper_grem.cpp
	/temper_grem.h
	/thr_data.cpp
	/thr_data.h
	/verlet_split.cpp
	/verlet_split.h
	/write_dump.cpp
	/write_dump.h
	/xdr_compat.cpp
	/xdr_compat.h

	/atom_vec_smd.cpp
	/atom_vec_smd.h
	/compute_saed.cpp
	/compute_saed.h
	/compute_saed_consts.h
	/compute_smd_contact_radius.cpp
	/compute_smd_contact_radius.h
	/compute_smd_damage.cpp
	/compute_smd_damage.h
	/compute_smd_hourglass_error.cpp
	/compute_smd_hourglass_error.h
	/compute_smd_internal_energy.cpp
	/compute_smd_internal_energy.h
	/compute_smd_plastic_strain.cpp
	/compute_smd_plastic_strain.h
	/compute_smd_plastic_strain_rate.cpp
	/compute_smd_plastic_strain_rate.h
	/compute_smd_rho.cpp
	/compute_smd_rho.h
	/compute_smd_tlsph_defgrad.cpp
	/compute_smd_tlsph_defgrad.h
	/compute_smd_tlsph_dt.cpp
	/compute_smd_tlsph_dt.h
	/compute_smd_tlsph_num_neighs.cpp
	/compute_smd_tlsph_num_neighs.h
	/compute_smd_tlsph_shape.cpp
	/compute_smd_tlsph_shape.h
	/compute_smd_tlsph_strain.cpp
	/compute_smd_tlsph_strain.h
	/compute_smd_tlsph_strain_rate.cpp
	/compute_smd_tlsph_strain_rate.h
	/compute_smd_tlsph_stress.cpp
	/compute_smd_tlsph_stress.h
	/compute_smd_triangle_mesh_vertices.cpp
	/compute_smd_triangle_mesh_vertices.h
	/compute_smd_ulsph_effm.cpp
	/compute_smd_ulsph_effm.h
	/compute_smd_ulsph_num_neighs.cpp
	/compute_smd_ulsph_num_neighs.h
	/compute_smd_ulsph_strain.cpp
	/compute_smd_ulsph_strain.h
	/compute_smd_ulsph_strain_rate.cpp
	/compute_smd_ulsph_strain_rate.h
	/compute_smd_ulsph_stress.cpp
	/compute_smd_ulsph_stress.h
	/compute_smd_vol.cpp
	/compute_smd_vol.h
	/compute_temp_cs.cpp
	/compute_temp_cs.h
	/compute_temp_drude.cpp
	/compute_temp_drude.h
	/compute_xrd.cpp
	/compute_xrd.h
	/compute_xrd_consts.h
	/fix_atom_swap.cpp
	/fix_atom_swap.h
	/fix_ave_spatial_sphere.cpp
	/fix_ave_spatial_sphere.h
	/fix_drude.cpp
	/fix_drude.h
	/fix_drude_transform.cpp
	/fix_drude_transform.h
	/fix_langevin_drude.cpp
	/fix_langevin_drude.h
	/fix_pimd.cpp
	/fix_pimd.h
	/fix_qbmsst.cpp
	/fix_qbmsst.h
	/fix_qtb.cpp
	/fix_qtb.h
	/fix_rattle.cpp
	/fix_rattle.h
	/fix_saed_vtk.cpp
	/fix_saed_vtk.h
	/fix_smd_adjust_dt.cpp
	/fix_smd_adjust_dt.h
	/fix_smd_integrate_tlsph.cpp
	/fix_smd_integrate_tlsph.h
	/fix_smd_integrate_ulsph.cpp
	/fix_smd_integrate_ulsph.h
	/fix_smd_move_triangulated_surface.cpp
	/fix_smd_move_triangulated_surface.h
	/fix_smd_setvel.cpp
	/fix_smd_setvel.h
	/fix_smd_tlsph_reference_configuration.cpp
	/fix_smd_tlsph_reference_configuration.h
	/fix_smd_wall_surface.cpp
	/fix_smd_wall_surface.h
	/fix_srp.cpp
	/fix_srp.h
	/fix_tfmc.cpp
	/fix_tfmc.h
	/fix_ttm_mod.cpp
	/fix_ttm_mod.h
	/pair_born_coul_long_cs.cpp
	/pair_born_coul_long_cs.h
	/pair_born_coul_dsf_cs.cpp
	/pair_born_coul_dsf_cs.h
	/pair_buck_coul_long_cs.cpp
	/pair_buck_coul_long_cs.h
	/pair_coul_long_cs.cpp
	/pair_coul_long_cs.h
	/pair_lj_cut_thole_long.cpp
	/pair_lj_cut_thole_long.h
	/pair_plum_hb.cpp
	/pair_plum_hb.h
	/pair_plum_hp.cpp
	/pair_plum_hp.h
	/pair_polymorphic.cpp
	/pair_polymorphic.h
	/pair_smd_hertz.cpp
	/pair_smd_hertz.h
	/pair_smd_tlsph.cpp
	/pair_smd_tlsph.h
	/pair_smd_triangulated_surface.cpp
	/pair_smd_triangulated_surface.h
	/pair_smd_ulsph.cpp
	/pair_smd_ulsph.h
	/pair_srp.cpp
	/pair_srp.h
	/pair_thole.cpp
	/pair_thole.h
	/pair_buck_mdf.cpp
	/pair_buck_mdf.h
	/pair_dpd_conservative.cpp
	/pair_dpd_conservative.h
	/pair_dpd_fdt.cpp
	/pair_dpd_fdt.h
	/pair_dpd_fdt_energy.cpp
	/pair_dpd_fdt_energy.h
	/pair_lennard_mdf.cpp
	/pair_lennard_mdf.h
	/pair_lj_cut_coul_long_cs.cpp
	/pair_lj_cut_coul_long_cs.h
	/pair_lj_mdf.cpp
	/pair_lj_mdf.h
	/pair_mgpt.cpp
	/pair_mgpt.h
	/pair_morse_smooth_linear.cpp
	/pair_morse_smooth_linear.h
	/pair_smtbq.cpp
	/pair_smtbq.h
	/pair_vashishta*.cpp
	/pair_vashishta*.h

	diff --git a/src/Depend.sh b/src/Depend.sh
	index 5a48a7c16..520d9ae2b 100644
	--- a/src/Depend.sh
	+++ b/src/Depend.sh
	@@ -1,129 +1,129 @@
	# Depend.sh = Install/unInstall files due to package dependencies
	# this script is invoked after any package is installed/uninstalled

	# enforce using portable C locale
	LC_ALL=C
	export LC_ALL

	# all parent/child package dependencies should be listed below
	# parent package = has files that files in another package derive from
	# child package = has files that derive from files in another package

	# update child packages that depend on the parent,
	# but only if the child package is already installed
	# this is necessary to insure the child package installs
	# only child files whose parent package files are now installed
	# decisions on (un)installing individual child files are made by
	# the Install.sh script in the child package

	# depend function: arg = child-package
	# checks if child-package is installed, if not just return
	# otherwise invoke update of child package via its Install.sh

	depend () {
	cd $1
	installed=0
	for file in .cpp .h; do
	if (test -e ../$file) then
	installed=1
	fi
	done

	cd ..
	if (test $installed = 0) then
	return
	fi

	echo " updating package $1"
	if (test -e $1/Install.sh) then
	cd $1; /bin/sh Install.sh 2; cd ..
	else
	cd $1; /bin/sh ../Install.sh 2; cd ..
	fi
	}

	# add one if statement per parent package
	# add one depend() call per child package that depends on that parent

	if (test $1 = "ASPHERE") then
	depend GPU
	depend USER-OMP
	depend USER-CGDNA
	depend USER-INTEL
	fi

	if (test $1 = "CLASS2") then
	depend GPU
	depend KOKKOS
	depend USER-OMP
	fi

	if (test $1 = "COLLOID") then
	depend GPU
	depend USER-OMP
	fi

	if (test $1 = "DIPOLE") then
	depend USER-MISC
	depend USER-OMP
	fi

	if (test $1 = "GRANULAR") then
	depend USER-OMP
	fi

	if (test $1 = "KSPACE") then
	depend CORESHELL
	depend GPU
	depend KOKKOS
	depend OPT
	depend USER-OMP
	depend USER-INTEL
	depend USER-PHONON
	depend USER-FEP
	fi

	if (test $1 = "MANYBODY") then
	depend GPU
	depend KOKKOS
	depend OPT
	depend USER-MISC
	depend USER-OMP
	fi

	if (test $1 = "MOLECULE") then
	depend GPU
	depend KOKKOS
	depend USER-MISC
	depend USER-OMP
	depend USER-FEP
	depend USER-CGDNA
	depend USER-INTEL
	fi

	if (test $1 = "PERI") then
	depend USER-OMP
	fi

	if (test $1 = "RIGID") then
	depend USER-OMP
	fi

	-if (test $1 = "USER-CG-CMM") then
	+if (test $1 = "USER-CGSDK") then
	depend GPU
	depend KOKKOS
	depend USER-OMP
	fi

	if (test $1 = "USER-FEP") then
	depend USER-OMP
	fi

	if (test $1 = "USER-MISC") then
	depend GPU
	depend USER-OMP
	fi

	if (test $1 = "USER-REAXC") then
	depend KOKKOS
	fi
	diff --git a/src/GPU/pair_lj_sdk_coul_long_gpu.cpp b/src/GPU/pair_lj_sdk_coul_long_gpu.cpp
	index 0b8d0f3b3..77c0dc066 100644
	--- a/src/GPU/pair_lj_sdk_coul_long_gpu.cpp
	+++ b/src/GPU/pair_lj_sdk_coul_long_gpu.cpp
	@@ -1,352 +1,352 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Mike Brown (SNL)
	------------------------------------------------------------------------- */

	#include <math.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include "pair_lj_sdk_coul_long_gpu.h"
	#include "atom.h"
	#include "atom_vec.h"
	#include "comm.h"
	#include "force.h"
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "integrate.h"
	#include "memory.h"
	#include "error.h"
	#include "neigh_request.h"
	#include "universe.h"
	#include "update.h"
	#include "domain.h"
	#include <string.h>
	#include "kspace.h"
	#include "gpu_extra.h"

	#define EWALD_F 1.12837917
	#define EWALD_P 0.3275911
	#define A1 0.254829592
	#define A2 -0.284496736
	#define A3 1.421413741
	#define A4 -1.453152027
	#define A5 1.061405429

	using namespace LAMMPS_NS;

	// External functions from cuda library for atom decomposition

	-int cmml_gpu_init(const int ntypes, double cutsq, int lj_type,
	+int sdkl_gpu_init(const int ntypes, double cutsq, int lj_type,
	double host_lj1, double host_lj2, double **host_lj3,
	double host_lj4, double offset, double *special_lj,
	const int nlocal, const int nall, const int max_nbors,
	const int maxspecial, const double cell_size, int &gpu_mode,
	FILE screen, double *host_cut_ljsq, double host_cut_coulsq,
	double *host_special_coul, const double qqrd2e,
	const double g_ewald);
	-void cmml_gpu_clear();
	-int ** cmml_gpu_compute_n(const int ago, const int inum, const int nall,
	+void sdkl_gpu_clear();
	+int ** sdkl_gpu_compute_n(const int ago, const int inum, const int nall,
	double *host_x, int host_type, double *sublo,
	double subhi, tagint tag, int **nspecial,
	tagint **special, const bool eflag, const bool vflag,
	const bool eatom, const bool vatom, int &host_start,
	int ilist, int jnum, const double cpu_time,
	bool &success, double host_q, double boxlo,
	double *prd);
	-void cmml_gpu_compute(const int ago, const int inum, const int nall,
	+void sdkl_gpu_compute(const int ago, const int inum, const int nall,
	double *host_x, int host_type, int ilist, int numj,
	int **firstneigh, const bool eflag, const bool vflag,
	const bool eatom, const bool vatom, int &host_start,
	const double cpu_time, bool &success, double *host_q,
	const int nlocal, double boxlo, double prd);
	-double cmml_gpu_bytes();
	+double sdkl_gpu_bytes();

	#include "lj_sdk_common.h"


	using namespace LJSDKParms;

	/* ---------------------------------------------------------------------- */

	PairLJSDKCoulLongGPU::PairLJSDKCoulLongGPU(LAMMPS *lmp) :
	PairLJSDKCoulLong(lmp), gpu_mode(GPU_FORCE)
	{
	respa_enable = 0;
	reinitflag = 0;
	cpu_time = 0.0;
	GPU_EXTRA::gpu_ready(lmp->modify, lmp->error);
	}

	/* ----------------------------------------------------------------------
	free all arrays
	------------------------------------------------------------------------- */

	PairLJSDKCoulLongGPU::~PairLJSDKCoulLongGPU()
	{
	- cmml_gpu_clear();
	+ sdkl_gpu_clear();
	}

	/* ---------------------------------------------------------------------- */

	void PairLJSDKCoulLongGPU::compute(int eflag, int vflag)
	{
	if (eflag \|\| vflag) ev_setup(eflag,vflag);
	else evflag = vflag_fdotr = 0;

	int nall = atom->nlocal + atom->nghost;
	int inum, host_start;

	bool success = true;
	int ilist, numneigh, **firstneigh;
	if (gpu_mode != GPU_FORCE) {
	inum = atom->nlocal;
	- firstneigh = cmml_gpu_compute_n(neighbor->ago, inum, nall, atom->x,
	+ firstneigh = sdkl_gpu_compute_n(neighbor->ago, inum, nall, atom->x,
	atom->type, domain->sublo, domain->subhi,
	atom->tag, atom->nspecial, atom->special,
	eflag, vflag, eflag_atom, vflag_atom,
	host_start, &ilist, &numneigh, cpu_time,
	success, atom->q, domain->boxlo,
	domain->prd);
	} else {
	inum = list->inum;
	ilist = list->ilist;
	numneigh = list->numneigh;
	firstneigh = list->firstneigh;
	- cmml_gpu_compute(neighbor->ago, inum, nall, atom->x, atom->type,
	+ sdkl_gpu_compute(neighbor->ago, inum, nall, atom->x, atom->type,
	ilist, numneigh, firstneigh, eflag, vflag, eflag_atom,
	vflag_atom, host_start, cpu_time, success, atom->q,
	atom->nlocal, domain->boxlo, domain->prd);
	}
	if (!success)
	error->one(FLERR,"Insufficient memory on accelerator");

	if (host_start<inum) {
	cpu_time = MPI_Wtime();
	if (evflag) {
	if (eflag) cpu_compute<1,1>(host_start, inum, ilist, numneigh, firstneigh);
	else cpu_compute<1,0>(host_start, inum, ilist, numneigh, firstneigh);
	} else cpu_compute<0,0>(host_start, inum, ilist, numneigh, firstneigh);
	cpu_time = MPI_Wtime() - cpu_time;
	}
	}

	/* ----------------------------------------------------------------------
	init specific to this pair style
	------------------------------------------------------------------------- */

	void PairLJSDKCoulLongGPU::init_style()
	{
	if (!atom->q_flag)
	error->all(FLERR,"Pair style lj/sdk/coul/long/gpu requires atom attribute q");
	if (force->newton_pair)
	error->all(FLERR,"Cannot use newton pair with lj/sdk/coul/long/gpu pair style");

	// Repeat cutsq calculation because done after call to init_style
	double maxcut = -1.0;
	double cut;
	for (int i = 1; i <= atom->ntypes; i++) {
	for (int j = i; j <= atom->ntypes; j++) {
	if (setflag[i][j] != 0 \|\| (setflag[i][i] != 0 && setflag[j][j] != 0)) {
	cut = init_one(i,j);
	cut *= cut;
	if (cut > maxcut)
	maxcut = cut;
	cutsq[i][j] = cutsq[j][i] = cut;
	} else
	cutsq[i][j] = cutsq[j][i] = 0.0;
	}
	}
	double cell_size = sqrt(maxcut) + neighbor->skin;

	cut_coulsq = cut_coul * cut_coul;

	// insure use of KSpace long-range solver, set g_ewald

	if (force->kspace == NULL)
	error->all(FLERR,"Pair style is incompatible with KSpace style");
	g_ewald = force->kspace->g_ewald;

	// setup force tables

	if (ncoultablebits) init_tables(cut_coul,NULL);

	int maxspecial=0;
	if (atom->molecular)
	maxspecial=atom->maxspecial;
	- int success = cmml_gpu_init(atom->ntypes+1, cutsq, lj_type, lj1, lj2, lj3,
	+ int success = sdkl_gpu_init(atom->ntypes+1, cutsq, lj_type, lj1, lj2, lj3,
	lj4, offset, force->special_lj, atom->nlocal,
	atom->nlocal+atom->nghost, 300, maxspecial,
	cell_size, gpu_mode, screen, cut_ljsq,
	cut_coulsq, force->special_coul,
	force->qqrd2e, g_ewald);
	GPU_EXTRA::check_flag(success,error,world);

	if (gpu_mode == GPU_FORCE) {
	int irequest = neighbor->request(this,instance_me);
	neighbor->requests[irequest]->half = 0;
	neighbor->requests[irequest]->full = 1;
	}
	}

	/* ---------------------------------------------------------------------- */

	double PairLJSDKCoulLongGPU::memory_usage()
	{
	double bytes = Pair::memory_usage();
	- return bytes + cmml_gpu_bytes();
	+ return bytes + sdkl_gpu_bytes();
	}

	/* ---------------------------------------------------------------------- */
	template <int EVFLAG, int EFLAG>
	void PairLJSDKCoulLongGPU::cpu_compute(int start, int inum, int *ilist,
	int numneigh, int *firstneigh)
	{
	int i,j,ii,jj;
	double qtmp,xtmp,ytmp,ztmp;
	double r2inv,forcecoul,forcelj,factor_coul,factor_lj;

	const double * const * const x = atom->x;
	double * const * const f = atom->f;
	const double * const q = atom->q;
	const int * const type = atom->type;
	const double * const special_coul = force->special_coul;
	const double * const special_lj = force->special_lj;
	const double qqrd2e = force->qqrd2e;
	double fxtmp,fytmp,fztmp;

	// loop over neighbors of my atoms

	for (ii = start; ii < inum; ii++) {
	i = ilist[ii];
	qtmp = q[i];
	xtmp = x[i][0];
	ytmp = x[i][1];
	ztmp = x[i][2];
	fxtmp=fytmp=fztmp=0.0;

	const int itype = type[i];
	const int * const jlist = firstneigh[i];
	const int jnum = numneigh[i];

	for (jj = 0; jj < jnum; jj++) {
	j = jlist[jj];
	factor_lj = special_lj[sbmask(j)];
	factor_coul = special_coul[sbmask(j)];
	j &= NEIGHMASK;

	const double delx = xtmp - x[j][0];
	const double dely = ytmp - x[j][1];
	const double delz = ztmp - x[j][2];
	const double rsq = delxdelx + delydely + delz*delz;
	const int jtype = type[j];

	double evdwl = 0.0;
	double ecoul = 0.0;
	double fpair = 0.0;

	if (rsq < cutsq[itype][jtype]) {
	r2inv = 1.0/rsq;
	const int ljt = lj_type[itype][jtype];

	if (rsq < cut_coulsq) {
	if (!ncoultablebits \|\| rsq <= tabinnersq) {
	const double r = sqrt(rsq);
	const double grij = g_ewald * r;
	const double expm2 = exp(-grij*grij);
	const double t = 1.0 / (1.0 + EWALD_P*grij);
	const double erfc = t * (A1+t(A2+t(A3+t(A4+tA5)))) * expm2;
	const double prefactor = qqrd2e * qtmp*q[j]/r;
	forcecoul = prefactor * (erfc + EWALD_Fgrijexpm2);
	if (EFLAG) ecoul = prefactor*erfc;
	if (factor_coul < 1.0) {
	forcecoul -= (1.0-factor_coul)*prefactor;
	if (EFLAG) ecoul -= (1.0-factor_coul)*prefactor;
	}
	} else {
	union_int_float_t rsq_lookup;
	rsq_lookup.f = rsq;
	int itable = rsq_lookup.i & ncoulmask;
	itable >>= ncoulshiftbits;
	const double fraction = (rsq_lookup.f - rtable[itable]) *
	drtable[itable];
	const double table = ftable[itable] + fraction*dftable[itable];
	forcecoul = qtmpq[j] table;
	if (EFLAG) {
	const double table2 = etable[itable] + fraction*detable[itable];
	ecoul = qtmpq[j] table2;
	}
	if (factor_coul < 1.0) {
	const double table2 = ctable[itable] + fraction*dctable[itable];
	const double prefactor = qtmpq[j] table2;
	forcecoul -= (1.0-factor_coul)*prefactor;
	if (EFLAG) ecoul -= (1.0-factor_coul)*prefactor;
	}
	}
	} else {
	forcecoul = 0.0;
	ecoul = 0.0;
	}


	if (rsq < cut_ljsq[itype][jtype]) {

	if (ljt == LJ12_4) {
	const double r4inv=r2inv*r2inv;
	forcelj = r4inv(lj1[itype][jtype]r4inv*r4inv
	- lj2[itype][jtype]);

	if (EFLAG)
	evdwl = r4inv(lj3[itype][jtype]r4inv*r4inv
	- lj4[itype][jtype]) - offset[itype][jtype];

	} else if (ljt == LJ9_6) {
	const double r3inv = r2inv*sqrt(r2inv);
	const double r6inv = r3inv*r3inv;
	forcelj = r6inv(lj1[itype][jtype]r3inv
	- lj2[itype][jtype]);
	if (EFLAG)
	evdwl = r6inv(lj3[itype][jtype]r3inv
	- lj4[itype][jtype]) - offset[itype][jtype];

	} else if (ljt == LJ12_6) {
	const double r6inv = r2invr2invr2inv;
	forcelj = r6inv(lj1[itype][jtype]r6inv
	- lj2[itype][jtype]);
	if (EFLAG)
	evdwl = r6inv(lj3[itype][jtype]r6inv
	- lj4[itype][jtype]) - offset[itype][jtype];
	}

	if (EFLAG) evdwl *= factor_lj;

	} else {
	forcelj=0.0;
	evdwl = 0.0;
	}

	fpair = (forcecoul + factor_ljforcelj) r2inv;

	fxtmp += delx*fpair;
	fytmp += dely*fpair;
	fztmp += delz*fpair;

	if (EVFLAG) ev_tally_full(i,evdwl,ecoul,fpair,delx,dely,delz);
	}
	}
	f[i][0] += fxtmp;
	f[i][1] += fytmp;
	f[i][2] += fztmp;
	}
	}
	diff --git a/src/GPU/pair_lj_sdk_coul_long_gpu.h b/src/GPU/pair_lj_sdk_coul_long_gpu.h
	index 61de27297..3248e9497 100644
	--- a/src/GPU/pair_lj_sdk_coul_long_gpu.h
	+++ b/src/GPU/pair_lj_sdk_coul_long_gpu.h
	@@ -1,69 +1,68 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifdef PAIR_CLASS

	PairStyle(lj/sdk/coul/long/gpu,PairLJSDKCoulLongGPU)
	-PairStyle(cg/cmm/coul/long/gpu,PairLJSDKCoulLongGPU)

	#else

	#ifndef LMP_PAIR_LJ_SDK_COUL_LONG_GPU_H
	#define LMP_PAIR_LJ_SDK_COUL_LONG_GPU_H

	#include "pair_lj_sdk_coul_long.h"

	namespace LAMMPS_NS {

	class PairLJSDKCoulLongGPU : public PairLJSDKCoulLong {
	public:
	PairLJSDKCoulLongGPU(LAMMPS *lmp);
	~PairLJSDKCoulLongGPU();
	template <int, int>
	void cpu_compute(int, int, int , int , int **);
	void compute(int, int);
	void init_style();
	double memory_usage();

	enum { GPU_FORCE, GPU_NEIGH, GPU_HYB_NEIGH };

	private:
	int gpu_mode;
	double cpu_time;
	};

	}
	#endif
	#endif

	/* ERROR/WARNING messages:

	E: Insufficient memory on accelerator

	There is insufficient memory on one of the devices specified for the gpu
	package

	E: Pair style lj/sdk/coul/long/gpu requires atom attribute q

	The atom style defined does not have this attribute.

	E: Cannot use newton pair with lj/sdk/coul/long/gpu pair style

	Self-explanatory.

	E: Pair style is incompatible with KSpace style

	If a pair style with a long-range Coulombic component is selected,
	then a kspace style must also be used.

	*/
	diff --git a/src/GPU/pair_lj_sdk_gpu.cpp b/src/GPU/pair_lj_sdk_gpu.cpp
	index e7e9b690f..67103181d 100644
	--- a/src/GPU/pair_lj_sdk_gpu.cpp
	+++ b/src/GPU/pair_lj_sdk_gpu.cpp
	@@ -1,262 +1,262 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Mike Brown (SNL)
	------------------------------------------------------------------------- */

	#include <math.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include "pair_lj_sdk_gpu.h"
	#include "atom.h"
	#include "atom_vec.h"
	#include "comm.h"
	#include "force.h"
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "integrate.h"
	#include "memory.h"
	#include "error.h"
	#include "neigh_request.h"
	#include "universe.h"
	#include "update.h"
	#include "domain.h"
	#include <string.h>
	#include "gpu_extra.h"

	using namespace LAMMPS_NS;

	// External functions from cuda library for atom decomposition

	-int cmm_gpu_init(const int ntypes, double cutsq, int cg_types,
	+int sdk_gpu_init(const int ntypes, double cutsq, int cg_types,
	double host_lj1, double host_lj2, double **host_lj3,
	double host_lj4, double offset, double *special_lj,
	const int nlocal, const int nall, const int max_nbors,
	const int maxspecial, const double cell_size, int &gpu_mode,
	FILE *screen);
	-void cmm_gpu_clear();
	-int ** cmm_gpu_compute_n(const int ago, const int inum, const int nall,
	+void sdk_gpu_clear();
	+int ** sdk_gpu_compute_n(const int ago, const int inum, const int nall,
	double *host_x, int host_type, double *sublo,
	double subhi, tagint tag, int **nspecial,
	tagint **special, const bool eflag, const bool vflag,
	const bool eatom, const bool vatom, int &host_start,
	int ilist, int jnum,
	const double cpu_time, bool &success);
	-void cmm_gpu_compute(const int ago, const int inum, const int nall,
	+void sdk_gpu_compute(const int ago, const int inum, const int nall,
	double *host_x, int host_type, int ilist, int numj,
	int **firstneigh, const bool eflag, const bool vflag,
	const bool eatom, const bool vatom, int &host_start,
	const double cpu_time, bool &success);
	-double cmm_gpu_bytes();
	+double sdk_gpu_bytes();

	#include "lj_sdk_common.h"

	using namespace LJSDKParms;

	/* ---------------------------------------------------------------------- */

	PairLJSDKGPU::PairLJSDKGPU(LAMMPS *lmp) : PairLJSDK(lmp), gpu_mode(GPU_FORCE)
	{
	respa_enable = 0;
	reinitflag = 0;
	cpu_time = 0.0;
	GPU_EXTRA::gpu_ready(lmp->modify, lmp->error);
	}

	/* ----------------------------------------------------------------------
	free all arrays
	------------------------------------------------------------------------- */

	PairLJSDKGPU::~PairLJSDKGPU()
	{
	- cmm_gpu_clear();
	+ sdk_gpu_clear();
	}

	/* ---------------------------------------------------------------------- */

	void PairLJSDKGPU::compute(int eflag, int vflag)
	{
	if (eflag \|\| vflag) ev_setup(eflag,vflag);
	else evflag = vflag_fdotr = 0;

	int nall = atom->nlocal + atom->nghost;
	int inum, host_start;

	bool success = true;
	int ilist, numneigh, **firstneigh;
	if (gpu_mode != GPU_FORCE) {
	inum = atom->nlocal;
	- firstneigh = cmm_gpu_compute_n(neighbor->ago, inum, nall, atom->x,
	+ firstneigh = sdk_gpu_compute_n(neighbor->ago, inum, nall, atom->x,
	atom->type, domain->sublo, domain->subhi,
	atom->tag, atom->nspecial, atom->special,
	eflag, vflag, eflag_atom, vflag_atom,
	host_start, &ilist, &numneigh, cpu_time,
	success);
	} else {
	inum = list->inum;
	ilist = list->ilist;
	numneigh = list->numneigh;
	firstneigh = list->firstneigh;
	- cmm_gpu_compute(neighbor->ago, inum, nall, atom->x, atom->type,
	+ sdk_gpu_compute(neighbor->ago, inum, nall, atom->x, atom->type,
	ilist, numneigh, firstneigh, eflag, vflag, eflag_atom,
	vflag_atom, host_start, cpu_time, success);
	}
	if (!success)
	error->one(FLERR,"Insufficient memory on accelerator");

	if (host_start<inum) {
	cpu_time = MPI_Wtime();
	if (evflag) {
	if (eflag) cpu_compute<1,1>(host_start, inum, ilist, numneigh, firstneigh);
	else cpu_compute<1,0>(host_start, inum, ilist, numneigh, firstneigh);
	} else cpu_compute<0,0>(host_start, inum, ilist, numneigh, firstneigh);
	cpu_time = MPI_Wtime() - cpu_time;
	}
	}

	/* ----------------------------------------------------------------------
	init specific to this pair style
	------------------------------------------------------------------------- */

	void PairLJSDKGPU::init_style()
	{
	if (force->newton_pair)
	error->all(FLERR,"Cannot use newton pair with lj/sdk/gpu pair style");

	// Repeat cutsq calculation because done after call to init_style
	double maxcut = -1.0;
	double cut;
	for (int i = 1; i <= atom->ntypes; i++) {
	for (int j = i; j <= atom->ntypes; j++) {
	if (setflag[i][j] != 0 \|\| (setflag[i][i] != 0 && setflag[j][j] != 0)) {
	cut = init_one(i,j);
	cut *= cut;
	if (cut > maxcut)
	maxcut = cut;
	cutsq[i][j] = cutsq[j][i] = cut;
	} else
	cutsq[i][j] = cutsq[j][i] = 0.0;
	}
	}
	double cell_size = sqrt(maxcut) + neighbor->skin;

	int maxspecial=0;
	if (atom->molecular)
	maxspecial=atom->maxspecial;
	- int success = cmm_gpu_init(atom->ntypes+1,cutsq,lj_type,lj1,lj2,lj3,lj4,
	+ int success = sdk_gpu_init(atom->ntypes+1,cutsq,lj_type,lj1,lj2,lj3,lj4,
	offset, force->special_lj, atom->nlocal,
	atom->nlocal+atom->nghost, 300, maxspecial,
	cell_size, gpu_mode, screen);
	GPU_EXTRA::check_flag(success,error,world);

	if (gpu_mode == GPU_FORCE) {
	int irequest = neighbor->request(this,instance_me);
	neighbor->requests[irequest]->half = 0;
	neighbor->requests[irequest]->full = 1;
	}
	}

	/* ---------------------------------------------------------------------- */

	double PairLJSDKGPU::memory_usage()
	{
	double bytes = Pair::memory_usage();
	- return bytes + cmm_gpu_bytes();
	+ return bytes + sdk_gpu_bytes();
	}

	/* ---------------------------------------------------------------------- */
	template <int EVFLAG, int EFLAG>
	void PairLJSDKGPU::cpu_compute(int start, int inum, int *ilist,
	int numneigh, int *firstneigh)
	{
	int i,j,ii,jj,jtype;
	double xtmp,ytmp,ztmp,delx,dely,delz,evdwl,fpair;
	double rsq,r2inv,forcelj,factor_lj;

	const double * const * const x = atom->x;
	double * const * const f = atom->f;
	const int * const type = atom->type;
	const double * const special_lj = force->special_lj;
	double fxtmp,fytmp,fztmp;
	evdwl=0.0;

	// loop over neighbors of my atoms

	for (ii = start; ii < inum; ii++) {
	i = ilist[ii];
	xtmp = x[i][0];
	ytmp = x[i][1];
	ztmp = x[i][2];
	fxtmp=fytmp=fztmp=0.0;

	const int itype = type[i];
	const int * const jlist = firstneigh[i];
	const int jnum = numneigh[i];

	for (jj = 0; jj < jnum; jj++) {
	j = jlist[jj];
	factor_lj = special_lj[sbmask(j)];
	j &= NEIGHMASK;

	delx = xtmp - x[j][0];
	dely = ytmp - x[j][1];
	delz = ztmp - x[j][2];
	rsq = delxdelx + delydely + delz*delz;
	jtype = type[j];

	if (rsq < cutsq[itype][jtype]) {
	r2inv = 1.0/rsq;
	const int ljt = lj_type[itype][jtype];

	if (ljt == LJ12_4) {
	const double r4inv=r2inv*r2inv;
	forcelj = r4inv(lj1[itype][jtype]r4inv*r4inv
	- lj2[itype][jtype]);

	if (EFLAG)
	evdwl = r4inv(lj3[itype][jtype]r4inv*r4inv
	- lj4[itype][jtype]) - offset[itype][jtype];

	} else if (ljt == LJ9_6) {
	const double r3inv = r2inv*sqrt(r2inv);
	const double r6inv = r3inv*r3inv;
	forcelj = r6inv(lj1[itype][jtype]r3inv
	- lj2[itype][jtype]);
	if (EFLAG)
	evdwl = r6inv(lj3[itype][jtype]r3inv
	- lj4[itype][jtype]) - offset[itype][jtype];

	} else if (ljt == LJ12_6) {
	const double r6inv = r2invr2invr2inv;
	forcelj = r6inv(lj1[itype][jtype]r6inv
	- lj2[itype][jtype]);
	if (EFLAG)
	evdwl = r6inv(lj3[itype][jtype]r6inv
	- lj4[itype][jtype]) - offset[itype][jtype];
	} else continue;

	fpair = factor_ljforceljr2inv;

	fxtmp += delx*fpair;
	fytmp += dely*fpair;
	fztmp += delz*fpair;

	if (EVFLAG) ev_tally_full(i,evdwl,0.0,fpair,delx,dely,delz);
	}
	}
	f[i][0] += fxtmp;
	f[i][1] += fytmp;
	f[i][2] += fztmp;
	}
	}
	diff --git a/src/GPU/pair_lj_sdk_gpu.h b/src/GPU/pair_lj_sdk_gpu.h
	index 610fb8b0e..3865b3404 100644
	--- a/src/GPU/pair_lj_sdk_gpu.h
	+++ b/src/GPU/pair_lj_sdk_gpu.h
	@@ -1,60 +1,59 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifdef PAIR_CLASS

	PairStyle(lj/sdk/gpu,PairLJSDKGPU)
	-PairStyle(cg/cmm/gpu,PairLJSDKGPU)

	#else

	#ifndef LMP_PAIR_LJ_SDK_GPU_H
	#define LMP_PAIR_LJ_SDK_GPU_H

	#include "pair_lj_sdk.h"

	namespace LAMMPS_NS {

	class PairLJSDKGPU : public PairLJSDK {
	public:
	PairLJSDKGPU(LAMMPS *lmp);
	~PairLJSDKGPU();
	template <int, int>
	void cpu_compute(int, int, int , int , int **);
	void compute(int, int);
	void init_style();
	double memory_usage();

	enum { GPU_FORCE, GPU_NEIGH, GPU_HYB_NEIGH };

	private:
	int gpu_mode;
	double cpu_time;
	};

	}
	#endif
	#endif

	/* ERROR/WARNING messages:

	E: Insufficient memory on accelerator

	There is insufficient memory on one of the devices specified for the gpu
	package

	E: Cannot use newton pair with lj/sdk/gpu pair style

	Self-explanatory.

	*/
	diff --git a/src/KOKKOS/Install.sh b/src/KOKKOS/Install.sh
	index bbebc36c1..790b9224c 100644
	--- a/src/KOKKOS/Install.sh
	+++ b/src/KOKKOS/Install.sh
	@@ -1,256 +1,256 @@
	# Install/unInstall package files in LAMMPS
	# mode = 0/1/2 for uninstall/install/update

	mode=$1

	# enforce using portable C locale
	LC_ALL=C
	export LC_ALL

	# arg1 = file, arg2 = file it depends on

	action () {
	if (test $mode = 0) then
	rm -f ../$1
	elif (! cmp -s $1 ../$1) then
	if (test -z "$2" \|\| test -e ../$2) then
	cp $1 ..
	if (test $mode = 2) then
	echo " updating src/$1"
	fi
	fi
	elif (test -n "$2") then
	if (test ! -e ../$2) then
	rm -f ../$1
	fi
	fi
	}

	# force rebuild of files with LMP_KOKKOS switch

	touch ../accelerator_kokkos.h
	touch ../memory.h

	# list of files with optional dependcies

	action angle_charmm_kokkos.cpp angle_charmm.cpp
	action angle_charmm_kokkos.h angle_charmm.h
	action angle_class2_kokkos.cpp angle_class2.cpp
	action angle_class2_kokkos.h angle_class2.h
	action angle_harmonic_kokkos.cpp angle_harmonic.cpp
	action angle_harmonic_kokkos.h angle_harmonic.h
	action atom_kokkos.cpp
	action atom_kokkos.h
	action atom_vec_angle_kokkos.cpp atom_vec_angle.cpp
	action atom_vec_angle_kokkos.h atom_vec_angle.h
	action atom_vec_atomic_kokkos.cpp
	action atom_vec_atomic_kokkos.h
	action atom_vec_bond_kokkos.cpp atom_vec_bond.cpp
	action atom_vec_bond_kokkos.h atom_vec_bond.h
	action atom_vec_charge_kokkos.cpp
	action atom_vec_charge_kokkos.h
	action atom_vec_full_kokkos.cpp atom_vec_full.cpp
	action atom_vec_full_kokkos.h atom_vec_full.h
	action atom_vec_kokkos.cpp
	action atom_vec_kokkos.h
	action atom_vec_molecular_kokkos.cpp atom_vec_molecular.cpp
	action atom_vec_molecular_kokkos.h atom_vec_molecular.h
	action bond_class2_kokkos.cpp bond_class2.cpp
	action bond_class2_kokkos.h bond_class2.h
	action bond_fene_kokkos.cpp bond_fene.cpp
	action bond_fene_kokkos.h bond_fene.h
	action bond_harmonic_kokkos.cpp bond_harmonic.cpp
	action bond_harmonic_kokkos.h bond_harmonic.h
	action comm_kokkos.cpp
	action comm_kokkos.h
	action comm_tiled_kokkos.cpp
	action comm_tiled_kokkos.h
	action compute_temp_kokkos.cpp
	action compute_temp_kokkos.h
	action dihedral_charmm_kokkos.cpp dihedral_charmm.cpp
	action dihedral_charmm_kokkos.h dihedral_charmm.h
	action dihedral_class2_kokkos.cpp dihedral_class2.cpp
	action dihedral_class2_kokkos.h dihedral_class2.h
	action dihedral_opls_kokkos.cpp dihedral_opls.cpp
	action dihedral_opls_kokkos.h dihedral_opls.h
	action domain_kokkos.cpp
	action domain_kokkos.h
	action fix_deform_kokkos.cpp
	action fix_deform_kokkos.h
	action fix_langevin_kokkos.cpp
	action fix_langevin_kokkos.h
	action fix_nh_kokkos.cpp
	action fix_nh_kokkos.h
	action fix_nph_kokkos.cpp
	action fix_nph_kokkos.h
	action fix_npt_kokkos.cpp
	action fix_npt_kokkos.h
	action fix_nve_kokkos.cpp
	action fix_nve_kokkos.h
	action fix_nvt_kokkos.cpp
	action fix_nvt_kokkos.h
	action fix_qeq_reax_kokkos.cpp fix_qeq_reax.cpp
	action fix_qeq_reax_kokkos.h fix_qeq_reax.h
	action fix_reaxc_bonds_kokkos.cpp fix_reaxc_bonds.cpp
	action fix_reaxc_bonds_kokkos.h fix_reaxc_bonds.h
	action fix_reaxc_species_kokkos.cpp fix_reaxc_species.cpp
	action fix_reaxc_species_kokkos.h fix_reaxc_species.h
	action fix_setforce_kokkos.cpp
	action fix_setforce_kokkos.h
	action fix_momentum_kokkos.cpp
	action fix_momentum_kokkos.h
	action fix_wall_reflect_kokkos.cpp
	action fix_wall_reflect_kokkos.h
	action gridcomm_kokkos.cpp gridcomm.cpp
	action gridcomm_kokkos.h gridcomm.h
	action improper_class2_kokkos.cpp improper_class2.cpp
	action improper_class2_kokkos.h improper_class2.h
	action improper_harmonic_kokkos.cpp improper_harmonic.cpp
	action improper_harmonic_kokkos.h improper_harmonic.h
	action kokkos.cpp
	action kokkos.h
	action kokkos_type.h
	action kokkos_few.h
	action memory_kokkos.h
	action modify_kokkos.cpp
	action modify_kokkos.h
	action neigh_bond_kokkos.cpp
	action neigh_bond_kokkos.h
	action neigh_list_kokkos.cpp
	action neigh_list_kokkos.h
	action neighbor_kokkos.cpp
	action neighbor_kokkos.h
	action npair_copy_kokkos.cpp
	action npair_copy_kokkos.h
	action npair_kokkos.cpp
	action npair_kokkos.h
	action nbin_kokkos.cpp
	action nbin_kokkos.h
	action math_special_kokkos.cpp
	action math_special_kokkos.h
	action pair_buck_coul_cut_kokkos.cpp
	action pair_buck_coul_cut_kokkos.h
	action pair_buck_coul_long_kokkos.cpp pair_buck_coul_long.cpp
	action pair_buck_coul_long_kokkos.h pair_buck_coul_long.h
	action pair_buck_kokkos.cpp
	action pair_buck_kokkos.h
	action pair_coul_cut_kokkos.cpp
	action pair_coul_cut_kokkos.h
	action pair_coul_debye_kokkos.cpp
	action pair_coul_debye_kokkos.h
	action pair_coul_dsf_kokkos.cpp
	action pair_coul_dsf_kokkos.h
	action pair_coul_long_kokkos.cpp pair_coul_long.cpp
	action pair_coul_long_kokkos.h pair_coul_long.h
	action pair_coul_wolf_kokkos.cpp
	action pair_coul_wolf_kokkos.h
	action pair_eam_kokkos.cpp pair_eam.cpp
	action pair_eam_kokkos.h pair_eam.h
	action pair_eam_alloy_kokkos.cpp pair_eam_alloy.cpp
	action pair_eam_alloy_kokkos.h pair_eam_alloy.h
	action pair_eam_fs_kokkos.cpp pair_eam_fs.cpp
	action pair_eam_fs_kokkos.h pair_eam_fs.h
	action pair_kokkos.h
	action pair_lj_charmm_coul_charmm_implicit_kokkos.cpp pair_lj_charmm_coul_charmm_implicit.cpp
	action pair_lj_charmm_coul_charmm_implicit_kokkos.h pair_lj_charmm_coul_charmm_implicit.h
	action pair_lj_charmm_coul_charmm_kokkos.cpp pair_lj_charmm_coul_charmm.cpp
	action pair_lj_charmm_coul_charmm_kokkos.h pair_lj_charmm_coul_charmm.h
	action pair_lj_charmm_coul_long_kokkos.cpp pair_lj_charmm_coul_long.cpp
	action pair_lj_charmm_coul_long_kokkos.h pair_lj_charmm_coul_long.h
	action pair_lj_class2_coul_cut_kokkos.cpp pair_lj_class2_coul_cut.cpp
	action pair_lj_class2_coul_cut_kokkos.h pair_lj_class2_coul_cut.h
	action pair_lj_class2_coul_long_kokkos.cpp pair_lj_class2_coul_long.cpp
	action pair_lj_class2_coul_long_kokkos.h pair_lj_class2_coul_long.h
	action pair_lj_class2_kokkos.cpp pair_lj_class2.cpp
	action pair_lj_class2_kokkos.h pair_lj_class2.h
	action pair_lj_cut_coul_cut_kokkos.cpp
	action pair_lj_cut_coul_cut_kokkos.h
	action pair_lj_cut_coul_debye_kokkos.cpp
	action pair_lj_cut_coul_debye_kokkos.h
	action pair_lj_cut_coul_dsf_kokkos.cpp
	action pair_lj_cut_coul_dsf_kokkos.h
	action pair_lj_cut_coul_long_kokkos.cpp pair_lj_cut_coul_long.cpp
	action pair_lj_cut_coul_long_kokkos.h pair_lj_cut_coul_long.h
	action pair_lj_cut_kokkos.cpp
	action pair_lj_cut_kokkos.h
	action pair_lj_expand_kokkos.cpp
	action pair_lj_expand_kokkos.h
	action pair_lj_gromacs_coul_gromacs_kokkos.cpp
	action pair_lj_gromacs_coul_gromacs_kokkos.h
	action pair_lj_gromacs_kokkos.cpp
	action pair_lj_gromacs_kokkos.h
	action pair_lj_sdk_kokkos.cpp pair_lj_sdk.cpp
	action pair_lj_sdk_kokkos.h pair_lj_sdk.h
	action pair_morse_kokkos.cpp
	action pair_morse_kokkos.h
	-action pair_reax_c_kokkos.cpp pair_reax_c.cpp
	-action pair_reax_c_kokkos.h pair_reax_c.h
	+action pair_reaxc_kokkos.cpp pair_reaxc.cpp
	+action pair_reaxc_kokkos.h pair_reaxc.h
	action pair_sw_kokkos.cpp pair_sw.cpp
	action pair_sw_kokkos.h pair_sw.h
	action pair_vashishta_kokkos.cpp pair_vashishta.cpp
	action pair_vashishta_kokkos.h pair_vashishta.h
	action pair_table_kokkos.cpp
	action pair_table_kokkos.h
	action pair_tersoff_kokkos.cpp pair_tersoff.cpp
	action pair_tersoff_kokkos.h pair_tersoff.h
	action pair_tersoff_mod_kokkos.cpp pair_tersoff_mod.cpp
	action pair_tersoff_mod_kokkos.h pair_tersoff_mod.h
	action pair_tersoff_zbl_kokkos.cpp pair_tersoff_zbl.cpp
	action pair_tersoff_zbl_kokkos.h pair_tersoff_zbl.h
	action pppm_kokkos.cpp pppm.cpp
	action pppm_kokkos.h pppm.h
	action region_block_kokkos.cpp
	action region_block_kokkos.h
	action verlet_kokkos.cpp
	action verlet_kokkos.h

	# edit 2 Makefile.package files to include/exclude package info

	if (test $1 = 1) then

	if (test -e ../Makefile.package) then
	sed -i -e 's/[^ \t]kokkos[^ \t] //g' ../Makefile.package
	sed -i -e 's/[^ \t]KOKKOS[^ \t] //g' ../Makefile.package
	sed -i -e 's\|^PKG_INC =[ \t]*\|&-DLMP_KOKKOS \|' ../Makefile.package
	# sed -i -e 's\|^PKG_PATH =[ \t]*\|&-L..\/..\/lib\/kokkos\/core\/src \|' ../Makefile.package
	sed -i -e 's\|^PKG_CPP_DEPENDS =[ \t]*\|&$(KOKKOS_CPP_DEPENDS) \|' ../Makefile.package
	sed -i -e 's\|^PKG_LIB =[ \t]*\|&$(KOKKOS_LIBS) \|' ../Makefile.package
	sed -i -e 's\|^PKG_LINK_DEPENDS =[ \t]*\|&$(KOKKOS_LINK_DEPENDS) \|' ../Makefile.package
	sed -i -e 's\|^PKG_SYSINC =[ \t]*\|&$(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) \|' ../Makefile.package
	sed -i -e 's\|^PKG_SYSLIB =[ \t]*\|&$(KOKKOS_LDFLAGS) \|' ../Makefile.package
	# sed -i -e 's\|^PKG_SYSPATH =[ \t]*\|&$(kokkos_SYSPATH) \|' ../Makefile.package
	fi

	if (test -e ../Makefile.package.settings) then
	sed -i -e '/CXX\ =\ \$(CC)/d' ../Makefile.package.settings
	sed -i -e '/^include.kokkos.$/d' ../Makefile.package.settings
	# multiline form needed for BSD sed on Macs
	sed -i -e '4 i \
	CXX = $(CC)
	' ../Makefile.package.settings
	sed -i -e '5 i \
	include ..\/..\/lib\/kokkos\/Makefile.kokkos
	' ../Makefile.package.settings
	fi

	# comb/omp triggers a persistent bug in nvcc. deleting it.
	rm -f ../_comb_omp.

	elif (test $1 = 2) then

	# comb/omp triggers a persistent bug in nvcc. deleting it.
	rm -f ../_comb_omp.

	elif (test $1 = 0) then

	if (test -e ../Makefile.package) then
	sed -i -e 's/[^ \t]kokkos[^ \t] //g' ../Makefile.package
	sed -i -e 's/[^ \t]KOKKOS[^ \t] //g' ../Makefile.package
	fi

	if (test -e ../Makefile.package.settings) then
	sed -i -e '/CXX\ =\ \$(CC)/d' ../Makefile.package.settings
	sed -i -e '/^include.kokkos.$/d' ../Makefile.package.settings
	fi

	fi
	diff --git a/src/KOKKOS/atom_vec_angle_kokkos.cpp b/src/KOKKOS/atom_vec_angle_kokkos.cpp
	index 48fc3a352..34b868aad 100644
	--- a/src/KOKKOS/atom_vec_angle_kokkos.cpp
	+++ b/src/KOKKOS/atom_vec_angle_kokkos.cpp
	@@ -1,1982 +1,1982 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <stdlib.h>
	#include "atom_vec_angle_kokkos.h"
	#include "atom_kokkos.h"
	#include "comm_kokkos.h"
	#include "domain.h"
	#include "modify.h"
	#include "fix.h"
	#include "atom_masks.h"
	#include "memory.h"
	#include "error.h"

	using namespace LAMMPS_NS;

	#define DELTA 10000

	/* ---------------------------------------------------------------------- */

	AtomVecAngleKokkos::AtomVecAngleKokkos(LAMMPS *lmp) : AtomVecKokkos(lmp)
	{
	molecular = 1;
	bonds_allow = angles_allow = 1;
	mass_type = 1;

	comm_x_only = comm_f_only = 1;
	size_forward = 3;
	size_reverse = 3;
	size_border = 7;
	size_velocity = 3;
	size_data_atom = 6;
	size_data_vel = 4;
	xcol_data = 4;

	atom->molecule_flag = 1;

	k_count = DAT::tdual_int_1d("atom::k_count",1);
	atomKK = (AtomKokkos *) atom;
	commKK = (CommKokkos *) comm;
	buffer = NULL;
	}

	/* ----------------------------------------------------------------------
	grow atom arrays
	n = 0 grows arrays by DELTA
	n > 0 allocates arrays to size n
	------------------------------------------------------------------------- */

	void AtomVecAngleKokkos::grow(int n)
	{
	if (n == 0) nmax += DELTA;
	else nmax = n;
	atomKK->nmax = nmax;
	if (nmax < 0 \|\| nmax > MAXSMALLINT)
	error->one(FLERR,"Per-processor system is too big");

	sync(Device,ALL_MASK);
	modified(Device,ALL_MASK);

	memory->grow_kokkos(atomKK->k_tag,atomKK->tag,nmax,"atom:tag");
	memory->grow_kokkos(atomKK->k_type,atomKK->type,nmax,"atom:type");
	memory->grow_kokkos(atomKK->k_mask,atomKK->mask,nmax,"atom:mask");
	memory->grow_kokkos(atomKK->k_image,atomKK->image,nmax,"atom:image");

	memory->grow_kokkos(atomKK->k_x,atomKK->x,nmax,3,"atom:x");
	memory->grow_kokkos(atomKK->k_v,atomKK->v,nmax,3,"atom:v");
	memory->grow_kokkos(atomKK->k_f,atomKK->f,nmax,3,"atom:f");

	memory->grow_kokkos(atomKK->k_molecule,atomKK->molecule,nmax,"atom:molecule");
	memory->grow_kokkos(atomKK->k_nspecial,atomKK->nspecial,nmax,3,"atom:nspecial");
	memory->grow_kokkos(atomKK->k_special,atomKK->special,nmax,atomKK->maxspecial,
	- "atom:special");
	+ "atom:special");
	memory->grow_kokkos(atomKK->k_num_bond,atomKK->num_bond,nmax,"atom:num_bond");
	memory->grow_kokkos(atomKK->k_bond_type,atomKK->bond_type,nmax,atomKK->bond_per_atom,
	- "atom:bond_type");
	+ "atom:bond_type");
	memory->grow_kokkos(atomKK->k_bond_atom,atomKK->bond_atom,nmax,atomKK->bond_per_atom,
	- "atom:bond_atom");
	+ "atom:bond_atom");

	memory->grow_kokkos(atomKK->k_num_angle,atomKK->num_angle,nmax,"atom:num_angle");
	memory->grow_kokkos(atomKK->k_angle_type,atomKK->angle_type,nmax,atomKK->angle_per_atom,
	- "atom:angle_type");
	+ "atom:angle_type");
	memory->grow_kokkos(atomKK->k_angle_atom1,atomKK->angle_atom1,nmax,atomKK->angle_per_atom,
	- "atom:angle_atom1");
	+ "atom:angle_atom1");
	memory->grow_kokkos(atomKK->k_angle_atom2,atomKK->angle_atom2,nmax,atomKK->angle_per_atom,
	- "atom:angle_atom2");
	+ "atom:angle_atom2");
	memory->grow_kokkos(atomKK->k_angle_atom3,atomKK->angle_atom3,nmax,atomKK->angle_per_atom,
	- "atom:angle_atom3");
	+ "atom:angle_atom3");

	grow_reset();
	sync(Host,ALL_MASK);

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
	}

	/* ----------------------------------------------------------------------
	reset local array ptrs
	------------------------------------------------------------------------- */

	void AtomVecAngleKokkos::grow_reset()
	{
	tag = atomKK->tag;
	d_tag = atomKK->k_tag.d_view;
	h_tag = atomKK->k_tag.h_view;

	type = atomKK->type;
	d_type = atomKK->k_type.d_view;
	h_type = atomKK->k_type.h_view;
	mask = atomKK->mask;
	d_mask = atomKK->k_mask.d_view;
	h_mask = atomKK->k_mask.h_view;
	image = atomKK->image;
	d_image = atomKK->k_image.d_view;
	h_image = atomKK->k_image.h_view;

	x = atomKK->x;
	d_x = atomKK->k_x.d_view;
	h_x = atomKK->k_x.h_view;
	v = atomKK->v;
	d_v = atomKK->k_v.d_view;
	h_v = atomKK->k_v.h_view;
	f = atomKK->f;
	d_f = atomKK->k_f.d_view;
	h_f = atomKK->k_f.h_view;

	molecule = atomKK->molecule;
	d_molecule = atomKK->k_molecule.d_view;
	h_molecule = atomKK->k_molecule.h_view;
	nspecial = atomKK->nspecial;
	d_nspecial = atomKK->k_nspecial.d_view;
	h_nspecial = atomKK->k_nspecial.h_view;
	special = atomKK->special;
	d_special = atomKK->k_special.d_view;
	h_special = atomKK->k_special.h_view;
	num_bond = atomKK->num_bond;
	d_num_bond = atomKK->k_num_bond.d_view;
	h_num_bond = atomKK->k_num_bond.h_view;
	bond_type = atomKK->bond_type;
	d_bond_type = atomKK->k_bond_type.d_view;
	h_bond_type = atomKK->k_bond_type.h_view;
	bond_atom = atomKK->bond_atom;
	d_bond_atom = atomKK->k_bond_atom.d_view;
	h_bond_atom = atomKK->k_bond_atom.h_view;

	num_angle = atomKK->num_angle;
	d_num_angle = atomKK->k_num_angle.d_view;
	h_num_angle = atomKK->k_num_angle.h_view;
	angle_type = atomKK->angle_type;
	d_angle_type = atomKK->k_angle_type.d_view;
	h_angle_type = atomKK->k_angle_type.h_view;
	angle_atom1 = atomKK->angle_atom1;
	d_angle_atom1 = atomKK->k_angle_atom1.d_view;
	h_angle_atom1 = atomKK->k_angle_atom1.h_view;
	angle_atom2 = atomKK->angle_atom2;
	d_angle_atom2 = atomKK->k_angle_atom2.d_view;
	h_angle_atom2 = atomKK->k_angle_atom2.h_view;
	angle_atom3 = atomKK->angle_atom3;
	d_angle_atom3 = atomKK->k_angle_atom3.d_view;
	h_angle_atom3 = atomKK->k_angle_atom3.h_view;
	}

	/* ----------------------------------------------------------------------
	copy atom I info to atom J
	------------------------------------------------------------------------- */

	void AtomVecAngleKokkos::copy(int i, int j, int delflag)
	{
	int k;

	h_tag[j] = h_tag[i];
	h_type[j] = h_type[i];
	mask[j] = mask[i];
	h_image[j] = h_image[i];
	h_x(j,0) = h_x(i,0);
	h_x(j,1) = h_x(i,1);
	h_x(j,2) = h_x(i,2);
	h_v(j,0) = h_v(i,0);
	h_v(j,1) = h_v(i,1);
	h_v(j,2) = h_v(i,2);

	h_molecule(j) = h_molecule(i);

	h_num_bond(j) = h_num_bond(i);
	for (k = 0; k < h_num_bond(j); k++) {
	h_bond_type(j,k) = h_bond_type(i,k);
	h_bond_atom(j,k) = h_bond_atom(i,k);
	}

	h_nspecial(j,0) = h_nspecial(i,0);
	h_nspecial(j,1) = h_nspecial(i,1);
	h_nspecial(j,2) = h_nspecial(i,2);
	for (k = 0; k < h_nspecial(j,2); k++)
	h_special(j,k) = h_special(i,k);

	h_num_angle(j) = h_num_angle(i);
	for (k = 0; k < h_num_angle(j); k++) {
	h_angle_type(j,k) = h_angle_type(i,k);
	h_angle_atom1(j,k) = h_angle_atom1(i,k);
	h_angle_atom2(j,k) = h_angle_atom2(i,k);
	h_angle_atom3(j,k) = h_angle_atom3(i,k);
	}

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG,int TRICLINIC>
	struct AtomVecAngleKokkos_PackComm {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
	typename ArrayTypes<DeviceType>::t_xfloat_2d_um _buf;
	typename ArrayTypes<DeviceType>::t_int_2d_const _list;
	const int _iswap;
	X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
	X_FLOAT _pbc[6];

	AtomVecAngleKokkos_PackComm(
	const typename DAT::tdual_x_array &x,
	const typename DAT::tdual_xfloat_2d &buf,
	const typename DAT::tdual_int_2d &list,
	const int & iswap,
	const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
	const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
	_x(x.view<DeviceType>()),_list(list.view<DeviceType>()),_iswap(iswap),
	_xprd(xprd),_yprd(yprd),_zprd(zprd),
	_xy(xy),_xz(xz),_yz(yz) {
	const size_t maxsend = (buf.view<DeviceType>().dimension_0()
	- *buf.view<DeviceType>().dimension_1())/3;
	+ *buf.view<DeviceType>().dimension_1())/3;
	const size_t elements = 3;
	buffer_view<DeviceType>(_buf,buf,maxsend,elements);
	_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
	_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_buf(i,0) = _x(j,0);
	_buf(i,1) = _x(j,1);
	_buf(i,2) = _x(j,2);
	} else {
	if (TRICLINIC == 0) {
	_buf(i,0) = _x(j,0) + _pbc[0]*_xprd;
	_buf(i,1) = _x(j,1) + _pbc[1]*_yprd;
	_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
	} else {
	_buf(i,0) = _x(j,0) + _pbc[0]_xprd + _pbc[5]_xy + _pbc[4]*_xz;
	_buf(i,1) = _x(j,1) + _pbc[1]_yprd + _pbc[3]_yz;
	_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
	}
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecAngleKokkos::pack_comm_kokkos(const int &n,
	- const DAT::tdual_int_2d &list,
	- const int & iswap,
	- const DAT::tdual_xfloat_2d &buf,
	- const int &pbc_flag,
	- const int* const pbc)
	+ const DAT::tdual_int_2d &list,
	+ const int & iswap,
	+ const DAT::tdual_xfloat_2d &buf,
	+ const int &pbc_flag,
	+ const int* const pbc)
	{
	// Check whether to always run forward communication on the host
	// Choose correct forward PackComm kernel

	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecAngleKokkos_PackComm<LMPHostType,1,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAngleKokkos_PackComm<LMPHostType,1,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecAngleKokkos_PackComm<LMPHostType,0,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAngleKokkos_PackComm<LMPHostType,0,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPHostType::fence();
	} else {
	sync(Device,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecAngleKokkos_PackComm<LMPDeviceType,1,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAngleKokkos_PackComm<LMPDeviceType,1,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecAngleKokkos_PackComm<LMPDeviceType,0,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAngleKokkos_PackComm<LMPDeviceType,0,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPDeviceType::fence();
	}

	- return n*size_forward;
	+ return n*size_forward;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG,int TRICLINIC>
	struct AtomVecAngleKokkos_PackCommSelf {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
	typename ArrayTypes<DeviceType>::t_x_array _xw;
	int _nfirst;
	typename ArrayTypes<DeviceType>::t_int_2d_const _list;
	const int _iswap;
	X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
	X_FLOAT _pbc[6];

	AtomVecAngleKokkos_PackCommSelf(
	const typename DAT::tdual_x_array &x,
	const int &nfirst,
	const typename DAT::tdual_int_2d &list,
	const int & iswap,
	const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
	const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
	_x(x.view<DeviceType>()),_xw(x.view<DeviceType>()),_nfirst(nfirst),_list(list.view<DeviceType>()),_iswap(iswap),
	_xprd(xprd),_yprd(yprd),_zprd(zprd),
	_xy(xy),_xz(xz),_yz(yz) {
	_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
	_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_xw(i+_nfirst,0) = _x(j,0);
	_xw(i+_nfirst,1) = _x(j,1);
	_xw(i+_nfirst,2) = _x(j,2);
	} else {
	if (TRICLINIC == 0) {
	_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd;
	_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd;
	_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
	} else {
	_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]_xprd + _pbc[5]_xy + _pbc[4]*_xz;
	_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]_yprd + _pbc[3]_yz;
	_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
	}
	}

	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecAngleKokkos::pack_comm_self(const int &n, const DAT::tdual_int_2d &list,
	const int & iswap,
	const int nfirst, const int &pbc_flag,
	const int* const pbc) {
	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	modified(Host,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecAngleKokkos_PackCommSelf<LMPHostType,1,1>
	f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAngleKokkos_PackCommSelf<LMPHostType,1,0>
	f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecAngleKokkos_PackCommSelf<LMPHostType,0,1>
	f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAngleKokkos_PackCommSelf<LMPHostType,0,0>
	f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPHostType::fence();
	} else {
	sync(Device,X_MASK);
	modified(Device,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecAngleKokkos_PackCommSelf<LMPDeviceType,1,1>
	f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAngleKokkos_PackCommSelf<LMPDeviceType,1,0>
	f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecAngleKokkos_PackCommSelf<LMPDeviceType,0,1>
	f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAngleKokkos_PackCommSelf<LMPDeviceType,0,0>
	f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPDeviceType::fence();
	}
	return n*3;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecAngleKokkos_UnpackComm {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array _x;
	typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
	int _first;

	AtomVecAngleKokkos_UnpackComm(
	const typename DAT::tdual_x_array &x,
	const typename DAT::tdual_xfloat_2d &buf,
	const int& first):_x(x.view<DeviceType>()),_buf(buf.view<DeviceType>()),
	_first(first) {};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	_x(i+_first,0) = _buf(i,0);
	_x(i+_first,1) = _buf(i,1);
	_x(i+_first,2) = _buf(i,2);
	}
	};

	/* ---------------------------------------------------------------------- */

	void AtomVecAngleKokkos::unpack_comm_kokkos(const int &n, const int &first,
	const DAT::tdual_xfloat_2d &buf ) {
	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	modified(Host,X_MASK);
	struct AtomVecAngleKokkos_UnpackComm<LMPHostType> f(atomKK->k_x,buf,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	} else {
	sync(Device,X_MASK);
	modified(Device,X_MASK);
	struct AtomVecAngleKokkos_UnpackComm<LMPDeviceType> f(atomKK->k_x,buf,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAngleKokkos::pack_comm(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0]domain->xprd + pbc[5]domain->xy + pbc[4]*domain->xz;
	dy = pbc[1]domain->yprd + pbc[3]domain->yz;
	dz = pbc[2]*domain->zprd;
	}
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	}
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAngleKokkos::pack_comm_vel(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz,dvx,dvy,dvz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0]domain->xprd + pbc[5]domain->xy + pbc[4]*domain->xz;
	dy = pbc[1]domain->yprd + pbc[3]domain->yz;
	dz = pbc[2]*domain->zprd;
	}
	if (!deform_vremap) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	dvx = pbc[0]h_rate[0] + pbc[5]h_rate[5] + pbc[4]*h_rate[4];
	dvy = pbc[1]h_rate[1] + pbc[3]h_rate[3];
	dvz = pbc[2]*h_rate[2];
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	if (mask[i] & deform_groupbit) {
	buf[m++] = h_v(j,0) + dvx;
	buf[m++] = h_v(j,1) + dvy;
	buf[m++] = h_v(j,2) + dvz;
	} else {
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	}
	}
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecAngleKokkos::unpack_comm(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecAngleKokkos::unpack_comm_vel(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_v(i,0) = buf[m++];
	h_v(i,1) = buf[m++];
	h_v(i,2) = buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAngleKokkos::pack_reverse(int n, int first, double *buf)
	{
	if(n > 0)
	sync(Host,F_MASK);

	int m = 0;
	const int last = first + n;
	for (int i = first; i < last; i++) {
	buf[m++] = h_f(i,0);
	buf[m++] = h_f(i,1);
	buf[m++] = h_f(i,2);
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecAngleKokkos::unpack_reverse(int n, int list, double buf)
	{
	if(n > 0)
	modified(Host,F_MASK);

	int m = 0;
	for (int i = 0; i < n; i++) {
	const int j = list[i];
	h_f(j,0) += buf[m++];
	h_f(j,1) += buf[m++];
	h_f(j,2) += buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG>
	struct AtomVecAngleKokkos_PackBorder {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;

	typename AT::t_xfloat_2d _buf;
	const typename AT::t_int_2d_const _list;
	const int _iswap;
	const typename AT::t_x_array_randomread _x;
	const typename AT::t_tagint_1d _tag;
	const typename AT::t_int_1d _type;
	const typename AT::t_int_1d _mask;
	const typename AT::t_tagint_1d _molecule;
	X_FLOAT _dx,_dy,_dz;

	AtomVecAngleKokkos_PackBorder(
	const typename AT::t_xfloat_2d &buf,
	const typename AT::t_int_2d_const &list,
	const int & iswap,
	const typename AT::t_x_array &x,
	const typename AT::t_tagint_1d &tag,
	const typename AT::t_int_1d &type,
	const typename AT::t_int_1d &mask,
	const typename AT::t_tagint_1d &molecule,
	const X_FLOAT &dx, const X_FLOAT &dy, const X_FLOAT &dz):
	_buf(buf),_list(list),_iswap(iswap),
	_x(x),_tag(tag),_type(type),_mask(mask),_molecule(molecule),
	_dx(dx),_dy(dy),_dz(dz) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_buf(i,0) = _x(j,0);
	_buf(i,1) = _x(j,1);
	_buf(i,2) = _x(j,2);
	- _buf(i,3) = _tag(j);
	- _buf(i,4) = _type(j);
	- _buf(i,5) = _mask(j);
	- _buf(i,6) = _molecule(j);
	+ _buf(i,3) = d_ubuf(_tag(j)).d;
	+ _buf(i,4) = d_ubuf(_type(j)).d;
	+ _buf(i,5) = d_ubuf(_mask(j)).d;
	+ _buf(i,6) = d_ubuf(_molecule(j)).d;
	} else {
	_buf(i,0) = _x(j,0) + _dx;
	_buf(i,1) = _x(j,1) + _dy;
	_buf(i,2) = _x(j,2) + _dz;
	- _buf(i,3) = _tag(j);
	- _buf(i,4) = _type(j);
	- _buf(i,5) = _mask(j);
	- _buf(i,6) = _molecule(j);
	+ _buf(i,3) = d_ubuf(_tag(j)).d;
	+ _buf(i,4) = d_ubuf(_type(j)).d;
	+ _buf(i,5) = d_ubuf(_mask(j)).d;
	+ _buf(i,6) = d_ubuf(_molecule(j)).d;
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecAngleKokkos::pack_border_kokkos(int n, DAT::tdual_int_2d k_sendlist,
	DAT::tdual_xfloat_2d buf,int iswap,
	int pbc_flag, int *pbc, ExecutionSpace space)
	{
	X_FLOAT dx,dy,dz;

	if (pbc_flag != 0) {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	if(space==Host) {
	AtomVecAngleKokkos_PackBorder<LMPHostType,1> f(
	buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
	iswap,h_x,h_tag,h_type,h_mask,h_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	AtomVecAngleKokkos_PackBorder<LMPDeviceType,1> f(
	buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
	iswap,d_x,d_tag,d_type,d_mask,d_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}

	} else {
	dx = dy = dz = 0;
	if(space==Host) {
	AtomVecAngleKokkos_PackBorder<LMPHostType,0> f(
	buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
	iswap,h_x,h_tag,h_type,h_mask,h_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	AtomVecAngleKokkos_PackBorder<LMPDeviceType,0> f(
	buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
	iswap,d_x,d_tag,d_type,d_mask,d_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}
	return n*size_border;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAngleKokkos::pack_border(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = ubuf(h_molecule(j)).d;
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = ubuf(h_molecule(j)).d;
	}
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);

	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAngleKokkos::pack_border_vel(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz,dvx,dvy,dvz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = ubuf(h_molecule(j)).d;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	if (!deform_vremap) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = ubuf(h_molecule(j)).d;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	dvx = pbc[0]h_rate[0] + pbc[5]h_rate[5] + pbc[4]*h_rate[4];
	dvy = pbc[1]h_rate[1] + pbc[3]h_rate[3];
	dvz = pbc[2]*h_rate[2];
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = ubuf(h_molecule(j)).d;
	if (mask[i] & deform_groupbit) {
	buf[m++] = h_v(j,0) + dvx;
	buf[m++] = h_v(j,1) + dvy;
	buf[m++] = h_v(j,2) + dvz;
	} else {
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	}
	}
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);

	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAngleKokkos::pack_border_hybrid(int n, int list, double buf)
	{
	int i,j,m;

	m = 0;
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_molecule(j);
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecAngleKokkos_UnpackBorder {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;

	const typename AT::t_xfloat_2d_const _buf;
	typename AT::t_x_array _x;
	typename AT::t_tagint_1d _tag;
	typename AT::t_int_1d _type;
	typename AT::t_int_1d _mask;
	typename AT::t_tagint_1d _molecule;
	int _first;


	AtomVecAngleKokkos_UnpackBorder(
	const typename AT::t_xfloat_2d_const &buf,
	typename AT::t_x_array &x,
	typename AT::t_tagint_1d &tag,
	typename AT::t_int_1d &type,
	typename AT::t_int_1d &mask,
	typename AT::t_tagint_1d &molecule,
	const int& first):
	_buf(buf),_x(x),_tag(tag),_type(type),_mask(mask),_molecule(molecule),
	_first(first){
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	_x(i+_first,0) = _buf(i,0);
	_x(i+_first,1) = _buf(i,1);
	_x(i+_first,2) = _buf(i,2);
	- _tag(i+_first) = static_cast<tagint> (_buf(i,3));
	- _type(i+_first) = static_cast<int> (_buf(i,4));
	- _mask(i+_first) = static_cast<int> (_buf(i,5));
	- _molecule(i+_first) = static_cast<tagint> (_buf(i,6));
	+ _tag(i+_first) = (tagint) d_ubuf(_buf(i,3)).i;
	+ _type(i+_first) = (int) d_ubuf(_buf(i,4)).i;
	+ _mask(i+_first) = (int) d_ubuf(_buf(i,5)).i;
	+ _molecule(i+_first) = (tagint) d_ubuf(_buf(i,6)).i;

	}
	};

	/* ---------------------------------------------------------------------- */

	void AtomVecAngleKokkos::unpack_border_kokkos(const int &n, const int &first,
	const DAT::tdual_xfloat_2d &buf,
	ExecutionSpace space) {
	modified(space,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|MOLECULE_MASK);
	while (first+n >= nmax) grow(0);
	modified(space,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|MOLECULE_MASK);
	if(space==Host) {
	struct AtomVecAngleKokkos_UnpackBorder<LMPHostType>
	f(buf.view<LMPHostType>(),h_x,h_tag,h_type,h_mask,h_molecule,first);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	struct AtomVecAngleKokkos_UnpackBorder<LMPDeviceType>
	f(buf.view<LMPDeviceType>(),d_x,d_tag,d_type,d_mask,d_molecule,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecAngleKokkos::unpack_border(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	if (i == nmax) grow(0);
	modified(Host,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|MOLECULE_MASK);
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_tag(i) = (tagint) ubuf(buf[m++]).i;
	h_type(i) = (int) ubuf(buf[m++]).i;
	h_mask(i) = (int) ubuf(buf[m++]).i;
	h_molecule(i) = (tagint) ubuf(buf[m++]).i;
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->
	unpack_border(n,first,&buf[m]);
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecAngleKokkos::unpack_border_vel(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	if (i == nmax) grow(0);
	modified(Host,X_MASK\|V_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|MOLECULE_MASK);
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_tag(i) = (tagint) ubuf(buf[m++]).i;
	h_type(i) = (int) ubuf(buf[m++]).i;
	h_mask(i) = (int) ubuf(buf[m++]).i;
	h_molecule(i) = (tagint) ubuf(buf[m++]).i;
	h_v(i,0) = buf[m++];
	h_v(i,1) = buf[m++];
	h_v(i,2) = buf[m++];
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->
	unpack_border(n,first,&buf[m]);
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAngleKokkos::unpack_border_hybrid(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++)
	h_molecule(i) = (tagint) ubuf(buf[m++]).i;
	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecAngleKokkos_PackExchangeFunctor {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;
	typename AT::t_x_array_randomread _x;
	typename AT::t_v_array_randomread _v;
	typename AT::t_tagint_1d_randomread _tag;
	typename AT::t_int_1d_randomread _type;
	typename AT::t_int_1d_randomread _mask;
	typename AT::t_imageint_1d_randomread _image;
	typename AT::t_tagint_1d_randomread _molecule;
	typename AT::t_int_2d_randomread _nspecial;
	typename AT::t_tagint_2d_randomread _special;
	typename AT::t_int_1d_randomread _num_bond;
	typename AT::t_int_2d_randomread _bond_type;
	typename AT::t_tagint_2d_randomread _bond_atom;
	typename AT::t_int_1d_randomread _num_angle;
	typename AT::t_int_2d_randomread _angle_type;
	typename AT::t_tagint_2d_randomread _angle_atom1,_angle_atom2,_angle_atom3;
	typename AT::t_x_array _xw;
	typename AT::t_v_array _vw;
	typename AT::t_tagint_1d _tagw;
	typename AT::t_int_1d _typew;
	typename AT::t_int_1d _maskw;
	typename AT::t_imageint_1d _imagew;
	typename AT::t_tagint_1d _moleculew;
	typename AT::t_int_2d _nspecialw;
	typename AT::t_tagint_2d _specialw;
	typename AT::t_int_1d _num_bondw;
	typename AT::t_int_2d _bond_typew;
	typename AT::t_tagint_2d _bond_atomw;
	typename AT::t_int_1d _num_anglew;
	typename AT::t_int_2d _angle_typew;
	typename AT::t_tagint_2d _angle_atom1w,_angle_atom2w,_angle_atom3w;

	typename AT::t_xfloat_2d_um _buf;
	typename AT::t_int_1d_const _sendlist;
	typename AT::t_int_1d_const _copylist;
	int _nlocal,_dim;
	X_FLOAT _lo,_hi;
	size_t elements;

	AtomVecAngleKokkos_PackExchangeFunctor(
	const AtomKokkos* atom,
	const typename AT::tdual_xfloat_2d buf,
	typename AT::tdual_int_1d sendlist,
	typename AT::tdual_int_1d copylist,int nlocal, int dim,
	X_FLOAT lo, X_FLOAT hi):
	_x(atom->k_x.view<DeviceType>()),
	_v(atom->k_v.view<DeviceType>()),
	_tag(atom->k_tag.view<DeviceType>()),
	_type(atom->k_type.view<DeviceType>()),
	_mask(atom->k_mask.view<DeviceType>()),
	_image(atom->k_image.view<DeviceType>()),
	_molecule(atom->k_molecule.view<DeviceType>()),
	_nspecial(atom->k_nspecial.view<DeviceType>()),
	_special(atom->k_special.view<DeviceType>()),
	_num_bond(atom->k_num_bond.view<DeviceType>()),
	_bond_type(atom->k_bond_type.view<DeviceType>()),
	_bond_atom(atom->k_bond_atom.view<DeviceType>()),
	_num_angle(atom->k_num_angle.view<DeviceType>()),
	_angle_type(atom->k_angle_type.view<DeviceType>()),
	_angle_atom1(atom->k_angle_atom1.view<DeviceType>()),
	_angle_atom2(atom->k_angle_atom2.view<DeviceType>()),
	_angle_atom3(atom->k_angle_atom3.view<DeviceType>()),
	_xw(atom->k_x.view<DeviceType>()),
	_vw(atom->k_v.view<DeviceType>()),
	_tagw(atom->k_tag.view<DeviceType>()),
	_typew(atom->k_type.view<DeviceType>()),
	_maskw(atom->k_mask.view<DeviceType>()),
	_imagew(atom->k_image.view<DeviceType>()),
	_moleculew(atom->k_molecule.view<DeviceType>()),
	_nspecialw(atom->k_nspecial.view<DeviceType>()),
	_specialw(atom->k_special.view<DeviceType>()),
	_num_bondw(atom->k_num_bond.view<DeviceType>()),
	_bond_typew(atom->k_bond_type.view<DeviceType>()),
	_bond_atomw(atom->k_bond_atom.view<DeviceType>()),
	_num_anglew(atom->k_num_angle.view<DeviceType>()),
	_angle_typew(atom->k_angle_type.view<DeviceType>()),
	_angle_atom1w(atom->k_angle_atom1.view<DeviceType>()),
	_angle_atom2w(atom->k_angle_atom2.view<DeviceType>()),
	_angle_atom3w(atom->k_angle_atom3.view<DeviceType>()),
	_sendlist(sendlist.template view<DeviceType>()),
	_copylist(copylist.template view<DeviceType>()),
	_nlocal(nlocal),_dim(dim),
	_lo(lo),_hi(hi){
	// 3 comp of x, 3 comp of v, 1 tag, 1 type, 1 mask, 1 image, 1 molecule, 3 nspecial,
	// maxspecial special, 1 num_bond, bond_per_atom bond_type, bond_per_atom bond_atom,
	// 1 num_angle, angle_per_atom angle_type, angle_per_atom angle_atom1, angle_atom2,
	// and angle_atom3
	// 1 to store buffer length
	elements = 17+atom->maxspecial+2atom->bond_per_atom+4atom->angle_per_atom;
	const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
	buf.template view<DeviceType>().dimension_1())/elements;
	buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
	}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int &mysend) const {
	int k;
	const int i = _sendlist(mysend);
	_buf(mysend,0) = elements;
	int m = 1;
	_buf(mysend,m++) = _x(i,0);
	_buf(mysend,m++) = _x(i,1);
	_buf(mysend,m++) = _x(i,2);
	_buf(mysend,m++) = _v(i,0);
	_buf(mysend,m++) = _v(i,1);
	_buf(mysend,m++) = _v(i,2);
	- _buf(mysend,m++) = _tag(i);
	- _buf(mysend,m++) = _type(i);
	- _buf(mysend,m++) = _mask(i);
	- _buf(mysend,m++) = _image(i);
	- _buf(mysend,m++) = _molecule(i);
	- _buf(mysend,m++) = _num_bond(i);
	+ _buf(mysend,m++) = d_ubuf(_tag(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_type(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_mask(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_image(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_molecule(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_num_bond(i)).d;
	for (k = 0; k < _num_bond(i); k++) {
	- _buf(mysend,m++) = _bond_type(i,k);
	- _buf(mysend,m++) = _bond_atom(i,k);
	+ _buf(mysend,m++) = d_ubuf(_bond_type(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_bond_atom(i,k)).d;
	}
	- _buf(mysend,m++) = _num_angle(i);
	+ _buf(mysend,m++) = d_ubuf(_num_angle(i)).d;
	for (k = 0; k < _num_angle(i); k++) {
	- _buf(mysend,m++) = _angle_type(i,k);
	- _buf(mysend,m++) = _angle_atom1(i,k);
	- _buf(mysend,m++) = _angle_atom2(i,k);
	- _buf(mysend,m++) = _angle_atom3(i,k);
	+ _buf(mysend,m++) = d_ubuf(_angle_type(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_angle_atom1(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_angle_atom2(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_angle_atom3(i,k)).d;
	}
	- _buf(mysend,m++) = _nspecial(i,0);
	- _buf(mysend,m++) = _nspecial(i,1);
	- _buf(mysend,m++) = _nspecial(i,2);
	+ _buf(mysend,m++) = d_ubuf(_nspecial(i,0)).d;
	+ _buf(mysend,m++) = d_ubuf(_nspecial(i,1)).d;
	+ _buf(mysend,m++) = d_ubuf(_nspecial(i,2)).d;
	for (k = 0; k < _nspecial(i,2); k++)
	- _buf(mysend,m++) = _special(i,k);
	+ _buf(mysend,m++) = d_ubuf(_special(i,k)).d;

	const int j = _copylist(mysend);

	if(j>-1) {
	_xw(i,0) = _x(j,0);
	_xw(i,1) = _x(j,1);
	_xw(i,2) = _x(j,2);
	_vw(i,0) = _v(j,0);
	_vw(i,1) = _v(j,1);
	_vw(i,2) = _v(j,2);
	_tagw(i) = _tag(j);
	_typew(i) = _type(j);
	_maskw(i) = _mask(j);
	_imagew(i) = _image(j);
	_moleculew(i) = _molecule(j);
	_num_bondw(i) = _num_bond(j);
	for (k = 0; k < _num_bond(j); k++) {
	_bond_typew(i,k) = _bond_type(j,k);
	_bond_atomw(i,k) = _bond_atom(j,k);
	}
	_num_anglew(i) = _num_angle(j);
	for (k = 0; k < _num_angle(j); k++) {
	_angle_typew(i,k) = _angle_type(j,k);
	_angle_atom1w(i,k) = _angle_atom1(j,k);
	_angle_atom2w(i,k) = _angle_atom2(j,k);
	_angle_atom3w(i,k) = _angle_atom3(j,k);
	}
	_nspecialw(i,0) = _nspecial(j,0);
	_nspecialw(i,1) = _nspecial(j,1);
	_nspecialw(i,2) = _nspecial(j,2);
	for (k = 0; k < _nspecial(j,2); k++)
	_specialw(i,k) = _special(j,k);
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecAngleKokkos::pack_exchange_kokkos(const int &nsend,DAT::tdual_xfloat_2d &k_buf,
	DAT::tdual_int_1d k_sendlist,
	DAT::tdual_int_1d k_copylist,
	ExecutionSpace space,int dim,X_FLOAT lo,
	X_FLOAT hi )
	{
	const int elements = 17+atom->maxspecial+2atom->bond_per_atom+4atom->angle_per_atom;
	if(nsend > (int) (k_buf.view<LMPHostType>().dimension_0()*
	k_buf.view<LMPHostType>().dimension_1())/elements) {
	int newsize = nsend*elements/k_buf.view<LMPHostType>().dimension_1()+1;
	k_buf.resize(newsize,k_buf.view<LMPHostType>().dimension_1());
	}
	if(space == Host) {
	AtomVecAngleKokkos_PackExchangeFunctor<LMPHostType>
	f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
	Kokkos::parallel_for(nsend,f);
	LMPHostType::fence();
	return nsend*elements;
	} else {
	AtomVecAngleKokkos_PackExchangeFunctor<LMPDeviceType>
	f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
	Kokkos::parallel_for(nsend,f);
	LMPDeviceType::fence();
	return nsend*elements;
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAngleKokkos::pack_exchange(int i, double *buf)
	{
	int k;
	int m = 1;
	buf[m++] = h_x(i,0);
	buf[m++] = h_x(i,1);
	buf[m++] = h_x(i,2);
	buf[m++] = h_v(i,0);
	buf[m++] = h_v(i,1);
	buf[m++] = h_v(i,2);
	buf[m++] = ubuf(h_tag(i)).d;
	buf[m++] = ubuf(h_type(i)).d;
	buf[m++] = ubuf(h_mask(i)).d;
	buf[m++] = ubuf(h_image(i)).d;
	buf[m++] = ubuf(h_molecule(i)).d;

	buf[m++] = ubuf(h_num_bond(i)).d;
	for (k = 0; k < h_num_bond(i); k++) {
	buf[m++] = ubuf(h_bond_type(i,k)).d;
	buf[m++] = ubuf(h_bond_atom(i,k)).d;
	}
	buf[m++] = ubuf(h_num_angle(i)).d;
	for (k = 0; k < h_num_angle(i); k++) {
	buf[m++] = ubuf(h_angle_type(i,k)).d;
	buf[m++] = ubuf(h_angle_atom1(i,k)).d;
	buf[m++] = ubuf(h_angle_atom2(i,k)).d;
	buf[m++] = ubuf(h_angle_atom3(i,k)).d;
	}
	buf[m++] = ubuf(h_nspecial(i,0)).d;
	buf[m++] = ubuf(h_nspecial(i,1)).d;
	buf[m++] = ubuf(h_nspecial(i,2)).d;
	for (k = 0; k < h_nspecial(i,2); k++)
	buf[m++] = ubuf(h_special(i,k)).d;

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);

	buf[0] = m;
	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecAngleKokkos_UnpackExchangeFunctor {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;
	typename AT::t_x_array _x;
	typename AT::t_v_array _v;
	typename AT::t_tagint_1d _tag;
	typename AT::t_int_1d _type;
	typename AT::t_int_1d _mask;
	typename AT::t_imageint_1d _image;
	typename AT::t_tagint_1d _molecule;
	typename AT::t_int_2d _nspecial;
	typename AT::t_tagint_2d _special;
	typename AT::t_int_1d _num_bond;
	typename AT::t_int_2d _bond_type;
	typename AT::t_tagint_2d _bond_atom;
	typename AT::t_int_1d _num_angle;
	typename AT::t_int_2d _angle_type;
	typename AT::t_tagint_2d _angle_atom1,_angle_atom2,_angle_atom3;

	typename AT::t_xfloat_2d_um _buf;
	typename AT::t_int_1d _nlocal;
	int _dim;
	X_FLOAT _lo,_hi;
	size_t elements;

	AtomVecAngleKokkos_UnpackExchangeFunctor(
	const AtomKokkos* atom,
	const typename AT::tdual_xfloat_2d buf,
	typename AT::tdual_int_1d nlocal,
	int dim, X_FLOAT lo, X_FLOAT hi):
	_x(atom->k_x.view<DeviceType>()),
	_v(atom->k_v.view<DeviceType>()),
	_tag(atom->k_tag.view<DeviceType>()),
	_type(atom->k_type.view<DeviceType>()),
	_mask(atom->k_mask.view<DeviceType>()),
	_image(atom->k_image.view<DeviceType>()),
	_molecule(atom->k_molecule.view<DeviceType>()),
	_nspecial(atom->k_nspecial.view<DeviceType>()),
	_special(atom->k_special.view<DeviceType>()),
	_num_bond(atom->k_num_bond.view<DeviceType>()),
	_bond_type(atom->k_bond_type.view<DeviceType>()),
	_bond_atom(atom->k_bond_atom.view<DeviceType>()),
	_num_angle(atom->k_num_angle.view<DeviceType>()),
	_angle_type(atom->k_angle_type.view<DeviceType>()),
	_angle_atom1(atom->k_angle_atom1.view<DeviceType>()),
	_angle_atom2(atom->k_angle_atom2.view<DeviceType>()),
	_angle_atom3(atom->k_angle_atom3.view<DeviceType>()),
	_nlocal(nlocal.template view<DeviceType>()),_dim(dim),
	_lo(lo),_hi(hi){
	elements =17+atom->maxspecial+2atom->bond_per_atom+4atom->angle_per_atom;
	const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
	- buf.template view<DeviceType>().dimension_1())/elements;
	+ buf.template view<DeviceType>().dimension_1())/elements;
	buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
	}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int &myrecv) const {
	X_FLOAT x = _buf(myrecv,_dim+1);
	if (x >= _lo && x < _hi) {
	int i = Kokkos::atomic_fetch_add(&_nlocal(0),1);
	int m = 1;
	_x(i,0) = _buf(myrecv,m++);
	_x(i,1) = _buf(myrecv,m++);
	_x(i,2) = _buf(myrecv,m++);
	_v(i,0) = _buf(myrecv,m++);
	_v(i,1) = _buf(myrecv,m++);
	_v(i,2) = _buf(myrecv,m++);
	- _tag(i) = _buf(myrecv,m++);
	- _type(i) = _buf(myrecv,m++);
	- _mask(i) = _buf(myrecv,m++);
	- _image(i) = _buf(myrecv,m++);
	+ _tag(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _type(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _mask(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _image(i) = (imageint) d_ubuf(_buf(myrecv,m++)).i;

	- _molecule(i) = _buf(myrecv,m++);
	- _num_bond(i) = _buf(myrecv,m++);
	+ _molecule(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _num_bond(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	int k;
	for (k = 0; k < _num_bond(i); k++) {
	- _bond_type(i,k) = _buf(myrecv,m++);
	- _bond_atom(i,k) = _buf(myrecv,m++);
	+ _bond_type(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _bond_atom(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	}
	- _num_angle(i) = _buf(myrecv,m++);
	+ _num_angle(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	for (k = 0; k < _num_angle(i); k++) {
	- _angle_type(i,k) = _buf(myrecv,m++);
	- _angle_atom1(i,k) = _buf(myrecv,m++);
	- _angle_atom2(i,k) = _buf(myrecv,m++);
	- _angle_atom3(i,k) = _buf(myrecv,m++);
	+ _angle_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _angle_atom1(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _angle_atom2(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _angle_atom3(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	}
	- _nspecial(i,0) = _buf(myrecv,m++);
	- _nspecial(i,1) = _buf(myrecv,m++);
	- _nspecial(i,2) = _buf(myrecv,m++);
	+ _nspecial(i,0) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _nspecial(i,1) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _nspecial(i,2) = (int) d_ubuf(_buf(myrecv,m++)).i;
	for (k = 0; k < _nspecial(i,2); k++)
	- _special(i,k) = _buf(myrecv,m++);
	+ _special(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecAngleKokkos::unpack_exchange_kokkos(DAT::tdual_xfloat_2d &k_buf,int nrecv,
	int nlocal,int dim,X_FLOAT lo,X_FLOAT hi,
	ExecutionSpace space) {
	const size_t elements = 17+atom->maxspecial+2atom->bond_per_atom+4atom->angle_per_atom;
	if(space == Host) {
	k_count.h_view(0) = nlocal;
	AtomVecAngleKokkos_UnpackExchangeFunctor<LMPHostType>
	f(atomKK,k_buf,k_count,dim,lo,hi);
	Kokkos::parallel_for(nrecv/elements,f);
	LMPHostType::fence();
	return k_count.h_view(0);
	} else {
	k_count.h_view(0) = nlocal;
	k_count.modify<LMPHostType>();
	k_count.sync<LMPDeviceType>();
	AtomVecAngleKokkos_UnpackExchangeFunctor<LMPDeviceType>
	f(atomKK,k_buf,k_count,dim,lo,hi);
	Kokkos::parallel_for(nrecv/elements,f);
	LMPDeviceType::fence();
	k_count.modify<LMPDeviceType>();
	k_count.sync<LMPHostType>();

	return k_count.h_view(0);
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAngleKokkos::unpack_exchange(double *buf)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) grow(0);
	modified(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| MOLECULE_MASK \| BOND_MASK \|
	ANGLE_MASK \| SPECIAL_MASK);

	int k;
	int m = 1;
	h_x(nlocal,0) = buf[m++];
	h_x(nlocal,1) = buf[m++];
	h_x(nlocal,2) = buf[m++];
	h_v(nlocal,0) = buf[m++];
	h_v(nlocal,1) = buf[m++];
	h_v(nlocal,2) = buf[m++];
	h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
	h_type(nlocal) = (int) ubuf(buf[m++]).i;
	h_mask(nlocal) = (int) ubuf(buf[m++]).i;
	h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
	h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;

	h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_bond(nlocal); k++) {
	h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}
	h_num_angle(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_angle(nlocal); k++) {
	h_angle_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_angle_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_angle_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_angle_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}
	h_nspecial(nlocal,0) = (int) ubuf(buf[m++]).i;
	h_nspecial(nlocal,1) = (int) ubuf(buf[m++]).i;
	h_nspecial(nlocal,2) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_nspecial(nlocal,2); k++)
	h_special(nlocal,k) = (tagint) ubuf(buf[m++]).i;

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	m += modify->fix[atom->extra_grow[iextra]]->
	unpack_exchange(nlocal,&buf[m]);

	atom->nlocal++;
	return m;
	}

	/* ----------------------------------------------------------------------
	size of restart data for all atoms owned by this proc
	include extra data stored by fixes
	------------------------------------------------------------------------- */

	int AtomVecAngleKokkos::size_restart()
	{
	int i;

	int nlocal = atom->nlocal;
	int n = 0;
	for (i = 0; i < nlocal; i++)
	n += 14 + 2h_num_bond(i) + 4h_num_angle(i);

	if (atom->nextra_restart)
	for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
	for (i = 0; i < nlocal; i++)
	n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);

	return n;
	}

	/* ----------------------------------------------------------------------
	pack atom I's data for restart file including extra quantities
	xyz must be 1st 3 values, so that read_restart can test on them
	molecular types may be negative, but write as positive
	------------------------------------------------------------------------- */

	int AtomVecAngleKokkos::pack_restart(int i, double *buf)
	{
	sync(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| MOLECULE_MASK \| BOND_MASK \|
	ANGLE_MASK \| SPECIAL_MASK);

	int m = 1;
	buf[m++] = h_x(i,0);
	buf[m++] = h_x(i,1);
	buf[m++] = h_x(i,2);
	buf[m++] = ubuf(h_tag(i)).d;
	buf[m++] = ubuf(h_type(i)).d;
	buf[m++] = ubuf(h_mask(i)).d;
	buf[m++] = ubuf(h_image(i)).d;
	buf[m++] = h_v(i,0);
	buf[m++] = h_v(i,1);
	buf[m++] = h_v(i,2);

	buf[m++] = ubuf(h_molecule(i)).d;

	buf[m++] = ubuf(h_num_bond(i)).d;
	for (int k = 0; k < h_num_bond(i); k++) {
	buf[m++] = ubuf(MAX(h_bond_type(i,k),-h_bond_type(i,k))).d;
	buf[m++] = ubuf(h_bond_atom(i,k)).d;
	}

	buf[m++] = ubuf(h_num_angle(i)).d;
	for (int k = 0; k < h_num_angle(i); k++) {
	buf[m++] = ubuf(MAX(h_angle_type(i,k),-h_angle_type(i,k))).d;
	buf[m++] = ubuf(h_angle_atom1(i,k)).d;
	buf[m++] = ubuf(h_angle_atom2(i,k)).d;
	buf[m++] = ubuf(h_angle_atom3(i,k)).d;
	}

	if (atom->nextra_restart)
	for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
	m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);

	buf[0] = m;
	return m;
	}

	/* ----------------------------------------------------------------------
	unpack data for one atom from restart file including extra quantities
	------------------------------------------------------------------------- */

	int AtomVecAngleKokkos::unpack_restart(double *buf)
	{
	int k;

	int nlocal = atom->nlocal;
	if (nlocal == nmax) {
	grow(0);
	if (atom->nextra_store)
	memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
	}
	modified(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| MOLECULE_MASK \| BOND_MASK \|
	ANGLE_MASK \| SPECIAL_MASK);

	int m = 1;
	h_x(nlocal,0) = buf[m++];
	h_x(nlocal,1) = buf[m++];
	h_x(nlocal,2) = buf[m++];
	h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
	h_type(nlocal) = (int) ubuf(buf[m++]).i;
	h_mask(nlocal) = (int) ubuf(buf[m++]).i;
	h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
	h_v(nlocal,0) = buf[m++];
	h_v(nlocal,1) = buf[m++];
	h_v(nlocal,2) = buf[m++];

	h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;

	h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_bond(nlocal); k++) {
	h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}

	h_num_angle(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_angle(nlocal); k++) {
	h_angle_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_angle_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_angle_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_angle_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}

	h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;

	double **extra = atom->extra;
	if (atom->nextra_store) {
	int size = static_cast<int> (buf[0]) - m;
	for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
	}

	atom->nlocal++;
	return m;
	}

	/* ----------------------------------------------------------------------
	create one atom of itype at coord
	set other values to defaults
	------------------------------------------------------------------------- */

	void AtomVecAngleKokkos::create_atom(int itype, double *coord)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) {
	atomKK->modified(Host,ALL_MASK);
	grow(0);
	}
	atomKK->modified(Host,ALL_MASK);

	tag[nlocal] = 0;
	type[nlocal] = itype;
	h_x(nlocal,0) = coord[0];
	h_x(nlocal,1) = coord[1];
	h_x(nlocal,2) = coord[2];
	h_mask(nlocal) = 1;
	h_image(nlocal) = ((imageint) IMGMAX << IMG2BITS) \|
	((imageint) IMGMAX << IMGBITS) \| IMGMAX;
	h_v(nlocal,0) = 0.0;
	h_v(nlocal,1) = 0.0;
	h_v(nlocal,2) = 0.0;

	h_molecule(nlocal) = 0;
	h_num_bond(nlocal) = 0;
	h_num_angle(nlocal) = 0;
	h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;

	atom->nlocal++;
	}

	/* ----------------------------------------------------------------------
	unpack one line from Atoms section of data file
	initialize other atom quantities
	------------------------------------------------------------------------- */

	void AtomVecAngleKokkos::data_atom(double *coord, imageint imagetmp,
	char **values)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) grow(0);
	atomKK->modified(Host,ALL_MASK);

	h_tag(nlocal) = atoi(values[0]);
	h_molecule(nlocal) = atoi(values[1]);
	h_type(nlocal) = atoi(values[2]);
	if (h_type(nlocal) <= 0 \|\| h_type(nlocal) > atom->ntypes)
	error->one(FLERR,"Invalid atom type in Atoms section of data file");

	h_x(nlocal,0) = coord[0];
	h_x(nlocal,1) = coord[1];
	h_x(nlocal,2) = coord[2];

	h_image(nlocal) = imagetmp;

	h_mask(nlocal) = 1;
	h_v(nlocal,0) = 0.0;
	h_v(nlocal,1) = 0.0;
	h_v(nlocal,2) = 0.0;
	h_num_bond(nlocal) = 0;
	h_num_angle(nlocal) = 0;

	atom->nlocal++;
	}

	/* ----------------------------------------------------------------------
	unpack hybrid quantities from one line in Atoms section of data file
	initialize other atom quantities for this sub-style
	------------------------------------------------------------------------- */

	int AtomVecAngleKokkos::data_atom_hybrid(int nlocal, char **values)
	{
	h_molecule(nlocal) = atoi(values[0]);
	h_num_bond(nlocal) = 0;
	h_num_angle(nlocal) = 0;
	return 1;
	}

	/* ----------------------------------------------------------------------
	pack atom info for data file including 3 image flags
	------------------------------------------------------------------------- */

	void AtomVecAngleKokkos::pack_data(double **buf)
	{
	int nlocal = atom->nlocal;
	for (int i = 0; i < nlocal; i++) {
	buf[i][0] = h_tag(i);
	buf[i][1] = h_molecule(i);
	buf[i][2] = h_type(i);
	buf[i][3] = h_x(i,0);
	buf[i][4] = h_x(i,1);
	buf[i][5] = h_x(i,2);
	buf[i][6] = (h_image[i] & IMGMASK) - IMGMAX;
	buf[i][7] = (h_image[i] >> IMGBITS & IMGMASK) - IMGMAX;
	buf[i][8] = (h_image[i] >> IMG2BITS) - IMGMAX;
	}
	}

	/* ----------------------------------------------------------------------
	pack hybrid atom info for data file
	------------------------------------------------------------------------- */

	int AtomVecAngleKokkos::pack_data_hybrid(int i, double *buf)
	{
	buf[0] = h_molecule(i);
	return 1;
	}

	/* ----------------------------------------------------------------------
	write atom info to data file including 3 image flags
	------------------------------------------------------------------------- */

	void AtomVecAngleKokkos::write_data(FILE fp, int n, double *buf)
	{
	for (int i = 0; i < n; i++)
	fprintf(fp,"%d %d %d %-1.16e %-1.16e %-1.16e %d %d %d\n",
	(int) buf[i][0],(int) buf[i][1], (int) buf[i][2],
	buf[i][3],buf[i][4],buf[i][5],
	(int) buf[i][6],(int) buf[i][7],(int) buf[i][8]);
	}

	/* ----------------------------------------------------------------------
	write hybrid atom info to data file
	------------------------------------------------------------------------- */

	int AtomVecAngleKokkos::write_data_hybrid(FILE fp, double buf)
	{
	fprintf(fp," " TAGINT_FORMAT, (tagint) (buf[0]));
	return 1;
	}

	/* ----------------------------------------------------------------------
	return # of bytes of allocated memory
	------------------------------------------------------------------------- */

	bigint AtomVecAngleKokkos::memory_usage()
	{
	bigint bytes = 0;

	if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
	if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
	if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
	if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
	if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
	if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
	if (atom->memcheck("f")) bytes += memory->usage(f,nmax*commKK->nthreads,3);

	if (atom->memcheck("molecule")) bytes += memory->usage(molecule,nmax);
	if (atom->memcheck("nspecial")) bytes += memory->usage(nspecial,nmax,3);
	if (atom->memcheck("special"))
	bytes += memory->usage(special,nmax,atom->maxspecial);

	if (atom->memcheck("num_bond")) bytes += memory->usage(num_bond,nmax);
	if (atom->memcheck("bond_type"))
	bytes += memory->usage(bond_type,nmax,atom->bond_per_atom);
	if (atom->memcheck("bond_atom"))
	bytes += memory->usage(bond_atom,nmax,atom->bond_per_atom);

	if (atom->memcheck("num_angle")) bytes += memory->usage(num_angle,nmax);
	if (atom->memcheck("angle_type"))
	bytes += memory->usage(angle_type,nmax,atom->angle_per_atom);
	if (atom->memcheck("angle_atom1"))
	bytes += memory->usage(angle_atom1,nmax,atom->angle_per_atom);
	if (atom->memcheck("angle_atom2"))
	bytes += memory->usage(angle_atom2,nmax,atom->angle_per_atom);
	if (atom->memcheck("angle_atom3"))
	bytes += memory->usage(angle_atom3,nmax,atom->angle_per_atom);

	return bytes;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecAngleKokkos::sync(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if (mask & X_MASK) atomKK->k_x.sync<LMPDeviceType>();
	if (mask & V_MASK) atomKK->k_v.sync<LMPDeviceType>();
	if (mask & F_MASK) atomKK->k_f.sync<LMPDeviceType>();
	if (mask & TAG_MASK) atomKK->k_tag.sync<LMPDeviceType>();
	if (mask & TYPE_MASK) atomKK->k_type.sync<LMPDeviceType>();
	if (mask & MASK_MASK) atomKK->k_mask.sync<LMPDeviceType>();
	if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPDeviceType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPDeviceType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.sync<LMPDeviceType>();
	atomKK->k_special.sync<LMPDeviceType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.sync<LMPDeviceType>();
	atomKK->k_bond_type.sync<LMPDeviceType>();
	atomKK->k_bond_atom.sync<LMPDeviceType>();
	}
	if (mask & ANGLE_MASK) {
	atomKK->k_num_angle.sync<LMPDeviceType>();
	atomKK->k_angle_type.sync<LMPDeviceType>();
	atomKK->k_angle_atom1.sync<LMPDeviceType>();
	atomKK->k_angle_atom2.sync<LMPDeviceType>();
	atomKK->k_angle_atom3.sync<LMPDeviceType>();
	}
	} else {
	if (mask & X_MASK) atomKK->k_x.sync<LMPHostType>();
	if (mask & V_MASK) atomKK->k_v.sync<LMPHostType>();
	if (mask & F_MASK) atomKK->k_f.sync<LMPHostType>();
	if (mask & TAG_MASK) atomKK->k_tag.sync<LMPHostType>();
	if (mask & TYPE_MASK) atomKK->k_type.sync<LMPHostType>();
	if (mask & MASK_MASK) atomKK->k_mask.sync<LMPHostType>();
	if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPHostType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPHostType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.sync<LMPHostType>();
	atomKK->k_special.sync<LMPHostType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.sync<LMPHostType>();
	atomKK->k_bond_type.sync<LMPHostType>();
	atomKK->k_bond_atom.sync<LMPHostType>();
	}
	if (mask & ANGLE_MASK) {
	atomKK->k_num_angle.sync<LMPHostType>();
	atomKK->k_angle_type.sync<LMPHostType>();
	atomKK->k_angle_atom1.sync<LMPHostType>();
	atomKK->k_angle_atom2.sync<LMPHostType>();
	atomKK->k_angle_atom3.sync<LMPHostType>();
	}
	}
	}

	void AtomVecAngleKokkos::sync_overlapping_device(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
	if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
	if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
	if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
	if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
	if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
	if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
	if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
	if (mask & SPECIAL_MASK) {
	if (atomKK->k_nspecial.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
	if (atomKK->k_special.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
	}
	if (mask & BOND_MASK) {
	if (atomKK->k_num_bond.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
	if (atomKK->k_bond_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
	if (atomKK->k_bond_atom.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
	}
	if (mask & ANGLE_MASK) {
	if (atomKK->k_num_angle.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_angle,space);
	if (atomKK->k_angle_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_angle_type,space);
	if (atomKK->k_angle_atom1.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom1,space);
	if (atomKK->k_angle_atom2.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom2,space);
	if (atomKK->k_angle_atom3.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom3,space);
	}
	} else {
	if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
	if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
	if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
	if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
	if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
	if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
	if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
	if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
	if (mask & SPECIAL_MASK) {
	if (atomKK->k_nspecial.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
	if (atomKK->k_special.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
	}
	if (mask & BOND_MASK) {
	if (atomKK->k_num_bond.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
	if (atomKK->k_bond_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
	if (atomKK->k_bond_atom.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
	}
	if (mask & ANGLE_MASK) {
	if (atomKK->k_num_angle.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_angle,space);
	if (atomKK->k_angle_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_angle_type,space);
	if (atomKK->k_angle_atom1.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom1,space);
	if (atomKK->k_angle_atom2.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom2,space);
	if (atomKK->k_angle_atom3.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom3,space);
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecAngleKokkos::modified(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if (mask & X_MASK) atomKK->k_x.modify<LMPDeviceType>();
	if (mask & V_MASK) atomKK->k_v.modify<LMPDeviceType>();
	if (mask & F_MASK) atomKK->k_f.modify<LMPDeviceType>();
	if (mask & TAG_MASK) atomKK->k_tag.modify<LMPDeviceType>();
	if (mask & TYPE_MASK) atomKK->k_type.modify<LMPDeviceType>();
	if (mask & MASK_MASK) atomKK->k_mask.modify<LMPDeviceType>();
	if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPDeviceType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPDeviceType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.modify<LMPDeviceType>();
	atomKK->k_special.modify<LMPDeviceType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.modify<LMPDeviceType>();
	atomKK->k_bond_type.modify<LMPDeviceType>();
	atomKK->k_bond_atom.modify<LMPDeviceType>();
	}
	if (mask & ANGLE_MASK) {
	atomKK->k_num_angle.modify<LMPDeviceType>();
	atomKK->k_angle_type.modify<LMPDeviceType>();
	atomKK->k_angle_atom1.modify<LMPDeviceType>();
	atomKK->k_angle_atom2.modify<LMPDeviceType>();
	atomKK->k_angle_atom3.modify<LMPDeviceType>();
	}
	} else {
	if (mask & X_MASK) atomKK->k_x.modify<LMPHostType>();
	if (mask & V_MASK) atomKK->k_v.modify<LMPHostType>();
	if (mask & F_MASK) atomKK->k_f.modify<LMPHostType>();
	if (mask & TAG_MASK) atomKK->k_tag.modify<LMPHostType>();
	if (mask & TYPE_MASK) atomKK->k_type.modify<LMPHostType>();
	if (mask & MASK_MASK) atomKK->k_mask.modify<LMPHostType>();
	if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPHostType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPHostType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.modify<LMPHostType>();
	atomKK->k_special.modify<LMPHostType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.modify<LMPHostType>();
	atomKK->k_bond_type.modify<LMPHostType>();
	atomKK->k_bond_atom.modify<LMPHostType>();
	}
	if (mask & ANGLE_MASK) {
	atomKK->k_num_angle.modify<LMPHostType>();
	atomKK->k_angle_type.modify<LMPHostType>();
	atomKK->k_angle_atom1.modify<LMPHostType>();
	atomKK->k_angle_atom2.modify<LMPHostType>();
	atomKK->k_angle_atom3.modify<LMPHostType>();
	}
	}
	}

	diff --git a/src/KOKKOS/atom_vec_atomic_kokkos.cpp b/src/KOKKOS/atom_vec_atomic_kokkos.cpp
	index dc254e6a7..d040bd355 100644
	--- a/src/KOKKOS/atom_vec_atomic_kokkos.cpp
	+++ b/src/KOKKOS/atom_vec_atomic_kokkos.cpp
	@@ -1,1438 +1,1438 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale AtomicKokkos/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <stdlib.h>
	#include "atom_vec_atomic_kokkos.h"
	#include "atom_kokkos.h"
	#include "comm_kokkos.h"
	#include "domain.h"
	#include "modify.h"
	#include "fix.h"
	#include "atom_masks.h"
	#include "memory.h"
	#include "error.h"

	using namespace LAMMPS_NS;

	#define DELTA 10000

	/* ---------------------------------------------------------------------- */

	AtomVecAtomicKokkos::AtomVecAtomicKokkos(LAMMPS *lmp) : AtomVecKokkos(lmp)
	{
	molecular = 0;
	mass_type = 1;

	comm_x_only = comm_f_only = 1;
	size_forward = 3;
	size_reverse = 3;
	size_border = 6;
	size_velocity = 3;
	size_data_atom = 5;
	size_data_vel = 4;
	xcol_data = 3;

	k_count = DAT::tdual_int_1d("atom::k_count",1);
	atomKK = (AtomKokkos *) atom;
	commKK = (CommKokkos *) comm;
	}

	/* ----------------------------------------------------------------------
	grow atom arrays
	n = 0 grows arrays by DELTA
	n > 0 allocates arrays to size n
	------------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::grow(int n)
	{
	if (n == 0) nmax += DELTA;
	else nmax = n;
	atomKK->nmax = nmax;
	if (nmax < 0 \|\| nmax > MAXSMALLINT)
	error->one(FLERR,"Per-processor system is too big");

	sync(Device,ALL_MASK);
	modified(Device,ALL_MASK);

	memory->grow_kokkos(atomKK->k_tag,atomKK->tag,nmax,"atom:tag");
	memory->grow_kokkos(atomKK->k_type,atomKK->type,nmax,"atom:type");
	memory->grow_kokkos(atomKK->k_mask,atomKK->mask,nmax,"atom:mask");
	memory->grow_kokkos(atomKK->k_image,atomKK->image,nmax,"atom:image");

	memory->grow_kokkos(atomKK->k_x,atomKK->x,nmax,3,"atom:x");
	memory->grow_kokkos(atomKK->k_v,atomKK->v,nmax,3,"atom:v");
	memory->grow_kokkos(atomKK->k_f,atomKK->f,nmax,3,"atom:f");

	grow_reset();
	sync(Host,ALL_MASK);

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
	}

	/* ----------------------------------------------------------------------
	reset local array ptrs
	------------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::grow_reset()
	{
	tag = atomKK->tag;
	d_tag = atomKK->k_tag.d_view;
	h_tag = atomKK->k_tag.h_view;

	type = atomKK->type;
	d_type = atomKK->k_type.d_view;
	h_type = atomKK->k_type.h_view;
	mask = atomKK->mask;
	d_mask = atomKK->k_mask.d_view;
	h_mask = atomKK->k_mask.h_view;
	image = atomKK->image;
	d_image = atomKK->k_image.d_view;
	h_image = atomKK->k_image.h_view;

	x = atomKK->x;
	d_x = atomKK->k_x.d_view;
	h_x = atomKK->k_x.h_view;
	v = atomKK->v;
	d_v = atomKK->k_v.d_view;
	h_v = atomKK->k_v.h_view;
	f = atomKK->f;
	d_f = atomKK->k_f.d_view;
	h_f = atomKK->k_f.h_view;
	}

	/* ----------------------------------------------------------------------
	copy atom I info to atom J
	------------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::copy(int i, int j, int delflag)
	{
	h_tag[j] = h_tag[i];
	h_type[j] = h_type[i];
	mask[j] = mask[i];
	h_image[j] = h_image[i];
	h_x(j,0) = h_x(i,0);
	h_x(j,1) = h_x(i,1);
	h_x(j,2) = h_x(i,2);
	h_v(j,0) = h_v(i,0);
	h_v(j,1) = h_v(i,1);
	h_v(j,2) = h_v(i,2);

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG,int TRICLINIC>
	struct AtomVecAtomicKokkos_PackComm {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
	typename ArrayTypes<DeviceType>::t_xfloat_2d_um _buf;
	typename ArrayTypes<DeviceType>::t_int_2d_const _list;
	const int _iswap;
	X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
	X_FLOAT _pbc[6];

	AtomVecAtomicKokkos_PackComm(
	const typename DAT::tdual_x_array &x,
	const typename DAT::tdual_xfloat_2d &buf,
	const typename DAT::tdual_int_2d &list,
	const int & iswap,
	const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
	const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
	_x(x.view<DeviceType>()),_list(list.view<DeviceType>()),_iswap(iswap),
	_xprd(xprd),_yprd(yprd),_zprd(zprd),
	_xy(xy),_xz(xz),_yz(yz) {
	const size_t maxsend = (buf.view<DeviceType>().dimension_0()*buf.view<DeviceType>().dimension_1())/3;
	const size_t elements = 3;
	buffer_view<DeviceType>(_buf,buf,maxsend,elements);
	_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
	_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_buf(i,0) = _x(j,0);
	_buf(i,1) = _x(j,1);
	_buf(i,2) = _x(j,2);
	} else {
	if (TRICLINIC == 0) {
	_buf(i,0) = _x(j,0) + _pbc[0]*_xprd;
	_buf(i,1) = _x(j,1) + _pbc[1]*_yprd;
	_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
	} else {
	_buf(i,0) = _x(j,0) + _pbc[0]_xprd + _pbc[5]_xy + _pbc[4]*_xz;
	_buf(i,1) = _x(j,1) + _pbc[1]_yprd + _pbc[3]_yz;
	_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
	}
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecAtomicKokkos::pack_comm_kokkos(const int &n,
	const DAT::tdual_int_2d &list,
	const int & iswap,
	const DAT::tdual_xfloat_2d &buf,
	const int &pbc_flag,
	const int* const pbc)
	{
	// Check whether to always run forward communication on the host
	// Choose correct forward PackComm kernel

	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecAtomicKokkos_PackComm<LMPHostType,1,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAtomicKokkos_PackComm<LMPHostType,1,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecAtomicKokkos_PackComm<LMPHostType,0,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAtomicKokkos_PackComm<LMPHostType,0,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPHostType::fence();
	} else {
	sync(Device,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecAtomicKokkos_PackComm<LMPDeviceType,1,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAtomicKokkos_PackComm<LMPDeviceType,1,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecAtomicKokkos_PackComm<LMPDeviceType,0,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAtomicKokkos_PackComm<LMPDeviceType,0,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPDeviceType::fence();
	}

	return n*size_forward;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG,int TRICLINIC>
	struct AtomVecAtomicKokkos_PackCommSelf {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
	typename ArrayTypes<DeviceType>::t_x_array _xw;
	int _nfirst;
	typename ArrayTypes<DeviceType>::t_int_2d_const _list;
	const int _iswap;
	X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
	X_FLOAT _pbc[6];

	AtomVecAtomicKokkos_PackCommSelf(
	const typename DAT::tdual_x_array &x,
	const int &nfirst,
	const typename DAT::tdual_int_2d &list,
	const int & iswap,
	const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
	const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
	_x(x.view<DeviceType>()),_xw(x.view<DeviceType>()),_nfirst(nfirst),_list(list.view<DeviceType>()),_iswap(iswap),
	_xprd(xprd),_yprd(yprd),_zprd(zprd),
	_xy(xy),_xz(xz),_yz(yz) {
	_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
	_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_xw(i+_nfirst,0) = _x(j,0);
	_xw(i+_nfirst,1) = _x(j,1);
	_xw(i+_nfirst,2) = _x(j,2);
	} else {
	if (TRICLINIC == 0) {
	_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd;
	_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd;
	_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
	} else {
	_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]_xprd + _pbc[5]_xy + _pbc[4]*_xz;
	_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]_yprd + _pbc[3]_yz;
	_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
	}
	}

	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecAtomicKokkos::pack_comm_self(const int &n, const DAT::tdual_int_2d &list, const int & iswap,
	const int nfirst, const int &pbc_flag, const int* const pbc) {
	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	modified(Host,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecAtomicKokkos_PackCommSelf<LMPHostType,1,1> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAtomicKokkos_PackCommSelf<LMPHostType,1,0> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecAtomicKokkos_PackCommSelf<LMPHostType,0,1> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAtomicKokkos_PackCommSelf<LMPHostType,0,0> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPHostType::fence();
	} else {
	sync(Device,X_MASK);
	modified(Device,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecAtomicKokkos_PackCommSelf<LMPDeviceType,1,1> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAtomicKokkos_PackCommSelf<LMPDeviceType,1,0> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecAtomicKokkos_PackCommSelf<LMPDeviceType,0,1> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecAtomicKokkos_PackCommSelf<LMPDeviceType,0,0> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPDeviceType::fence();
	}
	return n*3;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecAtomicKokkos_UnpackComm {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array _x;
	typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
	int _first;

	AtomVecAtomicKokkos_UnpackComm(
	const typename DAT::tdual_x_array &x,
	const typename DAT::tdual_xfloat_2d &buf,
	const int& first):_x(x.view<DeviceType>()),_buf(buf.view<DeviceType>()),
	_first(first) {};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	_x(i+_first,0) = _buf(i,0);
	_x(i+_first,1) = _buf(i,1);
	_x(i+_first,2) = _buf(i,2);
	}
	};

	/* ---------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::unpack_comm_kokkos(const int &n, const int &first,
	const DAT::tdual_xfloat_2d &buf ) {
	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	modified(Host,X_MASK);
	struct AtomVecAtomicKokkos_UnpackComm<LMPHostType> f(atomKK->k_x,buf,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	} else {
	sync(Device,X_MASK);
	modified(Device,X_MASK);
	struct AtomVecAtomicKokkos_UnpackComm<LMPDeviceType> f(atomKK->k_x,buf,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAtomicKokkos::pack_comm(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0]domain->xprd + pbc[5]domain->xy + pbc[4]*domain->xz;
	dy = pbc[1]domain->yprd + pbc[3]domain->yz;
	dz = pbc[2]*domain->zprd;
	}
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	}
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAtomicKokkos::pack_comm_vel(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz,dvx,dvy,dvz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0]domain->xprd + pbc[5]domain->xy + pbc[4]*domain->xz;
	dy = pbc[1]domain->yprd + pbc[3]domain->yz;
	dz = pbc[2]*domain->zprd;
	}
	if (!deform_vremap) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	dvx = pbc[0]h_rate[0] + pbc[5]h_rate[5] + pbc[4]*h_rate[4];
	dvy = pbc[1]h_rate[1] + pbc[3]h_rate[3];
	dvz = pbc[2]*h_rate[2];
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	if (mask[i] & deform_groupbit) {
	buf[m++] = h_v(j,0) + dvx;
	buf[m++] = h_v(j,1) + dvy;
	buf[m++] = h_v(j,2) + dvz;
	} else {
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	}
	}
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::unpack_comm(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::unpack_comm_vel(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_v(i,0) = buf[m++];
	h_v(i,1) = buf[m++];
	h_v(i,2) = buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAtomicKokkos::pack_reverse(int n, int first, double *buf)
	{
	if(n > 0)
	sync(Host,F_MASK);

	int m = 0;
	const int last = first + n;
	for (int i = first; i < last; i++) {
	buf[m++] = h_f(i,0);
	buf[m++] = h_f(i,1);
	buf[m++] = h_f(i,2);
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::unpack_reverse(int n, int list, double buf)
	{
	if(n > 0) {
	sync(Host,F_MASK);
	modified(Host,F_MASK);
	}

	int m = 0;
	for (int i = 0; i < n; i++) {
	const int j = list[i];
	h_f(j,0) += buf[m++];
	h_f(j,1) += buf[m++];
	h_f(j,2) += buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG>
	struct AtomVecAtomicKokkos_PackBorder {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_xfloat_2d _buf;
	const typename ArrayTypes<DeviceType>::t_int_2d_const _list;
	const int _iswap;
	const typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
	const typename ArrayTypes<DeviceType>::t_tagint_1d _tag;
	const typename ArrayTypes<DeviceType>::t_int_1d _type;
	const typename ArrayTypes<DeviceType>::t_int_1d _mask;
	X_FLOAT _dx,_dy,_dz;

	AtomVecAtomicKokkos_PackBorder(
	const typename ArrayTypes<DeviceType>::t_xfloat_2d &buf,
	const typename ArrayTypes<DeviceType>::t_int_2d_const &list,
	const int & iswap,
	const typename ArrayTypes<DeviceType>::t_x_array &x,
	const typename ArrayTypes<DeviceType>::t_tagint_1d &tag,
	const typename ArrayTypes<DeviceType>::t_int_1d &type,
	const typename ArrayTypes<DeviceType>::t_int_1d &mask,
	const X_FLOAT &dx, const X_FLOAT &dy, const X_FLOAT &dz):
	_buf(buf),_list(list),_iswap(iswap),
	_x(x),_tag(tag),_type(type),_mask(mask),
	_dx(dx),_dy(dy),_dz(dz) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_buf(i,0) = _x(j,0);
	_buf(i,1) = _x(j,1);
	_buf(i,2) = _x(j,2);
	- _buf(i,3) = _tag(j);
	- _buf(i,4) = _type(j);
	- _buf(i,5) = _mask(j);
	+ _buf(i,3) = d_ubuf(_tag(j)).d;
	+ _buf(i,4) = d_ubuf(_type(j)).d;
	+ _buf(i,5) = d_ubuf(_mask(j)).d;
	} else {
	_buf(i,0) = _x(j,0) + _dx;
	_buf(i,1) = _x(j,1) + _dy;
	_buf(i,2) = _x(j,2) + _dz;
	- _buf(i,3) = _tag(j);
	- _buf(i,4) = _type(j);
	- _buf(i,5) = _mask(j);
	+ _buf(i,3) = d_ubuf(_tag(j)).d;
	+ _buf(i,4) = d_ubuf(_type(j)).d;
	+ _buf(i,5) = d_ubuf(_mask(j)).d;
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecAtomicKokkos::pack_border_kokkos(int n, DAT::tdual_int_2d k_sendlist, DAT::tdual_xfloat_2d buf,int iswap,
	int pbc_flag, int *pbc, ExecutionSpace space)
	{
	X_FLOAT dx,dy,dz;

	if (pbc_flag != 0) {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	if(space==Host) {
	AtomVecAtomicKokkos_PackBorder<LMPHostType,1> f(
	buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
	iswap,h_x,h_tag,h_type,h_mask,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	AtomVecAtomicKokkos_PackBorder<LMPDeviceType,1> f(
	buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
	iswap,d_x,d_tag,d_type,d_mask,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}

	} else {
	dx = dy = dz = 0;
	if(space==Host) {
	AtomVecAtomicKokkos_PackBorder<LMPHostType,0> f(
	buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
	iswap,h_x,h_tag,h_type,h_mask,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	AtomVecAtomicKokkos_PackBorder<LMPDeviceType,0> f(
	buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
	iswap,d_x,d_tag,d_type,d_mask,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}
	return n*6;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAtomicKokkos::pack_border(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	}
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);

	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAtomicKokkos::pack_border_vel(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz,dvx,dvy,dvz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	if (!deform_vremap) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	dvx = pbc[0]h_rate[0] + pbc[5]h_rate[5] + pbc[4]*h_rate[4];
	dvy = pbc[1]h_rate[1] + pbc[3]h_rate[3];
	dvz = pbc[2]*h_rate[2];
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	if (mask[i] & deform_groupbit) {
	buf[m++] = h_v(j,0) + dvx;
	buf[m++] = h_v(j,1) + dvy;
	buf[m++] = h_v(j,2) + dvz;
	} else {
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	}
	}
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);

	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecAtomicKokkos_UnpackBorder {
	typedef DeviceType device_type;

	const typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
	typename ArrayTypes<DeviceType>::t_x_array _x;
	typename ArrayTypes<DeviceType>::t_tagint_1d _tag;
	typename ArrayTypes<DeviceType>::t_int_1d _type;
	typename ArrayTypes<DeviceType>::t_int_1d _mask;
	int _first;


	AtomVecAtomicKokkos_UnpackBorder(
	const typename ArrayTypes<DeviceType>::t_xfloat_2d_const &buf,
	typename ArrayTypes<DeviceType>::t_x_array &x,
	typename ArrayTypes<DeviceType>::t_tagint_1d &tag,
	typename ArrayTypes<DeviceType>::t_int_1d &type,
	typename ArrayTypes<DeviceType>::t_int_1d &mask,
	const int& first):
	_buf(buf),_x(x),_tag(tag),_type(type),_mask(mask),_first(first){
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	_x(i+_first,0) = _buf(i,0);
	_x(i+_first,1) = _buf(i,1);
	_x(i+_first,2) = _buf(i,2);
	- _tag(i+_first) = static_cast<tagint> (_buf(i,3));
	- _type(i+_first) = static_cast<int> (_buf(i,4));
	- _mask(i+_first) = static_cast<int> (_buf(i,5));
	+ _tag(i+_first) = (tagint) d_ubuf(_buf(i,3)).i;
	+ _type(i+_first) = (int) d_ubuf(_buf(i,4)).i;
	+ _mask(i+_first) = (int) d_ubuf(_buf(i,5)).i;
	// printf("%i %i %lf %lf %lf %i BORDER\n",_tag(i+_first),i+_first,_x(i+_first,0),_x(i+_first,1),_x(i+_first,2),_type(i+_first));
	}
	};

	/* ---------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::unpack_border_kokkos(const int &n, const int &first,
	const DAT::tdual_xfloat_2d &buf,ExecutionSpace space) {
	modified(space,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK);
	while (first+n >= nmax) grow(0);
	modified(space,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK);
	if(space==Host) {
	struct AtomVecAtomicKokkos_UnpackBorder<LMPHostType> f(buf.view<LMPHostType>(),h_x,h_tag,h_type,h_mask,first);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	struct AtomVecAtomicKokkos_UnpackBorder<LMPDeviceType> f(buf.view<LMPDeviceType>(),d_x,d_tag,d_type,d_mask,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::unpack_border(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	if (i == nmax) grow(0);
	modified(Host,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK);
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_tag(i) = (tagint) ubuf(buf[m++]).i;
	h_type(i) = (int) ubuf(buf[m++]).i;
	h_mask(i) = (int) ubuf(buf[m++]).i;
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->
	unpack_border(n,first,&buf[m]);
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::unpack_border_vel(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	if (i == nmax) grow(0);
	modified(Host,X_MASK\|V_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK);
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_tag(i) = (tagint) ubuf(buf[m++]).i;
	h_type(i) = (int) ubuf(buf[m++]).i;
	h_mask(i) = (int) ubuf(buf[m++]).i;
	h_v(i,0) = buf[m++];
	h_v(i,1) = buf[m++];
	h_v(i,2) = buf[m++];
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->
	unpack_border(n,first,&buf[m]);
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecAtomicKokkos_PackExchangeFunctor {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;
	typename AT::t_x_array_randomread _x;
	typename AT::t_v_array_randomread _v;
	typename AT::t_tagint_1d_randomread _tag;
	typename AT::t_int_1d_randomread _type;
	typename AT::t_int_1d_randomread _mask;
	typename AT::t_imageint_1d_randomread _image;
	typename AT::t_x_array _xw;
	typename AT::t_v_array _vw;
	typename AT::t_tagint_1d _tagw;
	typename AT::t_int_1d _typew;
	typename AT::t_int_1d _maskw;
	typename AT::t_imageint_1d _imagew;

	typename AT::t_xfloat_2d_um _buf;
	typename AT::t_int_1d_const _sendlist;
	typename AT::t_int_1d_const _copylist;
	int _nlocal,_dim;
	X_FLOAT _lo,_hi;

	AtomVecAtomicKokkos_PackExchangeFunctor(
	const AtomKokkos* atom,
	const typename AT::tdual_xfloat_2d buf,
	typename AT::tdual_int_1d sendlist,
	typename AT::tdual_int_1d copylist,int nlocal, int dim,
	X_FLOAT lo, X_FLOAT hi):
	_x(atom->k_x.view<DeviceType>()),
	_v(atom->k_v.view<DeviceType>()),
	_tag(atom->k_tag.view<DeviceType>()),
	_type(atom->k_type.view<DeviceType>()),
	_mask(atom->k_mask.view<DeviceType>()),
	_image(atom->k_image.view<DeviceType>()),
	_xw(atom->k_x.view<DeviceType>()),
	_vw(atom->k_v.view<DeviceType>()),
	_tagw(atom->k_tag.view<DeviceType>()),
	_typew(atom->k_type.view<DeviceType>()),
	_maskw(atom->k_mask.view<DeviceType>()),
	_imagew(atom->k_image.view<DeviceType>()),
	_sendlist(sendlist.template view<DeviceType>()),
	_copylist(copylist.template view<DeviceType>()),
	_nlocal(nlocal),_dim(dim),
	_lo(lo),_hi(hi){
	const size_t elements = 11;
	const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*buf.template view<DeviceType>().dimension_1())/elements;

	buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
	}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int &mysend) const {
	const int i = _sendlist(mysend);
	_buf(mysend,0) = 11;
	_buf(mysend,1) = _x(i,0);
	_buf(mysend,2) = _x(i,1);
	_buf(mysend,3) = _x(i,2);
	_buf(mysend,4) = _v(i,0);
	_buf(mysend,5) = _v(i,1);
	_buf(mysend,6) = _v(i,2);
	- _buf(mysend,7) = _tag[i];
	- _buf(mysend,8) = _type[i];
	- _buf(mysend,9) = _mask[i];
	- _buf(mysend,10) = _image[i];
	+ _buf(mysend,7) = d_ubuf(_tag[i]).d;
	+ _buf(mysend,8) = d_ubuf(_type[i]).d;
	+ _buf(mysend,9) = d_ubuf(_mask[i]).d;
	+ _buf(mysend,10) = d_ubuf(_image[i]).d;
	const int j = _copylist(mysend);

	if(j>-1) {
	_xw(i,0) = _x(j,0);
	_xw(i,1) = _x(j,1);
	_xw(i,2) = _x(j,2);
	_vw(i,0) = _v(j,0);
	_vw(i,1) = _v(j,1);
	_vw(i,2) = _v(j,2);
	_tagw[i] = _tag(j);
	_typew[i] = _type(j);
	_maskw[i] = _mask(j);
	_imagew[i] = _image(j);
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecAtomicKokkos::pack_exchange_kokkos(const int &nsend,DAT::tdual_xfloat_2d &k_buf, DAT::tdual_int_1d k_sendlist,DAT::tdual_int_1d k_copylist,ExecutionSpace space,int dim,X_FLOAT lo,X_FLOAT hi )
	{
	if(nsend > (int) (k_buf.view<LMPHostType>().dimension_0()*k_buf.view<LMPHostType>().dimension_1())/11) {
	int newsize = nsend*11/k_buf.view<LMPHostType>().dimension_1()+1;
	k_buf.resize(newsize,k_buf.view<LMPHostType>().dimension_1());
	}
	if(space == Host) {
	AtomVecAtomicKokkos_PackExchangeFunctor<LMPHostType> f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
	Kokkos::parallel_for(nsend,f);
	LMPHostType::fence();
	return nsend*11;
	} else {
	AtomVecAtomicKokkos_PackExchangeFunctor<LMPDeviceType> f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
	Kokkos::parallel_for(nsend,f);
	LMPDeviceType::fence();
	return nsend*11;
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAtomicKokkos::pack_exchange(int i, double *buf)
	{
	int m = 1;
	buf[m++] = h_x(i,0);
	buf[m++] = h_x(i,1);
	buf[m++] = h_x(i,2);
	buf[m++] = h_v(i,0);
	buf[m++] = h_v(i,1);
	buf[m++] = h_v(i,2);
	buf[m++] = ubuf(h_tag(i)).d;
	buf[m++] = ubuf(h_type(i)).d;
	buf[m++] = ubuf(h_mask(i)).d;
	buf[m++] = ubuf(h_image(i)).d;

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);

	buf[0] = m;
	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecAtomicKokkos_UnpackExchangeFunctor {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;
	typename AT::t_x_array _x;
	typename AT::t_v_array _v;
	typename AT::t_tagint_1d _tag;
	typename AT::t_int_1d _type;
	typename AT::t_int_1d _mask;
	typename AT::t_imageint_1d _image;

	typename AT::t_xfloat_2d_um _buf;
	typename AT::t_int_1d _nlocal;
	int _dim;
	X_FLOAT _lo,_hi;

	AtomVecAtomicKokkos_UnpackExchangeFunctor(
	const AtomKokkos* atom,
	const typename AT::tdual_xfloat_2d buf,
	typename AT::tdual_int_1d nlocal,
	int dim, X_FLOAT lo, X_FLOAT hi):
	_x(atom->k_x.view<DeviceType>()),
	_v(atom->k_v.view<DeviceType>()),
	_tag(atom->k_tag.view<DeviceType>()),
	_type(atom->k_type.view<DeviceType>()),
	_mask(atom->k_mask.view<DeviceType>()),
	_image(atom->k_image.view<DeviceType>()),
	_nlocal(nlocal.template view<DeviceType>()),_dim(dim),
	_lo(lo),_hi(hi){
	const size_t elements = 11;
	const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*buf.template view<DeviceType>().dimension_1())/elements;

	buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
	}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int &myrecv) const {
	X_FLOAT x = _buf(myrecv,_dim+1);
	if (x >= _lo && x < _hi) {
	int i = Kokkos::atomic_fetch_add(&_nlocal(0),1);
	_x(i,0) = _buf(myrecv,1);
	_x(i,1) = _buf(myrecv,2);
	_x(i,2) = _buf(myrecv,3);
	_v(i,0) = _buf(myrecv,4);
	_v(i,1) = _buf(myrecv,5);
	_v(i,2) = _buf(myrecv,6);
	- _tag[i] = _buf(myrecv,7);
	- _type[i] = _buf(myrecv,8);
	- _mask[i] = _buf(myrecv,9);
	- _image[i] = _buf(myrecv,10);
	+ _tag[i] = (tagint) d_ubuf(_buf(myrecv,7)).i;
	+ _type[i] = (int) d_ubuf(_buf(myrecv,8)).i;
	+ _mask[i] = (int) d_ubuf(_buf(myrecv,9)).i;
	+ _image[i] = (imageint) d_ubuf(_buf(myrecv,10)).i;
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecAtomicKokkos::unpack_exchange_kokkos(DAT::tdual_xfloat_2d &k_buf,int nrecv,int nlocal,int dim,X_FLOAT lo,X_FLOAT hi,ExecutionSpace space) {
	if(space == Host) {
	k_count.h_view(0) = nlocal;
	AtomVecAtomicKokkos_UnpackExchangeFunctor<LMPHostType> f(atomKK,k_buf,k_count,dim,lo,hi);
	Kokkos::parallel_for(nrecv/11,f);
	LMPHostType::fence();
	return k_count.h_view(0);
	} else {
	k_count.h_view(0) = nlocal;
	k_count.modify<LMPHostType>();
	k_count.sync<LMPDeviceType>();
	AtomVecAtomicKokkos_UnpackExchangeFunctor<LMPDeviceType> f(atomKK,k_buf,k_count,dim,lo,hi);
	Kokkos::parallel_for(nrecv/11,f);
	LMPDeviceType::fence();
	k_count.modify<LMPDeviceType>();
	k_count.sync<LMPHostType>();

	return k_count.h_view(0);
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecAtomicKokkos::unpack_exchange(double *buf)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) grow(0);
	modified(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK);

	int m = 1;
	h_x(nlocal,0) = buf[m++];
	h_x(nlocal,1) = buf[m++];
	h_x(nlocal,2) = buf[m++];
	h_v(nlocal,0) = buf[m++];
	h_v(nlocal,1) = buf[m++];
	h_v(nlocal,2) = buf[m++];
	h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
	h_type(nlocal) = (int) ubuf(buf[m++]).i;
	h_mask(nlocal) = (int) ubuf(buf[m++]).i;
	h_image(nlocal) = (imageint) ubuf(buf[m++]).i;

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	m += modify->fix[atom->extra_grow[iextra]]->
	unpack_exchange(nlocal,&buf[m]);

	atom->nlocal++;
	return m;
	}

	/* ----------------------------------------------------------------------
	size of restart data for all atoms owned by this proc
	include extra data stored by fixes
	------------------------------------------------------------------------- */

	int AtomVecAtomicKokkos::size_restart()
	{
	int i;

	int nlocal = atom->nlocal;
	int n = 11 * nlocal;

	if (atom->nextra_restart)
	for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
	for (i = 0; i < nlocal; i++)
	n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);

	return n;
	}

	/* ----------------------------------------------------------------------
	pack atom I's data for restart file including extra quantities
	xyz must be 1st 3 values, so that read_restart can test on them
	molecular types may be negative, but write as positive
	------------------------------------------------------------------------- */

	int AtomVecAtomicKokkos::pack_restart(int i, double *buf)
	{
	sync(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK );

	int m = 1;
	buf[m++] = h_x(i,0);
	buf[m++] = h_x(i,1);
	buf[m++] = h_x(i,2);
	buf[m++] = ubuf(h_tag(i)).d;
	buf[m++] = ubuf(h_type(i)).d;
	buf[m++] = ubuf(h_mask(i)).d;
	buf[m++] = ubuf(h_image(i)).d;
	buf[m++] = h_v(i,0);
	buf[m++] = h_v(i,1);
	buf[m++] = h_v(i,2);

	if (atom->nextra_restart)
	for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
	m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);

	buf[0] = m;
	return m;
	}

	/* ----------------------------------------------------------------------
	unpack data for one atom from restart file including extra quantities
	------------------------------------------------------------------------- */

	int AtomVecAtomicKokkos::unpack_restart(double *buf)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) {
	grow(0);
	if (atom->nextra_store)
	memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
	}
	modified(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK );

	int m = 1;
	h_x(nlocal,0) = buf[m++];
	h_x(nlocal,1) = buf[m++];
	h_x(nlocal,2) = buf[m++];
	h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
	h_type(nlocal) = (int) ubuf(buf[m++]).i;
	h_mask(nlocal) = (int) ubuf(buf[m++]).i;
	h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
	h_v(nlocal,0) = buf[m++];
	h_v(nlocal,1) = buf[m++];
	h_v(nlocal,2) = buf[m++];

	double **extra = atom->extra;
	if (atom->nextra_store) {
	int size = static_cast<int> (buf[0]) - m;
	for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
	}

	atom->nlocal++;
	return m;
	}

	/* ----------------------------------------------------------------------
	create one atom of itype at coord
	set other values to defaults
	------------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::create_atom(int itype, double *coord)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) {
	//if(nlocal>2) printf("typeA: %i %i\n",type[0],type[1]);
	atomKK->modified(Host,ALL_MASK);
	grow(0);
	//if(nlocal>2) printf("typeB: %i %i\n",type[0],type[1]);
	}
	atomKK->modified(Host,ALL_MASK);

	tag[nlocal] = 0;
	type[nlocal] = itype;
	h_x(nlocal,0) = coord[0];
	h_x(nlocal,1) = coord[1];
	h_x(nlocal,2) = coord[2];
	h_mask[nlocal] = 1;
	h_image[nlocal] = ((tagint) IMGMAX << IMG2BITS) \|
	((tagint) IMGMAX << IMGBITS) \| IMGMAX;
	h_v(nlocal,0) = 0.0;
	h_v(nlocal,1) = 0.0;
	h_v(nlocal,2) = 0.0;

	atom->nlocal++;
	}

	/* ----------------------------------------------------------------------
	unpack one line from Atoms section of data file
	initialize other atom quantities
	------------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::data_atom(double *coord, tagint imagetmp,
	char **values)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) grow(0);

	h_tag[nlocal] = atoi(values[0]);
	h_type[nlocal] = atoi(values[1]);
	if (type[nlocal] <= 0 \|\| type[nlocal] > atom->ntypes)
	error->one(FLERR,"Invalid atom type in Atoms section of data file");

	h_x(nlocal,0) = coord[0];
	h_x(nlocal,1) = coord[1];
	h_x(nlocal,2) = coord[2];

	h_image[nlocal] = imagetmp;

	h_mask[nlocal] = 1;
	h_v(nlocal,0) = 0.0;
	h_v(nlocal,1) = 0.0;
	h_v(nlocal,2) = 0.0;

	atomKK->modified(Host,ALL_MASK);

	atom->nlocal++;
	}

	/* ----------------------------------------------------------------------
	pack atom info for data file including 3 image flags
	------------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::pack_data(double **buf)
	{
	int nlocal = atom->nlocal;
	for (int i = 0; i < nlocal; i++) {
	buf[i][0] = h_tag[i];
	buf[i][1] = h_type[i];
	buf[i][2] = h_x(i,0);
	buf[i][3] = h_x(i,1);
	buf[i][4] = h_x(i,2);
	buf[i][5] = (h_image[i] & IMGMASK) - IMGMAX;
	buf[i][6] = (h_image[i] >> IMGBITS & IMGMASK) - IMGMAX;
	buf[i][7] = (h_image[i] >> IMG2BITS) - IMGMAX;
	}
	}

	/* ----------------------------------------------------------------------
	write atom info to data file including 3 image flags
	------------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::write_data(FILE fp, int n, double *buf)
	{
	for (int i = 0; i < n; i++)
	fprintf(fp,"%d %d %-1.16e %-1.16e %-1.16e %d %d %d\n",
	(int) buf[i][0],(int) buf[i][1],buf[i][2],buf[i][3],buf[i][4],
	(int) buf[i][5],(int) buf[i][6],(int) buf[i][7]);
	}

	/* ----------------------------------------------------------------------
	return # of bytes of allocated memory
	------------------------------------------------------------------------- */

	bigint AtomVecAtomicKokkos::memory_usage()
	{
	bigint bytes = 0;

	if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
	if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
	if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
	if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
	if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
	if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
	if (atom->memcheck("f")) bytes += memory->usage(f,nmax*commKK->nthreads,3);

	return bytes;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::sync(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if (mask & X_MASK) atomKK->k_x.sync<LMPDeviceType>();
	if (mask & V_MASK) atomKK->k_v.sync<LMPDeviceType>();
	if (mask & F_MASK) atomKK->k_f.sync<LMPDeviceType>();
	if (mask & TAG_MASK) atomKK->k_tag.sync<LMPDeviceType>();
	if (mask & TYPE_MASK) atomKK->k_type.sync<LMPDeviceType>();
	if (mask & MASK_MASK) atomKK->k_mask.sync<LMPDeviceType>();
	if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPDeviceType>();
	} else {
	if (mask & X_MASK) atomKK->k_x.sync<LMPHostType>();
	if (mask & V_MASK) atomKK->k_v.sync<LMPHostType>();
	if (mask & F_MASK) atomKK->k_f.sync<LMPHostType>();
	if (mask & TAG_MASK) atomKK->k_tag.sync<LMPHostType>();
	if (mask & TYPE_MASK) atomKK->k_type.sync<LMPHostType>();
	if (mask & MASK_MASK) atomKK->k_mask.sync<LMPHostType>();
	if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPHostType>();
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::sync_overlapping_device(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
	if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
	if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
	if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
	if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
	if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
	if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
	} else {
	if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
	if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
	if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
	if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
	if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
	if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
	if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecAtomicKokkos::modified(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if (mask & X_MASK) atomKK->k_x.modify<LMPDeviceType>();
	if (mask & V_MASK) atomKK->k_v.modify<LMPDeviceType>();
	if (mask & F_MASK) atomKK->k_f.modify<LMPDeviceType>();
	if (mask & TAG_MASK) atomKK->k_tag.modify<LMPDeviceType>();
	if (mask & TYPE_MASK) atomKK->k_type.modify<LMPDeviceType>();
	if (mask & MASK_MASK) atomKK->k_mask.modify<LMPDeviceType>();
	if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPDeviceType>();
	} else {
	if (mask & X_MASK) atomKK->k_x.modify<LMPHostType>();
	if (mask & V_MASK) atomKK->k_v.modify<LMPHostType>();
	if (mask & F_MASK) atomKK->k_f.modify<LMPHostType>();
	if (mask & TAG_MASK) atomKK->k_tag.modify<LMPHostType>();
	if (mask & TYPE_MASK) atomKK->k_type.modify<LMPHostType>();
	if (mask & MASK_MASK) atomKK->k_mask.modify<LMPHostType>();
	if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPHostType>();
	}
	}

	diff --git a/src/KOKKOS/atom_vec_bond_kokkos.cpp b/src/KOKKOS/atom_vec_bond_kokkos.cpp
	index f10decac2..c46c49cb2 100644
	--- a/src/KOKKOS/atom_vec_bond_kokkos.cpp
	+++ b/src/KOKKOS/atom_vec_bond_kokkos.cpp
	@@ -1,1786 +1,1786 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <stdlib.h>
	#include "atom_vec_bond_kokkos.h"
	#include "atom_kokkos.h"
	#include "comm_kokkos.h"
	#include "domain.h"
	#include "modify.h"
	#include "fix.h"
	#include "atom_masks.h"
	#include "memory.h"
	#include "error.h"

	using namespace LAMMPS_NS;

	#define DELTA 10000

	/* ---------------------------------------------------------------------- */

	AtomVecBondKokkos::AtomVecBondKokkos(LAMMPS *lmp) : AtomVecKokkos(lmp)
	{
	molecular = 1;
	bonds_allow = 1;
	mass_type = 1;

	comm_x_only = comm_f_only = 1;
	size_forward = 3;
	size_reverse = 3;
	size_border = 7;
	size_velocity = 3;
	size_data_atom = 6;
	size_data_vel = 4;
	xcol_data = 4;

	k_count = DAT::tdual_int_1d("atom::k_count",1);
	atomKK = (AtomKokkos *) atom;
	commKK = (CommKokkos *) comm;
	}

	/* ----------------------------------------------------------------------
	grow atom arrays
	n = 0 grows arrays by DELTA
	n > 0 allocates arrays to size n
	------------------------------------------------------------------------- */

	void AtomVecBondKokkos::grow(int n)
	{
	if (n == 0) nmax += DELTA;
	else nmax = n;
	atomKK->nmax = nmax;
	if (nmax < 0 \|\| nmax > MAXSMALLINT)
	error->one(FLERR,"Per-processor system is too big");

	sync(Device,ALL_MASK);
	modified(Device,ALL_MASK);

	memory->grow_kokkos(atomKK->k_tag,atomKK->tag,nmax,"atom:tag");
	memory->grow_kokkos(atomKK->k_type,atomKK->type,nmax,"atom:type");
	memory->grow_kokkos(atomKK->k_mask,atomKK->mask,nmax,"atom:mask");
	memory->grow_kokkos(atomKK->k_image,atomKK->image,nmax,"atom:image");

	memory->grow_kokkos(atomKK->k_x,atomKK->x,nmax,3,"atom:x");
	memory->grow_kokkos(atomKK->k_v,atomKK->v,nmax,3,"atom:v");
	memory->grow_kokkos(atomKK->k_f,atomKK->f,nmax,3,"atom:f");

	memory->grow_kokkos(atomKK->k_molecule,atomKK->molecule,nmax,"atom:molecule");
	memory->grow_kokkos(atomKK->k_nspecial,atomKK->nspecial,nmax,3,"atom:nspecial");
	memory->grow_kokkos(atomKK->k_special,atomKK->special,nmax,atomKK->maxspecial,"atom:special");
	memory->grow_kokkos(atomKK->k_num_bond,atomKK->num_bond,nmax,"atom:num_bond");
	memory->grow_kokkos(atomKK->k_bond_type,atomKK->bond_type,nmax,atomKK->bond_per_atom,"atom:bond_type");
	memory->grow_kokkos(atomKK->k_bond_atom,atomKK->bond_atom,nmax,atomKK->bond_per_atom,"atom:bond_atom");

	grow_reset();
	sync(Host,ALL_MASK);

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atomKK->nextra_grow; iextra++)
	modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
	}

	/* ----------------------------------------------------------------------
	reset local array ptrs
	------------------------------------------------------------------------- */

	void AtomVecBondKokkos::grow_reset()
	{
	tag = atomKK->tag;
	d_tag = atomKK->k_tag.d_view;
	h_tag = atomKK->k_tag.h_view;

	type = atomKK->type;
	d_type = atomKK->k_type.d_view;
	h_type = atomKK->k_type.h_view;
	mask = atomKK->mask;
	d_mask = atomKK->k_mask.d_view;
	h_mask = atomKK->k_mask.h_view;
	image = atomKK->image;
	d_image = atomKK->k_image.d_view;
	h_image = atomKK->k_image.h_view;

	x = atomKK->x;
	d_x = atomKK->k_x.d_view;
	h_x = atomKK->k_x.h_view;
	v = atomKK->v;
	d_v = atomKK->k_v.d_view;
	h_v = atomKK->k_v.h_view;
	f = atomKK->f;
	d_f = atomKK->k_f.d_view;
	h_f = atomKK->k_f.h_view;

	molecule = atomKK->molecule;
	d_molecule = atomKK->k_molecule.d_view;
	h_molecule = atomKK->k_molecule.h_view;
	nspecial = atomKK->nspecial;
	d_nspecial = atomKK->k_nspecial.d_view;
	h_nspecial = atomKK->k_nspecial.h_view;
	special = atomKK->special;
	d_special = atomKK->k_special.d_view;
	h_special = atomKK->k_special.h_view;
	num_bond = atomKK->num_bond;
	d_num_bond = atomKK->k_num_bond.d_view;
	h_num_bond = atomKK->k_num_bond.h_view;
	bond_type = atomKK->bond_type;
	d_bond_type = atomKK->k_bond_type.d_view;
	h_bond_type = atomKK->k_bond_type.h_view;
	bond_atom = atomKK->bond_atom;
	d_bond_atom = atomKK->k_bond_atom.d_view;
	h_bond_atom = atomKK->k_bond_atom.h_view;
	}

	/* ----------------------------------------------------------------------
	copy atom I info to atom J
	------------------------------------------------------------------------- */

	void AtomVecBondKokkos::copy(int i, int j, int delflag)
	{
	int k;

	h_tag[j] = h_tag[i];
	h_type[j] = h_type[i];
	mask[j] = mask[i];
	h_image[j] = h_image[i];
	h_x(j,0) = h_x(i,0);
	h_x(j,1) = h_x(i,1);
	h_x(j,2) = h_x(i,2);
	h_v(j,0) = h_v(i,0);
	h_v(j,1) = h_v(i,1);
	h_v(j,2) = h_v(i,2);

	h_molecule(j) = h_molecule(i);

	h_num_bond(j) = h_num_bond(i);
	for (k = 0; k < h_num_bond(j); k++) {
	h_bond_type(j,k) = h_bond_type(i,k);
	h_bond_atom(j,k) = h_bond_atom(i,k);
	}

	h_nspecial(j,0) = h_nspecial(i,0);
	h_nspecial(j,1) = h_nspecial(i,1);
	h_nspecial(j,2) = h_nspecial(i,2);
	for (k = 0; k < h_nspecial(j,2); k++) h_special(j,k) = h_special(i,k);

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG,int TRICLINIC>
	struct AtomVecBondKokkos_PackComm {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
	typename ArrayTypes<DeviceType>::t_xfloat_2d_um _buf;
	typename ArrayTypes<DeviceType>::t_int_2d_const _list;
	const int _iswap;
	X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
	X_FLOAT _pbc[6];

	AtomVecBondKokkos_PackComm(
	const typename DAT::tdual_x_array &x,
	const typename DAT::tdual_xfloat_2d &buf,
	const typename DAT::tdual_int_2d &list,
	const int & iswap,
	const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
	const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
	_x(x.view<DeviceType>()),_list(list.view<DeviceType>()),_iswap(iswap),
	_xprd(xprd),_yprd(yprd),_zprd(zprd),
	_xy(xy),_xz(xz),_yz(yz) {
	const size_t maxsend = (buf.view<DeviceType>().dimension_0()*buf.view<DeviceType>().dimension_1())/3;
	const size_t elements = 3;
	buffer_view<DeviceType>(_buf,buf,maxsend,elements);
	_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
	_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_buf(i,0) = _x(j,0);
	_buf(i,1) = _x(j,1);
	_buf(i,2) = _x(j,2);
	} else {
	if (TRICLINIC == 0) {
	_buf(i,0) = _x(j,0) + _pbc[0]*_xprd;
	_buf(i,1) = _x(j,1) + _pbc[1]*_yprd;
	_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
	} else {
	_buf(i,0) = _x(j,0) + _pbc[0]_xprd + _pbc[5]_xy + _pbc[4]*_xz;
	_buf(i,1) = _x(j,1) + _pbc[1]_yprd + _pbc[3]_yz;
	_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
	}
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecBondKokkos::pack_comm_kokkos(const int &n,
	const DAT::tdual_int_2d &list,
	const int & iswap,
	const DAT::tdual_xfloat_2d &buf,
	const int &pbc_flag,
	const int* const pbc)
	{
	// Check whether to always run forward communication on the host
	// Choose correct forward PackComm kernel

	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecBondKokkos_PackComm<LMPHostType,1,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecBondKokkos_PackComm<LMPHostType,1,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecBondKokkos_PackComm<LMPHostType,0,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecBondKokkos_PackComm<LMPHostType,0,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPHostType::fence();
	} else {
	sync(Device,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecBondKokkos_PackComm<LMPDeviceType,1,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecBondKokkos_PackComm<LMPDeviceType,1,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecBondKokkos_PackComm<LMPDeviceType,0,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecBondKokkos_PackComm<LMPDeviceType,0,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPDeviceType::fence();
	}

	return n*size_forward;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG,int TRICLINIC>
	struct AtomVecBondKokkos_PackCommSelf {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
	typename ArrayTypes<DeviceType>::t_x_array _xw;
	int _nfirst;
	typename ArrayTypes<DeviceType>::t_int_2d_const _list;
	const int _iswap;
	X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
	X_FLOAT _pbc[6];

	AtomVecBondKokkos_PackCommSelf(
	const typename DAT::tdual_x_array &x,
	const int &nfirst,
	const typename DAT::tdual_int_2d &list,
	const int & iswap,
	const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
	const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
	_x(x.view<DeviceType>()),_xw(x.view<DeviceType>()),_nfirst(nfirst),_list(list.view<DeviceType>()),_iswap(iswap),
	_xprd(xprd),_yprd(yprd),_zprd(zprd),
	_xy(xy),_xz(xz),_yz(yz) {
	_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
	_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_xw(i+_nfirst,0) = _x(j,0);
	_xw(i+_nfirst,1) = _x(j,1);
	_xw(i+_nfirst,2) = _x(j,2);
	} else {
	if (TRICLINIC == 0) {
	_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd;
	_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd;
	_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
	} else {
	_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]_xprd + _pbc[5]_xy + _pbc[4]*_xz;
	_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]_yprd + _pbc[3]_yz;
	_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
	}
	}

	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecBondKokkos::pack_comm_self(const int &n, const DAT::tdual_int_2d &list, const int & iswap,
	const int nfirst, const int &pbc_flag, const int* const pbc) {
	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	modified(Host,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecBondKokkos_PackCommSelf<LMPHostType,1,1> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecBondKokkos_PackCommSelf<LMPHostType,1,0> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecBondKokkos_PackCommSelf<LMPHostType,0,1> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecBondKokkos_PackCommSelf<LMPHostType,0,0> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPHostType::fence();
	} else {
	sync(Device,X_MASK);
	modified(Device,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecBondKokkos_PackCommSelf<LMPDeviceType,1,1> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecBondKokkos_PackCommSelf<LMPDeviceType,1,0> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecBondKokkos_PackCommSelf<LMPDeviceType,0,1> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecBondKokkos_PackCommSelf<LMPDeviceType,0,0> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPDeviceType::fence();
	}
	return n*3;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecBondKokkos_UnpackComm {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array _x;
	typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
	int _first;

	AtomVecBondKokkos_UnpackComm(
	const typename DAT::tdual_x_array &x,
	const typename DAT::tdual_xfloat_2d &buf,
	const int& first):_x(x.view<DeviceType>()),_buf(buf.view<DeviceType>()),
	_first(first) {};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	_x(i+_first,0) = _buf(i,0);
	_x(i+_first,1) = _buf(i,1);
	_x(i+_first,2) = _buf(i,2);
	}
	};

	/* ---------------------------------------------------------------------- */

	void AtomVecBondKokkos::unpack_comm_kokkos(const int &n, const int &first,
	const DAT::tdual_xfloat_2d &buf ) {
	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	modified(Host,X_MASK);
	struct AtomVecBondKokkos_UnpackComm<LMPHostType> f(atomKK->k_x,buf,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	} else {
	sync(Device,X_MASK);
	modified(Device,X_MASK);
	struct AtomVecBondKokkos_UnpackComm<LMPDeviceType> f(atomKK->k_x,buf,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecBondKokkos::pack_comm(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0]domain->xprd + pbc[5]domain->xy + pbc[4]*domain->xz;
	dy = pbc[1]domain->yprd + pbc[3]domain->yz;
	dz = pbc[2]*domain->zprd;
	}
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	}
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecBondKokkos::pack_comm_vel(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz,dvx,dvy,dvz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0]domain->xprd + pbc[5]domain->xy + pbc[4]*domain->xz;
	dy = pbc[1]domain->yprd + pbc[3]domain->yz;
	dz = pbc[2]*domain->zprd;
	}
	if (!deform_vremap) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	dvx = pbc[0]h_rate[0] + pbc[5]h_rate[5] + pbc[4]*h_rate[4];
	dvy = pbc[1]h_rate[1] + pbc[3]h_rate[3];
	dvz = pbc[2]*h_rate[2];
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	if (mask[i] & deform_groupbit) {
	buf[m++] = h_v(j,0) + dvx;
	buf[m++] = h_v(j,1) + dvy;
	buf[m++] = h_v(j,2) + dvz;
	} else {
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	}
	}
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecBondKokkos::unpack_comm(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecBondKokkos::unpack_comm_vel(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_v(i,0) = buf[m++];
	h_v(i,1) = buf[m++];
	h_v(i,2) = buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecBondKokkos::pack_reverse(int n, int first, double *buf)
	{
	if(n > 0)
	sync(Host,F_MASK);

	int m = 0;
	const int last = first + n;
	for (int i = first; i < last; i++) {
	buf[m++] = h_f(i,0);
	buf[m++] = h_f(i,1);
	buf[m++] = h_f(i,2);
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecBondKokkos::unpack_reverse(int n, int list, double buf)
	{
	if(n > 0)
	modified(Host,F_MASK);

	int m = 0;
	for (int i = 0; i < n; i++) {
	const int j = list[i];
	h_f(j,0) += buf[m++];
	h_f(j,1) += buf[m++];
	h_f(j,2) += buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG>
	struct AtomVecBondKokkos_PackBorder {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;

	typename AT::t_xfloat_2d _buf;
	const typename AT::t_int_2d_const _list;
	const int _iswap;
	const typename AT::t_x_array_randomread _x;
	const typename AT::t_tagint_1d _tag;
	const typename AT::t_int_1d _type;
	const typename AT::t_int_1d _mask;
	const typename AT::t_tagint_1d _molecule;
	X_FLOAT _dx,_dy,_dz;

	AtomVecBondKokkos_PackBorder(
	const typename AT::t_xfloat_2d &buf,
	const typename AT::t_int_2d_const &list,
	const int & iswap,
	const typename AT::t_x_array &x,
	const typename AT::t_tagint_1d &tag,
	const typename AT::t_int_1d &type,
	const typename AT::t_int_1d &mask,
	const typename AT::t_tagint_1d &molecule,
	const X_FLOAT &dx, const X_FLOAT &dy, const X_FLOAT &dz):
	_buf(buf),_list(list),_iswap(iswap),
	_x(x),_tag(tag),_type(type),_mask(mask),_molecule(molecule),
	_dx(dx),_dy(dy),_dz(dz) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_buf(i,0) = _x(j,0);
	_buf(i,1) = _x(j,1);
	_buf(i,2) = _x(j,2);
	- _buf(i,3) = _tag(j);
	- _buf(i,4) = _type(j);
	- _buf(i,5) = _mask(j);
	- _buf(i,6) = _molecule(j);
	+ _buf(i,3) = d_ubuf(_tag(j)).d;
	+ _buf(i,4) = d_ubuf(_type(j)).d;
	+ _buf(i,5) = d_ubuf(_mask(j)).d;
	+ _buf(i,6) = d_ubuf(_molecule(j)).d;
	} else {
	_buf(i,0) = _x(j,0) + _dx;
	_buf(i,1) = _x(j,1) + _dy;
	_buf(i,2) = _x(j,2) + _dz;
	- _buf(i,3) = _tag(j);
	- _buf(i,4) = _type(j);
	- _buf(i,5) = _mask(j);
	- _buf(i,6) = _molecule(j);
	+ _buf(i,3) = d_ubuf(_tag(j)).d;
	+ _buf(i,4) = d_ubuf(_type(j)).d;
	+ _buf(i,5) = d_ubuf(_mask(j)).d;
	+ _buf(i,6) = d_ubuf(_molecule(j)).d;
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecBondKokkos::pack_border_kokkos(int n, DAT::tdual_int_2d k_sendlist,
	DAT::tdual_xfloat_2d buf,int iswap,
	int pbc_flag, int *pbc, ExecutionSpace space)
	{
	X_FLOAT dx,dy,dz;

	if (pbc_flag != 0) {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	if(space==Host) {
	AtomVecBondKokkos_PackBorder<LMPHostType,1> f(
	buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
	iswap,h_x,h_tag,h_type,h_mask,h_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	AtomVecBondKokkos_PackBorder<LMPDeviceType,1> f(
	buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
	iswap,d_x,d_tag,d_type,d_mask,d_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}

	} else {
	dx = dy = dz = 0;
	if(space==Host) {
	AtomVecBondKokkos_PackBorder<LMPHostType,0> f(
	buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
	iswap,h_x,h_tag,h_type,h_mask,h_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	AtomVecBondKokkos_PackBorder<LMPDeviceType,0> f(
	buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
	iswap,d_x,d_tag,d_type,d_mask,d_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}
	return n*size_border;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecBondKokkos::pack_border(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = ubuf(h_molecule(j)).d;
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = ubuf(h_molecule(j)).d;
	}
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);

	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecBondKokkos::pack_border_vel(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz,dvx,dvy,dvz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = ubuf(h_molecule(j)).d;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	if (!deform_vremap) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = ubuf(h_molecule(j)).d;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	dvx = pbc[0]h_rate[0] + pbc[5]h_rate[5] + pbc[4]*h_rate[4];
	dvy = pbc[1]h_rate[1] + pbc[3]h_rate[3];
	dvz = pbc[2]*h_rate[2];
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = ubuf(h_molecule(j)).d;
	if (mask[i] & deform_groupbit) {
	buf[m++] = h_v(j,0) + dvx;
	buf[m++] = h_v(j,1) + dvy;
	buf[m++] = h_v(j,2) + dvz;
	} else {
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	}
	}
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);

	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecBondKokkos::pack_border_hybrid(int n, int list, double buf)
	{
	int i,j,m;

	m = 0;
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_molecule(j);
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecBondKokkos_UnpackBorder {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;

	const typename AT::t_xfloat_2d_const _buf;
	typename AT::t_x_array _x;
	typename AT::t_tagint_1d _tag;
	typename AT::t_int_1d _type;
	typename AT::t_int_1d _mask;
	typename AT::t_tagint_1d _molecule;
	int _first;


	AtomVecBondKokkos_UnpackBorder(
	const typename AT::t_xfloat_2d_const &buf,
	typename AT::t_x_array &x,
	typename AT::t_tagint_1d &tag,
	typename AT::t_int_1d &type,
	typename AT::t_int_1d &mask,
	typename AT::t_tagint_1d &molecule,
	const int& first):
	_buf(buf),_x(x),_tag(tag),_type(type),_mask(mask),_molecule(molecule),
	_first(first){
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	_x(i+_first,0) = _buf(i,0);
	_x(i+_first,1) = _buf(i,1);
	_x(i+_first,2) = _buf(i,2);
	- _tag(i+_first) = static_cast<tagint> (_buf(i,3));
	- _type(i+_first) = static_cast<int> (_buf(i,4));
	- _mask(i+_first) = static_cast<int> (_buf(i,5));
	- _molecule(i+_first) = static_cast<tagint> (_buf(i,6));
	+ _tag(i+_first) = (tagint) d_ubuf(_buf(i,3)).i;
	+ _type(i+_first) = (int) d_ubuf(_buf(i,4)).i;
	+ _mask(i+_first) = (int) d_ubuf(_buf(i,5)).i;
	+ _molecule(i+_first) = (tagint) d_ubuf(_buf(i,6)).i;

	}
	};

	/* ---------------------------------------------------------------------- */

	void AtomVecBondKokkos::unpack_border_kokkos(const int &n, const int &first,
	const DAT::tdual_xfloat_2d &buf,
	ExecutionSpace space) {
	modified(space,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|MOLECULE_MASK);
	while (first+n >= nmax) grow(0);
	modified(space,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|MOLECULE_MASK);
	if(space==Host) {
	struct AtomVecBondKokkos_UnpackBorder<LMPHostType>
	f(buf.view<LMPHostType>(),h_x,h_tag,h_type,h_mask,h_molecule,first);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	struct AtomVecBondKokkos_UnpackBorder<LMPDeviceType>
	f(buf.view<LMPDeviceType>(),d_x,d_tag,d_type,d_mask,d_molecule,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecBondKokkos::unpack_border(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	if (i == nmax) grow(0);
	modified(Host,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|MOLECULE_MASK);
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_tag(i) = (tagint) ubuf(buf[m++]).i;
	h_type(i) = (int) ubuf(buf[m++]).i;
	h_mask(i) = (int) ubuf(buf[m++]).i;
	h_molecule(i) = (tagint) ubuf(buf[m++]).i;
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->
	unpack_border(n,first,&buf[m]);
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecBondKokkos::unpack_border_vel(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	if (i == nmax) grow(0);
	modified(Host,X_MASK\|V_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|MOLECULE_MASK);
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_tag(i) = (tagint) ubuf(buf[m++]).i;
	h_type(i) = (int) ubuf(buf[m++]).i;
	h_mask(i) = (int) ubuf(buf[m++]).i;
	h_molecule(i) = (tagint) ubuf(buf[m++]).i;
	h_v(i,0) = buf[m++];
	h_v(i,1) = buf[m++];
	h_v(i,2) = buf[m++];
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->
	unpack_border(n,first,&buf[m]);
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecBondKokkos::unpack_border_hybrid(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++)
	h_molecule(i) = (tagint) ubuf(buf[m++]).i;
	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecBondKokkos_PackExchangeFunctor {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;
	typename AT::t_x_array_randomread _x;
	typename AT::t_v_array_randomread _v;
	typename AT::t_tagint_1d_randomread _tag;
	typename AT::t_int_1d_randomread _type;
	typename AT::t_int_1d_randomread _mask;
	typename AT::t_imageint_1d_randomread _image;
	typename AT::t_tagint_1d_randomread _molecule;
	typename AT::t_int_2d_randomread _nspecial;
	typename AT::t_tagint_2d_randomread _special;
	typename AT::t_int_1d_randomread _num_bond;
	typename AT::t_int_2d_randomread _bond_type;
	typename AT::t_tagint_2d_randomread _bond_atom;
	typename AT::t_x_array _xw;
	typename AT::t_v_array _vw;
	typename AT::t_tagint_1d _tagw;
	typename AT::t_int_1d _typew;
	typename AT::t_int_1d _maskw;
	typename AT::t_imageint_1d _imagew;
	typename AT::t_tagint_1d _moleculew;
	typename AT::t_int_2d _nspecialw;
	typename AT::t_tagint_2d _specialw;
	typename AT::t_int_1d _num_bondw;
	typename AT::t_int_2d _bond_typew;
	typename AT::t_tagint_2d _bond_atomw;

	typename AT::t_xfloat_2d_um _buf;
	typename AT::t_int_1d_const _sendlist;
	typename AT::t_int_1d_const _copylist;
	int _nlocal,_dim;
	X_FLOAT _lo,_hi;
	size_t elements;

	AtomVecBondKokkos_PackExchangeFunctor(
	const AtomKokkos* atom,
	const typename AT::tdual_xfloat_2d buf,
	typename AT::tdual_int_1d sendlist,
	typename AT::tdual_int_1d copylist,int nlocal, int dim,
	X_FLOAT lo, X_FLOAT hi):
	_x(atom->k_x.view<DeviceType>()),
	_v(atom->k_v.view<DeviceType>()),
	_tag(atom->k_tag.view<DeviceType>()),
	_type(atom->k_type.view<DeviceType>()),
	_mask(atom->k_mask.view<DeviceType>()),
	_image(atom->k_image.view<DeviceType>()),
	_molecule(atom->k_molecule.view<DeviceType>()),
	_nspecial(atom->k_nspecial.view<DeviceType>()),
	_special(atom->k_special.view<DeviceType>()),
	_num_bond(atom->k_num_bond.view<DeviceType>()),
	_bond_type(atom->k_bond_type.view<DeviceType>()),
	_bond_atom(atom->k_bond_atom.view<DeviceType>()),
	_xw(atom->k_x.view<DeviceType>()),
	_vw(atom->k_v.view<DeviceType>()),
	_tagw(atom->k_tag.view<DeviceType>()),
	_typew(atom->k_type.view<DeviceType>()),
	_maskw(atom->k_mask.view<DeviceType>()),
	_imagew(atom->k_image.view<DeviceType>()),
	_moleculew(atom->k_molecule.view<DeviceType>()),
	_nspecialw(atom->k_nspecial.view<DeviceType>()),
	_specialw(atom->k_special.view<DeviceType>()),
	_num_bondw(atom->k_num_bond.view<DeviceType>()),
	_bond_typew(atom->k_bond_type.view<DeviceType>()),
	_bond_atomw(atom->k_bond_atom.view<DeviceType>()),
	_sendlist(sendlist.template view<DeviceType>()),
	_copylist(copylist.template view<DeviceType>()),
	_nlocal(nlocal),_dim(dim),
	_lo(lo),_hi(hi){
	// 3 comp of x, 3 comp of v, 1 tag, 1 type, 1 mask, 1 image, 1 molecule, 3 nspecial,
	// maxspecial special, 1 num_bond, bond_per_atom bond_type, bond_per_atom bond_atom,
	// 1 to store buffer lenght
	elements = 16+atom->maxspecial+atom->bond_per_atom+atom->bond_per_atom;
	const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
	buf.template view<DeviceType>().dimension_1())/elements;
	buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
	}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int &mysend) const {
	int k;
	const int i = _sendlist(mysend);
	_buf(mysend,0) = elements;
	int m = 1;
	_buf(mysend,m++) = _x(i,0);
	_buf(mysend,m++) = _x(i,1);
	_buf(mysend,m++) = _x(i,2);
	_buf(mysend,m++) = _v(i,0);
	_buf(mysend,m++) = _v(i,1);
	_buf(mysend,m++) = _v(i,2);
	- _buf(mysend,m++) = _tag(i);
	- _buf(mysend,m++) = _type(i);
	- _buf(mysend,m++) = _mask(i);
	- _buf(mysend,m++) = _image(i);
	- _buf(mysend,m++) = _molecule(i);
	- _buf(mysend,m++) = _num_bond(i);
	+ _buf(mysend,m++) = d_ubuf(_tag(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_type(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_mask(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_image(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_molecule(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_num_bond(i)).d;
	for (k = 0; k < _num_bond(i); k++) {
	- _buf(mysend,m++) = _bond_type(i,k);
	- _buf(mysend,m++) = _bond_atom(i,k);
	+ _buf(mysend,m++) = d_ubuf(_bond_type(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_bond_atom(i,k)).d;
	}
	- _buf(mysend,m++) = _nspecial(i,0);
	- _buf(mysend,m++) = _nspecial(i,1);
	- _buf(mysend,m++) = _nspecial(i,2);
	+ _buf(mysend,m++) = d_ubuf(_nspecial(i,0)).d;
	+ _buf(mysend,m++) = d_ubuf(_nspecial(i,1)).d;
	+ _buf(mysend,m++) = d_ubuf(_nspecial(i,2)).d;
	for (k = 0; k < _nspecial(i,2); k++)
	- _buf(mysend,m++) = _special(i,k);
	+ _buf(mysend,m++) = d_ubuf(_special(i,k)).d;

	const int j = _copylist(mysend);

	if(j>-1) {
	_xw(i,0) = _x(j,0);
	_xw(i,1) = _x(j,1);
	_xw(i,2) = _x(j,2);
	_vw(i,0) = _v(j,0);
	_vw(i,1) = _v(j,1);
	_vw(i,2) = _v(j,2);
	_tagw(i) = _tag(j);
	_typew(i) = _type(j);
	_maskw(i) = _mask(j);
	_imagew(i) = _image(j);
	_moleculew(i) = _molecule(j);
	_num_bondw(i) = _num_bond(j);
	for (k = 0; k < _num_bond(j); k++) {
	_bond_typew(i,k) = _bond_type(j,k);
	_bond_atomw(i,k) = _bond_atom(j,k);
	}
	_nspecialw(i,0) = _nspecial(j,0);
	_nspecialw(i,1) = _nspecial(j,1);
	_nspecialw(i,2) = _nspecial(j,2);
	for (k = 0; k < _nspecial(j,2); k++)
	_specialw(i,k) = _special(j,k);
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecBondKokkos::pack_exchange_kokkos(const int &nsend,DAT::tdual_xfloat_2d &k_buf,
	DAT::tdual_int_1d k_sendlist,
	DAT::tdual_int_1d k_copylist,
	ExecutionSpace space,int dim,X_FLOAT lo,
	X_FLOAT hi )
	{
	const int elements = 16+atomKK->maxspecial+atomKK->bond_per_atom+atomKK->bond_per_atom;
	if(nsend > (int) (k_buf.view<LMPHostType>().dimension_0()*
	k_buf.view<LMPHostType>().dimension_1())/elements) {
	int newsize = nsend*elements/k_buf.view<LMPHostType>().dimension_1()+1;
	k_buf.resize(newsize,k_buf.view<LMPHostType>().dimension_1());
	}
	if(space == Host) {
	AtomVecBondKokkos_PackExchangeFunctor<LMPHostType>
	f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
	Kokkos::parallel_for(nsend,f);
	LMPHostType::fence();
	return nsend*elements;
	} else {
	AtomVecBondKokkos_PackExchangeFunctor<LMPDeviceType>
	f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
	Kokkos::parallel_for(nsend,f);
	LMPDeviceType::fence();
	return nsend*elements;
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecBondKokkos::pack_exchange(int i, double *buf)
	{
	int k;
	int m = 1;
	buf[m++] = h_x(i,0);
	buf[m++] = h_x(i,1);
	buf[m++] = h_x(i,2);
	buf[m++] = h_v(i,0);
	buf[m++] = h_v(i,1);
	buf[m++] = h_v(i,2);
	buf[m++] = ubuf(h_tag(i)).d;
	buf[m++] = ubuf(h_type(i)).d;
	buf[m++] = ubuf(h_mask(i)).d;
	buf[m++] = ubuf(h_image(i)).d;
	buf[m++] = ubuf(h_molecule(i)).d;

	buf[m++] = ubuf(h_num_bond(i)).d;
	for (k = 0; k < h_num_bond(i); k++) {
	buf[m++] = ubuf(h_bond_type(i,k)).d;
	buf[m++] = ubuf(h_bond_atom(i,k)).d;
	}
	buf[m++] = ubuf(h_nspecial(i,0)).d;
	buf[m++] = ubuf(h_nspecial(i,1)).d;
	buf[m++] = ubuf(h_nspecial(i,2)).d;
	for (k = 0; k < h_nspecial(i,2); k++)
	buf[m++] = ubuf(h_special(i,k)).d;

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);

	buf[0] = m;
	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecBondKokkos_UnpackExchangeFunctor {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;
	typename AT::t_x_array _x;
	typename AT::t_v_array _v;
	typename AT::t_tagint_1d _tag;
	typename AT::t_int_1d _type;
	typename AT::t_int_1d _mask;
	typename AT::t_imageint_1d _image;
	typename AT::t_tagint_1d _molecule;
	typename AT::t_int_2d _nspecial;
	typename AT::t_tagint_2d _special;
	typename AT::t_int_1d _num_bond;
	typename AT::t_int_2d _bond_type;
	typename AT::t_tagint_2d _bond_atom;

	typename AT::t_xfloat_2d_um _buf;
	typename AT::t_int_1d _nlocal;
	int _dim;
	X_FLOAT _lo,_hi;
	size_t elements;

	AtomVecBondKokkos_UnpackExchangeFunctor(
	const AtomKokkos* atom,
	const typename AT::tdual_xfloat_2d buf,
	typename AT::tdual_int_1d nlocal,
	int dim, X_FLOAT lo, X_FLOAT hi):
	_x(atom->k_x.view<DeviceType>()),
	_v(atom->k_v.view<DeviceType>()),
	_tag(atom->k_tag.view<DeviceType>()),
	_type(atom->k_type.view<DeviceType>()),
	_mask(atom->k_mask.view<DeviceType>()),
	_image(atom->k_image.view<DeviceType>()),
	_molecule(atom->k_molecule.view<DeviceType>()),
	_nspecial(atom->k_nspecial.view<DeviceType>()),
	_special(atom->k_special.view<DeviceType>()),
	_num_bond(atom->k_num_bond.view<DeviceType>()),
	_bond_type(atom->k_bond_type.view<DeviceType>()),
	_bond_atom(atom->k_bond_atom.view<DeviceType>()),
	_nlocal(nlocal.template view<DeviceType>()),_dim(dim),
	_lo(lo),_hi(hi){
	elements = 16+atom->maxspecial+atom->bond_per_atom+atom->bond_per_atom;
	const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
	buf.template view<DeviceType>().dimension_1())/elements;
	buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
	}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int &myrecv) const {
	X_FLOAT x = _buf(myrecv,_dim+1);
	if (x >= _lo && x < _hi) {
	int i = Kokkos::atomic_fetch_add(&_nlocal(0),1);
	int m = 1;
	_x(i,0) = _buf(myrecv,m++);
	_x(i,1) = _buf(myrecv,m++);
	_x(i,2) = _buf(myrecv,m++);
	_v(i,0) = _buf(myrecv,m++);
	_v(i,1) = _buf(myrecv,m++);
	_v(i,2) = _buf(myrecv,m++);
	- _tag(i) = _buf(myrecv,m++);
	- _type(i) = _buf(myrecv,m++);
	- _mask(i) = _buf(myrecv,m++);
	- _image(i) = _buf(myrecv,m++);
	+ _tag(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _type(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _mask(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _image(i) = (imageint) d_ubuf(_buf(myrecv,m++)).i;

	- _molecule(i) = _buf(myrecv,m++);
	- _num_bond(i) = _buf(myrecv,m++);
	+ _molecule(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _num_bond(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	int k;
	for (k = 0; k < _num_bond(i); k++) {
	- _bond_type(i,k) = _buf(myrecv,m++);
	- _bond_atom(i,k) = _buf(myrecv,m++);
	+ _bond_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _bond_atom(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	}
	- _nspecial(i,0) = _buf(myrecv,m++);
	- _nspecial(i,1) = _buf(myrecv,m++);
	- _nspecial(i,2) = _buf(myrecv,m++);
	+ _nspecial(i,0) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _nspecial(i,1) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _nspecial(i,2) = (int) d_ubuf(_buf(myrecv,m++)).i;
	for (k = 0; k < _nspecial(i,2); k++)
	- _special(i,k) = _buf(myrecv,m++);
	+ _special(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecBondKokkos::unpack_exchange_kokkos(DAT::tdual_xfloat_2d &k_buf,int nrecv,
	int nlocal,int dim,X_FLOAT lo,X_FLOAT hi,
	ExecutionSpace space) {
	const size_t elements = 16+atomKK->maxspecial+atomKK->bond_per_atom+atomKK->bond_per_atom;
	if(space == Host) {
	k_count.h_view(0) = nlocal;
	AtomVecBondKokkos_UnpackExchangeFunctor<LMPHostType>
	f(atomKK,k_buf,k_count,dim,lo,hi);
	Kokkos::parallel_for(nrecv/elements,f);
	LMPHostType::fence();
	return k_count.h_view(0);
	} else {
	k_count.h_view(0) = nlocal;
	k_count.modify<LMPHostType>();
	k_count.sync<LMPDeviceType>();
	AtomVecBondKokkos_UnpackExchangeFunctor<LMPDeviceType>
	f(atomKK,k_buf,k_count,dim,lo,hi);
	Kokkos::parallel_for(nrecv/elements,f);
	LMPDeviceType::fence();
	k_count.modify<LMPDeviceType>();
	k_count.sync<LMPHostType>();

	return k_count.h_view(0);
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecBondKokkos::unpack_exchange(double *buf)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) grow(0);
	modified(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| MOLECULE_MASK \| BOND_MASK \| SPECIAL_MASK);

	int k;
	int m = 1;
	h_x(nlocal,0) = buf[m++];
	h_x(nlocal,1) = buf[m++];
	h_x(nlocal,2) = buf[m++];
	h_v(nlocal,0) = buf[m++];
	h_v(nlocal,1) = buf[m++];
	h_v(nlocal,2) = buf[m++];
	h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
	h_type(nlocal) = (int) ubuf(buf[m++]).i;
	h_mask(nlocal) = (int) ubuf(buf[m++]).i;
	h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
	h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;

	h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_bond(nlocal); k++) {
	h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}

	h_nspecial(nlocal,0) = (int) ubuf(buf[m++]).i;
	h_nspecial(nlocal,1) = (int) ubuf(buf[m++]).i;
	h_nspecial(nlocal,2) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_nspecial(nlocal,2); k++)
	h_special(nlocal,k) = (tagint) ubuf(buf[m++]).i;

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	m += modify->fix[atom->extra_grow[iextra]]->
	unpack_exchange(nlocal,&buf[m]);

	atom->nlocal++;
	return m;
	}

	/* ----------------------------------------------------------------------
	size of restart data for all atoms owned by this proc
	include extra data stored by fixes
	------------------------------------------------------------------------- */

	int AtomVecBondKokkos::size_restart()
	{
	int i;

	int nlocal = atom->nlocal;
	int n = 0;
	for (i = 0; i < nlocal; i++)
	n += 13 + 2*h_num_bond[i];

	if (atom->nextra_restart)
	for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
	for (i = 0; i < nlocal; i++)
	n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);

	return n;
	}

	/* ----------------------------------------------------------------------
	pack atom I's data for restart file including extra quantities
	xyz must be 1st 3 values, so that read_restart can test on them
	molecular types may be negative, but write as positive
	------------------------------------------------------------------------- */

	int AtomVecBondKokkos::pack_restart(int i, double *buf)
	{
	sync(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| MOLECULE_MASK \| BOND_MASK \| SPECIAL_MASK);
	int m = 1;
	buf[m++] = h_x(i,0);
	buf[m++] = h_x(i,1);
	buf[m++] = h_x(i,2);
	buf[m++] = ubuf(h_tag(i)).d;
	buf[m++] = ubuf(h_type(i)).d;
	buf[m++] = ubuf(h_mask(i)).d;
	buf[m++] = ubuf(h_image(i)).d;
	buf[m++] = h_v(i,0);
	buf[m++] = h_v(i,1);
	buf[m++] = h_v(i,2);

	buf[m++] = ubuf(h_molecule(i)).d;

	buf[m++] = ubuf(h_num_bond(i)).d;
	for (int k = 0; k < h_num_bond(i); k++) {
	buf[m++] = ubuf(MAX(h_bond_type(i,k),-h_bond_type(i,k))).d;
	buf[m++] = ubuf(h_bond_atom(i,k)).d;
	}

	if (atom->nextra_restart)
	for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
	m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);

	buf[0] = m;
	return m;
	}

	/* ----------------------------------------------------------------------
	unpack data for one atom from restart file including extra quantities
	------------------------------------------------------------------------- */

	int AtomVecBondKokkos::unpack_restart(double *buf)
	{
	int k;

	int nlocal = atom->nlocal;
	if (nlocal == nmax) {
	grow(0);
	if (atom->nextra_store)
	memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
	}
	modified(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| MOLECULE_MASK \| BOND_MASK \| SPECIAL_MASK);
	int m = 1;
	h_x(nlocal,0) = buf[m++];
	h_x(nlocal,1) = buf[m++];
	h_x(nlocal,2) = buf[m++];
	h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
	h_type(nlocal) = (int) ubuf(buf[m++]).i;
	h_mask(nlocal) = (int) ubuf(buf[m++]).i;
	h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
	h_v(nlocal,0) = buf[m++];
	h_v(nlocal,1) = buf[m++];
	h_v(nlocal,2) = buf[m++];

	h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;

	h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_bond(nlocal); k++) {
	h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}

	h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;

	double **extra = atom->extra;
	if (atom->nextra_store) {
	int size = static_cast<int> (buf[0]) - m;
	for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
	}

	atom->nlocal++;
	return m;
	}

	/* ----------------------------------------------------------------------
	create one atom of itype at coord
	set other values to defaults
	------------------------------------------------------------------------- */

	void AtomVecBondKokkos::create_atom(int itype, double *coord)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) {
	atomKK->modified(Host,ALL_MASK);
	grow(0);
	}
	atomKK->modified(Host,ALL_MASK);

	tag[nlocal] = 0;
	type[nlocal] = itype;
	h_x(nlocal,0) = coord[0];
	h_x(nlocal,1) = coord[1];
	h_x(nlocal,2) = coord[2];
	h_mask(nlocal) = 1;
	h_image(nlocal) = ((imageint) IMGMAX << IMG2BITS) \|
	((imageint) IMGMAX << IMGBITS) \| IMGMAX;
	h_v(nlocal,0) = 0.0;
	h_v(nlocal,1) = 0.0;
	h_v(nlocal,2) = 0.0;

	h_molecule(nlocal) = 0;
	h_num_bond(nlocal) = 0;
	h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;

	atom->nlocal++;
	}

	/* ----------------------------------------------------------------------
	unpack one line from Atoms section of data file
	initialize other atom quantities
	------------------------------------------------------------------------- */

	void AtomVecBondKokkos::data_atom(double *coord, imageint imagetmp,
	char **values)
	{
	int nlocal = atomKK->nlocal;
	if (nlocal == nmax) grow(0);
	atomKK->modified(Host,ALL_MASK);

	h_tag(nlocal) = atoi(values[0]);
	h_molecule(nlocal) = atoi(values[1]);
	h_type(nlocal) = atoi(values[2]);
	if (h_type(nlocal) <= 0 \|\| h_type(nlocal) > atom->ntypes)
	error->one(FLERR,"Invalid atom type in Atoms section of data file");

	h_x(nlocal,0) = coord[0];
	h_x(nlocal,1) = coord[1];
	h_x(nlocal,2) = coord[2];

	h_image(nlocal) = imagetmp;

	h_mask(nlocal) = 1;
	h_v(nlocal,0) = 0.0;
	h_v(nlocal,1) = 0.0;
	h_v(nlocal,2) = 0.0;
	h_num_bond(nlocal) = 0;

	atomKK->nlocal++;
	}

	/* ----------------------------------------------------------------------
	unpack hybrid quantities from one line in Atoms section of data file
	initialize other atom quantities for this sub-style
	------------------------------------------------------------------------- */

	int AtomVecBondKokkos::data_atom_hybrid(int nlocal, char **values)
	{
	h_molecule(nlocal) = atoi(values[0]);
	h_num_bond(nlocal) = 0;
	return 1;
	}

	/* ----------------------------------------------------------------------
	pack atom info for data file including 3 image flags
	------------------------------------------------------------------------- */

	void AtomVecBondKokkos::pack_data(double **buf)
	{
	int nlocal = atom->nlocal;
	for (int i = 0; i < nlocal; i++) {
	buf[i][0] = h_tag(i);
	buf[i][1] = h_molecule(i);
	buf[i][2] = h_type(i);
	buf[i][3] = h_x(i,0);
	buf[i][4] = h_x(i,1);
	buf[i][5] = h_x(i,2);
	buf[i][6] = (h_image[i] & IMGMASK) - IMGMAX;
	buf[i][7] = (h_image[i] >> IMGBITS & IMGMASK) - IMGMAX;
	buf[i][8] = (h_image[i] >> IMG2BITS) - IMGMAX;
	}
	}

	/* ----------------------------------------------------------------------
	pack hybrid atom info for data file
	------------------------------------------------------------------------- */

	int AtomVecBondKokkos::pack_data_hybrid(int i, double *buf)
	{
	buf[0] = h_molecule(i);
	return 1;
	}

	/* ----------------------------------------------------------------------
	write atom info to data file including 3 image flags
	------------------------------------------------------------------------- */

	void AtomVecBondKokkos::write_data(FILE fp, int n, double *buf)
	{
	for (int i = 0; i < n; i++)
	fprintf(fp,"%d %d %d %-1.16e %-1.16e %-1.16e %d %d %d\n",
	(int) buf[i][0],(int) buf[i][1], (int) buf[i][2],
	buf[i][3],buf[i][4],buf[i][5],
	(int) buf[i][6],(int) buf[i][7],(int) buf[i][8]);
	}

	/* ----------------------------------------------------------------------
	write hybrid atom info to data file
	------------------------------------------------------------------------- */

	int AtomVecBondKokkos::write_data_hybrid(FILE fp, double buf)
	{
	fprintf(fp," " TAGINT_FORMAT, (tagint) (buf[0]));
	return 1;
	}

	/* ----------------------------------------------------------------------
	return # of bytes of allocated memory
	------------------------------------------------------------------------- */

	bigint AtomVecBondKokkos::memory_usage()
	{
	bigint bytes = 0;

	if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
	if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
	if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
	if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
	if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
	if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
	if (atom->memcheck("f")) bytes += memory->usage(f,nmax*commKK->nthreads,3);

	if (atom->memcheck("molecule")) bytes += memory->usage(molecule,nmax);
	if (atom->memcheck("nspecial")) bytes += memory->usage(nspecial,nmax,3);
	if (atom->memcheck("special"))
	bytes += memory->usage(special,nmax,atom->maxspecial);

	if (atom->memcheck("num_bond")) bytes += memory->usage(num_bond,nmax);
	if (atom->memcheck("bond_type"))
	bytes += memory->usage(bond_type,nmax,atom->bond_per_atom);
	if (atom->memcheck("bond_atom"))
	bytes += memory->usage(bond_atom,nmax,atom->bond_per_atom);

	return bytes;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecBondKokkos::sync(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if (mask & X_MASK) atomKK->k_x.sync<LMPDeviceType>();
	if (mask & V_MASK) atomKK->k_v.sync<LMPDeviceType>();
	if (mask & F_MASK) atomKK->k_f.sync<LMPDeviceType>();
	if (mask & TAG_MASK) atomKK->k_tag.sync<LMPDeviceType>();
	if (mask & TYPE_MASK) atomKK->k_type.sync<LMPDeviceType>();
	if (mask & MASK_MASK) atomKK->k_mask.sync<LMPDeviceType>();
	if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPDeviceType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPDeviceType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.sync<LMPDeviceType>();
	atomKK->k_special.sync<LMPDeviceType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.sync<LMPDeviceType>();
	atomKK->k_bond_type.sync<LMPDeviceType>();
	atomKK->k_bond_atom.sync<LMPDeviceType>();
	}
	} else {
	if (mask & X_MASK) atomKK->k_x.sync<LMPHostType>();
	if (mask & V_MASK) atomKK->k_v.sync<LMPHostType>();
	if (mask & F_MASK) atomKK->k_f.sync<LMPHostType>();
	if (mask & TAG_MASK) atomKK->k_tag.sync<LMPHostType>();
	if (mask & TYPE_MASK) atomKK->k_type.sync<LMPHostType>();
	if (mask & MASK_MASK) atomKK->k_mask.sync<LMPHostType>();
	if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPHostType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPHostType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.sync<LMPHostType>();
	atomKK->k_special.sync<LMPHostType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.sync<LMPHostType>();
	atomKK->k_bond_type.sync<LMPHostType>();
	atomKK->k_bond_atom.sync<LMPHostType>();
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecBondKokkos::sync_overlapping_device(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
	if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
	if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
	if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
	if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
	if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
	if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
	if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
	if (mask & SPECIAL_MASK) {
	if (atomKK->k_nspecial.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
	if (atomKK->k_special.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
	}
	if (mask & BOND_MASK) {
	if (atomKK->k_num_bond.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
	if (atomKK->k_bond_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
	if (atomKK->k_bond_atom.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
	}
	} else {
	if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
	if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
	if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
	if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
	if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
	if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
	if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
	if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
	if (mask & SPECIAL_MASK) {
	if (atomKK->k_nspecial.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
	if (atomKK->k_special.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
	}
	if (mask & BOND_MASK) {
	if (atomKK->k_num_bond.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
	if (atomKK->k_bond_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
	if (atomKK->k_bond_atom.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecBondKokkos::modified(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if (mask & X_MASK) atomKK->k_x.modify<LMPDeviceType>();
	if (mask & V_MASK) atomKK->k_v.modify<LMPDeviceType>();
	if (mask & F_MASK) atomKK->k_f.modify<LMPDeviceType>();
	if (mask & TAG_MASK) atomKK->k_tag.modify<LMPDeviceType>();
	if (mask & TYPE_MASK) atomKK->k_type.modify<LMPDeviceType>();
	if (mask & MASK_MASK) atomKK->k_mask.modify<LMPDeviceType>();
	if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPDeviceType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPDeviceType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.modify<LMPDeviceType>();
	atomKK->k_special.modify<LMPDeviceType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.modify<LMPDeviceType>();
	atomKK->k_bond_type.modify<LMPDeviceType>();
	atomKK->k_bond_atom.modify<LMPDeviceType>();
	}
	} else {
	if (mask & X_MASK) atomKK->k_x.modify<LMPHostType>();
	if (mask & V_MASK) atomKK->k_v.modify<LMPHostType>();
	if (mask & F_MASK) atomKK->k_f.modify<LMPHostType>();
	if (mask & TAG_MASK) atomKK->k_tag.modify<LMPHostType>();
	if (mask & TYPE_MASK) atomKK->k_type.modify<LMPHostType>();
	if (mask & MASK_MASK) atomKK->k_mask.modify<LMPHostType>();
	if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPHostType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPHostType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.modify<LMPHostType>();
	atomKK->k_special.modify<LMPHostType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.modify<LMPHostType>();
	atomKK->k_bond_type.modify<LMPHostType>();
	atomKK->k_bond_atom.modify<LMPHostType>();
	}
	}
	}

	diff --git a/src/KOKKOS/atom_vec_charge_kokkos.cpp b/src/KOKKOS/atom_vec_charge_kokkos.cpp
	index f6952f127..856660d1e 100644
	--- a/src/KOKKOS/atom_vec_charge_kokkos.cpp
	+++ b/src/KOKKOS/atom_vec_charge_kokkos.cpp
	@@ -1,1562 +1,1562 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <stdlib.h>
	#include "atom_vec_charge_kokkos.h"
	#include "atom_kokkos.h"
	#include "comm_kokkos.h"
	#include "domain.h"
	#include "modify.h"
	#include "fix.h"
	#include "atom_masks.h"
	#include "memory.h"
	#include "error.h"

	using namespace LAMMPS_NS;

	#define DELTA 10000

	/* ---------------------------------------------------------------------- */

	AtomVecChargeKokkos::AtomVecChargeKokkos(LAMMPS *lmp) : AtomVecKokkos(lmp)
	{
	molecular = 0;
	mass_type = 1;

	comm_x_only = comm_f_only = 1;
	size_forward = 3;
	size_reverse = 3;
	size_border = 7;
	size_velocity = 3;
	size_data_atom = 6;
	size_data_vel = 4;
	xcol_data = 4;

	atom->q_flag = 1;

	k_count = DAT::tdual_int_1d("atom::k_count",1);
	atomKK = (AtomKokkos *) atom;
	commKK = (CommKokkos *) comm;

	}

	/* ----------------------------------------------------------------------
	grow atom arrays
	n = 0 grows arrays by DELTA
	n > 0 allocates arrays to size n
	------------------------------------------------------------------------- */

	void AtomVecChargeKokkos::grow(int n)
	{
	if (n == 0) nmax += DELTA;
	else nmax = n;
	atomKK->nmax = nmax;
	if (nmax < 0 \|\| nmax > MAXSMALLINT)
	error->one(FLERR,"Per-processor system is too big");

	sync(Device,ALL_MASK);
	modified(Device,ALL_MASK);

	memory->grow_kokkos(atomKK->k_tag,atomKK->tag,nmax,"atom:tag");
	memory->grow_kokkos(atomKK->k_type,atomKK->type,nmax,"atom:type");
	memory->grow_kokkos(atomKK->k_mask,atomKK->mask,nmax,"atom:mask");
	memory->grow_kokkos(atomKK->k_image,atomKK->image,nmax,"atom:image");

	memory->grow_kokkos(atomKK->k_x,atomKK->x,nmax,3,"atom:x");
	memory->grow_kokkos(atomKK->k_v,atomKK->v,nmax,3,"atom:v");
	memory->grow_kokkos(atomKK->k_f,atomKK->f,nmax,3,"atom:f");

	memory->grow_kokkos(atomKK->k_q,atomKK->q,nmax,"atom:q");

	grow_reset();
	sync(Host,ALL_MASK);

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
	}

	/* ----------------------------------------------------------------------
	reset local array ptrs
	------------------------------------------------------------------------- */

	void AtomVecChargeKokkos::grow_reset()
	{
	tag = atomKK->tag;
	d_tag = atomKK->k_tag.d_view;
	h_tag = atomKK->k_tag.h_view;

	type = atomKK->type;
	d_type = atomKK->k_type.d_view;
	h_type = atomKK->k_type.h_view;
	mask = atomKK->mask;
	d_mask = atomKK->k_mask.d_view;
	h_mask = atomKK->k_mask.h_view;
	image = atomKK->image;
	d_image = atomKK->k_image.d_view;
	h_image = atomKK->k_image.h_view;

	x = atomKK->x;
	d_x = atomKK->k_x.d_view;
	h_x = atomKK->k_x.h_view;
	v = atomKK->v;
	d_v = atomKK->k_v.d_view;
	h_v = atomKK->k_v.h_view;
	f = atomKK->f;
	d_f = atomKK->k_f.d_view;
	h_f = atomKK->k_f.h_view;

	q = atomKK->q;
	d_q = atomKK->k_q.d_view;
	h_q = atomKK->k_q.h_view;

	}

	/* ----------------------------------------------------------------------
	copy atom I info to atom J
	------------------------------------------------------------------------- */

	void AtomVecChargeKokkos::copy(int i, int j, int delflag)
	{
	h_tag[j] = h_tag[i];
	h_type[j] = h_type[i];
	mask[j] = mask[i];
	h_image[j] = h_image[i];
	h_x(j,0) = h_x(i,0);
	h_x(j,1) = h_x(i,1);
	h_x(j,2) = h_x(i,2);
	h_v(j,0) = h_v(i,0);
	h_v(j,1) = h_v(i,1);
	h_v(j,2) = h_v(i,2);

	h_q[j] = h_q[i];

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG,int TRICLINIC>
	struct AtomVecChargeKokkos_PackComm {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
	typename ArrayTypes<DeviceType>::t_xfloat_2d_um _buf;
	typename ArrayTypes<DeviceType>::t_int_2d_const _list;
	const int _iswap;
	X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
	X_FLOAT _pbc[6];

	AtomVecChargeKokkos_PackComm(
	const typename DAT::tdual_x_array &x,
	const typename DAT::tdual_xfloat_2d &buf,
	const typename DAT::tdual_int_2d &list,
	const int & iswap,
	const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
	const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
	_x(x.view<DeviceType>()),_list(list.view<DeviceType>()),_iswap(iswap),
	_xprd(xprd),_yprd(yprd),_zprd(zprd),
	_xy(xy),_xz(xz),_yz(yz) {
	const size_t maxsend = (buf.view<DeviceType>().dimension_0()*buf.view<DeviceType>().dimension_1())/3;
	const size_t elements = 3;
	buffer_view<DeviceType>(_buf,buf,maxsend,elements);
	_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
	_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_buf(i,0) = _x(j,0);
	_buf(i,1) = _x(j,1);
	_buf(i,2) = _x(j,2);
	} else {
	if (TRICLINIC == 0) {
	_buf(i,0) = _x(j,0) + _pbc[0]*_xprd;
	_buf(i,1) = _x(j,1) + _pbc[1]*_yprd;
	_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
	} else {
	_buf(i,0) = _x(j,0) + _pbc[0]_xprd + _pbc[5]_xy + _pbc[4]*_xz;
	_buf(i,1) = _x(j,1) + _pbc[1]_yprd + _pbc[3]_yz;
	_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
	}
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecChargeKokkos::pack_comm_kokkos(const int &n,
	const DAT::tdual_int_2d &list,
	const int & iswap,
	const DAT::tdual_xfloat_2d &buf,
	const int &pbc_flag,
	const int* const pbc)
	{
	// Check whether to always run forward communication on the host
	// Choose correct forward PackComm kernel

	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecChargeKokkos_PackComm<LMPHostType,1,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecChargeKokkos_PackComm<LMPHostType,1,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecChargeKokkos_PackComm<LMPHostType,0,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecChargeKokkos_PackComm<LMPHostType,0,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPHostType::fence();
	} else {
	sync(Device,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecChargeKokkos_PackComm<LMPDeviceType,1,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecChargeKokkos_PackComm<LMPDeviceType,1,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecChargeKokkos_PackComm<LMPDeviceType,0,1> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecChargeKokkos_PackComm<LMPDeviceType,0,0> f(atomKK->k_x,buf,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPDeviceType::fence();
	}

	return n*size_forward;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG,int TRICLINIC>
	struct AtomVecChargeKokkos_PackCommSelf {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
	typename ArrayTypes<DeviceType>::t_x_array _xw;
	int _nfirst;
	typename ArrayTypes<DeviceType>::t_int_2d_const _list;
	const int _iswap;
	X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
	X_FLOAT _pbc[6];

	AtomVecChargeKokkos_PackCommSelf(
	const typename DAT::tdual_x_array &x,
	const int &nfirst,
	const typename DAT::tdual_int_2d &list,
	const int & iswap,
	const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
	const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
	_x(x.view<DeviceType>()),_xw(x.view<DeviceType>()),_nfirst(nfirst),_list(list.view<DeviceType>()),_iswap(iswap),
	_xprd(xprd),_yprd(yprd),_zprd(zprd),
	_xy(xy),_xz(xz),_yz(yz) {
	_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
	_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_xw(i+_nfirst,0) = _x(j,0);
	_xw(i+_nfirst,1) = _x(j,1);
	_xw(i+_nfirst,2) = _x(j,2);
	} else {
	if (TRICLINIC == 0) {
	_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd;
	_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd;
	_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
	} else {
	_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]_xprd + _pbc[5]_xy + _pbc[4]*_xz;
	_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]_yprd + _pbc[3]_yz;
	_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
	}
	}

	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecChargeKokkos::pack_comm_self(const int &n, const DAT::tdual_int_2d &list, const int & iswap,
	- const int nfirst, const int &pbc_flag, const int* const pbc) {
	+ const int nfirst, const int &pbc_flag, const int* const pbc) {
	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	modified(Host,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecChargeKokkos_PackCommSelf<LMPHostType,1,1> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecChargeKokkos_PackCommSelf<LMPHostType,1,0> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecChargeKokkos_PackCommSelf<LMPHostType,0,1> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecChargeKokkos_PackCommSelf<LMPHostType,0,0> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPHostType::fence();
	} else {
	sync(Device,X_MASK);
	modified(Device,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecChargeKokkos_PackCommSelf<LMPDeviceType,1,1> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecChargeKokkos_PackCommSelf<LMPDeviceType,1,0> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecChargeKokkos_PackCommSelf<LMPDeviceType,0,1> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecChargeKokkos_PackCommSelf<LMPDeviceType,0,0> f(atomKK->k_x,nfirst,list,iswap,
	domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPDeviceType::fence();
	}
	return n*3;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecChargeKokkos_UnpackComm {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array _x;
	typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
	int _first;

	AtomVecChargeKokkos_UnpackComm(
	const typename DAT::tdual_x_array &x,
	const typename DAT::tdual_xfloat_2d &buf,
	const int& first):_x(x.view<DeviceType>()),_buf(buf.view<DeviceType>()),
	_first(first) {};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	_x(i+_first,0) = _buf(i,0);
	_x(i+_first,1) = _buf(i,1);
	_x(i+_first,2) = _buf(i,2);
	}
	};

	/* ---------------------------------------------------------------------- */

	void AtomVecChargeKokkos::unpack_comm_kokkos(const int &n, const int &first,
	const DAT::tdual_xfloat_2d &buf ) {
	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	modified(Host,X_MASK);
	struct AtomVecChargeKokkos_UnpackComm<LMPHostType> f(atomKK->k_x,buf,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	} else {
	sync(Device,X_MASK);
	modified(Device,X_MASK);
	struct AtomVecChargeKokkos_UnpackComm<LMPDeviceType> f(atomKK->k_x,buf,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecChargeKokkos::pack_comm(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0]domain->xprd + pbc[5]domain->xy + pbc[4]*domain->xz;
	dy = pbc[1]domain->yprd + pbc[3]domain->yz;
	dz = pbc[2]*domain->zprd;
	}
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	}
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecChargeKokkos::pack_comm_vel(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz,dvx,dvy,dvz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0]domain->xprd + pbc[5]domain->xy + pbc[4]*domain->xz;
	dy = pbc[1]domain->yprd + pbc[3]domain->yz;
	dz = pbc[2]*domain->zprd;
	}
	if (!deform_vremap) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	dvx = pbc[0]h_rate[0] + pbc[5]h_rate[5] + pbc[4]*h_rate[4];
	dvy = pbc[1]h_rate[1] + pbc[3]h_rate[3];
	dvz = pbc[2]*h_rate[2];
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	if (mask[i] & deform_groupbit) {
	buf[m++] = h_v(j,0) + dvx;
	buf[m++] = h_v(j,1) + dvy;
	buf[m++] = h_v(j,2) + dvz;
	} else {
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	}
	}
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecChargeKokkos::unpack_comm(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecChargeKokkos::unpack_comm_vel(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_v(i,0) = buf[m++];
	h_v(i,1) = buf[m++];
	h_v(i,2) = buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecChargeKokkos::pack_reverse(int n, int first, double *buf)
	{
	if(n > 0)
	sync(Host,F_MASK);

	int m = 0;
	const int last = first + n;
	for (int i = first; i < last; i++) {
	buf[m++] = h_f(i,0);
	buf[m++] = h_f(i,1);
	buf[m++] = h_f(i,2);
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecChargeKokkos::unpack_reverse(int n, int list, double buf)
	{
	if(n > 0)
	modified(Host,F_MASK);

	int m = 0;
	for (int i = 0; i < n; i++) {
	const int j = list[i];
	h_f(j,0) += buf[m++];
	h_f(j,1) += buf[m++];
	h_f(j,2) += buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG>
	struct AtomVecChargeKokkos_PackBorder {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_xfloat_2d _buf;
	const typename ArrayTypes<DeviceType>::t_int_2d_const _list;
	const int _iswap;
	const typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
	const typename ArrayTypes<DeviceType>::t_tagint_1d _tag;
	const typename ArrayTypes<DeviceType>::t_int_1d _type;
	const typename ArrayTypes<DeviceType>::t_int_1d _mask;
	const typename ArrayTypes<DeviceType>::t_float_1d _q;
	X_FLOAT _dx,_dy,_dz;

	AtomVecChargeKokkos_PackBorder(
	const typename ArrayTypes<DeviceType>::t_xfloat_2d &buf,
	const typename ArrayTypes<DeviceType>::t_int_2d_const &list,
	const int & iswap,
	const typename ArrayTypes<DeviceType>::t_x_array &x,
	const typename ArrayTypes<DeviceType>::t_tagint_1d &tag,
	const typename ArrayTypes<DeviceType>::t_int_1d &type,
	const typename ArrayTypes<DeviceType>::t_int_1d &mask,
	const typename ArrayTypes<DeviceType>::t_float_1d &q,
	const X_FLOAT &dx, const X_FLOAT &dy, const X_FLOAT &dz):
	_buf(buf),_list(list),_iswap(iswap),
	_x(x),_tag(tag),_type(type),_mask(mask),_q(q),
	_dx(dx),_dy(dy),_dz(dz) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_buf(i,0) = _x(j,0);
	_buf(i,1) = _x(j,1);
	_buf(i,2) = _x(j,2);
	- _buf(i,3) = _tag(j);
	- _buf(i,4) = _type(j);
	- _buf(i,5) = _mask(j);
	+ _buf(i,3) = d_ubuf(_tag(j)).d;
	+ _buf(i,4) = d_ubuf(_type(j)).d;
	+ _buf(i,5) = d_ubuf(_mask(j)).d;
	_buf(i,6) = _q(j);
	} else {
	_buf(i,0) = _x(j,0) + _dx;
	_buf(i,1) = _x(j,1) + _dy;
	_buf(i,2) = _x(j,2) + _dz;
	- _buf(i,3) = _tag(j);
	- _buf(i,4) = _type(j);
	- _buf(i,5) = _mask(j);
	+ _buf(i,3) = d_ubuf(_tag(j)).d;
	+ _buf(i,4) = d_ubuf(_type(j)).d;
	+ _buf(i,5) = d_ubuf(_mask(j)).d;
	_buf(i,6) = _q(j);
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecChargeKokkos::pack_border_kokkos(int n, DAT::tdual_int_2d k_sendlist, DAT::tdual_xfloat_2d buf,int iswap,
	int pbc_flag, int *pbc, ExecutionSpace space)
	{
	X_FLOAT dx,dy,dz;

	if (pbc_flag != 0) {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	if(space==Host) {
	AtomVecChargeKokkos_PackBorder<LMPHostType,1> f(
	buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
	iswap,h_x,h_tag,h_type,h_mask,h_q,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	AtomVecChargeKokkos_PackBorder<LMPDeviceType,1> f(
	buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
	iswap,d_x,d_tag,d_type,d_mask,d_q,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}

	} else {
	dx = dy = dz = 0;
	if(space==Host) {
	AtomVecChargeKokkos_PackBorder<LMPHostType,0> f(
	buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
	iswap,h_x,h_tag,h_type,h_mask,h_q,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	AtomVecChargeKokkos_PackBorder<LMPDeviceType,0> f(
	buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
	iswap,d_x,d_tag,d_type,d_mask,d_q,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}
	return n*size_border;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecChargeKokkos::pack_border(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = h_q(j);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = h_q(j);
	}
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);

	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecChargeKokkos::pack_border_vel(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz,dvx,dvy,dvz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = h_q[j];
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	if (!deform_vremap) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = h_q[j];
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	dvx = pbc[0]h_rate[0] + pbc[5]h_rate[5] + pbc[4]*h_rate[4];
	dvy = pbc[1]h_rate[1] + pbc[3]h_rate[3];
	dvz = pbc[2]*h_rate[2];
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = h_q[j];
	if (mask[i] & deform_groupbit) {
	buf[m++] = h_v(j,0) + dvx;
	buf[m++] = h_v(j,1) + dvy;
	buf[m++] = h_v(j,2) + dvz;
	} else {
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	}
	}
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);

	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecChargeKokkos::pack_border_hybrid(int n, int list, double buf)
	{
	int i,j,m;

	m = 0;
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_q[j];
	}
	return m;
	}


	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecChargeKokkos_UnpackBorder {
	typedef DeviceType device_type;

	const typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
	typename ArrayTypes<DeviceType>::t_x_array _x;
	typename ArrayTypes<DeviceType>::t_tagint_1d _tag;
	typename ArrayTypes<DeviceType>::t_int_1d _type;
	typename ArrayTypes<DeviceType>::t_int_1d _mask;
	typename ArrayTypes<DeviceType>::t_float_1d _q;
	int _first;


	AtomVecChargeKokkos_UnpackBorder(
	const typename ArrayTypes<DeviceType>::t_xfloat_2d_const &buf,
	typename ArrayTypes<DeviceType>::t_x_array &x,
	typename ArrayTypes<DeviceType>::t_tagint_1d &tag,
	typename ArrayTypes<DeviceType>::t_int_1d &type,
	typename ArrayTypes<DeviceType>::t_int_1d &mask,
	typename ArrayTypes<DeviceType>::t_float_1d &q,
	const int& first):
	_buf(buf),_x(x),_tag(tag),_type(type),_mask(mask),_q(q),_first(first){
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	_x(i+_first,0) = _buf(i,0);
	_x(i+_first,1) = _buf(i,1);
	_x(i+_first,2) = _buf(i,2);
	- _tag(i+_first) = static_cast<tagint> (_buf(i,3));
	- _type(i+_first) = static_cast<int> (_buf(i,4));
	- _mask(i+_first) = static_cast<int> (_buf(i,5));
	+ _tag(i+_first) = (tagint) d_ubuf(_buf(i,3)).i;
	+ _type(i+_first) = (int) d_ubuf(_buf(i,4)).i;
	+ _mask(i+_first) = (int) d_ubuf(_buf(i,5)).i;
	_q(i+_first) = _buf(i,6);
	}
	};

	/* ---------------------------------------------------------------------- */

	void AtomVecChargeKokkos::unpack_border_kokkos(const int &n, const int &first,
	const DAT::tdual_xfloat_2d &buf,ExecutionSpace space) {
	if (first+n >= nmax) {
	grow(first+n+100);
	}
	if(space==Host) {
	struct AtomVecChargeKokkos_UnpackBorder<LMPHostType>
	f(buf.view<LMPHostType>(),h_x,h_tag,h_type,h_mask,h_q,first);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	struct AtomVecChargeKokkos_UnpackBorder<LMPDeviceType>
	f(buf.view<LMPDeviceType>(),d_x,d_tag,d_type,d_mask,d_q,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	modified(space,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|Q_MASK);
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecChargeKokkos::unpack_border(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;

	for (i = first; i < last; i++) {
	if (i == nmax) {
	grow(0);
	}
	modified(Host,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|Q_MASK);
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_tag(i) = (tagint) ubuf(buf[m++]).i;
	h_type(i) = (int) ubuf(buf[m++]).i;
	h_mask(i) = (int) ubuf(buf[m++]).i;
	h_q[i] = buf[m++];
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->
	unpack_border(n,first,&buf[m]);
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecChargeKokkos::unpack_border_vel(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	if (i == nmax) grow(0);
	modified(Host,X_MASK\|V_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|Q_MASK);
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_tag(i) = (tagint) ubuf(buf[m++]).i;
	h_type(i) = (int) ubuf(buf[m++]).i;
	h_mask(i) = (int) ubuf(buf[m++]).i;
	h_q[i] = buf[m++];
	h_v(i,0) = buf[m++];
	h_v(i,1) = buf[m++];
	h_v(i,2) = buf[m++];
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->
	unpack_border(n,first,&buf[m]);
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecChargeKokkos::unpack_border_hybrid(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++)
	h_q[i] = buf[m++];
	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecChargeKokkos_PackExchangeFunctor {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;
	typename AT::t_x_array_randomread _x;
	typename AT::t_v_array_randomread _v;
	typename AT::t_tagint_1d_randomread _tag;
	typename AT::t_int_1d_randomread _type;
	typename AT::t_int_1d_randomread _mask;
	typename AT::t_imageint_1d_randomread _image;
	typename AT::t_float_1d_randomread _q;
	typename AT::t_x_array _xw;
	typename AT::t_v_array _vw;
	typename AT::t_tagint_1d _tagw;
	typename AT::t_int_1d _typew;
	typename AT::t_int_1d _maskw;
	typename AT::t_imageint_1d _imagew;
	typename AT::t_float_1d _qw;

	typename AT::t_xfloat_2d_um _buf;
	typename AT::t_int_1d_const _sendlist;
	typename AT::t_int_1d_const _copylist;
	int _nlocal,_dim;
	X_FLOAT _lo,_hi;

	AtomVecChargeKokkos_PackExchangeFunctor(
	const AtomKokkos* atom,
	const typename AT::tdual_xfloat_2d buf,
	typename AT::tdual_int_1d sendlist,
	typename AT::tdual_int_1d copylist,int nlocal, int dim,
	X_FLOAT lo, X_FLOAT hi):
	_x(atom->k_x.view<DeviceType>()),
	_v(atom->k_v.view<DeviceType>()),
	_tag(atom->k_tag.view<DeviceType>()),
	_type(atom->k_type.view<DeviceType>()),
	_mask(atom->k_mask.view<DeviceType>()),
	_image(atom->k_image.view<DeviceType>()),
	_q(atom->k_q.view<DeviceType>()),
	_xw(atom->k_x.view<DeviceType>()),
	_vw(atom->k_v.view<DeviceType>()),
	_tagw(atom->k_tag.view<DeviceType>()),
	_typew(atom->k_type.view<DeviceType>()),
	_maskw(atom->k_mask.view<DeviceType>()),
	_imagew(atom->k_image.view<DeviceType>()),
	_qw(atom->k_q.view<DeviceType>()),
	_sendlist(sendlist.template view<DeviceType>()),
	_copylist(copylist.template view<DeviceType>()),
	_nlocal(nlocal),_dim(dim),
	_lo(lo),_hi(hi){
	const size_t elements = 12;
	const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
	buf.template view<DeviceType>().dimension_1())/elements;

	buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
	}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int &mysend) const {
	const int i = _sendlist(mysend);
	_buf(mysend,0) = 12;
	_buf(mysend,1) = _x(i,0);
	_buf(mysend,2) = _x(i,1);
	_buf(mysend,3) = _x(i,2);
	_buf(mysend,4) = _v(i,0);
	_buf(mysend,5) = _v(i,1);
	_buf(mysend,6) = _v(i,2);
	- _buf(mysend,7) = _tag[i];
	- _buf(mysend,8) = _type[i];
	- _buf(mysend,9) = _mask[i];
	- _buf(mysend,10) = _image[i];
	+ _buf(mysend,7) = d_ubuf(_tag[i]).d;
	+ _buf(mysend,8) = d_ubuf(_type[i]).d;
	+ _buf(mysend,9) = d_ubuf(_mask[i]).d;
	+ _buf(mysend,10) = d_ubuf(_image[i]).d;
	_buf(mysend,11) = _q[i];
	const int j = _copylist(mysend);

	if(j>-1) {
	_xw(i,0) = _x(j,0);
	_xw(i,1) = _x(j,1);
	_xw(i,2) = _x(j,2);
	_vw(i,0) = _v(j,0);
	_vw(i,1) = _v(j,1);
	_vw(i,2) = _v(j,2);
	_tagw(i) = _tag(j);
	_typew(i) = _type(j);
	_maskw(i) = _mask(j);
	_imagew(i) = _image(j);
	_qw(i) = _q(j);
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecChargeKokkos::pack_exchange_kokkos(const int &nsend,DAT::tdual_xfloat_2d &k_buf,
	DAT::tdual_int_1d k_sendlist,
	DAT::tdual_int_1d k_copylist,
	ExecutionSpace space,int dim,
	X_FLOAT lo,X_FLOAT hi )
	{
	if(nsend > (int) (k_buf.view<LMPHostType>().dimension_0()*k_buf.view<LMPHostType>().dimension_1())/12) {
	int newsize = nsend*12/k_buf.view<LMPHostType>().dimension_1()+1;
	k_buf.resize(newsize,k_buf.view<LMPHostType>().dimension_1());
	}
	if(space == Host) {
	AtomVecChargeKokkos_PackExchangeFunctor<LMPHostType>
	f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
	Kokkos::parallel_for(nsend,f);
	LMPHostType::fence();
	return nsend*12;
	} else {
	AtomVecChargeKokkos_PackExchangeFunctor<LMPDeviceType>
	f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
	Kokkos::parallel_for(nsend,f);
	LMPDeviceType::fence();
	return nsend*12;
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecChargeKokkos::pack_exchange(int i, double *buf)
	{
	int m = 1;
	buf[m++] = h_x(i,0);
	buf[m++] = h_x(i,1);
	buf[m++] = h_x(i,2);
	buf[m++] = h_v(i,0);
	buf[m++] = h_v(i,1);
	buf[m++] = h_v(i,2);
	buf[m++] = ubuf(h_tag(i)).d;
	buf[m++] = ubuf(h_type(i)).d;
	buf[m++] = ubuf(h_mask(i)).d;
	buf[m++] = ubuf(h_image(i)).d;
	buf[m++] = h_q[i];

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);

	buf[0] = m;
	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecChargeKokkos_UnpackExchangeFunctor {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;
	typename AT::t_x_array _x;
	typename AT::t_v_array _v;
	typename AT::t_tagint_1d _tag;
	typename AT::t_int_1d _type;
	typename AT::t_int_1d _mask;
	typename AT::t_imageint_1d _image;
	typename AT::t_float_1d _q;
	typename AT::t_xfloat_2d_um _buf;
	typename AT::t_int_1d _nlocal;
	int _dim;
	X_FLOAT _lo,_hi;

	AtomVecChargeKokkos_UnpackExchangeFunctor(
	const AtomKokkos* atom,
	const typename AT::tdual_xfloat_2d buf,
	typename AT::tdual_int_1d nlocal,
	int dim, X_FLOAT lo, X_FLOAT hi):
	_x(atom->k_x.view<DeviceType>()),
	_v(atom->k_v.view<DeviceType>()),
	_tag(atom->k_tag.view<DeviceType>()),
	_type(atom->k_type.view<DeviceType>()),
	_mask(atom->k_mask.view<DeviceType>()),
	_image(atom->k_image.view<DeviceType>()),
	_q(atom->k_q.view<DeviceType>()),
	_nlocal(nlocal.template view<DeviceType>()),_dim(dim),
	_lo(lo),_hi(hi){
	const size_t elements = 12;
	const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*buf.template view<DeviceType>().dimension_1())/elements;

	buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
	}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int &myrecv) const {
	X_FLOAT x = _buf(myrecv,_dim+1);
	if (x >= _lo && x < _hi) {
	int i = Kokkos::atomic_fetch_add(&_nlocal(0),1);
	_x(i,0) = _buf(myrecv,1);
	_x(i,1) = _buf(myrecv,2);
	_x(i,2) = _buf(myrecv,3);
	_v(i,0) = _buf(myrecv,4);
	_v(i,1) = _buf(myrecv,5);
	_v(i,2) = _buf(myrecv,6);
	- _tag[i] = _buf(myrecv,7);
	- _type[i] = _buf(myrecv,8);
	- _mask[i] = _buf(myrecv,9);
	- _image[i] = _buf(myrecv,10);
	+ _tag[i] = (tagint) d_ubuf(_buf(myrecv,7)).i;
	+ _type[i] = (int) d_ubuf(_buf(myrecv,8)).i;
	+ _mask[i] = (int) d_ubuf(_buf(myrecv,9)).i;
	+ _image[i] = (imageint) d_ubuf(_buf(myrecv,10)).i;
	_q[i] = _buf(myrecv,11);
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecChargeKokkos::unpack_exchange_kokkos(DAT::tdual_xfloat_2d &k_buf,int nrecv,
	int nlocal,int dim,X_FLOAT lo,X_FLOAT hi,
	ExecutionSpace space) {
	if(space == Host) {
	k_count.h_view(0) = nlocal;
	AtomVecChargeKokkos_UnpackExchangeFunctor<LMPHostType> f(atomKK,k_buf,k_count,dim,lo,hi);
	Kokkos::parallel_for(nrecv/12,f);
	LMPHostType::fence();
	return k_count.h_view(0);
	} else {
	k_count.h_view(0) = nlocal;
	k_count.modify<LMPHostType>();
	k_count.sync<LMPDeviceType>();
	AtomVecChargeKokkos_UnpackExchangeFunctor<LMPDeviceType>
	f(atomKK,k_buf,k_count,dim,lo,hi);
	Kokkos::parallel_for(nrecv/12,f);
	LMPDeviceType::fence();
	k_count.modify<LMPDeviceType>();
	k_count.sync<LMPHostType>();

	return k_count.h_view(0);
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecChargeKokkos::unpack_exchange(double *buf)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) grow(0);
	modified(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| Q_MASK);

	int m = 1;
	h_x(nlocal,0) = buf[m++];
	h_x(nlocal,1) = buf[m++];
	h_x(nlocal,2) = buf[m++];
	h_v(nlocal,0) = buf[m++];
	h_v(nlocal,1) = buf[m++];
	h_v(nlocal,2) = buf[m++];
	h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
	h_type(nlocal) = (int) ubuf(buf[m++]).i;
	h_mask(nlocal) = (int) ubuf(buf[m++]).i;
	h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
	h_q[nlocal] = buf[m++];

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	m += modify->fix[atom->extra_grow[iextra]]->
	unpack_exchange(nlocal,&buf[m]);

	atom->nlocal++;
	return m;
	}

	/* ----------------------------------------------------------------------
	size of restart data for all atoms owned by this proc
	include extra data stored by fixes
	------------------------------------------------------------------------- */

	int AtomVecChargeKokkos::size_restart()
	{
	int i;

	int nlocal = atom->nlocal;
	int n = 12 * nlocal;

	if (atom->nextra_restart)
	for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
	for (i = 0; i < nlocal; i++)
	n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);

	return n;
	}

	/* ----------------------------------------------------------------------
	pack atom I's data for restart file including extra quantities
	xyz must be 1st 3 values, so that read_restart can test on them
	molecular types may be negative, but write as positive
	------------------------------------------------------------------------- */

	int AtomVecChargeKokkos::pack_restart(int i, double *buf)
	{
	sync(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| Q_MASK);

	int m = 1;
	buf[m++] = h_x(i,0);
	buf[m++] = h_x(i,1);
	buf[m++] = h_x(i,2);
	buf[m++] = ubuf(h_tag(i)).d;
	buf[m++] = ubuf(h_type(i)).d;
	buf[m++] = ubuf(h_mask(i)).d;
	buf[m++] = ubuf(h_image(i)).d;
	buf[m++] = h_v(i,0);
	buf[m++] = h_v(i,1);
	buf[m++] = h_v(i,2);

	buf[m++] = h_q[i];

	if (atom->nextra_restart)
	for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
	m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);

	buf[0] = m;
	return m;
	}

	/* ----------------------------------------------------------------------
	unpack data for one atom from restart file including extra quantities
	------------------------------------------------------------------------- */

	int AtomVecChargeKokkos::unpack_restart(double *buf)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) {
	grow(0);
	if (atom->nextra_store)
	memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
	}

	modified(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| Q_MASK);

	int m = 1;
	h_x(nlocal,0) = buf[m++];
	h_x(nlocal,1) = buf[m++];
	h_x(nlocal,2) = buf[m++];
	h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
	h_type(nlocal) = (int) ubuf(buf[m++]).i;
	h_mask(nlocal) = (int) ubuf(buf[m++]).i;
	h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
	h_v(nlocal,0) = buf[m++];
	h_v(nlocal,1) = buf[m++];
	h_v(nlocal,2) = buf[m++];

	h_q[nlocal] = buf[m++];

	double **extra = atom->extra;
	if (atom->nextra_store) {
	int size = static_cast<int> (buf[0]) - m;
	for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
	}

	atom->nlocal++;
	return m;
	}

	/* ----------------------------------------------------------------------
	create one atom of itype at coord
	set other values to defaults
	------------------------------------------------------------------------- */

	void AtomVecChargeKokkos::create_atom(int itype, double *coord)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) {
	atomKK->modified(Host,ALL_MASK);
	grow(0);
	}
	atomKK->sync(Host,ALL_MASK);
	atomKK->modified(Host,ALL_MASK);

	tag[nlocal] = 0;
	type[nlocal] = itype;
	h_x(nlocal,0) = coord[0];
	h_x(nlocal,1) = coord[1];
	h_x(nlocal,2) = coord[2];
	h_mask[nlocal] = 1;
	h_image[nlocal] = ((imageint) IMGMAX << IMG2BITS) \|
	((imageint) IMGMAX << IMGBITS) \| IMGMAX;
	h_v(nlocal,0) = 0.0;
	h_v(nlocal,1) = 0.0;
	h_v(nlocal,2) = 0.0;

	h_q[nlocal] = 0.0;

	atom->nlocal++;
	}

	/* ----------------------------------------------------------------------
	unpack one line from Atoms section of data file
	initialize other atom quantities
	------------------------------------------------------------------------- */

	void AtomVecChargeKokkos::data_atom(double *coord, imageint imagetmp,
	char **values)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) grow(0);

	h_tag[nlocal] = atoi(values[0]);
	h_type[nlocal] = atoi(values[1]);
	if (type[nlocal] <= 0 \|\| type[nlocal] > atom->ntypes)
	error->one(FLERR,"Invalid atom type in Atoms section of data file");

	h_q[nlocal] = atof(values[2]);

	h_x(nlocal,0) = coord[0];
	h_x(nlocal,1) = coord[1];
	h_x(nlocal,2) = coord[2];

	h_image[nlocal] = imagetmp;

	h_mask[nlocal] = 1;
	h_v(nlocal,0) = 0.0;
	h_v(nlocal,1) = 0.0;
	h_v(nlocal,2) = 0.0;

	atomKK->modified(Host,ALL_MASK);

	atom->nlocal++;
	}
	/* ----------------------------------------------------------------------
	unpack hybrid quantities from one line in Atoms section of data file
	initialize other atom quantities for this sub-style
	------------------------------------------------------------------------- */

	int AtomVecChargeKokkos::data_atom_hybrid(int nlocal, char **values)
	{
	h_q[nlocal] = atof(values[0]);

	return 1;
	}
	/* ----------------------------------------------------------------------
	pack atom info for data file including 3 image flags
	------------------------------------------------------------------------- */

	void AtomVecChargeKokkos::pack_data(double **buf)
	{
	int nlocal = atom->nlocal;
	for (int i = 0; i < nlocal; i++) {
	buf[i][0] = h_tag[i];
	buf[i][1] = h_type[i];
	buf[i][2] = h_q[i];
	buf[i][3] = h_x(i,0);
	buf[i][4] = h_x(i,1);
	buf[i][5] = h_x(i,2);
	buf[i][6] = (h_image[i] & IMGMASK) - IMGMAX;
	buf[i][7] = (h_image[i] >> IMGBITS & IMGMASK) - IMGMAX;
	buf[i][8] = (h_image[i] >> IMG2BITS) - IMGMAX;
	}
	}

	/* ----------------------------------------------------------------------
	pack hybrid atom info for data file
	------------------------------------------------------------------------- */

	int AtomVecChargeKokkos::pack_data_hybrid(int i, double *buf)
	{
	buf[0] = h_q[i];
	return 1;
	}

	/* ----------------------------------------------------------------------
	write atom info to data file including 3 image flags
	------------------------------------------------------------------------- */

	void AtomVecChargeKokkos::write_data(FILE fp, int n, double *buf)
	{
	for (int i = 0; i < n; i++)
	fprintf(fp,"%d %d %-1.16e %-1.16e %-1.16e %-1.16e %d %d %d\n",
	(int) buf[i][0],(int) buf[i][1],buf[i][2],buf[i][3],buf[i][4],buf[i][5],
	(int) buf[i][6],(int) buf[i][7],(int) buf[i][8]);
	}

	/* ----------------------------------------------------------------------
	write hybrid atom info to data file
	------------------------------------------------------------------------- */

	int AtomVecChargeKokkos::write_data_hybrid(FILE fp, double buf)
	{
	fprintf(fp," %-1.16e",buf[0]);
	return 1;
	}

	/* ----------------------------------------------------------------------
	return # of bytes of allocated memory
	------------------------------------------------------------------------- */

	bigint AtomVecChargeKokkos::memory_usage()
	{
	bigint bytes = 0;

	if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
	if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
	if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
	if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
	if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
	if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
	if (atom->memcheck("f")) bytes += memory->usage(f,nmax*commKK->nthreads,3);

	if (atom->memcheck("q")) bytes += memory->usage(q,nmax);

	return bytes;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecChargeKokkos::sync(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if (mask & X_MASK) atomKK->k_x.sync<LMPDeviceType>();
	if (mask & V_MASK) atomKK->k_v.sync<LMPDeviceType>();
	if (mask & F_MASK) atomKK->k_f.sync<LMPDeviceType>();
	if (mask & TAG_MASK) atomKK->k_tag.sync<LMPDeviceType>();
	if (mask & TYPE_MASK) atomKK->k_type.sync<LMPDeviceType>();
	if (mask & MASK_MASK) atomKK->k_mask.sync<LMPDeviceType>();
	if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPDeviceType>();
	if (mask & Q_MASK) atomKK->k_q.sync<LMPDeviceType>();
	} else {
	if (mask & X_MASK) atomKK->k_x.sync<LMPHostType>();
	if (mask & V_MASK) atomKK->k_v.sync<LMPHostType>();
	if (mask & F_MASK) atomKK->k_f.sync<LMPHostType>();
	if (mask & TAG_MASK) atomKK->k_tag.sync<LMPHostType>();
	if (mask & TYPE_MASK) atomKK->k_type.sync<LMPHostType>();
	if (mask & MASK_MASK) atomKK->k_mask.sync<LMPHostType>();
	if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPHostType>();
	if (mask & Q_MASK) atomKK->k_q.sync<LMPHostType>();
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecChargeKokkos::modified(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if (mask & X_MASK) atomKK->k_x.modify<LMPDeviceType>();
	if (mask & V_MASK) atomKK->k_v.modify<LMPDeviceType>();
	if (mask & F_MASK) atomKK->k_f.modify<LMPDeviceType>();
	if (mask & TAG_MASK) atomKK->k_tag.modify<LMPDeviceType>();
	if (mask & TYPE_MASK) atomKK->k_type.modify<LMPDeviceType>();
	if (mask & MASK_MASK) atomKK->k_mask.modify<LMPDeviceType>();
	if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPDeviceType>();
	if (mask & Q_MASK) atomKK->k_q.modify<LMPDeviceType>();
	} else {
	if (mask & X_MASK) atomKK->k_x.modify<LMPHostType>();
	if (mask & V_MASK) atomKK->k_v.modify<LMPHostType>();
	if (mask & F_MASK) atomKK->k_f.modify<LMPHostType>();
	if (mask & TAG_MASK) atomKK->k_tag.modify<LMPHostType>();
	if (mask & TYPE_MASK) atomKK->k_type.modify<LMPHostType>();
	if (mask & MASK_MASK) atomKK->k_mask.modify<LMPHostType>();
	if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPHostType>();
	if (mask & Q_MASK) atomKK->k_q.modify<LMPHostType>();
	}
	}

	void AtomVecChargeKokkos::sync_overlapping_device(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
	if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
	if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
	if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
	if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
	if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
	if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
	if ((mask & MOLECULE_MASK) && atomKK->k_q.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_float_1d>(atomKK->k_q,space);
	} else {
	if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
	if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
	if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
	if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
	if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
	if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
	if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
	if ((mask & MOLECULE_MASK) && atomKK->k_q.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_float_1d>(atomKK->k_q,space);
	}
	}

	diff --git a/src/KOKKOS/atom_vec_full_kokkos.cpp b/src/KOKKOS/atom_vec_full_kokkos.cpp
	index 731168b6e..fa4cf18ae 100644
	--- a/src/KOKKOS/atom_vec_full_kokkos.cpp
	+++ b/src/KOKKOS/atom_vec_full_kokkos.cpp
	@@ -1,2490 +1,2444 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <stdlib.h>
	#include "atom_vec_full_kokkos.h"
	#include "atom_kokkos.h"
	#include "comm_kokkos.h"
	#include "domain.h"
	#include "modify.h"
	#include "fix.h"
	#include "atom_masks.h"
	#include "memory.h"
	#include "error.h"

	using namespace LAMMPS_NS;

	#define DELTA 10000

	/* ---------------------------------------------------------------------- */

	AtomVecFullKokkos::AtomVecFullKokkos(LAMMPS *lmp) : AtomVecKokkos(lmp)
	{
	molecular = 1;
	bonds_allow = angles_allow = dihedrals_allow = impropers_allow = 1;
	mass_type = 1;

	comm_x_only = comm_f_only = 1;
	size_forward = 3;
	size_reverse = 3;
	size_border = 8;
	size_velocity = 3;
	size_data_atom = 7;
	size_data_vel = 4;
	xcol_data = 5;

	atom->molecule_flag = atom->q_flag = 1;

	k_count = DAT::tdual_int_1d("atom::k_count",1);
	atomKK = (AtomKokkos *) atom;
	commKK = (CommKokkos *) comm;
	}

	/* ----------------------------------------------------------------------
	grow atom arrays
	n = 0 grows arrays by DELTA
	n > 0 allocates arrays to size n
	------------------------------------------------------------------------- */

	void AtomVecFullKokkos::grow(int n)
	{
	if (n == 0) nmax += DELTA;
	else nmax = n;
	atomKK->nmax = nmax;
	if (nmax < 0 \|\| nmax > MAXSMALLINT)
	error->one(FLERR,"Per-processor system is too big");

	sync(Device,ALL_MASK);
	modified(Device,ALL_MASK);

	memory->grow_kokkos(atomKK->k_tag,atomKK->tag,nmax,"atom:tag");
	memory->grow_kokkos(atomKK->k_type,atomKK->type,nmax,"atom:type");
	memory->grow_kokkos(atomKK->k_mask,atomKK->mask,nmax,"atom:mask");
	memory->grow_kokkos(atomKK->k_image,atomKK->image,nmax,"atom:image");

	memory->grow_kokkos(atomKK->k_x,atomKK->x,nmax,3,"atom:x");
	memory->grow_kokkos(atomKK->k_v,atomKK->v,nmax,3,"atom:v");
	memory->grow_kokkos(atomKK->k_f,atomKK->f,nmax,3,"atom:f");

	memory->grow_kokkos(atomKK->k_q,atomKK->q,nmax,"atom:q");
	memory->grow_kokkos(atomKK->k_molecule,atomKK->molecule,nmax,"atom:molecule");

	memory->grow_kokkos(atomKK->k_nspecial,atomKK->nspecial,nmax,3,"atom:nspecial");
	memory->grow_kokkos(atomKK->k_special,atomKK->special,nmax,atomKK->maxspecial,
	"atom:special");
	memory->grow_kokkos(atomKK->k_num_bond,atomKK->num_bond,nmax,"atom:num_bond");
	memory->grow_kokkos(atomKK->k_bond_type,atomKK->bond_type,nmax,atomKK->bond_per_atom,
	"atom:bond_type");
	memory->grow_kokkos(atomKK->k_bond_atom,atomKK->bond_atom,nmax,atomKK->bond_per_atom,
	"atom:bond_atom");

	memory->grow_kokkos(atomKK->k_num_angle,atomKK->num_angle,nmax,"atom:num_angle");
	memory->grow_kokkos(atomKK->k_angle_type,atomKK->angle_type,nmax,atomKK->angle_per_atom,
	"atom:angle_type");
	memory->grow_kokkos(atomKK->k_angle_atom1,atomKK->angle_atom1,nmax,atomKK->angle_per_atom,
	"atom:angle_atom1");
	memory->grow_kokkos(atomKK->k_angle_atom2,atomKK->angle_atom2,nmax,atomKK->angle_per_atom,
	"atom:angle_atom2");
	memory->grow_kokkos(atomKK->k_angle_atom3,atomKK->angle_atom3,nmax,atomKK->angle_per_atom,
	"atom:angle_atom3");

	memory->grow_kokkos(atomKK->k_num_dihedral,atomKK->num_dihedral,nmax,"atom:num_dihedral");
	memory->grow_kokkos(atomKK->k_dihedral_type,atomKK->dihedral_type,nmax,
	atomKK->dihedral_per_atom,"atom:dihedral_type");
	memory->grow_kokkos(atomKK->k_dihedral_atom1,atomKK->dihedral_atom1,nmax,
	atomKK->dihedral_per_atom,"atom:dihedral_atom1");
	memory->grow_kokkos(atomKK->k_dihedral_atom2,atomKK->dihedral_atom2,nmax,
	atomKK->dihedral_per_atom,"atom:dihedral_atom2");
	memory->grow_kokkos(atomKK->k_dihedral_atom3,atomKK->dihedral_atom3,nmax,
	atomKK->dihedral_per_atom,"atom:dihedral_atom3");
	memory->grow_kokkos(atomKK->k_dihedral_atom4,atomKK->dihedral_atom4,nmax,
	atomKK->dihedral_per_atom,"atom:dihedral_atom4");

	memory->grow_kokkos(atomKK->k_num_improper,atomKK->num_improper,nmax,"atom:num_improper");
	memory->grow_kokkos(atomKK->k_improper_type,atomKK->improper_type,nmax,
	atomKK->improper_per_atom,"atom:improper_type");
	memory->grow_kokkos(atomKK->k_improper_atom1,atomKK->improper_atom1,nmax,
	atomKK->improper_per_atom,"atom:improper_atom1");
	memory->grow_kokkos(atomKK->k_improper_atom2,atomKK->improper_atom2,nmax,
	atomKK->improper_per_atom,"atom:improper_atom2");
	memory->grow_kokkos(atomKK->k_improper_atom3,atomKK->improper_atom3,nmax,
	atomKK->improper_per_atom,"atom:improper_atom3");
	memory->grow_kokkos(atomKK->k_improper_atom4,atomKK->improper_atom4,nmax,
	atomKK->improper_per_atom,"atom:improper_atom4");

	grow_reset();
	sync(Host,ALL_MASK);

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
	}

	/* ----------------------------------------------------------------------
	reset local array ptrs
	------------------------------------------------------------------------- */

	void AtomVecFullKokkos::grow_reset()
	{
	tag = atomKK->tag;
	d_tag = atomKK->k_tag.d_view;
	h_tag = atomKK->k_tag.h_view;

	type = atomKK->type;
	d_type = atomKK->k_type.d_view;
	h_type = atomKK->k_type.h_view;
	mask = atomKK->mask;
	d_mask = atomKK->k_mask.d_view;
	h_mask = atomKK->k_mask.h_view;
	image = atomKK->image;
	d_image = atomKK->k_image.d_view;
	h_image = atomKK->k_image.h_view;

	x = atomKK->x;
	d_x = atomKK->k_x.d_view;
	h_x = atomKK->k_x.h_view;
	v = atomKK->v;
	d_v = atomKK->k_v.d_view;
	h_v = atomKK->k_v.h_view;
	f = atomKK->f;
	d_f = atomKK->k_f.d_view;
	h_f = atomKK->k_f.h_view;

	q = atomKK->q;
	d_q = atomKK->k_q.d_view;
	h_q = atomKK->k_q.h_view;

	molecule = atomKK->molecule;
	d_molecule = atomKK->k_molecule.d_view;
	h_molecule = atomKK->k_molecule.h_view;

	nspecial = atomKK->nspecial;
	d_nspecial = atomKK->k_nspecial.d_view;
	h_nspecial = atomKK->k_nspecial.h_view;
	special = atomKK->special;
	d_special = atomKK->k_special.d_view;
	h_special = atomKK->k_special.h_view;

	num_bond = atomKK->num_bond;
	d_num_bond = atomKK->k_num_bond.d_view;
	h_num_bond = atomKK->k_num_bond.h_view;
	bond_type = atomKK->bond_type;
	d_bond_type = atomKK->k_bond_type.d_view;
	h_bond_type = atomKK->k_bond_type.h_view;
	bond_atom = atomKK->bond_atom;
	d_bond_atom = atomKK->k_bond_atom.d_view;
	h_bond_atom = atomKK->k_bond_atom.h_view;

	num_angle = atomKK->num_angle;
	d_num_angle = atomKK->k_num_angle.d_view;
	h_num_angle = atomKK->k_num_angle.h_view;
	angle_type = atomKK->angle_type;
	d_angle_type = atomKK->k_angle_type.d_view;
	h_angle_type = atomKK->k_angle_type.h_view;
	angle_atom1 = atomKK->angle_atom1;
	d_angle_atom1 = atomKK->k_angle_atom1.d_view;
	h_angle_atom1 = atomKK->k_angle_atom1.h_view;
	angle_atom2 = atomKK->angle_atom2;
	d_angle_atom2 = atomKK->k_angle_atom2.d_view;
	h_angle_atom2 = atomKK->k_angle_atom2.h_view;
	angle_atom3 = atomKK->angle_atom3;
	d_angle_atom3 = atomKK->k_angle_atom3.d_view;
	h_angle_atom3 = atomKK->k_angle_atom3.h_view;

	num_dihedral = atomKK->num_dihedral;
	d_num_dihedral = atomKK->k_num_dihedral.d_view;
	h_num_dihedral = atomKK->k_num_dihedral.h_view;
	dihedral_type = atomKK->dihedral_type;
	d_dihedral_type = atomKK->k_dihedral_type.d_view;
	h_dihedral_type = atomKK->k_dihedral_type.h_view;
	dihedral_atom1 = atomKK->dihedral_atom1;
	d_dihedral_atom1 = atomKK->k_dihedral_atom1.d_view;
	h_dihedral_atom1 = atomKK->k_dihedral_atom1.h_view;
	dihedral_atom2 = atomKK->dihedral_atom2;
	d_dihedral_atom2 = atomKK->k_dihedral_atom2.d_view;
	h_dihedral_atom2 = atomKK->k_dihedral_atom2.h_view;
	dihedral_atom3 = atomKK->dihedral_atom3;
	d_dihedral_atom3 = atomKK->k_dihedral_atom3.d_view;
	h_dihedral_atom3 = atomKK->k_dihedral_atom3.h_view;
	dihedral_atom4 = atomKK->dihedral_atom4;
	d_dihedral_atom4 = atomKK->k_dihedral_atom4.d_view;
	h_dihedral_atom4 = atomKK->k_dihedral_atom4.h_view;

	num_improper = atomKK->num_improper;
	d_num_improper = atomKK->k_num_improper.d_view;
	h_num_improper = atomKK->k_num_improper.h_view;
	improper_type = atomKK->improper_type;
	d_improper_type = atomKK->k_improper_type.d_view;
	h_improper_type = atomKK->k_improper_type.h_view;
	improper_atom1 = atomKK->improper_atom1;
	d_improper_atom1 = atomKK->k_improper_atom1.d_view;
	h_improper_atom1 = atomKK->k_improper_atom1.h_view;
	improper_atom2 = atomKK->improper_atom2;
	d_improper_atom2 = atomKK->k_improper_atom2.d_view;
	h_improper_atom2 = atomKK->k_improper_atom2.h_view;
	improper_atom3 = atomKK->improper_atom3;
	d_improper_atom3 = atomKK->k_improper_atom3.d_view;
	h_improper_atom3 = atomKK->k_improper_atom3.h_view;
	improper_atom4 = atomKK->improper_atom4;
	d_improper_atom4 = atomKK->k_improper_atom4.d_view;
	h_improper_atom4 = atomKK->k_improper_atom4.h_view;
	}

	/* ----------------------------------------------------------------------
	copy atom I info to atom J
	------------------------------------------------------------------------- */

	void AtomVecFullKokkos::copy(int i, int j, int delflag)
	{
	int k;

	h_tag[j] = h_tag[i];
	h_type[j] = h_type[i];
	mask[j] = mask[i];
	h_image[j] = h_image[i];
	h_x(j,0) = h_x(i,0);
	h_x(j,1) = h_x(i,1);
	h_x(j,2) = h_x(i,2);
	h_v(j,0) = h_v(i,0);
	h_v(j,1) = h_v(i,1);
	h_v(j,2) = h_v(i,2);

	h_q[j] = h_q[i];
	h_molecule(j) = h_molecule(i);

	h_num_bond(j) = h_num_bond(i);
	for (k = 0; k < h_num_bond(j); k++) {
	h_bond_type(j,k) = h_bond_type(i,k);
	h_bond_atom(j,k) = h_bond_atom(i,k);
	}

	h_nspecial(j,0) = h_nspecial(i,0);
	h_nspecial(j,1) = h_nspecial(i,1);
	h_nspecial(j,2) = h_nspecial(i,2);
	for (k = 0; k < h_nspecial(j,2); k++)
	h_special(j,k) = h_special(i,k);

	h_num_angle(j) = h_num_angle(i);
	for (k = 0; k < h_num_angle(j); k++) {
	h_angle_type(j,k) = h_angle_type(i,k);
	h_angle_atom1(j,k) = h_angle_atom1(i,k);
	h_angle_atom2(j,k) = h_angle_atom2(i,k);
	h_angle_atom3(j,k) = h_angle_atom3(i,k);
	}

	h_num_dihedral(j) = h_num_dihedral(i);
	for (k = 0; k < h_num_dihedral(j); k++) {
	h_dihedral_type(j,k) = h_dihedral_type(i,k);
	h_dihedral_atom1(j,k) = h_dihedral_atom1(i,k);
	h_dihedral_atom2(j,k) = h_dihedral_atom2(i,k);
	h_dihedral_atom3(j,k) = h_dihedral_atom3(i,k);
	h_dihedral_atom4(j,k) = h_dihedral_atom4(i,k);
	}

	h_num_improper(j) = h_num_improper(i);
	for (k = 0; k < h_num_improper(j); k++) {
	h_improper_type(j,k) = h_improper_type(i,k);
	h_improper_atom1(j,k) = h_improper_atom1(i,k);
	h_improper_atom2(j,k) = h_improper_atom2(i,k);
	h_improper_atom3(j,k) = h_improper_atom3(i,k);
	h_improper_atom4(j,k) = h_improper_atom4(i,k);
	}

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG,int TRICLINIC>
	struct AtomVecFullKokkos_PackComm {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
	typename ArrayTypes<DeviceType>::t_xfloat_2d_um _buf;
	typename ArrayTypes<DeviceType>::t_int_2d_const _list;
	const int _iswap;
	X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
	X_FLOAT _pbc[6];

	AtomVecFullKokkos_PackComm(
	const typename DAT::tdual_x_array &x,
	const typename DAT::tdual_xfloat_2d &buf,
	const typename DAT::tdual_int_2d &list,
	const int & iswap,
	const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
	const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
	_x(x.view<DeviceType>()),_list(list.view<DeviceType>()),_iswap(iswap),
	_xprd(xprd),_yprd(yprd),_zprd(zprd),
	_xy(xy),_xz(xz),_yz(yz) {
	const size_t maxsend = (buf.view<DeviceType>().dimension_0()
	*buf.view<DeviceType>().dimension_1())/3;
	const size_t elements = 3;
	buffer_view<DeviceType>(_buf,buf,maxsend,elements);
	_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
	_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_buf(i,0) = _x(j,0);
	_buf(i,1) = _x(j,1);
	_buf(i,2) = _x(j,2);
	} else {
	if (TRICLINIC == 0) {
	_buf(i,0) = _x(j,0) + _pbc[0]*_xprd;
	_buf(i,1) = _x(j,1) + _pbc[1]*_yprd;
	_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
	} else {
	_buf(i,0) = _x(j,0) + _pbc[0]_xprd + _pbc[5]_xy + _pbc[4]*_xz;
	_buf(i,1) = _x(j,1) + _pbc[1]_yprd + _pbc[3]_yz;
	_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
	}
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecFullKokkos::pack_comm_kokkos(const int &n,
	const DAT::tdual_int_2d &list,
	const int & iswap,
	const DAT::tdual_xfloat_2d &buf,
	const int &pbc_flag,
	const int* const pbc)
	{
	// Check whether to always run forward communication on the host
	// Choose correct forward PackComm kernel

	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecFullKokkos_PackComm<LMPHostType,1,1>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecFullKokkos_PackComm<LMPHostType,1,0>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecFullKokkos_PackComm<LMPHostType,0,1>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecFullKokkos_PackComm<LMPHostType,0,0>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPHostType::fence();
	} else {
	sync(Device,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecFullKokkos_PackComm<LMPDeviceType,1,1>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecFullKokkos_PackComm<LMPDeviceType,1,0>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecFullKokkos_PackComm<LMPDeviceType,0,1>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecFullKokkos_PackComm<LMPDeviceType,0,0>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPDeviceType::fence();
	}

	return n*size_forward;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG,int TRICLINIC>
	struct AtomVecFullKokkos_PackCommSelf {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
	typename ArrayTypes<DeviceType>::t_x_array _xw;
	int _nfirst;
	typename ArrayTypes<DeviceType>::t_int_2d_const _list;
	const int _iswap;
	X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
	X_FLOAT _pbc[6];

	AtomVecFullKokkos_PackCommSelf(
	const typename DAT::tdual_x_array &x,
	const int &nfirst,
	const typename DAT::tdual_int_2d &list,
	const int & iswap,
	const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
	const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
	_x(x.view<DeviceType>()),_xw(x.view<DeviceType>()),_nfirst(nfirst),
	_list(list.view<DeviceType>()),_iswap(iswap),
	_xprd(xprd),_yprd(yprd),_zprd(zprd),
	_xy(xy),_xz(xz),_yz(yz) {
	_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
	_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_xw(i+_nfirst,0) = _x(j,0);
	_xw(i+_nfirst,1) = _x(j,1);
	_xw(i+_nfirst,2) = _x(j,2);
	} else {
	if (TRICLINIC == 0) {
	_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd;
	_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd;
	_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
	} else {
	_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]_xprd + _pbc[5]_xy + _pbc[4]*_xz;
	_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]_yprd + _pbc[3]_yz;
	_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
	}
	}

	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecFullKokkos::pack_comm_self(const int &n, const DAT::tdual_int_2d &list,
	const int & iswap,
	const int nfirst, const int &pbc_flag,
	const int* const pbc) {
	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	modified(Host,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecFullKokkos_PackCommSelf<LMPHostType,1,1>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecFullKokkos_PackCommSelf<LMPHostType,1,0>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecFullKokkos_PackCommSelf<LMPHostType,0,1>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecFullKokkos_PackCommSelf<LMPHostType,0,0>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPHostType::fence();
	} else {
	sync(Device,X_MASK);
	modified(Device,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecFullKokkos_PackCommSelf<LMPDeviceType,1,1>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecFullKokkos_PackCommSelf<LMPDeviceType,1,0>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecFullKokkos_PackCommSelf<LMPDeviceType,0,1>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecFullKokkos_PackCommSelf<LMPDeviceType,0,0>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPDeviceType::fence();
	}
	return n*3;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecFullKokkos_UnpackComm {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array _x;
	typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
	int _first;

	AtomVecFullKokkos_UnpackComm(
	const typename DAT::tdual_x_array &x,
	const typename DAT::tdual_xfloat_2d &buf,
	const int& first):_x(x.view<DeviceType>()),_buf(buf.view<DeviceType>()),
	_first(first) {};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	_x(i+_first,0) = _buf(i,0);
	_x(i+_first,1) = _buf(i,1);
	_x(i+_first,2) = _buf(i,2);
	}
	};

	/* ---------------------------------------------------------------------- */

	void AtomVecFullKokkos::unpack_comm_kokkos(const int &n, const int &first,
	const DAT::tdual_xfloat_2d &buf ) {
	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	modified(Host,X_MASK);
	struct AtomVecFullKokkos_UnpackComm<LMPHostType> f(atomKK->k_x,buf,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	} else {
	sync(Device,X_MASK);
	modified(Device,X_MASK);
	struct AtomVecFullKokkos_UnpackComm<LMPDeviceType> f(atomKK->k_x,buf,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecFullKokkos::pack_comm(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0]domain->xprd + pbc[5]domain->xy + pbc[4]*domain->xz;
	dy = pbc[1]domain->yprd + pbc[3]domain->yz;
	dz = pbc[2]*domain->zprd;
	}
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	}
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecFullKokkos::pack_comm_vel(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz,dvx,dvy,dvz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0]domain->xprd + pbc[5]domain->xy + pbc[4]*domain->xz;
	dy = pbc[1]domain->yprd + pbc[3]domain->yz;
	dz = pbc[2]*domain->zprd;
	}
	if (!deform_vremap) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	dvx = pbc[0]h_rate[0] + pbc[5]h_rate[5] + pbc[4]*h_rate[4];
	dvy = pbc[1]h_rate[1] + pbc[3]h_rate[3];
	dvz = pbc[2]*h_rate[2];
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	if (mask[i] & deform_groupbit) {
	buf[m++] = h_v(j,0) + dvx;
	buf[m++] = h_v(j,1) + dvy;
	buf[m++] = h_v(j,2) + dvz;
	} else {
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	}
	}
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecFullKokkos::unpack_comm(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecFullKokkos::unpack_comm_vel(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_v(i,0) = buf[m++];
	h_v(i,1) = buf[m++];
	h_v(i,2) = buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecFullKokkos::pack_reverse(int n, int first, double *buf)
	{
	if(n > 0)
	sync(Host,F_MASK);

	int m = 0;
	const int last = first + n;
	for (int i = first; i < last; i++) {
	buf[m++] = h_f(i,0);
	buf[m++] = h_f(i,1);
	buf[m++] = h_f(i,2);
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecFullKokkos::unpack_reverse(int n, int list, double buf)
	{
	if(n > 0)
	modified(Host,F_MASK);

	int m = 0;
	for (int i = 0; i < n; i++) {
	const int j = list[i];
	h_f(j,0) += buf[m++];
	h_f(j,1) += buf[m++];
	h_f(j,2) += buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG>
	struct AtomVecFullKokkos_PackBorder {
	- union ubuf {
	- double d;
	- int64_t i;
	- KOKKOS_INLINE_FUNCTION
	- ubuf(double arg) : d(arg) {}
	- KOKKOS_INLINE_FUNCTION
	- ubuf(int64_t arg) : i(arg) {}
	- KOKKOS_INLINE_FUNCTION
	- ubuf(int arg) : i(arg) {}
	- };
	-
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;

	typename AT::t_xfloat_2d _buf;
	const typename AT::t_int_2d_const _list;
	const int _iswap;
	const typename AT::t_x_array_randomread _x;
	const typename AT::t_tagint_1d _tag;
	const typename AT::t_int_1d _type;
	const typename AT::t_int_1d _mask;
	const typename AT::t_float_1d _q;
	const typename AT::t_tagint_1d _molecule;
	X_FLOAT _dx,_dy,_dz;

	AtomVecFullKokkos_PackBorder(
	const typename AT::t_xfloat_2d &buf,
	const typename AT::t_int_2d_const &list,
	const int & iswap,
	const typename AT::t_x_array &x,
	const typename AT::t_tagint_1d &tag,
	const typename AT::t_int_1d &type,
	const typename AT::t_int_1d &mask,
	const typename AT::t_float_1d &q,
	const typename AT::t_tagint_1d &molecule,
	const X_FLOAT &dx, const X_FLOAT &dy, const X_FLOAT &dz):
	_buf(buf),_list(list),_iswap(iswap),
	_x(x),_tag(tag),_type(type),_mask(mask),_q(q),_molecule(molecule),
	_dx(dx),_dy(dy),_dz(dz) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_buf(i,0) = _x(j,0);
	_buf(i,1) = _x(j,1);
	_buf(i,2) = _x(j,2);
	- _buf(i,3) = ubuf(_tag(j)).d;
	- _buf(i,4) = ubuf(_type(j)).d;
	- _buf(i,5) = ubuf(_mask(j)).d;
	+ _buf(i,3) = d_ubuf(_tag(j)).d;
	+ _buf(i,4) = d_ubuf(_type(j)).d;
	+ _buf(i,5) = d_ubuf(_mask(j)).d;
	_buf(i,6) = _q(j);
	- _buf(i,7) = ubuf(_molecule(j)).d;
	+ _buf(i,7) = d_ubuf(_molecule(j)).d;
	} else {
	_buf(i,0) = _x(j,0) + _dx;
	_buf(i,1) = _x(j,1) + _dy;
	_buf(i,2) = _x(j,2) + _dz;
	- _buf(i,3) = ubuf(_tag(j)).d;
	- _buf(i,4) = ubuf(_type(j)).d;
	- _buf(i,5) = ubuf(_mask(j)).d;
	+ _buf(i,3) = d_ubuf(_tag(j)).d;
	+ _buf(i,4) = d_ubuf(_type(j)).d;
	+ _buf(i,5) = d_ubuf(_mask(j)).d;
	_buf(i,6) = _q(j);
	- _buf(i,7) = ubuf(_molecule(j)).d;
	+ _buf(i,7) = d_ubuf(_molecule(j)).d;
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecFullKokkos::pack_border_kokkos(int n, DAT::tdual_int_2d k_sendlist,
	DAT::tdual_xfloat_2d buf,int iswap,
	int pbc_flag, int *pbc, ExecutionSpace space)
	{
	X_FLOAT dx,dy,dz;

	if (pbc_flag != 0) {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	if(space==Host) {
	AtomVecFullKokkos_PackBorder<LMPHostType,1> f(
	buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
	iswap,h_x,h_tag,h_type,h_mask,h_q,h_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	AtomVecFullKokkos_PackBorder<LMPDeviceType,1> f(
	buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
	iswap,d_x,d_tag,d_type,d_mask,d_q,d_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}

	} else {
	dx = dy = dz = 0;
	if(space==Host) {
	AtomVecFullKokkos_PackBorder<LMPHostType,0> f(
	buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
	iswap,h_x,h_tag,h_type,h_mask,h_q,h_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	AtomVecFullKokkos_PackBorder<LMPDeviceType,0> f(
	buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
	iswap,d_x,d_tag,d_type,d_mask,d_q,d_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}
	return n*size_border;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecFullKokkos::pack_border(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = h_q(j);
	buf[m++] = ubuf(h_molecule(j)).d;
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = h_q(j);
	buf[m++] = ubuf(h_molecule(j)).d;
	}
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);

	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecFullKokkos::pack_border_vel(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz,dvx,dvy,dvz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = h_q(j);
	buf[m++] = ubuf(h_molecule(j)).d;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	if (!deform_vremap) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = h_q(j);
	buf[m++] = ubuf(h_molecule(j)).d;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	dvx = pbc[0]h_rate[0] + pbc[5]h_rate[5] + pbc[4]*h_rate[4];
	dvy = pbc[1]h_rate[1] + pbc[3]h_rate[3];
	dvz = pbc[2]*h_rate[2];
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = h_q(j);
	buf[m++] = ubuf(h_molecule(j)).d;
	if (mask[i] & deform_groupbit) {
	buf[m++] = h_v(j,0) + dvx;
	buf[m++] = h_v(j,1) + dvy;
	buf[m++] = h_v(j,2) + dvz;
	} else {
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	}
	}
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);

	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecFullKokkos::pack_border_hybrid(int n, int list, double buf)
	{
	int i,j,m;

	m = 0;
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_q(j);
	buf[m++] = ubuf(h_molecule(j)).d;
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecFullKokkos_UnpackBorder {
	- union ubuf {
	- double d;
	- int64_t i;
	- KOKKOS_INLINE_FUNCTION
	- ubuf(double arg) : d(arg) {}
	- KOKKOS_INLINE_FUNCTION
	- ubuf(int64_t arg) : i(arg) {}
	- KOKKOS_INLINE_FUNCTION
	- ubuf(int arg) : i(arg) {}
	- };
	-
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;

	const typename AT::t_xfloat_2d_const _buf;
	typename AT::t_x_array _x;
	typename AT::t_tagint_1d _tag;
	typename AT::t_int_1d _type;
	typename AT::t_int_1d _mask;
	typename AT::t_float_1d _q;
	typename AT::t_tagint_1d _molecule;
	int _first;


	AtomVecFullKokkos_UnpackBorder(
	const typename AT::t_xfloat_2d_const &buf,
	typename AT::t_x_array &x,
	typename AT::t_tagint_1d &tag,
	typename AT::t_int_1d &type,
	typename AT::t_int_1d &mask,
	typename AT::t_float_1d &q,
	typename AT::t_tagint_1d &molecule,
	const int& first):
	_buf(buf),_x(x),_tag(tag),_type(type),_mask(mask),_q(q),_molecule(molecule),
	_first(first){
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	_x(i+_first,0) = _buf(i,0);
	_x(i+_first,1) = _buf(i,1);
	_x(i+_first,2) = _buf(i,2);
	- _tag(i+_first) = (tagint) ubuf(_buf(i,3)).i;
	- _type(i+_first) = (int) ubuf(_buf(i,4)).i;
	- _mask(i+_first) = (int) ubuf(_buf(i,5)).i;
	+ _tag(i+_first) = (tagint) d_ubuf(_buf(i,3)).i;
	+ _type(i+_first) = (int) d_ubuf(_buf(i,4)).i;
	+ _mask(i+_first) = (int) d_ubuf(_buf(i,5)).i;
	_q(i+_first) = _buf(i,6);
	- _molecule(i+_first) = (tagint) ubuf(_buf(i,7)).i;
	+ _molecule(i+_first) = (tagint) d_ubuf(_buf(i,7)).i;

	}
	};

	/* ---------------------------------------------------------------------- */

	void AtomVecFullKokkos::unpack_border_kokkos(const int &n, const int &first,
	const DAT::tdual_xfloat_2d &buf,
	ExecutionSpace space) {
	modified(space,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|Q_MASK\|MOLECULE_MASK);
	while (first+n >= nmax) grow(0);
	modified(space,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|Q_MASK\|MOLECULE_MASK);
	if(space==Host) {
	struct AtomVecFullKokkos_UnpackBorder<LMPHostType>
	f(buf.view<LMPHostType>(),h_x,h_tag,h_type,h_mask,h_q,h_molecule,first);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	struct AtomVecFullKokkos_UnpackBorder<LMPDeviceType>
	f(buf.view<LMPDeviceType>(),d_x,d_tag,d_type,d_mask,d_q,d_molecule,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecFullKokkos::unpack_border(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	if (i == nmax) grow(0);
	modified(Host,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|Q_MASK\|MOLECULE_MASK);
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_tag(i) = (tagint) ubuf(buf[m++]).i;
	h_type(i) = (int) ubuf(buf[m++]).i;
	h_mask(i) = (int) ubuf(buf[m++]).i;
	h_q(i) = buf[m++];
	h_molecule(i) = (tagint) ubuf(buf[m++]).i;
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->
	unpack_border(n,first,&buf[m]);
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecFullKokkos::unpack_border_vel(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	if (i == nmax) grow(0);
	modified(Host,X_MASK\|V_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|Q_MASK\|MOLECULE_MASK);
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_tag(i) = (tagint) ubuf(buf[m++]).i;
	h_type(i) = (int) ubuf(buf[m++]).i;
	h_mask(i) = (int) ubuf(buf[m++]).i;
	h_q(i) = buf[m++];
	h_molecule(i) = (tagint) ubuf(buf[m++]).i;
	h_v(i,0) = buf[m++];
	h_v(i,1) = buf[m++];
	h_v(i,2) = buf[m++];
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->
	unpack_border(n,first,&buf[m]);
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecFullKokkos::unpack_border_hybrid(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	h_q(i) = buf[m++];
	h_molecule(i) = (tagint) ubuf(buf[m++]).i;
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecFullKokkos_PackExchangeFunctor {
	-
	- union ubuf {
	- double d;
	- int64_t i;
	- KOKKOS_INLINE_FUNCTION
	- ubuf(double arg) : d(arg) {}
	- KOKKOS_INLINE_FUNCTION
	- ubuf(int64_t arg) : i(arg) {}
	- KOKKOS_INLINE_FUNCTION
	- ubuf(int arg) : i(arg) {}
	- };
	-
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;
	typename AT::t_x_array_randomread _x;
	typename AT::t_v_array_randomread _v;
	typename AT::t_tagint_1d_randomread _tag;
	typename AT::t_int_1d_randomread _type;
	typename AT::t_int_1d_randomread _mask;
	typename AT::t_imageint_1d_randomread _image;
	typename AT::t_float_1d_randomread _q;
	typename AT::t_tagint_1d_randomread _molecule;
	typename AT::t_int_2d_randomread _nspecial;
	typename AT::t_tagint_2d_randomread _special;
	typename AT::t_int_1d_randomread _num_bond;
	typename AT::t_int_2d_randomread _bond_type;
	typename AT::t_tagint_2d_randomread _bond_atom;
	typename AT::t_int_1d_randomread _num_angle;
	typename AT::t_int_2d_randomread _angle_type;
	typename AT::t_tagint_2d_randomread _angle_atom1,_angle_atom2,_angle_atom3;
	typename AT::t_int_1d_randomread _num_dihedral;
	typename AT::t_int_2d_randomread _dihedral_type;
	typename AT::t_tagint_2d_randomread _dihedral_atom1,_dihedral_atom2,
	_dihedral_atom3,_dihedral_atom4;
	typename AT::t_int_1d_randomread _num_improper;
	typename AT::t_int_2d_randomread _improper_type;
	typename AT::t_tagint_2d_randomread _improper_atom1,_improper_atom2,
	_improper_atom3,_improper_atom4;
	typename AT::t_x_array _xw;
	typename AT::t_v_array _vw;
	typename AT::t_tagint_1d _tagw;
	typename AT::t_int_1d _typew;
	typename AT::t_int_1d _maskw;
	typename AT::t_imageint_1d _imagew;
	typename AT::t_float_1d _qw;
	typename AT::t_tagint_1d _moleculew;
	typename AT::t_int_2d _nspecialw;
	typename AT::t_tagint_2d _specialw;
	typename AT::t_int_1d _num_bondw;
	typename AT::t_int_2d _bond_typew;
	typename AT::t_tagint_2d _bond_atomw;
	typename AT::t_int_1d _num_anglew;
	typename AT::t_int_2d _angle_typew;
	typename AT::t_tagint_2d _angle_atom1w,_angle_atom2w,_angle_atom3w;
	typename AT::t_int_1d _num_dihedralw;
	typename AT::t_int_2d _dihedral_typew;
	typename AT::t_tagint_2d _dihedral_atom1w,_dihedral_atom2w,
	_dihedral_atom3w,_dihedral_atom4w;
	typename AT::t_int_1d _num_improperw;
	typename AT::t_int_2d _improper_typew;
	typename AT::t_tagint_2d _improper_atom1w,_improper_atom2w,
	_improper_atom3w,_improper_atom4w;
	typename AT::t_xfloat_2d_um _buf;
	typename AT::t_int_1d_const _sendlist;
	typename AT::t_int_1d_const _copylist;
	int _nlocal,_dim;
	X_FLOAT _lo,_hi;
	size_t elements;

	AtomVecFullKokkos_PackExchangeFunctor(
	const AtomKokkos* atom,
	const typename AT::tdual_xfloat_2d buf,
	typename AT::tdual_int_1d sendlist,
	typename AT::tdual_int_1d copylist,int nlocal, int dim,
	X_FLOAT lo, X_FLOAT hi):
	_x(atom->k_x.view<DeviceType>()),
	_v(atom->k_v.view<DeviceType>()),
	_tag(atom->k_tag.view<DeviceType>()),
	_type(atom->k_type.view<DeviceType>()),
	_mask(atom->k_mask.view<DeviceType>()),
	_image(atom->k_image.view<DeviceType>()),
	_q(atom->k_q.view<DeviceType>()),
	_molecule(atom->k_molecule.view<DeviceType>()),
	_nspecial(atom->k_nspecial.view<DeviceType>()),
	_special(atom->k_special.view<DeviceType>()),
	_num_bond(atom->k_num_bond.view<DeviceType>()),
	_bond_type(atom->k_bond_type.view<DeviceType>()),
	_bond_atom(atom->k_bond_atom.view<DeviceType>()),
	_num_angle(atom->k_num_angle.view<DeviceType>()),
	_angle_type(atom->k_angle_type.view<DeviceType>()),
	_angle_atom1(atom->k_angle_atom1.view<DeviceType>()),
	_angle_atom2(atom->k_angle_atom2.view<DeviceType>()),
	_angle_atom3(atom->k_angle_atom3.view<DeviceType>()),
	_num_dihedral(atom->k_num_dihedral.view<DeviceType>()),
	_dihedral_type(atom->k_dihedral_type.view<DeviceType>()),
	_dihedral_atom1(atom->k_dihedral_atom1.view<DeviceType>()),
	_dihedral_atom2(atom->k_dihedral_atom2.view<DeviceType>()),
	_dihedral_atom3(atom->k_dihedral_atom3.view<DeviceType>()),
	_dihedral_atom4(atom->k_dihedral_atom4.view<DeviceType>()),
	_num_improper(atom->k_num_improper.view<DeviceType>()),
	_improper_type(atom->k_improper_type.view<DeviceType>()),
	_improper_atom1(atom->k_improper_atom1.view<DeviceType>()),
	_improper_atom2(atom->k_improper_atom2.view<DeviceType>()),
	_improper_atom3(atom->k_improper_atom3.view<DeviceType>()),
	_improper_atom4(atom->k_improper_atom4.view<DeviceType>()),
	_xw(atom->k_x.view<DeviceType>()),
	_vw(atom->k_v.view<DeviceType>()),
	_tagw(atom->k_tag.view<DeviceType>()),
	_typew(atom->k_type.view<DeviceType>()),
	_maskw(atom->k_mask.view<DeviceType>()),
	_imagew(atom->k_image.view<DeviceType>()),
	_qw(atom->k_q.view<DeviceType>()),
	_moleculew(atom->k_molecule.view<DeviceType>()),
	_nspecialw(atom->k_nspecial.view<DeviceType>()),
	_specialw(atom->k_special.view<DeviceType>()),
	_num_bondw(atom->k_num_bond.view<DeviceType>()),
	_bond_typew(atom->k_bond_type.view<DeviceType>()),
	_bond_atomw(atom->k_bond_atom.view<DeviceType>()),
	_num_anglew(atom->k_num_angle.view<DeviceType>()),
	_angle_typew(atom->k_angle_type.view<DeviceType>()),
	_angle_atom1w(atom->k_angle_atom1.view<DeviceType>()),
	_angle_atom2w(atom->k_angle_atom2.view<DeviceType>()),
	_angle_atom3w(atom->k_angle_atom3.view<DeviceType>()),
	_num_dihedralw(atom->k_num_dihedral.view<DeviceType>()),
	_dihedral_typew(atom->k_dihedral_type.view<DeviceType>()),
	_dihedral_atom1w(atom->k_dihedral_atom1.view<DeviceType>()),
	_dihedral_atom2w(atom->k_dihedral_atom2.view<DeviceType>()),
	_dihedral_atom3w(atom->k_dihedral_atom3.view<DeviceType>()),
	_dihedral_atom4w(atom->k_dihedral_atom4.view<DeviceType>()),
	_num_improperw(atom->k_num_improper.view<DeviceType>()),
	_improper_typew(atom->k_improper_type.view<DeviceType>()),
	_improper_atom1w(atom->k_improper_atom1.view<DeviceType>()),
	_improper_atom2w(atom->k_improper_atom2.view<DeviceType>()),
	_improper_atom3w(atom->k_improper_atom3.view<DeviceType>()),
	_improper_atom4w(atom->k_improper_atom4.view<DeviceType>()),
	_sendlist(sendlist.template view<DeviceType>()),
	_copylist(copylist.template view<DeviceType>()),
	_nlocal(nlocal),_dim(dim),
	_lo(lo),_hi(hi){
	// 3 comp of x, 3 comp of v, 1 tag, 1 type, 1 mask, 1 image, 1 molecule, 3 nspecial,
	// maxspecial special, 1 num_bond, bond_per_atom bond_type, bond_per_atom bond_atom,
	// 1 num_angle, angle_per_atom angle_type, angle_per_atom angle_atom1, angle_atom2,
	// and angle_atom3
	// 1 num_dihedral, dihedral_per_atom dihedral_type, 4*dihedral_per_atom
	// 1 num_improper, 5*improper_per_atom
	// 1 charge
	// 1 to store buffer length
	elements = 20+atom->maxspecial+2atom->bond_per_atom+4atom->angle_per_atom+
	5atom->dihedral_per_atom + 5atom->improper_per_atom;
	const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
	- buf.template view<DeviceType>().dimension_1())/elements;
	+ buf.template view<DeviceType>().dimension_1())/elements;
	buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
	}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int &mysend) const {
	int k;
	const int i = _sendlist(mysend);
	_buf(mysend,0) = elements;
	int m = 1;
	_buf(mysend,m++) = _x(i,0);
	_buf(mysend,m++) = _x(i,1);
	_buf(mysend,m++) = _x(i,2);
	_buf(mysend,m++) = _v(i,0);
	_buf(mysend,m++) = _v(i,1);
	_buf(mysend,m++) = _v(i,2);
	- _buf(mysend,m++) = ubuf(_tag(i)).d;
	- _buf(mysend,m++) = ubuf(_type(i)).d;
	- _buf(mysend,m++) = ubuf(_mask(i)).d;
	- _buf(mysend,m++) = ubuf(_image(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_tag(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_type(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_mask(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_image(i)).d;
	_buf(mysend,m++) = _q(i);
	- _buf(mysend,m++) = ubuf(_molecule(i)).d;
	- _buf(mysend,m++) = ubuf(_num_bond(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_molecule(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_num_bond(i)).d;
	for (k = 0; k < _num_bond(i); k++) {
	- _buf(mysend,m++) = ubuf(_bond_type(i,k)).d;
	- _buf(mysend,m++) = ubuf(_bond_atom(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_bond_type(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_bond_atom(i,k)).d;
	}
	- _buf(mysend,m++) = ubuf(_num_angle(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_num_angle(i)).d;
	for (k = 0; k < _num_angle(i); k++) {
	- _buf(mysend,m++) = ubuf(_angle_type(i,k)).d;
	- _buf(mysend,m++) = ubuf(_angle_atom1(i,k)).d;
	- _buf(mysend,m++) = ubuf(_angle_atom2(i,k)).d;
	- _buf(mysend,m++) = ubuf(_angle_atom3(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_angle_type(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_angle_atom1(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_angle_atom2(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_angle_atom3(i,k)).d;
	}
	- _buf(mysend,m++) = ubuf(_num_dihedral(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_num_dihedral(i)).d;
	for (k = 0; k < _num_dihedral(i); k++) {
	- _buf(mysend,m++) = ubuf(_dihedral_type(i,k)).d;
	- _buf(mysend,m++) = ubuf(_dihedral_atom1(i,k)).d;
	- _buf(mysend,m++) = ubuf(_dihedral_atom2(i,k)).d;
	- _buf(mysend,m++) = ubuf(_dihedral_atom3(i,k)).d;
	- _buf(mysend,m++) = ubuf(_dihedral_atom4(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_dihedral_type(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_dihedral_atom1(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_dihedral_atom2(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_dihedral_atom3(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_dihedral_atom4(i,k)).d;
	}
	- _buf(mysend,m++) = ubuf(_num_improper(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_num_improper(i)).d;
	for (k = 0; k < _num_improper(i); k++) {
	- _buf(mysend,m++) = ubuf(_improper_type(i,k)).d;
	- _buf(mysend,m++) = ubuf(_improper_atom1(i,k)).d;
	- _buf(mysend,m++) = ubuf(_improper_atom2(i,k)).d;
	- _buf(mysend,m++) = ubuf(_improper_atom3(i,k)).d;
	- _buf(mysend,m++) = ubuf(_improper_atom4(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_improper_type(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_improper_atom1(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_improper_atom2(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_improper_atom3(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_improper_atom4(i,k)).d;
	}

	- _buf(mysend,m++) = ubuf(_nspecial(i,0)).d;
	- _buf(mysend,m++) = ubuf(_nspecial(i,1)).d;
	- _buf(mysend,m++) = ubuf(_nspecial(i,2)).d;
	+ _buf(mysend,m++) = d_ubuf(_nspecial(i,0)).d;
	+ _buf(mysend,m++) = d_ubuf(_nspecial(i,1)).d;
	+ _buf(mysend,m++) = d_ubuf(_nspecial(i,2)).d;
	for (k = 0; k < _nspecial(i,2); k++)
	- _buf(mysend,m++) = ubuf(_special(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_special(i,k)).d;

	const int j = _copylist(mysend);

	if(j>-1) {
	_xw(i,0) = _x(j,0);
	_xw(i,1) = _x(j,1);
	_xw(i,2) = _x(j,2);
	_vw(i,0) = _v(j,0);
	_vw(i,1) = _v(j,1);
	_vw(i,2) = _v(j,2);
	_tagw(i) = _tag(j);
	_typew(i) = _type(j);
	_maskw(i) = _mask(j);
	_imagew(i) = _image(j);
	_qw(i) = _q(j);
	_moleculew(i) = _molecule(j);
	_num_bondw(i) = _num_bond(j);
	for (k = 0; k < _num_bond(j); k++) {
	_bond_typew(i,k) = _bond_type(j,k);
	_bond_atomw(i,k) = _bond_atom(j,k);
	}
	_num_anglew(i) = _num_angle(j);
	for (k = 0; k < _num_angle(j); k++) {
	_angle_typew(i,k) = _angle_type(j,k);
	_angle_atom1w(i,k) = _angle_atom1(j,k);
	_angle_atom2w(i,k) = _angle_atom2(j,k);
	_angle_atom3w(i,k) = _angle_atom3(j,k);
	}
	_num_dihedralw(i) = _num_dihedral(j);
	for (k = 0; k < _num_dihedral(j); k++) {
	_dihedral_typew(i,k) = _dihedral_type(j,k);
	_dihedral_atom1w(i,k) = _dihedral_atom1(j,k);
	_dihedral_atom2w(i,k) = _dihedral_atom2(j,k);
	_dihedral_atom3w(i,k) = _dihedral_atom3(j,k);
	_dihedral_atom4w(i,k) = _dihedral_atom4(j,k);
	}
	_num_improperw(i) = _num_improper(j);
	for (k = 0; k < _num_improper(j); k++) {
	_improper_typew(i,k) = _improper_type(j,k);
	_improper_atom1w(i,k) = _improper_atom1(j,k);
	_improper_atom2w(i,k) = _improper_atom2(j,k);
	_improper_atom3w(i,k) = _improper_atom3(j,k);
	_improper_atom4w(i,k) = _improper_atom4(j,k);
	}
	_nspecialw(i,0) = _nspecial(j,0);
	_nspecialw(i,1) = _nspecial(j,1);
	_nspecialw(i,2) = _nspecial(j,2);
	for (k = 0; k < _nspecial(j,2); k++)
	_specialw(i,k) = _special(j,k);
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecFullKokkos::pack_exchange_kokkos(const int &nsend,DAT::tdual_xfloat_2d &k_buf,
	DAT::tdual_int_1d k_sendlist,
	DAT::tdual_int_1d k_copylist,
	ExecutionSpace space,int dim,X_FLOAT lo,
	X_FLOAT hi )
	{
	const int elements = 20+atom->maxspecial+2atom->bond_per_atom+4atom->angle_per_atom+
	5atom->dihedral_per_atom + 5atom->improper_per_atom;
	if(nsend > (int) (k_buf.view<LMPHostType>().dimension_0()*
	k_buf.view<LMPHostType>().dimension_1())/elements) {
	int newsize = nsend*elements/k_buf.view<LMPHostType>().dimension_1()+1;
	k_buf.resize(newsize,k_buf.view<LMPHostType>().dimension_1());
	}
	if(space == Host) {
	AtomVecFullKokkos_PackExchangeFunctor<LMPHostType>
	f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
	Kokkos::parallel_for(nsend,f);
	LMPHostType::fence();
	return nsend*elements;
	} else {
	AtomVecFullKokkos_PackExchangeFunctor<LMPDeviceType>
	f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
	Kokkos::parallel_for(nsend,f);
	LMPDeviceType::fence();
	return nsend*elements;
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecFullKokkos::pack_exchange(int i, double *buf)
	{
	int k;
	int m = 1;
	buf[m++] = h_x(i,0);
	buf[m++] = h_x(i,1);
	buf[m++] = h_x(i,2);
	buf[m++] = h_v(i,0);
	buf[m++] = h_v(i,1);
	buf[m++] = h_v(i,2);
	buf[m++] = ubuf(h_tag(i)).d;
	buf[m++] = ubuf(h_type(i)).d;
	buf[m++] = ubuf(h_mask(i)).d;
	buf[m++] = ubuf(h_image(i)).d;
	buf[m++] = h_q(i);
	buf[m++] = ubuf(h_molecule(i)).d;
	buf[m++] = ubuf(h_num_bond(i)).d;
	for (k = 0; k < h_num_bond(i); k++) {
	buf[m++] = ubuf(h_bond_type(i,k)).d;
	buf[m++] = ubuf(h_bond_atom(i,k)).d;
	}
	buf[m++] = ubuf(h_num_angle(i)).d;
	for (k = 0; k < h_num_angle(i); k++) {
	buf[m++] = ubuf(h_angle_type(i,k)).d;
	buf[m++] = ubuf(h_angle_atom1(i,k)).d;
	buf[m++] = ubuf(h_angle_atom2(i,k)).d;
	buf[m++] = ubuf(h_angle_atom3(i,k)).d;
	}
	buf[m++] = ubuf(h_num_dihedral(i)).d;
	for (k = 0; k < h_num_dihedral(i); k++) {
	buf[m++] = ubuf(h_dihedral_type(i,k)).d;
	buf[m++] = ubuf(h_dihedral_atom1(i,k)).d;
	buf[m++] = ubuf(h_dihedral_atom2(i,k)).d;
	buf[m++] = ubuf(h_dihedral_atom3(i,k)).d;
	buf[m++] = ubuf(h_dihedral_atom4(i,k)).d;
	}
	buf[m++] = ubuf(h_num_improper(i)).d;
	for (k = 0; k < h_num_improper(i); k++) {
	buf[m++] = ubuf(h_improper_type(i,k)).d;
	buf[m++] = ubuf(h_improper_atom1(i,k)).d;
	buf[m++] = ubuf(h_improper_atom2(i,k)).d;
	buf[m++] = ubuf(h_improper_atom3(i,k)).d;
	buf[m++] = ubuf(h_improper_atom4(i,k)).d;
	}
	buf[m++] = ubuf(h_nspecial(i,0)).d;
	buf[m++] = ubuf(h_nspecial(i,1)).d;
	buf[m++] = ubuf(h_nspecial(i,2)).d;
	for (k = 0; k < h_nspecial(i,2); k++)
	buf[m++] = ubuf(h_special(i,k)).d;

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);

	buf[0] = m;
	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecFullKokkos_UnpackExchangeFunctor {
	-
	- union ubuf {
	- double d;
	- int64_t i;
	- KOKKOS_INLINE_FUNCTION
	- ubuf(double arg) : d(arg) {}
	- KOKKOS_INLINE_FUNCTION
	- ubuf(int64_t arg) : i(arg) {}
	- KOKKOS_INLINE_FUNCTION
	- ubuf(int arg) : i(arg) {}
	- };
	-
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;
	typename AT::t_x_array _x;
	typename AT::t_v_array _v;
	typename AT::t_tagint_1d _tag;
	typename AT::t_int_1d _type;
	typename AT::t_int_1d _mask;
	typename AT::t_imageint_1d _image;
	typename AT::t_float_1d _q;
	typename AT::t_tagint_1d _molecule;
	typename AT::t_int_2d _nspecial;
	typename AT::t_tagint_2d _special;
	typename AT::t_int_1d _num_bond;
	typename AT::t_int_2d _bond_type;
	typename AT::t_tagint_2d _bond_atom;
	typename AT::t_int_1d _num_angle;
	typename AT::t_int_2d _angle_type;
	typename AT::t_tagint_2d _angle_atom1,_angle_atom2,_angle_atom3;
	typename AT::t_int_1d _num_dihedral;
	typename AT::t_int_2d _dihedral_type;
	typename AT::t_tagint_2d _dihedral_atom1,_dihedral_atom2,
	_dihedral_atom3,_dihedral_atom4;
	typename AT::t_int_1d _num_improper;
	typename AT::t_int_2d _improper_type;
	typename AT::t_tagint_2d _improper_atom1,_improper_atom2,
	_improper_atom3,_improper_atom4;

	typename AT::t_xfloat_2d_um _buf;
	typename AT::t_int_1d _nlocal;
	int _dim;
	X_FLOAT _lo,_hi;
	size_t elements;

	AtomVecFullKokkos_UnpackExchangeFunctor(
	const AtomKokkos* atom,
	const typename AT::tdual_xfloat_2d buf,
	typename AT::tdual_int_1d nlocal,
	int dim, X_FLOAT lo, X_FLOAT hi):
	_x(atom->k_x.view<DeviceType>()),
	_v(atom->k_v.view<DeviceType>()),
	_tag(atom->k_tag.view<DeviceType>()),
	_type(atom->k_type.view<DeviceType>()),
	_mask(atom->k_mask.view<DeviceType>()),
	_image(atom->k_image.view<DeviceType>()),
	_q(atom->k_q.view<DeviceType>()),
	_molecule(atom->k_molecule.view<DeviceType>()),
	_nspecial(atom->k_nspecial.view<DeviceType>()),
	_special(atom->k_special.view<DeviceType>()),
	_num_bond(atom->k_num_bond.view<DeviceType>()),
	_bond_type(atom->k_bond_type.view<DeviceType>()),
	_bond_atom(atom->k_bond_atom.view<DeviceType>()),
	_num_angle(atom->k_num_angle.view<DeviceType>()),
	_angle_type(atom->k_angle_type.view<DeviceType>()),
	_angle_atom1(atom->k_angle_atom1.view<DeviceType>()),
	_angle_atom2(atom->k_angle_atom2.view<DeviceType>()),
	_angle_atom3(atom->k_angle_atom3.view<DeviceType>()),
	_num_dihedral(atom->k_num_dihedral.view<DeviceType>()),
	_dihedral_type(atom->k_dihedral_type.view<DeviceType>()),
	_dihedral_atom1(atom->k_dihedral_atom1.view<DeviceType>()),
	_dihedral_atom2(atom->k_dihedral_atom2.view<DeviceType>()),
	_dihedral_atom3(atom->k_dihedral_atom3.view<DeviceType>()),
	_dihedral_atom4(atom->k_dihedral_atom4.view<DeviceType>()),
	_num_improper(atom->k_num_improper.view<DeviceType>()),
	_improper_type(atom->k_improper_type.view<DeviceType>()),
	_improper_atom1(atom->k_improper_atom1.view<DeviceType>()),
	_improper_atom2(atom->k_improper_atom2.view<DeviceType>()),
	_improper_atom3(atom->k_improper_atom3.view<DeviceType>()),
	_improper_atom4(atom->k_improper_atom4.view<DeviceType>()),
	_nlocal(nlocal.template view<DeviceType>()),_dim(dim),
	_lo(lo),_hi(hi){

	elements = 20+atom->maxspecial+2atom->bond_per_atom+4atom->angle_per_atom+
	5atom->dihedral_per_atom + 5atom->improper_per_atom;
	const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
	- buf.template view<DeviceType>().dimension_1())/elements;
	+ buf.template view<DeviceType>().dimension_1())/elements;
	buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
	}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int &myrecv) const {
	X_FLOAT x = _buf(myrecv,_dim+1);
	if (x >= _lo && x < _hi) {
	int i = Kokkos::atomic_fetch_add(&_nlocal(0),1);
	int m = 1;
	_x(i,0) = _buf(myrecv,m++);
	_x(i,1) = _buf(myrecv,m++);
	_x(i,2) = _buf(myrecv,m++);
	_v(i,0) = _buf(myrecv,m++);
	_v(i,1) = _buf(myrecv,m++);
	_v(i,2) = _buf(myrecv,m++);
	- _tag(i) = (tagint) ubuf(_buf(myrecv,m++)).i;
	- _type(i) = (int) ubuf(_buf(myrecv,m++)).i;
	- _mask(i) = (int) ubuf(_buf(myrecv,m++)).i;
	- _image(i) = (imageint) ubuf(_buf(myrecv,m++)).i;
	+ _tag(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _type(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _mask(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _image(i) = (imageint) d_ubuf(_buf(myrecv,m++)).i;
	_q(i) = _buf(myrecv,m++);
	- _molecule(i) = (tagint) ubuf(_buf(myrecv,m++)).i;
	- _num_bond(i) = (int) ubuf(_buf(myrecv,m++)).i;
	+ _molecule(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _num_bond(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	int k;
	for (k = 0; k < _num_bond(i); k++) {
	- _bond_type(i,k) = (int) ubuf(_buf(myrecv,m++)).i;
	- _bond_atom(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
	+ _bond_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _bond_atom(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	}
	- _num_angle(i) = (int) ubuf(_buf(myrecv,m++)).i;
	+ _num_angle(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	for (k = 0; k < _num_angle(i); k++) {
	- _angle_type(i,k) = (int) ubuf(_buf(myrecv,m++)).i;
	- _angle_atom1(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
	- _angle_atom2(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
	- _angle_atom3(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
	+ _angle_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _angle_atom1(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _angle_atom2(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _angle_atom3(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	}
	- _num_dihedral(i) = (int) ubuf(_buf(myrecv,m++)).i;
	+ _num_dihedral(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	for (k = 0; k < _num_dihedral(i); k++) {
	- _dihedral_type(i,k) = (int) ubuf(_buf(myrecv,m++)).i;
	- _dihedral_atom1(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
	- _dihedral_atom2(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
	- _dihedral_atom3(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
	- _dihedral_atom4(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
	+ _dihedral_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _dihedral_atom1(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _dihedral_atom2(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _dihedral_atom3(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _dihedral_atom4(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	}
	- _num_improper(i) = (int) ubuf(_buf(myrecv,m++)).i;
	+ _num_improper(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	for (k = 0; k < _num_improper(i); k++) {
	- _improper_type(i,k) = (int) ubuf(_buf(myrecv,m++)).i;
	- _improper_atom1(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
	- _improper_atom2(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
	- _improper_atom3(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
	- _improper_atom4(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
	+ _improper_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _improper_atom1(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _improper_atom2(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _improper_atom3(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _improper_atom4(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	}
	- _nspecial(i,0) = (int) ubuf(_buf(myrecv,m++)).i;
	- _nspecial(i,1) = (int) ubuf(_buf(myrecv,m++)).i;
	- _nspecial(i,2) = (int) ubuf(_buf(myrecv,m++)).i;
	+ _nspecial(i,0) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _nspecial(i,1) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _nspecial(i,2) = (int) d_ubuf(_buf(myrecv,m++)).i;
	for (k = 0; k < _nspecial(i,2); k++)
	- _special(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
	+ _special(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecFullKokkos::unpack_exchange_kokkos(DAT::tdual_xfloat_2d &k_buf,int nrecv,
	int nlocal,int dim,X_FLOAT lo,X_FLOAT hi,
	ExecutionSpace space) {
	const size_t elements = 20+atom->maxspecial+2atom->bond_per_atom+4atom->angle_per_atom+
	5atom->dihedral_per_atom + 5atom->improper_per_atom;
	if(space == Host) {
	k_count.h_view(0) = nlocal;
	AtomVecFullKokkos_UnpackExchangeFunctor<LMPHostType>
	f(atomKK,k_buf,k_count,dim,lo,hi);
	Kokkos::parallel_for(nrecv/elements,f);
	LMPHostType::fence();
	return k_count.h_view(0);
	} else {
	k_count.h_view(0) = nlocal;
	k_count.modify<LMPHostType>();
	k_count.sync<LMPDeviceType>();
	AtomVecFullKokkos_UnpackExchangeFunctor<LMPDeviceType>
	f(atomKK,k_buf,k_count,dim,lo,hi);
	Kokkos::parallel_for(nrecv/elements,f);
	LMPDeviceType::fence();
	k_count.modify<LMPDeviceType>();
	k_count.sync<LMPHostType>();

	return k_count.h_view(0);
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecFullKokkos::unpack_exchange(double *buf)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) grow(0);
	modified(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| Q_MASK \| MOLECULE_MASK \| BOND_MASK \|
	ANGLE_MASK \| DIHEDRAL_MASK \| IMPROPER_MASK \| SPECIAL_MASK);

	int k;
	int m = 1;
	h_x(nlocal,0) = buf[m++];
	h_x(nlocal,1) = buf[m++];
	h_x(nlocal,2) = buf[m++];
	h_v(nlocal,0) = buf[m++];
	h_v(nlocal,1) = buf[m++];
	h_v(nlocal,2) = buf[m++];
	h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
	h_type(nlocal) = (int) ubuf(buf[m++]).i;
	h_mask(nlocal) = (int) ubuf(buf[m++]).i;
	h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
	h_q(nlocal) = buf[m++];
	h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;

	h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_bond(nlocal); k++) {
	h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}
	h_num_angle(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_angle(nlocal); k++) {
	h_angle_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_angle_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_angle_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_angle_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}
	h_num_dihedral(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_dihedral(nlocal); k++) {
	h_dihedral_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_dihedral_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_dihedral_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_dihedral_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_dihedral_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}
	h_num_improper(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_improper(nlocal); k++) {
	h_improper_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_improper_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_improper_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_improper_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_improper_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}
	h_nspecial(nlocal,0) = (int) ubuf(buf[m++]).i;
	h_nspecial(nlocal,1) = (int) ubuf(buf[m++]).i;
	h_nspecial(nlocal,2) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_nspecial(nlocal,2); k++)
	h_special(nlocal,k) = (tagint) ubuf(buf[m++]).i;

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	m += modify->fix[atom->extra_grow[iextra]]->
	unpack_exchange(nlocal,&buf[m]);

	atom->nlocal++;
	return m;
	}

	/* ----------------------------------------------------------------------
	size of restart data for all atoms owned by this proc
	include extra data stored by fixes
	------------------------------------------------------------------------- */

	int AtomVecFullKokkos::size_restart()
	{
	int i;

	int nlocal = atom->nlocal;
	int n = 0;
	for (i = 0; i < nlocal; i++)
	n += 17 + 2num_bond[i] + 4num_angle[i] +
	5num_dihedral[i] + 5num_improper[i];

	if (atom->nextra_restart)
	for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
	for (i = 0; i < nlocal; i++)
	n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);

	return n;
	}

	/* ----------------------------------------------------------------------
	pack atom I's data for restart file including extra quantities
	xyz must be 1st 3 values, so that read_restart can test on them
	molecular types may be negative, but write as positive
	------------------------------------------------------------------------- */

	int AtomVecFullKokkos::pack_restart(int i, double *buf)
	{
	sync(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| Q_MASK \| MOLECULE_MASK \| BOND_MASK \|
	ANGLE_MASK \| DIHEDRAL_MASK \| IMPROPER_MASK \| SPECIAL_MASK);

	int m = 1;
	buf[m++] = h_x(i,0);
	buf[m++] = h_x(i,1);
	buf[m++] = h_x(i,2);
	buf[m++] = ubuf(h_tag(i)).d;
	buf[m++] = ubuf(h_type(i)).d;
	buf[m++] = ubuf(h_mask(i)).d;
	buf[m++] = ubuf(h_image(i)).d;
	buf[m++] = h_v(i,0);
	buf[m++] = h_v(i,1);
	buf[m++] = h_v(i,2);

	buf[m++] = h_q(i);
	buf[m++] = ubuf(h_molecule(i)).d;

	buf[m++] = ubuf(h_num_bond(i)).d;
	for (int k = 0; k < h_num_bond(i); k++) {
	buf[m++] = ubuf(MAX(h_bond_type(i,k),-h_bond_type(i,k))).d;
	buf[m++] = ubuf(h_bond_atom(i,k)).d;
	}

	buf[m++] = ubuf(h_num_angle(i)).d;
	for (int k = 0; k < h_num_angle(i); k++) {
	buf[m++] = ubuf(MAX(h_angle_type(i,k),-h_angle_type(i,k))).d;
	buf[m++] = ubuf(h_angle_atom1(i,k)).d;
	buf[m++] = ubuf(h_angle_atom2(i,k)).d;
	buf[m++] = ubuf(h_angle_atom3(i,k)).d;
	}

	buf[m++] = ubuf(h_num_dihedral(i)).d;
	for (int k = 0; k < h_num_dihedral(i); k++) {
	buf[m++] = ubuf(MAX(h_dihedral_type(i,k),-h_dihedral_type(i,k))).d;
	buf[m++] = ubuf(h_dihedral_atom1(i,k)).d;
	buf[m++] = ubuf(h_dihedral_atom2(i,k)).d;
	buf[m++] = ubuf(h_dihedral_atom3(i,k)).d;
	buf[m++] = ubuf(h_dihedral_atom4(i,k)).d;
	}

	buf[m++] = ubuf(h_num_improper(i)).d;
	for (int k = 0; k < h_num_improper(i); k++) {
	buf[m++] = ubuf(MAX(h_improper_type(i,k),-h_improper_type(i,k))).d;
	buf[m++] = ubuf(h_improper_atom1(i,k)).d;
	buf[m++] = ubuf(h_improper_atom2(i,k)).d;
	buf[m++] = ubuf(h_improper_atom3(i,k)).d;
	buf[m++] = ubuf(h_improper_atom4(i,k)).d;
	}

	if (atom->nextra_restart)
	for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
	m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);

	buf[0] = m;
	return m;
	}

	/* ----------------------------------------------------------------------
	unpack data for one atom from restart file including extra quantities
	------------------------------------------------------------------------- */

	int AtomVecFullKokkos::unpack_restart(double *buf)
	{
	int k;

	int nlocal = atom->nlocal;
	if (nlocal == nmax) {
	grow(0);
	if (atom->nextra_store)
	memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
	}
	sync(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| Q_MASK \| MOLECULE_MASK \| BOND_MASK \|
	ANGLE_MASK \| DIHEDRAL_MASK \| IMPROPER_MASK \| SPECIAL_MASK);
	modified(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| Q_MASK \| MOLECULE_MASK \| BOND_MASK \|
	ANGLE_MASK \| DIHEDRAL_MASK \| IMPROPER_MASK \| SPECIAL_MASK);

	int m = 1;
	h_x(nlocal,0) = buf[m++];
	h_x(nlocal,1) = buf[m++];
	h_x(nlocal,2) = buf[m++];
	h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
	h_type(nlocal) = (int) ubuf(buf[m++]).i;
	h_mask(nlocal) = (int) ubuf(buf[m++]).i;
	h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
	h_v(nlocal,0) = buf[m++];
	h_v(nlocal,1) = buf[m++];
	h_v(nlocal,2) = buf[m++];

	h_q(nlocal) = buf[m++];
	h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;

	h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_bond(nlocal); k++) {
	h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}

	h_num_angle(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_angle(nlocal); k++) {
	h_angle_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_angle_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_angle_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_angle_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}

	h_num_dihedral(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_dihedral(nlocal); k++) {
	h_dihedral_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_dihedral_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_dihedral_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_dihedral_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_dihedral_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}

	h_num_improper(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_improper(nlocal); k++) {
	h_improper_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_improper_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_improper_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_improper_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_improper_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}

	h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;

	double **extra = atom->extra;
	if (atom->nextra_store) {
	int size = static_cast<int> (buf[0]) - m;
	for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
	}

	atom->nlocal++;
	return m;
	}

	/* ----------------------------------------------------------------------
	create one atom of itype at coord
	set other values to defaults
	------------------------------------------------------------------------- */

	void AtomVecFullKokkos::create_atom(int itype, double *coord)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) {
	atomKK->modified(Host,ALL_MASK);
	grow(0);
	}
	atomKK->sync(Host,ALL_MASK);
	atomKK->modified(Host,ALL_MASK);

	tag[nlocal] = 0;
	type[nlocal] = itype;
	h_x(nlocal,0) = coord[0];
	h_x(nlocal,1) = coord[1];
	h_x(nlocal,2) = coord[2];
	h_mask(nlocal) = 1;
	h_image(nlocal) = ((imageint) IMGMAX << IMG2BITS) \|
	((imageint) IMGMAX << IMGBITS) \| IMGMAX;
	h_v(nlocal,0) = 0.0;
	h_v(nlocal,1) = 0.0;
	h_v(nlocal,2) = 0.0;

	h_q(nlocal) = 0.0;
	h_molecule(nlocal) = 0;
	h_num_bond(nlocal) = 0;
	h_num_angle(nlocal) = 0;
	h_num_dihedral(nlocal) = 0;
	h_num_improper(nlocal) = 0;
	h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;

	atom->nlocal++;
	}

	/* ----------------------------------------------------------------------
	unpack one line from Atoms section of data file
	initialize other atom quantities
	------------------------------------------------------------------------- */

	void AtomVecFullKokkos::data_atom(double *coord, imageint imagetmp,
	char **values)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) grow(0);
	atomKK->modified(Host,ALL_MASK);

	h_tag(nlocal) = atoi(values[0]);
	h_molecule(nlocal) = atoi(values[1]);
	h_type(nlocal) = atoi(values[2]);
	if (h_type(nlocal) <= 0 \|\| h_type(nlocal) > atom->ntypes)
	error->one(FLERR,"Invalid atom type in Atoms section of data file");

	h_q(nlocal) = atof(values[3]);

	h_x(nlocal,0) = coord[0];
	h_x(nlocal,1) = coord[1];
	h_x(nlocal,2) = coord[2];

	h_image(nlocal) = imagetmp;

	h_mask(nlocal) = 1;
	h_v(nlocal,0) = 0.0;
	h_v(nlocal,1) = 0.0;
	h_v(nlocal,2) = 0.0;
	h_num_bond(nlocal) = 0;
	h_num_angle(nlocal) = 0;
	h_num_dihedral(nlocal) = 0;
	h_num_improper(nlocal) = 0;

	atom->nlocal++;
	}

	/* ----------------------------------------------------------------------
	unpack hybrid quantities from one line in Atoms section of data file
	initialize other atom quantities for this sub-style
	------------------------------------------------------------------------- */

	int AtomVecFullKokkos::data_atom_hybrid(int nlocal, char **values)
	{
	h_molecule(nlocal) = atoi(values[0]);
	h_q(nlocal) = atof(values[1]);
	h_num_bond(nlocal) = 0;
	h_num_angle(nlocal) = 0;
	h_num_dihedral(nlocal) = 0;
	h_num_improper(nlocal) = 0;
	return 2;
	}

	/* ----------------------------------------------------------------------
	pack atom info for data file including 3 image flags
	------------------------------------------------------------------------- */

	void AtomVecFullKokkos::pack_data(double **buf)
	{
	int nlocal = atom->nlocal;
	for (int i = 0; i < nlocal; i++) {
	buf[i][0] = h_tag(i);
	buf[i][1] = h_molecule(i);
	buf[i][2] = h_type(i);
	buf[i][3] = h_q(i);
	buf[i][4] = h_x(i,0);
	buf[i][5] = h_x(i,1);
	buf[i][6] = h_x(i,2);
	buf[i][7] = (h_image[i] & IMGMASK) - IMGMAX;
	buf[i][8] = (h_image[i] >> IMGBITS & IMGMASK) - IMGMAX;
	buf[i][9] = (h_image[i] >> IMG2BITS) - IMGMAX;
	}
	}

	/* ----------------------------------------------------------------------
	pack hybrid atom info for data file
	------------------------------------------------------------------------- */

	int AtomVecFullKokkos::pack_data_hybrid(int i, double *buf)
	{
	buf[0] = h_molecule(i);
	buf[1] = h_q(i);
	return 2;
	}

	/* ----------------------------------------------------------------------
	write atom info to data file including 3 image flags
	------------------------------------------------------------------------- */

	void AtomVecFullKokkos::write_data(FILE fp, int n, double *buf)
	{
	for (int i = 0; i < n; i++)
	fprintf(fp,"%d %d %d %-1.16e %-1.16e %-1.16e %-1.16e %d %d %d\n",
	(int) buf[i][0],(int) buf[i][1], (int) buf[i][2], buf[i][3],
	buf[i][4],buf[i][5],buf[i][6],
	(int) buf[i][7],(int) buf[i][8],(int) buf[i][9]);
	}

	/* ----------------------------------------------------------------------
	write hybrid atom info to data file
	------------------------------------------------------------------------- */

	int AtomVecFullKokkos::write_data_hybrid(FILE fp, double buf)
	{
	fprintf(fp," " TAGINT_FORMAT " %-1.16e",(tagint) ubuf(buf[0]).i,buf[1]);
	return 2;
	}

	/* ----------------------------------------------------------------------
	return # of bytes of allocated memory
	------------------------------------------------------------------------- */

	bigint AtomVecFullKokkos::memory_usage()
	{
	bigint bytes = 0;

	if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
	if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
	if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
	if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
	if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
	if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
	if (atom->memcheck("f")) bytes += memory->usage(f,nmax*commKK->nthreads,3);

	if (atom->memcheck("q")) bytes += memory->usage(q,nmax);
	if (atom->memcheck("molecule")) bytes += memory->usage(molecule,nmax);
	if (atom->memcheck("nspecial")) bytes += memory->usage(nspecial,nmax,3);
	if (atom->memcheck("special"))
	bytes += memory->usage(special,nmax,atom->maxspecial);

	if (atom->memcheck("num_bond")) bytes += memory->usage(num_bond,nmax);
	if (atom->memcheck("bond_type"))
	bytes += memory->usage(bond_type,nmax,atom->bond_per_atom);
	if (atom->memcheck("bond_atom"))
	bytes += memory->usage(bond_atom,nmax,atom->bond_per_atom);

	if (atom->memcheck("num_angle")) bytes += memory->usage(num_angle,nmax);
	if (atom->memcheck("angle_type"))
	bytes += memory->usage(angle_type,nmax,atom->angle_per_atom);
	if (atom->memcheck("angle_atom1"))
	bytes += memory->usage(angle_atom1,nmax,atom->angle_per_atom);
	if (atom->memcheck("angle_atom2"))
	bytes += memory->usage(angle_atom2,nmax,atom->angle_per_atom);
	if (atom->memcheck("angle_atom3"))
	bytes += memory->usage(angle_atom3,nmax,atom->angle_per_atom);

	if (atom->memcheck("num_dihedral")) bytes += memory->usage(num_dihedral,nmax);
	if (atom->memcheck("dihedral_type"))
	bytes += memory->usage(dihedral_type,nmax,atom->dihedral_per_atom);
	if (atom->memcheck("dihedral_atom1"))
	bytes += memory->usage(dihedral_atom1,nmax,atom->dihedral_per_atom);
	if (atom->memcheck("dihedral_atom2"))
	bytes += memory->usage(dihedral_atom2,nmax,atom->dihedral_per_atom);
	if (atom->memcheck("dihedral_atom3"))
	bytes += memory->usage(dihedral_atom3,nmax,atom->dihedral_per_atom);
	if (atom->memcheck("dihedral_atom4"))
	bytes += memory->usage(dihedral_atom4,nmax,atom->dihedral_per_atom);
	if (atom->memcheck("num_improper")) bytes += memory->usage(num_improper,nmax);
	if (atom->memcheck("improper_type"))
	bytes += memory->usage(improper_type,nmax,atom->improper_per_atom);
	if (atom->memcheck("improper_atom1"))
	bytes += memory->usage(improper_atom1,nmax,atom->improper_per_atom);
	if (atom->memcheck("improper_atom2"))
	bytes += memory->usage(improper_atom2,nmax,atom->improper_per_atom);
	if (atom->memcheck("improper_atom3"))
	bytes += memory->usage(improper_atom3,nmax,atom->improper_per_atom);
	if (atom->memcheck("improper_atom4"))
	bytes += memory->usage(improper_atom4,nmax,atom->improper_per_atom);

	return bytes;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecFullKokkos::sync(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if (mask & X_MASK) atomKK->k_x.sync<LMPDeviceType>();
	if (mask & V_MASK) atomKK->k_v.sync<LMPDeviceType>();
	if (mask & F_MASK) atomKK->k_f.sync<LMPDeviceType>();
	if (mask & TAG_MASK) atomKK->k_tag.sync<LMPDeviceType>();
	if (mask & TYPE_MASK) atomKK->k_type.sync<LMPDeviceType>();
	if (mask & MASK_MASK) atomKK->k_mask.sync<LMPDeviceType>();
	if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPDeviceType>();
	if (mask & Q_MASK) atomKK->k_q.sync<LMPDeviceType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPDeviceType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.sync<LMPDeviceType>();
	atomKK->k_special.sync<LMPDeviceType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.sync<LMPDeviceType>();
	atomKK->k_bond_type.sync<LMPDeviceType>();
	atomKK->k_bond_atom.sync<LMPDeviceType>();
	}
	if (mask & ANGLE_MASK) {
	atomKK->k_num_angle.sync<LMPDeviceType>();
	atomKK->k_angle_type.sync<LMPDeviceType>();
	atomKK->k_angle_atom1.sync<LMPDeviceType>();
	atomKK->k_angle_atom2.sync<LMPDeviceType>();
	atomKK->k_angle_atom3.sync<LMPDeviceType>();
	}
	if (mask & DIHEDRAL_MASK) {
	atomKK->k_num_dihedral.sync<LMPDeviceType>();
	atomKK->k_dihedral_type.sync<LMPDeviceType>();
	atomKK->k_dihedral_atom1.sync<LMPDeviceType>();
	atomKK->k_dihedral_atom2.sync<LMPDeviceType>();
	atomKK->k_dihedral_atom3.sync<LMPDeviceType>();
	atomKK->k_dihedral_atom4.sync<LMPDeviceType>();
	}
	if (mask & IMPROPER_MASK) {
	atomKK->k_num_improper.sync<LMPDeviceType>();
	atomKK->k_improper_type.sync<LMPDeviceType>();
	atomKK->k_improper_atom1.sync<LMPDeviceType>();
	atomKK->k_improper_atom2.sync<LMPDeviceType>();
	atomKK->k_improper_atom3.sync<LMPDeviceType>();
	atomKK->k_improper_atom4.sync<LMPDeviceType>();
	}
	} else {
	if (mask & X_MASK) atomKK->k_x.sync<LMPHostType>();
	if (mask & V_MASK) atomKK->k_v.sync<LMPHostType>();
	if (mask & F_MASK) atomKK->k_f.sync<LMPHostType>();
	if (mask & TAG_MASK) atomKK->k_tag.sync<LMPHostType>();
	if (mask & TYPE_MASK) atomKK->k_type.sync<LMPHostType>();
	if (mask & MASK_MASK) atomKK->k_mask.sync<LMPHostType>();
	if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPHostType>();
	if (mask & Q_MASK) atomKK->k_q.sync<LMPHostType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPHostType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.sync<LMPHostType>();
	atomKK->k_special.sync<LMPHostType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.sync<LMPHostType>();
	atomKK->k_bond_type.sync<LMPHostType>();
	atomKK->k_bond_atom.sync<LMPHostType>();
	}
	if (mask & ANGLE_MASK) {
	atomKK->k_num_angle.sync<LMPHostType>();
	atomKK->k_angle_type.sync<LMPHostType>();
	atomKK->k_angle_atom1.sync<LMPHostType>();
	atomKK->k_angle_atom2.sync<LMPHostType>();
	atomKK->k_angle_atom3.sync<LMPHostType>();
	}
	if (mask & DIHEDRAL_MASK) {
	atomKK->k_num_dihedral.sync<LMPHostType>();
	atomKK->k_dihedral_type.sync<LMPHostType>();
	atomKK->k_dihedral_atom1.sync<LMPHostType>();
	atomKK->k_dihedral_atom2.sync<LMPHostType>();
	atomKK->k_dihedral_atom3.sync<LMPHostType>();
	atomKK->k_dihedral_atom4.sync<LMPHostType>();
	}
	if (mask & IMPROPER_MASK) {
	atomKK->k_num_improper.sync<LMPHostType>();
	atomKK->k_improper_type.sync<LMPHostType>();
	atomKK->k_improper_atom1.sync<LMPHostType>();
	atomKK->k_improper_atom2.sync<LMPHostType>();
	atomKK->k_improper_atom3.sync<LMPHostType>();
	atomKK->k_improper_atom4.sync<LMPHostType>();
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecFullKokkos::sync_overlapping_device(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
	if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
	if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
	if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
	if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
	if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
	if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
	if ((mask & Q_MASK) && atomKK->k_q.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_float_1d>(atomKK->k_q,space);
	if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
	if (mask & SPECIAL_MASK) {
	if (atomKK->k_nspecial.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
	if (atomKK->k_special.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
	}
	if (mask & BOND_MASK) {
	if (atomKK->k_num_bond.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
	if (atomKK->k_bond_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
	if (atomKK->k_bond_atom.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
	}
	if (mask & ANGLE_MASK) {
	if (atomKK->k_num_angle.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_angle,space);
	if (atomKK->k_angle_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_angle_type,space);
	if (atomKK->k_angle_atom1.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom1,space);
	if (atomKK->k_angle_atom2.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom2,space);
	if (atomKK->k_angle_atom3.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom3,space);
	}
	if (mask & DIHEDRAL_MASK) {
	if (atomKK->k_num_dihedral.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_dihedral,space);
	if (atomKK->k_dihedral_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_dihedral_type,space);
	if (atomKK->k_dihedral_atom1.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom1,space);
	if (atomKK->k_dihedral_atom2.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom2,space);
	if (atomKK->k_dihedral_atom3.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom3,space);
	}
	if (mask & IMPROPER_MASK) {
	if (atomKK->k_num_improper.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_improper,space);
	if (atomKK->k_improper_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_improper_type,space);
	if (atomKK->k_improper_atom1.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom1,space);
	if (atomKK->k_improper_atom2.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom2,space);
	if (atomKK->k_improper_atom3.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom3,space);
	if (atomKK->k_improper_atom4.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom4,space);
	}
	} else {
	if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
	if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
	if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
	if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
	if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
	if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
	if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
	if ((mask & Q_MASK) && atomKK->k_q.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_float_1d>(atomKK->k_q,space);
	if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
	if (mask & SPECIAL_MASK) {
	if (atomKK->k_nspecial.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
	if (atomKK->k_special.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
	}
	if (mask & BOND_MASK) {
	if (atomKK->k_num_bond.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
	if (atomKK->k_bond_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
	if (atomKK->k_bond_atom.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
	}
	if (mask & ANGLE_MASK) {
	if (atomKK->k_num_angle.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_angle,space);
	if (atomKK->k_angle_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_angle_type,space);
	if (atomKK->k_angle_atom1.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom1,space);
	if (atomKK->k_angle_atom2.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom2,space);
	if (atomKK->k_angle_atom3.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom3,space);
	}
	if (mask & DIHEDRAL_MASK) {
	if (atomKK->k_num_dihedral.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_dihedral,space);
	if (atomKK->k_dihedral_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_dihedral_type,space);
	if (atomKK->k_dihedral_atom1.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom1,space);
	if (atomKK->k_dihedral_atom2.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom2,space);
	if (atomKK->k_dihedral_atom3.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom3,space);
	if (atomKK->k_dihedral_atom4.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom4,space);
	}
	if (mask & IMPROPER_MASK) {
	if (atomKK->k_num_improper.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_improper,space);
	if (atomKK->k_improper_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_improper_type,space);
	if (atomKK->k_improper_atom1.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom1,space);
	if (atomKK->k_improper_atom2.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom2,space);
	if (atomKK->k_improper_atom3.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom3,space);
	if (atomKK->k_improper_atom4.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom4,space);
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecFullKokkos::modified(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if (mask & X_MASK) atomKK->k_x.modify<LMPDeviceType>();
	if (mask & V_MASK) atomKK->k_v.modify<LMPDeviceType>();
	if (mask & F_MASK) atomKK->k_f.modify<LMPDeviceType>();
	if (mask & TAG_MASK) atomKK->k_tag.modify<LMPDeviceType>();
	if (mask & TYPE_MASK) atomKK->k_type.modify<LMPDeviceType>();
	if (mask & MASK_MASK) atomKK->k_mask.modify<LMPDeviceType>();
	if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPDeviceType>();
	if (mask & Q_MASK) atomKK->k_q.modify<LMPDeviceType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPDeviceType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.modify<LMPDeviceType>();
	atomKK->k_special.modify<LMPDeviceType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.modify<LMPDeviceType>();
	atomKK->k_bond_type.modify<LMPDeviceType>();
	atomKK->k_bond_atom.modify<LMPDeviceType>();
	}
	if (mask & ANGLE_MASK) {
	atomKK->k_num_angle.modify<LMPDeviceType>();
	atomKK->k_angle_type.modify<LMPDeviceType>();
	atomKK->k_angle_atom1.modify<LMPDeviceType>();
	atomKK->k_angle_atom2.modify<LMPDeviceType>();
	atomKK->k_angle_atom3.modify<LMPDeviceType>();
	}
	if (mask & DIHEDRAL_MASK) {
	atomKK->k_num_dihedral.modify<LMPDeviceType>();
	atomKK->k_dihedral_type.modify<LMPDeviceType>();
	atomKK->k_dihedral_atom1.modify<LMPDeviceType>();
	atomKK->k_dihedral_atom2.modify<LMPDeviceType>();
	atomKK->k_dihedral_atom3.modify<LMPDeviceType>();
	atomKK->k_dihedral_atom4.modify<LMPDeviceType>();
	}
	if (mask & IMPROPER_MASK) {
	atomKK->k_num_improper.modify<LMPDeviceType>();
	atomKK->k_improper_type.modify<LMPDeviceType>();
	atomKK->k_improper_atom1.modify<LMPDeviceType>();
	atomKK->k_improper_atom2.modify<LMPDeviceType>();
	atomKK->k_improper_atom3.modify<LMPDeviceType>();
	atomKK->k_improper_atom4.modify<LMPDeviceType>();
	}
	} else {
	if (mask & X_MASK) atomKK->k_x.modify<LMPHostType>();
	if (mask & V_MASK) atomKK->k_v.modify<LMPHostType>();
	if (mask & F_MASK) atomKK->k_f.modify<LMPHostType>();
	if (mask & TAG_MASK) atomKK->k_tag.modify<LMPHostType>();
	if (mask & TYPE_MASK) atomKK->k_type.modify<LMPHostType>();
	if (mask & MASK_MASK) atomKK->k_mask.modify<LMPHostType>();
	if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPHostType>();
	if (mask & Q_MASK) atomKK->k_q.modify<LMPHostType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPHostType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.modify<LMPHostType>();
	atomKK->k_special.modify<LMPHostType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.modify<LMPHostType>();
	atomKK->k_bond_type.modify<LMPHostType>();
	atomKK->k_bond_atom.modify<LMPHostType>();
	}
	if (mask & ANGLE_MASK) {
	atomKK->k_num_angle.modify<LMPHostType>();
	atomKK->k_angle_type.modify<LMPHostType>();
	atomKK->k_angle_atom1.modify<LMPHostType>();
	atomKK->k_angle_atom2.modify<LMPHostType>();
	atomKK->k_angle_atom3.modify<LMPHostType>();
	}
	if (mask & DIHEDRAL_MASK) {
	atomKK->k_num_dihedral.modify<LMPHostType>();
	atomKK->k_dihedral_type.modify<LMPHostType>();
	atomKK->k_dihedral_atom1.modify<LMPHostType>();
	atomKK->k_dihedral_atom2.modify<LMPHostType>();
	atomKK->k_dihedral_atom3.modify<LMPHostType>();
	atomKK->k_dihedral_atom4.modify<LMPHostType>();
	}
	if (mask & IMPROPER_MASK) {
	atomKK->k_num_improper.modify<LMPHostType>();
	atomKK->k_improper_type.modify<LMPHostType>();
	atomKK->k_improper_atom1.modify<LMPHostType>();
	atomKK->k_improper_atom2.modify<LMPHostType>();
	atomKK->k_improper_atom3.modify<LMPHostType>();
	atomKK->k_improper_atom4.modify<LMPHostType>();
	}
	}
	}

	diff --git a/src/KOKKOS/atom_vec_kokkos.h b/src/KOKKOS/atom_vec_kokkos.h
	index 7ac66f162..7f593f235 100644
	--- a/src/KOKKOS/atom_vec_kokkos.h
	+++ b/src/KOKKOS/atom_vec_kokkos.h
	@@ -1,155 +1,166 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifndef LMP_ATOM_VEC_KOKKOS_H
	#define LMP_ATOM_VEC_KOKKOS_H

	#include "atom_vec.h"
	#include "kokkos_type.h"
	#include <type_traits>

	namespace LAMMPS_NS {

	+union d_ubuf {
	+ double d;
	+ int64_t i;
	+ KOKKOS_INLINE_FUNCTION
	+ d_ubuf(double arg) : d(arg) {}
	+ KOKKOS_INLINE_FUNCTION
	+ d_ubuf(int64_t arg) : i(arg) {}
	+ KOKKOS_INLINE_FUNCTION
	+ d_ubuf(int arg) : i(arg) {}
	+};
	+
	class AtomVecKokkos : public AtomVec {
	public:
	AtomVecKokkos(class LAMMPS *);
	virtual ~AtomVecKokkos() {}

	virtual void sync(ExecutionSpace space, unsigned int mask) = 0;
	virtual void modified(ExecutionSpace space, unsigned int mask) = 0;
	virtual void sync_overlapping_device(ExecutionSpace space, unsigned int mask) {};

	virtual int
	pack_comm_self(const int &n, const DAT::tdual_int_2d &list,
	const int & iswap, const int nfirst,
	const int &pbc_flag, const int pbc[]) = 0;
	//{return 0;}
	virtual int
	pack_comm_kokkos(const int &n, const DAT::tdual_int_2d &list,
	const int & iswap, const DAT::tdual_xfloat_2d &buf,
	const int &pbc_flag, const int pbc[]) = 0;
	//{return 0;}
	virtual void
	unpack_comm_kokkos(const int &n, const int &nfirst,
	const DAT::tdual_xfloat_2d &buf) = 0;
	virtual int
	pack_border_kokkos(int n, DAT::tdual_int_2d k_sendlist,
	DAT::tdual_xfloat_2d buf,int iswap,
	int pbc_flag, int *pbc, ExecutionSpace space) = 0;
	//{return 0;};
	virtual void
	unpack_border_kokkos(const int &n, const int &nfirst,
	const DAT::tdual_xfloat_2d &buf,
	ExecutionSpace space) = 0;

	virtual int
	pack_exchange_kokkos(const int &nsend, DAT::tdual_xfloat_2d &buf,
	DAT::tdual_int_1d k_sendlist,
	DAT::tdual_int_1d k_copylist,
	ExecutionSpace space, int dim, X_FLOAT lo, X_FLOAT hi) = 0;
	//{return 0;};
	virtual int
	unpack_exchange_kokkos(DAT::tdual_xfloat_2d &k_buf, int nrecv,
	int nlocal, int dim, X_FLOAT lo, X_FLOAT hi,
	ExecutionSpace space) = 0;
	//{return 0;};

	protected:

	class CommKokkos *commKK;
	size_t buffer_size;
	void* buffer;

	#ifdef KOKKOS_HAVE_CUDA
	template<class ViewType>
	Kokkos::View<typename ViewType::data_type,
	typename ViewType::array_layout,
	Kokkos::CudaHostPinnedSpace,
	Kokkos::MemoryTraits<Kokkos::Unmanaged> >
	create_async_copy(const ViewType& src) {
	typedef Kokkos::View<typename ViewType::data_type,
	typename ViewType::array_layout,
	typename std::conditional<
	std::is_same<typename ViewType::execution_space,LMPDeviceType>::value,
	Kokkos::CudaHostPinnedSpace,typename ViewType::memory_space>::type,
	Kokkos::MemoryTraits<Kokkos::Unmanaged> > mirror_type;
	if (buffer_size == 0) {
	buffer = Kokkos::kokkos_malloc<Kokkos::CudaHostPinnedSpace>(src.capacity());
	buffer_size = src.capacity();
	} else if (buffer_size < src.capacity()) {
	buffer = Kokkos::kokkos_realloc<Kokkos::CudaHostPinnedSpace>(buffer,src.capacity());
	buffer_size = src.capacity();
	}
	return mirror_type( buffer ,
	src.dimension_0() ,
	src.dimension_1() ,
	src.dimension_2() ,
	src.dimension_3() ,
	src.dimension_4() ,
	src.dimension_5() ,
	src.dimension_6() ,
	src.dimension_7() );
	}

	template<class ViewType>
	void perform_async_copy(const ViewType& src, unsigned int space) {
	typedef Kokkos::View<typename ViewType::data_type,
	typename ViewType::array_layout,
	typename std::conditional<
	std::is_same<typename ViewType::execution_space,LMPDeviceType>::value,
	Kokkos::CudaHostPinnedSpace,typename ViewType::memory_space>::type,
	Kokkos::MemoryTraits<Kokkos::Unmanaged> > mirror_type;
	if (buffer_size == 0) {
	buffer = Kokkos::kokkos_malloc<Kokkos::CudaHostPinnedSpace>(src.capacity()*sizeof(typename ViewType::value_type));
	buffer_size = src.capacity();
	} else if (buffer_size < src.capacity()) {
	buffer = Kokkos::kokkos_realloc<Kokkos::CudaHostPinnedSpace>(buffer,src.capacity()*sizeof(typename ViewType::value_type));
	buffer_size = src.capacity();
	}
	mirror_type tmp_view( (typename ViewType::value_type*)buffer ,
	src.dimension_0() ,
	src.dimension_1() ,
	src.dimension_2() ,
	src.dimension_3() ,
	src.dimension_4() ,
	src.dimension_5() ,
	src.dimension_6() ,
	src.dimension_7() );
	if(space == Device) {
	Kokkos::deep_copy(LMPHostType(),tmp_view,src.h_view),
	Kokkos::deep_copy(LMPHostType(),src.d_view,tmp_view);
	src.modified_device() = src.modified_host();
	} else {
	Kokkos::deep_copy(LMPHostType(),tmp_view,src.d_view),
	Kokkos::deep_copy(LMPHostType(),src.h_view,tmp_view);
	src.modified_device() = src.modified_host();
	}
	}
	#else
	template<class ViewType>
	void perform_async_copy(ViewType& src, unsigned int space) {
	if(space == Device)
	src.template sync<LMPDeviceType>();
	else
	src.template sync<LMPHostType>();
	}
	#endif
	};

	}

	#endif

	/* ERROR/WARNING messages:

	*/
	diff --git a/src/KOKKOS/atom_vec_molecular_kokkos.cpp b/src/KOKKOS/atom_vec_molecular_kokkos.cpp
	index 4fd811437..5c16ac151 100644
	--- a/src/KOKKOS/atom_vec_molecular_kokkos.cpp
	+++ b/src/KOKKOS/atom_vec_molecular_kokkos.cpp
	@@ -1,2386 +1,2386 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <stdlib.h>
	#include "atom_vec_molecular_kokkos.h"
	#include "atom_kokkos.h"
	#include "comm_kokkos.h"
	#include "domain.h"
	#include "modify.h"
	#include "fix.h"
	#include "atom_masks.h"
	#include "memory.h"
	#include "error.h"

	using namespace LAMMPS_NS;

	#define DELTA 10000

	/* ---------------------------------------------------------------------- */

	AtomVecMolecularKokkos::AtomVecMolecularKokkos(LAMMPS *lmp) : AtomVecKokkos(lmp)
	{
	molecular = 1;
	bonds_allow = angles_allow = dihedrals_allow = impropers_allow = 1;
	mass_type = 1;

	comm_x_only = comm_f_only = 1;
	size_forward = 3;
	size_reverse = 3;
	size_border = 7;
	size_velocity = 3;
	size_data_atom = 6;
	size_data_vel = 4;
	xcol_data = 4;

	atom->molecule_flag = 1;

	k_count = DAT::tdual_int_1d("atom::k_count",1);
	atomKK = (AtomKokkos *) atom;
	commKK = (CommKokkos *) comm;
	}

	/* ----------------------------------------------------------------------
	grow atom arrays
	n = 0 grows arrays by DELTA
	n > 0 allocates arrays to size n
	------------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::grow(int n)
	{
	if (n == 0) nmax += DELTA;
	else nmax = n;
	atomKK->nmax = nmax;
	if (nmax < 0 \|\| nmax > MAXSMALLINT)
	error->one(FLERR,"Per-processor system is too big");

	sync(Device,ALL_MASK);
	modified(Device,ALL_MASK);

	memory->grow_kokkos(atomKK->k_tag,atomKK->tag,nmax,"atom:tag");
	memory->grow_kokkos(atomKK->k_type,atomKK->type,nmax,"atom:type");
	memory->grow_kokkos(atomKK->k_mask,atomKK->mask,nmax,"atom:mask");
	memory->grow_kokkos(atomKK->k_image,atomKK->image,nmax,"atom:image");

	memory->grow_kokkos(atomKK->k_x,atomKK->x,nmax,3,"atom:x");
	memory->grow_kokkos(atomKK->k_v,atomKK->v,nmax,3,"atom:v");
	memory->grow_kokkos(atomKK->k_f,atomKK->f,nmax,3,"atom:f");

	memory->grow_kokkos(atomKK->k_molecule,atomKK->molecule,nmax,"atom:molecule");
	memory->grow_kokkos(atomKK->k_nspecial,atomKK->nspecial,nmax,3,"atom:nspecial");
	memory->grow_kokkos(atomKK->k_special,atomKK->special,nmax,atomKK->maxspecial,
	"atom:special");
	memory->grow_kokkos(atomKK->k_num_bond,atomKK->num_bond,nmax,"atom:num_bond");
	memory->grow_kokkos(atomKK->k_bond_type,atomKK->bond_type,nmax,atomKK->bond_per_atom,
	"atom:bond_type");
	memory->grow_kokkos(atomKK->k_bond_atom,atomKK->bond_atom,nmax,atomKK->bond_per_atom,
	"atom:bond_atom");

	memory->grow_kokkos(atomKK->k_num_angle,atomKK->num_angle,nmax,"atom:num_angle");
	memory->grow_kokkos(atomKK->k_angle_type,atomKK->angle_type,nmax,atomKK->angle_per_atom,
	"atom:angle_type");
	memory->grow_kokkos(atomKK->k_angle_atom1,atomKK->angle_atom1,nmax,atomKK->angle_per_atom,
	"atom:angle_atom1");
	memory->grow_kokkos(atomKK->k_angle_atom2,atomKK->angle_atom2,nmax,atomKK->angle_per_atom,
	"atom:angle_atom2");
	memory->grow_kokkos(atomKK->k_angle_atom3,atomKK->angle_atom3,nmax,atomKK->angle_per_atom,
	"atom:angle_atom3");

	memory->grow_kokkos(atomKK->k_num_dihedral,atomKK->num_dihedral,nmax,"atom:num_dihedral");
	memory->grow_kokkos(atomKK->k_dihedral_type,atomKK->dihedral_type,nmax,
	atomKK->dihedral_per_atom,"atom:dihedral_type");
	memory->grow_kokkos(atomKK->k_dihedral_atom1,atomKK->dihedral_atom1,nmax,
	atomKK->dihedral_per_atom,"atom:dihedral_atom1");
	memory->grow_kokkos(atomKK->k_dihedral_atom2,atomKK->dihedral_atom2,nmax,
	atomKK->dihedral_per_atom,"atom:dihedral_atom2");
	memory->grow_kokkos(atomKK->k_dihedral_atom3,atomKK->dihedral_atom3,nmax,
	atomKK->dihedral_per_atom,"atom:dihedral_atom3");
	memory->grow_kokkos(atomKK->k_dihedral_atom4,atomKK->dihedral_atom4,nmax,
	atomKK->dihedral_per_atom,"atom:dihedral_atom4");

	memory->grow_kokkos(atomKK->k_num_improper,atomKK->num_improper,nmax,"atom:num_improper");
	memory->grow_kokkos(atomKK->k_improper_type,atomKK->improper_type,nmax,
	atomKK->improper_per_atom,"atom:improper_type");
	memory->grow_kokkos(atomKK->k_improper_atom1,atomKK->improper_atom1,nmax,
	atomKK->improper_per_atom,"atom:improper_atom1");
	memory->grow_kokkos(atomKK->k_improper_atom2,atomKK->improper_atom2,nmax,
	atomKK->improper_per_atom,"atom:improper_atom2");
	memory->grow_kokkos(atomKK->k_improper_atom3,atomKK->improper_atom3,nmax,
	atomKK->improper_per_atom,"atom:improper_atom3");
	memory->grow_kokkos(atomKK->k_improper_atom4,atomKK->improper_atom4,nmax,
	atomKK->improper_per_atom,"atom:improper_atom4");

	grow_reset();
	sync(Host,ALL_MASK);

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
	}

	/* ----------------------------------------------------------------------
	reset local array ptrs
	------------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::grow_reset()
	{
	tag = atomKK->tag;
	d_tag = atomKK->k_tag.d_view;
	h_tag = atomKK->k_tag.h_view;

	type = atomKK->type;
	d_type = atomKK->k_type.d_view;
	h_type = atomKK->k_type.h_view;
	mask = atomKK->mask;
	d_mask = atomKK->k_mask.d_view;
	h_mask = atomKK->k_mask.h_view;
	image = atomKK->image;
	d_image = atomKK->k_image.d_view;
	h_image = atomKK->k_image.h_view;

	x = atomKK->x;
	d_x = atomKK->k_x.d_view;
	h_x = atomKK->k_x.h_view;
	v = atomKK->v;
	d_v = atomKK->k_v.d_view;
	h_v = atomKK->k_v.h_view;
	f = atomKK->f;
	d_f = atomKK->k_f.d_view;
	h_f = atomKK->k_f.h_view;

	molecule = atomKK->molecule;
	d_molecule = atomKK->k_molecule.d_view;
	h_molecule = atomKK->k_molecule.h_view;
	nspecial = atomKK->nspecial;
	d_nspecial = atomKK->k_nspecial.d_view;
	h_nspecial = atomKK->k_nspecial.h_view;
	special = atomKK->special;
	d_special = atomKK->k_special.d_view;
	h_special = atomKK->k_special.h_view;
	num_bond = atomKK->num_bond;
	d_num_bond = atomKK->k_num_bond.d_view;
	h_num_bond = atomKK->k_num_bond.h_view;
	bond_type = atomKK->bond_type;
	d_bond_type = atomKK->k_bond_type.d_view;
	h_bond_type = atomKK->k_bond_type.h_view;
	bond_atom = atomKK->bond_atom;
	d_bond_atom = atomKK->k_bond_atom.d_view;
	h_bond_atom = atomKK->k_bond_atom.h_view;

	num_angle = atomKK->num_angle;
	d_num_angle = atomKK->k_num_angle.d_view;
	h_num_angle = atomKK->k_num_angle.h_view;
	angle_type = atomKK->angle_type;
	d_angle_type = atomKK->k_angle_type.d_view;
	h_angle_type = atomKK->k_angle_type.h_view;
	angle_atom1 = atomKK->angle_atom1;
	d_angle_atom1 = atomKK->k_angle_atom1.d_view;
	h_angle_atom1 = atomKK->k_angle_atom1.h_view;
	angle_atom2 = atomKK->angle_atom2;
	d_angle_atom2 = atomKK->k_angle_atom2.d_view;
	h_angle_atom2 = atomKK->k_angle_atom2.h_view;
	angle_atom3 = atomKK->angle_atom3;
	d_angle_atom3 = atomKK->k_angle_atom3.d_view;
	h_angle_atom3 = atomKK->k_angle_atom3.h_view;

	num_dihedral = atomKK->num_dihedral;
	d_num_dihedral = atomKK->k_num_dihedral.d_view;
	h_num_dihedral = atomKK->k_num_dihedral.h_view;
	dihedral_type = atomKK->dihedral_type;
	d_dihedral_type = atomKK->k_dihedral_type.d_view;
	h_dihedral_type = atomKK->k_dihedral_type.h_view;
	dihedral_atom1 = atomKK->dihedral_atom1;
	d_dihedral_atom1 = atomKK->k_dihedral_atom1.d_view;
	h_dihedral_atom1 = atomKK->k_dihedral_atom1.h_view;
	dihedral_atom2 = atomKK->dihedral_atom2;
	d_dihedral_atom2 = atomKK->k_dihedral_atom2.d_view;
	h_dihedral_atom2 = atomKK->k_dihedral_atom2.h_view;
	dihedral_atom3 = atomKK->dihedral_atom3;
	d_dihedral_atom3 = atomKK->k_dihedral_atom3.d_view;
	h_dihedral_atom3 = atomKK->k_dihedral_atom3.h_view;
	dihedral_atom4 = atomKK->dihedral_atom4;
	d_dihedral_atom4 = atomKK->k_dihedral_atom4.d_view;
	h_dihedral_atom4 = atomKK->k_dihedral_atom4.h_view;

	num_improper = atomKK->num_improper;
	d_num_improper = atomKK->k_num_improper.d_view;
	h_num_improper = atomKK->k_num_improper.h_view;
	improper_type = atomKK->improper_type;
	d_improper_type = atomKK->k_improper_type.d_view;
	h_improper_type = atomKK->k_improper_type.h_view;
	improper_atom1 = atomKK->improper_atom1;
	d_improper_atom1 = atomKK->k_improper_atom1.d_view;
	h_improper_atom1 = atomKK->k_improper_atom1.h_view;
	improper_atom2 = atomKK->improper_atom2;
	d_improper_atom2 = atomKK->k_improper_atom2.d_view;
	h_improper_atom2 = atomKK->k_improper_atom2.h_view;
	improper_atom3 = atomKK->improper_atom3;
	d_improper_atom3 = atomKK->k_improper_atom3.d_view;
	h_improper_atom3 = atomKK->k_improper_atom3.h_view;
	improper_atom4 = atomKK->improper_atom4;
	d_improper_atom4 = atomKK->k_improper_atom4.d_view;
	h_improper_atom4 = atomKK->k_improper_atom4.h_view;
	}

	/* ----------------------------------------------------------------------
	copy atom I info to atom J
	------------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::copy(int i, int j, int delflag)
	{
	int k;

	h_tag[j] = h_tag[i];
	h_type[j] = h_type[i];
	mask[j] = mask[i];
	h_image[j] = h_image[i];
	h_x(j,0) = h_x(i,0);
	h_x(j,1) = h_x(i,1);
	h_x(j,2) = h_x(i,2);
	h_v(j,0) = h_v(i,0);
	h_v(j,1) = h_v(i,1);
	h_v(j,2) = h_v(i,2);

	h_molecule(j) = h_molecule(i);

	h_num_bond(j) = h_num_bond(i);
	for (k = 0; k < h_num_bond(j); k++) {
	h_bond_type(j,k) = h_bond_type(i,k);
	h_bond_atom(j,k) = h_bond_atom(i,k);
	}

	h_nspecial(j,0) = h_nspecial(i,0);
	h_nspecial(j,1) = h_nspecial(i,1);
	h_nspecial(j,2) = h_nspecial(i,2);
	for (k = 0; k < h_nspecial(j,2); k++)
	h_special(j,k) = h_special(i,k);

	h_num_angle(j) = h_num_angle(i);
	for (k = 0; k < h_num_angle(j); k++) {
	h_angle_type(j,k) = h_angle_type(i,k);
	h_angle_atom1(j,k) = h_angle_atom1(i,k);
	h_angle_atom2(j,k) = h_angle_atom2(i,k);
	h_angle_atom3(j,k) = h_angle_atom3(i,k);
	}

	h_num_dihedral(j) = h_num_dihedral(i);
	for (k = 0; k < h_num_dihedral(j); k++) {
	h_dihedral_type(j,k) = h_dihedral_type(i,k);
	h_dihedral_atom1(j,k) = h_dihedral_atom1(i,k);
	h_dihedral_atom2(j,k) = h_dihedral_atom2(i,k);
	h_dihedral_atom3(j,k) = h_dihedral_atom3(i,k);
	h_dihedral_atom4(j,k) = h_dihedral_atom4(i,k);
	}

	h_num_improper(j) = h_num_improper(i);
	for (k = 0; k < h_num_improper(j); k++) {
	h_improper_type(j,k) = h_improper_type(i,k);
	h_improper_atom1(j,k) = h_improper_atom1(i,k);
	h_improper_atom2(j,k) = h_improper_atom2(i,k);
	h_improper_atom3(j,k) = h_improper_atom3(i,k);
	h_improper_atom4(j,k) = h_improper_atom4(i,k);
	}

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG,int TRICLINIC>
	struct AtomVecMolecularKokkos_PackComm {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
	typename ArrayTypes<DeviceType>::t_xfloat_2d_um _buf;
	typename ArrayTypes<DeviceType>::t_int_2d_const _list;
	const int _iswap;
	X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
	X_FLOAT _pbc[6];

	AtomVecMolecularKokkos_PackComm(
	const typename DAT::tdual_x_array &x,
	const typename DAT::tdual_xfloat_2d &buf,
	const typename DAT::tdual_int_2d &list,
	const int & iswap,
	const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
	const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
	_x(x.view<DeviceType>()),_list(list.view<DeviceType>()),_iswap(iswap),
	_xprd(xprd),_yprd(yprd),_zprd(zprd),
	_xy(xy),_xz(xz),_yz(yz) {
	const size_t maxsend = (buf.view<DeviceType>().dimension_0()
	*buf.view<DeviceType>().dimension_1())/3;
	const size_t elements = 3;
	buffer_view<DeviceType>(_buf,buf,maxsend,elements);
	_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
	_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_buf(i,0) = _x(j,0);
	_buf(i,1) = _x(j,1);
	_buf(i,2) = _x(j,2);
	} else {
	if (TRICLINIC == 0) {
	_buf(i,0) = _x(j,0) + _pbc[0]*_xprd;
	_buf(i,1) = _x(j,1) + _pbc[1]*_yprd;
	_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
	} else {
	_buf(i,0) = _x(j,0) + _pbc[0]_xprd + _pbc[5]_xy + _pbc[4]*_xz;
	_buf(i,1) = _x(j,1) + _pbc[1]_yprd + _pbc[3]_yz;
	_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
	}
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::pack_comm_kokkos(const int &n,
	const DAT::tdual_int_2d &list,
	const int & iswap,
	const DAT::tdual_xfloat_2d &buf,
	const int &pbc_flag,
	const int* const pbc)
	{
	// Check whether to always run forward communication on the host
	// Choose correct forward PackComm kernel

	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecMolecularKokkos_PackComm<LMPHostType,1,1>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecMolecularKokkos_PackComm<LMPHostType,1,0>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecMolecularKokkos_PackComm<LMPHostType,0,1>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecMolecularKokkos_PackComm<LMPHostType,0,0>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPHostType::fence();
	} else {
	sync(Device,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecMolecularKokkos_PackComm<LMPDeviceType,1,1>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecMolecularKokkos_PackComm<LMPDeviceType,1,0>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecMolecularKokkos_PackComm<LMPDeviceType,0,1>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecMolecularKokkos_PackComm<LMPDeviceType,0,0>
	f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPDeviceType::fence();
	}

	return n*size_forward;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG,int TRICLINIC>
	struct AtomVecMolecularKokkos_PackCommSelf {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
	typename ArrayTypes<DeviceType>::t_x_array _xw;
	int _nfirst;
	typename ArrayTypes<DeviceType>::t_int_2d_const _list;
	const int _iswap;
	X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
	X_FLOAT _pbc[6];

	AtomVecMolecularKokkos_PackCommSelf(
	const typename DAT::tdual_x_array &x,
	const int &nfirst,
	const typename DAT::tdual_int_2d &list,
	const int & iswap,
	const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
	const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
	_x(x.view<DeviceType>()),_xw(x.view<DeviceType>()),_nfirst(nfirst),
	_list(list.view<DeviceType>()),_iswap(iswap),
	_xprd(xprd),_yprd(yprd),_zprd(zprd),
	_xy(xy),_xz(xz),_yz(yz) {
	_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
	_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_xw(i+_nfirst,0) = _x(j,0);
	_xw(i+_nfirst,1) = _x(j,1);
	_xw(i+_nfirst,2) = _x(j,2);
	} else {
	if (TRICLINIC == 0) {
	_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd;
	_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd;
	_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
	} else {
	_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]_xprd + _pbc[5]_xy + _pbc[4]*_xz;
	_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]_yprd + _pbc[3]_yz;
	_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
	}
	}

	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::pack_comm_self(const int &n, const DAT::tdual_int_2d &list,
	const int & iswap,
	const int nfirst, const int &pbc_flag,
	const int* const pbc) {
	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	modified(Host,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecMolecularKokkos_PackCommSelf<LMPHostType,1,1>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecMolecularKokkos_PackCommSelf<LMPHostType,1,0>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecMolecularKokkos_PackCommSelf<LMPHostType,0,1>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecMolecularKokkos_PackCommSelf<LMPHostType,0,0>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPHostType::fence();
	} else {
	sync(Device,X_MASK);
	modified(Device,X_MASK);
	if(pbc_flag) {
	if(domain->triclinic) {
	struct AtomVecMolecularKokkos_PackCommSelf<LMPDeviceType,1,1>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecMolecularKokkos_PackCommSelf<LMPDeviceType,1,0>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	} else {
	if(domain->triclinic) {
	struct AtomVecMolecularKokkos_PackCommSelf<LMPDeviceType,0,1>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	} else {
	struct AtomVecMolecularKokkos_PackCommSelf<LMPDeviceType,0,0>
	f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
	domain->xy,domain->xz,domain->yz,pbc);
	Kokkos::parallel_for(n,f);
	}
	}
	LMPDeviceType::fence();
	}
	return n*3;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecMolecularKokkos_UnpackComm {
	typedef DeviceType device_type;

	typename ArrayTypes<DeviceType>::t_x_array _x;
	typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
	int _first;

	AtomVecMolecularKokkos_UnpackComm(
	const typename DAT::tdual_x_array &x,
	const typename DAT::tdual_xfloat_2d &buf,
	const int& first):_x(x.view<DeviceType>()),_buf(buf.view<DeviceType>()),
	_first(first) {};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	_x(i+_first,0) = _buf(i,0);
	_x(i+_first,1) = _buf(i,1);
	_x(i+_first,2) = _buf(i,2);
	}
	};

	/* ---------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::unpack_comm_kokkos(const int &n, const int &first,
	const DAT::tdual_xfloat_2d &buf ) {
	if(commKK->forward_comm_on_host) {
	sync(Host,X_MASK);
	modified(Host,X_MASK);
	struct AtomVecMolecularKokkos_UnpackComm<LMPHostType> f(atomKK->k_x,buf,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	} else {
	sync(Device,X_MASK);
	modified(Device,X_MASK);
	struct AtomVecMolecularKokkos_UnpackComm<LMPDeviceType> f(atomKK->k_x,buf,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::pack_comm(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0]domain->xprd + pbc[5]domain->xy + pbc[4]*domain->xz;
	dy = pbc[1]domain->yprd + pbc[3]domain->yz;
	dz = pbc[2]*domain->zprd;
	}
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	}
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::pack_comm_vel(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz,dvx,dvy,dvz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0]domain->xprd + pbc[5]domain->xy + pbc[4]*domain->xz;
	dy = pbc[1]domain->yprd + pbc[3]domain->yz;
	dz = pbc[2]*domain->zprd;
	}
	if (!deform_vremap) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	dvx = pbc[0]h_rate[0] + pbc[5]h_rate[5] + pbc[4]*h_rate[4];
	dvy = pbc[1]h_rate[1] + pbc[3]h_rate[3];
	dvz = pbc[2]*h_rate[2];
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	if (mask[i] & deform_groupbit) {
	buf[m++] = h_v(j,0) + dvx;
	buf[m++] = h_v(j,1) + dvy;
	buf[m++] = h_v(j,2) + dvz;
	} else {
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	}
	}
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::unpack_comm(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::unpack_comm_vel(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_v(i,0) = buf[m++];
	h_v(i,1) = buf[m++];
	h_v(i,2) = buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::pack_reverse(int n, int first, double *buf)
	{
	if(n > 0)
	sync(Host,F_MASK);

	int m = 0;
	const int last = first + n;
	for (int i = first; i < last; i++) {
	buf[m++] = h_f(i,0);
	buf[m++] = h_f(i,1);
	buf[m++] = h_f(i,2);
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::unpack_reverse(int n, int list, double buf)
	{
	if(n > 0)
	modified(Host,F_MASK);

	int m = 0;
	for (int i = 0; i < n; i++) {
	const int j = list[i];
	h_f(j,0) += buf[m++];
	h_f(j,1) += buf[m++];
	h_f(j,2) += buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType,int PBC_FLAG>
	struct AtomVecMolecularKokkos_PackBorder {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;

	typename AT::t_xfloat_2d _buf;
	const typename AT::t_int_2d_const _list;
	const int _iswap;
	const typename AT::t_x_array_randomread _x;
	const typename AT::t_tagint_1d _tag;
	const typename AT::t_int_1d _type;
	const typename AT::t_int_1d _mask;
	const typename AT::t_tagint_1d _molecule;
	X_FLOAT _dx,_dy,_dz;

	AtomVecMolecularKokkos_PackBorder(
	const typename AT::t_xfloat_2d &buf,
	const typename AT::t_int_2d_const &list,
	const int & iswap,
	const typename AT::t_x_array &x,
	const typename AT::t_tagint_1d &tag,
	const typename AT::t_int_1d &type,
	const typename AT::t_int_1d &mask,
	const typename AT::t_tagint_1d &molecule,
	const X_FLOAT &dx, const X_FLOAT &dy, const X_FLOAT &dz):
	_buf(buf),_list(list),_iswap(iswap),
	_x(x),_tag(tag),_type(type),_mask(mask),_molecule(molecule),
	_dx(dx),_dy(dy),_dz(dz) {}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	const int j = _list(_iswap,i);
	if (PBC_FLAG == 0) {
	_buf(i,0) = _x(j,0);
	_buf(i,1) = _x(j,1);
	_buf(i,2) = _x(j,2);
	- _buf(i,3) = _tag(j);
	- _buf(i,4) = _type(j);
	- _buf(i,5) = _mask(j);
	- _buf(i,6) = _molecule(j);
	+ _buf(i,3) = d_ubuf(_tag(j)).d;
	+ _buf(i,4) = d_ubuf(_type(j)).d;
	+ _buf(i,5) = d_ubuf(_mask(j)).d;
	+ _buf(i,6) = d_ubuf(_molecule(j)).d;
	} else {
	_buf(i,0) = _x(j,0) + _dx;
	_buf(i,1) = _x(j,1) + _dy;
	_buf(i,2) = _x(j,2) + _dz;
	- _buf(i,3) = _tag(j);
	- _buf(i,4) = _type(j);
	- _buf(i,5) = _mask(j);
	- _buf(i,6) = _molecule(j);
	+ _buf(i,3) = d_ubuf(_tag(j)).d;
	+ _buf(i,4) = d_ubuf(_type(j)).d;
	+ _buf(i,5) = d_ubuf(_mask(j)).d;
	+ _buf(i,6) = d_ubuf(_molecule(j)).d;
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::pack_border_kokkos(int n, DAT::tdual_int_2d k_sendlist,
	DAT::tdual_xfloat_2d buf,int iswap,
	int pbc_flag, int *pbc, ExecutionSpace space)
	{
	X_FLOAT dx,dy,dz;

	if (pbc_flag != 0) {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	if(space==Host) {
	AtomVecMolecularKokkos_PackBorder<LMPHostType,1> f(
	buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
	iswap,h_x,h_tag,h_type,h_mask,h_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	AtomVecMolecularKokkos_PackBorder<LMPDeviceType,1> f(
	buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
	iswap,d_x,d_tag,d_type,d_mask,d_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}

	} else {
	dx = dy = dz = 0;
	if(space==Host) {
	AtomVecMolecularKokkos_PackBorder<LMPHostType,0> f(
	buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
	iswap,h_x,h_tag,h_type,h_mask,h_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	AtomVecMolecularKokkos_PackBorder<LMPDeviceType,0> f(
	buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
	iswap,d_x,d_tag,d_type,d_mask,d_molecule,dx,dy,dz);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}
	return n*size_border;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::pack_border(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = ubuf(h_molecule(j)).d;
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = ubuf(h_molecule(j)).d;
	}
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);

	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::pack_border_vel(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz,dvx,dvy,dvz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0);
	buf[m++] = h_x(j,1);
	buf[m++] = h_x(j,2);
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = ubuf(h_molecule(j)).d;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0];
	dy = pbc[1];
	dz = pbc[2];
	}
	if (!deform_vremap) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = ubuf(h_molecule(j)).d;
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	} else {
	dvx = pbc[0]h_rate[0] + pbc[5]h_rate[5] + pbc[4]*h_rate[4];
	dvy = pbc[1]h_rate[1] + pbc[3]h_rate[3];
	dvz = pbc[2]*h_rate[2];
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_x(j,0) + dx;
	buf[m++] = h_x(j,1) + dy;
	buf[m++] = h_x(j,2) + dz;
	buf[m++] = ubuf(h_tag(j)).d;
	buf[m++] = ubuf(h_type(j)).d;
	buf[m++] = ubuf(h_mask(j)).d;
	buf[m++] = ubuf(h_molecule(j)).d;
	if (mask[i] & deform_groupbit) {
	buf[m++] = h_v(j,0) + dvx;
	buf[m++] = h_v(j,1) + dvy;
	buf[m++] = h_v(j,2) + dvz;
	} else {
	buf[m++] = h_v(j,0);
	buf[m++] = h_v(j,1);
	buf[m++] = h_v(j,2);
	}
	}
	}
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);

	return m;
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::pack_border_hybrid(int n, int list, double buf)
	{
	int i,j,m;

	m = 0;
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = h_molecule(j);
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecMolecularKokkos_UnpackBorder {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;

	const typename AT::t_xfloat_2d_const _buf;
	typename AT::t_x_array _x;
	typename AT::t_tagint_1d _tag;
	typename AT::t_int_1d _type;
	typename AT::t_int_1d _mask;
	typename AT::t_tagint_1d _molecule;
	int _first;


	AtomVecMolecularKokkos_UnpackBorder(
	const typename AT::t_xfloat_2d_const &buf,
	typename AT::t_x_array &x,
	typename AT::t_tagint_1d &tag,
	typename AT::t_int_1d &type,
	typename AT::t_int_1d &mask,
	typename AT::t_tagint_1d &molecule,
	const int& first):
	_buf(buf),_x(x),_tag(tag),_type(type),_mask(mask),_molecule(molecule),
	_first(first){
	};

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	_x(i+_first,0) = _buf(i,0);
	_x(i+_first,1) = _buf(i,1);
	_x(i+_first,2) = _buf(i,2);
	- _tag(i+_first) = static_cast<tagint> (_buf(i,3));
	- _type(i+_first) = static_cast<int> (_buf(i,4));
	- _mask(i+_first) = static_cast<int> (_buf(i,5));
	- _molecule(i+_first) = static_cast<tagint> (_buf(i,6));
	+ _tag(i+_first) = (tagint) d_ubuf(_buf(i,3)).i;
	+ _type(i+_first) = (int) d_ubuf(_buf(i,4)).i;
	+ _mask(i+_first) = (int) d_ubuf(_buf(i,5)).i;
	+ _molecule(i+_first) = (tagint) d_ubuf(_buf(i,6)).i;

	}
	};

	/* ---------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::unpack_border_kokkos(const int &n, const int &first,
	const DAT::tdual_xfloat_2d &buf,
	ExecutionSpace space) {
	modified(space,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|MOLECULE_MASK);
	while (first+n >= nmax) grow(0);
	modified(space,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|MOLECULE_MASK);
	if(space==Host) {
	struct AtomVecMolecularKokkos_UnpackBorder<LMPHostType>
	f(buf.view<LMPHostType>(),h_x,h_tag,h_type,h_mask,h_molecule,first);
	Kokkos::parallel_for(n,f);
	LMPHostType::fence();
	} else {
	struct AtomVecMolecularKokkos_UnpackBorder<LMPDeviceType>
	f(buf.view<LMPDeviceType>(),d_x,d_tag,d_type,d_mask,d_molecule,first);
	Kokkos::parallel_for(n,f);
	LMPDeviceType::fence();
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::unpack_border(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	if (i == nmax) grow(0);
	modified(Host,X_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|MOLECULE_MASK);
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_tag(i) = (tagint) ubuf(buf[m++]).i;
	h_type(i) = (int) ubuf(buf[m++]).i;
	h_mask(i) = (int) ubuf(buf[m++]).i;
	h_molecule(i) = (tagint) ubuf(buf[m++]).i;
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->
	unpack_border(n,first,&buf[m]);
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::unpack_border_vel(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	if (i == nmax) grow(0);
	modified(Host,X_MASK\|V_MASK\|TAG_MASK\|TYPE_MASK\|MASK_MASK\|MOLECULE_MASK);
	h_x(i,0) = buf[m++];
	h_x(i,1) = buf[m++];
	h_x(i,2) = buf[m++];
	h_tag(i) = (tagint) ubuf(buf[m++]).i;
	h_type(i) = (int) ubuf(buf[m++]).i;
	h_mask(i) = (int) ubuf(buf[m++]).i;
	h_molecule(i) = (tagint) ubuf(buf[m++]).i;
	h_v(i,0) = buf[m++];
	h_v(i,1) = buf[m++];
	h_v(i,2) = buf[m++];
	}

	if (atom->nextra_border)
	for (int iextra = 0; iextra < atom->nextra_border; iextra++)
	m += modify->fix[atom->extra_border[iextra]]->
	unpack_border(n,first,&buf[m]);
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::unpack_border_hybrid(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++)
	h_molecule(i) = (tagint) ubuf(buf[m++]).i;
	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecMolecularKokkos_PackExchangeFunctor {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;
	typename AT::t_x_array_randomread _x;
	typename AT::t_v_array_randomread _v;
	typename AT::t_tagint_1d_randomread _tag;
	typename AT::t_int_1d_randomread _type;
	typename AT::t_int_1d_randomread _mask;
	typename AT::t_imageint_1d_randomread _image;
	typename AT::t_tagint_1d_randomread _molecule;
	typename AT::t_int_2d_randomread _nspecial;
	typename AT::t_tagint_2d_randomread _special;
	typename AT::t_int_1d_randomread _num_bond;
	typename AT::t_int_2d_randomread _bond_type;
	typename AT::t_tagint_2d_randomread _bond_atom;
	typename AT::t_int_1d_randomread _num_angle;
	typename AT::t_int_2d_randomread _angle_type;
	typename AT::t_tagint_2d_randomread _angle_atom1,_angle_atom2,_angle_atom3;
	typename AT::t_int_1d_randomread _num_dihedral;
	typename AT::t_int_2d_randomread _dihedral_type;
	typename AT::t_tagint_2d_randomread _dihedral_atom1,_dihedral_atom2,
	_dihedral_atom3,_dihedral_atom4;
	typename AT::t_int_1d_randomread _num_improper;
	typename AT::t_int_2d_randomread _improper_type;
	typename AT::t_tagint_2d_randomread _improper_atom1,_improper_atom2,
	_improper_atom3,_improper_atom4;
	typename AT::t_x_array _xw;
	typename AT::t_v_array _vw;
	typename AT::t_tagint_1d _tagw;
	typename AT::t_int_1d _typew;
	typename AT::t_int_1d _maskw;
	typename AT::t_imageint_1d _imagew;
	typename AT::t_tagint_1d _moleculew;
	typename AT::t_int_2d _nspecialw;
	typename AT::t_tagint_2d _specialw;
	typename AT::t_int_1d _num_bondw;
	typename AT::t_int_2d _bond_typew;
	typename AT::t_tagint_2d _bond_atomw;
	typename AT::t_int_1d _num_anglew;
	typename AT::t_int_2d _angle_typew;
	typename AT::t_tagint_2d _angle_atom1w,_angle_atom2w,_angle_atom3w;
	typename AT::t_int_1d _num_dihedralw;
	typename AT::t_int_2d _dihedral_typew;
	typename AT::t_tagint_2d _dihedral_atom1w,_dihedral_atom2w,
	_dihedral_atom3w,_dihedral_atom4w;
	typename AT::t_int_1d _num_improperw;
	typename AT::t_int_2d _improper_typew;
	typename AT::t_tagint_2d _improper_atom1w,_improper_atom2w,
	_improper_atom3w,_improper_atom4w;
	typename AT::t_xfloat_2d_um _buf;
	typename AT::t_int_1d_const _sendlist;
	typename AT::t_int_1d_const _copylist;
	int _nlocal,_dim;
	X_FLOAT _lo,_hi;
	size_t elements;

	AtomVecMolecularKokkos_PackExchangeFunctor(
	const AtomKokkos* atom,
	const typename AT::tdual_xfloat_2d buf,
	typename AT::tdual_int_1d sendlist,
	typename AT::tdual_int_1d copylist,int nlocal, int dim,
	X_FLOAT lo, X_FLOAT hi):
	_x(atom->k_x.view<DeviceType>()),
	_v(atom->k_v.view<DeviceType>()),
	_tag(atom->k_tag.view<DeviceType>()),
	_type(atom->k_type.view<DeviceType>()),
	_mask(atom->k_mask.view<DeviceType>()),
	_image(atom->k_image.view<DeviceType>()),
	_molecule(atom->k_molecule.view<DeviceType>()),
	_nspecial(atom->k_nspecial.view<DeviceType>()),
	_special(atom->k_special.view<DeviceType>()),
	_num_bond(atom->k_num_bond.view<DeviceType>()),
	_bond_type(atom->k_bond_type.view<DeviceType>()),
	_bond_atom(atom->k_bond_atom.view<DeviceType>()),
	_num_angle(atom->k_num_angle.view<DeviceType>()),
	_angle_type(atom->k_angle_type.view<DeviceType>()),
	_angle_atom1(atom->k_angle_atom1.view<DeviceType>()),
	_angle_atom2(atom->k_angle_atom2.view<DeviceType>()),
	_angle_atom3(atom->k_angle_atom3.view<DeviceType>()),
	_num_dihedral(atom->k_num_dihedral.view<DeviceType>()),
	_dihedral_type(atom->k_dihedral_type.view<DeviceType>()),
	_dihedral_atom1(atom->k_dihedral_atom1.view<DeviceType>()),
	_dihedral_atom2(atom->k_dihedral_atom2.view<DeviceType>()),
	_dihedral_atom3(atom->k_dihedral_atom3.view<DeviceType>()),
	_dihedral_atom4(atom->k_dihedral_atom4.view<DeviceType>()),
	_num_improper(atom->k_num_improper.view<DeviceType>()),
	_improper_type(atom->k_improper_type.view<DeviceType>()),
	_improper_atom1(atom->k_improper_atom1.view<DeviceType>()),
	_improper_atom2(atom->k_improper_atom2.view<DeviceType>()),
	_improper_atom3(atom->k_improper_atom3.view<DeviceType>()),
	_improper_atom4(atom->k_improper_atom4.view<DeviceType>()),
	_xw(atom->k_x.view<DeviceType>()),
	_vw(atom->k_v.view<DeviceType>()),
	_tagw(atom->k_tag.view<DeviceType>()),
	_typew(atom->k_type.view<DeviceType>()),
	_maskw(atom->k_mask.view<DeviceType>()),
	_imagew(atom->k_image.view<DeviceType>()),
	_moleculew(atom->k_molecule.view<DeviceType>()),
	_nspecialw(atom->k_nspecial.view<DeviceType>()),
	_specialw(atom->k_special.view<DeviceType>()),
	_num_bondw(atom->k_num_bond.view<DeviceType>()),
	_bond_typew(atom->k_bond_type.view<DeviceType>()),
	_bond_atomw(atom->k_bond_atom.view<DeviceType>()),
	_num_anglew(atom->k_num_angle.view<DeviceType>()),
	_angle_typew(atom->k_angle_type.view<DeviceType>()),
	_angle_atom1w(atom->k_angle_atom1.view<DeviceType>()),
	_angle_atom2w(atom->k_angle_atom2.view<DeviceType>()),
	_angle_atom3w(atom->k_angle_atom3.view<DeviceType>()),
	_num_dihedralw(atom->k_num_dihedral.view<DeviceType>()),
	_dihedral_typew(atom->k_dihedral_type.view<DeviceType>()),
	_dihedral_atom1w(atom->k_dihedral_atom1.view<DeviceType>()),
	_dihedral_atom2w(atom->k_dihedral_atom2.view<DeviceType>()),
	_dihedral_atom3w(atom->k_dihedral_atom3.view<DeviceType>()),
	_dihedral_atom4w(atom->k_dihedral_atom4.view<DeviceType>()),
	_num_improperw(atom->k_num_improper.view<DeviceType>()),
	_improper_typew(atom->k_improper_type.view<DeviceType>()),
	_improper_atom1w(atom->k_improper_atom1.view<DeviceType>()),
	_improper_atom2w(atom->k_improper_atom2.view<DeviceType>()),
	_improper_atom3w(atom->k_improper_atom3.view<DeviceType>()),
	_improper_atom4w(atom->k_improper_atom4.view<DeviceType>()),
	_sendlist(sendlist.template view<DeviceType>()),
	_copylist(copylist.template view<DeviceType>()),
	_nlocal(nlocal),_dim(dim),
	_lo(lo),_hi(hi){
	// 3 comp of x, 3 comp of v, 1 tag, 1 type, 1 mask, 1 image, 1 molecule, 3 nspecial,
	// maxspecial special, 1 num_bond, bond_per_atom bond_type, bond_per_atom bond_atom,
	// 1 num_angle, angle_per_atom angle_type, angle_per_atom angle_atom1, angle_atom2,
	// and angle_atom3
	// 1 num_dihedral, dihedral_per_atom dihedral_type, 4*dihedral_per_atom
	// 1 num_improper, 5*improper_per_atom
	// 1 to store buffer length
	elements = 19+atom->maxspecial+2atom->bond_per_atom+4atom->angle_per_atom+
	5atom->dihedral_per_atom + 5atom->improper_per_atom;
	const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
	- buf.template view<DeviceType>().dimension_1())/elements;
	+ buf.template view<DeviceType>().dimension_1())/elements;
	buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
	}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int &mysend) const {
	int k;
	const int i = _sendlist(mysend);
	_buf(mysend,0) = elements;
	int m = 1;
	_buf(mysend,m++) = _x(i,0);
	_buf(mysend,m++) = _x(i,1);
	_buf(mysend,m++) = _x(i,2);
	_buf(mysend,m++) = _v(i,0);
	_buf(mysend,m++) = _v(i,1);
	_buf(mysend,m++) = _v(i,2);
	- _buf(mysend,m++) = _tag(i);
	- _buf(mysend,m++) = _type(i);
	- _buf(mysend,m++) = _mask(i);
	- _buf(mysend,m++) = _image(i);
	- _buf(mysend,m++) = _molecule(i);
	- _buf(mysend,m++) = _num_bond(i);
	+ _buf(mysend,m++) = d_ubuf(_tag(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_type(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_mask(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_image(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_molecule(i)).d;
	+ _buf(mysend,m++) = d_ubuf(_num_bond(i)).d;
	for (k = 0; k < _num_bond(i); k++) {
	- _buf(mysend,m++) = _bond_type(i,k);
	- _buf(mysend,m++) = _bond_atom(i,k);
	+ _buf(mysend,m++) = d_ubuf(_bond_type(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_bond_atom(i,k)).d;
	}
	- _buf(mysend,m++) = _num_angle(i);
	+ _buf(mysend,m++) = d_ubuf(_num_angle(i)).d;
	for (k = 0; k < _num_angle(i); k++) {
	- _buf(mysend,m++) = _angle_type(i,k);
	- _buf(mysend,m++) = _angle_atom1(i,k);
	- _buf(mysend,m++) = _angle_atom2(i,k);
	- _buf(mysend,m++) = _angle_atom3(i,k);
	+ _buf(mysend,m++) = d_ubuf(_angle_type(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_angle_atom1(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_angle_atom2(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_angle_atom3(i,k)).d;
	}
	- _buf(mysend,m++) = _num_dihedral(i);
	+ _buf(mysend,m++) = d_ubuf(_num_dihedral(i)).d;
	for (k = 0; k < _num_dihedral(i); k++) {
	- _buf(mysend,m++) = _dihedral_type(i,k);
	- _buf(mysend,m++) = _dihedral_atom1(i,k);
	- _buf(mysend,m++) = _dihedral_atom2(i,k);
	- _buf(mysend,m++) = _dihedral_atom3(i,k);
	- _buf(mysend,m++) = _dihedral_atom4(i,k);
	+ _buf(mysend,m++) = d_ubuf(_dihedral_type(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_dihedral_atom1(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_dihedral_atom2(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_dihedral_atom3(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_dihedral_atom4(i,k)).d;
	}
	- _buf(mysend,m++) = _num_improper(i);
	+ _buf(mysend,m++) = d_ubuf(_num_improper(i)).d;
	for (k = 0; k < _num_improper(i); k++) {
	- _buf(mysend,m++) = _improper_type(i,k);
	- _buf(mysend,m++) = _improper_atom1(i,k);
	- _buf(mysend,m++) = _improper_atom2(i,k);
	- _buf(mysend,m++) = _improper_atom3(i,k);
	- _buf(mysend,m++) = _improper_atom4(i,k);
	+ _buf(mysend,m++) = d_ubuf(_improper_type(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_improper_atom1(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_improper_atom2(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_improper_atom3(i,k)).d;
	+ _buf(mysend,m++) = d_ubuf(_improper_atom4(i,k)).d;
	}

	- _buf(mysend,m++) = _nspecial(i,0);
	- _buf(mysend,m++) = _nspecial(i,1);
	- _buf(mysend,m++) = _nspecial(i,2);
	+ _buf(mysend,m++) = d_ubuf(_nspecial(i,0)).d;
	+ _buf(mysend,m++) = d_ubuf(_nspecial(i,1)).d;
	+ _buf(mysend,m++) = d_ubuf(_nspecial(i,2)).d;
	for (k = 0; k < _nspecial(i,2); k++)
	- _buf(mysend,m++) = _special(i,k);
	+ _buf(mysend,m++) = d_ubuf(_special(i,k)).d;

	const int j = _copylist(mysend);

	if(j>-1) {
	_xw(i,0) = _x(j,0);
	_xw(i,1) = _x(j,1);
	_xw(i,2) = _x(j,2);
	_vw(i,0) = _v(j,0);
	_vw(i,1) = _v(j,1);
	_vw(i,2) = _v(j,2);
	_tagw(i) = _tag(j);
	_typew(i) = _type(j);
	_maskw(i) = _mask(j);
	_imagew(i) = _image(j);
	_moleculew(i) = _molecule(j);
	_num_bondw(i) = _num_bond(j);
	for (k = 0; k < _num_bond(j); k++) {
	_bond_typew(i,k) = _bond_type(j,k);
	_bond_atomw(i,k) = _bond_atom(j,k);
	}
	_num_anglew(i) = _num_angle(j);
	for (k = 0; k < _num_angle(j); k++) {
	_angle_typew(i,k) = _angle_type(j,k);
	_angle_atom1w(i,k) = _angle_atom1(j,k);
	_angle_atom2w(i,k) = _angle_atom2(j,k);
	_angle_atom3w(i,k) = _angle_atom3(j,k);
	}
	_num_dihedralw(i) = _num_dihedral(j);
	for (k = 0; k < _num_dihedral(j); k++) {
	_dihedral_typew(i,k) = _dihedral_type(j,k);
	_dihedral_atom1w(i,k) = _dihedral_atom1(j,k);
	_dihedral_atom2w(i,k) = _dihedral_atom2(j,k);
	_dihedral_atom3w(i,k) = _dihedral_atom3(j,k);
	_dihedral_atom4w(i,k) = _dihedral_atom4(j,k);
	}
	_num_improperw(i) = _num_improper(j);
	for (k = 0; k < _num_improper(j); k++) {
	_improper_typew(i,k) = _improper_type(j,k);
	_improper_atom1w(i,k) = _improper_atom1(j,k);
	_improper_atom2w(i,k) = _improper_atom2(j,k);
	_improper_atom3w(i,k) = _improper_atom3(j,k);
	_improper_atom4w(i,k) = _improper_atom4(j,k);
	}
	_nspecialw(i,0) = _nspecial(j,0);
	_nspecialw(i,1) = _nspecial(j,1);
	_nspecialw(i,2) = _nspecial(j,2);
	for (k = 0; k < _nspecial(j,2); k++)
	_specialw(i,k) = _special(j,k);
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::pack_exchange_kokkos(const int &nsend,DAT::tdual_xfloat_2d &k_buf,
	DAT::tdual_int_1d k_sendlist,
	DAT::tdual_int_1d k_copylist,
	ExecutionSpace space,int dim,X_FLOAT lo,
	X_FLOAT hi )
	{
	const int elements = 19+atom->maxspecial+2atom->bond_per_atom+4atom->angle_per_atom+
	5atom->dihedral_per_atom + 5atom->improper_per_atom;
	if(nsend > (int) (k_buf.view<LMPHostType>().dimension_0()*
	k_buf.view<LMPHostType>().dimension_1())/elements) {
	int newsize = nsend*elements/k_buf.view<LMPHostType>().dimension_1()+1;
	k_buf.resize(newsize,k_buf.view<LMPHostType>().dimension_1());
	}
	if(space == Host) {
	AtomVecMolecularKokkos_PackExchangeFunctor<LMPHostType>
	f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
	Kokkos::parallel_for(nsend,f);
	LMPHostType::fence();
	return nsend*elements;
	} else {
	AtomVecMolecularKokkos_PackExchangeFunctor<LMPDeviceType>
	f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
	Kokkos::parallel_for(nsend,f);
	LMPDeviceType::fence();
	return nsend*elements;
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::pack_exchange(int i, double *buf)
	{
	int k;
	int m = 1;
	buf[m++] = h_x(i,0);
	buf[m++] = h_x(i,1);
	buf[m++] = h_x(i,2);
	buf[m++] = h_v(i,0);
	buf[m++] = h_v(i,1);
	buf[m++] = h_v(i,2);
	buf[m++] = ubuf(h_tag(i)).d;
	buf[m++] = ubuf(h_type(i)).d;
	buf[m++] = ubuf(h_mask(i)).d;
	buf[m++] = ubuf(h_image(i)).d;
	buf[m++] = ubuf(h_molecule(i)).d;

	buf[m++] = ubuf(h_num_bond(i)).d;
	for (k = 0; k < h_num_bond(i); k++) {
	buf[m++] = ubuf(h_bond_type(i,k)).d;
	buf[m++] = ubuf(h_bond_atom(i,k)).d;
	}
	buf[m++] = ubuf(h_num_angle(i)).d;
	for (k = 0; k < h_num_angle(i); k++) {
	buf[m++] = ubuf(h_angle_type(i,k)).d;
	buf[m++] = ubuf(h_angle_atom1(i,k)).d;
	buf[m++] = ubuf(h_angle_atom2(i,k)).d;
	buf[m++] = ubuf(h_angle_atom3(i,k)).d;
	}
	buf[m++] = ubuf(h_num_dihedral(i)).d;
	for (k = 0; k < h_num_dihedral(i); k++) {
	buf[m++] = ubuf(h_dihedral_type(i,k)).d;
	buf[m++] = ubuf(h_dihedral_atom1(i,k)).d;
	buf[m++] = ubuf(h_dihedral_atom2(i,k)).d;
	buf[m++] = ubuf(h_dihedral_atom3(i,k)).d;
	buf[m++] = ubuf(h_dihedral_atom4(i,k)).d;
	}
	buf[m++] = ubuf(h_num_improper(i)).d;
	for (k = 0; k < h_num_improper(i); k++) {
	buf[m++] = ubuf(h_improper_type(i,k)).d;
	buf[m++] = ubuf(h_improper_atom1(i,k)).d;
	buf[m++] = ubuf(h_improper_atom2(i,k)).d;
	buf[m++] = ubuf(h_improper_atom3(i,k)).d;
	buf[m++] = ubuf(h_improper_atom4(i,k)).d;
	}
	buf[m++] = ubuf(h_nspecial(i,0)).d;
	buf[m++] = ubuf(h_nspecial(i,1)).d;
	buf[m++] = ubuf(h_nspecial(i,2)).d;
	for (k = 0; k < h_nspecial(i,2); k++)
	buf[m++] = ubuf(h_special(i,k)).d;

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);

	buf[0] = m;
	return m;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	struct AtomVecMolecularKokkos_UnpackExchangeFunctor {
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;
	typename AT::t_x_array _x;
	typename AT::t_v_array _v;
	typename AT::t_tagint_1d _tag;
	typename AT::t_int_1d _type;
	typename AT::t_int_1d _mask;
	typename AT::t_imageint_1d _image;
	typename AT::t_tagint_1d _molecule;
	typename AT::t_int_2d _nspecial;
	typename AT::t_tagint_2d _special;
	typename AT::t_int_1d _num_bond;
	typename AT::t_int_2d _bond_type;
	typename AT::t_tagint_2d _bond_atom;
	typename AT::t_int_1d _num_angle;
	typename AT::t_int_2d _angle_type;
	typename AT::t_tagint_2d _angle_atom1,_angle_atom2,_angle_atom3;
	typename AT::t_int_1d _num_dihedral;
	typename AT::t_int_2d _dihedral_type;
	typename AT::t_tagint_2d _dihedral_atom1,_dihedral_atom2,
	_dihedral_atom3,_dihedral_atom4;
	typename AT::t_int_1d _num_improper;
	typename AT::t_int_2d _improper_type;
	typename AT::t_tagint_2d _improper_atom1,_improper_atom2,
	_improper_atom3,_improper_atom4;

	typename AT::t_xfloat_2d_um _buf;
	typename AT::t_int_1d _nlocal;
	int _dim;
	X_FLOAT _lo,_hi;
	size_t elements;

	AtomVecMolecularKokkos_UnpackExchangeFunctor(
	const AtomKokkos* atom,
	const typename AT::tdual_xfloat_2d buf,
	typename AT::tdual_int_1d nlocal,
	int dim, X_FLOAT lo, X_FLOAT hi):
	_x(atom->k_x.view<DeviceType>()),
	_v(atom->k_v.view<DeviceType>()),
	_tag(atom->k_tag.view<DeviceType>()),
	_type(atom->k_type.view<DeviceType>()),
	_mask(atom->k_mask.view<DeviceType>()),
	_image(atom->k_image.view<DeviceType>()),
	_molecule(atom->k_molecule.view<DeviceType>()),
	_nspecial(atom->k_nspecial.view<DeviceType>()),
	_special(atom->k_special.view<DeviceType>()),
	_num_bond(atom->k_num_bond.view<DeviceType>()),
	_bond_type(atom->k_bond_type.view<DeviceType>()),
	_bond_atom(atom->k_bond_atom.view<DeviceType>()),
	_num_angle(atom->k_num_angle.view<DeviceType>()),
	_angle_type(atom->k_angle_type.view<DeviceType>()),
	_angle_atom1(atom->k_angle_atom1.view<DeviceType>()),
	_angle_atom2(atom->k_angle_atom2.view<DeviceType>()),
	_angle_atom3(atom->k_angle_atom3.view<DeviceType>()),
	_num_dihedral(atom->k_num_dihedral.view<DeviceType>()),
	_dihedral_type(atom->k_dihedral_type.view<DeviceType>()),
	_dihedral_atom1(atom->k_dihedral_atom1.view<DeviceType>()),
	_dihedral_atom2(atom->k_dihedral_atom2.view<DeviceType>()),
	_dihedral_atom3(atom->k_dihedral_atom3.view<DeviceType>()),
	_dihedral_atom4(atom->k_dihedral_atom4.view<DeviceType>()),
	_num_improper(atom->k_num_improper.view<DeviceType>()),
	_improper_type(atom->k_improper_type.view<DeviceType>()),
	_improper_atom1(atom->k_improper_atom1.view<DeviceType>()),
	_improper_atom2(atom->k_improper_atom2.view<DeviceType>()),
	_improper_atom3(atom->k_improper_atom3.view<DeviceType>()),
	_improper_atom4(atom->k_improper_atom4.view<DeviceType>()),
	_nlocal(nlocal.template view<DeviceType>()),_dim(dim),
	_lo(lo),_hi(hi){

	elements = 19+atom->maxspecial+2atom->bond_per_atom+4atom->angle_per_atom+
	5atom->dihedral_per_atom + 5atom->improper_per_atom;
	const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
	- buf.template view<DeviceType>().dimension_1())/elements;
	+ buf.template view<DeviceType>().dimension_1())/elements;
	buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
	}

	KOKKOS_INLINE_FUNCTION
	void operator() (const int &myrecv) const {
	X_FLOAT x = _buf(myrecv,_dim+1);
	if (x >= _lo && x < _hi) {
	int i = Kokkos::atomic_fetch_add(&_nlocal(0),1);
	int m = 1;
	_x(i,0) = _buf(myrecv,m++);
	_x(i,1) = _buf(myrecv,m++);
	_x(i,2) = _buf(myrecv,m++);
	_v(i,0) = _buf(myrecv,m++);
	_v(i,1) = _buf(myrecv,m++);
	_v(i,2) = _buf(myrecv,m++);
	- _tag(i) = _buf(myrecv,m++);
	- _type(i) = _buf(myrecv,m++);
	- _mask(i) = _buf(myrecv,m++);
	- _image(i) = _buf(myrecv,m++);
	+ _tag(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _type(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _mask(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _image(i) = (imageint) d_ubuf(_buf(myrecv,m++)).i;

	- _molecule(i) = _buf(myrecv,m++);
	- _num_bond(i) = _buf(myrecv,m++);
	+ _molecule(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _num_bond(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	int k;
	for (k = 0; k < _num_bond(i); k++) {
	- _bond_type(i,k) = _buf(myrecv,m++);
	- _bond_atom(i,k) = _buf(myrecv,m++);
	+ _bond_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _bond_atom(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	}
	- _num_angle(i) = _buf(myrecv,m++);
	+ _num_angle(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	for (k = 0; k < _num_angle(i); k++) {
	- _angle_type(i,k) = _buf(myrecv,m++);
	- _angle_atom1(i,k) = _buf(myrecv,m++);
	- _angle_atom2(i,k) = _buf(myrecv,m++);
	- _angle_atom3(i,k) = _buf(myrecv,m++);
	+ _angle_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _angle_atom1(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _angle_atom2(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _angle_atom3(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	}
	- _num_dihedral(i) = _buf(myrecv,m++);
	+ _num_dihedral(i) = d_ubuf(_buf(myrecv,m++)).i;
	for (k = 0; k < _num_dihedral(i); k++) {
	- _dihedral_type(i,k) = _buf(myrecv,m++);
	- _dihedral_atom1(i,k) = _buf(myrecv,m++);
	- _dihedral_atom2(i,k) = _buf(myrecv,m++);
	- _dihedral_atom3(i,k) = _buf(myrecv,m++);
	- _dihedral_atom4(i,k) = _buf(myrecv,m++);
	+ _dihedral_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _dihedral_atom1(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _dihedral_atom2(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _dihedral_atom3(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _dihedral_atom4(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	}
	- _num_improper(i) = _buf(myrecv,m++);
	- for (k = 0; k < _num_improper(i); k++) {
	- _improper_type(i,k) = _buf(myrecv,m++);
	- _improper_atom1(i,k) = _buf(myrecv,m++);
	- _improper_atom2(i,k) = _buf(myrecv,m++);
	- _improper_atom3(i,k) = _buf(myrecv,m++);
	- _improper_atom4(i,k) = _buf(myrecv,m++);
	+ _num_improper(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ for (k = 0; k < (int) _num_improper(i); k++) {
	+ _improper_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _improper_atom1(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _improper_atom2(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _improper_atom3(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	+ _improper_atom4(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	}
	- _nspecial(i,0) = _buf(myrecv,m++);
	- _nspecial(i,1) = _buf(myrecv,m++);
	- _nspecial(i,2) = _buf(myrecv,m++);
	+ _nspecial(i,0) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _nspecial(i,1) = (int) d_ubuf(_buf(myrecv,m++)).i;
	+ _nspecial(i,2) = (int) d_ubuf(_buf(myrecv,m++)).i;
	for (k = 0; k < _nspecial(i,2); k++)
	- _special(i,k) = _buf(myrecv,m++);
	+ _special(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
	}
	}
	};

	/* ---------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::unpack_exchange_kokkos(DAT::tdual_xfloat_2d &k_buf,int nrecv,
	int nlocal,int dim,X_FLOAT lo,X_FLOAT hi,
	ExecutionSpace space) {
	const size_t elements = 19+atom->maxspecial+2atom->bond_per_atom+4atom->angle_per_atom+
	5atom->dihedral_per_atom + 5atom->improper_per_atom;
	if(space == Host) {
	k_count.h_view(0) = nlocal;
	AtomVecMolecularKokkos_UnpackExchangeFunctor<LMPHostType>
	f(atomKK,k_buf,k_count,dim,lo,hi);
	Kokkos::parallel_for(nrecv/elements,f);
	LMPHostType::fence();
	return k_count.h_view(0);
	} else {
	k_count.h_view(0) = nlocal;
	k_count.modify<LMPHostType>();
	k_count.sync<LMPDeviceType>();
	AtomVecMolecularKokkos_UnpackExchangeFunctor<LMPDeviceType>
	f(atomKK,k_buf,k_count,dim,lo,hi);
	Kokkos::parallel_for(nrecv/elements,f);
	LMPDeviceType::fence();
	k_count.modify<LMPDeviceType>();
	k_count.sync<LMPHostType>();

	return k_count.h_view(0);
	}
	}

	/* ---------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::unpack_exchange(double *buf)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) grow(0);
	modified(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| MOLECULE_MASK \| BOND_MASK \|
	ANGLE_MASK \| DIHEDRAL_MASK \| IMPROPER_MASK \| SPECIAL_MASK);

	int k;
	int m = 1;
	h_x(nlocal,0) = buf[m++];
	h_x(nlocal,1) = buf[m++];
	h_x(nlocal,2) = buf[m++];
	h_v(nlocal,0) = buf[m++];
	h_v(nlocal,1) = buf[m++];
	h_v(nlocal,2) = buf[m++];
	h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
	h_type(nlocal) = (int) ubuf(buf[m++]).i;
	h_mask(nlocal) = (int) ubuf(buf[m++]).i;
	h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
	h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;

	h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_bond(nlocal); k++) {
	h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}
	h_num_angle(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_angle(nlocal); k++) {
	h_angle_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_angle_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_angle_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_angle_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}
	h_num_dihedral(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_dihedral(nlocal); k++) {
	h_dihedral_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_dihedral_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_dihedral_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_dihedral_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_dihedral_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}
	h_num_improper(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_improper(nlocal); k++) {
	h_improper_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_improper_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_improper_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_improper_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_improper_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}
	h_nspecial(nlocal,0) = (int) ubuf(buf[m++]).i;
	h_nspecial(nlocal,1) = (int) ubuf(buf[m++]).i;
	h_nspecial(nlocal,2) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_nspecial(nlocal,2); k++)
	h_special(nlocal,k) = (tagint) ubuf(buf[m++]).i;

	if (atom->nextra_grow)
	for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
	m += modify->fix[atom->extra_grow[iextra]]->
	unpack_exchange(nlocal,&buf[m]);

	atom->nlocal++;
	return m;
	}

	/* ----------------------------------------------------------------------
	size of restart data for all atoms owned by this proc
	include extra data stored by fixes
	------------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::size_restart()
	{
	int i;

	int nlocal = atom->nlocal;
	int n = 0;
	for (i = 0; i < nlocal; i++)
	n += 16 + 2num_bond[i] + 4num_angle[i] +
	5num_dihedral[i] + 5num_improper[i];

	if (atom->nextra_restart)
	for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
	for (i = 0; i < nlocal; i++)
	n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);

	return n;
	}

	/* ----------------------------------------------------------------------
	pack atom I's data for restart file including extra quantities
	xyz must be 1st 3 values, so that read_restart can test on them
	molecular types may be negative, but write as positive
	------------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::pack_restart(int i, double *buf)
	{
	sync(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| MOLECULE_MASK \| BOND_MASK \|
	ANGLE_MASK \| DIHEDRAL_MASK \| IMPROPER_MASK \| SPECIAL_MASK);

	int m = 1;
	buf[m++] = h_x(i,0);
	buf[m++] = h_x(i,1);
	buf[m++] = h_x(i,2);
	buf[m++] = ubuf(h_tag(i)).d;
	buf[m++] = ubuf(h_type(i)).d;
	buf[m++] = ubuf(h_mask(i)).d;
	buf[m++] = ubuf(h_image(i)).d;
	buf[m++] = h_v(i,0);
	buf[m++] = h_v(i,1);
	buf[m++] = h_v(i,2);

	buf[m++] = ubuf(h_molecule(i)).d;

	buf[m++] = ubuf(h_num_bond(i)).d;
	for (int k = 0; k < h_num_bond(i); k++) {
	buf[m++] = ubuf(MAX(h_bond_type(i,k),-h_bond_type(i,k))).d;
	buf[m++] = ubuf(h_bond_atom(i,k)).d;
	}

	buf[m++] = ubuf(h_num_angle(i)).d;
	for (int k = 0; k < h_num_angle(i); k++) {
	buf[m++] = ubuf(MAX(h_angle_type(i,k),-h_angle_type(i,k))).d;
	buf[m++] = ubuf(h_angle_atom1(i,k)).d;
	buf[m++] = ubuf(h_angle_atom2(i,k)).d;
	buf[m++] = ubuf(h_angle_atom3(i,k)).d;
	}

	buf[m++] = ubuf(h_num_dihedral(i)).d;
	for (int k = 0; k < h_num_dihedral(i); k++) {
	buf[m++] = ubuf(MAX(h_dihedral_type(i,k),-h_dihedral_type(i,k))).d;
	buf[m++] = ubuf(h_dihedral_atom1(i,k)).d;
	buf[m++] = ubuf(h_dihedral_atom2(i,k)).d;
	buf[m++] = ubuf(h_dihedral_atom3(i,k)).d;
	buf[m++] = ubuf(h_dihedral_atom4(i,k)).d;
	}

	buf[m++] = ubuf(h_num_improper(i)).d;
	for (int k = 0; k < h_num_improper(i); k++) {
	buf[m++] = ubuf(MAX(h_improper_type(i,k),-h_improper_type(i,k))).d;
	buf[m++] = ubuf(h_improper_atom1(i,k)).d;
	buf[m++] = ubuf(h_improper_atom2(i,k)).d;
	buf[m++] = ubuf(h_improper_atom3(i,k)).d;
	buf[m++] = ubuf(h_improper_atom4(i,k)).d;
	}

	if (atom->nextra_restart)
	for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
	m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);

	buf[0] = m;
	return m;
	}

	/* ----------------------------------------------------------------------
	unpack data for one atom from restart file including extra quantities
	------------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::unpack_restart(double *buf)
	{
	int k;

	int nlocal = atom->nlocal;
	if (nlocal == nmax) {
	grow(0);
	if (atom->nextra_store)
	memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
	}

	modified(Host,X_MASK \| V_MASK \| TAG_MASK \| TYPE_MASK \|
	MASK_MASK \| IMAGE_MASK \| MOLECULE_MASK \| BOND_MASK \|
	ANGLE_MASK \| DIHEDRAL_MASK \| IMPROPER_MASK \| SPECIAL_MASK);

	int m = 1;
	h_x(nlocal,0) = buf[m++];
	h_x(nlocal,1) = buf[m++];
	h_x(nlocal,2) = buf[m++];
	h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
	h_type(nlocal) = (int) ubuf(buf[m++]).i;
	h_mask(nlocal) = (int) ubuf(buf[m++]).i;
	h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
	h_v(nlocal,0) = buf[m++];
	h_v(nlocal,1) = buf[m++];
	h_v(nlocal,2) = buf[m++];

	h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;

	h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_bond(nlocal); k++) {
	h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}

	h_num_angle(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_angle(nlocal); k++) {
	h_angle_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_angle_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_angle_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_angle_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}

	h_num_dihedral(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_dihedral(nlocal); k++) {
	h_dihedral_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_dihedral_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_dihedral_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_dihedral_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_dihedral_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}

	h_num_improper(nlocal) = (int) ubuf(buf[m++]).i;
	for (k = 0; k < h_num_improper(nlocal); k++) {
	h_improper_type(nlocal,k) = (int) ubuf(buf[m++]).i;
	h_improper_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_improper_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_improper_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	h_improper_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
	}

	h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;

	double **extra = atom->extra;
	if (atom->nextra_store) {
	int size = static_cast<int> (buf[0]) - m;
	for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
	}

	atom->nlocal++;
	return m;
	}

	/* ----------------------------------------------------------------------
	create one atom of itype at coord
	set other values to defaults
	------------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::create_atom(int itype, double *coord)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) {
	atomKK->modified(Host,ALL_MASK);
	grow(0);
	}
	atomKK->modified(Host,ALL_MASK);

	tag[nlocal] = 0;
	type[nlocal] = itype;
	h_x(nlocal,0) = coord[0];
	h_x(nlocal,1) = coord[1];
	h_x(nlocal,2) = coord[2];
	h_mask(nlocal) = 1;
	h_image(nlocal) = ((imageint) IMGMAX << IMG2BITS) \|
	((imageint) IMGMAX << IMGBITS) \| IMGMAX;
	h_v(nlocal,0) = 0.0;
	h_v(nlocal,1) = 0.0;
	h_v(nlocal,2) = 0.0;

	h_molecule(nlocal) = 0;
	h_num_bond(nlocal) = 0;
	h_num_angle(nlocal) = 0;
	h_num_dihedral(nlocal) = 0;
	h_num_improper(nlocal) = 0;
	h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;

	atom->nlocal++;
	}

	/* ----------------------------------------------------------------------
	unpack one line from Atoms section of data file
	initialize other atom quantities
	------------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::data_atom(double *coord, imageint imagetmp,
	char **values)
	{
	int nlocal = atom->nlocal;
	if (nlocal == nmax) grow(0);
	atomKK->modified(Host,ALL_MASK);

	h_tag(nlocal) = atoi(values[0]);
	h_molecule(nlocal) = atoi(values[1]);
	h_type(nlocal) = atoi(values[2]);
	if (h_type(nlocal) <= 0 \|\| h_type(nlocal) > atom->ntypes)
	error->one(FLERR,"Invalid atom type in Atoms section of data file");

	h_x(nlocal,0) = coord[0];
	h_x(nlocal,1) = coord[1];
	h_x(nlocal,2) = coord[2];

	h_image(nlocal) = imagetmp;

	h_mask(nlocal) = 1;
	h_v(nlocal,0) = 0.0;
	h_v(nlocal,1) = 0.0;
	h_v(nlocal,2) = 0.0;
	h_num_bond(nlocal) = 0;
	h_num_angle(nlocal) = 0;
	h_num_dihedral(nlocal) = 0;
	h_num_improper(nlocal) = 0;

	atom->nlocal++;
	}

	/* ----------------------------------------------------------------------
	unpack hybrid quantities from one line in Atoms section of data file
	initialize other atom quantities for this sub-style
	------------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::data_atom_hybrid(int nlocal, char **values)
	{
	h_molecule(nlocal) = atoi(values[0]);
	h_num_bond(nlocal) = 0;
	h_num_angle(nlocal) = 0;
	h_num_dihedral(nlocal) = 0;
	h_num_improper(nlocal) = 0;
	return 1;
	}

	/* ----------------------------------------------------------------------
	pack atom info for data file including 3 image flags
	------------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::pack_data(double **buf)
	{
	int nlocal = atom->nlocal;
	for (int i = 0; i < nlocal; i++) {
	buf[i][0] = h_tag(i);
	buf[i][1] = h_molecule(i);
	buf[i][2] = h_type(i);
	buf[i][3] = h_x(i,0);
	buf[i][4] = h_x(i,1);
	buf[i][5] = h_x(i,2);
	buf[i][6] = (h_image[i] & IMGMASK) - IMGMAX;
	buf[i][7] = (h_image[i] >> IMGBITS & IMGMASK) - IMGMAX;
	buf[i][8] = (h_image[i] >> IMG2BITS) - IMGMAX;
	}
	}

	/* ----------------------------------------------------------------------
	pack hybrid atom info for data file
	------------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::pack_data_hybrid(int i, double *buf)
	{
	buf[0] = h_molecule(i);
	return 1;
	}

	/* ----------------------------------------------------------------------
	write atom info to data file including 3 image flags
	------------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::write_data(FILE fp, int n, double *buf)
	{
	for (int i = 0; i < n; i++)
	fprintf(fp,"%d %d %d %-1.16e %-1.16e %-1.16e %d %d %d\n",
	(int) buf[i][0],(int) buf[i][1], (int) buf[i][2],
	buf[i][3],buf[i][4],buf[i][5],
	(int) buf[i][6],(int) buf[i][7],(int) buf[i][8]);
	}

	/* ----------------------------------------------------------------------
	write hybrid atom info to data file
	------------------------------------------------------------------------- */

	int AtomVecMolecularKokkos::write_data_hybrid(FILE fp, double buf)
	{
	fprintf(fp," " TAGINT_FORMAT, (tagint) (buf[0]));
	return 1;
	}

	/* ----------------------------------------------------------------------
	return # of bytes of allocated memory
	------------------------------------------------------------------------- */

	bigint AtomVecMolecularKokkos::memory_usage()
	{
	bigint bytes = 0;

	if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
	if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
	if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
	if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
	if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
	if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
	if (atom->memcheck("f")) bytes += memory->usage(f,nmax*commKK->nthreads,3);

	if (atom->memcheck("molecule")) bytes += memory->usage(molecule,nmax);
	if (atom->memcheck("nspecial")) bytes += memory->usage(nspecial,nmax,3);
	if (atom->memcheck("special"))
	bytes += memory->usage(special,nmax,atom->maxspecial);

	if (atom->memcheck("num_bond")) bytes += memory->usage(num_bond,nmax);
	if (atom->memcheck("bond_type"))
	bytes += memory->usage(bond_type,nmax,atom->bond_per_atom);
	if (atom->memcheck("bond_atom"))
	bytes += memory->usage(bond_atom,nmax,atom->bond_per_atom);

	if (atom->memcheck("num_angle")) bytes += memory->usage(num_angle,nmax);
	if (atom->memcheck("angle_type"))
	bytes += memory->usage(angle_type,nmax,atom->angle_per_atom);
	if (atom->memcheck("angle_atom1"))
	bytes += memory->usage(angle_atom1,nmax,atom->angle_per_atom);
	if (atom->memcheck("angle_atom2"))
	bytes += memory->usage(angle_atom2,nmax,atom->angle_per_atom);
	if (atom->memcheck("angle_atom3"))
	bytes += memory->usage(angle_atom3,nmax,atom->angle_per_atom);

	if (atom->memcheck("num_dihedral")) bytes += memory->usage(num_dihedral,nmax);
	if (atom->memcheck("dihedral_type"))
	bytes += memory->usage(dihedral_type,nmax,atom->dihedral_per_atom);
	if (atom->memcheck("dihedral_atom1"))
	bytes += memory->usage(dihedral_atom1,nmax,atom->dihedral_per_atom);
	if (atom->memcheck("dihedral_atom2"))
	bytes += memory->usage(dihedral_atom2,nmax,atom->dihedral_per_atom);
	if (atom->memcheck("dihedral_atom3"))
	bytes += memory->usage(dihedral_atom3,nmax,atom->dihedral_per_atom);
	if (atom->memcheck("dihedral_atom4"))
	bytes += memory->usage(dihedral_atom4,nmax,atom->dihedral_per_atom);
	if (atom->memcheck("num_improper")) bytes += memory->usage(num_improper,nmax);
	if (atom->memcheck("improper_type"))
	bytes += memory->usage(improper_type,nmax,atom->improper_per_atom);
	if (atom->memcheck("improper_atom1"))
	bytes += memory->usage(improper_atom1,nmax,atom->improper_per_atom);
	if (atom->memcheck("improper_atom2"))
	bytes += memory->usage(improper_atom2,nmax,atom->improper_per_atom);
	if (atom->memcheck("improper_atom3"))
	bytes += memory->usage(improper_atom3,nmax,atom->improper_per_atom);
	if (atom->memcheck("improper_atom4"))
	bytes += memory->usage(improper_atom4,nmax,atom->improper_per_atom);

	return bytes;
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::sync(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if (mask & X_MASK) atomKK->k_x.sync<LMPDeviceType>();
	if (mask & V_MASK) atomKK->k_v.sync<LMPDeviceType>();
	if (mask & F_MASK) atomKK->k_f.sync<LMPDeviceType>();
	if (mask & TAG_MASK) atomKK->k_tag.sync<LMPDeviceType>();
	if (mask & TYPE_MASK) atomKK->k_type.sync<LMPDeviceType>();
	if (mask & MASK_MASK) atomKK->k_mask.sync<LMPDeviceType>();
	if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPDeviceType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPDeviceType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.sync<LMPDeviceType>();
	atomKK->k_special.sync<LMPDeviceType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.sync<LMPDeviceType>();
	atomKK->k_bond_type.sync<LMPDeviceType>();
	atomKK->k_bond_atom.sync<LMPDeviceType>();
	}
	if (mask & ANGLE_MASK) {
	atomKK->k_num_angle.sync<LMPDeviceType>();
	atomKK->k_angle_type.sync<LMPDeviceType>();
	atomKK->k_angle_atom1.sync<LMPDeviceType>();
	atomKK->k_angle_atom2.sync<LMPDeviceType>();
	atomKK->k_angle_atom3.sync<LMPDeviceType>();
	}
	if (mask & DIHEDRAL_MASK) {
	atomKK->k_num_dihedral.sync<LMPDeviceType>();
	atomKK->k_dihedral_type.sync<LMPDeviceType>();
	atomKK->k_dihedral_atom1.sync<LMPDeviceType>();
	atomKK->k_dihedral_atom2.sync<LMPDeviceType>();
	atomKK->k_dihedral_atom3.sync<LMPDeviceType>();
	atomKK->k_dihedral_atom4.sync<LMPDeviceType>();
	}
	if (mask & IMPROPER_MASK) {
	atomKK->k_num_improper.sync<LMPDeviceType>();
	atomKK->k_improper_type.sync<LMPDeviceType>();
	atomKK->k_improper_atom1.sync<LMPDeviceType>();
	atomKK->k_improper_atom2.sync<LMPDeviceType>();
	atomKK->k_improper_atom3.sync<LMPDeviceType>();
	atomKK->k_improper_atom4.sync<LMPDeviceType>();
	}
	} else {
	if (mask & X_MASK) atomKK->k_x.sync<LMPHostType>();
	if (mask & V_MASK) atomKK->k_v.sync<LMPHostType>();
	if (mask & F_MASK) atomKK->k_f.sync<LMPHostType>();
	if (mask & TAG_MASK) atomKK->k_tag.sync<LMPHostType>();
	if (mask & TYPE_MASK) atomKK->k_type.sync<LMPHostType>();
	if (mask & MASK_MASK) atomKK->k_mask.sync<LMPHostType>();
	if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPHostType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPHostType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.sync<LMPHostType>();
	atomKK->k_special.sync<LMPHostType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.sync<LMPHostType>();
	atomKK->k_bond_type.sync<LMPHostType>();
	atomKK->k_bond_atom.sync<LMPHostType>();
	}
	if (mask & ANGLE_MASK) {
	atomKK->k_num_angle.sync<LMPHostType>();
	atomKK->k_angle_type.sync<LMPHostType>();
	atomKK->k_angle_atom1.sync<LMPHostType>();
	atomKK->k_angle_atom2.sync<LMPHostType>();
	atomKK->k_angle_atom3.sync<LMPHostType>();
	}
	if (mask & DIHEDRAL_MASK) {
	atomKK->k_num_dihedral.sync<LMPHostType>();
	atomKK->k_dihedral_type.sync<LMPHostType>();
	atomKK->k_dihedral_atom1.sync<LMPHostType>();
	atomKK->k_dihedral_atom2.sync<LMPHostType>();
	atomKK->k_dihedral_atom3.sync<LMPHostType>();
	atomKK->k_dihedral_atom4.sync<LMPHostType>();
	}
	if (mask & IMPROPER_MASK) {
	atomKK->k_num_improper.sync<LMPHostType>();
	atomKK->k_improper_type.sync<LMPHostType>();
	atomKK->k_improper_atom1.sync<LMPHostType>();
	atomKK->k_improper_atom2.sync<LMPHostType>();
	atomKK->k_improper_atom3.sync<LMPHostType>();
	atomKK->k_improper_atom4.sync<LMPHostType>();
	}
	}
	}

	void AtomVecMolecularKokkos::sync_overlapping_device(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
	if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
	if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
	if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
	if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
	if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
	if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
	if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
	if (mask & SPECIAL_MASK) {
	if (atomKK->k_nspecial.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
	if (atomKK->k_special.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
	}
	if (mask & BOND_MASK) {
	if (atomKK->k_num_bond.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
	if (atomKK->k_bond_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
	if (atomKK->k_bond_atom.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
	}
	if (mask & ANGLE_MASK) {
	if (atomKK->k_num_angle.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_angle,space);
	if (atomKK->k_angle_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_angle_type,space);
	if (atomKK->k_angle_atom1.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom1,space);
	if (atomKK->k_angle_atom2.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom2,space);
	if (atomKK->k_angle_atom3.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom3,space);
	}
	if (mask & DIHEDRAL_MASK) {
	if (atomKK->k_num_dihedral.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_dihedral,space);
	if (atomKK->k_dihedral_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_dihedral_type,space);
	if (atomKK->k_dihedral_atom1.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom1,space);
	if (atomKK->k_dihedral_atom2.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom2,space);
	if (atomKK->k_dihedral_atom3.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom3,space);
	}
	if (mask & IMPROPER_MASK) {
	if (atomKK->k_num_improper.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_improper,space);
	if (atomKK->k_improper_type.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_improper_type,space);
	if (atomKK->k_improper_atom1.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom1,space);
	if (atomKK->k_improper_atom2.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom2,space);
	if (atomKK->k_improper_atom3.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom3,space);
	if (atomKK->k_improper_atom4.need_sync<LMPDeviceType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom4,space);
	}
	} else {
	if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
	if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
	if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
	if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
	if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
	if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
	if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
	if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
	if (mask & SPECIAL_MASK) {
	if (atomKK->k_nspecial.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
	if (atomKK->k_special.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
	}
	if (mask & BOND_MASK) {
	if (atomKK->k_num_bond.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
	if (atomKK->k_bond_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
	if (atomKK->k_bond_atom.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
	}
	if (mask & ANGLE_MASK) {
	if (atomKK->k_num_angle.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_angle,space);
	if (atomKK->k_angle_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_angle_type,space);
	if (atomKK->k_angle_atom1.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom1,space);
	if (atomKK->k_angle_atom2.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom2,space);
	if (atomKK->k_angle_atom3.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom3,space);
	}
	if (mask & DIHEDRAL_MASK) {
	if (atomKK->k_num_dihedral.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_dihedral,space);
	if (atomKK->k_dihedral_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_dihedral_type,space);
	if (atomKK->k_dihedral_atom1.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom1,space);
	if (atomKK->k_dihedral_atom2.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom2,space);
	if (atomKK->k_dihedral_atom3.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom3,space);
	if (atomKK->k_dihedral_atom4.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom4,space);
	}
	if (mask & IMPROPER_MASK) {
	if (atomKK->k_num_improper.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_improper,space);
	if (atomKK->k_improper_type.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_int_2d>(atomKK->k_improper_type,space);
	if (atomKK->k_improper_atom1.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom1,space);
	if (atomKK->k_improper_atom2.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom2,space);
	if (atomKK->k_improper_atom3.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom3,space);
	if (atomKK->k_improper_atom4.need_sync<LMPHostType>())
	perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom4,space);
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	void AtomVecMolecularKokkos::modified(ExecutionSpace space, unsigned int mask)
	{
	if (space == Device) {
	if (mask & X_MASK) atomKK->k_x.modify<LMPDeviceType>();
	if (mask & V_MASK) atomKK->k_v.modify<LMPDeviceType>();
	if (mask & F_MASK) atomKK->k_f.modify<LMPDeviceType>();
	if (mask & TAG_MASK) atomKK->k_tag.modify<LMPDeviceType>();
	if (mask & TYPE_MASK) atomKK->k_type.modify<LMPDeviceType>();
	if (mask & MASK_MASK) atomKK->k_mask.modify<LMPDeviceType>();
	if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPDeviceType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPDeviceType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.modify<LMPDeviceType>();
	atomKK->k_special.modify<LMPDeviceType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.modify<LMPDeviceType>();
	atomKK->k_bond_type.modify<LMPDeviceType>();
	atomKK->k_bond_atom.modify<LMPDeviceType>();
	}
	if (mask & ANGLE_MASK) {
	atomKK->k_num_angle.modify<LMPDeviceType>();
	atomKK->k_angle_type.modify<LMPDeviceType>();
	atomKK->k_angle_atom1.modify<LMPDeviceType>();
	atomKK->k_angle_atom2.modify<LMPDeviceType>();
	atomKK->k_angle_atom3.modify<LMPDeviceType>();
	}
	if (mask & DIHEDRAL_MASK) {
	atomKK->k_num_dihedral.modify<LMPDeviceType>();
	atomKK->k_dihedral_type.modify<LMPDeviceType>();
	atomKK->k_dihedral_atom1.modify<LMPDeviceType>();
	atomKK->k_dihedral_atom2.modify<LMPDeviceType>();
	atomKK->k_dihedral_atom3.modify<LMPDeviceType>();
	atomKK->k_dihedral_atom4.modify<LMPDeviceType>();
	}
	if (mask & IMPROPER_MASK) {
	atomKK->k_num_improper.modify<LMPDeviceType>();
	atomKK->k_improper_type.modify<LMPDeviceType>();
	atomKK->k_improper_atom1.modify<LMPDeviceType>();
	atomKK->k_improper_atom2.modify<LMPDeviceType>();
	atomKK->k_improper_atom3.modify<LMPDeviceType>();
	atomKK->k_improper_atom4.modify<LMPDeviceType>();
	}
	} else {
	if (mask & X_MASK) atomKK->k_x.modify<LMPHostType>();
	if (mask & V_MASK) atomKK->k_v.modify<LMPHostType>();
	if (mask & F_MASK) atomKK->k_f.modify<LMPHostType>();
	if (mask & TAG_MASK) atomKK->k_tag.modify<LMPHostType>();
	if (mask & TYPE_MASK) atomKK->k_type.modify<LMPHostType>();
	if (mask & MASK_MASK) atomKK->k_mask.modify<LMPHostType>();
	if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPHostType>();
	if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPHostType>();
	if (mask & SPECIAL_MASK) {
	atomKK->k_nspecial.modify<LMPHostType>();
	atomKK->k_special.modify<LMPHostType>();
	}
	if (mask & BOND_MASK) {
	atomKK->k_num_bond.modify<LMPHostType>();
	atomKK->k_bond_type.modify<LMPHostType>();
	atomKK->k_bond_atom.modify<LMPHostType>();
	}
	if (mask & ANGLE_MASK) {
	atomKK->k_num_angle.modify<LMPHostType>();
	atomKK->k_angle_type.modify<LMPHostType>();
	atomKK->k_angle_atom1.modify<LMPHostType>();
	atomKK->k_angle_atom2.modify<LMPHostType>();
	atomKK->k_angle_atom3.modify<LMPHostType>();
	}
	if (mask & DIHEDRAL_MASK) {
	atomKK->k_num_dihedral.modify<LMPHostType>();
	atomKK->k_dihedral_type.modify<LMPHostType>();
	atomKK->k_dihedral_atom1.modify<LMPHostType>();
	atomKK->k_dihedral_atom2.modify<LMPHostType>();
	atomKK->k_dihedral_atom3.modify<LMPHostType>();
	atomKK->k_dihedral_atom4.modify<LMPHostType>();
	}
	if (mask & IMPROPER_MASK) {
	atomKK->k_num_improper.modify<LMPHostType>();
	atomKK->k_improper_type.modify<LMPHostType>();
	atomKK->k_improper_atom1.modify<LMPHostType>();
	atomKK->k_improper_atom2.modify<LMPHostType>();
	atomKK->k_improper_atom3.modify<LMPHostType>();
	atomKK->k_improper_atom4.modify<LMPHostType>();
	}
	}
	}

	diff --git a/src/KOKKOS/fix_qeq_reax_kokkos.cpp b/src/KOKKOS/fix_qeq_reax_kokkos.cpp
	index 3b8d5a85e..2e46b85fd 100644
	--- a/src/KOKKOS/fix_qeq_reax_kokkos.cpp
	+++ b/src/KOKKOS/fix_qeq_reax_kokkos.cpp
	@@ -1,1231 +1,1232 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Ray Shan (SNL), Stan Moore (SNL)
	------------------------------------------------------------------------- */

	#include <math.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <string.h>
	#include "fix_qeq_reax_kokkos.h"
	#include "kokkos.h"
	#include "atom.h"
	#include "atom_masks.h"
	#include "atom_kokkos.h"
	#include "comm.h"
	#include "force.h"
	#include "group.h"
	#include "modify.h"
	#include "neighbor.h"
	#include "neigh_list_kokkos.h"
	#include "neigh_request.h"
	#include "update.h"
	#include "integrate.h"
	#include "respa.h"
	#include "math_const.h"
	#include "memory.h"
	#include "error.h"
	-#include "pair_reax_c_kokkos.h"
	+#include "pair_reaxc_kokkos.h"

	using namespace LAMMPS_NS;
	using namespace FixConst;

	#define SMALL 0.0001
	#define EV_TO_KCAL_PER_MOL 14.4

	#define TEAMSIZE 128

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	-FixQEqReaxKokkos<DeviceType>::FixQEqReaxKokkos(LAMMPS lmp, int narg, char *arg) :
	+FixQEqReaxKokkos<DeviceType>::
	+FixQEqReaxKokkos(LAMMPS lmp, int narg, char *arg) :
	FixQEqReax(lmp, narg, arg)
	{
	kokkosable = 1;
	atomKK = (AtomKokkos *) atom;
	execution_space = ExecutionSpaceFromDevice<DeviceType>::space;

	datamask_read = X_MASK \| V_MASK \| F_MASK \| MASK_MASK \| Q_MASK \| TYPE_MASK;
	datamask_modify = Q_MASK \| X_MASK;

	nmax = nmax = m_cap = 0;
	allocated_flag = 0;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	FixQEqReaxKokkos<DeviceType>::~FixQEqReaxKokkos()
	{
	if (copymode) return;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void FixQEqReaxKokkos<DeviceType>::init()
	{
	atomKK->k_q.modify<LMPHostType>();
	atomKK->k_q.sync<LMPDeviceType>();

	FixQEqReax::init();

	neighflag = lmp->kokkos->neighflag_qeq;
	int irequest = neighbor->nrequest - 1;

	neighbor->requests[irequest]->
	kokkos_host = Kokkos::Impl::is_same<DeviceType,LMPHostType>::value &&
	!Kokkos::Impl::is_same<DeviceType,LMPDeviceType>::value;
	neighbor->requests[irequest]->
	kokkos_device = Kokkos::Impl::is_same<DeviceType,LMPDeviceType>::value;

	if (neighflag == FULL) {
	neighbor->requests[irequest]->fix = 1;
	neighbor->requests[irequest]->pair = 0;
	neighbor->requests[irequest]->full = 1;
	neighbor->requests[irequest]->half = 0;
	} else { //if (neighflag == HALF \|\| neighflag == HALFTHREAD)
	neighbor->requests[irequest]->fix = 1;
	neighbor->requests[irequest]->pair = 0;
	neighbor->requests[irequest]->full = 0;
	neighbor->requests[irequest]->half = 1;
	neighbor->requests[irequest]->ghost = 1;
	}

	int ntypes = atom->ntypes;
	k_params = Kokkos::DualView<params_qeq*,Kokkos::LayoutRight,DeviceType>
	("FixQEqReax::params",ntypes+1);
	params = k_params.template view<DeviceType>();

	for (n = 1; n <= ntypes; n++) {
	k_params.h_view(n).chi = chi[n];
	k_params.h_view(n).eta = eta[n];
	k_params.h_view(n).gamma = gamma[n];
	}
	k_params.template modify<LMPHostType>();

	cutsq = swb * swb;

	init_shielding_k();
	init_hist();
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void FixQEqReaxKokkos<DeviceType>::init_shielding_k()
	{
	int i,j;
	int ntypes = atom->ntypes;

	k_shield = DAT::tdual_ffloat_2d("qeq/kk:shield",ntypes+1,ntypes+1);
	d_shield = k_shield.template view<DeviceType>();

	for( i = 1; i <= ntypes; ++i )
	for( j = 1; j <= ntypes; ++j )
	k_shield.h_view(i,j) = pow( gamma[i] * gamma[j], -1.5 );

	k_shield.template modify<LMPHostType>();
	k_shield.template sync<DeviceType>();

	k_tap = DAT::tdual_ffloat_1d("qeq/kk:tap",8);
	d_tap = k_tap.template view<DeviceType>();

	for (i = 0; i < 8; i ++)
	k_tap.h_view(i) = Tap[i];

	k_tap.template modify<LMPHostType>();
	k_tap.template sync<DeviceType>();
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void FixQEqReaxKokkos<DeviceType>::init_hist()
	{
	int i,j;

	k_s_hist = DAT::tdual_ffloat_2d("qeq/kk:s_hist",atom->nmax,5);
	d_s_hist = k_s_hist.template view<DeviceType>();
	h_s_hist = k_s_hist.h_view;
	k_t_hist = DAT::tdual_ffloat_2d("qeq/kk:t_hist",atom->nmax,5);
	d_t_hist = k_t_hist.template view<DeviceType>();
	h_t_hist = k_t_hist.h_view;

	for( i = 0; i < atom->nmax; i++ )
	for( j = 0; j < 5; j++ )
	k_s_hist.h_view(i,j) = k_t_hist.h_view(i,j) = 0.0;

	k_s_hist.template modify<LMPHostType>();
	k_s_hist.template sync<DeviceType>();

	k_t_hist.template modify<LMPHostType>();
	k_t_hist.template sync<DeviceType>();

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void FixQEqReaxKokkos<DeviceType>::setup_pre_force(int vflag)
	{
	//neighbor->build_one(list);

	pre_force(vflag);
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void FixQEqReaxKokkos<DeviceType>::pre_force(int vflag)
	{
	if (update->ntimestep % nevery) return;

	atomKK->sync(execution_space,datamask_read);
	atomKK->modified(execution_space,datamask_modify);

	x = atomKK->k_x.view<DeviceType>();
	v = atomKK->k_v.view<DeviceType>();
	f = atomKK->k_f.view<DeviceType>();
	q = atomKK->k_q.view<DeviceType>();
	tag = atomKK->k_tag.view<DeviceType>();
	type = atomKK->k_type.view<DeviceType>();
	mask = atomKK->k_mask.view<DeviceType>();
	nlocal = atomKK->nlocal;
	nall = atom->nlocal + atom->nghost;
	newton_pair = force->newton_pair;

	k_params.template sync<DeviceType>();
	k_shield.template sync<DeviceType>();
	k_tap.template sync<DeviceType>();

	NeighListKokkos<DeviceType>* k_list = static_cast<NeighListKokkos<DeviceType>*>(list);
	d_numneigh = k_list->d_numneigh;
	d_neighbors = k_list->d_neighbors;
	d_ilist = k_list->d_ilist;
	inum = list->inum;

	k_list->clean_copy();
	//cleanup_copy();
	copymode = 1;

	int teamsize = TEAMSIZE;

	// allocate
	allocate_array();

	// get max number of neighbor
	if (!allocated_flag \|\| update->ntimestep == neighbor->lastcall)
	allocate_matrix();

	// compute_H
	FixQEqReaxKokkosComputeHFunctor<DeviceType> computeH_functor(this);
	Kokkos::parallel_scan(inum,computeH_functor);
	DeviceType::fence();

	// init_matvec
	FixQEqReaxKokkosMatVecFunctor<DeviceType> matvec_functor(this);
	Kokkos::parallel_for(inum,matvec_functor);
	DeviceType::fence();

	// comm->forward_comm_fix(this); //Dist_vector( s );
	pack_flag = 2;
	k_s.template modify<DeviceType>();
	k_s.template sync<LMPHostType>();
	comm->forward_comm_fix(this);
	k_s.template modify<LMPHostType>();
	k_s.template sync<DeviceType>();

	// comm->forward_comm_fix(this); //Dist_vector( t );
	pack_flag = 3;
	k_t.template modify<DeviceType>();
	k_t.template sync<LMPHostType>();
	comm->forward_comm_fix(this);
	k_t.template modify<LMPHostType>();
	k_t.template sync<DeviceType>();

	// 1st cg solve over b_s, s
	cg_solve1();
	DeviceType::fence();

	// 2nd cg solve over b_t, t
	cg_solve2();
	DeviceType::fence();

	// calculate_Q();
	calculate_q();
	DeviceType::fence();

	copymode = 0;

	if (!allocated_flag)
	allocated_flag = 1;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::num_neigh_item(int ii, int &maxneigh) const
	{
	const int i = d_ilist[ii];
	maxneigh += d_numneigh[i];
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void FixQEqReaxKokkos<DeviceType>::allocate_matrix()
	{
	int i,ii,m;
	const int inum = list->inum;

	nmax = atom->nmax;

	// determine the total space for the H matrix

	m_cap = 0;
	FixQEqReaxKokkosNumNeighFunctor<DeviceType> neigh_functor(this);
	Kokkos::parallel_reduce(inum,neigh_functor,m_cap);

	d_firstnbr = typename AT::t_int_1d("qeq/kk:firstnbr",nmax);
	d_numnbrs = typename AT::t_int_1d("qeq/kk:numnbrs",nmax);
	d_jlist = typename AT::t_int_1d("qeq/kk:jlist",m_cap);
	d_val = typename AT::t_ffloat_1d("qeq/kk:val",m_cap);
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void FixQEqReaxKokkos<DeviceType>::allocate_array()
	{
	if (atom->nmax > nmax) {
	nmax = atom->nmax;

	k_o = DAT::tdual_ffloat_1d("qeq/kk:h_o",nmax);
	d_o = k_o.template view<DeviceType>();
	h_o = k_o.h_view;

	d_Hdia_inv = typename AT::t_ffloat_1d("qeq/kk:h_Hdia_inv",nmax);

	d_b_s = typename AT::t_ffloat_1d("qeq/kk:h_b_s",nmax);

	d_b_t = typename AT::t_ffloat_1d("qeq/kk:h_b_t",nmax);

	k_s = DAT::tdual_ffloat_1d("qeq/kk:h_s",nmax);
	d_s = k_s.template view<DeviceType>();
	h_s = k_s.h_view;

	k_t = DAT::tdual_ffloat_1d("qeq/kk:h_t",nmax);
	d_t = k_t.template view<DeviceType>();
	h_t = k_t.h_view;

	d_p = typename AT::t_ffloat_1d("qeq/kk:h_p",nmax);

	d_r = typename AT::t_ffloat_1d("qeq/kk:h_r",nmax);

	k_d = DAT::tdual_ffloat_1d("qeq/kk:h_d",nmax);
	d_d = k_d.template view<DeviceType>();
	h_d = k_d.h_view;

	k_s_hist = DAT::tdual_ffloat_2d("qeq/kk:s_hist",nmax,5);
	d_s_hist = k_s_hist.template view<DeviceType>();
	h_s_hist = k_s_hist.h_view;

	k_t_hist = DAT::tdual_ffloat_2d("qeq/kk:t_hist",nmax,5);
	d_t_hist = k_t_hist.template view<DeviceType>();
	h_t_hist = k_t_hist.h_view;
	}

	// init_storage
	const int ignum = atom->nlocal + atom->nghost;
	FixQEqReaxKokkosZeroFunctor<DeviceType> zero_functor(this);
	Kokkos::parallel_for(ignum,zero_functor);
	DeviceType::fence();

	}
	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::zero_item(int ii) const
	{
	const int i = d_ilist[ii];
	const int itype = type(i);

	if (mask[i] & groupbit) {
	d_Hdia_inv[i] = 1.0 / params(itype).eta;
	d_b_s[i] = -params(itype).chi;
	d_b_t[i] = -1.0;
	d_s[i] = 0.0;
	d_t[i] = 0.0;
	d_p[i] = 0.0;
	d_o[i] = 0.0;
	d_r[i] = 0.0;
	d_d[i] = 0.0;
	//for( int j = 0; j < 5; j++ )
	//d_s_hist(i,j) = d_t_hist(i,j) = 0.0;
	}

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::compute_h_item(int ii, int &m_fill, const bool &final) const
	{
	const int i = d_ilist[ii];
	int j,jj,jtype,flag;

	if (mask[i] & groupbit) {

	const X_FLOAT xtmp = x(i,0);
	const X_FLOAT ytmp = x(i,1);
	const X_FLOAT ztmp = x(i,2);
	const int itype = type(i);
	const tagint itag = tag(i);
	const int jnum = d_numneigh[i];
	if (final)
	d_firstnbr[i] = m_fill;

	for (jj = 0; jj < jnum; jj++) {
	j = d_neighbors(i,jj);
	j &= NEIGHMASK;
	jtype = type(j);

	const X_FLOAT delx = x(j,0) - xtmp;
	const X_FLOAT dely = x(j,1) - ytmp;
	const X_FLOAT delz = x(j,2) - ztmp;

	if (neighflag != FULL) {
	const tagint jtag = tag(j);
	flag = 0;
	if (j < nlocal) flag = 1;
	else if (itag < jtag) flag = 1;
	else if (itag == jtag) {
	if (delz > SMALL) flag = 1;
	else if (fabs(delz) < SMALL) {
	if (dely > SMALL) flag = 1;
	else if (fabs(dely) < SMALL && delx > SMALL)
	flag = 1;
	}
	}
	if (!flag) continue;
	}

	const F_FLOAT rsq = delxdelx + delydely + delz*delz;
	if (rsq > cutsq) continue;

	if (final) {
	const F_FLOAT r = sqrt(rsq);
	d_jlist(m_fill) = j;
	const F_FLOAT shldij = d_shield(itype,jtype);
	d_val(m_fill) = calculate_H_k(r,shldij);
	}
	m_fill++;
	}
	if (final)
	d_numnbrs[i] = m_fill - d_firstnbr[i];
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	double FixQEqReaxKokkos<DeviceType>::calculate_H_k(const F_FLOAT &r, const F_FLOAT &shld) const
	{
	F_FLOAT taper, denom;

	taper = d_tap[7] * r + d_tap[6];
	taper = taper * r + d_tap[5];
	taper = taper * r + d_tap[4];
	taper = taper * r + d_tap[3];
	taper = taper * r + d_tap[2];
	taper = taper * r + d_tap[1];
	taper = taper * r + d_tap[0];

	denom = r * r * r + shld;
	denom = pow(denom,0.3333333333333);

	return taper * EV_TO_KCAL_PER_MOL / denom;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::mat_vec_item(int ii) const
	{
	const int i = d_ilist[ii];
	const int itype = type(i);

	if (mask[i] & groupbit) {
	d_Hdia_inv[i] = 1.0 / params(itype).eta;
	d_b_s[i] = -params(itype).chi;
	d_b_t[i] = -1.0;
	d_t[i] = d_t_hist(i,2) + 3*(d_t_hist(i,0) - d_t_hist(i,1));
	d_s[i] = 4(d_s_hist(i,0)+d_s_hist(i,2))-(6d_s_hist(i,1)+d_s_hist(i,3));
	}

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void FixQEqReaxKokkos<DeviceType>::cg_solve1()
	// b = b_s, x = s;
	{
	const int inum = list->inum;
	const int ignum = inum + list->gnum;
	F_FLOAT tmp, sig_old, b_norm;

	const int teamsize = TEAMSIZE;

	// sparse_matvec( &H, x, q );
	FixQEqReaxKokkosSparse12Functor<DeviceType> sparse12_functor(this);
	Kokkos::parallel_for(inum,sparse12_functor);
	DeviceType::fence();
	if (neighflag != FULL) {
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType,TagZeroQGhosts>(nlocal,nlocal+atom->nghost),*this);
	DeviceType::fence();
	if (neighflag == HALF) {
	FixQEqReaxKokkosSparse13Functor<DeviceType,HALF> sparse13_functor(this);
	Kokkos::parallel_for(inum,sparse13_functor);
	} else {
	FixQEqReaxKokkosSparse13Functor<DeviceType,HALFTHREAD> sparse13_functor(this);
	Kokkos::parallel_for(inum,sparse13_functor);
	}
	} else {
	Kokkos::parallel_for(Kokkos::TeamPolicy <DeviceType, TagSparseMatvec1> (inum, teamsize), *this);
	}
	DeviceType::fence();

	if (neighflag != FULL) {
	k_o.template modify<DeviceType>();
	k_o.template sync<LMPHostType>();
	comm->reverse_comm_fix(this); //Coll_vector( q );
	k_o.template modify<LMPHostType>();
	k_o.template sync<DeviceType>();
	}

	// vector_sum( r , 1., b, -1., q, nn );
	// preconditioning: d[j] = r[j] * Hdia_inv[j];
	// b_norm = parallel_norm( b, nn );
	F_FLOAT my_norm = 0.0;
	FixQEqReaxKokkosNorm1Functor<DeviceType> norm1_functor(this);
	Kokkos::parallel_reduce(inum,norm1_functor,my_norm);
	DeviceType::fence();
	F_FLOAT norm_sqr = 0.0;
	MPI_Allreduce( &my_norm, &norm_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
	b_norm = sqrt(norm_sqr);
	DeviceType::fence();

	// sig_new = parallel_dot( r, d, nn);
	F_FLOAT my_dot = 0.0;
	FixQEqReaxKokkosDot1Functor<DeviceType> dot1_functor(this);
	Kokkos::parallel_reduce(inum,dot1_functor,my_dot);
	DeviceType::fence();
	F_FLOAT dot_sqr = 0.0;
	MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
	F_FLOAT sig_new = dot_sqr;
	DeviceType::fence();

	int loop;
	const int loopmax = 200;
	for (loop = 1; loop < loopmax & sqrt(sig_new)/b_norm > tolerance; loop++) {

	// comm->forward_comm_fix(this); //Dist_vector( d );
	pack_flag = 1;
	k_d.template modify<DeviceType>();
	k_d.template sync<LMPHostType>();
	comm->forward_comm_fix(this);
	k_d.template modify<LMPHostType>();
	k_d.template sync<DeviceType>();

	// sparse_matvec( &H, d, q );
	FixQEqReaxKokkosSparse22Functor<DeviceType> sparse22_functor(this);
	Kokkos::parallel_for(inum,sparse22_functor);
	DeviceType::fence();
	if (neighflag != FULL) {
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType,TagZeroQGhosts>(nlocal,nlocal+atom->nghost),*this);
	DeviceType::fence();
	if (neighflag == HALF) {
	FixQEqReaxKokkosSparse23Functor<DeviceType,HALF> sparse23_functor(this);
	Kokkos::parallel_for(inum,sparse23_functor);
	} else {
	FixQEqReaxKokkosSparse23Functor<DeviceType,HALFTHREAD> sparse23_functor(this);
	Kokkos::parallel_for(inum,sparse23_functor);
	}
	} else {
	Kokkos::parallel_for(Kokkos::TeamPolicy <DeviceType, TagSparseMatvec2> (inum, teamsize), *this);
	}
	DeviceType::fence();


	if (neighflag != FULL) {
	k_o.template modify<DeviceType>();
	k_o.template sync<LMPHostType>();
	comm->reverse_comm_fix(this); //Coll_vector( q );
	k_o.template modify<LMPHostType>();
	k_o.template sync<DeviceType>();
	}

	// tmp = parallel_dot( d, q, nn);
	my_dot = dot_sqr = 0.0;
	FixQEqReaxKokkosDot2Functor<DeviceType> dot2_functor(this);
	Kokkos::parallel_reduce(inum,dot2_functor,my_dot);
	DeviceType::fence();
	MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
	tmp = dot_sqr;

	alpha = sig_new / tmp;

	sig_old = sig_new;

	// vector_add( s, alpha, d, nn );
	// vector_add( r, -alpha, q, nn );
	my_dot = dot_sqr = 0.0;
	FixQEqReaxKokkosPrecon1Functor<DeviceType> precon1_functor(this);
	Kokkos::parallel_for(inum,precon1_functor);
	DeviceType::fence();
	// preconditioning: p[j] = r[j] * Hdia_inv[j];
	// sig_new = parallel_dot( r, p, nn);
	FixQEqReaxKokkosPreconFunctor<DeviceType> precon_functor(this);
	Kokkos::parallel_reduce(inum,precon_functor,my_dot);
	DeviceType::fence();
	MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
	sig_new = dot_sqr;

	beta = sig_new / sig_old;

	// vector_sum( d, 1., p, beta, d, nn );
	FixQEqReaxKokkosVecSum2Functor<DeviceType> vecsum2_functor(this);
	Kokkos::parallel_for(inum,vecsum2_functor);
	DeviceType::fence();
	}

	if (loop >= loopmax && comm->me == 0) {
	char str[128];
	sprintf(str,"Fix qeq/reax cg_solve1 convergence failed after %d iterations "
	"at " BIGINT_FORMAT " step: %f",loop,update->ntimestep,sqrt(sig_new)/b_norm);
	error->warning(FLERR,str);
	//error->all(FLERR,str);
	}

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void FixQEqReaxKokkos<DeviceType>::cg_solve2()
	// b = b_t, x = t;
	{
	const int inum = list->inum;
	const int ignum = inum + list->gnum;
	F_FLOAT tmp, sig_old, b_norm;

	const int teamsize = TEAMSIZE;

	// sparse_matvec( &H, x, q );
	FixQEqReaxKokkosSparse32Functor<DeviceType> sparse32_functor(this);
	Kokkos::parallel_for(inum,sparse32_functor);
	DeviceType::fence();
	if (neighflag != FULL) {
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType,TagZeroQGhosts>(nlocal,nlocal+atom->nghost),*this);
	DeviceType::fence();
	if (neighflag == HALF) {
	FixQEqReaxKokkosSparse33Functor<DeviceType,HALF> sparse33_functor(this);
	Kokkos::parallel_for(inum,sparse33_functor);
	} else {
	FixQEqReaxKokkosSparse33Functor<DeviceType,HALFTHREAD> sparse33_functor(this);
	Kokkos::parallel_for(inum,sparse33_functor);
	}
	} else {
	Kokkos::parallel_for(Kokkos::TeamPolicy <DeviceType, TagSparseMatvec3> (inum, teamsize), *this);
	}
	DeviceType::fence();

	if (neighflag != FULL) {
	k_o.template modify<DeviceType>();
	k_o.template sync<LMPHostType>();
	comm->reverse_comm_fix(this); //Coll_vector( q );
	k_o.template modify<LMPHostType>();
	k_o.template sync<DeviceType>();
	}

	// vector_sum( r , 1., b, -1., q, nn );
	// preconditioning: d[j] = r[j] * Hdia_inv[j];
	// b_norm = parallel_norm( b, nn );
	F_FLOAT my_norm = 0.0;
	FixQEqReaxKokkosNorm2Functor<DeviceType> norm2_functor(this);
	Kokkos::parallel_reduce(inum,norm2_functor,my_norm);
	DeviceType::fence();
	F_FLOAT norm_sqr = 0.0;
	MPI_Allreduce( &my_norm, &norm_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
	b_norm = sqrt(norm_sqr);
	DeviceType::fence();

	// sig_new = parallel_dot( r, d, nn);
	F_FLOAT my_dot = 0.0;
	FixQEqReaxKokkosDot1Functor<DeviceType> dot1_functor(this);
	Kokkos::parallel_reduce(inum,dot1_functor,my_dot);
	DeviceType::fence();
	F_FLOAT dot_sqr = 0.0;
	MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
	F_FLOAT sig_new = dot_sqr;
	DeviceType::fence();

	int loop;
	const int loopmax = 200;
	for (loop = 1; loop < loopmax & sqrt(sig_new)/b_norm > tolerance; loop++) {

	// comm->forward_comm_fix(this); //Dist_vector( d );
	pack_flag = 1;
	k_d.template modify<DeviceType>();
	k_d.template sync<LMPHostType>();
	comm->forward_comm_fix(this);
	k_d.template modify<LMPHostType>();
	k_d.template sync<DeviceType>();

	// sparse_matvec( &H, d, q );
	FixQEqReaxKokkosSparse22Functor<DeviceType> sparse22_functor(this);
	Kokkos::parallel_for(inum,sparse22_functor);
	DeviceType::fence();
	if (neighflag != FULL) {
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType,TagZeroQGhosts>(nlocal,nlocal+atom->nghost),*this);
	DeviceType::fence();
	if (neighflag == HALF) {
	FixQEqReaxKokkosSparse23Functor<DeviceType,HALF> sparse23_functor(this);
	Kokkos::parallel_for(inum,sparse23_functor);
	} else {
	FixQEqReaxKokkosSparse23Functor<DeviceType,HALFTHREAD> sparse23_functor(this);
	Kokkos::parallel_for(inum,sparse23_functor);
	}
	} else {
	Kokkos::parallel_for(Kokkos::TeamPolicy <DeviceType, TagSparseMatvec2> (inum, teamsize), *this);
	}
	DeviceType::fence();

	if (neighflag != FULL) {
	k_o.template modify<DeviceType>();
	k_o.template sync<LMPHostType>();
	comm->reverse_comm_fix(this); //Coll_vector( q );
	k_o.template modify<LMPHostType>();
	k_o.template sync<DeviceType>();
	}

	// tmp = parallel_dot( d, q, nn);
	my_dot = dot_sqr = 0.0;
	FixQEqReaxKokkosDot2Functor<DeviceType> dot2_functor(this);
	Kokkos::parallel_reduce(inum,dot2_functor,my_dot);
	DeviceType::fence();
	MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
	tmp = dot_sqr;
	DeviceType::fence();

	alpha = sig_new / tmp;

	sig_old = sig_new;

	// vector_add( t, alpha, d, nn );
	// vector_add( r, -alpha, q, nn );
	my_dot = dot_sqr = 0.0;
	FixQEqReaxKokkosPrecon2Functor<DeviceType> precon2_functor(this);
	Kokkos::parallel_for(inum,precon2_functor);
	DeviceType::fence();
	// preconditioning: p[j] = r[j] * Hdia_inv[j];
	// sig_new = parallel_dot( r, p, nn);
	FixQEqReaxKokkosPreconFunctor<DeviceType> precon_functor(this);
	Kokkos::parallel_reduce(inum,precon_functor,my_dot);
	DeviceType::fence();
	MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
	sig_new = dot_sqr;

	beta = sig_new / sig_old;

	// vector_sum( d, 1., p, beta, d, nn );
	FixQEqReaxKokkosVecSum2Functor<DeviceType> vecsum2_functor(this);
	Kokkos::parallel_for(inum,vecsum2_functor);
	DeviceType::fence();
	}

	if (loop >= loopmax && comm->me == 0) {
	char str[128];
	sprintf(str,"Fix qeq/reax cg_solve2 convergence failed after %d iterations "
	"at " BIGINT_FORMAT " step: %f",loop,update->ntimestep,sqrt(sig_new)/b_norm);
	error->warning(FLERR,str);
	//error->all(FLERR,str);
	}

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void FixQEqReaxKokkos<DeviceType>::calculate_q()
	{
	F_FLOAT sum, sum_all;
	const int inum = list->inum;

	// s_sum = parallel_vector_acc( s, nn );
	sum = sum_all = 0.0;
	FixQEqReaxKokkosVecAcc1Functor<DeviceType> vecacc1_functor(this);
	Kokkos::parallel_reduce(inum,vecacc1_functor,sum);
	DeviceType::fence();
	MPI_Allreduce(&sum, &sum_all, 1, MPI_DOUBLE, MPI_SUM, world );
	const F_FLOAT s_sum = sum_all;

	// t_sum = parallel_vector_acc( t, nn);
	sum = sum_all = 0.0;
	FixQEqReaxKokkosVecAcc2Functor<DeviceType> vecacc2_functor(this);
	Kokkos::parallel_reduce(inum,vecacc2_functor,sum);
	DeviceType::fence();
	MPI_Allreduce(&sum, &sum_all, 1, MPI_DOUBLE, MPI_SUM, world );
	const F_FLOAT t_sum = sum_all;

	// u = s_sum / t_sum;
	delta = s_sum/t_sum;

	// q[i] = s[i] - u * t[i];
	FixQEqReaxKokkosCalculateQFunctor<DeviceType> calculateQ_functor(this);
	Kokkos::parallel_for(inum,calculateQ_functor);
	DeviceType::fence();

	pack_flag = 4;
	//comm->forward_comm_fix( this ); //Dist_vector( atom->q );
	atomKK->k_q.modify<DeviceType>();
	atomKK->k_q.sync<LMPHostType>();
	comm->forward_comm_fix(this);
	atomKK->k_q.modify<LMPHostType>();
	atomKK->k_q.sync<DeviceType>();

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::sparse12_item(int ii) const
	{
	const int i = d_ilist[ii];
	const int itype = type(i);
	if (mask[i] & groupbit) {
	d_o[i] = params(itype).eta * d_s[i];
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::sparse13_item(int ii) const
	{
	// The q array is atomic for Half/Thread neighbor style
	Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_o = d_o;

	const int i = d_ilist[ii];
	if (mask[i] & groupbit) {
	F_FLOAT tmp = 0.0;
	for(int jj = d_firstnbr[i]; jj < d_firstnbr[i] + d_numnbrs[i]; jj++) {
	const int j = d_jlist(jj);
	tmp += d_val(jj) * d_s[j];
	a_o[j] += d_val(jj) * d_s[i];
	}
	a_o[i] += tmp;
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::operator() (TagSparseMatvec1, const membertype1 &team) const
	{
	const int i = d_ilist[team.league_rank()];
	if (mask[i] & groupbit) {
	F_FLOAT doitmp;
	Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team, d_firstnbr[i], d_firstnbr[i] + d_numnbrs[i]), [&] (const int &jj, F_FLOAT &doi) {
	const int j = d_jlist(jj);
	doi += d_val(jj) * d_s[j];
	}, doitmp);
	Kokkos::single(Kokkos::PerTeam(team), [&] () {d_o[i] += doitmp;});
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::sparse22_item(int ii) const
	{
	const int i = d_ilist[ii];
	const int itype = type(i);
	if (mask[i] & groupbit) {
	d_o[i] = params(itype).eta * d_d[i];
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::sparse23_item(int ii) const
	{
	// The q array is atomic for Half/Thread neighbor style
	Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_o = d_o;

	const int i = d_ilist[ii];
	if (mask[i] & groupbit) {
	F_FLOAT tmp = 0.0;
	for(int jj = d_firstnbr[i]; jj < d_firstnbr[i] + d_numnbrs[i]; jj++) {
	const int j = d_jlist(jj);
	tmp += d_val(jj) * d_d[j];
	a_o[j] += d_val(jj) * d_d[i];
	}
	a_o[i] += tmp;
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::operator() (TagSparseMatvec2, const membertype2 &team) const
	{
	const int i = d_ilist[team.league_rank()];
	if (mask[i] & groupbit) {
	F_FLOAT doitmp;
	Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team, d_firstnbr[i], d_firstnbr[i] + d_numnbrs[i]), [&] (const int &jj, F_FLOAT &doi) {
	const int j = d_jlist(jj);
	doi += d_val(jj) * d_d[j];
	}, doitmp);
	Kokkos::single(Kokkos::PerTeam(team), [&] () {d_o[i] += doitmp; });
	}
	}

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::operator() (TagZeroQGhosts, const int &i) const
	{
	if (mask[i] & groupbit)
	d_o[i] = 0.0;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::sparse32_item(int ii) const
	{
	const int i = d_ilist[ii];
	const int itype = type(i);
	if (mask[i] & groupbit)
	d_o[i] = params(itype).eta * d_t[i];
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::sparse33_item(int ii) const
	{
	// The q array is atomic for Half/Thread neighbor style
	Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_o = d_o;

	const int i = d_ilist[ii];
	if (mask[i] & groupbit) {
	F_FLOAT tmp = 0.0;
	for(int jj = d_firstnbr[i]; jj < d_firstnbr[i] + d_numnbrs[i]; jj++) {
	const int j = d_jlist(jj);
	tmp += d_val(jj) * d_t[j];
	a_o[j] += d_val(jj) * d_t[i];
	}
	a_o[i] += tmp;
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::operator() (TagSparseMatvec3, const membertype3 &team) const
	{
	const int i = d_ilist[team.league_rank()];
	if (mask[i] & groupbit) {
	F_FLOAT doitmp;
	Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team, d_firstnbr[i], d_firstnbr[i] + d_numnbrs[i]), [&] (const int &jj, F_FLOAT &doi) {
	const int j = d_jlist(jj);
	doi += d_val(jj) * d_t[j];
	}, doitmp);
	Kokkos::single(Kokkos::PerTeam(team), [&] () {d_o[i] += doitmp;});
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::vecsum2_item(int ii) const
	{
	const int i = d_ilist[ii];
	if (mask[i] & groupbit)
	d_d[i] = 1.0 * d_p[i] + beta * d_d[i];
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	double FixQEqReaxKokkos<DeviceType>::norm1_item(int ii) const
	{
	F_FLOAT tmp = 0;
	const int i = d_ilist[ii];
	if (mask[i] & groupbit) {
	d_r[i] = 1.0d_b_s[i] + -1.0d_o[i];
	d_d[i] = d_r[i] * d_Hdia_inv[i];
	tmp = d_b_s[i] * d_b_s[i];
	}
	return tmp;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	double FixQEqReaxKokkos<DeviceType>::norm2_item(int ii) const
	{
	F_FLOAT tmp = 0;
	const int i = d_ilist[ii];
	if (mask[i] & groupbit) {
	d_r[i] = 1.0d_b_t[i] + -1.0d_o[i];
	d_d[i] = d_r[i] * d_Hdia_inv[i];
	tmp = d_b_t[i] * d_b_t[i];
	}
	return tmp;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	double FixQEqReaxKokkos<DeviceType>::dot1_item(int ii) const
	{
	F_FLOAT tmp = 0.0;
	const int i = d_ilist[ii];
	if (mask[i] & groupbit)
	tmp = d_r[i] * d_d[i];
	return tmp;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	double FixQEqReaxKokkos<DeviceType>::dot2_item(int ii) const
	{
	double tmp = 0.0;
	const int i = d_ilist[ii];
	if (mask[i] & groupbit) {
	tmp = d_d[i] * d_o[i];
	}
	return tmp;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::precon1_item(int ii) const
	{
	const int i = d_ilist[ii];
	if (mask[i] & groupbit) {
	d_s[i] += alpha * d_d[i];
	d_r[i] += -alpha * d_o[i];
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::precon2_item(int ii) const
	{
	const int i = d_ilist[ii];
	if (mask[i] & groupbit) {
	d_t[i] += alpha * d_d[i];
	d_r[i] += -alpha * d_o[i];
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	double FixQEqReaxKokkos<DeviceType>::precon_item(int ii) const
	{
	F_FLOAT tmp = 0.0;
	const int i = d_ilist[ii];
	if (mask[i] & groupbit) {
	d_p[i] = d_r[i] * d_Hdia_inv[i];
	tmp = d_r[i] * d_p[i];
	}
	return tmp;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	double FixQEqReaxKokkos<DeviceType>::vecacc1_item(int ii) const
	{
	F_FLOAT tmp = 0.0;
	const int i = d_ilist[ii];
	if (mask[i] & groupbit)
	tmp = d_s[i];
	return tmp;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	double FixQEqReaxKokkos<DeviceType>::vecacc2_item(int ii) const
	{
	F_FLOAT tmp = 0.0;
	const int i = d_ilist[ii];
	if (mask[i] & groupbit) {
	tmp = d_t[i];
	}
	return tmp;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void FixQEqReaxKokkos<DeviceType>::calculate_q_item(int ii) const
	{
	const int i = d_ilist[ii];
	if (mask[i] & groupbit) {
	q(i) = d_s[i] - delta * d_t[i];

	for (int k = 4; k > 0; --k) {
	d_s_hist(i,k) = d_s_hist(i,k-1);
	d_t_hist(i,k) = d_t_hist(i,k-1);
	}
	d_s_hist(i,0) = d_s[i];
	d_t_hist(i,0) = d_t[i];
	}

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	int FixQEqReaxKokkos<DeviceType>::pack_forward_comm(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int m;

	if( pack_flag == 1)
	for(m = 0; m < n; m++) buf[m] = h_d[list[m]];
	else if( pack_flag == 2 )
	for(m = 0; m < n; m++) buf[m] = h_s[list[m]];
	else if( pack_flag == 3 )
	for(m = 0; m < n; m++) buf[m] = h_t[list[m]];
	else if( pack_flag == 4 )
	for(m = 0; m < n; m++) buf[m] = atom->q[list[m]];

	return n;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void FixQEqReaxKokkos<DeviceType>::unpack_forward_comm(int n, int first, double *buf)
	{
	int i, m;

	if( pack_flag == 1)
	for(m = 0, i = first; m < n; m++, i++) h_d[i] = buf[m];
	else if( pack_flag == 2)
	for(m = 0, i = first; m < n; m++, i++) h_s[i] = buf[m];
	else if( pack_flag == 3)
	for(m = 0, i = first; m < n; m++, i++) h_t[i] = buf[m];
	else if( pack_flag == 4)
	for(m = 0, i = first; m < n; m++, i++) atom->q[i] = buf[m];
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	int FixQEqReaxKokkos<DeviceType>::pack_reverse_comm(int n, int first, double *buf)
	{
	int i, m;
	for(m = 0, i = first; m < n; m++, i++) {
	buf[m] = h_o[i];
	}
	return n;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void FixQEqReaxKokkos<DeviceType>::unpack_reverse_comm(int n, int list, double buf)
	{
	for(int m = 0; m < n; m++) {
	h_o[list[m]] += buf[m];
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void FixQEqReaxKokkos<DeviceType>::cleanup_copy()
	{
	id = style = NULL;
	}

	/* ----------------------------------------------------------------------
	memory usage of local atom-based arrays
	------------------------------------------------------------------------- */

	template<class DeviceType>
	double FixQEqReaxKokkos<DeviceType>::memory_usage()
	{
	double bytes;

	bytes = atom->nmax52 * sizeof(F_FLOAT); // s_hist & t_hist
	bytes += atom->nmax8 sizeof(F_FLOAT); // storage
	bytes += n_cap2 sizeof(int); // matrix...
	bytes += m_cap * sizeof(int);
	bytes += m_cap * sizeof(F_FLOAT);

	return bytes;
	}

	/* ---------------------------------------------------------------------- */\

	namespace LAMMPS_NS {
	template class FixQEqReaxKokkos<LMPDeviceType>;
	#ifdef KOKKOS_HAVE_CUDA
	template class FixQEqReaxKokkos<LMPHostType>;
	#endif
	}
	diff --git a/src/KOKKOS/fix_reaxc_bonds_kokkos.cpp b/src/KOKKOS/fix_reaxc_bonds_kokkos.cpp
	index 7688d6745..e4fb9385a 100644
	--- a/src/KOKKOS/fix_reaxc_bonds_kokkos.cpp
	+++ b/src/KOKKOS/fix_reaxc_bonds_kokkos.cpp
	@@ -1,126 +1,126 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Stan Moore (Sandia)
	------------------------------------------------------------------------- */

	#include <stdlib.h>
	#include <string.h>
	#include "fix_ave_atom.h"
	#include "fix_reaxc_bonds_kokkos.h"
	#include "atom.h"
	#include "update.h"
	-#include "pair_reax_c_kokkos.h"
	+#include "pair_reaxc_kokkos.h"
	#include "modify.h"
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "neigh_request.h"
	#include "comm.h"
	#include "force.h"
	#include "compute.h"
	#include "input.h"
	#include "variable.h"
	#include "memory.h"
	#include "error.h"
	#include "reaxc_list.h"
	#include "reaxc_types.h"
	#include "reaxc_defs.h"
	#include "atom_masks.h"

	using namespace LAMMPS_NS;
	using namespace FixConst;

	/* ---------------------------------------------------------------------- */

	FixReaxCBondsKokkos::FixReaxCBondsKokkos(LAMMPS lmp, int narg, char *arg) :
	FixReaxCBonds(lmp, narg, arg)
	{
	kokkosable = 1;
	atomKK = (AtomKokkos *) atom;

	datamask_read = EMPTY_MASK;
	datamask_modify = EMPTY_MASK;
	}

	/* ---------------------------------------------------------------------- */

	FixReaxCBondsKokkos::~FixReaxCBondsKokkos()
	{

	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCBondsKokkos::init()
	{
	Pair *pair_kk = force->pair_match("reax/c/kk",1);
	if (pair_kk == NULL) error->all(FLERR,"Cannot use fix reax/c/bonds without "
	"pair_style reax/c/kk");

	FixReaxCBonds::init();
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCBondsKokkos::Output_ReaxC_Bonds(bigint ntimestep, FILE *fp)

	{
	int nbuf_local;
	int nlocal_max, numbonds, numbonds_max;
	double *buf;
	DAT::tdual_ffloat_1d k_buf;

	int nlocal = atom->nlocal;
	int nlocal_tot = static_cast<int> (atom->natoms);

	numbonds = 0;
	if (reaxc->execution_space == Device)
	((PairReaxCKokkos<LMPDeviceType>*) reaxc)->FindBond(numbonds);
	else
	((PairReaxCKokkos<LMPHostType>*) reaxc)->FindBond(numbonds);

	// allocate a temporary buffer for the snapshot info
	MPI_Allreduce(&numbonds,&numbonds_max,1,MPI_INT,MPI_MAX,world);
	MPI_Allreduce(&nlocal,&nlocal_max,1,MPI_INT,MPI_MAX,world);

	nbuf = 1+(numbonds_max2+10)nlocal_max;
	memory->create_kokkos(k_buf,buf,nbuf,"reax/c/bonds:buf");

	// Pass information to buffer
	if (reaxc->execution_space == Device)
	((PairReaxCKokkos<LMPDeviceType>*) reaxc)->PackBondBuffer(k_buf,nbuf_local);
	else
	((PairReaxCKokkos<LMPHostType>*) reaxc)->PackBondBuffer(k_buf,nbuf_local);
	buf[0] = nlocal;

	// Receive information from buffer for output
	RecvBuffer(buf, nbuf, nbuf_local, nlocal_tot, numbonds_max);

	memory->destroy_kokkos(k_buf,buf);
	}

	/* ---------------------------------------------------------------------- */

	double FixReaxCBondsKokkos::memory_usage()
	{
	double bytes;

	bytes = nbuf*sizeof(double);
	// These are accounted for in PairReaxCKokkos:
	//bytes += nmax*sizeof(int);
	//bytes += 1.0nmaxMAXREAXBOND*sizeof(double);
	//bytes += 1.0nmaxMAXREAXBOND*sizeof(int);

	return bytes;
	}
	diff --git a/src/KOKKOS/fix_reaxc_species_kokkos.cpp b/src/KOKKOS/fix_reaxc_species_kokkos.cpp
	index 17b42174c..ce84de30c 100644
	--- a/src/KOKKOS/fix_reaxc_species_kokkos.cpp
	+++ b/src/KOKKOS/fix_reaxc_species_kokkos.cpp
	@@ -1,159 +1,159 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing authors: Stan Moore (Sandia)
	------------------------------------------------------------------------- */

	#include <stdlib.h>
	#include <math.h>
	#include "atom.h"
	#include <string.h>
	#include "fix_ave_atom.h"
	#include "fix_reaxc_species_kokkos.h"
	#include "domain.h"
	#include "update.h"
	-#include "pair_reax_c_kokkos.h"
	+#include "pair_reaxc_kokkos.h"
	#include "modify.h"
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "neigh_request.h"
	#include "comm.h"
	#include "force.h"
	#include "compute.h"
	#include "input.h"
	#include "variable.h"
	#include "memory.h"
	#include "error.h"
	#include "reaxc_list.h"
	#include "atom_masks.h"

	using namespace LAMMPS_NS;
	using namespace FixConst;

	/* ---------------------------------------------------------------------- */

	FixReaxCSpeciesKokkos::FixReaxCSpeciesKokkos(LAMMPS lmp, int narg, char *arg) :
	FixReaxCSpecies(lmp, narg, arg)
	{
	kokkosable = 1;
	atomKK = (AtomKokkos *) atom;

	// NOTE: Could improve performance if a Kokkos version of ComputeSpecAtom is added

	datamask_read = X_MASK \| V_MASK \| Q_MASK \| MASK_MASK;
	datamask_modify = EMPTY_MASK;
	}

	/* ---------------------------------------------------------------------- */

	FixReaxCSpeciesKokkos::~FixReaxCSpeciesKokkos()
	{

	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpeciesKokkos::init()
	{
	Pair* pair_kk = force->pair_match("reax/c/kk",1);
	if (pair_kk == NULL) error->all(FLERR,"Cannot use fix reax/c/species/kk without "
	"pair_style reax/c/kk");

	FixReaxCSpecies::init();
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpeciesKokkos::FindMolecule()
	{
	int i,j,ii,jj,inum,itype,jtype,loop,looptot;
	int change,done,anychange;
	int *mask = atom->mask;
	double bo_tmp,bo_cut;
	double **spec_atom = f_SPECBOND->array_atom;

	inum = reaxc->list->inum;
	typename ArrayTypes<LMPHostType>::t_int_1d ilist;
	if (reaxc->execution_space == Host) {
	NeighListKokkos<LMPHostType>* k_list = static_cast<NeighListKokkos<LMPHostType>*>(reaxc->list);
	k_list->k_ilist.sync<LMPHostType>();
	ilist = k_list->k_ilist.h_view;
	} else {
	NeighListKokkos<LMPDeviceType>* k_list = static_cast<NeighListKokkos<LMPDeviceType>*>(reaxc->list);
	k_list->k_ilist.sync<LMPHostType>();
	ilist = k_list->k_ilist.h_view;
	}

	for (ii = 0; ii < inum; ii++) {
	i = ilist[ii];
	if (mask[i] & groupbit) {
	clusterID[i] = atom->tag[i];
	x0[i].x = spec_atom[i][1];
	x0[i].y = spec_atom[i][2];
	x0[i].z = spec_atom[i][3];
	}
	else clusterID[i] = 0.0;
	}

	loop = 0;
	while (1) {
	comm->forward_comm_fix(this);
	loop ++;

	change = 0;
	while (1) {
	done = 1;

	for (ii = 0; ii < inum; ii++) {
	i = ilist[ii];
	if (!(mask[i] & groupbit)) continue;

	itype = atom->type[i];

	for (jj = 0; jj < MAXSPECBOND; jj++) {
	j = reaxc->tmpid[i][jj];

	if (j < i) continue;
	if (!(mask[j] & groupbit)) continue;

	if (clusterID[i] == clusterID[j] && PBCconnected[i] == PBCconnected[j]
	&& x0[i].x == x0[j].x && x0[i].y == x0[j].y && x0[i].z == x0[j].z) continue;

	jtype = atom->type[j];
	bo_cut = BOCut[itype][jtype];
	bo_tmp = spec_atom[i][jj+7];

	if (bo_tmp > bo_cut) {
	clusterID[i] = clusterID[j] = MIN(clusterID[i], clusterID[j]);
	PBCconnected[i] = PBCconnected[j] = MAX(PBCconnected[i], PBCconnected[j]);
	x0[i] = x0[j] = chAnchor(x0[i], x0[j]);
	if ((fabs(spec_atom[i][1] - spec_atom[j][1]) > reaxc->control->bond_cut)
	\|\| (fabs(spec_atom[i][2] - spec_atom[j][2]) > reaxc->control->bond_cut)
	\|\| (fabs(spec_atom[i][3] - spec_atom[j][3]) > reaxc->control->bond_cut))
	PBCconnected[i] = PBCconnected[j] = 1;
	done = 0;
	}
	}
	}
	if (!done) change = 1;
	if (done) break;
	}
	MPI_Allreduce(&change,&anychange,1,MPI_INT,MPI_MAX,world);
	if (!anychange) break;

	MPI_Allreduce(&loop,&looptot,1,MPI_INT,MPI_SUM,world);
	if (looptot >= 400*nprocs) break;

	}
	-}
	\ No newline at end of file
	+}
	diff --git a/src/KOKKOS/modify_kokkos.cpp b/src/KOKKOS/modify_kokkos.cpp
	index b4a89c8e3..c9242f211 100644
	--- a/src/KOKKOS/modify_kokkos.cpp
	+++ b/src/KOKKOS/modify_kokkos.cpp
	@@ -1,721 +1,761 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include "modify_kokkos.h"
	#include "atom_kokkos.h"
	#include "update.h"
	#include "fix.h"
	#include "compute.h"
	#include "kokkos.h"

	using namespace LAMMPS_NS;

	#define BIG 1.0e20

	/* ---------------------------------------------------------------------- */

	ModifyKokkos::ModifyKokkos(LAMMPS *lmp) : Modify(lmp)
	{
	atomKK = (AtomKokkos *) atom;
	}

	/* ----------------------------------------------------------------------
	setup for run, calls setup() of all fixes and computes
	called from Verlet, RESPA, Min
	------------------------------------------------------------------------- */

	void ModifyKokkos::setup(int vflag)
	{
	// compute setup needs to come before fix setup
	// b/c NH fixes need use DOF of temperature computes

	for (int i = 0; i < ncompute; i++) compute[i]->setup();

	if (update->whichflag == 1)
	for (int i = 0; i < nfix; i++) {
	atomKK->sync(fix[i]->execution_space,fix[i]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[i]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[i]->setup(vflag);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[i]->execution_space,fix[i]->datamask_modify);
	}
	else if (update->whichflag == 2)
	for (int i = 0; i < nfix; i++) {
	atomKK->sync(fix[i]->execution_space,fix[i]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[i]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[i]->min_setup(vflag);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[i]->execution_space,fix[i]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	setup pre_exchange call, only for fixes that define pre_exchange
	called from Verlet, RESPA, Min, and WriteRestart with whichflag = 0
	------------------------------------------------------------------------- */

	void ModifyKokkos::setup_pre_exchange()
	{
	if (update->whichflag <= 1)
	for (int i = 0; i < n_pre_exchange; i++) {
	atomKK->sync(fix[list_pre_exchange[i]]->execution_space,
	fix[list_pre_exchange[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_pre_exchange[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_pre_exchange[i]]->setup_pre_exchange();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_pre_exchange[i]]->execution_space,
	fix[list_pre_exchange[i]]->datamask_modify);
	}
	else if (update->whichflag == 2)
	for (int i = 0; i < n_min_pre_exchange; i++) {
	atomKK->sync(fix[list_min_pre_exchange[i]]->execution_space,
	fix[list_min_pre_exchange[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_min_pre_exchange[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_min_pre_exchange[i]]->setup_pre_exchange();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_min_pre_exchange[i]]->execution_space,
	fix[list_min_pre_exchange[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	setup pre_neighbor call, only for fixes that define pre_neighbor
	called from Verlet, RESPA
	------------------------------------------------------------------------- */

	void ModifyKokkos::setup_pre_neighbor()
	{
	if (update->whichflag == 1)
	for (int i = 0; i < n_pre_neighbor; i++) {
	atomKK->sync(fix[list_pre_neighbor[i]]->execution_space,
	fix[list_pre_neighbor[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_pre_neighbor[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_pre_neighbor[i]]->setup_pre_neighbor();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_pre_neighbor[i]]->execution_space,
	fix[list_pre_neighbor[i]]->datamask_modify);
	}
	else if (update->whichflag == 2)
	for (int i = 0; i < n_min_pre_neighbor; i++) {
	atomKK->sync(fix[list_min_pre_neighbor[i]]->execution_space,
	fix[list_min_pre_neighbor[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_min_pre_neighbor[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_min_pre_neighbor[i]]->setup_pre_neighbor();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_min_pre_neighbor[i]]->execution_space,
	fix[list_min_pre_neighbor[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	setup pre_force call, only for fixes that define pre_force
	called from Verlet, RESPA, Min
	------------------------------------------------------------------------- */

	void ModifyKokkos::setup_pre_force(int vflag)
	{
	if (update->whichflag == 1)
	for (int i = 0; i < n_pre_force; i++) {
	atomKK->sync(fix[list_pre_force[i]]->execution_space,
	fix[list_pre_force[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_pre_force[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_pre_force[i]]->setup_pre_force(vflag);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_pre_force[i]]->execution_space,
	fix[list_pre_force[i]]->datamask_modify);
	}
	else if (update->whichflag == 2)
	for (int i = 0; i < n_min_pre_force; i++) {
	atomKK->sync(fix[list_min_pre_force[i]]->execution_space,
	fix[list_min_pre_force[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_min_pre_force[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_min_pre_force[i]]->setup_pre_force(vflag);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_min_pre_force[i]]->execution_space,
	fix[list_min_pre_force[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	setup pre_reverse call, only for fixes that define pre_reverse
	called from Verlet, RESPA, Min
	------------------------------------------------------------------------- */

	void ModifyKokkos::setup_pre_reverse(int eflag, int vflag)
	{
	if (update->whichflag == 1)
	for (int i = 0; i < n_pre_reverse; i++) {
	atomKK->sync(fix[list_pre_reverse[i]]->execution_space,
	fix[list_pre_reverse[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_pre_reverse[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_pre_reverse[i]]->setup_pre_reverse(eflag,vflag);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_pre_reverse[i]]->execution_space,
	fix[list_pre_reverse[i]]->datamask_modify);
	}
	else if (update->whichflag == 2)
	for (int i = 0; i < n_min_pre_reverse; i++) {
	atomKK->sync(fix[list_min_pre_reverse[i]]->execution_space,
	fix[list_min_pre_reverse[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_min_pre_reverse[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_min_pre_reverse[i]]->setup_pre_reverse(eflag,vflag);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_min_pre_reverse[i]]->execution_space,
	fix[list_min_pre_reverse[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	1st half of integrate call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::initial_integrate(int vflag)
	{
	for (int i = 0; i < n_initial_integrate; i++) {
	atomKK->sync(fix[list_initial_integrate[i]]->execution_space,
	fix[list_initial_integrate[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_initial_integrate[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_initial_integrate[i]]->initial_integrate(vflag);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_initial_integrate[i]]->execution_space,
	fix[list_initial_integrate[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	post_integrate call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::post_integrate()
	{
	for (int i = 0; i < n_post_integrate; i++) {
	atomKK->sync(fix[list_post_integrate[i]]->execution_space,
	fix[list_post_integrate[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_post_integrate[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_post_integrate[i]]->post_integrate();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_post_integrate[i]]->execution_space,
	fix[list_post_integrate[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	pre_exchange call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::pre_exchange()
	{
	for (int i = 0; i < n_pre_exchange; i++) {
	atomKK->sync(fix[list_pre_exchange[i]]->execution_space,
	fix[list_pre_exchange[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_pre_exchange[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_pre_exchange[i]]->pre_exchange();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_pre_exchange[i]]->execution_space,
	fix[list_pre_exchange[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	pre_neighbor call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::pre_neighbor()
	{
	for (int i = 0; i < n_pre_neighbor; i++) {
	atomKK->sync(fix[list_pre_neighbor[i]]->execution_space,
	fix[list_pre_neighbor[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_pre_neighbor[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_pre_neighbor[i]]->pre_neighbor();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_pre_neighbor[i]]->execution_space,
	fix[list_pre_neighbor[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	pre_force call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::pre_force(int vflag)
	{
	for (int i = 0; i < n_pre_force; i++) {
	atomKK->sync(fix[list_pre_force[i]]->execution_space,
	fix[list_pre_force[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_pre_force[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_pre_force[i]]->pre_force(vflag);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_pre_force[i]]->execution_space,
	fix[list_pre_force[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	pre_reverse call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::pre_reverse(int eflag, int vflag)
	{
	for (int i = 0; i < n_pre_reverse; i++) {
	atomKK->sync(fix[list_pre_reverse[i]]->execution_space,
	fix[list_pre_reverse[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_pre_reverse[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_pre_reverse[i]]->pre_reverse(eflag,vflag);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_pre_reverse[i]]->execution_space,
	fix[list_pre_reverse[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	post_force call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::post_force(int vflag)
	{
	for (int i = 0; i < n_post_force; i++) {
	atomKK->sync(fix[list_post_force[i]]->execution_space,
	fix[list_post_force[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_post_force[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_post_force[i]]->post_force(vflag);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_post_force[i]]->execution_space,
	fix[list_post_force[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	2nd half of integrate call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::final_integrate()
	{
	for (int i = 0; i < n_final_integrate; i++) {
	atomKK->sync(fix[list_final_integrate[i]]->execution_space,
	fix[list_final_integrate[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_final_integrate[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_final_integrate[i]]->final_integrate();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_final_integrate[i]]->execution_space,
	fix[list_final_integrate[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	end-of-timestep call, only for relevant fixes
	only call fix->end_of_step() on timesteps that are multiples of nevery
	------------------------------------------------------------------------- */

	void ModifyKokkos::end_of_step()
	{
	for (int i = 0; i < n_end_of_step; i++)
	if (update->ntimestep % end_of_step_every[i] == 0) {
	atomKK->sync(fix[list_end_of_step[i]]->execution_space,
	fix[list_end_of_step[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_end_of_step[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_end_of_step[i]]->end_of_step();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_end_of_step[i]]->execution_space,
	fix[list_end_of_step[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	thermo energy call, only for relevant fixes
	called by Thermo class
	compute_scalar() is fix call to return energy
	------------------------------------------------------------------------- */

	double ModifyKokkos::thermo_energy()
	{
	double energy = 0.0;
	for (int i = 0; i < n_thermo_energy; i++) {
	atomKK->sync(fix[list_thermo_energy[i]]->execution_space,
	fix[list_thermo_energy[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_thermo_energy[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	energy += fix[list_thermo_energy[i]]->compute_scalar();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_thermo_energy[i]]->execution_space,
	fix[list_thermo_energy[i]]->datamask_modify);
	}
	return energy;
	}

	/* ----------------------------------------------------------------------
	post_run call
	------------------------------------------------------------------------- */

	void ModifyKokkos::post_run()
	{
	for (int i = 0; i < nfix; i++) {
	atomKK->sync(fix[i]->execution_space,
	fix[i]->datamask_read);
	fix[i]->post_run();
	atomKK->modified(fix[i]->execution_space,
	fix[i]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	setup rRESPA pre_force call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::setup_pre_force_respa(int vflag, int ilevel)
	{
	for (int i = 0; i < n_pre_force; i++) {
	atomKK->sync(fix[list_pre_force[i]]->execution_space,
	fix[list_pre_force[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_pre_force[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_pre_force[i]]->setup_pre_force_respa(vflag,ilevel);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_pre_force[i]]->execution_space,
	fix[list_pre_force[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	1st half of rRESPA integrate call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::initial_integrate_respa(int vflag, int ilevel, int iloop)
	{
	for (int i = 0; i < n_initial_integrate_respa; i++) {
	atomKK->sync(fix[list_initial_integrate_respa[i]]->execution_space,
	fix[list_initial_integrate_respa[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_initial_integrate_respa[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_initial_integrate_respa[i]]->
	initial_integrate_respa(vflag,ilevel,iloop);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_initial_integrate_respa[i]]->execution_space,
	fix[list_initial_integrate_respa[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	rRESPA post_integrate call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::post_integrate_respa(int ilevel, int iloop)
	{
	for (int i = 0; i < n_post_integrate_respa; i++) {
	atomKK->sync(fix[list_post_integrate_respa[i]]->execution_space,
	fix[list_post_integrate_respa[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_post_integrate_respa[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_post_integrate_respa[i]]->post_integrate_respa(ilevel,iloop);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_post_integrate_respa[i]]->execution_space,
	fix[list_post_integrate_respa[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	rRESPA pre_force call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::pre_force_respa(int vflag, int ilevel, int iloop)
	{
	for (int i = 0; i < n_pre_force_respa; i++) {
	atomKK->sync(fix[list_pre_force_respa[i]]->execution_space,
	fix[list_pre_force_respa[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_pre_force_respa[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_pre_force_respa[i]]->pre_force_respa(vflag,ilevel,iloop);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_pre_force_respa[i]]->execution_space,
	fix[list_pre_force_respa[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	rRESPA post_force call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::post_force_respa(int vflag, int ilevel, int iloop)
	{
	for (int i = 0; i < n_post_force_respa; i++) {
	atomKK->sync(fix[list_post_force_respa[i]]->execution_space,
	fix[list_post_force_respa[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_post_force_respa[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_post_force_respa[i]]->post_force_respa(vflag,ilevel,iloop);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_post_force_respa[i]]->execution_space,
	fix[list_post_force_respa[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	2nd half of rRESPA integrate call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::final_integrate_respa(int ilevel, int iloop)
	{
	for (int i = 0; i < n_final_integrate_respa; i++) {
	atomKK->sync(fix[list_final_integrate_respa[i]]->execution_space,
	fix[list_final_integrate_respa[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_final_integrate_respa[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_final_integrate_respa[i]]->final_integrate_respa(ilevel,iloop);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_final_integrate_respa[i]]->execution_space,
	fix[list_final_integrate_respa[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	minimizer pre-exchange call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::min_pre_exchange()
	{
	for (int i = 0; i < n_min_pre_exchange; i++) {
	atomKK->sync(fix[list_min_pre_exchange[i]]->execution_space,
	fix[list_min_pre_exchange[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_min_pre_exchange[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_min_pre_exchange[i]]->min_pre_exchange();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_min_pre_exchange[i]]->execution_space,
	fix[list_min_pre_exchange[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	minimizer pre-neighbor call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::min_pre_neighbor()
	{
	for (int i = 0; i < n_min_pre_neighbor; i++) {
	atomKK->sync(fix[list_min_pre_neighbor[i]]->execution_space,
	fix[list_min_pre_neighbor[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_min_pre_neighbor[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_min_pre_neighbor[i]]->min_pre_neighbor();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_min_pre_neighbor[i]]->execution_space,
	fix[list_min_pre_neighbor[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	minimizer pre-force call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::min_pre_force(int vflag)
	{
	for (int i = 0; i < n_min_pre_force; i++) {
	atomKK->sync(fix[list_min_pre_force[i]]->execution_space,
	fix[list_min_pre_force[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_min_pre_force[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_min_pre_force[i]]->min_pre_force(vflag);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_min_pre_force[i]]->execution_space,
	fix[list_min_pre_force[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	minimizer pre-reverse call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::min_pre_reverse(int eflag, int vflag)
	{
	for (int i = 0; i < n_min_pre_reverse; i++) {
	atomKK->sync(fix[list_min_pre_reverse[i]]->execution_space,
	fix[list_min_pre_reverse[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_min_pre_reverse[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_min_pre_reverse[i]]->min_pre_reverse(eflag,vflag);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_min_pre_reverse[i]]->execution_space,
	fix[list_min_pre_reverse[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	minimizer force adjustment call, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::min_post_force(int vflag)
	{
	for (int i = 0; i < n_min_post_force; i++) {
	atomKK->sync(fix[list_min_post_force[i]]->execution_space,
	fix[list_min_post_force[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_min_post_force[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_min_post_force[i]]->min_post_force(vflag);
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_min_post_force[i]]->execution_space,
	fix[list_min_post_force[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	minimizer energy/force evaluation, only for relevant fixes
	return energy and forces on extra degrees of freedom
	------------------------------------------------------------------------- */

	double ModifyKokkos::min_energy(double *fextra)
	{
	int ifix,index;

	index = 0;
	double eng = 0.0;
	for (int i = 0; i < n_min_energy; i++) {
	ifix = list_min_energy[i];
	atomKK->sync(fix[ifix]->execution_space,fix[ifix]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[ifix]->kokkosable) lmp->kokkos->auto_sync = 1;
	eng += fix[ifix]->min_energy(&fextra[index]);
	index += fix[ifix]->min_dof();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[ifix]->execution_space,fix[ifix]->datamask_modify);
	}
	return eng;
	}

	/* ----------------------------------------------------------------------
	store current state of extra dof, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::min_store()
	{
	for (int i = 0; i < n_min_energy; i++) {
	atomKK->sync(fix[list_min_energy[i]]->execution_space,
	fix[list_min_energy[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_min_energy[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_min_energy[i]]->min_store();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_min_energy[i]]->execution_space,
	fix[list_min_energy[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	mange state of extra dof on a stack, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::min_clearstore()
	{
	for (int i = 0; i < n_min_energy; i++) {
	atomKK->sync(fix[list_min_energy[i]]->execution_space,
	fix[list_min_energy[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_min_energy[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_min_energy[i]]->min_clearstore();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_min_energy[i]]->execution_space,
	fix[list_min_energy[i]]->datamask_modify);
	}
	}

	void ModifyKokkos::min_pushstore()
	{
	for (int i = 0; i < n_min_energy; i++) {
	atomKK->sync(fix[list_min_energy[i]]->execution_space,
	fix[list_min_energy[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_min_energy[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_min_energy[i]]->min_pushstore();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_min_energy[i]]->execution_space,
	fix[list_min_energy[i]]->datamask_modify);
	}
	}

	void ModifyKokkos::min_popstore()
	{
	for (int i = 0; i < n_min_energy; i++) {
	atomKK->sync(fix[list_min_energy[i]]->execution_space,
	fix[list_min_energy[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_min_energy[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[list_min_energy[i]]->min_popstore();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_min_energy[i]]->execution_space,
	fix[list_min_energy[i]]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	displace extra dof along vector hextra, only for relevant fixes
	------------------------------------------------------------------------- */

	void ModifyKokkos::min_step(double alpha, double *hextra)
	{
	int ifix,index;

	index = 0;
	for (int i = 0; i < n_min_energy; i++) {
	ifix = list_min_energy[i];
	atomKK->sync(fix[ifix]->execution_space,fix[ifix]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[ifix]->kokkosable) lmp->kokkos->auto_sync = 1;
	fix[ifix]->min_step(alpha,&hextra[index]);
	index += fix[ifix]->min_dof();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[ifix]->execution_space,fix[ifix]->datamask_modify);
	}
	}

	/* ----------------------------------------------------------------------
	compute max allowed step size along vector hextra, only for relevant fixes
	------------------------------------------------------------------------- */

	double ModifyKokkos::max_alpha(double *hextra)
	{
	int ifix,index;

	double alpha = BIG;
	index = 0;
	for (int i = 0; i < n_min_energy; i++) {
	ifix = list_min_energy[i];
	atomKK->sync(fix[ifix]->execution_space,fix[ifix]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[ifix]->kokkosable) lmp->kokkos->auto_sync = 1;
	double alpha_one = fix[ifix]->max_alpha(&hextra[index]);
	alpha = MIN(alpha,alpha_one);
	index += fix[ifix]->min_dof();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[ifix]->execution_space,fix[ifix]->datamask_modify);
	}
	return alpha;
	}

	/* ----------------------------------------------------------------------
	extract extra dof for minimization, only for relevant fixes
	------------------------------------------------------------------------- */

	int ModifyKokkos::min_dof()
	{
	int ndof = 0;
	for (int i = 0; i < n_min_energy; i++) {
	atomKK->sync(fix[list_min_energy[i]]->execution_space,
	fix[list_min_energy[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_min_energy[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	ndof += fix[list_min_energy[i]]->min_dof();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	atomKK->modified(fix[list_min_energy[i]]->execution_space,
	fix[list_min_energy[i]]->datamask_modify);
	}
	return ndof;
	}

	/* ----------------------------------------------------------------------
	reset reference state of fix, only for relevant fixes
	------------------------------------------------------------------------- */

	int ModifyKokkos::min_reset_ref()
	{
	int itmp,itmpall;
	itmpall = 0;
	for (int i = 0; i < n_min_energy; i++) {
	atomKK->sync(fix[list_min_energy[i]]->execution_space,
	fix[list_min_energy[i]]->datamask_read);
	+ int prev_auto_sync = lmp->kokkos->auto_sync;
	if (!fix[list_min_energy[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
	itmp = fix[list_min_energy[i]]->min_reset_ref();
	- lmp->kokkos->auto_sync = 0;
	+ lmp->kokkos->auto_sync = prev_auto_sync;
	if (itmp) itmpall = 1;
	atomKK->modified(fix[list_min_energy[i]]->execution_space,
	fix[list_min_energy[i]]->datamask_modify);
	}
	return itmpall;
	}
	diff --git a/src/KOKKOS/pair_reax_c_kokkos.cpp b/src/KOKKOS/pair_reaxc_kokkos.cpp
	similarity index 99%
	rename from src/KOKKOS/pair_reax_c_kokkos.cpp
	rename to src/KOKKOS/pair_reaxc_kokkos.cpp
	index acf9c754c..59369b5e0 100644
	--- a/src/KOKKOS/pair_reax_c_kokkos.cpp
	+++ b/src/KOKKOS/pair_reaxc_kokkos.cpp
	@@ -1,4196 +1,4196 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Ray Shan (SNL), Stan Moore (SNL)
	------------------------------------------------------------------------- */

	#include <math.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <string.h>
	-#include "pair_reax_c_kokkos.h"
	+#include "pair_reaxc_kokkos.h"
	#include "kokkos.h"
	#include "atom_kokkos.h"
	#include "comm.h"
	#include "force.h"
	#include "neighbor.h"
	#include "neigh_request.h"
	#include "neigh_list_kokkos.h"
	#include "update.h"
	#include "integrate.h"
	#include "respa.h"
	#include "math_const.h"
	#include "math_special.h"
	#include "memory.h"
	#include "error.h"
	#include "atom_masks.h"
	#include "reaxc_defs.h"
	#include "reaxc_lookup.h"
	#include "reaxc_tool_box.h"


	#define TEAMSIZE 128

	/* ---------------------------------------------------------------------- */

	namespace LAMMPS_NS {
	using namespace MathConst;
	using namespace MathSpecial;

	template<class DeviceType>
	PairReaxCKokkos<DeviceType>::PairReaxCKokkos(LAMMPS *lmp) : PairReaxC(lmp)
	{
	respa_enable = 0;

	cut_nbsq = cut_hbsq = cut_bosq = 0.0;

	atomKK = (AtomKokkos *) atom;
	execution_space = ExecutionSpaceFromDevice<DeviceType>::space;
	datamask_read = X_MASK \| Q_MASK \| F_MASK \| TYPE_MASK \| ENERGY_MASK \| VIRIAL_MASK;
	datamask_modify = F_MASK \| ENERGY_MASK \| VIRIAL_MASK;

	k_resize_bo = DAT::tdual_int_scalar("pair:resize_bo");
	d_resize_bo = k_resize_bo.view<DeviceType>();

	k_resize_hb = DAT::tdual_int_scalar("pair:resize_hb");
	d_resize_hb = k_resize_hb.view<DeviceType>();

	nmax = 0;
	maxbo = 1;
	maxhb = 1;

	k_error_flag = DAT::tdual_int_scalar("pair:error_flag");
	k_nbuf_local = DAT::tdual_int_scalar("pair:nbuf_local");
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	PairReaxCKokkos<DeviceType>::~PairReaxCKokkos()
	{
	if (copymode) return;

	memory->destroy_kokkos(k_eatom,eatom);
	memory->destroy_kokkos(k_vatom,vatom);

	memory->destroy_kokkos(k_tmpid,tmpid);
	tmpid = NULL;
	memory->destroy_kokkos(k_tmpbo,tmpbo);
	tmpbo = NULL;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void PairReaxCKokkos<DeviceType>::allocate()
	{
	int n = atom->ntypes;

	k_params_sing = Kokkos::DualView<params_sing*,typename DeviceType::array_layout,DeviceType>
	("PairReaxC::params_sing",n+1);
	paramssing = k_params_sing.d_view;

	k_params_twbp = Kokkos::DualView<params_twbp**,typename DeviceType::array_layout,DeviceType>
	("PairReaxC::params_twbp",n+1,n+1);
	paramstwbp = k_params_twbp.d_view;

	k_params_thbp = Kokkos::DualView<params_thbp***,typename DeviceType::array_layout,DeviceType>
	("PairReaxC::params_thbp",n+1,n+1,n+1);
	paramsthbp = k_params_thbp.d_view;

	k_params_fbp = Kokkos::DualView<params_fbp****,typename DeviceType::array_layout,DeviceType>
	("PairReaxC::params_fbp",n+1,n+1,n+1,n+1);
	paramsfbp = k_params_fbp.d_view;

	k_params_hbp = Kokkos::DualView<params_hbp***,typename DeviceType::array_layout,DeviceType>
	("PairReaxC::params_hbp",n+1,n+1,n+1);
	paramshbp = k_params_hbp.d_view;

	k_tap = DAT::tdual_ffloat_1d("pair:tap",8);
	d_tap = k_tap.d_view;
	h_tap = k_tap.h_view;

	}

	/* ----------------------------------------------------------------------
	init specific to this pair style
	------------------------------------------------------------------------- */

	template<class DeviceType>
	void PairReaxCKokkos<DeviceType>::init_style()
	{
	PairReaxC::init_style();

	// irequest = neigh request made by parent class

	neighflag = lmp->kokkos->neighflag;
	int irequest = neighbor->nrequest - 1;

	neighbor->requests[irequest]->
	kokkos_host = Kokkos::Impl::is_same<DeviceType,LMPHostType>::value &&
	!Kokkos::Impl::is_same<DeviceType,LMPDeviceType>::value;
	neighbor->requests[irequest]->
	kokkos_device = Kokkos::Impl::is_same<DeviceType,LMPDeviceType>::value;

	if (neighflag == FULL) {
	neighbor->requests[irequest]->full = 1;
	neighbor->requests[irequest]->half = 0;
	neighbor->requests[irequest]->ghost = 1;
	} else if (neighflag == HALF \|\| neighflag == HALFTHREAD) {
	neighbor->requests[irequest]->full = 0;
	neighbor->requests[irequest]->half = 1;
	neighbor->requests[irequest]->ghost = 1;
	} else {
	error->all(FLERR,"Cannot use chosen neighbor list style with reax/c/kk");
	}

	allocate();
	setup();
	init_md();
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void PairReaxCKokkos<DeviceType>::setup()
	{
	int i,j,k,m;
	int n = atom->ntypes;

	// general parameters
	for (i = 0; i < 39; i ++)
	gp[i] = system->reax_param.gp.l[i];

	p_boc1 = gp[0];
	p_boc2 = gp[1];

	// vdw parameters
	vdwflag = system->reax_param.gp.vdw_type;
	lgflag = control->lgflag;

	// atom, bond, angle, dihedral, H-bond specific parameters
	two_body_parameters *twbp;

	// valence angle (3-body) parameters
	three_body_header *thbh;
	three_body_parameters *thbp;

	// torsion angle (4-body) parameters
	four_body_header *fbh;
	four_body_parameters *fbp;

	// hydrogen bond parameters
	hbond_parameters *hbp;

	for (i = 1; i <= n; i++) {
	// general
	k_params_sing.h_view(i).mass = system->reax_param.sbp[map[i]].mass;

	// polarization
	k_params_sing.h_view(i).chi = system->reax_param.sbp[map[i]].chi;
	k_params_sing.h_view(i).eta = system->reax_param.sbp[map[i]].eta;

	// bond order
	k_params_sing.h_view(i).r_s = system->reax_param.sbp[map[i]].r_s;
	k_params_sing.h_view(i).r_pi = system->reax_param.sbp[map[i]].r_pi;
	k_params_sing.h_view(i).r_pi2 = system->reax_param.sbp[map[i]].r_pi_pi;
	k_params_sing.h_view(i).valency = system->reax_param.sbp[map[i]].valency;
	k_params_sing.h_view(i).valency_val = system->reax_param.sbp[map[i]].valency_val;
	k_params_sing.h_view(i).valency_boc = system->reax_param.sbp[map[i]].valency_boc;
	k_params_sing.h_view(i).valency_e = system->reax_param.sbp[map[i]].valency_e;
	k_params_sing.h_view(i).nlp_opt = system->reax_param.sbp[map[i]].nlp_opt;

	// multibody
	k_params_sing.h_view(i).p_lp2 = system->reax_param.sbp[map[i]].p_lp2;
	k_params_sing.h_view(i).p_ovun2 = system->reax_param.sbp[map[i]].p_ovun2;
	k_params_sing.h_view(i).p_ovun5 = system->reax_param.sbp[map[i]].p_ovun5;

	// angular
	k_params_sing.h_view(i).p_val3 = system->reax_param.sbp[map[i]].p_val3;
	k_params_sing.h_view(i).p_val5 = system->reax_param.sbp[map[i]].p_val5;

	// hydrogen bond
	k_params_sing.h_view(i).p_hbond = system->reax_param.sbp[map[i]].p_hbond;

	for (j = 1; j <= n; j++) {
	twbp = &(system->reax_param.tbp[map[i]][map[j]]);

	// vdW
	k_params_twbp.h_view(i,j).gamma = twbp->gamma;
	k_params_twbp.h_view(i,j).gamma_w = twbp->gamma_w;
	k_params_twbp.h_view(i,j).alpha = twbp->alpha;
	k_params_twbp.h_view(i,j).r_vdw = twbp->r_vdW;
	k_params_twbp.h_view(i,j).epsilon = twbp->D;
	k_params_twbp.h_view(i,j).acore = twbp->acore;
	k_params_twbp.h_view(i,j).ecore = twbp->ecore;
	k_params_twbp.h_view(i,j).rcore = twbp->rcore;
	k_params_twbp.h_view(i,j).lgre = twbp->lgre;
	k_params_twbp.h_view(i,j).lgcij = twbp->lgcij;

	// bond order
	k_params_twbp.h_view(i,j).r_s = twbp->r_s;
	k_params_twbp.h_view(i,j).r_pi = twbp->r_p;
	k_params_twbp.h_view(i,j).r_pi2 = twbp->r_pp;
	k_params_twbp.h_view(i,j).p_bo1 = twbp->p_bo1;
	k_params_twbp.h_view(i,j).p_bo2 = twbp->p_bo2;
	k_params_twbp.h_view(i,j).p_bo3 = twbp->p_bo3;
	k_params_twbp.h_view(i,j).p_bo4 = twbp->p_bo4;
	k_params_twbp.h_view(i,j).p_bo5 = twbp->p_bo5;
	k_params_twbp.h_view(i,j).p_bo6 = twbp->p_bo6;
	k_params_twbp.h_view(i,j).p_boc3 = twbp->p_boc3;
	k_params_twbp.h_view(i,j).p_boc4 = twbp->p_boc4;
	k_params_twbp.h_view(i,j).p_boc5 = twbp->p_boc5;
	k_params_twbp.h_view(i,j).ovc = twbp->ovc;
	k_params_twbp.h_view(i,j).v13cor = twbp->v13cor;

	// bond energy
	k_params_twbp.h_view(i,j).p_be1 = twbp->p_be1;
	k_params_twbp.h_view(i,j).p_be2 = twbp->p_be2;
	k_params_twbp.h_view(i,j).De_s = twbp->De_s;
	k_params_twbp.h_view(i,j).De_p = twbp->De_p;
	k_params_twbp.h_view(i,j).De_pp = twbp->De_pp;

	// multibody
	k_params_twbp.h_view(i,j).p_ovun1 = twbp->p_ovun1;

	for (k = 1; k <= n; k++) {
	// Angular
	thbh = &(system->reax_param.thbp[map[i]][map[j]][map[k]]);
	thbp = &(thbh->prm[0]);
	k_params_thbp.h_view(i,j,k).cnt = thbh->cnt;
	k_params_thbp.h_view(i,j,k).theta_00 = thbp->theta_00;
	k_params_thbp.h_view(i,j,k).p_val1 = thbp->p_val1;
	k_params_thbp.h_view(i,j,k).p_val2 = thbp->p_val2;
	k_params_thbp.h_view(i,j,k).p_val4 = thbp->p_val4;
	k_params_thbp.h_view(i,j,k).p_val7 = thbp->p_val7;
	k_params_thbp.h_view(i,j,k).p_pen1 = thbp->p_pen1;
	k_params_thbp.h_view(i,j,k).p_coa1 = thbp->p_coa1;

	// Hydrogen Bond
	hbp = &(system->reax_param.hbp[map[i]][map[j]][map[k]]);
	k_params_hbp.h_view(i,j,k).p_hb1 = hbp->p_hb1;
	k_params_hbp.h_view(i,j,k).p_hb2 = hbp->p_hb2;
	k_params_hbp.h_view(i,j,k).p_hb3 = hbp->p_hb3;
	k_params_hbp.h_view(i,j,k).r0_hb = hbp->r0_hb;

	for (m = 1; m <= n; m++) {
	// Torsion
	fbh = &(system->reax_param.fbp[map[i]][map[j]][map[k]][map[m]]);
	fbp = &(fbh->prm[0]);
	k_params_fbp.h_view(i,j,k,m).p_tor1 = fbp->p_tor1;
	k_params_fbp.h_view(i,j,k,m).p_cot1 = fbp->p_cot1;
	k_params_fbp.h_view(i,j,k,m).V1 = fbp->V1;
	k_params_fbp.h_view(i,j,k,m).V2 = fbp->V2;
	k_params_fbp.h_view(i,j,k,m).V3 = fbp->V3;
	}
	}
	}
	}
	k_params_sing.template modify<LMPHostType>();
	k_params_twbp.template modify<LMPHostType>();
	k_params_thbp.template modify<LMPHostType>();
	k_params_fbp.template modify<LMPHostType>();
	k_params_hbp.template modify<LMPHostType>();

	// cutoffs
	cut_nbsq = control->nonb_cut * control->nonb_cut;
	cut_hbsq = control->hbond_cut * control->hbond_cut;
	cut_bosq = control->bond_cut * control->bond_cut;

	// bond order cutoffs
	bo_cut = 0.01 * gp[29];
	thb_cut = control->thb_cut;
	thb_cutsq = 0.000010; //thb_cut*thb_cut;

	if (atom->nmax > nmax) {
	nmax = atom->nmax;
	allocate_array();
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void PairReaxCKokkos<DeviceType>::init_md()
	{
	// init_taper()
	F_FLOAT d1, d7, swa, swa2, swa3, swb, swb2, swb3;

	swa = control->nonb_low;
	swb = control->nonb_cut;

	if (fabs(swa) > 0.01 )
	error->warning(FLERR,"Warning: non-zero lower Taper-radius cutoff");

	if (swb < 0)
	error->one(FLERR,"Negative upper Taper-radius cutoff");
	else if (swb < 5) {
	char str[128];
	sprintf(str,"Warning: very low Taper-radius cutoff: %f\n", swb);
	error->one(FLERR,str);
	}

	d1 = swb - swa;
	d7 = powint(d1,7);
	swa2 = swa * swa;
	swa3 = swa * swa2;
	swb2 = swb * swb;
	swb3 = swb * swb2;

	k_tap.h_view(7) = 20.0/d7;
	k_tap.h_view(6) = -70.0 * (swa + swb) / d7;
	k_tap.h_view(5) = 84.0 * (swa2 + 3.0swaswb + swb2) / d7;
	k_tap.h_view(4) = -35.0 * (swa3 + 9.0swa2swb + 9.0swaswb2 + swb3 ) / d7;
	k_tap.h_view(3) = 140.0 * (swa3swb + 3.0swa2swb2 + swaswb3 ) / d7;
	k_tap.h_view(2) =-210.0 * (swa3swb2 + swa2swb3) / d7;
	k_tap.h_view(1) = 140.0 * swa3 * swb3 / d7;
	k_tap.h_view(0) = (-35.0swa3swb2swb2 + 21.0swa2swb3swb2 +
	7.0swaswb3swb3 + swb3swb3*swb ) / d7;

	k_tap.template modify<LMPHostType>();
	k_tap.template sync<DeviceType>();


	if ( control->tabulate ) {
	int ntypes = atom->ntypes;

	Init_Lookup_Tables();
	k_LR = tdual_LR_lookup_table_kk_2d("lookup:LR",ntypes+1,ntypes+1);
	d_LR = k_LR.d_view;

	for (int i = 1; i <= ntypes; ++i) {
	for (int j = i; j <= ntypes; ++j) {
	int n = LR[i][j].n;
	if (n == 0) continue;
	k_LR.h_view(i,j).xmin = LR[i][j].xmin;
	k_LR.h_view(i,j).xmax = LR[i][j].xmax;
	k_LR.h_view(i,j).n = LR[i][j].n;
	k_LR.h_view(i,j).dx = LR[i][j].dx;
	k_LR.h_view(i,j).inv_dx = LR[i][j].inv_dx;
	k_LR.h_view(i,j).a = LR[i][j].a;
	k_LR.h_view(i,j).m = LR[i][j].m;
	k_LR.h_view(i,j).c = LR[i][j].c;

	tdual_LR_data_1d k_y = tdual_LR_data_1d("lookup:LR[i,j].y",n);
	tdual_cubic_spline_coef_1d k_H = tdual_cubic_spline_coef_1d("lookup:LR[i,j].H",n);
	tdual_cubic_spline_coef_1d k_vdW = tdual_cubic_spline_coef_1d("lookup:LR[i,j].vdW",n);
	tdual_cubic_spline_coef_1d k_CEvd = tdual_cubic_spline_coef_1d("lookup:LR[i,j].CEvd",n);
	tdual_cubic_spline_coef_1d k_ele = tdual_cubic_spline_coef_1d("lookup:LR[i,j].ele",n);
	tdual_cubic_spline_coef_1d k_CEclmb = tdual_cubic_spline_coef_1d("lookup:LR[i,j].CEclmb",n);

	k_LR.h_view(i,j).d_y = k_y.d_view;
	k_LR.h_view(i,j).d_H = k_H.d_view;
	k_LR.h_view(i,j).d_vdW = k_vdW.d_view;
	k_LR.h_view(i,j).d_CEvd = k_CEvd.d_view;
	k_LR.h_view(i,j).d_ele = k_ele.d_view;
	k_LR.h_view(i,j).d_CEclmb = k_CEclmb.d_view;

	for (int k = 0; k < n; k++) {
	k_y.h_view(k) = LR[i][j].y[k];
	k_H.h_view(k) = LR[i][j].H[k];
	k_vdW.h_view(k) = LR[i][j].vdW[k];
	k_CEvd.h_view(k) = LR[i][j].CEvd[k];
	k_ele.h_view(k) = LR[i][j].ele[k];
	k_CEclmb.h_view(k) = LR[i][j].CEclmb[k];
	}

	k_y.template modify<LMPHostType>();
	k_H.template modify<LMPHostType>();
	k_vdW.template modify<LMPHostType>();
	k_CEvd.template modify<LMPHostType>();
	k_ele.template modify<LMPHostType>();
	k_CEclmb.template modify<LMPHostType>();

	k_y.template sync<DeviceType>();
	k_H.template sync<DeviceType>();
	k_vdW.template sync<DeviceType>();
	k_CEvd.template sync<DeviceType>();
	k_ele.template sync<DeviceType>();
	k_CEclmb.template sync<DeviceType>();
	}
	}
	k_LR.template modify<LMPHostType>();
	k_LR.template sync<DeviceType>();

	Deallocate_Lookup_Tables();
	}

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	int PairReaxCKokkos<DeviceType>::Init_Lookup_Tables()
	{
	int i, j, r;
	int num_atom_types;
	double dr;
	double h, fh, fvdw, fele, fCEvd, fCEclmb;
	double v0_vdw, v0_ele, vlast_vdw, vlast_ele;

	/* initializations */
	v0_vdw = 0;
	v0_ele = 0;
	vlast_vdw = 0;
	vlast_ele = 0;

	num_atom_types = atom->ntypes;
	dr = control->nonb_cut / control->tabulate;
	h = (double*)
	smalloc( (control->tabulate+2) * sizeof(double), "lookup:h", world );
	fh = (double*)
	smalloc( (control->tabulate+2) * sizeof(double), "lookup:fh", world );
	fvdw = (double*)
	smalloc( (control->tabulate+2) * sizeof(double), "lookup:fvdw", world );
	fCEvd = (double*)
	smalloc( (control->tabulate+2) * sizeof(double), "lookup:fCEvd", world );
	fele = (double*)
	smalloc( (control->tabulate+2) * sizeof(double), "lookup:fele", world );
	fCEclmb = (double*)
	smalloc( (control->tabulate+2) * sizeof(double), "lookup:fCEclmb", world );

	LR = (LR_lookup_table**)
	scalloc( num_atom_types+1, sizeof(LR_lookup_table*), "lookup:LR", world );
	for( i = 0; i < num_atom_types+1; ++i )
	LR[i] = (LR_lookup_table*)
	scalloc( num_atom_types+1, sizeof(LR_lookup_table), "lookup:LR[i]", world );

	for( i = 1; i <= num_atom_types; ++i ) {
	for( j = i; j <= num_atom_types; ++j ) {
	LR[i][j].xmin = 0;
	LR[i][j].xmax = control->nonb_cut;
	LR[i][j].n = control->tabulate + 2;
	LR[i][j].dx = dr;
	LR[i][j].inv_dx = control->tabulate / control->nonb_cut;
	LR[i][j].y = (LR_data*)
	smalloc( LR[i][j].n * sizeof(LR_data), "lookup:LR[i,j].y", world );
	LR[i][j].H = (cubic_spline_coef*)
	smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].H" ,
	world );
	LR[i][j].vdW = (cubic_spline_coef*)
	smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].vdW",
	world);
	LR[i][j].CEvd = (cubic_spline_coef*)
	smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].CEvd",
	world);
	LR[i][j].ele = (cubic_spline_coef*)
	smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].ele",
	world );
	LR[i][j].CEclmb = (cubic_spline_coef*)
	smalloc( LR[i][j].n*sizeof(cubic_spline_coef),
	"lookup:LR[i,j].CEclmb", world );

	for( r = 1; r <= control->tabulate; ++r ) {
	LR_vdW_Coulomb(i, j, r * dr, &(LR[i][j].y[r]) );
	h[r] = LR[i][j].dx;
	fh[r] = LR[i][j].y[r].H;
	fvdw[r] = LR[i][j].y[r].e_vdW;
	fCEvd[r] = LR[i][j].y[r].CEvd;
	fele[r] = LR[i][j].y[r].e_ele;
	fCEclmb[r] = LR[i][j].y[r].CEclmb;
	}

	// init the start-end points
	h[r] = LR[i][j].dx;
	v0_vdw = LR[i][j].y[1].CEvd;
	v0_ele = LR[i][j].y[1].CEclmb;
	fh[r] = fh[r-1];
	fvdw[r] = fvdw[r-1];
	fCEvd[r] = fCEvd[r-1];
	fele[r] = fele[r-1];
	fCEclmb[r] = fCEclmb[r-1];
	vlast_vdw = fCEvd[r-1];
	vlast_ele = fele[r-1];

	Natural_Cubic_Spline( &h[1], &fh[1],
	&(LR[i][j].H[1]), control->tabulate+1, world );

	Complete_Cubic_Spline( &h[1], &fvdw[1], v0_vdw, vlast_vdw,
	&(LR[i][j].vdW[1]), control->tabulate+1,
	world );

	Natural_Cubic_Spline( &h[1], &fCEvd[1],
	&(LR[i][j].CEvd[1]), control->tabulate+1,
	world );

	Complete_Cubic_Spline( &h[1], &fele[1], v0_ele, vlast_ele,
	&(LR[i][j].ele[1]), control->tabulate+1,
	world );

	Natural_Cubic_Spline( &h[1], &fCEclmb[1],
	&(LR[i][j].CEclmb[1]), control->tabulate+1,
	world );
	}// else{
	// LR[i][j].n = 0;
	//}//
	}
	free(h);
	free(fh);
	free(fvdw);
	free(fCEvd);
	free(fele);
	free(fCEclmb);

	return 1;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void PairReaxCKokkos<DeviceType>::Deallocate_Lookup_Tables()
	{
	int i, j;
	int ntypes;

	ntypes = atom->ntypes;

	for( i = 0; i < ntypes; ++i ) {
	for( j = i; j < ntypes; ++j )
	if( LR[i][j].n ) {
	sfree( LR[i][j].y, "LR[i,j].y" );
	sfree( LR[i][j].H, "LR[i,j].H" );
	sfree( LR[i][j].vdW, "LR[i,j].vdW" );
	sfree( LR[i][j].CEvd, "LR[i,j].CEvd" );
	sfree( LR[i][j].ele, "LR[i,j].ele" );
	sfree( LR[i][j].CEclmb, "LR[i,j].CEclmb" );
	}
	sfree( LR[i], "LR[i]" );
	}
	sfree( LR, "LR" );
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void PairReaxCKokkos<DeviceType>::LR_vdW_Coulomb( int i, int j, double r_ij, LR_data *lr )
	{
	double p_vdW1 = system->reax_param.gp.l[28];
	double p_vdW1i = 1.0 / p_vdW1;
	double powr_vdW1, powgi_vdW1;
	double tmp, fn13, exp1, exp2;
	double Tap, dTap, dfn13;
	double dr3gamij_1, dr3gamij_3;
	double e_core, de_core;
	double e_lg, de_lg, r_ij5, r_ij6, re6;
	two_body_parameters *twbp;

	twbp = &(system->reax_param.tbp[map[i]][map[j]]);
	e_core = 0;
	de_core = 0;
	e_lg = de_lg = 0.0;

	/* calculate taper and its derivative */
	Tap = k_tap.h_view[7] * r_ij + k_tap.h_view[6];
	Tap = Tap * r_ij + k_tap.h_view[5];
	Tap = Tap * r_ij + k_tap.h_view[4];
	Tap = Tap * r_ij + k_tap.h_view[3];
	Tap = Tap * r_ij + k_tap.h_view[2];
	Tap = Tap * r_ij + k_tap.h_view[1];
	Tap = Tap * r_ij + k_tap.h_view[0];

	dTap = 7k_tap.h_view[7] r_ij + 6*k_tap.h_view[6];
	dTap = dTap * r_ij + 5*k_tap.h_view[5];
	dTap = dTap * r_ij + 4*k_tap.h_view[4];
	dTap = dTap * r_ij + 3*k_tap.h_view[3];
	dTap = dTap * r_ij + 2*k_tap.h_view[2];
	dTap += k_tap.h_view[1]/r_ij;

	/vdWaals Calculations/
	if(system->reax_param.gp.vdw_type==1 \|\| system->reax_param.gp.vdw_type==3)
	{ // shielding
	powr_vdW1 = pow(r_ij, p_vdW1);
	powgi_vdW1 = pow( 1.0 / twbp->gamma_w, p_vdW1);

	fn13 = pow( powr_vdW1 + powgi_vdW1, p_vdW1i );
	exp1 = exp( twbp->alpha * (1.0 - fn13 / twbp->r_vdW) );
	exp2 = exp( 0.5 * twbp->alpha * (1.0 - fn13 / twbp->r_vdW) );

	lr->e_vdW = Tap * twbp->D * (exp1 - 2.0 * exp2);

	dfn13 = pow( powr_vdW1 + powgi_vdW1, p_vdW1i-1.0) * pow(r_ij, p_vdW1-2.0);

	lr->CEvd = dTap * twbp->D * (exp1 - 2.0 * exp2) -
	Tap * twbp->D * (twbp->alpha / twbp->r_vdW) * (exp1 - exp2) * dfn13;
	}
	else{ // no shielding
	exp1 = exp( twbp->alpha * (1.0 - r_ij / twbp->r_vdW) );
	exp2 = exp( 0.5 * twbp->alpha * (1.0 - r_ij / twbp->r_vdW) );

	lr->e_vdW = Tap * twbp->D * (exp1 - 2.0 * exp2);
	lr->CEvd = dTap * twbp->D * (exp1 - 2.0 * exp2) -
	Tap * twbp->D * (twbp->alpha / twbp->r_vdW) * (exp1 - exp2) / r_ij;
	}

	if(system->reax_param.gp.vdw_type==2 \|\| system->reax_param.gp.vdw_type==3)
	{ // inner wall
	e_core = twbp->ecore * exp(twbp->acore * (1.0-(r_ij/twbp->rcore)));
	lr->e_vdW += Tap * e_core;

	de_core = -(twbp->acore/twbp->rcore) * e_core;
	lr->CEvd += dTap * e_core + Tap * de_core / r_ij;

	// lg correction, only if lgvdw is yes
	if (control->lgflag) {
	r_ij5 = powint( r_ij, 5 );
	r_ij6 = powint( r_ij, 6 );
	re6 = powint( twbp->lgre, 6 );
	e_lg = -(twbp->lgcij/( r_ij6 + re6 ));
	lr->e_vdW += Tap * e_lg;

	de_lg = -6.0 * e_lg * r_ij5 / ( r_ij6 + re6 ) ;
	lr->CEvd += dTap * e_lg + Tap * de_lg/r_ij;
	}

	}


	/* Coulomb calculations */
	dr3gamij_1 = ( r_ij * r_ij * r_ij + twbp->gamma );
	dr3gamij_3 = pow( dr3gamij_1 , 0.33333333333333 );

	tmp = Tap / dr3gamij_3;
	lr->H = EV_to_KCALpMOL * tmp;
	lr->e_ele = C_ele * tmp;

	lr->CEclmb = C_ele * ( dTap - Tap * r_ij / dr3gamij_1 ) / dr3gamij_3;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void PairReaxCKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
	{
	copymode = 1;

	bocnt = hbcnt = 0;

	eflag = eflag_in;
	vflag = vflag_in;

	if (neighflag == FULL) no_virial_fdotr_compute = 1;

	if (eflag \|\| vflag) ev_setup(eflag,vflag);
	else evflag = vflag_fdotr = 0;

	atomKK->sync(execution_space,datamask_read);
	k_params_sing.template sync<DeviceType>();
	k_params_twbp.template sync<DeviceType>();
	k_params_thbp.template sync<DeviceType>();
	k_params_fbp.template sync<DeviceType>();
	k_params_hbp.template sync<DeviceType>();

	if (eflag \|\| vflag) atomKK->modified(execution_space,datamask_modify);
	else atomKK->modified(execution_space,F_MASK);

	x = atomKK->k_x.view<DeviceType>();
	f = atomKK->k_f.view<DeviceType>();
	q = atomKK->k_q.view<DeviceType>();
	tag = atomKK->k_tag.view<DeviceType>();
	type = atomKK->k_type.view<DeviceType>();
	nlocal = atomKK->nlocal;
	nall = atom->nlocal + atom->nghost;
	newton_pair = force->newton_pair;

	const int inum = list->inum;
	const int ignum = inum + list->gnum;
	NeighListKokkos<DeviceType>* k_list = static_cast<NeighListKokkos<DeviceType>*>(list);
	d_numneigh = k_list->d_numneigh;
	d_neighbors = k_list->d_neighbors;
	d_ilist = k_list->d_ilist;

	k_list->clean_copy();

	if (eflag_global) {
	for (int i = 0; i < 14; i++)
	pvector[i] = 0.0;
	}

	EV_FLOAT_REAX ev;
	EV_FLOAT_REAX ev_all;

	// Polarization (self)
	if (neighflag == HALF) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputePolar<HALF,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputePolar<HALF,0> >(0,inum),*this);
	} else { //if (neighflag == HALFTHREAD) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputePolar<HALFTHREAD,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputePolar<HALFTHREAD,0> >(0,inum),*this);
	}
	DeviceType::fence();
	ev_all += ev;
	pvector[13] = ev.ecoul;

	// LJ + Coulomb
	if (control->tabulate) {
	if (neighflag == HALF) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeTabulatedLJCoulomb<HALF,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeTabulatedLJCoulomb<HALF,0> >(0,inum),*this);
	} else if (neighflag == HALFTHREAD) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeTabulatedLJCoulomb<HALFTHREAD,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeTabulatedLJCoulomb<HALFTHREAD,0> >(0,inum),*this);
	} else if (neighflag == FULL) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeTabulatedLJCoulomb<FULL,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeTabulatedLJCoulomb<FULL,0> >(0,inum),*this);
	}
	} else {
	if (neighflag == HALF) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeLJCoulomb<HALF,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeLJCoulomb<HALF,0> >(0,inum),*this);
	} else if (neighflag == HALFTHREAD) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeLJCoulomb<HALFTHREAD,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeLJCoulomb<HALFTHREAD,0> >(0,inum),*this);
	} else if (neighflag == FULL) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeLJCoulomb<FULL,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeLJCoulomb<FULL,0> >(0,inum),*this);
	}
	}
	DeviceType::fence();
	ev_all += ev;
	pvector[10] = ev.evdwl;
	pvector[11] = ev.ecoul;


	if (atom->nmax > nmax) {
	nmax = atom->nmax;
	allocate_array();
	}

	// Neighbor lists for bond and hbond

	// try, resize if necessary

	int resize = 1;
	while (resize) {
	resize = 0;

	k_resize_bo.h_view() = 0;
	k_resize_bo.modify<LMPHostType>();
	k_resize_bo.sync<DeviceType>();

	k_resize_hb.h_view() = 0;
	k_resize_hb.modify<LMPHostType>();
	k_resize_hb.sync<DeviceType>();

	// zero
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxZero>(0,nmax),*this);
	DeviceType::fence();

	if (neighflag == HALF)
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxBuildListsHalf<HALF> >(0,ignum),*this);
	else if (neighflag == HALFTHREAD)
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxBuildListsHalf_LessAtomics<HALFTHREAD> >(0,ignum),*this);
	else //(neighflag == FULL)
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxBuildListsFull>(0,ignum),*this);
	DeviceType::fence();

	k_resize_bo.modify<DeviceType>();
	k_resize_bo.sync<LMPHostType>();
	int resize_bo = k_resize_bo.h_view();
	if (resize_bo) maxbo++;

	k_resize_hb.modify<DeviceType>();
	k_resize_hb.sync<LMPHostType>();
	int resize_hb = k_resize_hb.h_view();
	if (resize_hb) maxhb++;

	resize = resize_bo \|\| resize_hb;
	if (resize) allocate_array();
	}

	// Bond order
	if (neighflag == HALF) {
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxBondOrder1>(0,ignum),*this);
	DeviceType::fence();
	} else if (neighflag == HALFTHREAD) {
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxBondOrder1_LessAtomics>(0,ignum),*this);
	DeviceType::fence();
	}
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxBondOrder2>(0,ignum),*this);
	DeviceType::fence();
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxBondOrder3>(0,ignum),*this);
	DeviceType::fence();

	// Bond energy
	if (neighflag == HALF) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond1<HALF,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond1<HALF,0> >(0,inum),*this);
	DeviceType::fence();
	ev_all += ev;
	pvector[0] = ev.evdwl;
	} else { //if (neighflag == HALFTHREAD) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond1<HALFTHREAD,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond1<HALFTHREAD,0> >(0,inum),*this);
	DeviceType::fence();
	ev_all += ev;
	pvector[0] = ev.evdwl;
	}

	// Multi-body corrections
	if (neighflag == HALF) {
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeMulti1<HALF,0> >(0,inum),*this);
	DeviceType::fence();
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeMulti2<HALF,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeMulti2<HALF,0> >(0,inum),*this);
	DeviceType::fence();
	ev_all += ev;
	} else { //if (neighflag == HALFTHREAD) {
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeMulti1<HALFTHREAD,0> >(0,inum),*this);
	DeviceType::fence();
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeMulti2<HALFTHREAD,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeMulti2<HALFTHREAD,0> >(0,inum),*this);
	DeviceType::fence();
	ev_all += ev;
	}
	pvector[2] = ev.ereax[0];
	pvector[1] = ev.ereax[1]+ev.ereax[2];
	pvector[3] = 0.0;
	ev_all.evdwl += ev.ereax[0] + ev.ereax[1] + ev.ereax[2];

	// Angular
	if (neighflag == HALF) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeAngular<HALF,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeAngular<HALF,0> >(0,inum),*this);
	DeviceType::fence();
	ev_all += ev;
	} else { //if (neighflag == HALFTHREAD) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeAngular<HALFTHREAD,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeAngular<HALFTHREAD,0> >(0,inum),*this);
	DeviceType::fence();
	ev_all += ev;
	}
	pvector[4] = ev.ereax[3];
	pvector[5] = ev.ereax[4];
	pvector[6] = ev.ereax[5];
	ev_all.evdwl += ev.ereax[3] + ev.ereax[4] + ev.ereax[5];

	// Torsion
	if (neighflag == HALF) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeTorsion<HALF,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeTorsion<HALF,0> >(0,inum),*this);
	DeviceType::fence();
	ev_all += ev;
	} else { //if (neighflag == HALFTHREAD) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeTorsion<HALFTHREAD,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeTorsion<HALFTHREAD,0> >(0,inum),*this);
	DeviceType::fence();
	ev_all += ev;
	}
	pvector[8] = ev.ereax[6];
	pvector[9] = ev.ereax[7];
	ev_all.evdwl += ev.ereax[6] + ev.ereax[7];

	// Hydrogen Bond
	if (cut_hbsq > 0.0) {
	if (neighflag == HALF) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeHydrogen<HALF,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeHydrogen<HALF,0> >(0,inum),*this);
	DeviceType::fence();
	ev_all += ev;
	} else { //if (neighflag == HALFTHREAD) {
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeHydrogen<HALFTHREAD,1> >(0,inum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeHydrogen<HALFTHREAD,0> >(0,inum),*this);
	DeviceType::fence();
	ev_all += ev;
	}
	}
	pvector[7] = ev.ereax[8];
	ev_all.evdwl += ev.ereax[8];

	// Bond force
	if (neighflag == HALF) {
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxUpdateBond<HALF> >(0,ignum),*this);
	DeviceType::fence();
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond2<HALF,1> >(0,ignum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond2<HALF,0> >(0,ignum),*this);
	DeviceType::fence();
	ev_all += ev;
	pvector[0] += ev.evdwl;
	} else { //if (neighflag == HALFTHREAD) {
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxUpdateBond<HALFTHREAD> >(0,ignum),*this);
	DeviceType::fence();
	if (evflag)
	Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond2<HALFTHREAD,1> >(0,ignum),*this,ev);
	else
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond2<HALFTHREAD,0> >(0,ignum),*this);
	DeviceType::fence();
	ev_all += ev;
	pvector[0] += ev.evdwl;
	}

	if (eflag_global) {
	eng_vdwl += ev_all.evdwl;
	eng_coul += ev_all.ecoul;
	}
	if (vflag_global) {
	virial[0] += ev_all.v[0];
	virial[1] += ev_all.v[1];
	virial[2] += ev_all.v[2];
	virial[3] += ev_all.v[3];
	virial[4] += ev_all.v[4];
	virial[5] += ev_all.v[5];
	}

	if (vflag_fdotr) pair_virial_fdotr_compute(this);

	if (eflag_atom) {
	k_eatom.template modify<DeviceType>();
	k_eatom.template sync<LMPHostType>();
	}

	if (vflag_atom) {
	k_vatom.template modify<DeviceType>();
	k_vatom.template sync<LMPHostType>();
	}

	if (fixspecies_flag)
	FindBondSpecies();

	copymode = 0;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputePolar<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {

	const int i = d_ilist[ii];
	const int itype = type(i);
	const F_FLOAT qi = q(i);
	const F_FLOAT chi = paramssing(itype).chi;
	const F_FLOAT eta = paramssing(itype).eta;

	const F_FLOAT epol = KCALpMOL_to_EV(chiqi+(eta/2.0)qiqi);
	if (eflag) ev.ecoul += epol;
	//if (eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,i,epol,0.0,0.0,0.0,0.0);
	if (eflag_atom) this->template e_tally_single<NEIGHFLAG>(ev,i,epol);

	}

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputePolar<NEIGHFLAG,EVFLAG>, const int &ii) const {
	EV_FLOAT_REAX ev;
	this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputePolar<NEIGHFLAG,EVFLAG>(), ii, ev);

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeLJCoulomb<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {

	// The f array is atomic for Half/Thread neighbor style
	Kokkos::View<F_FLOAT*[3], typename DAT::t_f_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_f = f;

	F_FLOAT powr_vdw, powgi_vdw, fn13, dfn13, exp1, exp2, etmp;
	F_FLOAT evdwl, fvdwl;
	evdwl = fvdwl = 0.0;

	const int i = d_ilist[ii];
	const X_FLOAT xtmp = x(i,0);
	const X_FLOAT ytmp = x(i,1);
	const X_FLOAT ztmp = x(i,2);
	const F_FLOAT qi = q(i);
	const int itype = type(i);
	const tagint itag = tag(i);
	const int jnum = d_numneigh[i];

	F_FLOAT fxtmp, fytmp, fztmp;
	fxtmp = fytmp = fztmp = 0.0;

	for (int jj = 0; jj < jnum; jj++) {
	int j = d_neighbors(i,jj);
	j &= NEIGHMASK;
	const int jtype = type(j);
	const tagint jtag = tag(j);
	const F_FLOAT qj = q(j);

	if (NEIGHFLAG != FULL) {
	// skip half of the interactions
	if (j >= nlocal) {
	if (itag > jtag) {
	if ((itag+jtag) % 2 == 0) continue;
	} else if (itag < jtag) {
	if ((itag+jtag) % 2 == 1) continue;
	} else {
	if (x(j,2) < ztmp) continue;
	if (x(j,2) == ztmp && x(j,1) < ytmp) continue;
	if (x(j,2) == ztmp && x(j,1) == ytmp && x(j,0) < xtmp) continue;
	}
	}
	}

	const X_FLOAT delx = x(j,0) - xtmp;
	const X_FLOAT dely = x(j,1) - ytmp;
	const X_FLOAT delz = x(j,2) - ztmp;
	const F_FLOAT rsq = delxdelx + delydely + delz*delz;

	if (rsq > cut_nbsq) continue;
	const F_FLOAT rij = sqrt(rsq);

	// LJ energy/force
	F_FLOAT Tap = d_tap[7] * rij + d_tap[6];
	Tap = Tap * rij + d_tap[5];
	Tap = Tap * rij + d_tap[4];
	Tap = Tap * rij + d_tap[3];
	Tap = Tap * rij + d_tap[2];
	Tap = Tap * rij + d_tap[1];
	Tap = Tap * rij + d_tap[0];

	F_FLOAT dTap = 7d_tap[7] rij + 6*d_tap[6];
	dTap = dTap * rij + 5*d_tap[5];
	dTap = dTap * rij + 4*d_tap[4];
	dTap = dTap * rij + 3*d_tap[3];
	dTap = dTap * rij + 2*d_tap[2];
	dTap += d_tap[1]/rij;

	const F_FLOAT gamma_w = paramstwbp(itype,jtype).gamma_w;
	const F_FLOAT alpha = paramstwbp(itype,jtype).alpha;
	const F_FLOAT r_vdw = paramstwbp(itype,jtype).r_vdw;
	const F_FLOAT epsilon = paramstwbp(itype,jtype).epsilon;

	// shielding
	if (vdwflag == 1 \|\| vdwflag == 3) {
	powr_vdw = pow(rij,gp[28]);
	powgi_vdw = pow(1.0/gamma_w,gp[28]);
	fn13 = pow(powr_vdw+powgi_vdw,1.0/gp[28]);
	exp1 = exp(alpha*(1.0-fn13/r_vdw));
	exp2 = exp(0.5alpha(1.0-fn13/r_vdw));
	dfn13 = pow(powr_vdw+powgi_vdw,1.0/gp[28]-1.0)*pow(rij,gp[28]-2.0);
	etmp = epsilon(exp1-2.0exp2);
	evdwl = Tap*etmp;
	fvdwl = dTapetmp-Tapepsilon(alpha/r_vdw)(exp1-exp2)*dfn13;
	} else {
	exp1 = exp(alpha*(1.0-rij/r_vdw));
	exp2 = exp(0.5alpha(1.0-rij/r_vdw));
	etmp = epsilon(exp1-2.0exp2);
	evdwl = Tap*etmp;
	fvdwl = dTapetmp-Tapepsilon(alpha/r_vdw)(exp1-exp2)*rij;
	}
	// inner wall
	if (vdwflag == 2 \|\| vdwflag == 3) {
	const F_FLOAT ecore = paramstwbp(itype,jtype).ecore;
	const F_FLOAT acore = paramstwbp(itype,jtype).acore;
	const F_FLOAT rcore = paramstwbp(itype,jtype).rcore;
	const F_FLOAT e_core = ecoreexp(acore(1.0-(rij/rcore)));
	const F_FLOAT de_core = -(acore/rcore)*e_core;
	evdwl += Tap*e_core;
	fvdwl += dTape_core+Tapde_core/rij;

	if (lgflag) {
	const F_FLOAT lgre = paramstwbp(itype,jtype).lgre;
	const F_FLOAT lgcij = paramstwbp(itype,jtype).lgcij;
	const F_FLOAT rij5 = rsqrsqrij;
	const F_FLOAT rij6 = rij5*rij;
	const F_FLOAT re6 = lgrelgrelgrelgrelgre*lgre;
	const F_FLOAT elg = -lgcij/(rij6+re6);
	const F_FLOAT delg = -6.0elgrij5/(rij6+re6);
	evdwl += Tap*elg;
	fvdwl += dTapelg+Tapdelg/rij;
	}
	}

	// Coulomb energy/force
	const F_FLOAT shld = paramstwbp(itype,jtype).gamma;
	const F_FLOAT denom1 = rij * rij * rij + shld;
	const F_FLOAT denom3 = pow(denom1,0.3333333333333);
	const F_FLOAT ecoul = C_ele * qiqjTap/denom3;
	const F_FLOAT fcoul = C_ele * qiqj(dTap-Tap*rij/denom1)/denom3;

	const F_FLOAT ftotal = fvdwl + fcoul;
	fxtmp += delx*ftotal;
	fytmp += dely*ftotal;
	fztmp += delz*ftotal;
	if (NEIGHFLAG != FULL) {
	a_f(j,0) -= delx*ftotal;
	a_f(j,1) -= dely*ftotal;
	a_f(j,2) -= delz*ftotal;
	}

	if (NEIGHFLAG == FULL) {
	if (eflag) ev.evdwl += 0.5*evdwl;
	if (eflag) ev.ecoul += 0.5*ecoul;
	} else {
	if (eflag) ev.evdwl += evdwl;
	if (eflag) ev.ecoul += ecoul;
	}

	if (vflag_either \|\| eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,j,evdwl+ecoul,-ftotal,delx,dely,delz);
	}

	a_f(i,0) += fxtmp;
	a_f(i,1) += fytmp;
	a_f(i,2) += fztmp;
	}

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeLJCoulomb<NEIGHFLAG,EVFLAG>, const int &ii) const {
	EV_FLOAT_REAX ev;
	this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeLJCoulomb<NEIGHFLAG,EVFLAG>(), ii, ev);
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeTabulatedLJCoulomb<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {

	// The f array is atomic for Half/Thread neighbor style
	Kokkos::View<F_FLOAT*[3], typename DAT::t_f_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_f = f;

	const int i = d_ilist[ii];
	const X_FLOAT xtmp = x(i,0);
	const X_FLOAT ytmp = x(i,1);
	const X_FLOAT ztmp = x(i,2);
	const F_FLOAT qi = q(i);
	const int itype = type(i);
	const tagint itag = tag(i);
	const int jnum = d_numneigh[i];

	F_FLOAT fxtmp, fytmp, fztmp;
	fxtmp = fytmp = fztmp = 0.0;

	for (int jj = 0; jj < jnum; jj++) {
	int j = d_neighbors(i,jj);
	j &= NEIGHMASK;
	const int jtype = type(j);
	const tagint jtag = tag(j);
	const F_FLOAT qj = q(j);

	if (NEIGHFLAG != FULL) {
	// skip half of the interactions
	if (j >= nlocal) {
	if (itag > jtag) {
	if ((itag+jtag) % 2 == 0) continue;
	} else if (itag < jtag) {
	if ((itag+jtag) % 2 == 1) continue;
	} else {
	if (x(j,2) < ztmp) continue;
	if (x(j,2) == ztmp && x(j,1) < ytmp) continue;
	if (x(j,2) == ztmp && x(j,1) == ytmp && x(j,0) < xtmp) continue;
	}
	}
	}

	const X_FLOAT delx = x(j,0) - xtmp;
	const X_FLOAT dely = x(j,1) - ytmp;
	const X_FLOAT delz = x(j,2) - ztmp;
	const F_FLOAT rsq = delxdelx + delydely + delz*delz;

	if (rsq > cut_nbsq) continue;
	const F_FLOAT rij = sqrt(rsq);

	const int tmin = MIN( itype, jtype );
	const int tmax = MAX( itype, jtype );
	const LR_lookup_table_kk t = d_LR(tmin,tmax);


	/* Cubic Spline Interpolation */
	int r = (int)(rij * t.inv_dx);
	if( r == 0 ) ++r;
	const F_FLOAT base = (double)(r+1) * t.dx;
	const F_FLOAT dif = rij - base;

	const cubic_spline_coef vdW = t.d_vdW[r];
	const cubic_spline_coef ele = t.d_ele[r];
	const cubic_spline_coef CEvd = t.d_CEvd[r];
	const cubic_spline_coef CEclmb = t.d_CEclmb[r];

	const F_FLOAT evdwl = ((vdW.ddif + vdW.c)dif + vdW.b)*dif +
	vdW.a;

	const F_FLOAT ecoul = (((ele.ddif + ele.c)dif + ele.b)*dif +
	ele.a)qiqj;

	const F_FLOAT fvdwl = ((CEvd.ddif + CEvd.c)dif + CEvd.b)*dif +
	CEvd.a;

	const F_FLOAT fcoul = (((CEclmb.ddif+CEclmb.c)dif+CEclmb.b)*dif +
	CEclmb.a)qiqj;

	const F_FLOAT ftotal = fvdwl + fcoul;
	fxtmp += delx*ftotal;
	fytmp += dely*ftotal;
	fztmp += delz*ftotal;
	if (NEIGHFLAG != FULL) {
	a_f(j,0) -= delx*ftotal;
	a_f(j,1) -= dely*ftotal;
	a_f(j,2) -= delz*ftotal;
	}

	if (NEIGHFLAG == FULL) {
	if (eflag) ev.evdwl += 0.5*evdwl;
	if (eflag) ev.ecoul += 0.5*ecoul;
	} else {
	if (eflag) ev.evdwl += evdwl;
	if (eflag) ev.ecoul += ecoul;
	}

	if (vflag_either \|\| eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,j,evdwl+ecoul,-ftotal,delx,dely,delz);
	}

	a_f(i,0) += fxtmp;
	a_f(i,1) += fytmp;
	a_f(i,2) += fztmp;
	}

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeTabulatedLJCoulomb<NEIGHFLAG,EVFLAG>, const int &ii) const {
	EV_FLOAT_REAX ev;
	this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeTabulatedLJCoulomb<NEIGHFLAG,EVFLAG>(), ii, ev);
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void PairReaxCKokkos<DeviceType>::allocate_array()
	{
	if (cut_hbsq > 0.0) {
	d_hb_first = typename AT::t_int_1d("reax/c/kk:hb_first",nmax);
	d_hb_num = typename AT::t_int_1d("reax/c/kk:hb_num",nmax);
	d_hb_list = typename AT::t_int_1d("reax/c/kk:hb_list",nmax*maxhb);
	}
	d_bo_first = typename AT::t_int_1d("reax/c/kk:bo_first",nmax);
	d_bo_num = typename AT::t_int_1d("reax/c/kk:bo_num",nmax);
	d_bo_list = typename AT::t_int_1d("reax/c/kk:bo_list",nmax*maxbo);

	d_BO = typename AT::t_ffloat_2d_dl("reax/c/kk:BO",nmax,maxbo);
	d_BO_s = typename AT::t_ffloat_2d_dl("reax/c/kk:BO",nmax,maxbo);
	d_BO_pi = typename AT::t_ffloat_2d_dl("reax/c/kk:BO_pi",nmax,maxbo);
	d_BO_pi2 = typename AT::t_ffloat_2d_dl("reax/c/kk:BO_pi2",nmax,maxbo);

	d_dln_BOp_pix = typename AT::t_ffloat_2d_dl("reax/c/kk:d_dln_BOp_pix",nmax,maxbo);
	d_dln_BOp_piy = typename AT::t_ffloat_2d_dl("reax/c/kk:d_dln_BOp_piy",nmax,maxbo);
	d_dln_BOp_piz = typename AT::t_ffloat_2d_dl("reax/c/kk:d_dln_BOp_piz",nmax,maxbo);

	d_dln_BOp_pi2x = typename AT::t_ffloat_2d_dl("reax/c/kk:d_dln_BOp_pi2x",nmax,maxbo);
	d_dln_BOp_pi2y = typename AT::t_ffloat_2d_dl("reax/c/kk:d_dln_BOp_pi2y",nmax,maxbo);
	d_dln_BOp_pi2z = typename AT::t_ffloat_2d_dl("reax/c/kk:d_dln_BOp_pi2z",nmax,maxbo);

	d_C1dbo = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C1dbo",nmax,maxbo);
	d_C2dbo = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C2dbo",nmax,maxbo);
	d_C3dbo = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C3dbo",nmax,maxbo);

	d_C1dbopi = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C1dbopi",nmax,maxbo);
	d_C2dbopi = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C2dbopi",nmax,maxbo);
	d_C3dbopi = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C3dbopi",nmax,maxbo);
	d_C4dbopi = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C4dbopi",nmax,maxbo);

	d_C1dbopi2 = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C1dbopi2",nmax,maxbo);
	d_C2dbopi2 = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C2dbopi2",nmax,maxbo);
	d_C3dbopi2 = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C3dbopi2",nmax,maxbo);
	d_C4dbopi2 = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C4dbopi2",nmax,maxbo);

	d_dBOpx = typename AT::t_ffloat_2d_dl("reax/c/kk:dBOpx",nmax,maxbo);
	d_dBOpy = typename AT::t_ffloat_2d_dl("reax/c/kk:dBOpy",nmax,maxbo);
	d_dBOpz = typename AT::t_ffloat_2d_dl("reax/c/kk:dBOpz",nmax,maxbo);

	d_dDeltap_self = typename AT::t_ffloat_2d_dl("reax/c/kk:dDeltap_self",nmax,3);
	d_Deltap_boc = typename AT::t_ffloat_1d("reax/c/kk:Deltap_boc",nmax);
	d_Deltap = typename AT::t_ffloat_1d("reax/c/kk:Deltap",nmax);
	d_total_bo = typename AT::t_ffloat_1d("reax/c/kk:total_bo",nmax);

	d_Cdbo = typename AT::t_ffloat_2d_dl("reax/c/kk:Cdbo",nmax,3*maxbo);
	d_Cdbopi = typename AT::t_ffloat_2d_dl("reax/c/kk:Cdbopi",nmax,3*maxbo);
	d_Cdbopi2 = typename AT::t_ffloat_2d_dl("reax/c/kk:Cdbopi2",nmax,3*maxbo);

	d_Delta = typename AT::t_ffloat_1d("reax/c/kk:Delta",nmax);
	d_Delta_boc = typename AT::t_ffloat_1d("reax/c/kk:Delta_boc",nmax);
	d_dDelta_lp = typename AT::t_ffloat_1d("reax/c/kk:dDelta_lp",nmax);
	d_Delta_lp = typename AT::t_ffloat_1d("reax/c/kk:Delta_lp",nmax);
	d_Delta_lp_temp = typename AT::t_ffloat_1d("reax/c/kk:Delta_lp_temp",nmax);
	d_CdDelta = typename AT::t_ffloat_1d("reax/c/kk:CdDelta",nmax);
	d_sum_ovun = typename AT::t_ffloat_2d_dl("reax/c/kk:sum_ovun",nmax,3);

	// FixReaxCSpecies
	if (fixspecies_flag) {
	memory->destroy_kokkos(k_tmpid,tmpid);
	memory->destroy_kokkos(k_tmpbo,tmpbo);
	memory->create_kokkos(k_tmpid,tmpid,nmax,MAXSPECBOND,"pair:tmpid");
	memory->create_kokkos(k_tmpbo,tmpbo,nmax,MAXSPECBOND,"pair:tmpbo");
	}

	// FixReaxCBonds
	d_abo = typename AT::t_ffloat_2d("reax/c/kk:abo",nmax,maxbo);
	d_neighid = typename AT::t_tagint_2d("reax/c/kk:neighid",nmax,maxbo);
	d_numneigh_bonds = typename AT::t_int_1d("reax/c/kk:numneigh_bonds",nmax);
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxZero, const int &n) const {
	d_total_bo(n) = 0.0;
	d_CdDelta(n) = 0.0;
	if (neighflag != FULL) {
	d_bo_num(n) = 0.0;
	d_hb_num(n) = 0.0;
	}
	for (int j = 0; j < 3; j++)
	d_dDeltap_self(n,j) = 0.0;
	}

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxZeroEAtom, const int &i) const {
	v_eatom(i) = 0.0;
	}

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxZeroVAtom, const int &i) const {
	v_vatom(i,0) = 0.0;
	v_vatom(i,1) = 0.0;
	v_vatom(i,2) = 0.0;
	v_vatom(i,3) = 0.0;
	v_vatom(i,4) = 0.0;
	v_vatom(i,5) = 0.0;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxBuildListsFull, const int &ii) const {

	if (d_resize_bo() \|\| d_resize_hb())
	return;

	const int i = d_ilist[ii];
	const X_FLOAT xtmp = x(i,0);
	const X_FLOAT ytmp = x(i,1);
	const X_FLOAT ztmp = x(i,2);
	const int itype = type(i);
	const int jnum = d_numneigh[i];

	F_FLOAT C12, C34, C56, BO_s, BO_pi, BO_pi2, BO, delij[3], dBOp_i[3], dln_BOp_pi_i[3], dln_BOp_pi2_i[3];
	F_FLOAT total_bo = 0.0;

	int j_index = i*maxbo;
	d_bo_first[i] = j_index;
	const int bo_first_i = j_index;

	int ihb = -1;
	int jhb = -1;
	int hb_index = i*maxhb;

	int hb_first_i;
	if (cut_hbsq > 0.0) {
	ihb = paramssing(itype).p_hbond;
	if (ihb == 1) {
	d_hb_first[i] = hb_index;
	hb_first_i = hb_index;
	}
	}

	for (int jj = 0; jj < jnum; jj++) {
	int j = d_neighbors(i,jj);
	j &= NEIGHMASK;
	delij[0] = x(j,0) - xtmp;
	delij[1] = x(j,1) - ytmp;
	delij[2] = x(j,2) - ztmp;
	const F_FLOAT rsq = delij[0]delij[0] + delij[1]delij[1] + delij[2]*delij[2];

	double cutoffsq;
	if(i < nlocal) cutoffsq = MAX(cut_bosq,cut_hbsq);
	else cutoffsq = cut_bosq;
	if (rsq > cutoffsq) continue;

	const int jtype = type(j);

	// hbond list
	if (i < nlocal && cut_hbsq > 0.0 && (ihb == 1 \|\| ihb == 2) && rsq <= cut_hbsq) {
	jhb = paramssing(jtype).p_hbond;
	if( ihb == 1 && jhb == 2) {
	const int jj_index = hb_index - hb_first_i;
	if (jj_index >= maxhb) {
	d_resize_hb() = 1;
	return;
	}
	d_hb_list[hb_index] = j;
	hb_index++;
	}
	}

	// bond_list
	const F_FLOAT rij = sqrt(rsq);
	const F_FLOAT p_bo1 = paramstwbp(itype,jtype).p_bo1;
	const F_FLOAT p_bo2 = paramstwbp(itype,jtype).p_bo2;
	const F_FLOAT p_bo3 = paramstwbp(itype,jtype).p_bo3;
	const F_FLOAT p_bo4 = paramstwbp(itype,jtype).p_bo4;
	const F_FLOAT p_bo5 = paramstwbp(itype,jtype).p_bo5;
	const F_FLOAT p_bo6 = paramstwbp(itype,jtype).p_bo6;
	const F_FLOAT r_s = paramstwbp(itype,jtype).r_s;
	const F_FLOAT r_pi = paramstwbp(itype,jtype).r_pi;
	const F_FLOAT r_pi2 = paramstwbp(itype,jtype).r_pi2;

	if (paramssing(itype).r_s > 0.0 && paramssing(jtype).r_s > 0.0) {
	C12 = p_bo1*pow(rij/r_s,p_bo2);
	BO_s = (1.0+bo_cut)*exp(C12);
	}
	else BO_s = C12 = 0.0;

	if (paramssing(itype).r_pi > 0.0 && paramssing(jtype).r_pi > 0.0) {
	C34 = p_bo3*pow(rij/r_pi,p_bo4);
	BO_pi = exp(C34);
	}
	else BO_pi = C34 = 0.0;

	if (paramssing(itype).r_pi2 > 0.0 && paramssing(jtype).r_pi2 > 0.0) {
	C56 = p_bo5*pow(rij/r_pi2,p_bo6);
	BO_pi2 = exp(C56);
	}
	else BO_pi2 = C56 = 0.0;

	BO = BO_s + BO_pi + BO_pi2;
	if (BO < bo_cut) continue;

	const int jj_index = j_index - bo_first_i;

	if (jj_index >= maxbo) {
	d_resize_bo() = 1;
	return;
	}

	d_bo_list[j_index] = j;

	// from BondOrder1

	d_BO(i,jj_index) = BO;
	d_BO_s(i,jj_index) = BO_s;
	d_BO_pi(i,jj_index) = BO_pi;
	d_BO_pi2(i,jj_index) = BO_pi2;

	F_FLOAT Cln_BOp_s = p_bo2 * C12 / rij / rij;
	F_FLOAT Cln_BOp_pi = p_bo4 * C34 / rij / rij;
	F_FLOAT Cln_BOp_pi2 = p_bo6 * C56 / rij / rij;

	if (nlocal == 0)
	Cln_BOp_s = Cln_BOp_pi = Cln_BOp_pi2 = 0.0;

	for (int d = 0; d < 3; d++) dln_BOp_pi_i[d] = -(BO_piCln_BOp_pi)delij[d];
	for (int d = 0; d < 3; d++) dln_BOp_pi2_i[d] = -(BO_pi2Cln_BOp_pi2)delij[d];
	for (int d = 0; d < 3; d++) dBOp_i[d] = -(BO_sCln_BOp_s+BO_piCln_BOp_pi+BO_pi2Cln_BOp_pi2)delij[d];
	for (int d = 0; d < 3; d++) d_dDeltap_self(i,d) += dBOp_i[d];

	d_dln_BOp_pix(i,jj_index) = dln_BOp_pi_i[0];
	d_dln_BOp_piy(i,jj_index) = dln_BOp_pi_i[1];
	d_dln_BOp_piz(i,jj_index) = dln_BOp_pi_i[2];

	d_dln_BOp_pi2x(i,jj_index) = dln_BOp_pi2_i[0];
	d_dln_BOp_pi2y(i,jj_index) = dln_BOp_pi2_i[1];
	d_dln_BOp_pi2z(i,jj_index) = dln_BOp_pi2_i[2];

	d_dBOpx(i,jj_index) = dBOp_i[0];
	d_dBOpy(i,jj_index) = dBOp_i[1];
	d_dBOpz(i,jj_index) = dBOp_i[2];

	d_BO(i,jj_index) -= bo_cut;
	d_BO_s(i,jj_index) -= bo_cut;
	total_bo += d_BO(i,jj_index);

	j_index++;
	}

	d_bo_num[i] = j_index - d_bo_first[i];
	if (cut_hbsq > 0.0 && ihb == 1) d_hb_num[i] = hb_index - d_hb_first[i];

	d_total_bo[i] += total_bo;

	const F_FLOAT val_i = paramssing(itype).valency;
	d_Deltap[i] = d_total_bo[i] - val_i;
	d_Deltap_boc[i] = d_total_bo[i] - paramssing(itype).valency_val;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxBuildListsHalf<NEIGHFLAG>, const int &ii) const {

	if (d_resize_bo() \|\| d_resize_hb())
	return;

	Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_dDeltap_self = d_dDeltap_self;
	Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_total_bo = d_total_bo;

	const int i = d_ilist[ii];
	const X_FLOAT xtmp = x(i,0);
	const X_FLOAT ytmp = x(i,1);
	const X_FLOAT ztmp = x(i,2);
	const int itype = type(i);
	const tagint itag = tag(i);
	const int jnum = d_numneigh[i];

	F_FLOAT C12, C34, C56, BO_s, BO_pi, BO_pi2, BO, delij[3], dBOp_i[3], dln_BOp_pi_i[3], dln_BOp_pi2_i[3];
	F_FLOAT total_bo = 0.0;

	int j_index,i_index;
	d_bo_first[i] = i*maxbo;
	const int bo_first_i = d_bo_first[i];

	int ihb = -1;
	int jhb = -1;

	int hb_first_i;
	if (cut_hbsq > 0.0) {
	ihb = paramssing(itype).p_hbond;
	if (ihb == 1) {
	d_hb_first[i] = i*maxhb;
	hb_first_i = d_hb_first[i];
	}
	}

	for (int jj = 0; jj < jnum; jj++) {
	int j = d_neighbors(i,jj);
	j &= NEIGHMASK;
	const tagint jtag = tag(j);

	d_bo_first[j] = j*maxbo;
	d_hb_first[j] = j*maxhb;
	const int jtype = type(j);

	delij[0] = x(j,0) - xtmp;
	delij[1] = x(j,1) - ytmp;
	delij[2] = x(j,2) - ztmp;
	const F_FLOAT rsq = delij[0]delij[0] + delij[1]delij[1] + delij[2]*delij[2];

	double cutoffsq;
	if(i < nlocal) cutoffsq = MAX(cut_bosq,cut_hbsq);
	else cutoffsq = cut_bosq;
	if (rsq > cutoffsq) continue;

	// hbond list
	if (i < nlocal && cut_hbsq > 0.0 && (ihb == 1 \|\| ihb == 2) && rsq <= cut_hbsq) {
	jhb = paramssing(jtype).p_hbond;
	if( ihb == 1 && jhb == 2) {
	if (NEIGHFLAG == HALF) {
	j_index = hb_first_i + d_hb_num[i];
	d_hb_num[i]++;
	} else {
	j_index = hb_first_i + Kokkos::atomic_fetch_add(&d_hb_num[i],1);
	}

	const int jj_index = j_index - hb_first_i;

	if (jj_index >= maxhb) {
	d_resize_hb() = 1;
	return;
	}

	d_hb_list[j_index] = j;
	} else if ( j < nlocal && ihb == 2 && jhb == 1) {
	if (NEIGHFLAG == HALF) {
	i_index = d_hb_first[j] + d_hb_num[j];
	d_hb_num[j]++;
	} else {
	i_index = d_hb_first[j] + Kokkos::atomic_fetch_add(&d_hb_num[j],1);
	}

	const int ii_index = i_index - d_hb_first[j];

	if (ii_index >= maxhb) {
	d_resize_hb() = 1;
	return;
	}

	d_hb_list[i_index] = i;
	}
	}

	// bond_list
	const F_FLOAT rij = sqrt(rsq);
	const F_FLOAT p_bo1 = paramstwbp(itype,jtype).p_bo1;
	const F_FLOAT p_bo2 = paramstwbp(itype,jtype).p_bo2;
	const F_FLOAT p_bo3 = paramstwbp(itype,jtype).p_bo3;
	const F_FLOAT p_bo4 = paramstwbp(itype,jtype).p_bo4;
	const F_FLOAT p_bo5 = paramstwbp(itype,jtype).p_bo5;
	const F_FLOAT p_bo6 = paramstwbp(itype,jtype).p_bo6;
	const F_FLOAT r_s = paramstwbp(itype,jtype).r_s;
	const F_FLOAT r_pi = paramstwbp(itype,jtype).r_pi;
	const F_FLOAT r_pi2 = paramstwbp(itype,jtype).r_pi2;

	if (paramssing(itype).r_s > 0.0 && paramssing(jtype).r_s > 0.0) {
	C12 = p_bo1*pow(rij/r_s,p_bo2);
	BO_s = (1.0+bo_cut)*exp(C12);
	}
	else BO_s = C12 = 0.0;

	if (paramssing(itype).r_pi > 0.0 && paramssing(jtype).r_pi > 0.0) {
	C34 = p_bo3*pow(rij/r_pi,p_bo4);
	BO_pi = exp(C34);
	}
	else BO_pi = C34 = 0.0;

	if (paramssing(itype).r_pi2 > 0.0 && paramssing(jtype).r_pi2 > 0.0) {
	C56 = p_bo5*pow(rij/r_pi2,p_bo6);
	BO_pi2 = exp(C56);
	}
	else BO_pi2 = C56 = 0.0;

	BO = BO_s + BO_pi + BO_pi2;
	if (BO < bo_cut) continue;

	if (NEIGHFLAG == HALF) {
	j_index = bo_first_i + d_bo_num[i];
	i_index = d_bo_first[j] + d_bo_num[j];
	d_bo_num[i]++;
	d_bo_num[j]++;
	} else {
	j_index = bo_first_i + Kokkos::atomic_fetch_add(&d_bo_num[i],1);
	i_index = d_bo_first[j] + Kokkos::atomic_fetch_add(&d_bo_num[j],1);
	}

	const int jj_index = j_index - bo_first_i;
	const int ii_index = i_index - d_bo_first[j];

	if (jj_index >= maxbo \|\| ii_index >= maxbo) {
	d_resize_bo() = 1;
	return;
	}

	d_bo_list[j_index] = j;
	d_bo_list[i_index] = i;

	// from BondOrder1

	d_BO(i,jj_index) = BO;
	d_BO_s(i,jj_index) = BO_s;
	d_BO_pi(i,jj_index) = BO_pi;
	d_BO_pi2(i,jj_index) = BO_pi2;

	d_BO(j,ii_index) = BO;
	d_BO_s(j,ii_index) = BO_s;
	d_BO_pi(j,ii_index) = BO_pi;
	d_BO_pi2(j,ii_index) = BO_pi2;

	F_FLOAT Cln_BOp_s = p_bo2 * C12 / rij / rij;
	F_FLOAT Cln_BOp_pi = p_bo4 * C34 / rij / rij;
	F_FLOAT Cln_BOp_pi2 = p_bo6 * C56 / rij / rij;

	if (nlocal == 0)
	Cln_BOp_s = Cln_BOp_pi = Cln_BOp_pi2 = 0.0;

	for (int d = 0; d < 3; d++) dln_BOp_pi_i[d] = -(BO_piCln_BOp_pi)delij[d];
	for (int d = 0; d < 3; d++) dln_BOp_pi2_i[d] = -(BO_pi2Cln_BOp_pi2)delij[d];
	for (int d = 0; d < 3; d++) dBOp_i[d] = -(BO_sCln_BOp_s+BO_piCln_BOp_pi+BO_pi2Cln_BOp_pi2)delij[d];
	for (int d = 0; d < 3; d++) a_dDeltap_self(i,d) += dBOp_i[d];
	for (int d = 0; d < 3; d++) a_dDeltap_self(j,d) += -dBOp_i[d];

	d_dln_BOp_pix(i,jj_index) = dln_BOp_pi_i[0];
	d_dln_BOp_piy(i,jj_index) = dln_BOp_pi_i[1];
	d_dln_BOp_piz(i,jj_index) = dln_BOp_pi_i[2];

	d_dln_BOp_pix(j,ii_index) = -dln_BOp_pi_i[0];
	d_dln_BOp_piy(j,ii_index) = -dln_BOp_pi_i[1];
	d_dln_BOp_piz(j,ii_index) = -dln_BOp_pi_i[2];

	d_dln_BOp_pi2x(i,jj_index) = dln_BOp_pi2_i[0];
	d_dln_BOp_pi2y(i,jj_index) = dln_BOp_pi2_i[1];
	d_dln_BOp_pi2z(i,jj_index) = dln_BOp_pi2_i[2];

	d_dln_BOp_pi2x(j,ii_index) = -dln_BOp_pi2_i[0];
	d_dln_BOp_pi2y(j,ii_index) = -dln_BOp_pi2_i[1];
	d_dln_BOp_pi2z(j,ii_index) = -dln_BOp_pi2_i[2];

	d_dBOpx(i,jj_index) = dBOp_i[0];
	d_dBOpy(i,jj_index) = dBOp_i[1];
	d_dBOpz(i,jj_index) = dBOp_i[2];

	d_dBOpx(j,ii_index) = -dBOp_i[0];
	d_dBOpy(j,ii_index) = -dBOp_i[1];
	d_dBOpz(j,ii_index) = -dBOp_i[2];

	d_BO(i,jj_index) -= bo_cut;
	d_BO(j,ii_index) -= bo_cut;
	d_BO_s(i,jj_index) -= bo_cut;
	d_BO_s(j,ii_index) -= bo_cut;
	total_bo += d_BO(i,jj_index);
	a_total_bo[j] += d_BO(j,ii_index);
	}
	a_total_bo[i] += total_bo;

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxBondOrder1, const int &ii) const {

	const int i = d_ilist[ii];
	const int itype = type(i);

	const F_FLOAT val_i = paramssing(itype).valency;
	d_Deltap[i] = d_total_bo[i] - val_i;
	d_Deltap_boc[i] = d_total_bo[i] - paramssing(itype).valency_val;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxBuildListsHalf_LessAtomics<NEIGHFLAG>, const int &ii) const {

	if (d_resize_bo() \|\| d_resize_hb())
	return;

	const int i = d_ilist[ii];
	const X_FLOAT xtmp = x(i,0);
	const X_FLOAT ytmp = x(i,1);
	const X_FLOAT ztmp = x(i,2);
	const int itype = type(i);
	const tagint itag = tag(i);
	const int jnum = d_numneigh[i];

	F_FLOAT C12, C34, C56, BO_s, BO_pi, BO_pi2, BO, delij[3];

	int j_index,i_index;
	d_bo_first[i] = i*maxbo;
	const int bo_first_i = d_bo_first[i];

	int ihb = -1;
	int jhb = -1;

	int hb_first_i;
	if (cut_hbsq > 0.0) {
	ihb = paramssing(itype).p_hbond;
	if (ihb == 1) {
	d_hb_first[i] = i*maxhb;
	hb_first_i = d_hb_first[i];
	}
	}

	for (int jj = 0; jj < jnum; jj++) {
	int j = d_neighbors(i,jj);
	j &= NEIGHMASK;
	const tagint jtag = tag(j);

	d_bo_first[j] = j*maxbo;
	d_hb_first[j] = j*maxhb;
	const int jtype = type(j);

	delij[0] = x(j,0) - xtmp;
	delij[1] = x(j,1) - ytmp;
	delij[2] = x(j,2) - ztmp;
	const F_FLOAT rsq = delij[0]delij[0] + delij[1]delij[1] + delij[2]*delij[2];

	double cutoffsq;
	if(i < nlocal) cutoffsq = MAX(cut_bosq,cut_hbsq);
	else cutoffsq = cut_bosq;
	if (rsq > cutoffsq) continue;

	// hbond list
	if (i < nlocal && cut_hbsq > 0.0 && (ihb == 1 \|\| ihb == 2) && rsq <= cut_hbsq) {
	jhb = paramssing(jtype).p_hbond;
	if( ihb == 1 && jhb == 2) {
	if (NEIGHFLAG == HALF) {
	j_index = hb_first_i + d_hb_num[i];
	d_hb_num[i]++;
	} else {
	j_index = hb_first_i + Kokkos::atomic_fetch_add(&d_hb_num[i],1);
	}

	const int jj_index = j_index - hb_first_i;

	if (jj_index >= maxhb) {
	d_resize_hb() = 1;
	return;
	}

	d_hb_list[j_index] = j;
	} else if ( j < nlocal && ihb == 2 && jhb == 1) {
	if (NEIGHFLAG == HALF) {
	i_index = d_hb_first[j] + d_hb_num[j];
	d_hb_num[j]++;
	} else {
	i_index = d_hb_first[j] + Kokkos::atomic_fetch_add(&d_hb_num[j],1);
	}

	const int ii_index = i_index - d_hb_first[j];

	if (ii_index >= maxhb) {
	d_resize_hb() = 1;
	return;
	}

	d_hb_list[i_index] = i;
	}
	}

	// bond_list
	const F_FLOAT rij = sqrt(rsq);
	const F_FLOAT p_bo1 = paramstwbp(itype,jtype).p_bo1;
	const F_FLOAT p_bo2 = paramstwbp(itype,jtype).p_bo2;
	const F_FLOAT p_bo3 = paramstwbp(itype,jtype).p_bo3;
	const F_FLOAT p_bo4 = paramstwbp(itype,jtype).p_bo4;
	const F_FLOAT p_bo5 = paramstwbp(itype,jtype).p_bo5;
	const F_FLOAT p_bo6 = paramstwbp(itype,jtype).p_bo6;
	const F_FLOAT r_s = paramstwbp(itype,jtype).r_s;
	const F_FLOAT r_pi = paramstwbp(itype,jtype).r_pi;
	const F_FLOAT r_pi2 = paramstwbp(itype,jtype).r_pi2;

	if (paramssing(itype).r_s > 0.0 && paramssing(jtype).r_s > 0.0) {
	C12 = p_bo1*pow(rij/r_s,p_bo2);
	BO_s = (1.0+bo_cut)*exp(C12);
	}
	else BO_s = C12 = 0.0;

	if (paramssing(itype).r_pi > 0.0 && paramssing(jtype).r_pi > 0.0) {
	C34 = p_bo3*pow(rij/r_pi,p_bo4);
	BO_pi = exp(C34);
	}
	else BO_pi = C34 = 0.0;

	if (paramssing(itype).r_pi2 > 0.0 && paramssing(jtype).r_pi2 > 0.0) {
	C56 = p_bo5*pow(rij/r_pi2,p_bo6);
	BO_pi2 = exp(C56);
	}
	else BO_pi2 = C56 = 0.0;

	BO = BO_s + BO_pi + BO_pi2;
	if (BO < bo_cut) continue;

	if (NEIGHFLAG == HALF) {
	j_index = bo_first_i + d_bo_num[i];
	i_index = d_bo_first[j] + d_bo_num[j];
	d_bo_num[i]++;
	d_bo_num[j]++;
	} else {
	j_index = bo_first_i + Kokkos::atomic_fetch_add(&d_bo_num[i],1);
	i_index = d_bo_first[j] + Kokkos::atomic_fetch_add(&d_bo_num[j],1);
	}

	const int jj_index = j_index - bo_first_i;
	const int ii_index = i_index - d_bo_first[j];

	if (jj_index >= maxbo \|\| ii_index >= maxbo) {
	d_resize_bo() = 1;
	return;
	}

	d_bo_list[j_index] = j;
	d_bo_list[i_index] = i;
	}

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxBondOrder1_LessAtomics, const int &ii) const {

	F_FLOAT C12, C34, C56, BO_s, BO_pi, BO_pi2, BO, delij[3], dBOp_i[3], dln_BOp_pi_i[3], dln_BOp_pi2_i[3];

	const int i = d_ilist[ii];
	const X_FLOAT xtmp = x(i,0);
	const X_FLOAT ytmp = x(i,1);
	const X_FLOAT ztmp = x(i,2);
	const int itype = type(i);

	const int j_start = d_bo_first[i];
	const int j_end = j_start + d_bo_num[i];

	F_FLOAT total_bo = 0.0;

	for (int jj = j_start; jj < j_end; jj++) {
	int j = d_bo_list[jj];
	j &= NEIGHMASK;
	delij[0] = x(j,0) - xtmp;
	delij[1] = x(j,1) - ytmp;
	delij[2] = x(j,2) - ztmp;
	const F_FLOAT rsq = delij[0]delij[0] + delij[1]delij[1] + delij[2]*delij[2];
	const F_FLOAT rij = sqrt(rsq);
	const int jtype = type(j);
	const int j_index = jj - j_start;

	// calculate uncorrected BO and total bond order

	const F_FLOAT p_bo1 = paramstwbp(itype,jtype).p_bo1;
	const F_FLOAT p_bo2 = paramstwbp(itype,jtype).p_bo2;
	const F_FLOAT p_bo3 = paramstwbp(itype,jtype).p_bo3;
	const F_FLOAT p_bo4 = paramstwbp(itype,jtype).p_bo4;
	const F_FLOAT p_bo5 = paramstwbp(itype,jtype).p_bo5;
	const F_FLOAT p_bo6 = paramstwbp(itype,jtype).p_bo6;
	const F_FLOAT r_s = paramstwbp(itype,jtype).r_s;
	const F_FLOAT r_pi = paramstwbp(itype,jtype).r_pi;
	const F_FLOAT r_pi2 = paramstwbp(itype,jtype).r_pi2;

	if (paramssing(itype).r_s > 0.0 && paramssing(jtype).r_s > 0.0) {
	C12 = p_bo1*pow(rij/r_s,p_bo2);
	BO_s = (1.0+bo_cut)*exp(C12);
	}
	else BO_s = C12 = 0.0;

	if (paramssing(itype).r_pi > 0.0 && paramssing(jtype).r_pi > 0.0) {
	C34 = p_bo3*pow(rij/r_pi,p_bo4);
	BO_pi = exp(C34);
	}
	else BO_pi = C34 = 0.0;

	if (paramssing(itype).r_pi2 > 0.0 && paramssing(jtype).r_pi2 > 0.0) {
	C56 = p_bo5*pow(rij/r_pi2,p_bo6);
	BO_pi2 = exp(C56);
	}
	else BO_pi2 = C56 = 0.0;

	BO = BO_s + BO_pi + BO_pi2;
	if (BO < bo_cut) continue;

	d_BO(i,j_index) = BO;
	d_BO_s(i,j_index) = BO;
	d_BO_pi(i,j_index) = BO_pi;
	d_BO_pi2(i,j_index) = BO_pi2;

	F_FLOAT Cln_BOp_s = p_bo2 * C12 / rij / rij;
	F_FLOAT Cln_BOp_pi = p_bo4 * C34 / rij / rij;
	F_FLOAT Cln_BOp_pi2 = p_bo6 * C56 / rij / rij;

	if (nlocal == 0)
	Cln_BOp_s = Cln_BOp_pi = Cln_BOp_pi2 = 0.0;

	for (int d = 0; d < 3; d++) dln_BOp_pi_i[d] = -(BO_piCln_BOp_pi)delij[d];
	for (int d = 0; d < 3; d++) dln_BOp_pi2_i[d] = -(BO_pi2Cln_BOp_pi2)delij[d];
	for (int d = 0; d < 3; d++) dBOp_i[d] = -(BO_sCln_BOp_s+BO_piCln_BOp_pi+BO_pi2Cln_BOp_pi2)delij[d];
	for (int d = 0; d < 3; d++) d_dDeltap_self(i,d) += dBOp_i[d];

	d_dln_BOp_pix(i,j_index) = dln_BOp_pi_i[0];
	d_dln_BOp_piy(i,j_index) = dln_BOp_pi_i[1];
	d_dln_BOp_piz(i,j_index) = dln_BOp_pi_i[2];

	d_dln_BOp_pi2x(i,j_index) = dln_BOp_pi2_i[0];
	d_dln_BOp_pi2y(i,j_index) = dln_BOp_pi2_i[1];
	d_dln_BOp_pi2z(i,j_index) = dln_BOp_pi2_i[2];

	d_dBOpx(i,j_index) = dBOp_i[0];
	d_dBOpy(i,j_index) = dBOp_i[1];
	d_dBOpz(i,j_index) = dBOp_i[2];

	d_BO(i,j_index) -= bo_cut;
	d_BO_s(i,j_index) -= bo_cut;
	total_bo += d_BO(i,j_index);
	}
	d_total_bo[i] += total_bo;

	const F_FLOAT val_i = paramssing(itype).valency;
	d_Deltap[i] = d_total_bo[i] - val_i;
	d_Deltap_boc[i] = d_total_bo[i] - paramssing(itype).valency_val;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxBondOrder2, const int &ii) const {

	F_FLOAT delij[3];
	F_FLOAT exp_p1i, exp_p2i, exp_p1j, exp_p2j, f1, f2, f3, u1_ij, u1_ji, Cf1A_ij, Cf1B_ij, Cf1_ij, Cf1_ji;
	F_FLOAT f4, f5, exp_f4, exp_f5, f4f5, Cf45_ij, Cf45_ji;
	F_FLOAT A0_ij, A1_ij, A2_ij, A3_ij, A2_ji, A3_ji;

	const int i = d_ilist[ii];
	const int itype = type(i);
	const int j_start = d_bo_first[i];
	const int j_end = j_start + d_bo_num[i];

	const X_FLOAT xtmp = x(i,0);
	const X_FLOAT ytmp = x(i,1);
	const X_FLOAT ztmp = x(i,2);

	const F_FLOAT val_i = paramssing(itype).valency;

	d_total_bo[i] = 0.0;
	F_FLOAT total_bo = 0.0;

	for (int jj = j_start; jj < j_end; jj++) {
	int j = d_bo_list[jj];
	j &= NEIGHMASK;
	delij[0] = x(j,0) - xtmp;
	delij[1] = x(j,1) - ytmp;
	delij[2] = x(j,2) - ztmp;
	const F_FLOAT rsq = delij[0]delij[0] + delij[1]delij[1] + delij[2]*delij[2];
	const F_FLOAT rij = sqrt(rsq);
	const int jtype = type(j);
	const int j_index = jj - j_start;
	const int i_index = maxbo+j_index;

	// calculate corrected BO and total bond order

	const F_FLOAT val_j = paramssing(jtype).valency;
	const F_FLOAT ovc = paramstwbp(itype,jtype).ovc;
	const F_FLOAT v13cor = paramstwbp(itype,jtype).v13cor;
	const F_FLOAT p_boc3 = paramstwbp(itype,jtype).p_boc3;
	const F_FLOAT p_boc4 = paramstwbp(itype,jtype).p_boc4;
	const F_FLOAT p_boc5 = paramstwbp(itype,jtype).p_boc5;

	if (ovc < 0.001 && v13cor < 0.001) {
	d_C1dbo(i,j_index) = 1.0;
	d_C2dbo(i,j_index) = 0.0;
	d_C3dbo(i,j_index) = 0.0;
	d_C1dbopi(i,j_index) = d_BO_pi(i,j_index);
	d_C2dbopi(i,j_index) = 0.0;
	d_C3dbopi(i,j_index) = 0.0;
	d_C4dbopi(i,j_index) = 0.0;
	d_C1dbopi2(i,j_index) = d_BO_pi(i,j_index);
	d_C2dbopi2(i,j_index) = 0.0;
	d_C3dbopi2(i,j_index) = 0.0;
	d_C4dbopi2(i,j_index) = 0.0;
	} else {
	if (ovc >= 0.001) {
	exp_p1i = exp(-p_boc1 * d_Deltap[i]);
	exp_p2i = exp(-p_boc2 * d_Deltap[i]);
	exp_p1j = exp(-p_boc1 * d_Deltap[j]);
	exp_p2j = exp(-p_boc2 * d_Deltap[j]);

	f2 = exp_p1i + exp_p1j;
	f3 = -1.0/p_boc2log(0.5(exp_p2i+exp_p2j));
	f1 = 0.5 * ((val_i + f2)/(val_i + f2 + f3) + (val_j + f2)/(val_j + f2 + f3));
	u1_ij = val_i + f2 + f3;
	u1_ji = val_j + f2 + f3;
	Cf1A_ij = 0.5 * f3 * (1.0/(u1_iju1_ij)+1.0/(u1_jiu1_ji));
	Cf1B_ij = -0.5 * ((u1_ij - f3)/(u1_iju1_ij)+(u1_ji - f3)/(u1_jiu1_ji));
	Cf1_ij = 0.5 * (-p_boc1 * exp_p1i / u1_ij - ((val_i+f2) / (u1_iju1_ij))
	(-p_boc1 * exp_p1i + exp_p2i / (exp_p2i + exp_p2j)) +
	-p_boc1 * exp_p1i / u1_ji - ((val_j+f2) / (u1_jiu1_ji))
	(-p_boc1 * exp_p1i + exp_p2i / (exp_p2i + exp_p2j)));
	Cf1_ji = -Cf1A_ij * p_boc1 * exp_p1j + Cf1B_ij * exp_p2j / ( exp_p2i + exp_p2j );
	} else {
	f1 = 1.0;
	Cf1_ij = Cf1_ji = 0.0;
	}

	if (v13cor >= 0.001) {
	exp_f4 =exp(-(p_boc4(d_BO(i,j_index)d_BO(i,j_index))-d_Deltap_boc[i])*p_boc3+p_boc5);
	exp_f5 =exp(-(p_boc4(d_BO(i,j_index)d_BO(i,j_index))-d_Deltap_boc[j])*p_boc3+p_boc5);
	f4 = 1. / (1. + exp_f4);
	f5 = 1. / (1. + exp_f5);
	f4f5 = f4 * f5;

	Cf45_ij = -f4 * exp_f4;
	Cf45_ji = -f5 * exp_f5;
	} else {
	f4 = f5 = f4f5 = 1.0;
	Cf45_ij = Cf45_ji = 0.0;
	}

	A0_ij = f1 * f4f5;
	A1_ij = -2 * p_boc3 * p_boc4 * d_BO(i,j_index) * (Cf45_ij + Cf45_ji);
	A2_ij = Cf1_ij / f1 + p_boc3 * Cf45_ij;
	A2_ji = Cf1_ji / f1 + p_boc3 * Cf45_ji;
	A3_ij = A2_ij + Cf1_ij / f1;
	A3_ji = A2_ji + Cf1_ji / f1;

	d_BO(i,j_index) = d_BO(i,j_index) * A0_ij;
	d_BO_pi(i,j_index) = d_BO_pi(i,j_index) * A0_ij * f1;
	d_BO_pi2(i,j_index) = d_BO_pi2(i,j_index) * A0_ij * f1;
	d_BO_s(i,j_index) = d_BO(i,j_index)-(d_BO_pi(i,j_index)+d_BO_pi2(i,j_index));

	d_C1dbo(i,j_index) = A0_ij + d_BO(i,j_index) * A1_ij;
	d_C2dbo(i,j_index) = d_BO(i,j_index) * A2_ij;
	d_C3dbo(i,j_index) = d_BO(i,j_index) * A2_ji;

	d_C1dbopi(i,j_index) = f1f1f4*f5;
	d_C2dbopi(i,j_index) = d_BO_pi(i,j_index) * A1_ij;
	d_C3dbopi(i,j_index) = d_BO_pi(i,j_index) * A3_ij;
	d_C4dbopi(i,j_index) = d_BO_pi(i,j_index) * A3_ji;

	d_C1dbopi2(i,j_index) = f1f1f4*f5;
	d_C2dbopi2(i,j_index) = d_BO_pi2(i,j_index) * A1_ij;
	d_C3dbopi2(i,j_index) = d_BO_pi2(i,j_index) * A3_ij;
	d_C4dbopi2(i,j_index) = d_BO_pi2(i,j_index) * A3_ji;
	}

	if(d_BO(i,j_index) < 1e-10) d_BO(i,j_index) = 0.0;
	if(d_BO_s(i,j_index) < 1e-10) d_BO_s(i,j_index) = 0.0;
	if(d_BO_pi(i,j_index) < 1e-10) d_BO_pi(i,j_index) = 0.0;
	if(d_BO_pi2(i,j_index) < 1e-10) d_BO_pi2(i,j_index) = 0.0;

	total_bo += d_BO(i,j_index);

	d_Cdbo(i,j_index) = 0.0;
	d_Cdbopi(i,j_index) = 0.0;
	d_Cdbopi2(i,j_index) = 0.0;
	d_Cdbo(j,i_index) = 0.0;
	d_Cdbopi(j,i_index) = 0.0;
	d_Cdbopi2(j,i_index) = 0.0;

	d_CdDelta[j] = 0.0;
	}
	d_CdDelta[i] = 0.0;
	d_total_bo[i] += total_bo;

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxBondOrder3, const int &ii) const {
	// bot part of BO()

	const int i = d_ilist[ii];
	const int itype = type(i);
	F_FLOAT nlp_temp;

	d_Delta[i] = d_total_bo[i] - paramssing(itype).valency;
	const F_FLOAT Delta_e = d_total_bo[i] - paramssing(itype).valency_e;
	d_Delta_boc[i] = d_total_bo[i] - paramssing(itype).valency_boc;

	const F_FLOAT vlpex = Delta_e - 2.0 * (int)(Delta_e/2.0);
	const F_FLOAT explp1 = exp(-gp[15] * SQR(2.0 + vlpex));
	const F_FLOAT nlp = explp1 - (int)(Delta_e / 2.0);
	d_Delta_lp[i] = paramssing(itype).nlp_opt - nlp;
	const F_FLOAT Clp = 2.0 * gp[15] * explp1 * (2.0 + vlpex);
	d_dDelta_lp[i] = Clp;

	if( paramssing(itype).mass > 21.0 ) {
	nlp_temp = 0.5 * (paramssing(itype).valency_e - paramssing(itype).valency);
	d_Delta_lp_temp[i] = paramssing(itype).nlp_opt - nlp_temp;
	} else {
	nlp_temp = nlp;
	d_Delta_lp_temp[i] = paramssing(itype).nlp_opt - nlp_temp;
	}

	d_sum_ovun(i,1) = 0.0;
	d_sum_ovun(i,2) = 0.0;

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeMulti1<NEIGHFLAG,EVFLAG>, const int &ii) const {

	const int i = d_ilist[ii];
	const int itype = type(i);
	const F_FLOAT imass = paramssing(itype).mass;
	F_FLOAT dfvl;

	if (imass > 21.0) dfvl = 0.0;
	else dfvl = 1.0;

	const int j_start = d_bo_first[i];
	const int j_end = j_start + d_bo_num[i];

	F_FLOAT sum_ovun1 = 0.0;
	F_FLOAT sum_ovun2 = 0.0;

	for (int jj = j_start; jj < j_end; jj++) {
	int j = d_bo_list[jj];
	j &= NEIGHMASK;
	const int jtype = type(j);
	const int j_index = jj - j_start;

	sum_ovun1 += paramstwbp(itype,jtype).p_ovun1 * paramstwbp(itype,jtype).De_s * d_BO(i,j_index);
	sum_ovun2 += (d_Delta[j] - dfvl * d_Delta_lp_temp[j]) * (d_BO_pi(i,j_index) + d_BO_pi2(i,j_index));
	}
	d_sum_ovun(i,1) += sum_ovun1;
	d_sum_ovun(i,2) += sum_ovun2;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeMulti2<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {

	Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_CdDelta = d_CdDelta;
	Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbo = d_Cdbo;
	Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbopi = d_Cdbopi;
	Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbopi2 = d_Cdbopi2;

	const int i = d_ilist[ii];
	const int itype = type(i);
	const F_FLOAT imass = paramssing(itype).mass;
	const F_FLOAT val_i = paramssing(itype).valency;

	F_FLOAT dfvl;
	if (imass > 21.0) dfvl = 0.0;
	else dfvl = 1.0;

	F_FLOAT e_lp, e_ov, e_un;
	F_FLOAT CEover1, CEover2, CEover3, CEover4;
	F_FLOAT CEunder1, CEunder2, CEunder3, CEunder4;
	const F_FLOAT p_lp3 = gp[5];
	const F_FLOAT p_ovun2 = paramssing(itype).p_ovun2;
	const F_FLOAT p_ovun3 = gp[32];
	const F_FLOAT p_ovun4 = gp[31];
	const F_FLOAT p_ovun5 = paramssing(itype).p_ovun5;
	const F_FLOAT p_ovun6 = gp[6];
	const F_FLOAT p_ovun7 = gp[8];
	const F_FLOAT p_ovun8 = gp[9];

	// lone pair
	const F_FLOAT p_lp2 = paramssing(itype).p_lp2;
	const F_FLOAT expvd2 = exp( -75 * d_Delta_lp[i]);
	const F_FLOAT inv_expvd2 = 1.0 / (1.0+expvd2);

	int numbonds = d_bo_num[i];

	e_lp = 0.0;
	- if (numbonds > 0)
	+ if (numbonds > 0 \|\| control->enobondsflag)
	e_lp = p_lp2 * d_Delta_lp[i] * inv_expvd2;
	const F_FLOAT dElp = p_lp2 * inv_expvd2 + 75.0 * p_lp2 * d_Delta_lp[i] * expvd2 * inv_expvd2*inv_expvd2;
	const F_FLOAT CElp = dElp * d_dDelta_lp[i];

	- if (numbonds > 0)
	+ if (numbonds > 0 \|\| control->enobondsflag)
	a_CdDelta[i] += CElp;

	if (eflag) ev.ereax[0] += e_lp;
	//if (vflag_either \|\| eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,i,e_lp,0.0,0.0,0.0,0.0);
	//if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,i,e_lp);

	// over coordination
	const F_FLOAT exp_ovun1 = p_ovun3 * exp( p_ovun4 * d_sum_ovun(i,2) );
	const F_FLOAT inv_exp_ovun1 = 1.0 / (1 + exp_ovun1);
	const F_FLOAT Delta_lpcorr = d_Delta[i] - (dfvl * d_Delta_lp_temp[i]) * inv_exp_ovun1;

	const F_FLOAT exp_ovun2 = exp( p_ovun2 * Delta_lpcorr );
	const F_FLOAT inv_exp_ovun2 = 1.0 / (1.0 + exp_ovun2);
	const F_FLOAT DlpVi = 1.0 / (Delta_lpcorr + val_i + 1e-8);

	CEover1 = Delta_lpcorr * DlpVi * inv_exp_ovun2;
	e_ov = d_sum_ovun(i,1) * CEover1;

	if (eflag) ev.ereax[1] += e_ov;
	//if (eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,i,e_ov,0.0,0.0,0.0,0.0);
	//if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,i,e_ov);

	CEover2 = d_sum_ovun(i,1) * DlpVi * inv_exp_ovun2 *
	(1.0 - Delta_lpcorr * ( DlpVi + p_ovun2 * exp_ovun2 * inv_exp_ovun2 ));
	CEover3 = CEover2 * (1.0 - dfvl * d_dDelta_lp[i] * inv_exp_ovun1 );
	CEover4 = CEover2 * (dfvl * d_Delta_lp_temp[i]) * p_ovun4 * exp_ovun1 * SQR(inv_exp_ovun1);

	// under coordination

	const F_FLOAT exp_ovun2n = 1.0 / exp_ovun2;
	const F_FLOAT exp_ovun6 = exp( p_ovun6 * Delta_lpcorr );
	const F_FLOAT exp_ovun8 = p_ovun7 * exp(p_ovun8 * d_sum_ovun(i,2));
	const F_FLOAT inv_exp_ovun2n = 1.0 / (1.0 + exp_ovun2n);
	const F_FLOAT inv_exp_ovun8 = 1.0 / (1.0 + exp_ovun8);

	e_un = 0;
	- if (numbonds > 0)
	+ if (numbonds > 0 \|\| control->enobondsflag)
	e_un = -p_ovun5 * (1.0 - exp_ovun6) * inv_exp_ovun2n * inv_exp_ovun8;

	if (eflag) ev.ereax[2] += e_un;
	//if (eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,i,e_un,0.0,0.0,0.0,0.0);
	//if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,i,e_un);

	CEunder1 = inv_exp_ovun2n *
	( p_ovun5 * p_ovun6 * exp_ovun6 * inv_exp_ovun8 + p_ovun2 * e_un * exp_ovun2n );
	CEunder2 = -e_un * p_ovun8 * exp_ovun8 * inv_exp_ovun8;
	CEunder3 = CEunder1 * (1.0 - dfvl * d_dDelta_lp[i] * inv_exp_ovun1);
	CEunder4 = CEunder1 * (dfvl * d_Delta_lp_temp[i]) *
	p_ovun4 * exp_ovun1 * inv_exp_ovun1 * inv_exp_ovun1 + CEunder2;

	const F_FLOAT eng_tmp = e_lp + e_ov + e_un;
	if (eflag_atom) this->template e_tally_single<NEIGHFLAG>(ev,i,eng_tmp);

	// multibody forces

	a_CdDelta[i] += CEover3;
	- if (numbonds > 0)
	+ if (numbonds > 0 \|\| control->enobondsflag)
	a_CdDelta[i] += CEunder3;

	const int j_start = d_bo_first[i];
	const int j_end = j_start + d_bo_num[i];

	F_FLOAT CdDelta_i = 0.0;
	for (int jj = j_start; jj < j_end; jj++) {
	int j = d_bo_list[jj];
	j &= NEIGHMASK;
	const int jtype = type(j);
	const F_FLOAT jmass = paramssing(jtype).mass;
	const int j_index = jj - j_start;
	const F_FLOAT De_s = paramstwbp(itype,jtype).De_s;

	// multibody lone pair: correction for C2
	if (p_lp3 > 0.001 && imass == 12.0 && jmass == 12.0) {
	const F_FLOAT Di = d_Delta[i];
	const F_FLOAT vov3 = d_BO(i,j_index) - Di - 0.040*pow(Di,4.0);
	if (vov3 > 3.0) {
	const F_FLOAT e_lph = p_lp3 * (vov3-3.0)*(vov3-3.0);
	const F_FLOAT deahu2dbo = 2.0 * p_lp3 * (vov3 - 3.0);
	const F_FLOAT deahu2dsbo = 2.0 * p_lp3 * (vov3 - 3.0) * (-1.0 - 0.16 * pow(Di,3.0));
	d_Cdbo(i,j_index) += deahu2dbo;
	CdDelta_i += deahu2dsbo;

	if (eflag) ev.ereax[0] += e_lph;
	if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,j,e_lph);
	}
	}

	// over/under coordination forces merged together
	const F_FLOAT p_ovun1 = paramstwbp(itype,jtype).p_ovun1;
	a_CdDelta[j] += (CEover4 + CEunder4) * (1.0 - dfvl * d_dDelta_lp[j]) * (d_BO_pi(i,j_index) + d_BO_pi2(i,j_index));
	d_Cdbo(i,j_index) += CEover1 * p_ovun1 * De_s;
	d_Cdbopi(i,j_index) += (CEover4 + CEunder4) * (d_Delta[j] - dfvl*d_Delta_lp_temp[j]);
	d_Cdbopi2(i,j_index) += (CEover4 + CEunder4) * (d_Delta[j] - dfvl*d_Delta_lp_temp[j]);
	}
	a_CdDelta[i] += CdDelta_i;

	}

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeMulti2<NEIGHFLAG,EVFLAG>, const int &ii) const {
	EV_FLOAT_REAX ev;
	this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeMulti2<NEIGHFLAG,EVFLAG>(), ii, ev);

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeAngular<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {

	Kokkos::View<F_FLOAT*[3], typename DAT::t_f_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_f = f;
	Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbo = d_Cdbo;
	Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_CdDelta = d_CdDelta;

	const int i = d_ilist[ii];
	const int itype = type(i);
	const X_FLOAT xtmp = x(i,0);
	const X_FLOAT ytmp = x(i,1);
	const X_FLOAT ztmp = x(i,2);

	F_FLOAT temp, temp_bo_jt, pBOjt7;
	F_FLOAT p_val1, p_val2, p_val3, p_val4, p_val5;
	F_FLOAT p_val6, p_val7, p_val8, p_val9, p_val10;
	F_FLOAT p_pen1, p_pen2, p_pen3, p_pen4;
	F_FLOAT p_coa1, p_coa2, p_coa3, p_coa4;
	F_FLOAT trm8, expval6, expval7, expval2theta, expval12theta, exp3ij, exp3jk;
	F_FLOAT exp_pen2ij, exp_pen2jk, exp_pen3, exp_pen4, trm_pen34, exp_coa2;
	F_FLOAT dSBO1, dSBO2, SBO, SBO2, CSBO2, SBOp, prod_SBO, vlpadj;
	F_FLOAT CEval1, CEval2, CEval3, CEval4, CEval5, CEval6, CEval7, CEval8;
	F_FLOAT CEpen1, CEpen2, CEpen3;
	F_FLOAT e_ang, e_coa, e_pen;
	F_FLOAT CEcoa1, CEcoa2, CEcoa3, CEcoa4, CEcoa5;
	F_FLOAT Cf7ij, Cf7jk, Cf8j, Cf9j;
	F_FLOAT f7_ij, f7_jk, f8_Dj, f9_Dj;
	F_FLOAT Ctheta_0, theta_0, theta_00, theta, cos_theta, sin_theta;
	F_FLOAT BOA_ij, BOA_ik, rij, bo_ij, bo_ik;
	F_FLOAT dcos_theta_di[3], dcos_theta_dj[3], dcos_theta_dk[3];
	F_FLOAT eng_tmp, fi_tmp[3], fj_tmp[3], fk_tmp[3];
	F_FLOAT delij[3], delik[3], delji[3], delki[3];

	p_val6 = gp[14];
	p_val8 = gp[33];
	p_val9 = gp[16];
	p_val10 = gp[17];

	p_pen2 = gp[19];
	p_pen3 = gp[20];
	p_pen4 = gp[21];

	p_coa2 = gp[2];
	p_coa3 = gp[38];
	p_coa4 = gp[30];

	p_val3 = paramssing(itype).p_val3;
	p_val5 = paramssing(itype).p_val5;

	const int j_start = d_bo_first[i];
	const int j_end = j_start + d_bo_num[i];

	const F_FLOAT Delta_val = d_total_bo[i] - paramssing(itype).valency_val;

	SBOp = 0.0, prod_SBO = 1.0;

	for (int jj = j_start; jj < j_end; jj++) {
	int j = d_bo_list[jj];
	j &= NEIGHMASK;
	const int j_index = jj - j_start;
	bo_ij = d_BO(i,j_index);

	SBOp += (d_BO_pi(i,j_index) + d_BO_pi2(i,j_index));
	temp = SQR(bo_ij);
	temp *= temp;
	temp *= temp;
	prod_SBO *= exp( -temp );
	}

	const F_FLOAT Delta_e = d_total_bo[i] - paramssing(itype).valency_e;
	const F_FLOAT vlpex = Delta_e - 2.0 * (int)(Delta_e/2.0);
	const F_FLOAT explp1 = exp(-gp[15] * SQR(2.0 + vlpex));
	const F_FLOAT nlp = explp1 - (int)(Delta_e / 2.0);
	if( vlpex >= 0.0 ){
	vlpadj = 0.0;
	dSBO2 = prod_SBO - 1.0;
	} else{
	vlpadj = nlp;
	dSBO2 = (prod_SBO - 1.0) * (1.0 - p_val8 * d_dDelta_lp[i]);
	}

	SBO = SBOp + (1.0 - prod_SBO) * (-d_Delta_boc[i] - p_val8 * vlpadj);
	dSBO1 = -8.0 * prod_SBO * ( d_Delta_boc[i] + p_val8 * vlpadj );

	if( SBO <= 0.0 ) {
	SBO2 = 0.0;
	CSBO2 = 0.0;
	} else if( SBO > 0.0 && SBO <= 1.0 ) {
	SBO2 = pow( SBO, p_val9 );
	CSBO2 = p_val9 * pow( SBO, p_val9 - 1.0 );
	} else if( SBO > 1.0 && SBO < 2.0 ) {
	SBO2 = 2.0 - pow( 2.0-SBO, p_val9 );
	CSBO2 = p_val9 * pow( 2.0 - SBO, p_val9 - 1.0 );
	} else {
	SBO2 = 2.0;
	CSBO2 = 0.0;
	}
	expval6 = exp( p_val6 * d_Delta_boc[i] );

	F_FLOAT CdDelta_i = 0.0;
	F_FLOAT fitmp[3],fjtmp[3];
	for (int j = 0; j < 3; j++) fitmp[j] = 0.0;

	for (int jj = j_start; jj < j_end; jj++) {
	int j = d_bo_list[jj];
	j &= NEIGHMASK;
	const int j_index = jj - j_start;
	delij[0] = x(j,0) - xtmp;
	delij[1] = x(j,1) - ytmp;
	delij[2] = x(j,2) - ztmp;
	const F_FLOAT rsqij = delij[0]delij[0] + delij[1]delij[1] + delij[2]*delij[2];
	rij = sqrt(rsqij);
	bo_ij = d_BO(i,j_index);
	const int i_index = maxbo+j_index;

	BOA_ij = bo_ij - thb_cut;
	if (BOA_ij <= 0.0) continue;
	if (i >= nlocal && j >= nlocal) continue;

	const int jtype = type(j);

	F_FLOAT CdDelta_j = 0.0;
	for (int k = 0; k < 3; k++) fjtmp[k] = 0.0;

	for (int kk = jj+1; kk < j_end; kk++ ) {
	//for (int kk = j_start; kk < j_end; kk++ ) {
	int k = d_bo_list[kk];
	k &= NEIGHMASK;
	if (k == j) continue;

	const int k_index = kk - j_start;
	delik[0] = x(k,0) - xtmp;
	delik[1] = x(k,1) - ytmp;
	delik[2] = x(k,2) - ztmp;
	const F_FLOAT rsqik = delik[0]delik[0] + delik[1]delik[1] + delik[2]*delik[2];
	const F_FLOAT rik = sqrt(rsqik);
	bo_ik = d_BO(i,k_index);
	BOA_ik = bo_ik - thb_cut;

	if (BOA_ik <= 0.0 \|\| bo_ij <= thb_cut \|\| bo_ik <= thb_cut \|\| bo_ij * bo_ik <= thb_cutsq) continue;

	const int ktype = type(k);

	// theta and derivatives

	cos_theta = (delij[0]delik[0]+delij[1]delik[1]+delij[2]delik[2])/(rijrik);
	if( cos_theta > 1.0 ) cos_theta = 1.0;
	if( cos_theta < -1.0 ) cos_theta = -1.0;
	theta = acos(cos_theta);

	const F_FLOAT inv_dists = 1.0 / (rij * rik);
	const F_FLOAT Cdot_inv3 = cos_theta * inv_dists * inv_dists;

	for( int t = 0; t < 3; t++ ) {
	dcos_theta_di[t] = -(delik[t] + delij[t]) * inv_dists + Cdot_inv3 * (rsqik * delij[t] + rsqij * delik[t]);
	dcos_theta_dj[t] = delik[t] * inv_dists - Cdot_inv3 * rsqik * delij[t];
	dcos_theta_dk[t] = delij[t] * inv_dists - Cdot_inv3 * rsqij * delik[t];
	}

	sin_theta = sin(theta);
	if (sin_theta < 1.0e-5) sin_theta = 1.0e-5;
	p_val1 = paramsthbp(jtype,itype,ktype).p_val1;

	if (fabs(p_val1) <= 0.001) continue;

	// ANGLE ENERGY

	p_val1 = paramsthbp(jtype,itype,ktype).p_val1;
	p_val2 = paramsthbp(jtype,itype,ktype).p_val2;
	p_val4 = paramsthbp(jtype,itype,ktype).p_val4;
	p_val7 = paramsthbp(jtype,itype,ktype).p_val7;
	theta_00 = paramsthbp(jtype,itype,ktype).theta_00;

	exp3ij = exp( -p_val3 * pow( BOA_ij, p_val4 ) );
	f7_ij = 1.0 - exp3ij;
	Cf7ij = p_val3 * p_val4 * pow( BOA_ij, p_val4 - 1.0 ) * exp3ij;
	exp3jk = exp( -p_val3 * pow( BOA_ik, p_val4 ) );
	f7_jk = 1.0 - exp3jk;
	Cf7jk = p_val3 * p_val4 * pow( BOA_ik, p_val4 - 1.0 ) * exp3jk;
	expval7 = exp( -p_val7 * d_Delta_boc[i] );
	trm8 = 1.0 + expval6 + expval7;
	f8_Dj = p_val5 - ( (p_val5 - 1.0) * (2.0 + expval6) / trm8 );
	Cf8j = ((1.0 - p_val5) / (trm8trm8))
	(p_val6 * expval6 * trm8 - (2.0 + expval6) * ( p_val6expval6 - p_val7expval7));
	theta_0 = 180.0 - theta_00 * (1.0 - exp(-p_val10 * (2.0 - SBO2)));
	theta_0 = theta_0*constPI/180.0;

	expval2theta = exp( -p_val2 * (theta_0-theta)*(theta_0-theta) );
	if( p_val1 >= 0 )
	expval12theta = p_val1 * (1.0 - expval2theta);
	else // To avoid linear Me-H-Me angles (6/6/06)
	expval12theta = p_val1 * -expval2theta;

	CEval1 = Cf7ij * f7_jk * f8_Dj * expval12theta;
	CEval2 = Cf7jk * f7_ij * f8_Dj * expval12theta;
	CEval3 = Cf8j * f7_ij * f7_jk * expval12theta;
	CEval4 = -2.0 * p_val1 * p_val2 * f7_ij * f7_jk * f8_Dj * expval2theta * (theta_0 - theta);
	Ctheta_0 = p_val10 * theta_00constPI/180.0 exp( -p_val10 * (2.0 - SBO2) );
	CEval5 = -CEval4 * Ctheta_0 * CSBO2;
	CEval6 = CEval5 * dSBO1;
	CEval7 = CEval5 * dSBO2;
	CEval8 = -CEval4 / sin_theta;

	e_ang = f7_ij * f7_jk * f8_Dj * expval12theta;
	if (eflag) ev.ereax[3] += e_ang;

	// Penalty energy

	p_pen1 = paramsthbp(jtype,itype,ktype).p_pen1;

	exp_pen2ij = exp( -p_pen2 * (BOA_ij - 2.0)*(BOA_ij - 2.0) );
	exp_pen2jk = exp( -p_pen2 * (BOA_ik - 2.0)*(BOA_ik - 2.0) );
	exp_pen3 = exp( -p_pen3 * d_Delta[i] );
	exp_pen4 = exp( p_pen4 * d_Delta[i] );
	trm_pen34 = 1.0 + exp_pen3 + exp_pen4;
	f9_Dj = (2.0 + exp_pen3 ) / trm_pen34;
	Cf9j = (-p_pen3 * exp_pen3 * trm_pen34 - (2.0 + exp_pen3) *
	(-p_pen3 * exp_pen3 + p_pen4 * exp_pen4 ) )/(trm_pen34*trm_pen34);

	e_pen = p_pen1 * f9_Dj * exp_pen2ij * exp_pen2jk;
	if (eflag) ev.ereax[4] += e_pen;

	CEpen1 = e_pen * Cf9j / f9_Dj;
	temp = -2.0 * p_pen2 * e_pen;
	CEpen2 = temp * (BOA_ij - 2.0);
	CEpen3 = temp * (BOA_ik - 2.0);

	// ConjAngle energy

	p_coa1 = paramsthbp(jtype,itype,ktype).p_coa1;
	exp_coa2 = exp( p_coa2 * Delta_val );
	e_coa = p_coa1 / (1. + exp_coa2) *
	exp( -p_coa3 * SQR(d_total_bo[j]-BOA_ij) ) *
	exp( -p_coa3 * SQR(d_total_bo[k]-BOA_ik) ) *
	exp( -p_coa4 * SQR(BOA_ij - 1.5) ) *
	exp( -p_coa4 * SQR(BOA_ik - 1.5) );

	CEcoa1 = -2 * p_coa4 * (BOA_ij - 1.5) * e_coa;
	CEcoa2 = -2 * p_coa4 * (BOA_ik - 1.5) * e_coa;
	CEcoa3 = -p_coa2 * exp_coa2 * e_coa / (1 + exp_coa2);
	CEcoa4 = -2 * p_coa3 * (d_total_bo[j]-BOA_ij) * e_coa;
	CEcoa5 = -2 * p_coa3 * (d_total_bo[k]-BOA_ik) * e_coa;

	if (eflag) ev.ereax[5] += e_coa;

	// Forces

	a_Cdbo(i,j_index) += (CEval1 + CEpen2 + (CEcoa1 - CEcoa4));
	a_Cdbo(j,i_index) += (CEval1 + CEpen2 + (CEcoa1 - CEcoa4));
	a_Cdbo(i,k_index) += (CEval2 + CEpen3 + (CEcoa2 - CEcoa5));
	a_Cdbo(k,i_index) += (CEval2 + CEpen3 + (CEcoa2 - CEcoa5));

	CdDelta_i += ((CEval3 + CEval7) + CEpen1 + CEcoa3);
	CdDelta_j += CEcoa4;
	a_CdDelta[k] += CEcoa5;

	for (int ll = j_start; ll < j_end; ll++) {
	int l = d_bo_list[ll];
	l &= NEIGHMASK;
	const int l_index = ll - j_start;

	temp_bo_jt = d_BO(i,l_index);
	temp = temp_bo_jt * temp_bo_jt * temp_bo_jt;
	pBOjt7 = temp * temp * temp_bo_jt;

	a_Cdbo(i,l_index) += (CEval6 * pBOjt7);
	d_Cdbopi(i,l_index) += CEval5;
	d_Cdbopi2(i,l_index) += CEval5;
	}

	for (int d = 0; d < 3; d++) fi_tmp[d] = CEval8 * dcos_theta_di[d];
	for (int d = 0; d < 3; d++) fj_tmp[d] = CEval8 * dcos_theta_dj[d];
	for (int d = 0; d < 3; d++) fk_tmp[d] = CEval8 * dcos_theta_dk[d];
	for (int d = 0; d < 3; d++) fitmp[d] -= fi_tmp[d];
	for (int d = 0; d < 3; d++) fjtmp[d] -= fj_tmp[d];
	for (int d = 0; d < 3; d++) a_f(k,d) -= fk_tmp[d];

	// energy/virial tally
	if (EVFLAG) {
	eng_tmp = e_ang + e_pen + e_coa;
	//if (eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,j,eng_tmp,0.0,0.0,0.0,0.0);
	for (int d = 0; d < 3; d++) delki[d] = -1.0 * delik[d];
	for (int d = 0; d < 3; d++) delji[d] = -1.0 * delij[d];
	if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,j,eng_tmp);
	if (vflag_either) this->template v_tally3<NEIGHFLAG>(ev,i,j,k,fj_tmp,fk_tmp,delji,delki);
	}

	}
	a_CdDelta[j] += CdDelta_j;
	for (int d = 0; d < 3; d++) a_f(j,d) += fjtmp[d];
	}
	a_CdDelta[i] += CdDelta_i;
	for (int d = 0; d < 3; d++) a_f(i,d) += fitmp[d];
	}


	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeAngular<NEIGHFLAG,EVFLAG>, const int &ii) const {
	EV_FLOAT_REAX ev;
	this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeAngular<NEIGHFLAG,EVFLAG>(), ii, ev);

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeTorsion<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {

	Kokkos::View<F_FLOAT*[3], typename DAT::t_f_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_f = f;
	Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_CdDelta = d_CdDelta;
	Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbo = d_Cdbo;

	// in reaxc_torsion_angles: j = i, k = j, i = k;

	F_FLOAT Delta_i, Delta_j, bo_ij, bo_ik, bo_jl, BOA_ij, BOA_ik, BOA_jl;
	F_FLOAT p_tor1, p_cot1, V1, V2, V3;
	F_FLOAT exp_tor2_ij, exp_tor2_ik, exp_tor2_jl, exp_tor1, exp_tor3_DiDj, exp_tor4_DiDj, exp_tor34_inv;
	F_FLOAT exp_cot2_ij, exp_cot2_ik, exp_cot2_jl, fn10, f11_DiDj, dfn11, fn12;
	F_FLOAT theta_ijk, theta_jil, sin_ijk, sin_jil, cos_ijk, cos_jil, tan_ijk_i, tan_jil_i;
	F_FLOAT cos_omega, cos2omega, cos3omega;
	F_FLOAT CV, cmn, CEtors1, CEtors2, CEtors3, CEtors4;
	F_FLOAT CEtors5, CEtors6, CEtors7, CEtors8, CEtors9;
	F_FLOAT Cconj, CEconj1, CEconj2, CEconj3, CEconj4, CEconj5, CEconj6;
	F_FLOAT e_tor, e_con, eng_tmp;

	F_FLOAT delij[3], delik[3], deljl[3], dellk[3], delil[3], delkl[3];
	F_FLOAT fi_tmp[3], fj_tmp[3], fk_tmp[3], fl_tmp[3];
	F_FLOAT dcos_omega_di[3], dcos_omega_dj[3], dcos_omega_dk[3], dcos_omega_dl[3];
	F_FLOAT dcos_ijk_di[3], dcos_ijk_dj[3], dcos_ijk_dk[3], dcos_jil_di[3], dcos_jil_dj[3], dcos_jil_dk[3];

	F_FLOAT p_tor2 = gp[23];
	F_FLOAT p_tor3 = gp[24];
	F_FLOAT p_tor4 = gp[25];
	F_FLOAT p_cot2 = gp[27];

	const int i = d_ilist[ii];
	const int itype = type(i);
	const tagint itag = tag(i);
	const X_FLOAT xtmp = x(i,0);
	const X_FLOAT ytmp = x(i,1);
	const X_FLOAT ztmp = x(i,2);
	Delta_i = d_Delta_boc[i];

	const int j_start = d_bo_first[i];
	const int j_end = j_start + d_bo_num[i];

	F_FLOAT fitmp[3], fjtmp[3], fktmp[3];
	for(int j = 0; j < 3; j++) fitmp[j] = 0.0;
	F_FLOAT CdDelta_i = 0.0;

	for (int jj = j_start; jj < j_end; jj++) {
	int j = d_bo_list[jj];
	j &= NEIGHMASK;
	const tagint jtag = tag(j);
	const int jtype = type(j);
	const int j_index = jj - j_start;

	// skip half of the interactions
	if (itag > jtag) {
	if ((itag+jtag) % 2 == 0) continue;
	} else if (itag < jtag) {
	if ((itag+jtag) % 2 == 1) continue;
	} else {
	if (x(j,2) < ztmp) continue;
	if (x(j,2) == ztmp && x(j,1) < ytmp) continue;
	if (x(j,2) == ztmp && x(j,1) == ytmp && x(j,0) < xtmp) continue;
	}

	bo_ij = d_BO(i,j_index);
	if (bo_ij < thb_cut) continue;

	delij[0] = x(j,0) - xtmp;
	delij[1] = x(j,1) - ytmp;
	delij[2] = x(j,2) - ztmp;
	const F_FLOAT rsqij = delij[0]delij[0] + delij[1]delij[1] + delij[2]*delij[2];
	const F_FLOAT rij = sqrt(rsqij);

	BOA_ij = bo_ij - thb_cut;
	Delta_j = d_Delta_boc[j];
	exp_tor2_ij = exp( -p_tor2 * BOA_ij );
	exp_cot2_ij = exp( -p_cot2 * SQR(BOA_ij - 1.5) );
	exp_tor3_DiDj = exp( -p_tor3 * (Delta_i + Delta_j) );
	exp_tor4_DiDj = exp( p_tor4 * (Delta_i + Delta_j) );
	exp_tor34_inv = 1.0 / (1.0 + exp_tor3_DiDj + exp_tor4_DiDj);
	f11_DiDj = (2.0 + exp_tor3_DiDj) * exp_tor34_inv;

	const int l_start = d_bo_first[j];
	const int l_end = l_start + d_bo_num[j];

	for(int k = 0; k < 3; k++) fjtmp[k] = 0.0;
	F_FLOAT CdDelta_j = 0.0;

	for (int kk = j_start; kk < j_end; kk++) {
	int k = d_bo_list[kk];
	k &= NEIGHMASK;
	if (k == j) continue;
	const int ktype = type(k);
	const int k_index = kk - j_start;

	bo_ik = d_BO(i,k_index);
	if (bo_ik < thb_cut) continue;

	BOA_ik = bo_ik - thb_cut;
	for (int d = 0; d < 3; d ++) delik[d] = x(k,d) - x(i,d);
	const F_FLOAT rsqik = delik[0]delik[0] + delik[1]delik[1] + delik[2]*delik[2];
	const F_FLOAT rik = sqrt(rsqik);

	cos_ijk = (delij[0]delik[0]+delij[1]delik[1]+delij[2]delik[2])/(rijrik);
	if( cos_ijk > 1.0 ) cos_ijk = 1.0;
	if( cos_ijk < -1.0 ) cos_ijk = -1.0;
	theta_ijk = acos(cos_ijk);

	// dcos_ijk
	const F_FLOAT inv_dists = 1.0 / (rij * rik);
	const F_FLOAT cos_ijk_tmp = cos_ijk / ((rijrik)(rij*rik));

	for( int d = 0; d < 3; d++ ) {
	dcos_ijk_di[d] = -(delik[d] + delij[d]) * inv_dists + cos_ijk_tmp * (rsqik * delij[d] + rsqij * delik[d]);
	dcos_ijk_dj[d] = delik[d] * inv_dists - cos_ijk_tmp * rsqik * delij[d];
	dcos_ijk_dk[d] = delij[d] * inv_dists - cos_ijk_tmp * rsqij * delik[d];
	}

	sin_ijk = sin( theta_ijk );
	if( sin_ijk >= 0 && sin_ijk <= 1e-10 )
	tan_ijk_i = cos_ijk / 1e-10;
	else if( sin_ijk <= 0 && sin_ijk >= -1e-10 )
	tan_ijk_i = -cos_ijk / 1e-10;
	else tan_ijk_i = cos_ijk / sin_ijk;

	exp_tor2_ik = exp( -p_tor2 * BOA_ik );
	exp_cot2_ik = exp( -p_cot2 * SQR(BOA_ik -1.5) );

	for(int l = 0; l < 3; l++) fktmp[l] = 0.0;

	for (int ll = l_start; ll < l_end; ll++) {
	int l = d_bo_list[ll];
	l &= NEIGHMASK;
	if (l == i) continue;
	const int ltype = type(l);
	const int l_index = ll - l_start;

	bo_jl = d_BO(j,l_index);
	if (l == k \|\| bo_jl < thb_cut \|\| bo_ijbo_ikbo_jl < thb_cut) continue;

	for (int d = 0; d < 3; d ++) deljl[d] = x(l,d) - x(j,d);
	const F_FLOAT rsqjl = deljl[0]deljl[0] + deljl[1]deljl[1] + deljl[2]*deljl[2];
	const F_FLOAT rjl = sqrt(rsqjl);
	BOA_jl = bo_jl - thb_cut;

	cos_jil = -(delij[0]deljl[0]+delij[1]deljl[1]+delij[2]deljl[2])/(rijrjl);
	if( cos_jil > 1.0 ) cos_jil = 1.0;
	if( cos_jil < -1.0 ) cos_jil = -1.0;
	theta_jil = acos(cos_jil);

	// dcos_jil
	const F_FLOAT inv_distjl = 1.0 / (rij * rjl);
	const F_FLOAT inv_distjl3 = pow( inv_distjl, 3.0 );
	const F_FLOAT cos_jil_tmp = cos_jil / ((rijrjl)(rij*rjl));

	for( int d = 0; d < 3; d++ ) {
	dcos_jil_di[d] = deljl[d] * inv_distjl - cos_jil_tmp * rsqjl * -delij[d];
	dcos_jil_dj[d] = (-deljl[d] + delij[d]) * inv_distjl - cos_jil_tmp * (rsqjl * delij[d] + rsqij * -deljl[d]);
	dcos_jil_dk[d] = -delij[d] * inv_distjl - cos_jil_tmp * rsqij * deljl[d];
	}

	sin_jil = sin( theta_jil );
	if( sin_jil >= 0 && sin_jil <= 1e-10 )
	tan_jil_i = cos_jil / 1e-10;
	else if( sin_jil <= 0 && sin_jil >= -1e-10 )
	tan_jil_i = -cos_jil / 1e-10;
	else tan_jil_i = cos_jil / sin_jil;

	for (int d = 0; d < 3; d ++) dellk[d] = x(k,d) - x(l,d);
	const F_FLOAT rsqlk = dellk[0]dellk[0] + dellk[1]dellk[1] + dellk[2]*dellk[2];
	const F_FLOAT rlk = sqrt(rsqlk);

	F_FLOAT unnorm_cos_omega, unnorm_sin_omega, omega;
	F_FLOAT htra, htrb, htrc, hthd, hthe, hnra, hnrc, hnhd, hnhe;
	F_FLOAT arg, poem, tel;
	F_FLOAT cross_ij_jl[3];

	// omega

	F_FLOAT dot_ij_jk = -(delij[0]delik[0]+delij[1]delik[1]+delij[2]*delik[2]);
	F_FLOAT dot_ij_lj = delij[0]deljl[0]+delij[1]deljl[1]+delij[2]*deljl[2];
	F_FLOAT dot_ik_jl = delik[0]deljl[0]+delik[1]deljl[1]+delik[2]*deljl[2];
	unnorm_cos_omega = dot_ij_jk * dot_ij_lj + rsqij * dot_ik_jl;

	cross_ij_jl[0] = delij[1]deljl[2] - delij[2]deljl[1];
	cross_ij_jl[1] = delij[2]deljl[0] - delij[0]deljl[2];
	cross_ij_jl[2] = delij[0]deljl[1] - delij[1]deljl[0];

	unnorm_sin_omega = -rij(delik[0]cross_ij_jl[0]+delik[1]cross_ij_jl[1]+delik[2]cross_ij_jl[2]);
	omega = atan2( unnorm_sin_omega, unnorm_cos_omega );

	htra = rik + cos_ijk * ( rjl * cos_jil - rij );
	htrb = rij - rik * cos_ijk - rjl * cos_jil;
	htrc = rjl + cos_jil * ( rik * cos_ijk - rij );
	hthd = rik * sin_ijk * ( rij - rjl * cos_jil );
	hthe = rjl * sin_jil * ( rij - rik * cos_ijk );
	hnra = rjl * sin_ijk * sin_jil;
	hnrc = rik * sin_ijk * sin_jil;
	hnhd = rik * rjl * cos_ijk * sin_jil;
	hnhe = rik * rjl * sin_ijk * cos_jil;

	poem = 2.0 * rik * rjl * sin_ijk * sin_jil;
	if( poem < 1e-20 ) poem = 1e-20;

	tel = SQR(rik) + SQR(rij) + SQR(rjl) - SQR(rlk) -
	2.0 * (rik * rij * cos_ijk - rik * rjl * cos_ijk * cos_jil + rij * rjl * cos_jil);

	arg = tel / poem;
	if( arg > 1.0 ) arg = 1.0;
	if( arg < -1.0 ) arg = -1.0;

	F_FLOAT sin_ijk_rnd = sin_ijk;
	F_FLOAT sin_jil_rnd = sin_jil;

	if( sin_ijk >= 0 && sin_ijk <= 1e-10 ) sin_ijk_rnd = 1e-10;
	else if( sin_ijk <= 0 && sin_ijk >= -1e-10 ) sin_ijk_rnd = -1e-10;
	if( sin_jil >= 0 && sin_jil <= 1e-10 ) sin_jil_rnd = 1e-10;
	else if( sin_jil <= 0 && sin_jil >= -1e-10 ) sin_jil_rnd = -1e-10;

	// dcos_omega_di
	for (int d = 0; d < 3; d++) dcos_omega_dk[d] = ((htra-arghnra)/rik) delik[d] - dellk[d];
	for (int d = 0; d < 3; d++) dcos_omega_dk[d] += (hthd-arghnhd)/sin_ijk_rnd -dcos_ijk_dk[d];
	for (int d = 0; d < 3; d++) dcos_omega_dk[d] *= 2.0/poem;

	// dcos_omega_dj
	for (int d = 0; d < 3; d++) dcos_omega_di[d] = -((htra-arghnra)/rik) delik[d] - htrb/rij * delij[d];
	for (int d = 0; d < 3; d++) dcos_omega_di[d] += -(hthd-arghnhd)/sin_ijk_rnd dcos_ijk_di[d];
	for (int d = 0; d < 3; d++) dcos_omega_di[d] += -(hthe-arghnhe)/sin_jil_rnd dcos_jil_di[d];
	for (int d = 0; d < 3; d++) dcos_omega_di[d] *= 2.0/poem;

	// dcos_omega_dk
	for (int d = 0; d < 3; d++) dcos_omega_dj[d] = -((htrc-arghnrc)/rjl) deljl[d] + htrb/rij * delij[d];
	for (int d = 0; d < 3; d++) dcos_omega_dj[d] += -(hthd-arghnhd)/sin_ijk_rnd dcos_ijk_dj[d];
	for (int d = 0; d < 3; d++) dcos_omega_dj[d] += -(hthe-arghnhe)/sin_jil_rnd dcos_jil_dj[d];
	for (int d = 0; d < 3; d++) dcos_omega_dj[d] *= 2.0/poem;

	// dcos_omega_dl
	for (int d = 0; d < 3; d++) dcos_omega_dl[d] = ((htrc-arghnrc)/rjl) deljl[d] + dellk[d];
	for (int d = 0; d < 3; d++) dcos_omega_dl[d] += (hthe-arghnhe)/sin_jil_rnd -dcos_jil_dk[d];
	for (int d = 0; d < 3; d++) dcos_omega_dl[d] *= 2.0/poem;

	cos_omega = cos( omega );
	cos2omega = cos( 2. * omega );
	cos3omega = cos( 3. * omega );

	// torsion energy

	p_tor1 = paramsfbp(ktype,itype,jtype,ltype).p_tor1;
	p_cot1 = paramsfbp(ktype,itype,jtype,ltype).p_cot1;
	V1 = paramsfbp(ktype,itype,jtype,ltype).V1;
	V2 = paramsfbp(ktype,itype,jtype,ltype).V2;
	V3 = paramsfbp(ktype,itype,jtype,ltype).V3;

	exp_tor1 = exp(p_tor1 * SQR(2.0 - d_BO_pi(i,j_index) - f11_DiDj));
	exp_tor2_jl = exp(-p_tor2 * BOA_jl);
	exp_cot2_jl = exp(-p_cot2 * SQR(BOA_jl - 1.5) );
	fn10 = (1.0 - exp_tor2_ik) * (1.0 - exp_tor2_ij) * (1.0 - exp_tor2_jl);

	CV = 0.5 * (V1 * (1.0 + cos_omega) + V2 * exp_tor1 * (1.0 - cos2omega) + V3 * (1.0 + cos3omega) );

	e_tor = fn10 * sin_ijk * sin_jil * CV;
	if (eflag) ev.ereax[6] += e_tor;

	dfn11 = (-p_tor3 * exp_tor3_DiDj + (p_tor3 * exp_tor3_DiDj - p_tor4 * exp_tor4_DiDj) *
	(2.0 + exp_tor3_DiDj) * exp_tor34_inv) * exp_tor34_inv;

	CEtors1 = sin_ijk * sin_jil * CV;

	CEtors2 = -fn10 * 2.0 * p_tor1 * V2 * exp_tor1 * (2.0 - d_BO_pi(i,j_index) - f11_DiDj) *
	(1.0 - SQR(cos_omega)) * sin_ijk * sin_jil;
	CEtors3 = CEtors2 * dfn11;

	CEtors4 = CEtors1 * p_tor2 * exp_tor2_ik * (1.0 - exp_tor2_ij) * (1.0 - exp_tor2_jl);
	CEtors5 = CEtors1 * p_tor2 * (1.0 - exp_tor2_ik) * exp_tor2_ij * (1.0 - exp_tor2_jl);
	CEtors6 = CEtors1 * p_tor2 * (1.0 - exp_tor2_ik) * (1.0 - exp_tor2_ij) * exp_tor2_jl;

	cmn = -fn10 * CV;
	CEtors7 = cmn * sin_jil * tan_ijk_i;
	CEtors8 = cmn * sin_ijk * tan_jil_i;

	CEtors9 = fn10 * sin_ijk * sin_jil *
	(0.5 * V1 - 2.0 * V2 * exp_tor1 * cos_omega + 1.5 * V3 * (cos2omega + 2.0 * SQR(cos_omega)));

	// 4-body conjugation energy

	fn12 = exp_cot2_ik * exp_cot2_ij * exp_cot2_jl;
	e_con = p_cot1 * fn12 * (1.0 + (SQR(cos_omega) - 1.0) * sin_ijk * sin_jil);
	if (eflag) ev.ereax[7] += e_con;

	Cconj = -2.0 * fn12 * p_cot1 * p_cot2 * (1.0 + (SQR(cos_omega) - 1.0) * sin_ijk * sin_jil);

	CEconj1 = Cconj * (BOA_ik - 1.5e0);
	CEconj2 = Cconj * (BOA_ij - 1.5e0);
	CEconj3 = Cconj * (BOA_jl - 1.5e0);

	CEconj4 = -p_cot1 * fn12 * (SQR(cos_omega) - 1.0) * sin_jil * tan_ijk_i;
	CEconj5 = -p_cot1 * fn12 * (SQR(cos_omega) - 1.0) * sin_ijk * tan_jil_i;
	CEconj6 = 2.0 * p_cot1 * fn12 * cos_omega * sin_ijk * sin_jil;

	// forces

	// contribution to bond order

	d_Cdbopi(i,j_index) += CEtors2;
	CdDelta_i += CEtors3;
	CdDelta_j += CEtors3;

	a_Cdbo(i,k_index) += CEtors4 + CEconj1;
	a_Cdbo(i,j_index) += CEtors5 + CEconj2;
	a_Cdbo(j,l_index) += CEtors6 + CEconj3; // trouble

	// dcos_theta_ijk
	const F_FLOAT coeff74 = CEtors7 + CEconj4;
	for (int d = 0; d < 3; d++) fi_tmp[d] = (coeff74) * dcos_ijk_di[d];
	for (int d = 0; d < 3; d++) fj_tmp[d] = (coeff74) * dcos_ijk_dj[d];
	for (int d = 0; d < 3; d++) fk_tmp[d] = (coeff74) * dcos_ijk_dk[d];

	const F_FLOAT coeff85 = CEtors8 + CEconj5;
	// dcos_theta_jil
	for (int d = 0; d < 3; d++) fi_tmp[d] += (coeff85) * dcos_jil_di[d];
	for (int d = 0; d < 3; d++) fj_tmp[d] += (coeff85) * dcos_jil_dj[d];
	for (int d = 0; d < 3; d++) fl_tmp[d] = (coeff85) * dcos_jil_dk[d];

	// dcos_omega
	const F_FLOAT coeff96 = CEtors9 + CEconj6;
	for (int d = 0; d < 3; d++) fi_tmp[d] += (coeff96) * dcos_omega_di[d];
	for (int d = 0; d < 3; d++) fj_tmp[d] += (coeff96) * dcos_omega_dj[d];
	for (int d = 0; d < 3; d++) fk_tmp[d] += (coeff96) * dcos_omega_dk[d];
	for (int d = 0; d < 3; d++) fl_tmp[d] += (coeff96) * dcos_omega_dl[d];

	// total forces

	for (int d = 0; d < 3; d++) fitmp[d] -= fi_tmp[d];
	for (int d = 0; d < 3; d++) fjtmp[d] -= fj_tmp[d];
	for (int d = 0; d < 3; d++) fktmp[d] -= fk_tmp[d];
	for (int d = 0; d < 3; d++) a_f(l,d) -= fl_tmp[d];

	// per-atom energy/virial tally

	if (EVFLAG) {
	eng_tmp = e_tor + e_con;
	//if (eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,j,eng_tmp,0.0,0.0,0.0,0.0);
	if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,j,eng_tmp);
	if (vflag_either) {
	for (int d = 0; d < 3; d ++) delil[d] = x(l,d) - x(i,d);
	for (int d = 0; d < 3; d ++) delkl[d] = x(l,d) - x(k,d);
	this->template v_tally4<NEIGHFLAG>(ev,k,i,j,l,fk_tmp,fi_tmp,fj_tmp,delkl,delil,deljl);
	}
	}

	}
	for (int d = 0; d < 3; d++) a_f(k,d) += fktmp[d];
	}
	a_CdDelta[j] += CdDelta_j;
	for (int d = 0; d < 3; d++) a_f(j,d) += fjtmp[d];
	}
	a_CdDelta[i] += CdDelta_i;
	for (int d = 0; d < 3; d++) a_f(i,d) += fitmp[d];
	}

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeTorsion<NEIGHFLAG,EVFLAG>, const int &ii) const {
	EV_FLOAT_REAX ev;
	this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeTorsion<NEIGHFLAG,EVFLAG>(), ii, ev);

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeHydrogen<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {

	Kokkos::View<F_FLOAT*[3], typename DAT::t_f_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_f = f;

	int hblist[MAX_BONDS];
	F_FLOAT theta, cos_theta, sin_xhz4, cos_xhz1, sin_theta2;
	F_FLOAT e_hb, exp_hb2, exp_hb3, CEhb1, CEhb2, CEhb3;
	F_FLOAT dcos_theta_di[3], dcos_theta_dj[3], dcos_theta_dk[3];

	// tally variables
	F_FLOAT fi_tmp[3], fj_tmp[3], fk_tmp[3], delij[3], delji[3], delik[3], delki[3];
	for (int d = 0; d < 3; d++) fi_tmp[d] = fj_tmp[d] = fk_tmp[d] = 0.0;

	const int i = d_ilist[ii];
	const int itype = type(i);
	if( paramssing(itype).p_hbond != 1 ) return;

	const X_FLOAT xtmp = x(i,0);
	const X_FLOAT ytmp = x(i,1);
	const X_FLOAT ztmp = x(i,2);
	const tagint itag = tag(i);

	const int j_start = d_bo_first[i];
	const int j_end = j_start + d_bo_num[i];
	const int k_start = d_hb_first[i];
	const int k_end = k_start + d_hb_num[i];

	int top = 0;
	for (int jj = j_start; jj < j_end; jj++) {
	int j = d_bo_list[jj];
	j &= NEIGHMASK;
	const int jtype = type(j);
	const int j_index = jj - j_start;
	const F_FLOAT bo_ij = d_BO(i,j_index);

	if( paramssing(jtype).p_hbond == 2 && bo_ij >= HB_THRESHOLD ) {
	hblist[top] = jj;
	top ++;
	}
	}

	F_FLOAT fitmp[3];
	for (int d = 0; d < 3; d++) fitmp[d] = 0.0;

	for (int kk = k_start; kk < k_end; kk++) {
	int k = d_hb_list[kk];
	k &= NEIGHMASK;
	const tagint ktag = tag(k);
	const int ktype = type(k);

	delik[0] = x(k,0) - xtmp;
	delik[1] = x(k,1) - ytmp;
	delik[2] = x(k,2) - ztmp;
	const F_FLOAT rsqik = delik[0]delik[0] + delik[1]delik[1] + delik[2]*delik[2];
	const F_FLOAT rik = sqrt(rsqik);

	for (int itr = 0; itr < top; itr++) {
	const int jj = hblist[itr];
	int j = d_bo_list[jj];
	j &= NEIGHMASK;
	const tagint jtag = tag(j);
	if (jtag == ktag) continue;

	const int jtype = type(j);
	const int j_index = jj - j_start;
	const F_FLOAT bo_ij = d_BO(i,j_index);

	delij[0] = x(j,0) - xtmp;
	delij[1] = x(j,1) - ytmp;
	delij[2] = x(j,2) - ztmp;
	const F_FLOAT rsqij = delij[0]delij[0] + delij[1]delij[1] + delij[2]*delij[2];
	const F_FLOAT rij = sqrt(rsqij);

	// theta and derivatives
	cos_theta = (delij[0]delik[0]+delij[1]delik[1]+delij[2]delik[2])/(rijrik);
	if( cos_theta > 1.0 ) cos_theta = 1.0;
	if( cos_theta < -1.0 ) cos_theta = -1.0;
	theta = acos(cos_theta);

	const F_FLOAT inv_dists = 1.0 / (rij * rik);
	const F_FLOAT Cdot_inv3 = cos_theta * inv_dists * inv_dists;

	for( int d = 0; d < 3; d++ ) {
	dcos_theta_di[d] = -(delik[d] + delij[d]) * inv_dists + Cdot_inv3 * (rsqik * delij[d] + rsqij * delik[d]);
	dcos_theta_dj[d] = delik[d] * inv_dists - Cdot_inv3 * rsqik * delij[d];
	dcos_theta_dk[d] = delij[d] * inv_dists - Cdot_inv3 * rsqij * delik[d];
	}

	// hydrogen bond energy
	const F_FLOAT p_hb1 = paramshbp(jtype,itype,ktype).p_hb1;
	const F_FLOAT p_hb2 = paramshbp(jtype,itype,ktype).p_hb2;
	const F_FLOAT p_hb3 = paramshbp(jtype,itype,ktype).p_hb3;
	const F_FLOAT r0_hb = paramshbp(jtype,itype,ktype).r0_hb;

	sin_theta2 = sin(theta/2.0);
	sin_xhz4 = SQR(sin_theta2);
	sin_xhz4 *= sin_xhz4;
	cos_xhz1 = (1.0 - cos_theta);
	exp_hb2 = exp(-p_hb2 * bo_ij);
	exp_hb3 = exp(-p_hb3 * (r0_hb/rik + rik/r0_hb - 2.0));

	e_hb = p_hb1 * (1.0 - exp_hb2) * exp_hb3 * sin_xhz4;
	if (eflag) ev.ereax[8] += e_hb;

	// hydrogen bond forces
	CEhb1 = p_hb1 * p_hb2 * exp_hb2 * exp_hb3 * sin_xhz4;
	CEhb2 = -p_hb1/2.0 * (1.0 - exp_hb2) * exp_hb3 * cos_xhz1;
	CEhb3 = -p_hb3 * (-r0_hb/SQR(rik) + 1.0/r0_hb) * e_hb;

	d_Cdbo(i,j_index) += CEhb1; // dbo term

	// dcos terms
	for (int d = 0; d < 3; d++) fi_tmp[d] = CEhb2 * dcos_theta_di[d];
	for (int d = 0; d < 3; d++) fj_tmp[d] = CEhb2 * dcos_theta_dj[d];
	for (int d = 0; d < 3; d++) fk_tmp[d] = CEhb2 * dcos_theta_dk[d];

	// dr terms
	for (int d = 0; d < 3; d++) fi_tmp[d] -= CEhb3/rik * delik[d];
	for (int d = 0; d < 3; d++) fk_tmp[d] += CEhb3/rik * delik[d];

	for (int d = 0; d < 3; d++) fitmp[d] -= fi_tmp[d];
	for (int d = 0; d < 3; d++) a_f(j,d) -= fj_tmp[d];
	for (int d = 0; d < 3; d++) a_f(k,d) -= fk_tmp[d];

	for (int d = 0; d < 3; d++) delki[d] = -1.0 * delik[d];
	for (int d = 0; d < 3; d++) delji[d] = -1.0 * delij[d];
	if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,j,e_hb);
	if (vflag_either) this->template v_tally3<NEIGHFLAG>(ev,i,j,k,fj_tmp,fk_tmp,delji,delki);
	}
	}
	for (int d = 0; d < 3; d++) a_f(i,d) += fitmp[d];
	}

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeHydrogen<NEIGHFLAG,EVFLAG>, const int &ii) const {
	EV_FLOAT_REAX ev;
	this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeHydrogen<NEIGHFLAG,EVFLAG>(), ii, ev);

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxUpdateBond<NEIGHFLAG>, const int &ii) const {

	Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbo = d_Cdbo;
	Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbopi = d_Cdbopi;
	Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbopi2 = d_Cdbopi2;

	const int i = d_ilist[ii];
	const tagint itag = tag(i);
	const int j_start = d_bo_first[i];
	const int j_end = j_start + d_bo_num[i];

	for (int jj = j_start; jj < j_end; jj++) {
	int j = d_bo_list[jj];
	j &= NEIGHMASK;
	const tagint jtag = tag(j);
	const int j_index = jj - j_start;
	const F_FLOAT Cdbo_i = d_Cdbo(i,j_index);
	const F_FLOAT Cdbopi_i = d_Cdbopi(i,j_index);
	const F_FLOAT Cdbopi2_i = d_Cdbopi2(i,j_index);

	const int k_start = d_bo_first[j];
	const int k_end = k_start + d_bo_num[j];

	for (int kk = k_start; kk < k_end; kk++) {
	int k = d_bo_list[kk];
	k &= NEIGHMASK;
	if (k != i) continue;
	const int k_index = kk - k_start;

	int flag = 0;
	if (itag > jtag) {
	if ((itag+jtag) % 2 == 0) flag = 1;
	} else if (itag < jtag) {
	if ((itag+jtag) % 2 == 1) flag = 1;
	}

	if (flag) {
	a_Cdbo(j,k_index) += Cdbo_i;
	a_Cdbopi(j,k_index) += Cdbopi_i;
	a_Cdbopi2(j,k_index) += Cdbopi2_i;
	}
	}
	}

	}

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeBond1<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {

	Kokkos::View<F_FLOAT*[3], typename DAT::t_f_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_f = f;
	Kokkos::View<F_FLOAT*, typename DAT::t_ffloat_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_CdDelta = d_CdDelta;

	F_FLOAT delij[3];
	F_FLOAT p_be1, p_be2, De_s, De_p, De_pp, pow_BOs_be2, exp_be12, CEbo, ebond;

	const int i = d_ilist[ii];
	const X_FLOAT xtmp = x(i,0);
	const X_FLOAT ytmp = x(i,1);
	const X_FLOAT ztmp = x(i,2);
	const int itype = type(i);
	const tagint itag = tag(i);
	const F_FLOAT imass = paramssing(itype).mass;
	const F_FLOAT val_i = paramssing(itype).valency;
	const int j_start = d_bo_first[i];
	const int j_end = j_start + d_bo_num[i];

	F_FLOAT CdDelta_i = 0.0;

	for (int jj = j_start; jj < j_end; jj++) {
	int j = d_bo_list[jj];
	j &= NEIGHMASK;
	const tagint jtag = tag(j);

	if (itag > jtag) {
	if ((itag+jtag) % 2 == 0) continue;
	} else if (itag < jtag) {
	if ((itag+jtag) % 2 == 1) continue;
	} else {
	if (x(j,2) < ztmp) continue;
	if (x(j,2) == ztmp && x(j,1) < ytmp) continue;
	if (x(j,2) == ztmp && x(j,1) == ytmp && x(j,0) < xtmp) continue;
	}

	const int jtype = type(j);
	const int j_index = jj - j_start;
	const F_FLOAT jmass = paramssing(jtype).mass;

	delij[0] = x(j,0) - xtmp;
	delij[1] = x(j,1) - ytmp;
	delij[2] = x(j,2) - ztmp;

	const F_FLOAT rsq = delij[0]delij[0] + delij[1]delij[1] + delij[2]*delij[2];
	const F_FLOAT rij = sqrt(rsq);

	const int k_start = d_bo_first[j];
	const int k_end = k_start + d_bo_num[j];

	const F_FLOAT p_bo1 = paramstwbp(itype,jtype).p_bo1;
	const F_FLOAT p_bo2 = paramstwbp(itype,jtype).p_bo2;
	const F_FLOAT p_bo3 = paramstwbp(itype,jtype).p_bo3;
	const F_FLOAT p_bo4 = paramstwbp(itype,jtype).p_bo4;
	const F_FLOAT p_bo5 = paramstwbp(itype,jtype).p_bo5;
	const F_FLOAT p_bo6 = paramstwbp(itype,jtype).p_bo6;
	const F_FLOAT r_s = paramstwbp(itype,jtype).r_s;
	const F_FLOAT r_pi = paramstwbp(itype,jtype).r_pi;
	const F_FLOAT r_pi2 = paramstwbp(itype,jtype).r_pi2;

	// bond energy (nlocal only)
	p_be1 = paramstwbp(itype,jtype).p_be1;
	p_be2 = paramstwbp(itype,jtype).p_be2;
	De_s = paramstwbp(itype,jtype).De_s;
	De_p = paramstwbp(itype,jtype).De_p;
	De_pp = paramstwbp(itype,jtype).De_pp;

	const F_FLOAT BO_i = d_BO(i,j_index);
	const F_FLOAT BO_s_i = d_BO_s(i,j_index);
	const F_FLOAT BO_pi_i = d_BO_pi(i,j_index);
	const F_FLOAT BO_pi2_i = d_BO_pi2(i,j_index);

	pow_BOs_be2 = pow(BO_s_i,p_be2);
	exp_be12 = exp(p_be1*(1.0-pow_BOs_be2));
	CEbo = -De_sexp_be12(1.0-p_be1p_be2pow_BOs_be2);
	ebond = -De_sBO_s_iexp_be12
	-De_p*BO_pi_i
	-De_pp*BO_pi2_i;

	if (eflag) ev.evdwl += ebond;
	//if (eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,j,ebond,0.0,0.0,0.0,0.0);
	//if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,j,ebond);

	// calculate derivatives of Bond Orders
	d_Cdbo(i,j_index) += CEbo;
	d_Cdbopi(i,j_index) -= (CEbo + De_p);
	d_Cdbopi2(i,j_index) -= (CEbo + De_pp);

	// Stabilisation terminal triple bond
	F_FLOAT estriph = 0.0;

	if( BO_i >= 1.00 ) {
	if( gp[37] == 2 \|\| (imass == 12.0000 && jmass == 15.9990) \|\|
	(jmass == 12.0000 && imass == 15.9990) ) {
	const F_FLOAT exphu = exp(-gp[7] * SQR(BO_i - 2.50) );
	const F_FLOAT exphua1 = exp(-gp[3] * (d_total_bo[i]-BO_i));
	const F_FLOAT exphub1 = exp(-gp[3] * (d_total_bo[j]-BO_i));
	const F_FLOAT exphuov = exp(gp[4] * (d_Delta[i] + d_Delta[j]));
	const F_FLOAT hulpov = 1.0 / (1.0 + 25.0 * exphuov);
	estriph = gp[10] * exphu * hulpov * (exphua1 + exphub1);

	if (eflag) ev.evdwl += estriph;
	//if (eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,j,estriph,0.0,0.0,0.0,0.0);
	//if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,j,estriph);

	const F_FLOAT decobdbo = gp[10] * exphu * hulpov * (exphua1 + exphub1) *
	( gp[3] - 2.0 * gp[7] * (BO_i-2.50) );
	const F_FLOAT decobdboua = -gp[10] * exphu * hulpov *
	(gp[3]exphua1 + 25.0gp[4]exphuovhulpov*(exphua1+exphub1));
	const F_FLOAT decobdboub = -gp[10] * exphu * hulpov *
	(gp[3]exphub1 + 25.0gp[4]exphuovhulpov*(exphua1+exphub1));

	d_Cdbo(i,j_index) += decobdbo;
	CdDelta_i += decobdboua;
	a_CdDelta[j] += decobdboub;
	}
	}
	const F_FLOAT eng_tmp = ebond + estriph;
	if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,j,eng_tmp);
	}
	a_CdDelta[i] += CdDelta_i;
	}

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeBond1<NEIGHFLAG,EVFLAG>, const int &ii) const {
	EV_FLOAT_REAX ev;
	this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeBond1<NEIGHFLAG,EVFLAG>(), ii, ev);

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeBond2<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {

	Kokkos::View<F_FLOAT*[3], typename DAT::t_f_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_f = f;

	F_FLOAT delij[3], delik[3], deljk[3], tmpvec[3];
	F_FLOAT dBOp_i[3], dBOp_k[3], dln_BOp_pi[3], dln_BOp_pi2[3];

	const int i = d_ilist[ii];
	const X_FLOAT xtmp = x(i,0);
	const X_FLOAT ytmp = x(i,1);
	const X_FLOAT ztmp = x(i,2);
	const int itype = type(i);
	const tagint itag = tag(i);
	const F_FLOAT imass = paramssing(itype).mass;
	const F_FLOAT val_i = paramssing(itype).valency;
	const int j_start = d_bo_first[i];
	const int j_end = j_start + d_bo_num[i];

	F_FLOAT CdDelta_i = d_CdDelta[i];
	F_FLOAT fitmp[3];
	for (int j = 0; j < 3; j++) fitmp[j] = 0.0;

	for (int jj = j_start; jj < j_end; jj++) {
	int j = d_bo_list[jj];
	j &= NEIGHMASK;
	const tagint jtag = tag(j);

	if (itag > jtag) {
	if ((itag+jtag) % 2 == 0) continue;
	} else if (itag < jtag) {
	if ((itag+jtag) % 2 == 1) continue;
	} else {
	if (x(j,2) < ztmp) continue;
	if (x(j,2) == ztmp && x(j,1) < ytmp) continue;
	if (x(j,2) == ztmp && x(j,1) == ytmp && x(j,0) < xtmp) continue;
	}

	const int jtype = type(j);
	const int j_index = jj - j_start;
	const F_FLOAT jmass = paramssing(jtype).mass;
	F_FLOAT CdDelta_j = d_CdDelta[j];

	delij[0] = x(j,0) - xtmp;
	delij[1] = x(j,1) - ytmp;
	delij[2] = x(j,2) - ztmp;

	const F_FLOAT rsq = delij[0]delij[0] + delij[1]delij[1] + delij[2]*delij[2];
	const F_FLOAT rij = sqrt(rsq);

	const int k_start = d_bo_first[j];
	const int k_end = k_start + d_bo_num[j];

	F_FLOAT coef_C1dbo, coef_C2dbo, coef_C3dbo, coef_C1dbopi, coef_C2dbopi, coef_C3dbopi, coef_C4dbopi;
	F_FLOAT coef_C1dbopi2, coef_C2dbopi2, coef_C3dbopi2, coef_C4dbopi2, coef_C1dDelta, coef_C2dDelta, coef_C3dDelta;

	coef_C1dbo = coef_C2dbo = coef_C3dbo = 0.0;
	coef_C1dbopi = coef_C2dbopi = coef_C3dbopi = coef_C4dbopi = 0.0;
	coef_C1dbopi2 = coef_C2dbopi2 = coef_C3dbopi2 = coef_C4dbopi2 = 0.0;
	coef_C1dDelta = coef_C2dDelta = coef_C3dDelta = 0.0;

	// total forces on i, j, k (nlocal + nghost, from Add_dBond_to_Forces))
	const F_FLOAT Cdbo_ij = d_Cdbo(i,j_index);
	coef_C1dbo = d_C1dbo(i,j_index) * (Cdbo_ij);
	coef_C2dbo = d_C2dbo(i,j_index) * (Cdbo_ij);
	coef_C3dbo = d_C3dbo(i,j_index) * (Cdbo_ij);

	const F_FLOAT Cdbopi_ij = d_Cdbopi(i,j_index);
	coef_C1dbopi = d_C1dbopi(i,j_index) * (Cdbopi_ij);
	coef_C2dbopi = d_C2dbopi(i,j_index) * (Cdbopi_ij);
	coef_C3dbopi = d_C3dbopi(i,j_index) * (Cdbopi_ij);
	coef_C4dbopi = d_C4dbopi(i,j_index) * (Cdbopi_ij);

	const F_FLOAT Cdbopi2_ij = d_Cdbopi2(i,j_index);
	coef_C1dbopi2 = d_C1dbopi2(i,j_index) * (Cdbopi2_ij);
	coef_C2dbopi2 = d_C2dbopi2(i,j_index) * (Cdbopi2_ij);
	coef_C3dbopi2 = d_C3dbopi2(i,j_index) * (Cdbopi2_ij);
	coef_C4dbopi2 = d_C4dbopi2(i,j_index) * (Cdbopi2_ij);

	const F_FLOAT coeff_CdDelta_ij = CdDelta_i + CdDelta_j;
	coef_C1dDelta = d_C1dbo(i,j_index) * (coeff_CdDelta_ij);
	coef_C2dDelta = d_C2dbo(i,j_index) * (coeff_CdDelta_ij);
	coef_C3dDelta = d_C3dbo(i,j_index) * (coeff_CdDelta_ij);

	F_FLOAT temp[3];

	dln_BOp_pi[0] = d_dln_BOp_pix(i,j_index);
	dln_BOp_pi[1] = d_dln_BOp_piy(i,j_index);
	dln_BOp_pi[2] = d_dln_BOp_piz(i,j_index);

	dln_BOp_pi2[0] = d_dln_BOp_pi2x(i,j_index);
	dln_BOp_pi2[1] = d_dln_BOp_pi2y(i,j_index);
	dln_BOp_pi2[2] = d_dln_BOp_pi2z(i,j_index);

	dBOp_i[0] = d_dBOpx(i,j_index);
	dBOp_i[1] = d_dBOpy(i,j_index);
	dBOp_i[2] = d_dBOpz(i,j_index);

	// forces on i
	for (int d = 0; d < 3; d++) temp[d] = coef_C1dbo * dBOp_i[d];
	for (int d = 0; d < 3; d++) temp[d] += coef_C2dbo * d_dDeltap_self(i,d);
	for (int d = 0; d < 3; d++) temp[d] += coef_C1dDelta * dBOp_i[d];
	for (int d = 0; d < 3; d++) temp[d] += coef_C2dDelta * d_dDeltap_self(i,d);
	for (int d = 0; d < 3; d++) temp[d] += coef_C1dbopi * dln_BOp_pi[d];
	for (int d = 0; d < 3; d++) temp[d] += coef_C2dbopi * dBOp_i[d];
	for (int d = 0; d < 3; d++) temp[d] += coef_C3dbopi * d_dDeltap_self(i,d);
	for (int d = 0; d < 3; d++) temp[d] += coef_C1dbopi2 * dln_BOp_pi2[d];
	for (int d = 0; d < 3; d++) temp[d] += coef_C2dbopi2 * dBOp_i[d];
	for (int d = 0; d < 3; d++) temp[d] += coef_C3dbopi2 * d_dDeltap_self(i,d);

	if (EVFLAG)
	if (vflag_either) this->template v_tally<NEIGHFLAG>(ev,i,temp,delij);

	fitmp[0] -= temp[0];
	fitmp[1] -= temp[1];
	fitmp[2] -= temp[2];

	// forces on j
	for (int d = 0; d < 3; d++) temp[d] = -coef_C1dbo * dBOp_i[d];
	for (int d = 0; d < 3; d++) temp[d] += coef_C3dbo * d_dDeltap_self(j,d);
	for (int d = 0; d < 3; d++) temp[d] -= coef_C1dDelta * dBOp_i[d];
	for (int d = 0; d < 3; d++) temp[d] += coef_C3dDelta * d_dDeltap_self(j,d);
	for (int d = 0; d < 3; d++) temp[d] -= coef_C1dbopi * dln_BOp_pi[d];
	for (int d = 0; d < 3; d++) temp[d] -= coef_C2dbopi * dBOp_i[d];
	for (int d = 0; d < 3; d++) temp[d] += coef_C4dbopi * d_dDeltap_self(j,d);
	for (int d = 0; d < 3; d++) temp[d] -= coef_C1dbopi2 * dln_BOp_pi2[d];
	for (int d = 0; d < 3; d++) temp[d] -= coef_C2dbopi2 * dBOp_i[d];
	for (int d = 0; d < 3; d++) temp[d] += coef_C4dbopi2 * d_dDeltap_self(j,d);

	a_f(j,0) -= temp[0];
	a_f(j,1) -= temp[1];
	a_f(j,2) -= temp[2];

	if (EVFLAG)
	if (vflag_either) {
	for (int d = 0; d < 3; d++) tmpvec[d] = -delij[d];
	this->template v_tally<NEIGHFLAG>(ev,j,temp,tmpvec);
	}

	// forces on k: i neighbor
	for (int kk = j_start; kk < j_end; kk++) {
	int k = d_bo_list[kk];
	k &= NEIGHMASK;
	const int k_index = kk - j_start;

	dBOp_k[0] = d_dBOpx(i,k_index);
	dBOp_k[1] = d_dBOpy(i,k_index);
	dBOp_k[2] = d_dBOpz(i,k_index);
	const F_FLOAT coef_all = -coef_C2dbo - coef_C2dDelta - coef_C3dbopi - coef_C3dbopi2;
	for (int d = 0; d < 3; d++) temp[d] = coef_all * dBOp_k[d];

	a_f(k,0) -= temp[0];
	a_f(k,1) -= temp[1];
	a_f(k,2) -= temp[2];

	if (EVFLAG)
	if (vflag_either) {
	delik[0] = x(k,0) - xtmp;
	delik[1] = x(k,1) - ytmp;
	delik[2] = x(k,2) - ztmp;
	for (int d = 0; d < 3; d++) tmpvec[d] = x(j,d) - x(k,d) - delik[d];
	this->template v_tally<NEIGHFLAG>(ev,k,temp,tmpvec);
	}

	}

	// forces on k: j neighbor
	for (int kk = k_start; kk < k_end; kk++) {
	int k = d_bo_list[kk];
	k &= NEIGHMASK;
	const int k_index = kk - k_start;

	dBOp_k[0] = d_dBOpx(j,k_index);
	dBOp_k[1] = d_dBOpy(j,k_index);
	dBOp_k[2] = d_dBOpz(j,k_index);
	const F_FLOAT coef_all = -coef_C3dbo - coef_C3dDelta - coef_C4dbopi - coef_C4dbopi2;
	for (int d = 0; d < 3; d++) temp[d] = coef_all * dBOp_k[d];

	a_f(k,0) -= temp[0];
	a_f(k,1) -= temp[1];
	a_f(k,2) -= temp[2];

	if (EVFLAG) {
	if (vflag_either) {
	for (int d = 0; d < 3; d++) deljk[d] = x(k,d) - x(j,d);
	for (int d = 0; d < 3; d++) tmpvec[d] = x(i,d) - x(k,d) - deljk[d];
	this->template v_tally<NEIGHFLAG>(ev,k,temp,tmpvec);
	}
	}

	}
	}
	for (int d = 0; d < 3; d++) a_f(i,d) += fitmp[d];
	}

	template<class DeviceType>
	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeBond2<NEIGHFLAG,EVFLAG>, const int &ii) const {
	EV_FLOAT_REAX ev;
	this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeBond2<NEIGHFLAG,EVFLAG>(), ii, ev);

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::ev_tally(EV_FLOAT_REAX &ev, const int &i, const int &j,
	const F_FLOAT &epair, const F_FLOAT &fpair, const F_FLOAT &delx,
	const F_FLOAT &dely, const F_FLOAT &delz) const
	{
	const int VFLAG = vflag_either;

	// The eatom and vatom arrays are atomic for Half/Thread neighbor style
	Kokkos::View<E_FLOAT*, typename DAT::t_efloat_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_eatom = v_eatom;
	Kokkos::View<F_FLOAT*[6], typename DAT::t_virial_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_vatom = v_vatom;

	if (eflag_atom) {
	const E_FLOAT epairhalf = 0.5 * epair;
	a_eatom[i] += epairhalf;
	if (NEIGHFLAG != FULL) a_eatom[j] += epairhalf;
	}

	if (VFLAG) {
	const E_FLOAT v0 = delxdelxfpair;
	const E_FLOAT v1 = delydelyfpair;
	const E_FLOAT v2 = delzdelzfpair;
	const E_FLOAT v3 = delxdelyfpair;
	const E_FLOAT v4 = delxdelzfpair;
	const E_FLOAT v5 = delydelzfpair;

	if (vflag_global) {
	if (NEIGHFLAG != FULL) {
	ev.v[0] += v0;
	ev.v[1] += v1;
	ev.v[2] += v2;
	ev.v[3] += v3;
	ev.v[4] += v4;
	ev.v[5] += v5;
	} else {
	ev.v[0] += 0.5*v0;
	ev.v[1] += 0.5*v1;
	ev.v[2] += 0.5*v2;
	ev.v[3] += 0.5*v3;
	ev.v[4] += 0.5*v4;
	ev.v[5] += 0.5*v5;
	}
	}

	if (vflag_atom) {
	a_vatom(i,0) += 0.5*v0;
	a_vatom(i,1) += 0.5*v1;
	a_vatom(i,2) += 0.5*v2;
	a_vatom(i,3) += 0.5*v3;
	a_vatom(i,4) += 0.5*v4;
	a_vatom(i,5) += 0.5*v5;

	if (NEIGHFLAG != FULL) {
	a_vatom(j,0) += 0.5*v0;
	a_vatom(j,1) += 0.5*v1;
	a_vatom(j,2) += 0.5*v2;
	a_vatom(j,3) += 0.5*v3;
	a_vatom(j,4) += 0.5*v4;
	a_vatom(j,5) += 0.5*v5;
	}
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::e_tally(EV_FLOAT_REAX &ev, const int &i, const int &j,
	const F_FLOAT &epair) const
	{

	// The eatom array is atomic for Half/Thread neighbor style

	if (eflag_atom) {
	Kokkos::View<E_FLOAT*, typename DAT::t_efloat_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_eatom = v_eatom;
	const E_FLOAT epairhalf = 0.5 * epair;
	a_eatom[i] += epairhalf;
	a_eatom[j] += epairhalf;
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::e_tally_single(EV_FLOAT_REAX &ev, const int &i,
	const F_FLOAT &epair) const
	{
	// The eatom array is atomic for Half/Thread neighbor style
	Kokkos::View<E_FLOAT*, typename DAT::t_efloat_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_eatom = v_eatom;

	a_eatom[i] += epair;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::v_tally(EV_FLOAT_REAX &ev, const int &i,
	F_FLOAT fi, F_FLOAT drij) const
	{

	F_FLOAT v[6];

	v[0] = 0.5drij[0]fi[0];
	v[1] = 0.5drij[1]fi[1];
	v[2] = 0.5drij[2]fi[2];
	v[3] = 0.5drij[0]fi[1];
	v[4] = 0.5drij[0]fi[2];
	v[5] = 0.5drij[1]fi[2];

	if (vflag_global) {
	ev.v[0] += v[0];
	ev.v[1] += v[1];
	ev.v[2] += v[2];
	ev.v[3] += v[3];
	ev.v[4] += v[4];
	ev.v[5] += v[5];
	}

	if (vflag_atom) {
	Kokkos::View<F_FLOAT*[6], typename DAT::t_virial_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_vatom = v_vatom;
	a_vatom(i,0) += v[0]; a_vatom(i,1) += v[1]; a_vatom(i,2) += v[2];
	a_vatom(i,3) += v[3]; a_vatom(i,4) += v[4]; a_vatom(i,5) += v[5];
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::v_tally3(EV_FLOAT_REAX &ev, const int &i, const int &j, const int &k,
	F_FLOAT fj, F_FLOAT fk, F_FLOAT drij, F_FLOAT drik) const
	{

	// The eatom and vatom arrays are atomic for Half/Thread neighbor style
	Kokkos::View<F_FLOAT*[6], typename DAT::t_virial_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_vatom = v_vatom;

	F_FLOAT v[6];

	v[0] = drij[0]fj[0] + drik[0]fk[0];
	v[1] = drij[1]fj[1] + drik[1]fk[1];
	v[2] = drij[2]fj[2] + drik[2]fk[2];
	v[3] = drij[0]fj[1] + drik[0]fk[1];
	v[4] = drij[0]fj[2] + drik[0]fk[2];
	v[5] = drij[1]fj[2] + drik[1]fk[2];

	if (vflag_global) {
	ev.v[0] += v[0];
	ev.v[1] += v[1];
	ev.v[2] += v[2];
	ev.v[3] += v[3];
	ev.v[4] += v[4];
	ev.v[5] += v[5];
	}

	if (vflag_atom) {
	a_vatom(i,0) += THIRD * v[0]; a_vatom(i,1) += THIRD * v[1]; a_vatom(i,2) += THIRD * v[2];
	a_vatom(i,3) += THIRD * v[3]; a_vatom(i,4) += THIRD * v[4]; a_vatom(i,5) += THIRD * v[5];
	a_vatom(j,0) += THIRD * v[0]; a_vatom(j,1) += THIRD * v[1]; a_vatom(j,2) += THIRD * v[2];
	a_vatom(j,3) += THIRD * v[3]; a_vatom(j,4) += THIRD * v[4]; a_vatom(j,5) += THIRD * v[5];
	a_vatom(k,0) += THIRD * v[0]; a_vatom(k,1) += THIRD * v[1]; a_vatom(k,2) += THIRD * v[2];
	a_vatom(k,3) += THIRD * v[3]; a_vatom(k,4) += THIRD * v[4]; a_vatom(k,5) += THIRD * v[5];
	}

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::v_tally4(EV_FLOAT_REAX &ev, const int &i, const int &j, const int &k,
	const int &l, F_FLOAT fi, F_FLOAT fj, F_FLOAT fk, F_FLOAT dril, F_FLOAT drjl, F_FLOAT drkl) const
	{

	// The vatom array is atomic for Half/Thread neighbor style
	F_FLOAT v[6];

	v[0] = dril[0]fi[0] + drjl[0]fj[0] + drkl[0]*fk[0];
	v[1] = dril[1]fi[1] + drjl[1]fj[1] + drkl[1]*fk[1];
	v[2] = dril[2]fi[2] + drjl[2]fj[2] + drkl[2]*fk[2];
	v[3] = dril[0]fi[1] + drjl[0]fj[1] + drkl[0]*fk[1];
	v[4] = dril[0]fi[2] + drjl[0]fj[2] + drkl[0]*fk[2];
	v[5] = dril[1]fi[2] + drjl[1]fj[2] + drkl[1]*fk[2];

	if (vflag_global) {
	ev.v[0] += v[0];
	ev.v[1] += v[1];
	ev.v[2] += v[2];
	ev.v[3] += v[3];
	ev.v[4] += v[4];
	ev.v[5] += v[5];
	}

	if (vflag_atom) {
	Kokkos::View<F_FLOAT*[6], typename DAT::t_virial_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_vatom = v_vatom;
	a_vatom(i,0) += 0.25 * v[0]; a_vatom(i,1) += 0.25 * v[1]; a_vatom(i,2) += 0.25 * v[2];
	a_vatom(i,3) += 0.25 * v[3]; a_vatom(i,4) += 0.25 * v[4]; a_vatom(i,5) += 0.25 * v[5];
	a_vatom(j,0) += 0.25 * v[0]; a_vatom(j,1) += 0.25 * v[1]; a_vatom(j,2) += 0.25 * v[2];
	a_vatom(j,3) += 0.25 * v[3]; a_vatom(j,4) += 0.25 * v[4]; a_vatom(j,5) += 0.25 * v[5];
	a_vatom(k,0) += 0.25 * v[0]; a_vatom(k,1) += 0.25 * v[1]; a_vatom(k,2) += 0.25 * v[2];
	a_vatom(k,3) += 0.25 * v[3]; a_vatom(k,4) += 0.25 * v[4]; a_vatom(k,5) += 0.25 * v[5];
	a_vatom(l,0) += 0.25 * v[0]; a_vatom(l,1) += 0.25 * v[1]; a_vatom(l,2) += 0.25 * v[2];
	a_vatom(l,3) += 0.25 * v[3]; a_vatom(l,4) += 0.25 * v[4]; a_vatom(l,5) += 0.25 * v[5];
	}

	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::v_tally3_atom(EV_FLOAT_REAX &ev, const int &i, const int &j, const int &k,
	F_FLOAT fj, F_FLOAT fk, F_FLOAT drji, F_FLOAT drjk) const
	{
	F_FLOAT v[6];

	v[0] = THIRD * (drji[0]fj[0] + drjk[0]fk[0]);
	v[1] = THIRD * (drji[1]fj[1] + drjk[1]fk[1]);
	v[2] = THIRD * (drji[2]fj[2] + drjk[2]fk[2]);
	v[3] = THIRD * (drji[0]fj[1] + drjk[0]fk[1]);
	v[4] = THIRD * (drji[0]fj[2] + drjk[0]fk[2]);
	v[5] = THIRD * (drji[1]fj[2] + drjk[1]fk[2]);

	if (vflag_global) {
	ev.v[0] += v[0];
	ev.v[1] += v[1];
	ev.v[2] += v[2];
	ev.v[3] += v[3];
	ev.v[4] += v[4];
	ev.v[5] += v[5];
	}

	if (vflag_atom) {
	d_vatom(i,0) += v[0]; d_vatom(i,1) += v[1]; d_vatom(i,2) += v[2];
	d_vatom(i,3) += v[3]; d_vatom(i,4) += v[4]; d_vatom(i,5) += v[5];
	}
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void PairReaxCKokkos<DeviceType>::extract(const char str, int &dim)
	{
	dim = 1;
	if (strcmp(str,"chi") == 0 && chi) {
	for (int i = 1; i <= atom->ntypes; i++)
	if (map[i] >= 0) chi[i] = system->reax_param.sbp[map[i]].chi;
	else chi[i] = 0.0;
	return (void *) chi;
	}
	if (strcmp(str,"eta") == 0 && eta) {
	for (int i = 1; i <= atom->ntypes; i++)
	if (map[i] >= 0) eta[i] = system->reax_param.sbp[map[i]].eta;
	else eta[i] = 0.0;
	return (void *) eta;
	}
	if (strcmp(str,"gamma") == 0 && gamma) {
	for (int i = 1; i <= atom->ntypes; i++)
	if (map[i] >= 0) gamma[i] = system->reax_param.sbp[map[i]].gamma;
	else gamma[i] = 0.0;
	return (void *) gamma;
	}
	return NULL;
	}

	/* ----------------------------------------------------------------------
	setup for energy, virial computation
	see integrate::ev_set() for values of eflag (0-3) and vflag (0-6)
	------------------------------------------------------------------------- */

	template<class DeviceType>
	void PairReaxCKokkos<DeviceType>::ev_setup(int eflag, int vflag)
	{
	int i;

	evflag = 1;

	eflag_either = eflag;
	eflag_global = eflag % 2;
	eflag_atom = eflag / 2;

	vflag_either = vflag;
	vflag_global = vflag % 4;
	vflag_atom = vflag / 4;

	// reallocate per-atom arrays if necessary

	if (eflag_atom && atom->nmax > maxeatom) {
	maxeatom = atom->nmax;
	memory->destroy_kokkos(k_eatom,eatom);
	memory->create_kokkos(k_eatom,eatom,maxeatom,"pair:eatom");
	v_eatom = k_eatom.view<DeviceType>();
	}
	if (vflag_atom && atom->nmax > maxvatom) {
	maxvatom = atom->nmax;
	memory->destroy_kokkos(k_vatom,vatom);
	memory->create_kokkos(k_vatom,vatom,maxvatom,6,"pair:vatom");
	v_vatom = k_vatom.view<DeviceType>();
	}

	// zero accumulators

	if (eflag_global) eng_vdwl = eng_coul = 0.0;
	if (vflag_global) for (i = 0; i < 6; i++) virial[i] = 0.0;
	if (eflag_atom) {
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxZeroEAtom>(0,maxeatom),*this);
	DeviceType::fence();
	}
	if (vflag_atom) {
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxZeroVAtom>(0,maxvatom),*this);
	DeviceType::fence();
	}

	// if vflag_global = 2 and pair::compute() calls virial_fdotr_compute()
	// compute global virial via (F dot r) instead of via pairwise summation
	// unset other flags as appropriate

	if (vflag_global == 2 && no_virial_fdotr_compute == 0) {
	vflag_fdotr = 1;
	vflag_global = 0;
	if (vflag_atom == 0) vflag_either = 0;
	if (vflag_either == 0 && eflag_either == 0) evflag = 0;
	} else vflag_fdotr = 0;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	double PairReaxCKokkos<DeviceType>::memory_usage()
	{
	double bytes = 0.0;

	if (cut_hbsq > 0.0) {
	bytes += nmax3sizeof(int);
	bytes += maxhbnmaxsizeof(int);
	}
	bytes += nmax2sizeof(int);
	bytes += maxbonmaxsizeof(int);

	bytes += nmax17sizeof(F_FLOAT);
	bytes += maxbonmax34*sizeof(F_FLOAT);

	// FixReaxCSpecies
	if (fixspecies_flag) {
	bytes += MAXSPECBONDnmaxsizeof(tagint);
	bytes += MAXSPECBONDnmaxsizeof(F_FLOAT);
	}

	// FixReaxCBonds
	bytes += maxbonmaxsizeof(tagint);
	bytes += maxbonmaxsizeof(F_FLOAT);
	bytes += nmax*sizeof(int);

	return bytes;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void PairReaxCKokkos<DeviceType>::FindBond(int &numbonds)
	{
	copymode = 1;
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxFindBondZero>(0,nmax),*this);
	DeviceType::fence();

	bo_cut_bond = control->bg_cut;

	atomKK->sync(execution_space,TAG_MASK);
	tag = atomKK->k_tag.view<DeviceType>();

	const int inum = list->inum;
	NeighListKokkos<DeviceType>* k_list = static_cast<NeighListKokkos<DeviceType>*>(list);
	d_ilist = k_list->d_ilist;
	k_list->clean_copy();

	numbonds = 0;
	PairReaxCKokkosFindBondFunctor<DeviceType> find_bond_functor(this);
	Kokkos::parallel_reduce(inum,find_bond_functor,numbonds);
	DeviceType::fence();
	copymode = 0;
	}

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxFindBondZero, const int &i) const {
	d_numneigh_bonds[i] = 0;
	for (int j = 0; j < maxbo; j++) {
	d_neighid(i,j) = 0;
	d_abo(i,j) = 0.0;
	}
	}

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::calculate_find_bond_item(int ii, int &numbonds) const
	{
	const int i = d_ilist[ii];
	int nj = 0;

	const int j_start = d_bo_first[i];
	const int j_end = j_start + d_bo_num[i];
	for (int jj = j_start; jj < j_end; jj++) {
	int j = d_bo_list[jj];
	j &= NEIGHMASK;
	const tagint jtag = tag[j];
	const int j_index = jj - j_start;
	double bo_tmp = d_BO(i,j_index);

	if (bo_tmp > bo_cut_bond) {
	d_neighid(i,nj) = jtag;
	d_abo(i,nj) = bo_tmp;
	nj++;
	}
	}
	d_numneigh_bonds[i] = nj;
	if (nj > numbonds) numbonds = nj;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void PairReaxCKokkos<DeviceType>::PackBondBuffer(DAT::tdual_ffloat_1d k_buf, int &nbuf_local)
	{
	d_buf = k_buf.view<DeviceType>();
	k_params_sing.template sync<DeviceType>();
	atomKK->sync(execution_space,TAG_MASK\|TYPE_MASK\|Q_MASK\|MOLECULE_MASK);

	tag = atomKK->k_tag.view<DeviceType>();
	type = atomKK->k_type.view<DeviceType>();
	q = atomKK->k_q.view<DeviceType>();
	if (atom->molecule)
	molecule = atomKK->k_molecule.view<DeviceType>();

	copymode = 1;
	nlocal = atomKK->nlocal;
	PairReaxCKokkosPackBondBufferFunctor<DeviceType> pack_bond_buffer_functor(this);
	Kokkos::parallel_scan(nlocal,pack_bond_buffer_functor);
	DeviceType::fence();
	copymode = 0;

	k_buf.modify<DeviceType>();
	k_nbuf_local.modify<DeviceType>();

	k_buf.sync<LMPHostType>();
	k_nbuf_local.sync<LMPHostType>();
	nbuf_local = k_nbuf_local.h_view();
	}

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::pack_bond_buffer_item(int i, int &j, const bool &final) const
	{
	if (i == 0)
	j += 2;

	if (final) {
	d_buf[j-1] = tag[i];
	d_buf[j+0] = type[i];
	d_buf[j+1] = d_total_bo[i];
	d_buf[j+2] = paramssing(type[i]).nlp_opt - d_Delta_lp[i];
	d_buf[j+3] = q[i];
	d_buf[j+4] = d_numneigh_bonds[i];
	}
	const int numbonds = d_numneigh_bonds[i];

	if (final) {
	for (int k = 5; k < 5+numbonds; k++) {
	d_buf[j+k] = d_neighid(i,k-5);
	}
	}
	j += (5+numbonds);

	if (final) {
	if (!molecule.data()) d_buf[j] = 0.0;
	else d_buf[j] = molecule[i];
	}
	j++;

	if (final) {
	for (int k = 0; k < numbonds; k++) {
	d_buf[j+k] = d_abo(i,k);
	}
	}
	j += (1+numbonds);

	if (final && i == nlocal-1)
	k_nbuf_local.view<DeviceType>()() = j - 1;
	}

	/* ---------------------------------------------------------------------- */

	template<class DeviceType>
	void PairReaxCKokkos<DeviceType>::FindBondSpecies()
	{
	copymode = 1;
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxFindBondSpeciesZero>(0,nmax),*this);
	DeviceType::fence();

	nlocal = atomKK->nlocal;
	Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxFindBondSpecies>(0,nlocal),*this);
	DeviceType::fence();
	copymode = 0;

	// NOTE: Could improve performance if a Kokkos version of ComputeSpecAtom is added

	k_tmpbo.modify<DeviceType>();
	k_tmpid.modify<DeviceType>();
	k_error_flag.modify<DeviceType>();

	k_tmpbo.sync<LMPHostType>();
	k_tmpid.sync<LMPHostType>();
	k_error_flag.sync<LMPHostType>();

	if (k_error_flag.h_view())
	error->all(FLERR,"Increase MAXSPECBOND in reaxc_defs.h");
	}

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxFindBondSpeciesZero, const int &i) const {
	for (int j = 0; j < MAXSPECBOND; j++) {
	k_tmpbo.view<DeviceType>()(i,j) = 0.0;
	k_tmpid.view<DeviceType>()(i,j) = 0;
	}
	}

	template<class DeviceType>
	KOKKOS_INLINE_FUNCTION
	void PairReaxCKokkos<DeviceType>::operator()(PairReaxFindBondSpecies, const int &i) const {
	int nj = 0;

	const int j_start = d_bo_first[i];
	const int j_end = j_start + d_bo_num[i];
	for (int jj = j_start; jj < j_end; jj++) {
	int j = d_bo_list[jj];
	j &= NEIGHMASK;
	if (j < i) continue;
	const int j_index = jj - j_start;

	double bo_tmp = d_BO(i,j_index);

	if (bo_tmp >= 0.10 ) { // Why is this a hardcoded value?
	k_tmpid.view<DeviceType>()(i,nj) = j;
	k_tmpbo.view<DeviceType>()(i,nj) = bo_tmp;
	nj++;
	if (nj > MAXSPECBOND) k_error_flag.view<DeviceType>()() = 1;
	}
	}
	}

	template class PairReaxCKokkos<LMPDeviceType>;
	#ifdef KOKKOS_HAVE_CUDA
	template class PairReaxCKokkos<LMPHostType>;
	#endif
	}
	diff --git a/src/KOKKOS/pair_reax_c_kokkos.h b/src/KOKKOS/pair_reaxc_kokkos.h
	similarity index 99%
	rename from src/KOKKOS/pair_reax_c_kokkos.h
	rename to src/KOKKOS/pair_reaxc_kokkos.h
	index 8a0c08b66..59c4d196d 100644
	--- a/src/KOKKOS/pair_reax_c_kokkos.h
	+++ b/src/KOKKOS/pair_reaxc_kokkos.h
	@@ -1,498 +1,498 @@
	/* -- c++ -- ----------------------------------------------------------

	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifdef PAIR_CLASS

	PairStyle(reax/c/kk,PairReaxCKokkos<LMPDeviceType>)
	PairStyle(reax/c/kk/device,PairReaxCKokkos<LMPDeviceType>)
	PairStyle(reax/c/kk/host,PairReaxCKokkos<LMPHostType>)

	#else

	#ifndef LMP_PAIR_REAXC_KOKKOS_H
	#define LMP_PAIR_REAXC_KOKKOS_H

	#include <stdio.h>
	#include "pair_kokkos.h"
	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "neigh_list_kokkos.h"
	#include "reaxc_types.h"

	#define C_ele 332.06371
	#define SMALL 0.0001
	#define KCALpMOL_to_EV 23.02
	#define HB_THRESHOLD 1e-2 // 0.01
	#define MAX_BONDS 30

	#define SQR(x) ((x)*(x))

	namespace LAMMPS_NS {

	typedef Kokkos::DualView<LR_data*,Kokkos::LayoutRight,LMPDeviceType> tdual_LR_data_1d;
	typedef typename tdual_LR_data_1d::t_dev t_LR_data_1d;

	typedef Kokkos::DualView<cubic_spline_coef*,Kokkos::LayoutRight,LMPDeviceType> tdual_cubic_spline_coef_1d;
	typedef typename tdual_cubic_spline_coef_1d::t_dev t_cubic_spline_coef_1d;

	struct LR_lookup_table_kk
	{
	double xmin, xmax;
	int n;
	double dx, inv_dx;
	double a;
	double m;
	double c;

	t_LR_data_1d d_y;
	t_cubic_spline_coef_1d d_H;
	t_cubic_spline_coef_1d d_vdW, d_CEvd;
	t_cubic_spline_coef_1d d_ele, d_CEclmb;
	};

	template<int NEIGHFLAG, int EVFLAG>
	struct PairReaxComputePolar{};

	template<int NEIGHFLAG, int EVFLAG>
	struct PairReaxComputeLJCoulomb{};

	template<int NEIGHFLAG, int EVFLAG>
	struct PairReaxComputeTabulatedLJCoulomb{};

	struct PairReaxBuildListsFull{};

	template<int NEIGHFLAG>
	struct PairReaxBuildListsHalf{};

	template<int NEIGHFLAG>
	struct PairReaxBuildListsHalf_LessAtomics{};

	struct PairReaxZero{};

	struct PairReaxZeroEAtom{};

	struct PairReaxZeroVAtom{};

	struct PairReaxBondOrder1{};

	struct PairReaxBondOrder1_LessAtomics{};

	struct PairReaxBondOrder2{};

	struct PairReaxBondOrder3{};

	template<int NEIGHFLAG>
	struct PairReaxUpdateBond{};

	template<int NEIGHFLAG, int EVFLAG>
	struct PairReaxComputeBond1{};

	template<int NEIGHFLAG, int EVFLAG>
	struct PairReaxComputeBond2{};

	template<int NEIGHFLAG, int EVFLAG>
	struct PairReaxComputeMulti1{};

	template<int NEIGHFLAG, int EVFLAG>
	struct PairReaxComputeMulti2{};

	template<int NEIGHFLAG, int EVFLAG>
	struct PairReaxComputeAngular{};

	template<int NEIGHFLAG, int EVFLAG>
	struct PairReaxComputeTorsion{};

	template<int NEIGHFLAG, int EVFLAG>
	struct PairReaxComputeHydrogen{};

	struct PairReaxFindBondZero{};

	struct PairReaxFindBondSpeciesZero{};

	struct PairReaxFindBondSpecies{};


	template<class DeviceType>
	class PairReaxCKokkos : public PairReaxC {
	public:
	enum {EnabledNeighFlags=FULL\|HALF\|HALFTHREAD};
	enum {COUL_FLAG=1};
	typedef DeviceType device_type;
	typedef ArrayTypes<DeviceType> AT;
	typedef EV_FLOAT_REAX value_type;

	PairReaxCKokkos(class LAMMPS *);
	virtual ~PairReaxCKokkos();

	void ev_setup(int, int);
	void compute(int, int);
	void extract(const char , int &);
	void init_style();
	double memory_usage();
	void FindBond(int &);
	void PackBondBuffer(DAT::tdual_ffloat_1d, int &);
	void FindBondSpecies();

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputePolar<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputePolar<NEIGHFLAG,EVFLAG>, const int&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeLJCoulomb<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeLJCoulomb<NEIGHFLAG,EVFLAG>, const int&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeTabulatedLJCoulomb<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeTabulatedLJCoulomb<NEIGHFLAG,EVFLAG>, const int&) const;

	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxBuildListsFull, const int&) const;

	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxBuildListsHalf<NEIGHFLAG>, const int&) const;

	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxBuildListsHalf_LessAtomics<NEIGHFLAG>, const int&) const;

	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxZero, const int&) const;

	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxZeroEAtom, const int&) const;

	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxZeroVAtom, const int&) const;

	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxBondOrder1, const int&) const;

	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxBondOrder1_LessAtomics, const int&) const;

	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxBondOrder2, const int&) const;

	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxBondOrder3, const int&) const;

	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxUpdateBond<NEIGHFLAG>, const int&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeBond1<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeBond1<NEIGHFLAG,EVFLAG>, const int&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeBond2<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeBond2<NEIGHFLAG,EVFLAG>, const int&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeMulti1<NEIGHFLAG,EVFLAG>, const int&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeMulti2<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeMulti2<NEIGHFLAG,EVFLAG>, const int&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeAngular<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeAngular<NEIGHFLAG,EVFLAG>, const int&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeTorsion<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeTorsion<NEIGHFLAG,EVFLAG>, const int&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeHydrogen<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;

	template<int NEIGHFLAG, int EVFLAG>
	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxComputeHydrogen<NEIGHFLAG,EVFLAG>, const int&) const;

	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxFindBondZero, const int&) const;

	KOKKOS_INLINE_FUNCTION
	void calculate_find_bond_item(int, int&) const;

	KOKKOS_INLINE_FUNCTION
	void pack_bond_buffer_item(int, int&, const bool&) const;

	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxFindBondSpeciesZero, const int&) const;

	KOKKOS_INLINE_FUNCTION
	void operator()(PairReaxFindBondSpecies, const int&) const;

	struct params_sing{
	KOKKOS_INLINE_FUNCTION
	params_sing(){mass=0;chi=0;eta=0;r_s=0;r_pi=0;r_pi2=0;valency=0;valency_val=0;valency_e=0;valency_boc=0;nlp_opt=0;
	p_lp2=0;p_ovun2=0;p_ovun5=0;p_val3=0;p_val5=0;p_hbond=0;};
	KOKKOS_INLINE_FUNCTION
	params_sing(int i){mass=0;chi=0;eta=0;r_s=0;r_pi=0;r_pi2=0;valency=0;valency_val=0;valency_e=0;valency_boc=0;nlp_opt=0;
	p_lp2=0;p_ovun2=0;p_ovun5=0;p_val3=0;p_val5=0;p_hbond=0;};
	F_FLOAT mass,chi,eta,r_s,r_pi,r_pi2,valency,valency_val,valency_e,valency_boc,nlp_opt,
	p_lp2,p_ovun2,p_ovun5, p_val3, p_val5, p_hbond;
	};

	struct params_twbp{
	KOKKOS_INLINE_FUNCTION
	params_twbp(){gamma=0;gamma_w=0;alpha=0;r_vdw=0;epsilon=0;acore=0;ecore=0;rcore=0;lgre=0;lgcij=0;
	r_s=0;r_pi=0;r_pi2=0;p_bo1=0;p_bo2=0;p_bo3=0;p_bo4=0;p_bo5=0;p_bo6=0;ovc=0;v13cor=0;
	p_boc3=0;p_boc4=0;p_boc5=0;p_be1=0,p_be2=0,De_s=0,De_p=0;De_pp=0;
	p_ovun1=0;};
	KOKKOS_INLINE_FUNCTION
	params_twbp(int i){gamma=0;gamma_w=0;alpha=0;r_vdw=0;epsilon=0;acore=0;ecore=0;rcore=0;lgre=0;lgcij=0;
	r_s=0;r_pi=0;r_pi2=0;p_bo1=0;p_bo2=0;p_bo3=0;p_bo4=0;p_bo5=0;p_bo6=0;ovc=0;v13cor=0;
	p_boc3=0;p_boc4=0;p_boc5=0;p_be1=0,p_be2=0,De_s=0,De_p=0;De_pp=0;
	p_ovun1=0;};
	F_FLOAT gamma,gamma_w,alpha,r_vdw,epsilon,acore,ecore,rcore,lgre,lgcij,
	r_s,r_pi,r_pi2,p_bo1,p_bo2,p_bo3,p_bo4,p_bo5,p_bo6,ovc,v13cor,
	p_boc3,p_boc4,p_boc5,p_be1,p_be2,De_s,De_p,De_pp,
	p_ovun1;
	};

	struct params_thbp{
	KOKKOS_INLINE_FUNCTION
	params_thbp(){cnt=0;theta_00=0;p_val1=0;p_val2=0;p_val4=0;p_val7=0;p_pen1=0;p_coa1=0;};
	KOKKOS_INLINE_FUNCTION
	params_thbp(int i){cnt=0;theta_00=0;p_val1=0;p_val2=0;p_val4=0;p_val7=0;p_pen1=0;p_coa1=0;};
	F_FLOAT cnt, theta_00, p_val1, p_val2, p_val4, p_val7, p_pen1, p_coa1;
	};

	struct params_fbp{
	KOKKOS_INLINE_FUNCTION
	params_fbp(){p_tor1=0;p_cot1=0;V1=0;V2=0;V3=0;};
	KOKKOS_INLINE_FUNCTION
	params_fbp(int i){p_tor1=0;p_cot1=0;V1=0;V2=0;V3=0;};
	F_FLOAT p_tor1, p_cot1, V1, V2, V3;
	};

	struct params_hbp{
	KOKKOS_INLINE_FUNCTION
	params_hbp(){p_hb1=0;p_hb2=0;p_hb3=0;r0_hb=0;};
	KOKKOS_INLINE_FUNCTION
	params_hbp(int i){p_hb1=0;p_hb2=0;p_hb3=0;r0_hb=0;};
	F_FLOAT p_hb1, p_hb2, p_hb3, r0_hb;
	};

	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void ev_tally(EV_FLOAT_REAX &ev, const int &i, const int &j, const F_FLOAT &epair, const F_FLOAT &fpair, const F_FLOAT &delx,
	const F_FLOAT &dely, const F_FLOAT &delz) const;

	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void e_tally(EV_FLOAT_REAX &ev, const int &i, const int &j, const F_FLOAT &epair) const;

	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void e_tally_single(EV_FLOAT_REAX &ev, const int &i, const F_FLOAT &epair) const;

	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void v_tally(EV_FLOAT_REAX &ev, const int &i, F_FLOAT fi, F_FLOAT drij) const;

	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void v_tally3(EV_FLOAT_REAX &ev, const int &i, const int &j, const int &k,
	F_FLOAT fj, F_FLOAT fk, F_FLOAT drij, F_FLOAT drik) const;

	KOKKOS_INLINE_FUNCTION
	void v_tally3_atom(EV_FLOAT_REAX &ev, const int &i, const int &j, const int &k,
	F_FLOAT fj, F_FLOAT fk, F_FLOAT drji, F_FLOAT drjk) const;

	template<int NEIGHFLAG>
	KOKKOS_INLINE_FUNCTION
	void v_tally4(EV_FLOAT_REAX &ev, const int &i, const int &j, const int &k, const int &l,
	F_FLOAT fi, F_FLOAT fj, F_FLOAT fk, F_FLOAT dril, F_FLOAT drjl, F_FLOAT drkl) const;

	protected:
	void cleanup_copy();
	void allocate();
	void allocate_array();
	void setup();
	void init_md();
	int Init_Lookup_Tables();
	void Deallocate_Lookup_Tables();
	void LR_vdW_Coulomb( int i, int j, double r_ij, LR_data *lr );

	typedef Kokkos::DualView<int*,DeviceType> tdual_int_1d;
	Kokkos::DualView<params_sing*,typename DeviceType::array_layout,DeviceType> k_params_sing;
	typename Kokkos::DualView<params_sing*,typename DeviceType::array_layout,DeviceType>::t_dev_const paramssing;

	typedef Kokkos::DualView<int**,DeviceType> tdual_int_2d;
	Kokkos::DualView<params_twbp**,typename DeviceType::array_layout,DeviceType> k_params_twbp;
	typename Kokkos::DualView<params_twbp**,typename DeviceType::array_layout,DeviceType>::t_dev_const paramstwbp;

	typedef Kokkos::DualView<int***,DeviceType> tdual_int_3d;
	Kokkos::DualView<params_thbp***,typename DeviceType::array_layout,DeviceType> k_params_thbp;
	typename Kokkos::DualView<params_thbp***,typename DeviceType::array_layout,DeviceType>::t_dev_const paramsthbp;
	Kokkos::DualView<params_hbp***,typename DeviceType::array_layout,DeviceType> k_params_hbp;
	typename Kokkos::DualView<params_hbp***,typename DeviceType::array_layout,DeviceType>::t_dev_const paramshbp;

	typedef Kokkos::DualView<int****,DeviceType> tdual_int_4d;
	Kokkos::DualView<params_fbp****,typename DeviceType::array_layout,DeviceType> k_params_fbp;
	typename Kokkos::DualView<params_fbp****,typename DeviceType::array_layout,DeviceType>::t_dev_const paramsfbp;

	typename AT::t_x_array_randomread x;
	typename AT::t_f_array f;
	typename AT::t_int_1d_randomread type;
	typename AT::t_tagint_1d_randomread tag;
	typename AT::t_float_1d_randomread q;
	typename AT::t_tagint_1d_randomread molecule;

	DAT::tdual_efloat_1d k_eatom;
	typename AT::t_efloat_1d v_eatom;

	DAT::tdual_virial_array k_vatom;
	typename ArrayTypes<DeviceType>::t_virial_array d_vatom;
	typename AT::t_virial_array v_vatom;
	HAT::t_virial_array h_vatom;

	DAT::tdual_float_1d k_tap;
	DAT::t_float_1d d_tap;
	HAT::t_float_1d h_tap;

	typename AT::t_float_1d d_bo_rij, d_hb_rsq, d_Deltap, d_Deltap_boc, d_total_bo;
	typename AT::t_float_1d d_Delta, d_Delta_boc, d_Delta_lp, d_dDelta_lp, d_Delta_lp_temp, d_CdDelta;
	typename AT::t_ffloat_2d_dl d_BO, d_BO_s, d_BO_pi, d_BO_pi2, d_dBOp;
	typename AT::t_ffloat_2d_dl d_dln_BOp_pix, d_dln_BOp_piy, d_dln_BOp_piz;
	typename AT::t_ffloat_2d_dl d_dln_BOp_pi2x, d_dln_BOp_pi2y, d_dln_BOp_pi2z;
	typename AT::t_ffloat_2d_dl d_C1dbo, d_C2dbo, d_C3dbo;
	typename AT::t_ffloat_2d_dl d_C1dbopi, d_C2dbopi, d_C3dbopi, d_C4dbopi;
	typename AT::t_ffloat_2d_dl d_C1dbopi2, d_C2dbopi2, d_C3dbopi2, d_C4dbopi2;
	typename AT::t_ffloat_2d_dl d_Cdbo, d_Cdbopi, d_Cdbopi2, d_dDeltap_self;

	typedef Kokkos::DualView<F_FLOAT**[7],typename DeviceType::array_layout,DeviceType> tdual_ffloat_2d_n7;
	typedef typename tdual_ffloat_2d_n7::t_dev_const_randomread t_ffloat_2d_n7_randomread;
	typedef typename tdual_ffloat_2d_n7::t_host t_host_ffloat_2d_n7;

	typename AT::t_neighbors_2d d_neighbors;
	typename AT::t_int_1d_randomread d_ilist;
	typename AT::t_int_1d_randomread d_numneigh;

	typename AT::t_int_1d d_bo_first, d_bo_num, d_bo_list, d_hb_first, d_hb_num, d_hb_list;

	DAT::tdual_int_scalar k_resize_bo, k_resize_hb;
	typename AT::t_int_scalar d_resize_bo, d_resize_hb;

	typename AT::t_ffloat_2d_dl d_sum_ovun;
	typename AT::t_ffloat_2d_dl d_dBOpx, d_dBOpy, d_dBOpz;

	int neighflag,newton_pair, maxnumneigh, maxhb, maxbo;
	int nlocal,nall,eflag,vflag;
	F_FLOAT cut_nbsq, cut_hbsq, cut_bosq, bo_cut, thb_cut, thb_cutsq;
	F_FLOAT bo_cut_bond;

	int vdwflag, lgflag;
	F_FLOAT gp[39], p_boc1, p_boc2;

	friend void pair_virial_fdotr_compute<PairReaxCKokkos>(PairReaxCKokkos*);

	int bocnt,hbcnt;

	typedef Kokkos::DualView<LR_lookup_table_kk**,LMPDeviceType::array_layout,DeviceType> tdual_LR_lookup_table_kk_2d;
	typedef typename tdual_LR_lookup_table_kk_2d::t_dev t_LR_lookup_table_kk_2d;

	tdual_LR_lookup_table_kk_2d k_LR;
	t_LR_lookup_table_kk_2d d_LR;

	DAT::tdual_int_2d k_tmpid;
	DAT::tdual_ffloat_2d k_tmpbo;
	DAT::tdual_int_scalar k_error_flag;

	typename AT::t_int_1d d_numneigh_bonds;
	typename AT::t_tagint_2d d_neighid;
	typename AT::t_ffloat_2d d_abo;

	typename AT::t_ffloat_1d d_buf;
	DAT::tdual_int_scalar k_nbuf_local;
	};

	template <class DeviceType>
	struct PairReaxCKokkosFindBondFunctor {
	typedef DeviceType device_type;
	typedef int value_type;
	PairReaxCKokkos<DeviceType> c;
	PairReaxCKokkosFindBondFunctor(PairReaxCKokkos<DeviceType>* c_ptr):c(*c_ptr) {};

	KOKKOS_INLINE_FUNCTION
	void join(volatile int &dst,
	const volatile int &src) const {
	dst = MAX(dst,src);
	}

	KOKKOS_INLINE_FUNCTION
	void operator()(const int ii, int &numbonds) const {
	c.calculate_find_bond_item(ii,numbonds);
	}
	};

	template <class DeviceType>
	struct PairReaxCKokkosPackBondBufferFunctor {
	typedef DeviceType device_type;
	typedef int value_type;
	PairReaxCKokkos<DeviceType> c;
	PairReaxCKokkosPackBondBufferFunctor(PairReaxCKokkos<DeviceType>* c_ptr):c(*c_ptr) {};

	KOKKOS_INLINE_FUNCTION
	void operator()(const int ii, int &j, const bool &final) const {
	c.pack_bond_buffer_item(ii,j,final);
	}
	};

	}

	#endif
	#endif

	/* ERROR/WARNING messages:

	*/
	diff --git a/src/KOKKOS/verlet_kokkos.cpp b/src/KOKKOS/verlet_kokkos.cpp
	index 53b404237..e4a3f857d 100644
	--- a/src/KOKKOS/verlet_kokkos.cpp
	+++ b/src/KOKKOS/verlet_kokkos.cpp
	@@ -1,624 +1,627 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <string.h>
	#include "verlet_kokkos.h"
	#include "neighbor.h"
	#include "domain.h"
	#include "comm.h"
	#include "atom.h"
	#include "atom_kokkos.h"
	#include "atom_masks.h"
	#include "force.h"
	#include "pair.h"
	#include "bond.h"
	#include "angle.h"
	#include "dihedral.h"
	#include "improper.h"
	#include "kspace.h"
	#include "output.h"
	#include "update.h"
	#include "modify.h"
	#include "compute.h"
	#include "fix.h"
	#include "timer.h"
	#include "memory.h"
	#include "error.h"

	#include <ctime>

	using namespace LAMMPS_NS;

	template<class ViewA, class ViewB>
	struct ForceAdder {
	ViewA a;
	ViewB b;
	ForceAdder(const ViewA& a_, const ViewB& b_):a(a_),b(b_) {}
	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	a(i,0) += b(i,0);
	a(i,1) += b(i,1);
	a(i,2) += b(i,2);
	}
	};

	/* ---------------------------------------------------------------------- */

	VerletKokkos::VerletKokkos(LAMMPS lmp, int narg, char *arg) :
	Verlet(lmp, narg, arg)
	{
	atomKK = (AtomKokkos *) atom;
	}

	/* ----------------------------------------------------------------------
	setup before run
	------------------------------------------------------------------------- */

	-void VerletKokkos::setup()
	+void VerletKokkos::setup(int flag)
	{
	if (comm->me == 0 && screen) {
	fprintf(screen,"Setting up Verlet run ...\n");
	- fprintf(screen," Unit style : %s\n", update->unit_style);
	- fprintf(screen," Current step : " BIGINT_FORMAT "\n", update->ntimestep);
	- fprintf(screen," Time step : %g\n", update->dt);
	- timer->print_timeout(screen);
	+ if (flag) {
	+ fprintf(screen," Unit style : %s\n", update->unit_style);
	+ fprintf(screen," Current step : " BIGINT_FORMAT "\n",
	+ update->ntimestep);
	+ fprintf(screen," Time step : %g\n", update->dt);
	+ timer->print_timeout(screen);
	+ }
	}

	update->setupflag = 1;
	lmp->kokkos->auto_sync = 0;

	// setup domain, communication and neighboring
	// acquire ghosts
	// build neighbor lists

	atomKK->sync(Host,ALL_MASK);
	atomKK->modified(Host,ALL_MASK);

	atomKK->setup();
	modify->setup_pre_exchange();
	// debug
	atomKK->sync(Host,ALL_MASK);
	atomKK->modified(Host,ALL_MASK);
	if (triclinic) domain->x2lamda(atomKK->nlocal);
	domain->pbc();

	atomKK->sync(Host,ALL_MASK);


	domain->reset_box();
	comm->setup();
	if (neighbor->style) neighbor->setup_bins();

	comm->exchange();

	if (atomKK->sortfreq > 0) atomKK->sort();

	comm->borders();

	if (triclinic) domain->lamda2x(atomKK->nlocal+atomKK->nghost);

	atomKK->sync(Host,ALL_MASK);

	domain->image_check();
	domain->box_too_small_check();
	modify->setup_pre_neighbor();

	atomKK->modified(Host,ALL_MASK);

	neighbor->build();
	neighbor->ncalls = 0;

	// compute all forces

	ev_set(update->ntimestep);
	force_clear();
	modify->setup_pre_force(vflag);

	if (pair_compute_flag) {
	atomKK->sync(force->pair->execution_space,force->pair->datamask_read);
	force->pair->compute(eflag,vflag);
	atomKK->modified(force->pair->execution_space,force->pair->datamask_modify);
	timer->stamp(Timer::PAIR);
	}
	else if (force->pair) force->pair->compute_dummy(eflag,vflag);


	if (atomKK->molecular) {
	if (force->bond) {
	atomKK->sync(force->bond->execution_space,force->bond->datamask_read);
	force->bond->compute(eflag,vflag);
	atomKK->modified(force->bond->execution_space,force->bond->datamask_modify);
	}
	if (force->angle) {
	atomKK->sync(force->angle->execution_space,force->angle->datamask_read);
	force->angle->compute(eflag,vflag);
	atomKK->modified(force->angle->execution_space,force->angle->datamask_modify);
	}
	if (force->dihedral) {
	atomKK->sync(force->dihedral->execution_space,force->dihedral->datamask_read);
	force->dihedral->compute(eflag,vflag);
	atomKK->modified(force->dihedral->execution_space,force->dihedral->datamask_modify);
	}
	if (force->improper) {
	atomKK->sync(force->improper->execution_space,force->improper->datamask_read);
	force->improper->compute(eflag,vflag);
	atomKK->modified(force->improper->execution_space,force->improper->datamask_modify);
	}
	timer->stamp(Timer::BOND);
	}

	if(force->kspace) {
	force->kspace->setup();
	if (kspace_compute_flag) {
	atomKK->sync(force->kspace->execution_space,force->kspace->datamask_read);
	force->kspace->compute(eflag,vflag);
	atomKK->modified(force->kspace->execution_space,force->kspace->datamask_modify);
	timer->stamp(Timer::KSPACE);
	} else force->kspace->compute_dummy(eflag,vflag);
	}
	if (force->newton) comm->reverse_comm();

	modify->setup(vflag);
	- output->setup();
	+ output->setup(flag);
	lmp->kokkos->auto_sync = 1;
	update->setupflag = 1;
	}

	/* ----------------------------------------------------------------------
	setup without output
	flag = 0 = just force calculation
	flag = 1 = reneighbor and force calculation
	------------------------------------------------------------------------- */

	void VerletKokkos::setup_minimal(int flag)
	{
	update->setupflag = 1;
	lmp->kokkos->auto_sync = 0;

	// setup domain, communication and neighboring
	// acquire ghosts
	// build neighbor lists

	if (flag) {
	atomKK->sync(Host,ALL_MASK);
	atomKK->modified(Host,ALL_MASK);

	modify->setup_pre_exchange();
	// debug
	atomKK->sync(Host,ALL_MASK);
	atomKK->modified(Host,ALL_MASK);

	if (triclinic) domain->x2lamda(atomKK->nlocal);
	domain->pbc();

	atomKK->sync(Host,ALL_MASK);

	domain->reset_box();
	comm->setup();
	if (neighbor->style) neighbor->setup_bins();
	comm->exchange();
	comm->borders();
	if (triclinic) domain->lamda2x(atomKK->nlocal+atomKK->nghost);

	atomKK->sync(Host,ALL_MASK);

	domain->image_check();
	domain->box_too_small_check();
	modify->setup_pre_neighbor();

	atomKK->modified(Host,ALL_MASK);

	neighbor->build();
	neighbor->ncalls = 0;
	}

	// compute all forces

	ev_set(update->ntimestep);
	force_clear();
	modify->setup_pre_force(vflag);

	if (pair_compute_flag) {
	atomKK->sync(force->pair->execution_space,force->pair->datamask_read);
	force->pair->compute(eflag,vflag);
	atomKK->modified(force->pair->execution_space,force->pair->datamask_modify);
	timer->stamp(Timer::PAIR);
	}
	else if (force->pair) force->pair->compute_dummy(eflag,vflag);


	if (atomKK->molecular) {
	if (force->bond) {
	atomKK->sync(force->bond->execution_space,force->bond->datamask_read);
	force->bond->compute(eflag,vflag);
	atomKK->modified(force->bond->execution_space,force->bond->datamask_modify);
	}
	if (force->angle) {
	atomKK->sync(force->angle->execution_space,force->angle->datamask_read);
	force->angle->compute(eflag,vflag);
	atomKK->modified(force->angle->execution_space,force->angle->datamask_modify);
	}
	if (force->dihedral) {
	atomKK->sync(force->dihedral->execution_space,force->dihedral->datamask_read);
	force->dihedral->compute(eflag,vflag);
	atomKK->modified(force->dihedral->execution_space,force->dihedral->datamask_modify);
	}
	if (force->improper) {
	atomKK->sync(force->improper->execution_space,force->improper->datamask_read);
	force->improper->compute(eflag,vflag);
	atomKK->modified(force->improper->execution_space,force->improper->datamask_modify);
	}
	timer->stamp(Timer::BOND);
	}

	if(force->kspace) {
	force->kspace->setup();
	if (kspace_compute_flag) {
	atomKK->sync(force->kspace->execution_space,force->kspace->datamask_read);
	force->kspace->compute(eflag,vflag);
	atomKK->modified(force->kspace->execution_space,force->kspace->datamask_modify);
	timer->stamp(Timer::KSPACE);
	} else force->kspace->compute_dummy(eflag,vflag);
	}

	if (force->newton) comm->reverse_comm();

	modify->setup(vflag);
	lmp->kokkos->auto_sync = 1;
	update->setupflag = 0;
	}

	/* ----------------------------------------------------------------------
	run for N steps
	------------------------------------------------------------------------- */

	void VerletKokkos::run(int n)
	{
	bigint ntimestep;
	int nflag,sortflag;

	int n_post_integrate = modify->n_post_integrate;
	int n_pre_exchange = modify->n_pre_exchange;
	int n_pre_neighbor = modify->n_pre_neighbor;
	int n_pre_force = modify->n_pre_force;
	int n_post_force = modify->n_post_force;
	int n_end_of_step = modify->n_end_of_step;

	lmp->kokkos->auto_sync = 0;

	if (atomKK->sortfreq > 0) sortflag = 1;
	else sortflag = 0;

	f_merge_copy = DAT::t_f_array("VerletKokkos::f_merge_copy",atomKK->k_f.dimension_0());

	static double time = 0.0;
	atomKK->sync(Device,ALL_MASK);
	Kokkos::Impl::Timer ktimer;

	timer->init_timeout();
	for (int i = 0; i < n; i++) {

	if (timer->check_timeout(i)) {
	update->nsteps = i;
	break;
	}
	ntimestep = ++update->ntimestep;
	ev_set(ntimestep);

	// initial time integration

	ktimer.reset();
	timer->stamp();
	modify->initial_integrate(vflag);
	time += ktimer.seconds();
	if (n_post_integrate) modify->post_integrate();
	timer->stamp(Timer::MODIFY);

	// regular communication vs neighbor list rebuild

	nflag = neighbor->decide();

	if (nflag == 0) {
	timer->stamp();
	comm->forward_comm();
	timer->stamp(Timer::COMM);
	} else {
	// added debug
	//atomKK->sync(Host,ALL_MASK);
	//atomKK->modified(Host,ALL_MASK);

	if (n_pre_exchange) {
	timer->stamp();
	modify->pre_exchange();
	timer->stamp(Timer::MODIFY);
	}
	// debug
	//atomKK->sync(Host,ALL_MASK);
	//atomKK->modified(Host,ALL_MASK);
	if (triclinic) domain->x2lamda(atomKK->nlocal);
	domain->pbc();
	if (domain->box_change) {
	domain->reset_box();
	comm->setup();
	if (neighbor->style) neighbor->setup_bins();
	}
	timer->stamp();

	// added debug
	//atomKK->sync(Device,ALL_MASK);
	//atomKK->modified(Device,ALL_MASK);

	comm->exchange();
	if (sortflag && ntimestep >= atomKK->nextsort) atomKK->sort();
	comm->borders();

	// added debug
	//atomKK->sync(Host,ALL_MASK);
	//atomKK->modified(Host,ALL_MASK);

	if (triclinic) domain->lamda2x(atomKK->nlocal+atomKK->nghost);

	timer->stamp(Timer::COMM);
	if (n_pre_neighbor) {
	modify->pre_neighbor();
	timer->stamp(Timer::MODIFY);
	}
	neighbor->build();
	timer->stamp(Timer::NEIGH);
	}

	// force computations
	// important for pair to come before bonded contributions
	// since some bonded potentials tally pairwise energy/virial
	// and Pair:ev_tally() needs to be called before any tallying

	force_clear();

	timer->stamp();

	if (n_pre_force) {
	modify->pre_force(vflag);
	timer->stamp(Timer::MODIFY);
	}

	bool execute_on_host = false;
	unsigned int datamask_read_device = 0;
	unsigned int datamask_modify_device = 0;
	unsigned int datamask_read_host = 0;

	if ( pair_compute_flag ) {
	if (force->pair->execution_space==Host) {
	execute_on_host = true;
	datamask_read_host \|= force->pair->datamask_read;
	datamask_modify_device \|= force->pair->datamask_modify;
	} else {
	datamask_read_device \|= force->pair->datamask_read;
	datamask_modify_device \|= force->pair->datamask_modify;
	}
	}
	if ( atomKK->molecular && force->bond ) {
	if (force->bond->execution_space==Host) {
	execute_on_host = true;
	datamask_read_host \|= force->bond->datamask_read;
	datamask_modify_device \|= force->bond->datamask_modify;
	} else {
	datamask_read_device \|= force->bond->datamask_read;
	datamask_modify_device \|= force->bond->datamask_modify;
	}
	}
	if ( atomKK->molecular && force->angle ) {
	if (force->angle->execution_space==Host) {
	execute_on_host = true;
	datamask_read_host \|= force->angle->datamask_read;
	datamask_modify_device \|= force->angle->datamask_modify;
	} else {
	datamask_read_device \|= force->angle->datamask_read;
	datamask_modify_device \|= force->angle->datamask_modify;
	}
	}
	if ( atomKK->molecular && force->dihedral ) {
	if (force->dihedral->execution_space==Host) {
	execute_on_host = true;
	datamask_read_host \|= force->dihedral->datamask_read;
	datamask_modify_device \|= force->dihedral->datamask_modify;
	} else {
	datamask_read_device \|= force->dihedral->datamask_read;
	datamask_modify_device \|= force->dihedral->datamask_modify;
	}
	}
	if ( atomKK->molecular && force->improper ) {
	if (force->improper->execution_space==Host) {
	execute_on_host = true;
	datamask_read_host \|= force->improper->datamask_read;
	datamask_modify_device \|= force->improper->datamask_modify;
	} else {
	datamask_read_device \|= force->improper->datamask_read;
	datamask_modify_device \|= force->improper->datamask_modify;
	}
	}
	if ( kspace_compute_flag ) {
	if (force->kspace->execution_space==Host) {
	execute_on_host = true;
	datamask_read_host \|= force->kspace->datamask_read;
	datamask_modify_device \|= force->kspace->datamask_modify;
	} else {
	datamask_read_device \|= force->kspace->datamask_read;
	datamask_modify_device \|= force->kspace->datamask_modify;
	}
	}


	if (pair_compute_flag) {
	atomKK->sync(force->pair->execution_space,force->pair->datamask_read);
	atomKK->sync(force->pair->execution_space,~(~force->pair->datamask_read\|(F_MASK \| ENERGY_MASK \| VIRIAL_MASK)));
	Kokkos::Impl::Timer ktimer;
	force->pair->compute(eflag,vflag);
	atomKK->modified(force->pair->execution_space,force->pair->datamask_modify);
	atomKK->modified(force->pair->execution_space,~(~force->pair->datamask_modify\|(F_MASK \| ENERGY_MASK \| VIRIAL_MASK)));
	timer->stamp(Timer::PAIR);
	}

	if(execute_on_host) {
	if(pair_compute_flag && force->pair->datamask_modify!=(F_MASK \| ENERGY_MASK \| VIRIAL_MASK))
	Kokkos::fence();
	atomKK->sync_overlapping_device(Host,~(~datamask_read_host\|(F_MASK \| ENERGY_MASK \| VIRIAL_MASK)));
	if(pair_compute_flag && force->pair->execution_space!=Host) {
	Kokkos::deep_copy(LMPHostType(),atomKK->k_f.h_view,0.0);
	}
	}

	if (atomKK->molecular) {
	if (force->bond) {
	atomKK->sync(force->bond->execution_space,~(~force->bond->datamask_read\|(F_MASK \| ENERGY_MASK \| VIRIAL_MASK)));
	force->bond->compute(eflag,vflag);
	atomKK->modified(force->bond->execution_space,~(~force->bond->datamask_modify\|(F_MASK \| ENERGY_MASK \| VIRIAL_MASK)));
	}
	if (force->angle) {
	atomKK->sync(force->angle->execution_space,~(~force->angle->datamask_read\|(F_MASK \| ENERGY_MASK \| VIRIAL_MASK)));
	force->angle->compute(eflag,vflag);
	atomKK->modified(force->angle->execution_space,~(~force->angle->datamask_modify\|(F_MASK \| ENERGY_MASK \| VIRIAL_MASK)));
	}
	if (force->dihedral) {
	atomKK->sync(force->dihedral->execution_space,~(~force->dihedral->datamask_read\|(F_MASK \| ENERGY_MASK \| VIRIAL_MASK)));
	force->dihedral->compute(eflag,vflag);
	atomKK->modified(force->dihedral->execution_space,~(~force->dihedral->datamask_modify\|(F_MASK \| ENERGY_MASK \| VIRIAL_MASK)));
	}
	if (force->improper) {
	atomKK->sync(force->improper->execution_space,~(~force->improper->datamask_read\|(F_MASK \| ENERGY_MASK \| VIRIAL_MASK)));
	force->improper->compute(eflag,vflag);
	atomKK->modified(force->improper->execution_space,~(~force->improper->datamask_modify\|(F_MASK \| ENERGY_MASK \| VIRIAL_MASK)));
	}
	timer->stamp(Timer::BOND);
	}

	if (kspace_compute_flag) {
	atomKK->sync(force->kspace->execution_space,~(~force->kspace->datamask_read\|(F_MASK \| ENERGY_MASK \| VIRIAL_MASK)));
	force->kspace->compute(eflag,vflag);
	atomKK->modified(force->kspace->execution_space,~(~force->kspace->datamask_modify\|(F_MASK \| ENERGY_MASK \| VIRIAL_MASK)));
	timer->stamp(Timer::KSPACE);
	}

	if(execute_on_host && !std::is_same<LMPHostType,LMPDeviceType>::value) {
	if(f_merge_copy.dimension_0()<atomKK->k_f.dimension_0()) {
	f_merge_copy = DAT::t_f_array("VerletKokkos::f_merge_copy",atomKK->k_f.dimension_0());
	}
	f = atomKK->k_f.d_view;
	Kokkos::deep_copy(LMPHostType(),f_merge_copy,atomKK->k_f.h_view);
	Kokkos::parallel_for(atomKK->k_f.dimension_0(),
	ForceAdder<DAT::t_f_array,DAT::t_f_array>(atomKK->k_f.d_view,f_merge_copy));
	atomKK->k_f.modified_host() = 0; // special case
	atomKK->k_f.modify<LMPDeviceType>();
	}


	// reverse communication of forces

	if (force->newton) comm->reverse_comm();
	timer->stamp(Timer::COMM);

	// force modifications, final time integration, diagnostics

	if (n_post_force) modify->post_force(vflag);
	modify->final_integrate();
	if (n_end_of_step) modify->end_of_step();
	timer->stamp(Timer::MODIFY);

	// all output

	if (ntimestep == output->next) {
	atomKK->sync(Host,ALL_MASK);

	timer->stamp();
	output->write(ntimestep);
	timer->stamp(Timer::OUTPUT);
	}
	}

	atomKK->sync(Host,ALL_MASK);
	lmp->kokkos->auto_sync = 1;
	}

	/* ----------------------------------------------------------------------
	clear force on own & ghost atoms
	clear other arrays as needed
	------------------------------------------------------------------------- */

	void VerletKokkos::force_clear()
	{
	int i;

	if (external_force_clear) return;

	// clear force on all particles
	// if either newton flag is set, also include ghosts
	// when using threads always clear all forces.

	if (neighbor->includegroup == 0) {
	int nall;
	if (force->newton) nall = atomKK->nlocal + atomKK->nghost;
	else nall = atomKK->nlocal;

	size_t nbytes = sizeof(double) * nall;

	if (nbytes) {
	if (atomKK->k_f.modified_host() > atomKK->k_f.modified_device()) {
	memset_kokkos(atomKK->k_f.view<LMPHostType>());
	atomKK->modified(Host,F_MASK);
	atomKK->sync(Device,F_MASK);
	} else {
	memset_kokkos(atomKK->k_f.view<LMPDeviceType>());
	atomKK->modified(Device,F_MASK);
	}
	if (torqueflag) memset(&(atomKK->torque[0][0]),0,3*nbytes);

	}

	// neighbor includegroup flag is set
	// clear force only on initial nfirst particles
	// if either newton flag is set, also include ghosts

	} else {
	int nall = atomKK->nfirst;
	if (atomKK->k_f.modified_host() > atomKK->k_f.modified_device()) {
	memset_kokkos(atomKK->k_f.view<LMPHostType>());
	atomKK->modified(Host,F_MASK);
	} else {
	memset_kokkos(atomKK->k_f.view<LMPDeviceType>());
	atomKK->modified(Device,F_MASK);
	}
	if (torqueflag) {
	double **torque = atomKK->torque;
	for (i = 0; i < nall; i++) {
	torque[i][0] = 0.0;
	torque[i][1] = 0.0;
	torque[i][2] = 0.0;
	}
	}

	if (force->newton) {
	nall = atomKK->nlocal + atomKK->nghost;

	if (torqueflag) {
	double **torque = atomKK->torque;
	for (i = atomKK->nlocal; i < nall; i++) {
	torque[i][0] = 0.0;
	torque[i][1] = 0.0;
	torque[i][2] = 0.0;
	}
	}

	}
	}
	}


	diff --git a/src/KOKKOS/verlet_kokkos.h b/src/KOKKOS/verlet_kokkos.h
	index 03a938332..645523920 100644
	--- a/src/KOKKOS/verlet_kokkos.h
	+++ b/src/KOKKOS/verlet_kokkos.h
	@@ -1,57 +1,57 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifdef INTEGRATE_CLASS

	IntegrateStyle(verlet/kk,VerletKokkos)

	#else

	#ifndef LMP_VERLET_KOKKOS_H
	#define LMP_VERLET_KOKKOS_H

	#include "verlet.h"
	#include "kokkos_type.h"

	namespace LAMMPS_NS {

	class VerletKokkos : public Verlet {
	public:
	VerletKokkos(class LAMMPS , int, char *);
	~VerletKokkos() {}
	- void setup();
	+ void setup(int flag=1);
	void setup_minimal(int);
	void run(int);

	KOKKOS_INLINE_FUNCTION
	void operator() (const int& i) const {
	f(i,0) += f_merge_copy(i,0);
	f(i,1) += f_merge_copy(i,1);
	f(i,2) += f_merge_copy(i,2);
	}


	protected:
	DAT::t_f_array f_merge_copy,f;

	void force_clear();
	};

	}

	#endif
	#endif

	/* ERROR/WARNING messages:

	*/
	diff --git a/src/KSPACE/ewald_disp.cpp b/src/KSPACE/ewald_disp.cpp
	index 467a748d0..85e3da921 100644
	--- a/src/KSPACE/ewald_disp.cpp
	+++ b/src/KSPACE/ewald_disp.cpp
	@@ -1,1509 +1,1510 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing authors: Pieter in 't Veld (SNL), Stan Moore (SNL)
	------------------------------------------------------------------------- */

	#include <mpi.h>
	#include <string.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <math.h>
	#include "ewald_disp.h"
	#include "math_vector.h"
	#include "math_const.h"
	#include "math_special.h"
	#include "atom.h"
	#include "comm.h"
	#include "force.h"
	#include "pair.h"
	#include "domain.h"
	#include "memory.h"
	#include "error.h"
	#include "update.h"

	using namespace LAMMPS_NS;
	using namespace MathConst;
	using namespace MathSpecial;

	#define SMALL 0.00001

	enum{GEOMETRIC,ARITHMETIC,SIXTHPOWER}; // same as in pair.h

	//#define DEBUG

	/* ---------------------------------------------------------------------- */

	EwaldDisp::EwaldDisp(LAMMPS lmp, int narg, char *arg) : KSpace(lmp, narg, arg),
	kenergy(NULL), kvirial(NULL), energy_self_peratom(NULL), virial_self_peratom(NULL),
	ekr_local(NULL), hvec(NULL), kvec(NULL), B(NULL), cek_local(NULL), cek_global(NULL)
	{
	if (narg!=1) error->all(FLERR,"Illegal kspace_style ewald/n command");

	ewaldflag = dispersionflag = dipoleflag = 1;
	accuracy_relative = fabs(force->numeric(FLERR,arg[0]));

	memset(function, 0, EWALD_NFUNCS*sizeof(int));
	kenergy = kvirial = NULL;
	cek_local = cek_global = NULL;
	ekr_local = NULL;
	hvec = NULL;
	kvec = NULL;
	B = NULL;
	first_output = 0;
	energy_self_peratom = NULL;
	virial_self_peratom = NULL;
	nmax = 0;
	q2 = 0;
	b2 = 0;
	M2 = 0;
	}

	/* ---------------------------------------------------------------------- */

	EwaldDisp::~EwaldDisp()
	{
	deallocate();
	deallocate_peratom();
	delete [] ekr_local;
	delete [] B;
	}

	/* --------------------------------------------------------------------- */

	void EwaldDisp::init()
	{
	nkvec = nkvec_max = nevec = nevec_max = 0;
	nfunctions = nsums = sums = 0;
	nbox = -1;
	bytes = 0.0;

	if (!comm->me) {
	if (screen) fprintf(screen,"EwaldDisp initialization ...\n");
	if (logfile) fprintf(logfile,"EwaldDisp initialization ...\n");
	}

	triclinic_check();
	if (domain->dimension == 2)
	error->all(FLERR,"Cannot use EwaldDisp with 2d simulation");
	if (slabflag == 0 && domain->nonperiodic > 0)
	error->all(FLERR,"Cannot use nonperiodic boundaries with EwaldDisp");
	if (slabflag == 1) {
	if (domain->xperiodic != 1 \|\| domain->yperiodic != 1 \|\|
	domain->boundary[2][0] != 1 \|\| domain->boundary[2][1] != 1)
	error->all(FLERR,"Incorrect boundaries with slab EwaldDisp");
	}

	scale = 1.0;
	mumurd2e = force->qqrd2e;
	dielectric = force->dielectric;

	int tmp;
	Pair *pair = force->pair;
	int ptr = pair ? (int ) pair->extract("ewald_order",tmp) : NULL;
	double cutoff = pair ? (double ) pair->extract("cut_coul",tmp) : NULL;
	if (!(ptr\|\|cutoff))
	error->all(FLERR,"KSpace style is incompatible with Pair style");
	int ewald_order = ptr ? ((int ) ptr) : 1<<1;
	int ewald_mix = ptr ? ((int ) pair->extract("ewald_mix",tmp)) : GEOMETRIC;
	memset(function, 0, EWALD_NFUNCS*sizeof(int));
	for (int i=0; i<=EWALD_NORDER; ++i) // transcribe order
	if (ewald_order&(1<<i)) { // from pair_style
	int n[] = EWALD_NSUMS, k = 0;
	switch (i) {
	case 1:
	k = 0; break;
	case 3:
	k = 3; break;
	case 6:
	if (ewald_mix==GEOMETRIC) { k = 1; break; }
	else if (ewald_mix==ARITHMETIC) { k = 2; break; }
	error->all(FLERR,
	"Unsupported mixing rule in kspace_style ewald/disp");
	default:
	error->all(FLERR,"Unsupported order in kspace_style ewald/disp");
	}
	nfunctions += function[k] = 1;
	nsums += n[k];
	}

	- if (!gewaldflag) g_ewald = 0.0;
	+ if (!gewaldflag) g_ewald = g_ewald_6 = 1.0;
	pair->init(); // so B is defined
	init_coeffs();
	init_coeff_sums();
	if (function[0]) qsum_qsq();
	else qsqsum = qsum = 0.0;
	natoms_original = atom->natoms;
	+ if (!gewaldflag) g_ewald = g_ewald_6 = 0.0;

	// turn off coulombic if no charge

	if (function[0] && qsqsum == 0.0) {
	function[0] = 0;
	nfunctions -= 1;
	nsums -= 1;
	}

	double bsbsum = 0.0;
	M2 = 0.0;
	if (function[1]) bsbsum = sum[1].x2;
	if (function[2]) bsbsum = sum[2].x2;

	if (function[3]) M2 = sum[9].x2;

	if (function[3] && strcmp(update->unit_style,"electron") == 0)
	error->all(FLERR,"Cannot (yet) use 'electron' units with dipoles");

	if (qsqsum == 0.0 && bsbsum == 0.0 && M2 == 0.0)
	error->all(FLERR,"Cannot use Ewald/disp solver "
	"on system with no charge, dipole, or LJ particles");
	if (fabs(qsum) > SMALL && comm->me == 0) {
	char str[128];
	sprintf(str,"System is not charge neutral, net charge = %g",qsum);
	error->warning(FLERR,str);
	}

	if (!function[1] && !function[2]) dispersionflag = 0;
	if (!function[3]) dipoleflag = 0;

	pair_check();

	// set accuracy (force units) from accuracy_relative or accuracy_absolute

	if (accuracy_absolute >= 0.0) accuracy = accuracy_absolute;
	else accuracy = accuracy_relative * two_charge_force;

	// setup K-space resolution

	q2 = qsqsum * force->qqrd2e;
	M2 *= mumurd2e;
	b2 = bsbsum; //Are these units right?
	bigint natoms = atom->natoms;

	if (!gewaldflag) {
	if (function[0]) {
	g_ewald = accuracysqrt(natoms(cutoff)shape_det(domain->h)) / (2.0*q2);
	if (g_ewald >= 1.0) g_ewald = (1.35 - 0.15log(accuracy))/(cutoff);
	else g_ewald = sqrt(-log(g_ewald)) / (*cutoff);
	} else if (function[3]) {
	//Try Newton Solver
	//Use old method to get guess
	g_ewald = (1.35 - 0.15log(accuracy))/ cutoff;
	double g_ewald_new =
	NewtonSolve(g_ewald,(*cutoff),natoms,shape_det(domain->h),M2);
	if (g_ewald_new > 0.0) g_ewald = g_ewald_new;
	else error->warning(FLERR,"Ewald/disp Newton solver failed, "
	"using old method to estimate g_ewald");
	} else if (function[1] \|\| function[2]) {
	//Try Newton Solver
	//Use old method to get guess
	g_ewald = (1.35 - 0.15log(accuracy))/ cutoff;

	double g_ewald_new =
	NewtonSolve(g_ewald,(*cutoff),natoms,shape_det(domain->h),b2);
	if (g_ewald_new > 0.0) g_ewald = g_ewald_new;
	else error->warning(FLERR,"Ewald/disp Newton solver failed, "
	"using old method to estimate g_ewald");
	}
	}

	if (!comm->me) {
	- if (screen) fprintf(screen, " G vector = %g\n", g_ewald);
	- if (logfile) fprintf(logfile, " G vector = %g\n", g_ewald);
	+ if (screen) fprintf(screen, " G vector = %g, accuracy = %g\n", g_ewald,accuracy);
	+ if (logfile) fprintf(logfile, " G vector = %g accuracy = %g\n", g_ewald,accuracy);
	}

	g_ewald_6 = g_ewald;
	deallocate_peratom();
	peratom_allocate_flag = 0;
	}

	/* ----------------------------------------------------------------------
	adjust EwaldDisp coeffs, called initially and whenever volume has changed
	------------------------------------------------------------------------- */

	void EwaldDisp::setup()
	{
	volume = shape_det(domain->h)*slab_volfactor;
	memcpy(unit, domain->h_inv, sizeof(shape));
	shape_scalar_mult(unit, 2.0*MY_PI);
	unit[2] /= slab_volfactor;

	// int nbox_old = nbox, nkvec_old = nkvec;

	if (accuracy >= 1) {
	nbox = 0;
	error->all(FLERR,"KSpace accuracy too low");
	}

	bigint natoms = atom->natoms;
	double err;
	int kxmax = 1;
	int kymax = 1;
	int kzmax = 1;
	err = rms(kxmax,domain->h[0],natoms,q2,b2,M2);
	while (err > accuracy) {
	kxmax++;
	err = rms(kxmax,domain->h[0],natoms,q2,b2,M2);
	}
	err = rms(kymax,domain->h[1],natoms,q2,b2,M2);
	while (err > accuracy) {
	kymax++;
	err = rms(kymax,domain->h[1],natoms,q2,b2,M2);
	}
	err = rms(kzmax,domain->h[2]*slab_volfactor,natoms,q2,b2,M2);
	while (err > accuracy) {
	kzmax++;
	err = rms(kzmax,domain->h[2]*slab_volfactor,natoms,q2,b2,M2);
	}
	nbox = MAX(kxmax,kymax);
	nbox = MAX(nbox,kzmax);
	double gsqxmx = unit[0]unit[0]kxmax*kxmax;
	double gsqymx = unit[1]unit[1]kymax*kymax;
	double gsqzmx = unit[2]unit[2]kzmax*kzmax;
	gsqmx = MAX(gsqxmx,gsqymx);
	gsqmx = MAX(gsqmx,gsqzmx);
	gsqmx *= 1.00001;

	reallocate();
	coefficients();
	init_coeffs();
	init_coeff_sums();
	init_self();

	if (!(first_output\|\|comm->me)) {
	first_output = 1;
	if (screen) fprintf(screen,
	" vectors: nbox = %d, nkvec = %d\n", nbox, nkvec);
	if (logfile) fprintf(logfile,
	" vectors: nbox = %d, nkvec = %d\n", nbox, nkvec);
	}
	}

	/* ----------------------------------------------------------------------
	compute RMS accuracy for a dimension
	------------------------------------------------------------------------- */

	double EwaldDisp::rms(int km, double prd, bigint natoms,
	double q2, double b2, double M2)
	{
	double value = 0.0;

	// Coulombic

	double g2 = g_ewald*g_ewald;

	value += 2.0q2g_ewald/prd *
	sqrt(1.0/(MY_PIkmnatoms)) *
	exp(-MY_PIMY_PIkmkm/(g2prd*prd));

	// Lennard-Jones

	double g7 = g2g2g2*g_ewald;

	value += 4.0b2g7/3.0 *
	sqrt(1.0/(MY_PInatoms))
	(exp(-MY_PIMY_PIkmkm/(g2prdprd))
	(MY_PIkm/(g_ewaldprd) + 1));

	// dipole

	value += 8.0MY_PIM2/volumeg_ewald
	sqrt(2.0MY_PIkmkmkm/(15.0natoms))
	exp(-pow(MY_PIkm/(g_ewaldprd),2.0));

	return value;
	}

	void EwaldDisp::reallocate()
	{
	int ix, iy, iz;
	int nkvec_max = nkvec;
	vector h;

	nkvec = 0;
	int kflag = new int[(nbox+1)(2nbox+1)(2*nbox+1)];
	int *flag = kflag;

	for (ix=0; ix<=nbox; ++ix)
	for (iy=-nbox; iy<=nbox; ++iy)
	for (iz=-nbox; iz<=nbox; ++iz)
	if (!(ix\|\|iy\|\|iz)) *(flag++) = 0;
	else if ((!ix)&&(iy<0)) *(flag++) = 0;
	else if ((!(ix\|\|iy))&&(iz<0)) *(flag++) = 0; // use symmetry
	else {
	h[0] = unit[0]*ix;
	h[1] = unit[5]ix+unit[1]iy;
	h[2] = unit[4]ix+unit[3]iy+unit[2]*iz;
	if (((flag++) = h[0]h[0]+h[1]h[1]+h[2]h[2]<=gsqmx)) ++nkvec;
	}

	if (nkvec>nkvec_max) {
	deallocate(); // free memory
	hvec = new hvector[nkvec]; // hvec
	bytes += (nkvec-nkvec_max)*sizeof(hvector);
	kvec = new kvector[nkvec]; // kvec
	bytes += (nkvec-nkvec_max)*sizeof(kvector);
	kenergy = new double[nkvec*nfunctions]; // kenergy
	bytes += (nkvec-nkvec_max)nfunctionssizeof(double);
	kvirial = new double[6nkvecnfunctions]; // kvirial
	bytes += 6(nkvec-nkvec_max)nfunctions*sizeof(double);
	cek_local = new complex[nkvec*nsums]; // cek_local
	bytes += (nkvec-nkvec_max)nsumssizeof(complex);
	cek_global = new complex[nkvec*nsums]; // cek_global
	bytes += (nkvec-nkvec_max)nsumssizeof(complex);
	nkvec_max = nkvec;
	}

	flag = kflag; // create index and
	kvector *k = kvec; // wave vectors
	hvector *hi = hvec;
	for (ix=0; ix<=nbox; ++ix)
	for (iy=-nbox; iy<=nbox; ++iy)
	for (iz=-nbox; iz<=nbox; ++iz)
	if (*(flag++)) {
	hi->x = unit[0]*ix;
	hi->y = unit[5]ix+unit[1]iy;
	(hi++)->z = unit[4]ix+unit[3]iy+unit[2]*iz;
	k->x = ix+nbox; k->y = iy+nbox; (k++)->z = iz+nbox; }

	delete [] kflag;
	}

	/* ---------------------------------------------------------------------- */

	void EwaldDisp::reallocate_atoms()
	{
	if (eflag_atom \|\| vflag_atom)
	if (atom->nmax > nmax) {
	deallocate_peratom();
	allocate_peratom();
	nmax = atom->nmax;
	}

	if ((nevec = atom->nmax(2nbox+1))<=nevec_max) return;
	delete [] ekr_local;
	ekr_local = new cvector[nevec];
	bytes += (nevec-nevec_max)*sizeof(cvector);
	nevec_max = nevec;
	}

	/* ---------------------------------------------------------------------- */

	void EwaldDisp::allocate_peratom()
	{
	memory->create(energy_self_peratom,
	atom->nmax,EWALD_NFUNCS,"ewald/n:energy_self_peratom");
	memory->create(virial_self_peratom,
	atom->nmax,EWALD_NFUNCS,"ewald/n:virial_self_peratom");
	}

	/* ---------------------------------------------------------------------- */

	void EwaldDisp::deallocate_peratom() // free memory
	{
	if (energy_self_peratom) {
	memory->destroy(energy_self_peratom);
	energy_self_peratom = NULL;
	}

	if (virial_self_peratom) {
	memory->destroy(virial_self_peratom);
	virial_self_peratom = NULL;
	}
	}

	/* ---------------------------------------------------------------------- */

	void EwaldDisp::deallocate() // free memory
	{
	delete [] hvec; hvec = NULL;
	delete [] kvec; kvec = NULL;
	delete [] kenergy; kenergy = NULL;
	delete [] kvirial; kvirial = NULL;
	delete [] cek_local; cek_local = NULL;
	delete [] cek_global; cek_global = NULL;
	}

	/* ---------------------------------------------------------------------- */

	void EwaldDisp::coefficients()
	{
	vector h;
	hvector hi = hvec, nh;
	double eta2 = 0.25/(g_ewald*g_ewald);
	double b1, b2, expb2, h1, h2, c1, c2;
	double ke = kenergy, kv = kvirial;
	int func0 = function[0], func12 = function[1]\|\|function[2],
	func3 = function[3];

	for (nh = (hi = hvec)+nkvec; hi<nh; ++hi) { // wave vectors
	memcpy(h, hi, sizeof(vector));
	expb2 = exp(-(b2 = (h2 = vec_dot(h, h))*eta2));
	if (func0) { // qi*qj/r coeffs
	*(ke++) = c1 = expb2/h2;
	(kv++) = c1-(c2 = 2.0c1(1.0+b2)/h2)h[0]*h[0];
	(kv++) = c1-c2h[1]*h[1]; // lammps convention
	(kv++) = c1-c2h[2]*h[2]; // instead of voigt
	(kv++) = -c2h[1]*h[0];
	(kv++) = -c2h[2]*h[0];
	(kv++) = -c2h[2]*h[1];
	}
	if (func12) { // -Bij/r^6 coeffs
	b1 = sqrt(b2); // minus sign folded
	h1 = sqrt(h2); // into constants
	(ke++) = c1 = -h1h2((c2=MY_PISerfc(b1))+(0.5/b2-1.0)*expb2/b1);
	(kv++) = c1-(c2 = 3.0h1(c2-expb2/b1))h[0]*h[0];
	(kv++) = c1-c2h[1]*h[1]; // lammps convention
	(kv++) = c1-c2h[2]*h[2]; // instead of voigt
	(kv++) = -c2h[1]*h[0];
	(kv++) = -c2h[2]*h[0];
	(kv++) = -c2h[2]*h[1];
	}
	if (func3) { // dipole coeffs
	*(ke++) = c1 = expb2/h2;
	(kv++) = c1-(c2 = 2.0c1(1.0+b2)/h2)h[0]*h[0];
	(kv++) = c1-c2h[1]*h[1]; // lammps convention
	(kv++) = c1-c2h[2]*h[2]; // instead of voigt
	(kv++) = -c2h[1]*h[0];
	(kv++) = -c2h[2]*h[0];
	(kv++) = -c2h[2]*h[1];
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	void EwaldDisp::init_coeffs()
	{
	int tmp;
	int n = atom->ntypes;

	if (function[1]) { // geometric 1/r^6
	double b = (double ) force->pair->extract("B",tmp);
	delete [] B;
	B = new double[n+1];
	B[0] = 0.0;
	bytes += (n+1)*sizeof(double);
	for (int i=1; i<=n; ++i) B[i] = sqrt(fabs(b[i][i]));
	}
	if (function[2]) { // arithmetic 1/r^6
	double epsilon = (double ) force->pair->extract("epsilon",tmp);
	double sigma = (double ) force->pair->extract("sigma",tmp);
	double eps_i, sigma_i, sigma_n, bi = B = new double[7n+7];
	double c[7] = {
	1.0, sqrt(6.0), sqrt(15.0), sqrt(20.0), sqrt(15.0), sqrt(6.0), 1.0};

	if (!(epsilon&&sigma))
	error->all(
	FLERR,"Epsilon or sigma reference not set by pair style in ewald/n");
	for (int j=0; j<7; ++j)
	*(bi++) = 0.0;
	for (int i=1; i<=n; ++i) {
	eps_i = sqrt(epsilon[i][i]);
	sigma_i = sigma[i][i];
	sigma_n = 1.0;
	for (int j=0; j<7; ++j) {
	(bi++) = sigma_neps_ic[j]; sigma_n = sigma_i;
	}
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	void EwaldDisp::init_coeff_sums()
	{
	if (sums) return; // calculated only once
	sums = 1;

	Sum sum_local[EWALD_MAX_NSUMS];

	memset(sum_local, 0, EWALD_MAX_NSUMS*sizeof(Sum));
	memset(sum, 0, EWALD_MAX_NSUMS*sizeof(Sum));

	// now perform qsum and qsq via parent qsum_qsq()

	sum_local[0].x = 0.0;
	sum_local[0].x2 = 0.0;

	//if (function[0]) { // 1/r
	// double q = atom->q, qn = q+atom->nlocal;
	// for (double *i=q; i<qn; ++i) {
	// sum_local[0].x += i[0]; sum_local[0].x2 += i[0]*i[0]; }
	//}

	if (function[1]) { // geometric 1/r^6
	int type = atom->type, ntype = type+atom->nlocal;
	for (int *i=type; i<ntype; ++i) {
	sum_local[1].x += B[i[0]]; sum_local[1].x2 += B[i[0]]*B[i[0]]; }
	}
	if (function[2]) { // arithmetic 1/r^6
	double *bi;
	int type = atom->type, ntype = type+atom->nlocal;
	for (int *i=type; i<ntype; ++i) {
	bi = B+7*i[0];
	sum_local[2].x2 += bi[0]*bi[6];
	for (int k=2; k<9; ++k) sum_local[k].x += *(bi++);
	}
	}
	if (function[3]&&atom->mu) { // dipole
	double mu = atom->mu[0], nmu = mu+4*atom->nlocal;
	for (double *i = mu; i < nmu; i += 4)
	sum_local[9].x2 += i[3]*i[3];
	}
	MPI_Allreduce(sum_local, sum, 2*EWALD_MAX_NSUMS, MPI_DOUBLE, MPI_SUM, world);
	}

	/* ---------------------------------------------------------------------- */

	void EwaldDisp::init_self()
	{
	double g1 = g_ewald, g2 = g1g1, g3 = g1g2;
	const double qscale = force->qqrd2e * scale;

	memset(energy_self, 0, EWALD_NFUNCS*sizeof(double)); // self energy
	memset(virial_self, 0, EWALD_NFUNCS*sizeof(double));

	if (function[0]) { // 1/r
	virial_self[0] = -0.5MY_PIqscale/(g2volume)qsum*qsum;
	energy_self[0] = qsqsumqscaleg1/MY_PIS-virial_self[0];
	}
	if (function[1]) { // geometric 1/r^6
	virial_self[1] = MY_PIMY_PISg3/(6.0volume)sum[1].x*sum[1].x;
	energy_self[1] = -sum[1].x2g3g3/12.0+virial_self[1];
	}
	if (function[2]) { // arithmetic 1/r^6
	virial_self[2] = MY_PIMY_PISg3/(48.0volume)(sum[2].x*sum[8].x+
	sum[3].xsum[7].x+sum[4].xsum[6].x+0.5sum[5].xsum[5].x);
	energy_self[2] = -sum[2].x2g3g3/3.0+virial_self[2];
	}
	if (function[3]) { // dipole
	virial_self[3] = 0; // in surface
	energy_self[3] = sum[9].x2mumurd2e2.0*g3/3.0/MY_PIS-virial_self[3];
	}
	}

	/* ---------------------------------------------------------------------- */

	void EwaldDisp::init_self_peratom()
	{
	if (!(vflag_atom \|\| eflag_atom)) return;

	double g1 = g_ewald, g2 = g1g1, g3 = g1g2;
	const double qscale = force->qqrd2e * scale;
	double *energy = energy_self_peratom[0];
	double *virial = virial_self_peratom[0];
	int nlocal = atom->nlocal;

	memset(energy, 0, EWALD_NFUNCSnlocalsizeof(double));
	memset(virial, 0, EWALD_NFUNCSnlocalsizeof(double));

	if (function[0]) { // 1/r
	double *ei = energy;
	double *vi = virial;
	double ce = qscale*g1/MY_PIS;
	double cv = -0.5MY_PIqscale/(g2*volume);
	double qi = atom->q, qn = qi + nlocal;
	for (; qi < qn; qi++, vi += EWALD_NFUNCS, ei += EWALD_NFUNCS) {
	double q = *qi;
	vi = cvq*qsum;
	ei = ceq*q-vi[0];
	}
	}
	if (function[1]) { // geometric 1/r^6
	double *ei = energy+1;
	double *vi = virial+1;
	double ce = -g3*g3/12.0;
	double cv = MY_PIMY_PISg3/(6.0*volume);
	int typei = atom->type, typen = typei + atom->nlocal;
	for (; typei < typen; typei++, vi += EWALD_NFUNCS, ei += EWALD_NFUNCS) {
	double b = B[*typei];
	vi = cvb*sum[1].x;
	ei = ceb*b+vi[0];
	}
	}
	if (function[2]) { // arithmetic 1/r^6
	double *bi;
	double *ei = energy+2;
	double *vi = virial+2;
	double ce = -g3*g3/3.0;
	double cv = 0.5MY_PIMY_PISg3/(48.0volume);
	int typei = atom->type, typen = typei + atom->nlocal;
	for (; typei < typen; typei++, vi += EWALD_NFUNCS, ei += EWALD_NFUNCS) {
	bi = B+7*typei[0]+7;
	for (int k=2; k<9; ++k) vi += cvsum[k].x*(--bi)[0];

	/* PJV 20120225:
	should this be this instead? above implies an inverse dependence
	seems to be the above way in original; i recall having tested
	arithmetic mixing in the conception phase, but an extra test would
	be prudent (pattern repeats in multiple functions below)

	bi = B+7*typei[0];
	for (int k=2; k<9; ++k) vi += cvsum[k].x*(bi++)[0];

	*/

	ei = cebi[0]*bi[6]+vi[0];
	}
	}
	if (function[3]&&atom->mu) { // dipole
	double *ei = energy+3;
	double *vi = virial+3;
	double imu = atom->mu[0], nmu = imu+4*atom->nlocal;
	double ce = mumurd2e2.0g3/3.0/MY_PIS;
	for (; imu < nmu; imu += 4, vi += EWALD_NFUNCS, ei += EWALD_NFUNCS) {
	*vi = 0; // in surface
	ei = ceimu[3]*imu[3]-vi[0];
	}
	}
	}

	/* ----------------------------------------------------------------------
	compute the EwaldDisp long-range force, energy, virial
	------------------------------------------------------------------------- */

	void EwaldDisp::compute(int eflag, int vflag)
	{
	if (!nbox) return;

	// set energy/virial flags
	// invoke allocate_peratom() if needed for first time

	if (eflag \|\| vflag) ev_setup(eflag,vflag);
	else evflag = eflag_global = vflag_global = eflag_atom = vflag_atom = 0;

	if (!peratom_allocate_flag && (eflag_atom \|\| vflag_atom)) {
	allocate_peratom();
	peratom_allocate_flag = 1;
	nmax = atom->nmax;
	}

	reallocate_atoms();
	init_self_peratom();
	compute_ek();
	compute_force();
	//compute_surface(); // assume conducting metal (tinfoil) boundary conditions

	// update qsum and qsqsum, if atom count has changed and energy needed

	if ((eflag_global \|\| eflag_atom) && atom->natoms != natoms_original) {
	if (function[0]) qsum_qsq();
	natoms_original = atom->natoms;
	}

	compute_energy();
	compute_energy_peratom();
	compute_virial();
	compute_virial_dipole();
	compute_virial_peratom();

	if (slabflag) compute_slabcorr();
	}


	void EwaldDisp::compute_ek()
	{
	cvector *ekr = ekr_local;
	int lbytes = (2nbox+1)sizeof(cvector);
	hvector *h = NULL;
	kvector k, nk = kvec+nkvec;
	cvector z = new cvector[2nbox+1];
	cvector z1, zx, zy, zz, zn = z+2*nbox;
	complex *cek, zxyz, zxy = COMPLEX_NULL, cx = COMPLEX_NULL;
	vector mui;
	double x = atom->x[0], xn = x+3atom->nlocal, q = atom->q, qi = 0.0;
	double bi = 0.0, ci[7];
	double *mu = atom->mu ? atom->mu[0] : NULL;
	int i, kx, ky, n = nkvecnsums, type = atom->type, tri = domain->triclinic;
	int func[EWALD_NFUNCS];

	memcpy(func, function, EWALD_NFUNCS*sizeof(int));
	memset(cek_local, 0, n*sizeof(complex)); // reset sums
	while (x<xn) {
	zx = (zy = (zz = z+nbox)+1)-2;
	C_SET(zz->x, 1, 0); C_SET(zz->y, 1, 0); C_SET(zz->z, 1, 0); // z[0]
	if (tri) { // triclinic z[1]
	C_ANGLE(z1.x, unit[0]x[0]+unit[5]x[1]+unit[4]*x[2]);
	C_ANGLE(z1.y, unit[1]x[1]+unit[3]x[2]);
	C_ANGLE(z1.z, x[2]*unit[2]); x += 3;
	}
	else { // orthogonal z[1]
	C_ANGLE(z1.x, (x++)unit[0]);
	C_ANGLE(z1.y, (x++)unit[1]);
	C_ANGLE(z1.z, (x++)unit[2]);
	}
	for (; zz<zn; --zx, ++zy, ++zz) { // set up z[k]=e^(ik.r)
	C_RMULT(zy->x, zz->x, z1.x); // 3D k-vector
	C_RMULT(zy->y, zz->y, z1.y); C_CONJ(zx->y, zy->y);
	C_RMULT(zy->z, zz->z, z1.z); C_CONJ(zx->z, zy->z);
	}
	kx = ky = -1;
	cek = cek_local;
	if (func[0]) qi = *(q++);
	if (func[1]) bi = B[*type];
	if (func[2]) memcpy(ci, B+7type[0], 7sizeof(double));
	if (func[3]) {
	memcpy(mui, mu, sizeof(vector));
	mu += 4;
	h = hvec;
	}
	for (k=kvec; k<nk; ++k) { // compute rho(k)
	if (ky!=k->y) { // based on order in
	if (kx!=k->x) cx = z[kx = k->x].x; // reallocate
	C_RMULT(zxy, z[ky = k->y].y, cx);
	}
	C_RMULT(zxyz, z[k->z].z, zxy);
	if (func[0]) {
	cek->re += zxyz.reqi; (cek++)->im += zxyz.imqi;
	}
	if (func[1]) {
	cek->re += zxyz.rebi; (cek++)->im += zxyz.imbi;
	}
	if (func[2]) for (i=0; i<7; ++i) {
	cek->re += zxyz.reci[i]; (cek++)->im += zxyz.imci[i];
	}
	if (func[3]) {
	register double muk = mui[0]h->x+mui[1]h->y+mui[2]*h->z; ++h;
	cek->re += zxyz.remuk; (cek++)->im += zxyz.immuk;
	}
	}
	ekr = (cvector ) ((char ) memcpy(ekr, z, lbytes)+lbytes);
	++type;
	}
	MPI_Allreduce(cek_local, cek_global, 2*n, MPI_DOUBLE, MPI_SUM, world);

	delete [] z;
	}

	/* ---------------------------------------------------------------------- */

	void EwaldDisp::compute_force()
	{
	kvector *k;
	hvector h, nh;
	cvector *z = ekr_local;
	vector sum[EWALD_MAX_NSUMS], mui = COMPLEX_NULL;
	complex *cek, zc, zx = COMPLEX_NULL, zxy = COMPLEX_NULL;
	complex *cek_coul;
	double f = atom->f[0], fn = f+3atom->nlocal, q = atom->q, *t = NULL;
	double *mu = atom->mu ? atom->mu[0] : NULL;
	const double qscale = force->qqrd2e * scale;
	double *ke, c[EWALD_NFUNCS] = {
	8.0MY_PIqscale/volume, 2.0MY_PIMY_PIS/(12.0*volume),
	2.0MY_PIMY_PIS/(192.0volume), 8.0MY_PI*mumurd2e/volume};
	int i, kx, ky, lbytes = (2nbox+1)sizeof(cvector), *type = atom->type;
	int func[EWALD_NFUNCS];

	if (atom->torque) t = atom->torque[0];
	memcpy(func, function, EWALD_NFUNCS*sizeof(int));
	memset(sum, 0, EWALD_MAX_NSUMS*sizeof(vector)); // fj = -dE/dr =
	for (; f<fn; f+=3) { // -iqjfac*
	k = kvec; // Sum[conj(d)-d]
	kx = ky = -1; // d = kconj(ekj)ek
	ke = kenergy;
	cek = cek_global;
	memset(sum, 0, EWALD_MAX_NSUMS*sizeof(vector));
	if (func[3]) {
	register double di = c[3];
	mui[0] = di(mu++)[0]; mui[1] = di(mu++)[0]; mui[2] = di*(mu++)[0];
	mu++;
	}
	for (nh = (h = hvec)+nkvec; h<nh; ++h, ++k) {
	if (ky!=k->y) { // based on order in
	if (kx!=k->x) zx = z[kx = k->x].x; // reallocate
	C_RMULT(zxy, z[ky = k->y].y, zx);
	}
	C_CRMULT(zc, z[k->z].z, zxy);
	if (func[0]) { // 1/r
	register double im = (ke++)(zc.imcek->re+cek->imzc.re);
	if (func[3]) cek_coul = cek;
	++cek;
	sum[0][0] += h->xim; sum[0][1] += h->yim; sum[0][2] += h->z*im;
	}
	if (func[1]) { // geometric 1/r^6
	register double im = (ke++)(zc.imcek->re+cek->imzc.re); ++cek;
	sum[1][0] += h->xim; sum[1][1] += h->yim; sum[1][2] += h->z*im;
	}
	if (func[2]) { // arithmetic 1/r^6
	register double im, c = *(ke++);
	for (i=2; i<9; ++i) {
	im = c(zc.imcek->re+cek->im*zc.re); ++cek;
	sum[i][0] += h->xim; sum[i][1] += h->yim; sum[i][2] += h->z*im;
	}
	}
	if (func[3]) { // dipole
	register double im = (ke)(zc.im*cek->re+
	cek->imzc.re)(mui[0]h->x+mui[1]h->y+mui[2]*h->z);
	register double im2 = (ke)(zc.re*cek->re-
	cek->im*zc.im);
	sum[9][0] += h->xim; sum[9][1] += h->yim; sum[9][2] += h->z*im;
	t[0] += -mui[1]h->zim2 + mui[2]h->yim2; // torque
	t[1] += -mui[2]h->xim2 + mui[0]h->zim2;
	t[2] += -mui[0]h->yim2 + mui[1]h->xim2;
	if (func[0]) { // charge-dipole
	register double qi = (q)c[0];
	im = - (ke)(zc.re*cek_coul->re -
	cek_coul->imzc.im)(mui[0]h->x+mui[1]h->y+mui[2]*h->z);
	im += (ke)(zc.recek->re - cek->imzc.im)*qi;
	sum[9][0] += h->xim; sum[9][1] += h->yim; sum[9][2] += h->z*im;

	im2 = (ke)(zc.recek_coul->im + cek_coul->rezc.im);
	im2 += -(ke)(zc.recek->im - cek->imzc.re);
	t[0] += -mui[1]h->zim2 + mui[2]h->yim2; // torque
	t[1] += -mui[2]h->xim2 + mui[0]h->zim2;
	t[2] += -mui[0]h->yim2 + mui[1]h->xim2;
	}
	++cek;
	ke++;
	}
	}
	if (func[0]) { // 1/r
	register double qi = (q++)c[0];
	f[0] -= sum[0][0]qi; f[1] -= sum[0][1]qi; f[2] -= sum[0][2]*qi;
	}
	if (func[1]) { // geometric 1/r^6
	register double bi = B[type]c[1];
	f[0] -= sum[1][0]bi; f[1] -= sum[1][1]bi; f[2] -= sum[1][2]*bi;
	}
	if (func[2]) { // arithmetic 1/r^6
	register double bi = B+7type[0]+7;
	for (i=2; i<9; ++i) {
	register double c2 = (--bi)[0]*c[2];
	f[0] -= sum[i][0]c2; f[1] -= sum[i][1]c2; f[2] -= sum[i][2]*c2;
	}
	}
	if (func[3]) { // dipole
	f[0] -= sum[9][0]; f[1] -= sum[9][1]; f[2] -= sum[9][2];
	}
	z = (cvector ) ((char ) z+lbytes);
	++type;
	t += 3;
	}
	}

	/* ---------------------------------------------------------------------- */

	void EwaldDisp::compute_surface()
	{
	// assume conducting metal (tinfoil) boundary conditions, so this function is
	// not called because dielectric at the boundary --> infinity, which makes all
	// the terms here zero.

	if (!function[3]) return;
	if (!atom->mu) return;

	vector sum_local = VECTOR_NULL, sum_total;
	memset(sum_local, 0, sizeof(vector));
	double i, n, *mu = atom->mu[0];

	for (n = (i = mu) + 4*atom->nlocal; i < n; ++i) {
	sum_local[0] += (i++)[0];
	sum_local[1] += (i++)[0];
	sum_local[2] += (i++)[0];
	}
	MPI_Allreduce(sum_local, sum_total, 3, MPI_DOUBLE, MPI_SUM, world);

	virial_self[3] =
	mumurd2e(2.0MY_PIvec_dot(sum_total,sum_total)/(2.0dielectric+1)/volume);
	energy_self[3] -= virial_self[3];

	if (!(vflag_atom \|\| eflag_atom)) return;

	double *ei = energy_self_peratom[0]+3;
	double *vi = virial_self_peratom[0]+3;
	double cv = 2.0mumurd2eMY_PI/(2.0*dielectric+1)/volume;

	for (i = mu; i < n; i += 4, ei += EWALD_NFUNCS, vi += EWALD_NFUNCS) {
	vi = cv(i[0]sum_total[0]+i[1]sum_total[1]+i[2]*sum_total[2]);
	ei -= vi;
	}
	}

	/* ---------------------------------------------------------------------- */

	void EwaldDisp::compute_energy()
	{
	energy = 0.0;
	if (!eflag_global) return;

	complex *cek = cek_global;
	complex *cek_coul;
	double *ke = kenergy;
	const double qscale = force->qqrd2e * scale;
	double c[EWALD_NFUNCS] = {
	4.0MY_PIqscale/volume, 2.0MY_PIMY_PIS/(24.0*volume),
	2.0MY_PIMY_PIS/(192.0volume), 4.0MY_PI*mumurd2e/volume};
	double sum[EWALD_NFUNCS];
	int func[EWALD_NFUNCS];

	memcpy(func, function, EWALD_NFUNCS*sizeof(int));
	memset(sum, 0, EWALD_NFUNCS*sizeof(double)); // reset sums
	for (int k=0; k<nkvec; ++k) { // sum over k vectors
	if (func[0]) { // 1/r
	sum[0] += (ke++)(cek->recek->re+cek->imcek->im);
	if (func[3]) cek_coul = cek;
	++cek;
	}
	if (func[1]) { // geometric 1/r^6
	sum[1] += (ke++)(cek->recek->re+cek->imcek->im); ++cek; }
	if (func[2]) { // arithmetic 1/r^6
	register double r =
	(cek[0].recek[6].re+cek[0].imcek[6].im)+
	(cek[1].recek[5].re+cek[1].imcek[5].im)+
	(cek[2].recek[4].re+cek[2].imcek[4].im)+
	0.5(cek[3].recek[3].re+cek[3].im*cek[3].im); cek += 7;
	sum[2] += (ke++)r;
	}
	if (func[3]) { // dipole
	sum[3] += (ke)(cek->recek->re+cek->imcek->im);
	if (func[0]) { // charge-dipole
	sum[3] += (ke)2.0(cek->recek_coul->im - cek->im*cek_coul->re);
	}
	ke++;
	++cek;
	}
	}
	for (int k=0; k<EWALD_NFUNCS; ++k) energy += c[k]*sum[k]-energy_self[k];
	}

	/* ---------------------------------------------------------------------- */

	void EwaldDisp::compute_energy_peratom()
	{
	if (!eflag_atom) return;

	kvector *k;
	hvector h, nh;
	cvector *z = ekr_local;
	vector mui = VECTOR_NULL;
	double sum[EWALD_MAX_NSUMS];
	complex *cek, zc = COMPLEX_NULL, zx = COMPLEX_NULL, zxy = COMPLEX_NULL;
	complex *cek_coul;
	double *q = atom->q;
	double *eatomj = eatom;
	double *mu = atom->mu ? atom->mu[0] : NULL;
	const double qscale = force->qqrd2e * scale;
	double *ke = kenergy;
	double c[EWALD_NFUNCS] = {
	4.0MY_PIqscale/volume, 2.0MY_PIMY_PIS/(24.0*volume),
	2.0MY_PIMY_PIS/(192.0volume), 4.0MY_PI*mumurd2e/volume};
	int i, kx, ky, lbytes = (2nbox+1)sizeof(cvector), *type = atom->type;
	int func[EWALD_NFUNCS];

	memcpy(func, function, EWALD_NFUNCS*sizeof(int));
	for (int j = 0; j < atom->nlocal; j++, ++eatomj) {
	k = kvec;
	kx = ky = -1;
	ke = kenergy;
	cek = cek_global;
	memset(sum, 0, EWALD_MAX_NSUMS*sizeof(double));
	if (func[3]) {
	register double di = c[3];
	mui[0] = di(mu++)[0]; mui[1] = di(mu++)[0]; mui[2] = di*(mu++)[0];
	mu++;
	}
	for (nh = (h = hvec)+nkvec; h<nh; ++h, ++k) {
	if (ky!=k->y) { // based on order in
	if (kx!=k->x) zx = z[kx = k->x].x; // reallocate
	C_RMULT(zxy, z[ky = k->y].y, zx);
	}
	C_CRMULT(zc, z[k->z].z, zxy);
	if (func[0]) { // 1/r
	sum[0] += (ke++)(cek->rezc.re - cek->imzc.im);
	if (func[3]) cek_coul = cek;
	++cek;
	}
	if (func[1]) { // geometric 1/r^6
	sum[1] += (ke++)(cek->rezc.re - cek->imzc.im); ++cek; }
	if (func[2]) { // arithmetic 1/r^6
	register double im, c = *(ke++);
	for (i=2; i<9; ++i) {
	im = c(cek->rezc.re - cek->im*zc.im); ++cek;
	sum[i] += im;
	}
	}
	if (func[3]) { // dipole
	double muk = (mui[0]h->x+mui[1]h->y+mui[2]*h->z);
	sum[9] += (ke)(cek->rezc.re - cek->imzc.im)*muk;
	if (func[0]) { // charge-dipole
	register double qj = (q)c[0];
	sum[9] += (ke)(cek_coul->imzc.re + cek_coul->rezc.im)*muk;
	sum[9] -= (ke)(cek->rezc.im + cek->imzc.re)*qj;
	}
	++cek;
	ke++;
	}
	}

	if (func[0]) { // 1/r
	register double qj = (q++)c[0];
	eatomj += sum[0]qj - energy_self_peratom[j][0];
	}
	if (func[1]) { // geometric 1/r^6
	register double bj = B[type]c[1];
	eatomj += sum[1]bj - energy_self_peratom[j][1];
	}
	if (func[2]) { // arithmetic 1/r^6
	register double bj = B+7type[0]+7;
	for (i=2; i<9; ++i) {
	register double c2 = (--bj)[0]*c[2];
	eatomj += 0.5sum[i]*c2;
	}
	*eatomj -= energy_self_peratom[j][2];
	}
	if (func[3]) { // dipole
	*eatomj += sum[9] - energy_self_peratom[j][3];
	}
	z = (cvector ) ((char ) z+lbytes);
	++type;
	}
	}

	/* ---------------------------------------------------------------------- */

	#define swap(a, b) { register double t = a; a= b; b = t; }

	void EwaldDisp::compute_virial()
	{
	memset(virial, 0, sizeof(shape));
	if (!vflag_global) return;

	complex *cek = cek_global;
	complex *cek_coul;
	double *kv = kvirial;
	const double qscale = force->qqrd2e * scale;
	double c[EWALD_NFUNCS] = {
	4.0MY_PIqscale/volume, 2.0MY_PIMY_PIS/(24.0*volume),
	2.0MY_PIMY_PIS/(192.0volume), 4.0MY_PI*mumurd2e/volume};
	shape sum[EWALD_NFUNCS];
	int func[EWALD_NFUNCS];

	memcpy(func, function, EWALD_NFUNCS*sizeof(int));
	memset(sum, 0, EWALD_NFUNCS*sizeof(shape));
	for (int k=0; k<nkvec; ++k) { // sum over k vectors
	if (func[0]) { // 1/r
	register double r = cek->recek->re+cek->imcek->im;
	if (func[3]) cek_coul = cek;
	++cek;
	sum[0][0] += (kv++)r; sum[0][1] += (kv++)r; sum[0][2] += (kv++)r;
	sum[0][3] += (kv++)r; sum[0][4] += (kv++)r; sum[0][5] += (kv++)r;
	}
	if (func[1]) { // geometric 1/r^6
	register double r = cek->recek->re+cek->imcek->im; ++cek;
	sum[1][0] += (kv++)r; sum[1][1] += (kv++)r; sum[1][2] += (kv++)r;
	sum[1][3] += (kv++)r; sum[1][4] += (kv++)r; sum[1][5] += (kv++)r;
	}
	if (func[2]) { // arithmetic 1/r^6
	register double r =
	(cek[0].recek[6].re+cek[0].imcek[6].im)+
	(cek[1].recek[5].re+cek[1].imcek[5].im)+
	(cek[2].recek[4].re+cek[2].imcek[4].im)+
	0.5(cek[3].recek[3].re+cek[3].im*cek[3].im); cek += 7;
	sum[2][0] += (kv++)r; sum[2][1] += (kv++)r; sum[2][2] += (kv++)r;
	sum[2][3] += (kv++)r; sum[2][4] += (kv++)r; sum[2][5] += (kv++)r;
	}
	if (func[3]) {
	register double r = cek->recek->re+cek->imcek->im;
	sum[3][0] += (kv++)r; sum[3][1] += (kv++)r; sum[3][2] += (kv++)r;
	sum[3][3] += (kv++)r; sum[3][4] += (kv++)r; sum[3][5] += (kv++)r;
	if (func[0]) { // charge-dipole
	kv -= 6;
	register double r = 2.0(cek->recek_coul->im - cek->im*cek_coul->re);
	sum[3][0] += (kv++)r; sum[3][1] += (kv++)r; sum[3][2] += (kv++)r;
	sum[3][3] += (kv++)r; sum[3][4] += (kv++)r; sum[3][5] += (kv++)r;
	}
	++cek;
	}
	}
	for (int k=0; k<EWALD_NFUNCS; ++k)
	if (func[k]) {
	shape self = {virial_self[k], virial_self[k], virial_self[k], 0, 0, 0};
	shape_scalar_mult(sum[k], c[k]);
	shape_add(virial, sum[k]);
	shape_subtr(virial, self);
	}
	}

	/* ---------------------------------------------------------------------- */

	void EwaldDisp::compute_virial_dipole()
	{
	if (!function[3]) return;
	if (!vflag_atom && !vflag_global) return;
	kvector *k;
	hvector h, nh;
	cvector *z = ekr_local;
	vector mui = COMPLEX_NULL;
	double sum[6];
	double sum_total[6];
	complex *cek, zc, zx = COMPLEX_NULL, zxy = COMPLEX_NULL;
	complex *cek_coul;
	double *mu = atom->mu ? atom->mu[0] : NULL;
	double *vatomj = NULL;
	if (vflag_atom && vatom) vatomj = vatom[0];
	const double qscale = force->qqrd2e * scale;
	double *ke, c[EWALD_NFUNCS] = {
	8.0MY_PIqscale/volume, 2.0MY_PIMY_PIS/(12.0*volume),
	2.0MY_PIMY_PIS/(192.0volume), 8.0MY_PI*mumurd2e/volume};
	int i, kx, ky, lbytes = (2nbox+1)sizeof(cvector), *type = atom->type;
	int func[EWALD_NFUNCS];

	memcpy(func, function, EWALD_NFUNCS*sizeof(int));
	memset(&sum[0], 0, 6*sizeof(double));
	memset(&sum_total[0], 0, 6*sizeof(double));
	for (int j = 0; j < atom->nlocal; j++) {
	k = kvec;
	kx = ky = -1;
	ke = kenergy;
	cek = cek_global;
	memset(&sum[0], 0, 6*sizeof(double));
	if (func[3]) {
	register double di = c[3];
	mui[0] = di(mu++)[0]; mui[1] = di(mu++)[0]; mui[2] = di*(mu++)[0];
	mu++;
	}
	for (nh = (h = hvec)+nkvec; h<nh; ++h, ++k) {
	if (ky!=k->y) { // based on order in
	if (kx!=k->x) zx = z[kx = k->x].x; // reallocate
	C_RMULT(zxy, z[ky = k->y].y, zx);
	}
	C_CRMULT(zc, z[k->z].z, zxy);
	double im = 0.0;
	if (func[0]) { // 1/r
	ke++;
	if (func[3]) cek_coul = cek;
	++cek;
	}
	if (func[1]) { // geometric 1/r^6
	ke++;
	++cek;
	}
	if (func[2]) { // arithmetic 1/r^6
	ke++;
	for (i=2; i<9; ++i) {
	++cek;
	}
	}
	if (func[3]) { // dipole
	im = (ke)(zc.recek->re - cek->imzc.im);
	if (func[0]) { // charge-dipole
	im += (ke)(zc.imcek_coul->re + cek_coul->imzc.re);
	}
	sum[0] -= mui[0]h->xim;
	sum[1] -= mui[1]h->yim;
	sum[2] -= mui[2]h->zim;
	sum[3] -= mui[0]h->yim;
	sum[4] -= mui[0]h->zim;
	sum[5] -= mui[1]h->zim;
	++cek;
	ke++;
	}
	}

	if (vflag_global)
	for (int n = 0; n < 6; n++)
	sum_total[n] -= sum[n];

	if (vflag_atom)
	for (int n = 0; n < 6; n++)
	vatomj[n] -= sum[n];

	z = (cvector ) ((char ) z+lbytes);
	++type;
	if (vflag_atom) vatomj += 6;
	}

	if (vflag_global) {
	MPI_Allreduce(&sum_total[0],&sum[0],6,MPI_DOUBLE,MPI_SUM,world);
	for (int n = 0; n < 6; n++)
	virial[n] += sum[n];
	}
	}

	/* ---------------------------------------------------------------------- */

	void EwaldDisp::compute_virial_peratom()
	{
	if (!vflag_atom) return;

	kvector *k;
	hvector h, nh;
	cvector *z = ekr_local;
	vector mui = VECTOR_NULL;
	complex *cek, zc = COMPLEX_NULL, zx = COMPLEX_NULL, zxy = COMPLEX_NULL;
	complex *cek_coul;
	double *kv;
	double *q = atom->q;
	double *vatomj = vatom ? vatom[0] : NULL;
	double *mu = atom->mu ? atom->mu[0] : NULL;
	const double qscale = force->qqrd2e * scale;
	double c[EWALD_NFUNCS] = {
	4.0MY_PIqscale/volume, 2.0MY_PIMY_PIS/(24.0*volume),
	2.0MY_PIMY_PIS/(192.0volume), 4.0MY_PI*mumurd2e/volume};
	shape sum[EWALD_MAX_NSUMS];
	int func[EWALD_NFUNCS];

	memcpy(func, function, EWALD_NFUNCS*sizeof(int));
	int i, kx, ky, lbytes = (2nbox+1)sizeof(cvector), *type = atom->type;
	for (int j = 0; j < atom->nlocal; j++) {
	k = kvec;
	kx = ky = -1;
	kv = kvirial;
	cek = cek_global;
	memset(sum, 0, EWALD_MAX_NSUMS*sizeof(shape));
	if (func[3]) {
	register double di = c[3];
	mui[0] = di(mu++)[0]; mui[1] = di(mu++)[0]; mui[2] = di*(mu++)[0];
	mu++;
	}
	for (nh = (h = hvec)+nkvec; h<nh; ++h, ++k) {
	if (ky!=k->y) { // based on order in
	if (kx!=k->x) zx = z[kx = k->x].x; // reallocate
	C_RMULT(zxy, z[ky = k->y].y, zx);
	}
	C_CRMULT(zc, z[k->z].z, zxy);
	if (func[0]) { // 1/r
	if (func[3]) cek_coul = cek;
	register double r = cek->rezc.re - cek->imzc.im; ++cek;
	sum[0][0] += (kv++)r;
	sum[0][1] += (kv++)r;
	sum[0][2] += (kv++)r;
	sum[0][3] += (kv++)r;
	sum[0][4] += (kv++)r;
	sum[0][5] += (kv++)r;
	}
	if (func[1]) { // geometric 1/r^6
	register double r = cek->rezc.re - cek->imzc.im; ++cek;
	sum[1][0] += (kv++)r;
	sum[1][1] += (kv++)r;
	sum[1][2] += (kv++)r;
	sum[1][3] += (kv++)r;
	sum[1][4] += (kv++)r;
	sum[1][5] += (kv++)r;
	}
	if (func[2]) { // arithmetic 1/r^6
	register double r;
	for (i=2; i<9; ++i) {
	r = cek->rezc.re - cek->imzc.im; ++cek;
	sum[i][0] += (kv++)r;
	sum[i][1] += (kv++)r;
	sum[i][2] += (kv++)r;
	sum[i][3] += (kv++)r;
	sum[i][4] += (kv++)r;
	sum[i][5] += (kv++)r;
	kv -= 6;
	}
	kv += 6;
	}
	if (func[3]) { // dipole
	double muk = (mui[0]h->x+mui[1]h->y+mui[2]*h->z);
	register double
	r = (cek->rezc.re - cek->imzc.im)*muk;
	sum[9][0] += (kv++)r;
	sum[9][1] += (kv++)r;
	sum[9][2] += (kv++)r;
	sum[9][3] += (kv++)r;
	sum[9][4] += (kv++)r;
	sum[9][5] += (kv++)r;
	if (func[0]) { // charge-dipole
	kv -= 6;
	register double qj = (q)c[0];
	r = (cek_coul->imzc.re + cek_coul->rezc.im)*muk;
	r += -(cek->rezc.im + cek->imzc.re)*qj;
	sum[9][0] += (kv++)r; sum[9][1] += (kv++)r; sum[9][2] += (kv++)r;
	sum[9][3] += (kv++)r; sum[9][4] += (kv++)r; sum[9][5] += (kv++)r;
	}
	++cek;
	}
	}

	if (func[0]) { // 1/r
	register double qi = (q++)c[0];
	for (int n = 0; n < 6; n++) vatomj[n] += sum[0][n]*qi;
	}
	if (func[1]) { // geometric 1/r^6
	register double bi = B[type]c[1];
	for (int n = 0; n < 6; n++) vatomj[n] += sum[1][n]*bi;
	}
	if (func[2]) { // arithmetic 1/r^6
	register double bj = B+7type[0]+7;
	for (i=2; i<9; ++i) {
	register double c2 = (--bj)[0]*c[2];
	for (int n = 0; n < 6; n++) vatomj[n] += 0.5sum[i][n]c2;
	}
	}
	if (func[3]) { // dipole
	for (int n = 0; n < 6; n++) vatomj[n] += sum[9][n];
	}

	for (int k=0; k<EWALD_NFUNCS; ++k) {
	if (func[k]) {
	for (int n = 0; n < 3; n++) vatomj[n] -= virial_self_peratom[j][k];
	}
	}

	z = (cvector ) ((char ) z+lbytes);
	++type;
	vatomj += 6;
	}
	}

	/* ----------------------------------------------------------------------
	Slab-geometry correction term to dampen inter-slab interactions between
	periodically repeating slabs. Yields good approximation to 2D Ewald if
	adequate empty space is left between repeating slabs (J. Chem. Phys.
	111, 3155). Slabs defined here to be parallel to the xy plane. Also
	extended to non-neutral systems (J. Chem. Phys. 131, 094107).
	------------------------------------------------------------------------- */

	void EwaldDisp::compute_slabcorr()
	{
	// compute local contribution to global dipole moment

	double *q = atom->q;
	double **x = atom->x;
	double zprd = domain->zprd;
	int nlocal = atom->nlocal;

	double dipole = 0.0;
	for (int i = 0; i < nlocal; i++) dipole += q[i]*x[i][2];

	if (function[3] && atom->mu) {
	double **mu = atom->mu;
	for (int i = 0; i < nlocal; i++) dipole += mu[i][2];
	}

	// sum local contributions to get global dipole moment

	double dipole_all;
	MPI_Allreduce(&dipole,&dipole_all,1,MPI_DOUBLE,MPI_SUM,world);

	// need to make non-neutral systems and/or
	// per-atom energy translationally invariant

	double dipole_r2 = 0.0;
	if (eflag_atom \|\| fabs(qsum) > SMALL) {

	if (function[3] && atom->mu)
	error->all(FLERR,"Cannot (yet) use kspace slab correction with "
	"long-range dipoles and non-neutral systems or per-atom energy");

	for (int i = 0; i < nlocal; i++)
	dipole_r2 += q[i]x[i][2]x[i][2];

	// sum local contributions

	double tmp;
	MPI_Allreduce(&dipole_r2,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
	dipole_r2 = tmp;
	}

	// compute corrections

	const double e_slabcorr = MY_2PI(dipole_alldipole_all -
	qsumdipole_r2 - qsumqsumzprdzprd/12.0)/volume;
	const double qscale = force->qqrd2e * scale;

	if (eflag_global) energy += qscale * e_slabcorr;

	// per-atom energy

	if (eflag_atom) {
	double efact = qscale * MY_2PI/volume;
	for (int i = 0; i < nlocal; i++)
	eatom[i] += efact * q[i](x[i][2]dipole_all - 0.5*(dipole_r2 +
	qsumx[i][2]x[i][2]) - qsumzprdzprd/12.0);
	}

	// add on force corrections

	double ffact = qscale * (-4.0*MY_PI/volume);
	double **f = atom->f;

	for (int i = 0; i < nlocal; i++)
	f[i][2] += ffact * q[i](dipole_all - qsumx[i][2]);

	// add on torque corrections

	if (function[3] && atom->mu && atom->torque) {
	double **mu = atom->mu;
	double **torque = atom->torque;
	for (int i = 0; i < nlocal; i++) {
	torque[i][0] += ffact * dipole_all * mu[i][1];
	torque[i][1] += -ffact * dipole_all * mu[i][0];
	}
	}
	}

	/* ----------------------------------------------------------------------
	Newton solver used to find g_ewald for LJ systems
	------------------------------------------------------------------------- */

	double EwaldDisp::NewtonSolve(double x, double Rc,
	bigint natoms, double vol, double b2)
	{
	double dx,tol;
	int maxit;

	maxit = 10000; //Maximum number of iterations
	tol = 0.00001; //Convergence tolerance

	//Begin algorithm

	for (int i = 0; i < maxit; i++) {
	dx = f(x,Rc,natoms,vol,b2) / derivf(x,Rc,natoms,vol,b2);
	x = x - dx; //Update x
	if (fabs(dx) < tol) return x;
	if (x < 0 \|\| x != x) // solver failed
	return -1;
	}
	return -1;
	}

	/* ----------------------------------------------------------------------
	Calculate f(x)
	------------------------------------------------------------------------- */

	double EwaldDisp::f(double x, double Rc, bigint natoms, double vol, double b2)
	{
	double a = Rc*x;
	double f = 0.0;

	if (function[3]) { // dipole
	double rg2 = a*a;
	double rg4 = rg2*rg2;
	double rg6 = rg4*rg2;
	double Cc = 4.0rg4 + 6.0rg2 + 3.0;
	double Dc = 8.0rg6 + 20.0rg4 + 30.0*rg2 + 15.0;
	f = (b2/(sqrt(volpowint(x,4)powint(Rc,9)natoms))
	sqrt(13.0/6.0CcCc + 2.0/15.0DcDc - 13.0/15.0CcDc) *
	exp(-rg2)) - accuracy;
	} else if (function[1] \|\| function[2]) { // LJ
	f = (4.0MY_PIb2powint(x,4)/vol/sqrt((double)natoms)erfc(a) *
	(6.0powint(a,-5) + 6.0powint(a,-3) + 3.0/a + a) - accuracy);
	}

	return f;
	}

	/* ----------------------------------------------------------------------
	Calculate numerical derivative f'(x)
	------------------------------------------------------------------------- */

	double EwaldDisp::derivf(double x, double Rc,
	bigint natoms, double vol, double b2)
	{
	double h = 0.000001; //Derivative step-size
	return (f(x + h,Rc,natoms,vol,b2) - f(x,Rc,natoms,vol,b2)) / h;
	}
	diff --git a/src/KSPACE/pair_lj_charmmfsw_coul_long.cpp b/src/KSPACE/pair_lj_charmmfsw_coul_long.cpp
	index 11c7a147e..6e17a9bbd 100644
	--- a/src/KSPACE/pair_lj_charmmfsw_coul_long.cpp
	+++ b/src/KSPACE/pair_lj_charmmfsw_coul_long.cpp
	@@ -1,1078 +1,1078 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Paul Crozier (SNL)
	The lj-fsw sections (force-switched) were provided by
	Robert Meissner and Lucio Colombi Ciacchi of
	Bremen University, Bremen, Germany, with
	additional assistance from Robert A. Latour, Clemson University
	------------------------------------------------------------------------- */

	#include <math.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <string.h>
	#include "pair_lj_charmmfsw_coul_long.h"
	#include "atom.h"
	#include "comm.h"
	#include "force.h"
	#include "kspace.h"
	#include "update.h"
	#include "integrate.h"
	#include "respa.h"
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "neigh_request.h"
	#include "memory.h"
	#include "error.h"

	using namespace LAMMPS_NS;

	#define EWALD_F 1.12837917
	#define EWALD_P 0.3275911
	#define A1 0.254829592
	#define A2 -0.284496736
	#define A3 1.421413741
	#define A4 -1.453152027
	#define A5 1.061405429

	/* ---------------------------------------------------------------------- */

	PairLJCharmmfswCoulLong::PairLJCharmmfswCoulLong(LAMMPS *lmp) : Pair(lmp)
	{
	respa_enable = 1;
	ewaldflag = pppmflag = 1;
	ftable = NULL;
	implicit = 0;
	mix_flag = ARITHMETIC;
	writedata = 1;
	+
	+ // short-range/long-range flag accessed by DihedralCharmmfsw
	+
	+ dihedflag = 1;
	}

	/* ---------------------------------------------------------------------- */

	PairLJCharmmfswCoulLong::~PairLJCharmmfswCoulLong()
	{
	if (!copymode) {
	if (allocated) {
	memory->destroy(setflag);
	memory->destroy(cutsq);

	memory->destroy(epsilon);
	memory->destroy(sigma);
	memory->destroy(eps14);
	memory->destroy(sigma14);
	memory->destroy(lj1);
	memory->destroy(lj2);
	memory->destroy(lj3);
	memory->destroy(lj4);
	memory->destroy(lj14_1);
	memory->destroy(lj14_2);
	memory->destroy(lj14_3);
	memory->destroy(lj14_4);
	}
	if (ftable) free_tables();
	}
	}

	/* ---------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::compute(int eflag, int vflag)
	{
	int i,j,ii,jj,inum,jnum,itype,jtype,itable;
	double qtmp,xtmp,ytmp,ztmp,delx,dely,delz,evdwl,evdwl12,evdwl6,ecoul,fpair;
	double fraction,table;
	double r,rinv,r2inv,r3inv,r6inv,rsq,forcecoul,forcelj,factor_coul,factor_lj;
	double grij,expm2,prefactor,t,erfc;
	double switch1;
	int ilist,jlist,numneigh,*firstneigh;

	evdwl = ecoul = 0.0;
	if (eflag \|\| vflag) ev_setup(eflag,vflag);
	else evflag = vflag_fdotr = 0;

	double **x = atom->x;
	double **f = atom->f;
	double *q = atom->q;
	int *type = atom->type;
	int nlocal = atom->nlocal;
	double *special_coul = force->special_coul;
	double *special_lj = force->special_lj;
	int newton_pair = force->newton_pair;
	double qqrd2e = force->qqrd2e;

	inum = list->inum;
	ilist = list->ilist;
	numneigh = list->numneigh;
	firstneigh = list->firstneigh;

	// loop over neighbors of my atoms

	for (ii = 0; ii < inum; ii++) {
	i = ilist[ii];
	qtmp = q[i];
	xtmp = x[i][0];
	ytmp = x[i][1];
	ztmp = x[i][2];
	itype = type[i];
	jlist = firstneigh[i];
	jnum = numneigh[i];

	for (jj = 0; jj < jnum; jj++) {
	j = jlist[jj];
	factor_lj = special_lj[sbmask(j)];
	factor_coul = special_coul[sbmask(j)];
	j &= NEIGHMASK;

	delx = xtmp - x[j][0];
	dely = ytmp - x[j][1];
	delz = ztmp - x[j][2];
	rsq = delxdelx + delydely + delz*delz;

	if (rsq < cut_bothsq) {
	r2inv = 1.0/rsq;

	if (rsq < cut_coulsq) {
	if (!ncoultablebits \|\| rsq <= tabinnersq) {
	r = sqrt(rsq);
	grij = g_ewald * r;
	expm2 = exp(-grij*grij);
	t = 1.0 / (1.0 + EWALD_P*grij);
	erfc = t * (A1+t(A2+t(A3+t(A4+tA5)))) * expm2;
	prefactor = qqrd2e * qtmp*q[j]/r;
	forcecoul = prefactor * (erfc + EWALD_Fgrijexpm2);
	if (factor_coul < 1.0) forcecoul -= (1.0-factor_coul)*prefactor;
	} else {
	union_int_float_t rsq_lookup;
	rsq_lookup.f = rsq;
	itable = rsq_lookup.i & ncoulmask;
	itable >>= ncoulshiftbits;
	fraction = (rsq_lookup.f - rtable[itable]) * drtable[itable];
	table = ftable[itable] + fraction*dftable[itable];
	forcecoul = qtmpq[j] table;
	if (factor_coul < 1.0) {
	table = ctable[itable] + fraction*dctable[itable];
	prefactor = qtmpq[j] table;
	forcecoul -= (1.0-factor_coul)*prefactor;
	}
	}
	} else forcecoul = 0.0;

	if (rsq < cut_ljsq) {
	r6inv = r2invr2invr2inv;
	jtype = type[j];
	forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
	if (rsq > cut_lj_innersq) {
	switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
	(cut_ljsq + 2.0rsq - 3.0cut_lj_innersq) / denom_lj;
	forcelj = forcelj*switch1;
	}
	} else forcelj = 0.0;

	fpair = (forcecoul + factor_ljforcelj) r2inv;

	f[i][0] += delx*fpair;
	f[i][1] += dely*fpair;
	f[i][2] += delz*fpair;
	if (newton_pair \|\| j < nlocal) {
	f[j][0] -= delx*fpair;
	f[j][1] -= dely*fpair;
	f[j][2] -= delz*fpair;
	}

	if (eflag) {
	if (rsq < cut_coulsq) {
	if (!ncoultablebits \|\| rsq <= tabinnersq)
	ecoul = prefactor*erfc;
	else {
	table = etable[itable] + fraction*detable[itable];
	ecoul = qtmpq[j] table;
	}
	if (factor_coul < 1.0) ecoul -= (1.0-factor_coul)*prefactor;
	} else ecoul = 0.0;

	if (rsq < cut_ljsq) {
	if (rsq > cut_lj_innersq) {
	r = sqrt(rsq);
	rinv = 1.0/r;
	r3inv = rinvrinvrinv;
	evdwl12 = lj3[itype][jtype]cut_lj6denom_lj12 *
	(r6inv - cut_lj6inv)*(r6inv - cut_lj6inv);
	evdwl6 = -lj4[itype][jtype]cut_lj3denom_lj6 *
	(r3inv - cut_lj3inv)*(r3inv - cut_lj3inv);;
	evdwl = evdwl12 + evdwl6;
	} else {
	evdwl12 = r6invlj3[itype][jtype]r6inv -
	lj3[itype][jtype]cut_lj_inner6invcut_lj6inv;
	evdwl6 = -lj4[itype][jtype]*r6inv +
	lj4[itype][jtype]cut_lj_inner3invcut_lj3inv;
	evdwl = evdwl12 + evdwl6;
	}
	evdwl *= factor_lj;
	} else evdwl = 0.0;
	}

	if (evflag) ev_tally(i,j,nlocal,newton_pair,
	evdwl,ecoul,fpair,delx,dely,delz);
	}
	}
	}

	if (vflag_fdotr) virial_fdotr_compute();
	}

	/* ---------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::compute_inner()
	{
	int i,j,ii,jj,inum,jnum,itype,jtype;
	double qtmp,xtmp,ytmp,ztmp,delx,dely,delz,fpair;
	double rsq,r2inv,r6inv,forcecoul,forcelj,factor_coul,factor_lj;
	double rsw;
	int ilist,jlist,numneigh,*firstneigh;

	double **x = atom->x;
	double **f = atom->f;
	double *q = atom->q;
	int *type = atom->type;
	int nlocal = atom->nlocal;
	double *special_coul = force->special_coul;
	double *special_lj = force->special_lj;
	int newton_pair = force->newton_pair;
	double qqrd2e = force->qqrd2e;

	inum = listinner->inum;
	ilist = listinner->ilist;
	numneigh = listinner->numneigh;
	firstneigh = listinner->firstneigh;

	double cut_out_on = cut_respa[0];
	double cut_out_off = cut_respa[1];

	double cut_out_diff = cut_out_off - cut_out_on;
	double cut_out_on_sq = cut_out_on*cut_out_on;
	double cut_out_off_sq = cut_out_off*cut_out_off;

	// loop over neighbors of my atoms

	for (ii = 0; ii < inum; ii++) {
	i = ilist[ii];
	qtmp = q[i];
	xtmp = x[i][0];
	ytmp = x[i][1];
	ztmp = x[i][2];
	itype = type[i];
	jlist = firstneigh[i];
	jnum = numneigh[i];

	for (jj = 0; jj < jnum; jj++) {
	j = jlist[jj];
	factor_lj = special_lj[sbmask(j)];
	factor_coul = special_coul[sbmask(j)];
	j &= NEIGHMASK;

	delx = xtmp - x[j][0];
	dely = ytmp - x[j][1];
	delz = ztmp - x[j][2];
	rsq = delxdelx + delydely + delz*delz;

	if (rsq < cut_out_off_sq) {
	r2inv = 1.0/rsq;
	forcecoul = qqrd2e * qtmpq[j]sqrt(r2inv);
	if (factor_coul < 1.0) forcecoul -= (1.0-factor_coul)*forcecoul;

	r6inv = r2invr2invr2inv;
	jtype = type[j];
	forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);

	fpair = (forcecoul + factor_ljforcelj) r2inv;

	if (rsq > cut_out_on_sq) {
	rsw = (sqrt(rsq) - cut_out_on)/cut_out_diff;
	fpair = 1.0 + rswrsw(2.0rsw-3.0);
	}

	f[i][0] += delx*fpair;
	f[i][1] += dely*fpair;
	f[i][2] += delz*fpair;
	if (newton_pair \|\| j < nlocal) {
	f[j][0] -= delx*fpair;
	f[j][1] -= dely*fpair;
	f[j][2] -= delz*fpair;
	}
	}
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::compute_middle()
	{
	int i,j,ii,jj,inum,jnum,itype,jtype;
	double qtmp,xtmp,ytmp,ztmp,delx,dely,delz,fpair;
	double rsq,r2inv,r6inv,forcecoul,forcelj,factor_coul,factor_lj;
	double switch1;
	double rsw;
	int ilist,jlist,numneigh,*firstneigh;

	double **x = atom->x;
	double **f = atom->f;
	double *q = atom->q;
	int *type = atom->type;
	int nlocal = atom->nlocal;
	double *special_coul = force->special_coul;
	double *special_lj = force->special_lj;
	int newton_pair = force->newton_pair;
	double qqrd2e = force->qqrd2e;

	inum = listmiddle->inum;
	ilist = listmiddle->ilist;
	numneigh = listmiddle->numneigh;
	firstneigh = listmiddle->firstneigh;

	double cut_in_off = cut_respa[0];
	double cut_in_on = cut_respa[1];
	double cut_out_on = cut_respa[2];
	double cut_out_off = cut_respa[3];

	double cut_in_diff = cut_in_on - cut_in_off;
	double cut_out_diff = cut_out_off - cut_out_on;
	double cut_in_off_sq = cut_in_off*cut_in_off;
	double cut_in_on_sq = cut_in_on*cut_in_on;
	double cut_out_on_sq = cut_out_on*cut_out_on;
	double cut_out_off_sq = cut_out_off*cut_out_off;

	// loop over neighbors of my atoms

	for (ii = 0; ii < inum; ii++) {
	i = ilist[ii];
	qtmp = q[i];
	xtmp = x[i][0];
	ytmp = x[i][1];
	ztmp = x[i][2];
	itype = type[i];

	jlist = firstneigh[i];
	jnum = numneigh[i];

	for (jj = 0; jj < jnum; jj++) {
	j = jlist[jj];
	factor_lj = special_lj[sbmask(j)];
	factor_coul = special_coul[sbmask(j)];
	j &= NEIGHMASK;

	delx = xtmp - x[j][0];
	dely = ytmp - x[j][1];
	delz = ztmp - x[j][2];
	rsq = delxdelx + delydely + delz*delz;

	if (rsq < cut_out_off_sq && rsq > cut_in_off_sq) {
	r2inv = 1.0/rsq;
	forcecoul = qqrd2e * qtmpq[j]sqrt(r2inv);
	if (factor_coul < 1.0) forcecoul -= (1.0-factor_coul)*forcecoul;

	r6inv = r2invr2invr2inv;
	jtype = type[j];
	forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
	if (rsq > cut_lj_innersq) {
	switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
	(cut_ljsq + 2.0rsq - 3.0cut_lj_innersq) / denom_lj;
	forcelj = forcelj*switch1;
	}

	fpair = (forcecoul + factor_ljforcelj) r2inv;
	if (rsq < cut_in_on_sq) {
	rsw = (sqrt(rsq) - cut_in_off)/cut_in_diff;
	fpair = rswrsw(3.0 - 2.0rsw);
	}
	if (rsq > cut_out_on_sq) {
	rsw = (sqrt(rsq) - cut_out_on)/cut_out_diff;
	fpair = 1.0 + rswrsw(2.0rsw - 3.0);
	}

	f[i][0] += delx*fpair;
	f[i][1] += dely*fpair;
	f[i][2] += delz*fpair;
	if (newton_pair \|\| j < nlocal) {
	f[j][0] -= delx*fpair;
	f[j][1] -= dely*fpair;
	f[j][2] -= delz*fpair;
	}
	}
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::compute_outer(int eflag, int vflag)
	{
	int i,j,ii,jj,inum,jnum,itype,jtype,itable;
	double qtmp,xtmp,ytmp,ztmp,delx,dely,delz,evdwl,evdwl6,evdwl12,ecoul,fpair;
	double fraction,table;
	double r,rinv,r2inv,r3inv,r6inv,forcecoul,forcelj,factor_coul,factor_lj;
	double grij,expm2,prefactor,t,erfc;
	double switch1;
	double rsw;
	int ilist,jlist,numneigh,*firstneigh;
	double rsq;

	evdwl = ecoul = 0.0;
	if (eflag \|\| vflag) ev_setup(eflag,vflag);
	else evflag = 0;

	double **x = atom->x;
	double **f = atom->f;
	double *q = atom->q;
	int *type = atom->type;
	int nlocal = atom->nlocal;
	double *special_coul = force->special_coul;
	double *special_lj = force->special_lj;
	int newton_pair = force->newton_pair;
	double qqrd2e = force->qqrd2e;

	inum = listouter->inum;
	ilist = listouter->ilist;
	numneigh = listouter->numneigh;
	firstneigh = listouter->firstneigh;

	double cut_in_off = cut_respa[2];
	double cut_in_on = cut_respa[3];

	double cut_in_diff = cut_in_on - cut_in_off;
	double cut_in_off_sq = cut_in_off*cut_in_off;
	double cut_in_on_sq = cut_in_on*cut_in_on;

	// loop over neighbors of my atoms

	for (ii = 0; ii < inum; ii++) {
	i = ilist[ii];
	qtmp = q[i];
	xtmp = x[i][0];
	ytmp = x[i][1];
	ztmp = x[i][2];
	itype = type[i];
	jlist = firstneigh[i];
	jnum = numneigh[i];

	for (jj = 0; jj < jnum; jj++) {
	j = jlist[jj];
	factor_lj = special_lj[sbmask(j)];
	factor_coul = special_coul[sbmask(j)];
	j &= NEIGHMASK;

	delx = xtmp - x[j][0];
	dely = ytmp - x[j][1];
	delz = ztmp - x[j][2];
	rsq = delxdelx + delydely + delz*delz;
	jtype = type[j];

	if (rsq < cut_bothsq) {
	r2inv = 1.0/rsq;

	if (rsq < cut_coulsq) {
	if (!ncoultablebits \|\| rsq <= tabinnersq) {
	r = sqrt(rsq);
	grij = g_ewald * r;
	expm2 = exp(-grij*grij);
	t = 1.0 / (1.0 + EWALD_P*grij);
	erfc = t * (A1+t(A2+t(A3+t(A4+tA5)))) * expm2;
	prefactor = qqrd2e * qtmp*q[j]/r;
	forcecoul = prefactor * (erfc + EWALD_Fgrijexpm2 - 1.0);
	if (rsq > cut_in_off_sq) {
	if (rsq < cut_in_on_sq) {
	rsw = (r - cut_in_off)/cut_in_diff;
	forcecoul += prefactorrswrsw(3.0 - 2.0rsw);
	if (factor_coul < 1.0)
	forcecoul -=
	(1.0-factor_coul)prefactorrswrsw(3.0 - 2.0*rsw);
	} else {
	forcecoul += prefactor;
	if (factor_coul < 1.0)
	forcecoul -= (1.0-factor_coul)*prefactor;
	}
	}
	} else {
	union_int_float_t rsq_lookup;
	rsq_lookup.f = rsq;
	itable = rsq_lookup.i & ncoulmask;
	itable >>= ncoulshiftbits;
	fraction = (rsq_lookup.f - rtable[itable]) * drtable[itable];
	table = ftable[itable] + fraction*dftable[itable];
	forcecoul = qtmpq[j] table;
	if (factor_coul < 1.0) {
	table = ctable[itable] + fraction*dctable[itable];
	prefactor = qtmpq[j] table;
	forcecoul -= (1.0-factor_coul)*prefactor;
	}
	}
	} else forcecoul = 0.0;

	if (rsq < cut_ljsq && rsq > cut_in_off_sq) {
	r6inv = r2invr2invr2inv;
	forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
	if (rsq > cut_lj_innersq) {
	switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
	(cut_ljsq + 2.0rsq - 3.0cut_lj_innersq) / denom_lj;
	forcelj = forcelj*switch1;
	}
	if (rsq < cut_in_on_sq) {
	rsw = (sqrt(rsq) - cut_in_off)/cut_in_diff;
	forcelj = rswrsw(3.0 - 2.0rsw);
	}
	} else forcelj = 0.0;

	fpair = (forcecoul + forcelj) * r2inv;

	f[i][0] += delx*fpair;
	f[i][1] += dely*fpair;
	f[i][2] += delz*fpair;
	if (newton_pair \|\| j < nlocal) {
	f[j][0] -= delx*fpair;
	f[j][1] -= dely*fpair;
	f[j][2] -= delz*fpair;
	}

	if (eflag) {
	if (rsq < cut_coulsq) {
	if (!ncoultablebits \|\| rsq <= tabinnersq) {
	ecoul = prefactor*erfc;
	if (factor_coul < 1.0) ecoul -= (1.0-factor_coul)*prefactor;
	} else {
	table = etable[itable] + fraction*detable[itable];
	ecoul = qtmpq[j] table;
	if (factor_coul < 1.0) {
	table = ptable[itable] + fraction*dptable[itable];
	prefactor = qtmpq[j] table;
	ecoul -= (1.0-factor_coul)*prefactor;
	}
	}
	} else ecoul = 0.0;

	if (rsq < cut_ljsq) {
	r6inv = r2invr2invr2inv;
	evdwl = r6inv(lj3[itype][jtype]r6inv-lj4[itype][jtype]);
	if (rsq > cut_lj_innersq) {
	rinv = 1.0/r;
	r3inv = rinvrinvrinv;
	evdwl12 = lj3[itype][jtype]cut_lj6denom_lj12 *
	(r6inv - cut_lj6inv)*(r6inv - cut_lj6inv);
	evdwl6 = -lj4[itype][jtype]cut_lj3denom_lj6 *
	(r3inv - cut_lj3inv)*(r3inv - cut_lj3inv);;
	evdwl = evdwl12 + evdwl6;
	} else {
	evdwl12 = r6invlj3[itype][jtype]r6inv -
	lj3[itype][jtype]cut_lj_inner6invcut_lj6inv;
	evdwl6 = -lj4[itype][jtype]*r6inv +
	lj4[itype][jtype]cut_lj_inner3invcut_lj3inv;
	evdwl = evdwl12 + evdwl6;
	}
	evdwl *= factor_lj;
	} else evdwl = 0.0;
	}

	if (vflag) {
	if (rsq < cut_coulsq) {
	if (!ncoultablebits \|\| rsq <= tabinnersq) {
	forcecoul = prefactor * (erfc + EWALD_Fgrijexpm2);
	if (factor_coul < 1.0) forcecoul -= (1.0-factor_coul)*prefactor;
	} else {
	table = vtable[itable] + fraction*dvtable[itable];
	forcecoul = qtmpq[j] table;
	if (factor_coul < 1.0) {
	table = ptable[itable] + fraction*dptable[itable];
	prefactor = qtmpq[j] table;
	forcecoul -= (1.0-factor_coul)*prefactor;
	}
	}
	} else forcecoul = 0.0;

	if (rsq <= cut_in_off_sq) {
	r6inv = r2invr2invr2inv;
	forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
	if (rsq > cut_lj_innersq) {
	switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
	(cut_ljsq + 2.0rsq - 3.0cut_lj_innersq) / denom_lj;
	forcelj = forcelj*switch1;
	}
	} else if (rsq <= cut_in_on_sq) {
	forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
	if (rsq > cut_lj_innersq) {
	switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
	(cut_ljsq + 2.0rsq - 3.0cut_lj_innersq) / denom_lj;
	forcelj = forcelj*switch1;
	}
	}

	fpair = (forcecoul + factor_ljforcelj) r2inv;
	}

	if (evflag) ev_tally(i,j,nlocal,newton_pair,
	evdwl,ecoul,fpair,delx,dely,delz);
	}
	}
	}
	}

	/* ----------------------------------------------------------------------
	allocate all arrays
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::allocate()
	{
	allocated = 1;
	int n = atom->ntypes;

	memory->create(setflag,n+1,n+1,"pair:setflag");
	for (int i = 1; i <= n; i++)
	for (int j = i; j <= n; j++)
	setflag[i][j] = 0;

	memory->create(cutsq,n+1,n+1,"pair:cutsq");

	memory->create(epsilon,n+1,n+1,"pair:epsilon");
	memory->create(sigma,n+1,n+1,"pair:sigma");
	memory->create(eps14,n+1,n+1,"pair:eps14");
	memory->create(sigma14,n+1,n+1,"pair:sigma14");
	memory->create(lj1,n+1,n+1,"pair:lj1");
	memory->create(lj2,n+1,n+1,"pair:lj2");
	memory->create(lj3,n+1,n+1,"pair:lj3");
	memory->create(lj4,n+1,n+1,"pair:lj4");
	memory->create(lj14_1,n+1,n+1,"pair:lj14_1");
	memory->create(lj14_2,n+1,n+1,"pair:lj14_2");
	memory->create(lj14_3,n+1,n+1,"pair:lj14_3");
	memory->create(lj14_4,n+1,n+1,"pair:lj14_4");
	}

	/* ----------------------------------------------------------------------
	global settings
	unlike other pair styles,
	there are no individual pair settings that these override
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::settings(int narg, char **arg)
	{
	if (narg != 2 && narg != 3) error->all(FLERR,"Illegal pair_style command");

	cut_lj_inner = force->numeric(FLERR,arg[0]);
	cut_lj = force->numeric(FLERR,arg[1]);
	if (narg == 2) cut_coul = cut_lj;
	else cut_coul = force->numeric(FLERR,arg[2]);
	-
	- // indicates pair_style being used for dihedral_charmm
	-
	- dihedflag = 1;
	}

	/* ----------------------------------------------------------------------
	set coeffs for one or more type pairs
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::coeff(int narg, char **arg)
	{
	if (narg != 4 && narg != 6) error->all(FLERR,"Illegal pair_coeff command");
	if (!allocated) allocate();

	int ilo,ihi,jlo,jhi;
	force->bounds(FLERR,arg[0],atom->ntypes,ilo,ihi);
	force->bounds(FLERR,arg[1],atom->ntypes,jlo,jhi);

	double epsilon_one = force->numeric(FLERR,arg[2]);
	double sigma_one = force->numeric(FLERR,arg[3]);
	double eps14_one = epsilon_one;
	double sigma14_one = sigma_one;
	if (narg == 6) {
	eps14_one = force->numeric(FLERR,arg[4]);
	sigma14_one = force->numeric(FLERR,arg[5]);
	}

	int count = 0;
	for (int i = ilo; i <= ihi; i++) {
	for (int j = MAX(jlo,i); j <= jhi; j++) {
	epsilon[i][j] = epsilon_one;
	sigma[i][j] = sigma_one;
	eps14[i][j] = eps14_one;
	sigma14[i][j] = sigma14_one;
	setflag[i][j] = 1;
	count++;
	}
	}

	if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients");
	}

	/* ----------------------------------------------------------------------
	init specific to this pair style
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::init_style()
	{
	if (!atom->q_flag)
	error->all(FLERR,
	"Pair style lj/charmmfsw/coul/long requires atom attribute q");

	// request regular or rRESPA neighbor lists

	int irequest;

	if (update->whichflag == 1 && strstr(update->integrate_style,"respa")) {
	int respa = 0;
	if (((Respa *) update->integrate)->level_inner >= 0) respa = 1;
	if (((Respa *) update->integrate)->level_middle >= 0) respa = 2;

	if (respa == 0) irequest = neighbor->request(this,instance_me);
	else if (respa == 1) {
	irequest = neighbor->request(this,instance_me);
	neighbor->requests[irequest]->id = 1;
	neighbor->requests[irequest]->half = 0;
	neighbor->requests[irequest]->respainner = 1;
	irequest = neighbor->request(this,instance_me);
	neighbor->requests[irequest]->id = 3;
	neighbor->requests[irequest]->half = 0;
	neighbor->requests[irequest]->respaouter = 1;
	} else {
	irequest = neighbor->request(this,instance_me);
	neighbor->requests[irequest]->id = 1;
	neighbor->requests[irequest]->half = 0;
	neighbor->requests[irequest]->respainner = 1;
	irequest = neighbor->request(this,instance_me);
	neighbor->requests[irequest]->id = 2;
	neighbor->requests[irequest]->half = 0;
	neighbor->requests[irequest]->respamiddle = 1;
	irequest = neighbor->request(this,instance_me);
	neighbor->requests[irequest]->id = 3;
	neighbor->requests[irequest]->half = 0;
	neighbor->requests[irequest]->respaouter = 1;
	}

	} else irequest = neighbor->request(this,instance_me);

	// require cut_lj_inner < cut_lj

	if (cut_lj_inner >= cut_lj)
	error->all(FLERR,"Pair inner cutoff >= Pair outer cutoff");

	cut_lj_innersq = cut_lj_inner * cut_lj_inner;
	cut_ljsq = cut_lj * cut_lj;
	cut_ljinv = 1.0/cut_lj;
	cut_lj_innerinv = 1.0/cut_lj_inner;
	cut_lj3 = cut_lj * cut_lj * cut_lj;
	cut_lj3inv = cut_ljinv * cut_ljinv * cut_ljinv;
	cut_lj_inner3inv = cut_lj_innerinv * cut_lj_innerinv * cut_lj_innerinv;
	cut_lj_inner3 = cut_lj_inner * cut_lj_inner * cut_lj_inner;
	cut_lj6 = cut_ljsq * cut_ljsq * cut_ljsq;
	cut_lj6inv = cut_lj3inv * cut_lj3inv;
	cut_lj_inner6inv = cut_lj_inner3inv * cut_lj_inner3inv;
	cut_lj_inner6 = cut_lj_innersq * cut_lj_innersq * cut_lj_innersq;
	cut_coulsq = cut_coul * cut_coul;
	cut_bothsq = MAX(cut_ljsq,cut_coulsq);

	denom_lj = (cut_ljsq-cut_lj_innersq) * (cut_ljsq-cut_lj_innersq) *
	(cut_ljsq-cut_lj_innersq);
	denom_lj12 = 1.0/(cut_lj6 - cut_lj_inner6);
	denom_lj6 = 1.0/(cut_lj3 - cut_lj_inner3);

	// set & error check interior rRESPA cutoffs

	if (strstr(update->integrate_style,"respa") &&
	((Respa *) update->integrate)->level_inner >= 0) {
	cut_respa = ((Respa *) update->integrate)->cutoff;
	if (MIN(cut_lj,cut_coul) < cut_respa[3])
	error->all(FLERR,"Pair cutoff < Respa interior cutoff");
	if (cut_lj_inner < cut_respa[1])
	error->all(FLERR,"Pair inner cutoff < Respa interior cutoff");
	} else cut_respa = NULL;

	// insure use of KSpace long-range solver, set g_ewald

	if (force->kspace == NULL)
	error->all(FLERR,"Pair style requires a KSpace style");
	g_ewald = force->kspace->g_ewald;

	// setup force tables

	if (ncoultablebits) init_tables(cut_coul,cut_respa);
	}

	/* ----------------------------------------------------------------------
	neighbor callback to inform pair style of neighbor list to use
	regular or rRESPA
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::init_list(int id, NeighList *ptr)
	{
	if (id == 0) list = ptr;
	else if (id == 1) listinner = ptr;
	else if (id == 2) listmiddle = ptr;
	else if (id == 3) listouter = ptr;
	}

	/* ----------------------------------------------------------------------
	init for one type pair i,j and corresponding j,i
	------------------------------------------------------------------------- */

	double PairLJCharmmfswCoulLong::init_one(int i, int j)
	{
	if (setflag[i][j] == 0) {
	epsilon[i][j] = mix_energy(epsilon[i][i],epsilon[j][j],
	sigma[i][i],sigma[j][j]);
	sigma[i][j] = mix_distance(sigma[i][i],sigma[j][j]);
	eps14[i][j] = mix_energy(eps14[i][i],eps14[j][j],
	sigma14[i][i],sigma14[j][j]);
	sigma14[i][j] = mix_distance(sigma14[i][i],sigma14[j][j]);
	}

	double cut = MAX(cut_lj,cut_coul);

	lj1[i][j] = 48.0 * epsilon[i][j] * pow(sigma[i][j],12.0);
	lj2[i][j] = 24.0 * epsilon[i][j] * pow(sigma[i][j],6.0);
	lj3[i][j] = 4.0 * epsilon[i][j] * pow(sigma[i][j],12.0);
	lj4[i][j] = 4.0 * epsilon[i][j] * pow(sigma[i][j],6.0);
	lj14_1[i][j] = 48.0 * eps14[i][j] * pow(sigma14[i][j],12.0);
	lj14_2[i][j] = 24.0 * eps14[i][j] * pow(sigma14[i][j],6.0);
	lj14_3[i][j] = 4.0 * eps14[i][j] * pow(sigma14[i][j],12.0);
	lj14_4[i][j] = 4.0 * eps14[i][j] * pow(sigma14[i][j],6.0);

	lj1[j][i] = lj1[i][j];
	lj2[j][i] = lj2[i][j];
	lj3[j][i] = lj3[i][j];
	lj4[j][i] = lj4[i][j];
	lj14_1[j][i] = lj14_1[i][j];
	lj14_2[j][i] = lj14_2[i][j];
	lj14_3[j][i] = lj14_3[i][j];
	lj14_4[j][i] = lj14_4[i][j];

	return cut;
	}

	/* ----------------------------------------------------------------------
	proc 0 writes to restart file
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::write_restart(FILE *fp)
	{
	write_restart_settings(fp);

	int i,j;
	for (i = 1; i <= atom->ntypes; i++)
	for (j = i; j <= atom->ntypes; j++) {
	fwrite(&setflag[i][j],sizeof(int),1,fp);
	if (setflag[i][j]) {
	fwrite(&epsilon[i][j],sizeof(double),1,fp);
	fwrite(&sigma[i][j],sizeof(double),1,fp);
	fwrite(&eps14[i][j],sizeof(double),1,fp);
	fwrite(&sigma14[i][j],sizeof(double),1,fp);
	}
	}
	}

	/* ----------------------------------------------------------------------
	proc 0 reads from restart file, bcasts
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::read_restart(FILE *fp)
	{
	read_restart_settings(fp);

	allocate();

	int i,j;
	int me = comm->me;
	for (i = 1; i <= atom->ntypes; i++)
	for (j = i; j <= atom->ntypes; j++) {
	if (me == 0) fread(&setflag[i][j],sizeof(int),1,fp);
	MPI_Bcast(&setflag[i][j],1,MPI_INT,0,world);
	if (setflag[i][j]) {
	if (me == 0) {
	fread(&epsilon[i][j],sizeof(double),1,fp);
	fread(&sigma[i][j],sizeof(double),1,fp);
	fread(&eps14[i][j],sizeof(double),1,fp);
	fread(&sigma14[i][j],sizeof(double),1,fp);
	}
	MPI_Bcast(&epsilon[i][j],1,MPI_DOUBLE,0,world);
	MPI_Bcast(&sigma[i][j],1,MPI_DOUBLE,0,world);
	MPI_Bcast(&eps14[i][j],1,MPI_DOUBLE,0,world);
	MPI_Bcast(&sigma14[i][j],1,MPI_DOUBLE,0,world);
	}
	}
	}

	/* ----------------------------------------------------------------------
	proc 0 writes to restart file
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::write_restart_settings(FILE *fp)
	{
	fwrite(&cut_lj_inner,sizeof(double),1,fp);
	fwrite(&cut_lj,sizeof(double),1,fp);
	fwrite(&cut_coul,sizeof(double),1,fp);
	fwrite(&offset_flag,sizeof(int),1,fp);
	fwrite(&mix_flag,sizeof(int),1,fp);
	fwrite(&ncoultablebits,sizeof(int),1,fp);
	fwrite(&tabinner,sizeof(double),1,fp);
	}

	/* ----------------------------------------------------------------------
	proc 0 reads from restart file, bcasts
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::read_restart_settings(FILE *fp)
	{
	if (comm->me == 0) {
	fread(&cut_lj_inner,sizeof(double),1,fp);
	fread(&cut_lj,sizeof(double),1,fp);
	fread(&cut_coul,sizeof(double),1,fp);
	fread(&offset_flag,sizeof(int),1,fp);
	fread(&mix_flag,sizeof(int),1,fp);
	fread(&ncoultablebits,sizeof(int),1,fp);
	fread(&tabinner,sizeof(double),1,fp);
	}
	MPI_Bcast(&cut_lj_inner,1,MPI_DOUBLE,0,world);
	MPI_Bcast(&cut_lj,1,MPI_DOUBLE,0,world);
	MPI_Bcast(&cut_coul,1,MPI_DOUBLE,0,world);
	MPI_Bcast(&offset_flag,1,MPI_INT,0,world);
	MPI_Bcast(&mix_flag,1,MPI_INT,0,world);
	MPI_Bcast(&ncoultablebits,1,MPI_INT,0,world);
	MPI_Bcast(&tabinner,1,MPI_DOUBLE,0,world);
	}


	/* ----------------------------------------------------------------------
	proc 0 writes to data file
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::write_data(FILE *fp)
	{
	for (int i = 1; i <= atom->ntypes; i++)
	fprintf(fp,"%d %g %g %g %g\n",
	i,epsilon[i][i],sigma[i][i],eps14[i][i],sigma14[i][i]);
	}

	/* ----------------------------------------------------------------------
	proc 0 writes all pairs to data file
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::write_data_all(FILE *fp)
	{
	for (int i = 1; i <= atom->ntypes; i++)
	for (int j = i; j <= atom->ntypes; j++)
	fprintf(fp,"%d %d %g %g %g %g\n",i,j,
	epsilon[i][j],sigma[i][j],eps14[i][j],sigma14[i][j]);
	}

	/* ---------------------------------------------------------------------- */

	double PairLJCharmmfswCoulLong::single(int i, int j, int itype, int jtype,
	double rsq,
	double factor_coul, double factor_lj,
	double &fforce)
	{
	double r,rinv,r2inv,r3inv,r6inv,grij,expm2,t,erfc,prefactor;
	double switch1,fraction,table,forcecoul,forcelj,phicoul,philj,philj12,philj6;
	int itable;

	r = sqrt(rsq);
	rinv = 1.0/r;
	r2inv = 1.0/rsq;
	if (rsq < cut_coulsq) {
	if (!ncoultablebits \|\| rsq <= tabinnersq) {
	r = sqrt(rsq);
	grij = g_ewald * r;
	expm2 = exp(-grij*grij);
	t = 1.0 / (1.0 + EWALD_P*grij);
	erfc = t * (A1+t(A2+t(A3+t(A4+tA5)))) * expm2;
	prefactor = force->qqrd2e * atom->q[i]*atom->q[j]/r;
	forcecoul = prefactor * (erfc + EWALD_Fgrijexpm2);
	if (factor_coul < 1.0) forcecoul -= (1.0-factor_coul)*prefactor;
	} else {
	union_int_float_t rsq_lookup;
	rsq_lookup.f = rsq;
	itable = rsq_lookup.i & ncoulmask;
	itable >>= ncoulshiftbits;
	fraction = (rsq_lookup.f - rtable[itable]) * drtable[itable];
	table = ftable[itable] + fraction*dftable[itable];
	forcecoul = atom->q[i]atom->q[j] table;
	if (factor_coul < 1.0) {
	table = ctable[itable] + fraction*dctable[itable];
	prefactor = atom->q[i]atom->q[j] table;
	forcecoul -= (1.0-factor_coul)*prefactor;
	}
	}
	} else forcecoul = 0.0;
	if (rsq < cut_ljsq) {
	r3inv = rinvrinvrinv;
	r6inv = r3inv*r3inv;
	forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
	if (rsq > cut_lj_innersq) {
	switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
	(cut_ljsq + 2.0rsq - 3.0cut_lj_innersq) / denom_lj;
	forcelj = forcelj*switch1;
	}
	} else forcelj = 0.0;
	fforce = (forcecoul + factor_ljforcelj) r2inv;

	double eng = 0.0;
	if (rsq < cut_coulsq) {
	if (!ncoultablebits \|\| rsq <= tabinnersq)
	phicoul = prefactor*erfc;
	else {
	table = etable[itable] + fraction*detable[itable];
	phicoul = atom->q[i]atom->q[j] table;
	}
	if (factor_coul < 1.0) phicoul -= (1.0-factor_coul)*prefactor;
	eng += phicoul;
	}

	if (rsq < cut_ljsq) {
	if (rsq > cut_lj_innersq) {
	philj12 = lj3[itype][jtype]cut_lj6denom_lj12 *
	(r6inv - cut_lj6inv)*(r6inv - cut_lj6inv);
	philj6 = -lj4[itype][jtype]cut_lj3denom_lj6 *
	(r3inv - cut_lj3inv)*(r3inv - cut_lj3inv);;
	philj = philj12 + philj6;
	} else {
	philj12 = r6invlj3[itype][jtype]r6inv -
	lj3[itype][jtype]cut_lj_inner6invcut_lj6inv;
	philj6 = -lj4[itype][jtype]*r6inv +
	lj4[itype][jtype]cut_lj_inner3invcut_lj3inv;
	philj = philj12 + philj6;
	}
	eng += factor_lj*philj;
	}

	return eng;
	}

	/* ---------------------------------------------------------------------- */

	void PairLJCharmmfswCoulLong::extract(const char str, int &dim)
	{
	dim = 2;
	if (strcmp(str,"lj14_1") == 0) return (void *) lj14_1;
	if (strcmp(str,"lj14_2") == 0) return (void *) lj14_2;
	if (strcmp(str,"lj14_3") == 0) return (void *) lj14_3;
	if (strcmp(str,"lj14_4") == 0) return (void *) lj14_4;

	dim = 0;
	if (strcmp(str,"implicit") == 0) return (void *) &implicit;

	// info extracted by dihedral_charmmfsw

	if (strcmp(str,"cut_coul") == 0) return (void *) &cut_coul;
	if (strcmp(str,"cut_lj_inner") == 0) return (void *) &cut_lj_inner;
	if (strcmp(str,"cut_lj") == 0) return (void *) &cut_lj;
	if (strcmp(str,"dihedflag") == 0) return (void *) &dihedflag;

	return NULL;
	}
	diff --git a/src/KSPACE/pppm_disp.cpp b/src/KSPACE/pppm_disp.cpp
	index 5d6c2042b..b31d42a81 100644
	--- a/src/KSPACE/pppm_disp.cpp
	+++ b/src/KSPACE/pppm_disp.cpp
	@@ -1,8256 +1,8256 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing authors: Rolf Isele-Holder (Aachen University)
	Paul Crozier (SNL)
	------------------------------------------------------------------------- */

	#include <mpi.h>
	#include <string.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <math.h>
	#include "pppm_disp.h"
	#include "math_const.h"
	#include "atom.h"
	#include "comm.h"
	#include "gridcomm.h"
	#include "neighbor.h"
	#include "force.h"
	#include "pair.h"
	#include "bond.h"
	#include "angle.h"
	#include "domain.h"
	#include "fft3d_wrap.h"
	#include "remap_wrap.h"
	#include "memory.h"
	#include "error.h"

	using namespace LAMMPS_NS;
	using namespace MathConst;

	#define MAXORDER 7
	#define OFFSET 16384
	#define SMALL 0.00001
	#define LARGE 10000.0
	#define EPS_HOC 1.0e-7

	enum{GEOMETRIC,ARITHMETIC,SIXTHPOWER};
	enum{REVERSE_RHO, REVERSE_RHO_G, REVERSE_RHO_A, REVERSE_RHO_NONE};
	enum{FORWARD_IK, FORWARD_AD, FORWARD_IK_PERATOM, FORWARD_AD_PERATOM,
	FORWARD_IK_G, FORWARD_AD_G, FORWARD_IK_PERATOM_G, FORWARD_AD_PERATOM_G,
	FORWARD_IK_A, FORWARD_AD_A, FORWARD_IK_PERATOM_A, FORWARD_AD_PERATOM_A,
	FORWARD_IK_NONE, FORWARD_AD_NONE, FORWARD_IK_PERATOM_NONE, FORWARD_AD_PERATOM_NONE};


	#ifdef FFT_SINGLE
	#define ZEROF 0.0f
	#define ONEF 1.0f
	#else
	#define ZEROF 0.0
	#define ONEF 1.0
	#endif

	/* ---------------------------------------------------------------------- */

	PPPMDisp::PPPMDisp(LAMMPS lmp, int narg, char *arg) : KSpace(lmp, narg, arg),
	factors(NULL), csumi(NULL), cii(NULL), B(NULL), density_brick(NULL), vdx_brick(NULL),
	vdy_brick(NULL), vdz_brick(NULL), density_fft(NULL), u_brick(NULL), v0_brick(NULL),
	v1_brick(NULL), v2_brick(NULL), v3_brick(NULL), v4_brick(NULL), v5_brick(NULL),
	density_brick_g(NULL), vdx_brick_g(NULL), vdy_brick_g(NULL), vdz_brick_g(NULL),
	density_fft_g(NULL), u_brick_g(NULL), v0_brick_g(NULL), v1_brick_g(NULL), v2_brick_g(NULL),
	v3_brick_g(NULL), v4_brick_g(NULL), v5_brick_g(NULL), density_brick_a0(NULL),
	vdx_brick_a0(NULL), vdy_brick_a0(NULL), vdz_brick_a0(NULL), density_fft_a0(NULL),
	u_brick_a0(NULL), v0_brick_a0(NULL), v1_brick_a0(NULL), v2_brick_a0(NULL),
	v3_brick_a0(NULL), v4_brick_a0(NULL), v5_brick_a0(NULL), density_brick_a1(NULL),
	vdx_brick_a1(NULL), vdy_brick_a1(NULL), vdz_brick_a1(NULL), density_fft_a1(NULL),
	u_brick_a1(NULL), v0_brick_a1(NULL), v1_brick_a1(NULL), v2_brick_a1(NULL),
	v3_brick_a1(NULL), v4_brick_a1(NULL), v5_brick_a1(NULL), density_brick_a2(NULL),
	vdx_brick_a2(NULL), vdy_brick_a2(NULL), vdz_brick_a2(NULL), density_fft_a2(NULL),
	u_brick_a2(NULL), v0_brick_a2(NULL), v1_brick_a2(NULL), v2_brick_a2(NULL),
	v3_brick_a2(NULL), v4_brick_a2(NULL), v5_brick_a2(NULL), density_brick_a3(NULL),
	vdx_brick_a3(NULL), vdy_brick_a3(NULL), vdz_brick_a3(NULL), density_fft_a3(NULL),
	u_brick_a3(NULL), v0_brick_a3(NULL), v1_brick_a3(NULL), v2_brick_a3(NULL),
	v3_brick_a3(NULL), v4_brick_a3(NULL), v5_brick_a3(NULL), density_brick_a4(NULL),
	vdx_brick_a4(NULL), vdy_brick_a4(NULL), vdz_brick_a4(NULL), density_fft_a4(NULL),
	u_brick_a4(NULL), v0_brick_a4(NULL), v1_brick_a4(NULL), v2_brick_a4(NULL),
	v3_brick_a4(NULL), v4_brick_a4(NULL), v5_brick_a4(NULL), density_brick_a5(NULL),
	vdx_brick_a5(NULL), vdy_brick_a5(NULL), vdz_brick_a5(NULL), density_fft_a5(NULL),
	u_brick_a5(NULL), v0_brick_a5(NULL), v1_brick_a5(NULL), v2_brick_a5(NULL),
	v3_brick_a5(NULL), v4_brick_a5(NULL), v5_brick_a5(NULL), density_brick_a6(NULL),
	vdx_brick_a6(NULL), vdy_brick_a6(NULL), vdz_brick_a6(NULL), density_fft_a6(NULL),
	u_brick_a6(NULL), v0_brick_a6(NULL), v1_brick_a6(NULL), v2_brick_a6(NULL),
	v3_brick_a6(NULL), v4_brick_a6(NULL), v5_brick_a6(NULL), density_brick_none(NULL),
	vdx_brick_none(NULL), vdy_brick_none(NULL), vdz_brick_none(NULL),
	density_fft_none(NULL), u_brick_none(NULL), v0_brick_none(NULL), v1_brick_none(NULL),
	v2_brick_none(NULL), v3_brick_none(NULL), v4_brick_none(NULL), v5_brick_none(NULL),
	greensfn(NULL), vg(NULL), vg2(NULL), greensfn_6(NULL), vg_6(NULL), vg2_6(NULL),
	fkx(NULL), fky(NULL), fkz(NULL), fkx2(NULL), fky2(NULL), fkz2(NULL), fkx_6(NULL),
	fky_6(NULL), fkz_6(NULL), fkx2_6(NULL), fky2_6(NULL), fkz2_6(NULL), gf_b(NULL),
	gf_b_6(NULL), sf_precoeff1(NULL), sf_precoeff2(NULL), sf_precoeff3(NULL),
	sf_precoeff4(NULL), sf_precoeff5(NULL), sf_precoeff6(NULL), sf_precoeff1_6(NULL),
	sf_precoeff2_6(NULL), sf_precoeff3_6(NULL), sf_precoeff4_6(NULL), sf_precoeff5_6(NULL),
	sf_precoeff6_6(NULL), rho1d(NULL), rho_coeff(NULL), drho1d(NULL), drho_coeff(NULL),
	rho1d_6(NULL), rho_coeff_6(NULL), drho1d_6(NULL), drho_coeff_6(NULL), work1(NULL),
	work2(NULL), work1_6(NULL), work2_6(NULL), fft1(NULL), fft2(NULL), fft1_6(NULL),
	fft2_6(NULL), remap(NULL), remap_6(NULL), cg(NULL), cg_peratom(NULL), cg_6(NULL),
	cg_peratom_6(NULL), part2grid(NULL), part2grid_6(NULL), boxlo(NULL)
	{
	if (narg < 1) error->all(FLERR,"Illegal kspace_style pppm/disp command");

	triclinic_support = 0;
	pppmflag = dispersionflag = 1;
	accuracy_relative = fabs(force->numeric(FLERR,arg[0]));

	nfactors = 3;
	factors = new int[nfactors];
	factors[0] = 2;
	factors[1] = 3;
	factors[2] = 5;

	MPI_Comm_rank(world,&me);
	MPI_Comm_size(world,&nprocs);

	csumflag = 0;
	B = NULL;
	cii = NULL;
	csumi = NULL;
	peratom_allocate_flag = 0;

	density_brick = vdx_brick = vdy_brick = vdz_brick = NULL;
	density_fft = NULL;
	u_brick = v0_brick = v1_brick = v2_brick = v3_brick =
	v4_brick = v5_brick = NULL;

	density_brick_g = vdx_brick_g = vdy_brick_g = vdz_brick_g = NULL;
	density_fft_g = NULL;
	u_brick_g = v0_brick_g = v1_brick_g = v2_brick_g = v3_brick_g =
	v4_brick_g = v5_brick_g = NULL;

	density_brick_a0 = vdx_brick_a0 = vdy_brick_a0 = vdz_brick_a0 = NULL;
	density_fft_a0 = NULL;
	u_brick_a0 = v0_brick_a0 = v1_brick_a0 = v2_brick_a0 = v3_brick_a0 =
	v4_brick_a0 = v5_brick_a0 = NULL;

	density_brick_a1 = vdx_brick_a1 = vdy_brick_a1 = vdz_brick_a1 = NULL;
	density_fft_a1 = NULL;
	u_brick_a1 = v0_brick_a1 = v1_brick_a1 = v2_brick_a1 = v3_brick_a1 =
	v4_brick_a1 = v5_brick_a1 = NULL;

	density_brick_a2 = vdx_brick_a2 = vdy_brick_a2 = vdz_brick_a2 = NULL;
	density_fft_a2 = NULL;
	u_brick_a2 = v0_brick_a2 = v1_brick_a2 = v2_brick_a2 = v3_brick_a2 =
	v4_brick_a2 = v5_brick_a2 = NULL;

	density_brick_a3 = vdx_brick_a3 = vdy_brick_a3 = vdz_brick_a3 = NULL;
	density_fft_a3 = NULL;
	u_brick_a3 = v0_brick_a3 = v1_brick_a3 = v2_brick_a3 = v3_brick_a3 =
	v4_brick_a3 = v5_brick_a3 = NULL;

	density_brick_a4 = vdx_brick_a4 = vdy_brick_a4 = vdz_brick_a4 = NULL;
	density_fft_a4 = NULL;
	u_brick_a4 = v0_brick_a4 = v1_brick_a4 = v2_brick_a4 = v3_brick_a4 =
	v4_brick_a4 = v5_brick_a4 = NULL;

	density_brick_a5 = vdx_brick_a5 = vdy_brick_a5 = vdz_brick_a5 = NULL;
	density_fft_a5 = NULL;
	u_brick_a5 = v0_brick_a5 = v1_brick_a5 = v2_brick_a5 = v3_brick_a5 =
	v4_brick_a5 = v5_brick_a5 = NULL;

	density_brick_a6 = vdx_brick_a6 = vdy_brick_a6 = vdz_brick_a6 = NULL;
	density_fft_a6 = NULL;
	u_brick_a6 = v0_brick_a6 = v1_brick_a6 = v2_brick_a6 = v3_brick_a6 =
	v4_brick_a6 = v5_brick_a6 = NULL;

	density_brick_none = vdx_brick_none = vdy_brick_none = vdz_brick_none = NULL;
	density_fft_none = NULL;
	u_brick_none = v0_brick_none = v1_brick_none = v2_brick_none = v3_brick_none =
	v4_brick_none = v5_brick_none = NULL;

	greensfn = NULL;
	greensfn_6 = NULL;
	work1 = work2 = NULL;
	work1_6 = work2_6 = NULL;
	vg = NULL;
	vg2 = NULL;
	vg_6 = NULL;
	vg2_6 = NULL;
	fkx = fky = fkz = NULL;
	fkx2 = fky2 = fkz2 = NULL;
	fkx_6 = fky_6 = fkz_6 = NULL;
	fkx2_6 = fky2_6 = fkz2_6 = NULL;

	sf_precoeff1 = sf_precoeff2 = sf_precoeff3 = sf_precoeff4 =
	sf_precoeff5 = sf_precoeff6 = NULL;
	sf_precoeff1_6 = sf_precoeff2_6 = sf_precoeff3_6 = sf_precoeff4_6 =
	sf_precoeff5_6 = sf_precoeff6_6 = NULL;

	gf_b = NULL;
	gf_b_6 = NULL;
	rho1d = rho_coeff = NULL;
	drho1d = drho_coeff = NULL;
	rho1d_6 = rho_coeff_6 = NULL;
	drho1d_6 = drho_coeff_6 = NULL;
	fft1 = fft2 = NULL;
	fft1_6 = fft2_6 = NULL;
	remap = NULL;
	remap_6 = NULL;

	nmax = 0;
	part2grid = NULL;
	part2grid_6 = NULL;

	cg = NULL;
	cg_peratom = NULL;
	cg_6 = NULL;
	cg_peratom_6 = NULL;

	memset(function, 0, EWALD_FUNCS*sizeof(int));
	}

	/* ----------------------------------------------------------------------
	free all memory
	------------------------------------------------------------------------- */

	PPPMDisp::~PPPMDisp()
	{
	delete [] factors;
	delete [] B;
	B = NULL;
	delete [] cii;
	cii = NULL;
	delete [] csumi;
	csumi = NULL;
	deallocate();
	deallocate_peratom();
	memory->destroy(part2grid);
	memory->destroy(part2grid_6);
	part2grid = part2grid_6 = NULL;
	}

	/* ----------------------------------------------------------------------
	called once before run
	------------------------------------------------------------------------- */

	void PPPMDisp::init()
	{
	if (me == 0) {
	if (screen) fprintf(screen,"PPPMDisp initialization ...\n");
	if (logfile) fprintf(logfile,"PPPMDisp initialization ...\n");
	}

	triclinic_check();
	if (domain->dimension == 2)
	error->all(FLERR,"Cannot use PPPMDisp with 2d simulation");
	if (comm->style != 0)
	error->universe_all(FLERR,"PPPMDisp can only currently be used with "
	"comm_style brick");

	if (slabflag == 0 && domain->nonperiodic > 0)
	error->all(FLERR,"Cannot use nonperiodic boundaries with PPPMDisp");
	if (slabflag == 1) {
	if (domain->xperiodic != 1 \|\| domain->yperiodic != 1 \|\|
	domain->boundary[2][0] != 1 \|\| domain->boundary[2][1] != 1)
	error->all(FLERR,"Incorrect boundaries with slab PPPMDisp");
	}

	if (order > MAXORDER \|\| order_6 > MAXORDER) {
	char str[128];
	sprintf(str,"PPPMDisp coulomb order cannot be greater than %d",MAXORDER);
	error->all(FLERR,str);
	}

	// free all arrays previously allocated

	deallocate();
	deallocate_peratom();

	// check whether cutoff and pair style are set

	triclinic = domain->triclinic;
	pair_check();

	int tmp;
	Pair *pair = force->pair;
	int ptr = pair ? (int ) pair->extract("ewald_order",tmp) : NULL;
	double p_cutoff = pair ? (double ) pair->extract("cut_coul",tmp) : NULL;
	double p_cutoff_lj = pair ? (double ) pair->extract("cut_LJ",tmp) : NULL;
	if (!(ptr\|\|p_cutoff\|\|p_cutoff_lj))
	error->all(FLERR,"KSpace style is incompatible with Pair style");
	cutoff = *p_cutoff;
	cutoff_lj = *p_cutoff_lj;

	double tmp2;
	MPI_Allreduce(&cutoff, &tmp2,1,MPI_DOUBLE,MPI_SUM,world);

	// check out which types of potentials will have to be calculated

	int ewald_order = ptr ? ((int ) ptr) : 1<<1;
	int ewald_mix = ptr ? ((int ) pair->extract("ewald_mix",tmp)) : GEOMETRIC;
	memset(function, 0, EWALD_FUNCS*sizeof(int));
	for (int i=0; i<=EWALD_MAXORDER; ++i) // transcribe order
	if (ewald_order&(1<<i)) { // from pair_style
	int k=0;
	char str[128];
	switch (i) {
	case 1:
	k = 0; break;
	case 6:
	if ((ewald_mix==GEOMETRIC \|\| ewald_mix==SIXTHPOWER \|\|
	mixflag == 1) && mixflag!= 2) { k = 1; break; }
	else if (ewald_mix==ARITHMETIC && mixflag!=2) { k = 2; break; }
	else if (mixflag == 2) { k = 3; break; }
	default:
	sprintf(str, "Unsupported order in kspace_style "
	"pppm/disp, pair_style %s", force->pair_style);
	error->all(FLERR,str);
	}
	function[k] = 1;
	}


	// warn, if function[0] is not set but charge attribute is set!

	if (!function[0] && atom->q_flag && me == 0) {
	char str[128];
	sprintf(str, "Charges are set, but coulombic solver is not used");
	error->warning(FLERR, str);
	}

	// show error message if pppm/disp is not used correctly

	if (function[1] \|\| function[2] \|\| function[3]) {
	if (!gridflag_6 && !gewaldflag_6 && accuracy_real_6 < 0
	&& accuracy_kspace_6 < 0 && !auto_disp_flag) {
	error->all(FLERR, "PPPMDisp used but no parameters set, "
	"for further information please see the pppm/disp "
	"documentation");
	}
	}

	// compute qsum & qsqsum, if function[0] is set, warn if not charge-neutral

	scale = 1.0;
	qqrd2e = force->qqrd2e;
	natoms_original = atom->natoms;

	if (function[0]) qsum_qsq();

	// if kspace is TIP4P, extract TIP4P params from pair style
	// bond/angle are not yet init(), so insure equilibrium request is valid

	qdist = 0.0;

	if (tip4pflag) {
	int itmp;
	double p_qdist = (double ) force->pair->extract("qdist",itmp);
	int p_typeO = (int ) force->pair->extract("typeO",itmp);
	int p_typeH = (int ) force->pair->extract("typeH",itmp);
	int p_typeA = (int ) force->pair->extract("typeA",itmp);
	int p_typeB = (int ) force->pair->extract("typeB",itmp);
	if (!p_qdist \|\| !p_typeO \|\| !p_typeH \|\| !p_typeA \|\| !p_typeB)
	error->all(FLERR,"KSpace style is incompatible with Pair style");
	qdist = *p_qdist;
	typeO = *p_typeO;
	typeH = *p_typeH;
	int typeA = *p_typeA;
	int typeB = *p_typeB;

	if (force->angle == NULL \|\| force->bond == NULL)
	error->all(FLERR,"Bond and angle potentials must be defined for TIP4P");
	if (typeA < 1 \|\| typeA > atom->nangletypes \|\|
	force->angle->setflag[typeA] == 0)
	error->all(FLERR,"Bad TIP4P angle type for PPPMDisp/TIP4P");
	if (typeB < 1 \|\| typeB > atom->nbondtypes \|\|
	force->bond->setflag[typeB] == 0)
	error->all(FLERR,"Bad TIP4P bond type for PPPMDisp/TIP4P");
	double theta = force->angle->equilibrium_angle(typeA);
	double blen = force->bond->equilibrium_distance(typeB);
	alpha = qdist / (cos(0.5theta) blen);
	}

	+ //if g_ewald and g_ewald_6 have not been specified, set some initial value
	+ // to avoid problems when calculating the energies!
	+
	+ if (!gewaldflag) g_ewald = 1;
	+ if (!gewaldflag_6) g_ewald_6 = 1;
	+
	// initialize the pair style to get the coefficients

	neighrequest_flag = 0;
	pair->init();
	neighrequest_flag = 1;
	init_coeffs();

	- //if g_ewald and g_ewald_6 have not been specified, set some initial value
	- // to avoid problems when calculating the energies!
	-
	- if (!gewaldflag) g_ewald = 1;
	- if (!gewaldflag_6) g_ewald_6 = 1;
	-
	// set accuracy (force units) from accuracy_relative or accuracy_absolute

	if (accuracy_absolute >= 0.0) accuracy = accuracy_absolute;
	else accuracy = accuracy_relative * two_charge_force;

	int (*procneigh)[2] = comm->procneigh;

	int iteration = 0;
	if (function[0]) {
	GridComm *cgtmp = NULL;
	while (order >= minorder) {

	if (iteration && me == 0)
	error->warning(FLERR,"Reducing PPPMDisp Coulomb order "
	"b/c stencil extends beyond neighbor processor");
	iteration++;

	// set grid for dispersion interaction and coulomb interactions

	set_grid();

	if (nx_pppm >= OFFSET \|\| ny_pppm >= OFFSET \|\| nz_pppm >= OFFSET)
	error->all(FLERR,"PPPMDisp Coulomb grid is too large");

	set_fft_parameters(nx_pppm, ny_pppm, nz_pppm,
	nxlo_fft, nylo_fft, nzlo_fft,
	nxhi_fft, nyhi_fft, nzhi_fft,
	nxlo_in, nylo_in, nzlo_in,
	nxhi_in, nyhi_in, nzhi_in,
	nxlo_out, nylo_out, nzlo_out,
	nxhi_out, nyhi_out, nzhi_out,
	nlower, nupper,
	ngrid, nfft, nfft_both,
	shift, shiftone, order);

	if (overlap_allowed) break;

	cgtmp = new GridComm(lmp, world,1,1,
	nxlo_in,nxhi_in,nylo_in,nyhi_in,nzlo_in,nzhi_in,
	nxlo_out,nxhi_out,nylo_out,nyhi_out,
	nzlo_out,nzhi_out,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);
	cgtmp->ghost_notify();
	if (!cgtmp->ghost_overlap()) break;
	delete cgtmp;

	order--;
	}

	if (order < minorder)
	error->all(FLERR,
	"Coulomb PPPMDisp order has been reduced below minorder");
	if (cgtmp) delete cgtmp;

	// adjust g_ewald

	if (!gewaldflag) adjust_gewald();

	// calculate the final accuracy

	double acc = final_accuracy();

	// print stats

	int ngrid_max,nfft_both_max;
	MPI_Allreduce(&ngrid,&ngrid_max,1,MPI_INT,MPI_MAX,world);
	MPI_Allreduce(&nfft_both,&nfft_both_max,1,MPI_INT,MPI_MAX,world);

	if (me == 0) {
	#ifdef FFT_SINGLE
	const char fft_prec[] = "single";
	#else
	const char fft_prec[] = "double";
	#endif

	if (screen) {
	fprintf(screen," Coulomb G vector (1/distance)= %g\n",g_ewald);
	fprintf(screen," Coulomb grid = %d %d %d\n",nx_pppm,ny_pppm,nz_pppm);
	fprintf(screen," Coulomb stencil order = %d\n",order);
	fprintf(screen," Coulomb estimated absolute RMS force accuracy = %g\n",
	acc);
	fprintf(screen," Coulomb estimated relative force accuracy = %g\n",
	acc/two_charge_force);
	fprintf(screen," using %s precision FFTs\n",fft_prec);
	fprintf(screen," 3d grid and FFT values/proc = %d %d\n",
	ngrid_max, nfft_both_max);
	}
	if (logfile) {
	fprintf(logfile," Coulomb G vector (1/distance) = %g\n",g_ewald);
	fprintf(logfile," Coulomb grid = %d %d %d\n",nx_pppm,ny_pppm,nz_pppm);
	fprintf(logfile," Coulomb stencil order = %d\n",order);
	fprintf(logfile,
	" Coulomb estimated absolute RMS force accuracy = %g\n",
	acc);
	fprintf(logfile," Coulomb estimated relative force accuracy = %g\n",
	acc/two_charge_force);
	fprintf(logfile," using %s precision FFTs\n",fft_prec);
	fprintf(logfile," 3d grid and FFT values/proc = %d %d\n",
	ngrid_max, nfft_both_max);
	}
	}
	}

	iteration = 0;
	if (function[1] + function[2] + function[3]) {
	GridComm *cgtmp = NULL;
	while (order_6 >= minorder) {

	if (iteration && me == 0)
	error->warning(FLERR,"Reducing PPPMDisp dispersion order "
	"b/c stencil extends beyond neighbor processor");
	iteration++;

	set_grid_6();

	if (nx_pppm_6 >= OFFSET \|\| ny_pppm_6 >= OFFSET \|\| nz_pppm_6 >= OFFSET)
	error->all(FLERR,"PPPMDisp Dispersion grid is too large");

	set_fft_parameters(nx_pppm_6, ny_pppm_6, nz_pppm_6,
	nxlo_fft_6, nylo_fft_6, nzlo_fft_6,
	nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
	nxlo_in_6, nylo_in_6, nzlo_in_6,
	nxhi_in_6, nyhi_in_6, nzhi_in_6,
	nxlo_out_6, nylo_out_6, nzlo_out_6,
	nxhi_out_6, nyhi_out_6, nzhi_out_6,
	nlower_6, nupper_6,
	ngrid_6, nfft_6, nfft_both_6,
	shift_6, shiftone_6, order_6);

	if (overlap_allowed) break;

	cgtmp = new GridComm(lmp,world,1,1,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,
	nzlo_in_6,nzhi_in_6,
	nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,
	nzlo_out_6,nzhi_out_6,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);
	cgtmp->ghost_notify();
	if (!cgtmp->ghost_overlap()) break;
	delete cgtmp;
	order_6--;
	}

	if (order_6 < minorder)
	error->all(FLERR,"Dispersion PPPMDisp order has been "
	"reduced below minorder");
	if (cgtmp) delete cgtmp;

	// adjust g_ewald_6

	if (!gewaldflag_6 && accuracy_kspace_6 == accuracy_real_6)
	adjust_gewald_6();

	// calculate the final accuracy

	double acc, acc_real, acc_kspace;
	final_accuracy_6(acc, acc_real, acc_kspace);


	// print stats

	int ngrid_max,nfft_both_max;
	MPI_Allreduce(&ngrid_6,&ngrid_max,1,MPI_INT,MPI_MAX,world);
	MPI_Allreduce(&nfft_both_6,&nfft_both_max,1,MPI_INT,MPI_MAX,world);

	if (me == 0) {
	#ifdef FFT_SINGLE
	const char fft_prec[] = "single";
	#else
	const char fft_prec[] = "double";
	#endif

	if (screen) {
	fprintf(screen," Dispersion G vector (1/distance)= %g\n",g_ewald_6);
	fprintf(screen," Dispersion grid = %d %d %d\n",
	nx_pppm_6,ny_pppm_6,nz_pppm_6);
	fprintf(screen," Dispersion stencil order = %d\n",order_6);
	fprintf(screen," Dispersion estimated absolute "
	"RMS force accuracy = %g\n",acc);
	fprintf(screen," Dispersion estimated absolute "
	"real space RMS force accuracy = %g\n",acc_real);
	fprintf(screen," Dispersion estimated absolute "
	"kspace RMS force accuracy = %g\n",acc_kspace);
	fprintf(screen," Dispersion estimated relative force accuracy = %g\n",
	acc/two_charge_force);
	fprintf(screen," using %s precision FFTs\n",fft_prec);
	fprintf(screen," 3d grid and FFT values/proc dispersion = %d %d\n",
	ngrid_max,nfft_both_max);
	}
	if (logfile) {
	fprintf(logfile," Dispersion G vector (1/distance) = %g\n",g_ewald_6);
	fprintf(logfile," Dispersion grid = %d %d %d\n",
	nx_pppm_6,ny_pppm_6,nz_pppm_6);
	fprintf(logfile," Dispersion stencil order = %d\n",order_6);
	fprintf(logfile," Dispersion estimated absolute "
	"RMS force accuracy = %g\n",acc);
	fprintf(logfile," Dispersion estimated absolute "
	"real space RMS force accuracy = %g\n",acc_real);
	fprintf(logfile," Dispersion estimated absolute "
	"kspace RMS force accuracy = %g\n",acc_kspace);
	fprintf(logfile," Disperion estimated relative force accuracy = %g\n",
	acc/two_charge_force);
	fprintf(logfile," using %s precision FFTs\n",fft_prec);
	fprintf(logfile," 3d grid and FFT values/proc dispersion = %d %d\n",
	ngrid_max,nfft_both_max);
	}
	}
	}

	// allocate K-space dependent memory

	allocate();

	// pre-compute Green's function denomiator expansion
	// pre-compute 1d charge distribution coefficients

	if (function[0]) {
	compute_gf_denom(gf_b, order);
	compute_rho_coeff(rho_coeff, drho_coeff, order);
	cg->ghost_notify();
	cg->setup();
	if (differentiation_flag == 1)
	compute_sf_precoeff(nx_pppm, ny_pppm, nz_pppm, order,
	nxlo_fft, nylo_fft, nzlo_fft,
	nxhi_fft, nyhi_fft, nzhi_fft,
	sf_precoeff1, sf_precoeff2, sf_precoeff3,
	sf_precoeff4, sf_precoeff5, sf_precoeff6);
	}
	if (function[1] + function[2] + function[3]) {
	compute_gf_denom(gf_b_6, order_6);
	compute_rho_coeff(rho_coeff_6, drho_coeff_6, order_6);
	cg_6->ghost_notify();
	cg_6->setup();
	if (differentiation_flag == 1)
	compute_sf_precoeff(nx_pppm_6, ny_pppm_6, nz_pppm_6, order_6,
	nxlo_fft_6, nylo_fft_6, nzlo_fft_6,
	nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
	sf_precoeff1_6, sf_precoeff2_6, sf_precoeff3_6,
	sf_precoeff4_6, sf_precoeff5_6, sf_precoeff6_6);
	}

	}

	/* ----------------------------------------------------------------------
	adjust PPPM coeffs, called initially and whenever volume has changed
	------------------------------------------------------------------------- */

	void PPPMDisp::setup()
	{

	if (slabflag == 0 && domain->nonperiodic > 0)
	error->all(FLERR,"Cannot use nonperiodic boundaries with PPPMDisp");
	if (slabflag == 1) {
	if (domain->xperiodic != 1 \|\| domain->yperiodic != 1 \|\|
	domain->boundary[2][0] != 1 \|\| domain->boundary[2][1] != 1)
	error->all(FLERR,"Incorrect boundaries with slab PPPMDisp");
	}

	double *prd;

	// volume-dependent factors
	// adjust z dimension for 2d slab PPPM
	// z dimension for 3d PPPM is zprd since slab_volfactor = 1.0

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;
	volume = xprd * yprd * zprd_slab;

	// compute fkx,fky,fkz for my FFT grid pts

	double unitkx = (2.0*MY_PI/xprd);
	double unitky = (2.0*MY_PI/yprd);
	double unitkz = (2.0*MY_PI/zprd_slab);

	//compute the virial coefficients and green functions
	if (function[0]){

	delxinv = nx_pppm/xprd;
	delyinv = ny_pppm/yprd;
	delzinv = nz_pppm/zprd_slab;

	delvolinv = delxinvdelyinvdelzinv;

	double per;
	int i, j, k, n;

	for (i = nxlo_fft; i <= nxhi_fft; i++) {
	per = i - nx_pppm(2i/nx_pppm);
	fkx[i] = unitkx*per;
	j = (nx_pppm - i) % nx_pppm;
	per = j - nx_pppm(2j/nx_pppm);
	fkx2[i] = unitkx*per;
	}

	for (i = nylo_fft; i <= nyhi_fft; i++) {
	per = i - ny_pppm(2i/ny_pppm);
	fky[i] = unitky*per;
	j = (ny_pppm - i) % ny_pppm;
	per = j - ny_pppm(2j/ny_pppm);
	fky2[i] = unitky*per;
	}

	for (i = nzlo_fft; i <= nzhi_fft; i++) {
	per = i - nz_pppm(2i/nz_pppm);
	fkz[i] = unitkz*per;
	j = (nz_pppm - i) % nz_pppm;
	per = j - nz_pppm(2j/nz_pppm);
	fkz2[i] = unitkz*per;
	}

	double sqk,vterm;
	double gew2inv = 1/(g_ewald*g_ewald);
	n = 0;
	for (k = nzlo_fft; k <= nzhi_fft; k++) {
	for (j = nylo_fft; j <= nyhi_fft; j++) {
	for (i = nxlo_fft; i <= nxhi_fft; i++) {
	sqk = fkx[i]fkx[i] + fky[j]fky[j] + fkz[k]*fkz[k];
	if (sqk == 0.0) {
	vg[n][0] = 0.0;
	vg[n][1] = 0.0;
	vg[n][2] = 0.0;
	vg[n][3] = 0.0;
	vg[n][4] = 0.0;
	vg[n][5] = 0.0;
	} else {
	vterm = -2.0 * (1.0/sqk + 0.25*gew2inv);
	vg[n][0] = 1.0 + vtermfkx[i]fkx[i];
	vg[n][1] = 1.0 + vtermfky[j]fky[j];
	vg[n][2] = 1.0 + vtermfkz[k]fkz[k];
	vg[n][3] = vtermfkx[i]fky[j];
	vg[n][4] = vtermfkx[i]fkz[k];
	vg[n][5] = vtermfky[j]fkz[k];
	vg2[n][0] = vterm0.5(fkx[i]fky[j] + fkx2[i]fky2[j]);
	vg2[n][1] = vterm0.5(fkx[i]fkz[k] + fkx2[i]fkz2[k]);
	vg2[n][2] = vterm0.5(fky[j]fkz[k] + fky2[j]fkz2[k]);
	}
	n++;
	}
	}
	}
	compute_gf();
	if (differentiation_flag == 1) compute_sf_coeff();
	}

	if (function[1] + function[2] + function[3]) {
	delxinv_6 = nx_pppm_6/xprd;
	delyinv_6 = ny_pppm_6/yprd;
	delzinv_6 = nz_pppm_6/zprd_slab;
	delvolinv_6 = delxinv_6delyinv_6delzinv_6;

	double per;
	int i, j, k, n;
	for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
	per = i - nx_pppm_6(2i/nx_pppm_6);
	fkx_6[i] = unitkx*per;
	j = (nx_pppm_6 - i) % nx_pppm_6;
	per = j - nx_pppm_6(2j/nx_pppm_6);
	fkx2_6[i] = unitkx*per;
	}
	for (i = nylo_fft_6; i <= nyhi_fft_6; i++) {
	per = i - ny_pppm_6(2i/ny_pppm_6);
	fky_6[i] = unitky*per;
	j = (ny_pppm_6 - i) % ny_pppm_6;
	per = j - ny_pppm_6(2j/ny_pppm_6);
	fky2_6[i] = unitky*per;
	}
	for (i = nzlo_fft_6; i <= nzhi_fft_6; i++) {
	per = i - nz_pppm_6(2i/nz_pppm_6);
	fkz_6[i] = unitkz*per;
	j = (nz_pppm_6 - i) % nz_pppm_6;
	per = j - nz_pppm_6(2j/nz_pppm_6);
	fkz2_6[i] = unitkz*per;
	}
	double sqk,vterm;
	long double erft, expt,nom, denom;
	long double b, bs, bt;
	double rtpi = sqrt(MY_PI);
	double gewinv = 1/g_ewald_6;
	n = 0;
	for (k = nzlo_fft_6; k <= nzhi_fft_6; k++) {
	for (j = nylo_fft_6; j <= nyhi_fft_6; j++) {
	for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
	sqk = fkx_6[i]fkx_6[i] + fky_6[j]fky_6[j] + fkz_6[k]*fkz_6[k];
	if (sqk == 0.0) {
	vg_6[n][0] = 0.0;
	vg_6[n][1] = 0.0;
	vg_6[n][2] = 0.0;
	vg_6[n][3] = 0.0;
	vg_6[n][4] = 0.0;
	vg_6[n][5] = 0.0;
	} else {
	b = 0.5sqrt(sqk)gewinv;
	bs = b*b;
	bt = bs*b;
	erft = 2btrtpi*erfc((double) b);
	expt = exp(-bs);
	nom = erft - 2bsexpt;
	denom = nom + expt;
	if (denom == 0) vterm = 3.0/sqk;
	else vterm = 3.0nom/(sqkdenom);
	vg_6[n][0] = 1.0 + vtermfkx_6[i]fkx_6[i];
	vg_6[n][1] = 1.0 + vtermfky_6[j]fky_6[j];
	vg_6[n][2] = 1.0 + vtermfkz_6[k]fkz_6[k];
	vg_6[n][3] = vtermfkx_6[i]fky_6[j];
	vg_6[n][4] = vtermfkx_6[i]fkz_6[k];
	vg_6[n][5] = vtermfky_6[j]fkz_6[k];
	vg2_6[n][0] = vterm0.5(fkx_6[i]fky_6[j] + fkx2_6[i]fky2_6[j]);
	vg2_6[n][1] = vterm0.5(fkx_6[i]fkz_6[k] + fkx2_6[i]fkz2_6[k]);
	vg2_6[n][2] = vterm0.5(fky_6[j]fkz_6[k] + fky2_6[j]fkz2_6[k]);
	}
	n++;
	}
	}
	}
	compute_gf_6();
	if (differentiation_flag == 1) compute_sf_coeff_6();
	}
	}

	/* ----------------------------------------------------------------------
	reset local grid arrays and communication stencils
	called by fix balance b/c it changed sizes of processor sub-domains
	------------------------------------------------------------------------- */

	void PPPMDisp::setup_grid()
	{
	// free all arrays previously allocated

	deallocate();
	deallocate_peratom();

	// reset portion of global grid that each proc owns

	if (function[0])
	set_fft_parameters(nx_pppm, ny_pppm, nz_pppm,
	nxlo_fft, nylo_fft, nzlo_fft,
	nxhi_fft, nyhi_fft, nzhi_fft,
	nxlo_in, nylo_in, nzlo_in,
	nxhi_in, nyhi_in, nzhi_in,
	nxlo_out, nylo_out, nzlo_out,
	nxhi_out, nyhi_out, nzhi_out,
	nlower, nupper,
	ngrid, nfft, nfft_both,
	shift, shiftone, order);

	if (function[1] + function[2] + function[3])
	set_fft_parameters(nx_pppm_6, ny_pppm_6, nz_pppm_6,
	nxlo_fft_6, nylo_fft_6, nzlo_fft_6,
	nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
	nxlo_in_6, nylo_in_6, nzlo_in_6,
	nxhi_in_6, nyhi_in_6, nzhi_in_6,
	nxlo_out_6, nylo_out_6, nzlo_out_6,
	nxhi_out_6, nyhi_out_6, nzhi_out_6,
	nlower_6, nupper_6,
	ngrid_6, nfft_6, nfft_both_6,
	shift_6, shiftone_6, order_6);

	// reallocate K-space dependent memory
	// check if grid communication is now overlapping if not allowed
	// don't invoke allocate_peratom(), compute() will allocate when needed

	allocate();

	if (function[0]) {
	cg->ghost_notify();
	if (overlap_allowed == 0 && cg->ghost_overlap())
	error->all(FLERR,"PPPM grid stencil extends "
	"beyond nearest neighbor processor");
	cg->setup();
	}
	if (function[1] + function[2] + function[3]) {
	cg_6->ghost_notify();
	if (overlap_allowed == 0 && cg_6->ghost_overlap())
	error->all(FLERR,"PPPM grid stencil extends "
	"beyond nearest neighbor processor");
	cg_6->setup();
	}

	// pre-compute Green's function denomiator expansion
	// pre-compute 1d charge distribution coefficients

	if (function[0]) {
	compute_gf_denom(gf_b, order);
	compute_rho_coeff(rho_coeff, drho_coeff, order);
	if (differentiation_flag == 1)
	compute_sf_precoeff(nx_pppm, ny_pppm, nz_pppm, order,
	nxlo_fft, nylo_fft, nzlo_fft,
	nxhi_fft, nyhi_fft, nzhi_fft,
	sf_precoeff1, sf_precoeff2, sf_precoeff3,
	sf_precoeff4, sf_precoeff5, sf_precoeff6);
	}
	if (function[1] + function[2] + function[3]) {
	compute_gf_denom(gf_b_6, order_6);
	compute_rho_coeff(rho_coeff_6, drho_coeff_6, order_6);
	if (differentiation_flag == 1)
	compute_sf_precoeff(nx_pppm_6, ny_pppm_6, nz_pppm_6, order_6,
	nxlo_fft_6, nylo_fft_6, nzlo_fft_6,
	nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
	sf_precoeff1_6, sf_precoeff2_6, sf_precoeff3_6,
	sf_precoeff4_6, sf_precoeff5_6, sf_precoeff6_6);
	}

	// pre-compute volume-dependent coeffs

	setup();
	}

	/* ----------------------------------------------------------------------
	compute the PPPM long-range force, energy, virial
	------------------------------------------------------------------------- */

	void PPPMDisp::compute(int eflag, int vflag)
	{

	int i;
	// convert atoms from box to lamda coords

	if (eflag \|\| vflag) ev_setup(eflag,vflag);
	else evflag = evflag_atom = eflag_global = vflag_global =
	eflag_atom = vflag_atom = 0;

	if (evflag_atom && !peratom_allocate_flag) {
	allocate_peratom();
	if (function[0]) {
	cg_peratom->ghost_notify();
	cg_peratom->setup();
	}
	if (function[1] + function[2] + function[3]) {
	cg_peratom_6->ghost_notify();
	cg_peratom_6->setup();
	}
	peratom_allocate_flag = 1;
	}

	if (triclinic == 0) boxlo = domain->boxlo;
	else {
	boxlo = domain->boxlo_lamda;
	domain->x2lamda(atom->nlocal);
	}
	// extend size of per-atom arrays if necessary

	if (atom->nmax > nmax) {

	if (function[0]) memory->destroy(part2grid);
	if (function[1] + function[2] + function[3]) memory->destroy(part2grid_6);
	nmax = atom->nmax;
	if (function[0]) memory->create(part2grid,nmax,3,"pppm/disp:part2grid");
	if (function[1] + function[2] + function[3])
	memory->create(part2grid_6,nmax,3,"pppm/disp:part2grid_6");
	}


	energy = 0.0;
	energy_1 = 0.0;
	energy_6 = 0.0;
	if (vflag) for (i = 0; i < 6; i++) virial_6[i] = virial_1[i] = 0.0;

	// find grid points for all my particles
	// distribute partcles' charges/dispersion coefficients on the grid
	// communication between processors and remapping two fft
	// Solution of poissons equation in k-space and backtransformation
	// communication between processors
	// calculation of forces

	if (function[0]) {

	//perfrom calculations for coulomb interactions only

	particle_map_c(delxinv, delyinv, delzinv, shift, part2grid, nupper, nlower,
	nxlo_out, nylo_out, nzlo_out, nxhi_out, nyhi_out, nzhi_out);

	make_rho_c();

	cg->reverse_comm(this,REVERSE_RHO);

	brick2fft(nxlo_in, nylo_in, nzlo_in, nxhi_in, nyhi_in, nzhi_in,
	density_brick, density_fft, work1,remap);

	if (differentiation_flag == 1) {

	poisson_ad(work1, work2, density_fft, fft1, fft2,
	nx_pppm, ny_pppm, nz_pppm, nfft,
	nxlo_fft, nylo_fft, nzlo_fft, nxhi_fft, nyhi_fft, nzhi_fft,
	nxlo_in, nylo_in, nzlo_in, nxhi_in, nyhi_in, nzhi_in,
	energy_1, greensfn,
	virial_1, vg,vg2,
	u_brick, v0_brick, v1_brick, v2_brick, v3_brick, v4_brick, v5_brick);

	cg->forward_comm(this,FORWARD_AD);

	fieldforce_c_ad();

	if (vflag_atom) cg_peratom->forward_comm(this, FORWARD_AD_PERATOM);

	} else {
	poisson_ik(work1, work2, density_fft, fft1, fft2,
	nx_pppm, ny_pppm, nz_pppm, nfft,
	nxlo_fft, nylo_fft, nzlo_fft, nxhi_fft, nyhi_fft, nzhi_fft,
	nxlo_in, nylo_in, nzlo_in, nxhi_in, nyhi_in, nzhi_in,
	energy_1, greensfn,
	fkx, fky, fkz,fkx2, fky2, fkz2,
	vdx_brick, vdy_brick, vdz_brick, virial_1, vg,vg2,
	u_brick, v0_brick, v1_brick, v2_brick, v3_brick, v4_brick, v5_brick);

	cg->forward_comm(this, FORWARD_IK);

	fieldforce_c_ik();

	if (evflag_atom) cg_peratom->forward_comm(this, FORWARD_IK_PERATOM);
	}
	if (evflag_atom) fieldforce_c_peratom();
	}

	if (function[1]) {
	//perfrom calculations for geometric mixing
	particle_map(delxinv_6, delyinv_6, delzinv_6, shift_6, part2grid_6, nupper_6, nlower_6,
	nxlo_out_6, nylo_out_6, nzlo_out_6, nxhi_out_6, nyhi_out_6, nzhi_out_6);
	make_rho_g();


	cg_6->reverse_comm(this, REVERSE_RHO_G);

	brick2fft(nxlo_in_6, nylo_in_6, nzlo_in_6, nxhi_in_6, nyhi_in_6, nzhi_in_6,
	density_brick_g, density_fft_g, work1_6,remap_6);

	if (differentiation_flag == 1) {

	poisson_ad(work1_6, work2_6, density_fft_g, fft1_6, fft2_6,
	nx_pppm_6, ny_pppm_6, nz_pppm_6, nfft_6,
	nxlo_fft_6, nylo_fft_6, nzlo_fft_6, nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
	nxlo_in_6, nylo_in_6, nzlo_in_6, nxhi_in_6, nyhi_in_6, nzhi_in_6,
	energy_6, greensfn_6,
	virial_6, vg_6, vg2_6,
	u_brick_g, v0_brick_g, v1_brick_g, v2_brick_g, v3_brick_g, v4_brick_g, v5_brick_g);

	cg_6->forward_comm(this,FORWARD_AD_G);

	fieldforce_g_ad();

	if (vflag_atom) cg_peratom_6->forward_comm(this,FORWARD_AD_PERATOM_G);

	} else {
	poisson_ik(work1_6, work2_6, density_fft_g, fft1_6, fft2_6,
	nx_pppm_6, ny_pppm_6, nz_pppm_6, nfft_6,
	nxlo_fft_6, nylo_fft_6, nzlo_fft_6, nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
	nxlo_in_6, nylo_in_6, nzlo_in_6, nxhi_in_6, nyhi_in_6, nzhi_in_6,
	energy_6, greensfn_6,
	fkx_6, fky_6, fkz_6,fkx2_6, fky2_6, fkz2_6,
	vdx_brick_g, vdy_brick_g, vdz_brick_g, virial_6, vg_6, vg2_6,
	u_brick_g, v0_brick_g, v1_brick_g, v2_brick_g, v3_brick_g, v4_brick_g, v5_brick_g);

	cg_6->forward_comm(this,FORWARD_IK_G);

	fieldforce_g_ik();


	if (evflag_atom) cg_peratom_6->forward_comm(this, FORWARD_IK_PERATOM_G);
	}
	if (evflag_atom) fieldforce_g_peratom();
	}

	if (function[2]) {
	//perform calculations for arithmetic mixing
	particle_map(delxinv_6, delyinv_6, delzinv_6, shift_6, part2grid_6, nupper_6, nlower_6,
	nxlo_out_6, nylo_out_6, nzlo_out_6, nxhi_out_6, nyhi_out_6, nzhi_out_6);
	make_rho_a();

	cg_6->reverse_comm(this, REVERSE_RHO_A);

	brick2fft_a();

	if ( differentiation_flag == 1) {

	poisson_ad(work1_6, work2_6, density_fft_a3, fft1_6, fft2_6,
	nx_pppm_6, ny_pppm_6, nz_pppm_6, nfft_6,
	nxlo_fft_6, nylo_fft_6, nzlo_fft_6, nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
	nxlo_in_6, nylo_in_6, nzlo_in_6, nxhi_in_6, nyhi_in_6, nzhi_in_6,
	energy_6, greensfn_6,
	virial_6, vg_6, vg2_6,
	u_brick_a3, v0_brick_a3, v1_brick_a3, v2_brick_a3, v3_brick_a3, v4_brick_a3, v5_brick_a3);
	poisson_2s_ad(density_fft_a0, density_fft_a6,
	u_brick_a0, v0_brick_a0, v1_brick_a0, v2_brick_a0, v3_brick_a0, v4_brick_a0, v5_brick_a0,
	u_brick_a6, v0_brick_a6, v1_brick_a6, v2_brick_a6, v3_brick_a6, v4_brick_a6, v5_brick_a6);
	poisson_2s_ad(density_fft_a1, density_fft_a5,
	u_brick_a1, v0_brick_a1, v1_brick_a1, v2_brick_a1, v3_brick_a1, v4_brick_a1, v5_brick_a1,
	u_brick_a5, v0_brick_a5, v1_brick_a5, v2_brick_a5, v3_brick_a5, v4_brick_a5, v5_brick_a5);
	poisson_2s_ad(density_fft_a2, density_fft_a4,
	u_brick_a2, v0_brick_a2, v1_brick_a2, v2_brick_a2, v3_brick_a2, v4_brick_a2, v5_brick_a2,
	u_brick_a4, v0_brick_a4, v1_brick_a4, v2_brick_a4, v3_brick_a4, v4_brick_a4, v5_brick_a4);

	cg_6->forward_comm(this, FORWARD_AD_A);

	fieldforce_a_ad();

	if (evflag_atom) cg_peratom_6->forward_comm(this, FORWARD_AD_PERATOM_A);

	} else {

	poisson_ik(work1_6, work2_6, density_fft_a3, fft1_6, fft2_6,
	nx_pppm_6, ny_pppm_6, nz_pppm_6, nfft_6,
	nxlo_fft_6, nylo_fft_6, nzlo_fft_6, nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
	nxlo_in_6, nylo_in_6, nzlo_in_6, nxhi_in_6, nyhi_in_6, nzhi_in_6,
	energy_6, greensfn_6,
	fkx_6, fky_6, fkz_6,fkx2_6, fky2_6, fkz2_6,
	vdx_brick_a3, vdy_brick_a3, vdz_brick_a3, virial_6, vg_6, vg2_6,
	u_brick_a3, v0_brick_a3, v1_brick_a3, v2_brick_a3, v3_brick_a3, v4_brick_a3, v5_brick_a3);
	poisson_2s_ik(density_fft_a0, density_fft_a6,
	vdx_brick_a0, vdy_brick_a0, vdz_brick_a0,
	vdx_brick_a6, vdy_brick_a6, vdz_brick_a6,
	u_brick_a0, v0_brick_a0, v1_brick_a0, v2_brick_a0, v3_brick_a0, v4_brick_a0, v5_brick_a0,
	u_brick_a6, v0_brick_a6, v1_brick_a6, v2_brick_a6, v3_brick_a6, v4_brick_a6, v5_brick_a6);
	poisson_2s_ik(density_fft_a1, density_fft_a5,
	vdx_brick_a1, vdy_brick_a1, vdz_brick_a1,
	vdx_brick_a5, vdy_brick_a5, vdz_brick_a5,
	u_brick_a1, v0_brick_a1, v1_brick_a1, v2_brick_a1, v3_brick_a1, v4_brick_a1, v5_brick_a1,
	u_brick_a5, v0_brick_a5, v1_brick_a5, v2_brick_a5, v3_brick_a5, v4_brick_a5, v5_brick_a5);
	poisson_2s_ik(density_fft_a2, density_fft_a4,
	vdx_brick_a2, vdy_brick_a2, vdz_brick_a2,
	vdx_brick_a4, vdy_brick_a4, vdz_brick_a4,
	u_brick_a2, v0_brick_a2, v1_brick_a2, v2_brick_a2, v3_brick_a2, v4_brick_a2, v5_brick_a2,
	u_brick_a4, v0_brick_a4, v1_brick_a4, v2_brick_a4, v3_brick_a4, v4_brick_a4, v5_brick_a4);

	cg_6->forward_comm(this, FORWARD_IK_A);

	fieldforce_a_ik();

	if (evflag_atom) cg_peratom_6->forward_comm(this, FORWARD_IK_PERATOM_A);
	}
	if (evflag_atom) fieldforce_a_peratom();
	}

	if (function[3]) {
	//perfrom calculations if no mixing rule applies
	particle_map(delxinv_6, delyinv_6, delzinv_6, shift_6, part2grid_6, nupper_6, nlower_6,
	nxlo_out_6, nylo_out_6, nzlo_out_6, nxhi_out_6, nyhi_out_6, nzhi_out_6);

	make_rho_none();

	cg_6->reverse_comm(this, REVERSE_RHO_NONE);

	brick2fft_none();

	if (differentiation_flag == 1) {

	int n = 0;
	for (int k = 0; k<nsplit_alloc/2; k++) {
	poisson_none_ad(n,n+1,density_fft_none[n],density_fft_none[n+1],
	u_brick_none[n],u_brick_none[n+1],
	v0_brick_none, v1_brick_none, v2_brick_none,
	v3_brick_none, v4_brick_none, v5_brick_none);
	n += 2;
	}

	cg_6->forward_comm(this,FORWARD_AD_NONE);

	fieldforce_none_ad();

	if (vflag_atom) cg_peratom_6->forward_comm(this,FORWARD_AD_PERATOM_NONE);

	} else {
	int n = 0;
	for (int k = 0; k<nsplit_alloc/2; k++) {

	poisson_none_ik(n,n+1,density_fft_none[n], density_fft_none[n+1],
	vdx_brick_none[n], vdy_brick_none[n], vdz_brick_none[n],
	vdx_brick_none[n+1], vdy_brick_none[n+1], vdz_brick_none[n+1],
	u_brick_none, v0_brick_none, v1_brick_none, v2_brick_none,
	v3_brick_none, v4_brick_none, v5_brick_none);
	n += 2;
	}

	cg_6->forward_comm(this,FORWARD_IK_NONE);

	fieldforce_none_ik();

	if (evflag_atom)
	cg_peratom_6->forward_comm(this, FORWARD_IK_PERATOM_NONE);
	}
	if (evflag_atom) fieldforce_none_peratom();
	}

	// update qsum and qsqsum, if atom count has changed and energy needed

	if ((eflag_global \|\| eflag_atom) && atom->natoms != natoms_original) {
	qsum_qsq();
	natoms_original = atom->natoms;
	}

	// sum energy across procs and add in volume-dependent term

	const double qscale = force->qqrd2e * scale;
	if (eflag_global) {
	double energy_all;
	MPI_Allreduce(&energy_1,&energy_all,1,MPI_DOUBLE,MPI_SUM,world);
	energy_1 = energy_all;
	MPI_Allreduce(&energy_6,&energy_all,1,MPI_DOUBLE,MPI_SUM,world);
	energy_6 = energy_all;

	energy_1 = 0.5volume;
	energy_6 = 0.5volume;

	energy_1 -= g_ewald*qsqsum/MY_PIS +
	MY_PI2qsumqsum / (g_ewaldg_ewaldvolume);
	energy_6 += - MY_PIMY_PIS/(6volume)pow(g_ewald_6,3)csumij +
	1.0/12.0pow(g_ewald_6,6)csum;
	energy_1 *= qscale;
	}

	// sum virial across procs

	if (vflag_global) {
	double virial_all[6];
	MPI_Allreduce(virial_1,virial_all,6,MPI_DOUBLE,MPI_SUM,world);
	for (i = 0; i < 6; i++) virial[i] = 0.5qscalevolume*virial_all[i];
	MPI_Allreduce(virial_6,virial_all,6,MPI_DOUBLE,MPI_SUM,world);
	for (i = 0; i < 6; i++) virial[i] += 0.5volumevirial_all[i];
	if (function[1]+function[2]+function[3]){
	double a = MY_PIMY_PIS/(6volume)pow(g_ewald_6,3)csumij;
	virial[0] -= a;
	virial[1] -= a;
	virial[2] -= a;
	}
	}

	if (eflag_atom) {
	if (function[0]) {
	double *q = atom->q;
	for (i = 0; i < atom->nlocal; i++) {
	eatom[i] -= qscaleg_ewaldq[i]q[i]/MY_PIS + qscaleMY_PI2q[i]qsum / (g_ewaldg_ewaldvolume); //coulomb self energy correction
	}
	}
	if (function[1] + function[2] + function[3]) {
	int tmp;
	for (i = 0; i < atom->nlocal; i++) {
	tmp = atom->type[i];
	eatom[i] += - MY_PIMY_PIS/(6volume)pow(g_ewald_6,3)csumi[tmp] +
	1.0/12.0pow(g_ewald_6,6)cii[tmp];
	}
	}
	}

	if (vflag_atom) {
	if (function[1] + function[2] + function[3]) {
	int tmp;
	for (i = 0; i < atom->nlocal; i++) {
	tmp = atom->type[i];
	for (int n = 0; n < 3; n++) vatom[i][n] -= MY_PIMY_PIS/(6volume)pow(g_ewald_6,3)csumi[tmp]; //dispersion self virial correction
	}
	}
	}


	// 2d slab correction

	if (slabflag) slabcorr(eflag);
	if (function[0]) energy += energy_1;
	if (function[1] + function[2] + function[3]) energy += energy_6;

	// convert atoms back from lamda to box coords

	if (triclinic) domain->lamda2x(atom->nlocal);
	}

	/* ----------------------------------------------------------------------
	initialize coefficients needed for the dispersion density on the grids
	------------------------------------------------------------------------- */

	void PPPMDisp::init_coeffs() // local pair coeffs
	{
	int tmp;
	int n = atom->ntypes;
	int converged;
	delete [] B;
	B = NULL;
	if (function[3] + function[2]) { // no mixing rule or arithmetic
	if (function[2] && me == 0) {
	if (screen) fprintf(screen," Optimizing splitting of Dispersion coefficients\n");
	if (logfile) fprintf(logfile," Optimizing splitting of Dispersion coefficients\n");
	}

	// allocate data for eigenvalue decomposition
	double **A=NULL;
	double **Q=NULL;
	if ( n > 1 ) {
	// get dispersion coefficients
	double b = (double ) force->pair->extract("B",tmp);
	memory->create(A,n,n,"pppm/disp:A");
	memory->create(Q,n,n,"pppm/disp:Q");
	// fill coefficients to matrix a
	for (int i = 1; i <= n; i++)
	for (int j = 1; j <= n; j++)
	A[i-1][j-1] = b[i][j];
	// transform q to a unity matrix
	for (int i = 0; i < n; i++)
	for (int j = 0; j < n; j++)
	Q[i][j] = 0.0;
	for (int i = 0; i < n; i++)
	Q[i][i] = 1.0;
	// perfrom eigenvalue decomposition with QR algorithm
	converged = qr_alg(A,Q,n);
	if (function[3] && !converged) {
	error->all(FLERR,"Matrix factorization to split dispersion coefficients failed");
	}
	// determine number of used eigenvalues
	// based on maximum allowed number or cutoff criterion
	// sort eigenvalues according to their size with bubble sort
	double t;
	for (int i = 0; i < n; i++) {
	for (int j = 0; j < n-1-i; j++) {
	if (fabs(A[j][j]) < fabs(A[j+1][j+1])) {
	t = A[j][j];
	A[j][j] = A[j+1][j+1];
	A[j+1][j+1] = t;
	for (int k = 0; k < n; k++) {
	t = Q[k][j];
	Q[k][j] = Q[k][j+1];
	Q[k][j+1] = t;
	}
	}
	}
	}

	// check which eigenvalue is the first that is smaller
	// than a specified tolerance
	// check how many are maximum allowed by the user
	double amax = fabs(A[0][0]);
	double acrit = amax*splittol;
	double bmax = 0;
	double err = 0;
	nsplit = 0;
	for (int i = 0; i < n; i++) {
	if (fabs(A[i][i]) > acrit) nsplit++;
	else {
	bmax = fabs(A[i][i]);
	break;
	}
	}

	err = bmax/amax;
	if (err > 1.0e-4) {
	char str[128];
	sprintf(str,"Estimated error in splitting of dispersion coeffs is %g",err);
	error->warning(FLERR, str);
	}
	// set B
	B = new double[nsplit*n+nsplit];
	for (int i = 0; i< nsplit; i++) {
	B[i] = A[i][i];
	for (int j = 0; j < n; j++) {
	B[nsplit*(j+1) + i] = Q[j][i];
	}
	}

	nsplit_alloc = nsplit;
	if (nsplit%2 == 1) nsplit_alloc = nsplit + 1;
	} else
	nsplit = 1; // use geometric mixing

	// check if the function should preferably be [1] or [2] or [3]
	if (nsplit == 1) {
	if ( B ) delete [] B;
	function[3] = 0;
	function[2] = 0;
	function[1] = 1;
	if (me == 0) {
	if (screen) fprintf(screen," Using geometric mixing for reciprocal space\n");
	if (logfile) fprintf(logfile," Using geometric mixing for reciprocal space\n");
	}
	}
	if (function[2] && nsplit <= 6) {
	if (me == 0) {
	if (screen) fprintf(screen," Using %d instead of 7 structure factors\n",nsplit);
	if (logfile) fprintf(logfile," Using %d instead of 7 structure factors\n",nsplit);
	}
	function[3] = 1;
	function[2] = 0;
	}
	if (function[2] && (nsplit > 6)) {
	if (me == 0) {
	if (screen) fprintf(screen," Using 7 structure factors\n");
	if (logfile) fprintf(logfile," Using 7 structure factors\n");
	}
	if ( B ) delete [] B;
	}
	if (function[3]) {
	if (me == 0) {
	if (screen) fprintf(screen," Using %d structure factors\n",nsplit);
	if (logfile) fprintf(logfile," Using %d structure factors\n",nsplit);
	}
	if (nsplit > 9) error->warning(FLERR, "Simulations might be very slow because of large number of structure factors");
	}

	memory->destroy(A);
	memory->destroy(Q);
	}
	if (function[1]) { // geometric 1/r^6
	double b = (double ) force->pair->extract("B",tmp);
	B = new double[n+1];
	B[0] = 0.0;
	for (int i=1; i<=n; ++i) B[i] = sqrt(fabs(b[i][i]));
	}
	if (function[2]) { // arithmetic 1/r^6
	//cannot use epsilon, because this has not been set yet
	double epsilon = (double ) force->pair->extract("epsilon",tmp);
	//cannot use sigma, because this has not been set yet
	double sigma = (double ) force->pair->extract("sigma",tmp);
	if (!(epsilon&&sigma))
	error->all(FLERR,"Epsilon or sigma reference not set by pair style in PPPMDisp");
	double eps_i, sigma_i, sigma_n, bi = B = new double[7n+7];
	double c[7] = {
	1.0, sqrt(6.0), sqrt(15.0), sqrt(20.0), sqrt(15.0), sqrt(6.0), 1.0};
	for (int i=0; i<=n; ++i) {
	eps_i = sqrt(epsilon[i][i]);
	sigma_i = sigma[i][i];
	sigma_n = 1.0;
	for (int j=0; j<7; ++j) {
	(bi++) = sigma_neps_ic[j]0.25;
	sigma_n *= sigma_i;
	}
	}
	}
	}

	/* ----------------------------------------------------------------------
	Eigenvalue decomposition of a real, symmetric matrix with the QR
	method (includes transpformation to Tridiagonal Matrix + Wilkinson
	shift)
	------------------------------------------------------------------------- */

	int PPPMDisp::qr_alg(double A, double Q, int n)
	{
	int converged = 0;
	double an1, an, bn1, d, mue;
	// allocate some memory for the required operations
	double A0,Qi,C,D,**E;
	// make a copy of A for convergence check
	memory->create(A0,n,n,"pppm/disp:A0");
	for (int i = 0; i < n; i++)
	for (int j = 0; j < n; j++)
	A0[i][j] = A[i][j];

	// allocate an auxiliary matrix Qi
	memory->create(Qi,n,n,"pppm/disp:Qi");

	// alllocate an auxillary matrices for the matrix multiplication
	memory->create(C,n,n,"pppm/disp:C");
	memory->create(D,n,n,"pppm/disp:D");
	memory->create(E,n,n,"pppm/disp:E");

	// transform Matrix A to Tridiagonal form
	hessenberg(A,Q,n);

	// start loop for the matrix factorization
	int count = 0;
	int countmax = 100000;
	while (1) {
	// make a Wilkinson shift
	an1 = A[n-2][n-2];
	an = A[n-1][n-1];
	bn1 = A[n-2][n-1];
	d = (an1-an)/2;
	mue = an + d - copysign(1.,d)sqrt(dd + bn1*bn1);
	for (int i = 0; i < n; i++)
	A[i][i] -= mue;

	// perform a QR factorization for a tridiagonal matrix A
	qr_tri(Qi,A,n);

	// update the matrices
	mmult(A,Qi,C,n);
	mmult(Q,Qi,C,n);

	// backward Wilkinson shift
	for (int i = 0; i < n; i++)
	A[i][i] += mue;

	// check the convergence
	converged = check_convergence(A,Q,A0,C,D,E,n);
	if (converged) break;
	count = count + 1;
	if (count == countmax) break;
	}

	// free allocated memory
	memory->destroy(Qi);
	memory->destroy(A0);
	memory->destroy(C);
	memory->destroy(D);
	memory->destroy(E);

	return converged;
	}

	/* ----------------------------------------------------------------------
	Transform a Matrix to Hessenberg form (for symmetric Matrices, the
	result will be a tridiagonal matrix)
	------------------------------------------------------------------------- */

	void PPPMDisp::hessenberg(double A, double Q, int n)
	{
	double r,a,b,c,s,x1,x2;
	for (int i = 0; i < n-1; i++) {
	for (int j = i+2; j < n; j++) {
	// compute coeffs for the rotation matrix
	a = A[i+1][i];
	b = A[j][i];
	r = sqrt(aa + bb);
	c = a/r;
	s = b/r;
	// update the entries of A with multiplication from the left
	for (int k = 0; k < n; k++) {
	x1 = A[i+1][k];
	x2 = A[j][k];
	A[i+1][k] = cx1 + sx2;
	A[j][k] = -sx1 + cx2;
	}
	// update the entries of A and Q with a multiplication from the right
	for (int k = 0; k < n; k++) {
	x1 = A[k][i+1];
	x2 = A[k][j];
	A[k][i+1] = cx1 + sx2;
	A[k][j] = -sx1 + cx2;
	x1 = Q[k][i+1];
	x2 = Q[k][j];
	Q[k][i+1] = cx1 + sx2;
	Q[k][j] = -sx1 + cx2;
	}
	}
	}
	}

	/* ----------------------------------------------------------------------
	QR factorization for a tridiagonal matrix; Result of the factorization
	is stored in A and Qi
	------------------------------------------------------------------------- */

	void PPPMDisp::qr_tri(double Qi,double A,int n)
	{
	double r,a,b,c,s,x1,x2;
	int j,k,k0,kmax;
	// make Qi a unity matrix
	for (int i = 0; i < n; i++)
	for (int j = 0; j < n; j++)
	Qi[i][j] = 0.0;
	for (int i = 0; i < n; i++)
	Qi[i][i] = 1.0;
	// loop over main diagonal and first of diagonal of A
	for (int i = 0; i < n-1; i++) {
	j = i+1;
	// coefficients of the rotation matrix
	a = A[i][i];
	b = A[j][i];
	r = sqrt(aa + bb);
	c = a/r;
	s = b/r;
	// update the entries of A and Q
	k0 = (i-1>0)?i-1:0; //min(i-1,0);
	kmax = (i+3<n)?i+3:n; //min(i+3,n);
	for (k = k0; k < kmax; k++) {
	x1 = A[i][k];
	x2 = A[j][k];
	A[i][k] = cx1 + sx2;
	A[j][k] = -sx1 + cx2;
	}
	for (k = 0; k < n; k++) {
	x1 = Qi[k][i];
	x2 = Qi[k][j];
	Qi[k][i] = cx1 + sx2;
	Qi[k][j] = -sx1 + cx2;
	}
	}
	}

	/* ----------------------------------------------------------------------
	Multiply two matrices A and B, store the result in A; C provides
	some memory to store intermediate results
	------------------------------------------------------------------------- */

	void PPPMDisp::mmult(double A, double B, double** C, int n)
	{
	for (int i = 0; i < n; i++)
	for (int j = 0; j < n; j++)
	C[i][j] = 0.0;

	// perform matrix multiplication
	for (int i = 0; i < n; i++)
	for (int j = 0; j < n; j++)
	for (int k = 0; k < n; k++)
	C[i][j] += A[i][k] * B[k][j];
	// copy the result back to matrix A
	for (int i = 0; i < n; i++)
	for (int j = 0; j < n; j++)
	A[i][j] = C[i][j];
	}

	/* ----------------------------------------------------------------------
	Check if the factorization has converged by comparing all elements of the
	original matrix and the new matrix
	------------------------------------------------------------------------- */

	int PPPMDisp::check_convergence(double A,double Q,double** A0,
	double C,double D,double** E,int n)
	{
	double eps = 1.0e-8;
	int converged = 1;
	double epsmax = -1;
	double Bmax = 0.0;
	double diff;
	// get the largest eigenvalue of the original matrix
	for (int i = 0; i < n; i++)
	for (int j = 0; j < n; j++)
	Bmax = (Bmax>A0[i][j])?Bmax:A0[i][j]; //max(Bmax,A0[i][j]);
	double epsabs = eps*Bmax;

	// reconstruct the original matrix
	// store the diagonal elements in D
	for (int i = 0; i < n; i++)
	for (int j = 0; j < n; j++)
	D[i][j] = 0.0;
	for (int i = 0; i < n; i++)
	D[i][i] = A[i][i];
	// store matrix Q in E
	for (int i = 0; i < n; i++)
	for (int j = 0; j < n; j++)
	E[i][j] = Q[i][j];
	// E = Q*A
	mmult(E,D,C,n);
	// store transpose of Q in D
	for (int i = 0; i < n; i++)
	for (int j = 0; j < n; j++)
	D[i][j] = Q[j][i];
	// E = QAQ.t
	mmult(E,D,C,n);

	//compare the original matrix and the final matrix
	for (int i = 0; i < n; i++) {
	for (int j = 0; j < n; j++) {
	diff = A0[i][j] - E[i][j];
	epsmax = (epsmax>fabs(diff))?epsmax:fabs(diff);//max(epsmax,fabs(diff));
	}
	}
	if (epsmax > epsabs) converged = 0;
	return converged;
	}

	/* ----------------------------------------------------------------------
	allocate memory that depends on # of K-vectors and order
	------------------------------------------------------------------------- */

	void PPPMDisp::allocate()
	{

	int (*procneigh)[2] = comm->procneigh;

	if (function[0]) {
	memory->create(work1,2*nfft_both,"pppm/disp:work1");
	memory->create(work2,2*nfft_both,"pppm/disp:work2");

	memory->create1d_offset(fkx,nxlo_fft,nxhi_fft,"pppm/disp:fkx");
	memory->create1d_offset(fky,nylo_fft,nyhi_fft,"pppm/disp:fky");
	memory->create1d_offset(fkz,nzlo_fft,nzhi_fft,"pppm/disp:fkz");

	memory->create1d_offset(fkx2,nxlo_fft,nxhi_fft,"pppm/disp:fkx2");
	memory->create1d_offset(fky2,nylo_fft,nyhi_fft,"pppm/disp:fky2");
	memory->create1d_offset(fkz2,nzlo_fft,nzhi_fft,"pppm/disp:fkz2");


	memory->create(gf_b,order,"pppm/disp:gf_b");
	memory->create2d_offset(rho1d,3,-order/2,order/2,"pppm/disp:rho1d");
	memory->create2d_offset(rho_coeff,order,(1-order)/2,order/2,"pppm/disp:rho_coeff");
	memory->create2d_offset(drho1d,3,-order/2,order/2,"pppm/disp:rho1d");
	memory->create2d_offset(drho_coeff,order,(1-order)/2,order/2,"pppm/disp:drho_coeff");

	memory->create(greensfn,nfft_both,"pppm/disp:greensfn");
	memory->create(vg,nfft_both,6,"pppm/disp:vg");
	memory->create(vg2,nfft_both,3,"pppm/disp:vg2");

	memory->create3d_offset(density_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
	nxlo_out,nxhi_out,"pppm/disp:density_brick");
	if ( differentiation_flag == 1) {
	memory->create3d_offset(u_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
	nxlo_out,nxhi_out,"pppm/disp:u_brick");
	memory->create(sf_precoeff1,nfft_both,"pppm/disp:sf_precoeff1");
	memory->create(sf_precoeff2,nfft_both,"pppm/disp:sf_precoeff2");
	memory->create(sf_precoeff3,nfft_both,"pppm/disp:sf_precoeff3");
	memory->create(sf_precoeff4,nfft_both,"pppm/disp:sf_precoeff4");
	memory->create(sf_precoeff5,nfft_both,"pppm/disp:sf_precoeff5");
	memory->create(sf_precoeff6,nfft_both,"pppm/disp:sf_precoeff6");

	} else {
	memory->create3d_offset(vdx_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
	nxlo_out,nxhi_out,"pppm/disp:vdx_brick");
	memory->create3d_offset(vdy_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
	nxlo_out,nxhi_out,"pppm/disp:vdy_brick");
	memory->create3d_offset(vdz_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
	nxlo_out,nxhi_out,"pppm/disp:vdz_brick");
	}
	memory->create(density_fft,nfft_both,"pppm/disp:density_fft");

	int tmp;

	fft1 = new FFT3d(lmp,world,nx_pppm,ny_pppm,nz_pppm,
	nxlo_fft,nxhi_fft,nylo_fft,nyhi_fft,nzlo_fft,nzhi_fft,
	nxlo_fft,nxhi_fft,nylo_fft,nyhi_fft,nzlo_fft,nzhi_fft,
	0,0,&tmp,collective_flag);

	fft2 = new FFT3d(lmp,world,nx_pppm,ny_pppm,nz_pppm,
	nxlo_fft,nxhi_fft,nylo_fft,nyhi_fft,nzlo_fft,nzhi_fft,
	nxlo_in,nxhi_in,nylo_in,nyhi_in,nzlo_in,nzhi_in,
	0,0,&tmp,collective_flag);

	remap = new Remap(lmp,world,
	nxlo_in,nxhi_in,nylo_in,nyhi_in,nzlo_in,nzhi_in,
	nxlo_fft,nxhi_fft,nylo_fft,nyhi_fft,nzlo_fft,nzhi_fft,
	1,0,0,FFT_PRECISION,collective_flag);

	// create ghost grid object for rho and electric field communication

	if (differentiation_flag == 1)
	cg = new GridComm(lmp,world,1,1,
	nxlo_in,nxhi_in,nylo_in,nyhi_in,nzlo_in,nzhi_in,
	nxlo_out,nxhi_out,nylo_out,nyhi_out,nzlo_out,nzhi_out,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);
	else
	cg = new GridComm(lmp,world,3,1,
	nxlo_in,nxhi_in,nylo_in,nyhi_in,nzlo_in,nzhi_in,
	nxlo_out,nxhi_out,nylo_out,nyhi_out,nzlo_out,nzhi_out,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);
	}

	if (function[1]) {
	memory->create(work1_6,2*nfft_both_6,"pppm/disp:work1_6");
	memory->create(work2_6,2*nfft_both_6,"pppm/disp:work2_6");

	memory->create1d_offset(fkx_6,nxlo_fft_6,nxhi_fft_6,"pppm/disp:fkx_6");
	memory->create1d_offset(fky_6,nylo_fft_6,nyhi_fft_6,"pppm/disp:fky_6");
	memory->create1d_offset(fkz_6,nzlo_fft_6,nzhi_fft_6,"pppm/disp:fkz_6");

	memory->create1d_offset(fkx2_6,nxlo_fft_6,nxhi_fft_6,"pppm/disp:fkx2_6");
	memory->create1d_offset(fky2_6,nylo_fft_6,nyhi_fft_6,"pppm/disp:fky2_6");
	memory->create1d_offset(fkz2_6,nzlo_fft_6,nzhi_fft_6,"pppm/disp:fkz2_6");

	memory->create(gf_b_6,order_6,"pppm/disp:gf_b_6");
	memory->create2d_offset(rho1d_6,3,-order_6/2,order_6/2,"pppm/disp:rho1d_6");
	memory->create2d_offset(rho_coeff_6,order_6,(1-order_6)/2,order_6/2,"pppm/disp:rho_coeff_6");
	memory->create2d_offset(drho1d_6,3,-order_6/2,order_6/2,"pppm/disp:drho1d_6");
	memory->create2d_offset(drho_coeff_6,order_6,(1-order_6)/2,order_6/2,"pppm/disp:drho_coeff_6");

	memory->create(greensfn_6,nfft_both_6,"pppm/disp:greensfn_6");
	memory->create(vg_6,nfft_both_6,6,"pppm/disp:vg_6");
	memory->create(vg2_6,nfft_both_6,3,"pppm/disp:vg2_6");

	memory->create3d_offset(density_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_g");
	if ( differentiation_flag == 1) {
	memory->create3d_offset(u_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_g");

	memory->create(sf_precoeff1_6,nfft_both_6,"pppm/disp:sf_precoeff1_6");
	memory->create(sf_precoeff2_6,nfft_both_6,"pppm/disp:sf_precoeff2_6");
	memory->create(sf_precoeff3_6,nfft_both_6,"pppm/disp:sf_precoeff3_6");
	memory->create(sf_precoeff4_6,nfft_both_6,"pppm/disp:sf_precoeff4_6");
	memory->create(sf_precoeff5_6,nfft_both_6,"pppm/disp:sf_precoeff5_6");
	memory->create(sf_precoeff6_6,nfft_both_6,"pppm/disp:sf_precoeff6_6");

	} else {
	memory->create3d_offset(vdx_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_g");
	memory->create3d_offset(vdy_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_g");
	memory->create3d_offset(vdz_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_g");
	}
	memory->create(density_fft_g,nfft_both_6,"pppm/disp:density_fft_g");


	int tmp;

	fft1_6 = new FFT3d(lmp,world,nx_pppm_6,ny_pppm_6,nz_pppm_6,
	nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
	nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
	0,0,&tmp,collective_flag);

	fft2_6 = new FFT3d(lmp,world,nx_pppm_6,ny_pppm_6,nz_pppm_6,
	nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	0,0,&tmp,collective_flag);

	remap_6 = new Remap(lmp,world,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
	1,0,0,FFT_PRECISION,collective_flag);

	// create ghost grid object for rho and electric field communication

	if (differentiation_flag == 1)
	cg_6 = new GridComm(lmp,world,1,1,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);
	else
	cg_6 = new GridComm(lmp,world,3,1,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);
	}

	if (function[2]) {
	memory->create(work1_6,2*nfft_both_6,"pppm/disp:work1_6");
	memory->create(work2_6,2*nfft_both_6,"pppm/disp:work2_6");

	memory->create1d_offset(fkx_6,nxlo_fft_6,nxhi_fft_6,"pppm/disp:fkx_6");
	memory->create1d_offset(fky_6,nylo_fft_6,nyhi_fft_6,"pppm/disp:fky_6");
	memory->create1d_offset(fkz_6,nzlo_fft_6,nzhi_fft_6,"pppm/disp:fkz_6");

	memory->create1d_offset(fkx2_6,nxlo_fft_6,nxhi_fft_6,"pppm/disp:fkx2_6");
	memory->create1d_offset(fky2_6,nylo_fft_6,nyhi_fft_6,"pppm/disp:fky2_6");
	memory->create1d_offset(fkz2_6,nzlo_fft_6,nzhi_fft_6,"pppm/disp:fkz2_6");

	memory->create(gf_b_6,order_6,"pppm/disp:gf_b_6");
	memory->create2d_offset(rho1d_6,3,-order_6/2,order_6/2,"pppm/disp:rho1d_6");
	memory->create2d_offset(rho_coeff_6,order_6,(1-order_6)/2,order_6/2,"pppm/disp:rho_coeff_6");
	memory->create2d_offset(drho1d_6,3,-order_6/2,order_6/2,"pppm/disp:drho1d_6");
	memory->create2d_offset(drho_coeff_6,order_6,(1-order_6)/2,order_6/2,"pppm/disp:drho_coeff_6");

	memory->create(greensfn_6,nfft_both_6,"pppm/disp:greensfn_6");
	memory->create(vg_6,nfft_both_6,6,"pppm/disp:vg_6");
	memory->create(vg2_6,nfft_both_6,3,"pppm/disp:vg2_6");

	memory->create3d_offset(density_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_a0");
	memory->create3d_offset(density_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_a1");
	memory->create3d_offset(density_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_a2");
	memory->create3d_offset(density_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_a3");
	memory->create3d_offset(density_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_a4");
	memory->create3d_offset(density_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_a5");
	memory->create3d_offset(density_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_a6");

	memory->create(density_fft_a0,nfft_both_6,"pppm/disp:density_fft_a0");
	memory->create(density_fft_a1,nfft_both_6,"pppm/disp:density_fft_a1");
	memory->create(density_fft_a2,nfft_both_6,"pppm/disp:density_fft_a2");
	memory->create(density_fft_a3,nfft_both_6,"pppm/disp:density_fft_a3");
	memory->create(density_fft_a4,nfft_both_6,"pppm/disp:density_fft_a4");
	memory->create(density_fft_a5,nfft_both_6,"pppm/disp:density_fft_a5");
	memory->create(density_fft_a6,nfft_both_6,"pppm/disp:density_fft_a6");


	if ( differentiation_flag == 1 ) {
	memory->create3d_offset(u_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a0");
	memory->create3d_offset(u_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a1");
	memory->create3d_offset(u_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a2");
	memory->create3d_offset(u_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a3");
	memory->create3d_offset(u_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a4");
	memory->create3d_offset(u_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a5");
	memory->create3d_offset(u_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a6");

	memory->create(sf_precoeff1_6,nfft_both_6,"pppm/disp:sf_precoeff1_6");
	memory->create(sf_precoeff2_6,nfft_both_6,"pppm/disp:sf_precoeff2_6");
	memory->create(sf_precoeff3_6,nfft_both_6,"pppm/disp:sf_precoeff3_6");
	memory->create(sf_precoeff4_6,nfft_both_6,"pppm/disp:sf_precoeff4_6");
	memory->create(sf_precoeff5_6,nfft_both_6,"pppm/disp:sf_precoeff5_6");
	memory->create(sf_precoeff6_6,nfft_both_6,"pppm/disp:sf_precoeff6_6");

	} else {

	memory->create3d_offset(vdx_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_a0");
	memory->create3d_offset(vdy_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_a0");
	memory->create3d_offset(vdz_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_a0");

	memory->create3d_offset(vdx_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_a1");
	memory->create3d_offset(vdy_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_a1");
	memory->create3d_offset(vdz_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_a1");

	memory->create3d_offset(vdx_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_a2");
	memory->create3d_offset(vdy_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_a2");
	memory->create3d_offset(vdz_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_a2");

	memory->create3d_offset(vdx_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_a3");
	memory->create3d_offset(vdy_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_a3");
	memory->create3d_offset(vdz_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_a3");

	memory->create3d_offset(vdx_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_a4");
	memory->create3d_offset(vdy_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_a4");
	memory->create3d_offset(vdz_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_a4");

	memory->create3d_offset(vdx_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_a5");
	memory->create3d_offset(vdy_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_a5");
	memory->create3d_offset(vdz_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_a5");

	memory->create3d_offset(vdx_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_a6");
	memory->create3d_offset(vdy_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_a6");
	memory->create3d_offset(vdz_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_a6");
	}



	int tmp;

	fft1_6 = new FFT3d(lmp,world,nx_pppm_6,ny_pppm_6,nz_pppm_6,
	nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
	nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
	0,0,&tmp,collective_flag);

	fft2_6 = new FFT3d(lmp,world,nx_pppm_6,ny_pppm_6,nz_pppm_6,
	nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	0,0,&tmp,collective_flag);

	remap_6 = new Remap(lmp,world,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
	1,0,0,FFT_PRECISION,collective_flag);

	// create ghost grid object for rho and electric field communication


	if (differentiation_flag == 1)
	cg_6 = new GridComm(lmp,world,7,7,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);
	else
	cg_6 = new GridComm(lmp,world,21,7,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);
	}

	if (function[3]) {
	memory->create(work1_6,2*nfft_both_6,"pppm/disp:work1_6");
	memory->create(work2_6,2*nfft_both_6,"pppm/disp:work2_6");

	memory->create1d_offset(fkx_6,nxlo_fft_6,nxhi_fft_6,"pppm/disp:fkx_6");
	memory->create1d_offset(fky_6,nylo_fft_6,nyhi_fft_6,"pppm/disp:fky_6");
	memory->create1d_offset(fkz_6,nzlo_fft_6,nzhi_fft_6,"pppm/disp:fkz_6");

	memory->create1d_offset(fkx2_6,nxlo_fft_6,nxhi_fft_6,"pppm/disp:fkx2_6");
	memory->create1d_offset(fky2_6,nylo_fft_6,nyhi_fft_6,"pppm/disp:fky2_6");
	memory->create1d_offset(fkz2_6,nzlo_fft_6,nzhi_fft_6,"pppm/disp:fkz2_6");

	memory->create(gf_b_6,order_6,"pppm/disp:gf_b_6");
	memory->create2d_offset(rho1d_6,3,-order_6/2,order_6/2,"pppm/disp:rho1d_6");
	memory->create2d_offset(rho_coeff_6,order_6,(1-order_6)/2,order_6/2,"pppm/disp:rho_coeff_6");
	memory->create2d_offset(drho1d_6,3,-order_6/2,order_6/2,"pppm/disp:drho1d_6");
	memory->create2d_offset(drho_coeff_6,order_6,(1-order_6)/2,order_6/2,"pppm/disp:drho_coeff_6");

	memory->create(greensfn_6,nfft_both_6,"pppm/disp:greensfn_6");
	memory->create(vg_6,nfft_both_6,6,"pppm/disp:vg_6");
	memory->create(vg2_6,nfft_both_6,3,"pppm/disp:vg2_6");

	memory->create4d_offset(density_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_none");
	if ( differentiation_flag == 1) {
	memory->create4d_offset(u_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_none");

	memory->create(sf_precoeff1_6,nfft_both_6,"pppm/disp:sf_precoeff1_6");
	memory->create(sf_precoeff2_6,nfft_both_6,"pppm/disp:sf_precoeff2_6");
	memory->create(sf_precoeff3_6,nfft_both_6,"pppm/disp:sf_precoeff3_6");
	memory->create(sf_precoeff4_6,nfft_both_6,"pppm/disp:sf_precoeff4_6");
	memory->create(sf_precoeff5_6,nfft_both_6,"pppm/disp:sf_precoeff5_6");
	memory->create(sf_precoeff6_6,nfft_both_6,"pppm/disp:sf_precoeff6_6");

	} else {
	memory->create4d_offset(vdx_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_none");
	memory->create4d_offset(vdy_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_none");
	memory->create4d_offset(vdz_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_none");
	}
	memory->create(density_fft_none,nsplit_alloc,nfft_both_6,"pppm/disp:density_fft_none");


	int tmp;

	fft1_6 = new FFT3d(lmp,world,nx_pppm_6,ny_pppm_6,nz_pppm_6,
	nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
	nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
	0,0,&tmp,collective_flag);

	fft2_6 = new FFT3d(lmp,world,nx_pppm_6,ny_pppm_6,nz_pppm_6,
	nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	0,0,&tmp,collective_flag);

	remap_6 = new Remap(lmp,world,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
	1,0,0,FFT_PRECISION,collective_flag);

	// create ghost grid object for rho and electric field communication

	if (differentiation_flag == 1)
	cg_6 = new GridComm(lmp,world,nsplit_alloc,nsplit_alloc,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);
	else
	cg_6 = new GridComm(lmp,world,3*nsplit_alloc,nsplit_alloc,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);
	}

	}

	/* ----------------------------------------------------------------------
	allocate memory that depends on # of K-vectors and order
	for per atom calculations
	------------------------------------------------------------------------- */

	void PPPMDisp::allocate_peratom()
	{

	int (*procneigh)[2] = comm->procneigh;

	if (function[0]) {

	if (differentiation_flag != 1)
	memory->create3d_offset(u_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
	nxlo_out,nxhi_out,"pppm/disp:u_brick");

	memory->create3d_offset(v0_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
	nxlo_out,nxhi_out,"pppm/disp:v0_brick");
	memory->create3d_offset(v1_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
	nxlo_out,nxhi_out,"pppm/disp:v1_brick");
	memory->create3d_offset(v2_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
	nxlo_out,nxhi_out,"pppm/disp:v2_brick");
	memory->create3d_offset(v3_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
	nxlo_out,nxhi_out,"pppm/disp:v3_brick");
	memory->create3d_offset(v4_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
	nxlo_out,nxhi_out,"pppm/disp:v4_brick");
	memory->create3d_offset(v5_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
	nxlo_out,nxhi_out,"pppm/disp:v5_brick");

	// create ghost grid object for rho and electric field communication

	if (differentiation_flag == 1)
	cg_peratom =
	new GridComm(lmp,world,6,1,
	nxlo_in,nxhi_in,nylo_in,nyhi_in,nzlo_in,nzhi_in,
	nxlo_out,nxhi_out,nylo_out,nyhi_out,nzlo_out,nzhi_out,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);
	else
	cg_peratom =
	new GridComm(lmp,world,7,1,
	nxlo_in,nxhi_in,nylo_in,nyhi_in,nzlo_in,nzhi_in,
	nxlo_out,nxhi_out,nylo_out,nyhi_out,nzlo_out,nzhi_out,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);

	}


	if (function[1]) {

	if ( differentiation_flag != 1 )
	memory->create3d_offset(u_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_g");

	memory->create3d_offset(v0_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_g");
	memory->create3d_offset(v1_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_g");
	memory->create3d_offset(v2_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_g");
	memory->create3d_offset(v3_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_g");
	memory->create3d_offset(v4_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_g");
	memory->create3d_offset(v5_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_g");

	// create ghost grid object for rho and electric field communication

	if (differentiation_flag == 1)
	cg_peratom_6 =
	new GridComm(lmp,world,6,1,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);
	else
	cg_peratom_6 =
	new GridComm(lmp,world,7,1,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);

	}

	if (function[2]) {

	if ( differentiation_flag != 1 ) {
	memory->create3d_offset(u_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a0");
	memory->create3d_offset(u_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a1");
	memory->create3d_offset(u_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a2");
	memory->create3d_offset(u_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a3");
	memory->create3d_offset(u_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a4");
	memory->create3d_offset(u_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a5");
	memory->create3d_offset(u_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a6");
	}

	memory->create3d_offset(v0_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_a0");
	memory->create3d_offset(v1_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_a0");
	memory->create3d_offset(v2_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_a0");
	memory->create3d_offset(v3_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_a0");
	memory->create3d_offset(v4_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_a0");
	memory->create3d_offset(v5_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_a0");

	memory->create3d_offset(v0_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_a1");
	memory->create3d_offset(v1_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_a1");
	memory->create3d_offset(v2_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_a1");
	memory->create3d_offset(v3_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_a1");
	memory->create3d_offset(v4_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_a1");
	memory->create3d_offset(v5_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_a1");

	memory->create3d_offset(v0_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_a2");
	memory->create3d_offset(v1_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_a2");
	memory->create3d_offset(v2_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_a2");
	memory->create3d_offset(v3_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_a2");
	memory->create3d_offset(v4_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_a2");
	memory->create3d_offset(v5_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_a2");

	memory->create3d_offset(v0_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_a3");
	memory->create3d_offset(v1_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_a3");
	memory->create3d_offset(v2_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_a3");
	memory->create3d_offset(v3_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_a3");
	memory->create3d_offset(v4_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_a3");
	memory->create3d_offset(v5_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_a3");

	memory->create3d_offset(v0_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_a4");
	memory->create3d_offset(v1_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_a4");
	memory->create3d_offset(v2_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_a4");
	memory->create3d_offset(v3_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_a4");
	memory->create3d_offset(v4_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_a4");
	memory->create3d_offset(v5_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_a4");

	memory->create3d_offset(v0_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_a5");
	memory->create3d_offset(v1_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_a5");
	memory->create3d_offset(v2_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_a5");
	memory->create3d_offset(v3_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_a5");
	memory->create3d_offset(v4_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_a5");
	memory->create3d_offset(v5_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_a5");

	memory->create3d_offset(v0_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_a6");
	memory->create3d_offset(v1_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_a6");
	memory->create3d_offset(v2_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_a6");
	memory->create3d_offset(v3_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_a6");
	memory->create3d_offset(v4_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_a6");
	memory->create3d_offset(v5_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_a6");

	// create ghost grid object for rho and electric field communication

	if (differentiation_flag == 1)
	cg_peratom_6 =
	new GridComm(lmp,world,42,1,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);
	else
	cg_peratom_6 =
	new GridComm(lmp,world,49,1,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);

	}

	if (function[3]) {

	if ( differentiation_flag != 1 )
	memory->create4d_offset(u_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_none");

	memory->create4d_offset(v0_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_none");
	memory->create4d_offset(v1_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_none");
	memory->create4d_offset(v2_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_none");
	memory->create4d_offset(v3_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_none");
	memory->create4d_offset(v4_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_none");
	memory->create4d_offset(v5_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
	nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_none");

	// create ghost grid object for rho and electric field communication

	if (differentiation_flag == 1)
	cg_peratom_6 =
	new GridComm(lmp,world,6*nsplit_alloc,1,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);
	else
	cg_peratom_6 =
	new GridComm(lmp,world,7*nsplit_alloc,1,
	nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
	nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
	procneigh[0][0],procneigh[0][1],procneigh[1][0],
	procneigh[1][1],procneigh[2][0],procneigh[2][1]);

	}
	}


	/* ----------------------------------------------------------------------
	deallocate memory that depends on # of K-vectors and order
	------------------------------------------------------------------------- */

	void PPPMDisp::deallocate()
	{
	memory->destroy3d_offset(density_brick,nzlo_out,nylo_out,nxlo_out);
	memory->destroy3d_offset(vdx_brick,nzlo_out,nylo_out,nxlo_out);
	memory->destroy3d_offset(vdy_brick,nzlo_out,nylo_out,nxlo_out);
	memory->destroy3d_offset(vdz_brick,nzlo_out,nylo_out,nxlo_out);
	memory->destroy(density_fft);
	density_brick = vdx_brick = vdy_brick = vdz_brick = NULL;
	density_fft = NULL;

	memory->destroy3d_offset(density_brick_g,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdx_brick_g,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdy_brick_g,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdz_brick_g,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy(density_fft_g);
	density_brick_g = vdx_brick_g = vdy_brick_g = vdz_brick_g = NULL;
	density_fft_g = NULL;

	memory->destroy3d_offset(density_brick_a0,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdx_brick_a0,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdy_brick_a0,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdz_brick_a0,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy(density_fft_a0);
	density_brick_a0 = vdx_brick_a0 = vdy_brick_a0 = vdz_brick_a0 = NULL;
	density_fft_a0 = NULL;

	memory->destroy3d_offset(density_brick_a1,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdx_brick_a1,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdy_brick_a1,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdz_brick_a1,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy(density_fft_a1);
	density_brick_a1 = vdx_brick_a1 = vdy_brick_a1 = vdz_brick_a1 = NULL;
	density_fft_a1 = NULL;

	memory->destroy3d_offset(density_brick_a2,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdx_brick_a2,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdy_brick_a2,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdz_brick_a2,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy(density_fft_a2);
	density_brick_a2 = vdx_brick_a2 = vdy_brick_a2 = vdz_brick_a2 = NULL;
	density_fft_a2 = NULL;

	memory->destroy3d_offset(density_brick_a3,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdx_brick_a3,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdy_brick_a3,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdz_brick_a3,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy(density_fft_a3);
	density_brick_a3 = vdx_brick_a3 = vdy_brick_a3 = vdz_brick_a3 = NULL;
	density_fft_a3 = NULL;

	memory->destroy3d_offset(density_brick_a4,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdx_brick_a4,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdy_brick_a4,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdz_brick_a4,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy(density_fft_a4);
	density_brick_a4 = vdx_brick_a4 = vdy_brick_a4 = vdz_brick_a4 = NULL;
	density_fft_a4 = NULL;

	memory->destroy3d_offset(density_brick_a5,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdx_brick_a5,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdy_brick_a5,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdz_brick_a5,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy(density_fft_a5);
	density_brick_a5 = vdx_brick_a5 = vdy_brick_a5 = vdz_brick_a5 = NULL;
	density_fft_a5 = NULL;

	memory->destroy3d_offset(density_brick_a6,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdx_brick_a6,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdy_brick_a6,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy3d_offset(vdz_brick_a6,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy(density_fft_a6);
	density_brick_a6 = vdx_brick_a6 = vdy_brick_a6 = vdz_brick_a6 = NULL;
	density_fft_a6 = NULL;

	memory->destroy4d_offset(density_brick_none,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy4d_offset(vdx_brick_none,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy4d_offset(vdy_brick_none,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy4d_offset(vdz_brick_none,nzlo_out_6,nylo_out_6,nxlo_out_6);
	memory->destroy(density_fft_none);
	density_brick_none = vdx_brick_none = vdy_brick_none = vdz_brick_none = NULL;
	density_fft_none = NULL;

	memory->destroy(sf_precoeff1);
	memory->destroy(sf_precoeff2);
	memory->destroy(sf_precoeff3);
	memory->destroy(sf_precoeff4);
	memory->destroy(sf_precoeff5);
	memory->destroy(sf_precoeff6);
	sf_precoeff1 = sf_precoeff2 = sf_precoeff3 = sf_precoeff4 = sf_precoeff5 = sf_precoeff6 = NULL;

	memory->destroy(sf_precoeff1_6);
	memory->destroy(sf_precoeff2_6);
	memory->destroy(sf_precoeff3_6);
	memory->destroy(sf_precoeff4_6);
	memory->destroy(sf_precoeff5_6);
	memory->destroy(sf_precoeff6_6);
	sf_precoeff1_6 = sf_precoeff2_6 = sf_precoeff3_6 = sf_precoeff4_6 = sf_precoeff5_6 = sf_precoeff6_6 = NULL;

	memory->destroy(greensfn);
	memory->destroy(greensfn_6);
	memory->destroy(work1);
	memory->destroy(work2);
	memory->destroy(work1_6);
	memory->destroy(work2_6);
	memory->destroy(vg);
	memory->destroy(vg2);
	memory->destroy(vg_6);
	memory->destroy(vg2_6);
	greensfn = greensfn_6 = NULL;
	work1 = work2 = work1_6 = work2_6 = NULL;
	vg = vg2 = vg_6 = vg2_6 = NULL;

	memory->destroy1d_offset(fkx,nxlo_fft);
	memory->destroy1d_offset(fky,nylo_fft);
	memory->destroy1d_offset(fkz,nzlo_fft);
	fkx = fky = fkz = NULL;

	memory->destroy1d_offset(fkx2,nxlo_fft);
	memory->destroy1d_offset(fky2,nylo_fft);
	memory->destroy1d_offset(fkz2,nzlo_fft);
	fkx2 = fky2 = fkz2 = NULL;

	memory->destroy1d_offset(fkx_6,nxlo_fft_6);
	memory->destroy1d_offset(fky_6,nylo_fft_6);
	memory->destroy1d_offset(fkz_6,nzlo_fft_6);
	fkx_6 = fky_6 = fkz_6 = NULL;

	memory->destroy1d_offset(fkx2_6,nxlo_fft_6);
	memory->destroy1d_offset(fky2_6,nylo_fft_6);
	memory->destroy1d_offset(fkz2_6,nzlo_fft_6);
	fkx2_6 = fky2_6 = fkz2_6 = NULL;


	memory->destroy(gf_b);
	memory->destroy2d_offset(rho1d,-order/2);
	memory->destroy2d_offset(rho_coeff,(1-order)/2);
	memory->destroy2d_offset(drho1d,-order/2);
	memory->destroy2d_offset(drho_coeff, (1-order)/2);
	gf_b = NULL;
	rho1d = rho_coeff = drho1d = drho_coeff = NULL;

	memory->destroy(gf_b_6);
	memory->destroy2d_offset(rho1d_6,-order_6/2);
	memory->destroy2d_offset(rho_coeff_6,(1-order_6)/2);
	memory->destroy2d_offset(drho1d_6,-order_6/2);
	memory->destroy2d_offset(drho_coeff_6,(1-order_6)/2);
	gf_b_6 = NULL;
	rho1d_6 = rho_coeff_6 = drho1d_6 = drho_coeff_6 = NULL;

	delete fft1;
	delete fft2;
	delete remap;
	delete cg;
	fft1 = fft2 = NULL;
	remap = NULL;
	cg = NULL;

	delete fft1_6;
	delete fft2_6;
	delete remap_6;
	delete cg_6;
	fft1_6 = fft2_6 = NULL;
	remap_6 = NULL;
	cg_6 = NULL;
	}


	/* ----------------------------------------------------------------------
	deallocate memory that depends on # of K-vectors and order
	for per atom calculations
	------------------------------------------------------------------------- */

	void PPPMDisp::deallocate_peratom()
	{
	peratom_allocate_flag = 0;

	memory->destroy3d_offset(u_brick, nzlo_out, nylo_out, nxlo_out);
	memory->destroy3d_offset(v0_brick, nzlo_out, nylo_out, nxlo_out);
	memory->destroy3d_offset(v1_brick, nzlo_out, nylo_out, nxlo_out);
	memory->destroy3d_offset(v2_brick, nzlo_out, nylo_out, nxlo_out);
	memory->destroy3d_offset(v3_brick, nzlo_out, nylo_out, nxlo_out);
	memory->destroy3d_offset(v4_brick, nzlo_out, nylo_out, nxlo_out);
	memory->destroy3d_offset(v5_brick, nzlo_out, nylo_out, nxlo_out);
	u_brick = v0_brick = v1_brick = v2_brick = v3_brick = v4_brick = v5_brick = NULL;

	memory->destroy3d_offset(u_brick_g, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v0_brick_g, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v1_brick_g, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v2_brick_g, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v3_brick_g, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v4_brick_g, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v5_brick_g, nzlo_out_6, nylo_out_6, nxlo_out_6);
	u_brick_g = v0_brick_g = v1_brick_g = v2_brick_g = v3_brick_g = v4_brick_g = v5_brick_g = NULL;

	memory->destroy3d_offset(u_brick_a0, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v0_brick_a0, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v1_brick_a0, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v2_brick_a0, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v3_brick_a0, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v4_brick_a0, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v5_brick_a0, nzlo_out_6, nylo_out_6, nxlo_out_6);
	u_brick_a0 = v0_brick_a0 = v1_brick_a0 = v2_brick_a0 = v3_brick_a0 = v4_brick_a0 = v5_brick_a0 = NULL;

	memory->destroy3d_offset(u_brick_a1, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v0_brick_a1, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v1_brick_a1, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v2_brick_a1, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v3_brick_a1, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v4_brick_a1, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v5_brick_a1, nzlo_out_6, nylo_out_6, nxlo_out_6);
	u_brick_a1 = v0_brick_a1 = v1_brick_a1 = v2_brick_a1 = v3_brick_a1 = v4_brick_a1 = v5_brick_a1 = NULL;

	memory->destroy3d_offset(u_brick_a2, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v0_brick_a2, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v1_brick_a2, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v2_brick_a2, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v3_brick_a2, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v4_brick_a2, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v5_brick_a2, nzlo_out_6, nylo_out_6, nxlo_out_6);
	u_brick_a2 = v0_brick_a2 = v1_brick_a2 = v2_brick_a2 = v3_brick_a2 = v4_brick_a2 = v5_brick_a2 = NULL;

	memory->destroy3d_offset(u_brick_a3, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v0_brick_a3, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v1_brick_a3, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v2_brick_a3, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v3_brick_a3, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v4_brick_a3, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v5_brick_a3, nzlo_out_6, nylo_out_6, nxlo_out_6);
	u_brick_a3 = v0_brick_a3 = v1_brick_a3 = v2_brick_a3 = v3_brick_a3 = v4_brick_a3 = v5_brick_a3 = NULL;

	memory->destroy3d_offset(u_brick_a4, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v0_brick_a4, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v1_brick_a4, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v2_brick_a4, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v3_brick_a4, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v4_brick_a4, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v5_brick_a4, nzlo_out_6, nylo_out_6, nxlo_out_6);
	u_brick_a4 = v0_brick_a4 = v1_brick_a4 = v2_brick_a4 = v3_brick_a4 = v4_brick_a4 = v5_brick_a4 = NULL;

	memory->destroy3d_offset(u_brick_a5, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v0_brick_a5, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v1_brick_a5, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v2_brick_a5, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v3_brick_a5, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v4_brick_a5, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v5_brick_a5, nzlo_out_6, nylo_out_6, nxlo_out_6);
	u_brick_a5 = v0_brick_a5 = v1_brick_a5 = v2_brick_a5 = v3_brick_a5 = v4_brick_a5 = v5_brick_a5 = NULL;

	memory->destroy3d_offset(u_brick_a6, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v0_brick_a6, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v1_brick_a6, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v2_brick_a6, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v3_brick_a6, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v4_brick_a6, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy3d_offset(v5_brick_a6, nzlo_out_6, nylo_out_6, nxlo_out_6);
	u_brick_a6 = v0_brick_a6 = v1_brick_a6 = v2_brick_a6 = v3_brick_a6 = v4_brick_a6 = v5_brick_a6 = NULL;

	memory->destroy4d_offset(u_brick_none, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy4d_offset(v0_brick_none, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy4d_offset(v1_brick_none, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy4d_offset(v2_brick_none, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy4d_offset(v3_brick_none, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy4d_offset(v4_brick_none, nzlo_out_6, nylo_out_6, nxlo_out_6);
	memory->destroy4d_offset(v5_brick_none, nzlo_out_6, nylo_out_6, nxlo_out_6);
	u_brick_none = v0_brick_none = v1_brick_none = v2_brick_none = v3_brick_none = v4_brick_none = v5_brick_none = NULL;

	delete cg_peratom;
	delete cg_peratom_6;
	cg_peratom = cg_peratom_6 = NULL;
	}

	/* ----------------------------------------------------------------------
	set size of FFT grid (nx,ny,nz_pppm) and g_ewald
	for Coulomb interactions
	------------------------------------------------------------------------- */

	void PPPMDisp::set_grid()
	{
	double q2 = qsqsum * force->qqrd2e;

	// use xprd,yprd,zprd even if triclinic so grid size is the same
	// adjust z dimension for 2d slab PPPM
	// 3d PPPM just uses zprd since slab_volfactor = 1.0

	double xprd = domain->xprd;
	double yprd = domain->yprd;
	double zprd = domain->zprd;
	double zprd_slab = zprd*slab_volfactor;

	// make initial g_ewald estimate
	// based on desired accuracy and real space cutoff
	// fluid-occupied volume used to estimate real-space error
	// zprd used rather than zprd_slab

	double h, h_x,h_y,h_z;
	bigint natoms = atom->natoms;

	if (!gewaldflag) {
	g_ewald = accuracysqrt(natomscutoffxprdyprdzprd) / (2.0q2);
	if (g_ewald >= 1.0)
	error->all(FLERR,"KSpace accuracy too large to estimate G vector");
	g_ewald = sqrt(-log(g_ewald)) / cutoff;
	}

	// set optimal nx_pppm,ny_pppm,nz_pppm based on order and accuracy
	// nz_pppm uses extended zprd_slab instead of zprd
	// reduce it until accuracy target is met

	if (!gridflag) {
	h = h_x = h_y = h_z = 4.0/g_ewald;
	int count = 0;
	while (1) {

	// set grid dimension
	nx_pppm = static_cast<int> (xprd/h_x);
	ny_pppm = static_cast<int> (yprd/h_y);
	nz_pppm = static_cast<int> (zprd_slab/h_z);

	if (nx_pppm <= 1) nx_pppm = 2;
	if (ny_pppm <= 1) ny_pppm = 2;
	if (nz_pppm <= 1) nz_pppm = 2;

	//set local grid dimension
	int npey_fft,npez_fft;
	if (nz_pppm >= nprocs) {
	npey_fft = 1;
	npez_fft = nprocs;
	} else procs2grid2d(nprocs,ny_pppm,nz_pppm,&npey_fft,&npez_fft);

	int me_y = me % npey_fft;
	int me_z = me / npey_fft;

	nxlo_fft = 0;
	nxhi_fft = nx_pppm - 1;
	nylo_fft = me_y*ny_pppm/npey_fft;
	nyhi_fft = (me_y+1)*ny_pppm/npey_fft - 1;
	nzlo_fft = me_z*nz_pppm/npez_fft;
	nzhi_fft = (me_z+1)*nz_pppm/npez_fft - 1;

	double qopt = compute_qopt();

	double dfkspace = sqrt(qopt/natoms)q2/(xprdyprd*zprd_slab);

	count++;

	// break loop if the accuracy has been reached or too many loops have been performed
	if (dfkspace <= accuracy) break;
	if (count > 500) error->all(FLERR, "Could not compute grid size for Coulomb interaction");
	h *= 0.95;
	h_x = h_y = h_z = h;
	}
	}

	// boost grid size until it is factorable

	while (!factorable(nx_pppm)) nx_pppm++;
	while (!factorable(ny_pppm)) ny_pppm++;
	while (!factorable(nz_pppm)) nz_pppm++;
	}

	/* ----------------------------------------------------------------------
	set the FFT parameters
	------------------------------------------------------------------------- */

	void PPPMDisp::set_fft_parameters(int& nx_p,int& ny_p,int& nz_p,
	int& nxlo_f,int& nylo_f,int& nzlo_f,
	int& nxhi_f,int& nyhi_f,int& nzhi_f,
	int& nxlo_i,int& nylo_i,int& nzlo_i,
	int& nxhi_i,int& nyhi_i,int& nzhi_i,
	int& nxlo_o,int& nylo_o,int& nzlo_o,
	int& nxhi_o,int& nyhi_o,int& nzhi_o,
	int& nlow, int& nupp,
	int& ng, int& nf, int& nfb,
	double& sft,double& sftone, int& ord)
	{
	// global indices of PPPM grid range from 0 to N-1
	// nlo_in,nhi_in = lower/upper limits of the 3d sub-brick of
	// global PPPM grid that I own without ghost cells
	// for slab PPPM, assign z grid as if it were not extended

	nxlo_i = static_cast<int> (comm->xsplit[comm->myloc[0]] * nx_p);
	nxhi_i = static_cast<int> (comm->xsplit[comm->myloc[0]+1] * nx_p) - 1;

	nylo_i = static_cast<int> (comm->ysplit[comm->myloc[1]] * ny_p);
	nyhi_i = static_cast<int> (comm->ysplit[comm->myloc[1]+1] * ny_p) - 1;

	nzlo_i = static_cast<int>
	(comm->zsplit[comm->myloc[2]] * nz_p/slab_volfactor);
	nzhi_i = static_cast<int>
	(comm->zsplit[comm->myloc[2]+1] * nz_p/slab_volfactor) - 1;


	// nlow,nupp = stencil size for mapping particles to PPPM grid

	nlow = -(ord-1)/2;
	nupp = ord/2;

	// sft values for particle <-> grid mapping
	// add/subtract OFFSET to avoid int(-0.75) = 0 when want it to be -1

	if (ord % 2) sft = OFFSET + 0.5;
	else sft = OFFSET;
	if (ord % 2) sftone = 0.0;
	else sftone = 0.5;

	// nlo_out,nhi_out = lower/upper limits of the 3d sub-brick of
	// global PPPM grid that my particles can contribute charge to
	// effectively nlo_in,nhi_in + ghost cells
	// nlo,nhi = global coords of grid pt to "lower left" of smallest/largest
	// position a particle in my box can be at
	// dist[3] = particle position bound = subbox + skin/2.0 + qdist
	// qdist = offset due to TIP4P fictitious charge
	// convert to triclinic if necessary
	// nlo_out,nhi_out = nlo,nhi + stencil size for particle mapping
	// for slab PPPM, assign z grid as if it were not extended

	double prd,sublo,*subhi;

	if (triclinic == 0) {
	prd = domain->prd;
	boxlo = domain->boxlo;
	sublo = domain->sublo;
	subhi = domain->subhi;
	} else {
	prd = domain->prd_lamda;
	boxlo = domain->boxlo_lamda;
	sublo = domain->sublo_lamda;
	subhi = domain->subhi_lamda;
	}

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;

	double dist[3];
	double cuthalf = 0.5*neighbor->skin + qdist;
	if (triclinic == 0) dist[0] = dist[1] = dist[2] = cuthalf;
	else {
	dist[0] = cuthalf/domain->prd[0];
	dist[1] = cuthalf/domain->prd[1];
	dist[2] = cuthalf/domain->prd[2];
	}

	int nlo,nhi;

	nlo = static_cast<int> ((sublo[0]-dist[0]-boxlo[0]) *
	nx_p/xprd + sft) - OFFSET;
	nhi = static_cast<int> ((subhi[0]+dist[0]-boxlo[0]) *
	nx_p/xprd + sft) - OFFSET;
	nxlo_o = nlo + nlow;
	nxhi_o = nhi + nupp;

	nlo = static_cast<int> ((sublo[1]-dist[1]-boxlo[1]) *
	ny_p/yprd + sft) - OFFSET;
	nhi = static_cast<int> ((subhi[1]+dist[1]-boxlo[1]) *
	ny_p/yprd + sft) - OFFSET;
	nylo_o = nlo + nlow;
	nyhi_o = nhi + nupp;

	nlo = static_cast<int> ((sublo[2]-dist[2]-boxlo[2]) *
	nz_p/zprd_slab + sft) - OFFSET;
	nhi = static_cast<int> ((subhi[2]+dist[2]-boxlo[2]) *
	nz_p/zprd_slab + sft) - OFFSET;
	nzlo_o = nlo + nlow;
	nzhi_o = nhi + nupp;

	// for slab PPPM, change the grid boundary for processors at +z end
	// to include the empty volume between periodically repeating slabs
	// for slab PPPM, want charge data communicated from -z proc to +z proc,
	// but not vice versa, also want field data communicated from +z proc to
	// -z proc, but not vice versa
	// this is accomplished by nzhi_i = nzhi_o on +z end (no ghost cells)

	if (slabflag && (comm->myloc[2] == comm->procgrid[2]-1)) {
	nzhi_i = nz_p - 1;
	nzhi_o = nz_p - 1;
	}

	// decomposition of FFT mesh
	// global indices range from 0 to N-1
	// proc owns entire x-dimension, clump of columns in y,z dimensions
	// npey_fft,npez_fft = # of procs in y,z dims
	// if nprocs is small enough, proc can own 1 or more entire xy planes,
	// else proc owns 2d sub-blocks of yz plane
	// me_y,me_z = which proc (0-npe_fft-1) I am in y,z dimensions
	// nlo_fft,nhi_fft = lower/upper limit of the section
	// of the global FFT mesh that I own

	int npey_fft,npez_fft;
	if (nz_p >= nprocs) {
	npey_fft = 1;
	npez_fft = nprocs;
	} else procs2grid2d(nprocs,ny_p,nz_p,&npey_fft,&npez_fft);

	int me_y = me % npey_fft;
	int me_z = me / npey_fft;

	nxlo_f = 0;
	nxhi_f = nx_p - 1;
	nylo_f = me_y*ny_p/npey_fft;
	nyhi_f = (me_y+1)*ny_p/npey_fft - 1;
	nzlo_f = me_z*nz_p/npez_fft;
	nzhi_f = (me_z+1)*nz_p/npez_fft - 1;

	// PPPM grid for this proc, including ghosts

	ng = (nxhi_o-nxlo_o+1) * (nyhi_o-nylo_o+1) *
	(nzhi_o-nzlo_o+1);

	// FFT arrays on this proc, without ghosts
	// nfft = FFT points in FFT decomposition on this proc
	// nfft_brick = FFT points in 3d brick-decomposition on this proc
	// nfft_both = greater of 2 values

	nf = (nxhi_f-nxlo_f+1) * (nyhi_f-nylo_f+1) *
	(nzhi_f-nzlo_f+1);
	int nfft_brick = (nxhi_i-nxlo_i+1) * (nyhi_i-nylo_i+1) *
	(nzhi_i-nzlo_i+1);
	nfb = MAX(nf,nfft_brick);

	}

	/* ----------------------------------------------------------------------
	check if all factors of n are in list of factors
	return 1 if yes, 0 if no
	------------------------------------------------------------------------- */

	int PPPMDisp::factorable(int n)
	{
	int i;

	while (n > 1) {
	for (i = 0; i < nfactors; i++) {
	if (n % factors[i] == 0) {
	n /= factors[i];
	break;
	}
	}
	if (i == nfactors) return 0;
	}

	return 1;
	}

	/* ----------------------------------------------------------------------
	pre-compute Green's function denominator expansion coeffs, Gamma(2n)
	------------------------------------------------------------------------- */
	void PPPMDisp::adjust_gewald()
	{

	// Use Newton solver to find g_ewald

	double dx;

	// Begin algorithm

	for (int i = 0; i < LARGE; i++) {
	dx = f() / derivf();
	g_ewald -= dx; //Update g_ewald
	if (fabs(f()) < SMALL) return;
	}

	// Failed to converge

	char str[128];
	sprintf(str, "Could not compute g_ewald");
	error->all(FLERR, str);

	}

	/* ----------------------------------------------------------------------
	Calculate f(x)
	------------------------------------------------------------------------- */

	double PPPMDisp::f()
	{
	double df_rspace, df_kspace;
	double q2 = qsqsum * force->qqrd2e;
	double xprd = domain->xprd;
	double yprd = domain->yprd;
	double zprd = domain->zprd;
	double zprd_slab = zprd*slab_volfactor;
	bigint natoms = atom->natoms;

	df_rspace = 2.0q2exp(-g_ewaldg_ewaldcutoff*cutoff) /
	sqrt(natomscutoffxprdyprdzprd);

	double qopt = compute_qopt();
	df_kspace = sqrt(qopt/natoms)q2/(xprdyprd*zprd_slab);

	return df_rspace - df_kspace;
	}

	/* ----------------------------------------------------------------------
	Calculate numerical derivative f'(x) using forward difference
	[f(x + h) - f(x)] / h
	------------------------------------------------------------------------- */

	double PPPMDisp::derivf()
	{
	double h = 0.000001; //Derivative step-size
	double df,f1,f2,g_ewald_old;

	f1 = f();
	g_ewald_old = g_ewald;
	g_ewald += h;
	f2 = f();
	g_ewald = g_ewald_old;
	df = (f2 - f1)/h;

	return df;
	}

	/* ----------------------------------------------------------------------
	Calculate the final estimator for the accuracy
	------------------------------------------------------------------------- */

	double PPPMDisp::final_accuracy()
	{
	double df_rspace, df_kspace;
	double q2 = qsqsum * force->qqrd2e;
	double xprd = domain->xprd;
	double yprd = domain->yprd;
	double zprd = domain->zprd;
	double zprd_slab = zprd*slab_volfactor;
	bigint natoms = atom->natoms;
	df_rspace = 2.0q2 exp(-g_ewaldg_ewaldcutoff*cutoff) /
	sqrt(natomscutoffxprdyprdzprd);

	double qopt = compute_qopt();

	df_kspace = sqrt(qopt/natoms)q2/(xprdyprd*zprd_slab);

	double acc = sqrt(df_rspacedf_rspace + df_kspacedf_kspace);
	return acc;
	}

	/* ----------------------------------------------------------------------
	Calculate the final estimator for the Dispersion accuracy
	------------------------------------------------------------------------- */

	void PPPMDisp::final_accuracy_6(double& acc, double& acc_real, double& acc_kspace)
	{
	double xprd = domain->xprd;
	double yprd = domain->yprd;
	double zprd = domain->zprd;
	double zprd_slab = zprd*slab_volfactor;
	bigint natoms = atom->natoms;
	acc_real = lj_rspace_error();

	double qopt = compute_qopt_6();

	acc_kspace = sqrt(qopt/natoms)csum/(xprdyprd*zprd_slab);

	acc = sqrt(acc_realacc_real + acc_kspaceacc_kspace);
	return;
	}

	/* ----------------------------------------------------------------------
	Compute qopt for Coulomb interactions
	------------------------------------------------------------------------- */

	double PPPMDisp::compute_qopt()
	{
	double qopt;
	if (differentiation_flag == 1) {
	qopt = compute_qopt_ad();
	} else {
	qopt = compute_qopt_ik();
	}
	double qopt_all;
	MPI_Allreduce(&qopt,&qopt_all,1,MPI_DOUBLE,MPI_SUM,world);
	return qopt_all;
	}

	/* ----------------------------------------------------------------------
	Compute qopt for Dispersion interactions
	------------------------------------------------------------------------- */

	double PPPMDisp::compute_qopt_6()
	{
	double qopt;
	if (differentiation_flag == 1) {
	qopt = compute_qopt_6_ad();
	} else {
	qopt = compute_qopt_6_ik();
	}
	double qopt_all;
	MPI_Allreduce(&qopt,&qopt_all,1,MPI_DOUBLE,MPI_SUM,world);
	return qopt_all;
	}

	/* ----------------------------------------------------------------------
	Compute qopt for the ik differentiation scheme and Coulomb interaction
	------------------------------------------------------------------------- */

	double PPPMDisp::compute_qopt_ik()
	{
	double qopt = 0.0;
	int k,l,m;
	double *prd;

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;

	double unitkx = (2.0*MY_PI/xprd);
	double unitky = (2.0*MY_PI/yprd);
	double unitkz = (2.0*MY_PI/zprd_slab);

	int nx,ny,nz,kper,lper,mper;
	double sqk, u2;
	double argx,argy,argz,wx,wy,wz,sx,sy,sz,qx,qy,qz;
	double sum1,sum2, sum3,dot1,dot2;

	int nbx = 2;
	int nby = 2;
	int nbz = 2;

	for (m = nzlo_fft; m <= nzhi_fft; m++) {
	mper = m - nz_pppm(2m/nz_pppm);

	for (l = nylo_fft; l <= nyhi_fft; l++) {
	lper = l - ny_pppm(2l/ny_pppm);

	for (k = nxlo_fft; k <= nxhi_fft; k++) {
	kper = k - nx_pppm(2k/nx_pppm);

	sqk = pow(unitkxkper,2.0) + pow(unitkylper,2.0) +
	pow(unitkz*mper,2.0);

	if (sqk != 0.0) {
	sum1 = 0.0;
	sum2 = 0.0;
	sum3 = 0.0;
	for (nx = -nbx; nx <= nbx; nx++) {
	qx = unitkx(kper+nx_pppmnx);
	sx = exp(-0.25*pow(qx/g_ewald,2.0));
	wx = 1.0;
	argx = 0.5qxxprd/nx_pppm;
	if (argx != 0.0) wx = pow(sin(argx)/argx,order);
	for (ny = -nby; ny <= nby; ny++) {
	qy = unitky(lper+ny_pppmny);
	sy = exp(-0.25*pow(qy/g_ewald,2.0));
	wy = 1.0;
	argy = 0.5qyyprd/ny_pppm;
	if (argy != 0.0) wy = pow(sin(argy)/argy,order);
	for (nz = -nbz; nz <= nbz; nz++) {
	qz = unitkz(mper+nz_pppmnz);
	sz = exp(-0.25*pow(qz/g_ewald,2.0));
	wz = 1.0;
	argz = 0.5qzzprd_slab/nz_pppm;
	if (argz != 0.0) wz = pow(sin(argz)/argz,order);

	dot1 = unitkxkperqx + unitkylperqy + unitkzmperqz;
	dot2 = qxqx+qyqy+qz*qz;
	u2 = pow(wxwywz,2.0);
	sum1 += sxsyszsxsysz/dot24.04.0MY_PI*MY_PI;
	sum2 += u2sxsysz4.0MY_PI/dot2dot1;
	sum3 += u2;
	}
	}
	}
	sum2 *= sum2;
	sum3 = sum3sqk;
	qopt += sum1 -sum2/sum3;
	}
	}
	}
	}
	return qopt;
	}

	/* ----------------------------------------------------------------------
	Compute qopt for the ad differentiation scheme and Coulomb interaction
	------------------------------------------------------------------------- */

	double PPPMDisp::compute_qopt_ad()
	{
	double qopt = 0.0;
	int k,l,m;
	double *prd;

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;


	double unitkx = (2.0*MY_PI/xprd);
	double unitky = (2.0*MY_PI/yprd);
	double unitkz = (2.0*MY_PI/zprd_slab);

	int nx,ny,nz,kper,lper,mper;
	double argx,argy,argz,wx,wy,wz,sx,sy,sz,qx,qy,qz;
	double u2, sqk;
	double sum1,sum2,sum3,sum4,dot2;

	int nbx = 2;
	int nby = 2;
	int nbz = 2;

	for (m = nzlo_fft; m <= nzhi_fft; m++) {
	mper = m - nz_pppm(2m/nz_pppm);

	for (l = nylo_fft; l <= nyhi_fft; l++) {
	lper = l - ny_pppm(2l/ny_pppm);

	for (k = nxlo_fft; k <= nxhi_fft; k++) {
	kper = k - nx_pppm(2k/nx_pppm);

	sqk = pow(unitkxkper,2.0) + pow(unitkylper,2.0) +
	pow(unitkz*mper,2.0);

	if (sqk != 0.0) {

	sum1 = 0.0;
	sum2 = 0.0;
	sum3 = 0.0;
	sum4 = 0.0;
	for (nx = -nbx; nx <= nbx; nx++) {
	qx = unitkx(kper+nx_pppmnx);
	sx = exp(-0.25*pow(qx/g_ewald,2.0));
	wx = 1.0;
	argx = 0.5qxxprd/nx_pppm;
	if (argx != 0.0) wx = pow(sin(argx)/argx,order);
	for (ny = -nby; ny <= nby; ny++) {
	qy = unitky(lper+ny_pppmny);
	sy = exp(-0.25*pow(qy/g_ewald,2.0));
	wy = 1.0;
	argy = 0.5qyyprd/ny_pppm;
	if (argy != 0.0) wy = pow(sin(argy)/argy,order);
	for (nz = -nbz; nz <= nbz; nz++) {
	qz = unitkz(mper+nz_pppmnz);
	sz = exp(-0.25*pow(qz/g_ewald,2.0));
	wz = 1.0;
	argz = 0.5qzzprd_slab/nz_pppm;
	if (argz != 0.0) wz = pow(sin(argz)/argz,order);

	dot2 = qxqx+qyqy+qz*qz;
	u2 = pow(wxwywz,2.0);
	sum1 += sxsyszsxsysz/dot24.04.0MY_PI*MY_PI;
	sum2 += sxsysz * u24.0MY_PI;
	sum3 += u2;
	sum4 += dot2*u2;
	}
	}
	}
	sum2 *= sum2;
	qopt += sum1 - sum2/(sum3*sum4);
	}
	}
	}
	}
	return qopt;
	}

	/* ----------------------------------------------------------------------
	Compute qopt for the ik differentiation scheme and Dispersion interaction
	------------------------------------------------------------------------- */

	double PPPMDisp::compute_qopt_6_ik()
	{
	double qopt = 0.0;
	int k,l,m;
	double *prd;

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;

	double unitkx = (2.0*MY_PI/xprd);
	double unitky = (2.0*MY_PI/yprd);
	double unitkz = (2.0*MY_PI/zprd_slab);

	int nx,ny,nz,kper,lper,mper;
	double sqk, u2;
	double argx,argy,argz,wx,wy,wz,sx,sy,sz,qx,qy,qz;
	double sum1,sum2, sum3;
	double dot1,dot2, rtdot2, term;
	double inv2ew = 2*g_ewald_6;
	inv2ew = 1.0/inv2ew;
	double rtpi = sqrt(MY_PI);

	int nbx = 2;
	int nby = 2;
	int nbz = 2;

	for (m = nzlo_fft_6; m <= nzhi_fft_6; m++) {
	mper = m - nz_pppm_6(2m/nz_pppm_6);

	for (l = nylo_fft_6; l <= nyhi_fft_6; l++) {
	lper = l - ny_pppm_6(2l/ny_pppm_6);

	for (k = nxlo_fft_6; k <= nxhi_fft_6; k++) {
	kper = k - nx_pppm_6(2k/nx_pppm_6);

	sqk = pow(unitkxkper,2.0) + pow(unitkylper,2.0) +
	pow(unitkz*mper,2.0);

	if (sqk != 0.0) {
	sum1 = 0.0;
	sum2 = 0.0;
	sum3 = 0.0;
	for (nx = -nbx; nx <= nbx; nx++) {
	qx = unitkx(kper+nx_pppm_6nx);
	sx = exp(-qxqxinv2ew*inv2ew);
	wx = 1.0;
	argx = 0.5qxxprd/nx_pppm_6;
	if (argx != 0.0) wx = pow(sin(argx)/argx,order_6);
	for (ny = -nby; ny <= nby; ny++) {
	qy = unitky(lper+ny_pppm_6ny);
	sy = exp(-qyqyinv2ew*inv2ew);
	wy = 1.0;
	argy = 0.5qyyprd/ny_pppm_6;
	if (argy != 0.0) wy = pow(sin(argy)/argy,order_6);
	for (nz = -nbz; nz <= nbz; nz++) {
	qz = unitkz(mper+nz_pppm_6nz);
	sz = exp(-qzqzinv2ew*inv2ew);
	wz = 1.0;
	argz = 0.5qzzprd_slab/nz_pppm_6;
	if (argz != 0.0) wz = pow(sin(argz)/argz,order_6);

	dot1 = unitkxkperqx + unitkylperqy + unitkzmperqz;
	dot2 = qxqx+qyqy+qz*qz;
	rtdot2 = sqrt(dot2);
	term = (1-2dot2inv2ewinv2ew)sxsysz +
	2dot2rtdot2inv2ewinv2ewinv2ewrtpierfc(rtdot2inv2ew);
	term = g_ewald_6g_ewald_6*g_ewald_6;
	u2 = pow(wxwywz,2.0);
	sum1 += termtermMY_PIMY_PIMY_PI/9.0 * dot2;
	sum2 += -u2termMY_PIrtpi/3.0dot1;
	sum3 += u2;
	}
	}
	}
	sum2 *= sum2;
	sum3 = sum3sqk;
	qopt += sum1 -sum2/sum3;
	}
	}
	}
	}
	return qopt;
	}

	/* ----------------------------------------------------------------------
	Compute qopt for the ad differentiation scheme and Dispersion interaction
	------------------------------------------------------------------------- */

	double PPPMDisp::compute_qopt_6_ad()
	{
	double qopt = 0.0;
	int k,l,m;
	double *prd;

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;

	double unitkx = (2.0*MY_PI/xprd);
	double unitky = (2.0*MY_PI/yprd);
	double unitkz = (2.0*MY_PI/zprd_slab);

	int nx,ny,nz,kper,lper,mper;
	double argx,argy,argz,wx,wy,wz,sx,sy,sz,qx,qy,qz;
	double u2, sqk;
	double sum1,sum2,sum3,sum4;
	double dot2, rtdot2, term;
	double inv2ew = 2*g_ewald_6;
	inv2ew = 1/inv2ew;
	double rtpi = sqrt(MY_PI);

	int nbx = 2;
	int nby = 2;
	int nbz = 2;

	for (m = nzlo_fft_6; m <= nzhi_fft_6; m++) {
	mper = m - nz_pppm_6(2m/nz_pppm_6);

	for (l = nylo_fft_6; l <= nyhi_fft_6; l++) {
	lper = l - ny_pppm_6(2l/ny_pppm_6);

	for (k = nxlo_fft_6; k <= nxhi_fft_6; k++) {
	kper = k - nx_pppm_6(2k/nx_pppm_6);

	sqk = pow(unitkxkper,2.0) + pow(unitkylper,2.0) +
	pow(unitkz*mper,2.0);

	if (sqk != 0.0) {

	sum1 = 0.0;
	sum2 = 0.0;
	sum3 = 0.0;
	sum4 = 0.0;
	for (nx = -nbx; nx <= nbx; nx++) {
	qx = unitkx(kper+nx_pppm_6nx);
	sx = exp(-qxqxinv2ew*inv2ew);
	wx = 1.0;
	argx = 0.5qxxprd/nx_pppm_6;
	if (argx != 0.0) wx = pow(sin(argx)/argx,order_6);
	for (ny = -nby; ny <= nby; ny++) {
	qy = unitky(lper+ny_pppm_6ny);
	sy = exp(-qyqyinv2ew*inv2ew);
	wy = 1.0;
	argy = 0.5qyyprd/ny_pppm_6;
	if (argy != 0.0) wy = pow(sin(argy)/argy,order_6);
	for (nz = -nbz; nz <= nbz; nz++) {
	qz = unitkz(mper+nz_pppm_6nz);
	sz = exp(-qzqzinv2ew*inv2ew);
	wz = 1.0;
	argz = 0.5qzzprd_slab/nz_pppm_6;
	if (argz != 0.0) wz = pow(sin(argz)/argz,order_6);

	dot2 = qxqx+qyqy+qz*qz;
	rtdot2 = sqrt(dot2);
	term = (1-2dot2inv2ewinv2ew)sxsysz +
	2dot2rtdot2inv2ewinv2ewinv2ewrtpierfc(rtdot2inv2ew);
	term = g_ewald_6g_ewald_6*g_ewald_6;
	u2 = pow(wxwywz,2.0);
	sum1 += termtermMY_PIMY_PIMY_PI/9.0 * dot2;
	sum2 += -termMY_PIrtpi/3.0 * u2 * dot2;
	sum3 += u2;
	sum4 += dot2*u2;
	}
	}
	}
	sum2 *= sum2;
	qopt += sum1 - sum2/(sum3*sum4);
	}
	}
	}
	}
	return qopt;
	}

	/* ----------------------------------------------------------------------
	set size of FFT grid and g_ewald_6
	for Dispersion interactions
	------------------------------------------------------------------------- */

	void PPPMDisp::set_grid_6()
	{
	// Calculate csum
	if (!csumflag) calc_csum();
	if (!gewaldflag_6) set_init_g6();
	if (!gridflag_6) set_n_pppm_6();
	while (!factorable(nx_pppm_6)) nx_pppm_6++;
	while (!factorable(ny_pppm_6)) ny_pppm_6++;
	while (!factorable(nz_pppm_6)) nz_pppm_6++;

	}

	/* ----------------------------------------------------------------------
	Calculate the sum of the squared dispersion coefficients and other
	related quantities required for the calculations
	------------------------------------------------------------------------- */

	void PPPMDisp::calc_csum()
	{
	csumij = 0.0;
	csum = 0.0;

	int ntypes = atom->ntypes;
	int i,j,k;

	delete [] cii;
	cii = new double[ntypes +1];
	for (i = 0; i<=ntypes; i++) cii[i] = 0.0;
	delete [] csumi;
	csumi = new double[ntypes +1];
	for (i = 0; i<=ntypes; i++) csumi[i] = 0.0;
	int *neach = new int[ntypes+1];
	for (i = 0; i<=ntypes; i++) neach[i] = 0;

	//the following variables are needed to distinguish between arithmetic
	// and geometric mixing

	if (function[1]) {
	for (i = 1; i <= ntypes; i++)
	cii[i] = B[i]*B[i];
	int tmp;
	for (i = 0; i < atom->nlocal; i++) {
	tmp = atom->type[i];
	neach[tmp]++;
	csum += B[tmp]*B[tmp];
	}
	}
	if (function[2]) {
	for (i = 1; i <= ntypes; i++)
	cii[i] = 64.0/20.0B[7i+3]B[7i+3];
	int tmp;
	for (i = 0; i < atom->nlocal; i++) {
	tmp = atom->type[i];
	neach[tmp]++;
	csum += 64.0/20.0B[7tmp+3]B[7tmp+3];
	}
	}
	if (function[3]) {
	for (i = 1; i <= ntypes; i++)
	for (j = 0; j < nsplit; j++)
	cii[i] += B[j]B[nspliti + j]B[nspliti + j];
	int tmp;
	for (i = 0; i < atom->nlocal; i++) {
	tmp = atom->type[i];
	neach[tmp]++;
	for (j = 0; j < nsplit; j++)
	csum += B[j]B[nsplittmp + j]B[nsplittmp + j];
	}
	}


	double tmp2;
	MPI_Allreduce(&csum,&tmp2,1,MPI_DOUBLE,MPI_SUM,world);
	csum = tmp2;
	csumflag = 1;

	int *neach_all = new int[ntypes+1];
	MPI_Allreduce(neach,neach_all,ntypes+1,MPI_INT,MPI_SUM,world);

	// copmute csumij and csumi
	double d1, d2;
	if (function[1]){
	for (i=1; i<=ntypes; i++) {
	for (j=1; j<=ntypes; j++) {
	csumi[i] += neach_all[j]B[i]B[j];
	d1 = neach_all[i]*B[i];
	d2 = neach_all[j]*B[j];
	csumij += d1*d2;
	//csumij += neach_all[i]neach_all[j]B[i]*B[j];
	}
	}
	}
	if (function[2]) {
	for (i=1; i<=ntypes; i++) {
	for (j=1; j<=ntypes; j++) {
	for (k=0; k<=6; k++) {
	csumi[i] += neach_all[j]B[7i + k]B[7(j+1)-k-1];
	d1 = neach_all[i]B[7i + k];
	d2 = neach_all[j]B[7(j+1)-k-1];
	csumij += d1*d2;
	//csumij += neach_all[i]neach_all[j]B[7i + k]B[7*(j+1)-k-1];
	}
	}
	}
	}
	if (function[3]) {
	for (i=1; i<=ntypes; i++) {
	for (j=1; j<=ntypes; j++) {
	for (k=0; k<nsplit; k++) {
	csumi[i] += neach_all[j]B[k]B[nspliti+k]B[nsplit*j+k];
	d1 = neach_all[i]B[nspliti+k];
	d2 = neach_all[j]B[nsplitj+k];
	csumij += B[k]d1d2;
	}
	}
	}
	}

	delete [] neach;
	delete [] neach_all;
	}

	/* ----------------------------------------------------------------------
	adjust g_ewald_6 to the new grid size
	------------------------------------------------------------------------- */

	void PPPMDisp::adjust_gewald_6()
	{
	// Use Newton solver to find g_ewald_6
	double dx;

	// Start loop

	for (int i = 0; i < LARGE; i++) {
	dx = f_6() / derivf_6();
	g_ewald_6 -= dx; //update g_ewald_6
	if (fabs(f_6()) < SMALL) return;
	}

	// Failed to converge

	char str[128];
	sprintf(str, "Could not adjust g_ewald_6");
	error->all(FLERR, str);

	}

	/* ----------------------------------------------------------------------
	Calculate f(x) for Dispersion interaction
	------------------------------------------------------------------------- */

	double PPPMDisp::f_6()
	{
	double df_rspace, df_kspace;
	double *prd;

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;
	bigint natoms = atom->natoms;

	df_rspace = lj_rspace_error();

	double qopt = compute_qopt_6();
	df_kspace = sqrt(qopt/natoms)csum/(xprdyprd*zprd_slab);

	return df_rspace - df_kspace;
	}

	/* ----------------------------------------------------------------------
	Calculate numerical derivative f'(x) using forward difference
	[f(x + h) - f(x)] / h
	------------------------------------------------------------------------- */

	double PPPMDisp::derivf_6()
	{
	double h = 0.000001; //Derivative step-size
	double df,f1,f2,g_ewald_old;

	f1 = f_6();
	g_ewald_old = g_ewald_6;
	g_ewald_6 += h;
	f2 = f_6();
	g_ewald_6 = g_ewald_old;
	df = (f2 - f1)/h;

	return df;
	}


	/* ----------------------------------------------------------------------
	calculate an initial value for g_ewald_6
	---------------------------------------------------------------------- */

	void PPPMDisp::set_init_g6()
	{
	// use xprd,yprd,zprd even if triclinic so grid size is the same
	// adjust z dimension for 2d slab PPPM
	// 3d PPPM just uses zprd since slab_volfactor = 1.0

	// make initial g_ewald estimate
	// based on desired error and real space cutoff

	// compute initial value for df_real with g_ewald_6 = 1/cutoff_lj
	// if df_real > 0, repeat divide g_ewald_6 by 2 until df_real < 0
	// else, repeat multiply g_ewald_6 by 2 until df_real > 0
	// perform bisection for the last two values of
	double df_real;
	double g_ewald_old;
	double gmin, gmax;

	// check if there is a user defined accuracy
	double acc_rspace = accuracy;
	if (accuracy_real_6 > 0) acc_rspace = accuracy_real_6;

	g_ewald_old = g_ewald_6 = 1.0/cutoff_lj;
	df_real = lj_rspace_error() - acc_rspace;
	int counter = 0;
	if (df_real > 0) {
	while (df_real > 0 && counter < LARGE) {
	counter++;
	g_ewald_old = g_ewald_6;
	g_ewald_6 *= 2;
	df_real = lj_rspace_error() - acc_rspace;
	}
	}

	if (df_real < 0) {
	while (df_real < 0 && counter < LARGE) {
	counter++;
	g_ewald_old = g_ewald_6;
	g_ewald_6 *= 0.5;
	df_real = lj_rspace_error() - acc_rspace;
	}
	}

	if (counter >= LARGE-1) error->all(FLERR,"Cannot compute initial g_ewald_disp");

	gmin = MIN(g_ewald_6, g_ewald_old);
	gmax = MAX(g_ewald_6, g_ewald_old);
	g_ewald_6 = gmin + 0.5*(gmax-gmin);
	counter = 0;
	while (gmax-gmin > SMALL && counter < LARGE) {
	counter++;
	df_real = lj_rspace_error() -acc_rspace;
	if (df_real < 0) gmax = g_ewald_6;
	else gmin = g_ewald_6;
	g_ewald_6 = gmin + 0.5*(gmax-gmin);
	}
	if (counter >= LARGE-1) error->all(FLERR,"Cannot compute initial g_ewald_disp");

	}

	/* ----------------------------------------------------------------------
	calculate nx_pppm, ny_pppm, nz_pppm for dispersion interaction
	---------------------------------------------------------------------- */

	void PPPMDisp::set_n_pppm_6()
	{
	bigint natoms = atom->natoms;

	double *prd;

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;
	double h, h_x,h_y,h_z;

	double acc_kspace = accuracy;
	if (accuracy_kspace_6 > 0.0) acc_kspace = accuracy_kspace_6;

	// initial value for the grid spacing
	h = h_x = h_y = h_z = 4.0/g_ewald_6;
	// decrease grid spacing untill required precision is obtained
	int count = 0;
	while(1) {

	// set grid dimension
	nx_pppm_6 = static_cast<int> (xprd/h_x);
	ny_pppm_6 = static_cast<int> (yprd/h_y);
	nz_pppm_6 = static_cast<int> (zprd_slab/h_z);

	if (nx_pppm_6 <= 1) nx_pppm_6 = 2;
	if (ny_pppm_6 <= 1) ny_pppm_6 = 2;
	if (nz_pppm_6 <= 1) nz_pppm_6 = 2;

	//set local grid dimension
	int npey_fft,npez_fft;
	if (nz_pppm_6 >= nprocs) {
	npey_fft = 1;
	npez_fft = nprocs;
	} else procs2grid2d(nprocs,ny_pppm_6,nz_pppm_6,&npey_fft,&npez_fft);

	int me_y = me % npey_fft;
	int me_z = me / npey_fft;

	nxlo_fft_6 = 0;
	nxhi_fft_6 = nx_pppm_6 - 1;
	nylo_fft_6 = me_y*ny_pppm_6/npey_fft;
	nyhi_fft_6 = (me_y+1)*ny_pppm_6/npey_fft - 1;
	nzlo_fft_6 = me_z*nz_pppm_6/npez_fft;
	nzhi_fft_6 = (me_z+1)*nz_pppm_6/npez_fft - 1;

	double qopt = compute_qopt_6();

	double df_kspace = sqrt(qopt/natoms)csum/(xprdyprd*zprd_slab);

	count++;

	// break loop if the accuracy has been reached or too many loops have been performed
	if (df_kspace <= acc_kspace) break;
	if (count > 500) error->all(FLERR, "Could not compute grid size for Dispersion");
	h *= 0.95;
	h_x = h_y = h_z = h;
	}
	}

	/* ----------------------------------------------------------------------
	calculate the real space error for dispersion interactions
	---------------------------------------------------------------------- */

	double PPPMDisp::lj_rspace_error()
	{
	bigint natoms = atom->natoms;
	double xprd = domain->xprd;
	double yprd = domain->yprd;
	double zprd = domain->zprd;
	double zprd_slab = zprd*slab_volfactor;

	double deltaf;
	double rgs = (cutoff_lj*g_ewald_6);
	rgs *= rgs;
	double rgs_inv = 1.0/rgs;
	deltaf = csum/sqrt(natomsxprdyprdzprd_slabcutoff_lj)sqrt(MY_PI)pow(g_ewald_6, 5)*
	exp(-rgs)(1+rgs_inv(3+rgs_inv(6+rgs_inv6)));
	return deltaf;
	}


	/* ----------------------------------------------------------------------
	Compyute the modified (hockney-eastwood) coulomb green function
	---------------------------------------------------------------------- */

	void PPPMDisp::compute_gf()
	{
	int k,l,m,n;
	double *prd;

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;
	volume = xprd * yprd * zprd_slab;

	double unitkx = (2.0*MY_PI/xprd);
	double unitky = (2.0*MY_PI/yprd);
	double unitkz = (2.0*MY_PI/zprd_slab);

	int kper,lper,mper;
	double snx,sny,snz,snx2,sny2,snz2;
	double sqk;
	double argx,argy,argz,wx,wy,wz,sx,sy,sz,qx,qy,qz;
	double numerator,denominator;


	n = 0;
	for (m = nzlo_fft; m <= nzhi_fft; m++) {
	mper = m - nz_pppm(2m/nz_pppm);
	qz = unitkz*mper;
	snz = sin(0.5qzzprd_slab/nz_pppm);
	snz2 = snz*snz;
	sz = exp(-0.25*pow(qz/g_ewald,2.0));
	wz = 1.0;
	argz = 0.5qzzprd_slab/nz_pppm;
	if (argz != 0.0) wz = pow(sin(argz)/argz,order);
	wz *= wz;

	for (l = nylo_fft; l <= nyhi_fft; l++) {
	lper = l - ny_pppm(2l/ny_pppm);
	qy = unitky*lper;
	sny = sin(0.5qyyprd/ny_pppm);
	sny2 = sny*sny;
	sy = exp(-0.25*pow(qy/g_ewald,2.0));
	wy = 1.0;
	argy = 0.5qyyprd/ny_pppm;
	if (argy != 0.0) wy = pow(sin(argy)/argy,order);
	wy *= wy;

	for (k = nxlo_fft; k <= nxhi_fft; k++) {
	kper = k - nx_pppm(2k/nx_pppm);
	qx = unitkx*kper;
	snx = sin(0.5qxxprd/nx_pppm);
	snx2 = snx*snx;
	sx = exp(-0.25*pow(qx/g_ewald,2.0));
	wx = 1.0;
	argx = 0.5qxxprd/nx_pppm;
	if (argx != 0.0) wx = pow(sin(argx)/argx,order);
	wx *= wx;

	sqk = pow(qx,2.0) + pow(qy,2.0) + pow(qz,2.0);

	if (sqk != 0.0) {
	numerator = 4.0*MY_PI/sqk;
	denominator = gf_denom(snx2,sny2,snz2, gf_b, order);
	greensfn[n++] = numeratorsxsyszwxwywz/denominator;
	} else greensfn[n++] = 0.0;
	}
	}
	}
	}

	/* ----------------------------------------------------------------------
	compute self force coefficients for ad-differentiation scheme
	and Coulomb interaction
	------------------------------------------------------------------------- */

	void PPPMDisp::compute_sf_precoeff(int nxp, int nyp, int nzp, int ord,
	int nxlo_ft, int nylo_ft, int nzlo_ft,
	int nxhi_ft, int nyhi_ft, int nzhi_ft,
	double sf_pre1, double sf_pre2, double *sf_pre3,
	double sf_pre4, double sf_pre5, double *sf_pre6)
	{

	int i,k,l,m,n;
	double *prd;

	// volume-dependent factors
	// adjust z dimension for 2d slab PPPM
	// z dimension for 3d PPPM is zprd since slab_volfactor = 1.0

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;

	double unitkx = (2.0*MY_PI/xprd);
	double unitky = (2.0*MY_PI/yprd);
	double unitkz = (2.0*MY_PI/zprd_slab);

	int nx,ny,nz,kper,lper,mper;
	double argx,argy,argz;
	double wx0[5],wy0[5],wz0[5],wx1[5],wy1[5],wz1[5],wx2[5],wy2[5],wz2[5];
	double qx0,qy0,qz0,qx1,qy1,qz1,qx2,qy2,qz2;
	double u0,u1,u2,u3,u4,u5,u6;
	double sum1,sum2,sum3,sum4,sum5,sum6;

	int nb = 2;

	n = 0;
	for (m = nzlo_ft; m <= nzhi_ft; m++) {
	mper = m - nzp(2m/nzp);

	for (l = nylo_ft; l <= nyhi_ft; l++) {
	lper = l - nyp(2l/nyp);

	for (k = nxlo_ft; k <= nxhi_ft; k++) {
	kper = k - nxp(2k/nxp);

	sum1 = sum2 = sum3 = sum4 = sum5 = sum6 = 0.0;
	for (i = -nb; i <= nb; i++) {

	qx0 = unitkx(kper+nxpi);
	qx1 = unitkx(kper+nxp(i+1));
	qx2 = unitkx(kper+nxp(i+2));
	wx0[i+2] = 1.0;
	wx1[i+2] = 1.0;
	wx2[i+2] = 1.0;
	argx = 0.5qx0xprd/nxp;
	if (argx != 0.0) wx0[i+2] = pow(sin(argx)/argx,ord);
	argx = 0.5qx1xprd/nxp;
	if (argx != 0.0) wx1[i+2] = pow(sin(argx)/argx,ord);
	argx = 0.5qx2xprd/nxp;
	if (argx != 0.0) wx2[i+2] = pow(sin(argx)/argx,ord);

	qy0 = unitky(lper+nypi);
	qy1 = unitky(lper+nyp(i+1));
	qy2 = unitky(lper+nyp(i+2));
	wy0[i+2] = 1.0;
	wy1[i+2] = 1.0;
	wy2[i+2] = 1.0;
	argy = 0.5qy0yprd/nyp;
	if (argy != 0.0) wy0[i+2] = pow(sin(argy)/argy,ord);
	argy = 0.5qy1yprd/nyp;
	if (argy != 0.0) wy1[i+2] = pow(sin(argy)/argy,ord);
	argy = 0.5qy2yprd/nyp;
	if (argy != 0.0) wy2[i+2] = pow(sin(argy)/argy,ord);

	qz0 = unitkz(mper+nzpi);
	qz1 = unitkz(mper+nzp(i+1));
	qz2 = unitkz(mper+nzp(i+2));
	wz0[i+2] = 1.0;
	wz1[i+2] = 1.0;
	wz2[i+2] = 1.0;
	argz = 0.5qz0zprd_slab/nzp;
	if (argz != 0.0) wz0[i+2] = pow(sin(argz)/argz,ord);
	argz = 0.5qz1zprd_slab/nzp;
	if (argz != 0.0) wz1[i+2] = pow(sin(argz)/argz,ord);
	argz = 0.5qz2zprd_slab/nzp;
	if (argz != 0.0) wz2[i+2] = pow(sin(argz)/argz,ord);
	}

	for (nx = 0; nx <= 4; nx++) {
	for (ny = 0; ny <= 4; ny++) {
	for (nz = 0; nz <= 4; nz++) {
	u0 = wx0[nx]wy0[ny]wz0[nz];
	u1 = wx1[nx]wy0[ny]wz0[nz];
	u2 = wx2[nx]wy0[ny]wz0[nz];
	u3 = wx0[nx]wy1[ny]wz0[nz];
	u4 = wx0[nx]wy2[ny]wz0[nz];
	u5 = wx0[nx]wy0[ny]wz1[nz];
	u6 = wx0[nx]wy0[ny]wz2[nz];

	sum1 += u0*u1;
	sum2 += u0*u2;
	sum3 += u0*u3;
	sum4 += u0*u4;
	sum5 += u0*u5;
	sum6 += u0*u6;
	}
	}
	}

	// store values

	sf_pre1[n] = sum1;
	sf_pre2[n] = sum2;
	sf_pre3[n] = sum3;
	sf_pre4[n] = sum4;
	sf_pre5[n] = sum5;
	sf_pre6[n++] = sum6;
	}
	}
	}
	}

	/* ----------------------------------------------------------------------
	Compute the modified (hockney-eastwood) dispersion green function
	---------------------------------------------------------------------- */

	void PPPMDisp::compute_gf_6()
	{
	double *prd;
	int k,l,m,n;

	// volume-dependent factors
	// adjust z dimension for 2d slab PPPM
	// z dimension for 3d PPPM is zprd since slab_volfactor = 1.0

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;

	double unitkx = (2.0*MY_PI/xprd);
	double unitky = (2.0*MY_PI/yprd);
	double unitkz = (2.0*MY_PI/zprd_slab);

	int kper,lper,mper;
	double sqk;
	double snx,sny,snz,snx2,sny2,snz2;
	double argx,argy,argz,wx,wy,wz,sx,sy,sz;
	double qx,qy,qz;
	double rtsqk, term;
	double numerator,denominator;
	double inv2ew = 2*g_ewald_6;
	inv2ew = 1/inv2ew;
	double rtpi = sqrt(MY_PI);

	numerator = -MY_PIrtpig_ewald_6g_ewald_6g_ewald_6/(3.0);

	n = 0;
	for (m = nzlo_fft_6; m <= nzhi_fft_6; m++) {
	mper = m - nz_pppm_6(2m/nz_pppm_6);
	qz = unitkz*mper;
	snz = sin(0.5unitkzmper*zprd_slab/nz_pppm_6);
	snz2 = snz*snz;
	sz = exp(-qzqzinv2ew*inv2ew);
	wz = 1.0;
	argz = 0.5qzzprd_slab/nz_pppm_6;
	if (argz != 0.0) wz = pow(sin(argz)/argz,order_6);
	wz *= wz;

	for (l = nylo_fft_6; l <= nyhi_fft_6; l++) {
	lper = l - ny_pppm_6(2l/ny_pppm_6);
	qy = unitky*lper;
	sny = sin(0.5unitkylper*yprd/ny_pppm_6);
	sny2 = sny*sny;
	sy = exp(-qyqyinv2ew*inv2ew);
	wy = 1.0;
	argy = 0.5qyyprd/ny_pppm_6;
	if (argy != 0.0) wy = pow(sin(argy)/argy,order_6);
	wy *= wy;

	for (k = nxlo_fft_6; k <= nxhi_fft_6; k++) {
	kper = k - nx_pppm_6(2k/nx_pppm_6);
	qx = unitkx*kper;
	snx = sin(0.5unitkxkper*xprd/nx_pppm_6);
	snx2 = snx*snx;
	sx = exp(-qxqxinv2ew*inv2ew);
	wx = 1.0;
	argx = 0.5qxxprd/nx_pppm_6;
	if (argx != 0.0) wx = pow(sin(argx)/argx,order_6);
	wx *= wx;

	sqk = pow(qx,2.0) + pow(qy,2.0) + pow(qz,2.0);

	if (sqk != 0.0) {
	denominator = gf_denom(snx2,sny2,snz2, gf_b_6, order_6);
	rtsqk = sqrt(sqk);
	term = (1-2sqkinv2ewinv2ew)sxsysz +
	2sqkrtsqkinv2ewinv2ewinv2ewrtpierfc(rtsqkinv2ew);
	greensfn_6[n++] = numeratortermwxwywz/denominator;
	} else greensfn_6[n++] = 0.0;
	}
	}
	}
	}

	/* ----------------------------------------------------------------------
	compute self force coefficients for ad-differentiation scheme
	and Coulomb interaction
	------------------------------------------------------------------------- */
	void PPPMDisp::compute_sf_coeff()
	{
	int i,k,l,m,n;
	double *prd;

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;
	volume = xprd * yprd * zprd_slab;

	for (i = 0; i <= 5; i++) sf_coeff[i] = 0.0;

	n = 0;
	for (m = nzlo_fft; m <= nzhi_fft; m++) {
	for (l = nylo_fft; l <= nyhi_fft; l++) {
	for (k = nxlo_fft; k <= nxhi_fft; k++) {
	sf_coeff[0] += sf_precoeff1[n]*greensfn[n];
	sf_coeff[1] += sf_precoeff2[n]*greensfn[n];
	sf_coeff[2] += sf_precoeff3[n]*greensfn[n];
	sf_coeff[3] += sf_precoeff4[n]*greensfn[n];
	sf_coeff[4] += sf_precoeff5[n]*greensfn[n];
	sf_coeff[5] += sf_precoeff6[n]*greensfn[n];
	++n;
	}
	}
	}

	// Compute the coefficients for the self-force correction

	double prex, prey, prez;
	prex = prey = prez = MY_PI/volume;
	prex *= nx_pppm/xprd;
	prey *= ny_pppm/yprd;
	prez *= nz_pppm/zprd_slab;
	sf_coeff[0] *= prex;
	sf_coeff[1] = prex2;
	sf_coeff[2] *= prey;
	sf_coeff[3] = prey2;
	sf_coeff[4] *= prez;
	sf_coeff[5] = prez2;

	// communicate values with other procs

	double tmp[6];
	MPI_Allreduce(sf_coeff,tmp,6,MPI_DOUBLE,MPI_SUM,world);
	for (n = 0; n < 6; n++) sf_coeff[n] = tmp[n];
	}

	/* ----------------------------------------------------------------------
	compute self force coefficients for ad-differentiation scheme
	and Dispersion interaction
	------------------------------------------------------------------------- */

	void PPPMDisp::compute_sf_coeff_6()
	{
	int i,k,l,m,n;
	double *prd;

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;
	volume = xprd * yprd * zprd_slab;

	for (i = 0; i <= 5; i++) sf_coeff_6[i] = 0.0;

	n = 0;
	for (m = nzlo_fft_6; m <= nzhi_fft_6; m++) {
	for (l = nylo_fft_6; l <= nyhi_fft_6; l++) {
	for (k = nxlo_fft_6; k <= nxhi_fft_6; k++) {
	sf_coeff_6[0] += sf_precoeff1_6[n]*greensfn_6[n];
	sf_coeff_6[1] += sf_precoeff2_6[n]*greensfn_6[n];
	sf_coeff_6[2] += sf_precoeff3_6[n]*greensfn_6[n];
	sf_coeff_6[3] += sf_precoeff4_6[n]*greensfn_6[n];
	sf_coeff_6[4] += sf_precoeff5_6[n]*greensfn_6[n];
	sf_coeff_6[5] += sf_precoeff6_6[n]*greensfn_6[n];
	++n;
	}
	}
	}


	// perform multiplication with prefactors

	double prex, prey, prez;
	prex = prey = prez = MY_PI/volume;
	prex *= nx_pppm_6/xprd;
	prey *= ny_pppm_6/yprd;
	prez *= nz_pppm_6/zprd_slab;
	sf_coeff_6[0] *= prex;
	sf_coeff_6[1] = prex2;
	sf_coeff_6[2] *= prey;
	sf_coeff_6[3] = prey2;
	sf_coeff_6[4] *= prez;
	sf_coeff_6[5] = prez2;

	// communicate values with other procs

	double tmp[6];
	MPI_Allreduce(sf_coeff_6,tmp,6,MPI_DOUBLE,MPI_SUM,world);
	for (n = 0; n < 6; n++) sf_coeff_6[n] = tmp[n];

	}

	/* ----------------------------------------------------------------------
	denominator for Hockney-Eastwood Green's function
	of x,y,z = sin(kx*deltax/2), etc

	inf n-1
	S(n,k) = Sum W(k+pij)2 = Sum b(l)(zz)*l
	j=-inf l=0

	= -(zz)n /(2n-1)! (d/dx)**(2n-1) cot(x) at z = sin(x)
	gf_b = denominator expansion coeffs
	------------------------------------------------------------------------- */

	double PPPMDisp::gf_denom(double x, double y, double z, double *g_b, int ord)
	{
	double sx,sy,sz;
	sz = sy = sx = 0.0;
	for (int l = ord-1; l >= 0; l--) {
	sx = g_b[l] + sx*x;
	sy = g_b[l] + sy*y;
	sz = g_b[l] + sz*z;
	}
	double s = sxsysz;
	return s*s;
	}

	/* ----------------------------------------------------------------------
	pre-compute Green's function denominator expansion coeffs, Gamma(2n)
	------------------------------------------------------------------------- */

	void PPPMDisp::compute_gf_denom(double* gf, int ord)
	{
	int k,l,m;

	for (l = 1; l < ord; l++) gf[l] = 0.0;
	gf[0] = 1.0;

	for (m = 1; m < ord; m++) {
	for (l = m; l > 0; l--)
	gf[l] = 4.0 * (gf[l](l-m)(l-m-0.5)-gf[l-1](l-m-1)(l-m-1));
	gf[0] = 4.0 * (gf[0](l-m)(l-m-0.5));
	}

	bigint ifact = 1;
	for (k = 1; k < 2ord; k++) ifact = k;
	double gaminv = 1.0/ifact;
	for (l = 0; l < ord; l++) gf[l] *= gaminv;
	}

	/* ----------------------------------------------------------------------
	ghost-swap to accumulate full density in brick decomposition
	remap density from 3d brick decomposition to FFTdecomposition
	for coulomb interaction or dispersion interaction with geometric
	mixing
	------------------------------------------------------------------------- */

	void PPPMDisp::brick2fft(int nxlo_i, int nylo_i, int nzlo_i,
	int nxhi_i, int nyhi_i, int nzhi_i,
	FFT_SCALAR*** dbrick, FFT_SCALAR* dfft, FFT_SCALAR* work,
	LAMMPS_NS::Remap* rmp)
	{
	int n,ix,iy,iz;

	// copy grabs inner portion of density from 3d brick
	// remap could be done as pre-stage of FFT,
	// but this works optimally on only double values, not complex values

	n = 0;
	for (iz = nzlo_i; iz <= nzhi_i; iz++)
	for (iy = nylo_i; iy <= nyhi_i; iy++)
	for (ix = nxlo_i; ix <= nxhi_i; ix++)
	dfft[n++] = dbrick[iz][iy][ix];

	rmp->perform(dfft,dfft,work);
	}


	/* ----------------------------------------------------------------------
	ghost-swap to accumulate full density in brick decomposition
	remap density from 3d brick decomposition to FFTdecomposition
	for dispersion with arithmetic mixing rule
	------------------------------------------------------------------------- */

	void PPPMDisp::brick2fft_a()
	{
	int n,ix,iy,iz;

	// copy grabs inner portion of density from 3d brick
	// remap could be done as pre-stage of FFT,
	// but this works optimally on only double values, not complex values

	n = 0;
	for (iz = nzlo_in_6; iz <= nzhi_in_6; iz++)
	for (iy = nylo_in_6; iy <= nyhi_in_6; iy++)
	for (ix = nxlo_in_6; ix <= nxhi_in_6; ix++) {
	density_fft_a0[n] = density_brick_a0[iz][iy][ix];
	density_fft_a1[n] = density_brick_a1[iz][iy][ix];
	density_fft_a2[n] = density_brick_a2[iz][iy][ix];
	density_fft_a3[n] = density_brick_a3[iz][iy][ix];
	density_fft_a4[n] = density_brick_a4[iz][iy][ix];
	density_fft_a5[n] = density_brick_a5[iz][iy][ix];
	density_fft_a6[n++] = density_brick_a6[iz][iy][ix];
	}

	remap_6->perform(density_fft_a0,density_fft_a0,work1_6);
	remap_6->perform(density_fft_a1,density_fft_a1,work1_6);
	remap_6->perform(density_fft_a2,density_fft_a2,work1_6);
	remap_6->perform(density_fft_a3,density_fft_a3,work1_6);
	remap_6->perform(density_fft_a4,density_fft_a4,work1_6);
	remap_6->perform(density_fft_a5,density_fft_a5,work1_6);
	remap_6->perform(density_fft_a6,density_fft_a6,work1_6);

	}

	/* ----------------------------------------------------------------------
	ghost-swap to accumulate full density in brick decomposition
	remap density from 3d brick decomposition to FFTdecomposition
	for dispersion with special case
	------------------------------------------------------------------------- */

	void PPPMDisp::brick2fft_none()
	{
	int k,n,ix,iy,iz;

	// copy grabs inner portion of density from 3d brick
	// remap could be done as pre-stage of FFT,
	// but this works optimally on only double values, not complex values

	for (k = 0; k<nsplit_alloc; k++) {
	n = 0;
	for (iz = nzlo_in_6; iz <= nzhi_in_6; iz++)
	for (iy = nylo_in_6; iy <= nyhi_in_6; iy++)
	for (ix = nxlo_in_6; ix <= nxhi_in_6; ix++)
	density_fft_none[k][n++] = density_brick_none[k][iz][iy][ix];
	}

	for (k=0; k<nsplit_alloc; k++)
	remap_6->perform(density_fft_none[k],density_fft_none[k],work1_6);
	}

	/* ----------------------------------------------------------------------
	find center grid pt for each of my particles
	check that full stencil for the particle will fit in my 3d brick
	store central grid pt indices in part2grid array
	------------------------------------------------------------------------- */

	void PPPMDisp::particle_map(double delx, double dely, double delz,
	double sft, int** p2g, int nup, int nlow,
	int nxlo, int nylo, int nzlo,
	int nxhi, int nyhi, int nzhi)
	{
	int nx,ny,nz;

	double **x = atom->x;
	int nlocal = atom->nlocal;

	if (!ISFINITE(boxlo[0]) \|\| !ISFINITE(boxlo[1]) \|\| !ISFINITE(boxlo[2]))
	error->one(FLERR,"Non-numeric box dimensions - simulation unstable");

	int flag = 0;
	for (int i = 0; i < nlocal; i++) {

	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// current particle coord can be outside global and local box
	// add/subtract OFFSET to avoid int(-0.75) = 0 when want it to be -1

	nx = static_cast<int> ((x[i][0]-boxlo[0])*delx+sft) - OFFSET;
	ny = static_cast<int> ((x[i][1]-boxlo[1])*dely+sft) - OFFSET;
	nz = static_cast<int> ((x[i][2]-boxlo[2])*delz+sft) - OFFSET;

	p2g[i][0] = nx;
	p2g[i][1] = ny;
	p2g[i][2] = nz;

	// check that entire stencil around nx,ny,nz will fit in my 3d brick

	if (nx+nlow < nxlo \|\| nx+nup > nxhi \|\|
	ny+nlow < nylo \|\| ny+nup > nyhi \|\|
	nz+nlow < nzlo \|\| nz+nup > nzhi)
	flag = 1;
	}

	if (flag) error->one(FLERR,"Out of range atoms - cannot compute PPPMDisp");
	}


	void PPPMDisp::particle_map_c(double delx, double dely, double delz,
	double sft, int** p2g, int nup, int nlow,
	int nxlo, int nylo, int nzlo,
	int nxhi, int nyhi, int nzhi)
	{
	particle_map(delx, dely, delz, sft, p2g, nup, nlow,
	nxlo, nylo, nzlo, nxhi, nyhi, nzhi);
	}

	/* ----------------------------------------------------------------------
	create discretized "density" on section of global grid due to my particles
	density(x,y,z) = charge "density" at grid points of my 3d brick
	(nxlo:nxhi,nylo:nyhi,nzlo:nzhi) is extent of my brick (including ghosts)
	in global grid
	------------------------------------------------------------------------- */

	void PPPMDisp::make_rho_c()
	{
	int l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz,x0,y0,z0;

	// clear 3d density array

	memset(&(density_brick[nzlo_out][nylo_out][nxlo_out]),0,
	ngrid*sizeof(FFT_SCALAR));

	// loop over my charges, add their contribution to nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt

	double *q = atom->q;
	double **x = atom->x;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {

	nx = part2grid[i][0];
	ny = part2grid[i][1];
	nz = part2grid[i][2];
	dx = nx+shiftone - (x[i][0]-boxlo[0])*delxinv;
	dy = ny+shiftone - (x[i][1]-boxlo[1])*delyinv;
	dz = nz+shiftone - (x[i][2]-boxlo[2])*delzinv;

	compute_rho1d(dx,dy,dz, order, rho_coeff, rho1d);

	z0 = delvolinv * q[i];
	for (n = nlower; n <= nupper; n++) {
	mz = n+nz;
	y0 = z0*rho1d[2][n];
	for (m = nlower; m <= nupper; m++) {
	my = m+ny;
	x0 = y0*rho1d[1][m];
	for (l = nlower; l <= nupper; l++) {
	mx = l+nx;
	density_brick[mz][my][mx] += x0*rho1d[0][l];
	}
	}
	}
	}
	}

	/* ----------------------------------------------------------------------
	create discretized "density" on section of global grid due to my particles
	density(x,y,z) = dispersion "density" at grid points of my 3d brick
	(nxlo:nxhi,nylo:nyhi,nzlo:nzhi) is extent of my brick (including ghosts)
	in global grid --- geometric mixing
	------------------------------------------------------------------------- */

	void PPPMDisp::make_rho_g()
	{
	int l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz,x0,y0,z0;

	// clear 3d density array

	memset(&(density_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
	ngrid_6*sizeof(FFT_SCALAR));

	// loop over my charges, add their contribution to nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt
	int type;
	double **x = atom->x;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {

	nx = part2grid_6[i][0];
	ny = part2grid_6[i][1];
	nz = part2grid_6[i][2];
	dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
	dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
	dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;

	compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
	type = atom->type[i];
	z0 = delvolinv_6 * B[type];
	for (n = nlower_6; n <= nupper_6; n++) {
	mz = n+nz;
	y0 = z0*rho1d_6[2][n];
	for (m = nlower_6; m <= nupper_6; m++) {
	my = m+ny;
	x0 = y0*rho1d_6[1][m];
	for (l = nlower_6; l <= nupper_6; l++) {
	mx = l+nx;
	density_brick_g[mz][my][mx] += x0*rho1d_6[0][l];
	}
	}
	}
	}
	}


	/* ----------------------------------------------------------------------
	create discretized "density" on section of global grid due to my particles
	density(x,y,z) = dispersion "density" at grid points of my 3d brick
	(nxlo:nxhi,nylo:nyhi,nzlo:nzhi) is extent of my brick (including ghosts)
	in global grid --- arithmetic mixing
	------------------------------------------------------------------------- */

	void PPPMDisp::make_rho_a()
	{
	int l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz,x0,y0,z0,w;

	// clear 3d density array

	memset(&(density_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
	ngrid_6*sizeof(FFT_SCALAR));
	memset(&(density_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
	ngrid_6*sizeof(FFT_SCALAR));
	memset(&(density_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
	ngrid_6*sizeof(FFT_SCALAR));
	memset(&(density_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
	ngrid_6*sizeof(FFT_SCALAR));
	memset(&(density_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
	ngrid_6*sizeof(FFT_SCALAR));
	memset(&(density_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
	ngrid_6*sizeof(FFT_SCALAR));
	memset(&(density_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
	ngrid_6*sizeof(FFT_SCALAR));

	// loop over my particles, add their contribution to nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt
	int type;
	double **x = atom->x;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {

	//do the following for all 4 grids
	nx = part2grid_6[i][0];
	ny = part2grid_6[i][1];
	nz = part2grid_6[i][2];
	dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
	dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
	dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
	compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
	type = atom->type[i];
	z0 = delvolinv_6;
	for (n = nlower_6; n <= nupper_6; n++) {
	mz = n+nz;
	y0 = z0*rho1d_6[2][n];
	for (m = nlower_6; m <= nupper_6; m++) {
	my = m+ny;
	x0 = y0*rho1d_6[1][m];
	for (l = nlower_6; l <= nupper_6; l++) {
	mx = l+nx;
	w = x0*rho1d_6[0][l];
	density_brick_a0[mz][my][mx] += wB[7type];
	density_brick_a1[mz][my][mx] += wB[7type+1];
	density_brick_a2[mz][my][mx] += wB[7type+2];
	density_brick_a3[mz][my][mx] += wB[7type+3];
	density_brick_a4[mz][my][mx] += wB[7type+4];
	density_brick_a5[mz][my][mx] += wB[7type+5];
	density_brick_a6[mz][my][mx] += wB[7type+6];
	}
	}
	}
	}
	}

	/* ----------------------------------------------------------------------
	create discretized "density" on section of global grid due to my particles
	density(x,y,z) = dispersion "density" at grid points of my 3d brick
	(nxlo:nxhi,nylo:nyhi,nzlo:nzhi) is extent of my brick (including ghosts)
	in global grid --- case when mixing rules don't apply
	------------------------------------------------------------------------- */

	void PPPMDisp::make_rho_none()
	{
	int k,l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz,x0,y0,z0,w;

	// clear 3d density array
	for (k = 0; k < nsplit_alloc; k++)
	memset(&(density_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
	ngrid_6*sizeof(FFT_SCALAR));


	// loop over my particles, add their contribution to nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt
	int type;
	double **x = atom->x;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {

	//do the following for all 4 grids
	nx = part2grid_6[i][0];
	ny = part2grid_6[i][1];
	nz = part2grid_6[i][2];
	dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
	dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
	dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
	compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
	type = atom->type[i];
	z0 = delvolinv_6;
	for (n = nlower_6; n <= nupper_6; n++) {
	mz = n+nz;
	y0 = z0*rho1d_6[2][n];
	for (m = nlower_6; m <= nupper_6; m++) {
	my = m+ny;
	x0 = y0*rho1d_6[1][m];
	for (l = nlower_6; l <= nupper_6; l++) {
	mx = l+nx;
	w = x0*rho1d_6[0][l];
	for (k = 0; k < nsplit; k++)
	density_brick_none[k][mz][my][mx] += wB[nsplittype + k];
	}
	}
	}
	}
	}


	/* ----------------------------------------------------------------------
	FFT-based Poisson solver for ik differentiation
	------------------------------------------------------------------------- */

	void PPPMDisp::poisson_ik(FFT_SCALAR* wk1, FFT_SCALAR* wk2,
	FFT_SCALAR* dfft, LAMMPS_NS::FFT3d* ft1,LAMMPS_NS::FFT3d* ft2,
	int nx_p, int ny_p, int nz_p, int nft,
	int nxlo_ft, int nylo_ft, int nzlo_ft,
	int nxhi_ft, int nyhi_ft, int nzhi_ft,
	int nxlo_i, int nylo_i, int nzlo_i,
	int nxhi_i, int nyhi_i, int nzhi_i,
	double& egy, double* gfn,
	double* kx, double* ky, double* kz,
	double* kx2, double* ky2, double* kz2,
	FFT_SCALAR* vx_brick, FFT_SCALAR* vy_brick, FFT_SCALAR*** vz_brick,
	double* vir, double vcoeff, double vcoeff2,
	FFT_SCALAR* u_pa, FFT_SCALAR* v0_pa, FFT_SCALAR* v1_pa, FFT_SCALAR* v2_pa,
	FFT_SCALAR* v3_pa, FFT_SCALAR* v4_pa, FFT_SCALAR*** v5_pa)


	{
	int i,j,k,n;
	double eng;

	// transform charge/dispersion density (r -> k)
	n = 0;
	for (i = 0; i < nft; i++) {
	wk1[n++] = dfft[i];
	wk1[n++] = ZEROF;
	}

	ft1->compute(wk1,wk1,1);

	// if requested, compute energy and virial contribution

	double scaleinv = 1.0/(nx_pny_pnz_p);
	double s2 = scaleinv*scaleinv;

	if (eflag_global \|\| vflag_global) {
	if (vflag_global) {
	n = 0;
	for (i = 0; i < nft; i++) {
	eng = s2 * gfn[i] * (wk1[n]wk1[n] + wk1[n+1]wk1[n+1]);
	for (j = 0; j < 6; j++) vir[j] += eng*vcoeff[i][j];
	if (eflag_global) egy += eng;
	n += 2;
	}
	} else {
	n = 0;
	for (i = 0; i < nft; i++) {
	egy +=
	s2 * gfn[i] * (wk1[n]wk1[n] + wk1[n+1]wk1[n+1]);
	n += 2;
	}
	}
	}

	// scale by 1/total-grid-pts to get rho(k)
	// multiply by Green's function to get V(k)

	n = 0;
	for (i = 0; i < nft; i++) {
	wk1[n++] = scaleinv gfn[i];
	wk1[n++] = scaleinv gfn[i];
	}

	// compute gradients of V(r) in each of 3 dims by transformimg -ik*V(k)
	// FFT leaves data in 3d brick decomposition
	// copy it into inner portion of vdx,vdy,vdz arrays

	// x & y direction gradient

	n = 0;
	for (k = nzlo_ft; k <= nzhi_ft; k++)
	for (j = nylo_ft; j <= nyhi_ft; j++)
	for (i = nxlo_ft; i <= nxhi_ft; i++) {
	wk2[n] = 0.5(kx[i]-kx2[i])wk1[n+1] + 0.5(ky[j]-ky2[j])wk1[n];
	wk2[n+1] = -0.5(kx[i]-kx2[i])wk1[n] + 0.5(ky[j]-ky2[j])wk1[n+1];
	n += 2;
	}

	ft2->compute(wk2,wk2,-1);

	n = 0;
	for (k = nzlo_i; k <= nzhi_i; k++)
	for (j = nylo_i; j <= nyhi_i; j++)
	for (i = nxlo_i; i <= nxhi_i; i++) {
	vx_brick[k][j][i] = wk2[n++];
	vy_brick[k][j][i] = wk2[n++];
	}

	if (!eflag_atom) {
	// z direction gradient only

	n = 0;
	for (k = nzlo_ft; k <= nzhi_ft; k++)
	for (j = nylo_ft; j <= nyhi_ft; j++)
	for (i = nxlo_ft; i <= nxhi_ft; i++) {
	wk2[n] = kz[k]*wk1[n+1];
	wk2[n+1] = -kz[k]*wk1[n];
	n += 2;
	}

	ft2->compute(wk2,wk2,-1);


	n = 0;
	for (k = nzlo_i; k <= nzhi_i; k++)
	for (j = nylo_i; j <= nyhi_i; j++)
	for (i = nxlo_i; i <= nxhi_i; i++) {
	vz_brick[k][j][i] = wk2[n];
	n += 2;
	}

	}

	else {
	// z direction gradient & per-atom energy

	n = 0;
	for (k = nzlo_ft; k <= nzhi_ft; k++)
	for (j = nylo_ft; j <= nyhi_ft; j++)
	for (i = nxlo_ft; i <= nxhi_ft; i++) {
	wk2[n] = 0.5(kz[k]-kz2[k])wk1[n+1] - wk1[n+1];
	wk2[n+1] = -0.5(kz[k]-kz2[k])wk1[n] + wk1[n];
	n += 2;
	}

	ft2->compute(wk2,wk2,-1);

	n = 0;
	for (k = nzlo_i; k <= nzhi_i; k++)
	for (j = nylo_i; j <= nyhi_i; j++)
	for (i = nxlo_i; i <= nxhi_i; i++) {
	vz_brick[k][j][i] = wk2[n++];
	u_pa[k][j][i] = wk2[n++];;
	}
	}

	if (vflag_atom) poisson_peratom(wk1, wk2, ft2, vcoeff, vcoeff2, nft,
	nxlo_i, nylo_i, nzlo_i, nxhi_i, nyhi_i, nzhi_i,
	v0_pa, v1_pa, v2_pa, v3_pa, v4_pa, v5_pa);
	}

	/* ----------------------------------------------------------------------
	FFT-based Poisson solver for ad differentiation
	------------------------------------------------------------------------- */

	void PPPMDisp::poisson_ad(FFT_SCALAR* wk1, FFT_SCALAR* wk2,
	FFT_SCALAR* dfft, LAMMPS_NS::FFT3d* ft1,LAMMPS_NS::FFT3d* ft2,
	int nx_p, int ny_p, int nz_p, int nft,
	int nxlo_ft, int nylo_ft, int nzlo_ft,
	int nxhi_ft, int nyhi_ft, int nzhi_ft,
	int nxlo_i, int nylo_i, int nzlo_i,
	int nxhi_i, int nyhi_i, int nzhi_i,
	double& egy, double* gfn,
	double* vir, double vcoeff, double vcoeff2,
	FFT_SCALAR* u_pa, FFT_SCALAR* v0_pa, FFT_SCALAR* v1_pa, FFT_SCALAR* v2_pa,
	FFT_SCALAR* v3_pa, FFT_SCALAR* v4_pa, FFT_SCALAR*** v5_pa)


	{
	int i,j,k,n;
	double eng;

	// transform charge/dispersion density (r -> k)
	n = 0;
	for (i = 0; i < nft; i++) {
	wk1[n++] = dfft[i];
	wk1[n++] = ZEROF;
	}

	ft1->compute(wk1,wk1,1);

	// if requested, compute energy and virial contribution

	double scaleinv = 1.0/(nx_pny_pnz_p);
	double s2 = scaleinv*scaleinv;

	if (eflag_global \|\| vflag_global) {
	if (vflag_global) {
	n = 0;
	for (i = 0; i < nft; i++) {
	eng = s2 * gfn[i] * (wk1[n]wk1[n] + wk1[n+1]wk1[n+1]);
	for (j = 0; j < 6; j++) vir[j] += eng*vcoeff[i][j];
	if (eflag_global) egy += eng;
	n += 2;
	}
	} else {
	n = 0;
	for (i = 0; i < nft; i++) {
	egy +=
	s2 * gfn[i] * (wk1[n]wk1[n] + wk1[n+1]wk1[n+1]);
	n += 2;
	}
	}
	}

	// scale by 1/total-grid-pts to get rho(k)
	// multiply by Green's function to get V(k)

	n = 0;
	for (i = 0; i < nft; i++) {
	wk1[n++] = scaleinv gfn[i];
	wk1[n++] = scaleinv gfn[i];
	}


	n = 0;
	for (k = nzlo_ft; k <= nzhi_ft; k++)
	for (j = nylo_ft; j <= nyhi_ft; j++)
	for (i = nxlo_ft; i <= nxhi_ft; i++) {
	wk2[n] = wk1[n];
	wk2[n+1] = wk1[n+1];
	n += 2;
	}

	ft2->compute(wk2,wk2,-1);


	n = 0;
	for (k = nzlo_i; k <= nzhi_i; k++)
	for (j = nylo_i; j <= nyhi_i; j++)
	for (i = nxlo_i; i <= nxhi_i; i++) {
	u_pa[k][j][i] = wk2[n++];
	n++;
	}


	if (vflag_atom) poisson_peratom(wk1, wk2, ft2, vcoeff, vcoeff2, nft,
	nxlo_i, nylo_i, nzlo_i, nxhi_i, nyhi_i, nzhi_i,
	v0_pa, v1_pa, v2_pa, v3_pa, v4_pa, v5_pa);

	}

	/* ----------------------------------------------------------------------
	Fourier Transform for per atom virial calculations
	------------------------------------------------------------------------- */

	void PPPMDisp:: poisson_peratom(FFT_SCALAR* wk1, FFT_SCALAR* wk2, LAMMPS_NS::FFT3d* ft2,
	double vcoeff, double vcoeff2, int nft,
	int nxlo_i, int nylo_i, int nzlo_i,
	int nxhi_i, int nyhi_i, int nzhi_i,
	FFT_SCALAR* v0_pa, FFT_SCALAR* v1_pa, FFT_SCALAR*** v2_pa,
	FFT_SCALAR* v3_pa, FFT_SCALAR* v4_pa, FFT_SCALAR*** v5_pa)
	{
	//v0 & v1 term
	int n, i, j, k;
	n = 0;
	for (i = 0; i < nft; i++) {
	wk2[n] = wk1[n]vcoeff[i][0] - wk1[n+1]vcoeff[i][1];
	wk2[n+1] = wk1[n+1]vcoeff[i][0] + wk1[n]vcoeff[i][1];
	n += 2;
	}

	ft2->compute(wk2,wk2,-1);

	n = 0;
	for (k = nzlo_i; k <= nzhi_i; k++)
	for (j = nylo_i; j <= nyhi_i; j++)
	for (i = nxlo_i; i <= nxhi_i; i++) {
	v0_pa[k][j][i] = wk2[n++];
	v1_pa[k][j][i] = wk2[n++];
	}

	//v2 & v3 term

	n = 0;
	for (i = 0; i < nft; i++) {
	wk2[n] = wk1[n]vcoeff[i][2] - wk1[n+1]vcoeff2[i][0];
	wk2[n+1] = wk1[n+1]vcoeff[i][2] + wk1[n]vcoeff2[i][0];
	n += 2;
	}

	ft2->compute(wk2,wk2,-1);

	n = 0;
	for (k = nzlo_i; k <= nzhi_i; k++)
	for (j = nylo_i; j <= nyhi_i; j++)
	for (i = nxlo_i; i <= nxhi_i; i++) {
	v2_pa[k][j][i] = wk2[n++];
	v3_pa[k][j][i] = wk2[n++];
	}

	//v4 & v5 term

	n = 0;
	for (i = 0; i < nft; i++) {
	wk2[n] = wk1[n]vcoeff2[i][1] - wk1[n+1]vcoeff2[i][2];
	wk2[n+1] = wk1[n+1]vcoeff2[i][1] + wk1[n]vcoeff2[i][2];
	n += 2;
	}

	ft2->compute(wk2,wk2,-1);

	n = 0;
	for (k = nzlo_i; k <= nzhi_i; k++)
	for (j = nylo_i; j <= nyhi_i; j++)
	for (i = nxlo_i; i <= nxhi_i; i++) {
	v4_pa[k][j][i] = wk2[n++];
	v5_pa[k][j][i] = wk2[n++];
	}

	}

	/* ----------------------------------------------------------------------
	Poisson solver for one mesh with 2 different dispersion densities
	for ik scheme
	------------------------------------------------------------------------- */

	void PPPMDisp::poisson_2s_ik(FFT_SCALAR* dfft_1, FFT_SCALAR* dfft_2,
	FFT_SCALAR* vxbrick_1, FFT_SCALAR* vybrick_1, FFT_SCALAR*** vzbrick_1,
	FFT_SCALAR* vxbrick_2, FFT_SCALAR* vybrick_2, FFT_SCALAR*** vzbrick_2,
	FFT_SCALAR* u_pa_1, FFT_SCALAR* v0_pa_1, FFT_SCALAR* v1_pa_1, FFT_SCALAR* v2_pa_1,
	FFT_SCALAR* v3_pa_1, FFT_SCALAR* v4_pa_1, FFT_SCALAR*** v5_pa_1,
	FFT_SCALAR* u_pa_2, FFT_SCALAR* v0_pa_2, FFT_SCALAR* v1_pa_2, FFT_SCALAR* v2_pa_2,
	FFT_SCALAR* v3_pa_2, FFT_SCALAR* v4_pa_2, FFT_SCALAR*** v5_pa_2)

	{
	int i,j,k,n;
	double eng;

	double scaleinv = 1.0/(nx_pppm_6ny_pppm_6nz_pppm_6);

	// transform charge/dispersion density (r -> k)
	// only one tansform required when energies and pressures do not
	// need to be calculated
	if (eflag_global + vflag_global == 0) {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work1_6[n++] = dfft_1[i];
	work1_6[n++] = dfft_2[i];
	}

	fft1_6->compute(work1_6,work1_6,1);
	}
	// two transforms are required when energies and pressures are
	// calculated
	else {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work1_6[n] = dfft_1[i];
	work2_6[n++] = ZEROF;
	work1_6[n] = ZEROF;
	work2_6[n++] = dfft_2[i];
	}

	fft1_6->compute(work1_6,work1_6,1);
	fft1_6->compute(work2_6,work2_6,1);

	double s2 = scaleinv*scaleinv;

	if (vflag_global) {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	eng = 2 * s2 * greensfn_6[i] * (work1_6[n]work2_6[n+1] - work1_6[n+1]work2_6[n]);
	for (j = 0; j < 6; j++) virial_6[j] += eng*vg_6[i][j];
	if (eflag_global)energy_6 += eng;
	n += 2;
	}
	} else {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	energy_6 +=
	2 * s2 * greensfn_6[i] * (work1_6[n]work2_6[n+1] - work1_6[n+1]work2_6[n]);
	n += 2;
	}
	}
	// unify the two transformed vectors for efficient calculations later
	for ( i = 0; i < 2*nfft_6; i++) {
	work1_6[i] += work2_6[i];
	}
	}

	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work1_6[n++] = scaleinv greensfn_6[i];
	work1_6[n++] = scaleinv greensfn_6[i];
	}

	// compute gradients of V(r) in each of 3 dims by transformimg -ik*V(k)
	// FFT leaves data in 3d brick decomposition
	// copy it into inner portion of vdx,vdy,vdz arrays

	// x direction gradient

	n = 0;
	for (k = nzlo_fft_6; k <= nzhi_fft_6; k++)
	for (j = nylo_fft_6; j <= nyhi_fft_6; j++)
	for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
	work2_6[n] = 0.5(fkx_6[i]-fkx2_6[i])work1_6[n+1];
	work2_6[n+1] = -0.5(fkx_6[i]-fkx2_6[i])work1_6[n];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	vxbrick_1[k][j][i] = work2_6[n++];
	vxbrick_2[k][j][i] = work2_6[n++];
	}

	// y direction gradient

	n = 0;
	for (k = nzlo_fft_6; k <= nzhi_fft_6; k++)
	for (j = nylo_fft_6; j <= nyhi_fft_6; j++)
	for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
	work2_6[n] = 0.5(fky_6[j]-fky2_6[j])work1_6[n+1];
	work2_6[n+1] = -0.5(fky_6[j]-fky2_6[j])work1_6[n];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	vybrick_1[k][j][i] = work2_6[n++];
	vybrick_2[k][j][i] = work2_6[n++];
	}

	// z direction gradient

	n = 0;
	for (k = nzlo_fft_6; k <= nzhi_fft_6; k++)
	for (j = nylo_fft_6; j <= nyhi_fft_6; j++)
	for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
	work2_6[n] = 0.5(fkz_6[k]-fkz2_6[k])work1_6[n+1];
	work2_6[n+1] = -0.5(fkz_6[k]-fkz2_6[k])work1_6[n];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	vzbrick_1[k][j][i] = work2_6[n++];
	vzbrick_2[k][j][i] = work2_6[n++];
	}

	//Per-atom energy

	if (eflag_atom) {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n];
	work2_6[n+1] = work1_6[n+1];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	u_pa_1[k][j][i] = work2_6[n++];
	u_pa_2[k][j][i] = work2_6[n++];
	}
	}

	if (vflag_atom) poisson_2s_peratom(v0_pa_1, v1_pa_1, v2_pa_1, v3_pa_1, v4_pa_1, v5_pa_1,
	v0_pa_2, v1_pa_2, v2_pa_2, v3_pa_2, v4_pa_2, v5_pa_2);
	}


	/* ----------------------------------------------------------------------
	Poisson solver for one mesh with 2 different dispersion densities
	for ik scheme
	------------------------------------------------------------------------- */

	void PPPMDisp::poisson_none_ik(int n1, int n2,FFT_SCALAR* dfft_1, FFT_SCALAR* dfft_2,
	FFT_SCALAR* vxbrick_1, FFT_SCALAR* vybrick_1, FFT_SCALAR*** vzbrick_1,
	FFT_SCALAR* vxbrick_2, FFT_SCALAR* vybrick_2, FFT_SCALAR*** vzbrick_2,
	FFT_SCALAR** u_pa, FFT_SCALAR v0_pa, FFT_SCALAR v1_pa, FFT_SCALAR** v2_pa,
	FFT_SCALAR** v3_pa, FFT_SCALAR v4_pa, FFT_SCALAR** v5_pa)
	{
	int i,j,k,n;
	double eng;

	double scaleinv = 1.0/(nx_pppm_6ny_pppm_6nz_pppm_6);

	// transform charge/dispersion density (r -> k)
	// only one tansform required when energies and pressures do not
	// need to be calculated
	if (eflag_global + vflag_global == 0) {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work1_6[n++] = dfft_1[i];
	work1_6[n++] = dfft_2[i];
	}

	fft1_6->compute(work1_6,work1_6,1);
	}


	// two transforms are required when energies and pressures are
	// calculated
	else {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work1_6[n] = dfft_1[i];
	work2_6[n++] = ZEROF;
	work1_6[n] = ZEROF;
	work2_6[n++] = dfft_2[i];
	}


	fft1_6->compute(work1_6,work1_6,1);
	fft1_6->compute(work2_6,work2_6,1);

	double s2 = scaleinv*scaleinv;

	if (vflag_global) {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	eng = s2 * greensfn_6[i] * (B[n1](work1_6[n]work1_6[n] + work1_6[n+1]work1_6[n+1]) + B[n2](work2_6[n]work2_6[n] + work2_6[n+1]work2_6[n+1]));
	for (j = 0; j < 6; j++) virial_6[j] += eng*vg_6[i][j];
	if (eflag_global)energy_6 += eng;
	n += 2;
	}
	} else {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	energy_6 +=
	s2 * greensfn_6[i] * (B[n1](work1_6[n]work1_6[n] + work1_6[n+1]work1_6[n+1]) + B[n2](work2_6[n]work2_6[n] + work2_6[n+1]work2_6[n+1]));
	n += 2;
	}
	}
	// unify the two transformed vectors for efficient calculations later
	for ( i = 0; i < 2*nfft_6; i++) {
	work1_6[i] += work2_6[i];
	}
	}

	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work1_6[n++] = scaleinv greensfn_6[i];
	work1_6[n++] = scaleinv greensfn_6[i];
	}

	// compute gradients of V(r) in each of 3 dims by transformimg -ik*V(k)
	// FFT leaves data in 3d brick decomposition
	// copy it into inner portion of vdx,vdy,vdz arrays

	// x direction gradient

	n = 0;
	for (k = nzlo_fft_6; k <= nzhi_fft_6; k++)
	for (j = nylo_fft_6; j <= nyhi_fft_6; j++)
	for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
	work2_6[n] = 0.5(fkx_6[i]-fkx2_6[i])work1_6[n+1];
	work2_6[n+1] = -0.5(fkx_6[i]-fkx2_6[i])work1_6[n];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	vxbrick_1[k][j][i] = B[n1]*work2_6[n++];
	vxbrick_2[k][j][i] = B[n2]*work2_6[n++];
	}

	// y direction gradient

	n = 0;
	for (k = nzlo_fft_6; k <= nzhi_fft_6; k++)
	for (j = nylo_fft_6; j <= nyhi_fft_6; j++)
	for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
	work2_6[n] = 0.5(fky_6[j]-fky2_6[j])work1_6[n+1];
	work2_6[n+1] = -0.5(fky_6[j]-fky2_6[j])work1_6[n];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	vybrick_1[k][j][i] = B[n1]*work2_6[n++];
	vybrick_2[k][j][i] = B[n2]*work2_6[n++];
	}

	// z direction gradient

	n = 0;
	for (k = nzlo_fft_6; k <= nzhi_fft_6; k++)
	for (j = nylo_fft_6; j <= nyhi_fft_6; j++)
	for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
	work2_6[n] = 0.5(fkz_6[k]-fkz2_6[k])work1_6[n+1];
	work2_6[n+1] = -0.5(fkz_6[k]-fkz2_6[k])work1_6[n];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	vzbrick_1[k][j][i] = B[n1]*work2_6[n++];
	vzbrick_2[k][j][i] = B[n2]*work2_6[n++];
	}

	//Per-atom energy

	if (eflag_atom) {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n];
	work2_6[n+1] = work1_6[n+1];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	u_pa[n1][k][j][i] = B[n1]*work2_6[n++];
	u_pa[n2][k][j][i] = B[n2]*work2_6[n++];
	}
	}

	if (vflag_atom) poisson_none_peratom(n1,n2,
	v0_pa[n1], v1_pa[n1], v2_pa[n1], v3_pa[n1], v4_pa[n1], v5_pa[n1],
	v0_pa[n2], v1_pa[n2], v2_pa[n2], v3_pa[n2], v4_pa[n2], v5_pa[n2]);
	}

	/* ----------------------------------------------------------------------
	Poisson solver for one mesh with 2 different dispersion densities
	for ad scheme
	------------------------------------------------------------------------- */

	void PPPMDisp::poisson_2s_ad(FFT_SCALAR* dfft_1, FFT_SCALAR* dfft_2,
	FFT_SCALAR* u_pa_1, FFT_SCALAR* v0_pa_1, FFT_SCALAR* v1_pa_1, FFT_SCALAR* v2_pa_1,
	FFT_SCALAR* v3_pa_1, FFT_SCALAR* v4_pa_1, FFT_SCALAR*** v5_pa_1,
	FFT_SCALAR* u_pa_2, FFT_SCALAR* v0_pa_2, FFT_SCALAR* v1_pa_2, FFT_SCALAR* v2_pa_2,
	FFT_SCALAR* v3_pa_2, FFT_SCALAR* v4_pa_2, FFT_SCALAR*** v5_pa_2)

	{
	int i,j,k,n;
	double eng;

	double scaleinv = 1.0/(nx_pppm_6ny_pppm_6nz_pppm_6);

	// transform charge/dispersion density (r -> k)
	// only one tansform required when energies and pressures do not
	// need to be calculated
	if (eflag_global + vflag_global == 0) {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work1_6[n++] = dfft_1[i];
	work1_6[n++] = dfft_2[i];
	}

	fft1_6->compute(work1_6,work1_6,1);
	}
	// two transforms are required when energies and pressures are
	// calculated
	else {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work1_6[n] = dfft_1[i];
	work2_6[n++] = ZEROF;
	work1_6[n] = ZEROF;
	work2_6[n++] = dfft_2[i];
	}

	fft1_6->compute(work1_6,work1_6,1);
	fft1_6->compute(work2_6,work2_6,1);

	double s2 = scaleinv*scaleinv;

	if (vflag_global) {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	eng = 2 * s2 * greensfn_6[i] * (work1_6[n]work2_6[n+1] - work1_6[n+1]work2_6[n]);
	for (j = 0; j < 6; j++) virial_6[j] += eng*vg_6[i][j];
	if (eflag_global)energy_6 += eng;
	n += 2;
	}
	} else {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	energy_6 +=
	2 * s2 * greensfn_6[i] * (work1_6[n]work2_6[n+1] - work1_6[n+1]work2_6[n]);
	n += 2;
	}
	}
	// unify the two transformed vectors for efficient calculations later
	for ( i = 0; i < 2*nfft_6; i++) {
	work1_6[i] += work2_6[i];
	}
	}


	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work1_6[n++] = scaleinv greensfn_6[i];
	work1_6[n++] = scaleinv greensfn_6[i];
	}


	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n];
	work2_6[n+1] = work1_6[n+1];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	u_pa_1[k][j][i] = work2_6[n++];
	u_pa_2[k][j][i] = work2_6[n++];
	}

	if (vflag_atom) poisson_2s_peratom(v0_pa_1, v1_pa_1, v2_pa_1, v3_pa_1, v4_pa_1, v5_pa_1,
	v0_pa_2, v1_pa_2, v2_pa_2, v3_pa_2, v4_pa_2, v5_pa_2);
	}

	/* ----------------------------------------------------------------------
	Poisson solver for one mesh with 2 different dispersion densities
	for ad scheme
	------------------------------------------------------------------------- */

	void PPPMDisp::poisson_none_ad(int n1, int n2, FFT_SCALAR* dfft_1, FFT_SCALAR* dfft_2,
	FFT_SCALAR* u_pa_1, FFT_SCALAR* u_pa_2,
	FFT_SCALAR** v0_pa, FFT_SCALAR v1_pa, FFT_SCALAR** v2_pa,
	FFT_SCALAR** v3_pa, FFT_SCALAR v4_pa, FFT_SCALAR** v5_pa)
	{
	int i,j,k,n;
	double eng;

	double scaleinv = 1.0/(nx_pppm_6ny_pppm_6nz_pppm_6);

	// transform charge/dispersion density (r -> k)
	// only one tansform required when energies and pressures do not
	// need to be calculated
	if (eflag_global + vflag_global == 0) {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work1_6[n++] = dfft_1[i];
	work1_6[n++] = dfft_2[i];
	}

	fft1_6->compute(work1_6,work1_6,1);
	}
	// two transforms are required when energies and pressures are
	// calculated
	else {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work1_6[n] = dfft_1[i];
	work2_6[n++] = ZEROF;
	work1_6[n] = ZEROF;
	work2_6[n++] = dfft_2[i];
	}

	fft1_6->compute(work1_6,work1_6,1);
	fft1_6->compute(work2_6,work2_6,1);

	double s2 = scaleinv*scaleinv;

	if (vflag_global) {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	eng = s2 * greensfn_6[i] * (B[n1](work1_6[n]work1_6[n] + work1_6[n+1]work1_6[n+1]) + B[n2](work2_6[n]work2_6[n] + work2_6[n+1]work2_6[n+1]));
	for (j = 0; j < 6; j++) virial_6[j] += eng*vg_6[i][j];
	if (eflag_global)energy_6 += eng;
	n += 2;
	}
	} else {
	n = 0;
	for (i = 0; i < nfft_6; i++) {
	energy_6 +=
	s2 * greensfn_6[i] * (B[n1](work1_6[n]work1_6[n] + work1_6[n+1]work1_6[n+1]) + B[n2](work2_6[n]work2_6[n] + work2_6[n+1]work2_6[n+1]));
	n += 2;
	}
	}
	// unify the two transformed vectors for efficient calculations later
	for ( i = 0; i < 2*nfft_6; i++) {
	work1_6[i] += work2_6[i];
	}
	}


	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work1_6[n++] = scaleinv greensfn_6[i];
	work1_6[n++] = scaleinv greensfn_6[i];
	}


	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n];
	work2_6[n+1] = work1_6[n+1];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	u_pa_1[k][j][i] = B[n1]*work2_6[n++];
	u_pa_2[k][j][i] = B[n2]*work2_6[n++];
	}

	if (vflag_atom) poisson_none_peratom(n1,n2,
	v0_pa[n1], v1_pa[n1], v2_pa[n1], v3_pa[n1], v4_pa[n1], v5_pa[n1],
	v0_pa[n2], v1_pa[n2], v2_pa[n2], v3_pa[n2], v4_pa[n2], v5_pa[n2]);
	}

	/* ----------------------------------------------------------------------
	Fourier Transform for per atom virial calculations
	------------------------------------------------------------------------- */

	void PPPMDisp::poisson_2s_peratom(FFT_SCALAR* v0_pa_1, FFT_SCALAR* v1_pa_1, FFT_SCALAR*** v2_pa_1,
	FFT_SCALAR* v3_pa_1, FFT_SCALAR* v4_pa_1, FFT_SCALAR*** v5_pa_1,
	FFT_SCALAR* v0_pa_2, FFT_SCALAR* v1_pa_2, FFT_SCALAR*** v2_pa_2,
	FFT_SCALAR* v3_pa_2, FFT_SCALAR* v4_pa_2, FFT_SCALAR*** v5_pa_2)
	{
	//Compute first virial term v0
	int n, i, j, k;

	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n]*vg_6[i][0];
	work2_6[n+1] = work1_6[n+1]*vg_6[i][0];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	v0_pa_1[k][j][i] = work2_6[n++];
	v0_pa_2[k][j][i] = work2_6[n++];
	}

	//Compute second virial term v1

	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n]*vg_6[i][1];
	work2_6[n+1] = work1_6[n+1]*vg_6[i][1];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	v1_pa_1[k][j][i] = work2_6[n++];
	v1_pa_2[k][j][i] = work2_6[n++];
	}

	//Compute third virial term v2

	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n]*vg_6[i][2];
	work2_6[n+1] = work1_6[n+1]*vg_6[i][2];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	v2_pa_1[k][j][i] = work2_6[n++];
	v2_pa_2[k][j][i] = work2_6[n++];
	}

	//Compute fourth virial term v3

	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n]*vg2_6[i][0];
	work2_6[n+1] = work1_6[n+1]*vg2_6[i][0];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	v3_pa_1[k][j][i] = work2_6[n++];
	v3_pa_2[k][j][i] = work2_6[n++];
	}

	//Compute fifth virial term v4

	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n]*vg2_6[i][1];
	work2_6[n+1] = work1_6[n+1]*vg2_6[i][1];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	v4_pa_1[k][j][i] = work2_6[n++];
	v4_pa_2[k][j][i] = work2_6[n++];
	}

	//Compute last virial term v5

	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n]*vg2_6[i][2];
	work2_6[n+1] = work1_6[n+1]*vg2_6[i][2];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	v5_pa_1[k][j][i] = work2_6[n++];
	v5_pa_2[k][j][i] = work2_6[n++];
	}
	}

	/* ----------------------------------------------------------------------
	Fourier Transform for per atom virial calculations
	------------------------------------------------------------------------- */

	void PPPMDisp::poisson_none_peratom(int n1, int n2,
	FFT_SCALAR* v0_pa_1, FFT_SCALAR* v1_pa_1, FFT_SCALAR*** v2_pa_1,
	FFT_SCALAR* v3_pa_1, FFT_SCALAR* v4_pa_1, FFT_SCALAR*** v5_pa_1,
	FFT_SCALAR* v0_pa_2, FFT_SCALAR* v1_pa_2, FFT_SCALAR*** v2_pa_2,
	FFT_SCALAR* v3_pa_2, FFT_SCALAR* v4_pa_2, FFT_SCALAR*** v5_pa_2)
	{
	//Compute first virial term v0
	int n, i, j, k;

	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n]*vg_6[i][0];
	work2_6[n+1] = work1_6[n+1]*vg_6[i][0];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	v0_pa_1[k][j][i] = B[n1]*work2_6[n++];
	v0_pa_2[k][j][i] = B[n2]*work2_6[n++];
	}

	//Compute second virial term v1

	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n]*vg_6[i][1];
	work2_6[n+1] = work1_6[n+1]*vg_6[i][1];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	v1_pa_1[k][j][i] = B[n1]*work2_6[n++];
	v1_pa_2[k][j][i] = B[n2]*work2_6[n++];
	}

	//Compute third virial term v2

	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n]*vg_6[i][2];
	work2_6[n+1] = work1_6[n+1]*vg_6[i][2];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	v2_pa_1[k][j][i] = B[n1]*work2_6[n++];
	v2_pa_2[k][j][i] = B[n2]*work2_6[n++];
	}

	//Compute fourth virial term v3

	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n]*vg2_6[i][0];
	work2_6[n+1] = work1_6[n+1]*vg2_6[i][0];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	v3_pa_1[k][j][i] = B[n1]*work2_6[n++];
	v3_pa_2[k][j][i] = B[n2]*work2_6[n++];
	}

	//Compute fifth virial term v4

	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n]*vg2_6[i][1];
	work2_6[n+1] = work1_6[n+1]*vg2_6[i][1];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	v4_pa_1[k][j][i] = B[n1]*work2_6[n++];
	v4_pa_2[k][j][i] = B[n2]*work2_6[n++];
	}

	//Compute last virial term v5

	n = 0;
	for (i = 0; i < nfft_6; i++) {
	work2_6[n] = work1_6[n]*vg2_6[i][2];
	work2_6[n+1] = work1_6[n+1]*vg2_6[i][2];
	n += 2;
	}

	fft2_6->compute(work2_6,work2_6,-1);

	n = 0;
	for (k = nzlo_in_6; k <= nzhi_in_6; k++)
	for (j = nylo_in_6; j <= nyhi_in_6; j++)
	for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
	v5_pa_1[k][j][i] = B[n1]*work2_6[n++];
	v5_pa_2[k][j][i] = B[n2]*work2_6[n++];
	}
	}

	/* ----------------------------------------------------------------------
	interpolate from grid to get electric field & force on my particles
	for ik scheme
	------------------------------------------------------------------------- */

	void PPPMDisp::fieldforce_c_ik()
	{
	int i,l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz,x0,y0,z0;
	FFT_SCALAR ekx,eky,ekz;

	// loop over my charges, interpolate electric field from nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt
	// ek = 3 components of E-field on particle

	double *q = atom->q;
	double **x = atom->x;
	double **f = atom->f;

	int nlocal = atom->nlocal;

	for (i = 0; i < nlocal; i++) {
	nx = part2grid[i][0];
	ny = part2grid[i][1];
	nz = part2grid[i][2];
	dx = nx+shiftone - (x[i][0]-boxlo[0])*delxinv;
	dy = ny+shiftone - (x[i][1]-boxlo[1])*delyinv;
	dz = nz+shiftone - (x[i][2]-boxlo[2])*delzinv;

	compute_rho1d(dx,dy,dz, order, rho_coeff, rho1d);

	ekx = eky = ekz = ZEROF;
	for (n = nlower; n <= nupper; n++) {
	mz = n+nz;
	z0 = rho1d[2][n];
	for (m = nlower; m <= nupper; m++) {
	my = m+ny;
	y0 = z0*rho1d[1][m];
	for (l = nlower; l <= nupper; l++) {
	mx = l+nx;
	x0 = y0*rho1d[0][l];
	ekx -= x0*vdx_brick[mz][my][mx];
	eky -= x0*vdy_brick[mz][my][mx];
	ekz -= x0*vdz_brick[mz][my][mx];
	}
	}
	}

	// convert E-field to force

	const double qfactor = force->qqrd2e * scale * q[i];
	f[i][0] += qfactor*ekx;
	f[i][1] += qfactor*eky;
	if (slabflag != 2) f[i][2] += qfactor*ekz;
	}
	}
	/* ----------------------------------------------------------------------
	interpolate from grid to get electric field & force on my particles
	for ad scheme
	------------------------------------------------------------------------- */

	void PPPMDisp::fieldforce_c_ad()
	{
	int i,l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz;
	FFT_SCALAR ekx,eky,ekz;
	double s1,s2,s3;
	double sf = 0.0;

	double *prd;

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;

	double hx_inv = nx_pppm/xprd;
	double hy_inv = ny_pppm/yprd;
	double hz_inv = nz_pppm/zprd_slab;

	// loop over my charges, interpolate electric field from nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt
	// ek = 3 components of E-field on particle

	double *q = atom->q;
	double **x = atom->x;
	double **f = atom->f;

	int nlocal = atom->nlocal;

	for (i = 0; i < nlocal; i++) {
	nx = part2grid[i][0];
	ny = part2grid[i][1];
	nz = part2grid[i][2];
	dx = nx+shiftone - (x[i][0]-boxlo[0])*delxinv;
	dy = ny+shiftone - (x[i][1]-boxlo[1])*delyinv;
	dz = nz+shiftone - (x[i][2]-boxlo[2])*delzinv;

	compute_rho1d(dx,dy,dz, order, rho_coeff, rho1d);
	compute_drho1d(dx,dy,dz, order, drho_coeff, drho1d);

	ekx = eky = ekz = ZEROF;
	for (n = nlower; n <= nupper; n++) {
	mz = n+nz;
	for (m = nlower; m <= nupper; m++) {
	my = m+ny;
	for (l = nlower; l <= nupper; l++) {
	mx = l+nx;
	ekx += drho1d[0][l]rho1d[1][m]rho1d[2][n]*u_brick[mz][my][mx];
	eky += rho1d[0][l]drho1d[1][m]rho1d[2][n]*u_brick[mz][my][mx];
	ekz += rho1d[0][l]rho1d[1][m]drho1d[2][n]*u_brick[mz][my][mx];
	}
	}
	}
	ekx *= hx_inv;
	eky *= hy_inv;
	ekz *= hz_inv;
	// convert E-field to force and substract self forces
	const double qfactor = force->qqrd2e * scale;

	s1 = x[i][0]*hx_inv;
	s2 = x[i][1]*hy_inv;
	s3 = x[i][2]*hz_inv;
	sf = sf_coeff[0]sin(2MY_PI*s1);
	sf += sf_coeff[1]sin(4MY_PI*s1);
	sf = 2q[i]*q[i];
	f[i][0] += qfactor(ekxq[i] - sf);

	sf = sf_coeff[2]sin(2MY_PI*s2);
	sf += sf_coeff[3]sin(4MY_PI*s2);
	sf = 2q[i]*q[i];
	f[i][1] += qfactor(ekyq[i] - sf);


	sf = sf_coeff[4]sin(2MY_PI*s3);
	sf += sf_coeff[5]sin(4MY_PI*s3);
	sf = 2q[i]*q[i];
	if (slabflag != 2) f[i][2] += qfactor(ekzq[i] - sf);
	}
	}

	/* ----------------------------------------------------------------------
	interpolate from grid to get electric field & force on my particles
	------------------------------------------------------------------------- */

	void PPPMDisp::fieldforce_c_peratom()
	{
	int i,l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz,x0,y0,z0;
	FFT_SCALAR u_pa,v0,v1,v2,v3,v4,v5;

	// loop over my charges, interpolate electric field from nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt
	// ek = 3 components of E-field on particle

	double *q = atom->q;
	double **x = atom->x;

	int nlocal = atom->nlocal;

	for (i = 0; i < nlocal; i++) {
	nx = part2grid[i][0];
	ny = part2grid[i][1];
	nz = part2grid[i][2];
	dx = nx+shiftone - (x[i][0]-boxlo[0])*delxinv;
	dy = ny+shiftone - (x[i][1]-boxlo[1])*delyinv;
	dz = nz+shiftone - (x[i][2]-boxlo[2])*delzinv;

	compute_rho1d(dx,dy,dz, order, rho_coeff, rho1d);

	u_pa = v0 = v1 = v2 = v3 = v4 = v5 = ZEROF;
	for (n = nlower; n <= nupper; n++) {
	mz = n+nz;
	z0 = rho1d[2][n];
	for (m = nlower; m <= nupper; m++) {
	my = m+ny;
	y0 = z0*rho1d[1][m];
	for (l = nlower; l <= nupper; l++) {
	mx = l+nx;
	x0 = y0*rho1d[0][l];
	if (eflag_atom) u_pa += x0*u_brick[mz][my][mx];
	if (vflag_atom) {
	v0 += x0*v0_brick[mz][my][mx];
	v1 += x0*v1_brick[mz][my][mx];
	v2 += x0*v2_brick[mz][my][mx];
	v3 += x0*v3_brick[mz][my][mx];
	v4 += x0*v4_brick[mz][my][mx];
	v5 += x0*v5_brick[mz][my][mx];
	}
	}
	}
	}

	// convert E-field to force

	const double qfactor = 0.5force->qqrd2e scale * q[i];

	if (eflag_atom) eatom[i] += u_pa*qfactor;
	if (vflag_atom) {
	vatom[i][0] += v0*qfactor;
	vatom[i][1] += v1*qfactor;
	vatom[i][2] += v2*qfactor;
	vatom[i][3] += v3*qfactor;
	vatom[i][4] += v4*qfactor;
	vatom[i][5] += v5*qfactor;
	}
	}
	}

	/* ----------------------------------------------------------------------
	interpolate from grid to get dispersion field & force on my particles
	for geometric mixing rule
	------------------------------------------------------------------------- */

	void PPPMDisp::fieldforce_g_ik()
	{
	int i,l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz,x0,y0,z0;
	FFT_SCALAR ekx,eky,ekz;

	// loop over my charges, interpolate electric field from nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt
	// ek = 3 components of dispersion field on particle

	double **x = atom->x;
	double **f = atom->f;
	int type;
	double lj;

	int nlocal = atom->nlocal;

	for (i = 0; i < nlocal; i++) {
	nx = part2grid_6[i][0];
	ny = part2grid_6[i][1];
	nz = part2grid_6[i][2];
	dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
	dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
	dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;

	compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);

	ekx = eky = ekz = ZEROF;
	for (n = nlower_6; n <= nupper_6; n++) {
	mz = n+nz;
	z0 = rho1d_6[2][n];
	for (m = nlower_6; m <= nupper_6; m++) {
	my = m+ny;
	y0 = z0*rho1d_6[1][m];
	for (l = nlower_6; l <= nupper_6; l++) {
	mx = l+nx;
	x0 = y0*rho1d_6[0][l];
	ekx -= x0*vdx_brick_g[mz][my][mx];
	eky -= x0*vdy_brick_g[mz][my][mx];
	ekz -= x0*vdz_brick_g[mz][my][mx];
	}
	}
	}

	// convert E-field to force
	type = atom->type[i];
	lj = B[type];
	f[i][0] += lj*ekx;
	f[i][1] += lj*eky;
	if (slabflag != 2) f[i][2] += lj*ekz;
	}
	}

	/* ----------------------------------------------------------------------
	interpolate from grid to get dispersion field & force on my particles
	for geometric mixing rule for ad scheme
	------------------------------------------------------------------------- */

	void PPPMDisp::fieldforce_g_ad()
	{
	int i,l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz;
	FFT_SCALAR ekx,eky,ekz;
	double s1,s2,s3;
	double sf = 0.0;
	double *prd;

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;

	double hx_inv = nx_pppm_6/xprd;
	double hy_inv = ny_pppm_6/yprd;
	double hz_inv = nz_pppm_6/zprd_slab;

	// loop over my charges, interpolate electric field from nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt
	// ek = 3 components of dispersion field on particle

	double **x = atom->x;
	double **f = atom->f;
	int type;
	double lj;

	int nlocal = atom->nlocal;


	for (i = 0; i < nlocal; i++) {
	nx = part2grid_6[i][0];
	ny = part2grid_6[i][1];
	nz = part2grid_6[i][2];
	dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
	dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
	dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;

	compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
	compute_drho1d(dx,dy,dz, order_6, drho_coeff_6, drho1d_6);


	ekx = eky = ekz = ZEROF;
	for (n = nlower_6; n <= nupper_6; n++) {
	mz = n+nz;
	for (m = nlower_6; m <= nupper_6; m++) {
	my = m+ny;
	for (l = nlower_6; l <= nupper_6; l++) {
	mx = l+nx;
	ekx += drho1d_6[0][l]rho1d_6[1][m]rho1d_6[2][n]*u_brick_g[mz][my][mx];
	eky += rho1d_6[0][l]drho1d_6[1][m]rho1d_6[2][n]*u_brick_g[mz][my][mx];
	ekz += rho1d_6[0][l]rho1d_6[1][m]drho1d_6[2][n]*u_brick_g[mz][my][mx];
	}
	}
	}
	ekx *= hx_inv;
	eky *= hy_inv;
	ekz *= hz_inv;

	// convert E-field to force
	type = atom->type[i];
	lj = B[type];

	s1 = x[i][0]*hx_inv;
	s2 = x[i][1]*hy_inv;
	s3 = x[i][2]*hz_inv;

	sf = sf_coeff_6[0]sin(2MY_PI*s1);
	sf += sf_coeff_6[1]sin(4MY_PI*s1);
	sf = 2lj*lj;
	f[i][0] += ekx*lj - sf;

	sf = sf_coeff_6[2]sin(2MY_PI*s2);
	sf += sf_coeff_6[3]sin(4MY_PI*s2);
	sf = 2lj*lj;
	f[i][1] += eky*lj - sf;


	sf = sf_coeff_6[4]sin(2MY_PI*s3);
	sf += sf_coeff_6[5]sin(4MY_PI*s3);
	sf = 2lj*lj;
	if (slabflag != 2) f[i][2] += ekz*lj - sf;

	}
	}

	/* ----------------------------------------------------------------------
	interpolate from grid to get dispersion field & force on my particles
	for geometric mixing rule for per atom quantities
	------------------------------------------------------------------------- */

	void PPPMDisp::fieldforce_g_peratom()
	{
	int i,l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz,x0,y0,z0;
	FFT_SCALAR u_pa,v0,v1,v2,v3,v4,v5;

	// loop over my charges, interpolate electric field from nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt
	// ek = 3 components of dispersion field on particle

	double **x = atom->x;
	int type;
	double lj;

	int nlocal = atom->nlocal;

	for (i = 0; i < nlocal; i++) {
	nx = part2grid_6[i][0];
	ny = part2grid_6[i][1];
	nz = part2grid_6[i][2];
	dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
	dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
	dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;

	compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);

	u_pa = v0 = v1 = v2 = v3 = v4 = v5 = ZEROF;
	for (n = nlower_6; n <= nupper_6; n++) {
	mz = n+nz;
	z0 = rho1d_6[2][n];
	for (m = nlower_6; m <= nupper_6; m++) {
	my = m+ny;
	y0 = z0*rho1d_6[1][m];
	for (l = nlower_6; l <= nupper_6; l++) {
	mx = l+nx;
	x0 = y0*rho1d_6[0][l];
	if (eflag_atom) u_pa += x0*u_brick_g[mz][my][mx];
	if (vflag_atom) {
	v0 += x0*v0_brick_g[mz][my][mx];
	v1 += x0*v1_brick_g[mz][my][mx];
	v2 += x0*v2_brick_g[mz][my][mx];
	v3 += x0*v3_brick_g[mz][my][mx];
	v4 += x0*v4_brick_g[mz][my][mx];
	v5 += x0*v5_brick_g[mz][my][mx];
	}
	}
	}
	}

	// convert E-field to force
	type = atom->type[i];
	lj = B[type]*0.5;

	if (eflag_atom) eatom[i] += u_pa*lj;
	if (vflag_atom) {
	vatom[i][0] += v0*lj;
	vatom[i][1] += v1*lj;
	vatom[i][2] += v2*lj;
	vatom[i][3] += v3*lj;
	vatom[i][4] += v4*lj;
	vatom[i][5] += v5*lj;
	}
	}
	}

	/* ----------------------------------------------------------------------
	interpolate from grid to get dispersion field & force on my particles
	for arithmetic mixing rule and ik scheme
	------------------------------------------------------------------------- */

	void PPPMDisp::fieldforce_a_ik()
	{
	int i,l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz,x0,y0,z0;
	FFT_SCALAR ekx0, eky0, ekz0, ekx1, eky1, ekz1, ekx2, eky2, ekz2;
	FFT_SCALAR ekx3, eky3, ekz3, ekx4, eky4, ekz4, ekx5, eky5, ekz5;
	FFT_SCALAR ekx6, eky6, ekz6;

	// loop over my charges, interpolate electric field from nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt
	// ek = 3 components of dispersion field on particle

	double **x = atom->x;
	double **f = atom->f;
	int type;
	double lj0, lj1, lj2, lj3, lj4, lj5, lj6;

	int nlocal = atom->nlocal;

	for (i = 0; i < nlocal; i++) {

	nx = part2grid_6[i][0];
	ny = part2grid_6[i][1];
	nz = part2grid_6[i][2];
	dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
	dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
	dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
	compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
	ekx0 = eky0 = ekz0 = ZEROF;
	ekx1 = eky1 = ekz1 = ZEROF;
	ekx2 = eky2 = ekz2 = ZEROF;
	ekx3 = eky3 = ekz3 = ZEROF;
	ekx4 = eky4 = ekz4 = ZEROF;
	ekx5 = eky5 = ekz5 = ZEROF;
	ekx6 = eky6 = ekz6 = ZEROF;
	for (n = nlower_6; n <= nupper_6; n++) {
	mz = n+nz;
	z0 = rho1d_6[2][n];
	for (m = nlower_6; m <= nupper_6; m++) {
	my = m+ny;
	y0 = z0*rho1d_6[1][m];
	for (l = nlower_6; l <= nupper_6; l++) {
	mx = l+nx;
	x0 = y0*rho1d_6[0][l];
	ekx0 -= x0*vdx_brick_a0[mz][my][mx];
	eky0 -= x0*vdy_brick_a0[mz][my][mx];
	ekz0 -= x0*vdz_brick_a0[mz][my][mx];
	ekx1 -= x0*vdx_brick_a1[mz][my][mx];
	eky1 -= x0*vdy_brick_a1[mz][my][mx];
	ekz1 -= x0*vdz_brick_a1[mz][my][mx];
	ekx2 -= x0*vdx_brick_a2[mz][my][mx];
	eky2 -= x0*vdy_brick_a2[mz][my][mx];
	ekz2 -= x0*vdz_brick_a2[mz][my][mx];
	ekx3 -= x0*vdx_brick_a3[mz][my][mx];
	eky3 -= x0*vdy_brick_a3[mz][my][mx];
	ekz3 -= x0*vdz_brick_a3[mz][my][mx];
	ekx4 -= x0*vdx_brick_a4[mz][my][mx];
	eky4 -= x0*vdy_brick_a4[mz][my][mx];
	ekz4 -= x0*vdz_brick_a4[mz][my][mx];
	ekx5 -= x0*vdx_brick_a5[mz][my][mx];
	eky5 -= x0*vdy_brick_a5[mz][my][mx];
	ekz5 -= x0*vdz_brick_a5[mz][my][mx];
	ekx6 -= x0*vdx_brick_a6[mz][my][mx];
	eky6 -= x0*vdy_brick_a6[mz][my][mx];
	ekz6 -= x0*vdz_brick_a6[mz][my][mx];
	}
	}
	}
	// convert D-field to force
	type = atom->type[i];
	lj0 = B[7*type+6];
	lj1 = B[7*type+5];
	lj2 = B[7*type+4];
	lj3 = B[7*type+3];
	lj4 = B[7*type+2];
	lj5 = B[7*type+1];
	lj6 = B[7*type];
	f[i][0] += lj0ekx0 + lj1ekx1 + lj2ekx2 + lj3ekx3 + lj4ekx4 + lj5ekx5 + lj6*ekx6;
	f[i][1] += lj0eky0 + lj1eky1 + lj2eky2 + lj3eky3 + lj4eky4 + lj5eky5 + lj6*eky6;
	if (slabflag != 2) f[i][2] += lj0ekz0 + lj1ekz1 + lj2ekz2 + lj3ekz3 + lj4ekz4 + lj5ekz5 + lj6*ekz6;
	}
	}

	/* ----------------------------------------------------------------------
	interpolate from grid to get dispersion field & force on my particles
	for arithmetic mixing rule for the ad scheme
	------------------------------------------------------------------------- */

	void PPPMDisp::fieldforce_a_ad()
	{
	int i,l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz,x0,y0,z0;
	FFT_SCALAR ekx0, eky0, ekz0, ekx1, eky1, ekz1, ekx2, eky2, ekz2;
	FFT_SCALAR ekx3, eky3, ekz3, ekx4, eky4, ekz4, ekx5, eky5, ekz5;
	FFT_SCALAR ekx6, eky6, ekz6;

	double s1,s2,s3;
	double sf = 0.0;
	double *prd;

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;

	double hx_inv = nx_pppm_6/xprd;
	double hy_inv = ny_pppm_6/yprd;
	double hz_inv = nz_pppm_6/zprd_slab;

	// loop over my charges, interpolate electric field from nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt
	// ek = 3 components of dispersion field on particle

	double **x = atom->x;
	double **f = atom->f;
	int type;
	double lj0, lj1, lj2, lj3, lj4, lj5, lj6;

	int nlocal = atom->nlocal;

	for (i = 0; i < nlocal; i++) {

	nx = part2grid_6[i][0];
	ny = part2grid_6[i][1];
	nz = part2grid_6[i][2];
	dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
	dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
	dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;

	compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
	compute_drho1d(dx,dy,dz, order_6, drho_coeff_6, drho1d_6);

	ekx0 = eky0 = ekz0 = ZEROF;
	ekx1 = eky1 = ekz1 = ZEROF;
	ekx2 = eky2 = ekz2 = ZEROF;
	ekx3 = eky3 = ekz3 = ZEROF;
	ekx4 = eky4 = ekz4 = ZEROF;
	ekx5 = eky5 = ekz5 = ZEROF;
	ekx6 = eky6 = ekz6 = ZEROF;
	for (n = nlower_6; n <= nupper_6; n++) {
	mz = n+nz;
	for (m = nlower_6; m <= nupper_6; m++) {
	my = m+ny;
	for (l = nlower_6; l <= nupper_6; l++) {
	mx = l+nx;
	x0 = drho1d_6[0][l]rho1d_6[1][m]rho1d_6[2][n];
	y0 = rho1d_6[0][l]drho1d_6[1][m]rho1d_6[2][n];
	z0 = rho1d_6[0][l]rho1d_6[1][m]drho1d_6[2][n];

	ekx0 += x0*u_brick_a0[mz][my][mx];
	eky0 += y0*u_brick_a0[mz][my][mx];
	ekz0 += z0*u_brick_a0[mz][my][mx];

	ekx1 += x0*u_brick_a1[mz][my][mx];
	eky1 += y0*u_brick_a1[mz][my][mx];
	ekz1 += z0*u_brick_a1[mz][my][mx];

	ekx2 += x0*u_brick_a2[mz][my][mx];
	eky2 += y0*u_brick_a2[mz][my][mx];
	ekz2 += z0*u_brick_a2[mz][my][mx];

	ekx3 += x0*u_brick_a3[mz][my][mx];
	eky3 += y0*u_brick_a3[mz][my][mx];
	ekz3 += z0*u_brick_a3[mz][my][mx];

	ekx4 += x0*u_brick_a4[mz][my][mx];
	eky4 += y0*u_brick_a4[mz][my][mx];
	ekz4 += z0*u_brick_a4[mz][my][mx];

	ekx5 += x0*u_brick_a5[mz][my][mx];
	eky5 += y0*u_brick_a5[mz][my][mx];
	ekz5 += z0*u_brick_a5[mz][my][mx];

	ekx6 += x0*u_brick_a6[mz][my][mx];
	eky6 += y0*u_brick_a6[mz][my][mx];
	ekz6 += z0*u_brick_a6[mz][my][mx];
	}
	}
	}

	ekx0 *= hx_inv;
	eky0 *= hy_inv;
	ekz0 *= hz_inv;

	ekx1 *= hx_inv;
	eky1 *= hy_inv;
	ekz1 *= hz_inv;

	ekx2 *= hx_inv;
	eky2 *= hy_inv;
	ekz2 *= hz_inv;

	ekx3 *= hx_inv;
	eky3 *= hy_inv;
	ekz3 *= hz_inv;

	ekx4 *= hx_inv;
	eky4 *= hy_inv;
	ekz4 *= hz_inv;

	ekx5 *= hx_inv;
	eky5 *= hy_inv;
	ekz5 *= hz_inv;

	ekx6 *= hx_inv;
	eky6 *= hy_inv;
	ekz6 *= hz_inv;

	// convert D-field to force
	type = atom->type[i];
	lj0 = B[7*type+6];
	lj1 = B[7*type+5];
	lj2 = B[7*type+4];
	lj3 = B[7*type+3];
	lj4 = B[7*type+2];
	lj5 = B[7*type+1];
	lj6 = B[7*type];

	s1 = x[i][0]*hx_inv;
	s2 = x[i][1]*hy_inv;
	s3 = x[i][2]*hz_inv;

	sf = sf_coeff_6[0]sin(2MY_PI*s1);
	sf += sf_coeff_6[1]sin(4MY_PI*s1);
	sf = 4lj0lj6 + 4lj1lj5 + 4lj2lj4 + 2lj3*lj3;
	f[i][0] += lj0ekx0 + lj1ekx1 + lj2ekx2 + lj3ekx3 + lj4ekx4 + lj5ekx5 + lj6*ekx6 - sf;

	sf = sf_coeff_6[2]sin(2MY_PI*s2);
	sf += sf_coeff_6[3]sin(4MY_PI*s2);
	sf = 4lj0lj6 + 4lj1lj5 + 4lj2lj4 + 2lj3*lj3;
	f[i][1] += lj0eky0 + lj1eky1 + lj2eky2 + lj3eky3 + lj4eky4 + lj5eky5 + lj6*eky6 - sf;

	sf = sf_coeff_6[4]sin(2MY_PI*s3);
	sf += sf_coeff_6[5]sin(4MY_PI*s3);
	sf = 4lj0lj6 + 4lj1lj5 + 4lj2lj4 + 2lj3*lj3;
	if (slabflag != 2) f[i][2] += lj0ekz0 + lj1ekz1 + lj2ekz2 + lj3ekz3 + lj4ekz4 + lj5ekz5 + lj6*ekz6 - sf;
	}
	}

	/* ----------------------------------------------------------------------
	interpolate from grid to get dispersion field & force on my particles
	for arithmetic mixing rule for per atom quantities
	------------------------------------------------------------------------- */

	void PPPMDisp::fieldforce_a_peratom()
	{
	int i,l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz,x0,y0,z0;
	FFT_SCALAR u_pa0,v00,v10,v20,v30,v40,v50;
	FFT_SCALAR u_pa1,v01,v11,v21,v31,v41,v51;
	FFT_SCALAR u_pa2,v02,v12,v22,v32,v42,v52;
	FFT_SCALAR u_pa3,v03,v13,v23,v33,v43,v53;
	FFT_SCALAR u_pa4,v04,v14,v24,v34,v44,v54;
	FFT_SCALAR u_pa5,v05,v15,v25,v35,v45,v55;
	FFT_SCALAR u_pa6,v06,v16,v26,v36,v46,v56;

	// loop over my charges, interpolate electric field from nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt
	// ek = 3 components of dispersion field on particle

	double **x = atom->x;
	int type;
	double lj0, lj1, lj2, lj3, lj4, lj5, lj6;

	int nlocal = atom->nlocal;

	for (i = 0; i < nlocal; i++) {

	nx = part2grid_6[i][0];
	ny = part2grid_6[i][1];
	nz = part2grid_6[i][2];
	dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
	dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
	dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
	compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);

	u_pa0 = v00 = v10 = v20 = v30 = v40 = v50 = ZEROF;
	u_pa1 = v01 = v11 = v21 = v31 = v41 = v51 = ZEROF;
	u_pa2 = v02 = v12 = v22 = v32 = v42 = v52 = ZEROF;
	u_pa3 = v03 = v13 = v23 = v33 = v43 = v53 = ZEROF;
	u_pa4 = v04 = v14 = v24 = v34 = v44 = v54 = ZEROF;
	u_pa5 = v05 = v15 = v25 = v35 = v45 = v55 = ZEROF;
	u_pa6 = v06 = v16 = v26 = v36 = v46 = v56 = ZEROF;
	for (n = nlower_6; n <= nupper_6; n++) {
	mz = n+nz;
	z0 = rho1d_6[2][n];
	for (m = nlower_6; m <= nupper_6; m++) {
	my = m+ny;
	y0 = z0*rho1d_6[1][m];
	for (l = nlower_6; l <= nupper_6; l++) {
	mx = l+nx;
	x0 = y0*rho1d_6[0][l];
	if (eflag_atom) {
	u_pa0 += x0*u_brick_a0[mz][my][mx];
	u_pa1 += x0*u_brick_a1[mz][my][mx];
	u_pa2 += x0*u_brick_a2[mz][my][mx];
	u_pa3 += x0*u_brick_a3[mz][my][mx];
	u_pa4 += x0*u_brick_a4[mz][my][mx];
	u_pa5 += x0*u_brick_a5[mz][my][mx];
	u_pa6 += x0*u_brick_a6[mz][my][mx];
	}
	if (vflag_atom) {
	v00 += x0*v0_brick_a0[mz][my][mx];
	v10 += x0*v1_brick_a0[mz][my][mx];
	v20 += x0*v2_brick_a0[mz][my][mx];
	v30 += x0*v3_brick_a0[mz][my][mx];
	v40 += x0*v4_brick_a0[mz][my][mx];
	v50 += x0*v5_brick_a0[mz][my][mx];
	v01 += x0*v0_brick_a1[mz][my][mx];
	v11 += x0*v1_brick_a1[mz][my][mx];
	v21 += x0*v2_brick_a1[mz][my][mx];
	v31 += x0*v3_brick_a1[mz][my][mx];
	v41 += x0*v4_brick_a1[mz][my][mx];
	v51 += x0*v5_brick_a1[mz][my][mx];
	v02 += x0*v0_brick_a2[mz][my][mx];
	v12 += x0*v1_brick_a2[mz][my][mx];
	v22 += x0*v2_brick_a2[mz][my][mx];
	v32 += x0*v3_brick_a2[mz][my][mx];
	v42 += x0*v4_brick_a2[mz][my][mx];
	v52 += x0*v5_brick_a2[mz][my][mx];
	v03 += x0*v0_brick_a3[mz][my][mx];
	v13 += x0*v1_brick_a3[mz][my][mx];
	v23 += x0*v2_brick_a3[mz][my][mx];
	v33 += x0*v3_brick_a3[mz][my][mx];
	v43 += x0*v4_brick_a3[mz][my][mx];
	v53 += x0*v5_brick_a3[mz][my][mx];
	v04 += x0*v0_brick_a4[mz][my][mx];
	v14 += x0*v1_brick_a4[mz][my][mx];
	v24 += x0*v2_brick_a4[mz][my][mx];
	v34 += x0*v3_brick_a4[mz][my][mx];
	v44 += x0*v4_brick_a4[mz][my][mx];
	v54 += x0*v5_brick_a4[mz][my][mx];
	v05 += x0*v0_brick_a5[mz][my][mx];
	v15 += x0*v1_brick_a5[mz][my][mx];
	v25 += x0*v2_brick_a5[mz][my][mx];
	v35 += x0*v3_brick_a5[mz][my][mx];
	v45 += x0*v4_brick_a5[mz][my][mx];
	v55 += x0*v5_brick_a5[mz][my][mx];
	v06 += x0*v0_brick_a6[mz][my][mx];
	v16 += x0*v1_brick_a6[mz][my][mx];
	v26 += x0*v2_brick_a6[mz][my][mx];
	v36 += x0*v3_brick_a6[mz][my][mx];
	v46 += x0*v4_brick_a6[mz][my][mx];
	v56 += x0*v5_brick_a6[mz][my][mx];
	}
	}
	}
	}
	// convert D-field to force
	type = atom->type[i];
	lj0 = B[7type+6]0.5;
	lj1 = B[7type+5]0.5;
	lj2 = B[7type+4]0.5;
	lj3 = B[7type+3]0.5;
	lj4 = B[7type+2]0.5;
	lj5 = B[7type+1]0.5;
	lj6 = B[7type]0.5;


	if (eflag_atom)
	eatom[i] += u_pa0lj0 + u_pa1lj1 + u_pa2*lj2 +
	u_pa3lj3 + u_pa4lj4 + u_pa5lj5 + u_pa6lj6;
	if (vflag_atom) {
	vatom[i][0] += v00lj0 + v01lj1 + v02lj2 + v03lj3 +
	v04lj4 + v05lj5 + v06*lj6;
	vatom[i][1] += v10lj0 + v11lj1 + v12lj2 + v13lj3 +
	v14lj4 + v15lj5 + v16*lj6;
	vatom[i][2] += v20lj0 + v21lj1 + v22lj2 + v23lj3 +
	v24lj4 + v25lj5 + v26*lj6;
	vatom[i][3] += v30lj0 + v31lj1 + v32lj2 + v33lj3 +
	v34lj4 + v35lj5 + v36*lj6;
	vatom[i][4] += v40lj0 + v41lj1 + v42lj2 + v43lj3 +
	v44lj4 + v45lj5 + v46*lj6;
	vatom[i][5] += v50lj0 + v51lj1 + v52lj2 + v53lj3 +
	v54lj4 + v55lj5 + v56*lj6;
	}
	}
	}

	/* ----------------------------------------------------------------------
	interpolate from grid to get dispersion field & force on my particles
	for arithmetic mixing rule and ik scheme
	------------------------------------------------------------------------- */

	void PPPMDisp::fieldforce_none_ik()
	{
	int i,k,l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz,x0,y0,z0;
	FFT_SCALAR ekx, eky, *ekz;

	ekx = new FFT_SCALAR[nsplit];
	eky = new FFT_SCALAR[nsplit];
	ekz = new FFT_SCALAR[nsplit];
	// loop over my charges, interpolate electric field from nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt
	// ek = 3 components of dispersion field on particle

	double **x = atom->x;
	double **f = atom->f;
	int type;
	double lj;

	int nlocal = atom->nlocal;

	for (i = 0; i < nlocal; i++) {

	nx = part2grid_6[i][0];
	ny = part2grid_6[i][1];
	nz = part2grid_6[i][2];
	dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
	dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
	dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
	compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
	for (k = 0; k < nsplit; k++)
	ekx[k] = eky[k] = ekz[k] = ZEROF;
	for (n = nlower_6; n <= nupper_6; n++) {
	mz = n+nz;
	z0 = rho1d_6[2][n];
	for (m = nlower_6; m <= nupper_6; m++) {
	my = m+ny;
	y0 = z0*rho1d_6[1][m];
	for (l = nlower_6; l <= nupper_6; l++) {
	mx = l+nx;
	x0 = y0*rho1d_6[0][l];
	for (k = 0; k < nsplit; k++) {
	ekx[k] -= x0*vdx_brick_none[k][mz][my][mx];
	eky[k] -= x0*vdy_brick_none[k][mz][my][mx];
	ekz[k] -= x0*vdz_brick_none[k][mz][my][mx];
	}
	}
	}
	}
	// convert D-field to force
	type = atom->type[i];
	for (k = 0; k < nsplit; k++) {
	lj = B[nsplit*type + k];
	f[i][0] += lj*ekx[k];
	f[i][1] +=lj*eky[k];
	if (slabflag != 2) f[i][2] +=lj*ekz[k];
	}
	}

	delete [] ekx;
	delete [] eky;
	delete [] ekz;
	}

	/* ----------------------------------------------------------------------
	interpolate from grid to get dispersion field & force on my particles
	for arithmetic mixing rule for the ad scheme
	------------------------------------------------------------------------- */

	void PPPMDisp::fieldforce_none_ad()
	{
	int i,k,l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz,x0,y0,z0;
	FFT_SCALAR ekx, eky, *ekz;

	ekx = new FFT_SCALAR[nsplit];
	eky = new FFT_SCALAR[nsplit];
	ekz = new FFT_SCALAR[nsplit];


	double s1,s2,s3;
	double sf1,sf2,sf3;
	double sf = 0.0;
	double *prd;

	if (triclinic == 0) prd = domain->prd;
	else prd = domain->prd_lamda;

	double xprd = prd[0];
	double yprd = prd[1];
	double zprd = prd[2];
	double zprd_slab = zprd*slab_volfactor;

	double hx_inv = nx_pppm_6/xprd;
	double hy_inv = ny_pppm_6/yprd;
	double hz_inv = nz_pppm_6/zprd_slab;

	// loop over my charges, interpolate electric field from nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt
	// ek = 3 components of dispersion field on particle

	double **x = atom->x;
	double **f = atom->f;
	int type;
	double lj;

	int nlocal = atom->nlocal;

	for (i = 0; i < nlocal; i++) {

	nx = part2grid_6[i][0];
	ny = part2grid_6[i][1];
	nz = part2grid_6[i][2];
	dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
	dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
	dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;

	compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
	compute_drho1d(dx,dy,dz, order_6, drho_coeff_6, drho1d_6);

	for (k = 0; k < nsplit; k++)
	ekx[k] = eky[k] = ekz[k] = ZEROF;

	for (n = nlower_6; n <= nupper_6; n++) {
	mz = n+nz;
	for (m = nlower_6; m <= nupper_6; m++) {
	my = m+ny;
	for (l = nlower_6; l <= nupper_6; l++) {
	mx = l+nx;
	x0 = drho1d_6[0][l]rho1d_6[1][m]rho1d_6[2][n];
	y0 = rho1d_6[0][l]drho1d_6[1][m]rho1d_6[2][n];
	z0 = rho1d_6[0][l]rho1d_6[1][m]drho1d_6[2][n];

	for (k = 0; k < nsplit; k++) {
	ekx[k] += x0*u_brick_none[k][mz][my][mx];
	eky[k] += y0*u_brick_none[k][mz][my][mx];
	ekz[k] += z0*u_brick_none[k][mz][my][mx];
	}
	}
	}
	}

	for (k = 0; k < nsplit; k++) {
	ekx[k] *= hx_inv;
	eky[k] *= hy_inv;
	ekz[k] *= hz_inv;
	}

	// convert D-field to force
	type = atom->type[i];

	s1 = x[i][0]*hx_inv;
	s2 = x[i][1]*hy_inv;
	s3 = x[i][2]*hz_inv;

	sf1 = sf_coeff_6[0]sin(2MY_PI*s1);
	sf1 += sf_coeff_6[1]sin(4MY_PI*s1);

	sf2 = sf_coeff_6[2]sin(2MY_PI*s2);
	sf2 += sf_coeff_6[3]sin(4MY_PI*s2);

	sf3 = sf_coeff_6[4]sin(2MY_PI*s3);
	sf3 += sf_coeff_6[5]sin(4MY_PI*s3);

	for (k = 0; k < nsplit; k++) {
	lj = B[nsplit*type + k];

	sf = sf1B[k]2ljlj;
	f[i][0] += lj*ekx[k] - sf;


	sf = sf2B[k]2ljlj;
	f[i][1] += lj*eky[k] - sf;

	sf = sf3B[k]2ljlj;
	if (slabflag != 2) f[i][2] += lj*ekz[k] - sf;
	}
	}

	delete [] ekx;
	delete [] eky;
	delete [] ekz;
	}

	/* ----------------------------------------------------------------------
	interpolate from grid to get dispersion field & force on my particles
	for arithmetic mixing rule for per atom quantities
	------------------------------------------------------------------------- */

	void PPPMDisp::fieldforce_none_peratom()
	{
	int i,k,l,m,n,nx,ny,nz,mx,my,mz;
	FFT_SCALAR dx,dy,dz,x0,y0,z0;
	FFT_SCALAR u_pa,v0,v1,v2,v3,v4,*v5;

	u_pa = new FFT_SCALAR[nsplit];
	v0 = new FFT_SCALAR[nsplit];
	v1 = new FFT_SCALAR[nsplit];
	v2 = new FFT_SCALAR[nsplit];
	v3 = new FFT_SCALAR[nsplit];
	v4 = new FFT_SCALAR[nsplit];
	v5 = new FFT_SCALAR[nsplit];

	// loop over my charges, interpolate electric field from nearby grid points
	// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
	// (dx,dy,dz) = distance to "lower left" grid pt
	// (mx,my,mz) = global coords of moving stencil pt
	// ek = 3 components of dispersion field on particle

	double **x = atom->x;
	int type;
	double lj;

	int nlocal = atom->nlocal;

	for (i = 0; i < nlocal; i++) {

	nx = part2grid_6[i][0];
	ny = part2grid_6[i][1];
	nz = part2grid_6[i][2];
	dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
	dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
	dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
	compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);

	for (k = 0; k < nsplit; k++)
	u_pa[k] = v0[k] = v1[k] = v2[k] = v3[k] = v4[k] = v5[k] = ZEROF;

	for (n = nlower_6; n <= nupper_6; n++) {
	mz = n+nz;
	z0 = rho1d_6[2][n];
	for (m = nlower_6; m <= nupper_6; m++) {
	my = m+ny;
	y0 = z0*rho1d_6[1][m];
	for (l = nlower_6; l <= nupper_6; l++) {
	mx = l+nx;
	x0 = y0*rho1d_6[0][l];
	if (eflag_atom) {
	for (k = 0; k < nsplit; k++)
	u_pa[k] += x0*u_brick_none[k][mz][my][mx];
	}
	if (vflag_atom) {
	for (k = 0; k < nsplit; k++) {
	v0[k] += x0*v0_brick_none[k][mz][my][mx];
	v1[k] += x0*v1_brick_none[k][mz][my][mx];
	v2[k] += x0*v2_brick_none[k][mz][my][mx];
	v3[k] += x0*v3_brick_none[k][mz][my][mx];
	v4[k] += x0*v4_brick_none[k][mz][my][mx];
	v5[k] += x0*v5_brick_none[k][mz][my][mx];
	}
	}
	}
	}
	}
	// convert D-field to force
	type = atom->type[i];
	for (k = 0; k < nsplit; k++) {
	lj = B[nsplittype + k]0.5;

	if (eflag_atom) {
	eatom[i] += u_pa[k]*lj;
	}
	if (vflag_atom) {
	vatom[i][0] += v0[k]*lj;
	vatom[i][1] += v1[k]*lj;
	vatom[i][2] += v2[k]*lj;
	vatom[i][3] += v3[k]*lj;
	vatom[i][4] += v4[k]*lj;
	vatom[i][5] += v5[k]*lj;
	}
	}
	}

	delete [] u_pa;
	delete [] v0;
	delete [] v1;
	delete [] v2;
	delete [] v3;
	delete [] v4;
	delete [] v5;
	}

	/* ----------------------------------------------------------------------
	pack values to buf to send to another proc
	------------------------------------------------------------------------- */

	void PPPMDisp::pack_forward(int flag, FFT_SCALAR buf, int nlist, int list)
	{
	int n = 0;

	switch (flag) {

	// Coulomb interactions

	case FORWARD_IK: {
	FFT_SCALAR *xsrc = &vdx_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *ysrc = &vdy_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *zsrc = &vdz_brick[nzlo_out][nylo_out][nxlo_out];
	for (int i = 0; i < nlist; i++) {
	buf[n++] = xsrc[list[i]];
	buf[n++] = ysrc[list[i]];
	buf[n++] = zsrc[list[i]];
	}
	break;
	}

	case FORWARD_AD: {
	FFT_SCALAR *src = &u_brick[nzlo_out][nylo_out][nxlo_out];
	for (int i = 0; i < nlist; i++)
	buf[i] = src[list[i]];
	break;
	}

	case FORWARD_IK_PERATOM: {
	FFT_SCALAR *esrc = &u_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v0src = &v0_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v1src = &v1_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v2src = &v2_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v3src = &v3_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v4src = &v4_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v5src = &v5_brick[nzlo_out][nylo_out][nxlo_out];
	for (int i = 0; i < nlist; i++) {
	if (eflag_atom) buf[n++] = esrc[list[i]];
	if (vflag_atom) {
	buf[n++] = v0src[list[i]];
	buf[n++] = v1src[list[i]];
	buf[n++] = v2src[list[i]];
	buf[n++] = v3src[list[i]];
	buf[n++] = v4src[list[i]];
	buf[n++] = v5src[list[i]];
	}
	}
	break;
	}

	case FORWARD_AD_PERATOM: {
	FFT_SCALAR *v0src = &v0_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v1src = &v1_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v2src = &v2_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v3src = &v3_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v4src = &v4_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v5src = &v5_brick[nzlo_out][nylo_out][nxlo_out];
	for (int i = 0; i < nlist; i++) {
	buf[n++] = v0src[list[i]];
	buf[n++] = v1src[list[i]];
	buf[n++] = v2src[list[i]];
	buf[n++] = v3src[list[i]];
	buf[n++] = v4src[list[i]];
	buf[n++] = v5src[list[i]];
	}
	break;
	}

	// Dispersion interactions, geometric mixing

	case FORWARD_IK_G: {
	FFT_SCALAR *xsrc = &vdx_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ysrc = &vdy_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zsrc = &vdz_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++) {
	buf[n++] = xsrc[list[i]];
	buf[n++] = ysrc[list[i]];
	buf[n++] = zsrc[list[i]];
	}
	break;
	}

	case FORWARD_AD_G: {
	FFT_SCALAR *src = &u_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++)
	buf[i] = src[list[i]];
	break;
	}

	case FORWARD_IK_PERATOM_G: {
	FFT_SCALAR *esrc = &u_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src = &v0_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src = &v1_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src = &v2_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src = &v3_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src = &v4_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src = &v5_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++) {
	if (eflag_atom) buf[n++] = esrc[list[i]];
	if (vflag_atom) {
	buf[n++] = v0src[list[i]];
	buf[n++] = v1src[list[i]];
	buf[n++] = v2src[list[i]];
	buf[n++] = v3src[list[i]];
	buf[n++] = v4src[list[i]];
	buf[n++] = v5src[list[i]];
	}
	}
	break;
	}

	case FORWARD_AD_PERATOM_G: {
	FFT_SCALAR *v0src = &v0_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src = &v1_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src = &v2_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src = &v3_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src = &v4_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src = &v5_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++) {
	buf[n++] = v0src[list[i]];
	buf[n++] = v1src[list[i]];
	buf[n++] = v2src[list[i]];
	buf[n++] = v3src[list[i]];
	buf[n++] = v4src[list[i]];
	buf[n++] = v5src[list[i]];
	}
	break;
	}

	// Dispersion interactions, arithmetic mixing

	case FORWARD_IK_A: {
	FFT_SCALAR *xsrc0 = &vdx_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ysrc0 = &vdy_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zsrc0 = &vdz_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *xsrc1 = &vdx_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ysrc1 = &vdy_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zsrc1 = &vdz_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *xsrc2 = &vdx_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ysrc2 = &vdy_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zsrc2 = &vdz_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *xsrc3 = &vdx_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ysrc3 = &vdy_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zsrc3 = &vdz_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *xsrc4 = &vdx_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ysrc4 = &vdy_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zsrc4 = &vdz_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *xsrc5 = &vdx_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ysrc5 = &vdy_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zsrc5 = &vdz_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *xsrc6 = &vdx_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ysrc6 = &vdy_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zsrc6 = &vdz_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];

	for (int i = 0; i < nlist; i++) {
	buf[n++] = xsrc0[list[i]];
	buf[n++] = ysrc0[list[i]];
	buf[n++] = zsrc0[list[i]];

	buf[n++] = xsrc1[list[i]];
	buf[n++] = ysrc1[list[i]];
	buf[n++] = zsrc1[list[i]];

	buf[n++] = xsrc2[list[i]];
	buf[n++] = ysrc2[list[i]];
	buf[n++] = zsrc2[list[i]];

	buf[n++] = xsrc3[list[i]];
	buf[n++] = ysrc3[list[i]];
	buf[n++] = zsrc3[list[i]];

	buf[n++] = xsrc4[list[i]];
	buf[n++] = ysrc4[list[i]];
	buf[n++] = zsrc4[list[i]];

	buf[n++] = xsrc5[list[i]];
	buf[n++] = ysrc5[list[i]];
	buf[n++] = zsrc5[list[i]];

	buf[n++] = xsrc6[list[i]];
	buf[n++] = ysrc6[list[i]];
	buf[n++] = zsrc6[list[i]];
	}
	break;
	}

	case FORWARD_AD_A: {
	FFT_SCALAR *src0 = &u_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *src1 = &u_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *src2 = &u_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *src3 = &u_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *src4 = &u_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *src5 = &u_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *src6 = &u_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];

	for (int i = 0; i < nlist; i++) {
	buf[n++] = src0[list[i]];
	buf[n++] = src1[list[i]];
	buf[n++] = src2[list[i]];
	buf[n++] = src3[list[i]];
	buf[n++] = src4[list[i]];
	buf[n++] = src5[list[i]];
	buf[n++] = src6[list[i]];
	}
	break;
	}

	case FORWARD_IK_PERATOM_A: {
	FFT_SCALAR *esrc0 = &u_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src0 = &v0_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src0 = &v1_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src0 = &v2_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src0 = &v3_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src0 = &v4_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src0 = &v5_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *esrc1 = &u_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src1 = &v0_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src1 = &v1_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src1 = &v2_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src1 = &v3_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src1 = &v4_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src1 = &v5_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *esrc2 = &u_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src2 = &v0_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src2 = &v1_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src2 = &v2_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src2 = &v3_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src2 = &v4_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src2 = &v5_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *esrc3 = &u_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src3 = &v0_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src3 = &v1_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src3 = &v2_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src3 = &v3_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src3 = &v4_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src3 = &v5_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *esrc4 = &u_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src4 = &v0_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src4 = &v1_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src4 = &v2_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src4 = &v3_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src4 = &v4_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src4 = &v5_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *esrc5 = &u_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src5 = &v0_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src5 = &v1_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src5 = &v2_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src5 = &v3_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src5 = &v4_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src5 = &v5_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *esrc6 = &u_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src6 = &v0_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src6 = &v1_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src6 = &v2_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src6 = &v3_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src6 = &v4_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src6 = &v5_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];

	for (int i = 0; i < nlist; i++) {
	if (eflag_atom) {
	buf[n++] = esrc0[list[i]];
	buf[n++] = esrc1[list[i]];
	buf[n++] = esrc2[list[i]];
	buf[n++] = esrc3[list[i]];
	buf[n++] = esrc4[list[i]];
	buf[n++] = esrc5[list[i]];
	buf[n++] = esrc6[list[i]];
	}
	if (vflag_atom) {
	buf[n++] = v0src0[list[i]];
	buf[n++] = v1src0[list[i]];
	buf[n++] = v2src0[list[i]];
	buf[n++] = v3src0[list[i]];
	buf[n++] = v4src0[list[i]];
	buf[n++] = v5src0[list[i]];

	buf[n++] = v0src1[list[i]];
	buf[n++] = v1src1[list[i]];
	buf[n++] = v2src1[list[i]];
	buf[n++] = v3src1[list[i]];
	buf[n++] = v4src1[list[i]];
	buf[n++] = v5src1[list[i]];

	buf[n++] = v0src2[list[i]];
	buf[n++] = v1src2[list[i]];
	buf[n++] = v2src2[list[i]];
	buf[n++] = v3src2[list[i]];
	buf[n++] = v4src2[list[i]];
	buf[n++] = v5src2[list[i]];

	buf[n++] = v0src3[list[i]];
	buf[n++] = v1src3[list[i]];
	buf[n++] = v2src3[list[i]];
	buf[n++] = v3src3[list[i]];
	buf[n++] = v4src3[list[i]];
	buf[n++] = v5src3[list[i]];

	buf[n++] = v0src4[list[i]];
	buf[n++] = v1src4[list[i]];
	buf[n++] = v2src4[list[i]];
	buf[n++] = v3src4[list[i]];
	buf[n++] = v4src4[list[i]];
	buf[n++] = v5src4[list[i]];

	buf[n++] = v0src5[list[i]];
	buf[n++] = v1src5[list[i]];
	buf[n++] = v2src5[list[i]];
	buf[n++] = v3src5[list[i]];
	buf[n++] = v4src5[list[i]];
	buf[n++] = v5src5[list[i]];

	buf[n++] = v0src6[list[i]];
	buf[n++] = v1src6[list[i]];
	buf[n++] = v2src6[list[i]];
	buf[n++] = v3src6[list[i]];
	buf[n++] = v4src6[list[i]];
	buf[n++] = v5src6[list[i]];
	}
	}
	break;
	}

	case FORWARD_AD_PERATOM_A: {
	FFT_SCALAR *v0src0 = &v0_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src0 = &v1_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src0 = &v2_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src0 = &v3_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src0 = &v4_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src0 = &v5_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *v0src1 = &v0_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src1 = &v1_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src1 = &v2_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src1 = &v3_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src1 = &v4_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src1 = &v5_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *v0src2 = &v0_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src2 = &v1_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src2 = &v2_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src2 = &v3_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src2 = &v4_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src2 = &v5_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *v0src3 = &v0_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src3 = &v1_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src3 = &v2_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src3 = &v3_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src3 = &v4_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src3 = &v5_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *v0src4 = &v0_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src4 = &v1_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src4 = &v2_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src4 = &v3_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src4 = &v4_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src4 = &v5_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *v0src5 = &v0_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src5 = &v1_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src5 = &v2_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src5 = &v3_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src5 = &v4_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src5 = &v5_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *v0src6 = &v0_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src6 = &v1_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src6 = &v2_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src6 = &v3_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src6 = &v4_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src6 = &v5_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];

	for (int i = 0; i < nlist; i++) {
	buf[n++] = v0src0[list[i]];
	buf[n++] = v1src0[list[i]];
	buf[n++] = v2src0[list[i]];
	buf[n++] = v3src0[list[i]];
	buf[n++] = v4src0[list[i]];
	buf[n++] = v5src0[list[i]];

	buf[n++] = v0src1[list[i]];
	buf[n++] = v1src1[list[i]];
	buf[n++] = v2src1[list[i]];
	buf[n++] = v3src1[list[i]];
	buf[n++] = v4src1[list[i]];
	buf[n++] = v5src1[list[i]];

	buf[n++] = v0src2[list[i]];
	buf[n++] = v1src2[list[i]];
	buf[n++] = v2src2[list[i]];
	buf[n++] = v3src2[list[i]];
	buf[n++] = v4src2[list[i]];
	buf[n++] = v5src2[list[i]];

	buf[n++] = v0src3[list[i]];
	buf[n++] = v1src3[list[i]];
	buf[n++] = v2src3[list[i]];
	buf[n++] = v3src3[list[i]];
	buf[n++] = v4src3[list[i]];
	buf[n++] = v5src3[list[i]];

	buf[n++] = v0src4[list[i]];
	buf[n++] = v1src4[list[i]];
	buf[n++] = v2src4[list[i]];
	buf[n++] = v3src4[list[i]];
	buf[n++] = v4src4[list[i]];
	buf[n++] = v5src4[list[i]];

	buf[n++] = v0src5[list[i]];
	buf[n++] = v1src5[list[i]];
	buf[n++] = v2src5[list[i]];
	buf[n++] = v3src5[list[i]];
	buf[n++] = v4src5[list[i]];
	buf[n++] = v5src5[list[i]];

	buf[n++] = v0src6[list[i]];
	buf[n++] = v1src6[list[i]];
	buf[n++] = v2src6[list[i]];
	buf[n++] = v3src6[list[i]];
	buf[n++] = v4src6[list[i]];
	buf[n++] = v5src6[list[i]];
	}
	break;
	}

	// Dispersion interactions, no mixing

	case FORWARD_IK_NONE: {
	for (int k = 0; k < nsplit_alloc; k++) {
	FFT_SCALAR *xsrc = &vdx_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ysrc = &vdy_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zsrc = &vdz_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++) {
	buf[n++] = xsrc[list[i]];
	buf[n++] = ysrc[list[i]];
	buf[n++] = zsrc[list[i]];
	}
	}
	break;
	}

	case FORWARD_AD_NONE: {
	for (int k = 0; k < nsplit_alloc; k++) {
	FFT_SCALAR *src = &u_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++)
	buf[n++] = src[list[i]];
	}
	break;
	}

	case FORWARD_IK_PERATOM_NONE: {
	for (int k = 0; k < nsplit_alloc; k++) {
	FFT_SCALAR *esrc = &u_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src = &v0_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src = &v1_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src = &v2_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src = &v3_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src = &v4_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src = &v5_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++) {
	if (eflag_atom) buf[n++] = esrc[list[i]];
	if (vflag_atom) {
	buf[n++] = v0src[list[i]];
	buf[n++] = v1src[list[i]];
	buf[n++] = v2src[list[i]];
	buf[n++] = v3src[list[i]];
	buf[n++] = v4src[list[i]];
	buf[n++] = v5src[list[i]];
	}
	}
	}
	break;
	}

	case FORWARD_AD_PERATOM_NONE: {
	for (int k = 0; k < nsplit_alloc; k++) {
	FFT_SCALAR *v0src = &v0_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src = &v1_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src = &v2_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src = &v3_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src = &v4_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src = &v5_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++) {
	buf[n++] = v0src[list[i]];
	buf[n++] = v1src[list[i]];
	buf[n++] = v2src[list[i]];
	buf[n++] = v3src[list[i]];
	buf[n++] = v4src[list[i]];
	buf[n++] = v5src[list[i]];
	}
	}
	break;
	}

	}
	}

	/* ----------------------------------------------------------------------
	unpack another proc's own values from buf and set own ghost values
	------------------------------------------------------------------------- */

	void PPPMDisp::unpack_forward(int flag, FFT_SCALAR buf, int nlist, int list)
	{
	int n = 0;

	switch (flag) {

	// Coulomb interactions

	case FORWARD_IK: {
	FFT_SCALAR *xdest = &vdx_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *ydest = &vdy_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *zdest = &vdz_brick[nzlo_out][nylo_out][nxlo_out];
	for (int i = 0; i < nlist; i++) {
	xdest[list[i]] = buf[n++];
	ydest[list[i]] = buf[n++];
	zdest[list[i]] = buf[n++];
	}
	break;
	}

	case FORWARD_AD: {
	FFT_SCALAR *dest = &u_brick[nzlo_out][nylo_out][nxlo_out];
	for (int i = 0; i < nlist; i++)
	dest[list[i]] = buf[n++];
	break;
	}

	case FORWARD_IK_PERATOM: {
	FFT_SCALAR *esrc = &u_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v0src = &v0_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v1src = &v1_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v2src = &v2_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v3src = &v3_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v4src = &v4_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v5src = &v5_brick[nzlo_out][nylo_out][nxlo_out];
	for (int i = 0; i < nlist; i++) {
	if (eflag_atom) esrc[list[i]] = buf[n++];
	if (vflag_atom) {
	v0src[list[i]] = buf[n++];
	v1src[list[i]] = buf[n++];
	v2src[list[i]] = buf[n++];
	v3src[list[i]] = buf[n++];
	v4src[list[i]] = buf[n++];
	v5src[list[i]] = buf[n++];
	}
	}
	break;
	}

	case FORWARD_AD_PERATOM: {
	FFT_SCALAR *v0src = &v0_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v1src = &v1_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v2src = &v2_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v3src = &v3_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v4src = &v4_brick[nzlo_out][nylo_out][nxlo_out];
	FFT_SCALAR *v5src = &v5_brick[nzlo_out][nylo_out][nxlo_out];
	for (int i = 0; i < nlist; i++) {
	v0src[list[i]] = buf[n++];
	v1src[list[i]] = buf[n++];
	v2src[list[i]] = buf[n++];
	v3src[list[i]] = buf[n++];
	v4src[list[i]] = buf[n++];
	v5src[list[i]] = buf[n++];
	}
	break;
	}

	// Disperion interactions, geometric mixing

	case FORWARD_IK_G: {
	FFT_SCALAR *xdest = &vdx_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ydest = &vdy_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zdest = &vdz_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++) {
	xdest[list[i]] = buf[n++];
	ydest[list[i]] = buf[n++];
	zdest[list[i]] = buf[n++];
	}
	break;
	}

	case FORWARD_AD_G: {
	FFT_SCALAR *dest = &u_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++)
	dest[list[i]] = buf[n++];
	break;
	}

	case FORWARD_IK_PERATOM_G: {
	FFT_SCALAR *esrc = &u_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src = &v0_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src = &v1_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src = &v2_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src = &v3_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src = &v4_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src = &v5_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++) {
	if (eflag_atom) esrc[list[i]] = buf[n++];
	if (vflag_atom) {
	v0src[list[i]] = buf[n++];
	v1src[list[i]] = buf[n++];
	v2src[list[i]] = buf[n++];
	v3src[list[i]] = buf[n++];
	v4src[list[i]] = buf[n++];
	v5src[list[i]] = buf[n++];
	}
	}
	break;
	}

	case FORWARD_AD_PERATOM_G: {
	FFT_SCALAR *v0src = &v0_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src = &v1_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src = &v2_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src = &v3_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src = &v4_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src = &v5_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++) {
	v0src[list[i]] = buf[n++];
	v1src[list[i]] = buf[n++];
	v2src[list[i]] = buf[n++];
	v3src[list[i]] = buf[n++];
	v4src[list[i]] = buf[n++];
	v5src[list[i]] = buf[n++];
	}
	break;
	}

	// Disperion interactions, arithmetic mixing

	case FORWARD_IK_A: {
	FFT_SCALAR *xdest0 = &vdx_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ydest0 = &vdy_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zdest0 = &vdz_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *xdest1 = &vdx_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ydest1 = &vdy_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zdest1 = &vdz_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *xdest2 = &vdx_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ydest2 = &vdy_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zdest2 = &vdz_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *xdest3 = &vdx_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ydest3 = &vdy_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zdest3 = &vdz_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *xdest4 = &vdx_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ydest4 = &vdy_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zdest4 = &vdz_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *xdest5 = &vdx_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ydest5 = &vdy_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zdest5 = &vdz_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *xdest6 = &vdx_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ydest6 = &vdy_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zdest6 = &vdz_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];

	for (int i = 0; i < nlist; i++) {
	xdest0[list[i]] = buf[n++];
	ydest0[list[i]] = buf[n++];
	zdest0[list[i]] = buf[n++];

	xdest1[list[i]] = buf[n++];
	ydest1[list[i]] = buf[n++];
	zdest1[list[i]] = buf[n++];

	xdest2[list[i]] = buf[n++];
	ydest2[list[i]] = buf[n++];
	zdest2[list[i]] = buf[n++];

	xdest3[list[i]] = buf[n++];
	ydest3[list[i]] = buf[n++];
	zdest3[list[i]] = buf[n++];

	xdest4[list[i]] = buf[n++];
	ydest4[list[i]] = buf[n++];
	zdest4[list[i]] = buf[n++];

	xdest5[list[i]] = buf[n++];
	ydest5[list[i]] = buf[n++];
	zdest5[list[i]] = buf[n++];

	xdest6[list[i]] = buf[n++];
	ydest6[list[i]] = buf[n++];
	zdest6[list[i]] = buf[n++];
	}
	break;
	}

	case FORWARD_AD_A: {
	FFT_SCALAR *dest0 = &u_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *dest1 = &u_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *dest2 = &u_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *dest3 = &u_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *dest4 = &u_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *dest5 = &u_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *dest6 = &u_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];

	for (int i = 0; i < nlist; i++) {
	dest0[list[i]] = buf[n++];
	dest1[list[i]] = buf[n++];
	dest2[list[i]] = buf[n++];
	dest3[list[i]] = buf[n++];
	dest4[list[i]] = buf[n++];
	dest5[list[i]] = buf[n++];
	dest6[list[i]] = buf[n++];
	}
	break;
	}

	case FORWARD_IK_PERATOM_A: {
	FFT_SCALAR *esrc0 = &u_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src0 = &v0_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src0 = &v1_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src0 = &v2_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src0 = &v3_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src0 = &v4_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src0 = &v5_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *esrc1 = &u_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src1 = &v0_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src1 = &v1_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src1 = &v2_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src1 = &v3_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src1 = &v4_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src1 = &v5_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *esrc2 = &u_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src2 = &v0_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src2 = &v1_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src2 = &v2_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src2 = &v3_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src2 = &v4_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src2 = &v5_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *esrc3 = &u_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src3 = &v0_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src3 = &v1_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src3 = &v2_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src3 = &v3_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src3 = &v4_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src3 = &v5_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *esrc4 = &u_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src4 = &v0_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src4 = &v1_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src4 = &v2_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src4 = &v3_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src4 = &v4_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src4 = &v5_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *esrc5 = &u_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src5 = &v0_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src5 = &v1_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src5 = &v2_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src5 = &v3_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src5 = &v4_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src5 = &v5_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *esrc6 = &u_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src6 = &v0_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src6 = &v1_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src6 = &v2_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src6 = &v3_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src6 = &v4_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src6 = &v5_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];

	for (int i = 0; i < nlist; i++) {
	if (eflag_atom) {
	esrc0[list[i]] = buf[n++];
	esrc1[list[i]] = buf[n++];
	esrc2[list[i]] = buf[n++];
	esrc3[list[i]] = buf[n++];
	esrc4[list[i]] = buf[n++];
	esrc5[list[i]] = buf[n++];
	esrc6[list[i]] = buf[n++];
	}
	if (vflag_atom) {
	v0src0[list[i]] = buf[n++];
	v1src0[list[i]] = buf[n++];
	v2src0[list[i]] = buf[n++];
	v3src0[list[i]] = buf[n++];
	v4src0[list[i]] = buf[n++];
	v5src0[list[i]] = buf[n++];

	v0src1[list[i]] = buf[n++];
	v1src1[list[i]] = buf[n++];
	v2src1[list[i]] = buf[n++];
	v3src1[list[i]] = buf[n++];
	v4src1[list[i]] = buf[n++];
	v5src1[list[i]] = buf[n++];

	v0src2[list[i]] = buf[n++];
	v1src2[list[i]] = buf[n++];
	v2src2[list[i]] = buf[n++];
	v3src2[list[i]] = buf[n++];
	v4src2[list[i]] = buf[n++];
	v5src2[list[i]] = buf[n++];

	v0src3[list[i]] = buf[n++];
	v1src3[list[i]] = buf[n++];
	v2src3[list[i]] = buf[n++];
	v3src3[list[i]] = buf[n++];
	v4src3[list[i]] = buf[n++];
	v5src3[list[i]] = buf[n++];

	v0src4[list[i]] = buf[n++];
	v1src4[list[i]] = buf[n++];
	v2src4[list[i]] = buf[n++];
	v3src4[list[i]] = buf[n++];
	v4src4[list[i]] = buf[n++];
	v5src4[list[i]] = buf[n++];

	v0src5[list[i]] = buf[n++];
	v1src5[list[i]] = buf[n++];
	v2src5[list[i]] = buf[n++];
	v3src5[list[i]] = buf[n++];
	v4src5[list[i]] = buf[n++];
	v5src5[list[i]] = buf[n++];

	v0src6[list[i]] = buf[n++];
	v1src6[list[i]] = buf[n++];
	v2src6[list[i]] = buf[n++];
	v3src6[list[i]] = buf[n++];
	v4src6[list[i]] = buf[n++];
	v5src6[list[i]] = buf[n++];
	}
	}
	break;
	}

	case FORWARD_AD_PERATOM_A: {
	FFT_SCALAR *v0src0 = &v0_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src0 = &v1_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src0 = &v2_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src0 = &v3_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src0 = &v4_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src0 = &v5_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *v0src1 = &v0_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src1 = &v1_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src1 = &v2_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src1 = &v3_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src1 = &v4_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src1 = &v5_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *v0src2 = &v0_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src2 = &v1_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src2 = &v2_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src2 = &v3_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src2 = &v4_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src2 = &v5_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *v0src3 = &v0_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src3 = &v1_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src3 = &v2_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src3 = &v3_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src3 = &v4_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src3 = &v5_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *v0src4 = &v0_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src4 = &v1_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src4 = &v2_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src4 = &v3_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src4 = &v4_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src4 = &v5_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *v0src5 = &v0_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src5 = &v1_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src5 = &v2_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src5 = &v3_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src5 = &v4_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src5 = &v5_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];

	FFT_SCALAR *v0src6 = &v0_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src6 = &v1_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src6 = &v2_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src6 = &v3_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src6 = &v4_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src6 = &v5_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];

	for (int i = 0; i < nlist; i++) {
	v0src0[list[i]] = buf[n++];
	v1src0[list[i]] = buf[n++];
	v2src0[list[i]] = buf[n++];
	v3src0[list[i]] = buf[n++];
	v4src0[list[i]] = buf[n++];
	v5src0[list[i]] = buf[n++];

	v0src1[list[i]] = buf[n++];
	v1src1[list[i]] = buf[n++];
	v2src1[list[i]] = buf[n++];
	v3src1[list[i]] = buf[n++];
	v4src1[list[i]] = buf[n++];
	v5src1[list[i]] = buf[n++];

	v0src2[list[i]] = buf[n++];
	v1src2[list[i]] = buf[n++];
	v2src2[list[i]] = buf[n++];
	v3src2[list[i]] = buf[n++];
	v4src2[list[i]] = buf[n++];
	v5src2[list[i]] = buf[n++];

	v0src3[list[i]] = buf[n++];
	v1src3[list[i]] = buf[n++];
	v2src3[list[i]] = buf[n++];
	v3src3[list[i]] = buf[n++];
	v4src3[list[i]] = buf[n++];
	v5src3[list[i]] = buf[n++];

	v0src4[list[i]] = buf[n++];
	v1src4[list[i]] = buf[n++];
	v2src4[list[i]] = buf[n++];
	v3src4[list[i]] = buf[n++];
	v4src4[list[i]] = buf[n++];
	v5src4[list[i]] = buf[n++];

	v0src5[list[i]] = buf[n++];
	v1src5[list[i]] = buf[n++];
	v2src5[list[i]] = buf[n++];
	v3src5[list[i]] = buf[n++];
	v4src5[list[i]] = buf[n++];
	v5src5[list[i]] = buf[n++];

	v0src6[list[i]] = buf[n++];
	v1src6[list[i]] = buf[n++];
	v2src6[list[i]] = buf[n++];
	v3src6[list[i]] = buf[n++];
	v4src6[list[i]] = buf[n++];
	v5src6[list[i]] = buf[n++];
	}
	break;
	}

	// Disperion interactions, geometric mixing

	case FORWARD_IK_NONE: {
	for (int k = 0; k < nsplit_alloc; k++) {
	FFT_SCALAR *xdest = &vdx_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *ydest = &vdy_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *zdest = &vdz_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++) {
	xdest[list[i]] = buf[n++];
	ydest[list[i]] = buf[n++];
	zdest[list[i]] = buf[n++];
	}
	}
	break;
	}

	case FORWARD_AD_NONE: {
	for (int k = 0; k < nsplit_alloc; k++) {
	FFT_SCALAR *dest = &u_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++)
	dest[list[i]] = buf[n++];
	}
	break;
	}

	case FORWARD_IK_PERATOM_NONE: {
	for (int k = 0; k < nsplit_alloc; k++) {
	FFT_SCALAR *esrc = &u_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v0src = &v0_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src = &v1_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src = &v2_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src = &v3_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src = &v4_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src = &v5_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++) {
	if (eflag_atom) esrc[list[i]] = buf[n++];
	if (vflag_atom) {
	v0src[list[i]] = buf[n++];
	v1src[list[i]] = buf[n++];
	v2src[list[i]] = buf[n++];
	v3src[list[i]] = buf[n++];
	v4src[list[i]] = buf[n++];
	v5src[list[i]] = buf[n++];
	}
	}
	}
	break;
	}

	case FORWARD_AD_PERATOM_NONE: {
	for (int k = 0; k < nsplit_alloc; k++) {
	FFT_SCALAR *v0src = &v0_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v1src = &v1_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v2src = &v2_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v3src = &v3_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v4src = &v4_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *v5src = &v5_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++) {
	v0src[list[i]] = buf[n++];
	v1src[list[i]] = buf[n++];
	v2src[list[i]] = buf[n++];
	v3src[list[i]] = buf[n++];
	v4src[list[i]] = buf[n++];
	v5src[list[i]] = buf[n++];
	}
	}
	break;
	}

	}
	}

	/* ----------------------------------------------------------------------
	pack ghost values into buf to send to another proc
	------------------------------------------------------------------------- */

	void PPPMDisp::pack_reverse(int flag, FFT_SCALAR buf, int nlist, int list)
	{
	int n = 0;

	//Coulomb interactions

	if (flag == REVERSE_RHO) {
	FFT_SCALAR *src = &density_brick[nzlo_out][nylo_out][nxlo_out];
	for (int i = 0; i < nlist; i++)
	buf[i] = src[list[i]];

	//Dispersion interactions, geometric mixing

	} else if (flag == REVERSE_RHO_G) {
	FFT_SCALAR *src = &density_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++)
	buf[i] = src[list[i]];

	//Dispersion interactions, arithmetic mixing

	} else if (flag == REVERSE_RHO_A) {
	FFT_SCALAR *src0 = &density_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *src1 = &density_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *src2 = &density_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *src3 = &density_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *src4 = &density_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *src5 = &density_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *src6 = &density_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++) {
	buf[n++] = src0[list[i]];
	buf[n++] = src1[list[i]];
	buf[n++] = src2[list[i]];
	buf[n++] = src3[list[i]];
	buf[n++] = src4[list[i]];
	buf[n++] = src5[list[i]];
	buf[n++] = src6[list[i]];
	}

	//Dispersion interactions, no mixing

	} else if (flag == REVERSE_RHO_NONE) {
	for (int k = 0; k < nsplit_alloc; k++) {
	FFT_SCALAR *src = &density_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++) {
	buf[n++] = src[list[i]];
	}
	}
	}
	}

	/* ----------------------------------------------------------------------
	unpack another proc's ghost values from buf and add to own values
	------------------------------------------------------------------------- */

	void PPPMDisp::unpack_reverse(int flag, FFT_SCALAR buf, int nlist, int list)
	{
	int n = 0;

	//Coulomb interactions

	if (flag == REVERSE_RHO) {
	FFT_SCALAR *dest = &density_brick[nzlo_out][nylo_out][nxlo_out];
	for (int i = 0; i < nlist; i++)
	dest[list[i]] += buf[i];

	//Dispersion interactions, geometric mixing

	} else if (flag == REVERSE_RHO_G) {
	FFT_SCALAR *dest = &density_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++)
	dest[list[i]] += buf[i];

	//Dispersion interactions, arithmetic mixing

	} else if (flag == REVERSE_RHO_A) {
	FFT_SCALAR *dest0 = &density_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *dest1 = &density_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *dest2 = &density_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *dest3 = &density_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *dest4 = &density_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *dest5 = &density_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
	FFT_SCALAR *dest6 = &density_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++) {
	dest0[list[i]] += buf[n++];
	dest1[list[i]] += buf[n++];
	dest2[list[i]] += buf[n++];
	dest3[list[i]] += buf[n++];
	dest4[list[i]] += buf[n++];
	dest5[list[i]] += buf[n++];
	dest6[list[i]] += buf[n++];
	}

	//Dispersion interactions, no mixing

	} else if (flag == REVERSE_RHO_NONE) {
	for (int k = 0; k < nsplit_alloc; k++) {
	FFT_SCALAR *dest = &density_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
	for (int i = 0; i < nlist; i++)
	dest[list[i]] += buf[n++];
	}
	}
	}

	/* ----------------------------------------------------------------------
	map nprocs to NX by NY grid as PX by PY procs - return optimal px,py
	------------------------------------------------------------------------- */

	void PPPMDisp::procs2grid2d(int nprocs, int nx, int ny, int px, int py)
	{
	// loop thru all possible factorizations of nprocs
	// surf = surface area of largest proc sub-domain
	// innermost if test minimizes surface area and surface/volume ratio

	int bestsurf = 2 * (nx + ny);
	int bestboxx = 0;
	int bestboxy = 0;

	int boxx,boxy,surf,ipx,ipy;

	ipx = 1;
	while (ipx <= nprocs) {
	if (nprocs % ipx == 0) {
	ipy = nprocs/ipx;
	boxx = nx/ipx;
	if (nx % ipx) boxx++;
	boxy = ny/ipy;
	if (ny % ipy) boxy++;
	surf = boxx + boxy;
	if (surf < bestsurf \|\|
	(surf == bestsurf && boxxboxy > bestboxxbestboxy)) {
	bestsurf = surf;
	bestboxx = boxx;
	bestboxy = boxy;
	*px = ipx;
	*py = ipy;
	}
	}
	ipx++;
	}
	}

	/* ----------------------------------------------------------------------
	charge assignment into rho1d
	dx,dy,dz = distance of particle from "lower left" grid point
	------------------------------------------------------------------------- */

	void PPPMDisp::compute_rho1d(const FFT_SCALAR &dx, const FFT_SCALAR &dy,
	const FFT_SCALAR &dz, int ord,
	FFT_SCALAR rho_c, FFT_SCALAR r1d)
	{
	int k,l;
	FFT_SCALAR r1,r2,r3;

	for (k = (1-ord)/2; k <= ord/2; k++) {
	r1 = r2 = r3 = ZEROF;

	for (l = ord-1; l >= 0; l--) {
	r1 = rho_c[l][k] + r1*dx;
	r2 = rho_c[l][k] + r2*dy;
	r3 = rho_c[l][k] + r3*dz;
	}
	r1d[0][k] = r1;
	r1d[1][k] = r2;
	r1d[2][k] = r3;
	}
	}

	/* ----------------------------------------------------------------------
	charge assignment into drho1d
	dx,dy,dz = distance of particle from "lower left" grid point
	------------------------------------------------------------------------- */

	void PPPMDisp::compute_drho1d(const FFT_SCALAR &dx, const FFT_SCALAR &dy,
	const FFT_SCALAR &dz, int ord,
	FFT_SCALAR drho_c, FFT_SCALAR dr1d)
	{
	int k,l;
	FFT_SCALAR r1,r2,r3;

	for (k = (1-ord)/2; k <= ord/2; k++) {
	r1 = r2 = r3 = ZEROF;

	for (l = ord-2; l >= 0; l--) {
	r1 = drho_c[l][k] + r1*dx;
	r2 = drho_c[l][k] + r2*dy;
	r3 = drho_c[l][k] + r3*dz;
	}
	dr1d[0][k] = r1;
	dr1d[1][k] = r2;
	dr1d[2][k] = r3;
	}
	}

	/* ----------------------------------------------------------------------
	generate coeffients for the weight function of order n

	(n-1)
	Wn(x) = Sum wn(k,x) , Sum is over every other integer
	k=-(n-1)
	For k=-(n-1),-(n-1)+2, ....., (n-1)-2,n-1
	k is odd integers if n is even and even integers if n is odd
	---
	\| n-1
	\| Sum a(l,j)(x-k/2)*l if abs(x-k/2) < 1/2
	wn(k,x) = < l=0
	\|
	\| 0 otherwise
	---
	a coeffients are packed into the array rho_coeff to eliminate zeros
	rho_coeff(l,((k+mod(n+1,2))/2) = a(l,k)
	------------------------------------------------------------------------- */

	void PPPMDisp::compute_rho_coeff(FFT_SCALAR coeff , FFT_SCALAR dcoeff,
	int ord)
	{
	int j,k,l,m;
	FFT_SCALAR s;

	FFT_SCALAR **a;
	memory->create2d_offset(a,ord,-ord,ord,"pppm/disp:a");

	for (k = -ord; k <= ord; k++)
	for (l = 0; l < ord; l++)
	a[l][k] = 0.0;

	a[0][0] = 1.0;
	for (j = 1; j < ord; j++) {
	for (k = -j; k <= j; k += 2) {
	s = 0.0;
	for (l = 0; l < j; l++) {
	a[l+1][k] = (a[l][k+1]-a[l][k-1]) / (l+1);
	#ifdef FFT_SINGLE
	s += powf(0.5,(float) l+1) *
	(a[l][k-1] + powf(-1.0,(float) l) * a[l][k+1]) / (l+1);
	#else
	s += pow(0.5,(double) l+1) *
	(a[l][k-1] + pow(-1.0,(double) l) * a[l][k+1]) / (l+1);
	#endif
	}
	a[0][k] = s;
	}
	}

	m = (1-ord)/2;
	for (k = -(ord-1); k < ord; k += 2) {
	for (l = 0; l < ord; l++)
	coeff[l][m] = a[l][k];
	for (l = 1; l < ord; l++)
	dcoeff[l-1][m] = l*a[l][k];
	m++;
	}

	memory->destroy2d_offset(a,-ord);
	}

	/* ----------------------------------------------------------------------
	Slab-geometry correction term to dampen inter-slab interactions between
	periodically repeating slabs. Yields good approximation to 2D Ewald if
	adequate empty space is left between repeating slabs (J. Chem. Phys.
	111, 3155). Slabs defined here to be parallel to the xy plane. Also
	extended to non-neutral systems (J. Chem. Phys. 131, 094107).
	------------------------------------------------------------------------- */

	void PPPMDisp::slabcorr(int eflag)
	{
	// compute local contribution to global dipole moment

	double *q = atom->q;
	double **x = atom->x;
	double zprd = domain->zprd;
	int nlocal = atom->nlocal;

	double dipole = 0.0;
	for (int i = 0; i < nlocal; i++) dipole += q[i]*x[i][2];

	// sum local contributions to get global dipole moment

	double dipole_all;
	MPI_Allreduce(&dipole,&dipole_all,1,MPI_DOUBLE,MPI_SUM,world);

	// need to make non-neutral systems and/or
	// per-atom energy translationally invariant

	double dipole_r2 = 0.0;
	if (eflag_atom \|\| fabs(qsum) > SMALL) {
	for (int i = 0; i < nlocal; i++)
	dipole_r2 += q[i]x[i][2]x[i][2];

	// sum local contributions

	double tmp;
	MPI_Allreduce(&dipole_r2,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
	dipole_r2 = tmp;
	}

	// compute corrections

	const double e_slabcorr = MY_2PI(dipole_alldipole_all -
	qsumdipole_r2 - qsumqsumzprdzprd/12.0)/volume;
	const double qscale = force->qqrd2e * scale;

	if (eflag_global) energy_1 += qscale * e_slabcorr;

	// per-atom energy

	if (eflag_atom) {
	double efact = qscale * MY_2PI/volume;
	for (int i = 0; i < nlocal; i++)
	eatom[i] += efact * q[i](x[i][2]dipole_all - 0.5*(dipole_r2 +
	qsumx[i][2]x[i][2]) - qsumzprdzprd/12.0);
	}

	// add on force corrections

	double ffact = qscale * (-4.0*MY_PI/volume);
	double **f = atom->f;

	for (int i = 0; i < nlocal; i++) f[i][2] += ffact * q[i](dipole_all - qsumx[i][2]);
	}

	/* ----------------------------------------------------------------------
	perform and time the 1d FFTs required for N timesteps
	------------------------------------------------------------------------- */

	int PPPMDisp::timing_1d(int n, double &time1d)
	{
	double time1,time2;
	int mixing = 1;
	if (function[2]) mixing = 4;
	if (function[3]) mixing = nsplit_alloc/2;

	if (function[0]) for (int i = 0; i < 2*nfft_both; i++) work1[i] = ZEROF;
	if (function[1] + function[2] + function[3])
	for (int i = 0; i < 2*nfft_both_6; i++) work1_6[i] = ZEROF;

	MPI_Barrier(world);
	time1 = MPI_Wtime();

	if (function[0]) {
	for (int i = 0; i < n; i++) {
	fft1->timing1d(work1,nfft_both,1);
	fft2->timing1d(work1,nfft_both,-1);
	if (differentiation_flag != 1){
	fft2->timing1d(work1,nfft_both,-1);
	fft2->timing1d(work1,nfft_both,-1);
	}
	}
	}

	MPI_Barrier(world);
	time2 = MPI_Wtime();
	time1d = time2 - time1;

	MPI_Barrier(world);
	time1 = MPI_Wtime();

	if (function[1] + function[2] + function[3]) {
	for (int i = 0; i < n; i++) {
	fft1_6->timing1d(work1_6,nfft_both_6,1);
	fft2_6->timing1d(work1_6,nfft_both_6,-1);
	if (differentiation_flag != 1){
	fft2_6->timing1d(work1_6,nfft_both_6,-1);
	fft2_6->timing1d(work1_6,nfft_both_6,-1);
	}
	}
	}

	MPI_Barrier(world);
	time2 = MPI_Wtime();
	time1d += (time2 - time1)*mixing;

	if (differentiation_flag) return 2;
	return 4;
	}

	/* ----------------------------------------------------------------------
	perform and time the 3d FFTs required for N timesteps
	------------------------------------------------------------------------- */

	int PPPMDisp::timing_3d(int n, double &time3d)
	{
	double time1,time2;
	int mixing = 1;
	if (function[2]) mixing = 4;
	if (function[3]) mixing = nsplit_alloc/2;

	if (function[0]) for (int i = 0; i < 2*nfft_both; i++) work1[i] = ZEROF;
	if (function[1] + function[2] + function[3])
	for (int i = 0; i < 2*nfft_both_6; i++) work1_6[i] = ZEROF;



	MPI_Barrier(world);
	time1 = MPI_Wtime();

	if (function[0]) {
	for (int i = 0; i < n; i++) {
	fft1->compute(work1,work1,1);
	fft2->compute(work1,work1,-1);
	if (differentiation_flag != 1) {
	fft2->compute(work1,work1,-1);
	fft2->compute(work1,work1,-1);
	}
	}
	}

	MPI_Barrier(world);
	time2 = MPI_Wtime();
	time3d = time2 - time1;

	MPI_Barrier(world);
	time1 = MPI_Wtime();

	if (function[1] + function[2] + function[3]) {
	for (int i = 0; i < n; i++) {
	fft1_6->compute(work1_6,work1_6,1);
	fft2_6->compute(work1_6,work1_6,-1);
	if (differentiation_flag != 1) {
	fft2_6->compute(work1_6,work1_6,-1);
	fft2_6->compute(work1_6,work1_6,-1);
	}
	}
	}

	MPI_Barrier(world);
	time2 = MPI_Wtime();
	time3d += (time2 - time1) * mixing;

	if (differentiation_flag) return 2;
	return 4;
	}

	/* ----------------------------------------------------------------------
	memory usage of local arrays
	------------------------------------------------------------------------- */

	double PPPMDisp::memory_usage()
	{
	double bytes = nmax3 sizeof(double);
	int mixing = 1;
	int diff = 3; //depends on differentiation
	int per = 7; //depends on per atom calculations
	if (differentiation_flag) {
	diff = 1;
	per = 6;
	}
	if (!evflag_atom) per = 0;
	if (function[2]) mixing = 7;
	if (function[3]) mixing = nsplit_alloc;

	if (function[0]) {
	int nbrick = (nxhi_out-nxlo_out+1) * (nyhi_out-nylo_out+1) *
	(nzhi_out-nzlo_out+1);
	bytes += (1 + diff + per) * nbrick * sizeof(FFT_SCALAR); //brick memory
	bytes += 6 * nfft_both * sizeof(double); // vg
	bytes += nfft_both * sizeof(double); // greensfn
	bytes += nfft_both * 3 * sizeof(FFT_SCALAR); // density_FFT, work1, work2
	if (cg) bytes += cg->memory_usage();
	}

	if (function[1] + function[2] + function[3]) {
	int nbrick = (nxhi_out_6-nxlo_out_6+1) * (nyhi_out_6-nylo_out_6+1) *
	(nzhi_out_6-nzlo_out_6+1);
	bytes += (1 + diff + per ) * nbrick * sizeof(FFT_SCALAR) * mixing; // density_brick + vd_brick + per atom bricks
	bytes += 6 * nfft_both_6 * sizeof(double); // vg
	bytes += nfft_both_6 * sizeof(double); // greensfn
	bytes += nfft_both_6 * (mixing + 2) * sizeof(FFT_SCALAR); // density_FFT, work1, work2
	if (cg_6) bytes += cg_6->memory_usage();
	}
	return bytes;
	}
	diff --git a/src/MC/fix_gcmc.cpp b/src/MC/fix_gcmc.cpp
	index cba5a0a17..73758e362 100644
	--- a/src/MC/fix_gcmc.cpp
	+++ b/src/MC/fix_gcmc.cpp
	@@ -1,2475 +1,2476 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Paul Crozier, Aidan Thompson (SNL)
	------------------------------------------------------------------------- */

	#include <math.h>
	#include <stdlib.h>
	#include <string.h>
	#include "fix_gcmc.h"
	#include "atom.h"
	#include "atom_vec.h"
	#include "atom_vec_hybrid.h"
	#include "molecule.h"
	#include "update.h"
	#include "modify.h"
	#include "fix.h"
	#include "comm.h"
	#include "compute.h"
	#include "group.h"
	#include "domain.h"
	#include "region.h"
	#include "random_park.h"
	#include "force.h"
	#include "pair.h"
	#include "bond.h"
	#include "angle.h"
	#include "dihedral.h"
	#include "improper.h"
	#include "kspace.h"
	#include "math_extra.h"
	#include "math_const.h"
	#include "memory.h"
	#include "error.h"
	#include "thermo.h"
	#include "output.h"
	#include "neighbor.h"
	#include <iostream>

	using namespace std;
	using namespace LAMMPS_NS;
	using namespace FixConst;
	using namespace MathConst;

	// large energy value used to signal overlap

	#define MAXENERGYSIGNAL 1.0e100

	// this must be lower than MAXENERGYSIGNAL
	// by a large amount, so that it is still
	// less than total energy when negative
	// energy contributions are added to MAXENERGYSIGNAL

	#define MAXENERGYTEST 1.0e50

	enum{ATOM,MOLECULE};

	/* ---------------------------------------------------------------------- */

	FixGCMC::FixGCMC(LAMMPS lmp, int narg, char *arg) :
	Fix(lmp, narg, arg),
	idregion(NULL), full_flag(0), ngroups(0), groupstrings(NULL), ngrouptypes(0), grouptypestrings(NULL),
	grouptypebits(NULL), grouptypes(NULL), local_gas_list(NULL), atom_coord(NULL), random_equal(NULL), random_unequal(NULL),
	coords(NULL), imageflags(NULL), fixrigid(NULL), fixshake(NULL), idrigid(NULL), idshake(NULL)
	{
	if (narg < 11) error->all(FLERR,"Illegal fix gcmc command");

	if (atom->molecular == 2)
	error->all(FLERR,"Fix gcmc does not (yet) work with atom_style template");

	dynamic_group_allow = 1;

	vector_flag = 1;
	size_vector = 8;
	global_freq = 1;
	extvector = 0;
	restart_global = 1;
	time_depend = 1;

	// required args

	nevery = force->inumeric(FLERR,arg[3]);
	nexchanges = force->inumeric(FLERR,arg[4]);
	nmcmoves = force->inumeric(FLERR,arg[5]);
	ngcmc_type = force->inumeric(FLERR,arg[6]);
	seed = force->inumeric(FLERR,arg[7]);
	reservoir_temperature = force->numeric(FLERR,arg[8]);
	chemical_potential = force->numeric(FLERR,arg[9]);
	displace = force->numeric(FLERR,arg[10]);

	if (nevery <= 0) error->all(FLERR,"Illegal fix gcmc command");
	if (nexchanges < 0) error->all(FLERR,"Illegal fix gcmc command");
	if (nmcmoves < 0) error->all(FLERR,"Illegal fix gcmc command");
	if (seed <= 0) error->all(FLERR,"Illegal fix gcmc command");
	if (reservoir_temperature < 0.0)
	error->all(FLERR,"Illegal fix gcmc command");
	if (displace < 0.0) error->all(FLERR,"Illegal fix gcmc command");

	// read options from end of input line

	options(narg-11,&arg[11]);

	// random number generator, same for all procs

	random_equal = new RanPark(lmp,seed);

	// random number generator, not the same for all procs

	random_unequal = new RanPark(lmp,seed);

	// error checks on region and its extent being inside simulation box

	region_xlo = region_xhi = region_ylo = region_yhi =
	region_zlo = region_zhi = 0.0;
	if (regionflag) {
	if (domain->regions[iregion]->bboxflag == 0)
	error->all(FLERR,"Fix gcmc region does not support a bounding box");
	if (domain->regions[iregion]->dynamic_check())
	error->all(FLERR,"Fix gcmc region cannot be dynamic");

	region_xlo = domain->regions[iregion]->extent_xlo;
	region_xhi = domain->regions[iregion]->extent_xhi;
	region_ylo = domain->regions[iregion]->extent_ylo;
	region_yhi = domain->regions[iregion]->extent_yhi;
	region_zlo = domain->regions[iregion]->extent_zlo;
	region_zhi = domain->regions[iregion]->extent_zhi;

	if (region_xlo < domain->boxlo[0] \|\| region_xhi > domain->boxhi[0] \|\|
	region_ylo < domain->boxlo[1] \|\| region_yhi > domain->boxhi[1] \|\|
	region_zlo < domain->boxlo[2] \|\| region_zhi > domain->boxhi[2])
	error->all(FLERR,"Fix gcmc region extends outside simulation box");

	// estimate region volume using MC trials

	double coord[3];
	int inside = 0;
	int attempts = 10000000;
	for (int i = 0; i < attempts; i++) {
	coord[0] = region_xlo + random_equal->uniform() * (region_xhi-region_xlo);
	coord[1] = region_ylo + random_equal->uniform() * (region_yhi-region_ylo);
	coord[2] = region_zlo + random_equal->uniform() * (region_zhi-region_zlo);
	if (domain->regions[iregion]->match(coord[0],coord[1],coord[2]) != 0)
	inside++;
	}

	double max_region_volume = (region_xhi - region_xlo)*
	(region_yhi - region_ylo)*(region_zhi - region_zlo);

	region_volume = max_region_volume*static_cast<double> (inside)/
	static_cast<double> (attempts);
	}

	// error check and further setup for mode = MOLECULE

	if (mode == MOLECULE) {
	if (onemols[imol]->xflag == 0)
	error->all(FLERR,"Fix gcmc molecule must have coordinates");
	if (onemols[imol]->typeflag == 0)
	error->all(FLERR,"Fix gcmc molecule must have atom types");
	if (ngcmc_type != 0)
	error->all(FLERR,"Atom type must be zero in fix gcmc mol command");
	if (onemols[imol]->qflag == 1 && atom->q == NULL)
	error->all(FLERR,"Fix gcmc molecule has charges, but atom style does not");

	if (atom->molecular == 2 && onemols != atom->avec->onemols)
	error->all(FLERR,"Fix gcmc molecule template ID must be same "
	"as atom_style template ID");
	onemols[imol]->check_attributes(0);
	}

	if (charge_flag && atom->q == NULL)
	error->all(FLERR,"Fix gcmc atom has charge, but atom style does not");

	if (rigidflag && mode == ATOM)
	error->all(FLERR,"Cannot use fix gcmc rigid and not molecule");
	if (shakeflag && mode == ATOM)
	error->all(FLERR,"Cannot use fix gcmc shake and not molecule");
	if (rigidflag && shakeflag)
	error->all(FLERR,"Cannot use fix gcmc rigid and shake");

	// setup of coords and imageflags array

	if (mode == ATOM) natoms_per_molecule = 1;
	else natoms_per_molecule = onemols[imol]->natoms;
	memory->create(coords,natoms_per_molecule,3,"gcmc:coords");
	memory->create(imageflags,natoms_per_molecule,"gcmc:imageflags");
	memory->create(atom_coord,natoms_per_molecule,3,"gcmc:atom_coord");

	// compute the number of MC cycles that occur nevery timesteps

	ncycles = nexchanges + nmcmoves;

	// set up reneighboring

	force_reneighbor = 1;
	next_reneighbor = update->ntimestep + 1;

	// zero out counters

	ntranslation_attempts = 0.0;
	ntranslation_successes = 0.0;
	nrotation_attempts = 0.0;
	nrotation_successes = 0.0;
	ndeletion_attempts = 0.0;
	ndeletion_successes = 0.0;
	ninsertion_attempts = 0.0;
	ninsertion_successes = 0.0;

	gcmc_nmax = 0;
	local_gas_list = NULL;
	}

	/* ----------------------------------------------------------------------
	parse optional parameters at end of input line
	------------------------------------------------------------------------- */

	void FixGCMC::options(int narg, char **arg)
	{
	if (narg < 0) error->all(FLERR,"Illegal fix gcmc command");

	// defaults

	mode = ATOM;
	max_rotation_angle = 10*MY_PI/180;
	regionflag = 0;
	iregion = -1;
	region_volume = 0;
	max_region_attempts = 1000;
	molecule_group = 0;
	molecule_group_bit = 0;
	molecule_group_inversebit = 0;
	exclusion_group = 0;
	exclusion_group_bit = 0;
	pressure_flag = false;
	pressure = 0.0;
	fugacity_coeff = 1.0;
	rigidflag = 0;
	shakeflag = 0;
	charge = 0.0;
	charge_flag = false;
	full_flag = false;
	ngroups = 0;
	int ngroupsmax = 0;
	groupstrings = NULL;
	ngrouptypes = 0;
	int ngrouptypesmax = 0;
	grouptypestrings = NULL;
	grouptypes = NULL;
	grouptypebits = NULL;
	energy_intra = 0.0;
	tfac_insert = 1.0;
	- overlap_cutoff = 0.0;
	+ overlap_cutoffsq = 0.0;
	overlap_flag = 0;

	int iarg = 0;
	while (iarg < narg) {
	if (strcmp(arg[iarg],"mol") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
	imol = atom->find_molecule(arg[iarg+1]);
	if (imol == -1)
	error->all(FLERR,"Molecule template ID for fix gcmc does not exist");
	if (atom->molecules[imol]->nset > 1 && comm->me == 0)
	error->warning(FLERR,"Molecule template for "
	"fix gcmc has multiple molecules");
	mode = MOLECULE;
	onemols = atom->molecules;
	nmol = onemols[imol]->nset;
	iarg += 2;
	} else if (strcmp(arg[iarg],"region") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
	iregion = domain->find_region(arg[iarg+1]);
	if (iregion == -1)
	error->all(FLERR,"Region ID for fix gcmc does not exist");
	int n = strlen(arg[iarg+1]) + 1;
	idregion = new char[n];
	strcpy(idregion,arg[iarg+1]);
	regionflag = 1;
	iarg += 2;
	} else if (strcmp(arg[iarg],"maxangle") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
	max_rotation_angle = force->numeric(FLERR,arg[iarg+1]);
	max_rotation_angle *= MY_PI/180;
	iarg += 2;
	} else if (strcmp(arg[iarg],"pressure") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
	pressure = force->numeric(FLERR,arg[iarg+1]);
	pressure_flag = true;
	iarg += 2;
	} else if (strcmp(arg[iarg],"fugacity_coeff") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
	fugacity_coeff = force->numeric(FLERR,arg[iarg+1]);
	iarg += 2;
	} else if (strcmp(arg[iarg],"charge") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
	charge = force->numeric(FLERR,arg[iarg+1]);
	charge_flag = true;
	iarg += 2;
	} else if (strcmp(arg[iarg],"rigid") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
	int n = strlen(arg[iarg+1]) + 1;
	delete [] idrigid;
	idrigid = new char[n];
	strcpy(idrigid,arg[iarg+1]);
	rigidflag = 1;
	iarg += 2;
	} else if (strcmp(arg[iarg],"shake") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
	int n = strlen(arg[iarg+1]) + 1;
	delete [] idshake;
	idshake = new char[n];
	strcpy(idshake,arg[iarg+1]);
	shakeflag = 1;
	iarg += 2;
	} else if (strcmp(arg[iarg],"full_energy") == 0) {
	full_flag = true;
	iarg += 1;
	} else if (strcmp(arg[iarg],"group") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
	if (ngroups >= ngroupsmax) {
	ngroupsmax = ngroups+1;
	groupstrings = (char **)
	memory->srealloc(groupstrings,
	ngroupsmaxsizeof(char ),
	"fix_gcmc:groupstrings");
	}
	int n = strlen(arg[iarg+1]) + 1;
	groupstrings[ngroups] = new char[n];
	strcpy(groupstrings[ngroups],arg[iarg+1]);
	ngroups++;
	iarg += 2;
	} else if (strcmp(arg[iarg],"grouptype") == 0) {
	if (iarg+3 > narg) error->all(FLERR,"Illegal fix gcmc command");
	if (ngrouptypes >= ngrouptypesmax) {
	ngrouptypesmax = ngrouptypes+1;
	grouptypes = (int) memory->srealloc(grouptypes,ngrouptypesmaxsizeof(int),
	"fix_gcmc:grouptypes");
	grouptypestrings = (char**)
	memory->srealloc(grouptypestrings,
	ngrouptypesmaxsizeof(char ),
	"fix_gcmc:grouptypestrings");
	}
	grouptypes[ngrouptypes] = atoi(arg[iarg+1]);
	int n = strlen(arg[iarg+2]) + 1;
	grouptypestrings[ngrouptypes] = new char[n];
	strcpy(grouptypestrings[ngrouptypes],arg[iarg+2]);
	ngrouptypes++;
	iarg += 3;
	} else if (strcmp(arg[iarg],"intra_energy") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
	energy_intra = force->numeric(FLERR,arg[iarg+1]);
	iarg += 2;
	} else if (strcmp(arg[iarg],"tfac_insert") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
	tfac_insert = force->numeric(FLERR,arg[iarg+1]);
	iarg += 2;
	} else if (strcmp(arg[iarg],"overlap_cutoff") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
	- overlap_cutoff = force->numeric(FLERR,arg[iarg+1]);
	+ double rtmp = force->numeric(FLERR,arg[iarg+1]);
	+ overlap_cutoffsq = rtmp*rtmp;
	overlap_flag = 1;
	iarg += 2;
	} else error->all(FLERR,"Illegal fix gcmc command");
	}
	}

	/* ---------------------------------------------------------------------- */

	FixGCMC::~FixGCMC()
	{
	if (regionflag) delete [] idregion;
	delete random_equal;
	delete random_unequal;

	memory->destroy(local_gas_list);
	memory->destroy(atom_coord);
	memory->destroy(coords);
	memory->destroy(imageflags);

	delete [] idrigid;
	delete [] idshake;

	if (ngroups > 0) {
	for (int igroup = 0; igroup < ngroups; igroup++)
	delete [] groupstrings[igroup];
	memory->sfree(groupstrings);
	}

	if (ngrouptypes > 0) {
	memory->destroy(grouptypes);
	memory->destroy(grouptypebits);
	for (int igroup = 0; igroup < ngrouptypes; igroup++)
	delete [] grouptypestrings[igroup];
	memory->sfree(grouptypestrings);
	}
	if (full_flag && group) {
	int igroupall = group->find("all");
	neighbor->exclusion_group_group_delete(exclusion_group,igroupall);
	}

	}

	/* ---------------------------------------------------------------------- */

	int FixGCMC::setmask()
	{
	int mask = 0;
	mask \|= PRE_EXCHANGE;
	return mask;
	}

	/* ---------------------------------------------------------------------- */

	void FixGCMC::init()
	{

	triclinic = domain->triclinic;

	// decide whether to switch to the full_energy option

	if (!full_flag) {
	if ((force->kspace) \|\|
	(force->pair == NULL) \|\|
	(force->pair->single_enable == 0) \|\|
	(force->pair_match("hybrid",0)) \|\|
	(force->pair_match("eam",0)) \|\|
	(force->pair->tail_flag)
	) {
	full_flag = true;
	if (comm->me == 0)
	error->warning(FLERR,"Fix gcmc using full_energy option");
	}
	}

	if (full_flag) {
	char id_pe = (char ) "thermo_pe";
	int ipe = modify->find_compute(id_pe);
	c_pe = modify->compute[ipe];
	}

	int *type = atom->type;

	if (mode == ATOM) {
	if (ngcmc_type <= 0 \|\| ngcmc_type > atom->ntypes)
	error->all(FLERR,"Invalid atom type in fix gcmc command");
	}

	// if mode == ATOM, warn if any deletable atom has a mol ID

	if ((mode == ATOM) && atom->molecule_flag) {
	tagint *molecule = atom->molecule;
	int flag = 0;
	for (int i = 0; i < atom->nlocal; i++)
	if (type[i] == ngcmc_type)
	if (molecule[i]) flag = 1;
	int flagall;
	MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_SUM,world);
	if (flagall && comm->me == 0)
	error->all(FLERR,
	"Fix gcmc cannot exchange individual atoms belonging to a molecule");
	}

	// if mode == MOLECULE, check for unset mol IDs

	if (mode == MOLECULE) {
	tagint *molecule = atom->molecule;
	int *mask = atom->mask;
	int flag = 0;
	for (int i = 0; i < atom->nlocal; i++)
	if (mask[i] == groupbit)
	if (molecule[i] == 0) flag = 1;
	int flagall;
	MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_SUM,world);
	if (flagall && comm->me == 0)
	error->all(FLERR,
	"All mol IDs should be set for fix gcmc group atoms");
	}

	if (((mode == MOLECULE) && (atom->molecule_flag == 0)) \|\|
	((mode == MOLECULE) && (!atom->tag_enable \|\| !atom->map_style)))
	error->all(FLERR,
	"Fix gcmc molecule command requires that "
	"atoms have molecule attributes");

	// if rigidflag defined, check for rigid/small fix
	// its molecule template must be same as this one

	fixrigid = NULL;
	if (rigidflag) {
	int ifix = modify->find_fix(idrigid);
	if (ifix < 0) error->all(FLERR,"Fix gcmc rigid fix does not exist");
	fixrigid = modify->fix[ifix];
	int tmp;
	if (onemols != (Molecule **) fixrigid->extract("onemol",tmp))
	error->all(FLERR,
	"Fix gcmc and fix rigid/small not using "
	"same molecule template ID");
	}

	// if shakeflag defined, check for SHAKE fix
	// its molecule template must be same as this one

	fixshake = NULL;
	if (shakeflag) {
	int ifix = modify->find_fix(idshake);
	if (ifix < 0) error->all(FLERR,"Fix gcmc shake fix does not exist");
	fixshake = modify->fix[ifix];
	int tmp;
	if (onemols != (Molecule **) fixshake->extract("onemol",tmp))
	error->all(FLERR,"Fix gcmc and fix shake not using "
	"same molecule template ID");
	}

	if (domain->dimension == 2)
	error->all(FLERR,"Cannot use fix gcmc in a 2d simulation");

	// create a new group for interaction exclusions
	// used for attempted atom or molecule deletions
	// skip if already exists from previous init()

	if (full_flag && !exclusion_group_bit) {
	char *group_arg = new char[4];

	// create unique group name for atoms to be excluded

	int len = strlen(id) + 30;
	group_arg[0] = new char[len];
	sprintf(group_arg[0],"FixGCMC:gcmc_exclusion_group:%s",id);
	group_arg[1] = (char *) "subtract";
	group_arg[2] = (char *) "all";
	group_arg[3] = (char *) "all";
	group->assign(4,group_arg);
	exclusion_group = group->find(group_arg[0]);
	if (exclusion_group == -1)
	error->all(FLERR,"Could not find fix gcmc exclusion group ID");
	exclusion_group_bit = group->bitmask[exclusion_group];

	// neighbor list exclusion setup
	// turn off interactions between group all and the exclusion group

	int narg = 4;
	char *arg = new char[narg];;
	arg[0] = (char *) "exclude";
	arg[1] = (char *) "group";
	arg[2] = group_arg[0];
	arg[3] = (char *) "all";
	neighbor->modify_params(narg,arg);
	delete [] group_arg[0];
	delete [] group_arg;
	delete [] arg;
	}

	// create a new group for temporary use with selected molecules

	if (mode == MOLECULE) {
	char *group_arg = new char[3];
	// create unique group name for atoms to be rotated
	int len = strlen(id) + 30;
	group_arg[0] = new char[len];
	sprintf(group_arg[0],"FixGCMC:rotation_gas_atoms:%s",id);
	group_arg[1] = (char *) "molecule";
	char digits[12];
	sprintf(digits,"%d",-1);
	group_arg[2] = digits;
	group->assign(3,group_arg);
	molecule_group = group->find(group_arg[0]);
	if (molecule_group == -1)
	error->all(FLERR,"Could not find fix gcmc rotation group ID");
	molecule_group_bit = group->bitmask[molecule_group];
	molecule_group_inversebit = molecule_group_bit ^ ~0;
	delete [] group_arg[0];
	delete [] group_arg;
	}

	// get all of the needed molecule data if mode == MOLECULE,
	// otherwise just get the gas mass

	if (mode == MOLECULE) {

	onemols[imol]->compute_mass();
	onemols[imol]->compute_com();
	gas_mass = onemols[imol]->masstotal;
	for (int i = 0; i < onemols[imol]->natoms; i++) {
	onemols[imol]->x[i][0] -= onemols[imol]->com[0];
	onemols[imol]->x[i][1] -= onemols[imol]->com[1];
	onemols[imol]->x[i][2] -= onemols[imol]->com[2];
	}

	} else gas_mass = atom->mass[ngcmc_type];

	if (gas_mass <= 0.0)
	error->all(FLERR,"Illegal fix gcmc gas mass <= 0");

	// check that no deletable atoms are in atom->firstgroup
	// deleting such an atom would not leave firstgroup atoms first

	if (atom->firstgroup >= 0) {
	int *mask = atom->mask;
	int firstgroupbit = group->bitmask[atom->firstgroup];

	int flag = 0;
	for (int i = 0; i < atom->nlocal; i++)
	if ((mask[i] == groupbit) && (mask[i] && firstgroupbit)) flag = 1;

	int flagall;
	MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_SUM,world);

	if (flagall)
	error->all(FLERR,"Cannot do GCMC on atoms in atom_modify first group");
	}

	// compute beta, lambda, sigma, and the zz factor
	// For LJ units, lambda=1
	beta = 1.0/(force->boltz*reservoir_temperature);
	if (strcmp(update->unit_style,"lj") == 0)
	zz = exp(beta*chemical_potential);
	else {
	double lambda = sqrt(force->hplanck*force->hplanck/
	(2.0MY_PIgas_massforce->mvv2e
	force->boltz*reservoir_temperature));
	zz = exp(beta*chemical_potential)/(pow(lambda,3.0));
	}

	sigma = sqrt(force->boltzreservoir_temperaturetfac_insert/gas_mass/force->mvv2e);
	if (pressure_flag) zz = pressurefugacity_coeffbeta/force->nktv2p;

	imagezero = ((imageint) IMGMAX << IMG2BITS) \|
	((imageint) IMGMAX << IMGBITS) \| IMGMAX;

	// construct group bitmask for all new atoms
	// aggregated over all group keywords

	groupbitall = 1 \| groupbit;
	for (int igroup = 0; igroup < ngroups; igroup++) {
	int jgroup = group->find(groupstrings[igroup]);
	if (jgroup == -1)
	error->all(FLERR,"Could not find specified fix gcmc group ID");
	groupbitall \|= group->bitmask[jgroup];
	}

	// construct group type bitmasks
	// not aggregated over all group keywords

	if (ngrouptypes > 0) {
	memory->create(grouptypebits,ngrouptypes,"fix_gcmc:grouptypebits");
	for (int igroup = 0; igroup < ngrouptypes; igroup++) {
	int jgroup = group->find(grouptypestrings[igroup]);
	if (jgroup == -1)
	error->all(FLERR,"Could not find specified fix gcmc group ID");
	grouptypebits[igroup] = group->bitmask[jgroup];
	}
	}

	}

	/* ----------------------------------------------------------------------
	attempt Monte Carlo translations, rotations, insertions, and deletions
	done before exchange, borders, reneighbor
	so that ghost atoms and neighbor lists will be correct
	------------------------------------------------------------------------- */

	void FixGCMC::pre_exchange()
	{
	// just return if should not be called on this timestep

	if (next_reneighbor != update->ntimestep) return;

	xlo = domain->boxlo[0];
	xhi = domain->boxhi[0];
	ylo = domain->boxlo[1];
	yhi = domain->boxhi[1];
	zlo = domain->boxlo[2];
	zhi = domain->boxhi[2];
	if (triclinic) {
	sublo = domain->sublo_lamda;
	subhi = domain->subhi_lamda;
	} else {
	sublo = domain->sublo;
	subhi = domain->subhi;
	}

	if (regionflag) volume = region_volume;
	else volume = domain->xprd * domain->yprd * domain->zprd;

	if (triclinic) domain->x2lamda(atom->nlocal);
	domain->pbc();
	comm->exchange();
	atom->nghost = 0;
	comm->borders();
	if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	update_gas_atoms_list();

	if (full_flag) {
	energy_stored = energy_full();
	if (overlap_flag && energy_stored > MAXENERGYTEST)
	error->warning(FLERR,"Energy of old configuration in "
	"fix gcmc is > MAXENERGYTEST.");

	if (mode == MOLECULE) {
	for (int i = 0; i < ncycles; i++) {
	int random_int_fraction =
	static_cast<int>(random_equal->uniform()*ncycles) + 1;
	if (random_int_fraction <= nmcmoves) {
	if (random_equal->uniform() < 0.5) attempt_molecule_translation_full();
	else attempt_molecule_rotation_full();
	} else {
	if (random_equal->uniform() < 0.5) attempt_molecule_deletion_full();
	else attempt_molecule_insertion_full();
	}
	}
	} else {
	for (int i = 0; i < ncycles; i++) {
	int random_int_fraction =
	static_cast<int>(random_equal->uniform()*ncycles) + 1;
	if (random_int_fraction <= nmcmoves) {
	attempt_atomic_translation_full();
	} else {
	if (random_equal->uniform() < 0.5) attempt_atomic_deletion_full();
	else attempt_atomic_insertion_full();
	}
	}
	}
	if (triclinic) domain->x2lamda(atom->nlocal);
	domain->pbc();
	comm->exchange();
	atom->nghost = 0;
	comm->borders();
	if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);

	} else {

	if (mode == MOLECULE) {
	for (int i = 0; i < ncycles; i++) {
	int random_int_fraction =
	static_cast<int>(random_equal->uniform()*ncycles) + 1;
	if (random_int_fraction <= nmcmoves) {
	if (random_equal->uniform() < 0.5) attempt_molecule_translation();
	else attempt_molecule_rotation();
	} else {
	if (random_equal->uniform() < 0.5) attempt_molecule_deletion();
	else attempt_molecule_insertion();
	}
	}
	} else {
	for (int i = 0; i < ncycles; i++) {
	int random_int_fraction =
	static_cast<int>(random_equal->uniform()*ncycles) + 1;
	if (random_int_fraction <= nmcmoves) {
	attempt_atomic_translation();
	} else {
	if (random_equal->uniform() < 0.5) attempt_atomic_deletion();
	else attempt_atomic_insertion();
	}
	}
	}
	}
	next_reneighbor = update->ntimestep + nevery;
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	void FixGCMC::attempt_atomic_translation()
	{
	ntranslation_attempts += 1.0;

	if (ngas == 0) return;

	int i = pick_random_gas_atom();

	int success = 0;
	if (i >= 0) {
	double **x = atom->x;
	double energy_before = energy(i,ngcmc_type,-1,x[i]);
	if (overlap_flag && energy_before > MAXENERGYTEST)
	error->warning(FLERR,"Energy of old configuration in "
	"fix gcmc is > MAXENERGYTEST.");
	double rsq = 1.1;
	double rx,ry,rz;
	rx = ry = rz = 0.0;
	double coord[3];
	while (rsq > 1.0) {
	rx = 2*random_unequal->uniform() - 1.0;
	ry = 2*random_unequal->uniform() - 1.0;
	rz = 2*random_unequal->uniform() - 1.0;
	rsq = rxrx + ryry + rz*rz;
	}
	coord[0] = x[i][0] + displace*rx;
	coord[1] = x[i][1] + displace*ry;
	coord[2] = x[i][2] + displace*rz;
	if (regionflag) {
	while (domain->regions[iregion]->match(coord[0],coord[1],coord[2]) == 0) {
	rsq = 1.1;
	while (rsq > 1.0) {
	rx = 2*random_unequal->uniform() - 1.0;
	ry = 2*random_unequal->uniform() - 1.0;
	rz = 2*random_unequal->uniform() - 1.0;
	rsq = rxrx + ryry + rz*rz;
	}
	coord[0] = x[i][0] + displace*rx;
	coord[1] = x[i][1] + displace*ry;
	coord[2] = x[i][2] + displace*rz;
	}
	}
	if (!domain->inside_nonperiodic(coord))
	error->one(FLERR,"Fix gcmc put atom outside box");

	double energy_after = energy(i,ngcmc_type,-1,coord);

	if (energy_after < MAXENERGYTEST &&
	random_unequal->uniform() <
	exp(beta*(energy_before - energy_after))) {
	x[i][0] = coord[0];
	x[i][1] = coord[1];
	x[i][2] = coord[2];
	success = 1;
	}
	}

	int success_all = 0;
	MPI_Allreduce(&success,&success_all,1,MPI_INT,MPI_MAX,world);

	if (success_all) {
	if (triclinic) domain->x2lamda(atom->nlocal);
	domain->pbc();
	comm->exchange();
	atom->nghost = 0;
	comm->borders();
	if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	update_gas_atoms_list();
	ntranslation_successes += 1.0;
	}
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	void FixGCMC::attempt_atomic_deletion()
	{
	ndeletion_attempts += 1.0;

	if (ngas == 0) return;

	int i = pick_random_gas_atom();

	int success = 0;
	if (i >= 0) {
	double deletion_energy = energy(i,ngcmc_type,-1,atom->x[i]);
	if (random_unequal->uniform() <
	ngasexp(betadeletion_energy)/(zz*volume)) {
	atom->avec->copy(atom->nlocal-1,i,1);
	atom->nlocal--;
	success = 1;
	}
	}

	int success_all = 0;
	MPI_Allreduce(&success,&success_all,1,MPI_INT,MPI_MAX,world);

	if (success_all) {
	atom->natoms--;
	if (atom->tag_enable) {
	if (atom->map_style) atom->map_init();
	}
	atom->nghost = 0;
	if (triclinic) domain->x2lamda(atom->nlocal);
	comm->borders();
	if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	update_gas_atoms_list();
	ndeletion_successes += 1.0;
	}
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	void FixGCMC::attempt_atomic_insertion()
	{
	double lamda[3];

	ninsertion_attempts += 1.0;

	// pick coordinates for insertion point

	double coord[3];
	if (regionflag) {
	int region_attempt = 0;
	coord[0] = region_xlo + random_equal->uniform() * (region_xhi-region_xlo);
	coord[1] = region_ylo + random_equal->uniform() * (region_yhi-region_ylo);
	coord[2] = region_zlo + random_equal->uniform() * (region_zhi-region_zlo);
	while (domain->regions[iregion]->match(coord[0],coord[1],coord[2]) == 0) {
	coord[0] = region_xlo + random_equal->uniform() * (region_xhi-region_xlo);
	coord[1] = region_ylo + random_equal->uniform() * (region_yhi-region_ylo);
	coord[2] = region_zlo + random_equal->uniform() * (region_zhi-region_zlo);
	region_attempt++;
	if (region_attempt >= max_region_attempts) return;
	}
	if (triclinic) domain->x2lamda(coord,lamda);
	} else {
	if (triclinic == 0) {
	coord[0] = xlo + random_equal->uniform() * (xhi-xlo);
	coord[1] = ylo + random_equal->uniform() * (yhi-ylo);
	coord[2] = zlo + random_equal->uniform() * (zhi-zlo);
	} else {
	lamda[0] = random_equal->uniform();
	lamda[1] = random_equal->uniform();
	lamda[2] = random_equal->uniform();

	// wasteful, but necessary

	if (lamda[0] == 1.0) lamda[0] = 0.0;
	if (lamda[1] == 1.0) lamda[1] = 0.0;
	if (lamda[2] == 1.0) lamda[2] = 0.0;

	domain->lamda2x(lamda,coord);
	}
	}

	int proc_flag = 0;
	if (triclinic == 0) {
	domain->remap(coord);
	if (!domain->inside(coord))
	error->one(FLERR,"Fix gcmc put atom outside box");
	if (coord[0] >= sublo[0] && coord[0] < subhi[0] &&
	coord[1] >= sublo[1] && coord[1] < subhi[1] &&
	coord[2] >= sublo[2] && coord[2] < subhi[2]) proc_flag = 1;
	} else {
	if (lamda[0] >= sublo[0] && lamda[0] < subhi[0] &&
	lamda[1] >= sublo[1] && lamda[1] < subhi[1] &&
	lamda[2] >= sublo[2] && lamda[2] < subhi[2]) proc_flag = 1;
	}

	int success = 0;
	if (proc_flag) {
	int ii = -1;
	if (charge_flag) {
	ii = atom->nlocal + atom->nghost;
	if (ii >= atom->nmax) atom->avec->grow(0);
	atom->q[ii] = charge;
	}
	double insertion_energy = energy(ii,ngcmc_type,-1,coord);

	if (insertion_energy < MAXENERGYTEST &&
	random_unequal->uniform() <
	zzvolumeexp(-beta*insertion_energy)/(ngas+1)) {
	atom->avec->create_atom(ngcmc_type,coord);
	int m = atom->nlocal - 1;

	// add to groups
	// optionally add to type-based groups

	atom->mask[m] = groupbitall;
	for (int igroup = 0; igroup < ngrouptypes; igroup++) {
	if (ngcmc_type == grouptypes[igroup])
	atom->mask[m] \|= grouptypebits[igroup];
	}

	atom->v[m][0] = random_unequal->gaussian()*sigma;
	atom->v[m][1] = random_unequal->gaussian()*sigma;
	atom->v[m][2] = random_unequal->gaussian()*sigma;
	modify->create_attribute(m);

	success = 1;
	}
	}

	int success_all = 0;
	MPI_Allreduce(&success,&success_all,1,MPI_INT,MPI_MAX,world);

	if (success_all) {
	atom->natoms++;
	if (atom->tag_enable) {
	atom->tag_extend();
	if (atom->map_style) atom->map_init();
	}
	atom->nghost = 0;
	if (triclinic) domain->x2lamda(atom->nlocal);
	comm->borders();
	if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	update_gas_atoms_list();
	ninsertion_successes += 1.0;
	}
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	void FixGCMC::attempt_molecule_translation()
	{
	ntranslation_attempts += 1.0;

	if (ngas == 0) return;

	tagint translation_molecule = pick_random_gas_molecule();
	if (translation_molecule == -1) return;

	double energy_before_sum = molecule_energy(translation_molecule);
	if (overlap_flag && energy_before_sum > MAXENERGYTEST)
	error->warning(FLERR,"Energy of old configuration in "
	"fix gcmc is > MAXENERGYTEST.");

	double **x = atom->x;
	double rx,ry,rz;
	double com_displace[3],coord[3];
	double rsq = 1.1;
	while (rsq > 1.0) {
	rx = 2*random_equal->uniform() - 1.0;
	ry = 2*random_equal->uniform() - 1.0;
	rz = 2*random_equal->uniform() - 1.0;
	rsq = rxrx + ryry + rz*rz;
	}
	com_displace[0] = displace*rx;
	com_displace[1] = displace*ry;
	com_displace[2] = displace*rz;

	int nlocal = atom->nlocal;
	if (regionflag) {
	int *mask = atom->mask;
	for (int i = 0; i < nlocal; i++) {
	if (atom->molecule[i] == translation_molecule) {
	mask[i] \|= molecule_group_bit;
	} else {
	mask[i] &= molecule_group_inversebit;
	}
	}
	double com[3];
	com[0] = com[1] = com[2] = 0.0;
	group->xcm(molecule_group,gas_mass,com);
	coord[0] = com[0] + displace*rx;
	coord[1] = com[1] + displace*ry;
	coord[2] = com[2] + displace*rz;
	while (domain->regions[iregion]->match(coord[0],coord[1],coord[2]) == 0) {
	rsq = 1.1;
	while (rsq > 1.0) {
	rx = 2*random_equal->uniform() - 1.0;
	ry = 2*random_equal->uniform() - 1.0;
	rz = 2*random_equal->uniform() - 1.0;
	rsq = rxrx + ryry + rz*rz;
	}
	coord[0] = com[0] + displace*rx;
	coord[1] = com[1] + displace*ry;
	coord[2] = com[2] + displace*rz;
	}
	com_displace[0] = displace*rx;
	com_displace[1] = displace*ry;
	com_displace[2] = displace*rz;
	}

	double energy_after = 0.0;
	for (int i = 0; i < nlocal; i++) {
	if (atom->molecule[i] == translation_molecule) {
	coord[0] = x[i][0] + com_displace[0];
	coord[1] = x[i][1] + com_displace[1];
	coord[2] = x[i][2] + com_displace[2];
	if (!domain->inside_nonperiodic(coord))
	error->one(FLERR,"Fix gcmc put atom outside box");
	energy_after += energy(i,atom->type[i],translation_molecule,coord);
	}
	}

	double energy_after_sum = 0.0;
	MPI_Allreduce(&energy_after,&energy_after_sum,1,MPI_DOUBLE,MPI_SUM,world);

	if (energy_after_sum < MAXENERGYTEST &&
	random_equal->uniform() <
	exp(beta*(energy_before_sum - energy_after_sum))) {
	for (int i = 0; i < nlocal; i++) {
	if (atom->molecule[i] == translation_molecule) {
	x[i][0] += com_displace[0];
	x[i][1] += com_displace[1];
	x[i][2] += com_displace[2];
	}
	}
	if (triclinic) domain->x2lamda(atom->nlocal);
	domain->pbc();
	comm->exchange();
	atom->nghost = 0;
	comm->borders();
	if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	update_gas_atoms_list();
	ntranslation_successes += 1.0;
	}
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	void FixGCMC::attempt_molecule_rotation()
	{
	nrotation_attempts += 1.0;

	if (ngas == 0) return;

	tagint rotation_molecule = pick_random_gas_molecule();
	if (rotation_molecule == -1) return;

	double energy_before_sum = molecule_energy(rotation_molecule);
	if (overlap_flag && energy_before_sum > MAXENERGYTEST)
	error->warning(FLERR,"Energy of old configuration in "
	"fix gcmc is > MAXENERGYTEST.");

	int nlocal = atom->nlocal;
	int *mask = atom->mask;
	for (int i = 0; i < nlocal; i++) {
	if (atom->molecule[i] == rotation_molecule) {
	mask[i] \|= molecule_group_bit;
	} else {
	mask[i] &= molecule_group_inversebit;
	}
	}

	double com[3];
	com[0] = com[1] = com[2] = 0.0;
	group->xcm(molecule_group,gas_mass,com);

	// generate point in unit cube
	// then restrict to unit sphere

	double r[3],rotmat[3][3],quat[4];
	double rsq = 1.1;
	while (rsq > 1.0) {
	r[0] = 2.0*random_equal->uniform() - 1.0;
	r[1] = 2.0*random_equal->uniform() - 1.0;
	r[2] = 2.0*random_equal->uniform() - 1.0;
	rsq = MathExtra::dot3(r, r);
	}

	double theta = random_equal->uniform() * max_rotation_angle;
	MathExtra::norm3(r);
	MathExtra::axisangle_to_quat(r,theta,quat);
	MathExtra::quat_to_mat(quat,rotmat);

	double **x = atom->x;
	imageint *image = atom->image;
	double energy_after = 0.0;
	int n = 0;
	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & molecule_group_bit) {
	double xtmp[3];
	domain->unmap(x[i],image[i],xtmp);
	xtmp[0] -= com[0];
	xtmp[1] -= com[1];
	xtmp[2] -= com[2];
	MathExtra::matvec(rotmat,xtmp,atom_coord[n]);
	atom_coord[n][0] += com[0];
	atom_coord[n][1] += com[1];
	atom_coord[n][2] += com[2];
	xtmp[0] = atom_coord[n][0];
	xtmp[1] = atom_coord[n][1];
	xtmp[2] = atom_coord[n][2];
	domain->remap(xtmp);
	if (!domain->inside(xtmp))
	error->one(FLERR,"Fix gcmc put atom outside box");
	energy_after += energy(i,atom->type[i],rotation_molecule,xtmp);
	n++;
	}
	}

	double energy_after_sum = 0.0;
	MPI_Allreduce(&energy_after,&energy_after_sum,1,MPI_DOUBLE,MPI_SUM,world);

	if (energy_after_sum < MAXENERGYTEST &&
	random_equal->uniform() <
	exp(beta*(energy_before_sum - energy_after_sum))) {
	int n = 0;
	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & molecule_group_bit) {
	image[i] = imagezero;
	x[i][0] = atom_coord[n][0];
	x[i][1] = atom_coord[n][1];
	x[i][2] = atom_coord[n][2];
	domain->remap(x[i],image[i]);
	n++;
	}
	}
	if (triclinic) domain->x2lamda(atom->nlocal);
	domain->pbc();
	comm->exchange();
	atom->nghost = 0;
	comm->borders();
	if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	update_gas_atoms_list();
	nrotation_successes += 1.0;
	}
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	void FixGCMC::attempt_molecule_deletion()
	{
	ndeletion_attempts += 1.0;

	if (ngas == 0) return;

	tagint deletion_molecule = pick_random_gas_molecule();
	if (deletion_molecule == -1) return;

	double deletion_energy_sum = molecule_energy(deletion_molecule);

	if (random_equal->uniform() <
	ngasexp(betadeletion_energy_sum)/(zzvolumenatoms_per_molecule)) {
	int i = 0;
	while (i < atom->nlocal) {
	if (atom->molecule[i] == deletion_molecule) {
	atom->avec->copy(atom->nlocal-1,i,1);
	atom->nlocal--;
	} else i++;
	}
	atom->natoms -= natoms_per_molecule;
	if (atom->map_style) atom->map_init();
	atom->nghost = 0;
	if (triclinic) domain->x2lamda(atom->nlocal);
	comm->borders();
	if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	update_gas_atoms_list();
	ndeletion_successes += 1.0;
	}
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	void FixGCMC::attempt_molecule_insertion()
	{
	double lamda[3];
	ninsertion_attempts += 1.0;

	double com_coord[3];
	if (regionflag) {
	int region_attempt = 0;
	com_coord[0] = region_xlo + random_equal->uniform() *
	(region_xhi-region_xlo);
	com_coord[1] = region_ylo + random_equal->uniform() *
	(region_yhi-region_ylo);
	com_coord[2] = region_zlo + random_equal->uniform() *
	(region_zhi-region_zlo);
	while (domain->regions[iregion]->match(com_coord[0],com_coord[1],
	com_coord[2]) == 0) {
	com_coord[0] = region_xlo + random_equal->uniform() *
	(region_xhi-region_xlo);
	com_coord[1] = region_ylo + random_equal->uniform() *
	(region_yhi-region_ylo);
	com_coord[2] = region_zlo + random_equal->uniform() *
	(region_zhi-region_zlo);
	region_attempt++;
	if (region_attempt >= max_region_attempts) return;
	}
	if (triclinic) domain->x2lamda(com_coord,lamda);
	} else {
	if (triclinic == 0) {
	com_coord[0] = xlo + random_equal->uniform() * (xhi-xlo);
	com_coord[1] = ylo + random_equal->uniform() * (yhi-ylo);
	com_coord[2] = zlo + random_equal->uniform() * (zhi-zlo);
	} else {
	lamda[0] = random_equal->uniform();
	lamda[1] = random_equal->uniform();
	lamda[2] = random_equal->uniform();

	// wasteful, but necessary

	if (lamda[0] == 1.0) lamda[0] = 0.0;
	if (lamda[1] == 1.0) lamda[1] = 0.0;
	if (lamda[2] == 1.0) lamda[2] = 0.0;

	domain->lamda2x(lamda,com_coord);
	}
	}

	// generate point in unit cube
	// then restrict to unit sphere

	double r[3],rotmat[3][3],quat[4];
	double rsq = 1.1;
	while (rsq > 1.0) {
	r[0] = 2.0*random_equal->uniform() - 1.0;
	r[1] = 2.0*random_equal->uniform() - 1.0;
	r[2] = 2.0*random_equal->uniform() - 1.0;
	rsq = MathExtra::dot3(r, r);
	}

	double theta = random_equal->uniform() * MY_2PI;
	MathExtra::norm3(r);
	MathExtra::axisangle_to_quat(r,theta,quat);
	MathExtra::quat_to_mat(quat,rotmat);

	double insertion_energy = 0.0;
	bool procflag[natoms_per_molecule];

	for (int i = 0; i < natoms_per_molecule; i++) {
	MathExtra::matvec(rotmat,onemols[imol]->x[i],atom_coord[i]);
	atom_coord[i][0] += com_coord[0];
	atom_coord[i][1] += com_coord[1];
	atom_coord[i][2] += com_coord[2];

	// use temporary variable for remapped position
	// so unmapped position is preserved in atom_coord

	double xtmp[3];
	xtmp[0] = atom_coord[i][0];
	xtmp[1] = atom_coord[i][1];
	xtmp[2] = atom_coord[i][2];
	domain->remap(xtmp);
	if (!domain->inside(xtmp))
	error->one(FLERR,"Fix gcmc put atom outside box");

	procflag[i] = false;
	if (triclinic == 0) {
	if (xtmp[0] >= sublo[0] && xtmp[0] < subhi[0] &&
	xtmp[1] >= sublo[1] && xtmp[1] < subhi[1] &&
	xtmp[2] >= sublo[2] && xtmp[2] < subhi[2]) procflag[i] = true;
	} else {
	domain->x2lamda(xtmp,lamda);
	if (lamda[0] >= sublo[0] && lamda[0] < subhi[0] &&
	lamda[1] >= sublo[1] && lamda[1] < subhi[1] &&
	lamda[2] >= sublo[2] && lamda[2] < subhi[2]) procflag[i] = true;
	}

	if (procflag[i]) {
	int ii = -1;
	if (onemols[imol]->qflag == 1) {
	ii = atom->nlocal + atom->nghost;
	if (ii >= atom->nmax) atom->avec->grow(0);
	atom->q[ii] = onemols[imol]->q[i];
	}
	insertion_energy += energy(ii,onemols[imol]->type[i],-1,xtmp);
	}
	}

	double insertion_energy_sum = 0.0;
	MPI_Allreduce(&insertion_energy,&insertion_energy_sum,1,
	MPI_DOUBLE,MPI_SUM,world);

	if (insertion_energy_sum < MAXENERGYTEST &&
	random_equal->uniform() < zzvolumenatoms_per_molecule*
	exp(-beta*insertion_energy_sum)/(ngas + natoms_per_molecule)) {

	tagint maxmol = 0;
	for (int i = 0; i < atom->nlocal; i++) maxmol = MAX(maxmol,atom->molecule[i]);
	tagint maxmol_all;
	MPI_Allreduce(&maxmol,&maxmol_all,1,MPI_LMP_TAGINT,MPI_MAX,world);
	maxmol_all++;
	if (maxmol_all >= MAXTAGINT)
	error->all(FLERR,"Fix gcmc ran out of available molecule IDs");

	tagint maxtag = 0;
	for (int i = 0; i < atom->nlocal; i++) maxtag = MAX(maxtag,atom->tag[i]);
	tagint maxtag_all;
	MPI_Allreduce(&maxtag,&maxtag_all,1,MPI_LMP_TAGINT,MPI_MAX,world);

	int nlocalprev = atom->nlocal;

	double vnew[3];
	vnew[0] = random_equal->gaussian()*sigma;
	vnew[1] = random_equal->gaussian()*sigma;
	vnew[2] = random_equal->gaussian()*sigma;

	for (int i = 0; i < natoms_per_molecule; i++) {
	if (procflag[i]) {
	atom->avec->create_atom(onemols[imol]->type[i],atom_coord[i]);
	int m = atom->nlocal - 1;

	// add to groups
	// optionally add to type-based groups

	atom->mask[m] = groupbitall;
	for (int igroup = 0; igroup < ngrouptypes; igroup++) {
	if (ngcmc_type == grouptypes[igroup])
	atom->mask[m] \|= grouptypebits[igroup];
	}

	atom->image[m] = imagezero;
	domain->remap(atom->x[m],atom->image[m]);
	atom->molecule[m] = maxmol_all;
	if (maxtag_all+i+1 >= MAXTAGINT)
	error->all(FLERR,"Fix gcmc ran out of available atom IDs");
	atom->tag[m] = maxtag_all + i + 1;
	atom->v[m][0] = vnew[0];
	atom->v[m][1] = vnew[1];
	atom->v[m][2] = vnew[2];

	atom->add_molecule_atom(onemols[imol],i,m,maxtag_all);
	modify->create_attribute(m);
	}
	}

	// FixRigidSmall::set_molecule stores rigid body attributes
	// FixShake::set_molecule stores shake info for molecule

	if (rigidflag)
	fixrigid->set_molecule(nlocalprev,maxtag_all,imol,com_coord,vnew,quat);
	else if (shakeflag)
	fixshake->set_molecule(nlocalprev,maxtag_all,imol,com_coord,vnew,quat);

	atom->natoms += natoms_per_molecule;
	if (atom->natoms < 0)
	error->all(FLERR,"Too many total atoms");
	atom->nbonds += onemols[imol]->nbonds;
	atom->nangles += onemols[imol]->nangles;
	atom->ndihedrals += onemols[imol]->ndihedrals;
	atom->nimpropers += onemols[imol]->nimpropers;
	if (atom->map_style) atom->map_init();
	atom->nghost = 0;
	if (triclinic) domain->x2lamda(atom->nlocal);
	comm->borders();
	if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	update_gas_atoms_list();
	ninsertion_successes += 1.0;
	}
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	void FixGCMC::attempt_atomic_translation_full()
	{
	ntranslation_attempts += 1.0;

	if (ngas == 0) return;

	double energy_before = energy_stored;

	int i = pick_random_gas_atom();

	double **x = atom->x;
	double xtmp[3];

	xtmp[0] = xtmp[1] = xtmp[2] = 0.0;

	tagint tmptag = -1;

	if (i >= 0) {

	double rsq = 1.1;
	double rx,ry,rz;
	rx = ry = rz = 0.0;
	double coord[3];
	while (rsq > 1.0) {
	rx = 2*random_unequal->uniform() - 1.0;
	ry = 2*random_unequal->uniform() - 1.0;
	rz = 2*random_unequal->uniform() - 1.0;
	rsq = rxrx + ryry + rz*rz;
	}
	coord[0] = x[i][0] + displace*rx;
	coord[1] = x[i][1] + displace*ry;
	coord[2] = x[i][2] + displace*rz;
	if (regionflag) {
	while (domain->regions[iregion]->match(coord[0],coord[1],coord[2]) == 0) {
	rsq = 1.1;
	while (rsq > 1.0) {
	rx = 2*random_unequal->uniform() - 1.0;
	ry = 2*random_unequal->uniform() - 1.0;
	rz = 2*random_unequal->uniform() - 1.0;
	rsq = rxrx + ryry + rz*rz;
	}
	coord[0] = x[i][0] + displace*rx;
	coord[1] = x[i][1] + displace*ry;
	coord[2] = x[i][2] + displace*rz;
	}
	}
	if (!domain->inside_nonperiodic(coord))
	error->one(FLERR,"Fix gcmc put atom outside box");
	xtmp[0] = x[i][0];
	xtmp[1] = x[i][1];
	xtmp[2] = x[i][2];
	x[i][0] = coord[0];
	x[i][1] = coord[1];
	x[i][2] = coord[2];

	tmptag = atom->tag[i];
	}

	double energy_after = energy_full();

	if (energy_after < MAXENERGYTEST &&
	random_equal->uniform() <
	exp(beta*(energy_before - energy_after))) {
	energy_stored = energy_after;
	ntranslation_successes += 1.0;
	} else {

	tagint tmptag_all;
	MPI_Allreduce(&tmptag,&tmptag_all,1,MPI_LMP_TAGINT,MPI_MAX,world);

	double xtmp_all[3];
	MPI_Allreduce(&xtmp,&xtmp_all,3,MPI_DOUBLE,MPI_SUM,world);

	for (int i = 0; i < atom->nlocal; i++) {
	if (tmptag_all == atom->tag[i]) {
	x[i][0] = xtmp_all[0];
	x[i][1] = xtmp_all[1];
	x[i][2] = xtmp_all[2];
	}
	}
	energy_stored = energy_before;
	}
	update_gas_atoms_list();
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	void FixGCMC::attempt_atomic_deletion_full()
	{
	double q_tmp;
	const int q_flag = atom->q_flag;

	ndeletion_attempts += 1.0;

	if (ngas == 0) return;

	double energy_before = energy_stored;

	const int i = pick_random_gas_atom();

	int tmpmask;
	if (i >= 0) {
	tmpmask = atom->mask[i];
	atom->mask[i] = exclusion_group_bit;
	if (q_flag) {
	q_tmp = atom->q[i];
	atom->q[i] = 0.0;
	}
	}
	if (force->kspace) force->kspace->qsum_qsq();
	double energy_after = energy_full();

	if (random_equal->uniform() <
	ngasexp(beta(energy_before - energy_after))/(zz*volume)) {
	if (i >= 0) {
	atom->avec->copy(atom->nlocal-1,i,1);
	atom->nlocal--;
	}
	atom->natoms--;
	if (atom->map_style) atom->map_init();
	ndeletion_successes += 1.0;
	energy_stored = energy_after;
	} else {
	if (i >= 0) {
	atom->mask[i] = tmpmask;
	if (q_flag) atom->q[i] = q_tmp;
	}
	if (force->kspace) force->kspace->qsum_qsq();
	energy_stored = energy_before;
	}
	update_gas_atoms_list();
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	void FixGCMC::attempt_atomic_insertion_full()
	{
	double lamda[3];
	ninsertion_attempts += 1.0;

	double energy_before = energy_stored;

	double coord[3];
	if (regionflag) {
	int region_attempt = 0;
	coord[0] = region_xlo + random_equal->uniform() * (region_xhi-region_xlo);
	coord[1] = region_ylo + random_equal->uniform() * (region_yhi-region_ylo);
	coord[2] = region_zlo + random_equal->uniform() * (region_zhi-region_zlo);
	while (domain->regions[iregion]->match(coord[0],coord[1],coord[2]) == 0) {
	coord[0] = region_xlo + random_equal->uniform() * (region_xhi-region_xlo);
	coord[1] = region_ylo + random_equal->uniform() * (region_yhi-region_ylo);
	coord[2] = region_zlo + random_equal->uniform() * (region_zhi-region_zlo);
	region_attempt++;
	if (region_attempt >= max_region_attempts) return;
	}
	if (triclinic) domain->x2lamda(coord,lamda);
	} else {
	if (triclinic == 0) {
	coord[0] = xlo + random_equal->uniform() * (xhi-xlo);
	coord[1] = ylo + random_equal->uniform() * (yhi-ylo);
	coord[2] = zlo + random_equal->uniform() * (zhi-zlo);
	} else {
	lamda[0] = random_equal->uniform();
	lamda[1] = random_equal->uniform();
	lamda[2] = random_equal->uniform();

	// wasteful, but necessary

	if (lamda[0] == 1.0) lamda[0] = 0.0;
	if (lamda[1] == 1.0) lamda[1] = 0.0;
	if (lamda[2] == 1.0) lamda[2] = 0.0;

	domain->lamda2x(lamda,coord);
	}
	}

	int proc_flag = 0;
	if (triclinic == 0) {
	domain->remap(coord);
	if (!domain->inside(coord))
	error->one(FLERR,"Fix gcmc put atom outside box");
	if (coord[0] >= sublo[0] && coord[0] < subhi[0] &&
	coord[1] >= sublo[1] && coord[1] < subhi[1] &&
	coord[2] >= sublo[2] && coord[2] < subhi[2]) proc_flag = 1;
	} else {
	if (lamda[0] >= sublo[0] && lamda[0] < subhi[0] &&
	lamda[1] >= sublo[1] && lamda[1] < subhi[1] &&
	lamda[2] >= sublo[2] && lamda[2] < subhi[2]) proc_flag = 1;
	}

	if (proc_flag) {
	atom->avec->create_atom(ngcmc_type,coord);
	int m = atom->nlocal - 1;

	// add to groups
	// optionally add to type-based groups

	atom->mask[m] = groupbitall;
	for (int igroup = 0; igroup < ngrouptypes; igroup++) {
	if (ngcmc_type == grouptypes[igroup])
	atom->mask[m] \|= grouptypebits[igroup];
	}

	atom->v[m][0] = random_unequal->gaussian()*sigma;
	atom->v[m][1] = random_unequal->gaussian()*sigma;
	atom->v[m][2] = random_unequal->gaussian()*sigma;
	if (charge_flag) atom->q[m] = charge;
	modify->create_attribute(m);
	}

	atom->natoms++;
	if (atom->tag_enable) {
	atom->tag_extend();
	if (atom->map_style) atom->map_init();
	}
	atom->nghost = 0;
	if (triclinic) domain->x2lamda(atom->nlocal);
	comm->borders();
	if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	if (force->kspace) force->kspace->qsum_qsq();
	double energy_after = energy_full();

	if (energy_after < MAXENERGYTEST &&
	random_equal->uniform() <
	zzvolumeexp(beta*(energy_before - energy_after))/(ngas+1)) {

	ninsertion_successes += 1.0;
	energy_stored = energy_after;
	} else {
	atom->natoms--;
	if (proc_flag) atom->nlocal--;
	if (force->kspace) force->kspace->qsum_qsq();
	energy_stored = energy_before;
	}
	update_gas_atoms_list();
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	void FixGCMC::attempt_molecule_translation_full()
	{
	ntranslation_attempts += 1.0;

	if (ngas == 0) return;

	tagint translation_molecule = pick_random_gas_molecule();
	if (translation_molecule == -1) return;

	double energy_before = energy_stored;

	double **x = atom->x;
	double rx,ry,rz;
	double com_displace[3],coord[3];
	double rsq = 1.1;
	while (rsq > 1.0) {
	rx = 2*random_equal->uniform() - 1.0;
	ry = 2*random_equal->uniform() - 1.0;
	rz = 2*random_equal->uniform() - 1.0;
	rsq = rxrx + ryry + rz*rz;
	}
	com_displace[0] = displace*rx;
	com_displace[1] = displace*ry;
	com_displace[2] = displace*rz;

	int nlocal = atom->nlocal;
	if (regionflag) {
	int *mask = atom->mask;
	for (int i = 0; i < nlocal; i++) {
	if (atom->molecule[i] == translation_molecule) {
	mask[i] \|= molecule_group_bit;
	} else {
	mask[i] &= molecule_group_inversebit;
	}
	}
	double com[3];
	com[0] = com[1] = com[2] = 0.0;
	group->xcm(molecule_group,gas_mass,com);
	coord[0] = com[0] + displace*rx;
	coord[1] = com[1] + displace*ry;
	coord[2] = com[2] + displace*rz;
	while (domain->regions[iregion]->match(coord[0],coord[1],coord[2]) == 0) {
	rsq = 1.1;
	while (rsq > 1.0) {
	rx = 2*random_equal->uniform() - 1.0;
	ry = 2*random_equal->uniform() - 1.0;
	rz = 2*random_equal->uniform() - 1.0;
	rsq = rxrx + ryry + rz*rz;
	}
	coord[0] = com[0] + displace*rx;
	coord[1] = com[1] + displace*ry;
	coord[2] = com[2] + displace*rz;
	}
	com_displace[0] = displace*rx;
	com_displace[1] = displace*ry;
	com_displace[2] = displace*rz;
	}

	for (int i = 0; i < nlocal; i++) {
	if (atom->molecule[i] == translation_molecule) {
	x[i][0] += com_displace[0];
	x[i][1] += com_displace[1];
	x[i][2] += com_displace[2];
	if (!domain->inside_nonperiodic(x[i]))
	error->one(FLERR,"Fix gcmc put atom outside box");
	}
	}

	double energy_after = energy_full();

	if (energy_after < MAXENERGYTEST &&
	random_equal->uniform() <
	exp(beta*(energy_before - energy_after))) {
	ntranslation_successes += 1.0;
	energy_stored = energy_after;
	} else {
	energy_stored = energy_before;
	for (int i = 0; i < nlocal; i++) {
	if (atom->molecule[i] == translation_molecule) {
	x[i][0] -= com_displace[0];
	x[i][1] -= com_displace[1];
	x[i][2] -= com_displace[2];
	}
	}
	}
	update_gas_atoms_list();
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	void FixGCMC::attempt_molecule_rotation_full()
	{
	nrotation_attempts += 1.0;

	if (ngas == 0) return;

	tagint rotation_molecule = pick_random_gas_molecule();
	if (rotation_molecule == -1) return;

	double energy_before = energy_stored;

	int nlocal = atom->nlocal;
	int *mask = atom->mask;
	for (int i = 0; i < nlocal; i++) {
	if (atom->molecule[i] == rotation_molecule) {
	mask[i] \|= molecule_group_bit;
	} else {
	mask[i] &= molecule_group_inversebit;
	}
	}

	double com[3];
	com[0] = com[1] = com[2] = 0.0;
	group->xcm(molecule_group,gas_mass,com);

	// generate point in unit cube
	// then restrict to unit sphere

	double r[3],rotmat[3][3],quat[4];
	double rsq = 1.1;
	while (rsq > 1.0) {
	r[0] = 2.0*random_equal->uniform() - 1.0;
	r[1] = 2.0*random_equal->uniform() - 1.0;
	r[2] = 2.0*random_equal->uniform() - 1.0;
	rsq = MathExtra::dot3(r, r);
	}

	double theta = random_equal->uniform() * max_rotation_angle;
	MathExtra::norm3(r);
	MathExtra::axisangle_to_quat(r,theta,quat);
	MathExtra::quat_to_mat(quat,rotmat);

	double **x = atom->x;
	imageint *image = atom->image;
	imageint image_orig[natoms_per_molecule];
	int n = 0;
	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & molecule_group_bit) {
	atom_coord[n][0] = x[i][0];
	atom_coord[n][1] = x[i][1];
	atom_coord[n][2] = x[i][2];
	image_orig[n] = image[i];
	double xtmp[3];
	domain->unmap(x[i],image[i],xtmp);
	xtmp[0] -= com[0];
	xtmp[1] -= com[1];
	xtmp[2] -= com[2];
	MathExtra::matvec(rotmat,xtmp,x[i]);
	x[i][0] += com[0];
	x[i][1] += com[1];
	x[i][2] += com[2];
	image[i] = imagezero;
	domain->remap(x[i],image[i]);
	if (!domain->inside(x[i]))
	error->one(FLERR,"Fix gcmc put atom outside box");
	n++;
	}
	}

	double energy_after = energy_full();

	if (energy_after < MAXENERGYTEST &&
	random_equal->uniform() <
	exp(beta*(energy_before - energy_after))) {
	nrotation_successes += 1.0;
	energy_stored = energy_after;
	} else {
	energy_stored = energy_before;
	int n = 0;
	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & molecule_group_bit) {
	x[i][0] = atom_coord[n][0];
	x[i][1] = atom_coord[n][1];
	x[i][2] = atom_coord[n][2];
	image[i] = image_orig[n];
	n++;
	}
	}
	}
	update_gas_atoms_list();
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	void FixGCMC::attempt_molecule_deletion_full()
	{
	ndeletion_attempts += 1.0;

	if (ngas == 0) return;

	tagint deletion_molecule = pick_random_gas_molecule();
	if (deletion_molecule == -1) return;

	double energy_before = energy_stored;

	int m = 0;
	double q_tmp[natoms_per_molecule];
	int tmpmask[atom->nlocal];
	for (int i = 0; i < atom->nlocal; i++) {
	if (atom->molecule[i] == deletion_molecule) {
	tmpmask[i] = atom->mask[i];
	atom->mask[i] = exclusion_group_bit;
	toggle_intramolecular(i);
	if (atom->q_flag) {
	q_tmp[m] = atom->q[i];
	m++;
	atom->q[i] = 0.0;
	}
	}
	}
	if (force->kspace) force->kspace->qsum_qsq();
	double energy_after = energy_full();

	// energy_before corrected by energy_intra

	double deltaphi = ngasexp(beta((energy_before - energy_intra) - energy_after))/(zzvolumenatoms_per_molecule);

	if (random_equal->uniform() < deltaphi) {
	int i = 0;
	while (i < atom->nlocal) {
	if (atom->molecule[i] == deletion_molecule) {
	atom->avec->copy(atom->nlocal-1,i,1);
	atom->nlocal--;
	} else i++;
	}
	atom->natoms -= natoms_per_molecule;
	if (atom->map_style) atom->map_init();
	ndeletion_successes += 1.0;
	energy_stored = energy_after;
	} else {
	energy_stored = energy_before;
	int m = 0;
	for (int i = 0; i < atom->nlocal; i++) {
	if (atom->molecule[i] == deletion_molecule) {
	atom->mask[i] = tmpmask[i];
	toggle_intramolecular(i);
	if (atom->q_flag) {
	atom->q[i] = q_tmp[m];
	m++;
	}
	}
	}
	if (force->kspace) force->kspace->qsum_qsq();
	}
	update_gas_atoms_list();
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	void FixGCMC::attempt_molecule_insertion_full()
	{
	double lamda[3];
	ninsertion_attempts += 1.0;

	double energy_before = energy_stored;

	tagint maxmol = 0;
	for (int i = 0; i < atom->nlocal; i++) maxmol = MAX(maxmol,atom->molecule[i]);
	tagint maxmol_all;
	MPI_Allreduce(&maxmol,&maxmol_all,1,MPI_LMP_TAGINT,MPI_MAX,world);
	maxmol_all++;
	if (maxmol_all >= MAXTAGINT)
	error->all(FLERR,"Fix gcmc ran out of available molecule IDs");
	int insertion_molecule = maxmol_all;

	tagint maxtag = 0;
	for (int i = 0; i < atom->nlocal; i++) maxtag = MAX(maxtag,atom->tag[i]);
	tagint maxtag_all;
	MPI_Allreduce(&maxtag,&maxtag_all,1,MPI_LMP_TAGINT,MPI_MAX,world);

	int nlocalprev = atom->nlocal;

	double com_coord[3];
	if (regionflag) {
	int region_attempt = 0;
	com_coord[0] = region_xlo + random_equal->uniform() *
	(region_xhi-region_xlo);
	com_coord[1] = region_ylo + random_equal->uniform() *
	(region_yhi-region_ylo);
	com_coord[2] = region_zlo + random_equal->uniform() *
	(region_zhi-region_zlo);
	while (domain->regions[iregion]->match(com_coord[0],com_coord[1],
	com_coord[2]) == 0) {
	com_coord[0] = region_xlo + random_equal->uniform() *
	(region_xhi-region_xlo);
	com_coord[1] = region_ylo + random_equal->uniform() *
	(region_yhi-region_ylo);
	com_coord[2] = region_zlo + random_equal->uniform() *
	(region_zhi-region_zlo);
	region_attempt++;
	if (region_attempt >= max_region_attempts) return;
	}
	if (triclinic) domain->x2lamda(com_coord,lamda);
	} else {
	if (triclinic == 0) {
	com_coord[0] = xlo + random_equal->uniform() * (xhi-xlo);
	com_coord[1] = ylo + random_equal->uniform() * (yhi-ylo);
	com_coord[2] = zlo + random_equal->uniform() * (zhi-zlo);
	} else {
	lamda[0] = random_equal->uniform();
	lamda[1] = random_equal->uniform();
	lamda[2] = random_equal->uniform();

	// wasteful, but necessary

	if (lamda[0] == 1.0) lamda[0] = 0.0;
	if (lamda[1] == 1.0) lamda[1] = 0.0;
	if (lamda[2] == 1.0) lamda[2] = 0.0;

	domain->lamda2x(lamda,com_coord);
	}

	}

	// generate point in unit cube
	// then restrict to unit sphere

	double r[3],rotmat[3][3],quat[4];
	double rsq = 1.1;
	while (rsq > 1.0) {
	r[0] = 2.0*random_equal->uniform() - 1.0;
	r[1] = 2.0*random_equal->uniform() - 1.0;
	r[2] = 2.0*random_equal->uniform() - 1.0;
	rsq = MathExtra::dot3(r, r);
	}

	double theta = random_equal->uniform() * MY_2PI;
	MathExtra::norm3(r);
	MathExtra::axisangle_to_quat(r,theta,quat);
	MathExtra::quat_to_mat(quat,rotmat);

	double vnew[3];
	vnew[0] = random_equal->gaussian()*sigma;
	vnew[1] = random_equal->gaussian()*sigma;
	vnew[2] = random_equal->gaussian()*sigma;

	for (int i = 0; i < natoms_per_molecule; i++) {
	double xtmp[3];
	MathExtra::matvec(rotmat,onemols[imol]->x[i],xtmp);
	xtmp[0] += com_coord[0];
	xtmp[1] += com_coord[1];
	xtmp[2] += com_coord[2];

	// need to adjust image flags in remap()

	imageint imagetmp = imagezero;
	domain->remap(xtmp,imagetmp);
	if (!domain->inside(xtmp))
	error->one(FLERR,"Fix gcmc put atom outside box");

	int proc_flag = 0;
	if (triclinic == 0) {
	if (xtmp[0] >= sublo[0] && xtmp[0] < subhi[0] &&
	xtmp[1] >= sublo[1] && xtmp[1] < subhi[1] &&
	xtmp[2] >= sublo[2] && xtmp[2] < subhi[2]) proc_flag = 1;
	} else {
	domain->x2lamda(xtmp,lamda);
	if (lamda[0] >= sublo[0] && lamda[0] < subhi[0] &&
	lamda[1] >= sublo[1] && lamda[1] < subhi[1] &&
	lamda[2] >= sublo[2] && lamda[2] < subhi[2]) proc_flag = 1;
	}

	if (proc_flag) {
	atom->avec->create_atom(onemols[imol]->type[i],xtmp);
	int m = atom->nlocal - 1;

	// add to groups
	// optionally add to type-based groups

	atom->mask[m] = groupbitall;
	for (int igroup = 0; igroup < ngrouptypes; igroup++) {
	if (ngcmc_type == grouptypes[igroup])
	atom->mask[m] \|= grouptypebits[igroup];
	}

	atom->image[m] = imagetmp;
	atom->molecule[m] = insertion_molecule;
	if (maxtag_all+i+1 >= MAXTAGINT)
	error->all(FLERR,"Fix gcmc ran out of available atom IDs");
	atom->tag[m] = maxtag_all + i + 1;
	atom->v[m][0] = vnew[0];
	atom->v[m][1] = vnew[1];
	atom->v[m][2] = vnew[2];

	atom->add_molecule_atom(onemols[imol],i,m,maxtag_all);
	modify->create_attribute(m);
	}
	}

	// FixRigidSmall::set_molecule stores rigid body attributes
	// FixShake::set_molecule stores shake info for molecule

	if (rigidflag)
	fixrigid->set_molecule(nlocalprev,maxtag_all,imol,com_coord,vnew,quat);
	else if (shakeflag)
	fixshake->set_molecule(nlocalprev,maxtag_all,imol,com_coord,vnew,quat);

	atom->natoms += natoms_per_molecule;
	if (atom->natoms < 0)
	error->all(FLERR,"Too many total atoms");
	atom->nbonds += onemols[imol]->nbonds;
	atom->nangles += onemols[imol]->nangles;
	atom->ndihedrals += onemols[imol]->ndihedrals;
	atom->nimpropers += onemols[imol]->nimpropers;
	if (atom->map_style) atom->map_init();
	atom->nghost = 0;
	if (triclinic) domain->x2lamda(atom->nlocal);
	comm->borders();
	if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	if (force->kspace) force->kspace->qsum_qsq();
	double energy_after = energy_full();

	// energy_after corrected by energy_intra

	double deltaphi = zzvolumenatoms_per_molecule*
	exp(beta*(energy_before - (energy_after - energy_intra)))/(ngas + natoms_per_molecule);

	if (energy_after < MAXENERGYTEST &&
	random_equal->uniform() < deltaphi) {

	ninsertion_successes += 1.0;
	energy_stored = energy_after;

	} else {

	atom->nbonds -= onemols[imol]->nbonds;
	atom->nangles -= onemols[imol]->nangles;
	atom->ndihedrals -= onemols[imol]->ndihedrals;
	atom->nimpropers -= onemols[imol]->nimpropers;
	atom->natoms -= natoms_per_molecule;

	energy_stored = energy_before;
	int i = 0;
	while (i < atom->nlocal) {
	if (atom->molecule[i] == insertion_molecule) {
	atom->avec->copy(atom->nlocal-1,i,1);
	atom->nlocal--;
	} else i++;
	}
	if (force->kspace) force->kspace->qsum_qsq();
	}
	update_gas_atoms_list();
	}

	/* ----------------------------------------------------------------------
	compute particle's interaction energy with the rest of the system
	------------------------------------------------------------------------- */

	double FixGCMC::energy(int i, int itype, tagint imolecule, double *coord)
	{
	double delx,dely,delz,rsq;

	double **x = atom->x;
	int *type = atom->type;
	tagint *molecule = atom->molecule;
	int nall = atom->nlocal + atom->nghost;
	pair = force->pair;
	cutsq = force->pair->cutsq;

	double fpair = 0.0;
	double factor_coul = 1.0;
	double factor_lj = 1.0;

	double total_energy = 0.0;

	for (int j = 0; j < nall; j++) {

	if (i == j) continue;
	if (mode == MOLECULE)
	if (imolecule == molecule[j]) continue;

	delx = coord[0] - x[j][0];
	dely = coord[1] - x[j][1];
	delz = coord[2] - x[j][2];
	rsq = delxdelx + delydely + delz*delz;
	int jtype = type[j];

	// if overlap check requested, if overlap,
	// return signal value for energy

	- if (overlap_flag && rsq < overlap_cutoff)
	+ if (overlap_flag && rsq < overlap_cutoffsq)
	return MAXENERGYSIGNAL;

	if (rsq < cutsq[itype][jtype])
	total_energy +=
	pair->single(i,j,itype,jtype,rsq,factor_coul,factor_lj,fpair);
	}

	return total_energy;
	}

	/* ----------------------------------------------------------------------
	compute the energy of the given gas molecule in its current position
	sum across all procs that own atoms of the given molecule
	------------------------------------------------------------------------- */

	double FixGCMC::molecule_energy(tagint gas_molecule_id)
	{
	double mol_energy = 0.0;
	for (int i = 0; i < atom->nlocal; i++)
	if (atom->molecule[i] == gas_molecule_id) {
	mol_energy += energy(i,atom->type[i],gas_molecule_id,atom->x[i]);
	}

	double mol_energy_sum = 0.0;
	MPI_Allreduce(&mol_energy,&mol_energy_sum,1,MPI_DOUBLE,MPI_SUM,world);

	return mol_energy_sum;
	}

	/* ----------------------------------------------------------------------
	compute system potential energy
	------------------------------------------------------------------------- */

	double FixGCMC::energy_full()
	{
	int imolecule;

	if (triclinic) domain->x2lamda(atom->nlocal);
	domain->pbc();
	comm->exchange();
	atom->nghost = 0;
	comm->borders();
	if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	if (modify->n_pre_neighbor) modify->pre_neighbor();
	neighbor->build();
	int eflag = 1;
	int vflag = 0;

	// if overlap check requested, if overlap,
	// return signal value for energy

	if (overlap_flag) {
	int overlaptestall;
	int overlaptest = 0;
	double delx,dely,delz,rsq;
	double **x = atom->x;
	tagint *molecule = atom->molecule;
	int nall = atom->nlocal + atom->nghost;
	for (int i = 0; i < atom->nlocal; i++) {
	if (mode == MOLECULE) imolecule = molecule[i];
	for (int j = i+1; j < nall; j++) {
	if (mode == MOLECULE)
	if (imolecule == molecule[j]) continue;

	delx = x[i][0] - x[j][0];
	dely = x[i][1] - x[j][1];
	delz = x[i][2] - x[j][2];
	rsq = delxdelx + delydely + delz*delz;

	- if (rsq < overlap_cutoff) {
	+ if (rsq < overlap_cutoffsq) {
	overlaptest = 1;
	break;
	}
	}
	if (overlaptest) break;
	}
	MPI_Allreduce(&overlaptest, &overlaptestall, 1,
	MPI_INT, MPI_MAX, world);
	if (overlaptestall) return MAXENERGYSIGNAL;
	}

	// clear forces so they don't accumulate over multiple
	// calls within fix gcmc timestep, e.g. for fix shake

	size_t nbytes = sizeof(double) * (atom->nlocal + atom->nghost);
	if (nbytes) memset(&atom->f[0][0],0,3*nbytes);

	if (modify->n_pre_force) modify->pre_force(vflag);

	if (force->pair) force->pair->compute(eflag,vflag);

	if (atom->molecular) {
	if (force->bond) force->bond->compute(eflag,vflag);
	if (force->angle) force->angle->compute(eflag,vflag);
	if (force->dihedral) force->dihedral->compute(eflag,vflag);
	if (force->improper) force->improper->compute(eflag,vflag);
	}

	if (force->kspace) force->kspace->compute(eflag,vflag);

	// unlike Verlet, not performing a reverse_comm() or forces here
	// b/c GCMC does not care about forces
	// don't think it will mess up energy due to any post_force() fixes

	if (modify->n_post_force) modify->post_force(vflag);
	if (modify->n_end_of_step) modify->end_of_step();

	// NOTE: all fixes with THERMO_ENERGY mask set and which
	// operate at pre_force() or post_force() or end_of_step()
	// and which user has enable via fix_modify thermo yes,
	// will contribute to total MC energy via pe->compute_scalar()

	update->eflag_global = update->ntimestep;
	double total_energy = c_pe->compute_scalar();

	return total_energy;
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	int FixGCMC::pick_random_gas_atom()
	{
	int i = -1;
	int iwhichglobal = static_cast<int> (ngas*random_equal->uniform());
	if ((iwhichglobal >= ngas_before) &&
	(iwhichglobal < ngas_before + ngas_local)) {
	int iwhichlocal = iwhichglobal - ngas_before;
	i = local_gas_list[iwhichlocal];
	}

	return i;
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	tagint FixGCMC::pick_random_gas_molecule()
	{
	int iwhichglobal = static_cast<int> (ngas*random_equal->uniform());
	tagint gas_molecule_id = 0;
	if ((iwhichglobal >= ngas_before) &&
	(iwhichglobal < ngas_before + ngas_local)) {
	int iwhichlocal = iwhichglobal - ngas_before;
	int i = local_gas_list[iwhichlocal];
	gas_molecule_id = atom->molecule[i];
	}

	tagint gas_molecule_id_all = 0;
	MPI_Allreduce(&gas_molecule_id,&gas_molecule_id_all,1,
	MPI_LMP_TAGINT,MPI_MAX,world);

	return gas_molecule_id_all;
	}

	/* ----------------------------------------------------------------------
	------------------------------------------------------------------------- */

	void FixGCMC::toggle_intramolecular(int i)
	{
	if (atom->avec->bonds_allow)
	for (int m = 0; m < atom->num_bond[i]; m++)
	atom->bond_type[i][m] = -atom->bond_type[i][m];

	if (atom->avec->angles_allow)
	for (int m = 0; m < atom->num_angle[i]; m++)
	atom->angle_type[i][m] = -atom->angle_type[i][m];

	if (atom->avec->dihedrals_allow)
	for (int m = 0; m < atom->num_dihedral[i]; m++)
	atom->dihedral_type[i][m] = -atom->dihedral_type[i][m];

	if (atom->avec->impropers_allow)
	for (int m = 0; m < atom->num_improper[i]; m++)
	atom->improper_type[i][m] = -atom->improper_type[i][m];
	}

	/* ----------------------------------------------------------------------
	update the list of gas atoms
	------------------------------------------------------------------------- */

	void FixGCMC::update_gas_atoms_list()
	{
	int nlocal = atom->nlocal;
	int *mask = atom->mask;
	tagint *molecule = atom->molecule;
	double **x = atom->x;

	if (atom->nmax > gcmc_nmax) {
	memory->sfree(local_gas_list);
	gcmc_nmax = atom->nmax;
	local_gas_list = (int ) memory->smalloc(gcmc_nmaxsizeof(int),
	"GCMC:local_gas_list");
	}

	ngas_local = 0;

	if (regionflag) {

	if (mode == MOLECULE) {

	tagint maxmol = 0;
	for (int i = 0; i < nlocal; i++) maxmol = MAX(maxmol,molecule[i]);
	tagint maxmol_all;
	MPI_Allreduce(&maxmol,&maxmol_all,1,MPI_LMP_TAGINT,MPI_MAX,world);
	double comx[maxmol_all];
	double comy[maxmol_all];
	double comz[maxmol_all];
	for (int imolecule = 0; imolecule < maxmol_all; imolecule++) {
	for (int i = 0; i < nlocal; i++) {
	if (molecule[i] == imolecule) {
	mask[i] \|= molecule_group_bit;
	} else {
	mask[i] &= molecule_group_inversebit;
	}
	}
	double com[3];
	com[0] = com[1] = com[2] = 0.0;
	group->xcm(molecule_group,gas_mass,com);

	// remap unwrapped com into periodic box

	domain->remap(com);
	comx[imolecule] = com[0];
	comy[imolecule] = com[1];
	comz[imolecule] = com[2];
	}

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) {
	if (domain->regions[iregion]->match(comx[molecule[i]],
	comy[molecule[i]],comz[molecule[i]]) == 1) {
	local_gas_list[ngas_local] = i;
	ngas_local++;
	}
	}
	}

	} else {
	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) {
	if (domain->regions[iregion]->match(x[i][0],x[i][1],x[i][2]) == 1) {
	local_gas_list[ngas_local] = i;
	ngas_local++;
	}
	}
	}
	}

	} else {
	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) {
	local_gas_list[ngas_local] = i;
	ngas_local++;
	}
	}
	}

	MPI_Allreduce(&ngas_local,&ngas,1,MPI_INT,MPI_SUM,world);
	MPI_Scan(&ngas_local,&ngas_before,1,MPI_INT,MPI_SUM,world);
	ngas_before -= ngas_local;
	}

	/* ----------------------------------------------------------------------
	return acceptance ratios
	------------------------------------------------------------------------- */

	double FixGCMC::compute_vector(int n)
	{
	if (n == 0) return ntranslation_attempts;
	if (n == 1) return ntranslation_successes;
	if (n == 2) return ninsertion_attempts;
	if (n == 3) return ninsertion_successes;
	if (n == 4) return ndeletion_attempts;
	if (n == 5) return ndeletion_successes;
	if (n == 6) return nrotation_attempts;
	if (n == 7) return nrotation_successes;
	return 0.0;
	}

	/* ----------------------------------------------------------------------
	memory usage of local atom-based arrays
	------------------------------------------------------------------------- */

	double FixGCMC::memory_usage()
	{
	double bytes = gcmc_nmax * sizeof(int);
	return bytes;
	}

	/* ----------------------------------------------------------------------
	pack entire state of Fix into one write
	------------------------------------------------------------------------- */

	void FixGCMC::write_restart(FILE *fp)
	{
	int n = 0;
	double list[4];
	list[n++] = random_equal->state();
	list[n++] = random_unequal->state();
	list[n++] = next_reneighbor;

	if (comm->me == 0) {
	int size = n * sizeof(double);
	fwrite(&size,sizeof(int),1,fp);
	fwrite(list,sizeof(double),n,fp);
	}
	}

	/* ----------------------------------------------------------------------
	use state info from restart file to restart the Fix
	------------------------------------------------------------------------- */

	void FixGCMC::restart(char *buf)
	{
	int n = 0;
	double list = (double ) buf;

	seed = static_cast<int> (list[n++]);
	random_equal->reset(seed);

	seed = static_cast<int> (list[n++]);
	random_unequal->reset(seed);

	next_reneighbor = static_cast<int> (list[n++]);
	}
	diff --git a/src/MC/fix_gcmc.h b/src/MC/fix_gcmc.h
	index 2519c0096..8a5375eed 100644
	--- a/src/MC/fix_gcmc.h
	+++ b/src/MC/fix_gcmc.h
	@@ -1,304 +1,304 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifdef FIX_CLASS

	FixStyle(gcmc,FixGCMC)

	#else

	#ifndef LMP_FIX_GCMC_H
	#define LMP_FIX_GCMC_H

	#include <stdio.h>
	#include "fix.h"

	namespace LAMMPS_NS {

	class FixGCMC : public Fix {
	public:
	FixGCMC(class LAMMPS , int, char *);
	~FixGCMC();
	int setmask();
	void init();
	void pre_exchange();
	void attempt_atomic_translation();
	void attempt_atomic_deletion();
	void attempt_atomic_insertion();
	void attempt_molecule_translation();
	void attempt_molecule_rotation();
	void attempt_molecule_deletion();
	void attempt_molecule_insertion();
	void attempt_atomic_translation_full();
	void attempt_atomic_deletion_full();
	void attempt_atomic_insertion_full();
	void attempt_molecule_translation_full();
	void attempt_molecule_rotation_full();
	void attempt_molecule_deletion_full();
	void attempt_molecule_insertion_full();
	double energy(int, int, tagint, double *);
	double molecule_energy(tagint);
	double energy_full();
	int pick_random_gas_atom();
	tagint pick_random_gas_molecule();
	void toggle_intramolecular(int);
	void update_gas_atoms_list();
	double compute_vector(int);
	double memory_usage();
	void write_restart(FILE *);
	void restart(char *);

	private:
	int molecule_group,molecule_group_bit;
	int molecule_group_inversebit;
	int exclusion_group,exclusion_group_bit;
	int ngcmc_type,nevery,seed;
	int ncycles,nexchanges,nmcmoves;
	int ngas; // # of gas atoms on all procs
	int ngas_local; // # of gas atoms on this proc
	int ngas_before; // # of gas atoms on procs < this proc
	int mode; // ATOM or MOLECULE
	int regionflag; // 0 = anywhere in box, 1 = specific region
	int iregion; // gcmc region
	char *idregion; // gcmc region id
	bool pressure_flag; // true if user specified reservoir pressure
	bool charge_flag; // true if user specified atomic charge
	bool full_flag; // true if doing full system energy calculations

	int natoms_per_molecule; // number of atoms in each gas molecule

	int groupbitall; // group bitmask for inserted atoms
	int ngroups; // number of group-ids for inserted atoms
	char** groupstrings; // list of group-ids for inserted atoms
	int ngrouptypes; // number of type-based group-ids for inserted atoms
	char** grouptypestrings; // list of type-based group-ids for inserted atoms
	int* grouptypebits; // list of type-based group bitmasks
	int* grouptypes; // list of type-based group types
	double ntranslation_attempts;
	double ntranslation_successes;
	double nrotation_attempts;
	double nrotation_successes;
	double ndeletion_attempts;
	double ndeletion_successes;
	double ninsertion_attempts;
	double ninsertion_successes;

	int gcmc_nmax;
	int max_region_attempts;
	double gas_mass;
	double reservoir_temperature;
	double tfac_insert;
	double chemical_potential;
	double displace;
	double max_rotation_angle;
	double beta,zz,sigma,volume;
	double pressure,fugacity_coeff,charge;
	double xlo,xhi,ylo,yhi,zlo,zhi;
	double region_xlo,region_xhi,region_ylo,region_yhi,region_zlo,region_zhi;
	double region_volume;
	double energy_stored; // full energy of old/current configuration
	double sublo,subhi;
	int *local_gas_list;
	double **cutsq;
	double **atom_coord;
	imageint imagezero;
	- double overlap_cutoff;
	+ double overlap_cutoffsq; // square distance cutoff for overlap
	int overlap_flag;

	double energy_intra;

	class Pair *pair;

	class RanPark *random_equal;
	class RanPark *random_unequal;

	class Atom *model_atom;

	class Molecule **onemols;
	int imol,nmol;
	double **coords;
	imageint *imageflags;
	class Fix fixrigid, fixshake;
	int rigidflag, shakeflag;
	char idrigid, idshake;
	int triclinic; // 0 = orthog box, 1 = triclinic

	class Compute *c_pe;

	void options(int, char **);
	};

	}

	#endif
	#endif

	/* ERROR/WARNING messages:

	E: Illegal ... command

	Self-explanatory. Check the input script syntax and compare to the
	documentation for the command. You can use -echo screen as a
	command-line option when running LAMMPS to see the offending line.

	E: Fix gcmc does not (yet) work with atom_style template

	Self-explanatory.

	E: Fix gcmc region does not support a bounding box

	Not all regions represent bounded volumes. You cannot use
	such a region with the fix gcmc command.

	E: Fix gcmc region cannot be dynamic

	Only static regions can be used with fix gcmc.

	E: Fix gcmc region extends outside simulation box

	Self-explanatory.

	E: Fix gcmc molecule must have coordinates

	The defined molecule does not specify coordinates.

	E: Fix gcmc molecule must have atom types

	The defined molecule does not specify atom types.

	E: Atom type must be zero in fix gcmc mol command

	Self-explanatory.

	E: Fix gcmc molecule has charges, but atom style does not

	Self-explanatory.

	E: Fix gcmc molecule template ID must be same as atom_style template ID

	When using atom_style template, you cannot insert molecules that are
	not in that template.

	E: Fix gcmc atom has charge, but atom style does not

	Self-explanatory.

	E: Cannot use fix gcmc shake and not molecule

	Self-explanatory.

	E: Molecule template ID for fix gcmc does not exist

	Self-explanatory.

	W: Molecule template for fix gcmc has multiple molecules

	The fix gcmc command will only create molecules of a single type,
	i.e. the first molecule in the template.

	E: Region ID for fix gcmc does not exist

	Self-explanatory.

	W: Fix gcmc using full_energy option

	Fix gcmc has automatically turned on the full_energy option since it
	is required for systems like the one specified by the user. User input
	included one or more of the following: kspace, a hybrid
	pair style, an eam pair style, tail correction,
	or no "single" function for the pair style.

	W: Energy of old configuration in fix gcmc is > MAXENERGYTEST.

	This probably means that a pair of atoms are closer than the
	overlap cutoff distance for keyword overlap_cutoff.

	E: Invalid atom type in fix gcmc command

	The atom type specified in the gcmc command does not exist.

	E: Fix gcmc cannot exchange individual atoms belonging to a molecule

	This is an error since you should not delete only one atom of a
	molecule. The user has specified atomic (non-molecular) gas
	exchanges, but an atom belonging to a molecule could be deleted.

	E: All mol IDs should be set for fix gcmc group atoms

	The molecule flag is on, yet not all molecule ids in the fix group
	have been set to non-zero positive values by the user. This is an
	error since all atoms in the fix gcmc group are eligible for deletion,
	rotation, and translation and therefore must have valid molecule ids.

	E: Fix gcmc molecule command requires that atoms have molecule attributes

	Should not choose the gcmc molecule feature if no molecules are being
	simulated. The general molecule flag is off, but gcmc's molecule flag
	is on.

	E: Fix gcmc shake fix does not exist

	Self-explanatory.

	E: Fix gcmc and fix shake not using same molecule template ID

	Self-explanatory.

	E: Fix gcmc can not currently be used with fix rigid or fix rigid/small

	Self-explanatory.

	E: Cannot use fix gcmc in a 2d simulation

	Fix gcmc is set up to run in 3d only. No 2d simulations with fix gcmc
	are allowed.

	E: Could not find fix gcmc exclusion group ID

	Self-explanatory.

	E: Could not find fix gcmc rotation group ID

	Self-explanatory.

	E: Illegal fix gcmc gas mass <= 0

	The computed mass of the designated gas molecule or atom type was less
	than or equal to zero.

	E: Cannot do GCMC on atoms in atom_modify first group

	This is a restriction due to the way atoms are organized in a list to
	enable the atom_modify first command.

	E: Could not find specified fix gcmc group ID

	Self-explanatory.

	E: Fix gcmc put atom outside box

	This should not normally happen. Contact the developers.

	E: Fix gcmc ran out of available molecule IDs

	See the setting for tagint in the src/lmptype.h file.

	E: Fix gcmc ran out of available atom IDs

	See the setting for tagint in the src/lmptype.h file.

	E: Too many total atoms

	See the setting for bigint in the src/lmptype.h file.

	*/
	diff --git a/src/MOLECULE/pair_lj_charmmfsw_coul_charmmfsh.cpp b/src/MOLECULE/pair_lj_charmmfsw_coul_charmmfsh.cpp
	index c75da63ca..af19f3eb3 100644
	--- a/src/MOLECULE/pair_lj_charmmfsw_coul_charmmfsh.cpp
	+++ b/src/MOLECULE/pair_lj_charmmfsw_coul_charmmfsh.cpp
	@@ -1,546 +1,546 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Paul Crozier (SNL)
	The lj-fsw/coul-fsh (force-switched and force-shifted) sections
	were provided by Robert Meissner
	and Lucio Colombi Ciacchi of Bremen University, Bremen, Germany,
	with additional assistance from Robert A. Latour, Clemson University
	------------------------------------------------------------------------- */

	#include <math.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <string.h>
	#include "pair_lj_charmmfsw_coul_charmmfsh.h"
	#include "atom.h"
	#include "comm.h"
	#include "force.h"
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "memory.h"
	#include "error.h"

	using namespace LAMMPS_NS;

	/* ---------------------------------------------------------------------- */

	PairLJCharmmfswCoulCharmmfsh::PairLJCharmmfswCoulCharmmfsh(LAMMPS *lmp) :
	Pair(lmp)
	{
	implicit = 0;
	mix_flag = ARITHMETIC;
	writedata = 1;
	+
	+ // short-range/long-range flag accessed by DihedralCharmmfsw
	+
	+ dihedflag = 0;
	}

	/* ---------------------------------------------------------------------- */

	PairLJCharmmfswCoulCharmmfsh::~PairLJCharmmfswCoulCharmmfsh()
	{
	if (!copymode) {
	if (allocated) {
	memory->destroy(setflag);
	memory->destroy(cutsq);

	memory->destroy(epsilon);
	memory->destroy(sigma);
	memory->destroy(eps14);
	memory->destroy(sigma14);
	memory->destroy(lj1);
	memory->destroy(lj2);
	memory->destroy(lj3);
	memory->destroy(lj4);
	memory->destroy(lj14_1);
	memory->destroy(lj14_2);
	memory->destroy(lj14_3);
	memory->destroy(lj14_4);
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	void PairLJCharmmfswCoulCharmmfsh::compute(int eflag, int vflag)
	{
	int i,j,ii,jj,inum,jnum,itype,jtype;
	double qtmp,xtmp,ytmp,ztmp,delx,dely,delz,evdwl,evdwl12,evdwl6,ecoul,fpair;
	double r,rinv,r3inv,rsq,r2inv,r6inv,forcecoul,forcelj,factor_coul,factor_lj;
	double switch1;
	int ilist,jlist,numneigh,*firstneigh;

	evdwl = ecoul = 0.0;
	if (eflag \|\| vflag) ev_setup(eflag,vflag);
	else evflag = vflag_fdotr = 0;

	double **x = atom->x;
	double **f = atom->f;
	double *q = atom->q;
	int *type = atom->type;
	int nlocal = atom->nlocal;
	double *special_coul = force->special_coul;
	double *special_lj = force->special_lj;
	int newton_pair = force->newton_pair;
	double qqrd2e = force->qqrd2e;

	inum = list->inum;
	ilist = list->ilist;
	numneigh = list->numneigh;
	firstneigh = list->firstneigh;

	// loop over neighbors of my atoms

	for (ii = 0; ii < inum; ii++) {
	i = ilist[ii];
	qtmp = q[i];
	xtmp = x[i][0];
	ytmp = x[i][1];
	ztmp = x[i][2];
	itype = type[i];
	jlist = firstneigh[i];
	jnum = numneigh[i];

	for (jj = 0; jj < jnum; jj++) {
	j = jlist[jj];
	factor_lj = special_lj[sbmask(j)];
	factor_coul = special_coul[sbmask(j)];
	j &= NEIGHMASK;

	delx = xtmp - x[j][0];
	dely = ytmp - x[j][1];
	delz = ztmp - x[j][2];
	rsq = delxdelx + delydely + delz*delz;

	if (rsq < cut_bothsq) {
	r2inv = 1.0/rsq;
	r = sqrt(rsq);

	if (rsq < cut_coulsq) {
	forcecoul = qqrd2e * qtmpq[j]
	(sqrt(r2inv) - rcut_coulinvcut_coulinv);
	} else forcecoul = 0.0;

	if (rsq < cut_ljsq) {
	r6inv = r2invr2invr2inv;
	jtype = type[j];
	forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
	if (rsq > cut_lj_innersq) {
	switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
	(cut_ljsq + 2.0rsq - 3.0cut_lj_innersq) / denom_lj;
	forcelj = forcelj*switch1;
	}
	} else forcelj = 0.0;

	fpair = (factor_coulforcecoul + factor_ljforcelj) * r2inv;

	f[i][0] += delx*fpair;
	f[i][1] += dely*fpair;
	f[i][2] += delz*fpair;
	if (newton_pair \|\| j < nlocal) {
	f[j][0] -= delx*fpair;
	f[j][1] -= dely*fpair;
	f[j][2] -= delz*fpair;
	}

	if (eflag) {
	if (rsq < cut_coulsq) {
	ecoul = qqrd2e * qtmpq[j]
	(sqrt(r2inv) + cut_coulinvcut_coulinvr - 2.0*cut_coulinv);
	ecoul *= factor_coul;
	} else ecoul = 0.0;
	if (rsq < cut_ljsq) {
	if (rsq > cut_lj_innersq) {
	rinv = 1.0/r;
	r3inv = rinvrinvrinv;
	evdwl12 = lj3[itype][jtype]cut_lj6denom_lj12 *
	(r6inv - cut_lj6inv)*(r6inv - cut_lj6inv);
	evdwl6 = -lj4[itype][jtype]cut_lj3denom_lj6 *
	(r3inv - cut_lj3inv)*(r3inv - cut_lj3inv);;
	evdwl = evdwl12 + evdwl6;
	} else {
	evdwl12 = r6invlj3[itype][jtype]r6inv -
	lj3[itype][jtype]cut_lj_inner6invcut_lj6inv;
	evdwl6 = -lj4[itype][jtype]*r6inv +
	lj4[itype][jtype]cut_lj_inner3invcut_lj3inv;
	evdwl = evdwl12 + evdwl6;
	}
	evdwl *= factor_lj;
	} else evdwl = 0.0;
	}

	if (evflag) ev_tally(i,j,nlocal,newton_pair,
	evdwl,ecoul,fpair,delx,dely,delz);
	}
	}
	}

	if (vflag_fdotr) virial_fdotr_compute();
	}

	/* ----------------------------------------------------------------------
	allocate all arrays
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulCharmmfsh::allocate()
	{
	allocated = 1;
	int n = atom->ntypes;

	memory->create(setflag,n+1,n+1,"pair:setflag");
	for (int i = 1; i <= n; i++)
	for (int j = i; j <= n; j++)
	setflag[i][j] = 0;

	memory->create(cutsq,n+1,n+1,"pair:cutsq");

	memory->create(epsilon,n+1,n+1,"pair:epsilon");
	memory->create(sigma,n+1,n+1,"pair:sigma");
	memory->create(eps14,n+1,n+1,"pair:eps14");
	memory->create(sigma14,n+1,n+1,"pair:sigma14");
	memory->create(lj1,n+1,n+1,"pair:lj1");
	memory->create(lj2,n+1,n+1,"pair:lj2");
	memory->create(lj3,n+1,n+1,"pair:lj3");
	memory->create(lj4,n+1,n+1,"pair:lj4");
	memory->create(lj14_1,n+1,n+1,"pair:lj14_1");
	memory->create(lj14_2,n+1,n+1,"pair:lj14_2");
	memory->create(lj14_3,n+1,n+1,"pair:lj14_3");
	memory->create(lj14_4,n+1,n+1,"pair:lj14_4");
	}

	/* ----------------------------------------------------------------------
	global settings
	unlike other pair styles,
	there are no individual pair settings that these override
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulCharmmfsh::settings(int narg, char **arg)
	{
	if (narg != 2 && narg != 3)
	error->all(FLERR,"Illegal pair_style command");

	cut_lj_inner = force->numeric(FLERR,arg[0]);
	cut_lj = force->numeric(FLERR,arg[1]);
	if (narg == 2) {
	cut_coul = cut_lj;
	} else {
	cut_coul = force->numeric(FLERR,arg[2]);
	}
	-
	- // indicates pair_style being used for dihedral_charmm
	-
	- dihedflag = 0;
	}

	/* ----------------------------------------------------------------------
	set coeffs for one or more type pairs
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulCharmmfsh::coeff(int narg, char **arg)
	{
	if (narg != 4 && narg != 6)
	error->all(FLERR,"Incorrect args for pair coefficients");
	if (!allocated) allocate();

	int ilo,ihi,jlo,jhi;
	force->bounds(FLERR,arg[0],atom->ntypes,ilo,ihi);
	force->bounds(FLERR,arg[1],atom->ntypes,jlo,jhi);

	double epsilon_one = force->numeric(FLERR,arg[2]);
	double sigma_one = force->numeric(FLERR,arg[3]);
	double eps14_one = epsilon_one;
	double sigma14_one = sigma_one;
	if (narg == 6) {
	eps14_one = force->numeric(FLERR,arg[4]);
	sigma14_one = force->numeric(FLERR,arg[5]);
	}

	int count = 0;
	for (int i = ilo; i <= ihi; i++) {
	for (int j = MAX(jlo,i); j <= jhi; j++) {
	epsilon[i][j] = epsilon_one;
	sigma[i][j] = sigma_one;
	eps14[i][j] = eps14_one;
	sigma14[i][j] = sigma14_one;
	setflag[i][j] = 1;
	count++;
	}
	}

	if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients");
	}

	/* ----------------------------------------------------------------------
	init specific to this pair style
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulCharmmfsh::init_style()
	{
	if (!atom->q_flag)
	error->all(FLERR,"Pair style lj/charmmfsw/coul/charmmfsh "
	"requires atom attribute q");

	neighbor->request(this,instance_me);

	// require cut_lj_inner < cut_lj

	if (cut_lj_inner >= cut_lj)
	error->all(FLERR,"Pair inner lj cutoff >= Pair outer lj cutoff");

	cut_lj_innersq = cut_lj_inner * cut_lj_inner;
	cut_ljsq = cut_lj * cut_lj;
	cut_ljinv = 1.0/cut_lj;
	cut_lj_innerinv = 1.0/cut_lj_inner;
	cut_lj3 = cut_lj * cut_lj * cut_lj;
	cut_lj3inv = cut_ljinv * cut_ljinv * cut_ljinv;
	cut_lj_inner3inv = cut_lj_innerinv * cut_lj_innerinv * cut_lj_innerinv;
	cut_lj_inner3 = cut_lj_inner * cut_lj_inner * cut_lj_inner;
	cut_lj6 = cut_ljsq * cut_ljsq * cut_ljsq;
	cut_lj6inv = cut_lj3inv * cut_lj3inv;
	cut_lj_inner6inv = cut_lj_inner3inv * cut_lj_inner3inv;
	cut_lj_inner6 = cut_lj_innersq * cut_lj_innersq * cut_lj_innersq;
	cut_coulsq = cut_coul * cut_coul;
	cut_coulinv = 1.0/cut_coul;
	cut_bothsq = MAX(cut_ljsq,cut_coulsq);

	denom_lj = (cut_ljsq-cut_lj_innersq) * (cut_ljsq-cut_lj_innersq) *
	(cut_ljsq-cut_lj_innersq);
	denom_lj12 = 1.0/(cut_lj6 - cut_lj_inner6);
	denom_lj6 = 1.0/(cut_lj3 - cut_lj_inner3);
	}

	/* ----------------------------------------------------------------------
	init for one type pair i,j and corresponding j,i
	------------------------------------------------------------------------- */

	double PairLJCharmmfswCoulCharmmfsh::init_one(int i, int j)
	{
	if (setflag[i][j] == 0) {
	epsilon[i][j] = mix_energy(epsilon[i][i],epsilon[j][j],
	sigma[i][i],sigma[j][j]);
	sigma[i][j] = mix_distance(sigma[i][i],sigma[j][j]);
	eps14[i][j] = mix_energy(eps14[i][i],eps14[j][j],
	sigma14[i][i],sigma14[j][j]);
	sigma14[i][j] = mix_distance(sigma14[i][i],sigma14[j][j]);
	}

	double cut = MAX(cut_lj,cut_coul);

	lj1[i][j] = 48.0 * epsilon[i][j] * pow(sigma[i][j],12.0);
	lj2[i][j] = 24.0 * epsilon[i][j] * pow(sigma[i][j],6.0);
	lj3[i][j] = 4.0 * epsilon[i][j] * pow(sigma[i][j],12.0);
	lj4[i][j] = 4.0 * epsilon[i][j] * pow(sigma[i][j],6.0);
	lj14_1[i][j] = 48.0 * eps14[i][j] * pow(sigma14[i][j],12.0);
	lj14_2[i][j] = 24.0 * eps14[i][j] * pow(sigma14[i][j],6.0);
	lj14_3[i][j] = 4.0 * eps14[i][j] * pow(sigma14[i][j],12.0);
	lj14_4[i][j] = 4.0 * eps14[i][j] * pow(sigma14[i][j],6.0);

	lj1[j][i] = lj1[i][j];
	lj2[j][i] = lj2[i][j];
	lj3[j][i] = lj3[i][j];
	lj4[j][i] = lj4[i][j];
	lj14_1[j][i] = lj14_1[i][j];
	lj14_2[j][i] = lj14_2[i][j];
	lj14_3[j][i] = lj14_3[i][j];
	lj14_4[j][i] = lj14_4[i][j];

	return cut;
	}

	/* ----------------------------------------------------------------------
	proc 0 writes to data file
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulCharmmfsh::write_data(FILE *fp)
	{
	for (int i = 1; i <= atom->ntypes; i++)
	fprintf(fp,"%d %g %g %g %g\n",
	i,epsilon[i][i],sigma[i][i],eps14[i][i],sigma14[i][i]);
	}

	/* ----------------------------------------------------------------------
	proc 0 writes all pairs to data file
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulCharmmfsh::write_data_all(FILE *fp)
	{
	for (int i = 1; i <= atom->ntypes; i++)
	for (int j = i; j <= atom->ntypes; j++)
	fprintf(fp,"%d %d %g %g %g %g\n",i,j,
	epsilon[i][j],sigma[i][j],eps14[i][j],sigma14[i][j]);
	}


	/* ----------------------------------------------------------------------
	proc 0 writes to restart file
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulCharmmfsh::write_restart(FILE *fp)
	{
	write_restart_settings(fp);

	int i,j;
	for (i = 1; i <= atom->ntypes; i++)
	for (j = i; j <= atom->ntypes; j++) {
	fwrite(&setflag[i][j],sizeof(int),1,fp);
	if (setflag[i][j]) {
	fwrite(&epsilon[i][j],sizeof(double),1,fp);
	fwrite(&sigma[i][j],sizeof(double),1,fp);
	fwrite(&eps14[i][j],sizeof(double),1,fp);
	fwrite(&sigma14[i][j],sizeof(double),1,fp);
	}
	}
	}

	/* ----------------------------------------------------------------------
	proc 0 reads from restart file, bcasts
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulCharmmfsh::read_restart(FILE *fp)
	{
	read_restart_settings(fp);

	allocate();

	int i,j;
	int me = comm->me;
	for (i = 1; i <= atom->ntypes; i++)
	for (j = i; j <= atom->ntypes; j++) {
	if (me == 0) fread(&setflag[i][j],sizeof(int),1,fp);
	MPI_Bcast(&setflag[i][j],1,MPI_INT,0,world);
	if (setflag[i][j]) {
	if (me == 0) {
	fread(&epsilon[i][j],sizeof(double),1,fp);
	fread(&sigma[i][j],sizeof(double),1,fp);
	fread(&eps14[i][j],sizeof(double),1,fp);
	fread(&sigma14[i][j],sizeof(double),1,fp);
	}
	MPI_Bcast(&epsilon[i][j],1,MPI_DOUBLE,0,world);
	MPI_Bcast(&sigma[i][j],1,MPI_DOUBLE,0,world);
	MPI_Bcast(&eps14[i][j],1,MPI_DOUBLE,0,world);
	MPI_Bcast(&sigma14[i][j],1,MPI_DOUBLE,0,world);
	}
	}
	}

	/* ----------------------------------------------------------------------
	proc 0 writes to restart file
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulCharmmfsh::write_restart_settings(FILE *fp)
	{
	fwrite(&cut_lj_inner,sizeof(double),1,fp);
	fwrite(&cut_lj,sizeof(double),1,fp);
	fwrite(&cut_coul,sizeof(double),1,fp);
	fwrite(&offset_flag,sizeof(int),1,fp);
	fwrite(&mix_flag,sizeof(int),1,fp);
	}

	/* ----------------------------------------------------------------------
	proc 0 reads from restart file, bcasts
	------------------------------------------------------------------------- */

	void PairLJCharmmfswCoulCharmmfsh::read_restart_settings(FILE *fp)
	{
	if (comm->me == 0) {
	fread(&cut_lj_inner,sizeof(double),1,fp);
	fread(&cut_lj,sizeof(double),1,fp);
	fread(&cut_coul,sizeof(double),1,fp);
	fread(&offset_flag,sizeof(int),1,fp);
	fread(&mix_flag,sizeof(int),1,fp);
	}
	MPI_Bcast(&cut_lj_inner,1,MPI_DOUBLE,0,world);
	MPI_Bcast(&cut_lj,1,MPI_DOUBLE,0,world);
	MPI_Bcast(&cut_coul,1,MPI_DOUBLE,0,world);
	MPI_Bcast(&offset_flag,1,MPI_INT,0,world);
	MPI_Bcast(&mix_flag,1,MPI_INT,0,world);
	}

	/* ---------------------------------------------------------------------- */

	double PairLJCharmmfswCoulCharmmfsh::
	single(int i, int j, int itype, int jtype,
	double rsq, double factor_coul, double factor_lj, double &fforce)
	{
	double r,rinv,r2inv,r3inv,r6inv,forcecoul,forcelj;
	double phicoul,philj,philj12,philj6;
	double switch1;

	r2inv = 1.0/rsq;
	r = sqrt(rsq);
	rinv = 1.0/r;
	if (rsq < cut_coulsq) {
	forcecoul = force->qqrd2e * atom->q[i]atom->q[j]
	(sqrt(r2inv) - rcut_coulinvcut_coulinv);
	} else forcecoul = 0.0;

	if (rsq < cut_ljsq) {
	r6inv = r2invr2invr2inv;
	r3inv = rinvrinvrinv;
	forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
	if (rsq > cut_lj_innersq) {
	switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
	(cut_ljsq + 2.0rsq - 3.0cut_lj_innersq) / denom_lj;
	forcelj = forcelj*switch1;
	}
	} else forcelj = 0.0;

	fforce = (factor_coulforcecoul + factor_ljforcelj) * r2inv;

	double eng = 0.0;
	if (rsq < cut_coulsq) {
	phicoul = force->qqrd2e * atom->q[i]atom->q[j]
	(sqrt(r2inv) + cut_coulinvcut_coulinvr - 2.0*cut_coulinv);
	eng += factor_coul*phicoul;
	}
	if (rsq < cut_ljsq) {
	if (rsq > cut_lj_innersq) {
	philj12 = lj3[itype][jtype]cut_lj6denom_lj12 *
	(r6inv - cut_lj6inv)*(r6inv - cut_lj6inv);
	philj6 = -lj4[itype][jtype]cut_lj3denom_lj6 *
	(r3inv - cut_lj3inv)*(r3inv - cut_lj3inv);;
	philj = philj12 + philj6;
	} else {
	philj12 = r6invlj3[itype][jtype]r6inv -
	lj3[itype][jtype]cut_lj_inner6invcut_lj6inv;
	philj6 = -lj4[itype][jtype]*r6inv +
	lj4[itype][jtype]cut_lj_inner3invcut_lj3inv;
	philj = philj12 + philj6;
	}
	eng += factor_lj*philj;
	}

	return eng;
	}

	/* ---------------------------------------------------------------------- */

	void PairLJCharmmfswCoulCharmmfsh::extract(const char str, int &dim)
	{
	dim = 2;
	if (strcmp(str,"lj14_1") == 0) return (void *) lj14_1;
	if (strcmp(str,"lj14_2") == 0) return (void *) lj14_2;
	if (strcmp(str,"lj14_3") == 0) return (void *) lj14_3;
	if (strcmp(str,"lj14_4") == 0) return (void *) lj14_4;

	dim = 0;
	if (strcmp(str,"implicit") == 0) return (void *) &implicit;

	- // info extracted by dihedral_charmmf
	+ // info extracted by dihedral_charmmfsw

	if (strcmp(str,"cut_coul") == 0) return (void *) &cut_coul;
	if (strcmp(str,"cut_lj_inner") == 0) return (void *) &cut_lj_inner;
	if (strcmp(str,"cut_lj") == 0) return (void *) &cut_lj;
	if (strcmp(str,"dihedflag") == 0) return (void *) &dihedflag;

	return NULL;
	}
	diff --git a/src/Makefile b/src/Makefile
	index 59f954014..32f9c3787 100644
	--- a/src/Makefile
	+++ b/src/Makefile
	@@ -1,379 +1,381 @@
	# LAMMPS multiple-machine -- Makefile --

	SHELL = /bin/bash
	PYTHON = python

	#.IGNORE:

	# Definitions

	ROOT = lmp
	EXE = lmp_$@
	ARLIB = liblammps_$@.a
	SHLIB = liblammps_$@.so
	ARLINK = liblammps.a
	SHLINK = liblammps.so

	OBJDIR = Obj_$@
	OBJSHDIR = Obj_shared_$@

	SRC = $(wildcard *.cpp)
	INC = $(wildcard *.h)
	OBJ = $(SRC:.cpp=.o)

	SRCLIB = $(filter-out main.cpp,$(SRC))
	OBJLIB = $(filter-out main.o,$(OBJ))

	# Command-line options for mode: exe (default), shexe, lib, shlib

	mode = exe
	objdir = $(OBJDIR)

	ifeq ($(mode),shexe)
	objdir = $(OBJSHDIR)
	endif

	ifeq ($(mode),lib)
	objdir = $(OBJDIR)
	endif

	ifeq ($(mode),shlib)
	objdir = $(OBJSHDIR)
	endif

	# Package variables

	# PACKAGE = standard packages
	# PACKUSER = user packagse
	# PACKLIB = all packages that require an additional lib
	+# should be PACKSYS + PACKINT + PACKEXT
	# PACKSYS = subset that reqiure a common system library
	+# include MPIIO and LB b/c require full MPI, not just STUBS
	# PACKINT = subset that require an internal (provided) library
	# PACKEXT = subset that require an external (downloaded) library
	-# PACKLIB = PACKSYS + PACKING + PACKEXT
	-# PACKSCRIPT = libs under lammps/lib that have an Install.py script

	PACKAGE = asphere body class2 colloid compress coreshell dipole gpu \
	granular kim kokkos kspace manybody mc meam misc molecule \
	mpiio mscg opt peri poems \
	python qeq reax replica rigid shock snap srd voronoi

	-PACKUSER = user-atc user-awpmd user-cg-cmm user-cgdna user-colvars \
	+PACKUSER = user-atc user-awpmd user-cgdna user-cgsdk user-colvars \
	user-diffraction user-dpd user-drude user-eff user-fep user-h5md \
	user-intel user-lb user-manifold user-mgpt user-misc user-molfile \
	- user-nc-dump user-omp user-phonon user-qmmm user-qtb \
	+ user-netcdf user-omp user-phonon user-qmmm user-qtb \
	user-quip user-reaxc user-smd user-smtbq user-sph user-tally \
	user-vtk

	PACKLIB = compress gpu kim kokkos meam mpiio mscg poems \
	python reax voronoi \
	- user-atc user-awpmd user-colvars user-h5md user-molfile \
	- user-nc-dump user-qmmm user-quip user-smd user-vtk
	+ user-atc user-awpmd user-colvars user-h5md user-lb user-molfile \
	+ user-netcdf user-qmmm user-quip user-smd user-vtk

	-PACKSYS = compress mpiio python
	+PACKSYS = compress mpiio python user-lb

	PACKINT = gpu kokkos meam poems reax user-atc user-awpmd user-colvars

	PACKEXT = kim mscg voronoi \
	- user-h5md user-molfile user-nc-dump user-qmmm user-quip \
	+ user-h5md user-molfile user-netcdf user-qmmm user-quip \
	user-smd user-vtk

	-PACKSCRIPT = voronoi
	-
	PACKALL = $(PACKAGE) $(PACKUSER)

	PACKAGEUC = $(shell echo $(PACKAGE) \| tr a-z A-Z)
	PACKUSERUC = $(shell echo $(PACKUSER) \| tr a-z A-Z)

	YESDIR = $(shell echo $(@:yes-%=%) \| tr a-z A-Z)
	NODIR = $(shell echo $(@:no-%=%) \| tr a-z A-Z)
	LIBDIR = $(shell echo $(@:lib-%=%))
	+LIBUSERDIR = $(shell echo $(@:lib-user-%=%))

	# List of all targets

	help:
	@echo ''
	@echo 'make clean-all delete all object files'
	@echo 'make clean-machine delete object files for one machine'
	@echo 'make mpi-stubs build dummy MPI library in STUBS'
	@echo 'make install-python install LAMMPS wrapper in Python'
	@echo 'make tar create lmp_src.tar.gz for src dir and packages'
	@echo ''
	@echo 'make package list available packages and their dependencies'
	@echo 'make package-status (ps) status of all packages'
	@echo 'make yes-package install a single pgk in src dir'
	@echo 'make no-package remove a single pkg from src dir'
	@echo 'make yes-all install all pgks in src dir'
	@echo 'make no-all remove all pkgs from src dir'
	@echo 'make yes-standard (yes-std) install all standard pkgs'
	@echo 'make no-standard (no-std) remove all standard pkgs'
	@echo 'make yes-user install all user pkgs'
	@echo 'make no-user remove all user pkgs'
	- @echo 'make yes-lib install all pkgs with libs (incldued or ext)'
	+ @echo 'make yes-lib install all pkgs with libs (included or ext)'
	@echo 'make no-lib remove all pkgs with libs (included or ext)'
	@echo 'make yes-ext install all pkgs with external libs'
	@echo 'make no-ext remove all pkgs with external libs'
	@echo ''
	@echo 'make package-update (pu) replace src files with updated package files'
	@echo 'make package-overwrite replace package files with src files'
	@echo 'make package-diff (pd) diff src files against package files'
	@echo ''
	@echo 'make lib-package download/build/install a package library'
	@echo 'make purge purge obsolete copies of source files'
	@echo ''
	@echo 'make machine build LAMMPS for machine'
	@echo 'make mode=lib machine build LAMMPS as static lib for machine'
	@echo 'make mode=shlib machine build LAMMPS as shared lib for machine'
	@echo 'make mode=shexe machine build LAMMPS as shared exe for machine'
	@echo 'make makelist create Makefile.list used by old makes'
	@echo 'make -f Makefile.list machine build LAMMPS for machine (old)'
	@echo ''
	@echo 'machine is one of these from src/MAKE:'
	@echo ''
	@files="`ls MAKE/Makefile.*`"; \
	for file in $$files; do head -1 $$file; done
	@echo ''
	@echo '... or one of these from src/MAKE/OPTIONS:'
	@echo ''
	@files="`ls MAKE/OPTIONS/Makefile.*`"; \
	for file in $$files; do head -1 $$file; done
	@echo ''
	@echo '... or one of these from src/MAKE/MACHINES:'
	@echo ''
	@files="`ls MAKE/MACHINES/Makefile.*`"; \
	for file in $$files; do head -1 $$file; done
	@echo ''
	@echo '... or one of these from src/MAKE/MINE:'
	@echo ''
	@files="`ls MAKE/MINE/Makefile.* 2>/dev/null`"; \
	for file in $$files; do head -1 $$file; done
	@echo ''

	# Build LAMMPS in one of 4 modes
	# exe = exe with static compile in Obj_machine (default)
	# shexe = exe with shared compile in Obj_shared_machine
	# lib = static lib in Obj_machine
	# shlib = shared lib in Obj_shared_machine

	.DEFAULT:
	@if [ $@ = "serial" -a ! -f STUBS/libmpi_stubs.a ]; \
	then $(MAKE) mpi-stubs; fi
	@test -f MAKE/Makefile.$@ -o -f MAKE/OPTIONS/Makefile.$@ -o \
	-f MAKE/MACHINES/Makefile.$@ -o -f MAKE/MINE/Makefile.$@
	@if [ ! -d $(objdir) ]; then mkdir $(objdir); fi
	@$(SHELL) Make.sh style
	@if [ -f MAKE/MACHINES/Makefile.$@ ]; \
	then cp MAKE/MACHINES/Makefile.$@ $(objdir)/Makefile; fi
	@if [ -f MAKE/OPTIONS/Makefile.$@ ]; \
	then cp MAKE/OPTIONS/Makefile.$@ $(objdir)/Makefile; fi
	@if [ -f MAKE/Makefile.$@ ]; \
	then cp MAKE/Makefile.$@ $(objdir)/Makefile; fi
	@if [ -f MAKE/MINE/Makefile.$@ ]; \
	then cp MAKE/MINE/Makefile.$@ $(objdir)/Makefile; fi
	@if [ ! -e Makefile.package ]; \
	then cp Makefile.package.empty Makefile.package; fi
	@if [ ! -e Makefile.package.settings ]; \
	then cp Makefile.package.settings.empty Makefile.package.settings; fi
	@cp Makefile.package Makefile.package.settings $(objdir)
	@cd $(objdir); rm -f .depend; \
	$(MAKE) $(MFLAGS) "SRC = $(SRC)" "INC = $(INC)" depend \|\| :
	ifeq ($(mode),exe)
	@cd $(objdir); \
	$(MAKE) $(MFLAGS) "OBJ = $(OBJ)" "INC = $(INC)" "SHFLAGS =" \
	"EXE = ../$(EXE)" ../$(EXE)
	endif
	ifeq ($(mode),shexe)
	@cd $(objdir); \
	$(MAKE) $(MFLAGS) "OBJ = $(OBJ)" "INC = $(INC)" \
	"EXE = ../$(EXE)" ../$(EXE)
	endif
	ifeq ($(mode),lib)
	@cd $(objdir); \
	$(MAKE) $(MFLAGS) "OBJ = $(OBJLIB)" "INC = $(INC)" "SHFLAGS =" \
	"EXE = ../$(ARLIB)" lib
	@rm -f $(ARLINK)
	@ln -s $(ARLIB) $(ARLINK)
	endif
	ifeq ($(mode),shlib)
	@cd $(objdir); \
	$(MAKE) $(MFLAGS) "OBJ = $(OBJLIB)" "INC = $(INC)" \
	"EXE = ../$(SHLIB)" shlib
	@rm -f $(SHLINK)
	@ln -s $(SHLIB) $(SHLINK)
	endif

	# Remove machine-specific object files

	clean:
	@echo 'make clean-all delete all object files'
	@echo 'make clean-machine delete object files for one machine'

	clean-all:
	rm -rf Obj_*
	clean-%:
	rm -rf Obj_$(@:clean-%=%) Obj_shared_$(@:clean-%=%)

	# Create Makefile.list

	makelist:
	@$(SHELL) Make.sh style
	@$(SHELL) Make.sh Makefile.list

	# Make MPI STUBS library

	mpi-stubs:
	@cd STUBS; $(MAKE) clean; $(MAKE)

	# install LAMMPS shared lib and Python wrapper for Python usage
	# include python package settings to automatically adapt name of python interpreter
	sinclude ../lib/python/Makefile.lammps
	install-python:
	@$(PYTHON) ../python/install.py

	# Create a tarball of src dir and packages

	tar:
	@cd STUBS; $(MAKE) clean
	@cd ..; tar cvzf src/$(ROOT)_src.tar.gz \
	src/Make* src/Package.sh src/Depend.sh src/Install.sh \
	src/MAKE src/DEPEND src/.cpp src/.h src/STUBS \
	$(patsubst %,src/%,$(PACKAGEUC)) $(patsubst %,src/%,$(PACKUSERUC)) \
	--exclude=*/.svn
	@cd STUBS; $(MAKE)
	@echo "Created $(ROOT)_src.tar.gz"

	# Package management

	package:
	@echo 'Standard packages:' $(PACKAGE)
	@echo ''
	@echo 'User-contributed packages:' $(PACKUSER)
	@echo ''
	@echo 'Packages that need system libraries:' $(PACKSYS)
	@echo ''
	@echo 'Packages that need provided libraries:' $(PACKINT)
	@echo ''
	@echo 'Packages that need external libraries:' $(PACKEXT)
	@echo ''
	@echo 'make package list available packages'
	@echo 'make package list available packages'
	@echo 'make package-status (ps) status of all packages'
	@echo 'make yes-package install a single pgk in src dir'
	@echo 'make no-package remove a single pkg from src dir'
	@echo 'make yes-all install all pgks in src dir'
	@echo 'make no-all remove all pkgs from src dir'
	@echo 'make yes-standard (yes-std) install all standard pkgs'
	@echo 'make no-standard (no-srd) remove all standard pkgs'
	@echo 'make yes-user install all user pkgs'
	@echo 'make no-user remove all user pkgs'
	@echo 'make yes-lib install all pkgs with libs (included or ext)'
	@echo 'make no-lib remove all pkgs with libs (included or ext)'
	@echo 'make yes-ext install all pkgs with external libs'
	@echo 'make no-ext remove all pkgs with external libs'
	@echo ''
	@echo 'make package-update (pu) replace src files with package files'
	@echo 'make package-overwrite replace package files with src files'
	@echo 'make package-diff (pd) diff src files against package file'
	@echo ''
	- @echo 'make lib-package download/build/install a package library'
	+ @echo 'make lib-package build and/or download a package library'

	yes-all:
	@for p in $(PACKALL); do $(MAKE) yes-$$p; done

	no-all:
	@for p in $(PACKALL); do $(MAKE) no-$$p; done

	yes-standard yes-std:
	@for p in $(PACKAGE); do $(MAKE) yes-$$p; done

	no-standard no-std:
	@for p in $(PACKAGE); do $(MAKE) no-$$p; done

	yes-user:
	@for p in $(PACKUSER); do $(MAKE) yes-$$p; done

	no-user:
	@for p in $(PACKUSER); do $(MAKE) no-$$p; done

	yes-lib:
	@for p in $(PACKLIB); do $(MAKE) yes-$$p; done

	no-lib:
	@for p in $(PACKLIB); do $(MAKE) no-$$p; done

	yes-ext:
	@for p in $(PACKEXT); do $(MAKE) yes-$$p; done

	no-ext:
	@for p in $(PACKEXT); do $(MAKE) no-$$p; done

	yes-%:
	@if [ ! -e Makefile.package ]; \
	then cp Makefile.package.empty Makefile.package; fi
	@if [ ! -e Makefile.package.settings ]; \
	then cp Makefile.package.settings.empty Makefile.package.settings; fi
	@if [ ! -e $(YESDIR) ]; then \
	echo "Package $(@:yes-%=%) does not exist"; \
	elif [ -e $(YESDIR)/Install.sh ]; then \
	echo "Installing package $(@:yes-%=%)"; \
	cd $(YESDIR); $(SHELL) Install.sh 1; cd ..; \
	$(SHELL) Depend.sh $(YESDIR) 1; \
	else \
	echo "Installing package $(@:yes-%=%)"; \
	cd $(YESDIR); $(SHELL) ../Install.sh 1; cd ..; \
	$(SHELL) Depend.sh $(YESDIR) 1; \
	fi;

	no-%:
	@if [ ! -e $(NODIR) ]; then \
	echo "Package $(@:no-%=%) does not exist"; \
	elif [ -e $(NODIR)/Install.sh ]; then \
	echo "Uninstalling package $(@:no-%=%)"; \
	cd $(NODIR); $(SHELL) Install.sh 0; cd ..; \
	$(SHELL) Depend.sh $(NODIR) 0; \
	else \
	echo "Uninstalling package $(@:no-%=%)"; \
	cd $(NODIR); $(SHELL) ../Install.sh 0; cd ..; \
	$(SHELL) Depend.sh $(NODIR) 0; \
	fi;

	# download/build/install a package library

	lib-%:
	- @if [ ! -e ../lib/$(LIBDIR)/Install.py ]; then \
	- echo "Install script for lib $(@:lib-%=%) does not exist"; \
	- else \
	- echo "Installing lib for package $(@:lib-%=%)"; \
	+ @if [ -e ../lib/$(LIBDIR)/Install.py ]; then \
	+ echo "Installing lib $(@:lib-%=%)"; \
	cd ../lib/$(LIBDIR); python Install.py $(args); \
	+ elif [ -e ../lib/$(LIBUSERDIR)/Install.py ]; then \
	+ echo "Installing lib $(@:lib-user-%=%)"; \
	+ cd ../lib/$(LIBUSERDIR); python Install.py $(args); \
	+ else \
	+ echo "Install script for lib $(@:lib-%=%) does not exist"; \
	fi;

	# status = list src files that differ from package files
	# update = replace src files with newer package files
	# overwrite = overwrite package files with newer src files
	# diff = show differences between src and package files
	# purge = delete obsolete and auto-generated package files

	package-status ps:
	@for p in $(PACKAGEUC); do $(SHELL) Package.sh $$p status; done
	@echo ''
	@for p in $(PACKUSERUC); do $(SHELL) Package.sh $$p status; done

	package-update pu:
	@for p in $(PACKAGEUC); do $(SHELL) Package.sh $$p update; done
	@echo ''
	@for p in $(PACKUSERUC); do $(SHELL) Package.sh $$p update; done

	package-overwrite:
	@for p in $(PACKAGEUC); do $(SHELL) Package.sh $$p overwrite; done
	@echo ''
	@for p in $(PACKUSERUC); do $(SHELL) Package.sh $$p overwrite; done

	package-diff pd:
	@for p in $(PACKAGEUC); do $(SHELL) Package.sh $$p diff; done
	@echo ''
	@for p in $(PACKUSERUC); do $(SHELL) Package.sh $$p diff; done

	purge: Purge.list
	@echo 'Purging obsolete and auto-generated source files'
	@for f in `grep -v '#' Purge.list` ; \
	do test -f $$f && rm $$f && echo $$f \|\| : ; \
	done
	diff --git a/src/Purge.list b/src/Purge.list
	index 554c5df82..6326dbadf 100644
	--- a/src/Purge.list
	+++ b/src/Purge.list
	@@ -1,451 +1,470 @@
	# auto-generated style files
	style_angle.h
	style_atom.h
	style_bond.h
	style_command.h
	style_compute.h
	style_dihedral.h
	style_dump.h
	style_fix.h
	style_improper.h
	style_integrate.h
	style_kspace.h
	style_minimize.h
	style_pair.h
	style_region.h
	style_neigh_bin.h
	style_neigh_pair.h
	style_neigh_stencil.h
	+# deleted on 4 May 2017
	+pair_reax_c.cpp
	+pair_reax_c.h
	+fix_reax_c_bonds.cpp
	+fix_reax_c_bonds.h
	+fix_reax_c_species.cpp
	+fix_reax_c_species.h
	+pair_reax_c_kokkos.cpp
	+pair_reax_c_kokkos.h
	+fix_reax_c_bonds_kokkos.cpp
	+fix_reax_c_bonds_kokkos.h
	+fix_reax_c_species_kokkos.cpp
	+fix_reax_c_species_kokkos.h
	+# deleted on 19 April 2017
	+vmdplugin.h
	+molfile_plugin.h
	+# deleted on 13 April 2017
	+dihedral_charmmfsh.cpp
	+dihedral_charmmfsh.h
	# deleted on ## XXX 2016
	accelerator_intel.h
	neigh_bond.cpp
	neigh_bond.h
	neigh_derive.cpp
	neigh_derive.h
	neigh_full.cpp
	neigh_full.h
	neigh_gran.cpp
	neigh_gran.h
	neigh_half_bin.cpp
	neigh_half_bin.h
	neigh_half_multi.cpp
	neigh_half_multi.h
	neigh_half_nsq.cpp
	neigh_half_nsq.h
	neigh_respa.cpp
	neigh_respa.h
	neigh_shardlow.cpp
	neigh_shardlow.h
	neigh_stencil.cpp
	neigh_half_bin_intel.cpp
	neigh_full_kokkos.h
	neighbor_omp.h
	neigh_derive_omp.cpp
	neigh_full_omp.cpp
	neigh_gran_omp.cpp
	neigh_half_bin_omp.cpp
	neigh_half_multi_omp.cpp
	neigh_half_nsq_omp.cpp
	neigh_respa_omp.cpp
	# deleted on 20 Sep 2016
	fix_ti_rs.cpp
	fix_ti_rs.h
	# deleted on 31 May 2016
	fix_ave_spatial_sphere.cpp
	fix_ave_spatial_sphere.h
	atom_vec_angle_cuda.cpp
	atom_vec_angle_cuda.h
	atom_vec_atomic_cuda.cpp
	atom_vec_atomic_cuda.h
	atom_vec_charge_cuda.cpp
	atom_vec_charge_cuda.h
	atom_vec_full_cuda.cpp
	atom_vec_full_cuda.h
	comm_cuda.cpp
	comm_cuda.h
	compute_pe_cuda.cpp
	compute_pe_cuda.h
	compute_pressure_cuda.cpp
	compute_pressure_cuda.h
	compute_temp_cuda.cpp
	compute_temp_cuda.h
	compute_temp_partial_cuda.cpp
	compute_temp_partial_cuda.h
	cuda.cpp
	cuda_data.h
	cuda_modify_flags.h
	cuda_neigh_list.cpp
	cuda_neigh_list.h
	domain_cuda.cpp
	domain_cuda.h
	fft3d_cuda.cpp
	fft3d_cuda.h
	fft3d_wrap_cuda.cpp
	fft3d_wrap_cuda.h
	fix_addforce_cuda.cpp
	fix_addforce_cuda.h
	fix_aveforce_cuda.cpp
	fix_aveforce_cuda.h
	fix_enforce2d_cuda.cpp
	fix_enforce2d_cuda.h
	fix_freeze_cuda.cpp
	fix_freeze_cuda.h
	fix_gravity_cuda.cpp
	fix_gravity_cuda.h
	fix_nh_cuda.cpp
	fix_nh_cuda.h
	fix_npt_cuda.cpp
	fix_npt_cuda.h
	fix_nve_cuda.cpp
	fix_nve_cuda.h
	fix_nvt_cuda.cpp
	fix_nvt_cuda.h
	fix_set_force_cuda.cpp
	fix_set_force_cuda.h
	fix_shake_cuda.cpp
	fix_shake_cuda.h
	fix_temp_berendsen_cuda.cpp
	fix_temp_berendsen_cuda.h
	fix_temp_rescale_cuda.cpp
	fix_temp_rescale_cuda.h
	fix_temp_rescale_limit_cuda.cpp
	fix_temp_rescale_limit_cuda.h
	fix_viscous_cuda.cpp
	fix_viscous_cuda.h
	modify_cuda.cpp
	modify_cuda.h
	neighbor_cuda.cpp
	neighbor_cuda.h
	neigh_full_cuda.cpp
	pair_born_coul_long_cuda.cpp
	pair_born_coul_long_cuda.h
	pair_buck_coul_cut_cuda.cpp
	pair_buck_coul_cut_cuda.h
	pair_buck_coul_long_cuda.cpp
	pair_buck_coul_long_cuda.h
	pair_buck_cuda.cpp
	pair_buck_cuda.h
	pair_eam_alloy_cuda.cpp
	pair_eam_alloy_cuda.h
	pair_eam_cuda.cpp
	pair_eam_cuda.h
	pair_eam_fs_cuda.cpp
	pair_eam_fs_cuda.h
	pair_gran_hooke_cuda.cpp
	pair_gran_hooke_cuda.h
	pair_lj96_cut_cuda.cpp
	pair_lj96_cut_cuda.h
	pair_lj_charmm_coul_charmm_cuda.cpp
	pair_lj_charmm_coul_charmm_cuda.h
	pair_lj_charmm_coul_charmm_implicit_cuda.cpp
	pair_lj_charmm_coul_charmm_implicit_cuda.h
	pair_lj_charmm_coul_long_cuda.cpp
	pair_lj_charmm_coul_long_cuda.h
	pair_lj_class2_coul_cut_cuda.cpp
	pair_lj_class2_coul_cut_cuda.h
	pair_lj_class2_coul_long_cuda.cpp
	pair_lj_class2_coul_long_cuda.h
	pair_lj_class2_cuda.cpp
	pair_lj_class2_cuda.h
	pair_lj_cut_coul_cut_cuda.cpp
	pair_lj_cut_coul_cut_cuda.h
	pair_lj_cut_coul_debye_cuda.cpp
	pair_lj_cut_coul_debye_cuda.h
	pair_lj_cut_coul_long_cuda.cpp
	pair_lj_cut_coul_long_cuda.h
	pair_lj_cut_cuda.cpp
	pair_lj_cut_cuda.h
	pair_lj_cut_experimental_cuda.cpp
	pair_lj_cut_experimental_cuda.h
	pair_lj_expand_cuda.cpp
	pair_lj_expand_cuda.h
	pair_lj_gromacs_coul_gromacs_cuda.cpp
	pair_lj_gromacs_coul_gromacs_cuda.h
	pair_lj_gromacs_cuda.cpp
	pair_lj_gromacs_cuda.h
	pair_lj_sdk_coul_long_cuda.cpp
	pair_lj_sdk_coul_long_cuda.h
	pair_lj_sdk_cuda.cpp
	pair_lj_sdk_cuda.h
	pair_lj_smooth_cuda.cpp
	pair_lj_smooth_cuda.h
	pair_morse_cuda.cpp
	pair_morse_cuda.h
	pair_sw_cuda.cpp
	pair_sw_cuda.h
	pair_tersoff_cuda.cpp
	pair_tersoff_cuda.h
	pair_tersoff_zbl_cuda.cpp
	pair_tersoff_zbl_cuda.h
	pppm_cuda.cpp
	pppm_cuda.h
	pppm_old.cpp
	pppm_old.h
	user_cuda.h
	verlet_cuda.cpp
	verlet_cuda.h
	# deleted on 11 May 2016
	pair_dpd_conservative.cpp
	pair_dpd_conservative.h
	# deleted on 21 Mar 2016
	verlet_intel.cpp
	verlet_intel.h
	verlet_split_intel.cpp
	verlet_split_intel.h
	# deleted on 15 Jan 2016
	pair_line_lj_omp.cpp
	pair_line_lj_omp.h
	pair_tri_lj_omp.cpp
	pair_tri_lj_omp.h
	# deleted on 13 May 14
	commgrid.cpp
	commgrid.h
	# deleted on 5 May 14
	reaxc_basic_comm.cpp
	reaxc_basic_comm.h
	# deleted on 15 Apr 14
	pppm_old.cpp
	pppm_old.h
	# deleted on Thu Jun 6 15:19:12 2013 +0000
	pair_dipole_cut.h
	pair_dipole_cut.cpp
	pair_dipole_cut_gpu.h
	pair_dipole_cut_gpu.cpp
	pair_dipole_cut_omp.h
	pair_dipole_cut_omp.cpp
	pair_dipole_sf.h
	pair_dipole_sf.cpp
	pair_dipole_sf_omp.h
	pair_dipole_sf_omp.cpp
	pair_dipole_sf_gpu.h
	pair_dipole_sf_gpu.cpp
	# deleted on Wed May 8 15:24:36 2013 +0000
	compute_spec_atom.cpp
	compute_spec_atom.h
	fix_species.cpp
	fix_species.h
	# deleted on Fri Oct 19 15:27:15 2012 +0000
	pair_lj_charmm_coul_long_proxy_omp.cpp
	pair_lj_charmm_coul_long_proxy_omp.h
	pair_lj_class2_coul_long_proxy_omp.cpp
	pair_lj_class2_coul_long_proxy_omp.h
	pair_lj_cut_coul_long_proxy_omp.cpp
	pair_lj_cut_coul_long_proxy_omp.h
	pair_lj_cut_tip4p_long_proxy_omp.cpp
	pair_lj_cut_tip4p_long_proxy_omp.h
	pppm_proxy.cpp
	pppm_proxy.h
	pppm_tip4p_proxy.cpp
	pppm_tip4p_proxy.h
	# deleted on Wed Oct 3 15:17:27 2012 +0000
	pair_lj_cut_coul_long_proxy_tip4p_omp.cpp
	pair_lj_cut_coul_long_proxy_tip4p_omp.h
	# deleted on Wed Oct 3 15:06:24 2012 +0000
	pair_lj_cut_coul_long_tip4p_opt.cpp
	pair_lj_cut_coul_long_tip4p_opt.h
	# deleted on Wed Oct 3 14:53:43 2012 +0000
	pair_lj_charmm_coul_long_proxy_omp.cpp
	pair_lj_charmm_coul_long_proxy_omp.h
	pair_lj_class2_coul_long_proxy_omp.cpp
	pair_lj_class2_coul_long_proxy_omp.h
	pair_lj_cut_coul_long_proxy_omp.cpp
	pair_lj_cut_coul_long_proxy_omp.h
	pair_lj_cut_coul_long_tip4p_omp.cpp
	pair_lj_cut_coul_long_tip4p_omp.h
	# deleted on Wed Oct 3 14:50:44 2012 +0000
	pair_buck_disp_coul_long_omp.cpp
	pair_buck_disp_coul_long_omp.h
	pair_lj_disp_coul_long_omp.cpp
	pair_lj_disp_coul_long_omp.h
	# deleted on Wed Oct 3 14:46:42 2012 +0000
	pair_lj_cut_coul_long_tip4p.cpp
	pair_lj_cut_coul_long_tip4p.h
	# deleted on Wed Oct 3 14:46:23 2012 +0000
	pair_buck_disp_coul_long.cpp
	pair_buck_disp_coul_long.h
	pair_lj_disp_coul_long.cpp
	pair_lj_disp_coul_long.h
	pair_lj_disp_coul_long_tip4p.cpp
	pair_lj_disp_coul_long_tip4p.h
	# deleted on Tue Oct 2 22:50:58 2012 +0000
	pair_buck_coul_omp.cpp
	pair_buck_coul_omp.h
	pair_lj_coul_omp.cpp
	pair_lj_coul_omp.h
	# deleted on Tue Oct 2 20:12:27 2012 +0000
	pair_lj_charmm_coul_pppm_omp.cpp
	pair_lj_charmm_coul_pppm_omp.h
	pair_lj_class2_coul_pppm_omp.cpp
	pair_lj_class2_coul_pppm_omp.h
	pair_lj_cut_coul_pppm_omp.cpp
	pair_lj_cut_coul_pppm_omp.h
	pair_lj_cut_coul_pppm_tip4p_omp.cpp
	pair_lj_cut_coul_pppm_tip4p_omp.h
	# deleted on Tue Oct 2 19:59:40 2012 +0000
	pair_buck_coul_omp.cpp
	pair_buck_coul_omp.h
	pair_lj_coul_omp.cpp
	pair_lj_coul_omp.h
	pair_lj_cut_coul_long_tip4p_omp.cpp
	pair_lj_cut_coul_long_tip4p_omp.h
	pppm_proxy.cpp
	pppm_proxy.h
	pppm_tip4p_proxy.cpp
	pppm_tip4p_proxy.h
	# deleted on Tue Oct 2 19:58:21 2012 +0000
	pair_lj_cut_coul_pppm_omp.cpp
	pair_lj_cut_coul_pppm_omp.h
	pair_lj_cut_coul_pppm_tip4p_omp.cpp
	pair_lj_cut_coul_pppm_tip4p_omp.h
	# deleted on Tue Oct 2 19:58:03 2012 +0000
	pair_lj_charmm_coul_pppm_omp.cpp
	pair_lj_charmm_coul_pppm_omp.h
	pair_lj_class2_coul_pppm_omp.cpp
	pair_lj_class2_coul_pppm_omp.h
	# deleted on Tue Oct 2 16:36:24 2012 +0000
	ewald_n.cpp
	ewald_n.h
	pair_buck_coul.cpp
	pair_buck_coul.h
	pair_lj_coul.cpp
	pair_lj_coul.h
	# deleted on Wed Jul 25 15:17:24 2012 +0000
	pair_lj_sdk_coul_cut_cuda.cpp
	pair_lj_sdk_coul_cut_cuda.h
	pair_lj_sdk_coul_debye_cuda.cpp
	pair_lj_sdk_coul_debye_cuda.h
	# deleted on Tue Jul 24 14:55:49 2012 +0000
	pair_cg_cmm_coul_cut_cuda.cpp
	pair_cg_cmm_coul_cut_cuda.h
	pair_cg_cmm_coul_debye_cuda.cpp
	pair_cg_cmm_coul_debye_cuda.h
	pair_cg_cmm_coul_long_cuda.cpp
	pair_cg_cmm_coul_long_cuda.h
	pair_cg_cmm_cuda.cpp
	pair_cg_cmm_cuda.h
	# deleted on Sat Dec 31 20:27:05 2011 -0500
	ewald_cg.cpp
	ewald_cg.h
	# deleted on Sat Dec 31 20:01:21 2011 -0500
	dihedral_omp.cpp
	dihedral_omp.h
	pair_cg_cmm_omp.cpp
	pair_cg_cmm_omp.h
	pair_lj_cut_coul_long_tip4p_omp.cpp
	pair_lj_cut_coul_long_tip4p_omp.h
	pair_omp.cpp
	pair_omp.h
	# deleted on Thu Dec 8 23:13:51 2011 +0000
	pair_cg_cmm_coul_long_gpu.cpp
	pair_cg_cmm_coul_long_gpu.h
	pair_cg_cmm_gpu.cpp
	pair_cg_cmm_gpu.h
	# deleted on Mon Nov 7 19:32:59 2011 -0500
	pair_cg_cmm_coul_long_gpu.cpp
	pair_cg_cmm_coul_long_gpu.h
	pair_cg_cmm_gpu.cpp
	pair_cg_cmm_gpu.h
	# deleted on Tue Oct 25 23:04:03 2011 -0400
	lj_sdk_common.cpp
	# deleted on Fri Oct 7 08:55:40 2011 -0400
	pair_hybrid_overlay_omp.cpp
	pair_hybrid_overlay_omp.h
	# deleted on Fri Oct 7 08:54:38 2011 -0400
	angle_hybrid_omp.cpp
	angle_hybrid_omp.h
	bond_hybrid_omp.cpp
	bond_hybrid_omp.h
	dihedral_hybrid_omp.cpp
	dihedral_hybrid_omp.h
	improper_hybrid_omp.cpp
	improper_hybrid_omp.h
	pair_hybrid_omp.cpp
	pair_hybrid_omp.h
	# deleted on Mon Aug 22 13:48:15 2011 -0400
	omp_thr.cpp
	omp_thr.h
	# deleted on Mon Aug 8 22:56:28 2011 +0000
	dihedral_cosineshiftexp.cpp
	dihedral_cosineshiftexp.h
	# deleted on Mon Aug 8 22:55:20 2011 +0000
	angle_cosineshift.cpp
	angle_cosineshift.h
	angle_cosineshiftexp.cpp
	angle_cosineshiftexp.h
	# deleted on Mon Aug 8 19:25:08 2011 +0000
	pppm_gpu_double.cpp
	pppm_gpu_double.h
	pppm_gpu_single.cpp
	pppm_gpu_single.h
	# deleted on Fri Apr 15 20:57:03 2011 -0400
	pair_lj_charmm_coul_long_gpu2.cpp
	pair_lj_charmm_coul_long_gpu2.h
	# deleted on Wed Apr 13 21:40:14 2011 +0000
	atom_vec_colloid.cpp
	atom_vec_colloid.h
	atom_vec_granular.cpp
	atom_vec_granular.h
	# deleted on Fri Nov 19 12:53:07 2010 -0500
	fix_pour_omp.cpp
	fix_pour_omp.h
	# deleted on Thu Aug 19 23:20:14 2010 +0000
	fix_qeq.cpp
	fix_qeq.h
	# deleted on Thu Jun 17 01:34:38 2010 +0000
	compute_vsum.cpp
	compute_vsum.h
	# deleted on Mon Jun 14 11:06:46 2010 -0400
	pair_buck_coul_omp.cpp
	pair_buck_coul_omp.h
	pair_lj_coul_omp.cpp
	pair_lj_coul_omp.h
	# deleted on Thu Jun 10 15:39:08 2010 -0400
	pair_buck_coul_omp.cpp
	pair_buck_coul_omp.h
	# deleted on Tue Jun 8 15:42:51 2010 -0400
	pair_buck_coul_omp.cpp
	pair_buck_coul_omp.h
	# deleted on Thu Dec 17 23:52:31 2009 +0000
	dump_bond.cpp
	dump_bond.h
	# deleted on Mon Nov 9 18:20:20 2009 +0000
	atom_vec_dpd.cpp
	atom_vec_dpd.h
	style_dpd.h
	# deleted on Mon Jun 22 21:11:31 2009 +0000
	fix_write_reax_bonds.cpp
	fix_write_reax_bonds.h
	# deleted on Thu Jan 8 16:53:09 2009 +0000
	pair_gran_hertzian.cpp
	pair_gran_hertzian.h
	pair_gran_history.cpp
	pair_gran_history.h
	pair_gran_no_history.cpp
	pair_gran_no_history.h
	# deleted on Mon Mar 17 23:24:44 2008 +0000
	compute_temp_dipole.cpp
	compute_temp_dipole.h
	fix_nve_dipole.cpp
	fix_nve_dipole.h
	# deleted on Mon Mar 17 23:23:24 2008 +0000
	fix_nve_gran.cpp
	fix_nve_gran.h
	# deleted on Fri Nov 30 21:49:20 2007 +0000
	fix_gran_diag.cpp
	fix_gran_diag.h
	atom_angle.cpp
	atom_angle.h
	atom_bond.cpp
	atom_bond.h
	atom_full.cpp
	atom_full.h
	atom_molecular.cpp
	atom_molecular.h
	# deleted on Tue Jan 30 00:22:05 2007 +0000
	atom_dpd.cpp
	atom_dpd.h
	atom_granular.cpp
	atom_granular.h
	# deleted on Wed Dec 13 00:34:21 2006 +0000
	fix_insert.cpp
	fix_insert.h
	diff --git a/src/QEQ/fix_qeq_point.cpp b/src/QEQ/fix_qeq_point.cpp
	index 9af70a445..63d20ad91 100644
	--- a/src/QEQ/fix_qeq_point.cpp
	+++ b/src/QEQ/fix_qeq_point.cpp
	@@ -1,173 +1,173 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Ray Shan (Sandia)
	------------------------------------------------------------------------- */

	#include <math.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <string.h>
	#include "fix_qeq_point.h"
	#include "atom.h"
	#include "comm.h"
	#include "domain.h"
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "neigh_request.h"
	#include "update.h"
	#include "force.h"
	#include "group.h"
	#include "kspace.h"
	#include "respa.h"
	#include "memory.h"
	#include "error.h"

	using namespace LAMMPS_NS;

	/* ---------------------------------------------------------------------- */

	FixQEqPoint::FixQEqPoint(LAMMPS lmp, int narg, char *arg) :
	FixQEq(lmp, narg, arg) {}

	/* ---------------------------------------------------------------------- */

	void FixQEqPoint::init()
	{
	if (!atom->q_flag)
	error->all(FLERR,"Fix qeq/point requires atom attribute q");

	ngroup = group->count(igroup);
	if (ngroup == 0) error->all(FLERR,"Fix qeq/point group has no atoms");

	int irequest = neighbor->request(this,instance_me);
	neighbor->requests[irequest]->pair = 0;
	neighbor->requests[irequest]->fix = 1;
	neighbor->requests[irequest]->half = 0;
	neighbor->requests[irequest]->full = 1;

	int ntypes = atom->ntypes;
	- memory->create(shld,ntypes+1,ntypes+1,"qeq:shileding");
	+ memory->create(shld,ntypes+1,ntypes+1,"qeq:shielding");

	if (strstr(update->integrate_style,"respa"))
	nlevels_respa = ((Respa *) update->integrate)->nlevels;

	}

	/* ---------------------------------------------------------------------- */

	void FixQEqPoint::pre_force(int vflag)
	{
	if (update->ntimestep % nevery) return;

	nlocal = atom->nlocal;

	if( atom->nmax > nmax ) reallocate_storage();

	if( nlocal > n_capDANGER_ZONE \|\| m_fill > m_capDANGER_ZONE )
	reallocate_matrix();

	init_matvec();
	matvecs = CG(b_s, s); // CG on s - parallel
	matvecs += CG(b_t, t); // CG on t - parallel
	calculate_Q();

	if (force->kspace) force->kspace->qsum_qsq();
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqPoint::init_matvec()
	{
	compute_H();

	int inum, ii, i;
	int *ilist;

	inum = list->inum;
	ilist = list->ilist;

	for( ii = 0; ii < inum; ++ii ) {
	i = ilist[ii];
	if (atom->mask[i] & groupbit) {
	Hdia_inv[i] = 1. / eta[ atom->type[i] ];
	b_s[i] = -( chi[atom->type[i]] + chizj[i] );
	b_t[i] = -1.0;
	t[i] = t_hist[i][2] + 3 * ( t_hist[i][0] - t_hist[i][1] );
	s[i] = 4(s_hist[i][0]+s_hist[i][2])-(6s_hist[i][1]+s_hist[i][3]);
	}
	}

	pack_flag = 2;
	comm->forward_comm_fix(this); //Dist_vector( s );
	pack_flag = 3;
	comm->forward_comm_fix(this); //Dist_vector( t );
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqPoint::compute_H()
	{
	int inum, jnum, ilist, jlist, numneigh, *firstneigh;
	int i, j, ii, jj;
	double **x;
	double dx, dy, dz, r_sqr, r;

	x = atom->x;
	int *mask = atom->mask;

	inum = list->inum;
	ilist = list->ilist;
	numneigh = list->numneigh;
	firstneigh = list->firstneigh;

	// fill in the H matrix
	m_fill = 0;
	for( ii = 0; ii < inum; ii++ ) {
	i = ilist[ii];
	if (mask[i] & groupbit) {
	jlist = firstneigh[i];
	jnum = numneigh[i];
	H.firstnbr[i] = m_fill;

	for( jj = 0; jj < jnum; jj++ ) {
	j = jlist[jj];
	j &= NEIGHMASK;

	dx = x[j][0] - x[i][0];
	dy = x[j][1] - x[i][1];
	dz = x[j][2] - x[i][2];
	r_sqr = dxdx + dydy + dz*dz;

	if (r_sqr <= cutoff_sq) {
	H.jlist[m_fill] = j;
	r = sqrt(r_sqr);
	H.val[m_fill] = 0.5/r;
	m_fill++;
	}
	}
	H.numnbrs[i] = m_fill - H.firstnbr[i];
	}
	}

	if (m_fill >= H.m) {
	char str[128];
	sprintf(str,"H matrix size has been exceeded: m_fill=%d H.m=%d\n",
	m_fill, H.m );
	error->warning(FLERR,str);
	error->all(FLERR,"Fix qeq/point has insufficient QEq matrix size");
	}
	}

	/* ---------------------------------------------------------------------- */
	diff --git a/src/RIGID/fix_shake.cpp b/src/RIGID/fix_shake.cpp
	index 1fe704efb..5c993ee85 100644
	--- a/src/RIGID/fix_shake.cpp
	+++ b/src/RIGID/fix_shake.cpp
	@@ -1,2811 +1,2820 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <mpi.h>
	#include <math.h>
	#include <stdlib.h>
	#include <string.h>
	#include <stdio.h>
	#include "fix_shake.h"
	#include "fix_rattle.h"
	#include "atom.h"
	#include "atom_vec.h"
	#include "molecule.h"
	#include "update.h"
	#include "respa.h"
	#include "modify.h"
	#include "domain.h"
	#include "force.h"
	#include "bond.h"
	#include "angle.h"
	#include "comm.h"
	#include "group.h"
	#include "fix_respa.h"
	#include "math_const.h"
	#include "memory.h"
	#include "error.h"

	using namespace LAMMPS_NS;
	using namespace FixConst;
	using namespace MathConst;

	// allocate space for static class variable

	FixShake *FixShake::fsptr;

	#define BIG 1.0e20
	#define MASSDELTA 0.1

	/* ---------------------------------------------------------------------- */

	FixShake::FixShake(LAMMPS lmp, int narg, char *arg) :
	Fix(lmp, narg, arg), bond_flag(NULL), angle_flag(NULL),
	type_flag(NULL), mass_list(NULL), bond_distance(NULL), angle_distance(NULL),
	loop_respa(NULL), step_respa(NULL), x(NULL), v(NULL), f(NULL), ftmp(NULL),
	vtmp(NULL), mass(NULL), rmass(NULL), type(NULL), shake_flag(NULL),
	shake_atom(NULL), shake_type(NULL), xshake(NULL), nshake(NULL),
	list(NULL), b_count(NULL), b_count_all(NULL), b_ave(NULL), b_max(NULL),
	b_min(NULL), b_ave_all(NULL), b_max_all(NULL), b_min_all(NULL),
	a_count(NULL), a_count_all(NULL), a_ave(NULL), a_max(NULL), a_min(NULL),
	a_ave_all(NULL), a_max_all(NULL), a_min_all(NULL), atommols(NULL),
	onemols(NULL)
	{
	MPI_Comm_rank(world,&me);
	MPI_Comm_size(world,&nprocs);

	virial_flag = 1;
	create_attribute = 1;
	dof_flag = 1;

	// error check

	molecular = atom->molecular;
	if (molecular == 0)
	error->all(FLERR,"Cannot use fix shake with non-molecular system");

	// perform initial allocation of atom-based arrays
	// register with Atom class

	shake_flag = NULL;
	shake_atom = NULL;
	shake_type = NULL;
	xshake = NULL;

	ftmp = NULL;
	vtmp = NULL;

	grow_arrays(atom->nmax);
	atom->add_callback(0);

	// set comm size needed by this fix

	comm_forward = 3;

	// parse SHAKE args

	if (narg < 8) error->all(FLERR,"Illegal fix shake command");

	tolerance = force->numeric(FLERR,arg[3]);
	max_iter = force->inumeric(FLERR,arg[4]);
	output_every = force->inumeric(FLERR,arg[5]);

	// parse SHAKE args for bond and angle types
	// will be used by find_clusters
	// store args for "b" "a" "t" as flags in (1:n) list for fast access
	// store args for "m" in list of length nmass for looping over
	// for "m" verify that atom masses have been set

	bond_flag = new int[atom->nbondtypes+1];
	for (int i = 1; i <= atom->nbondtypes; i++) bond_flag[i] = 0;
	angle_flag = new int[atom->nangletypes+1];
	for (int i = 1; i <= atom->nangletypes; i++) angle_flag[i] = 0;
	type_flag = new int[atom->ntypes+1];
	for (int i = 1; i <= atom->ntypes; i++) type_flag[i] = 0;
	mass_list = new double[atom->ntypes];
	nmass = 0;

	char mode = '\0';
	int next = 6;
	while (next < narg) {
	if (strcmp(arg[next],"b") == 0) mode = 'b';
	else if (strcmp(arg[next],"a") == 0) mode = 'a';
	else if (strcmp(arg[next],"t") == 0) mode = 't';
	else if (strcmp(arg[next],"m") == 0) {
	mode = 'm';
	atom->check_mass(FLERR);

	// break if keyword that is not b,a,t,m

	} else if (isalpha(arg[next][0])) break;

	// read numeric args of b,a,t,m

	else if (mode == 'b') {
	int i = force->inumeric(FLERR,arg[next]);
	if (i < 1 \|\| i > atom->nbondtypes)
	error->all(FLERR,"Invalid bond type index for fix shake");
	bond_flag[i] = 1;

	} else if (mode == 'a') {
	int i = force->inumeric(FLERR,arg[next]);
	if (i < 1 \|\| i > atom->nangletypes)
	error->all(FLERR,"Invalid angle type index for fix shake");
	angle_flag[i] = 1;

	} else if (mode == 't') {
	int i = force->inumeric(FLERR,arg[next]);
	if (i < 1 \|\| i > atom->ntypes)
	error->all(FLERR,"Invalid atom type index for fix shake");
	type_flag[i] = 1;

	} else if (mode == 'm') {
	double massone = force->numeric(FLERR,arg[next]);
	if (massone == 0.0) error->all(FLERR,"Invalid atom mass for fix shake");
	if (nmass == atom->ntypes)
	error->all(FLERR,"Too many masses for fix shake");
	mass_list[nmass++] = massone;

	} else error->all(FLERR,"Illegal fix shake command");
	next++;
	}

	// parse optional args

	onemols = NULL;

	int iarg = next;
	while (iarg < narg) {
	if (strcmp(arg[next],"mol") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal fix shake command");
	int imol = atom->find_molecule(arg[iarg+1]);
	if (imol == -1)
	error->all(FLERR,"Molecule template ID for fix shake does not exist");
	if (atom->molecules[imol]->nset > 1 && comm->me == 0)
	error->warning(FLERR,"Molecule template for "
	"fix shake has multiple molecules");
	onemols = &atom->molecules[imol];
	nmol = onemols[0]->nset;
	iarg += 2;
	} else error->all(FLERR,"Illegal fix shake command");
	}

	// error check for Molecule template

	if (onemols) {
	for (int i = 0; i < nmol; i++)
	if (onemols[i]->shakeflag == 0)
	error->all(FLERR,"Fix shake molecule template must have shake info");
	}

	// allocate bond and angle distance arrays, indexed from 1 to n

	bond_distance = new double[atom->nbondtypes+1];
	angle_distance = new double[atom->nangletypes+1];

	// allocate statistics arrays

	if (output_every) {
	int nb = atom->nbondtypes + 1;
	b_count = new int[nb];
	b_count_all = new int[nb];
	b_ave = new double[nb];
	b_ave_all = new double[nb];
	b_max = new double[nb];
	b_max_all = new double[nb];
	b_min = new double[nb];
	b_min_all = new double[nb];

	int na = atom->nangletypes + 1;
	a_count = new int[na];
	a_count_all = new int[na];
	a_ave = new double[na];
	a_ave_all = new double[na];
	a_max = new double[na];
	a_max_all = new double[na];
	a_min = new double[na];
	a_min_all = new double[na];
	}

	// SHAKE vs RATTLE

	rattle = 0;

	// identify all SHAKE clusters

	find_clusters();

	// initialize list of SHAKE clusters to constrain

	maxlist = 0;
	list = NULL;
	}

	/* ---------------------------------------------------------------------- */

	FixShake::~FixShake()
	{
	// unregister callbacks to this fix from Atom class

	atom->delete_callback(id,0);

	// set bond_type and angle_type back to positive for SHAKE clusters
	// must set for all SHAKE bonds and angles stored by each atom

	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (shake_flag[i] == 0) continue;
	else if (shake_flag[i] == 1) {
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],1);
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][2],1);
	angletype_findset(i,shake_atom[i][1],shake_atom[i][2],1);
	} else if (shake_flag[i] == 2) {
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],1);
	} else if (shake_flag[i] == 3) {
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],1);
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][2],1);
	} else if (shake_flag[i] == 4) {
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],1);
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][2],1);
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][3],1);
	}
	}

	// delete locally stored arrays

	memory->destroy(shake_flag);
	memory->destroy(shake_atom);
	memory->destroy(shake_type);
	memory->destroy(xshake);
	memory->destroy(ftmp);
	memory->destroy(vtmp);


	delete [] bond_flag;
	delete [] angle_flag;
	delete [] type_flag;
	delete [] mass_list;

	delete [] bond_distance;
	delete [] angle_distance;

	if (output_every) {
	delete [] b_count;
	delete [] b_count_all;
	delete [] b_ave;
	delete [] b_ave_all;
	delete [] b_max;
	delete [] b_max_all;
	delete [] b_min;
	delete [] b_min_all;

	delete [] a_count;
	delete [] a_count_all;
	delete [] a_ave;
	delete [] a_ave_all;
	delete [] a_max;
	delete [] a_max_all;
	delete [] a_min;
	delete [] a_min_all;
	}

	memory->destroy(list);
	}

	/* ---------------------------------------------------------------------- */

	int FixShake::setmask()
	{
	int mask = 0;
	mask \|= PRE_NEIGHBOR;
	mask \|= POST_FORCE;
	mask \|= POST_FORCE_RESPA;
	return mask;
	}

	/* ----------------------------------------------------------------------
	set bond and angle distances
	this init must happen after force->bond and force->angle inits
	------------------------------------------------------------------------- */

	void FixShake::init()
	{
	int i,m,flag,flag_all,type1,type2,bond1_type,bond2_type;
	double rsq,angle;

	// error if more than one shake fix

	int count = 0;
	for (i = 0; i < modify->nfix; i++)
	if (strcmp(modify->fix[i]->style,"shake") == 0) count++;
	if (count > 1) error->all(FLERR,"More than one fix shake");

	// cannot use with minimization since SHAKE turns off bonds
	// that should contribute to potential energy

	if (update->whichflag == 2)
	error->all(FLERR,"Fix shake cannot be used with minimization");

	// error if npt,nph fix comes before shake fix

	for (i = 0; i < modify->nfix; i++) {
	if (strcmp(modify->fix[i]->style,"npt") == 0) break;
	if (strcmp(modify->fix[i]->style,"nph") == 0) break;
	}
	if (i < modify->nfix) {
	for (int j = i; j < modify->nfix; j++)
	if (strcmp(modify->fix[j]->style,"shake") == 0)
	error->all(FLERR,"Shake fix must come before NPT/NPH fix");
	}

	// if rRESPA, find associated fix that must exist
	// could have changed locations in fix list since created
	// set ptrs to rRESPA variables

	if (strstr(update->integrate_style,"respa")) {
	for (i = 0; i < modify->nfix; i++)
	if (strcmp(modify->fix[i]->style,"RESPA") == 0) ifix_respa = i;
	nlevels_respa = ((Respa *) update->integrate)->nlevels;
	loop_respa = ((Respa *) update->integrate)->loop;
	step_respa = ((Respa *) update->integrate)->step;
	}

	// set equilibrium bond distances

	if (force->bond == NULL)
	error->all(FLERR,"Bond potential must be defined for SHAKE");
	for (i = 1; i <= atom->nbondtypes; i++)
	bond_distance[i] = force->bond->equilibrium_distance(i);

	// set equilibrium angle distances

	int nlocal = atom->nlocal;

	for (i = 1; i <= atom->nangletypes; i++) {
	if (angle_flag[i] == 0) continue;
	if (force->angle == NULL)
	error->all(FLERR,"Angle potential must be defined for SHAKE");

	// scan all atoms for a SHAKE angle cluster
	// extract bond types for the 2 bonds in the cluster
	// bond types must be same in all clusters of this angle type,
	// else set error flag

	flag = 0;
	bond1_type = bond2_type = 0;
	for (m = 0; m < nlocal; m++) {
	if (shake_flag[m] != 1) continue;
	if (shake_type[m][2] != i) continue;
	type1 = MIN(shake_type[m][0],shake_type[m][1]);
	type2 = MAX(shake_type[m][0],shake_type[m][1]);
	if (bond1_type > 0) {
	if (type1 != bond1_type \|\| type2 != bond2_type) {
	flag = 1;
	break;
	}
	}
	bond1_type = type1;
	bond2_type = type2;
	}

	// error check for any bond types that are not the same

	MPI_Allreduce(&flag,&flag_all,1,MPI_INT,MPI_MAX,world);
	if (flag_all) error->all(FLERR,"Shake angles have different bond types");

	// insure all procs have bond types

	MPI_Allreduce(&bond1_type,&flag_all,1,MPI_INT,MPI_MAX,world);
	bond1_type = flag_all;
	MPI_Allreduce(&bond2_type,&flag_all,1,MPI_INT,MPI_MAX,world);
	bond2_type = flag_all;

	// if bond types are 0, no SHAKE angles of this type exist
	// just skip this angle

	if (bond1_type == 0) {
	angle_distance[i] = 0.0;
	continue;
	}

	// compute the angle distance as a function of 2 bond distances
	// formula is now correct for bonds of same or different lengths (Oct15)

	angle = force->angle->equilibrium_angle(i);
	const double b1 = bond_distance[bond1_type];
	const double b2 = bond_distance[bond2_type];
	rsq = b1b1 + b2b2 - 2.0b1b2*cos(angle);
	angle_distance[i] = sqrt(rsq);
	}
	}

	/* ----------------------------------------------------------------------
	SHAKE as pre-integrator constraint
	------------------------------------------------------------------------- */

	void FixShake::setup(int vflag)
	{
	pre_neighbor();

	if (output_every) stats();

	// setup SHAKE output

	bigint ntimestep = update->ntimestep;
	if (output_every) {
	next_output = ntimestep + output_every;
	if (ntimestep % output_every != 0)
	next_output = (ntimestep/output_every)*output_every + output_every;
	} else next_output = -1;

	// set respa to 0 if verlet is used and to 1 otherwise

	if (strstr(update->integrate_style,"verlet"))
	respa = 0;
	else
	respa = 1;

	if (!respa) {
	dtv = update->dt;
	dtfsq = 0.5 * update->dt * update->dt * force->ftm2v;
	if (!rattle) dtfsq = update->dt * update->dt * force->ftm2v;
	} else {
	dtv = step_respa[0];
	dtf_innerhalf = 0.5 * step_respa[0] * force->ftm2v;
	dtf_inner = dtf_innerhalf;
	}

	// correct geometry of cluster if necessary

	correct_coordinates(vflag);

	// remove velocities along any bonds

	correct_velocities();

	// precalculate constraining forces for first integration step

	shake_end_of_step(vflag);
	}

	/* ----------------------------------------------------------------------
	build list of SHAKE clusters to constrain
	if one or more atoms in cluster are on this proc,
	this proc lists the cluster exactly once
	------------------------------------------------------------------------- */

	void FixShake::pre_neighbor()
	{
	int atom1,atom2,atom3,atom4;

	// local copies of atom quantities
	// used by SHAKE until next re-neighboring

	x = atom->x;
	v = atom->v;
	f = atom->f;
	mass = atom->mass;
	rmass = atom->rmass;
	type = atom->type;
	nlocal = atom->nlocal;

	// extend size of SHAKE list if necessary

	if (nlocal > maxlist) {
	maxlist = nlocal;
	memory->destroy(list);
	memory->create(list,maxlist,"shake:list");
	}

	// build list of SHAKE clusters I compute

	nlist = 0;

	for (int i = 0; i < nlocal; i++)
	if (shake_flag[i]) {
	if (shake_flag[i] == 2) {
	atom1 = atom->map(shake_atom[i][0]);
	atom2 = atom->map(shake_atom[i][1]);
	if (atom1 == -1 \|\| atom2 == -1) {
	char str[128];
	sprintf(str,"Shake atoms " TAGINT_FORMAT " " TAGINT_FORMAT
	" missing on proc %d at step " BIGINT_FORMAT,
	shake_atom[i][0],shake_atom[i][1],me,update->ntimestep);
	error->one(FLERR,str);
	}
	if (i <= atom1 && i <= atom2) list[nlist++] = i;
	} else if (shake_flag[i] % 2 == 1) {
	atom1 = atom->map(shake_atom[i][0]);
	atom2 = atom->map(shake_atom[i][1]);
	atom3 = atom->map(shake_atom[i][2]);
	if (atom1 == -1 \|\| atom2 == -1 \|\| atom3 == -1) {
	char str[128];
	sprintf(str,"Shake atoms "
	TAGINT_FORMAT " " TAGINT_FORMAT " " TAGINT_FORMAT
	" missing on proc %d at step " BIGINT_FORMAT,
	shake_atom[i][0],shake_atom[i][1],shake_atom[i][2],
	me,update->ntimestep);
	error->one(FLERR,str);
	}
	if (i <= atom1 && i <= atom2 && i <= atom3) list[nlist++] = i;
	} else {
	atom1 = atom->map(shake_atom[i][0]);
	atom2 = atom->map(shake_atom[i][1]);
	atom3 = atom->map(shake_atom[i][2]);
	atom4 = atom->map(shake_atom[i][3]);
	if (atom1 == -1 \|\| atom2 == -1 \|\| atom3 == -1 \|\| atom4 == -1) {
	char str[128];
	sprintf(str,"Shake atoms "
	TAGINT_FORMAT " " TAGINT_FORMAT " "
	TAGINT_FORMAT " " TAGINT_FORMAT
	" missing on proc %d at step " BIGINT_FORMAT,
	shake_atom[i][0],shake_atom[i][1],
	shake_atom[i][2],shake_atom[i][3],
	me,update->ntimestep);
	error->one(FLERR,str);
	}
	if (i <= atom1 && i <= atom2 && i <= atom3 && i <= atom4)
	list[nlist++] = i;
	}
	}
	}

	/* ----------------------------------------------------------------------
	compute the force adjustment for SHAKE constraint
	------------------------------------------------------------------------- */

	void FixShake::post_force(int vflag)
	{
	if (update->ntimestep == next_output) stats();

	// xshake = unconstrained move with current v,f
	// communicate results if necessary

	unconstrained_update();
	if (nprocs > 1) comm->forward_comm_fix(this);

	// virial setup

	if (vflag) v_setup(vflag);
	else evflag = 0;

	// loop over clusters to add constraint forces

	int m;
	for (int i = 0; i < nlist; i++) {
	m = list[i];
	if (shake_flag[m] == 2) shake(m);
	else if (shake_flag[m] == 3) shake3(m);
	else if (shake_flag[m] == 4) shake4(m);
	else shake3angle(m);
	}

	// store vflag for coordinate_constraints_end_of_step()

	vflag_post_force = vflag;
	}

	/* ----------------------------------------------------------------------
	enforce SHAKE constraints from rRESPA
	xshake prediction portion is different than Verlet
	------------------------------------------------------------------------- */

	void FixShake::post_force_respa(int vflag, int ilevel, int iloop)
	{
	// call stats only on outermost level

	if (ilevel == nlevels_respa-1 && update->ntimestep == next_output) stats();

	// might be OK to skip enforcing SHAKE constraings
	// on last iteration of inner levels if pressure not requested
	// however, leads to slightly different trajectories

	//if (ilevel < nlevels_respa-1 && iloop == loop_respa[ilevel]-1 && !vflag)
	// return;

	// xshake = unconstrained move with current v,f as function of level
	// communicate results if necessary

	unconstrained_update_respa(ilevel);
	if (nprocs > 1) comm->forward_comm_fix(this);

	// virial setup only needed on last iteration of innermost level
	// and if pressure is requested
	// virial accumulation happens via evflag at last iteration of each level

	if (ilevel == 0 && iloop == loop_respa[ilevel]-1 && vflag) v_setup(vflag);
	if (iloop == loop_respa[ilevel]-1) evflag = 1;
	else evflag = 0;

	// loop over clusters to add constraint forces

	int m;
	for (int i = 0; i < nlist; i++) {
	m = list[i];
	if (shake_flag[m] == 2) shake(m);
	else if (shake_flag[m] == 3) shake3(m);
	else if (shake_flag[m] == 4) shake4(m);
	else shake3angle(m);
	}

	// store vflag for coordinate_constraints_end_of_step()
	vflag_post_force = vflag;
	}

	/* ----------------------------------------------------------------------
	count # of degrees-of-freedom removed by SHAKE for atoms in igroup
	------------------------------------------------------------------------- */

	int FixShake::dof(int igroup)
	{
	int groupbit = group->bitmask[igroup];

	int *mask = atom->mask;
	tagint *tag = atom->tag;
	int nlocal = atom->nlocal;

	// count dof in a cluster if and only if
	// the central atom is in group and atom i is the central atom

	int n = 0;
	for (int i = 0; i < nlocal; i++) {
	if (!(mask[i] & groupbit)) continue;
	if (shake_flag[i] == 0) continue;
	if (shake_atom[i][0] != tag[i]) continue;
	if (shake_flag[i] == 1) n += 3;
	else if (shake_flag[i] == 2) n += 1;
	else if (shake_flag[i] == 3) n += 2;
	else if (shake_flag[i] == 4) n += 3;
	}

	int nall;
	MPI_Allreduce(&n,&nall,1,MPI_INT,MPI_SUM,world);
	return nall;
	}

	/* ----------------------------------------------------------------------
	identify whether each atom is in a SHAKE cluster
	only include atoms in fix group and those bonds/angles specified in input
	test whether all clusters are valid
	set shake_flag, shake_atom, shake_type values
	set bond,angle types negative so will be ignored in neighbor lists
	------------------------------------------------------------------------- */

	void FixShake::find_clusters()
	{
	int i,j,m,n,imol,iatom;
	int flag,flag_all,nbuf,size;
	tagint tagprev;
	double massone;
	tagint *buf;

	if (me == 0 && screen) {
	if (!rattle) fprintf(screen,"Finding SHAKE clusters ...\n");
	else fprintf(screen,"Finding RATTLE clusters ...\n");
	}

	atommols = atom->avec->onemols;

	tagint *tag = atom->tag;
	int *type = atom->type;
	int *mask = atom->mask;
	double *mass = atom->mass;
	double *rmass = atom->rmass;
	int **nspecial = atom->nspecial;
	tagint **special = atom->special;

	int *molindex = atom->molindex;
	int *molatom = atom->molatom;

	int nlocal = atom->nlocal;
	int angles_allow = atom->avec->angles_allow;

	// setup ring of procs

	int next = me + 1;
	int prev = me -1;
	if (next == nprocs) next = 0;
	if (prev < 0) prev = nprocs - 1;

	// -----------------------------------------------------
	// allocate arrays for self (1d) and bond partners (2d)
	// max = max # of bond partners for owned atoms = 2nd dim of partner arrays
	// npartner[i] = # of bonds attached to atom i
	// nshake[i] = # of SHAKE bonds attached to atom i
	// partner_tag[i][] = global IDs of each partner
	// partner_mask[i][] = mask of each partner
	// partner_type[i][] = type of each partner
	// partner_massflag[i][] = 1 if partner meets mass criterion, 0 if not
	// partner_bondtype[i][] = type of bond attached to each partner
	// partner_shake[i][] = 1 if SHAKE bonded to partner, 0 if not
	// partner_nshake[i][] = nshake value for each partner
	// -----------------------------------------------------

	int max = 0;
	if (molecular == 1) {
	for (i = 0; i < nlocal; i++) max = MAX(max,nspecial[i][0]);
	} else {
	for (i = 0; i < nlocal; i++) {
	imol = molindex[i];
	if (imol < 0) continue;
	iatom = molatom[i];
	max = MAX(max,atommols[imol]->nspecial[iatom][0]);
	}
	}

	int *npartner;
	memory->create(npartner,nlocal,"shake:npartner");
	memory->create(nshake,nlocal,"shake:nshake");

	tagint **partner_tag;
	int partner_mask,partner_type,**partner_massflag;
	int partner_bondtype,partner_shake,**partner_nshake;
	memory->create(partner_tag,nlocal,max,"shake:partner_tag");
	memory->create(partner_mask,nlocal,max,"shake:partner_mask");
	memory->create(partner_type,nlocal,max,"shake:partner_type");
	memory->create(partner_massflag,nlocal,max,"shake:partner_massflag");
	memory->create(partner_bondtype,nlocal,max,"shake:partner_bondtype");
	memory->create(partner_shake,nlocal,max,"shake:partner_shake");
	memory->create(partner_nshake,nlocal,max,"shake:partner_nshake");

	// -----------------------------------------------------
	// set npartner and partner_tag from special arrays
	// -----------------------------------------------------

	if (molecular == 1) {
	for (i = 0; i < nlocal; i++) {
	npartner[i] = nspecial[i][0];
	for (j = 0; j < npartner[i]; j++)
	partner_tag[i][j] = special[i][j];
	}
	} else {
	for (i = 0; i < nlocal; i++) {
	imol = molindex[i];
	if (imol < 0) continue;
	iatom = molatom[i];
	tagprev = tag[i] - iatom - 1;
	npartner[i] = atommols[imol]->nspecial[iatom][0];
	for (j = 0; j < npartner[i]; j++)
	partner_tag[i][j] = atommols[imol]->special[iatom][j] + tagprev;;
	}
	}

	// -----------------------------------------------------
	// set partner_mask, partner_type, partner_massflag, partner_bondtype
	// for bonded partners
	// requires communication for off-proc partners
	// -----------------------------------------------------

	// fill in mask, type, massflag, bondtype if own bond partner
	// info to store in buf for each off-proc bond = nper = 6
	// 2 atoms IDs in bond, space for mask, type, massflag, bondtype
	// nbufmax = largest buffer needed to hold info from any proc

	int nper = 6;

	nbuf = 0;
	for (i = 0; i < nlocal; i++) {
	for (j = 0; j < npartner[i]; j++) {
	partner_mask[i][j] = 0;
	partner_type[i][j] = 0;
	partner_massflag[i][j] = 0;
	partner_bondtype[i][j] = 0;

	m = atom->map(partner_tag[i][j]);
	if (m >= 0 && m < nlocal) {
	partner_mask[i][j] = mask[m];
	partner_type[i][j] = type[m];
	if (nmass) {
	if (rmass) massone = rmass[m];
	else massone = mass[type[m]];
	partner_massflag[i][j] = masscheck(massone);
	}
	n = bondtype_findset(i,tag[i],partner_tag[i][j],0);
	if (n) partner_bondtype[i][j] = n;
	else {
	n = bondtype_findset(m,tag[i],partner_tag[i][j],0);
	if (n) partner_bondtype[i][j] = n;
	}
	} else nbuf += nper;
	}
	}

	memory->create(buf,nbuf,"shake:buf");

	// fill buffer with info

	size = 0;
	for (i = 0; i < nlocal; i++) {
	for (j = 0; j < npartner[i]; j++) {
	m = atom->map(partner_tag[i][j]);
	if (m < 0 \|\| m >= nlocal) {
	buf[size] = tag[i];
	buf[size+1] = partner_tag[i][j];
	buf[size+2] = 0;
	buf[size+3] = 0;
	buf[size+4] = 0;
	n = bondtype_findset(i,tag[i],partner_tag[i][j],0);
	if (n) buf[size+5] = n;
	else buf[size+5] = 0;
	size += nper;
	}
	}
	}

	// cycle buffer around ring of procs back to self

	fsptr = this;
	comm->ring(size,sizeof(tagint),buf,1,ring_bonds,buf);

	// store partner info returned to me

	m = 0;
	while (m < size) {
	i = atom->map(buf[m]);
	for (j = 0; j < npartner[i]; j++)
	if (buf[m+1] == partner_tag[i][j]) break;
	partner_mask[i][j] = buf[m+2];
	partner_type[i][j] = buf[m+3];
	partner_massflag[i][j] = buf[m+4];
	partner_bondtype[i][j] = buf[m+5];
	m += nper;
	}

	memory->destroy(buf);

	// error check for unfilled partner info
	// if partner_type not set, is an error
	// partner_bondtype may not be set if special list is not consistent
	// with bondatom (e.g. due to delete_bonds command)
	// this is OK if one or both atoms are not in fix group, since
	// bond won't be SHAKEn anyway
	// else it's an error

	flag = 0;
	for (i = 0; i < nlocal; i++)
	for (j = 0; j < npartner[i]; j++) {
	if (partner_type[i][j] == 0) flag = 1;
	if (!(mask[i] & groupbit)) continue;
	if (!(partner_mask[i][j] & groupbit)) continue;
	if (partner_bondtype[i][j] == 0) flag = 1;
	}

	MPI_Allreduce(&flag,&flag_all,1,MPI_INT,MPI_SUM,world);
	if (flag_all) error->all(FLERR,"Did not find fix shake partner info");

	// -----------------------------------------------------
	// identify SHAKEable bonds
	// set nshake[i] = # of SHAKE bonds attached to atom i
	// set partner_shake[i][] = 1 if SHAKE bonded to partner, 0 if not
	// both atoms must be in group, bondtype must be > 0
	// check if bondtype is in input bond_flag
	// check if type of either atom is in input type_flag
	// check if mass of either atom is in input mass_list
	// -----------------------------------------------------

	int np;

	for (i = 0; i < nlocal; i++) {
	nshake[i] = 0;
	np = npartner[i];
	for (j = 0; j < np; j++) {
	partner_shake[i][j] = 0;

	if (!(mask[i] & groupbit)) continue;
	if (!(partner_mask[i][j] & groupbit)) continue;
	if (partner_bondtype[i][j] <= 0) continue;

	if (bond_flag[partner_bondtype[i][j]]) {
	partner_shake[i][j] = 1;
	nshake[i]++;
	continue;
	}
	if (type_flag[type[i]] \|\| type_flag[partner_type[i][j]]) {
	partner_shake[i][j] = 1;
	nshake[i]++;
	continue;
	}
	if (nmass) {
	if (partner_massflag[i][j]) {
	partner_shake[i][j] = 1;
	nshake[i]++;
	continue;
	} else {
	if (rmass) massone = rmass[i];
	else massone = mass[type[i]];
	if (masscheck(massone)) {
	partner_shake[i][j] = 1;
	nshake[i]++;
	continue;
	}
	}
	}
	}
	}

	// -----------------------------------------------------
	// set partner_nshake for bonded partners
	// requires communication for off-proc partners
	// -----------------------------------------------------

	// fill in partner_nshake if own bond partner
	// info to store in buf for each off-proc bond =
	// 2 atoms IDs in bond, space for nshake value
	// nbufmax = largest buffer needed to hold info from any proc

	nbuf = 0;
	for (i = 0; i < nlocal; i++) {
	for (j = 0; j < npartner[i]; j++) {
	m = atom->map(partner_tag[i][j]);
	if (m >= 0 && m < nlocal) partner_nshake[i][j] = nshake[m];
	else nbuf += 3;
	}
	}

	memory->create(buf,nbuf,"shake:buf");

	// fill buffer with info

	size = 0;
	for (i = 0; i < nlocal; i++) {
	for (j = 0; j < npartner[i]; j++) {
	m = atom->map(partner_tag[i][j]);
	if (m < 0 \|\| m >= nlocal) {
	buf[size] = tag[i];
	buf[size+1] = partner_tag[i][j];
	size += 3;
	}
	}
	}

	// cycle buffer around ring of procs back to self

	fsptr = this;
	comm->ring(size,sizeof(tagint),buf,2,ring_nshake,buf);

	// store partner info returned to me

	m = 0;
	while (m < size) {
	i = atom->map(buf[m]);
	for (j = 0; j < npartner[i]; j++)
	if (buf[m+1] == partner_tag[i][j]) break;
	partner_nshake[i][j] = buf[m+2];
	m += 3;
	}

	memory->destroy(buf);

	// -----------------------------------------------------
	// error checks
	// no atom with nshake > 3
	// no connected atoms which both have nshake > 1
	// -----------------------------------------------------

	flag = 0;
	for (i = 0; i < nlocal; i++) if (nshake[i] > 3) flag = 1;
	MPI_Allreduce(&flag,&flag_all,1,MPI_INT,MPI_SUM,world);
	if (flag_all) error->all(FLERR,"Shake cluster of more than 4 atoms");

	flag = 0;
	for (i = 0; i < nlocal; i++) {
	if (nshake[i] <= 1) continue;
	for (j = 0; j < npartner[i]; j++)
	if (partner_shake[i][j] && partner_nshake[i][j] > 1) flag = 1;
	}
	MPI_Allreduce(&flag,&flag_all,1,MPI_INT,MPI_SUM,world);
	if (flag_all) error->all(FLERR,"Shake clusters are connected");

	// -----------------------------------------------------
	// set SHAKE arrays that are stored with atoms & add angle constraints
	// zero shake arrays for all owned atoms
	// if I am central atom set shake_flag & shake_atom & shake_type
	// for 2-atom clusters, I am central atom if my atom ID < partner ID
	// for 3-atom clusters, test for angle constraint
	// angle will be stored by this atom if it exists
	// if angle type matches angle_flag, then it is angle-constrained
	// shake_flag[] = 0 if atom not in SHAKE cluster
	// 2,3,4 = size of bond-only cluster
	// 1 = 3-atom angle cluster
	// shake_atom[][] = global IDs of 2,3,4 atoms in cluster
	// central atom is 1st
	// for 2-atom cluster, lowest ID is 1st
	// shake_type[][] = bondtype of each bond in cluster
	// for 3-atom angle cluster, 3rd value is angletype
	// -----------------------------------------------------

	for (i = 0; i < nlocal; i++) {
	shake_flag[i] = 0;
	shake_atom[i][0] = 0;
	shake_atom[i][1] = 0;
	shake_atom[i][2] = 0;
	shake_atom[i][3] = 0;
	shake_type[i][0] = 0;
	shake_type[i][1] = 0;
	shake_type[i][2] = 0;

	if (nshake[i] == 1) {
	for (j = 0; j < npartner[i]; j++)
	if (partner_shake[i][j]) break;
	if (partner_nshake[i][j] == 1 && tag[i] < partner_tag[i][j]) {
	shake_flag[i] = 2;
	shake_atom[i][0] = tag[i];
	shake_atom[i][1] = partner_tag[i][j];
	shake_type[i][0] = partner_bondtype[i][j];
	}
	}

	if (nshake[i] > 1) {
	shake_flag[i] = 1;
	shake_atom[i][0] = tag[i];
	for (j = 0; j < npartner[i]; j++)
	if (partner_shake[i][j]) {
	m = shake_flag[i];
	shake_atom[i][m] = partner_tag[i][j];
	shake_type[i][m-1] = partner_bondtype[i][j];
	shake_flag[i]++;
	}
	}

	if (nshake[i] == 2 && angles_allow) {
	n = angletype_findset(i,shake_atom[i][1],shake_atom[i][2],0);
	if (n <= 0) continue;
	if (angle_flag[n]) {
	shake_flag[i] = 1;
	shake_type[i][2] = n;
	}
	}
	}

	// -----------------------------------------------------
	// set shake_flag,shake_atom,shake_type for non-central atoms
	// requires communication for off-proc atoms
	// -----------------------------------------------------

	// fill in shake arrays for each bond partner I own
	// info to store in buf for each off-proc bond =
	// all values from shake_flag, shake_atom, shake_type
	// nbufmax = largest buffer needed to hold info from any proc

	nbuf = 0;
	for (i = 0; i < nlocal; i++) {
	if (shake_flag[i] == 0) continue;
	for (j = 0; j < npartner[i]; j++) {
	if (partner_shake[i][j] == 0) continue;
	m = atom->map(partner_tag[i][j]);
	if (m >= 0 && m < nlocal) {
	shake_flag[m] = shake_flag[i];
	shake_atom[m][0] = shake_atom[i][0];
	shake_atom[m][1] = shake_atom[i][1];
	shake_atom[m][2] = shake_atom[i][2];
	shake_atom[m][3] = shake_atom[i][3];
	shake_type[m][0] = shake_type[i][0];
	shake_type[m][1] = shake_type[i][1];
	shake_type[m][2] = shake_type[i][2];
	} else nbuf += 9;
	}
	}

	memory->create(buf,nbuf,"shake:buf");

	// fill buffer with info

	size = 0;
	for (i = 0; i < nlocal; i++) {
	if (shake_flag[i] == 0) continue;
	for (j = 0; j < npartner[i]; j++) {
	if (partner_shake[i][j] == 0) continue;
	m = atom->map(partner_tag[i][j]);
	if (m < 0 \|\| m >= nlocal) {
	buf[size] = partner_tag[i][j];
	buf[size+1] = shake_flag[i];
	buf[size+2] = shake_atom[i][0];
	buf[size+3] = shake_atom[i][1];
	buf[size+4] = shake_atom[i][2];
	buf[size+5] = shake_atom[i][3];
	buf[size+6] = shake_type[i][0];
	buf[size+7] = shake_type[i][1];
	buf[size+8] = shake_type[i][2];
	size += 9;
	}
	}
	}

	// cycle buffer around ring of procs back to self

	fsptr = this;
	comm->ring(size,sizeof(tagint),buf,3,ring_shake,NULL);

	memory->destroy(buf);

	// -----------------------------------------------------
	// free local memory
	// -----------------------------------------------------

	memory->destroy(npartner);
	memory->destroy(nshake);
	memory->destroy(partner_tag);
	memory->destroy(partner_mask);
	memory->destroy(partner_type);
	memory->destroy(partner_massflag);
	memory->destroy(partner_bondtype);
	memory->destroy(partner_shake);
	memory->destroy(partner_nshake);

	// -----------------------------------------------------
	// set bond_type and angle_type negative for SHAKE clusters
	// must set for all SHAKE bonds and angles stored by each atom
	// -----------------------------------------------------

	for (i = 0; i < nlocal; i++) {
	if (shake_flag[i] == 0) continue;
	else if (shake_flag[i] == 1) {
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],-1);
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][2],-1);
	angletype_findset(i,shake_atom[i][1],shake_atom[i][2],-1);
	} else if (shake_flag[i] == 2) {
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],-1);
	} else if (shake_flag[i] == 3) {
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],-1);
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][2],-1);
	} else if (shake_flag[i] == 4) {
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],-1);
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][2],-1);
	bondtype_findset(i,shake_atom[i][0],shake_atom[i][3],-1);
	}
	}

	// -----------------------------------------------------
	// print info on SHAKE clusters
	// -----------------------------------------------------

	int count1,count2,count3,count4;
	count1 = count2 = count3 = count4 = 0;
	for (i = 0; i < nlocal; i++) {
	if (shake_flag[i] == 1) count1++;
	else if (shake_flag[i] == 2) count2++;
	else if (shake_flag[i] == 3) count3++;
	else if (shake_flag[i] == 4) count4++;
	}

	int tmp;
	tmp = count1;
	MPI_Allreduce(&tmp,&count1,1,MPI_INT,MPI_SUM,world);
	tmp = count2;
	MPI_Allreduce(&tmp,&count2,1,MPI_INT,MPI_SUM,world);
	tmp = count3;
	MPI_Allreduce(&tmp,&count3,1,MPI_INT,MPI_SUM,world);
	tmp = count4;
	MPI_Allreduce(&tmp,&count4,1,MPI_INT,MPI_SUM,world);

	if (me == 0) {
	if (screen) {
	fprintf(screen," %d = # of size 2 clusters\n",count2/2);
	fprintf(screen," %d = # of size 3 clusters\n",count3/3);
	fprintf(screen," %d = # of size 4 clusters\n",count4/4);
	fprintf(screen," %d = # of frozen angles\n",count1/3);
	}
	if (logfile) {
	fprintf(logfile," %d = # of size 2 clusters\n",count2/2);
	fprintf(logfile," %d = # of size 3 clusters\n",count3/3);
	fprintf(logfile," %d = # of size 4 clusters\n",count4/4);
	fprintf(logfile," %d = # of frozen angles\n",count1/3);
	}
	}
	}

	/* ----------------------------------------------------------------------
	when receive buffer, scan bond partner IDs for atoms I own
	if I own partner:
	fill in mask and type and massflag
	search for bond with 1st atom and fill in bondtype
	------------------------------------------------------------------------- */

	void FixShake::ring_bonds(int ndatum, char *cbuf)
	{
	Atom *atom = fsptr->atom;
	double *rmass = atom->rmass;
	double *mass = atom->mass;
	int *mask = atom->mask;
	int *type = atom->type;
	int nlocal = atom->nlocal;
	int nmass = fsptr->nmass;

	tagint buf = (tagint ) cbuf;
	int m,n;
	double massone;

	for (int i = 0; i < ndatum; i += 6) {
	m = atom->map(buf[i+1]);
	if (m >= 0 && m < nlocal) {
	buf[i+2] = mask[m];
	buf[i+3] = type[m];
	if (nmass) {
	if (rmass) massone = rmass[m];
	else massone = mass[type[m]];
	buf[i+4] = fsptr->masscheck(massone);
	}
	if (buf[i+5] == 0) {
	n = fsptr->bondtype_findset(m,buf[i],buf[i+1],0);
	if (n) buf[i+5] = n;
	}
	}
	}
	}

	/* ----------------------------------------------------------------------
	when receive buffer, scan bond partner IDs for atoms I own
	if I own partner, fill in nshake value
	------------------------------------------------------------------------- */

	void FixShake::ring_nshake(int ndatum, char *cbuf)
	{
	Atom *atom = fsptr->atom;
	int nlocal = atom->nlocal;

	int *nshake = fsptr->nshake;

	tagint buf = (tagint ) cbuf;
	int m;

	for (int i = 0; i < ndatum; i += 3) {
	m = atom->map(buf[i+1]);
	if (m >= 0 && m < nlocal) buf[i+2] = nshake[m];
	}
	}

	/* ----------------------------------------------------------------------
	when receive buffer, scan bond partner IDs for atoms I own
	if I own partner, fill in nshake value
	------------------------------------------------------------------------- */

	void FixShake::ring_shake(int ndatum, char *cbuf)
	{
	Atom *atom = fsptr->atom;
	int nlocal = atom->nlocal;

	int *shake_flag = fsptr->shake_flag;
	tagint **shake_atom = fsptr->shake_atom;
	int **shake_type = fsptr->shake_type;

	tagint buf = (tagint ) cbuf;
	int m;

	for (int i = 0; i < ndatum; i += 9) {
	m = atom->map(buf[i]);
	if (m >= 0 && m < nlocal) {
	shake_flag[m] = buf[i+1];
	shake_atom[m][0] = buf[i+2];
	shake_atom[m][1] = buf[i+3];
	shake_atom[m][2] = buf[i+4];
	shake_atom[m][3] = buf[i+5];
	shake_type[m][0] = buf[i+6];
	shake_type[m][1] = buf[i+7];
	shake_type[m][2] = buf[i+8];
	}
	}
	}

	/* ----------------------------------------------------------------------
	check if massone is within MASSDELTA of any mass in mass_list
	return 1 if yes, 0 if not
	------------------------------------------------------------------------- */

	int FixShake::masscheck(double massone)
	{
	for (int i = 0; i < nmass; i++)
	if (fabs(mass_list[i]-massone) <= MASSDELTA) return 1;
	return 0;
	}

	/* ----------------------------------------------------------------------
	update the unconstrained position of each atom
	only for SHAKE clusters, else set to 0.0
	assumes NVE update, seems to be accurate enough for NVT,NPT,NPH as well
	------------------------------------------------------------------------- */

	void FixShake::unconstrained_update()
	{
	double dtfmsq;

	if (rmass) {
	for (int i = 0; i < nlocal; i++) {
	if (shake_flag[i]) {
	dtfmsq = dtfsq / rmass[i];
	xshake[i][0] = x[i][0] + dtvv[i][0] + dtfmsqf[i][0];
	xshake[i][1] = x[i][1] + dtvv[i][1] + dtfmsqf[i][1];
	xshake[i][2] = x[i][2] + dtvv[i][2] + dtfmsqf[i][2];
	} else xshake[i][2] = xshake[i][1] = xshake[i][0] = 0.0;
	}
	} else {
	for (int i = 0; i < nlocal; i++) {
	if (shake_flag[i]) {
	dtfmsq = dtfsq / mass[type[i]];
	xshake[i][0] = x[i][0] + dtvv[i][0] + dtfmsqf[i][0];
	xshake[i][1] = x[i][1] + dtvv[i][1] + dtfmsqf[i][1];
	xshake[i][2] = x[i][2] + dtvv[i][2] + dtfmsqf[i][2];
	} else xshake[i][2] = xshake[i][1] = xshake[i][0] = 0.0;
	}
	}
	}

	/* ----------------------------------------------------------------------
	update the unconstrained position of each atom in a rRESPA step
	only for SHAKE clusters, else set to 0.0
	assumes NVE update, seems to be accurate enough for NVT,NPT,NPH as well
	------------------------------------------------------------------------- */

	void FixShake::unconstrained_update_respa(int ilevel)
	{
	// xshake = atom coords after next x update in innermost loop
	// depends on rRESPA level
	// for levels > 0 this includes more than one velocity update
	// xshake = predicted position from call to this routine at level N =
	// x + dt0 (v + dtN/m fN + 1/2 dt(N-1)/m f(N-1) + ... + 1/2 dt0/m f0)
	// also set dtfsq = dt0*dtN so that shake,shake3,etc can use it

	double **f_level = ((FixRespa ) modify->fix[ifix_respa])->f_level;
	dtfsq = dtf_inner * step_respa[ilevel];

	double invmass,dtfmsq;
	int jlevel;

	if (rmass) {
	for (int i = 0; i < nlocal; i++) {
	if (shake_flag[i]) {
	invmass = 1.0 / rmass[i];
	dtfmsq = dtfsq * invmass;
	xshake[i][0] = x[i][0] + dtvv[i][0] + dtfmsqf[i][0];
	xshake[i][1] = x[i][1] + dtvv[i][1] + dtfmsqf[i][1];
	xshake[i][2] = x[i][2] + dtvv[i][2] + dtfmsqf[i][2];
	for (jlevel = 0; jlevel < ilevel; jlevel++) {
	dtfmsq = dtf_innerhalf * step_respa[jlevel] * invmass;
	xshake[i][0] += dtfmsq*f_level[i][jlevel][0];
	xshake[i][1] += dtfmsq*f_level[i][jlevel][1];
	xshake[i][2] += dtfmsq*f_level[i][jlevel][2];
	}
	} else xshake[i][2] = xshake[i][1] = xshake[i][0] = 0.0;
	}

	} else {
	for (int i = 0; i < nlocal; i++) {
	if (shake_flag[i]) {
	invmass = 1.0 / mass[type[i]];
	dtfmsq = dtfsq * invmass;
	xshake[i][0] = x[i][0] + dtvv[i][0] + dtfmsqf[i][0];
	xshake[i][1] = x[i][1] + dtvv[i][1] + dtfmsqf[i][1];
	xshake[i][2] = x[i][2] + dtvv[i][2] + dtfmsqf[i][2];
	for (jlevel = 0; jlevel < ilevel; jlevel++) {
	dtfmsq = dtf_innerhalf * step_respa[jlevel] * invmass;
	xshake[i][0] += dtfmsq*f_level[i][jlevel][0];
	xshake[i][1] += dtfmsq*f_level[i][jlevel][1];
	xshake[i][2] += dtfmsq*f_level[i][jlevel][2];
	}
	} else xshake[i][2] = xshake[i][1] = xshake[i][0] = 0.0;
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	void FixShake::shake(int m)
	{
	int nlist,list[2];
	double v[6];
	double invmass0,invmass1;

	// local atom IDs and constraint distances

	int i0 = atom->map(shake_atom[m][0]);
	int i1 = atom->map(shake_atom[m][1]);
	double bond1 = bond_distance[shake_type[m][0]];

	// r01 = distance vec between atoms, with PBC

	double r01[3];
	r01[0] = x[i0][0] - x[i1][0];
	r01[1] = x[i0][1] - x[i1][1];
	r01[2] = x[i0][2] - x[i1][2];
	domain->minimum_image(r01);

	// s01 = distance vec after unconstrained update, with PBC
	+ // use Domain::minimum_image_once(), not minimum_image()
	+ // b/c xshake values might be huge, due to e.g. fix gcmc

	double s01[3];
	s01[0] = xshake[i0][0] - xshake[i1][0];
	s01[1] = xshake[i0][1] - xshake[i1][1];
	s01[2] = xshake[i0][2] - xshake[i1][2];
	- domain->minimum_image(s01);
	+ domain->minimum_image_once(s01);

	// scalar distances between atoms

	double r01sq = r01[0]r01[0] + r01[1]r01[1] + r01[2]*r01[2];
	double s01sq = s01[0]s01[0] + s01[1]s01[1] + s01[2]*s01[2];

	// a,b,c = coeffs in quadratic equation for lamda

	if (rmass) {
	invmass0 = 1.0/rmass[i0];
	invmass1 = 1.0/rmass[i1];
	} else {
	invmass0 = 1.0/mass[type[i0]];
	invmass1 = 1.0/mass[type[i1]];
	}

	double a = (invmass0+invmass1)(invmass0+invmass1) r01sq;
	double b = 2.0 * (invmass0+invmass1) *
	(s01[0]r01[0] + s01[1]r01[1] + s01[2]*r01[2]);
	double c = s01sq - bond1*bond1;

	// error check

	double determ = bb - 4.0a*c;
	if (determ < 0.0) {
	error->warning(FLERR,"Shake determinant < 0.0",0);
	determ = 0.0;
	}

	// exact quadratic solution for lamda

	double lamda,lamda1,lamda2;
	lamda1 = (-b+sqrt(determ)) / (2.0*a);
	lamda2 = (-b-sqrt(determ)) / (2.0*a);

	if (fabs(lamda1) <= fabs(lamda2)) lamda = lamda1;
	else lamda = lamda2;

	// update forces if atom is owned by this processor

	lamda /= dtfsq;

	if (i0 < nlocal) {
	f[i0][0] += lamda*r01[0];
	f[i0][1] += lamda*r01[1];
	f[i0][2] += lamda*r01[2];
	}

	if (i1 < nlocal) {
	f[i1][0] -= lamda*r01[0];
	f[i1][1] -= lamda*r01[1];
	f[i1][2] -= lamda*r01[2];
	}

	if (evflag) {
	nlist = 0;
	if (i0 < nlocal) list[nlist++] = i0;
	if (i1 < nlocal) list[nlist++] = i1;

	v[0] = lamdar01[0]r01[0];
	v[1] = lamdar01[1]r01[1];
	v[2] = lamdar01[2]r01[2];
	v[3] = lamdar01[0]r01[1];
	v[4] = lamdar01[0]r01[2];
	v[5] = lamdar01[1]r01[2];

	v_tally(nlist,list,2.0,v);
	}
	}

	/* ---------------------------------------------------------------------- */

	void FixShake::shake3(int m)
	{
	int nlist,list[3];
	double v[6];
	double invmass0,invmass1,invmass2;

	// local atom IDs and constraint distances

	int i0 = atom->map(shake_atom[m][0]);
	int i1 = atom->map(shake_atom[m][1]);
	int i2 = atom->map(shake_atom[m][2]);
	double bond1 = bond_distance[shake_type[m][0]];
	double bond2 = bond_distance[shake_type[m][1]];

	// r01,r02 = distance vec between atoms, with PBC

	double r01[3];
	r01[0] = x[i0][0] - x[i1][0];
	r01[1] = x[i0][1] - x[i1][1];
	r01[2] = x[i0][2] - x[i1][2];
	domain->minimum_image(r01);

	double r02[3];
	r02[0] = x[i0][0] - x[i2][0];
	r02[1] = x[i0][1] - x[i2][1];
	r02[2] = x[i0][2] - x[i2][2];
	domain->minimum_image(r02);

	// s01,s02 = distance vec after unconstrained update, with PBC
	+ // use Domain::minimum_image_once(), not minimum_image()
	+ // b/c xshake values might be huge, due to e.g. fix gcmc

	double s01[3];
	s01[0] = xshake[i0][0] - xshake[i1][0];
	s01[1] = xshake[i0][1] - xshake[i1][1];
	s01[2] = xshake[i0][2] - xshake[i1][2];
	- domain->minimum_image(s01);
	+ domain->minimum_image_once(s01);

	double s02[3];
	s02[0] = xshake[i0][0] - xshake[i2][0];
	s02[1] = xshake[i0][1] - xshake[i2][1];
	s02[2] = xshake[i0][2] - xshake[i2][2];
	- domain->minimum_image(s02);
	+ domain->minimum_image_once(s02);

	// scalar distances between atoms

	double r01sq = r01[0]r01[0] + r01[1]r01[1] + r01[2]*r01[2];
	double r02sq = r02[0]r02[0] + r02[1]r02[1] + r02[2]*r02[2];
	double s01sq = s01[0]s01[0] + s01[1]s01[1] + s01[2]*s01[2];
	double s02sq = s02[0]s02[0] + s02[1]s02[1] + s02[2]*s02[2];

	// matrix coeffs and rhs for lamda equations

	if (rmass) {
	invmass0 = 1.0/rmass[i0];
	invmass1 = 1.0/rmass[i1];
	invmass2 = 1.0/rmass[i2];
	} else {
	invmass0 = 1.0/mass[type[i0]];
	invmass1 = 1.0/mass[type[i1]];
	invmass2 = 1.0/mass[type[i2]];
	}

	double a11 = 2.0 * (invmass0+invmass1) *
	(s01[0]r01[0] + s01[1]r01[1] + s01[2]*r01[2]);
	double a12 = 2.0 * invmass0 *
	(s01[0]r02[0] + s01[1]r02[1] + s01[2]*r02[2]);
	double a21 = 2.0 * invmass0 *
	(s02[0]r01[0] + s02[1]r01[1] + s02[2]*r01[2]);
	double a22 = 2.0 * (invmass0+invmass2) *
	(s02[0]r02[0] + s02[1]r02[1] + s02[2]*r02[2]);

	// inverse of matrix

	double determ = a11a22 - a12a21;
	if (determ == 0.0) error->one(FLERR,"Shake determinant = 0.0");
	double determinv = 1.0/determ;

	double a11inv = a22*determinv;
	double a12inv = -a12*determinv;
	double a21inv = -a21*determinv;
	double a22inv = a11*determinv;

	// quadratic correction coeffs

	double r0102 = (r01[0]r02[0] + r01[1]r02[1] + r01[2]*r02[2]);

	double quad1_0101 = (invmass0+invmass1)(invmass0+invmass1) r01sq;
	double quad1_0202 = invmass0invmass0 r02sq;
	double quad1_0102 = 2.0 * (invmass0+invmass1)invmass0 r0102;

	double quad2_0202 = (invmass0+invmass2)(invmass0+invmass2) r02sq;
	double quad2_0101 = invmass0invmass0 r01sq;
	double quad2_0102 = 2.0 * (invmass0+invmass2)invmass0 r0102;

	// iterate until converged

	double lamda01 = 0.0;
	double lamda02 = 0.0;
	int niter = 0;
	int done = 0;

	double quad1,quad2,b1,b2,lamda01_new,lamda02_new;

	while (!done && niter < max_iter) {
	quad1 = quad1_0101 * lamda01lamda01 + quad1_0202 lamda02*lamda02 +
	quad1_0102 * lamda01*lamda02;
	quad2 = quad2_0101 * lamda01lamda01 + quad2_0202 lamda02*lamda02 +
	quad2_0102 * lamda01*lamda02;

	b1 = bond1*bond1 - s01sq - quad1;
	b2 = bond2*bond2 - s02sq - quad2;

	lamda01_new = a11invb1 + a12invb2;
	lamda02_new = a21invb1 + a22invb2;

	done = 1;
	if (fabs(lamda01_new-lamda01) > tolerance) done = 0;
	if (fabs(lamda02_new-lamda02) > tolerance) done = 0;

	lamda01 = lamda01_new;
	lamda02 = lamda02_new;
	niter++;
	}

	// update forces if atom is owned by this processor

	lamda01 = lamda01/dtfsq;
	lamda02 = lamda02/dtfsq;

	if (i0 < nlocal) {
	f[i0][0] += lamda01r01[0] + lamda02r02[0];
	f[i0][1] += lamda01r01[1] + lamda02r02[1];
	f[i0][2] += lamda01r01[2] + lamda02r02[2];
	}

	if (i1 < nlocal) {
	f[i1][0] -= lamda01*r01[0];
	f[i1][1] -= lamda01*r01[1];
	f[i1][2] -= lamda01*r01[2];
	}

	if (i2 < nlocal) {
	f[i2][0] -= lamda02*r02[0];
	f[i2][1] -= lamda02*r02[1];
	f[i2][2] -= lamda02*r02[2];
	}

	if (evflag) {
	nlist = 0;
	if (i0 < nlocal) list[nlist++] = i0;
	if (i1 < nlocal) list[nlist++] = i1;
	if (i2 < nlocal) list[nlist++] = i2;

	v[0] = lamda01r01[0]r01[0] + lamda02r02[0]r02[0];
	v[1] = lamda01r01[1]r01[1] + lamda02r02[1]r02[1];
	v[2] = lamda01r01[2]r01[2] + lamda02r02[2]r02[2];
	v[3] = lamda01r01[0]r01[1] + lamda02r02[0]r02[1];
	v[4] = lamda01r01[0]r01[2] + lamda02r02[0]r02[2];
	v[5] = lamda01r01[1]r01[2] + lamda02r02[1]r02[2];

	v_tally(nlist,list,3.0,v);
	}
	}

	/* ---------------------------------------------------------------------- */

	void FixShake::shake4(int m)
	{
	int nlist,list[4];
	double v[6];
	double invmass0,invmass1,invmass2,invmass3;

	// local atom IDs and constraint distances

	int i0 = atom->map(shake_atom[m][0]);
	int i1 = atom->map(shake_atom[m][1]);
	int i2 = atom->map(shake_atom[m][2]);
	int i3 = atom->map(shake_atom[m][3]);
	double bond1 = bond_distance[shake_type[m][0]];
	double bond2 = bond_distance[shake_type[m][1]];
	double bond3 = bond_distance[shake_type[m][2]];

	// r01,r02,r03 = distance vec between atoms, with PBC

	double r01[3];
	r01[0] = x[i0][0] - x[i1][0];
	r01[1] = x[i0][1] - x[i1][1];
	r01[2] = x[i0][2] - x[i1][2];
	domain->minimum_image(r01);

	double r02[3];
	r02[0] = x[i0][0] - x[i2][0];
	r02[1] = x[i0][1] - x[i2][1];
	r02[2] = x[i0][2] - x[i2][2];
	domain->minimum_image(r02);

	double r03[3];
	r03[0] = x[i0][0] - x[i3][0];
	r03[1] = x[i0][1] - x[i3][1];
	r03[2] = x[i0][2] - x[i3][2];
	domain->minimum_image(r03);

	// s01,s02,s03 = distance vec after unconstrained update, with PBC
	+ // use Domain::minimum_image_once(), not minimum_image()
	+ // b/c xshake values might be huge, due to e.g. fix gcmc

	double s01[3];
	s01[0] = xshake[i0][0] - xshake[i1][0];
	s01[1] = xshake[i0][1] - xshake[i1][1];
	s01[2] = xshake[i0][2] - xshake[i1][2];
	- domain->minimum_image(s01);
	+ domain->minimum_image_once(s01);

	double s02[3];
	s02[0] = xshake[i0][0] - xshake[i2][0];
	s02[1] = xshake[i0][1] - xshake[i2][1];
	s02[2] = xshake[i0][2] - xshake[i2][2];
	- domain->minimum_image(s02);
	+ domain->minimum_image_once(s02);

	double s03[3];
	s03[0] = xshake[i0][0] - xshake[i3][0];
	s03[1] = xshake[i0][1] - xshake[i3][1];
	s03[2] = xshake[i0][2] - xshake[i3][2];
	- domain->minimum_image(s03);
	+ domain->minimum_image_once(s03);

	// scalar distances between atoms

	double r01sq = r01[0]r01[0] + r01[1]r01[1] + r01[2]*r01[2];
	double r02sq = r02[0]r02[0] + r02[1]r02[1] + r02[2]*r02[2];
	double r03sq = r03[0]r03[0] + r03[1]r03[1] + r03[2]*r03[2];
	double s01sq = s01[0]s01[0] + s01[1]s01[1] + s01[2]*s01[2];
	double s02sq = s02[0]s02[0] + s02[1]s02[1] + s02[2]*s02[2];
	double s03sq = s03[0]s03[0] + s03[1]s03[1] + s03[2]*s03[2];

	// matrix coeffs and rhs for lamda equations

	if (rmass) {
	invmass0 = 1.0/rmass[i0];
	invmass1 = 1.0/rmass[i1];
	invmass2 = 1.0/rmass[i2];
	invmass3 = 1.0/rmass[i3];
	} else {
	invmass0 = 1.0/mass[type[i0]];
	invmass1 = 1.0/mass[type[i1]];
	invmass2 = 1.0/mass[type[i2]];
	invmass3 = 1.0/mass[type[i3]];
	}

	double a11 = 2.0 * (invmass0+invmass1) *
	(s01[0]r01[0] + s01[1]r01[1] + s01[2]*r01[2]);
	double a12 = 2.0 * invmass0 *
	(s01[0]r02[0] + s01[1]r02[1] + s01[2]*r02[2]);
	double a13 = 2.0 * invmass0 *
	(s01[0]r03[0] + s01[1]r03[1] + s01[2]*r03[2]);
	double a21 = 2.0 * invmass0 *
	(s02[0]r01[0] + s02[1]r01[1] + s02[2]*r01[2]);
	double a22 = 2.0 * (invmass0+invmass2) *
	(s02[0]r02[0] + s02[1]r02[1] + s02[2]*r02[2]);
	double a23 = 2.0 * invmass0 *
	(s02[0]r03[0] + s02[1]r03[1] + s02[2]*r03[2]);
	double a31 = 2.0 * invmass0 *
	(s03[0]r01[0] + s03[1]r01[1] + s03[2]*r01[2]);
	double a32 = 2.0 * invmass0 *
	(s03[0]r02[0] + s03[1]r02[1] + s03[2]*r02[2]);
	double a33 = 2.0 * (invmass0+invmass3) *
	(s03[0]r03[0] + s03[1]r03[1] + s03[2]*r03[2]);

	// inverse of matrix;

	double determ = a11a22a33 + a12a23a31 + a13a21a32 -
	a11a23a32 - a12a21a33 - a13a22a31;
	if (determ == 0.0) error->one(FLERR,"Shake determinant = 0.0");
	double determinv = 1.0/determ;

	double a11inv = determinv * (a22a33 - a23a32);
	double a12inv = -determinv * (a12a33 - a13a32);
	double a13inv = determinv * (a12a23 - a13a22);
	double a21inv = -determinv * (a21a33 - a23a31);
	double a22inv = determinv * (a11a33 - a13a31);
	double a23inv = -determinv * (a11a23 - a13a21);
	double a31inv = determinv * (a21a32 - a22a31);
	double a32inv = -determinv * (a11a32 - a12a31);
	double a33inv = determinv * (a11a22 - a12a21);

	// quadratic correction coeffs

	double r0102 = (r01[0]r02[0] + r01[1]r02[1] + r01[2]*r02[2]);
	double r0103 = (r01[0]r03[0] + r01[1]r03[1] + r01[2]*r03[2]);
	double r0203 = (r02[0]r03[0] + r02[1]r03[1] + r02[2]*r03[2]);

	double quad1_0101 = (invmass0+invmass1)(invmass0+invmass1) r01sq;
	double quad1_0202 = invmass0invmass0 r02sq;
	double quad1_0303 = invmass0invmass0 r03sq;
	double quad1_0102 = 2.0 * (invmass0+invmass1)invmass0 r0102;
	double quad1_0103 = 2.0 * (invmass0+invmass1)invmass0 r0103;
	double quad1_0203 = 2.0 * invmass0invmass0 r0203;

	double quad2_0101 = invmass0invmass0 r01sq;
	double quad2_0202 = (invmass0+invmass2)(invmass0+invmass2) r02sq;
	double quad2_0303 = invmass0invmass0 r03sq;
	double quad2_0102 = 2.0 * (invmass0+invmass2)invmass0 r0102;
	double quad2_0103 = 2.0 * invmass0invmass0 r0103;
	double quad2_0203 = 2.0 * (invmass0+invmass2)invmass0 r0203;

	double quad3_0101 = invmass0invmass0 r01sq;
	double quad3_0202 = invmass0invmass0 r02sq;
	double quad3_0303 = (invmass0+invmass3)(invmass0+invmass3) r03sq;
	double quad3_0102 = 2.0 * invmass0invmass0 r0102;
	double quad3_0103 = 2.0 * (invmass0+invmass3)invmass0 r0103;
	double quad3_0203 = 2.0 * (invmass0+invmass3)invmass0 r0203;

	// iterate until converged

	double lamda01 = 0.0;
	double lamda02 = 0.0;
	double lamda03 = 0.0;
	int niter = 0;
	int done = 0;

	double quad1,quad2,quad3,b1,b2,b3,lamda01_new,lamda02_new,lamda03_new;

	while (!done && niter < max_iter) {
	quad1 = quad1_0101 * lamda01*lamda01 +
	quad1_0202 * lamda02*lamda02 +
	quad1_0303 * lamda03*lamda03 +
	quad1_0102 * lamda01*lamda02 +
	quad1_0103 * lamda01*lamda03 +
	quad1_0203 * lamda02*lamda03;

	quad2 = quad2_0101 * lamda01*lamda01 +
	quad2_0202 * lamda02*lamda02 +
	quad2_0303 * lamda03*lamda03 +
	quad2_0102 * lamda01*lamda02 +
	quad2_0103 * lamda01*lamda03 +
	quad2_0203 * lamda02*lamda03;

	quad3 = quad3_0101 * lamda01*lamda01 +
	quad3_0202 * lamda02*lamda02 +
	quad3_0303 * lamda03*lamda03 +
	quad3_0102 * lamda01*lamda02 +
	quad3_0103 * lamda01*lamda03 +
	quad3_0203 * lamda02*lamda03;

	b1 = bond1*bond1 - s01sq - quad1;
	b2 = bond2*bond2 - s02sq - quad2;
	b3 = bond3*bond3 - s03sq - quad3;

	lamda01_new = a11invb1 + a12invb2 + a13inv*b3;
	lamda02_new = a21invb1 + a22invb2 + a23inv*b3;
	lamda03_new = a31invb1 + a32invb2 + a33inv*b3;

	done = 1;
	if (fabs(lamda01_new-lamda01) > tolerance) done = 0;
	if (fabs(lamda02_new-lamda02) > tolerance) done = 0;
	if (fabs(lamda03_new-lamda03) > tolerance) done = 0;

	lamda01 = lamda01_new;
	lamda02 = lamda02_new;
	lamda03 = lamda03_new;
	niter++;
	}

	// update forces if atom is owned by this processor

	lamda01 = lamda01/dtfsq;
	lamda02 = lamda02/dtfsq;
	lamda03 = lamda03/dtfsq;

	if (i0 < nlocal) {
	f[i0][0] += lamda01r01[0] + lamda02r02[0] + lamda03*r03[0];
	f[i0][1] += lamda01r01[1] + lamda02r02[1] + lamda03*r03[1];
	f[i0][2] += lamda01r01[2] + lamda02r02[2] + lamda03*r03[2];
	}

	if (i1 < nlocal) {
	f[i1][0] -= lamda01*r01[0];
	f[i1][1] -= lamda01*r01[1];
	f[i1][2] -= lamda01*r01[2];
	}

	if (i2 < nlocal) {
	f[i2][0] -= lamda02*r02[0];
	f[i2][1] -= lamda02*r02[1];
	f[i2][2] -= lamda02*r02[2];
	}

	if (i3 < nlocal) {
	f[i3][0] -= lamda03*r03[0];
	f[i3][1] -= lamda03*r03[1];
	f[i3][2] -= lamda03*r03[2];
	}

	if (evflag) {
	nlist = 0;
	if (i0 < nlocal) list[nlist++] = i0;
	if (i1 < nlocal) list[nlist++] = i1;
	if (i2 < nlocal) list[nlist++] = i2;
	if (i3 < nlocal) list[nlist++] = i3;

	v[0] = lamda01r01[0]r01[0]+lamda02r02[0]r02[0]+lamda03r03[0]r03[0];
	v[1] = lamda01r01[1]r01[1]+lamda02r02[1]r02[1]+lamda03r03[1]r03[1];
	v[2] = lamda01r01[2]r01[2]+lamda02r02[2]r02[2]+lamda03r03[2]r03[2];
	v[3] = lamda01r01[0]r01[1]+lamda02r02[0]r02[1]+lamda03r03[0]r03[1];
	v[4] = lamda01r01[0]r01[2]+lamda02r02[0]r02[2]+lamda03r03[0]r03[2];
	v[5] = lamda01r01[1]r01[2]+lamda02r02[1]r02[2]+lamda03r03[1]r03[2];

	v_tally(nlist,list,4.0,v);
	}
	}

	/* ---------------------------------------------------------------------- */

	void FixShake::shake3angle(int m)
	{
	int nlist,list[3];
	double v[6];
	double invmass0,invmass1,invmass2;

	// local atom IDs and constraint distances

	int i0 = atom->map(shake_atom[m][0]);
	int i1 = atom->map(shake_atom[m][1]);
	int i2 = atom->map(shake_atom[m][2]);
	double bond1 = bond_distance[shake_type[m][0]];
	double bond2 = bond_distance[shake_type[m][1]];
	double bond12 = angle_distance[shake_type[m][2]];

	// r01,r02,r12 = distance vec between atoms, with PBC

	double r01[3];
	r01[0] = x[i0][0] - x[i1][0];
	r01[1] = x[i0][1] - x[i1][1];
	r01[2] = x[i0][2] - x[i1][2];
	domain->minimum_image(r01);

	double r02[3];
	r02[0] = x[i0][0] - x[i2][0];
	r02[1] = x[i0][1] - x[i2][1];
	r02[2] = x[i0][2] - x[i2][2];
	domain->minimum_image(r02);

	double r12[3];
	r12[0] = x[i1][0] - x[i2][0];
	r12[1] = x[i1][1] - x[i2][1];
	r12[2] = x[i1][2] - x[i2][2];
	domain->minimum_image(r12);

	// s01,s02,s12 = distance vec after unconstrained update, with PBC
	+ // use Domain::minimum_image_once(), not minimum_image()
	+ // b/c xshake values might be huge, due to e.g. fix gcmc

	double s01[3];
	s01[0] = xshake[i0][0] - xshake[i1][0];
	s01[1] = xshake[i0][1] - xshake[i1][1];
	s01[2] = xshake[i0][2] - xshake[i1][2];
	- domain->minimum_image(s01);
	+ domain->minimum_image_once(s01);

	double s02[3];
	s02[0] = xshake[i0][0] - xshake[i2][0];
	s02[1] = xshake[i0][1] - xshake[i2][1];
	s02[2] = xshake[i0][2] - xshake[i2][2];
	- domain->minimum_image(s02);
	+ domain->minimum_image_once(s02);

	double s12[3];
	s12[0] = xshake[i1][0] - xshake[i2][0];
	s12[1] = xshake[i1][1] - xshake[i2][1];
	s12[2] = xshake[i1][2] - xshake[i2][2];
	- domain->minimum_image(s12);
	+ domain->minimum_image_once(s12);

	// scalar distances between atoms

	double r01sq = r01[0]r01[0] + r01[1]r01[1] + r01[2]*r01[2];
	double r02sq = r02[0]r02[0] + r02[1]r02[1] + r02[2]*r02[2];
	double r12sq = r12[0]r12[0] + r12[1]r12[1] + r12[2]*r12[2];
	double s01sq = s01[0]s01[0] + s01[1]s01[1] + s01[2]*s01[2];
	double s02sq = s02[0]s02[0] + s02[1]s02[1] + s02[2]*s02[2];
	double s12sq = s12[0]s12[0] + s12[1]s12[1] + s12[2]*s12[2];

	// matrix coeffs and rhs for lamda equations

	if (rmass) {
	invmass0 = 1.0/rmass[i0];
	invmass1 = 1.0/rmass[i1];
	invmass2 = 1.0/rmass[i2];
	} else {
	invmass0 = 1.0/mass[type[i0]];
	invmass1 = 1.0/mass[type[i1]];
	invmass2 = 1.0/mass[type[i2]];
	}

	double a11 = 2.0 * (invmass0+invmass1) *
	(s01[0]r01[0] + s01[1]r01[1] + s01[2]*r01[2]);
	double a12 = 2.0 * invmass0 *
	(s01[0]r02[0] + s01[1]r02[1] + s01[2]*r02[2]);
	double a13 = - 2.0 * invmass1 *
	(s01[0]r12[0] + s01[1]r12[1] + s01[2]*r12[2]);
	double a21 = 2.0 * invmass0 *
	(s02[0]r01[0] + s02[1]r01[1] + s02[2]*r01[2]);
	double a22 = 2.0 * (invmass0+invmass2) *
	(s02[0]r02[0] + s02[1]r02[1] + s02[2]*r02[2]);
	double a23 = 2.0 * invmass2 *
	(s02[0]r12[0] + s02[1]r12[1] + s02[2]*r12[2]);
	double a31 = - 2.0 * invmass1 *
	(s12[0]r01[0] + s12[1]r01[1] + s12[2]*r01[2]);
	double a32 = 2.0 * invmass2 *
	(s12[0]r02[0] + s12[1]r02[1] + s12[2]*r02[2]);
	double a33 = 2.0 * (invmass1+invmass2) *
	(s12[0]r12[0] + s12[1]r12[1] + s12[2]*r12[2]);

	// inverse of matrix

	double determ = a11a22a33 + a12a23a31 + a13a21a32 -
	a11a23a32 - a12a21a33 - a13a22a31;
	if (determ == 0.0) error->one(FLERR,"Shake determinant = 0.0");
	double determinv = 1.0/determ;

	double a11inv = determinv * (a22a33 - a23a32);
	double a12inv = -determinv * (a12a33 - a13a32);
	double a13inv = determinv * (a12a23 - a13a22);
	double a21inv = -determinv * (a21a33 - a23a31);
	double a22inv = determinv * (a11a33 - a13a31);
	double a23inv = -determinv * (a11a23 - a13a21);
	double a31inv = determinv * (a21a32 - a22a31);
	double a32inv = -determinv * (a11a32 - a12a31);
	double a33inv = determinv * (a11a22 - a12a21);

	// quadratic correction coeffs

	double r0102 = (r01[0]r02[0] + r01[1]r02[1] + r01[2]*r02[2]);
	double r0112 = (r01[0]r12[0] + r01[1]r12[1] + r01[2]*r12[2]);
	double r0212 = (r02[0]r12[0] + r02[1]r12[1] + r02[2]*r12[2]);

	double quad1_0101 = (invmass0+invmass1)(invmass0+invmass1) r01sq;
	double quad1_0202 = invmass0invmass0 r02sq;
	double quad1_1212 = invmass1invmass1 r12sq;
	double quad1_0102 = 2.0 * (invmass0+invmass1)invmass0 r0102;
	double quad1_0112 = - 2.0 * (invmass0+invmass1)invmass1 r0112;
	double quad1_0212 = - 2.0 * invmass0invmass1 r0212;

	double quad2_0101 = invmass0invmass0 r01sq;
	double quad2_0202 = (invmass0+invmass2)(invmass0+invmass2) r02sq;
	double quad2_1212 = invmass2invmass2 r12sq;
	double quad2_0102 = 2.0 * (invmass0+invmass2)invmass0 r0102;
	double quad2_0112 = 2.0 * invmass0invmass2 r0112;
	double quad2_0212 = 2.0 * (invmass0+invmass2)invmass2 r0212;

	double quad3_0101 = invmass1invmass1 r01sq;
	double quad3_0202 = invmass2invmass2 r02sq;
	double quad3_1212 = (invmass1+invmass2)(invmass1+invmass2) r12sq;
	double quad3_0102 = - 2.0 * invmass1invmass2 r0102;
	double quad3_0112 = - 2.0 * (invmass1+invmass2)invmass1 r0112;
	double quad3_0212 = 2.0 * (invmass1+invmass2)invmass2 r0212;

	// iterate until converged

	double lamda01 = 0.0;
	double lamda02 = 0.0;
	double lamda12 = 0.0;
	int niter = 0;
	int done = 0;

	double quad1,quad2,quad3,b1,b2,b3,lamda01_new,lamda02_new,lamda12_new;

	while (!done && niter < max_iter) {
	+
	quad1 = quad1_0101 * lamda01*lamda01 +
	quad1_0202 * lamda02*lamda02 +
	quad1_1212 * lamda12*lamda12 +
	quad1_0102 * lamda01*lamda02 +
	quad1_0112 * lamda01*lamda12 +
	quad1_0212 * lamda02*lamda12;

	quad2 = quad2_0101 * lamda01*lamda01 +
	quad2_0202 * lamda02*lamda02 +
	quad2_1212 * lamda12*lamda12 +
	quad2_0102 * lamda01*lamda02 +
	quad2_0112 * lamda01*lamda12 +
	quad2_0212 * lamda02*lamda12;

	quad3 = quad3_0101 * lamda01*lamda01 +
	quad3_0202 * lamda02*lamda02 +
	quad3_1212 * lamda12*lamda12 +
	quad3_0102 * lamda01*lamda02 +
	quad3_0112 * lamda01*lamda12 +
	quad3_0212 * lamda02*lamda12;

	b1 = bond1*bond1 - s01sq - quad1;
	b2 = bond2*bond2 - s02sq - quad2;
	b3 = bond12*bond12 - s12sq - quad3;

	lamda01_new = a11invb1 + a12invb2 + a13inv*b3;
	lamda02_new = a21invb1 + a22invb2 + a23inv*b3;
	lamda12_new = a31invb1 + a32invb2 + a33inv*b3;

	done = 1;
	if (fabs(lamda01_new-lamda01) > tolerance) done = 0;
	if (fabs(lamda02_new-lamda02) > tolerance) done = 0;
	if (fabs(lamda12_new-lamda12) > tolerance) done = 0;

	lamda01 = lamda01_new;
	lamda02 = lamda02_new;
	lamda12 = lamda12_new;
	niter++;
	}

	// update forces if atom is owned by this processor

	lamda01 = lamda01/dtfsq;
	lamda02 = lamda02/dtfsq;
	lamda12 = lamda12/dtfsq;

	if (i0 < nlocal) {
	f[i0][0] += lamda01r01[0] + lamda02r02[0];
	f[i0][1] += lamda01r01[1] + lamda02r02[1];
	f[i0][2] += lamda01r01[2] + lamda02r02[2];
	}

	if (i1 < nlocal) {
	f[i1][0] -= lamda01r01[0] - lamda12r12[0];
	f[i1][1] -= lamda01r01[1] - lamda12r12[1];
	f[i1][2] -= lamda01r01[2] - lamda12r12[2];
	}

	if (i2 < nlocal) {
	f[i2][0] -= lamda02r02[0] + lamda12r12[0];
	f[i2][1] -= lamda02r02[1] + lamda12r12[1];
	f[i2][2] -= lamda02r02[2] + lamda12r12[2];
	}

	if (evflag) {
	nlist = 0;
	if (i0 < nlocal) list[nlist++] = i0;
	if (i1 < nlocal) list[nlist++] = i1;
	if (i2 < nlocal) list[nlist++] = i2;

	v[0] = lamda01r01[0]r01[0]+lamda02r02[0]r02[0]+lamda12r12[0]r12[0];
	v[1] = lamda01r01[1]r01[1]+lamda02r02[1]r02[1]+lamda12r12[1]r12[1];
	v[2] = lamda01r01[2]r01[2]+lamda02r02[2]r02[2]+lamda12r12[2]r12[2];
	v[3] = lamda01r01[0]r01[1]+lamda02r02[0]r02[1]+lamda12r12[0]r12[1];
	v[4] = lamda01r01[0]r01[2]+lamda02r02[0]r02[2]+lamda12r12[0]r12[2];
	v[5] = lamda01r01[1]r01[2]+lamda02r02[1]r02[2]+lamda12r12[1]r12[2];

	v_tally(nlist,list,3.0,v);
	}
	}

	/* ----------------------------------------------------------------------
	print-out bond & angle statistics
	------------------------------------------------------------------------- */

	void FixShake::stats()
	{
	int i,j,m,n,iatom,jatom,katom;
	double delx,dely,delz;
	double r,r1,r2,r3,angle;

	// zero out accumulators

	int nb = atom->nbondtypes + 1;
	int na = atom->nangletypes + 1;

	for (i = 0; i < nb; i++) {
	b_count[i] = 0;
	b_ave[i] = b_max[i] = 0.0;
	b_min[i] = BIG;
	}
	for (i = 0; i < na; i++) {
	a_count[i] = 0;
	a_ave[i] = a_max[i] = 0.0;
	a_min[i] = BIG;
	}

	// log stats for each bond & angle
	// OK to double count since are just averaging

	double **x = atom->x;
	int nlocal = atom->nlocal;

	for (i = 0; i < nlocal; i++) {
	if (shake_flag[i] == 0) continue;

	// bond stats

	n = shake_flag[i];
	if (n == 1) n = 3;
	iatom = atom->map(shake_atom[i][0]);
	for (j = 1; j < n; j++) {
	jatom = atom->map(shake_atom[i][j]);
	delx = x[iatom][0] - x[jatom][0];
	dely = x[iatom][1] - x[jatom][1];
	delz = x[iatom][2] - x[jatom][2];
	domain->minimum_image(delx,dely,delz);
	r = sqrt(delxdelx + delydely + delz*delz);

	m = shake_type[i][j-1];
	b_count[m]++;
	b_ave[m] += r;
	b_max[m] = MAX(b_max[m],r);
	b_min[m] = MIN(b_min[m],r);
	}

	// angle stats

	if (shake_flag[i] == 1) {
	iatom = atom->map(shake_atom[i][0]);
	jatom = atom->map(shake_atom[i][1]);
	katom = atom->map(shake_atom[i][2]);

	delx = x[iatom][0] - x[jatom][0];
	dely = x[iatom][1] - x[jatom][1];
	delz = x[iatom][2] - x[jatom][2];
	domain->minimum_image(delx,dely,delz);
	r1 = sqrt(delxdelx + delydely + delz*delz);

	delx = x[iatom][0] - x[katom][0];
	dely = x[iatom][1] - x[katom][1];
	delz = x[iatom][2] - x[katom][2];
	domain->minimum_image(delx,dely,delz);
	r2 = sqrt(delxdelx + delydely + delz*delz);

	delx = x[jatom][0] - x[katom][0];
	dely = x[jatom][1] - x[katom][1];
	delz = x[jatom][2] - x[katom][2];
	domain->minimum_image(delx,dely,delz);
	r3 = sqrt(delxdelx + delydely + delz*delz);

	angle = acos((r1r1 + r2r2 - r3r3) / (2.0r1*r2));
	angle *= 180.0/MY_PI;
	m = shake_type[i][2];
	a_count[m]++;
	a_ave[m] += angle;
	a_max[m] = MAX(a_max[m],angle);
	a_min[m] = MIN(a_min[m],angle);
	}
	}

	// sum across all procs

	MPI_Allreduce(b_count,b_count_all,nb,MPI_INT,MPI_SUM,world);
	MPI_Allreduce(b_ave,b_ave_all,nb,MPI_DOUBLE,MPI_SUM,world);
	MPI_Allreduce(b_max,b_max_all,nb,MPI_DOUBLE,MPI_MAX,world);
	MPI_Allreduce(b_min,b_min_all,nb,MPI_DOUBLE,MPI_MIN,world);

	MPI_Allreduce(a_count,a_count_all,na,MPI_INT,MPI_SUM,world);
	MPI_Allreduce(a_ave,a_ave_all,na,MPI_DOUBLE,MPI_SUM,world);
	MPI_Allreduce(a_max,a_max_all,na,MPI_DOUBLE,MPI_MAX,world);
	MPI_Allreduce(a_min,a_min_all,na,MPI_DOUBLE,MPI_MIN,world);

	// print stats only for non-zero counts

	if (me == 0) {

	if (screen) {
	fprintf(screen,
	"SHAKE stats (type/ave/delta) on step " BIGINT_FORMAT "\n",
	update->ntimestep);
	for (i = 1; i < nb; i++)
	if (b_count_all[i])
	fprintf(screen," %d %g %g %d\n",i,
	b_ave_all[i]/b_count_all[i],b_max_all[i]-b_min_all[i],
	b_count_all[i]);
	for (i = 1; i < na; i++)
	if (a_count_all[i])
	fprintf(screen," %d %g %g\n",i,
	a_ave_all[i]/a_count_all[i],a_max_all[i]-a_min_all[i]);
	}
	if (logfile) {
	fprintf(logfile,
	"SHAKE stats (type/ave/delta) on step " BIGINT_FORMAT "\n",
	update->ntimestep);
	for (i = 0; i < nb; i++)
	if (b_count_all[i])
	fprintf(logfile," %d %g %g\n",i,
	b_ave_all[i]/b_count_all[i],b_max_all[i]-b_min_all[i]);
	for (i = 0; i < na; i++)
	if (a_count_all[i])
	fprintf(logfile," %d %g %g\n",i,
	a_ave_all[i]/a_count_all[i],a_max_all[i]-a_min_all[i]);
	}
	}

	// next timestep for stats

	next_output += output_every;
	}

	/* ----------------------------------------------------------------------
	find a bond between global atom IDs n1 and n2 stored with local atom i
	if find it:
	if setflag = 0, return bond type
	if setflag = -1/1, set bond type to negative/positive and return 0
	if do not find it, return 0
	------------------------------------------------------------------------- */

	int FixShake::bondtype_findset(int i, tagint n1, tagint n2, int setflag)
	{
	int m,nbonds;
	int *btype;

	if (molecular == 1) {
	tagint *tag = atom->tag;
	tagint **bond_atom = atom->bond_atom;
	nbonds = atom->num_bond[i];

	for (m = 0; m < nbonds; m++) {
	if (n1 == tag[i] && n2 == bond_atom[i][m]) break;
	if (n1 == bond_atom[i][m] && n2 == tag[i]) break;
	}

	} else {
	int imol = atom->molindex[i];
	int iatom = atom->molatom[i];
	tagint *tag = atom->tag;
	tagint tagprev = tag[i] - iatom - 1;
	tagint *batom = atommols[imol]->bond_atom[iatom];
	btype = atommols[imol]->bond_type[iatom];
	nbonds = atommols[imol]->num_bond[iatom];

	for (m = 0; m < nbonds; m++) {
	if (n1 == tag[i] && n2 == batom[m]+tagprev) break;
	if (n1 == batom[m]+tagprev && n2 == tag[i]) break;
	}
	}

	if (m < nbonds) {
	if (setflag == 0) {
	if (molecular == 1) return atom->bond_type[i][m];
	else return btype[m];
	}
	if (molecular == 1) {
	if ((setflag < 0 && atom->bond_type[i][m] > 0) \|\|
	(setflag > 0 && atom->bond_type[i][m] < 0))
	atom->bond_type[i][m] = -atom->bond_type[i][m];
	} else {
	if ((setflag < 0 && btype[m] > 0) \|\|
	(setflag > 0 && btype[m] < 0)) btype[m] = -btype[m];
	}
	}

	return 0;
	}

	/* ----------------------------------------------------------------------
	find an angle with global end atom IDs n1 and n2 stored with local atom i
	if find it:
	if setflag = 0, return angle type
	if setflag = -1/1, set angle type to negative/positive and return 0
	if do not find it, return 0
	------------------------------------------------------------------------- */

	int FixShake::angletype_findset(int i, tagint n1, tagint n2, int setflag)
	{
	int m,nangles;
	int *atype;

	if (molecular == 1) {
	tagint **angle_atom1 = atom->angle_atom1;
	tagint **angle_atom3 = atom->angle_atom3;
	nangles = atom->num_angle[i];

	for (m = 0; m < nangles; m++) {
	if (n1 == angle_atom1[i][m] && n2 == angle_atom3[i][m]) break;
	if (n1 == angle_atom3[i][m] && n2 == angle_atom1[i][m]) break;
	}

	} else {
	int imol = atom->molindex[i];
	int iatom = atom->molatom[i];
	tagint *tag = atom->tag;
	tagint tagprev = tag[i] - iatom - 1;
	tagint *aatom1 = atommols[imol]->angle_atom1[iatom];
	tagint *aatom3 = atommols[imol]->angle_atom3[iatom];
	atype = atommols[imol]->angle_type[iatom];
	nangles = atommols[imol]->num_angle[iatom];

	for (m = 0; m < nangles; m++) {
	if (n1 == aatom1[m]+tagprev && n2 == aatom3[m]+tagprev) break;
	if (n1 == aatom3[m]+tagprev && n2 == aatom1[m]+tagprev) break;
	}
	}

	if (m < nangles) {
	if (setflag == 0) {
	if (molecular == 1) return atom->angle_type[i][m];
	else return atype[m];
	}
	if (molecular == 1) {
	if ((setflag < 0 && atom->angle_type[i][m] > 0) \|\|
	(setflag > 0 && atom->angle_type[i][m] < 0))
	atom->angle_type[i][m] = -atom->angle_type[i][m];
	} else {
	if ((setflag < 0 && atype[m] > 0) \|\|
	(setflag > 0 && atype[m] < 0)) atype[m] = -atype[m];
	}
	}

	return 0;
	}

	/* ----------------------------------------------------------------------
	memory usage of local atom-based arrays
	------------------------------------------------------------------------- */

	double FixShake::memory_usage()
	{
	int nmax = atom->nmax;
	double bytes = nmax * sizeof(int);
	bytes += nmax4 sizeof(int);
	bytes += nmax3 sizeof(int);
	bytes += nmax3 sizeof(double);
	bytes += maxvatom6 sizeof(double);
	return bytes;
	}

	/* ----------------------------------------------------------------------
	allocate local atom-based arrays
	------------------------------------------------------------------------- */

	void FixShake::grow_arrays(int nmax)
	{
	memory->grow(shake_flag,nmax,"shake:shake_flag");
	memory->grow(shake_atom,nmax,4,"shake:shake_atom");
	memory->grow(shake_type,nmax,3,"shake:shake_type");
	memory->destroy(xshake);
	memory->create(xshake,nmax,3,"shake:xshake");
	memory->destroy(ftmp);
	memory->create(ftmp,nmax,3,"shake:ftmp");
	memory->destroy(vtmp);
	memory->create(vtmp,nmax,3,"shake:vtmp");
	}

	/* ----------------------------------------------------------------------
	copy values within local atom-based arrays
	------------------------------------------------------------------------- */

	void FixShake::copy_arrays(int i, int j, int delflag)
	{
	int flag = shake_flag[j] = shake_flag[i];
	if (flag == 1) {
	shake_atom[j][0] = shake_atom[i][0];
	shake_atom[j][1] = shake_atom[i][1];
	shake_atom[j][2] = shake_atom[i][2];
	shake_type[j][0] = shake_type[i][0];
	shake_type[j][1] = shake_type[i][1];
	shake_type[j][2] = shake_type[i][2];
	} else if (flag == 2) {
	shake_atom[j][0] = shake_atom[i][0];
	shake_atom[j][1] = shake_atom[i][1];
	shake_type[j][0] = shake_type[i][0];
	} else if (flag == 3) {
	shake_atom[j][0] = shake_atom[i][0];
	shake_atom[j][1] = shake_atom[i][1];
	shake_atom[j][2] = shake_atom[i][2];
	shake_type[j][0] = shake_type[i][0];
	shake_type[j][1] = shake_type[i][1];
	} else if (flag == 4) {
	shake_atom[j][0] = shake_atom[i][0];
	shake_atom[j][1] = shake_atom[i][1];
	shake_atom[j][2] = shake_atom[i][2];
	shake_atom[j][3] = shake_atom[i][3];
	shake_type[j][0] = shake_type[i][0];
	shake_type[j][1] = shake_type[i][1];
	shake_type[j][2] = shake_type[i][2];
	}
	}

	/* ----------------------------------------------------------------------
	initialize one atom's array values, called when atom is created
	------------------------------------------------------------------------- */

	void FixShake::set_arrays(int i)
	{
	shake_flag[i] = 0;
	}

	/* ----------------------------------------------------------------------
	update one atom's array values
	called when molecule is created from fix gcmc
	------------------------------------------------------------------------- */

	void FixShake::update_arrays(int i, int atom_offset)
	{
	int flag = shake_flag[i];

	if (flag == 1) {
	shake_atom[i][0] += atom_offset;
	shake_atom[i][1] += atom_offset;
	shake_atom[i][2] += atom_offset;
	} else if (flag == 2) {
	shake_atom[i][0] += atom_offset;
	shake_atom[i][1] += atom_offset;
	} else if (flag == 3) {
	shake_atom[i][0] += atom_offset;
	shake_atom[i][1] += atom_offset;
	shake_atom[i][2] += atom_offset;
	} else if (flag == 4) {
	shake_atom[i][0] += atom_offset;
	shake_atom[i][1] += atom_offset;
	shake_atom[i][2] += atom_offset;
	shake_atom[i][3] += atom_offset;
	}
	}

	/* ----------------------------------------------------------------------
	initialize a molecule inserted by another fix, e.g. deposit or pour
	called when molecule is created
	nlocalprev = # of atoms on this proc before molecule inserted
	tagprev = atom ID previous to new atoms in the molecule
	xgeom,vcm,quat ignored
	------------------------------------------------------------------------- */

	void FixShake::set_molecule(int nlocalprev, tagint tagprev, int imol,
	double xgeom, double vcm, double *quat)
	{
	int m,flag;

	int nlocal = atom->nlocal;
	if (nlocalprev == nlocal) return;

	tagint *tag = atom->tag;
	tagint **mol_shake_atom = onemols[imol]->shake_atom;
	int **mol_shake_type = onemols[imol]->shake_type;

	for (int i = nlocalprev; i < nlocal; i++) {
	m = tag[i] - tagprev-1;

	flag = shake_flag[i] = onemols[imol]->shake_flag[m];

	if (flag == 1) {
	shake_atom[i][0] = mol_shake_atom[m][0] + tagprev;
	shake_atom[i][1] = mol_shake_atom[m][1] + tagprev;
	shake_atom[i][2] = mol_shake_atom[m][2] + tagprev;
	shake_type[i][0] = mol_shake_type[m][0];
	shake_type[i][1] = mol_shake_type[m][1];
	shake_type[i][2] = mol_shake_type[m][2];
	} else if (flag == 2) {
	shake_atom[i][0] = mol_shake_atom[m][0] + tagprev;
	shake_atom[i][1] = mol_shake_atom[m][1] + tagprev;
	shake_type[i][0] = mol_shake_type[m][0];
	} else if (flag == 3) {
	shake_atom[i][0] = mol_shake_atom[m][0] + tagprev;
	shake_atom[i][1] = mol_shake_atom[m][1] + tagprev;
	shake_atom[i][2] = mol_shake_atom[m][2] + tagprev;
	shake_type[i][0] = mol_shake_type[m][0];
	shake_type[i][1] = mol_shake_type[m][1];
	} else if (flag == 4) {
	shake_atom[i][0] = mol_shake_atom[m][0] + tagprev;
	shake_atom[i][1] = mol_shake_atom[m][1] + tagprev;
	shake_atom[i][2] = mol_shake_atom[m][2] + tagprev;
	shake_atom[i][3] = mol_shake_atom[m][3] + tagprev;
	shake_type[i][0] = mol_shake_type[m][0];
	shake_type[i][1] = mol_shake_type[m][1];
	shake_type[i][2] = mol_shake_type[m][2];
	}
	}
	}

	/* ----------------------------------------------------------------------
	pack values in local atom-based arrays for exchange with another proc
	------------------------------------------------------------------------- */

	int FixShake::pack_exchange(int i, double *buf)
	{
	int m = 0;
	buf[m++] = shake_flag[i];
	int flag = shake_flag[i];
	if (flag == 1) {
	buf[m++] = shake_atom[i][0];
	buf[m++] = shake_atom[i][1];
	buf[m++] = shake_atom[i][2];
	buf[m++] = shake_type[i][0];
	buf[m++] = shake_type[i][1];
	buf[m++] = shake_type[i][2];
	} else if (flag == 2) {
	buf[m++] = shake_atom[i][0];
	buf[m++] = shake_atom[i][1];
	buf[m++] = shake_type[i][0];
	} else if (flag == 3) {
	buf[m++] = shake_atom[i][0];
	buf[m++] = shake_atom[i][1];
	buf[m++] = shake_atom[i][2];
	buf[m++] = shake_type[i][0];
	buf[m++] = shake_type[i][1];
	} else if (flag == 4) {
	buf[m++] = shake_atom[i][0];
	buf[m++] = shake_atom[i][1];
	buf[m++] = shake_atom[i][2];
	buf[m++] = shake_atom[i][3];
	buf[m++] = shake_type[i][0];
	buf[m++] = shake_type[i][1];
	buf[m++] = shake_type[i][2];
	}
	return m;
	}

	/* ----------------------------------------------------------------------
	unpack values in local atom-based arrays from exchange with another proc
	------------------------------------------------------------------------- */

	int FixShake::unpack_exchange(int nlocal, double *buf)
	{
	int m = 0;
	int flag = shake_flag[nlocal] = static_cast<int> (buf[m++]);
	if (flag == 1) {
	shake_atom[nlocal][0] = static_cast<tagint> (buf[m++]);
	shake_atom[nlocal][1] = static_cast<tagint> (buf[m++]);
	shake_atom[nlocal][2] = static_cast<tagint> (buf[m++]);
	shake_type[nlocal][0] = static_cast<int> (buf[m++]);
	shake_type[nlocal][1] = static_cast<int> (buf[m++]);
	shake_type[nlocal][2] = static_cast<int> (buf[m++]);
	} else if (flag == 2) {
	shake_atom[nlocal][0] = static_cast<tagint> (buf[m++]);
	shake_atom[nlocal][1] = static_cast<tagint> (buf[m++]);
	shake_type[nlocal][0] = static_cast<int> (buf[m++]);
	} else if (flag == 3) {
	shake_atom[nlocal][0] = static_cast<tagint> (buf[m++]);
	shake_atom[nlocal][1] = static_cast<tagint> (buf[m++]);
	shake_atom[nlocal][2] = static_cast<tagint> (buf[m++]);
	shake_type[nlocal][0] = static_cast<int> (buf[m++]);
	shake_type[nlocal][1] = static_cast<int> (buf[m++]);
	} else if (flag == 4) {
	shake_atom[nlocal][0] = static_cast<tagint> (buf[m++]);
	shake_atom[nlocal][1] = static_cast<tagint> (buf[m++]);
	shake_atom[nlocal][2] = static_cast<tagint> (buf[m++]);
	shake_atom[nlocal][3] = static_cast<tagint> (buf[m++]);
	shake_type[nlocal][0] = static_cast<int> (buf[m++]);
	shake_type[nlocal][1] = static_cast<int> (buf[m++]);
	shake_type[nlocal][2] = static_cast<int> (buf[m++]);
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	int FixShake::pack_forward_comm(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;
	double dx,dy,dz;

	m = 0;
	if (pbc_flag == 0) {
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = xshake[j][0];
	buf[m++] = xshake[j][1];
	buf[m++] = xshake[j][2];
	}
	} else {
	if (domain->triclinic == 0) {
	dx = pbc[0]*domain->xprd;
	dy = pbc[1]*domain->yprd;
	dz = pbc[2]*domain->zprd;
	} else {
	dx = pbc[0]domain->xprd + pbc[5]domain->xy + pbc[4]*domain->xz;
	dy = pbc[1]domain->yprd + pbc[3]domain->yz;
	dz = pbc[2]*domain->zprd;
	}
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = xshake[j][0] + dx;
	buf[m++] = xshake[j][1] + dy;
	buf[m++] = xshake[j][2] + dz;
	}
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void FixShake::unpack_forward_comm(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	xshake[i][0] = buf[m++];
	xshake[i][1] = buf[m++];
	xshake[i][2] = buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	void FixShake::reset_dt()
	{
	if (strstr(update->integrate_style,"verlet")) {
	dtv = update->dt;
	if (rattle) dtfsq = 0.5 * update->dt * update->dt * force->ftm2v;
	else dtfsq = update->dt * update->dt * force->ftm2v;
	} else {
	dtv = step_respa[0];
	dtf_innerhalf = 0.5 * step_respa[0] * force->ftm2v;
	if (rattle) dtf_inner = dtf_innerhalf;
	else dtf_inner = step_respa[0] * force->ftm2v;
	}
	}

	/* ----------------------------------------------------------------------
	extract Molecule ptr
	------------------------------------------------------------------------- */

	void FixShake::extract(const char str, int &dim)
	{
	dim = 0;
	if (strcmp(str,"onemol") == 0) return onemols;
	return NULL;
	}

	/* ----------------------------------------------------------------------
	add coordinate constraining forces
	this method is called at the end of a timestep
	------------------------------------------------------------------------- */

	void FixShake::shake_end_of_step(int vflag) {

	if (!respa) {
	dtv = update->dt;
	dtfsq = 0.5 * update->dt * update->dt * force->ftm2v;
	FixShake::post_force(vflag);
	if (!rattle) dtfsq = update->dt * update->dt * force->ftm2v;

	} else {
	dtv = step_respa[0];
	dtf_innerhalf = 0.5 * step_respa[0] * force->ftm2v;
	dtf_inner = dtf_innerhalf;

	// apply correction to all rRESPA levels

	for (int ilevel = 0; ilevel < nlevels_respa; ilevel++) {
	((Respa *) update->integrate)->copy_flevel_f(ilevel);
	FixShake::post_force_respa(vflag,ilevel,loop_respa[ilevel]-1);
	((Respa *) update->integrate)->copy_f_flevel(ilevel);
	}
	if (!rattle) dtf_inner = step_respa[0] * force->ftm2v;
	}
	}

	/* ----------------------------------------------------------------------
	wrapper method for end_of_step fixes which modify velocities
	------------------------------------------------------------------------- */

	void FixShake::correct_velocities() {}

	/* ----------------------------------------------------------------------
	calculate constraining forces based on the current configuration
	change coordinates
	------------------------------------------------------------------------- */

	void FixShake::correct_coordinates(int vflag) {

	// save current forces and velocities so that you
	// initialise them to zero such that FixShake::unconstrained_coordinate_update has no effect

	for (int j=0; j<nlocal; j++) {
	for (int k=0; k<3; k++) {

	// store current value of forces and velocities

	ftmp[j][k] = f[j][k];
	vtmp[j][k] = v[j][k];

	// set f and v to zero for SHAKE

	v[j][k] = 0;
	f[j][k] = 0;
	}
	}

	// call SHAKE to correct the coordinates which were updated without constraints
	// IMPORTANT: use 1 as argument and thereby enforce velocity Verlet

	dtfsq = 0.5 * update->dt * update->dt * force->ftm2v;
	FixShake::post_force(vflag);

	// integrate coordiantes: x' = xnp1 + dt^2/2m_i * f, where f is the constraining force
	// NOTE: After this command, the coordinates geometry of the molecules will be correct!

	double dtfmsq;
	if (rmass) {
	for (int i = 0; i < nlocal; i++) {
	dtfmsq = dtfsq/ rmass[i];
	x[i][0] = x[i][0] + dtfmsq*f[i][0];
	x[i][1] = x[i][1] + dtfmsq*f[i][1];
	x[i][2] = x[i][2] + dtfmsq*f[i][2];
	}
	}
	else {
	for (int i = 0; i < nlocal; i++) {
	dtfmsq = dtfsq / mass[type[i]];
	x[i][0] = x[i][0] + dtfmsq*f[i][0];
	x[i][1] = x[i][1] + dtfmsq*f[i][1];
	x[i][2] = x[i][2] + dtfmsq*f[i][2];
	}
	}

	// copy forces and velocities back

	for (int j=0; j<nlocal; j++) {
	for (int k=0; k<3; k++) {
	f[j][k] = ftmp[j][k];
	v[j][k] = vtmp[j][k];
	}
	}

	if (!rattle) dtfsq = update->dt * update->dt * force->ftm2v;

	// communicate changes
	// NOTE: for compatibility xshake is temporarily set to x, such that pack/unpack_forward
	// can be used for communicating the coordinates.

	double **xtmp = xshake;
	xshake = x;
	if (nprocs > 1) {
	comm->forward_comm_fix(this);
	}
	xshake = xtmp;
	}
	diff --git a/src/SNAP/compute_sna_atom.cpp b/src/SNAP/compute_sna_atom.cpp
	index ad934535a..cba6fae9b 100644
	--- a/src/SNAP/compute_sna_atom.cpp
	+++ b/src/SNAP/compute_sna_atom.cpp
	@@ -1,286 +1,301 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */
	#include "sna.h"
	#include <string.h>
	#include <stdlib.h>
	#include "compute_sna_atom.h"
	#include "atom.h"
	#include "update.h"
	#include "modify.h"
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "neigh_request.h"
	#include "force.h"
	#include "pair.h"
	#include "comm.h"
	#include "memory.h"
	#include "error.h"
	#include "openmp_snap.h"

	using namespace LAMMPS_NS;

	ComputeSNAAtom::ComputeSNAAtom(LAMMPS lmp, int narg, char *arg) :
	Compute(lmp, narg, arg), cutsq(NULL), list(NULL), sna(NULL),
	radelem(NULL), wjelem(NULL)
	{
	double rmin0, rfac0;
	int twojmax, switchflag, bzeroflag;
	radelem = NULL;
	wjelem = NULL;

	int ntypes = atom->ntypes;
	int nargmin = 6+2*ntypes;

	if (narg < nargmin) error->all(FLERR,"Illegal compute sna/atom command");

	// default values

	diagonalstyle = 0;
	rmin0 = 0.0;
	switchflag = 1;
	- bzeroflag = 0;
	+ bzeroflag = 1;
	+ quadraticflag = 0;

	// offset by 1 to match up with types

	memory->create(radelem,ntypes+1,"sna/atom:radelem");
	memory->create(wjelem,ntypes+1,"sna/atom:wjelem");

	rcutfac = atof(arg[3]);
	rfac0 = atof(arg[4]);
	twojmax = atoi(arg[5]);

	for(int i = 0; i < ntypes; i++)
	radelem[i+1] = atof(arg[6+i]);
	for(int i = 0; i < ntypes; i++)
	wjelem[i+1] = atof(arg[6+ntypes+i]);

	// construct cutsq

	double cut;
	cutmax = 0.0;
	memory->create(cutsq,ntypes+1,ntypes+1,"sna/atom:cutsq");
	for(int i = 1; i <= ntypes; i++) {
	cut = 2.0radelem[i]rcutfac;
	if (cut > cutmax) cutmax = cut;
	cutsq[i][i] = cut*cut;
	for(int j = i+1; j <= ntypes; j++) {
	cut = (radelem[i]+radelem[j])*rcutfac;
	cutsq[i][j] = cutsq[j][i] = cut*cut;
	}
	}

	// process optional args

	int iarg = nargmin;

	while (iarg < narg) {
	if (strcmp(arg[iarg],"diagonal") == 0) {
	if (iarg+2 > narg)
	error->all(FLERR,"Illegal compute sna/atom command");
	diagonalstyle = atoi(arg[iarg+1]);
	if (diagonalstyle < 0 \|\| diagonalstyle > 3)
	error->all(FLERR,"Illegal compute sna/atom command");
	iarg += 2;
	} else if (strcmp(arg[iarg],"rmin0") == 0) {
	if (iarg+2 > narg)
	error->all(FLERR,"Illegal compute sna/atom command");
	rmin0 = atof(arg[iarg+1]);
	iarg += 2;
	} else if (strcmp(arg[iarg],"switchflag") == 0) {
	if (iarg+2 > narg)
	error->all(FLERR,"Illegal compute sna/atom command");
	switchflag = atoi(arg[iarg+1]);
	iarg += 2;
	} else if (strcmp(arg[iarg],"bzeroflag") == 0) {
	if (iarg+2 > narg)
	error->all(FLERR,"Illegal compute sna/atom command");
	bzeroflag = atoi(arg[iarg+1]);
	iarg += 2;
	+ } else if (strcmp(arg[iarg],"quadraticflag") == 0) {
	+ if (iarg+2 > narg)
	+ error->all(FLERR,"Illegal compute sna/atom command");
	+ quadraticflag = atoi(arg[iarg+1]);
	+ iarg += 2;
	} else error->all(FLERR,"Illegal compute sna/atom command");
	}

	snaptr = new SNA*[comm->nthreads];
	#if defined(_OPENMP)
	#pragma omp parallel default(none) shared(lmp,rfac0,twojmax,rmin0,switchflag,bzeroflag)
	#endif
	{
	int tid = omp_get_thread_num();

	// always unset use_shared_arrays since it does not work with computes
	snaptr[tid] = new SNA(lmp,rfac0,twojmax,diagonalstyle,
	0 /use_shared_arrays/, rmin0,switchflag,bzeroflag);
	}

	ncoeff = snaptr[0]->ncoeff;
	- peratom_flag = 1;
	size_peratom_cols = ncoeff;
	+ if (quadraticflag) size_peratom_cols += ncoeff*ncoeff;
	+ peratom_flag = 1;

	nmax = 0;
	njmax = 0;
	sna = NULL;

	}

	/* ---------------------------------------------------------------------- */

	ComputeSNAAtom::~ComputeSNAAtom()
	{
	memory->destroy(sna);
	memory->destroy(radelem);
	memory->destroy(wjelem);
	memory->destroy(cutsq);
	delete [] snaptr;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSNAAtom::init()
	{
	if (force->pair == NULL)
	error->all(FLERR,"Compute sna/atom requires a pair style be defined");

	if (cutmax > force->pair->cutforce)
	error->all(FLERR,"Compute sna/atom cutoff is longer than pairwise cutoff");

	// need an occasional full neighbor list

	int irequest = neighbor->request(this,instance_me);
	neighbor->requests[irequest]->pair = 0;
	neighbor->requests[irequest]->compute = 1;
	neighbor->requests[irequest]->half = 0;
	neighbor->requests[irequest]->full = 1;
	neighbor->requests[irequest]->occasional = 1;

	int count = 0;
	for (int i = 0; i < modify->ncompute; i++)
	if (strcmp(modify->compute[i]->style,"sna/atom") == 0) count++;
	if (count > 1 && comm->me == 0)
	error->warning(FLERR,"More than one compute sna/atom");
	#if defined(_OPENMP)
	#pragma omp parallel default(none)
	#endif
	{
	int tid = omp_get_thread_num();
	snaptr[tid]->init();
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSNAAtom::init_list(int id, NeighList *ptr)
	{
	list = ptr;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSNAAtom::compute_peratom()
	{
	invoked_peratom = update->ntimestep;

	// grow sna array if necessary

	if (atom->nmax > nmax) {
	memory->destroy(sna);
	nmax = atom->nmax;
	memory->create(sna,nmax,size_peratom_cols,"sna/atom:sna");
	array_atom = sna;
	}

	// invoke full neighbor list (will copy or build if necessary)

	neighbor->build_one(list);

	const int inum = list->inum;
	const int* const ilist = list->ilist;
	const int* const numneigh = list->numneigh;
	int** const firstneigh = list->firstneigh;
	int * const type = atom->type;

	// compute sna for each atom in group
	// use full neighbor list to count atoms less than cutoff

	double** const x = atom->x;
	const int* const mask = atom->mask;

	#if defined(_OPENMP)
	#pragma omp parallel for default(none)
	#endif
	for (int ii = 0; ii < inum; ii++) {
	const int tid = omp_get_thread_num();
	const int i = ilist[ii];
	if (mask[i] & groupbit) {

	const double xtmp = x[i][0];
	const double ytmp = x[i][1];
	const double ztmp = x[i][2];
	const int itype = type[i];
	const double radi = radelem[itype];
	const int* const jlist = firstneigh[i];
	const int jnum = numneigh[i];

	// insure rij, inside, and typej are of size jnum

	snaptr[tid]->grow_rij(jnum);

	// rij[][3] = displacements between atom I and those neighbors
	// inside = indices of neighbors of I within cutoff
	// typej = types of neighbors of I within cutoff

	int ninside = 0;
	for (int jj = 0; jj < jnum; jj++) {
	int j = jlist[jj];
	j &= NEIGHMASK;

	const double delx = xtmp - x[j][0];
	const double dely = ytmp - x[j][1];
	const double delz = ztmp - x[j][2];
	const double rsq = delxdelx + delydely + delz*delz;
	int jtype = type[j];
	if (rsq < cutsq[itype][jtype] && rsq>1e-20) {
	snaptr[tid]->rij[ninside][0] = delx;
	snaptr[tid]->rij[ninside][1] = dely;
	snaptr[tid]->rij[ninside][2] = delz;
	snaptr[tid]->inside[ninside] = j;
	snaptr[tid]->wj[ninside] = wjelem[jtype];
	snaptr[tid]->rcutij[ninside] = (radi+radelem[jtype])*rcutfac;
	ninside++;
	}
	}

	snaptr[tid]->compute_ui(ninside);
	snaptr[tid]->compute_zi();
	snaptr[tid]->compute_bi();
	snaptr[tid]->copy_bi2bvec();
	for (int icoeff = 0; icoeff < ncoeff; icoeff++)
	sna[i][icoeff] = snaptr[tid]->bvec[icoeff];
	+ if (quadraticflag) {
	+ int ncount = ncoeff;
	+ for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
	+ double bi = snaptr[tid]->bvec[icoeff];
	+ for (int jcoeff = 0; jcoeff < ncoeff; jcoeff++)
	+ sna[i][ncount++] = bi*snaptr[tid]->bvec[jcoeff];
	+ }
	+ }
	} else {
	- for (int icoeff = 0; icoeff < ncoeff; icoeff++)
	+ for (int icoeff = 0; icoeff < size_peratom_cols; icoeff++)
	sna[i][icoeff] = 0.0;
	}
	}
	}

	/* ----------------------------------------------------------------------
	memory usage
	------------------------------------------------------------------------- */

	double ComputeSNAAtom::memory_usage()
	{
	double bytes = nmaxsize_peratom_cols sizeof(double);
	bytes += 3njmaxsizeof(double);
	bytes += njmax*sizeof(int);
	bytes += snaptr[0]->memory_usage()*comm->nthreads;
	return bytes;
	}

	diff --git a/src/SNAP/compute_sna_atom.h b/src/SNAP/compute_sna_atom.h
	index af62d7cf3..b22eea71b 100644
	--- a/src/SNAP/compute_sna_atom.h
	+++ b/src/SNAP/compute_sna_atom.h
	@@ -1,75 +1,75 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifdef COMPUTE_CLASS

	ComputeStyle(sna/atom,ComputeSNAAtom)

	#else

	#ifndef LMP_COMPUTE_SNA_ATOM_H
	#define LMP_COMPUTE_SNA_ATOM_H

	#include "compute.h"

	namespace LAMMPS_NS {

	class ComputeSNAAtom : public Compute {
	public:
	ComputeSNAAtom(class LAMMPS , int, char *);
	~ComputeSNAAtom();
	void init();
	void init_list(int, class NeighList *);
	void compute_peratom();
	double memory_usage();

	private:
	int nmax, njmax, diagonalstyle;
	int ncoeff;
	double **cutsq;
	class NeighList *list;
	double **sna;
	double rcutfac;
	double *radelem;
	double *wjelem;
	class SNA** snaptr;
	double cutmax;
	-
	+ int quadraticflag;
	};

	}

	#endif
	#endif

	/* ERROR/WARNING messages:

	E: Illegal ... command

	Self-explanatory. Check the input script syntax and compare to the
	documentation for the command. You can use -echo screen as a
	command-line option when running LAMMPS to see the offending line.

	E: Compute sna/atom requires a pair style be defined

	Self-explanatory.

	E: Compute sna/atom cutoff is longer than pairwise cutoff

	Self-explanatory.

	W: More than one compute sna/atom

	Self-explanatory.

	*/
	diff --git a/src/SNAP/compute_snad_atom.cpp b/src/SNAP/compute_snad_atom.cpp
	index 73452427b..39f34dd8c 100644
	--- a/src/SNAP/compute_snad_atom.cpp
	+++ b/src/SNAP/compute_snad_atom.cpp
	@@ -1,337 +1,391 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */
	#include "sna.h"
	#include <string.h>
	#include <stdlib.h>
	#include "compute_snad_atom.h"
	#include "atom.h"
	#include "update.h"
	#include "modify.h"
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "neigh_request.h"
	#include "force.h"
	#include "pair.h"
	#include "comm.h"
	#include "memory.h"
	#include "error.h"
	#include "openmp_snap.h"

	using namespace LAMMPS_NS;

	ComputeSNADAtom::ComputeSNADAtom(LAMMPS lmp, int narg, char *arg) :
	Compute(lmp, narg, arg), cutsq(NULL), list(NULL), snad(NULL),
	radelem(NULL), wjelem(NULL)
	{
	double rfac0, rmin0;
	int twojmax, switchflag, bzeroflag;
	radelem = NULL;
	wjelem = NULL;

	int ntypes = atom->ntypes;
	int nargmin = 6+2*ntypes;

	if (narg < nargmin) error->all(FLERR,"Illegal compute snad/atom command");

	// default values

	diagonalstyle = 0;
	rmin0 = 0.0;
	switchflag = 1;
	- bzeroflag = 0;
	+ bzeroflag = 1;
	+ quadraticflag = 0;

	// process required arguments
	+
	memory->create(radelem,ntypes+1,"sna/atom:radelem"); // offset by 1 to match up with types
	memory->create(wjelem,ntypes+1,"sna/atom:wjelem");
	rcutfac = atof(arg[3]);
	rfac0 = atof(arg[4]);
	twojmax = atoi(arg[5]);
	for(int i = 0; i < ntypes; i++)
	radelem[i+1] = atof(arg[6+i]);
	for(int i = 0; i < ntypes; i++)
	wjelem[i+1] = atof(arg[6+ntypes+i]);
	+
	// construct cutsq
	+
	double cut;
	+ cutmax = 0.0;
	memory->create(cutsq,ntypes+1,ntypes+1,"sna/atom:cutsq");
	for(int i = 1; i <= ntypes; i++) {
	cut = 2.0radelem[i]rcutfac;
	+ if (cut > cutmax) cutmax = cut;
	cutsq[i][i] = cut*cut;
	for(int j = i+1; j <= ntypes; j++) {
	cut = (radelem[i]+radelem[j])*rcutfac;
	cutsq[i][j] = cutsq[j][i] = cut*cut;
	}
	}

	// process optional args

	int iarg = nargmin;

	while (iarg < narg) {
	if (strcmp(arg[iarg],"diagonal") == 0) {
	if (iarg+2 > narg)
	error->all(FLERR,"Illegal compute snad/atom command");
	diagonalstyle = atof(arg[iarg+1]);
	if (diagonalstyle < 0 \|\| diagonalstyle > 3)
	error->all(FLERR,"Illegal compute snad/atom command");
	iarg += 2;
	} else if (strcmp(arg[iarg],"rmin0") == 0) {
	if (iarg+2 > narg)
	error->all(FLERR,"Illegal compute snad/atom command");
	rmin0 = atof(arg[iarg+1]);
	iarg += 2;
	} else if (strcmp(arg[iarg],"switchflag") == 0) {
	if (iarg+2 > narg)
	error->all(FLERR,"Illegal compute snad/atom command");
	switchflag = atoi(arg[iarg+1]);
	iarg += 2;
	+ } else if (strcmp(arg[iarg],"quadraticflag") == 0) {
	+ if (iarg+2 > narg)
	+ error->all(FLERR,"Illegal compute snad/atom command");
	+ quadraticflag = atoi(arg[iarg+1]);
	+ iarg += 2;
	} else error->all(FLERR,"Illegal compute snad/atom command");
	}

	snaptr = new SNA*[comm->nthreads];
	#if defined(_OPENMP)
	#pragma omp parallel default(none) shared(lmp,rfac0,twojmax,rmin0,switchflag,bzeroflag)
	#endif
	{
	int tid = omp_get_thread_num();

	// always unset use_shared_arrays since it does not work with computes
	snaptr[tid] = new SNA(lmp,rfac0,twojmax,diagonalstyle,
	0 /use_shared_arrays/, rmin0,switchflag,bzeroflag);
	}

	ncoeff = snaptr[0]->ncoeff;
	- peratom_flag = 1;
	- size_peratom_cols = 3ncoeffatom->ntypes;
	+ twoncoeff = 2*ncoeff;
	+ threencoeff = 3*ncoeff;
	+ size_peratom_cols = threencoeff*atom->ntypes;
	+ if (quadraticflag) {
	+ ncoeffsq = ncoeff*ncoeff;
	+ twoncoeffsq = 2*ncoeffsq;
	+ threencoeffsq = 3*ncoeffsq;
	+ size_peratom_cols +=
	+ threencoeffsq*atom->ntypes;
	+ }
	comm_reverse = size_peratom_cols;
	+ peratom_flag = 1;
	+
	nmax = 0;
	njmax = 0;
	snad = NULL;

	}

	/* ---------------------------------------------------------------------- */

	ComputeSNADAtom::~ComputeSNADAtom()
	{
	memory->destroy(snad);
	memory->destroy(radelem);
	memory->destroy(wjelem);
	memory->destroy(cutsq);
	delete [] snaptr;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSNADAtom::init()
	{
	if (force->pair == NULL)
	error->all(FLERR,"Compute snad/atom requires a pair style be defined");
	- // TODO: Not sure what to do with this error check since cutoff radius is not
	- // a single number
	- //if (sqrt(cutsq) > force->pair->cutforce)
	- //error->all(FLERR,"Compute snad/atom cutoff is longer than pairwise cutoff");
	+
	+ if (cutmax > force->pair->cutforce)
	+ error->all(FLERR,"Compute sna/atom cutoff is longer than pairwise cutoff");

	// need an occasional full neighbor list

	int irequest = neighbor->request(this,instance_me);
	neighbor->requests[irequest]->pair = 0;
	neighbor->requests[irequest]->compute = 1;
	neighbor->requests[irequest]->half = 0;
	neighbor->requests[irequest]->full = 1;
	neighbor->requests[irequest]->occasional = 1;

	int count = 0;
	for (int i = 0; i < modify->ncompute; i++)
	if (strcmp(modify->compute[i]->style,"snad/atom") == 0) count++;
	if (count > 1 && comm->me == 0)
	error->warning(FLERR,"More than one compute snad/atom");
	#if defined(_OPENMP)
	#pragma omp parallel default(none)
	#endif
	{
	int tid = omp_get_thread_num();
	snaptr[tid]->init();
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSNADAtom::init_list(int id, NeighList *ptr)
	{
	list = ptr;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSNADAtom::compute_peratom()
	{
	int ntotal = atom->nlocal + atom->nghost;

	invoked_peratom = update->ntimestep;

	// grow snad array if necessary

	if (atom->nmax > nmax) {
	memory->destroy(snad);
	nmax = atom->nmax;
	memory->create(snad,nmax,size_peratom_cols,
	"snad/atom:snad");
	array_atom = snad;
	}

	// clear local array

	for (int i = 0; i < ntotal; i++)
	for (int icoeff = 0; icoeff < size_peratom_cols; icoeff++) {
	snad[i][icoeff] = 0.0;
	}

	// invoke full neighbor list (will copy or build if necessary)

	neighbor->build_one(list);

	const int inum = list->inum;
	const int* const ilist = list->ilist;
	const int* const numneigh = list->numneigh;
	int** const firstneigh = list->firstneigh;
	int * const type = atom->type;

	// compute sna derivatives for each atom in group
	// use full neighbor list to count atoms less than cutoff

	double** const x = atom->x;
	const int* const mask = atom->mask;

	#if defined(_OPENMP)
	#pragma omp parallel for default(none)
	#endif
	for (int ii = 0; ii < inum; ii++) {
	const int tid = omp_get_thread_num();
	const int i = ilist[ii];
	if (mask[i] & groupbit) {

	const double xtmp = x[i][0];
	const double ytmp = x[i][1];
	const double ztmp = x[i][2];
	const int itype = type[i];
	const double radi = radelem[itype];
	const int* const jlist = firstneigh[i];
	const int jnum = numneigh[i];

	- const int typeoffset = 3ncoeff(atom->type[i]-1);
	+ const int typeoffset = threencoeff*(atom->type[i]-1);
	+ const int quadraticoffset = threencoeff*atom->ntypes +
	+ threencoeffsq*(atom->type[i]-1);

	// insure rij, inside, and typej are of size jnum

	snaptr[tid]->grow_rij(jnum);

	// rij[][3] = displacements between atom I and those neighbors
	// inside = indices of neighbors of I within cutoff
	// typej = types of neighbors of I within cutoff
	// note Rij sign convention => dU/dRij = dU/dRj = -dU/dRi

	int ninside = 0;
	for (int jj = 0; jj < jnum; jj++) {
	int j = jlist[jj];
	j &= NEIGHMASK;

	const double delx = x[j][0] - xtmp;
	const double dely = x[j][1] - ytmp;
	const double delz = x[j][2] - ztmp;
	const double rsq = delxdelx + delydely + delz*delz;
	int jtype = type[j];
	if (rsq < cutsq[itype][jtype]&&rsq>1e-20) {
	snaptr[tid]->rij[ninside][0] = delx;
	snaptr[tid]->rij[ninside][1] = dely;
	snaptr[tid]->rij[ninside][2] = delz;
	snaptr[tid]->inside[ninside] = j;
	snaptr[tid]->wj[ninside] = wjelem[jtype];
	snaptr[tid]->rcutij[ninside] = (radi+radelem[jtype])*rcutfac;
	ninside++;
	}
	}

	snaptr[tid]->compute_ui(ninside);
	snaptr[tid]->compute_zi();
	-
	+ if (quadraticflag) {
	+ snaptr[tid]->compute_bi();
	+ snaptr[tid]->copy_bi2bvec();
	+ }
	+
	for (int jj = 0; jj < ninside; jj++) {
	const int j = snaptr[tid]->inside[jj];
	snaptr[tid]->compute_duidrj(snaptr[tid]->rij[jj],
	snaptr[tid]->wj[jj],
	snaptr[tid]->rcutij[jj]);
	snaptr[tid]->compute_dbidrj();
	snaptr[tid]->copy_dbi2dbvec();

	// Accumulate -dBi/dRi, -dBi/dRj

	double *snadi = snad[i]+typeoffset;
	double *snadj = snad[j]+typeoffset;

	for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
	snadi[icoeff] += snaptr[tid]->dbvec[icoeff][0];
	snadi[icoeff+ncoeff] += snaptr[tid]->dbvec[icoeff][1];
	- snadi[icoeff+2*ncoeff] += snaptr[tid]->dbvec[icoeff][2];
	+ snadi[icoeff+twoncoeff] += snaptr[tid]->dbvec[icoeff][2];
	snadj[icoeff] -= snaptr[tid]->dbvec[icoeff][0];
	snadj[icoeff+ncoeff] -= snaptr[tid]->dbvec[icoeff][1];
	- snadj[icoeff+2*ncoeff] -= snaptr[tid]->dbvec[icoeff][2];
	+ snadj[icoeff+twoncoeff] -= snaptr[tid]->dbvec[icoeff][2];
	}
	+
	+ if (quadraticflag) {
	+ double *snadi = snad[i]+quadraticoffset;
	+ double *snadj = snad[j]+quadraticoffset;
	+ int ncount = 0;
	+ for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
	+ double bi = snaptr[tid]->bvec[icoeff];
	+ double bix = snaptr[tid]->dbvec[icoeff][0];
	+ double biy = snaptr[tid]->dbvec[icoeff][1];
	+ double biz = snaptr[tid]->dbvec[icoeff][2];
	+ for (int jcoeff = 0; jcoeff < ncoeff; jcoeff++) {
	+ double dbxtmp = bi*snaptr[tid]->dbvec[jcoeff][0]
	+ + bix*snaptr[tid]->bvec[jcoeff];
	+ double dbytmp = bi*snaptr[tid]->dbvec[jcoeff][1]
	+ + biy*snaptr[tid]->bvec[jcoeff];
	+ double dbztmp = bi*snaptr[tid]->dbvec[jcoeff][2]
	+ + biz*snaptr[tid]->bvec[jcoeff];
	+ snadi[ncount] += dbxtmp;
	+ snadi[ncount+ncoeffsq] += dbytmp;
	+ snadi[ncount+twoncoeffsq] += dbztmp;
	+ snadj[ncount] -= dbxtmp;
	+ snadj[ncount+ncoeffsq] -= dbytmp;
	+ snadj[ncount+twoncoeffsq] -= dbztmp;
	+ ncount++;
	+ }
	+ }
	+ }
	}
	}
	}

	// communicate snad contributions between neighbor procs

	comm->reverse_comm_compute(this);

	}

	/* ---------------------------------------------------------------------- */

	int ComputeSNADAtom::pack_reverse_comm(int n, int first, double *buf)
	{
	int i,m,last,icoeff;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++)
	for (icoeff = 0; icoeff < size_peratom_cols; icoeff++)
	buf[m++] = snad[i][icoeff];
	return comm_reverse;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSNADAtom::unpack_reverse_comm(int n, int list, double buf)
	{
	int i,j,m,icoeff;

	m = 0;
	for (i = 0; i < n; i++) {
	j = list[i];
	for (icoeff = 0; icoeff < size_peratom_cols; icoeff++)
	snad[j][icoeff] += buf[m++];
	}
	}

	/* ----------------------------------------------------------------------
	memory usage
	------------------------------------------------------------------------- */

	double ComputeSNADAtom::memory_usage()
	{
	double bytes = nmaxsize_peratom_cols sizeof(double);
	bytes += 3njmaxsizeof(double);
	bytes += njmax*sizeof(int);
	- bytes += ncoeff*3;
	+ bytes += threencoeff*atom->ntypes;
	+ if (quadraticflag) bytes += threencoeffsq*atom->ntypes;
	bytes += snaptr[0]->memory_usage()*comm->nthreads;
	return bytes;
	}
	diff --git a/src/SNAP/compute_snad_atom.h b/src/SNAP/compute_snad_atom.h
	index 31f5bf252..0d5a369ab 100644
	--- a/src/SNAP/compute_snad_atom.h
	+++ b/src/SNAP/compute_snad_atom.h
	@@ -1,76 +1,77 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifdef COMPUTE_CLASS

	ComputeStyle(snad/atom,ComputeSNADAtom)

	#else

	#ifndef LMP_COMPUTE_SNAD_ATOM_H
	#define LMP_COMPUTE_SNAD_ATOM_H

	#include "compute.h"

	namespace LAMMPS_NS {

	class ComputeSNADAtom : public Compute {
	public:
	ComputeSNADAtom(class LAMMPS , int, char *);
	~ComputeSNADAtom();
	void init();
	void init_list(int, class NeighList *);
	void compute_peratom();
	int pack_reverse_comm(int, int, double *);
	void unpack_reverse_comm(int, int , double );
	double memory_usage();

	private:
	int nmax, njmax, diagonalstyle;
	- int ncoeff;
	+ int ncoeff, twoncoeff, threencoeff, ncoeffsq, twoncoeffsq, threencoeffsq;
	double **cutsq;
	class NeighList *list;
	double **snad;
	double rcutfac;
	double *radelem;
	double *wjelem;
	class SNA** snaptr;
	-
	+ double cutmax;
	+ int quadraticflag;
	};

	}

	#endif
	#endif

	/* ERROR/WARNING messages:

	E: Illegal ... command

	Self-explanatory. Check the input script syntax and compare to the
	documentation for the command. You can use -echo screen as a
	command-line option when running LAMMPS to see the offending line.

	E: Compute snad/atom requires a pair style be defined

	Self-explanatory.

	E: Compute snad/atom cutoff is longer than pairwise cutoff

	Self-explanatory.

	W: More than one compute snad/atom

	Self-explanatory.

	*/
	diff --git a/src/SNAP/compute_snav_atom.cpp b/src/SNAP/compute_snav_atom.cpp
	index f75b02fba..0d21d1656 100644
	--- a/src/SNAP/compute_snav_atom.cpp
	+++ b/src/SNAP/compute_snav_atom.cpp
	@@ -1,347 +1,407 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */
	#include "sna.h"
	#include <string.h>
	#include <stdlib.h>
	#include "compute_snav_atom.h"
	#include "atom.h"
	#include "update.h"
	#include "modify.h"
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "neigh_request.h"
	#include "force.h"
	#include "pair.h"
	#include "comm.h"
	#include "memory.h"
	#include "error.h"
	#include "openmp_snap.h"

	using namespace LAMMPS_NS;

	ComputeSNAVAtom::ComputeSNAVAtom(LAMMPS lmp, int narg, char *arg) :
	Compute(lmp, narg, arg), cutsq(NULL), list(NULL), snav(NULL),
	radelem(NULL), wjelem(NULL)
	{
	double rfac0, rmin0;
	int twojmax, switchflag, bzeroflag;
	radelem = NULL;
	wjelem = NULL;

	- nvirial = 6;
	-
	int ntypes = atom->ntypes;
	int nargmin = 6+2*ntypes;

	if (narg < nargmin) error->all(FLERR,"Illegal compute snav/atom command");

	// default values

	diagonalstyle = 0;
	rmin0 = 0.0;
	switchflag = 1;
	- bzeroflag = 0;
	+ bzeroflag = 1;
	+ quadraticflag = 0;

	// process required arguments
	+
	memory->create(radelem,ntypes+1,"sna/atom:radelem"); // offset by 1 to match up with types
	memory->create(wjelem,ntypes+1,"sna/atom:wjelem");
	rcutfac = atof(arg[3]);
	rfac0 = atof(arg[4]);
	twojmax = atoi(arg[5]);
	for(int i = 0; i < ntypes; i++)
	radelem[i+1] = atof(arg[6+i]);
	for(int i = 0; i < ntypes; i++)
	wjelem[i+1] = atof(arg[6+ntypes+i]);
	// construct cutsq
	double cut;
	memory->create(cutsq,ntypes+1,ntypes+1,"sna/atom:cutsq");
	for(int i = 1; i <= ntypes; i++) {
	cut = 2.0radelem[i]rcutfac;
	cutsq[i][i] = cut*cut;
	for(int j = i+1; j <= ntypes; j++) {
	cut = (radelem[i]+radelem[j])*rcutfac;
	cutsq[i][j] = cutsq[j][i] = cut*cut;
	}
	}

	// process optional args

	int iarg = nargmin;

	while (iarg < narg) {
	if (strcmp(arg[iarg],"diagonal") == 0) {
	if (iarg+2 > narg)
	error->all(FLERR,"Illegal compute snav/atom command");
	diagonalstyle = atof(arg[iarg+1]);
	if (diagonalstyle < 0 \|\| diagonalstyle > 3)
	error->all(FLERR,"Illegal compute snav/atom command");
	iarg += 2;
	} else if (strcmp(arg[iarg],"rmin0") == 0) {
	if (iarg+2 > narg)
	error->all(FLERR,"Illegal compute snav/atom command");
	rmin0 = atof(arg[iarg+1]);
	iarg += 2;
	} else if (strcmp(arg[iarg],"switchflag") == 0) {
	if (iarg+2 > narg)
	error->all(FLERR,"Illegal compute snav/atom command");
	switchflag = atoi(arg[iarg+1]);
	iarg += 2;
	+ } else if (strcmp(arg[iarg],"quadraticflag") == 0) {
	+ if (iarg+2 > narg)
	+ error->all(FLERR,"Illegal compute snav/atom command");
	+ quadraticflag = atoi(arg[iarg+1]);
	+ iarg += 2;
	} else error->all(FLERR,"Illegal compute snav/atom command");
	}

	snaptr = new SNA*[comm->nthreads];
	#if defined(_OPENMP)
	#pragma omp parallel default(none) shared(lmp,rfac0,twojmax,rmin0,switchflag,bzeroflag)
	#endif
	{
	int tid = omp_get_thread_num();

	// always unset use_shared_arrays since it does not work with computes
	snaptr[tid] = new SNA(lmp,rfac0,twojmax,diagonalstyle,
	0 /use_shared_arrays/, rmin0,switchflag,bzeroflag);
	}

	ncoeff = snaptr[0]->ncoeff;
	- peratom_flag = 1;
	- size_peratom_cols = nvirialncoeffatom->ntypes;
	+ twoncoeff = 2*ncoeff;
	+ threencoeff = 3*ncoeff;
	+ fourncoeff = 4*ncoeff;
	+ fivencoeff = 5*ncoeff;
	+ sixncoeff = 6*ncoeff;
	+ size_peratom_cols = sixncoeff*atom->ntypes;
	+ if (quadraticflag) {
	+ ncoeffsq = ncoeff*ncoeff;
	+ twoncoeffsq = 2*ncoeffsq;
	+ threencoeffsq = 3*ncoeffsq;
	+ fourncoeffsq = 4*ncoeffsq;
	+ fivencoeffsq = 5*ncoeffsq;
	+ sixncoeffsq = 6*ncoeffsq;
	+ size_peratom_cols +=
	+ sixncoeffsq*atom->ntypes;
	+ }
	comm_reverse = size_peratom_cols;
	+ peratom_flag = 1;

	nmax = 0;
	njmax = 0;
	snav = NULL;

	}

	/* ---------------------------------------------------------------------- */

	ComputeSNAVAtom::~ComputeSNAVAtom()
	{
	memory->destroy(snav);
	memory->destroy(radelem);
	memory->destroy(wjelem);
	memory->destroy(cutsq);
	delete [] snaptr;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSNAVAtom::init()
	{
	if (force->pair == NULL)
	error->all(FLERR,"Compute snav/atom requires a pair style be defined");
	// TODO: Not sure what to do with this error check since cutoff radius is not
	// a single number
	//if (sqrt(cutsq) > force->pair->cutforce)
	// error->all(FLERR,"Compute snav/atom cutoff is longer than pairwise cutoff");

	// need an occasional full neighbor list

	int irequest = neighbor->request(this,instance_me);
	neighbor->requests[irequest]->pair = 0;
	neighbor->requests[irequest]->compute = 1;
	neighbor->requests[irequest]->half = 0;
	neighbor->requests[irequest]->full = 1;
	neighbor->requests[irequest]->occasional = 1;

	int count = 0;
	for (int i = 0; i < modify->ncompute; i++)
	if (strcmp(modify->compute[i]->style,"snav/atom") == 0) count++;
	if (count > 1 && comm->me == 0)
	error->warning(FLERR,"More than one compute snav/atom");
	#if defined(_OPENMP)
	#pragma omp parallel default(none)
	#endif
	{
	int tid = omp_get_thread_num();
	snaptr[tid]->init();
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSNAVAtom::init_list(int id, NeighList *ptr)
	{
	list = ptr;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSNAVAtom::compute_peratom()
	{
	int ntotal = atom->nlocal + atom->nghost;

	invoked_peratom = update->ntimestep;

	// grow snav array if necessary

	if (atom->nmax > nmax) {
	memory->destroy(snav);
	nmax = atom->nmax;
	memory->create(snav,nmax,size_peratom_cols,
	"snav/atom:snav");
	array_atom = snav;
	}

	// clear local array

	for (int i = 0; i < ntotal; i++)
	for (int icoeff = 0; icoeff < size_peratom_cols; icoeff++) {
	snav[i][icoeff] = 0.0;
	}

	// invoke full neighbor list (will copy or build if necessary)

	neighbor->build_one(list);

	const int inum = list->inum;
	const int* const ilist = list->ilist;
	const int* const numneigh = list->numneigh;
	int** const firstneigh = list->firstneigh;
	int * const type = atom->type;
	// compute sna derivatives for each atom in group
	// use full neighbor list to count atoms less than cutoff

	double** const x = atom->x;
	const int* const mask = atom->mask;

	#if defined(_OPENMP)
	#pragma omp parallel for default(none)
	#endif
	for (int ii = 0; ii < inum; ii++) {
	const int tid = omp_get_thread_num();
	const int i = ilist[ii];
	if (mask[i] & groupbit) {

	const double xtmp = x[i][0];
	const double ytmp = x[i][1];
	const double ztmp = x[i][2];
	const int itype = type[i];
	const double radi = radelem[itype];

	const int* const jlist = firstneigh[i];
	const int jnum = numneigh[i];

	- const int typeoffset = nvirialncoeff(atom->type[i]-1);
	+ const int typeoffset = sixncoeff*(atom->type[i]-1);
	+ const int quadraticoffset = sixncoeff*atom->ntypes +
	+ sixncoeffsq*(atom->type[i]-1);

	// insure rij, inside, and typej are of size jnum

	snaptr[tid]->grow_rij(jnum);

	// rij[][3] = displacements between atom I and those neighbors
	// inside = indices of neighbors of I within cutoff
	// typej = types of neighbors of I within cutoff
	// note Rij sign convention => dU/dRij = dU/dRj = -dU/dRi

	int ninside = 0;
	for (int jj = 0; jj < jnum; jj++) {
	int j = jlist[jj];
	j &= NEIGHMASK;

	const double delx = x[j][0] - xtmp;
	const double dely = x[j][1] - ytmp;
	const double delz = x[j][2] - ztmp;
	const double rsq = delxdelx + delydely + delz*delz;
	int jtype = type[j];
	if (rsq < cutsq[itype][jtype]&&rsq>1e-20) {
	snaptr[tid]->rij[ninside][0] = delx;
	snaptr[tid]->rij[ninside][1] = dely;
	snaptr[tid]->rij[ninside][2] = delz;
	snaptr[tid]->inside[ninside] = j;
	snaptr[tid]->wj[ninside] = wjelem[jtype];
	snaptr[tid]->rcutij[ninside] = (radi+radelem[jtype])*rcutfac;
	ninside++;
	}
	}

	snaptr[tid]->compute_ui(ninside);
	snaptr[tid]->compute_zi();
	+ if (quadraticflag) {
	+ snaptr[tid]->compute_bi();
	+ snaptr[tid]->copy_bi2bvec();
	+ }

	for (int jj = 0; jj < ninside; jj++) {
	const int j = snaptr[tid]->inside[jj];

	snaptr[tid]->compute_duidrj(snaptr[tid]->rij[jj],
	snaptr[tid]->wj[jj],
	snaptr[tid]->rcutij[jj]);
	snaptr[tid]->compute_dbidrj();
	snaptr[tid]->copy_dbi2dbvec();

	// Accumulate -dBi/dRiRi, -dBi/dRjRj

	double *snavi = snav[i]+typeoffset;
	double *snavj = snav[j]+typeoffset;

	for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
	- snavi[icoeff] += snaptr[tid]->dbvec[icoeff][0]*xtmp;
	- snavi[icoeff+ncoeff] += snaptr[tid]->dbvec[icoeff][1]*ytmp;
	- snavi[icoeff+2ncoeff] += snaptr[tid]->dbvec[icoeff][2]ztmp;
	- snavi[icoeff+3ncoeff] += snaptr[tid]->dbvec[icoeff][1]ztmp;
	- snavi[icoeff+4ncoeff] += snaptr[tid]->dbvec[icoeff][0]ztmp;
	- snavi[icoeff+5ncoeff] += snaptr[tid]->dbvec[icoeff][0]ytmp;
	- snavj[icoeff] -= snaptr[tid]->dbvec[icoeff][0]*x[j][0];
	- snavj[icoeff+ncoeff] -= snaptr[tid]->dbvec[icoeff][1]*x[j][1];
	- snavj[icoeff+2ncoeff] -= snaptr[tid]->dbvec[icoeff][2]x[j][2];
	- snavj[icoeff+3ncoeff] -= snaptr[tid]->dbvec[icoeff][1]x[j][2];
	- snavj[icoeff+4ncoeff] -= snaptr[tid]->dbvec[icoeff][0]x[j][2];
	- snavj[icoeff+5ncoeff] -= snaptr[tid]->dbvec[icoeff][0]x[j][1];
	+ snavi[icoeff] += snaptr[tid]->dbvec[icoeff][0]*xtmp;
	+ snavi[icoeff+ncoeff] += snaptr[tid]->dbvec[icoeff][1]*ytmp;
	+ snavi[icoeff+twoncoeff] += snaptr[tid]->dbvec[icoeff][2]*ztmp;
	+ snavi[icoeff+threencoeff] += snaptr[tid]->dbvec[icoeff][1]*ztmp;
	+ snavi[icoeff+fourncoeff] += snaptr[tid]->dbvec[icoeff][0]*ztmp;
	+ snavi[icoeff+fivencoeff] += snaptr[tid]->dbvec[icoeff][0]*ytmp;
	+ snavj[icoeff] -= snaptr[tid]->dbvec[icoeff][0]*x[j][0];
	+ snavj[icoeff+ncoeff] -= snaptr[tid]->dbvec[icoeff][1]*x[j][1];
	+ snavj[icoeff+twoncoeff] -= snaptr[tid]->dbvec[icoeff][2]*x[j][2];
	+ snavj[icoeff+threencoeff] -= snaptr[tid]->dbvec[icoeff][1]*x[j][2];
	+ snavj[icoeff+fourncoeff] -= snaptr[tid]->dbvec[icoeff][0]*x[j][2];
	+ snavj[icoeff+fivencoeff] -= snaptr[tid]->dbvec[icoeff][0]*x[j][1];
	}
	+
	+ if (quadraticflag) {
	+ double *snavi = snav[i]+quadraticoffset;
	+ double *snavj = snav[j]+quadraticoffset;
	+ int ncount = 0;
	+ for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
	+ double bi = snaptr[tid]->bvec[icoeff];
	+ double bix = snaptr[tid]->dbvec[icoeff][0];
	+ double biy = snaptr[tid]->dbvec[icoeff][1];
	+ double biz = snaptr[tid]->dbvec[icoeff][2];
	+ for (int jcoeff = 0; jcoeff < ncoeff; jcoeff++) {
	+ double dbxtmp = bi*snaptr[tid]->dbvec[jcoeff][0]
	+ + bix*snaptr[tid]->bvec[jcoeff];
	+ double dbytmp = bi*snaptr[tid]->dbvec[jcoeff][1]
	+ + biy*snaptr[tid]->bvec[jcoeff];
	+ double dbztmp = bi*snaptr[tid]->dbvec[jcoeff][2]
	+ + biz*snaptr[tid]->bvec[jcoeff];
	+ snavi[ncount] += dbxtmp*xtmp;
	+ snavi[ncount+ncoeffsq] += dbytmp*ytmp;
	+ snavi[ncount+twoncoeffsq] += dbztmp*ztmp;
	+ snavi[ncount+threencoeffsq] += dbytmp*ztmp;
	+ snavi[ncount+fourncoeffsq] += dbxtmp*ztmp;
	+ snavi[ncount+fivencoeffsq] += dbxtmp*ytmp;
	+ snavj[ncount] -= dbxtmp*x[j][0];
	+ snavj[ncount+ncoeffsq] -= dbytmp*x[j][1];
	+ snavj[ncount+twoncoeffsq] -= dbztmp*x[j][2];
	+ snavj[ncount+threencoeffsq] -= dbytmp*x[j][2];
	+ snavj[ncount+fourncoeffsq] -= dbxtmp*x[j][2];
	+ snavj[ncount+fivencoeffsq] -= dbxtmp*x[j][1];
	+ ncount++;
	+ }
	+ }
	+ }
	}
	}
	}

	// communicate snav contributions between neighbor procs

	comm->reverse_comm_compute(this);

	}

	/* ---------------------------------------------------------------------- */

	int ComputeSNAVAtom::pack_reverse_comm(int n, int first, double *buf)
	{
	int i,m,last,icoeff;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++)
	for (icoeff = 0; icoeff < size_peratom_cols; icoeff++)
	buf[m++] = snav[i][icoeff];
	return comm_reverse;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSNAVAtom::unpack_reverse_comm(int n, int list, double buf)
	{
	int i,j,m,icoeff;

	m = 0;
	for (i = 0; i < n; i++) {
	j = list[i];
	for (icoeff = 0; icoeff < size_peratom_cols; icoeff++)
	snav[j][icoeff] += buf[m++];
	}
	}

	/* ----------------------------------------------------------------------
	memory usage
	------------------------------------------------------------------------- */

	double ComputeSNAVAtom::memory_usage()
	{
	double bytes = nmaxsize_peratom_cols sizeof(double);
	bytes += 3njmaxsizeof(double);
	bytes += njmax*sizeof(int);
	- bytes += ncoeff*nvirial;
	+ bytes += sixncoeff*atom->ntypes;
	+ if (quadraticflag) bytes += sixncoeffsq*atom->ntypes;
	bytes += snaptr[0]->memory_usage()*comm->nthreads;
	return bytes;
	}
	diff --git a/src/SNAP/compute_snav_atom.h b/src/SNAP/compute_snav_atom.h
	index 0252be705..33ae4f921 100644
	--- a/src/SNAP/compute_snav_atom.h
	+++ b/src/SNAP/compute_snav_atom.h
	@@ -1,77 +1,78 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifdef COMPUTE_CLASS

	ComputeStyle(snav/atom,ComputeSNAVAtom)

	#else

	#ifndef LMP_COMPUTE_SNAV_ATOM_H
	#define LMP_COMPUTE_SNAV_ATOM_H

	#include "compute.h"

	namespace LAMMPS_NS {

	class ComputeSNAVAtom : public Compute {
	public:
	ComputeSNAVAtom(class LAMMPS , int, char *);
	~ComputeSNAVAtom();
	void init();
	void init_list(int, class NeighList *);
	void compute_peratom();
	int pack_reverse_comm(int, int, double *);
	void unpack_reverse_comm(int, int , double );
	double memory_usage();

	private:
	int nmax, njmax, diagonalstyle;
	- int ncoeff,nvirial;
	+ int ncoeff, twoncoeff, threencoeff, fourncoeff, fivencoeff, sixncoeff;
	+ int ncoeffsq, twoncoeffsq, threencoeffsq, fourncoeffsq, fivencoeffsq, sixncoeffsq;
	double **cutsq;
	class NeighList *list;
	double **snav;
	double rcutfac;
	double *radelem;
	double *wjelem;
	-
	class SNA** snaptr;
	-
	+ double cutmax;
	+ int quadraticflag;
	};

	}

	#endif
	#endif

	/* ERROR/WARNING messages:

	E: Illegal ... command

	Self-explanatory. Check the input script syntax and compare to the
	documentation for the command. You can use -echo screen as a
	command-line option when running LAMMPS to see the offending line.

	E: Compute snav/atom requires a pair style be defined

	Self-explanatory.

	E: Compute snav/atom cutoff is longer than pairwise cutoff

	Self-explanatory.

	W: More than one compute snav/atom

	Self-explanatory.

	*/
	diff --git a/src/SNAP/pair_snap.cpp b/src/SNAP/pair_snap.cpp
	index 06c2e4848..e4ed57b93 100644
	--- a/src/SNAP/pair_snap.cpp
	+++ b/src/SNAP/pair_snap.cpp
	@@ -1,1733 +1,1734 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <math.h>
	#include <stdlib.h>
	#include <string.h>
	#include "pair_snap.h"
	#include "atom.h"
	#include "atom_vec.h"
	#include "force.h"
	#include "comm.h"
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "neigh_request.h"
	#include "sna.h"
	#include "openmp_snap.h"
	#include "domain.h"
	#include "memory.h"
	#include "error.h"

	#include <cmath>

	using namespace LAMMPS_NS;

	#define MAXLINE 1024
	#define MAXWORD 3

	/* ---------------------------------------------------------------------- */

	PairSNAP::PairSNAP(LAMMPS *lmp) : Pair(lmp)
	{
	single_enable = 0;
	restartinfo = 0;
	one_coeff = 1;
	manybody_flag = 1;

	nelements = 0;
	elements = NULL;
	radelem = NULL;
	wjelem = NULL;
	coeffelem = NULL;

	nmax = 0;
	nthreads = 1;

	schedule_user = 0;
	schedule_time_guided = -1;
	schedule_time_dynamic = -1;
	ncalls_neigh =-1;

	ilistmask_max = 0;
	ilistmask = NULL;
	ghostinum = 0;
	ghostilist_max = 0;
	ghostilist = NULL;
	ghostnumneigh_max = 0;
	ghostnumneigh = NULL;
	ghostneighs = NULL;
	ghostfirstneigh = NULL;
	ghostneighs_total = 0;
	ghostneighs_max = 0;

	i_max = 0;
	i_neighmax = 0;
	i_numpairs = 0;
	i_rij = NULL;
	i_inside = NULL;
	i_wj = NULL;
	i_rcutij = NULL;
	i_ninside = NULL;
	i_pairs = NULL;
	i_uarraytot_r = NULL;
	i_uarraytot_i = NULL;
	i_zarray_r = NULL;
	i_zarray_i =NULL;

	use_shared_arrays = 0;

	#ifdef TIMING_INFO
	timers[0] = 0;
	timers[1] = 0;
	timers[2] = 0;
	timers[3] = 0;
	#endif

	// Need to set this because restart not handled by PairHybrid

	sna = NULL;

	}

	/* ---------------------------------------------------------------------- */

	PairSNAP::~PairSNAP()
	{
	if (nelements) {
	for (int i = 0; i < nelements; i++)
	delete[] elements[i];
	delete[] elements;
	memory->destroy(radelem);
	memory->destroy(wjelem);
	memory->destroy(coeffelem);
	}

	// Need to set this because restart not handled by PairHybrid

	if (sna) {

	#ifdef TIMING_INFO
	double time[5];
	double timeave[5];
	double timeave_mpi[5];
	double timemax_mpi[5];

	for (int i = 0; i < 5; i++) {
	time[i] = 0;
	timeave[i] = 0;
	for (int tid = 0; tid<nthreads; tid++) {
	if (sna[tid]->timers[i]>time[i])
	time[i] = sna[tid]->timers[i];
	timeave[i] += sna[tid]->timers[i];
	}
	timeave[i] /= nthreads;
	}
	MPI_Reduce(timeave, timeave_mpi, 5, MPI_DOUBLE, MPI_SUM, 0, world);
	MPI_Reduce(time, timemax_mpi, 5, MPI_DOUBLE, MPI_MAX, 0, world);
	#endif

	for (int tid = 0; tid<nthreads; tid++)
	delete sna[tid];
	delete [] sna;

	}

	if (allocated) {
	memory->destroy(setflag);
	memory->destroy(cutsq);
	memory->destroy(map);
	}

	}

	void PairSNAP::compute(int eflag, int vflag)
	{
	if (use_optimized)
	compute_optimized(eflag, vflag);
	else
	compute_regular(eflag, vflag);
	}

	/* ----------------------------------------------------------------------
	This version is a straightforward implementation
	---------------------------------------------------------------------- */

	void PairSNAP::compute_regular(int eflag, int vflag)
	{
	int i,j,jnum,ninside;
	double delx,dely,delz,evdwl,rsq;
	double fij[3];
	int jlist,numneigh,**firstneigh;
	evdwl = 0.0;

	if (eflag \|\| vflag) ev_setup(eflag,vflag);
	else evflag = vflag_fdotr = 0;

	double **x = atom->x;
	double **f = atom->f;
	int *type = atom->type;
	int nlocal = atom->nlocal;
	int newton_pair = force->newton_pair;
	class SNA* snaptr = sna[0];

	numneigh = list->numneigh;
	firstneigh = list->firstneigh;

	for (int ii = 0; ii < list->inum; ii++) {
	i = list->ilist[ii];

	const double xtmp = x[i][0];
	const double ytmp = x[i][1];
	const double ztmp = x[i][2];
	const int itype = type[i];
	const int ielem = map[itype];
	const double radi = radelem[ielem];

	jlist = firstneigh[i];
	jnum = numneigh[i];

	// insure rij, inside, wj, and rcutij are of size jnum

	snaptr->grow_rij(jnum);

	// rij[][3] = displacements between atom I and those neighbors
	// inside = indices of neighbors of I within cutoff
	// wj = weights for neighbors of I within cutoff
	// rcutij = cutoffs for neighbors of I within cutoff
	// note Rij sign convention => dU/dRij = dU/dRj = -dU/dRi

	ninside = 0;
	for (int jj = 0; jj < jnum; jj++) {
	j = jlist[jj];
	j &= NEIGHMASK;
	delx = x[j][0] - xtmp;
	dely = x[j][1] - ytmp;
	delz = x[j][2] - ztmp;
	rsq = delxdelx + delydely + delz*delz;
	int jtype = type[j];
	int jelem = map[jtype];

	if (rsq < cutsq[itype][jtype]&&rsq>1e-20) {
	snaptr->rij[ninside][0] = delx;
	snaptr->rij[ninside][1] = dely;
	snaptr->rij[ninside][2] = delz;
	snaptr->inside[ninside] = j;
	snaptr->wj[ninside] = wjelem[jelem];
	snaptr->rcutij[ninside] = (radi + radelem[jelem])*rcutfac;
	ninside++;
	}
	}

	// compute Ui, Zi, and Bi for atom I

	snaptr->compute_ui(ninside);
	snaptr->compute_zi();
	if (!gammaoneflag) {
	snaptr->compute_bi();
	snaptr->copy_bi2bvec();
	}

	// for neighbors of I within cutoff:
	// compute dUi/drj and dBi/drj
	// Fij = dEi/dRj = -dEi/dRi => add to Fi, subtract from Fj

	double* coeffi = coeffelem[ielem];

	for (int jj = 0; jj < ninside; jj++) {
	int j = snaptr->inside[jj];
	snaptr->compute_duidrj(snaptr->rij[jj],
	snaptr->wj[jj],snaptr->rcutij[jj]);

	snaptr->compute_dbidrj();
	snaptr->copy_dbi2dbvec();

	fij[0] = 0.0;
	fij[1] = 0.0;
	fij[2] = 0.0;

	for (int k = 1; k <= ncoeff; k++) {
	double bgb;
	if (gammaoneflag)
	bgb = coeffi[k];
	else bgb = coeffi[k]*
	gamma*pow(snaptr->bvec[k-1],gamma-1.0);
	fij[0] += bgb*snaptr->dbvec[k-1][0];
	fij[1] += bgb*snaptr->dbvec[k-1][1];
	fij[2] += bgb*snaptr->dbvec[k-1][2];
	}

	f[i][0] += fij[0];
	f[i][1] += fij[1];
	f[i][2] += fij[2];
	f[j][0] -= fij[0];
	f[j][1] -= fij[1];
	f[j][2] -= fij[2];

	if (evflag)
	ev_tally_xyz(i,j,nlocal,newton_pair,0.0,0.0,
	fij[0],fij[1],fij[2],
	snaptr->rij[jj][0],snaptr->rij[jj][1],
	snaptr->rij[jj][2]);
	}

	if (eflag) {

	// evdwl = energy of atom I, sum over coeffs_k * Bi_k

	evdwl = coeffi[0];
	if (gammaoneflag) {
	snaptr->compute_bi();
	snaptr->copy_bi2bvec();
	for (int k = 1; k <= ncoeff; k++)
	evdwl += coeffi[k]*snaptr->bvec[k-1];
	} else
	for (int k = 1; k <= ncoeff; k++)
	evdwl += coeffi[k]*pow(snaptr->bvec[k-1],gamma);
	ev_tally_full(i,2.0*evdwl,0.0,0.0,delx,dely,delz);
	}

	}

	if (vflag_fdotr) virial_fdotr_compute();
	}


	/* ----------------------------------------------------------------------
	This version is optimized for threading, micro-load balancing
	---------------------------------------------------------------------- */

	void PairSNAP::compute_optimized(int eflag, int vflag)
	{
	// if reneighboring took place do load_balance if requested
	if (do_load_balance > 0 &&
	(neighbor->ncalls != ncalls_neigh)) {
	ghostinum = 0;
	// reset local ghost neighbor lists
	ncalls_neigh = neighbor->ncalls;
	if (ilistmask_max < list->inum) {
	memory->grow(ilistmask,list->inum,"PairSnap::ilistmask");
	ilistmask_max = list->inum;
	}
	for (int i = 0; i < list->inum; i++)
	ilistmask[i] = 1;

	//multiple passes for loadbalancing
	for (int i = 0; i < do_load_balance; i++)
	load_balance();
	}

	int numpairs = 0;
	for (int ii = 0; ii < list->inum; ii++) {
	if ((do_load_balance <= 0) \|\| ilistmask[ii]) {
	int i = list->ilist[ii];
	int jnum = list->numneigh[i];
	numpairs += jnum;
	}
	}

	if (do_load_balance)
	for (int ii = 0; ii < ghostinum; ii++) {
	int i = ghostilist[ii];
	int jnum = ghostnumneigh[i];
	numpairs += jnum;
	}

	// optimized schedule setting

	int time_dynamic = 0;
	int time_guided = 0;

	if (schedule_user == 0) schedule_user = 4;

	switch (schedule_user) {
	case 1:
	omp_set_schedule(omp_sched_static,1);
	break;
	case 2:
	omp_set_schedule(omp_sched_dynamic,1);
	break;
	case 3:
	omp_set_schedule(omp_sched_guided,2);
	break;
	case 4:
	omp_set_schedule(omp_sched_auto,0);
	break;
	case 5:
	if (numpairs < 8*nthreads) omp_set_schedule(omp_sched_dynamic,1);
	else if (schedule_time_guided < 0.0) {
	omp_set_schedule(omp_sched_guided,2);
	if (!eflag && !vflag) time_guided = 1;
	} else if (schedule_time_dynamic<0.0) {
	omp_set_schedule(omp_sched_dynamic,1);
	if (!eflag && !vflag) time_dynamic = 1;
	} else if (schedule_time_guided<schedule_time_dynamic)
	omp_set_schedule(omp_sched_guided,2);
	else
	omp_set_schedule(omp_sched_dynamic,1);
	break;
	}

	if (use_shared_arrays)
	build_per_atom_arrays();

	#if defined(_OPENMP)
	#pragma omp parallel shared(eflag,vflag,time_dynamic,time_guided) firstprivate(numpairs) default(none)
	#endif
	{
	// begin of pragma omp parallel

	int tid = omp_get_thread_num();
	int** pairs_tid_unique = NULL;

	int** pairs;
	if (use_shared_arrays) pairs = i_pairs;
	else {
	memory->create(pairs_tid_unique,numpairs,4,"numpairs");
	pairs = pairs_tid_unique;
	}

	if (!use_shared_arrays) {
	numpairs = 0;
	for (int ii = 0; ii < list->inum; ii++) {
	if ((do_load_balance <= 0) \|\| ilistmask[ii]) {
	int i = list->ilist[ii];
	int jnum = list->numneigh[i];
	for (int jj = 0; jj<jnum; jj++) {
	pairs[numpairs][0] = i;
	pairs[numpairs][1] = jj;
	pairs[numpairs][2] = -1;
	numpairs++;
	}
	}
	}

	for (int ii = 0; ii < ghostinum; ii++) {
	int i = ghostilist[ii];
	int jnum = ghostnumneigh[i];
	for (int jj = 0; jj<jnum; jj++) {
	pairs[numpairs][0] = i;
	pairs[numpairs][1] = jj;
	pairs[numpairs][2] = -1;
	numpairs++;
	}
	}
	}

	int ielem;
	int jj,k,jnum,jtype,ninside;
	double delx,dely,delz,evdwl,rsq;
	double fij[3];
	int jlist,numneigh,**firstneigh;
	evdwl = 0.0;

	#if defined(_OPENMP)
	#pragma omp master
	#endif
	{
	if (eflag \|\| vflag) ev_setup(eflag,vflag);
	else evflag = vflag_fdotr = 0;
	}

	#if defined(_OPENMP)
	#pragma omp barrier
	{ ; }
	#endif

	double **x = atom->x;
	double **f = atom->f;
	int *type = atom->type;
	int nlocal = atom->nlocal;
	int newton_pair = force->newton_pair;

	numneigh = list->numneigh;
	firstneigh = list->firstneigh;

	#ifdef TIMING_INFO
	// only update micro timers after setup
	static int count=0;
	if (count<2) {
	sna[tid]->timers[0] = 0;
	sna[tid]->timers[1] = 0;
	sna[tid]->timers[2] = 0;
	sna[tid]->timers[3] = 0;
	sna[tid]->timers[4] = 0;
	}
	count++;
	#endif

	// did thread start working on interactions of new atom
	int iold = -1;

	double starttime, endtime;
	if (time_dynamic \|\| time_guided)
	starttime = MPI_Wtime();

	#if defined(_OPENMP)
	#pragma omp for schedule(runtime)
	#endif
	for (int iijj = 0; iijj < numpairs; iijj++) {
	int i = 0;
	if (use_shared_arrays) {
	i = i_pairs[iijj][0];
	if (iold != i) {
	set_sna_to_shared(tid,i_pairs[iijj][3]);
	ielem = map[type[i]];
	}
	iold = i;
	} else {
	i = pairs[iijj][0];
	if (iold != i) {
	iold = i;
	const double xtmp = x[i][0];
	const double ytmp = x[i][1];
	const double ztmp = x[i][2];
	const int itype = type[i];
	ielem = map[itype];
	const double radi = radelem[ielem];

	if (i < nlocal) {
	jlist = firstneigh[i];
	jnum = numneigh[i];
	} else {
	jlist = ghostneighs+ghostfirstneigh[i];
	jnum = ghostnumneigh[i];
	}

	// insure rij, inside, wj, and rcutij are of size jnum

	sna[tid]->grow_rij(jnum);

	// rij[][3] = displacements between atom I and those neighbors
	// inside = indices of neighbors of I within cutoff
	// wj = weights of neighbors of I within cutoff
	// rcutij = cutoffs of neighbors of I within cutoff
	// note Rij sign convention => dU/dRij = dU/dRj = -dU/dRi

	ninside = 0;
	for (jj = 0; jj < jnum; jj++) {
	int j = jlist[jj];
	j &= NEIGHMASK;
	delx = x[j][0] - xtmp; //unitialised
	dely = x[j][1] - ytmp;
	delz = x[j][2] - ztmp;
	rsq = delxdelx + delydely + delz*delz;
	jtype = type[j];
	int jelem = map[jtype];

	if (rsq < cutsq[itype][jtype]&&rsq>1e-20) { //unitialised
	sna[tid]->rij[ninside][0] = delx;
	sna[tid]->rij[ninside][1] = dely;
	sna[tid]->rij[ninside][2] = delz;
	sna[tid]->inside[ninside] = j;
	sna[tid]->wj[ninside] = wjelem[jelem];
	sna[tid]->rcutij[ninside] = (radi + radelem[jelem])*rcutfac;
	ninside++;

	// update index list with inside index
	pairs[iijj + (jj - pairs[iijj][1])][2] =
	ninside-1; //unitialised
	}
	}

	// compute Ui and Zi for atom I

	sna[tid]->compute_ui(ninside); //unitialised
	sna[tid]->compute_zi();
	}
	}

	// for neighbors of I within cutoff:
	// compute dUi/drj and dBi/drj
	// Fij = dEi/dRj = -dEi/dRi => add to Fi, subtract from Fj

	// entry into loop if inside index is set

	double* coeffi = coeffelem[ielem];

	if (pairs[iijj][2] >= 0) {
	jj = pairs[iijj][2];
	int j = sna[tid]->inside[jj];
	sna[tid]->compute_duidrj(sna[tid]->rij[jj],
	sna[tid]->wj[jj],sna[tid]->rcutij[jj]);

	sna[tid]->compute_dbidrj();
	sna[tid]->copy_dbi2dbvec();
	if (!gammaoneflag) {
	sna[tid]->compute_bi();
	sna[tid]->copy_bi2bvec();
	}

	fij[0] = 0.0;
	fij[1] = 0.0;
	fij[2] = 0.0;

	for (k = 1; k <= ncoeff; k++) {
	double bgb;
	if (gammaoneflag)
	bgb = coeffi[k];
	else bgb = coeffi[k]*
	gamma*pow(sna[tid]->bvec[k-1],gamma-1.0);
	fij[0] += bgb*sna[tid]->dbvec[k-1][0];
	fij[1] += bgb*sna[tid]->dbvec[k-1][1];
	fij[2] += bgb*sna[tid]->dbvec[k-1][2];
	}

	#if defined(_OPENMP)
	#pragma omp critical
	#endif
	{
	f[i][0] += fij[0];
	f[i][1] += fij[1];
	f[i][2] += fij[2];
	f[j][0] -= fij[0];
	f[j][1] -= fij[1];
	f[j][2] -= fij[2];
	if (evflag)
	ev_tally_xyz(i,j,nlocal,newton_pair,0.0,0.0,
	fij[0],fij[1],fij[2],
	sna[tid]->rij[jj][0],sna[tid]->rij[jj][1],
	sna[tid]->rij[jj][2]);
	}
	}

	// evdwl = energy of atom I, sum over coeffs_k * Bi_k
	// only call this for first pair of each atom i
	// if atom has no pairs, eatom=0, which is wrong

	if (eflag&&pairs[iijj][1] == 0) {
	evdwl = coeffi[0];
	if (gammaoneflag) {
	sna[tid]->compute_bi();
	sna[tid]->copy_bi2bvec();
	for (int k = 1; k <= ncoeff; k++)
	evdwl += coeffi[k]*sna[tid]->bvec[k-1];
	} else
	for (int k = 1; k <= ncoeff; k++)
	evdwl += coeffi[k]*pow(sna[tid]->bvec[k-1],gamma);

	#if defined(_OPENMP)
	#pragma omp critical
	#endif
	ev_tally_full(i,2.0*evdwl,0.0,0.0,delx,dely,delz);
	}

	}
	if (time_dynamic \|\| time_guided)
	endtime = MPI_Wtime();
	if (time_dynamic) schedule_time_dynamic = endtime - starttime;
	if (time_guided) schedule_time_guided = endtime - starttime;
	if (!use_shared_arrays) memory->destroy(pairs);

	}// end of pragma omp parallel

	if (vflag_fdotr) virial_fdotr_compute();

	}

	inline int PairSNAP::equal(double* x,double* y)
	{
	double dist2 =
	(x[0]-y[0])*(x[0]-y[0]) +
	(x[1]-y[1])*(x[1]-y[1]) +
	(x[2]-y[2])*(x[2]-y[2]);
	if (dist2 < 1e-20) return 1;
	return 0;
	}

	inline double PairSNAP::dist2(double* x,double* y)
	{
	return
	(x[0]-y[0])*(x[0]-y[0]) +
	(x[1]-y[1])*(x[1]-y[1]) +
	(x[2]-y[2])*(x[2]-y[2]);
	}

	// return extra communication cutoff
	// extra_cutoff = max(subdomain_length)

	double PairSNAP::extra_cutoff()
	{
	double sublo[3],subhi[3];

	if (domain->triclinic == 0) {
	for (int dim = 0 ; dim < 3 ; dim++) {
	sublo[dim] = domain->sublo[dim];
	subhi[dim] = domain->subhi[dim];
	}
	} else {
	domain->lamda2x(domain->sublo_lamda,sublo);
	domain->lamda2x(domain->subhi_lamda,subhi);
	}

	double sub_size[3];
	for (int dim = 0; dim < 3; dim++)
	sub_size[dim] = subhi[dim] - sublo[dim];

	double max_sub_size = 0;
	for (int dim = 0; dim < 3; dim++)
	max_sub_size = MAX(max_sub_size,sub_size[dim]);

	// note: for triclinic, probably need something different
	// see Comm::setup()

	return max_sub_size;
	}

	// micro load_balancer: each MPI process will
	// check with each of its 26 neighbors,
	// whether an imbalance exists in the number
	// of atoms to calculate forces for.
	// If it does it will set ilistmask of one of
	// its local atoms to zero, and send its Tag
	// to the neighbor process. The neighboring process
	// will check its ghost list for the
	// ghost atom with the same Tag which is closest
	// to its domain center, and build a
	// neighborlist for this ghost atom. For this to work,
	// the communication cutoff has to be
	// as large as the neighbor cutoff +
	// maximum subdomain length.

	// Note that at most one atom is exchanged per processor pair.

	// Also note that the local atom assignment
	// doesn't change. This load balancer will cause
	// some ghost atoms to have full neighborlists
	// which are unique to PairSNAP.
	// They are not part of the generally accessible neighborlist.
	// At the same time corresponding local atoms on
	// other MPI processes will not be
	// included in the force computation since
	// their ilistmask is 0. This does not effect
	// any other classes which might
	// access the same general neighborlist.
	// Reverse communication (newton on) of forces is required.

	// Currently the load balancer does two passes,
	// since its exchanging atoms upstream and downstream.

	void PairSNAP::load_balance()
	{
	double sublo[3],subhi[3];
	if (domain->triclinic == 0) {
	double* sublotmp = domain->sublo;
	double* subhitmp = domain->subhi;
	for (int dim = 0 ; dim<3 ; dim++) {
	sublo[dim]=sublotmp[dim];
	subhi[dim]=subhitmp[dim];
	}
	} else {
	double* sublotmp = domain->sublo_lamda;
	double* subhitmp = domain->subhi_lamda;
	domain->lamda2x(sublotmp,sublo);
	domain->lamda2x(subhitmp,subhi);
	}

	//if (list->inum==0) list->grow(atom->nmax);

	int nlocal = ghostinum;
	for (int i=0; i < list->inum; i++)
	if (ilistmask[i]) nlocal++;
	int ***grid2proc = comm->grid2proc;
	int* procgrid = comm->procgrid;

	int nlocal_up,nlocal_down;
	MPI_Request request;

	double sub_mid[3];
	for (int dim=0; dim<3; dim++)
	sub_mid[dim] = (subhi[dim] + sublo[dim])/2;

	if (comm->cutghostuser <
	neighbor->cutneighmax+extra_cutoff())
	error->all(FLERR,"Communication cutoff too small for SNAP micro load balancing");

	int nrecv = ghostinum;
	int totalsend = 0;
	int nsend = 0;
	int depth = 1;

	for (int dx = -depth; dx < depth+1; dx++)
	for (int dy = -depth; dy < depth+1; dy++)
	for (int dz = -depth; dz < depth+1; dz++) {

	if (dx == dy && dy == dz && dz == 0) continue;

	int sendloc[3] = {comm->myloc[0],
	comm->myloc[1], comm->myloc[2]
	};
	sendloc[0] += dx;
	sendloc[1] += dy;
	sendloc[2] += dz;
	for (int dim = 0; dim < 3; dim++)
	if (sendloc[dim] >= procgrid[dim])
	sendloc[dim] = sendloc[dim] - procgrid[dim];
	for (int dim = 0; dim < 3; dim++)
	if (sendloc[dim] < 0)
	sendloc[dim] = procgrid[dim] + sendloc[dim];
	int recvloc[3] = {comm->myloc[0],
	comm->myloc[1], comm->myloc[2]
	};
	recvloc[0] -= dx;
	recvloc[1] -= dy;
	recvloc[2] -= dz;
	for (int dim = 0; dim < 3; dim++)
	if (recvloc[dim] < 0)
	recvloc[dim] = procgrid[dim] + recvloc[dim];
	for (int dim = 0; dim < 3; dim++)
	if (recvloc[dim] >= procgrid[dim])
	recvloc[dim] = recvloc[dim] - procgrid[dim];

	int sendproc = grid2proc[sendloc[0]][sendloc[1]][sendloc[2]];
	int recvproc = grid2proc[recvloc[0]][recvloc[1]][recvloc[2]];

	// two stage process, first upstream movement, then downstream

	MPI_Sendrecv(&nlocal,1,MPI_INT,sendproc,0,
	&nlocal_up,1,MPI_INT,recvproc,0,world,MPI_STATUS_IGNORE);
	MPI_Sendrecv(&nlocal,1,MPI_INT,recvproc,0,
	&nlocal_down,1,MPI_INT,sendproc,0,world,MPI_STATUS_IGNORE);
	nsend = 0;

	// send upstream

	if (nlocal > nlocal_up+1) {

	int i = totalsend++;
	while(i < list->inum && ilistmask[i] == 0)
	i = totalsend++;

	if (i < list->inum)
	MPI_Isend(&atom->tag[i],1,MPI_INT,recvproc,0,world,&request);
	else {
	int j = -1;
	MPI_Isend(&j,1,MPI_INT,recvproc,0,world,&request);
	}

	if (i < list->inum) {
	for (int j = 0; j < list->inum; j++)
	if (list->ilist[j] == i)
	ilistmask[j] = 0;
	nsend = 1;
	}
	}

	// recv downstream

	if (nlocal < nlocal_down-1) {
	nlocal++;
	int get_tag = -1;
	MPI_Recv(&get_tag,1,MPI_INT,sendproc,0,world,MPI_STATUS_IGNORE);

	// if get_tag -1 the other process didnt have local atoms to send

	if (get_tag >= 0) {
	if (ghostinum >= ghostilist_max) {
	memory->grow(ghostilist,ghostinum+10,
	"PairSnap::ghostilist");
	ghostilist_max = ghostinum+10;
	}
	if (atom->nlocal + atom->nghost >= ghostnumneigh_max) {
	ghostnumneigh_max = atom->nlocal+atom->nghost+100;
	memory->grow(ghostnumneigh,ghostnumneigh_max,
	"PairSnap::ghostnumneigh");
	memory->grow(ghostfirstneigh,ghostnumneigh_max,
	"PairSnap::ghostfirstneigh");
	}

	// find closest ghost image of the transfered particle

	double mindist = 1e200;
	int closestghost = -1;
	for (int j = 0; j < atom->nlocal + atom->nghost; j++)
	if (atom->tag[j] == get_tag)
	if (dist2(sub_mid, atom->x[j]) < mindist) {
	closestghost = j;
	mindist = dist2(sub_mid, atom->x[j]);
	}

	// build neighborlist for this particular
	// ghost atom, and add it to list->ilist

	if (ghostneighs_max - ghostneighs_total <
	neighbor->oneatom) {
	memory->grow(ghostneighs,
	ghostneighs_total + neighbor->oneatom,
	"PairSnap::ghostneighs");
	ghostneighs_max = ghostneighs_total + neighbor->oneatom;
	}

	int j = closestghost;

	ghostilist[ghostinum] = j;
	ghostnumneigh[j] = 0;
	ghostfirstneigh[j] = ghostneighs_total;

	ghostinum++;
	int* jlist = ghostneighs + ghostfirstneigh[j];

	// find all neighbors by looping
	// over all local and ghost atoms

	for (int k = 0; k < atom->nlocal + atom->nghost; k++)
	if (dist2(atom->x[j],atom->x[k]) <
	neighbor->cutneighmax*neighbor->cutneighmax) {
	jlist[ghostnumneigh[j]] = k;
	ghostnumneigh[j]++;
	ghostneighs_total++;
	}
	}

	if (get_tag >= 0) nrecv++;
	}

	// decrease nlocal later, so that it is the
	// initial number both for receiving and sending

	if (nsend) nlocal--;

	// second pass through the grid

	MPI_Sendrecv(&nlocal,1,MPI_INT,sendproc,0,
	&nlocal_up,1,MPI_INT,recvproc,0,world,MPI_STATUS_IGNORE);
	MPI_Sendrecv(&nlocal,1,MPI_INT,recvproc,0,
	&nlocal_down,1,MPI_INT,sendproc,0,world,MPI_STATUS_IGNORE);

	// send downstream

	nsend=0;
	if (nlocal > nlocal_down+1) {
	int i = totalsend++;
	while(i < list->inum && ilistmask[i]==0) i = totalsend++;

	if (i < list->inum)
	MPI_Isend(&atom->tag[i],1,MPI_INT,sendproc,0,world,&request);
	else {
	int j =- 1;
	MPI_Isend(&j,1,MPI_INT,sendproc,0,world,&request);
	}

	if (i < list->inum) {
	for (int j=0; j<list->inum; j++)
	if (list->ilist[j] == i) ilistmask[j] = 0;
	nsend = 1;
	}
	}

	// receive upstream

	if (nlocal < nlocal_up-1) {
	nlocal++;
	int get_tag = -1;

	MPI_Recv(&get_tag,1,MPI_INT,recvproc,0,world,MPI_STATUS_IGNORE);

	if (get_tag >= 0) {
	if (ghostinum >= ghostilist_max) {
	memory->grow(ghostilist,ghostinum+10,
	"PairSnap::ghostilist");
	ghostilist_max = ghostinum+10;
	}
	if (atom->nlocal + atom->nghost >= ghostnumneigh_max) {
	ghostnumneigh_max = atom->nlocal + atom->nghost + 100;
	memory->grow(ghostnumneigh,ghostnumneigh_max,
	"PairSnap::ghostnumneigh");
	memory->grow(ghostfirstneigh,ghostnumneigh_max,
	"PairSnap::ghostfirstneigh");
	}

	// find closest ghost image of the transfered particle

	double mindist = 1e200;
	int closestghost = -1;
	for (int j = 0; j < atom->nlocal + atom->nghost; j++)
	if (atom->tag[j] == get_tag)
	if (dist2(sub_mid,atom->x[j])<mindist) {
	closestghost = j;
	mindist = dist2(sub_mid,atom->x[j]);
	}

	// build neighborlist for this particular ghost atom

	if (ghostneighs_max-ghostneighs_total < neighbor->oneatom) {
	memory->grow(ghostneighs,ghostneighs_total + neighbor->oneatom,
	"PairSnap::ghostneighs");
	ghostneighs_max = ghostneighs_total + neighbor->oneatom;
	}

	int j = closestghost;

	ghostilist[ghostinum] = j;
	ghostnumneigh[j] = 0;
	ghostfirstneigh[j] = ghostneighs_total;

	ghostinum++;
	int* jlist = ghostneighs + ghostfirstneigh[j];

	for (int k = 0; k < atom->nlocal + atom->nghost; k++)
	if (dist2(atom->x[j],atom->x[k]) <
	neighbor->cutneighmax*neighbor->cutneighmax) {
	jlist[ghostnumneigh[j]] = k;
	ghostnumneigh[j]++;
	ghostneighs_total++;
	}
	}

	if (get_tag >= 0) nrecv++;
	}
	if (nsend) nlocal--;
	}
	}

	void PairSNAP::set_sna_to_shared(int snaid,int i)
	{
	sna[snaid]->rij = i_rij[i];
	sna[snaid]->inside = i_inside[i];
	sna[snaid]->wj = i_wj[i];
	sna[snaid]->rcutij = i_rcutij[i];
	sna[snaid]->zarray_r = i_zarray_r[i];
	sna[snaid]->zarray_i = i_zarray_i[i];
	sna[snaid]->uarraytot_r = i_uarraytot_r[i];
	sna[snaid]->uarraytot_i = i_uarraytot_i[i];
	}

	void PairSNAP::build_per_atom_arrays()
	{

	#ifdef TIMING_INFO
	clock_gettime(CLOCK_REALTIME,&starttime);
	#endif

	int count = 0;
	int neighmax = 0;
	for (int ii = 0; ii < list->inum; ii++)
	if ((do_load_balance <= 0) \|\| ilistmask[ii]) {
	neighmax=MAX(neighmax,list->numneigh[list->ilist[ii]]);
	++count;
	}
	for (int ii = 0; ii < ghostinum; ii++) {
	neighmax=MAX(neighmax,ghostnumneigh[ghostilist[ii]]);
	++count;
	}

	if (i_max < count \|\| i_neighmax < neighmax) {
	int i_maxt = MAX(count,i_max);
	i_neighmax = MAX(neighmax,i_neighmax);
	memory->destroy(i_rij);
	memory->destroy(i_inside);
	memory->destroy(i_wj);
	memory->destroy(i_rcutij);
	memory->destroy(i_ninside);
	memory->destroy(i_pairs);
	memory->create(i_rij,i_maxt,i_neighmax,3,"PairSNAP::i_rij");
	memory->create(i_inside,i_maxt,i_neighmax,"PairSNAP::i_inside");
	memory->create(i_wj,i_maxt,i_neighmax,"PairSNAP::i_wj");
	memory->create(i_rcutij,i_maxt,i_neighmax,"PairSNAP::i_rcutij");
	memory->create(i_ninside,i_maxt,"PairSNAP::i_ninside");
	memory->create(i_pairs,i_maxt*i_neighmax,4,"PairSNAP::i_pairs");
	}

	if (i_max < count) {
	int jdim = sna[0]->twojmax+1;
	memory->destroy(i_uarraytot_r);
	memory->destroy(i_uarraytot_i);
	memory->create(i_uarraytot_r,count,jdim,jdim,jdim,
	"PairSNAP::i_uarraytot_r");
	memory->create(i_uarraytot_i,count,jdim,jdim,jdim,
	"PairSNAP::i_uarraytot_i");
	if (i_zarray_r != NULL)
	for (int i = 0; i < i_max; i++) {
	memory->destroy(i_zarray_r[i]);
	memory->destroy(i_zarray_i[i]);
	}

	delete [] i_zarray_r;
	delete [] i_zarray_i;
	i_zarray_r = new double*****[count];
	i_zarray_i = new double*****[count];
	for (int i = 0; i < count; i++) {
	memory->create(i_zarray_r[i],jdim,jdim,jdim,jdim,jdim,
	"PairSNAP::i_zarray_r");
	memory->create(i_zarray_i[i],jdim,jdim,jdim,jdim,jdim,
	"PairSNAP::i_zarray_i");
	}
	}

	if (i_max < count)
	i_max = count;

	count = 0;
	i_numpairs = 0;
	for (int ii = 0; ii < list->inum; ii++) {
	if ((do_load_balance <= 0) \|\| ilistmask[ii]) {
	int i = list->ilist[ii];
	int jnum = list->numneigh[i];
	int* jlist = list->firstneigh[i];
	const double xtmp = atom->x[i][0];
	const double ytmp = atom->x[i][1];
	const double ztmp = atom->x[i][2];
	const int itype = atom->type[i];
	const int ielem = map[itype];
	const double radi = radelem[ielem];
	int ninside = 0;
	for (int jj = 0; jj < jnum; jj++) {
	int j = jlist[jj];
	j &= NEIGHMASK;
	const double delx = atom->x[j][0] - xtmp;
	const double dely = atom->x[j][1] - ytmp;
	const double delz = atom->x[j][2] - ztmp;
	const double rsq = delxdelx + delydely + delz*delz;
	int jtype = atom->type[j];
	int jelem = map[jtype];

	i_pairs[i_numpairs][0] = i;
	i_pairs[i_numpairs][1] = jj;
	i_pairs[i_numpairs][2] = -1;
	i_pairs[i_numpairs][3] = count;
	if (rsq < cutsq[itype][jtype]&&rsq>1e-20) {
	i_rij[count][ninside][0] = delx;
	i_rij[count][ninside][1] = dely;
	i_rij[count][ninside][2] = delz;
	i_inside[count][ninside] = j;
	i_wj[count][ninside] = wjelem[jelem];
	i_rcutij[count][ninside] = (radi + radelem[jelem])*rcutfac;

	// update index list with inside index
	i_pairs[i_numpairs][2] = ninside++;
	}
	i_numpairs++;
	}
	i_ninside[count] = ninside;
	count++;
	}
	}

	for (int ii = 0; ii < ghostinum; ii++) {
	int i = ghostilist[ii];
	int jnum = ghostnumneigh[i];
	int* jlist = ghostneighs+ghostfirstneigh[i];
	const double xtmp = atom->x[i][0];
	const double ytmp = atom->x[i][1];
	const double ztmp = atom->x[i][2];
	const int itype = atom->type[i];
	const int ielem = map[itype];
	const double radi = radelem[ielem];
	int ninside = 0;

	for (int jj = 0; jj < jnum; jj++) {
	int j = jlist[jj];
	j &= NEIGHMASK;
	const double delx = atom->x[j][0] - xtmp;
	const double dely = atom->x[j][1] - ytmp;
	const double delz = atom->x[j][2] - ztmp;
	const double rsq = delxdelx + delydely + delz*delz;
	int jtype = atom->type[j];
	int jelem = map[jtype];

	i_pairs[i_numpairs][0] = i;
	i_pairs[i_numpairs][1] = jj;
	i_pairs[i_numpairs][2] = -1;
	i_pairs[i_numpairs][3] = count;
	if (rsq < cutsq[itype][jtype]&&rsq>1e-20) {
	i_rij[count][ninside][0] = delx;
	i_rij[count][ninside][1] = dely;
	i_rij[count][ninside][2] = delz;
	i_inside[count][ninside] = j;
	i_wj[count][ninside] = wjelem[jelem];
	i_rcutij[count][ninside] = (radi + radelem[jelem])*rcutfac;
	// update index list with inside index
	i_pairs[i_numpairs][2] = ninside++;
	}
	i_numpairs++;
	}
	i_ninside[count] = ninside;
	count++;
	}
	#ifdef TIMING_INFO
	clock_gettime(CLOCK_REALTIME,&endtime);
	timers[0]+=(endtime.tv_sec-starttime.tv_sec+1.0*
	(endtime.tv_nsec-starttime.tv_nsec)/1000000000);
	#endif
	#ifdef TIMING_INFO
	clock_gettime(CLOCK_REALTIME,&starttime);
	#endif

	#if defined(_OPENMP)
	#pragma omp parallel for shared(count) default(none)
	#endif
	for (int ii=0; ii < count; ii++) {
	int tid = omp_get_thread_num();
	set_sna_to_shared(tid,ii);
	//sna[tid]->compute_ui(i_ninside[ii]);
	#ifdef TIMING_INFO
	clock_gettime(CLOCK_REALTIME,&starttime);
	#endif
	sna[tid]->compute_ui_omp(i_ninside[ii],MAX(int(nthreads/count),1));
	#ifdef TIMING_INFO
	clock_gettime(CLOCK_REALTIME,&endtime);
	sna[tid]->timers[0]+=(endtime.tv_sec-starttime.tv_sec+1.0*
	(endtime.tv_nsec-starttime.tv_nsec)/1000000000);
	#endif
	}

	#ifdef TIMING_INFO
	clock_gettime(CLOCK_REALTIME,&starttime);
	#endif
	for (int ii=0; ii < count; ii++) {
	int tid = 0;//omp_get_thread_num();
	set_sna_to_shared(tid,ii);
	sna[tid]->compute_zi_omp(MAX(int(nthreads/count),1));
	}
	#ifdef TIMING_INFO
	clock_gettime(CLOCK_REALTIME,&endtime);
	sna[0]->timers[1]+=(endtime.tv_sec-starttime.tv_sec+1.0*
	(endtime.tv_nsec-starttime.tv_nsec)/1000000000);
	#endif

	#ifdef TIMING_INFO
	clock_gettime(CLOCK_REALTIME,&endtime);
	timers[1]+=(endtime.tv_sec-starttime.tv_sec+1.0*
	(endtime.tv_nsec-starttime.tv_nsec)/1000000000);
	#endif
	}

	/* ----------------------------------------------------------------------
	allocate all arrays
	------------------------------------------------------------------------- */

	void PairSNAP::allocate()
	{
	allocated = 1;
	int n = atom->ntypes;

	memory->create(setflag,n+1,n+1,"pair:setflag");
	memory->create(cutsq,n+1,n+1,"pair:cutsq");
	memory->create(map,n+1,"pair:map");
	}

	/* ----------------------------------------------------------------------
	global settings
	------------------------------------------------------------------------- */

	void PairSNAP::settings(int narg, char **arg)
	{

	// set default values for optional arguments

	nthreads = -1;
	use_shared_arrays=-1;
	do_load_balance = 0;
	use_optimized = 1;

	// optional arguments

	for (int i=0; i < narg; i++) {
	if (i+2>narg) error->all(FLERR,"Illegal pair_style command");
	if (strcmp(arg[i],"nthreads")==0) {
	nthreads=force->inumeric(FLERR,arg[++i]);
	#if defined(LMP_USER_OMP)
	error->all(FLERR,"Must set number of threads via package omp command");
	#else
	omp_set_num_threads(nthreads);
	comm->nthreads=nthreads;
	#endif
	continue;
	}
	if (strcmp(arg[i],"optimized")==0) {
	use_optimized=force->inumeric(FLERR,arg[++i]);
	continue;
	}
	if (strcmp(arg[i],"shared")==0) {
	use_shared_arrays=force->inumeric(FLERR,arg[++i]);
	continue;
	}
	if (strcmp(arg[i],"loadbalance")==0) {
	do_load_balance = force->inumeric(FLERR,arg[++i]);
	if (do_load_balance) {
	double mincutoff = extra_cutoff() +
	rcutmax + neighbor->skin;
	if (comm->cutghostuser < mincutoff) {
	char buffer[255];

	//apparently mincutoff is 0 after sprintf command ?????

	double tmp = mincutoff + 0.1;
	sprintf(buffer, "Communication cutoff is too small "
	"for SNAP micro load balancing, increased to %lf",
	mincutoff+0.1);
	if (comm->me==0)
	error->warning(FLERR,buffer);

	comm->cutghostuser = tmp;

	}
	}
	continue;
	}
	if (strcmp(arg[i],"schedule")==0) {
	i++;
	if (strcmp(arg[i],"static")==0)
	schedule_user = 1;
	if (strcmp(arg[i],"dynamic")==0)
	schedule_user = 2;
	if (strcmp(arg[i],"guided")==0)
	schedule_user = 3;
	if (strcmp(arg[i],"auto")==0)
	schedule_user = 4;
	if (strcmp(arg[i],"determine")==0)
	schedule_user = 5;
	if (schedule_user == 0)
	error->all(FLERR,"Illegal pair_style command");
	continue;
	}
	error->all(FLERR,"Illegal pair_style command");
	}

	if (nthreads < 0)
	nthreads = comm->nthreads;

	if (use_shared_arrays < 0) {
	if (nthreads > 1 && atom->nlocal <= 2*nthreads)
	use_shared_arrays = 1;
	else use_shared_arrays = 0;
	}

	// check if running non-optimized code with
	// optimization flags set

	if (!use_optimized)
	if (nthreads > 1 \|\|
	use_shared_arrays \|\|
	do_load_balance \|\|
	schedule_user)
	error->all(FLERR,"Illegal pair_style command");
	}

	/* ----------------------------------------------------------------------
	set coeffs for one or more type pairs
	------------------------------------------------------------------------- */

	void PairSNAP::coeff(int narg, char **arg)
	{
	// read SNAP element names between 2 filenames
	// nelements = # of SNAP elements
	// elements = list of unique element names

	if (narg < 6) error->all(FLERR,"Incorrect args for pair coefficients");
	if (!allocated) allocate();

	if (nelements) {
	for (int i = 0; i < nelements; i++)
	delete[] elements[i];
	delete[] elements;
	memory->destroy(radelem);
	memory->destroy(wjelem);
	memory->destroy(coeffelem);
	}

	nelements = narg - 4 - atom->ntypes;
	if (nelements < 1) error->all(FLERR,"Incorrect args for pair coefficients");

	char* type1 = arg[0];
	char* type2 = arg[1];
	char* coefffilename = arg[2];
	char** elemlist = &arg[3];
	char* paramfilename = arg[3+nelements];
	char** elemtypes = &arg[4+nelements];

	// insure I,J args are * *

	if (strcmp(type1,"") != 0 \|\| strcmp(type2,"") != 0)
	error->all(FLERR,"Incorrect args for pair coefficients");

	elements = new char*[nelements];

	for (int i = 0; i < nelements; i++) {
	char* elemname = elemlist[i];
	int n = strlen(elemname) + 1;
	elements[i] = new char[n];
	strcpy(elements[i],elemname);
	}

	// read snapcoeff and snapparam files

	read_files(coefffilename,paramfilename);

	// read args that map atom types to SNAP elements
	// map[i] = which element the Ith atom type is, -1 if not mapped
	// map[0] is not used

	for (int i = 1; i <= atom->ntypes; i++) {
	char* elemname = elemtypes[i-1];
	int jelem;
	for (jelem = 0; jelem < nelements; jelem++)
	if (strcmp(elemname,elements[jelem]) == 0)
	break;

	if (jelem < nelements)
	map[i] = jelem;
	else if (strcmp(elemname,"NULL") == 0) map[i] = -1;
	else error->all(FLERR,"Incorrect args for pair coefficients");
	}

	// clear setflag since coeff() called once with I,J = * *

	int n = atom->ntypes;
	for (int i = 1; i <= n; i++)
	for (int j = i; j <= n; j++)
	setflag[i][j] = 0;

	// set setflag i,j for type pairs where both are mapped to elements

	int count = 0;
	for (int i = 1; i <= n; i++)
	for (int j = i; j <= n; j++)
	if (map[i] >= 0 && map[j] >= 0) {
	setflag[i][j] = 1;
	count++;
	}

	if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients");

	sna = new SNA*[nthreads];

	// allocate memory for per OpenMP thread data which
	// is wrapped into the sna class

	#if defined(_OPENMP)
	#pragma omp parallel default(none)
	#endif
	{
	int tid = omp_get_thread_num();
	sna[tid] = new SNA(lmp,rfac0,twojmax,
	diagonalstyle,use_shared_arrays,
	rmin0,switchflag,bzeroflag);
	if (!use_shared_arrays)
	sna[tid]->grow_rij(nmax);
	}

	if (ncoeff != sna[0]->ncoeff) {
	printf("ncoeff = %d snancoeff = %d \n",ncoeff,sna[0]->ncoeff);
	error->all(FLERR,"Incorrect SNAP parameter file");
	}

	// Calculate maximum cutoff for all elements

	rcutmax = 0.0;
	for (int ielem = 0; ielem < nelements; ielem++)
	rcutmax = MAX(2.0radelem[ielem]rcutfac,rcutmax);

	}

	/* ----------------------------------------------------------------------
	init specific to this pair style
	------------------------------------------------------------------------- */

	void PairSNAP::init_style()
	{
	if (force->newton_pair == 0)
	error->all(FLERR,"Pair style SNAP requires newton pair on");

	// need a full neighbor list

	int irequest = neighbor->request(this,instance_me);
	neighbor->requests[irequest]->half = 0;
	neighbor->requests[irequest]->full = 1;

	#if defined(_OPENMP)
	#pragma omp parallel default(none)
	#endif
	{
	int tid = omp_get_thread_num();
	sna[tid]->init();
	}

	}

	/* ----------------------------------------------------------------------
	init for one type pair i,j and corresponding j,i
	------------------------------------------------------------------------- */

	double PairSNAP::init_one(int i, int j)
	{
	if (setflag[i][j] == 0) error->all(FLERR,"All pair coeffs are not set");
	return (radelem[map[i]] +
	radelem[map[j]])*rcutfac;
	}

	/* ---------------------------------------------------------------------- */

	void PairSNAP::read_files(char coefffilename, char paramfilename)
	{

	// open SNAP ceofficient file on proc 0

	FILE *fpcoeff;
	if (comm->me == 0) {
	fpcoeff = force->open_potential(coefffilename);
	if (fpcoeff == NULL) {
	char str[128];
	sprintf(str,"Cannot open SNAP coefficient file %s",coefffilename);
	error->one(FLERR,str);
	}
	}

	char line[MAXLINE],*ptr;
	int eof = 0;

	int n;
	int nwords = 0;
	while (nwords == 0) {
	if (comm->me == 0) {
	ptr = fgets(line,MAXLINE,fpcoeff);
	if (ptr == NULL) {
	eof = 1;
	fclose(fpcoeff);
	} else n = strlen(line) + 1;
	}
	MPI_Bcast(&eof,1,MPI_INT,0,world);
	if (eof) break;
	MPI_Bcast(&n,1,MPI_INT,0,world);
	MPI_Bcast(line,n,MPI_CHAR,0,world);

	// strip comment, skip line if blank

	if ((ptr = strchr(line,'#'))) *ptr = '\0';
	nwords = atom->count_words(line);
	}
	if (nwords != 2)
	error->all(FLERR,"Incorrect format in SNAP coefficient file");

	// words = ptrs to all words in line
	// strip single and double quotes from words

	char* words[MAXWORD];
	int iword = 0;
	words[iword] = strtok(line,"' \t\n\r\f");
	iword = 1;
	words[iword] = strtok(NULL,"' \t\n\r\f");

	int nelemfile = atoi(words[0]);
	ncoeff = atoi(words[1])-1;

	// Set up element lists

	memory->create(radelem,nelements,"pair:radelem");
	memory->create(wjelem,nelements,"pair:wjelem");
	memory->create(coeffelem,nelements,ncoeff+1,"pair:coeffelem");

	int *found = new int[nelements];
	for (int ielem = 0; ielem < nelements; ielem++)
	found[ielem] = 0;

	// Loop over elements in the SNAP coefficient file

	for (int ielemfile = 0; ielemfile < nelemfile; ielemfile++) {

	if (comm->me == 0) {
	ptr = fgets(line,MAXLINE,fpcoeff);
	if (ptr == NULL) {
	eof = 1;
	fclose(fpcoeff);
	} else n = strlen(line) + 1;
	}
	MPI_Bcast(&eof,1,MPI_INT,0,world);
	if (eof)
	error->all(FLERR,"Incorrect format in SNAP coefficient file");
	MPI_Bcast(&n,1,MPI_INT,0,world);
	MPI_Bcast(line,n,MPI_CHAR,0,world);

	nwords = atom->count_words(line);
	if (nwords != 3)
	error->all(FLERR,"Incorrect format in SNAP coefficient file");

	iword = 0;
	words[iword] = strtok(line,"' \t\n\r\f");
	iword = 1;
	words[iword] = strtok(NULL,"' \t\n\r\f");
	iword = 2;
	words[iword] = strtok(NULL,"' \t\n\r\f");

	char* elemtmp = words[0];
	double radtmp = atof(words[1]);
	double wjtmp = atof(words[2]);

	// skip if element name isn't in element list

	int ielem;
	for (ielem = 0; ielem < nelements; ielem++)
	if (strcmp(elemtmp,elements[ielem]) == 0) break;
	if (ielem == nelements) {
	if (comm->me == 0)
	for (int icoeff = 0; icoeff <= ncoeff; icoeff++)
	ptr = fgets(line,MAXLINE,fpcoeff);
	continue;
	}

	// skip if element already appeared

	if (found[ielem]) {
	if (comm->me == 0)
	for (int icoeff = 0; icoeff <= ncoeff; icoeff++)
	ptr = fgets(line,MAXLINE,fpcoeff);
	continue;
	}

	found[ielem] = 1;
	radelem[ielem] = radtmp;
	wjelem[ielem] = wjtmp;


	if (comm->me == 0) {
	if (screen) fprintf(screen,"SNAP Element = %s, Radius %g, Weight %g \n",
	elements[ielem], radelem[ielem], wjelem[ielem]);
	if (logfile) fprintf(logfile,"SNAP Element = %s, Radius %g, Weight %g \n",
	elements[ielem], radelem[ielem], wjelem[ielem]);
	}

	for (int icoeff = 0; icoeff <= ncoeff; icoeff++) {
	if (comm->me == 0) {
	ptr = fgets(line,MAXLINE,fpcoeff);
	if (ptr == NULL) {
	eof = 1;
	fclose(fpcoeff);
	} else n = strlen(line) + 1;
	}

	MPI_Bcast(&eof,1,MPI_INT,0,world);
	if (eof)
	error->all(FLERR,"Incorrect format in SNAP coefficient file");
	MPI_Bcast(&n,1,MPI_INT,0,world);
	MPI_Bcast(line,n,MPI_CHAR,0,world);

	nwords = atom->count_words(line);
	if (nwords != 1)
	error->all(FLERR,"Incorrect format in SNAP coefficient file");

	iword = 0;
	words[iword] = strtok(line,"' \t\n\r\f");

	coeffelem[ielem][icoeff] = atof(words[0]);

	}
	}

	// set flags for required keywords

	rcutfacflag = 0;
	twojmaxflag = 0;

	// Set defaults for optional keywords

	gamma = 1.0;
	gammaoneflag = 1;
	rfac0 = 0.99363;
	rmin0 = 0.0;
	diagonalstyle = 3;
	switchflag = 1;
	- bzeroflag = 0;
	+ bzeroflag = 1;
	+
	// open SNAP parameter file on proc 0

	FILE *fpparam;
	if (comm->me == 0) {
	fpparam = force->open_potential(paramfilename);
	if (fpparam == NULL) {
	char str[128];
	sprintf(str,"Cannot open SNAP parameter file %s",paramfilename);
	error->one(FLERR,str);
	}
	}

	eof = 0;
	while (1) {
	if (comm->me == 0) {
	ptr = fgets(line,MAXLINE,fpparam);
	if (ptr == NULL) {
	eof = 1;
	fclose(fpparam);
	} else n = strlen(line) + 1;
	}
	MPI_Bcast(&eof,1,MPI_INT,0,world);
	if (eof) break;
	MPI_Bcast(&n,1,MPI_INT,0,world);
	MPI_Bcast(line,n,MPI_CHAR,0,world);

	// strip comment, skip line if blank

	if ((ptr = strchr(line,'#'))) *ptr = '\0';
	nwords = atom->count_words(line);
	if (nwords == 0) continue;

	if (nwords != 2)
	error->all(FLERR,"Incorrect format in SNAP parameter file");

	// words = ptrs to all words in line
	// strip single and double quotes from words

	char* keywd = strtok(line,"' \t\n\r\f");
	char* keyval = strtok(NULL,"' \t\n\r\f");

	if (comm->me == 0) {
	if (screen) fprintf(screen,"SNAP keyword %s %s \n",keywd,keyval);
	if (logfile) fprintf(logfile,"SNAP keyword %s %s \n",keywd,keyval);
	}

	if (strcmp(keywd,"rcutfac") == 0) {
	rcutfac = atof(keyval);
	rcutfacflag = 1;
	} else if (strcmp(keywd,"twojmax") == 0) {
	twojmax = atoi(keyval);
	twojmaxflag = 1;
	} else if (strcmp(keywd,"gamma") == 0)
	gamma = atof(keyval);
	else if (strcmp(keywd,"rfac0") == 0)
	rfac0 = atof(keyval);
	else if (strcmp(keywd,"rmin0") == 0)
	rmin0 = atof(keyval);
	else if (strcmp(keywd,"diagonalstyle") == 0)
	diagonalstyle = atoi(keyval);
	else if (strcmp(keywd,"switchflag") == 0)
	switchflag = atoi(keyval);
	else if (strcmp(keywd,"bzeroflag") == 0)
	bzeroflag = atoi(keyval);
	else
	error->all(FLERR,"Incorrect SNAP parameter file");
	}

	if (rcutfacflag == 0 \|\| twojmaxflag == 0)
	error->all(FLERR,"Incorrect SNAP parameter file");

	if (gamma == 1.0) gammaoneflag = 1;
	else gammaoneflag = 0;

	delete[] found;
	}

	/* ----------------------------------------------------------------------
	memory usage
	------------------------------------------------------------------------- */

	double PairSNAP::memory_usage()
	{
	double bytes = Pair::memory_usage();
	int n = atom->ntypes+1;
	bytes += nnsizeof(int);
	bytes += nnsizeof(double);
	bytes += 3nmaxsizeof(double);
	bytes += nmax*sizeof(int);
	bytes += (2ncoeff+1)sizeof(double);
	bytes += (ncoeff3)sizeof(double);
	bytes += sna[0]->memory_usage()*nthreads;
	return bytes;
	}

	diff --git a/src/USER-CG-CMM/Install.sh b/src/USER-CGSDK/Install.sh
	similarity index 100%
	rename from src/USER-CG-CMM/Install.sh
	rename to src/USER-CGSDK/Install.sh
	diff --git a/src/USER-CG-CMM/README b/src/USER-CGSDK/README
	similarity index 58%
	rename from src/USER-CG-CMM/README
	rename to src/USER-CGSDK/README
	index b37fbd376..535bd43ac 100644
	--- a/src/USER-CG-CMM/README
	+++ b/src/USER-CGSDK/README
	@@ -1,46 +1,38 @@
	This package implements 3 commands which can be used in a LAMMPS input
	script:

	pair_style lj/sdk
	pair_style lj/sdk/coul/long
	angle_style sdk

	These styles allow coarse grained MD simulations with the
	parametrization of Shinoda, DeVane, Klein, Mol Sim, 33, 27 (2007)
	(SDK), with extensions to simulate ionic liquids, electrolytes,
	lipids and charged amino acids.

	See the doc pages for these commands for details.

	There are example scripts for using this package in
	-examples/USER/cg-cmm.
	+examples/USER/cgsdk

	This is the second generation implementation reducing the the clutter
	of the previous version. For many systems with long range
	electrostatics, it will be faster to use pair_style hybrid/overlay
	with lj/sdk and coul/long instead of the combined lj/sdk/coul/long
	-style, since the number of charged atom types is usually small. To
	-exploit this property, the use of the kspace_style pppm/cg is
	-recommended over regular pppm. For all new styles, input file backward
	-compatibility is provided. The old implementation is still available
	-through appending the /old suffix. These will be discontinued and
	-removed after the new implementation has been fully validated.
	-
	-The current version of this package should be considered beta
	-quality. The CG potentials work correctly for "normal" situations, but
	-have not been testing with all kinds of potential parameters and
	-simuation systems.
	+style, since the number of charged atom types is usually small.
	+To exploit this property, the use of the kspace_style pppm/cg is
	+recommended over regular pppm.

	The person who created this package is Axel Kohlmeyer at Temple U
	(akohlmey at gmail.com). Contact him directly if you have questions.

	---------------------------------

	Thanks for contributions, support and testing goes to

	-Wataru Shinoda (AIST, Tsukuba)
	+Wataru Shinoda (Nagoya University)
	Russell DeVane (Procter & Gamble)
	-Michael L. Klein (CMM / U Penn, Philadelphia)
	+Michael L. Klein (Temple University, Philadelphia)
	Balasubramanian Sundaram (JNCASR, Bangalore)

	-version: 0.99 / 2011-11-29
	+version: 1.0 / 2017-04-26
	diff --git a/src/USER-CG-CMM/angle_sdk.cpp b/src/USER-CGSDK/angle_sdk.cpp
	similarity index 100%
	rename from src/USER-CG-CMM/angle_sdk.cpp
	rename to src/USER-CGSDK/angle_sdk.cpp
	diff --git a/src/USER-CG-CMM/angle_sdk.h b/src/USER-CGSDK/angle_sdk.h
	similarity index 98%
	rename from src/USER-CG-CMM/angle_sdk.h
	rename to src/USER-CGSDK/angle_sdk.h
	index fbd546118..a5d917e57 100644
	--- a/src/USER-CG-CMM/angle_sdk.h
	+++ b/src/USER-CGSDK/angle_sdk.h
	@@ -1,63 +1,62 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifdef ANGLE_CLASS

	AngleStyle(sdk,AngleSDK)
	-AngleStyle(cg/cmm,AngleSDK)

	#else

	#ifndef LMP_ANGLE_SDK_H
	#define LMP_ANGLE_SDK_H

	#include <stdio.h>
	#include "angle.h"

	namespace LAMMPS_NS {

	class AngleSDK : public Angle {
	public:
	AngleSDK(class LAMMPS *);
	virtual ~AngleSDK();
	virtual void compute(int, int);
	void coeff(int, char **);
	void init_style();
	double equilibrium_angle(int);
	void write_restart(FILE *);
	void read_restart(FILE *);
	void write_data(FILE *);
	double single(int, int, int, int);

	protected:
	double k,theta0;

	// scaling factor for repulsive 1-3 interaction
	double *repscale;
	// parameters from SDK pair style
	int **lj_type;
	double lj1,lj2, lj3, lj4;
	double rminsq,emin;

	int repflag; // 1 if we have to handle 1-3 repulsion

	void ev_tally13(int, int, int, int, double, double,
	double, double, double);

	void allocate();
	};

	}

	#endif
	#endif
	diff --git a/src/USER-CG-CMM/lj_sdk_common.h b/src/USER-CGSDK/lj_sdk_common.h
	similarity index 100%
	rename from src/USER-CG-CMM/lj_sdk_common.h
	rename to src/USER-CGSDK/lj_sdk_common.h
	diff --git a/src/USER-CG-CMM/pair_lj_sdk.cpp b/src/USER-CGSDK/pair_lj_sdk.cpp
	similarity index 100%
	rename from src/USER-CG-CMM/pair_lj_sdk.cpp
	rename to src/USER-CGSDK/pair_lj_sdk.cpp
	diff --git a/src/USER-CG-CMM/pair_lj_sdk.h b/src/USER-CGSDK/pair_lj_sdk.h
	similarity index 98%
	rename from src/USER-CG-CMM/pair_lj_sdk.h
	rename to src/USER-CGSDK/pair_lj_sdk.h
	index de27485c1..ef0263c06 100644
	--- a/src/USER-CG-CMM/pair_lj_sdk.h
	+++ b/src/USER-CGSDK/pair_lj_sdk.h
	@@ -1,76 +1,75 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Axel Kohlmeyer (Temple U)
	------------------------------------------------------------------------- */

	#ifdef PAIR_CLASS

	PairStyle(lj/sdk,PairLJSDK)
	-PairStyle(cg/cmm,PairLJSDK)

	#else

	#ifndef LMP_PAIR_LJ_SDK_H
	#define LMP_PAIR_LJ_SDK_H

	#include "pair.h"

	namespace LAMMPS_NS {
	class LAMMPS;

	class PairLJSDK : public Pair {
	public:
	PairLJSDK(LAMMPS *);
	virtual ~PairLJSDK();
	virtual void compute(int, int);
	virtual void settings(int, char **);
	virtual void coeff(int, char **);
	virtual double init_one(int, int);
	void write_restart(FILE *);
	void read_restart(FILE *);
	void write_restart_settings(FILE *);
	void read_restart_settings(FILE *);
	void write_data(FILE *);
	void write_data_all(FILE *);
	double single(int, int, int, int, double, double, double, double &);
	void extract(const char , int &);
	virtual double memory_usage();

	protected:
	int **lj_type; // type of lennard jones potential

	double **cut;
	double epsilon,sigma;
	double lj1,lj2,lj3,lj4,**offset;

	// cutoff and offset for minimum of LJ potential
	// to be used in SDK angle potential, which
	// uses only the repulsive part of the potential

	double rminsq, emin;

	double cut_global;

	virtual void allocate();

	private:
	template <int EVFLAG, int EFLAG, int NEWTON_PAIR> void eval();

	};

	}

	#endif
	#endif
	diff --git a/src/USER-CG-CMM/pair_lj_sdk_coul_long.cpp b/src/USER-CGSDK/pair_lj_sdk_coul_long.cpp
	similarity index 100%
	rename from src/USER-CG-CMM/pair_lj_sdk_coul_long.cpp
	rename to src/USER-CGSDK/pair_lj_sdk_coul_long.cpp
	diff --git a/src/USER-CG-CMM/pair_lj_sdk_coul_long.h b/src/USER-CGSDK/pair_lj_sdk_coul_long.h
	similarity index 97%
	rename from src/USER-CG-CMM/pair_lj_sdk_coul_long.h
	rename to src/USER-CGSDK/pair_lj_sdk_coul_long.h
	index 508ffe5e6..57779cc0b 100644
	--- a/src/USER-CG-CMM/pair_lj_sdk_coul_long.h
	+++ b/src/USER-CGSDK/pair_lj_sdk_coul_long.h
	@@ -1,77 +1,76 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Axel Kohlmeyer (Temple U)
	------------------------------------------------------------------------- */

	#ifdef PAIR_CLASS

	PairStyle(lj/sdk/coul/long,PairLJSDKCoulLong)
	-PairStyle(cg/cmm/coul/long,PairLJSDKCoulLong)

	#else

	#ifndef LMP_PAIR_LJ_SDK_COUL_LONG_H
	#define LMP_PAIR_LJ_SDK_COUL_LONG_H

	#include "pair.h"

	namespace LAMMPS_NS {

	class PairLJSDKCoulLong : public Pair {
	public:
	PairLJSDKCoulLong(class LAMMPS *);
	virtual ~PairLJSDKCoulLong();
	virtual void compute(int, int);
	virtual void settings(int, char **);
	void coeff(int, char **);
	void init_style();
	double init_one(int, int);
	void write_restart(FILE *);
	void read_restart(FILE *);
	void write_data(FILE *);
	void write_data_all(FILE *);
	virtual void write_restart_settings(FILE *);
	virtual void read_restart_settings(FILE *);
	virtual double single(int, int, int, int, double, double, double, double &);
	virtual void extract(const char , int &);
	virtual double memory_usage();

	protected:
	double cut_lj,cut_ljsq;
	double cut_coul,cut_coulsq;
	double epsilon,sigma;
	double lj1,lj2,lj3,lj4,**offset;
	int **lj_type;

	// cutoff and offset for minimum of LJ potential
	// to be used in SDK angle potential, which
	// uses only the repulsive part of the potential

	double rminsq, emin;

	double cut_lj_global;
	double g_ewald;

	void allocate();

	private:
	template <int EVFLAG, int EFLAG, int NEWTON_PAIR> void eval();

	};

	}

	#endif
	#endif
	diff --git a/src/USER-CG-CMM/pair_lj_sdk_coul_msm.cpp b/src/USER-CGSDK/pair_lj_sdk_coul_msm.cpp
	similarity index 100%
	rename from src/USER-CG-CMM/pair_lj_sdk_coul_msm.cpp
	rename to src/USER-CGSDK/pair_lj_sdk_coul_msm.cpp
	diff --git a/src/USER-CG-CMM/pair_lj_sdk_coul_msm.h b/src/USER-CGSDK/pair_lj_sdk_coul_msm.h
	similarity index 97%
	rename from src/USER-CG-CMM/pair_lj_sdk_coul_msm.h
	rename to src/USER-CGSDK/pair_lj_sdk_coul_msm.h
	index be56c0cec..8438ced66 100644
	--- a/src/USER-CG-CMM/pair_lj_sdk_coul_msm.h
	+++ b/src/USER-CGSDK/pair_lj_sdk_coul_msm.h
	@@ -1,57 +1,56 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Axel Kohlmeyer (Temple U)
	------------------------------------------------------------------------- */

	#ifdef PAIR_CLASS

	PairStyle(lj/sdk/coul/msm,PairLJSDKCoulMSM)
	-PairStyle(cg/cmm/coul/msm,PairLJSDKCoulMSM)

	#else

	#ifndef LMP_PAIR_LJ_SDK_COUL_MSM_H
	#define LMP_PAIR_LJ_SDK_COUL_MSM_H

	#include "pair_lj_sdk_coul_long.h"

	namespace LAMMPS_NS {

	class PairLJSDKCoulMSM : public PairLJSDKCoulLong {
	public:
	PairLJSDKCoulMSM(class LAMMPS *);
	virtual ~PairLJSDKCoulMSM() {};
	virtual void compute(int, int);
	virtual double single(int, int, int, int, double, double, double, double &);
	virtual void extract(const char , int &);

	private:
	template <int EVFLAG, int EFLAG, int NEWTON_PAIR> void eval_msm();

	};

	}

	#endif
	#endif

	/* ERROR/WARNING messages:

	E: Must use 'kspace_modify pressure/scalar no' with Pair style

	The kspace scalar pressure option is not (yet) compatible with at least one of
	the defined Pair styles.

	*/
	diff --git a/src/USER-MISC/fix_srp.cpp b/src/USER-MISC/fix_srp.cpp
	index fbd8473cb..f3dec42a8 100644
	--- a/src/USER-MISC/fix_srp.cpp
	+++ b/src/USER-MISC/fix_srp.cpp
	@@ -1,631 +1,638 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing authors: Timothy Sirk (ARL), Pieter in't Veld (BASF)
	------------------------------------------------------------------------- */

	#include <string.h>
	#include <stdlib.h>
	#include "fix_srp.h"
	#include "atom.h"
	#include "force.h"
	#include "domain.h"
	#include "comm.h"
	#include "memory.h"
	#include "error.h"
	#include "neighbor.h"
	#include "atom_vec.h"
	#include "modify.h"

	using namespace LAMMPS_NS;
	using namespace FixConst;

	/* ---------------------------------------------------------------------- */

	FixSRP::FixSRP(LAMMPS lmp, int narg, char *arg) : Fix(lmp, narg, arg)
	{
	// settings
	nevery=1;
	peratom_freq = 1;
	time_integrate = 0;
	create_attribute = 0;
	comm_border = 2;

	// restart settings
	restart_global = 1;
	restart_peratom = 1;
	restart_pbc = 1;

	// per-atom array width 2
	peratom_flag = 1;
	size_peratom_cols = 2;

	// initial allocation of atom-based array
	// register with Atom class
	array = NULL;
	grow_arrays(atom->nmax);

	// extends pack_exchange()
	atom->add_callback(0);
	atom->add_callback(1); // restart
	atom->add_callback(2);

	// initialize to illegal values so we capture
	btype = -1;
	bptype = -1;

	// zero
	for (int i = 0; i < atom->nmax; i++)
	for (int m = 0; m < 2; m++)
	array[i][m] = 0.0;
	}

	/* ---------------------------------------------------------------------- */

	FixSRP::~FixSRP()
	{
	// unregister callbacks to this fix from Atom class
	atom->delete_callback(id,0);
	atom->delete_callback(id,1);
	atom->delete_callback(id,2);
	memory->destroy(array);
	}

	/* ---------------------------------------------------------------------- */

	int FixSRP::setmask()
	{
	int mask = 0;
	mask \|= PRE_FORCE;
	mask \|= PRE_EXCHANGE;
	mask \|= POST_RUN;

	return mask;
	}

	/* ---------------------------------------------------------------------- */

	void FixSRP::init()
	{
	if (force->pair_match("hybrid",1) == NULL)
	error->all(FLERR,"Cannot use pair srp without pair_style hybrid");

	+ int has_rigid = 0;
	+ for (int i = 0; i < modify->nfix; i++)
	+ if (strncmp(modify->fix[i]->style,"rigid",5) == 0) ++has_rigid;
	+
	+ if (has_rigid > 0)
	+ error->all(FLERR,"Pair srp is not compatible with rigid fixes.");
	+
	if ((bptype < 1) \|\| (bptype > atom->ntypes))
	error->all(FLERR,"Illegal bond particle type");

	// fix SRP should be the first fix running at the PRE_EXCHANGE step.
	// Otherwise it might conflict with, e.g. fix deform

	if (modify->n_pre_exchange > 1) {
	char *first = modify->fix[modify->list_pre_exchange[0]]->id;
	if ((comm->me == 0) && (strcmp(id,first) != 0))
	error->warning(FLERR,"Internal fix for pair srp defined too late."
	" May lead to incorrect behavior.");
	}

	// setup neigh exclusions for diff atom types
	// bond particles do not interact with other types
	// type bptype only interacts with itself
	char* arg1[4];
	arg1[0] = (char *) "exclude";
	arg1[1] = (char *) "type";
	char c0[20];
	char c1[20];

	for(int z = 1; z < atom->ntypes; z++) {
	if(z == bptype)
	continue;
	sprintf(c0, "%d", z);
	arg1[2] = c0;
	sprintf(c1, "%d", bptype);
	arg1[3] = c1;
	neighbor->modify_params(4, arg1);
	}
	}

	/* ----------------------------------------------------------------------
	insert bond particles
	------------------------------------------------------------------------- */

	void FixSRP::setup_pre_force(int zz)
	{
	double **x = atom->x;
	double **xold;
	tagint *tag = atom->tag;
	tagint *tagold;
	int *type = atom->type;
	int* dlist;
	AtomVec *avec = atom->avec;
	int **bondlist = neighbor->bondlist;

	int nlocal, nlocal_old;
	nlocal = nlocal_old = atom->nlocal;
	bigint nall = atom->nlocal + atom->nghost;
	int nbondlist = neighbor->nbondlist;
	int i,j,n;

	// make a copy of all coordinates and tags
	// that is consistent with the bond list as
	// atom->x will be affected by creating/deleting atoms.
	// also compile list of local atoms to be deleted.

	memory->create(xold,nall,3,"fix_srp:xold");
	memory->create(tagold,nall,"fix_srp:tagold");
	memory->create(dlist,nall,"fix_srp:dlist");

	for (i = 0; i < nall; i++){
	xold[i][0] = x[i][0];
	xold[i][1] = x[i][1];
	xold[i][2] = x[i][2];
	tagold[i]=tag[i];
	dlist[i] = (type[i] == bptype) ? 1 : 0;
	for (n = 0; n < 2; n++)
	array[i][n] = 0.0;
	}

	// delete local atoms flagged in dlist
	i = 0;
	int ndel = 0;
	while (i < nlocal) {
	if (dlist[i]) {
	avec->copy(nlocal-1,i,1);
	dlist[i] = dlist[nlocal-1];
	nlocal--;
	ndel++;
	} else i++;
	}

	atom->nlocal = nlocal;
	memory->destroy(dlist);

	int nadd = 0;
	double rsqold = 0.0;
	double delx, dely, delz, rmax, rsq, rsqmax;
	double xone[3];

	for (n = 0; n < nbondlist; n++) {

	// consider only the user defined bond type
	// btype of zero considers all bonds
	if(btype > 0 && bondlist[n][2] != btype)
	continue;

	i = bondlist[n][0];
	j = bondlist[n][1];

	// position of bond i
	xone[0] = (xold[i][0] + xold[j][0])*0.5;
	xone[1] = (xold[i][1] + xold[j][1])*0.5;
	xone[2] = (xold[i][2] + xold[j][2])*0.5;

	// record longest bond
	// this used to set ghost cutoff
	delx = xold[j][0] - xold[i][0];
	dely = xold[j][1] - xold[i][1];
	delz = xold[j][2] - xold[i][2];
	rsq = delxdelx + delydely + delz*delz;
	if(rsq > rsqold) rsqold = rsq;

	// make one particle for each bond
	// i is local
	// if newton bond, always make particle
	// if j is local, always make particle
	// if j is ghost, decide from tag

	if ((force->newton_bond) \|\| (j < nlocal_old) \|\| (tagold[i] > tagold[j])) {
	atom->natoms++;
	avec->create_atom(bptype,xone);
	// pack tag i/j into buffer for comm
	array[atom->nlocal-1][0] = static_cast<double>(tagold[i]);
	array[atom->nlocal-1][1] = static_cast<double>(tagold[j]);
	nadd++;
	}
	}

	bigint nblocal = atom->nlocal;
	MPI_Allreduce(&nblocal,&atom->natoms,1,MPI_LMP_BIGINT,MPI_SUM,world);

	// free temporary storage
	memory->destroy(xold);
	memory->destroy(tagold);

	char str[128];
	int nadd_all = 0, ndel_all = 0;
	MPI_Allreduce(&ndel,&ndel_all,1,MPI_INT,MPI_SUM,world);
	MPI_Allreduce(&nadd,&nadd_all,1,MPI_INT,MPI_SUM,world);
	if(comm->me == 0){
	sprintf(str, "Removed/inserted %d/%d bond particles.", ndel_all,nadd_all);
	error->message(FLERR,str);
	}

	// check ghost comm distances
	// warn and change if shorter from estimate
	// ghost atoms must be present for bonds on edge of neighbor cutoff
	// extend cutghost slightly more than half of the longest bond
	MPI_Allreduce(&rsqold,&rsqmax,1,MPI_DOUBLE,MPI_MAX,world);
	rmax = sqrt(rsqmax);
	double cutneighmax_srp = neighbor->cutneighmax + 0.51*rmax;

	// find smallest cutghost
	double cutghostmin = comm->cutghost[0];
	if (cutghostmin > comm->cutghost[1])
	cutghostmin = comm->cutghost[1];
	if (cutghostmin > comm->cutghost[2])
	cutghostmin = comm->cutghost[2];

	// stop if cutghost is insufficient
	if (cutneighmax_srp > cutghostmin){
	sprintf(str, "Communication cutoff too small for fix srp. "
	"Need %f, current %f.", cutneighmax_srp, cutghostmin);
	error->all(FLERR,str);
	}

	// assign tags for new atoms, update map
	atom->tag_extend();
	if (atom->map_style) {
	atom->nghost = 0;
	atom->map_init();
	atom->map_set();
	}

	// put new particles in the box before exchange
	// move owned to new procs
	// get ghosts
	// build neigh lists again

	// if triclinic, lambda coords needed for pbc, exchange, borders
	if (domain->triclinic) domain->x2lamda(atom->nlocal);
	domain->pbc();
	comm->setup();
	if (neighbor->style) neighbor->setup_bins();
	comm->exchange();
	if (atom->sortfreq > 0) atom->sort();
	comm->borders();
	// back to box coords
	if (domain->triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	domain->image_check();
	domain->box_too_small_check();
	modify->setup_pre_neighbor();
	neighbor->build();
	neighbor->ncalls = 0;

	// new atom counts

	nlocal = atom->nlocal;
	nall = atom->nlocal + atom->nghost;

	// zero all forces

	for(i = 0; i < nall; i++)
	atom->f[i][0] = atom->f[i][1] = atom->f[i][2] = 0.0;

	// do not include bond particles in thermo output
	// remove them from all groups. set their velocity to zero.

	for(i=0; i< nlocal; i++)
	if(atom->type[i] == bptype) {
	atom->mask[i] = 0;
	atom->v[i][0] = atom->v[i][1] = atom->v[i][2] = 0.0;
	}
	}

	/* ----------------------------------------------------------------------
	set position of bond particles
	------------------------------------------------------------------------- */

	void FixSRP::pre_exchange()
	{
	// update ghosts
	comm->forward_comm();

	// reassign bond particle coordinates to midpoint of bonds
	// only need to do this before neigh rebuild
	double **x=atom->x;
	int i,j;
	int nlocal = atom->nlocal;

	for(int ii = 0; ii < nlocal; ii++){
	if(atom->type[ii] != bptype) continue;

	i = atom->map(static_cast<tagint>(array[ii][0]));
	if(i < 0) error->all(FLERR,"Fix SRP failed to map atom");
	i = domain->closest_image(ii,i);

	j = atom->map(static_cast<tagint>(array[ii][1]));
	if(j < 0) error->all(FLERR,"Fix SRP failed to map atom");
	j = domain->closest_image(ii,j);

	// position of bond particle ii
	atom->x[ii][0] = (x[i][0] + x[j][0])*0.5;
	atom->x[ii][1] = (x[i][1] + x[j][1])*0.5;
	atom->x[ii][2] = (x[i][2] + x[j][2])*0.5;
	}
	}

	/* ----------------------------------------------------------------------
	memory usage of local atom-based array
	------------------------------------------------------------------------- */

	double FixSRP::memory_usage()
	{
	double bytes = atom->nmax2 sizeof(double);
	return bytes;
	}

	/* ----------------------------------------------------------------------
	allocate atom-based array
	------------------------------------------------------------------------- */

	void FixSRP::grow_arrays(int nmax)
	{
	memory->grow(array,nmax,2,"fix_srp:array");
	array_atom = array;
	}

	/* ----------------------------------------------------------------------
	copy values within local atom-based array
	called when move to new proc
	------------------------------------------------------------------------- */

	void FixSRP::copy_arrays(int i, int j, int delflag)
	{
	for (int m = 0; m < 2; m++)
	array[j][m] = array[i][m];
	}

	/* ----------------------------------------------------------------------
	initialize one atom's array values
	called when atom is created
	------------------------------------------------------------------------- */

	void FixSRP::set_arrays(int i)
	{
	array[i][0] = -1;
	array[i][1] = -1;
	}

	/* ----------------------------------------------------------------------
	pack values in local atom-based array for exchange with another proc
	------------------------------------------------------------------------- */

	int FixSRP::pack_exchange(int i, double *buf)
	{
	for (int m = 0; m < 2; m++) buf[m] = array[i][m];
	return 2;
	}

	/* ----------------------------------------------------------------------
	unpack values in local atom-based array from exchange with another proc
	------------------------------------------------------------------------- */

	int FixSRP::unpack_exchange(int nlocal, double *buf)
	{
	for (int m = 0; m < 2; m++) array[nlocal][m] = buf[m];
	return 2;
	}
	/* ----------------------------------------------------------------------
	pack values for border communication at re-neighboring
	------------------------------------------------------------------------- */

	int FixSRP::pack_border(int n, int list, double buf)
	{
	// pack buf for border com
	int i,j;
	int m = 0;
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = array[j][0];
	buf[m++] = array[j][1];
	}
	return m;
	}

	/* ----------------------------------------------------------------------
	unpack values for border communication at re-neighboring
	------------------------------------------------------------------------- */

	int FixSRP::unpack_border(int n, int first, double *buf)
	{
	// unpack buf into array
	int i,last;
	int m = 0;
	last = first + n;

	for (i = first; i < last; i++){
	array[i][0] = buf[m++];
	array[i][1] = buf[m++];
	}
	return m;
	}

	/* ----------------------------------------------------------------------
	remove particles after run
	------------------------------------------------------------------------- */

	void FixSRP::post_run()
	{
	// all bond particles are removed after each run
	// useful for write_data and write_restart commands
	// since those commands occur between runs

	bigint natoms_previous = atom->natoms;
	int nlocal = atom->nlocal;
	int* dlist;
	memory->create(dlist,nlocal,"fix_srp:dlist");

	for (int i = 0; i < nlocal; i++){
	if(atom->type[i] == bptype)
	dlist[i] = 1;
	else
	dlist[i] = 0;
	}

	// delete local atoms flagged in dlist
	// reset nlocal

	AtomVec *avec = atom->avec;

	int i = 0;
	while (i < nlocal) {
	if (dlist[i]) {
	avec->copy(nlocal-1,i,1);
	dlist[i] = dlist[nlocal-1];
	nlocal--;
	} else i++;
	}

	atom->nlocal = nlocal;
	memory->destroy(dlist);

	// reset atom->natoms
	// reset atom->map if it exists
	// set nghost to 0 so old ghosts won't be mapped

	bigint nblocal = atom->nlocal;
	MPI_Allreduce(&nblocal,&atom->natoms,1,MPI_LMP_BIGINT,MPI_SUM,world);
	if (atom->map_style) {
	atom->nghost = 0;
	atom->map_init();
	atom->map_set();
	}

	// print before and after atom count

	bigint ndelete = natoms_previous - atom->natoms;

	if (comm->me == 0) {
	if (screen) fprintf(screen,"Deleted " BIGINT_FORMAT
	" atoms, new total = " BIGINT_FORMAT "\n",
	ndelete,atom->natoms);
	if (logfile) fprintf(logfile,"Deleted " BIGINT_FORMAT
	" atoms, new total = " BIGINT_FORMAT "\n",
	ndelete,atom->natoms);
	}

	// verlet calls box_too_small_check() in post_run
	// this check maps all bond partners
	// therefore need ghosts

	// need to convert to lambda coords before apply pbc
	if (domain->triclinic) domain->x2lamda(atom->nlocal);
	domain->pbc();
	comm->setup();
	comm->exchange();
	if (atom->sortfreq > 0) atom->sort();
	comm->borders();
	// change back to box coordinates
	if (domain->triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	}

	/* ----------------------------------------------------------------------
	pack values in local atom-based arrays for restart file
	------------------------------------------------------------------------- */

	int FixSRP::pack_restart(int i, double *buf)
	{
	int m = 0;
	buf[m++] = 3;
	buf[m++] = array[i][0];
	buf[m++] = array[i][1];
	return m;
	}

	/* ----------------------------------------------------------------------
	unpack values from atom->extra array to restart the fix
	------------------------------------------------------------------------- */

	void FixSRP::unpack_restart(int nlocal, int nth)
	{
	double **extra = atom->extra;

	// skip to Nth set of extra values
	int m = 0;
	for (int i = 0; i < nth; i++){
	m += extra[nlocal][m];
	}

	m++;
	array[nlocal][0] = extra[nlocal][m++];
	array[nlocal][1] = extra[nlocal][m++];

	}
	/* ----------------------------------------------------------------------
	maxsize of any atom's restart data
	------------------------------------------------------------------------- */

	int FixSRP::maxsize_restart()
	{
	return 3;
	}

	/* ----------------------------------------------------------------------
	size of atom nlocal's restart data
	------------------------------------------------------------------------- */

	int FixSRP::size_restart(int nlocal)
	{
	return 3;
	}

	/* ----------------------------------------------------------------------
	pack global state of Fix
	------------------------------------------------------------------------- */

	void FixSRP::write_restart(FILE *fp)
	{
	int n = 0;
	double list[3];
	list[n++] = comm->cutghostuser;
	list[n++] = btype;
	list[n++] = bptype;

	if (comm->me == 0) {
	int size = n * sizeof(double);
	fwrite(&size,sizeof(int),1,fp);
	fwrite(list,sizeof(double),n,fp);
	}
	}

	/* ----------------------------------------------------------------------
	use info from restart file to restart the Fix
	------------------------------------------------------------------------- */

	void FixSRP::restart(char *buf)
	{
	int n = 0;
	double list = (double ) buf;

	comm->cutghostuser = static_cast<double> (list[n++]);
	btype = static_cast<int> (list[n++]);
	bptype = static_cast<int> (list[n++]);
	}

	/* ----------------------------------------------------------------------
	interface with pair class
	pair srp sets the bond type in this fix
	------------------------------------------------------------------------- */

	int FixSRP::modify_param(int narg, char **arg)
	{
	if (strcmp(arg[0],"btype") == 0) {
	btype = atoi(arg[1]);
	return 2;
	}
	if (strcmp(arg[0],"bptype") == 0) {
	bptype = atoi(arg[1]);
	return 2;
	}
	return 0;
	}

	diff --git a/src/USER-MISC/improper_ring.cpp b/src/USER-MISC/improper_ring.cpp
	index 5a7937e4e..adf17ed1d 100644
	--- a/src/USER-MISC/improper_ring.cpp
	+++ b/src/USER-MISC/improper_ring.cpp
	@@ -1,339 +1,339 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Georgios G. Vogiatzis (CoMSE, NTU Athens),
	gvog@chemeng.ntua.gr
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Description: This file implements the improper potential introduced
	by Destree et al., in Equation 9 of:
	- M. Destree, F. Laupretre, A. Lyulin, and J.-P.
	Ryckaert, J. Chem. Phys. 112, 9632 (2000),
	and subsequently referred in:
	- A.V. Lyulin, M.A.J Michels, Macromolecules, 35, 1463,
	(2002)
	This potential does not affect small amplitude vibrations
	but is used in an ad hoc way to prevent the onset of
	accidentally large amplitude fluctuations leading to
	the occurrence of a planar conformation of the three
	bonds i, i + 1 and i', an intermediate conformation
	toward the chiral inversion of a methine carbon.
	In the "Impropers" section of data file four atoms:
	i, j, k and l are specified with i,j and l lying on the
	backbone of the chain and k specifying the chirality
	of j.
	------------------------------------------------------------------------- */

	#include <mpi.h>
	#include <math.h>
	#include <stdlib.h>
	#include "improper_ring.h"
	#include "atom.h"
	#include "comm.h"
	#include "neighbor.h"
	#include "domain.h"
	#include "force.h"
	#include "update.h"
	#include "math_const.h"
	#include "math_special.h"
	#include "memory.h"
	#include "error.h"

	using namespace LAMMPS_NS;
	using namespace MathConst;
	using namespace MathSpecial;

	#define TOLERANCE 0.05
	#define SMALL 0.001

	/* ---------------------------------------------------------------------- */

	ImproperRing::ImproperRing(LAMMPS *lmp) : Improper(lmp) {}

	/* ---------------------------------------------------------------------- */

	ImproperRing::~ImproperRing()
	{
	if (allocated) {
	memory->destroy(setflag);
	memory->destroy(k);
	memory->destroy(chi);
	}
	}

	/* ---------------------------------------------------------------------- */

	void ImproperRing::compute(int eflag, int vflag)
	{
	/* Be careful!: "chi" is the equilibrium angle in radians. */
	int i1,i2,i3,i4,n,type;

	double eimproper ;

	/* Compatibility variables. */
	double vb1x,vb1y,vb1z,vb2x,vb2y,vb2z,vb3x,vb3y,vb3z;
	double f1[3], f3[3], f4[3];

	/* Actual computation variables. */
	int at1[3], at2[3], at3[3], icomb;
	double bvec1x[3], bvec1y[3], bvec1z[3],
	bvec2x[3], bvec2y[3], bvec2z[3],
	bvec1n[3], bvec2n[3], bend_angle[3];
	double angle_summer, angfac, cfact1, cfact2, cfact3;
	double cjiji, ckjji, ckjkj, fix, fiy, fiz, fjx, fjy, fjz, fkx, fky, fkz;

	eimproper = 0.0;
	if (eflag \|\| vflag) ev_setup(eflag,vflag);
	else evflag = 0;

	/* References to simulation data. */
	double **x = atom->x;
	double **f = atom->f;
	int **improperlist = neighbor->improperlist;
	int nimproperlist = neighbor->nimproperlist;
	int nlocal = atom->nlocal;
	int newton_bond = force->newton_bond;


	/* A description of the potential can be found in
	Macromolecules 35, pp. 1463-1472 (2002). */
	for (n = 0; n < nimproperlist; n++)
	{
	/* Take the ids of the atoms contributing to the improper potential. */
	i1 = improperlist[n][0]; /* Atom "1" of Figure 1 from the above reference.*/
	i2 = improperlist[n][1]; /* Atom "2" ... */
	i3 = improperlist[n][2]; /* Atom "3" ... */
	i4 = improperlist[n][3]; /* Atom "9" ... */
	type = improperlist[n][4];

	/* Calculate the necessary variables for LAMMPS implementation.
	if (evflag) ev_tally(i1,i2,i3,i4,nlocal,newton_bond,eimproper,f1,f3,f4,
	vb1x,vb1y,vb1z,vb2x,vb2y,vb2z,vb3x,vb3y,vb3z);
	Although, they are irrelevant to the calculation of the potential, we keep
	them for maximal compatibility. */
	vb1x = x[i1][0] - x[i2][0]; vb1y = x[i1][1] - x[i2][1]; vb1z = x[i1][2] - x[i2][2];

	vb2x = x[i3][0] - x[i2][0]; vb2y = x[i3][1] - x[i2][1]; vb2z = x[i3][2] - x[i2][2];

	vb3x = x[i4][0] - x[i3][0]; vb3y = x[i4][1] - x[i3][1]; vb3z = x[i4][2] - x[i3][2];


	/* Pass the atom tags to form the necessary combinations. */
	at1[0] = i1; at2[0] = i2; at3[0] = i4; /* ids: 1-2-9 */
	at1[1] = i1; at2[1] = i2; at3[1] = i3; /* ids: 1-2-3 */
	at1[2] = i4; at2[2] = i2; at3[2] = i3; /* ids: 9-2-3 */


	/* Initialize the sum of the angles differences. */
	angle_summer = 0.0;
	/* Take a loop over the three angles, defined by each triad: */
	for (icomb = 0; icomb < 3; icomb ++)
	{

	/* Bond vector connecting the first and the second atom. */
	bvec1x[icomb] = x[at2[icomb]][0] - x[at1[icomb]][0];
	bvec1y[icomb] = x[at2[icomb]][1] - x[at1[icomb]][1];
	bvec1z[icomb] = x[at2[icomb]][2] - x[at1[icomb]][2];
	/* also calculate the norm of the vector: */
	bvec1n[icomb] = sqrt( bvec1x[icomb]*bvec1x[icomb]
	+ bvec1y[icomb]*bvec1y[icomb]
	+ bvec1z[icomb]*bvec1z[icomb]);
	/* Bond vector connecting the second and the third atom. */
	bvec2x[icomb] = x[at3[icomb]][0] - x[at2[icomb]][0];
	bvec2y[icomb] = x[at3[icomb]][1] - x[at2[icomb]][1];
	bvec2z[icomb] = x[at3[icomb]][2] - x[at2[icomb]][2];
	/* also calculate the norm of the vector: */
	bvec2n[icomb] = sqrt( bvec2x[icomb]*bvec2x[icomb]
	+ bvec2y[icomb]*bvec2y[icomb]
	+ bvec2z[icomb]*bvec2z[icomb]);

	/* Calculate the bending angle of the atom triad: */
	bend_angle[icomb] = ( bvec2x[icomb]*bvec1x[icomb]
	+ bvec2y[icomb]*bvec1y[icomb]
	+ bvec2z[icomb]*bvec1z[icomb]);
	bend_angle[icomb] /= (bvec1n[icomb] * bvec2n[icomb]);
	if (bend_angle[icomb] > 1.0) bend_angle[icomb] -= SMALL;
	if (bend_angle[icomb] < -1.0) bend_angle[icomb] += SMALL;

	/* Append the current angle to the sum of angle differences. */
	angle_summer += (bend_angle[icomb] - chi[type]);
	}
	if (eflag) eimproper = (1.0/6.0) k[type] powint(angle_summer,6);
	/*
	printf("The tags: %d-%d-%d-%d, of type %d .\n",atom->tag[i1],atom->tag[i2],atom->tag[i3],atom->tag[i4],type);
	// printf("The coordinates of the first: %f, %f, %f.\n", x[i1][0], x[i1][1], x[i1][2]);
	// printf("The coordinates of the second: %f, %f, %f.\n", x[i2][0], x[i2][1], x[i2][2]);
	// printf("The coordinates of the third: %f, %f, %f.\n", x[i3][0], x[i3][1], x[i3][2]);
	// printf("The coordinates of the fourth: %f, %f, %f.\n", x[i4][0], x[i4][1], x[i4][2]);
	printf("The angles are: %f / %f / %f equilibrium: %f.\n", bend_angle[0], bend_angle[1], bend_angle[2],chi[type]);
	printf("The energy of the improper: %f with prefactor %f.\n", eimproper,(1.0/6.0)*k[type]);
	printf("The sum of the angles: %f.\n", angle_summer);
	*/

	/* Force calculation acting on all atoms.
	Calculate the derivatives of the potential. */
	angfac = k[type] * powint(angle_summer,5);

	f1[0] = 0.0; f1[1] = 0.0; f1[2] = 0.0;
	f3[0] = 0.0; f3[1] = 0.0; f3[2] = 0.0;
	f4[0] = 0.0; f4[1] = 0.0; f4[2] = 0.0;

	/* Take a loop over the three angles, defined by each triad: */
	for (icomb = 0; icomb < 3; icomb ++)
	{
	/* Calculate the squares of the distances. */
	cjiji = bvec1n[icomb] * bvec1n[icomb]; ckjkj = bvec2n[icomb] * bvec2n[icomb];

	ckjji = bvec2x[icomb] * bvec1x[icomb]
	+ bvec2y[icomb] * bvec1y[icomb]
	+ bvec2z[icomb] * bvec1z[icomb] ;

	cfact1 = angfac / (sqrt(ckjkj * cjiji));
	cfact2 = ckjji / ckjkj;
	cfact3 = ckjji / cjiji;

	- /* Calculate the force acted on the thrid atom of the angle. */
	+ /* Calculate the force acted on the third atom of the angle. */
	fkx = cfact2 * bvec2x[icomb] - bvec1x[icomb];
	fky = cfact2 * bvec2y[icomb] - bvec1y[icomb];
	fkz = cfact2 * bvec2z[icomb] - bvec1z[icomb];

	/* Calculate the force acted on the first atom of the angle. */
	fix = bvec2x[icomb] - cfact3 * bvec1x[icomb];
	fiy = bvec2y[icomb] - cfact3 * bvec1y[icomb];
	fiz = bvec2z[icomb] - cfact3 * bvec1z[icomb];

	/* Finally, calculate the force acted on the middle atom of the angle.*/
	fjx = - fix - fkx; fjy = - fiy - fky; fjz = - fiz - fkz;

	/* Consider the appropriate scaling of the forces: */
	fix = cfact1; fiy = cfact1; fiz *= cfact1;
	fjx = cfact1; fjy = cfact1; fjz *= cfact1;
	fkx = cfact1; fky = cfact1; fkz *= cfact1;

	if (at1[icomb] == i1) {f1[0] += fix; f1[1] += fiy; f1[2] += fiz;}
	else if (at2[icomb] == i1) {f1[0] += fjx; f1[1] += fjy; f1[2] += fjz;}
	else if (at3[icomb] == i1) {f1[0] += fkx; f1[1] += fky; f1[2] += fkz;}

	if (at1[icomb] == i3) {f3[0] += fix; f3[1] += fiy; f3[2] += fiz;}
	else if (at2[icomb] == i3) {f3[0] += fjx; f3[1] += fjy; f3[2] += fjz;}
	else if (at3[icomb] == i3) {f3[0] += fkx; f3[1] += fky; f3[2] += fkz;}

	if (at1[icomb] == i4) {f4[0] += fix; f4[1] += fiy; f4[2] += fiz;}
	else if (at2[icomb] == i4) {f4[0] += fjx; f4[1] += fjy; f4[2] += fjz;}
	else if (at3[icomb] == i4) {f4[0] += fkx; f4[1] += fky; f4[2] += fkz;}


	/* Store the contribution to the global arrays: */
	/* Take the id of the atom from the at1[icomb] element, i1 = at1[icomb]. */
	if (newton_bond \|\| at1[icomb] < nlocal) {
	f[at1[icomb]][0] += fix;
	f[at1[icomb]][1] += fiy;
	f[at1[icomb]][2] += fiz;
	}
	/* Take the id of the atom from the at2[icomb] element, i2 = at2[icomb]. */
	if (newton_bond \|\| at2[icomb] < nlocal) {
	f[at2[icomb]][0] += fjx;
	f[at2[icomb]][1] += fjy;
	f[at2[icomb]][2] += fjz;
	}
	/* Take the id of the atom from the at3[icomb] element, i3 = at3[icomb]. */
	if (newton_bond \|\| at3[icomb] < nlocal) {
	f[at3[icomb]][0] += fkx;
	f[at3[icomb]][1] += fky;
	f[at3[icomb]][2] += fkz;
	}

	}

	if (evflag) ev_tally(i1,i2,i3,i4,nlocal,newton_bond,eimproper,f1,f3,f4,
	vb1x,vb1y,vb1z,vb2x,vb2y,vb2z,vb3x,vb3y,vb3z);

	}
	}

	/* ---------------------------------------------------------------------- */

	void ImproperRing::allocate()
	{
	allocated = 1;
	int n = atom->nimpropertypes;

	memory->create(k,n+1,"improper:k");
	memory->create(chi,n+1,"improper:chi");

	memory->create(setflag,n+1,"improper:setflag");
	for (int i = 1; i <= n; i++) setflag[i] = 0;
	}

	/* ----------------------------------------------------------------------
	set coeffs for one type
	------------------------------------------------------------------------- */

	void ImproperRing ::coeff(int narg, char **arg)
	{
	/* Check whether there exist sufficient number of arguments.
	0: type of improper to be applied to
	1: energetic constant
	2: equilibrium angle in degrees */
	if (narg != 3) error->all(FLERR,"Incorrect args for RING improper coefficients");
	if (!allocated) allocate();

	int ilo,ihi;
	force->bounds(FLERR,arg[0],atom->nimpropertypes,ilo,ihi);

	double k_one = force->numeric(FLERR,arg[1]);
	double chi_one = force->numeric(FLERR,arg[2]);

	int count = 0;
	for (int i = ilo; i <= ihi; i++) {
	/* Read the k parameter in kcal/mol. */
	k[i] = k_one;
	/* "chi_one" stores the equilibrium angle in degrees.
	Convert it to radians and store its cosine. */
	chi[i] = cos((chi_one/180.0)*MY_PI);
	setflag[i] = 1;
	count++;
	}

	if (count == 0) error->all(FLERR,"Incorrect args for improper coefficients");
	}

	/* ----------------------------------------------------------------------
	proc 0 writes out coeffs to restart file
	------------------------------------------------------------------------- */

	void ImproperRing ::write_restart(FILE *fp)
	{
	fwrite(&k[1],sizeof(double),atom->nimpropertypes,fp);
	fwrite(&chi[1],sizeof(double),atom->nimpropertypes,fp);
	}

	/* ----------------------------------------------------------------------
	proc 0 reads coeffs from restart file, bcasts them
	------------------------------------------------------------------------- */

	void ImproperRing::read_restart(FILE *fp)
	{
	allocate();

	if (comm->me == 0) {
	fread(&k[1],sizeof(double),atom->nimpropertypes,fp);
	fread(&chi[1],sizeof(double),atom->nimpropertypes,fp);
	}
	MPI_Bcast(&k[1],atom->nimpropertypes,MPI_DOUBLE,0,world);
	MPI_Bcast(&chi[1],atom->nimpropertypes,MPI_DOUBLE,0,world);

	for (int i = 1; i <= atom->nimpropertypes; i++) setflag[i] = 1;
	}
	diff --git a/src/USER-MOLFILE/README b/src/USER-MOLFILE/README
	index f6defed6a..4437b587e 100644
	--- a/src/USER-MOLFILE/README
	+++ b/src/USER-MOLFILE/README
	@@ -1,35 +1,22 @@
	This package provides a C++ interface class to the VMD molfile
	plugins, http://www.ks.uiuc.edu/Research/vmd/plugins/molfile, and a
	set of LAMMPS classes that use this interface.

	-Molfile plugins provide a consistent programming interface to read and
	-write file formats commonly used in molecular simulations. This
	+Molfile plugins provide a consistent programming interface to read
	+and write file formats commonly used in molecular simulations. This
	package only provides the interface code, not the plugins; these can
	be taken as precompiled binaries directly from a VMD installation that
	matches the platform of your LAMMPS executable. Using the plugin
	interface one can add support for additional file formats to LAMMPS
	simply by telling LAMMPS where to find a suitable plugin without
	having to recompile or change LAMMPS directly. The plugins bundled
	with VMD are usually installed in a directory inside the VMD
	installation tree named "plugins/<VMDARCH>/molfile".

	To be able to dynamically load and execute the plugins from inside
	LAMMPS, you need to link with an appropriate system library, which
	is done using the settings in lib/molfile/Makefile.lammps. See
	that file and the lib/molfile/README file for more details.

	-NOTE: while the programming interface (API) to the molfile plugins is
	-backward compatible (i.e. you can expect to be able to compile this
	-package for plugins from newer VMD packages), the binary interface
	-(ABI) is not. So it is necessary to compile this package with the
	-molfile plugin header files (vmdplugin.h and molfile_plugin.h) taken
	-from the _same_ VMD installation that the (binary) plugin files are
	-taken from. These header files can be found inside the VMD
	-installation tree under: "plugins/include".
	-
	-For convenience, this package includes a set of header files that is
	-compatible with VMD 1.9 and 1.9.1 (the current version in June 2012)
	-and should be compilable with VMD versions back to about version 1.8.4
	-
	The person who created this package is Axel Kohlmeyer at Temple U
	(akohlmey at gmail.com). Contact him directly if you have questions.
	diff --git a/src/USER-NC-DUMP/Install.sh b/src/USER-NETCDF/Install.sh
	similarity index 100%
	rename from src/USER-NC-DUMP/Install.sh
	rename to src/USER-NETCDF/Install.sh
	diff --git a/src/USER-NC-DUMP/README b/src/USER-NETCDF/README
	similarity index 95%
	rename from src/USER-NC-DUMP/README
	rename to src/USER-NETCDF/README
	index c02e879c6..57dec5e4c 100644
	--- a/src/USER-NC-DUMP/README
	+++ b/src/USER-NETCDF/README
	@@ -1,39 +1,39 @@
	-USER-NC-DUMP
	+USER-NETCDF
	============

	-This package provides the nc and (optionally) the nc/mpiio dump styles.
	+This package provides the netcf and netcdf/mpiio dump styles.
	See the doc page for dump nc or dump nc/mpiio command for how to use them.
	Compiling these dump styles requires having the netCDF library installed
	on your system. See lib/netcdf/README for additional details.

	PACKAGE DESCRIPTION
	-------------------

	This is a LAMMPS (http://lammps.sandia.gov/) dump style for output into a NetCDF
	database. The database format follows the AMBER NetCDF trajectory convention
	(http://ambermd.org/netcdf/nctraj.xhtml), but includes extensions to this
	convention. These extension are:
	* A variable "cell_origin" (of dimension "frame", "cell_spatial") that contains
	the bottom left corner of the simulation cell.
	* Any number of additional variables corresponding to per atom scalar, vector
	or tensor quantities available within LAMMPS. Tensor quantities are written in
	Voigt notation. An additional dimension "Voigt" of length 6 is created for
	this purpose.
	* Possibility to output to an HDF5 database.

	NetCDF files can be directly visualized with the following tools:
	* Ovito (http://www.ovito.org/). Ovito supports the AMBER convention and all of
	the above extensions.
	* VMD (http://www.ks.uiuc.edu/Research/vmd/).
	* AtomEye (http://www.libatoms.org/). The libAtoms version of AtomEye contains
	a NetCDF reader that is not present in the standard distribution of AtomEye.

	The person who created these files is Lars Pastewka at
	Karlsruhe Institute of Technology (lars.pastewka@kit.edu).
	Contact him directly if you have questions.

	Lars Pastewka
	Institute for Applied Materials (IAM)
	Karlsruhe Institute of Technology (KIT)
	Kaiserstrasse 12, 76131 Karlsruhe
	e-mail: lars.pastewka@kit.edu
	diff --git a/src/USER-NC-DUMP/dump_nc.cpp b/src/USER-NETCDF/dump_netcdf.cpp
	similarity index 97%
	rename from src/USER-NC-DUMP/dump_nc.cpp
	rename to src/USER-NETCDF/dump_netcdf.cpp
	index 7a66eb022..bad90bdef 100644
	--- a/src/USER-NC-DUMP/dump_nc.cpp
	+++ b/src/USER-NETCDF/dump_netcdf.cpp
	@@ -1,1144 +1,1141 @@
	/* ======================================================================
	LAMMPS NetCDF dump style
	https://github.com/pastewka/lammps-netcdf
	Lars Pastewka, lars.pastewka@kit.edu

	Copyright (2011-2013) Fraunhofer IWM
	Copyright (2014) Karlsruhe Institute of Technology

	This program is free software: you can redistribute it and/or modify
	it under the terms of the GNU General Public License as published by
	the Free Software Foundation, either version 2 of the License, or
	(at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	GNU General Public License for more details.

	You should have received a copy of the GNU General Public License
	along with this program. If not, see <http://www.gnu.org/licenses/>.
	====================================================================== */

	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */
	+
	#if defined(LMP_HAS_NETCDF)

	#include <unistd.h>
	#include <stdlib.h>
	#include <string.h>
	-
	#include <netcdf.h>
	-
	+#include "dump_netcdf.h"
	#include "atom.h"
	#include "comm.h"
	#include "compute.h"
	#include "domain.h"
	#include "error.h"
	#include "fix.h"
	#include "group.h"
	#include "input.h"
	#include "math_const.h"
	#include "memory.h"
	#include "modify.h"
	#include "update.h"
	#include "universe.h"
	#include "variable.h"
	#include "force.h"

	-#include "dump_nc.h"
	-
	using namespace LAMMPS_NS;
	using namespace MathConst;

	enum{INT,DOUBLE}; // same as in dump_custom.cpp

	const char NC_FRAME_STR[] = "frame";
	const char NC_SPATIAL_STR[] = "spatial";
	const char NC_VOIGT_STR[] = "Voigt";
	const char NC_ATOM_STR[] = "atom";
	const char NC_CELL_SPATIAL_STR[] = "cell_spatial";
	const char NC_CELL_ANGULAR_STR[] = "cell_angular";
	const char NC_LABEL_STR[] = "label";

	const char NC_TIME_STR[] = "time";
	const char NC_CELL_ORIGIN_STR[] = "cell_origin";
	const char NC_CELL_LENGTHS_STR[] = "cell_lengths";
	const char NC_CELL_ANGLES_STR[] = "cell_angles";

	const char NC_UNITS_STR[] = "units";
	const char NC_SCALE_FACTOR_STR[] = "scale_factor";

	const int THIS_IS_A_FIX = -1;
	const int THIS_IS_A_COMPUTE = -2;
	const int THIS_IS_A_VARIABLE = -3;
	const int THIS_IS_A_BIGINT = -4;

	/* ---------------------------------------------------------------------- */

	#define NCERR(x) ncerr(x, NULL, __LINE__)
	#define NCERRX(x, descr) ncerr(x, descr, __LINE__)

	/* ---------------------------------------------------------------------- */

	-DumpNC::DumpNC(LAMMPS lmp, int narg, char *arg) :
	+DumpNetCDF::DumpNetCDF(LAMMPS lmp, int narg, char *arg) :
	DumpCustom(lmp, narg, arg)
	{
	// arrays for data rearrangement

	sort_flag = 1;
	sortcol = 0;
	binary = 1;
	flush_flag = 0;

	if (multiproc)
	error->all(FLERR,"Multi-processor writes are not supported.");
	if (multifile)
	error->all(FLERR,"Multiple files are not supported.");

	perat = new nc_perat_t[nfield];

	for (int i = 0; i < nfield; i++) {
	perat[i].dims = 0;
	}

	n_perat = 0;
	for (int iarg = 5; iarg < narg; iarg++) {
	int i = iarg-5;
	int idim = 0;
	int ndims = 1;
	char mangled[1024];
	bool constant = false;

	strcpy(mangled, arg[iarg]);

	// name mangling
	// in the AMBER specification
	if (!strcmp(mangled, "x") \|\| !strcmp(mangled, "y") \|\|
	!strcmp(mangled, "z")) {
	idim = mangled[0] - 'x';
	ndims = 3;
	strcpy(mangled, "coordinates");
	}
	else if (!strcmp(mangled, "vx") \|\| !strcmp(mangled, "vy") \|\|
	!strcmp(mangled, "vz")) {
	idim = mangled[1] - 'x';
	ndims = 3;
	strcpy(mangled, "velocities");
	}
	// extensions to the AMBER specification
	else if (!strcmp(mangled, "type")) {
	strcpy(mangled, "atom_types");
	}
	else if (!strcmp(mangled, "xs") \|\| !strcmp(mangled, "ys") \|\|
	!strcmp(mangled, "zs")) {
	idim = mangled[0] - 'x';
	ndims = 3;
	strcpy(mangled, "scaled_coordinates");
	}
	else if (!strcmp(mangled, "xu") \|\| !strcmp(mangled, "yu") \|\|
	!strcmp(mangled, "zu")) {
	idim = mangled[0] - 'x';
	ndims = 3;
	strcpy(mangled, "unwrapped_coordinates");
	}
	else if (!strcmp(mangled, "fx") \|\| !strcmp(mangled, "fy") \|\|
	!strcmp(mangled, "fz")) {
	idim = mangled[1] - 'x';
	ndims = 3;
	strcpy(mangled, "forces");
	}
	else if (!strcmp(mangled, "mux") \|\| !strcmp(mangled, "muy") \|\|
	!strcmp(mangled, "muz")) {
	idim = mangled[2] - 'x';
	ndims = 3;
	strcpy(mangled, "mu");
	}
	else if (!strncmp(mangled, "c_", 2)) {
	char *ptr = strchr(mangled, '[');
	if (ptr) {
	if (mangled[strlen(mangled)-1] != ']')
	error->all(FLERR,"Missing ']' in dump command");
	*ptr = '\0';
	idim = ptr[1] - '1';
	ndims = THIS_IS_A_COMPUTE;
	}
	}
	else if (!strncmp(mangled, "f_", 2)) {
	char *ptr = strchr(mangled, '[');
	if (ptr) {
	if (mangled[strlen(mangled)-1] != ']')
	error->all(FLERR,"Missing ']' in dump command");
	*ptr = '\0';
	idim = ptr[1] - '1';
	ndims = THIS_IS_A_FIX;
	}
	}

	// find mangled name
	int inc = -1;
	for (int j = 0; j < n_perat && inc < 0; j++) {
	if (!strcmp(perat[j].name, mangled)) {
	inc = j;
	}
	}

	if (inc < 0) {
	// this has not yet been defined
	inc = n_perat;
	perat[inc].dims = ndims;
	if (ndims < 0) ndims = DUMP_NC_MAX_DIMS;
	for (int j = 0; j < DUMP_NC_MAX_DIMS; j++) {
	perat[inc].field[j] = -1;
	}
	strcpy(perat[inc].name, mangled);
	n_perat++;
	}

	perat[inc].constant = constant;
	perat[inc].ndumped = 0;
	perat[inc].field[idim] = i;
	}

	n_perframe = 0;
	perframe = NULL;

	n_buffer = 0;
	int_buffer = NULL;
	double_buffer = NULL;

	double_precision = false;

	framei = 0;
	}

	/* ---------------------------------------------------------------------- */

	-DumpNC::~DumpNC()
	+DumpNetCDF::~DumpNetCDF()
	{
	closefile();

	delete [] perat;
	if (n_perframe > 0)
	delete [] perframe;

	if (int_buffer) memory->sfree(int_buffer);
	if (double_buffer) memory->sfree(double_buffer);
	}

	/* ---------------------------------------------------------------------- */

	-void DumpNC::openfile()
	+void DumpNetCDF::openfile()
	{
	// now the computes and fixes have been initialized, so we can query
	// for the size of vector quantities
	for (int i = 0; i < n_perat; i++) {
	if (perat[i].dims == THIS_IS_A_COMPUTE) {
	int j = -1;
	for (int k = 0; k < DUMP_NC_MAX_DIMS; k++) {
	if (perat[i].field[k] >= 0) {
	j = field2index[perat[i].field[0]];
	}
	}
	if (j < 0)
	error->all(FLERR,"Internal error.");
	if (!compute[j]->peratom_flag)
	error->all(FLERR,"compute does not provide per atom data");
	perat[i].dims = compute[j]->size_peratom_cols;
	if (perat[i].dims > DUMP_NC_MAX_DIMS)
	error->all(FLERR,"perat[i].dims > DUMP_NC_MAX_DIMS");
	}
	else if (perat[i].dims == THIS_IS_A_FIX) {
	int j = -1;
	for (int k = 0; k < DUMP_NC_MAX_DIMS; k++) {
	if (perat[i].field[k] >= 0) {
	j = field2index[perat[i].field[0]];
	}
	}
	if (j < 0)
	error->all(FLERR,"Internal error.");
	if (!fix[j]->peratom_flag)
	error->all(FLERR,"fix does not provide per atom data");
	perat[i].dims = fix[j]->size_peratom_cols;
	if (perat[i].dims > DUMP_NC_MAX_DIMS)
	error->all(FLERR,"perat[i].dims > DUMP_NC_MAX_DIMS");
	}
	}

	// get total number of atoms
	ntotalgr = group->count(igroup);

	if (filewriter) {
	if (append_flag && access(filename, F_OK) != -1) {
	// Fixme! Perform checks if dimensions and variables conform with
	// data structure standard.

	if (singlefile_opened) return;
	singlefile_opened = 1;

	NCERRX( nc_open(filename, NC_WRITE, &ncid), filename );

	// dimensions
	NCERRX( nc_inq_dimid(ncid, NC_FRAME_STR, &frame_dim), NC_FRAME_STR );
	NCERRX( nc_inq_dimid(ncid, NC_SPATIAL_STR, &spatial_dim),
	NC_SPATIAL_STR );
	NCERRX( nc_inq_dimid(ncid, NC_VOIGT_STR, &Voigt_dim), NC_VOIGT_STR );
	NCERRX( nc_inq_dimid(ncid, NC_ATOM_STR, &atom_dim), NC_ATOM_STR );
	NCERRX( nc_inq_dimid(ncid, NC_CELL_SPATIAL_STR, &cell_spatial_dim),
	NC_CELL_SPATIAL_STR );
	NCERRX( nc_inq_dimid(ncid, NC_CELL_ANGULAR_STR, &cell_angular_dim),
	NC_CELL_ANGULAR_STR );
	NCERRX( nc_inq_dimid(ncid, NC_LABEL_STR, &label_dim), NC_LABEL_STR );

	// default variables
	NCERRX( nc_inq_varid(ncid, NC_SPATIAL_STR, &spatial_var),
	NC_SPATIAL_STR );
	NCERRX( nc_inq_varid(ncid, NC_CELL_SPATIAL_STR, &cell_spatial_var),
	NC_CELL_SPATIAL_STR);
	NCERRX( nc_inq_varid(ncid, NC_CELL_ANGULAR_STR, &cell_angular_var),
	NC_CELL_ANGULAR_STR);

	NCERRX( nc_inq_varid(ncid, NC_TIME_STR, &time_var), NC_TIME_STR );
	NCERRX( nc_inq_varid(ncid, NC_CELL_ORIGIN_STR, &cell_origin_var),
	NC_CELL_ORIGIN_STR );
	NCERRX( nc_inq_varid(ncid, NC_CELL_LENGTHS_STR, &cell_lengths_var),
	NC_CELL_LENGTHS_STR);
	NCERRX( nc_inq_varid(ncid, NC_CELL_ANGLES_STR, &cell_angles_var),
	NC_CELL_ANGLES_STR);

	// variables specified in the input file
	for (int i = 0; i < n_perat; i++) {
	nc_type xtype;

	// Type mangling
	if (vtype[perat[i].field[0]] == INT) {
	xtype = NC_INT;
	}
	else {
	if (double_precision)
	xtype = NC_DOUBLE;
	else
	xtype = NC_FLOAT;
	}

	NCERRX( nc_inq_varid(ncid, perat[i].name, &perat[i].var),
	perat[i].name );
	}

	// perframe variables
	for (int i = 0; i < n_perframe; i++) {
	NCERRX( nc_inq_varid(ncid, perframe[i].name, &perframe[i].var),
	perframe[i].name );
	}

	size_t nframes;
	NCERR( nc_inq_dimlen(ncid, frame_dim, &nframes) );
	// framei == -1 means append to file, == -2 means override last frame
	// Note that in the input file this translates to 'yes', '-1', etc.
	if (framei < 0 \|\| (append_flag && framei == 0)) framei = nframes+framei+1;
	if (framei < 1) framei = 1;
	}
	else {
	int dims[NC_MAX_VAR_DIMS];
	size_t index[NC_MAX_VAR_DIMS], count[NC_MAX_VAR_DIMS];
	double d[1];

	if (singlefile_opened) return;
	singlefile_opened = 1;

	NCERRX( nc_create(filename, NC_64BIT_OFFSET, &ncid),
	filename );

	// dimensions
	NCERRX( nc_def_dim(ncid, NC_FRAME_STR, NC_UNLIMITED, &frame_dim),
	NC_FRAME_STR );
	NCERRX( nc_def_dim(ncid, NC_SPATIAL_STR, 3, &spatial_dim),
	NC_SPATIAL_STR );
	NCERRX( nc_def_dim(ncid, NC_VOIGT_STR, 6, &Voigt_dim),
	NC_VOIGT_STR );
	NCERRX( nc_def_dim(ncid, NC_ATOM_STR, ntotalgr, &atom_dim),
	NC_ATOM_STR );
	NCERRX( nc_def_dim(ncid, NC_CELL_SPATIAL_STR, 3, &cell_spatial_dim),
	NC_CELL_SPATIAL_STR );
	NCERRX( nc_def_dim(ncid, NC_CELL_ANGULAR_STR, 3, &cell_angular_dim),
	NC_CELL_ANGULAR_STR );
	NCERRX( nc_def_dim(ncid, NC_LABEL_STR, 10, &label_dim),
	NC_LABEL_STR );

	// default variables
	dims[0] = spatial_dim;
	NCERRX( nc_def_var(ncid, NC_SPATIAL_STR, NC_CHAR, 1, dims, &spatial_var),
	NC_SPATIAL_STR );
	NCERRX( nc_def_var(ncid, NC_CELL_SPATIAL_STR, NC_CHAR, 1, dims,
	&cell_spatial_var), NC_CELL_SPATIAL_STR );
	dims[0] = spatial_dim;
	dims[1] = label_dim;
	NCERRX( nc_def_var(ncid, NC_CELL_ANGULAR_STR, NC_CHAR, 2, dims,
	&cell_angular_var), NC_CELL_ANGULAR_STR );

	dims[0] = frame_dim;
	NCERRX( nc_def_var(ncid, NC_TIME_STR, NC_DOUBLE, 1, dims, &time_var),
	NC_TIME_STR);
	dims[0] = frame_dim;
	dims[1] = cell_spatial_dim;
	NCERRX( nc_def_var(ncid, NC_CELL_ORIGIN_STR, NC_DOUBLE, 2, dims,
	&cell_origin_var), NC_CELL_ORIGIN_STR );
	NCERRX( nc_def_var(ncid, NC_CELL_LENGTHS_STR, NC_DOUBLE, 2, dims,
	&cell_lengths_var), NC_CELL_LENGTHS_STR );
	dims[0] = frame_dim;
	dims[1] = cell_angular_dim;
	NCERRX( nc_def_var(ncid, NC_CELL_ANGLES_STR, NC_DOUBLE, 2, dims,
	&cell_angles_var), NC_CELL_ANGLES_STR );

	// variables specified in the input file
	dims[0] = frame_dim;
	dims[1] = atom_dim;
	dims[2] = spatial_dim;

	for (int i = 0; i < n_perat; i++) {
	nc_type xtype;

	// Type mangling
	if (vtype[perat[i].field[0]] == INT) {
	xtype = NC_INT;
	}
	else {
	if (double_precision)
	xtype = NC_DOUBLE;
	else
	xtype = NC_FLOAT;
	}

	if (perat[i].constant) {
	// this quantity will only be written once
	if (perat[i].dims == 6) {
	// this is a tensor in Voigt notation
	dims[2] = Voigt_dim;
	NCERRX( nc_def_var(ncid, perat[i].name, xtype, 2, dims+1,
	&perat[i].var), perat[i].name );
	}
	else if (perat[i].dims == 3) {
	// this is a vector, we need to store x-, y- and z-coordinates
	dims[2] = spatial_dim;
	NCERRX( nc_def_var(ncid, perat[i].name, xtype, 2, dims+1,
	&perat[i].var), perat[i].name );
	}
	else if (perat[i].dims == 1) {
	NCERRX( nc_def_var(ncid, perat[i].name, xtype, 1, dims+1,
	&perat[i].var), perat[i].name );
	}
	else {
	char errstr[1024];
	sprintf(errstr, "%i dimensions for '%s'. Not sure how to write "
	"this to the NetCDF trajectory file.", perat[i].dims,
	perat[i].name);
	error->all(FLERR,errstr);
	}
	}
	else {
	if (perat[i].dims == 6) {
	// this is a tensor in Voigt notation
	dims[2] = Voigt_dim;
	NCERRX( nc_def_var(ncid, perat[i].name, xtype, 3, dims,
	&perat[i].var), perat[i].name );
	}
	else if (perat[i].dims == 3) {
	// this is a vector, we need to store x-, y- and z-coordinates
	dims[2] = spatial_dim;
	NCERRX( nc_def_var(ncid, perat[i].name, xtype, 3, dims,
	&perat[i].var), perat[i].name );
	}
	else if (perat[i].dims == 1) {
	NCERRX( nc_def_var(ncid, perat[i].name, xtype, 2, dims,
	&perat[i].var), perat[i].name );
	}
	else {
	char errstr[1024];
	sprintf(errstr, "%i dimensions for '%s'. Not sure how to write "
	"this to the NetCDF trajectory file.", perat[i].dims,
	perat[i].name);
	error->all(FLERR,errstr);
	}
	}
	}

	// perframe variables
	for (int i = 0; i < n_perframe; i++) {
	if (perframe[i].type == THIS_IS_A_BIGINT) {
	NCERRX( nc_def_var(ncid, perframe[i].name, NC_LONG, 1, dims,
	&perframe[i].var), perframe[i].name );
	}
	else {
	NCERRX( nc_def_var(ncid, perframe[i].name, NC_DOUBLE, 1, dims,
	&perframe[i].var), perframe[i].name );
	}
	}

	// attributes
	NCERR( nc_put_att_text(ncid, NC_GLOBAL, "Conventions",
	5, "AMBER") );
	NCERR( nc_put_att_text(ncid, NC_GLOBAL, "ConventionVersion",
	3, "1.0") );

	NCERR( nc_put_att_text(ncid, NC_GLOBAL, "program",
	6, "LAMMPS") );
	NCERR( nc_put_att_text(ncid, NC_GLOBAL, "programVersion",
	strlen(universe->version), universe->version) );

	// units
	if (!strcmp(update->unit_style, "lj")) {
	NCERR( nc_put_att_text(ncid, time_var, NC_UNITS_STR,
	2, "lj") );
	NCERR( nc_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
	2, "lj") );
	NCERR( nc_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
	2, "lj") );
	}
	else if (!strcmp(update->unit_style, "real")) {
	NCERR( nc_put_att_text(ncid, time_var, NC_UNITS_STR,
	11, "femtosecond") );
	NCERR( nc_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
	8, "Angstrom") );
	NCERR( nc_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
	8, "Angstrom") );
	}
	else if (!strcmp(update->unit_style, "metal")) {
	NCERR( nc_put_att_text(ncid, time_var, NC_UNITS_STR,
	10, "picosecond") );
	NCERR( nc_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
	8, "Angstrom") );
	NCERR( nc_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
	8, "Angstrom") );
	}
	else if (!strcmp(update->unit_style, "si")) {
	NCERR( nc_put_att_text(ncid, time_var, NC_UNITS_STR,
	6, "second") );
	NCERR( nc_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
	5, "meter") );
	NCERR( nc_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
	5, "meter") );
	}
	else if (!strcmp(update->unit_style, "cgs")) {
	NCERR( nc_put_att_text(ncid, time_var, NC_UNITS_STR,
	6, "second") );
	NCERR( nc_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
	10, "centimeter") );
	NCERR( nc_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
	10, "centimeter") );
	}
	else if (!strcmp(update->unit_style, "electron")) {
	NCERR( nc_put_att_text(ncid, time_var, NC_UNITS_STR,
	11, "femtosecond") );
	NCERR( nc_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
	4, "Bohr") );
	NCERR( nc_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
	4, "Bohr") );
	}
	else {
	char errstr[1024];
	sprintf(errstr, "Unsupported unit style '%s'", update->unit_style);
	error->all(FLERR,errstr);
	}

	NCERR( nc_put_att_text(ncid, cell_angles_var, NC_UNITS_STR,
	6, "degree") );

	d[0] = update->dt;
	NCERR( nc_put_att_double(ncid, time_var, NC_SCALE_FACTOR_STR,
	NC_DOUBLE, 1, d) );
	d[0] = 1.0;
	NCERR( nc_put_att_double(ncid, cell_origin_var, NC_SCALE_FACTOR_STR,
	NC_DOUBLE, 1, d) );
	d[0] = 1.0;
	NCERR( nc_put_att_double(ncid, cell_lengths_var, NC_SCALE_FACTOR_STR,
	NC_DOUBLE, 1, d) );

	/*
	* Finished with definition
	*/

	NCERR( nc_enddef(ncid) );

	/*
	* Write label variables
	*/

	NCERR( nc_put_var_text(ncid, spatial_var, "xyz") );
	NCERR( nc_put_var_text(ncid, cell_spatial_var, "abc") );
	index[0] = 0;
	index[1] = 0;
	count[0] = 1;
	count[1] = 5;
	NCERR( nc_put_vara_text(ncid, cell_angular_var, index, count, "alpha") );
	index[0] = 1;
	count[1] = 4;
	NCERR( nc_put_vara_text(ncid, cell_angular_var, index, count, "beta") );
	index[0] = 2;
	count[1] = 5;
	NCERR( nc_put_vara_text(ncid, cell_angular_var, index, count, "gamma") );

	framei = 1;
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	-void DumpNC::closefile()
	+void DumpNetCDF::closefile()
	{
	if (filewriter && singlefile_opened) {
	NCERR( nc_close(ncid) );
	singlefile_opened = 0;
	- // append next time DumpNC::openfile is called
	+ // append next time DumpNetCDF::openfile is called
	append_flag = 1;
	// write to next frame upon next open
	framei++;
	}
	}

	/* ---------------------------------------------------------------------- */

	-void DumpNC::write()
	+void DumpNetCDF::write()
	{
	// open file

	openfile();

	// need to write per-frame (global) properties here since they may come
	// from computes. write_header below is only called from the writing
	// processes, but modify->compute[j]->compute_* must be called from all
	// processes.

	size_t start[2];

	start[0] = framei-1;
	start[1] = 0;

	for (int i = 0; i < n_perframe; i++) {

	if (perframe[i].type == THIS_IS_A_BIGINT) {
	bigint data;
	(this->perframe[i].compute)((void) &data);

	if (filewriter)
	#if defined(LAMMPS_SMALLBIG) \|\| defined(LAMMPS_BIGBIG)
	NCERR( nc_put_var1_long(ncid, perframe[i].var, start, &data) );
	#else
	NCERR( nc_put_var1_int(ncid, perframe[i].var, start, &data) );
	#endif
	}
	else {
	double data;
	int j = perframe[i].index;
	int idim = perframe[i].dim;

	if (perframe[i].type == THIS_IS_A_COMPUTE) {
	if (idim >= 0) {
	modify->compute[j]->compute_vector();
	data = modify->compute[j]->vector[idim];
	}
	else
	data = modify->compute[j]->compute_scalar();
	}
	else if (perframe[i].type == THIS_IS_A_FIX) {
	if (idim >= 0) {
	data = modify->fix[j]->compute_vector(idim);
	}
	else
	data = modify->fix[j]->compute_scalar();
	}
	else if (perframe[i].type == THIS_IS_A_VARIABLE) {
	j = input->variable->find(perframe[i].id);
	data = input->variable->compute_equal(j);
	}

	if (filewriter)
	NCERR( nc_put_var1_double(ncid, perframe[i].var, start, &data) );
	}
	}

	// call write of superclass

	Dump::write();

	// close file. this ensures data is flushed and mimized data corruption

	closefile();
	}

	/* ---------------------------------------------------------------------- */

	-void DumpNC::write_header(bigint n)
	+void DumpNetCDF::write_header(bigint n)
	{
	size_t start[2];

	start[0] = framei-1;
	start[1] = 0;

	if (filewriter) {
	size_t count[2];
	double time, cell_origin[3], cell_lengths[3], cell_angles[3];

	time = update->ntimestep;
	if (domain->triclinic == 0) {
	cell_origin[0] = domain->boxlo[0];
	cell_origin[1] = domain->boxlo[1];
	cell_origin[2] = domain->boxlo[2];

	cell_lengths[0] = domain->xprd;
	cell_lengths[1] = domain->yprd;
	cell_lengths[2] = domain->zprd;

	cell_angles[0] = 90;
	cell_angles[1] = 90;
	cell_angles[2] = 90;
	}
	else {
	double cosalpha, cosbeta, cosgamma;
	double *h = domain->h;

	cell_origin[0] = domain->boxlo[0];
	cell_origin[1] = domain->boxlo[1];
	cell_origin[2] = domain->boxlo[2];

	cell_lengths[0] = domain->xprd;
	cell_lengths[1] = sqrt(h[1]h[1]+h[5]h[5]);
	cell_lengths[2] = sqrt(h[2]h[2]+h[3]h[3]+h[4]*h[4]);

	cosalpha = (h[5]h[4]+h[1]h[3])/
	sqrt((h[1]h[1]+h[5]h[5])(h[2]h[2]+h[3]h[3]+h[4]h[4]));
	cosbeta = h[4]/sqrt(h[2]h[2]+h[3]h[3]+h[4]*h[4]);
	cosgamma = h[5]/sqrt(h[1]h[1]+h[5]h[5]);

	cell_angles[0] = acos(cosalpha)*180.0/MY_PI;
	cell_angles[1] = acos(cosbeta)*180.0/MY_PI;
	cell_angles[2] = acos(cosgamma)*180.0/MY_PI;
	}

	// Recent AMBER conventions say that nonperiodic boundaries should have
	// 'cell_lengths' set to zero.
	for (int dim = 0; dim < 3; dim++) {
	if (!domain->periodicity[dim])
	cell_lengths[dim] = 0.0;
	}

	count[0] = 1;
	count[1] = 3;
	NCERR( nc_put_var1_double(ncid, time_var, start, &time) );
	NCERR( nc_put_vara_double(ncid, cell_origin_var, start, count,
	cell_origin) );
	NCERR( nc_put_vara_double(ncid, cell_lengths_var, start, count,
	cell_lengths) );
	NCERR( nc_put_vara_double(ncid, cell_angles_var, start, count,
	cell_angles) );
	}

	ndata = n;
	blocki = 0;
	}


	/* ----------------------------------------------------------------------
	write data lines to file in a block-by-block style
	write head of block (mass & element name) only if has atoms of the type
	------------------------------------------------------------------------- */

	-void DumpNC::write_data(int n, double *mybuf)
	+void DumpNetCDF::write_data(int n, double *mybuf)
	{
	size_t start[NC_MAX_VAR_DIMS], count[NC_MAX_VAR_DIMS];
	ptrdiff_t stride[NC_MAX_VAR_DIMS];

	if (!int_buffer) {
	n_buffer = n;
	int_buffer = (int *)
	- memory->smalloc(n*sizeof(int), "DumpNC::int_buffer");
	+ memory->smalloc(n*sizeof(int),"dump::int_buffer");
	double_buffer = (double *)
	- memory->smalloc(n*sizeof(double), "DumpNC::double_buffer");
	+ memory->smalloc(n*sizeof(double),"dump::double_buffer");
	}

	if (n > n_buffer) {
	n_buffer = n;
	int_buffer = (int *)
	- memory->srealloc(int_buffer, n*sizeof(int), "DumpNC::int_buffer");
	+ memory->srealloc(int_buffer, n*sizeof(int),"dump::int_buffer");
	double_buffer = (double *)
	- memory->srealloc(double_buffer, n*sizeof(double),
	- "DumpNC::double_buffer");
	+ memory->srealloc(double_buffer, n*sizeof(double),"dump::double_buffer");
	}

	start[0] = framei-1;
	start[1] = blocki;
	start[2] = 0;

	count[0] = 1;
	count[1] = n;
	count[2] = 1;

	stride[0] = 1;
	stride[1] = 1;
	stride[2] = 3;

	for (int i = 0; i < n_perat; i++) {
	int iaux = perat[i].field[0];

	if (vtype[iaux] == INT) {
	// integers
	if (perat[i].dims > 1) {

	for (int idim = 0; idim < perat[i].dims; idim++) {
	iaux = perat[i].field[idim];

	if (iaux >= 0) {
	for (int j = 0; j < n; j++, iaux+=size_one) {
	int_buffer[j] = mybuf[iaux];
	}

	start[2] = idim;

	if (perat[i].constant) {
	if (perat[i].ndumped < ntotalgr) {
	NCERR( nc_put_vars_int(ncid, perat[i].var,
	start+1, count+1, stride+1,
	int_buffer) );
	perat[i].ndumped += n;
	}
	}
	else
	NCERR( nc_put_vars_int(ncid, perat[i].var, start, count, stride,
	int_buffer) );
	}
	}
	}
	else {
	for (int j = 0; j < n; j++, iaux+=size_one) {
	int_buffer[j] = mybuf[iaux];
	}

	if (perat[i].constant) {
	if (perat[i].ndumped < ntotalgr) {
	NCERR( nc_put_vara_int(ncid, perat[i].var, start+1, count+1,
	int_buffer) );
	perat[i].ndumped += n;
	}
	}
	else
	NCERR( nc_put_vara_int(ncid, perat[i].var, start, count,
	int_buffer) );
	}
	}
	else {
	// doubles
	if (perat[i].dims > 1) {

	for (int idim = 0; idim < perat[i].dims; idim++) {
	iaux = perat[i].field[idim];

	if (iaux >= 0) {
	for (int j = 0; j < n; j++, iaux+=size_one) {
	double_buffer[j] = mybuf[iaux];
	}

	start[2] = idim;

	if (perat[i].constant) {
	if (perat[i].ndumped < ntotalgr) {
	NCERR( nc_put_vars_double(ncid, perat[i].var,
	start+1, count+1, stride+1,
	double_buffer) );
	perat[i].ndumped += n;
	}
	}
	else
	NCERR( nc_put_vars_double(ncid, perat[i].var, start, count,
	stride, double_buffer) );
	}
	}
	}
	else {
	for (int j = 0; j < n; j++, iaux+=size_one) {
	double_buffer[j] = mybuf[iaux];
	}

	if (perat[i].constant) {
	if (perat[i].ndumped < ntotalgr) {
	NCERR( nc_put_vara_double(ncid, perat[i].var, start+1, count+1,
	double_buffer) );
	perat[i].ndumped += n;
	}
	}
	else
	NCERR( nc_put_vara_double(ncid, perat[i].var, start, count,
	double_buffer) );
	}
	}
	}

	blocki += n;
	}

	/* ---------------------------------------------------------------------- */

	-int DumpNC::modify_param(int narg, char **arg)
	+int DumpNetCDF::modify_param(int narg, char **arg)
	{
	int iarg = 0;
	if (strcmp(arg[iarg],"double") == 0) {
	iarg++;
	if (iarg >= narg)
	error->all(FLERR,"expected 'yes' or 'no' after 'double' keyword.");
	if (strcmp(arg[iarg],"yes") == 0) {
	double_precision = true;
	}
	else if (strcmp(arg[iarg],"no") == 0) {
	double_precision = false;
	}
	else error->all(FLERR,"expected 'yes' or 'no' after 'double' keyword.");
	iarg++;
	return 2;
	}
	else if (strcmp(arg[iarg],"at") == 0) {
	iarg++;
	framei = force->inumeric(FLERR,arg[iarg]);
	if (framei < 0) framei--;
	iarg++;
	return 2;
	}
	else if (strcmp(arg[iarg],"global") == 0) {
	// "perframe" quantities, i.e. not per-atom stuff

	iarg++;

	n_perframe = narg-iarg;
	perframe = new nc_perframe_t[n_perframe];

	for (int i = 0; iarg < narg; iarg++, i++) {
	int n;
	char *suffix=NULL;

	if (!strcmp(arg[iarg],"step")) {
	perframe[i].type = THIS_IS_A_BIGINT;
	- perframe[i].compute = &DumpNC::compute_step;
	+ perframe[i].compute = &DumpNetCDF::compute_step;
	strcpy(perframe[i].name, arg[iarg]);
	}
	else if (!strcmp(arg[iarg],"elapsed")) {
	perframe[i].type = THIS_IS_A_BIGINT;
	- perframe[i].compute = &DumpNC::compute_elapsed;
	+ perframe[i].compute = &DumpNetCDF::compute_elapsed;
	strcpy(perframe[i].name, arg[iarg]);
	}
	else if (!strcmp(arg[iarg],"elaplong")) {
	perframe[i].type = THIS_IS_A_BIGINT;
	- perframe[i].compute = &DumpNC::compute_elapsed_long;
	+ perframe[i].compute = &DumpNetCDF::compute_elapsed_long;
	strcpy(perframe[i].name, arg[iarg]);
	}
	else {

	n = strlen(arg[iarg]);

	if (n > 2) {
	suffix = new char[n-1];
	strcpy(suffix, arg[iarg]+2);
	}
	else {
	char errstr[1024];
	sprintf(errstr, "perframe quantity '%s' must thermo quantity or "
	"compute, fix or variable", arg[iarg]);
	error->all(FLERR,errstr);
	}

	if (!strncmp(arg[iarg], "c_", 2)) {
	int idim = -1;
	char *ptr = strchr(suffix, '[');

	if (ptr) {
	if (suffix[strlen(suffix)-1] != ']')
	error->all(FLERR,"Missing ']' in dump modify command");
	*ptr = '\0';
	idim = ptr[1] - '1';
	}

	n = modify->find_compute(suffix);
	if (n < 0)
	error->all(FLERR,"Could not find dump modify compute ID");
	if (modify->compute[n]->peratom_flag != 0)
	error->all(FLERR,"Dump modify compute ID computes per-atom info");
	if (idim >= 0 && modify->compute[n]->vector_flag == 0)
	error->all(FLERR,"Dump modify compute ID does not compute vector");
	if (idim < 0 && modify->compute[n]->scalar_flag == 0)
	error->all(FLERR,"Dump modify compute ID does not compute scalar");

	perframe[i].type = THIS_IS_A_COMPUTE;
	perframe[i].dim = idim;
	perframe[i].index = n;
	strcpy(perframe[i].name, arg[iarg]);
	}
	else if (!strncmp(arg[iarg], "f_", 2)) {
	int idim = -1;
	char *ptr = strchr(suffix, '[');

	if (ptr) {
	if (suffix[strlen(suffix)-1] != ']')
	error->all(FLERR,"Missing ']' in dump modify command");
	*ptr = '\0';
	idim = ptr[1] - '1';
	}

	n = modify->find_fix(suffix);
	if (n < 0)
	error->all(FLERR,"Could not find dump modify fix ID");
	if (modify->fix[n]->peratom_flag != 0)
	error->all(FLERR,"Dump modify fix ID computes per-atom info");
	if (idim >= 0 && modify->fix[n]->vector_flag == 0)
	error->all(FLERR,"Dump modify fix ID does not compute vector");
	if (idim < 0 && modify->fix[n]->scalar_flag == 0)
	error->all(FLERR,"Dump modify fix ID does not compute vector");

	perframe[i].type = THIS_IS_A_FIX;
	perframe[i].dim = idim;
	perframe[i].index = n;
	strcpy(perframe[i].name, arg[iarg]);
	}
	else if (!strncmp(arg[iarg], "v_", 2)) {
	n = input->variable->find(suffix);
	if (n < 0)
	error->all(FLERR,"Could not find dump modify variable ID");
	if (!input->variable->equalstyle(n))
	error->all(FLERR,"Dump modify variable must be of style equal");

	perframe[i].type = THIS_IS_A_VARIABLE;
	perframe[i].dim = 1;
	perframe[i].index = n;
	strcpy(perframe[i].name, arg[iarg]);
	strcpy(perframe[i].id, suffix);
	}
	else {
	char errstr[1024];
	sprintf(errstr, "perframe quantity '%s' must be compute, fix or "
	"variable", arg[iarg]);
	error->all(FLERR,errstr);
	}

	delete [] suffix;

	}
	}

	return narg;
	} else return 0;
	}

	/* ---------------------------------------------------------------------- */

	-void DumpNC::write_prmtop()
	+void DumpNetCDF::write_prmtop()
	{
	char fn[1024];
	char tmp[81];
	FILE *f;

	strcpy(fn, filename);
	strcat(fn, ".prmtop");

	f = fopen(fn, "w");
	fprintf(f, "%%VERSION LAMMPS\n");
	fprintf(f, "%%FLAG TITLE\n");
	fprintf(f, "%%FORMAT(20a4)\n");
	memset(tmp, ' ', 76);
	tmp[76] = '\0';
	fprintf(f, "NASN%s\n", tmp);

	fprintf(f, "%%FLAG POINTERS\n");
	fprintf(f, "%%FORMAT(10I8)\n");
	#if defined(LAMMPS_SMALLBIG) \|\| defined(LAMMPS_BIGBIG)
	fprintf(f, "%8li", ntotalgr);
	#else
	fprintf(f, "%8i", ntotalgr);
	#endif
	for (int i = 0; i < 11; i++)
	fprintf(f, "%8i", 0);
	fprintf(f, "\n");
	for (int i = 0; i < 12; i++)
	fprintf(f, "%8i", 0);
	fprintf(f, "\n");
	for (int i = 0; i < 6; i++)
	fprintf(f, "%8i", 0);
	fprintf(f, "\n");

	fprintf(f, "%%FLAG ATOM_NAME\n");
	fprintf(f, "%%FORMAT(20a4)\n");
	for (int i = 0; i < ntotalgr; i++) {
	fprintf(f, "%4s", "He");
	if ((i+1) % 20 == 0)
	fprintf(f, "\n");
	}

	fprintf(f, "%%FLAG CHARGE\n");
	fprintf(f, "%%FORMAT(5E16.5)\n");
	for (int i = 0; i < ntotalgr; i++) {
	fprintf(f, "%16.5e", 0.0);
	if ((i+1) % 5 == 0)
	fprintf(f, "\n");
	}

	fprintf(f, "%%FLAG MASS\n");
	fprintf(f, "%%FORMAT(5E16.5)\n");
	for (int i = 0; i < ntotalgr; i++) {
	fprintf(f, "%16.5e", 1.0);
	if ((i+1) % 5 == 0)
	fprintf(f, "\n");
	}
	fclose(f);
	}

	/* ---------------------------------------------------------------------- */

	-void DumpNC::ncerr(int err, const char *descr, int line)
	+void DumpNetCDF::ncerr(int err, const char *descr, int line)
	{
	if (err != NC_NOERR) {
	char errstr[1024];
	if (descr) {
	sprintf(errstr, "NetCDF failed with error '%s' (while accessing '%s') "
	" in line %i of %s.", nc_strerror(err), descr, line, __FILE__);
	}
	else {
	sprintf(errstr, "NetCDF failed with error '%s' in line %i of %s.",
	nc_strerror(err), line, __FILE__);
	}
	error->one(FLERR,errstr);
	}
	}

	/* ----------------------------------------------------------------------
	one method for every keyword thermo can output
	called by compute() or evaluate_keyword()
	compute will have already been called
	set ivalue/dvalue/bivalue if value is int/double/bigint
	customize a new keyword by adding a method
	------------------------------------------------------------------------- */

	-void DumpNC::compute_step(void *r)
	+void DumpNetCDF::compute_step(void *r)
	{
	((bigint ) r) = update->ntimestep;
	}

	/* ---------------------------------------------------------------------- */

	-void DumpNC::compute_elapsed(void *r)
	+void DumpNetCDF::compute_elapsed(void *r)
	{
	((bigint ) r) = update->ntimestep - update->firststep;
	}

	/* ---------------------------------------------------------------------- */

	-void DumpNC::compute_elapsed_long(void *r)
	+void DumpNetCDF::compute_elapsed_long(void *r)
	{
	((bigint ) r) = update->ntimestep - update->beginstep;
	}

	#endif /* defined(LMP_HAS_NETCDF) */
	diff --git a/src/USER-NC-DUMP/dump_nc.h b/src/USER-NETCDF/dump_netcdf.h
	similarity index 94%
	rename from src/USER-NC-DUMP/dump_nc.h
	rename to src/USER-NETCDF/dump_netcdf.h
	index 788a9368f..daf4e9d0d 100644
	--- a/src/USER-NC-DUMP/dump_nc.h
	+++ b/src/USER-NETCDF/dump_netcdf.h
	@@ -1,143 +1,144 @@
	/* ======================================================================
	LAMMPS NetCDF dump style
	https://github.com/pastewka/lammps-netcdf
	Lars Pastewka, lars.pastewka@kit.edu

	Copyright (2011-2013) Fraunhofer IWM
	Copyright (2014) Karlsruhe Institute of Technology

	This program is free software: you can redistribute it and/or modify
	it under the terms of the GNU General Public License as published by
	the Free Software Foundation, either version 2 of the License, or
	(at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	GNU General Public License for more details.

	You should have received a copy of the GNU General Public License
	along with this program. If not, see <http://www.gnu.org/licenses/>.
	====================================================================== */

	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */
	+
	#if defined(LMP_HAS_NETCDF)

	#ifdef DUMP_CLASS

	-DumpStyle(nc,DumpNC)
	+DumpStyle(netcdf,DumpNetCDF)

	#else

	-#ifndef LMP_DUMP_NC_H
	-#define LMP_DUMP_NC_H
	+#ifndef LMP_DUMP_NETCDF_H
	+#define LMP_DUMP_NETCDFC_H

	#include "dump_custom.h"

	namespace LAMMPS_NS {

	const int NC_FIELD_NAME_MAX = 100;
	const int DUMP_NC_MAX_DIMS = 100;

	-class DumpNC : public DumpCustom {
	+class DumpNetCDF : public DumpCustom {
	public:
	- DumpNC(class LAMMPS , int, char *);
	- virtual ~DumpNC();
	+ DumpNetCDF(class LAMMPS , int, char *);
	+ virtual ~DumpNetCDF();
	virtual void write();

	private:
	// per-atoms quantities (positions, velocities, etc.)
	struct nc_perat_t {
	int dims; // number of dimensions
	int field[DUMP_NC_MAX_DIMS]; // field indices corresponding to the dim.
	char name[NC_FIELD_NAME_MAX]; // field name
	int var; // NetCDF variable

	bool constant; // is this property per file (not per frame)
	int ndumped; // number of enties written for this prop.
	};

	- typedef void (DumpNC::funcptr_t)(void );
	+ typedef void (DumpNetCDF::funcptr_t)(void );

	// per-frame quantities (variables, fixes or computes)
	struct nc_perframe_t {
	char name[NC_FIELD_NAME_MAX]; // field name
	int var; // NetCDF variable
	int type; // variable, fix, compute or callback
	int index; // index in fix/compute list
	funcptr_t compute; // compute function
	int dim; // dimension
	char id[NC_FIELD_NAME_MAX]; // variable id

	bigint bigint_data; // actual data
	double double_data; // actual data
	};

	int framei; // current frame index
	int blocki; // current block index
	int ndata; // number of data blocks to expect

	bigint ntotalgr; // # of atoms

	int n_perat; // # of netcdf per-atom properties
	nc_perat_t *perat; // per-atom properties

	int n_perframe; // # of global netcdf (not per-atom) fix props
	nc_perframe_t *perframe; // global properties

	bool double_precision; // write everything as double precision

	bigint n_buffer; // size of buffer
	int *int_buffer; // buffer for passing data to netcdf
	double *double_buffer; // buffer for passing data to netcdf

	int ncid;

	int frame_dim;
	int spatial_dim;
	int Voigt_dim;
	int atom_dim;
	int cell_spatial_dim;
	int cell_angular_dim;
	int label_dim;

	int spatial_var;
	int cell_spatial_var;
	int cell_angular_var;

	int time_var;
	int cell_origin_var;
	int cell_lengths_var;
	int cell_angles_var;

	virtual void openfile();
	void closefile();
	virtual void write_header(bigint);
	virtual void write_data(int, double *);
	void write_prmtop();

	virtual int modify_param(int, char **);

	void ncerr(int, const char *, int);

	void compute_step(void *);
	void compute_elapsed(void *);
	void compute_elapsed_long(void *);
	};

	}

	#endif
	#endif
	#endif /* defined(LMP_HAS_NETCDF) */
	diff --git a/src/USER-NC-DUMP/dump_nc_mpiio.cpp b/src/USER-NETCDF/dump_netcdf_mpiio.cpp
	similarity index 96%
	rename from src/USER-NC-DUMP/dump_nc_mpiio.cpp
	rename to src/USER-NETCDF/dump_netcdf_mpiio.cpp
	index 6b2601403..2e9ec274a 100644
	--- a/src/USER-NC-DUMP/dump_nc_mpiio.cpp
	+++ b/src/USER-NETCDF/dump_netcdf_mpiio.cpp
	@@ -1,1077 +1,1074 @@
	/* ======================================================================
	LAMMPS NetCDF dump style
	https://github.com/pastewka/lammps-netcdf
	Lars Pastewka, lars.pastewka@kit.edu

	Copyright (2011-2013) Fraunhofer IWM
	Copyright (2014) Karlsruhe Institute of Technology

	This program is free software: you can redistribute it and/or modify
	it under the terms of the GNU General Public License as published by
	the Free Software Foundation, either version 2 of the License, or
	(at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	GNU General Public License for more details.

	You should have received a copy of the GNU General Public License
	along with this program. If not, see <http://www.gnu.org/licenses/>.
	====================================================================== */

	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */
	+
	#if defined(LMP_HAS_PNETCDF)

	#include <unistd.h>
	#include <stdlib.h>
	#include <string.h>
	-
	#include <pnetcdf.h>
	-
	+#include "dump_netcdf_mpiio.h"
	#include "atom.h"
	#include "comm.h"
	#include "compute.h"
	#include "domain.h"
	#include "error.h"
	#include "fix.h"
	#include "group.h"
	#include "input.h"
	#include "math_const.h"
	#include "memory.h"
	#include "modify.h"
	#include "update.h"
	#include "universe.h"
	#include "variable.h"
	#include "force.h"

	-#include "dump_nc_mpiio.h"
	-
	using namespace LAMMPS_NS;
	using namespace MathConst;

	enum{INT,DOUBLE}; // same as in dump_custom.cpp

	const char NC_FRAME_STR[] = "frame";
	const char NC_SPATIAL_STR[] = "spatial";
	const char NC_VOIGT_STR[] = "Voigt";
	const char NC_ATOM_STR[] = "atom";
	const char NC_CELL_SPATIAL_STR[] = "cell_spatial";
	const char NC_CELL_ANGULAR_STR[] = "cell_angular";
	const char NC_LABEL_STR[] = "label";

	const char NC_TIME_STR[] = "time";
	const char NC_CELL_ORIGIN_STR[] = "cell_origin";
	const char NC_CELL_LENGTHS_STR[] = "cell_lengths";
	const char NC_CELL_ANGLES_STR[] = "cell_angles";

	const char NC_UNITS_STR[] = "units";
	const char NC_SCALE_FACTOR_STR[] = "scale_factor";

	const int THIS_IS_A_FIX = -1;
	const int THIS_IS_A_COMPUTE = -2;
	const int THIS_IS_A_VARIABLE = -3;
	const int THIS_IS_A_BIGINT = -4;

	/* ---------------------------------------------------------------------- */

	#define NCERR(x) ncerr(x, NULL, __LINE__)
	#define NCERRX(x, descr) ncerr(x, descr, __LINE__)

	/* ---------------------------------------------------------------------- */

	-DumpNCMPIIO::DumpNCMPIIO(LAMMPS lmp, int narg, char *arg) :
	+DumpNetCDFMPIIO::DumpNetCDFMPIIO(LAMMPS lmp, int narg, char *arg) :
	DumpCustom(lmp, narg, arg)
	{
	// arrays for data rearrangement

	sort_flag = 1;
	sortcol = 0;
	binary = 1;
	flush_flag = 0;

	if (multiproc)
	error->all(FLERR,"Multi-processor writes are not supported.");
	if (multifile)
	error->all(FLERR,"Multiple files are not supported.");

	perat = new nc_perat_t[nfield];

	for (int i = 0; i < nfield; i++) {
	perat[i].dims = 0;
	}

	n_perat = 0;
	for (int iarg = 5; iarg < narg; iarg++) {
	int i = iarg-5;
	int idim = 0;
	int ndims = 1;
	char mangled[1024];

	strcpy(mangled, arg[iarg]);

	// name mangling
	// in the AMBER specification
	if (!strcmp(mangled, "x") \|\| !strcmp(mangled, "y") \|\|
	!strcmp(mangled, "z")) {
	idim = mangled[0] - 'x';
	ndims = 3;
	strcpy(mangled, "coordinates");
	}
	else if (!strcmp(mangled, "vx") \|\| !strcmp(mangled, "vy") \|\|
	!strcmp(mangled, "vz")) {
	idim = mangled[1] - 'x';
	ndims = 3;
	strcpy(mangled, "velocities");
	}
	else if (!strcmp(mangled, "xs") \|\| !strcmp(mangled, "ys") \|\|
	!strcmp(mangled, "zs")) {
	idim = mangled[0] - 'x';
	ndims = 3;
	strcpy(mangled, "scaled_coordinates");
	}
	else if (!strcmp(mangled, "xu") \|\| !strcmp(mangled, "yu") \|\|
	!strcmp(mangled, "zu")) {
	idim = mangled[0] - 'x';
	ndims = 3;
	strcpy(mangled, "unwrapped_coordinates");
	}
	else if (!strcmp(mangled, "fx") \|\| !strcmp(mangled, "fy") \|\|
	!strcmp(mangled, "fz")) {
	idim = mangled[1] - 'x';
	ndims = 3;
	strcpy(mangled, "forces");
	}
	else if (!strcmp(mangled, "mux") \|\| !strcmp(mangled, "muy") \|\|
	!strcmp(mangled, "muz")) {
	idim = mangled[2] - 'x';
	ndims = 3;
	strcpy(mangled, "mu");
	}
	else if (!strncmp(mangled, "c_", 2)) {
	char *ptr = strchr(mangled, '[');
	if (ptr) {
	if (mangled[strlen(mangled)-1] != ']')
	error->all(FLERR,"Missing ']' in dump command");
	*ptr = '\0';
	idim = ptr[1] - '1';
	ndims = THIS_IS_A_COMPUTE;
	}
	}
	else if (!strncmp(mangled, "f_", 2)) {
	char *ptr = strchr(mangled, '[');
	if (ptr) {
	if (mangled[strlen(mangled)-1] != ']')
	error->all(FLERR,"Missing ']' in dump command");
	*ptr = '\0';
	idim = ptr[1] - '1';
	ndims = THIS_IS_A_FIX;
	}
	}

	// find mangled name
	int inc = -1;
	for (int j = 0; j < n_perat && inc < 0; j++) {
	if (!strcmp(perat[j].name, mangled)) {
	inc = j;
	}
	}

	if (inc < 0) {
	// this has not yet been defined
	inc = n_perat;
	perat[inc].dims = ndims;
	if (ndims < 0) ndims = DUMP_NC_MPIIO_MAX_DIMS;
	for (int j = 0; j < DUMP_NC_MPIIO_MAX_DIMS; j++) {
	perat[inc].field[j] = -1;
	}
	strcpy(perat[inc].name, mangled);
	n_perat++;
	}

	perat[inc].field[idim] = i;
	}

	n_perframe = 0;
	perframe = NULL;

	n_buffer = 0;
	int_buffer = NULL;
	double_buffer = NULL;

	double_precision = false;

	framei = 0;
	}

	/* ---------------------------------------------------------------------- */

	-DumpNCMPIIO::~DumpNCMPIIO()
	+DumpNetCDFMPIIO::~DumpNetCDFMPIIO()
	{
	closefile();

	delete [] perat;
	if (n_perframe > 0)
	delete [] perframe;

	if (int_buffer) memory->sfree(int_buffer);
	if (double_buffer) memory->sfree(double_buffer);
	}

	/* ---------------------------------------------------------------------- */

	-void DumpNCMPIIO::openfile()
	+void DumpNetCDFMPIIO::openfile()
	{
	// now the computes and fixes have been initialized, so we can query
	// for the size of vector quantities
	for (int i = 0; i < n_perat; i++) {
	if (perat[i].dims == THIS_IS_A_COMPUTE) {
	int j = -1;
	for (int k = 0; k < DUMP_NC_MPIIO_MAX_DIMS; k++) {
	if (perat[i].field[k] >= 0) {
	j = field2index[perat[i].field[0]];
	}
	}
	if (j < 0)
	error->all(FLERR,"Internal error.");
	if (!compute[j]->peratom_flag)
	error->all(FLERR,"compute does not provide per atom data");
	perat[i].dims = compute[j]->size_peratom_cols;
	if (perat[i].dims > DUMP_NC_MPIIO_MAX_DIMS)
	error->all(FLERR,"perat[i].dims > DUMP_NC_MPIIO_MAX_DIMS");
	}
	else if (perat[i].dims == THIS_IS_A_FIX) {
	int j = -1;
	for (int k = 0; k < DUMP_NC_MPIIO_MAX_DIMS; k++) {
	if (perat[i].field[k] >= 0) {
	j = field2index[perat[i].field[0]];
	}
	}
	if (j < 0)
	error->all(FLERR,"Internal error.");
	if (!fix[j]->peratom_flag)
	error->all(FLERR,"fix does not provide per atom data");
	perat[i].dims = fix[j]->size_peratom_cols;
	if (perat[i].dims > DUMP_NC_MPIIO_MAX_DIMS)
	error->all(FLERR,"perat[i].dims > DUMP_NC_MPIIO_MAX_DIMS");
	}
	}

	// get total number of atoms
	ntotalgr = group->count(igroup);

	if (append_flag && access(filename, F_OK) != -1) {
	// Fixme! Perform checks if dimensions and variables conform with
	// data structure standard.

	MPI_Offset index[NC_MAX_VAR_DIMS], count[NC_MAX_VAR_DIMS];
	double d[1];

	if (singlefile_opened) return;
	singlefile_opened = 1;

	NCERRX( ncmpi_open(MPI_COMM_WORLD, filename, NC_WRITE, MPI_INFO_NULL,
	&ncid), filename );

	// dimensions
	NCERRX( ncmpi_inq_dimid(ncid, NC_FRAME_STR, &frame_dim), NC_FRAME_STR );
	NCERRX( ncmpi_inq_dimid(ncid, NC_SPATIAL_STR, &spatial_dim),
	NC_SPATIAL_STR );
	NCERRX( ncmpi_inq_dimid(ncid, NC_VOIGT_STR, &Voigt_dim), NC_VOIGT_STR );
	NCERRX( ncmpi_inq_dimid(ncid, NC_ATOM_STR, &atom_dim), NC_ATOM_STR );
	NCERRX( ncmpi_inq_dimid(ncid, NC_CELL_SPATIAL_STR, &cell_spatial_dim),
	NC_CELL_SPATIAL_STR );
	NCERRX( ncmpi_inq_dimid(ncid, NC_CELL_ANGULAR_STR, &cell_angular_dim),
	NC_CELL_ANGULAR_STR );
	NCERRX( ncmpi_inq_dimid(ncid, NC_LABEL_STR, &label_dim), NC_LABEL_STR );

	// default variables
	NCERRX( ncmpi_inq_varid(ncid, NC_SPATIAL_STR, &spatial_var),
	NC_SPATIAL_STR );
	NCERRX( ncmpi_inq_varid(ncid, NC_CELL_SPATIAL_STR, &cell_spatial_var),
	NC_CELL_SPATIAL_STR);
	NCERRX( ncmpi_inq_varid(ncid, NC_CELL_ANGULAR_STR, &cell_angular_var),
	NC_CELL_ANGULAR_STR);

	NCERRX( ncmpi_inq_varid(ncid, NC_TIME_STR, &time_var), NC_TIME_STR );
	NCERRX( ncmpi_inq_varid(ncid, NC_CELL_ORIGIN_STR, &cell_origin_var),
	NC_CELL_ORIGIN_STR );
	NCERRX( ncmpi_inq_varid(ncid, NC_CELL_LENGTHS_STR, &cell_lengths_var),
	NC_CELL_LENGTHS_STR);
	NCERRX( ncmpi_inq_varid(ncid, NC_CELL_ANGLES_STR, &cell_angles_var),
	NC_CELL_ANGLES_STR);

	// variables specified in the input file
	for (int i = 0; i < n_perat; i++) {
	nc_type xtype;

	// Type mangling
	if (vtype[perat[i].field[0]] == INT) {
	xtype = NC_INT;
	}
	else {
	if (double_precision)
	xtype = NC_DOUBLE;
	else
	xtype = NC_FLOAT;
	}

	NCERRX( ncmpi_inq_varid(ncid, perat[i].name, &perat[i].var),
	perat[i].name );
	}

	// perframe variables
	for (int i = 0; i < n_perframe; i++) {
	NCERRX( ncmpi_inq_varid(ncid, perframe[i].name, &perframe[i].var),
	perframe[i].name );
	}

	MPI_Offset nframes;
	NCERR( ncmpi_inq_dimlen(ncid, frame_dim, &nframes) );
	// framei == -1 means append to file, == -2 means override last frame
	// Note that in the input file this translates to 'yes', '-1', etc.
	if (framei < 0 \|\| (append_flag && framei == 0)) framei = nframes+framei+1;
	if (framei < 1) framei = 1;
	}
	else {
	int dims[NC_MAX_VAR_DIMS];
	MPI_Offset index[NC_MAX_VAR_DIMS], count[NC_MAX_VAR_DIMS];
	double d[1];

	if (singlefile_opened) return;
	singlefile_opened = 1;

	NCERRX( ncmpi_create(MPI_COMM_WORLD, filename, NC_64BIT_OFFSET,
	MPI_INFO_NULL, &ncid), filename );

	// dimensions
	NCERRX( ncmpi_def_dim(ncid, NC_FRAME_STR, NC_UNLIMITED, &frame_dim),
	NC_FRAME_STR );
	NCERRX( ncmpi_def_dim(ncid, NC_SPATIAL_STR, 3, &spatial_dim),
	NC_SPATIAL_STR );
	NCERRX( ncmpi_def_dim(ncid, NC_VOIGT_STR, 6, &Voigt_dim),
	NC_VOIGT_STR );
	NCERRX( ncmpi_def_dim(ncid, NC_ATOM_STR, ntotalgr, &atom_dim),
	NC_ATOM_STR );
	NCERRX( ncmpi_def_dim(ncid, NC_CELL_SPATIAL_STR, 3, &cell_spatial_dim),
	NC_CELL_SPATIAL_STR );
	NCERRX( ncmpi_def_dim(ncid, NC_CELL_ANGULAR_STR, 3, &cell_angular_dim),
	NC_CELL_ANGULAR_STR );
	NCERRX( ncmpi_def_dim(ncid, NC_LABEL_STR, 10, &label_dim),
	NC_LABEL_STR );

	// default variables
	dims[0] = spatial_dim;
	NCERRX( ncmpi_def_var(ncid, NC_SPATIAL_STR, NC_CHAR, 1, dims, &spatial_var),
	NC_SPATIAL_STR );
	NCERRX( ncmpi_def_var(ncid, NC_CELL_SPATIAL_STR, NC_CHAR, 1, dims,
	&cell_spatial_var), NC_CELL_SPATIAL_STR );
	dims[0] = spatial_dim;
	dims[1] = label_dim;
	NCERRX( ncmpi_def_var(ncid, NC_CELL_ANGULAR_STR, NC_CHAR, 2, dims,
	&cell_angular_var), NC_CELL_ANGULAR_STR );

	dims[0] = frame_dim;
	NCERRX( ncmpi_def_var(ncid, NC_TIME_STR, NC_DOUBLE, 1, dims, &time_var),
	NC_TIME_STR);
	dims[0] = frame_dim;
	dims[1] = cell_spatial_dim;
	NCERRX( ncmpi_def_var(ncid, NC_CELL_ORIGIN_STR, NC_DOUBLE, 2, dims,
	&cell_origin_var), NC_CELL_ORIGIN_STR );
	NCERRX( ncmpi_def_var(ncid, NC_CELL_LENGTHS_STR, NC_DOUBLE, 2, dims,
	&cell_lengths_var), NC_CELL_LENGTHS_STR );
	dims[0] = frame_dim;
	dims[1] = cell_angular_dim;
	NCERRX( ncmpi_def_var(ncid, NC_CELL_ANGLES_STR, NC_DOUBLE, 2, dims,
	&cell_angles_var), NC_CELL_ANGLES_STR );

	// variables specified in the input file
	dims[0] = frame_dim;
	dims[1] = atom_dim;
	dims[2] = spatial_dim;

	for (int i = 0; i < n_perat; i++) {
	nc_type xtype;

	// Type mangling
	if (vtype[perat[i].field[0]] == INT) {
	xtype = NC_INT;
	}
	else {
	if (double_precision)
	xtype = NC_DOUBLE;
	else
	xtype = NC_FLOAT;
	}

	if (perat[i].dims == 6) {
	// this is a tensor in Voigt notation
	dims[2] = Voigt_dim;
	NCERRX( ncmpi_def_var(ncid, perat[i].name, xtype, 3, dims,
	&perat[i].var), perat[i].name );
	}
	else if (perat[i].dims == 3) {
	// this is a vector, we need to store x-, y- and z-coordinates
	dims[2] = spatial_dim;
	NCERRX( ncmpi_def_var(ncid, perat[i].name, xtype, 3, dims,
	&perat[i].var), perat[i].name );
	}
	else if (perat[i].dims == 1) {
	NCERRX( ncmpi_def_var(ncid, perat[i].name, xtype, 2, dims,
	&perat[i].var), perat[i].name );
	}
	else {
	char errstr[1024];
	sprintf(errstr, "%i dimensions for '%s'. Not sure how to write "
	"this to the NetCDF trajectory file.", perat[i].dims,
	perat[i].name);
	error->all(FLERR,errstr);
	}
	}

	// perframe variables
	for (int i = 0; i < n_perframe; i++) {
	if (perframe[i].type == THIS_IS_A_BIGINT) {
	NCERRX( ncmpi_def_var(ncid, perframe[i].name, NC_INT, 1, dims,
	&perframe[i].var), perframe[i].name );
	}
	else {
	NCERRX( ncmpi_def_var(ncid, perframe[i].name, NC_DOUBLE, 1, dims,
	&perframe[i].var), perframe[i].name );
	}
	}

	// attributes
	NCERR( ncmpi_put_att_text(ncid, NC_GLOBAL, "Conventions",
	5, "AMBER") );
	NCERR( ncmpi_put_att_text(ncid, NC_GLOBAL, "ConventionVersion",
	3, "1.0") );

	NCERR( ncmpi_put_att_text(ncid, NC_GLOBAL, "program",
	6, "LAMMPS") );
	NCERR( ncmpi_put_att_text(ncid, NC_GLOBAL, "programVersion",
	strlen(universe->version), universe->version) );

	// units
	if (!strcmp(update->unit_style, "lj")) {
	NCERR( ncmpi_put_att_text(ncid, time_var, NC_UNITS_STR,
	2, "lj") );
	NCERR( ncmpi_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
	2, "lj") );
	NCERR( ncmpi_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
	2, "lj") );
	}
	else if (!strcmp(update->unit_style, "real")) {
	NCERR( ncmpi_put_att_text(ncid, time_var, NC_UNITS_STR,
	11, "femtosecond") );
	NCERR( ncmpi_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
	8, "Angstrom") );
	NCERR( ncmpi_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
	8, "Angstrom") );
	}
	else if (!strcmp(update->unit_style, "metal")) {
	NCERR( ncmpi_put_att_text(ncid, time_var, NC_UNITS_STR,
	10, "picosecond") );
	NCERR( ncmpi_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
	8, "Angstrom") );
	NCERR( ncmpi_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
	8, "Angstrom") );
	}
	else if (!strcmp(update->unit_style, "si")) {
	NCERR( ncmpi_put_att_text(ncid, time_var, NC_UNITS_STR,
	6, "second") );
	NCERR( ncmpi_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
	5, "meter") );
	NCERR( ncmpi_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
	5, "meter") );
	}
	else if (!strcmp(update->unit_style, "cgs")) {
	NCERR( ncmpi_put_att_text(ncid, time_var, NC_UNITS_STR,
	6, "second") );
	NCERR( ncmpi_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
	10, "centimeter") );
	NCERR( ncmpi_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
	10, "centimeter") );
	}
	else if (!strcmp(update->unit_style, "electron")) {
	NCERR( ncmpi_put_att_text(ncid, time_var, NC_UNITS_STR,
	11, "femtosecond") );
	NCERR( ncmpi_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
	4, "Bohr") );
	NCERR( ncmpi_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
	4, "Bohr") );
	}
	else {
	char errstr[1024];
	sprintf(errstr, "Unsupported unit style '%s'", update->unit_style);
	error->all(FLERR,errstr);
	}

	NCERR( ncmpi_put_att_text(ncid, cell_angles_var, NC_UNITS_STR,
	6, "degree") );

	d[0] = update->dt;
	NCERR( ncmpi_put_att_double(ncid, time_var, NC_SCALE_FACTOR_STR,
	NC_DOUBLE, 1, d) );
	d[0] = 1.0;
	NCERR( ncmpi_put_att_double(ncid, cell_origin_var, NC_SCALE_FACTOR_STR,
	NC_DOUBLE, 1, d) );
	d[0] = 1.0;
	NCERR( ncmpi_put_att_double(ncid, cell_lengths_var, NC_SCALE_FACTOR_STR,
	NC_DOUBLE, 1, d) );

	/*
	* Finished with definition
	*/

	NCERR( ncmpi_enddef(ncid) );

	/*
	* Write label variables
	*/

	NCERR( ncmpi_begin_indep_data(ncid) );

	if (filewriter) {
	NCERR( ncmpi_put_var_text(ncid, spatial_var, "xyz") );
	NCERR( ncmpi_put_var_text(ncid, cell_spatial_var, "abc") );
	index[0] = 0;
	index[1] = 0;
	count[0] = 1;
	count[1] = 5;
	NCERR( ncmpi_put_vara_text(ncid, cell_angular_var, index, count,
	"alpha") );
	index[0] = 1;
	count[1] = 4;
	NCERR( ncmpi_put_vara_text(ncid, cell_angular_var, index, count,
	"beta") );
	index[0] = 2;
	count[1] = 5;
	NCERR( ncmpi_put_vara_text(ncid, cell_angular_var, index, count,
	"gamma") );
	}

	NCERR( ncmpi_end_indep_data(ncid) );

	framei = 1;
	}
	}

	/* ---------------------------------------------------------------------- */

	-void DumpNCMPIIO::closefile()
	+void DumpNetCDFMPIIO::closefile()
	{
	if (singlefile_opened) {
	NCERR( ncmpi_close(ncid) );
	singlefile_opened = 0;
	- // append next time DumpNCMPIIO::openfile is called
	+ // append next time DumpNetCDFMPIIO::openfile is called
	append_flag = 1;
	// write to next frame upon next open
	framei++;
	}
	}

	/* ---------------------------------------------------------------------- */

	-void DumpNCMPIIO::write()
	+void DumpNetCDFMPIIO::write()
	{
	// open file

	openfile();

	// need to write per-frame (global) properties here since they may come
	// from computes. write_header below is only called from the writing
	// processes, but modify->compute[j]->compute_* must be called from all
	// processes.

	MPI_Offset start[2];

	start[0] = framei-1;
	start[1] = 0;

	NCERR( ncmpi_begin_indep_data(ncid) );

	for (int i = 0; i < n_perframe; i++) {

	if (perframe[i].type == THIS_IS_A_BIGINT) {
	bigint data;
	(this->perframe[i].compute)((void) &data);

	if (filewriter)
	#if defined(LAMMPS_SMALLBIG) \|\| defined(LAMMPS_BIGBIG)
	NCERR( ncmpi_put_var1_long(ncid, perframe[i].var, start, &data) );
	#else
	NCERR( ncmpi_put_var1_int(ncid, perframe[i].var, start, &data) );
	#endif
	}
	else {
	double data;
	int j = perframe[i].index;
	int idim = perframe[i].dim;

	if (perframe[i].type == THIS_IS_A_COMPUTE) {
	if (idim >= 0) {
	modify->compute[j]->compute_vector();
	data = modify->compute[j]->vector[idim];
	}
	else
	data = modify->compute[j]->compute_scalar();
	}
	else if (perframe[i].type == THIS_IS_A_FIX) {
	if (idim >= 0) {
	data = modify->fix[j]->compute_vector(idim);
	}
	else
	data = modify->fix[j]->compute_scalar();
	}
	else if (perframe[i].type == THIS_IS_A_VARIABLE) {
	j = input->variable->find(perframe[i].id);
	data = input->variable->compute_equal(j);
	}

	if (filewriter)
	NCERR( ncmpi_put_var1_double(ncid, perframe[i].var, start, &data) );
	}
	}

	// write timestep header

	write_time_and_cell();

	NCERR( ncmpi_end_indep_data(ncid) );

	// nme = # of dump lines this proc contributes to dump

	nme = count();
	int *block_sizes = new int[comm->nprocs];
	MPI_Allgather(&nme, 1, MPI_INT, block_sizes, 1, MPI_INT, MPI_COMM_WORLD);
	blocki = 0;
	for (int i = 0; i < comm->me; i++) blocki += block_sizes[i];
	delete [] block_sizes;

	// insure buf is sized for packing and communicating
	// use nme to insure filewriter proc can receive info from others
	// limit nme*size_one to int since used as arg in MPI calls

	if (nme > maxbuf) {
	if ((bigint) nme * size_one > MAXSMALLINT)
	error->all(FLERR,"Too much per-proc info for dump");
	maxbuf = nme;
	memory->destroy(buf);
	memory->create(buf,maxbuf*size_one,"dump:buf");
	}

	// pack my data into buf

	pack(NULL);

	// each process writes its data

	write_data(nme, buf);

	// close file. this ensures data is flushed and minimizes data corruption

	closefile();
	}

	/* ---------------------------------------------------------------------- */

	-void DumpNCMPIIO::write_time_and_cell()
	+void DumpNetCDFMPIIO::write_time_and_cell()
	{
	MPI_Offset start[2];

	start[0] = framei-1;
	start[1] = 0;

	MPI_Offset count[2];
	double time, cell_origin[3], cell_lengths[3], cell_angles[3];

	time = update->ntimestep;
	if (domain->triclinic == 0) {
	cell_origin[0] = domain->boxlo[0];
	cell_origin[1] = domain->boxlo[1];
	cell_origin[2] = domain->boxlo[2];

	cell_lengths[0] = domain->xprd;
	cell_lengths[1] = domain->yprd;
	cell_lengths[2] = domain->zprd;

	cell_angles[0] = 90;
	cell_angles[1] = 90;
	cell_angles[2] = 90;
	}
	else {
	double cosalpha, cosbeta, cosgamma;
	double *h = domain->h;

	cell_origin[0] = domain->boxlo[0];
	cell_origin[1] = domain->boxlo[1];
	cell_origin[2] = domain->boxlo[2];

	cell_lengths[0] = domain->xprd;
	cell_lengths[1] = sqrt(h[1]h[1]+h[5]h[5]);
	cell_lengths[2] = sqrt(h[2]h[2]+h[3]h[3]+h[4]*h[4]);

	cosalpha = (h[5]h[4]+h[1]h[3])/
	sqrt((h[1]h[1]+h[5]h[5])(h[2]h[2]+h[3]h[3]+h[4]h[4]));
	cosbeta = h[4]/sqrt(h[2]h[2]+h[3]h[3]+h[4]*h[4]);
	cosgamma = h[5]/sqrt(h[1]h[1]+h[5]h[5]);

	cell_angles[0] = acos(cosalpha)*180.0/MY_PI;
	cell_angles[1] = acos(cosbeta)*180.0/MY_PI;
	cell_angles[2] = acos(cosgamma)*180.0/MY_PI;
	}

	// Recent AMBER conventions say that nonperiodic boundaries should have
	// 'cell_lengths' set to zero.
	for (int dim = 0; dim < 3; dim++) {
	if (!domain->periodicity[dim])
	cell_lengths[dim] = 0.0;
	}

	count[0] = 1;
	count[1] = 3;
	if (filewriter) {
	NCERR( ncmpi_put_var1_double(ncid, time_var, start, &time) );
	NCERR( ncmpi_put_vara_double(ncid, cell_origin_var, start, count,
	cell_origin) );
	NCERR( ncmpi_put_vara_double(ncid, cell_lengths_var, start, count,
	cell_lengths) );
	NCERR( ncmpi_put_vara_double(ncid, cell_angles_var, start, count,
	cell_angles) );
	}
	}


	/* ----------------------------------------------------------------------
	write data lines to file in a block-by-block style
	write head of block (mass & element name) only if has atoms of the type
	------------------------------------------------------------------------- */

	-void DumpNCMPIIO::write_data(int n, double *mybuf)
	+void DumpNetCDFMPIIO::write_data(int n, double *mybuf)
	{
	MPI_Offset start[NC_MAX_VAR_DIMS], count[NC_MAX_VAR_DIMS];
	MPI_Offset stride[NC_MAX_VAR_DIMS];

	if (!int_buffer) {
	n_buffer = std::max(1, n);
	int_buffer = (int *)
	- memory->smalloc(n_buffer*sizeof(int), "DumpNCMPIIO::int_buffer");
	+ memory->smalloc(n_buffer*sizeof(int),"dump::int_buffer");
	double_buffer = (double *)
	- memory->smalloc(n_buffer*sizeof(double), "DumpNCMPIIO::double_buffer");
	+ memory->smalloc(n_buffer*sizeof(double),"dump::double_buffer");
	}

	if (n > n_buffer) {
	n_buffer = std::max(1, n);
	int_buffer = (int *)
	- memory->srealloc(int_buffer, n_buffer*sizeof(int),
	- "DumpNCMPIIO::int_buffer");
	+ memory->srealloc(int_buffer, n_buffer*sizeof(int),"dump::int_buffer");
	double_buffer = (double *)
	memory->srealloc(double_buffer, n_buffer*sizeof(double),
	- "DumpNCMPIIO::double_buffer");
	+ "dump::double_buffer");
	}

	start[0] = framei-1;
	start[1] = blocki;
	start[2] = 0;

	if (n == 0) {
	/* If there is no data, we need to make sure the start values don't exceed
	dimension bounds. Just set them to zero. */
	start[1] = 0;
	}

	count[0] = 1;
	count[1] = n;
	count[2] = 1;

	stride[0] = 1;
	stride[1] = 1;
	stride[2] = 3;

	for (int i = 0; i < n_perat; i++) {
	int iaux = perat[i].field[0];
	if (iaux < 0 \|\| iaux >= size_one) {
	char errmsg[1024];
	sprintf(errmsg, "Internal error: name = %s, iaux = %i, "
	"size_one = %i", perat[i].name, iaux, size_one);
	error->one(FLERR,errmsg);
	}

	if (vtype[iaux] == INT) {
	// integers
	if (perat[i].dims > 1) {

	for (int idim = 0; idim < perat[i].dims; idim++) {
	iaux = perat[i].field[idim];

	if (iaux >= 0) {
	if (iaux >= size_one) {
	char errmsg[1024];
	sprintf(errmsg, "Internal error: name = %s, iaux = %i, "
	"size_one = %i", perat[i].name, iaux, size_one);
	error->one(FLERR,errmsg);
	}

	for (int j = 0; j < n; j++, iaux+=size_one) {
	int_buffer[j] = mybuf[iaux];
	}

	start[2] = idim;
	NCERRX( ncmpi_put_vars_int_all(ncid, perat[i].var, start, count,
	stride, int_buffer), perat[i].name );
	}
	}
	}
	else {
	for (int j = 0; j < n; j++, iaux+=size_one) {
	int_buffer[j] = mybuf[iaux];
	}

	NCERRX( ncmpi_put_vara_int_all(ncid, perat[i].var, start, count,
	int_buffer), perat[i].name );
	}
	}
	else {
	// doubles
	if (perat[i].dims > 1) {

	for (int idim = 0; idim < perat[i].dims; idim++) {
	iaux = perat[i].field[idim];

	if (iaux >= 0) {
	if (iaux >= size_one) {
	char errmsg[1024];
	sprintf(errmsg, "Internal error: name = %s, iaux = %i, "
	"size_one = %i", perat[i].name, iaux, size_one);
	error->one(FLERR,errmsg);
	}

	for (int j = 0; j < n; j++, iaux+=size_one) {
	double_buffer[j] = mybuf[iaux];
	}

	start[2] = idim;
	NCERRX( ncmpi_put_vars_double_all(ncid, perat[i].var, start, count,
	stride, double_buffer), perat[i].name );
	}
	}
	}
	else {
	for (int j = 0; j < n; j++, iaux+=size_one) {
	double_buffer[j] = mybuf[iaux];
	}

	NCERRX( ncmpi_put_vara_double_all(ncid, perat[i].var, start, count,
	double_buffer), perat[i].name );
	}
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	-int DumpNCMPIIO::modify_param(int narg, char **arg)
	+int DumpNetCDFMPIIO::modify_param(int narg, char **arg)
	{
	int iarg = 0;
	if (strcmp(arg[iarg],"double") == 0) {
	iarg++;
	if (iarg >= narg)
	error->all(FLERR,"expected 'yes' or 'no' after 'double' keyword.");
	if (strcmp(arg[iarg],"yes") == 0) {
	double_precision = true;
	}
	else if (strcmp(arg[iarg],"no") == 0) {
	double_precision = false;
	}
	else error->all(FLERR,"expected 'yes' or 'no' after 'double' keyword.");
	iarg++;
	return 2;
	}
	else if (strcmp(arg[iarg],"at") == 0) {
	iarg++;
	framei = force->inumeric(FLERR,arg[iarg]);
	if (framei < 0) framei--;
	iarg++;
	return 2;
	}
	else if (strcmp(arg[iarg],"global") == 0) {
	// "perframe" quantities, i.e. not per-atom stuff

	iarg++;

	n_perframe = narg-iarg;
	perframe = new nc_perframe_t[n_perframe];

	for (int i = 0; iarg < narg; iarg++, i++) {
	int n;
	char *suffix;

	if (!strcmp(arg[iarg],"step")) {
	perframe[i].type = THIS_IS_A_BIGINT;
	- perframe[i].compute = &DumpNCMPIIO::compute_step;
	+ perframe[i].compute = &DumpNetCDFMPIIO::compute_step;
	strcpy(perframe[i].name, arg[iarg]);
	}
	else if (!strcmp(arg[iarg],"elapsed")) {
	perframe[i].type = THIS_IS_A_BIGINT;
	- perframe[i].compute = &DumpNCMPIIO::compute_elapsed;
	+ perframe[i].compute = &DumpNetCDFMPIIO::compute_elapsed;
	strcpy(perframe[i].name, arg[iarg]);
	}
	else if (!strcmp(arg[iarg],"elaplong")) {
	perframe[i].type = THIS_IS_A_BIGINT;
	- perframe[i].compute = &DumpNCMPIIO::compute_elapsed_long;
	+ perframe[i].compute = &DumpNetCDFMPIIO::compute_elapsed_long;
	strcpy(perframe[i].name, arg[iarg]);
	}
	else {

	n = strlen(arg[iarg]);

	if (n > 2) {
	suffix = new char[n-1];
	strcpy(suffix, arg[iarg]+2);
	}
	else {
	char errstr[1024];
	sprintf(errstr, "perframe quantity '%s' must thermo quantity or "
	"compute, fix or variable", arg[iarg]);
	error->all(FLERR,errstr);
	}

	if (!strncmp(arg[iarg], "c_", 2)) {
	int idim = -1;
	char *ptr = strchr(suffix, '[');

	if (ptr) {
	if (suffix[strlen(suffix)-1] != ']')
	error->all(FLERR,"Missing ']' in dump modify command");
	*ptr = '\0';
	idim = ptr[1] - '1';
	}

	n = modify->find_compute(suffix);
	if (n < 0)
	error->all(FLERR,"Could not find dump modify compute ID");
	if (modify->compute[n]->peratom_flag != 0)
	error->all(FLERR,"Dump modify compute ID computes per-atom info");
	if (idim >= 0 && modify->compute[n]->vector_flag == 0)
	error->all(FLERR,"Dump modify compute ID does not compute vector");
	if (idim < 0 && modify->compute[n]->scalar_flag == 0)
	error->all(FLERR,"Dump modify compute ID does not compute scalar");

	perframe[i].type = THIS_IS_A_COMPUTE;
	perframe[i].dim = idim;
	perframe[i].index = n;
	strcpy(perframe[i].name, arg[iarg]);
	}
	else if (!strncmp(arg[iarg], "f_", 2)) {
	int idim = -1;
	char *ptr = strchr(suffix, '[');

	if (ptr) {
	if (suffix[strlen(suffix)-1] != ']')
	error->all(FLERR,"Missing ']' in dump modify command");
	*ptr = '\0';
	idim = ptr[1] - '1';
	}

	n = modify->find_fix(suffix);
	if (n < 0)
	error->all(FLERR,"Could not find dump modify fix ID");
	if (modify->fix[n]->peratom_flag != 0)
	error->all(FLERR,"Dump modify fix ID computes per-atom info");
	if (idim >= 0 && modify->fix[n]->vector_flag == 0)
	error->all(FLERR,"Dump modify fix ID does not compute vector");
	if (idim < 0 && modify->fix[n]->scalar_flag == 0)
	error->all(FLERR,"Dump modify fix ID does not compute vector");

	perframe[i].type = THIS_IS_A_FIX;
	perframe[i].dim = idim;
	perframe[i].index = n;
	strcpy(perframe[i].name, arg[iarg]);
	}
	else if (!strncmp(arg[iarg], "v_", 2)) {
	n = input->variable->find(suffix);
	if (n < 0)
	error->all(FLERR,"Could not find dump modify variable ID");
	if (!input->variable->equalstyle(n))
	error->all(FLERR,"Dump modify variable must be of style equal");

	perframe[i].type = THIS_IS_A_VARIABLE;
	perframe[i].dim = 1;
	perframe[i].index = n;
	strcpy(perframe[i].name, arg[iarg]);
	strcpy(perframe[i].id, suffix);
	}
	else {
	char errstr[1024];
	sprintf(errstr, "perframe quantity '%s' must be compute, fix or "
	"variable", arg[iarg]);
	error->all(FLERR,errstr);
	}

	delete [] suffix;

	}
	}

	return narg;
	} else return 0;
	}

	/* ---------------------------------------------------------------------- */

	-void DumpNCMPIIO::ncerr(int err, const char *descr, int line)
	+void DumpNetCDFMPIIO::ncerr(int err, const char *descr, int line)
	{
	if (err != NC_NOERR) {
	char errstr[1024];
	if (descr) {
	sprintf(errstr, "NetCDF failed with error '%s' (while accessing '%s') "
	" in line %i of %s.", ncmpi_strerror(err), descr, line, __FILE__);
	}
	else {
	sprintf(errstr, "NetCDF failed with error '%s' in line %i of %s.",
	ncmpi_strerror(err), line, __FILE__);
	}
	error->one(FLERR,errstr);
	}
	}

	/* ----------------------------------------------------------------------
	one method for every keyword thermo can output
	called by compute() or evaluate_keyword()
	compute will have already been called
	set ivalue/dvalue/bivalue if value is int/double/bigint
	customize a new keyword by adding a method
	------------------------------------------------------------------------- */

	-void DumpNCMPIIO::compute_step(void *r)
	+void DumpNetCDFMPIIO::compute_step(void *r)
	{
	((bigint ) r) = update->ntimestep;
	}

	/* ---------------------------------------------------------------------- */

	-void DumpNCMPIIO::compute_elapsed(void *r)
	+void DumpNetCDFMPIIO::compute_elapsed(void *r)
	{
	((bigint ) r) = update->ntimestep - update->firststep;
	}

	/* ---------------------------------------------------------------------- */

	-void DumpNCMPIIO::compute_elapsed_long(void *r)
	+void DumpNetCDFMPIIO::compute_elapsed_long(void *r)
	{
	((bigint ) r) = update->ntimestep - update->beginstep;
	}

	#endif /* defined(LMP_HAS_PNETCDF) */
	diff --git a/src/USER-NC-DUMP/dump_nc_mpiio.h b/src/USER-NETCDF/dump_netcdf_mpiio.h
	similarity index 95%
	rename from src/USER-NC-DUMP/dump_nc_mpiio.h
	rename to src/USER-NETCDF/dump_netcdf_mpiio.h
	index 5e36335e6..6f5b00b03 100644
	--- a/src/USER-NC-DUMP/dump_nc_mpiio.h
	+++ b/src/USER-NETCDF/dump_netcdf_mpiio.h
	@@ -1,140 +1,141 @@
	/* ======================================================================
	LAMMPS NetCDF dump style
	https://github.com/pastewka/lammps-netcdf
	Lars Pastewka, lars.pastewka@kit.edu

	Copyright (2011-2013) Fraunhofer IWM
	Copyright (2014) Karlsruhe Institute of Technology

	This program is free software: you can redistribute it and/or modify
	it under the terms of the GNU General Public License as published by
	the Free Software Foundation, either version 2 of the License, or
	(at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	GNU General Public License for more details.

	You should have received a copy of the GNU General Public License
	along with this program. If not, see <http://www.gnu.org/licenses/>.
	====================================================================== */

	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */
	+
	#if defined(LMP_HAS_PNETCDF)

	#ifdef DUMP_CLASS

	-DumpStyle(nc/mpiio,DumpNCMPIIO)
	+DumpStyle(netcdf/mpiio,DumpNetCDFMPIIO)

	#else

	-#ifndef LMP_DUMP_NC_MPIIO_H
	-#define LMP_DUMP_NC_MPIIO_H
	+#ifndef LMP_DUMP_NETCDF_MPIIO_H
	+#define LMP_DUMP_NETCDF_MPIIO_H

	#include "dump_custom.h"

	namespace LAMMPS_NS {

	const int NC_MPIIO_FIELD_NAME_MAX = 100;
	const int DUMP_NC_MPIIO_MAX_DIMS = 100;

	-class DumpNCMPIIO : public DumpCustom {
	+class DumpNetCDFMPIIO : public DumpCustom {
	public:
	- DumpNCMPIIO(class LAMMPS , int, char *);
	- virtual ~DumpNCMPIIO();
	+ DumpNetCDFMPIIO(class LAMMPS , int, char *);
	+ virtual ~DumpNetCDFMPIIO();
	virtual void write();

	private:
	// per-atoms quantities (positions, velocities, etc.)
	struct nc_perat_t {
	int dims; // number of dimensions
	int field[DUMP_NC_MPIIO_MAX_DIMS]; // field indices corresponding to the dim.
	char name[NC_MPIIO_FIELD_NAME_MAX]; // field name
	int var; // NetCDF variable
	};

	typedef void (DumpNCMPIIO::funcptr_t)(void );

	// per-frame quantities (variables, fixes or computes)
	struct nc_perframe_t {
	char name[NC_MPIIO_FIELD_NAME_MAX]; // field name
	int var; // NetCDF variable
	int type; // variable, fix, compute or callback
	int index; // index in fix/compute list
	funcptr_t compute; // compute function
	int dim; // dimension
	char id[NC_MPIIO_FIELD_NAME_MAX]; // variable id

	bigint bigint_data; // actual data
	double double_data; // actual data
	};

	int framei; // current frame index
	int blocki; // current block index
	int ndata; // number of data blocks to expect

	bigint ntotalgr; // # of atoms

	int n_perat; // # of netcdf per-atom properties
	nc_perat_t *perat; // per-atom properties

	int n_perframe; // # of global netcdf (not per-atom) fix props
	nc_perframe_t *perframe; // global properties

	bool double_precision; // write everything as double precision

	bigint n_buffer; // size of buffer
	int *int_buffer; // buffer for passing data to netcdf
	double *double_buffer; // buffer for passing data to netcdf

	int ncid;

	int frame_dim;
	int spatial_dim;
	int Voigt_dim;
	int atom_dim;
	int cell_spatial_dim;
	int cell_angular_dim;
	int label_dim;

	int spatial_var;
	int cell_spatial_var;
	int cell_angular_var;

	int time_var;
	int cell_origin_var;
	int cell_lengths_var;
	int cell_angles_var;

	virtual void openfile();
	void closefile();
	void write_time_and_cell();
	virtual void write_data(int, double *);
	void write_prmtop();

	virtual int modify_param(int, char **);

	void ncerr(int, const char *, int);

	void compute_step(void *);
	void compute_elapsed(void *);
	void compute_elapsed_long(void *);
	};

	}

	#endif
	#endif
	#endif /* defined(LMP_HAS_PNETCDF) */
	diff --git a/src/USER-OMP/angle_sdk_omp.h b/src/USER-OMP/angle_sdk_omp.h
	index 9ab75904c..c041c2ecc 100644
	--- a/src/USER-OMP/angle_sdk_omp.h
	+++ b/src/USER-OMP/angle_sdk_omp.h
	@@ -1,47 +1,46 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Axel Kohlmeyer (Temple U)
	------------------------------------------------------------------------- */

	#ifdef ANGLE_CLASS

	AngleStyle(sdk/omp,AngleSDKOMP)
	-AngleStyle(cg/cmm/omp,AngleSDKOMP)

	#else

	#ifndef LMP_ANGLE_SDK_OMP_H
	#define LMP_ANGLE_SDK_OMP_H

	#include "angle_sdk.h"
	#include "thr_omp.h"

	namespace LAMMPS_NS {

	class AngleSDKOMP : public AngleSDK, public ThrOMP {

	public:
	AngleSDKOMP(class LAMMPS *lmp);
	virtual void compute(int, int);

	private:
	template <int EVFLAG, int EFLAG, int NEWTON_BOND>
	void eval(int ifrom, int ito, ThrData * const thr);
	};

	}

	#endif
	#endif
	diff --git a/src/USER-OMP/improper_ring_omp.cpp b/src/USER-OMP/improper_ring_omp.cpp
	index bd7593c51..4eadc8318 100644
	--- a/src/USER-OMP/improper_ring_omp.cpp
	+++ b/src/USER-OMP/improper_ring_omp.cpp
	@@ -1,266 +1,266 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Axel Kohlmeyer (Temple U)
	------------------------------------------------------------------------- */

	#include <math.h>
	#include "improper_ring_omp.h"
	#include "atom.h"
	#include "comm.h"
	#include "neighbor.h"
	#include "domain.h"
	#include "force.h"
	#include "update.h"
	#include "error.h"
	#include "math_special.h"

	#include "suffix.h"
	using namespace LAMMPS_NS;
	using namespace MathSpecial;

	#define TOLERANCE 0.05
	#define SMALL 0.001

	/* ---------------------------------------------------------------------- */

	ImproperRingOMP::ImproperRingOMP(class LAMMPS *lmp)
	: ImproperRing(lmp), ThrOMP(lmp,THR_IMPROPER)
	{
	suffix_flag \|= Suffix::OMP;
	}

	/* ---------------------------------------------------------------------- */

	void ImproperRingOMP::compute(int eflag, int vflag)
	{

	if (eflag \|\| vflag) {
	ev_setup(eflag,vflag);
	} else evflag = 0;

	const int nall = atom->nlocal + atom->nghost;
	const int nthreads = comm->nthreads;
	const int inum = neighbor->nimproperlist;

	#if defined(_OPENMP)
	#pragma omp parallel default(none) shared(eflag,vflag)
	#endif
	{
	int ifrom, ito, tid;

	loop_setup_thr(ifrom, ito, tid, inum, nthreads);
	ThrData *thr = fix->get_thr(tid);
	thr->timer(Timer::START);
	ev_setup_thr(eflag, vflag, nall, eatom, vatom, thr);

	if (inum > 0) {
	if (evflag) {
	if (eflag) {
	if (force->newton_bond) eval<1,1,1>(ifrom, ito, thr);
	else eval<1,1,0>(ifrom, ito, thr);
	} else {
	if (force->newton_bond) eval<1,0,1>(ifrom, ito, thr);
	else eval<1,0,0>(ifrom, ito, thr);
	}
	} else {
	if (force->newton_bond) eval<0,0,1>(ifrom, ito, thr);
	else eval<0,0,0>(ifrom, ito, thr);
	}
	thr->timer(Timer::BOND);
	reduce_thr(this, eflag, vflag, thr);
	}
	} // end of omp parallel region
	}

	template <int EVFLAG, int EFLAG, int NEWTON_BOND>
	void ImproperRingOMP::eval(int nfrom, int nto, ThrData * const thr)
	{
	/* Be careful!: "chi" is the equilibrium angle in radians. */
	int i1,i2,i3,i4,n,type;

	double eimproper;

	/* Compatibility variables. */
	double vb1x,vb1y,vb1z,vb2x,vb2y,vb2z,vb3x,vb3y,vb3z;
	double f1[3], f3[3], f4[3];

	/* Actual computation variables. */
	int at1[3], at2[3], at3[3], icomb;
	double bvec1x[3], bvec1y[3], bvec1z[3],
	bvec2x[3], bvec2y[3], bvec2z[3],
	bvec1n[3], bvec2n[3], bend_angle[3];
	double angle_summer, angfac, cfact1, cfact2, cfact3;
	double cjiji, ckjji, ckjkj, fix, fiy, fiz, fjx, fjy, fjz, fkx, fky, fkz;

	eimproper = 0.0;

	const double * const * const x = atom->x;
	double * const * const f = thr->get_f();
	const int * const * const improperlist = neighbor->improperlist;
	const int nlocal = atom->nlocal;

	/* A description of the potential can be found in
	Macromolecules 35, pp. 1463-1472 (2002). */
	for (n = nfrom; n < nto; n++) {
	/* Take the ids of the atoms contributing to the improper potential. */
	i1 = improperlist[n][0]; /* Atom "1" of Figure 1 from the above reference.*/
	i2 = improperlist[n][1]; /* Atom "2" ... */
	i3 = improperlist[n][2]; /* Atom "3" ... */
	i4 = improperlist[n][3]; /* Atom "9" ... */
	type = improperlist[n][4];

	/* Calculate the necessary variables for LAMMPS implementation.
	if (evflag) ev_tally(i1,i2,i3,i4,nlocal,newton_bond,eimproper,f1,f3,f4,
	vb1x,vb1y,vb1z,vb2x,vb2y,vb2z,vb3x,vb3y,vb3z);
	Although, they are irrelevant to the calculation of the potential, we keep
	them for maximal compatibility. */
	vb1x = x[i1][0] - x[i2][0]; vb1y = x[i1][1] - x[i2][1]; vb1z = x[i1][2] - x[i2][2];

	vb2x = x[i3][0] - x[i2][0]; vb2y = x[i3][1] - x[i2][1]; vb2z = x[i3][2] - x[i2][2];

	vb3x = x[i4][0] - x[i3][0]; vb3y = x[i4][1] - x[i3][1]; vb3z = x[i4][2] - x[i3][2];


	/* Pass the atom tags to form the necessary combinations. */
	at1[0] = i1; at2[0] = i2; at3[0] = i4; /* ids: 1-2-9 */
	at1[1] = i1; at2[1] = i2; at3[1] = i3; /* ids: 1-2-3 */
	at1[2] = i4; at2[2] = i2; at3[2] = i3; /* ids: 9-2-3 */


	/* Initialize the sum of the angles differences. */
	angle_summer = 0.0;
	/* Take a loop over the three angles, defined by each triad: */
	for (icomb = 0; icomb < 3; icomb ++) {

	/* Bond vector connecting the first and the second atom. */
	bvec1x[icomb] = x[at2[icomb]][0] - x[at1[icomb]][0];
	bvec1y[icomb] = x[at2[icomb]][1] - x[at1[icomb]][1];
	bvec1z[icomb] = x[at2[icomb]][2] - x[at1[icomb]][2];
	/* also calculate the norm of the vector: */
	bvec1n[icomb] = sqrt( bvec1x[icomb]*bvec1x[icomb]
	+ bvec1y[icomb]*bvec1y[icomb]
	+ bvec1z[icomb]*bvec1z[icomb]);
	/* Bond vector connecting the second and the third atom. */
	bvec2x[icomb] = x[at3[icomb]][0] - x[at2[icomb]][0];
	bvec2y[icomb] = x[at3[icomb]][1] - x[at2[icomb]][1];
	bvec2z[icomb] = x[at3[icomb]][2] - x[at2[icomb]][2];
	/* also calculate the norm of the vector: */
	bvec2n[icomb] = sqrt( bvec2x[icomb]*bvec2x[icomb]
	+ bvec2y[icomb]*bvec2y[icomb]
	+ bvec2z[icomb]*bvec2z[icomb]);

	/* Calculate the bending angle of the atom triad: */
	bend_angle[icomb] = ( bvec2x[icomb]*bvec1x[icomb]
	+ bvec2y[icomb]*bvec1y[icomb]
	+ bvec2z[icomb]*bvec1z[icomb]);
	bend_angle[icomb] /= (bvec1n[icomb] * bvec2n[icomb]);
	if (bend_angle[icomb] > 1.0) bend_angle[icomb] -= SMALL;
	if (bend_angle[icomb] < -1.0) bend_angle[icomb] += SMALL;

	/* Append the current angle to the sum of angle differences. */
	angle_summer += (bend_angle[icomb] - chi[type]);
	}
	if (EFLAG) eimproper = (1.0/6.0) k[type] powint(angle_summer,6);
	/*
	printf("The tags: %d-%d-%d-%d, of type %d .\n",atom->tag[i1],atom->tag[i2],atom->tag[i3],atom->tag[i4],type);
	// printf("The coordinates of the first: %f, %f, %f.\n", x[i1][0], x[i1][1], x[i1][2]);
	// printf("The coordinates of the second: %f, %f, %f.\n", x[i2][0], x[i2][1], x[i2][2]);
	// printf("The coordinates of the third: %f, %f, %f.\n", x[i3][0], x[i3][1], x[i3][2]);
	// printf("The coordinates of the fourth: %f, %f, %f.\n", x[i4][0], x[i4][1], x[i4][2]);
	printf("The angles are: %f / %f / %f equilibrium: %f.\n", bend_angle[0], bend_angle[1], bend_angle[2],chi[type]);
	printf("The energy of the improper: %f with prefactor %f.\n", eimproper,(1.0/6.0)*k[type]);
	printf("The sum of the angles: %f.\n", angle_summer);
	*/

	/* Force calculation acting on all atoms.
	Calculate the derivatives of the potential. */
	angfac = k[type] * powint(angle_summer,5);

	f1[0] = 0.0; f1[1] = 0.0; f1[2] = 0.0;
	f3[0] = 0.0; f3[1] = 0.0; f3[2] = 0.0;
	f4[0] = 0.0; f4[1] = 0.0; f4[2] = 0.0;

	/* Take a loop over the three angles, defined by each triad: */
	for (icomb = 0; icomb < 3; icomb ++)
	{
	/* Calculate the squares of the distances. */
	cjiji = bvec1n[icomb] * bvec1n[icomb]; ckjkj = bvec2n[icomb] * bvec2n[icomb];

	ckjji = bvec2x[icomb] * bvec1x[icomb]
	+ bvec2y[icomb] * bvec1y[icomb]
	+ bvec2z[icomb] * bvec1z[icomb] ;

	cfact1 = angfac / (sqrt(ckjkj * cjiji));
	cfact2 = ckjji / ckjkj;
	cfact3 = ckjji / cjiji;

	- /* Calculate the force acted on the thrid atom of the angle. */
	+ /* Calculate the force acted on the third atom of the angle. */
	fkx = cfact2 * bvec2x[icomb] - bvec1x[icomb];
	fky = cfact2 * bvec2y[icomb] - bvec1y[icomb];
	fkz = cfact2 * bvec2z[icomb] - bvec1z[icomb];

	/* Calculate the force acted on the first atom of the angle. */
	fix = bvec2x[icomb] - cfact3 * bvec1x[icomb];
	fiy = bvec2y[icomb] - cfact3 * bvec1y[icomb];
	fiz = bvec2z[icomb] - cfact3 * bvec1z[icomb];

	/* Finally, calculate the force acted on the middle atom of the angle.*/
	fjx = - fix - fkx; fjy = - fiy - fky; fjz = - fiz - fkz;

	/* Consider the appropriate scaling of the forces: */
	fix = cfact1; fiy = cfact1; fiz *= cfact1;
	fjx = cfact1; fjy = cfact1; fjz *= cfact1;
	fkx = cfact1; fky = cfact1; fkz *= cfact1;

	if (at1[icomb] == i1) {f1[0] += fix; f1[1] += fiy; f1[2] += fiz;}
	else if (at2[icomb] == i1) {f1[0] += fjx; f1[1] += fjy; f1[2] += fjz;}
	else if (at3[icomb] == i1) {f1[0] += fkx; f1[1] += fky; f1[2] += fkz;}

	if (at1[icomb] == i3) {f3[0] += fix; f3[1] += fiy; f3[2] += fiz;}
	else if (at2[icomb] == i3) {f3[0] += fjx; f3[1] += fjy; f3[2] += fjz;}
	else if (at3[icomb] == i3) {f3[0] += fkx; f3[1] += fky; f3[2] += fkz;}

	if (at1[icomb] == i4) {f4[0] += fix; f4[1] += fiy; f4[2] += fiz;}
	else if (at2[icomb] == i4) {f4[0] += fjx; f4[1] += fjy; f4[2] += fjz;}
	else if (at3[icomb] == i4) {f4[0] += fkx; f4[1] += fky; f4[2] += fkz;}


	/* Store the contribution to the global arrays: */
	/* Take the id of the atom from the at1[icomb] element, i1 = at1[icomb]. */
	if (NEWTON_BOND \|\| at1[icomb] < nlocal) {
	f[at1[icomb]][0] += fix;
	f[at1[icomb]][1] += fiy;
	f[at1[icomb]][2] += fiz;
	}
	/* Take the id of the atom from the at2[icomb] element, i2 = at2[icomb]. */
	if (NEWTON_BOND \|\| at2[icomb] < nlocal) {
	f[at2[icomb]][0] += fjx;
	f[at2[icomb]][1] += fjy;
	f[at2[icomb]][2] += fjz;
	}
	/* Take the id of the atom from the at3[icomb] element, i3 = at3[icomb]. */
	if (NEWTON_BOND \|\| at3[icomb] < nlocal) {
	f[at3[icomb]][0] += fkx;
	f[at3[icomb]][1] += fky;
	f[at3[icomb]][2] += fkz;
	}

	}

	if (EVFLAG)
	ev_tally_thr(this,i1,i2,i3,i4,nlocal,NEWTON_BOND,eimproper,f1,f3,f4,
	vb1x,vb1y,vb1z,vb2x,vb2y,vb2z,vb3x,vb3y,vb3z,thr);
	}
	}
	diff --git a/src/USER-OMP/pair_lj_sdk_coul_long_omp.h b/src/USER-OMP/pair_lj_sdk_coul_long_omp.h
	index a615efb50..1886d2c7b 100644
	--- a/src/USER-OMP/pair_lj_sdk_coul_long_omp.h
	+++ b/src/USER-OMP/pair_lj_sdk_coul_long_omp.h
	@@ -1,49 +1,48 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Axel Kohlmeyer (Temple U)
	------------------------------------------------------------------------- */

	#ifdef PAIR_CLASS

	PairStyle(lj/sdk/coul/long/omp,PairLJSDKCoulLongOMP)
	-PairStyle(cg/cmm/coul/long/omp,PairLJSDKCoulLongOMP)

	#else

	#ifndef LMP_PAIR_LJ_SDK_COUL_LONG_OMP_H
	#define LMP_PAIR_LJ_SDK_COUL_LONG_OMP_H

	#include "pair_lj_sdk_coul_long.h"
	#include "thr_omp.h"

	namespace LAMMPS_NS {

	class PairLJSDKCoulLongOMP : public PairLJSDKCoulLong, public ThrOMP {

	public:
	PairLJSDKCoulLongOMP(class LAMMPS *);

	virtual void compute(int, int);
	virtual double memory_usage();

	private:
	template <int EVFLAG, int EFLAG, int NEWTON_PAIR>
	void eval_thr(int ifrom, int ito, ThrData * const thr);
	};

	}

	#endif
	#endif
	diff --git a/src/USER-OMP/pair_lj_sdk_coul_msm_omp.h b/src/USER-OMP/pair_lj_sdk_coul_msm_omp.h
	index 9e4a922c3..9841408b8 100644
	--- a/src/USER-OMP/pair_lj_sdk_coul_msm_omp.h
	+++ b/src/USER-OMP/pair_lj_sdk_coul_msm_omp.h
	@@ -1,57 +1,56 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Axel Kohlmeyer (Temple U)
	------------------------------------------------------------------------- */

	#ifdef PAIR_CLASS

	PairStyle(lj/sdk/coul/msm/omp,PairLJSDKCoulMSMOMP)
	-PairStyle(cg/cmm/coul/msm/omp,PairLJSDKCoulMSMOMP)

	#else

	#ifndef LMP_PAIR_LJ_SDK_COUL_MSM_OMP_H
	#define LMP_PAIR_LJ_SDK_COUL_MSM_OMP_H

	#include "pair_lj_sdk_coul_msm.h"
	#include "thr_omp.h"

	namespace LAMMPS_NS {

	class PairLJSDKCoulMSMOMP : public PairLJSDKCoulMSM, public ThrOMP {

	public:
	PairLJSDKCoulMSMOMP(class LAMMPS *);

	virtual void compute(int, int);
	virtual double memory_usage();

	private:
	template <int EVFLAG, int EFLAG, int NEWTON_PAIR>
	void eval_msm_thr(int ifrom, int ito, ThrData * const thr);
	};

	}

	#endif
	#endif

	/* ERROR/WARNING messages:

	E: Must use 'kspace_modify pressure/scalar no' with OMP MSM Pair styles

	The kspace scalar pressure option is not (yet) compatible with OMP MSM Pair styles.

	-*/
	\ No newline at end of file
	+*/
	diff --git a/src/USER-OMP/pair_lj_sdk_omp.h b/src/USER-OMP/pair_lj_sdk_omp.h
	index c3837fb68..36c913252 100644
	--- a/src/USER-OMP/pair_lj_sdk_omp.h
	+++ b/src/USER-OMP/pair_lj_sdk_omp.h
	@@ -1,49 +1,48 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Axel Kohlmeyer (Temple U)
	------------------------------------------------------------------------- */

	#ifdef PAIR_CLASS

	PairStyle(lj/sdk/omp,PairLJSDKOMP)
	-PairStyle(cg/cmm/omp,PairLJSDKOMP)

	#else

	#ifndef LMP_PAIR_LJ_SDK_OMP_H
	#define LMP_PAIR_LJ_SDK_OMP_H

	#include "pair_lj_sdk.h"
	#include "thr_omp.h"

	namespace LAMMPS_NS {

	class PairLJSDKOMP : public PairLJSDK, public ThrOMP {

	public:
	PairLJSDKOMP(class LAMMPS *);

	virtual void compute(int, int);
	virtual double memory_usage();

	private:
	template <int EVFLAG, int EFLAG, int NEWTON_PAIR>
	void eval_thr(int ifrom, int ito, ThrData * const thr);
	};

	}

	#endif
	#endif
	diff --git a/src/USER-REAXC/compute_spec_atom.cpp b/src/USER-REAXC/compute_spec_atom.cpp
	index 4af8efcae..164ce8720 100644
	--- a/src/USER-REAXC/compute_spec_atom.cpp
	+++ b/src/USER-REAXC/compute_spec_atom.cpp
	@@ -1,648 +1,648 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Labo0ratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <math.h>
	#include <string.h>
	#include "compute_spec_atom.h"
	#include "math_extra.h"
	#include "atom.h"
	#include "update.h"
	#include "force.h"
	#include "domain.h"
	#include "memory.h"
	#include "error.h"

	#include "reaxc_defs.h"
	#include "reaxc_types.h"
	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"

	using namespace LAMMPS_NS;

	enum{KEYWORD,COMPUTE,FIX,VARIABLE};

	/* ---------------------------------------------------------------------- */

	ComputeSpecAtom::ComputeSpecAtom(LAMMPS lmp, int narg, char *arg) :
	Compute(lmp, narg, arg)
	{
	if (narg < 4) error->all(FLERR,"Illegal compute reax/c/atom command");

	peratom_flag = 1;
	nvalues = narg - 3;
	if (nvalues == 1) size_peratom_cols = 0;
	else size_peratom_cols = nvalues;

	// Initiate reaxc
	reaxc = (PairReaxC *) force->pair_match("reax/c",1);
	if (reaxc == NULL)
	reaxc = (PairReaxC *) force->pair_match("reax/c/kk",1);

	pack_choice = new FnPtrPack[nvalues];

	int i;
	for (int iarg = 3; iarg < narg; iarg++) {
	i = iarg-3;

	// standard lammps attributes
	if (strcmp(arg[iarg],"q") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_q;

	} else if (strcmp(arg[iarg],"x") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_x;
	} else if (strcmp(arg[iarg],"y") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_y;
	} else if (strcmp(arg[iarg],"z") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_z;

	} else if (strcmp(arg[iarg],"vx") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_vx;
	} else if (strcmp(arg[iarg],"vy") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_vy;
	} else if (strcmp(arg[iarg],"vz") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_vz;

	- // from pair_reax_c
	+ // from pair_reaxc
	} else if (strcmp(arg[iarg],"abo01") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo01;
	} else if (strcmp(arg[iarg],"abo02") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo02;
	} else if (strcmp(arg[iarg],"abo03") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo03;
	} else if (strcmp(arg[iarg],"abo04") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo04;
	} else if (strcmp(arg[iarg],"abo05") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo05;
	} else if (strcmp(arg[iarg],"abo06") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo06;
	} else if (strcmp(arg[iarg],"abo07") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo07;
	} else if (strcmp(arg[iarg],"abo08") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo08;
	} else if (strcmp(arg[iarg],"abo09") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo09;
	} else if (strcmp(arg[iarg],"abo10") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo10;
	} else if (strcmp(arg[iarg],"abo11") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo11;
	} else if (strcmp(arg[iarg],"abo12") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo12;
	} else if (strcmp(arg[iarg],"abo13") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo13;
	} else if (strcmp(arg[iarg],"abo14") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo14;
	} else if (strcmp(arg[iarg],"abo15") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo15;
	} else if (strcmp(arg[iarg],"abo16") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo16;
	} else if (strcmp(arg[iarg],"abo17") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo17;
	} else if (strcmp(arg[iarg],"abo18") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo18;
	} else if (strcmp(arg[iarg],"abo19") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo19;
	} else if (strcmp(arg[iarg],"abo20") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo20;
	} else if (strcmp(arg[iarg],"abo21") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo21;
	} else if (strcmp(arg[iarg],"abo22") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo22;
	} else if (strcmp(arg[iarg],"abo23") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo23;
	} else if (strcmp(arg[iarg],"abo24") == 0) {
	pack_choice[i] = &ComputeSpecAtom::pack_abo24;

	} else error->all(FLERR,"Invalid keyword in compute reax/c/atom command");
	}

	nmax = 0;
	vector = NULL;
	array = NULL;

	}

	/* ---------------------------------------------------------------------- */

	ComputeSpecAtom::~ComputeSpecAtom()
	{
	delete [] pack_choice;
	memory->destroy(vector);
	memory->destroy(array);
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::compute_peratom()
	{
	invoked_peratom = update->ntimestep;

	// grow vector or array if necessary

	if (atom->nmax > nmax) {
	nmax = atom->nmax;
	if (nvalues == 1) {
	memory->destroy(vector);
	memory->create(vector,nmax,"property/atom:vector");
	vector_atom = vector;
	} else {
	memory->destroy(array);
	memory->create(array,nmax,nvalues,"property/atom:array");
	array_atom = array;
	}
	}

	// fill vector or array with per-atom values

	if (nvalues == 1) {
	buf = vector;
	(this->*pack_choice[0])(0);
	} else {
	if (nmax > 0) {
	buf = &array[0][0];
	for (int n = 0; n < nvalues; n++)
	(this->*pack_choice[n])(n);
	}
	}
	}

	/* ----------------------------------------------------------------------
	memory usage of local atom-based array
	------------------------------------------------------------------------- */

	double ComputeSpecAtom::memory_usage()
	{
	double bytes = nmaxnvalues sizeof(double);
	return bytes;
	}

	/* ----------------------------------------------------------------------
	one method for every keyword compute property/atom can output
	the atom property is packed into buf starting at n with stride nvalues
	customize a new keyword by adding a method
	------------------------------------------------------------------------- */

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_q(int n)
	{
	double *q = atom->q;
	int *mask = atom->mask;
	int nlocal = atom->nlocal;



	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = q[i];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_x(int n)
	{
	double **x = atom->x;
	int *mask = atom->mask;
	int nlocal = atom->nlocal;



	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = x[i][0];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_y(int n)
	{
	double **x = atom->x;
	int *mask = atom->mask;
	int nlocal = atom->nlocal;



	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = x[i][1];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_z(int n)
	{
	double **x = atom->x;
	int *mask = atom->mask;
	int nlocal = atom->nlocal;



	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = x[i][2];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_vx(int n)
	{
	double **v = atom->v;
	int *mask = atom->mask;
	int nlocal = atom->nlocal;



	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = v[i][0];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_vy(int n)
	{
	double **v = atom->v;
	int *mask = atom->mask;
	int nlocal = atom->nlocal;



	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = v[i][1];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_vz(int n)
	{
	double **v = atom->v;
	int *mask = atom->mask;
	int nlocal = atom->nlocal;



	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = v[i][2];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo01(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][0];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo02(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][1];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo03(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][2];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo04(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][3];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo05(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][4];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo06(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][5];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo07(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][6];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo08(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][7];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo09(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][8];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo10(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][9];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo11(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][10];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo12(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][11];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo13(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][12];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo14(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][13];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo15(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][14];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo16(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][15];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo17(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][16];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo18(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][17];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo19(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][18];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo20(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][19];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo21(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][20];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo22(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][21];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo23(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][22];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeSpecAtom::pack_abo24(int n)
	{
	int *mask = atom->mask;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][23];
	else buf[n] = 0.0;
	n += nvalues;
	}
	}

	/* ---------------------------------------------------------------------- */
	diff --git a/src/USER-REAXC/fix_qeq_reax.cpp b/src/USER-REAXC/fix_qeq_reax.cpp
	index 26cf03f60..01ecd9d39 100644
	--- a/src/USER-REAXC/fix_qeq_reax.cpp
	+++ b/src/USER-REAXC/fix_qeq_reax.cpp
	@@ -1,1041 +1,1041 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Hasan Metin Aktulga, Purdue University
	(now at Lawrence Berkeley National Laboratory, hmaktulga@lbl.gov)

	Hybrid and sub-group capabilities: Ray Shan (Sandia)
	------------------------------------------------------------------------- */

	#include <math.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <string.h>
	#include "fix_qeq_reax.h"
	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "atom.h"
	#include "comm.h"
	#include "domain.h"
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "neigh_request.h"
	#include "update.h"
	#include "force.h"
	#include "group.h"
	#include "pair.h"
	#include "respa.h"
	#include "memory.h"
	#include "citeme.h"
	#include "error.h"
	#include "reaxc_defs.h"

	using namespace LAMMPS_NS;
	using namespace FixConst;

	#define EV_TO_KCAL_PER_MOL 14.4
	//#define DANGER_ZONE 0.95
	//#define LOOSE_ZONE 0.7
	#define SQR(x) ((x)*(x))
	#define CUBE(x) ((x)(x)(x))
	#define MIN_NBRS 100

	static const char cite_fix_qeq_reax[] =
	"fix qeq/reax command:\n\n"
	"@Article{Aktulga12,\n"
	" author = {H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama},\n"
	" title = {Parallel reactive molecular dynamics: Numerical methods and algorithmic techniques},\n"
	" journal = {Parallel Computing},\n"
	" year = 2012,\n"
	" volume = 38,\n"
	" pages = {245--259}\n"
	"}\n\n";

	/* ---------------------------------------------------------------------- */

	FixQEqReax::FixQEqReax(LAMMPS lmp, int narg, char *arg) :
	Fix(lmp, narg, arg)
	{
	if (lmp->citeme) lmp->citeme->add(cite_fix_qeq_reax);

	if (narg != 8) error->all(FLERR,"Illegal fix qeq/reax command");

	nevery = force->inumeric(FLERR,arg[3]);
	if (nevery <= 0) error->all(FLERR,"Illegal fix qeq/reax command");

	swa = force->numeric(FLERR,arg[4]);
	swb = force->numeric(FLERR,arg[5]);
	tolerance = force->numeric(FLERR,arg[6]);
	pertype_parameters(arg[7]);

	shld = NULL;

	n = n_cap = 0;
	N = nmax = 0;
	m_fill = m_cap = 0;
	pack_flag = 0;
	s = NULL;
	t = NULL;
	nprev = 5;

	Hdia_inv = NULL;
	b_s = NULL;
	b_t = NULL;
	b_prc = NULL;
	b_prm = NULL;

	// CG
	p = NULL;
	q = NULL;
	r = NULL;
	d = NULL;

	// H matrix
	H.firstnbr = NULL;
	H.numnbrs = NULL;
	H.jlist = NULL;
	H.val = NULL;

	comm_forward = comm_reverse = 1;

	// perform initial allocation of atom-based arrays
	// register with Atom class

	s_hist = t_hist = NULL;
	grow_arrays(atom->nmax);
	atom->add_callback(0);
	for( int i = 0; i < atom->nmax; i++ )
	for (int j = 0; j < nprev; ++j )
	s_hist[i][j] = t_hist[i][j] = 0;

	reaxc = NULL;
	reaxc = (PairReaxC *) force->pair_match("reax/c",1);

	}

	/* ---------------------------------------------------------------------- */

	FixQEqReax::~FixQEqReax()
	{
	// unregister callbacks to this fix from Atom class

	if (copymode) return;

	atom->delete_callback(id,0);

	memory->destroy(s_hist);
	memory->destroy(t_hist);

	deallocate_storage();
	deallocate_matrix();

	memory->destroy(shld);

	if (!reaxflag) {
	memory->destroy(chi);
	memory->destroy(eta);
	memory->destroy(gamma);
	}
	}

	/* ---------------------------------------------------------------------- */

	int FixQEqReax::setmask()
	{
	int mask = 0;
	mask \|= PRE_FORCE;
	mask \|= PRE_FORCE_RESPA;
	mask \|= MIN_PRE_FORCE;
	return mask;
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::pertype_parameters(char *arg)
	{
	if (strcmp(arg,"reax/c") == 0) {
	reaxflag = 1;
	Pair *pair = force->pair_match("reax/c",1);
	if (pair == NULL)
	pair = force->pair_match("reax/c/kk",1);

	if (pair == NULL) error->all(FLERR,"No pair reax/c for fix qeq/reax");
	int tmp;
	chi = (double *) pair->extract("chi",tmp);
	eta = (double *) pair->extract("eta",tmp);
	gamma = (double *) pair->extract("gamma",tmp);
	if (chi == NULL \|\| eta == NULL \|\| gamma == NULL)
	error->all(FLERR,
	"Fix qeq/reax could not extract params from pair reax/c");
	return;
	}

	int i,itype,ntypes;
	double v1,v2,v3;
	FILE *pf;

	reaxflag = 0;
	ntypes = atom->ntypes;

	memory->create(chi,ntypes+1,"qeq/reax:chi");
	memory->create(eta,ntypes+1,"qeq/reax:eta");
	memory->create(gamma,ntypes+1,"qeq/reax:gamma");

	if (comm->me == 0) {
	if ((pf = fopen(arg,"r")) == NULL)
	error->one(FLERR,"Fix qeq/reax parameter file could not be found");

	for (i = 1; i <= ntypes && !feof(pf); i++) {
	fscanf(pf,"%d %lg %lg %lg",&itype,&v1,&v2,&v3);
	if (itype < 1 \|\| itype > ntypes)
	error->one(FLERR,"Fix qeq/reax invalid atom type in param file");
	chi[itype] = v1;
	eta[itype] = v2;
	gamma[itype] = v3;
	}
	if (i <= ntypes) error->one(FLERR,"Invalid param file for fix qeq/reax");
	fclose(pf);
	}

	MPI_Bcast(&chi[1],ntypes,MPI_DOUBLE,0,world);
	MPI_Bcast(&eta[1],ntypes,MPI_DOUBLE,0,world);
	MPI_Bcast(&gamma[1],ntypes,MPI_DOUBLE,0,world);
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::allocate_storage()
	{
	nmax = atom->nmax;

	memory->create(s,nmax,"qeq:s");
	memory->create(t,nmax,"qeq:t");

	memory->create(Hdia_inv,nmax,"qeq:Hdia_inv");
	memory->create(b_s,nmax,"qeq:b_s");
	memory->create(b_t,nmax,"qeq:b_t");
	memory->create(b_prc,nmax,"qeq:b_prc");
	memory->create(b_prm,nmax,"qeq:b_prm");

	memory->create(p,nmax,"qeq:p");
	memory->create(q,nmax,"qeq:q");
	memory->create(r,nmax,"qeq:r");
	memory->create(d,nmax,"qeq:d");
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::deallocate_storage()
	{
	memory->destroy(s);
	memory->destroy(t);

	memory->destroy( Hdia_inv );
	memory->destroy( b_s );
	memory->destroy( b_t );
	memory->destroy( b_prc );
	memory->destroy( b_prm );

	memory->destroy( p );
	memory->destroy( q );
	memory->destroy( r );
	memory->destroy( d );
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::reallocate_storage()
	{
	deallocate_storage();
	allocate_storage();
	init_storage();
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::allocate_matrix()
	{
	int i,ii,inum,m;
	int ilist, numneigh;

	int mincap;
	double safezone;

	if( reaxflag ) {
	mincap = reaxc->system->mincap;
	safezone = reaxc->system->safezone;
	} else {
	mincap = MIN_CAP;
	safezone = SAFE_ZONE;
	}

	n = atom->nlocal;
	n_cap = MAX( (int)(n * safezone), mincap );

	// determine the total space for the H matrix

	if (reaxc) {
	inum = reaxc->list->inum;
	ilist = reaxc->list->ilist;
	numneigh = reaxc->list->numneigh;
	} else {
	inum = list->inum;
	ilist = list->ilist;
	numneigh = list->numneigh;
	}

	m = 0;
	for( ii = 0; ii < inum; ii++ ) {
	i = ilist[ii];
	m += numneigh[i];
	}
	m_cap = MAX( (int)(m * safezone), mincap * MIN_NBRS );

	H.n = n_cap;
	H.m = m_cap;
	memory->create(H.firstnbr,n_cap,"qeq:H.firstnbr");
	memory->create(H.numnbrs,n_cap,"qeq:H.numnbrs");
	memory->create(H.jlist,m_cap,"qeq:H.jlist");
	memory->create(H.val,m_cap,"qeq:H.val");
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::deallocate_matrix()
	{
	memory->destroy( H.firstnbr );
	memory->destroy( H.numnbrs );
	memory->destroy( H.jlist );
	memory->destroy( H.val );
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::reallocate_matrix()
	{
	deallocate_matrix();
	allocate_matrix();
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::init()
	{
	if (!atom->q_flag) error->all(FLERR,"Fix qeq/reax requires atom attribute q");

	ngroup = group->count(igroup);
	if (ngroup == 0) error->all(FLERR,"Fix qeq/reax group has no atoms");

	/*
	if (reaxc)
	if (ngroup != reaxc->ngroup)
	error->all(FLERR,"Fix qeq/reax group and pair reax/c have "
	"different numbers of atoms");
	*/

	// need a half neighbor list w/ Newton off and ghost neighbors
	// built whenever re-neighboring occurs

	int irequest = neighbor->request(this,instance_me);
	neighbor->requests[irequest]->pair = 0;
	neighbor->requests[irequest]->fix = 1;
	neighbor->requests[irequest]->newton = 2;
	neighbor->requests[irequest]->ghost = 1;

	init_shielding();
	init_taper();

	if (strstr(update->integrate_style,"respa"))
	nlevels_respa = ((Respa *) update->integrate)->nlevels;
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::init_list(int id, NeighList *ptr)
	{
	list = ptr;
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::init_shielding()
	{
	int i,j;
	int ntypes;

	ntypes = atom->ntypes;
	if (shld == NULL)
	- memory->create(shld,ntypes+1,ntypes+1,"qeq:shileding");
	+ memory->create(shld,ntypes+1,ntypes+1,"qeq:shielding");

	for( i = 1; i <= ntypes; ++i )
	for( j = 1; j <= ntypes; ++j )
	shld[i][j] = pow( gamma[i] * gamma[j], -1.5 );
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::init_taper()
	{
	double d7, swa2, swa3, swb2, swb3;

	if (fabs(swa) > 0.01 && comm->me == 0)
	error->warning(FLERR,"Fix qeq/reax has non-zero lower Taper radius cutoff");
	if (swb < 0)
	error->all(FLERR, "Fix qeq/reax has negative upper Taper radius cutoff");
	else if (swb < 5 && comm->me == 0)
	error->warning(FLERR,"Fix qeq/reax has very low Taper radius cutoff");

	d7 = pow( swb - swa, 7 );
	swa2 = SQR( swa );
	swa3 = CUBE( swa );
	swb2 = SQR( swb );
	swb3 = CUBE( swb );

	Tap[7] = 20.0 / d7;
	Tap[6] = -70.0 * (swa + swb) / d7;
	Tap[5] = 84.0 * (swa2 + 3.0swaswb + swb2) / d7;
	Tap[4] = -35.0 * (swa3 + 9.0swa2swb + 9.0swaswb2 + swb3 ) / d7;
	Tap[3] = 140.0 * (swa3swb + 3.0swa2swb2 + swaswb3 ) / d7;
	Tap[2] =-210.0 * (swa3swb2 + swa2swb3) / d7;
	Tap[1] = 140.0 * swa3 * swb3 / d7;
	Tap[0] = (-35.0swa3swb2swb2 + 21.0swa2swb3swb2 +
	7.0swaswb3swb3 + swb3swb3*swb ) / d7;
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::setup_pre_force(int vflag)
	{
	// should not be needed
	// neighbor->build_one(list);

	deallocate_storage();
	allocate_storage();

	init_storage();

	deallocate_matrix();
	allocate_matrix();

	pre_force(vflag);
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::setup_pre_force_respa(int vflag, int ilevel)
	{
	if (ilevel < nlevels_respa-1) return;
	setup_pre_force(vflag);
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::init_storage()
	{
	int NN;

	if (reaxc)
	NN = reaxc->list->inum + reaxc->list->gnum;
	else
	NN = list->inum + list->gnum;

	for( int i = 0; i < NN; i++ ) {
	Hdia_inv[i] = 1. / eta[atom->type[i]];
	b_s[i] = -chi[atom->type[i]];
	b_t[i] = -1.0;
	b_prc[i] = 0;
	b_prm[i] = 0;
	s[i] = t[i] = 0;
	}
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::pre_force(int vflag)
	{
	double t_start, t_end;

	if (update->ntimestep % nevery) return;
	if( comm->me == 0 ) t_start = MPI_Wtime();

	n = atom->nlocal;
	N = atom->nlocal + atom->nghost;

	// grow arrays if necessary
	// need to be atom->nmax in length

	if( atom->nmax > nmax ) reallocate_storage();
	if( n > n_capDANGER_ZONE \|\| m_fill > m_capDANGER_ZONE )
	reallocate_matrix();

	init_matvec();
	matvecs = CG(b_s, s); // CG on s - parallel
	matvecs += CG(b_t, t); // CG on t - parallel
	calculate_Q();

	if( comm->me == 0 ) {
	t_end = MPI_Wtime();
	qeq_time = t_end - t_start;
	}
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::pre_force_respa(int vflag, int ilevel, int iloop)
	{
	if (ilevel == nlevels_respa-1) pre_force(vflag);
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::min_pre_force(int vflag)
	{
	pre_force(vflag);
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::init_matvec()
	{
	/* fill-in H matrix */
	compute_H();

	int nn, ii, i;
	int *ilist;

	if (reaxc) {
	nn = reaxc->list->inum;
	ilist = reaxc->list->ilist;
	} else {
	nn = list->inum;
	ilist = list->ilist;
	}

	for( ii = 0; ii < nn; ++ii ) {
	i = ilist[ii];
	if (atom->mask[i] & groupbit) {

	/* init pre-conditioner for H and init solution vectors */
	Hdia_inv[i] = 1. / eta[ atom->type[i] ];
	b_s[i] = -chi[ atom->type[i] ];
	b_t[i] = -1.0;

	/* linear extrapolation for s & t from previous solutions */
	//s[i] = 2 * s_hist[i][0] - s_hist[i][1];
	//t[i] = 2 * t_hist[i][0] - t_hist[i][1];

	/* quadratic extrapolation for s & t from previous solutions */
	//s[i] = s_hist[i][2] + 3 * ( s_hist[i][0] - s_hist[i][1] );
	t[i] = t_hist[i][2] + 3 * ( t_hist[i][0] - t_hist[i][1] );

	/* cubic extrapolation for s & t from previous solutions */
	s[i] = 4(s_hist[i][0]+s_hist[i][2])-(6s_hist[i][1]+s_hist[i][3]);
	//t[i] = 4(t_hist[i][0]+t_hist[i][2])-(6t_hist[i][1]+t_hist[i][3]);
	}
	}

	pack_flag = 2;
	comm->forward_comm_fix(this); //Dist_vector( s );
	pack_flag = 3;
	comm->forward_comm_fix(this); //Dist_vector( t );
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::compute_H()
	{
	int inum, jnum, ilist, jlist, numneigh, *firstneigh;
	int i, j, ii, jj, flag;
	double **x, SMALL = 0.0001;
	double dx, dy, dz, r_sqr;

	int *type = atom->type;
	tagint *tag = atom->tag;
	x = atom->x;
	int *mask = atom->mask;

	if (reaxc) {
	inum = reaxc->list->inum;
	ilist = reaxc->list->ilist;
	numneigh = reaxc->list->numneigh;
	firstneigh = reaxc->list->firstneigh;
	} else {
	inum = list->inum;
	ilist = list->ilist;
	numneigh = list->numneigh;
	firstneigh = list->firstneigh;
	}

	// fill in the H matrix
	m_fill = 0;
	r_sqr = 0;
	for( ii = 0; ii < inum; ii++ ) {
	i = ilist[ii];
	if (mask[i] & groupbit) {
	jlist = firstneigh[i];
	jnum = numneigh[i];
	H.firstnbr[i] = m_fill;

	for( jj = 0; jj < jnum; jj++ ) {
	j = jlist[jj];

	dx = x[j][0] - x[i][0];
	dy = x[j][1] - x[i][1];
	dz = x[j][2] - x[i][2];
	r_sqr = SQR(dx) + SQR(dy) + SQR(dz);

	flag = 0;
	if (r_sqr <= SQR(swb)) {
	if (j < n) flag = 1;
	else if (tag[i] < tag[j]) flag = 1;
	else if (tag[i] == tag[j]) {
	if (dz > SMALL) flag = 1;
	else if (fabs(dz) < SMALL) {
	if (dy > SMALL) flag = 1;
	else if (fabs(dy) < SMALL && dx > SMALL)
	flag = 1;
	}
	}
	}

	if( flag ) {
	H.jlist[m_fill] = j;
	H.val[m_fill] = calculate_H( sqrt(r_sqr), shld[type[i]][type[j]] );
	m_fill++;
	}
	}
	H.numnbrs[i] = m_fill - H.firstnbr[i];
	}
	}

	if (m_fill >= H.m) {
	char str[128];
	sprintf(str,"H matrix size has been exceeded: m_fill=%d H.m=%d\n",
	m_fill, H.m );
	error->warning(FLERR,str);
	error->all(FLERR,"Fix qeq/reax has insufficient QEq matrix size");
	}
	}

	/* ---------------------------------------------------------------------- */

	double FixQEqReax::calculate_H( double r, double gamma )
	{
	double Taper, denom;

	Taper = Tap[7] * r + Tap[6];
	Taper = Taper * r + Tap[5];
	Taper = Taper * r + Tap[4];
	Taper = Taper * r + Tap[3];
	Taper = Taper * r + Tap[2];
	Taper = Taper * r + Tap[1];
	Taper = Taper * r + Tap[0];

	denom = r * r * r + gamma;
	denom = pow(denom,0.3333333333333);

	return Taper * EV_TO_KCAL_PER_MOL / denom;
	}

	/* ---------------------------------------------------------------------- */

	int FixQEqReax::CG( double b, double x )
	{
	int i, j, imax;
	double tmp, alpha, beta, b_norm;
	double sig_old, sig_new;

	int nn, jj;
	int *ilist;
	if (reaxc) {
	nn = reaxc->list->inum;
	ilist = reaxc->list->ilist;
	} else {
	nn = list->inum;
	ilist = list->ilist;
	}

	imax = 200;

	pack_flag = 1;
	sparse_matvec( &H, x, q );
	comm->reverse_comm_fix( this ); //Coll_Vector( q );

	vector_sum( r , 1., b, -1., q, nn );

	for( jj = 0; jj < nn; ++jj ) {
	j = ilist[jj];
	if (atom->mask[j] & groupbit)
	d[j] = r[j] * Hdia_inv[j]; //pre-condition
	}

	b_norm = parallel_norm( b, nn );
	sig_new = parallel_dot( r, d, nn);

	for( i = 1; i < imax && sqrt(sig_new) / b_norm > tolerance; ++i ) {
	comm->forward_comm_fix(this); //Dist_vector( d );
	sparse_matvec( &H, d, q );
	comm->reverse_comm_fix(this); //Coll_vector( q );

	tmp = parallel_dot( d, q, nn);
	alpha = sig_new / tmp;

	vector_add( x, alpha, d, nn );
	vector_add( r, -alpha, q, nn );

	// pre-conditioning
	for( jj = 0; jj < nn; ++jj ) {
	j = ilist[jj];
	if (atom->mask[j] & groupbit)
	p[j] = r[j] * Hdia_inv[j];
	}

	sig_old = sig_new;
	sig_new = parallel_dot( r, p, nn);

	beta = sig_new / sig_old;
	vector_sum( d, 1., p, beta, d, nn );

	}

	if (i >= imax && comm->me == 0) {
	char str[128];
	sprintf(str,"Fix qeq/reax CG convergence failed after %d iterations "
	"at " BIGINT_FORMAT " step",i,update->ntimestep);
	error->warning(FLERR,str);
	}

	return i;
	}


	/* ---------------------------------------------------------------------- */

	void FixQEqReax::sparse_matvec( sparse_matrix A, double x, double *b )
	{
	int i, j, itr_j;
	int nn, NN, ii;
	int *ilist;

	if (reaxc) {
	nn = reaxc->list->inum;
	NN = reaxc->list->inum + reaxc->list->gnum;
	ilist = reaxc->list->ilist;
	} else {
	nn = list->inum;
	NN = list->inum + list->gnum;
	ilist = list->ilist;
	}

	for( ii = 0; ii < nn; ++ii ) {
	i = ilist[ii];
	if (atom->mask[i] & groupbit)
	b[i] = eta[ atom->type[i] ] * x[i];
	}

	for( ii = nn; ii < NN; ++ii ) {
	i = ilist[ii];
	if (atom->mask[i] & groupbit)
	b[i] = 0;
	}

	for( ii = 0; ii < nn; ++ii ) {
	i = ilist[ii];
	if (atom->mask[i] & groupbit) {
	for( itr_j=A->firstnbr[i]; itr_j<A->firstnbr[i]+A->numnbrs[i]; itr_j++) {
	j = A->jlist[itr_j];
	b[i] += A->val[itr_j] * x[j];
	b[j] += A->val[itr_j] * x[i];
	}
	}
	}

	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::calculate_Q()
	{
	int i, k;
	double u, s_sum, t_sum;
	double *q = atom->q;

	int nn, ii;
	int *ilist;

	if (reaxc) {
	nn = reaxc->list->inum;
	ilist = reaxc->list->ilist;
	} else {
	nn = list->inum;
	ilist = list->ilist;
	}

	s_sum = parallel_vector_acc( s, nn );
	t_sum = parallel_vector_acc( t, nn);
	u = s_sum / t_sum;

	for( ii = 0; ii < nn; ++ii ) {
	i = ilist[ii];
	if (atom->mask[i] & groupbit) {
	q[i] = s[i] - u * t[i];

	/* backup s & t */
	for( k = 4; k > 0; --k ) {
	s_hist[i][k] = s_hist[i][k-1];
	t_hist[i][k] = t_hist[i][k-1];
	}
	s_hist[i][0] = s[i];
	t_hist[i][0] = t[i];
	}
	}

	pack_flag = 4;
	comm->forward_comm_fix( this ); //Dist_vector( atom->q );
	}

	/* ---------------------------------------------------------------------- */

	int FixQEqReax::pack_forward_comm(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int m;

	if( pack_flag == 1)
	for(m = 0; m < n; m++) buf[m] = d[list[m]];
	else if( pack_flag == 2 )
	for(m = 0; m < n; m++) buf[m] = s[list[m]];
	else if( pack_flag == 3 )
	for(m = 0; m < n; m++) buf[m] = t[list[m]];
	else if( pack_flag == 4 )
	for(m = 0; m < n; m++) buf[m] = atom->q[list[m]];

	return n;
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::unpack_forward_comm(int n, int first, double *buf)
	{
	int i, m;

	if( pack_flag == 1)
	for(m = 0, i = first; m < n; m++, i++) d[i] = buf[m];
	else if( pack_flag == 2)
	for(m = 0, i = first; m < n; m++, i++) s[i] = buf[m];
	else if( pack_flag == 3)
	for(m = 0, i = first; m < n; m++, i++) t[i] = buf[m];
	else if( pack_flag == 4)
	for(m = 0, i = first; m < n; m++, i++) atom->q[i] = buf[m];
	}

	/* ---------------------------------------------------------------------- */

	int FixQEqReax::pack_reverse_comm(int n, int first, double *buf)
	{
	int i, m;
	for(m = 0, i = first; m < n; m++, i++) buf[m] = q[i];
	return n;
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::unpack_reverse_comm(int n, int list, double buf)
	{
	for(int m = 0; m < n; m++) q[list[m]] += buf[m];
	}

	/* ----------------------------------------------------------------------
	memory usage of local atom-based arrays
	------------------------------------------------------------------------- */

	double FixQEqReax::memory_usage()
	{
	double bytes;

	bytes = atom->nmaxnprev2 * sizeof(double); // s_hist & t_hist
	bytes += atom->nmax11 sizeof(double); // storage
	bytes += n_cap2 sizeof(int); // matrix...
	bytes += m_cap * sizeof(int);
	bytes += m_cap * sizeof(double);

	return bytes;
	}

	/* ----------------------------------------------------------------------
	allocate fictitious charge arrays
	------------------------------------------------------------------------- */

	void FixQEqReax::grow_arrays(int nmax)
	{
	memory->grow(s_hist,nmax,nprev,"qeq:s_hist");
	memory->grow(t_hist,nmax,nprev,"qeq:t_hist");
	}

	/* ----------------------------------------------------------------------
	copy values within fictitious charge arrays
	------------------------------------------------------------------------- */

	void FixQEqReax::copy_arrays(int i, int j, int delflag)
	{
	for (int m = 0; m < nprev; m++) {
	s_hist[j][m] = s_hist[i][m];
	t_hist[j][m] = t_hist[i][m];
	}
	}

	/* ----------------------------------------------------------------------
	pack values in local atom-based array for exchange with another proc
	------------------------------------------------------------------------- */

	int FixQEqReax::pack_exchange(int i, double *buf)
	{
	for (int m = 0; m < nprev; m++) buf[m] = s_hist[i][m];
	for (int m = 0; m < nprev; m++) buf[nprev+m] = t_hist[i][m];
	return nprev*2;
	}

	/* ----------------------------------------------------------------------
	unpack values in local atom-based array from exchange with another proc
	------------------------------------------------------------------------- */

	int FixQEqReax::unpack_exchange(int nlocal, double *buf)
	{
	for (int m = 0; m < nprev; m++) s_hist[nlocal][m] = buf[m];
	for (int m = 0; m < nprev; m++) t_hist[nlocal][m] = buf[nprev+m];
	return nprev*2;
	}

	/* ---------------------------------------------------------------------- */

	double FixQEqReax::parallel_norm( double *v, int n )
	{
	int i;
	double my_sum, norm_sqr;

	int ii;
	int *ilist;

	if (reaxc)
	ilist = reaxc->list->ilist;
	else
	ilist = list->ilist;

	my_sum = 0.0;
	norm_sqr = 0.0;
	for( ii = 0; ii < n; ++ii ) {
	i = ilist[ii];
	if (atom->mask[i] & groupbit)
	my_sum += SQR( v[i] );
	}

	MPI_Allreduce( &my_sum, &norm_sqr, 1, MPI_DOUBLE, MPI_SUM, world );

	return sqrt( norm_sqr );
	}

	/* ---------------------------------------------------------------------- */

	double FixQEqReax::parallel_dot( double v1, double v2, int n)
	{
	int i;
	double my_dot, res;

	int ii;
	int *ilist;

	if (reaxc)
	ilist = reaxc->list->ilist;
	else
	ilist = list->ilist;

	my_dot = 0.0;
	res = 0.0;
	for( ii = 0; ii < n; ++ii ) {
	i = ilist[ii];
	if (atom->mask[i] & groupbit)
	my_dot += v1[i] * v2[i];
	}

	MPI_Allreduce( &my_dot, &res, 1, MPI_DOUBLE, MPI_SUM, world );

	return res;
	}

	/* ---------------------------------------------------------------------- */

	double FixQEqReax::parallel_vector_acc( double *v, int n )
	{
	int i;
	double my_acc, res;

	int ii;
	int *ilist;

	if (reaxc)
	ilist = reaxc->list->ilist;
	else
	ilist = list->ilist;

	my_acc = 0.0;
	res = 0.0;
	for( ii = 0; ii < n; ++ii ) {
	i = ilist[ii];
	if (atom->mask[i] & groupbit)
	my_acc += v[i];
	}

	MPI_Allreduce( &my_acc, &res, 1, MPI_DOUBLE, MPI_SUM, world );

	return res;
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::vector_sum( double* dest, double c, double* v,
	double d, double* y, int k )
	{
	int kk;
	int *ilist;

	if (reaxc)
	ilist = reaxc->list->ilist;
	else
	ilist = list->ilist;

	for( --k; k>=0; --k ) {
	kk = ilist[k];
	if (atom->mask[kk] & groupbit)
	dest[kk] = c * v[kk] + d * y[kk];
	}
	}

	/* ---------------------------------------------------------------------- */

	void FixQEqReax::vector_add( double* dest, double c, double* v, int k )
	{
	int kk;
	int *ilist;

	if (reaxc)
	ilist = reaxc->list->ilist;
	else
	ilist = list->ilist;

	for( --k; k>=0; --k ) {
	kk = ilist[k];
	if (atom->mask[kk] & groupbit)
	dest[kk] += c * v[kk];
	}

	}
	diff --git a/src/USER-REAXC/fix_reax_c.cpp b/src/USER-REAXC/fix_reaxc.cpp
	similarity index 99%
	rename from src/USER-REAXC/fix_reax_c.cpp
	rename to src/USER-REAXC/fix_reaxc.cpp
	index e1cc4e340..df0621799 100644
	--- a/src/USER-REAXC/fix_reax_c.cpp
	+++ b/src/USER-REAXC/fix_reaxc.cpp
	@@ -1,161 +1,161 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Hasan Metin Aktulga, Purdue University
	(now at Lawrence Berkeley National Laboratory, hmaktulga@lbl.gov)

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.
	------------------------------------------------------------------------- */

	-#include "fix_reax_c.h"
	+#include "fix_reaxc.h"
	#include "atom.h"
	#include "pair.h"
	#include "comm.h"
	#include "memory.h"

	using namespace LAMMPS_NS;
	using namespace FixConst;

	#define MAX_REAX_BONDS 30
	#define MIN_REAX_BONDS 15
	#define MIN_REAX_HBONDS 25

	/* ---------------------------------------------------------------------- */

	FixReaxC::FixReaxC(LAMMPS lmp,int narg, char *arg) :
	Fix(lmp, narg, arg)
	{
	// perform initial allocation of atom-based arrays
	// register with atom class

	num_bonds = NULL;
	num_hbonds = NULL;
	grow_arrays(atom->nmax);
	atom->add_callback(0);

	// initialize arrays to MIN so atom migration is OK the 1st time

	int nlocal = atom->nlocal;
	for (int i = 0; i < nlocal; i++)
	num_bonds[i] = num_hbonds[i] = MIN_REAX_BONDS;

	// set comm sizes needed by this fix

	comm_forward = 1;
	}

	/* ---------------------------------------------------------------------- */

	FixReaxC::~FixReaxC()
	{
	// unregister this fix so atom class doesn't invoke it any more

	atom->delete_callback(id,0);

	// delete locally stored arrays

	memory->destroy(num_bonds);
	memory->destroy(num_hbonds);
	}

	/* ---------------------------------------------------------------------- */

	int FixReaxC::setmask()
	{
	int mask = 0;
	return mask;
	}

	/* ----------------------------------------------------------------------
	memory usage of local atom-based arrays
	------------------------------------------------------------------------- */

	double FixReaxC::memory_usage()
	{
	int nmax = atom->nmax;
	double bytes = nmax * 2 * sizeof(int);
	return bytes;
	}

	/* ----------------------------------------------------------------------
	allocate local atom-based arrays
	------------------------------------------------------------------------- */

	void FixReaxC::grow_arrays(int nmax)
	{
	memory->grow(num_bonds,nmax,"reaxc:num_bonds");
	memory->grow(num_hbonds,nmax,"reaxc:num_hbonds");
	}

	/* ----------------------------------------------------------------------
	copy values within local atom-based arrays
	------------------------------------------------------------------------- */

	void FixReaxC::copy_arrays(int i, int j, int delflag)
	{
	num_bonds[j] = num_bonds[i];
	num_hbonds[j] = num_hbonds[i];
	}

	/* ----------------------------------------------------------------------
	pack values in local atom-based arrays for exchange with another proc
	------------------------------------------------------------------------- */

	int FixReaxC::pack_exchange(int i, double *buf)
	{
	buf[0] = num_bonds[i];
	buf[1] = num_hbonds[i];
	return 2;
	}

	/* ----------------------------------------------------------------------
	unpack values in local atom-based arrays from exchange with another proc
	------------------------------------------------------------------------- */

	int FixReaxC::unpack_exchange(int nlocal, double *buf)
	{
	num_bonds[nlocal] = static_cast<int> (buf[0]);
	num_hbonds[nlocal] = static_cast<int> (buf[1]);
	return 2;
	}

	/* ---------------------------------------------------------------------- */

	int FixReaxC::pack_forward_comm(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;

	m = 0;
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m++] = num_bonds[j];
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxC::unpack_forward_comm(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++)
	num_bonds[i] = static_cast<int> (buf[m++]);
	}
	diff --git a/src/USER-REAXC/fix_reax_c.h b/src/USER-REAXC/fix_reaxc.h
	similarity index 100%
	rename from src/USER-REAXC/fix_reax_c.h
	rename to src/USER-REAXC/fix_reaxc.h
	diff --git a/src/USER-REAXC/fix_reaxc_bonds.cpp b/src/USER-REAXC/fix_reaxc_bonds.cpp
	index 543669de7..cf9e4789c 100644
	--- a/src/USER-REAXC/fix_reaxc_bonds.cpp
	+++ b/src/USER-REAXC/fix_reaxc_bonds.cpp
	@@ -1,359 +1,359 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Ray Shan (Sandia, tnshan@sandia.gov)
	------------------------------------------------------------------------- */

	#include <stdlib.h>
	#include <string.h>
	#include "fix_ave_atom.h"
	#include "fix_reaxc_bonds.h"
	#include "atom.h"
	#include "update.h"
	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "modify.h"
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "neigh_request.h"
	#include "comm.h"
	#include "force.h"
	#include "compute.h"
	#include "input.h"
	#include "variable.h"
	#include "memory.h"
	#include "error.h"
	#include "reaxc_list.h"
	#include "reaxc_types.h"
	#include "reaxc_defs.h"

	using namespace LAMMPS_NS;
	using namespace FixConst;

	/* ---------------------------------------------------------------------- */

	FixReaxCBonds::FixReaxCBonds(LAMMPS lmp, int narg, char *arg) :
	Fix(lmp, narg, arg)
	{
	if (narg != 5) error->all(FLERR,"Illegal fix reax/c/bonds command");

	MPI_Comm_rank(world,&me);
	MPI_Comm_size(world,&nprocs);
	ntypes = atom->ntypes;
	nmax = atom->nmax;

	nevery = force->inumeric(FLERR,arg[3]);

	if (nevery <= 0 )
	error->all(FLERR,"Illegal fix reax/c/bonds command");

	if (me == 0) {
	fp = fopen(arg[4],"w");
	if (fp == NULL) {
	char str[128];
	sprintf(str,"Cannot open fix reax/c/bonds file %s",arg[4]);
	error->one(FLERR,str);
	}
	}

	if (atom->tag_consecutive() == 0)
	error->all(FLERR,"Atom IDs must be consecutive for fix reax/c bonds");

	abo = NULL;
	neighid = NULL;
	numneigh = NULL;

	allocate();
	}

	/* ---------------------------------------------------------------------- */

	FixReaxCBonds::~FixReaxCBonds()
	{
	MPI_Comm_rank(world,&me);

	destroy();

	if (me == 0) fclose(fp);
	}

	/* ---------------------------------------------------------------------- */

	int FixReaxCBonds::setmask()
	{
	int mask = 0;
	mask \|= END_OF_STEP;
	return mask;
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCBonds::setup(int vflag)
	{
	end_of_step();
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCBonds::init()
	{
	reaxc = (PairReaxC *) force->pair_match("reax/c",1);
	if (reaxc == NULL)
	reaxc = (PairReaxC *) force->pair_match("reax/c/kk",1);

	if (reaxc == NULL) error->all(FLERR,"Cannot use fix reax/c/bonds without "
	"pair_style reax/c");

	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCBonds::end_of_step()
	{
	Output_ReaxC_Bonds(update->ntimestep,fp);
	if (me == 0) fflush(fp);
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCBonds::Output_ReaxC_Bonds(bigint ntimestep, FILE *fp)

	{
	int i, j;
	int nbuf, nbuf_local;
	int nlocal_max, numbonds, numbonds_max;
	double *buf;

	int nlocal = atom->nlocal;
	int nlocal_tot = static_cast<int> (atom->natoms);

	if (atom->nmax > nmax) {
	destroy();
	nmax = atom->nmax;
	allocate();
	}

	for (i = 0; i < nmax; i++) {
	numneigh[i] = 0;
	for (j = 0; j < MAXREAXBOND; j++) {
	neighid[i][j] = 0;
	abo[i][j] = 0.0;
	}
	}
	numbonds = 0;

	FindBond(lists, numbonds);

	// allocate a temporary buffer for the snapshot info
	MPI_Allreduce(&numbonds,&numbonds_max,1,MPI_INT,MPI_MAX,world);
	MPI_Allreduce(&nlocal,&nlocal_max,1,MPI_INT,MPI_MAX,world);

	nbuf = 1+(numbonds_max2+10)nlocal_max;
	memory->create(buf,nbuf,"reax/c/bonds:buf");
	for (i = 0; i < nbuf; i ++) buf[i] = 0.0;

	// Pass information to buffer
	PassBuffer(buf, nbuf_local);

	// Receive information from buffer for output
	RecvBuffer(buf, nbuf, nbuf_local, nlocal_tot, numbonds_max);

	memory->destroy(buf);

	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCBonds::FindBond(struct _reax_list *lists, int &numbonds)
	{
	int *ilist, i, ii, inum;
	int j, pj, nj;
	tagint jtag;
	double bo_tmp,bo_cut;

	inum = reaxc->list->inum;
	ilist = reaxc->list->ilist;
	bond_data *bo_ij;
	bo_cut = reaxc->control->bg_cut;

	tagint *tag = atom->tag;

	for (ii = 0; ii < inum; ii++) {
	i = ilist[ii];
	nj = 0;

	for( pj = Start_Index(i, reaxc->lists); pj < End_Index(i, reaxc->lists); ++pj ) {
	bo_ij = &( reaxc->lists->select.bond_list[pj] );
	j = bo_ij->nbr;
	jtag = tag[j];
	bo_tmp = bo_ij->bo_data.BO;

	if (bo_tmp > bo_cut) {
	neighid[i][nj] = jtag;
	abo[i][nj] = bo_tmp;
	nj ++;
	}
	}
	numneigh[i] = nj;
	if (nj > numbonds) numbonds = nj;
	}

	}
	/* ---------------------------------------------------------------------- */

	void FixReaxCBonds::PassBuffer(double *buf, int &nbuf_local)
	{
	int i, j, k, numbonds;
	int nlocal = atom->nlocal;

	j = 2;
	buf[0] = nlocal;
	for (i = 0; i < nlocal; i++) {
	buf[j-1] = atom->tag[i];
	buf[j+0] = atom->type[i];
	buf[j+1] = reaxc->workspace->total_bond_order[i];
	buf[j+2] = reaxc->workspace->nlp[i];
	buf[j+3] = atom->q[i];
	buf[j+4] = numneigh[i];
	numbonds = nint(buf[j+4]);

	for (k = 5; k < 5+numbonds; k++) {
	buf[j+k] = neighid[i][k-5];
	}
	j += (5+numbonds);

	if (atom->molecule == NULL ) buf[j] = 0.0;
	else buf[j] = atom->molecule[i];
	j ++;

	for (k = 0; k < numbonds; k++) {
	buf[j+k] = abo[i][k];
	}
	j += (1+numbonds);
	}
	nbuf_local = j - 1;
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCBonds::RecvBuffer(double *buf, int nbuf, int nbuf_local,
	int natoms, int maxnum)
	{
	int i, j, k, itype;
	int inode, nlocal_tmp, numbonds;
	tagint itag,jtag;
	int nlocal = atom->nlocal;
	bigint ntimestep = update->ntimestep;
	double sbotmp, nlptmp, avqtmp, abotmp;

	double cutof3 = reaxc->control->bg_cut;
	MPI_Request irequest, irequest2;

	if (me == 0 ){
	fprintf(fp,"# Timestep " BIGINT_FORMAT " \n",ntimestep);
	fprintf(fp,"# \n");
	fprintf(fp,"# Number of particles %d \n",natoms);
	fprintf(fp,"# \n");
	fprintf(fp,"# Max number of bonds per atom %d with "
	"coarse bond order cutoff %5.3f \n",maxnum,cutof3);
	fprintf(fp,"# Particle connection table and bond orders \n");
	fprintf(fp,"# id type nb id_1...id_nb mol bo_1...bo_nb abo nlp q \n");
	}

	j = 2;
	if (me == 0) {
	for (inode = 0; inode < nprocs; inode ++) {
	if (inode == 0) {
	nlocal_tmp = nlocal;
	} else {
	MPI_Irecv(&buf[0],nbuf,MPI_DOUBLE,inode,0,world,&irequest);
	MPI_Wait(&irequest,MPI_STATUS_IGNORE);
	nlocal_tmp = nint(buf[0]);
	}
	j = 2;
	for (i = 0; i < nlocal_tmp; i ++) {
	itag = static_cast<tagint> (buf[j-1]);
	itype = nint(buf[j+0]);
	sbotmp = buf[j+1];
	nlptmp = buf[j+2];
	avqtmp = buf[j+3];
	numbonds = nint(buf[j+4]);

	fprintf(fp," " TAGINT_FORMAT " %d %d",itag,itype,numbonds);

	for (k = 5; k < 5+numbonds; k++) {
	jtag = static_cast<tagint> (buf[j+k]);
	fprintf(fp," " TAGINT_FORMAT,jtag);
	}
	j += (5+numbonds);

	fprintf(fp," " TAGINT_FORMAT,static_cast<tagint> (buf[j]));
	j ++;

	for (k = 0; k < numbonds; k++) {
	abotmp = buf[j+k];
	fprintf(fp,"%14.3f",abotmp);
	}
	j += (1+numbonds);
	fprintf(fp,"%14.3f%14.3f%14.3f\n",sbotmp,nlptmp,avqtmp);
	}
	}
	} else {
	MPI_Isend(&buf[0],nbuf_local,MPI_DOUBLE,0,0,world,&irequest2);
	MPI_Wait(&irequest2,MPI_STATUS_IGNORE);
	}
	if(me ==0) fprintf(fp,"# \n");

	}

	/* ---------------------------------------------------------------------- */

	int FixReaxCBonds::nint(const double &r)
	{
	int i = 0;
	if (r>0.0) i = static_cast<int>(r+0.5);
	else if (r<0.0) i = static_cast<int>(r-0.5);
	return i;
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCBonds::destroy()
	{
	memory->destroy(abo);
	memory->destroy(neighid);
	memory->destroy(numneigh);
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCBonds::allocate()
	{
	memory->create(abo,nmax,MAXREAXBOND,"reax/c/bonds:abo");
	memory->create(neighid,nmax,MAXREAXBOND,"reax/c/bonds:neighid");
	memory->create(numneigh,nmax,"reax/c/bonds:numneigh");
	}

	/* ---------------------------------------------------------------------- */

	double FixReaxCBonds::memory_usage()
	{
	double bytes;

	bytes = 3.0nmaxsizeof(double);
	bytes += nmax*sizeof(int);
	bytes += 1.0nmaxMAXREAXBOND*sizeof(double);
	bytes += 1.0nmaxMAXREAXBOND*sizeof(int);

	return bytes;
	}
	diff --git a/src/USER-REAXC/fix_reaxc_species.cpp b/src/USER-REAXC/fix_reaxc_species.cpp
	index ead73f02a..d291903fa 100644
	--- a/src/USER-REAXC/fix_reaxc_species.cpp
	+++ b/src/USER-REAXC/fix_reaxc_species.cpp
	@@ -1,985 +1,985 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing authors: Ray Shan (Sandia, tnshan@sandia.gov)
	Oleg Sergeev (VNIIA, sergeev@vniia.ru)
	------------------------------------------------------------------------- */

	#include <stdlib.h>
	#include <math.h>
	#include "atom.h"
	#include <string.h>
	#include "fix_ave_atom.h"
	#include "fix_reaxc_species.h"
	#include "domain.h"
	#include "update.h"
	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "modify.h"
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "neigh_request.h"
	#include "comm.h"
	#include "force.h"
	#include "compute.h"
	#include "input.h"
	#include "variable.h"
	#include "memory.h"
	#include "error.h"
	#include "reaxc_list.h"

	using namespace LAMMPS_NS;
	using namespace FixConst;

	/* ---------------------------------------------------------------------- */

	FixReaxCSpecies::FixReaxCSpecies(LAMMPS lmp, int narg, char *arg) :
	Fix(lmp, narg, arg)
	{
	if (narg < 7) error->all(FLERR,"Illegal fix reax/c/species command");

	force_reneighbor = 0;

	vector_flag = 1;
	size_vector = 2;
	extvector = 0;

	peratom_flag = 1;
	size_peratom_cols = 0;
	peratom_freq = 1;

	nvalid = -1;

	MPI_Comm_rank(world,&me);
	MPI_Comm_size(world,&nprocs);
	ntypes = atom->ntypes;

	nevery = atoi(arg[3]);
	nrepeat = atoi(arg[4]);
	global_freq = nfreq = atoi(arg[5]);

	comm_forward = 5;

	if (nevery <= 0 \|\| nrepeat <= 0 \|\| nfreq <= 0)
	error->all(FLERR,"Illegal fix reax/c/species command");
	if (nfreq % nevery \|\| nrepeat*nevery > nfreq)
	error->all(FLERR,"Illegal fix reax/c/species command");

	// Neighbor lists must stay unchanged during averaging of bonds,
	// but may be updated when no averaging is performed.

	int rene_flag = 0;
	if (nevery * nrepeat != 1 && (nfreq % neighbor->every != 0 \|\| neighbor->every < nevery * nrepeat)) {
	int newneighborevery = nevery * nrepeat;
	while (nfreq % newneighborevery != 0 && newneighborevery <= nfreq / 2)
	newneighborevery++;

	if (nfreq % newneighborevery != 0)
	newneighborevery = nfreq;

	neighbor->every = newneighborevery;
	rene_flag = 1;
	}

	if (nevery * nrepeat != 1 && (neighbor->delay != 0 \|\| neighbor->dist_check != 0)) {
	neighbor->delay = 0;
	neighbor->dist_check = 0;
	rene_flag = 1;
	}

	if (me == 0 && rene_flag) {
	char str[128];
	sprintf(str,"Resetting reneighboring criteria for fix reax/c/species");
	error->warning(FLERR,str);
	}

	tmparg = NULL;
	memory->create(tmparg,4,4,"reax/c/species:tmparg");
	strcpy(tmparg[0],arg[3]);
	strcpy(tmparg[1],arg[4]);
	strcpy(tmparg[2],arg[5]);

	if (me == 0) {
	fp = fopen(arg[6],"w");
	if (fp == NULL) {
	char str[128];
	sprintf(str,"Cannot open fix reax/c/species file %s",arg[6]);
	error->one(FLERR,str);
	}
	}

	x0 = NULL;
	PBCconnected = NULL;
	clusterID = NULL;

	int ntmp = 1;
	memory->create(x0,ntmp,"reax/c/species:x0");
	memory->create(PBCconnected,ntmp,"reax/c/species:PBCconnected");
	memory->create(clusterID,ntmp,"reax/c/species:clusterID");
	vector_atom = clusterID;

	BOCut = NULL;
	Name = NULL;
	MolName = NULL;
	MolType = NULL;
	NMol = NULL;
	nd = NULL;
	molmap = NULL;

	nmax = 0;
	setupflag = 0;

	// set default bond order cutoff
	int n, i, j, itype, jtype;
	double bo_cut;
	bg_cut = 0.30;
	n = ntypes+1;
	memory->create(BOCut,n,n,"reax/c/species:BOCut");
	for (i = 1; i < n; i ++)
	for (j = 1; j < n; j ++)
	BOCut[i][j] = bg_cut;

	// optional args
	eletype = NULL;
	ele = filepos = NULL;
	eleflag = posflag = padflag = 0;

	singlepos_opened = multipos_opened = 0;
	multipos = 0;
	posfreq = 0;

	int iarg = 7;
	while (iarg < narg) {

	// set BO cutoff
	if (strcmp(arg[iarg],"cutoff") == 0) {
	if (iarg+4 > narg) error->all(FLERR,"Illegal fix reax/c/species command");
	itype = atoi(arg[iarg+1]);
	jtype = atoi(arg[iarg+2]);
	bo_cut = atof(arg[iarg+3]);
	if (itype > ntypes \|\| jtype > ntypes)
	error->all(FLERR,"Illegal fix reax/c/species command");
	if (itype <= 0 \|\| jtype <= 0)
	error->all(FLERR,"Illegal fix reax/c/species command");
	if (bo_cut > 1.0 \|\| bo_cut < 0.0)
	error->all(FLERR,"Illegal fix reax/c/species command");

	BOCut[itype][jtype] = bo_cut;
	BOCut[jtype][itype] = bo_cut;
	iarg += 4;

	// modify element type names
	} else if (strcmp(arg[iarg],"element") == 0) {
	if (iarg+ntypes+1 > narg) error->all(FLERR,"Illegal fix reax/c/species command");

	eletype = (char*) malloc(ntypessizeof(char*));
	for (int i = 0; i < ntypes; i ++) {
	eletype[i] = (char) malloc(2sizeof(char));
	strcpy(eletype[i],arg[iarg+1+i]);
	}
	eleflag = 1;
	iarg += ntypes + 1;

	// position of molecules
	} else if (strcmp(arg[iarg],"position") == 0) {
	if (iarg+3 > narg) error->all(FLERR,"Illegal fix reax/c/species command");
	posflag = 1;
	posfreq = atoi(arg[iarg+1]);
	if (posfreq < nfreq \|\| (posfreq%nfreq != 0))
	error->all(FLERR,"Illegal fix reax/c/species command");

	filepos = new char[255];
	strcpy(filepos,arg[iarg+2]);
	if (strchr(filepos,'*')) {
	multipos = 1;
	} else {
	if (me == 0) {
	pos = fopen(filepos, "w");
	if (pos == NULL) error->one(FLERR,"Cannot open fix reax/c/species position file");
	}
	singlepos_opened = 1;
	multipos = 0;
	}
	iarg += 3;
	} else error->all(FLERR,"Illegal fix reax/c/species command");
	}

	if (!eleflag) {
	memory->create(ele,ntypes+1,"reax/c/species:ele");
	ele[0]='C';
	if (ntypes > 1)
	ele[1]='H';
	if (ntypes > 2)
	ele[2]='O';
	if (ntypes > 3)
	ele[3]='N';
	}

	vector_nmole = 0;
	vector_nspec = 0;

	}

	/* ---------------------------------------------------------------------- */

	FixReaxCSpecies::~FixReaxCSpecies()
	{
	memory->destroy(ele);
	memory->destroy(BOCut);
	memory->destroy(clusterID);
	memory->destroy(PBCconnected);
	memory->destroy(x0);

	memory->destroy(nd);
	memory->destroy(Name);
	memory->destroy(NMol);
	memory->destroy(MolType);
	memory->destroy(MolName);
	memory->destroy(tmparg);

	if (filepos)
	delete [] filepos;

	if (me == 0) fclose(fp);
	if (me == 0 && posflag && multipos_opened) fclose(pos);

	modify->delete_compute("SPECATOM");
	modify->delete_fix("SPECBOND");
	}

	/* ---------------------------------------------------------------------- */

	int FixReaxCSpecies::setmask()
	{
	int mask = 0;
	mask \|= POST_INTEGRATE;
	return mask;
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpecies::setup(int vflag)
	{
	ntotal = static_cast<int> (atom->natoms);
	if (Name == NULL)
	memory->create(Name,ntypes,"reax/c/species:Name");

	post_integrate();
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpecies::init()
	{
	if (atom->tag_enable == 0)
	error->all(FLERR,"Cannot use fix reax/c/species unless atoms have IDs");

	reaxc = (PairReaxC *) force->pair_match("reax/c",1);
	if (reaxc == NULL)
	reaxc = (PairReaxC *) force->pair_match("reax/c/kk",1);

	if (reaxc == NULL) error->all(FLERR,"Cannot use fix reax/c/species without "
	"pair_style reax/c");

	reaxc->fixspecies_flag = 1;

	// reset next output timestep if not yet set or timestep has been reset
	if (nvalid != update->ntimestep)
	nvalid = update->ntimestep+nfreq;

	// check if this fix has been called twice
	int count = 0;
	for (int i = 0; i < modify->nfix; i++)
	if (strcmp(modify->fix[i]->style,"reax/c/species") == 0) count++;
	if (count > 1 && comm->me == 0)
	error->warning(FLERR,"More than one fix reax/c/species");

	if (!setupflag) {
	// create a compute to store properties
	create_compute();

	// create a fix to point to fix_ave_atom for averaging stored properties
	create_fix();

	setupflag = 1;
	}

	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpecies::create_compute()
	{
	int narg;
	char **args;

	narg = 34;
	args = new char*[narg];
	args[0] = (char *) "SPECATOM";
	args[1] = (char *) "all";
	args[2] = (char *) "SPEC/ATOM";
	args[3] = (char *) "q";
	args[4] = (char *) "x";
	args[5] = (char *) "y";
	args[6] = (char *) "z";
	args[7] = (char *) "vx";
	args[8] = (char *) "vy";
	args[9] = (char *) "vz";
	args[10] = (char *) "abo01";
	args[11] = (char *) "abo02";
	args[12] = (char *) "abo03";
	args[13] = (char *) "abo04";
	args[14] = (char *) "abo05";
	args[15] = (char *) "abo06";
	args[16] = (char *) "abo07";
	args[17] = (char *) "abo08";
	args[18] = (char *) "abo09";
	args[19] = (char *) "abo10";
	args[20] = (char *) "abo11";
	args[21] = (char *) "abo12";
	args[22] = (char *) "abo13";
	args[23] = (char *) "abo14";
	args[24] = (char *) "abo15";
	args[25] = (char *) "abo16";
	args[26] = (char *) "abo17";
	args[27] = (char *) "abo18";
	args[28] = (char *) "abo19";
	args[29] = (char *) "abo20";
	args[30] = (char *) "abo21";
	args[31] = (char *) "abo22";
	args[32] = (char *) "abo23";
	args[33] = (char *) "abo24";
	modify->add_compute(narg,args);
	delete [] args;
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpecies::create_fix()
	{
	int narg;
	char **args;

	narg = 37;
	args = new char*[narg];
	args[0] = (char *) "SPECBOND";
	args[1] = (char *) "all";
	args[2] = (char *) "ave/atom";
	args[3] = tmparg[0];
	args[4] = tmparg[1];
	args[5] = tmparg[2];
	args[6] = (char *) "c_SPECATOM[1]"; // q, array_atoms[i][0]
	args[7] = (char *) "c_SPECATOM[2]"; // x, 1
	args[8] = (char *) "c_SPECATOM[3]"; // y, 2
	args[9] = (char *) "c_SPECATOM[4]"; // z, 3
	args[10] = (char *) "c_SPECATOM[5]"; // vx, 4
	args[11] = (char *) "c_SPECATOM[6]"; // vy, 5
	args[12] = (char *) "c_SPECATOM[7]"; // vz, 6
	args[13] = (char *) "c_SPECATOM[8]"; // abo01, 7
	args[14] = (char *) "c_SPECATOM[9]";
	args[15] = (char *) "c_SPECATOM[10]";
	args[16] = (char *) "c_SPECATOM[11]";
	args[17] = (char *) "c_SPECATOM[12]";
	args[18] = (char *) "c_SPECATOM[13]";
	args[19] = (char *) "c_SPECATOM[14]";
	args[20] = (char *) "c_SPECATOM[15]";
	args[21] = (char *) "c_SPECATOM[16]";
	args[22] = (char *) "c_SPECATOM[17]";
	args[23] = (char *) "c_SPECATOM[18]";
	args[24] = (char *) "c_SPECATOM[19]"; // abo12, 18
	args[25] = (char *) "c_SPECATOM[20]";
	args[26] = (char *) "c_SPECATOM[21]";
	args[27] = (char *) "c_SPECATOM[22]";
	args[28] = (char *) "c_SPECATOM[23]";
	args[29] = (char *) "c_SPECATOM[24]";
	args[30] = (char *) "c_SPECATOM[25]";
	args[31] = (char *) "c_SPECATOM[26]";
	args[32] = (char *) "c_SPECATOM[27]";
	args[33] = (char *) "c_SPECATOM[28]";
	args[34] = (char *) "c_SPECATOM[29]";
	args[35] = (char *) "c_SPECATOM[30]";
	args[36] = (char *) "c_SPECATOM[31]";
	modify->add_fix(narg,args);
	f_SPECBOND = (FixAveAtom *) modify->fix[modify->nfix-1];
	delete [] args;

	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpecies::init_list(int id, NeighList *ptr)
	{
	list = ptr;
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpecies::post_integrate()
	{
	Output_ReaxC_Bonds(update->ntimestep,fp);
	if (me == 0) fflush(fp);
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpecies::Output_ReaxC_Bonds(bigint ntimestep, FILE *fp)

	{
	int Nmole, Nspec;

	// point to fix_ave_atom
	f_SPECBOND->end_of_step();

	if (ntimestep != nvalid) return;

	nlocal = atom->nlocal;

	if (atom->nmax > nmax) {
	nmax = atom->nmax;
	memory->destroy(x0);
	memory->destroy(PBCconnected);
	memory->destroy(clusterID);
	memory->create(x0,nmax,"reax/c/species:x0");
	memory->create(PBCconnected,nmax,"reax/c/species:PBCconnected");
	memory->create(clusterID,nmax,"reax/c/species:clusterID");
	vector_atom = clusterID;
	}

	for (int i = 0; i < nmax; i++) {
	PBCconnected[i] = 0;
	x0[i].x = x0[i].y = x0[i].z = 0.0;
	}

	Nmole = Nspec = 0;

	FindMolecule();

	SortMolecule (Nmole);

	FindSpecies(Nmole, Nspec);

	vector_nmole = Nmole;
	vector_nspec = Nspec;

	if (me == 0 && ntimestep >= 0)
	WriteFormulas (Nmole, Nspec);

	if (posflag && ((ntimestep)%posfreq==0)) {
	WritePos(Nmole, Nspec);
	if (me == 0) fflush(pos);
	}

	nvalid += nfreq;
	}

	/* ---------------------------------------------------------------------- */

	AtomCoord FixReaxCSpecies::chAnchor(AtomCoord in1, AtomCoord in2)
	{
	if (in1.x < in2.x)
	return in1;
	return in2;
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpecies::FindMolecule ()
	{
	int i,j,ii,jj,inum,itype,jtype,loop,looptot;
	int change,done,anychange;
	int *mask = atom->mask;
	int *ilist;
	double bo_tmp,bo_cut;
	double **spec_atom = f_SPECBOND->array_atom;

	inum = reaxc->list->inum;
	ilist = reaxc->list->ilist;

	for (ii = 0; ii < inum; ii++) {
	i = ilist[ii];
	if (mask[i] & groupbit) {
	clusterID[i] = atom->tag[i];
	x0[i].x = spec_atom[i][1];
	x0[i].y = spec_atom[i][2];
	x0[i].z = spec_atom[i][3];
	}
	else clusterID[i] = 0.0;
	}

	loop = 0;
	while (1) {
	comm->forward_comm_fix(this);
	loop ++;

	change = 0;
	while (1) {
	done = 1;

	for (ii = 0; ii < inum; ii++) {
	i = ilist[ii];
	if (!(mask[i] & groupbit)) continue;

	itype = atom->type[i];

	for (jj = 0; jj < MAXSPECBOND; jj++) {
	j = reaxc->tmpid[i][jj];

	if (j < i) continue;
	if (!(mask[j] & groupbit)) continue;

	if (clusterID[i] == clusterID[j] && PBCconnected[i] == PBCconnected[j]
	&& x0[i].x == x0[j].x && x0[i].y == x0[j].y && x0[i].z == x0[j].z) continue;

	jtype = atom->type[j];
	bo_cut = BOCut[itype][jtype];
	bo_tmp = spec_atom[i][jj+7];

	if (bo_tmp > bo_cut) {
	clusterID[i] = clusterID[j] = MIN(clusterID[i], clusterID[j]);
	PBCconnected[i] = PBCconnected[j] = MAX(PBCconnected[i], PBCconnected[j]);
	x0[i] = x0[j] = chAnchor(x0[i], x0[j]);
	if ((fabs(spec_atom[i][1] - spec_atom[j][1]) > reaxc->control->bond_cut)
	\|\| (fabs(spec_atom[i][2] - spec_atom[j][2]) > reaxc->control->bond_cut)
	\|\| (fabs(spec_atom[i][3] - spec_atom[j][3]) > reaxc->control->bond_cut))
	PBCconnected[i] = PBCconnected[j] = 1;
	done = 0;
	}
	}
	}
	if (!done) change = 1;
	if (done) break;
	}
	MPI_Allreduce(&change,&anychange,1,MPI_INT,MPI_MAX,world);
	if (!anychange) break;

	MPI_Allreduce(&loop,&looptot,1,MPI_INT,MPI_SUM,world);
	if (looptot >= 400*nprocs) break;

	}
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpecies::SortMolecule(int &Nmole)
	{
	memory->destroy(molmap);
	molmap = NULL;

	int n, idlo, idhi;
	int *mask =atom->mask;
	int lo = ntotal;
	int hi = -ntotal;
	int flag = 0;
	for (n = 0; n < nlocal; n++) {
	if (!(mask[n] & groupbit)) continue;
	if (clusterID[n] == 0.0) flag = 1;
	lo = MIN(lo,nint(clusterID[n]));
	hi = MAX(hi,nint(clusterID[n]));
	}
	int flagall;
	MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_SUM,world);
	if (flagall && me == 0)
	error->warning(FLERR,"Atom with cluster ID = 0 included in "
	"fix reax/c/species group");
	MPI_Allreduce(&lo,&idlo,1,MPI_INT,MPI_MIN,world);
	MPI_Allreduce(&hi,&idhi,1,MPI_INT,MPI_MAX,world);
	if (idlo == ntotal)
	if (me == 0)
	error->warning(FLERR,"Atom with cluster ID = maxmol "
	"included in fix reax/c/species group");

	int nlen = idhi - idlo + 1;
	memory->create(molmap,nlen,"reax/c/species:molmap");
	for (n = 0; n < nlen; n++) molmap[n] = 0;

	for (n = 0; n < nlocal; n++) {
	if (!(mask[n] & groupbit)) continue;
	molmap[nint(clusterID[n])-idlo] = 1;
	}

	int *molmapall;
	memory->create(molmapall,nlen,"reax/c/species:molmapall");
	MPI_Allreduce(molmap,molmapall,nlen,MPI_INT,MPI_MAX,world);

	Nmole = 0;
	for (n = 0; n < nlen; n++) {
	if (molmapall[n]) molmap[n] = Nmole++;
	else molmap[n] = -1;
	}
	memory->destroy(molmapall);

	flag = 0;
	for (n = 0; n < nlocal; n++) {
	if (mask[n] & groupbit) continue;
	if (nint(clusterID[n]) < idlo \|\| nint(clusterID[n]) > idhi) continue;
	if (molmap[nint(clusterID[n])-idlo] >= 0) flag = 1;
	}

	MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_SUM,world);
	if (flagall && comm->me == 0)
	error->warning(FLERR,"One or more cluster has atoms not in group");

	for (n = 0; n < nlocal; n++) {
	if (!(mask[n] & groupbit)) continue;
	clusterID[n] = molmap[nint(clusterID[n])-idlo] + 1;
	}

	memory->destroy(molmap);
	molmap = NULL;

	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpecies::FindSpecies(int Nmole, int &Nspec)
	{
	int k, l, m, n, itype, cid;
	int flag_identity, flag_mol, flag_spec;
	int flag_tmp;
	int *mask =atom->mask;
	int Nameall, NMolall;

	memory->destroy(MolName);
	MolName = NULL;
	memory->create(MolName,Nmole*(ntypes+1),"reax/c/species:MolName");

	memory->destroy(NMol);
	NMol = NULL;
	memory->create(NMol,Nmole,"reax/c/species:NMol");
	for (m = 0; m < Nmole; m ++)
	NMol[m] = 1;

	memory->create(Nameall,ntypes,"reax/c/species:Nameall");
	memory->create(NMolall,Nmole,"reax/c/species:NMolall");

	for (m = 1, Nspec = 0; m <= Nmole; m ++) {
	for (n = 0; n < ntypes; n ++) Name[n] = 0;
	for (n = 0, flag_mol = 0; n < nlocal; n ++) {
	if (!(mask[n] & groupbit)) continue;
	cid = nint(clusterID[n]);
	if (cid == m) {
	itype = atom->type[n]-1;
	Name[itype] ++;
	flag_mol = 1;
	}
	}
	MPI_Allreduce(&flag_mol,&flag_tmp,1,MPI_INT,MPI_MAX,world);
	flag_mol = flag_tmp;

	MPI_Allreduce(Name,Nameall,ntypes,MPI_INT,MPI_SUM,world);
	for (n = 0; n < ntypes; n++) Name[n] = Nameall[n];

	if (flag_mol == 1) {
	flag_identity = 1;
	for (k = 0; k < Nspec; k ++) {
	flag_spec=0;
	for (l = 0; l < ntypes; l ++)
	if (MolName[ntypes*k+l] != Name[l]) flag_spec = 1;
	if (flag_spec == 0) NMol[k] ++;
	flag_identity *= flag_spec;
	}
	if (Nspec == 0 \|\| flag_identity == 1) {
	for (l = 0; l < ntypes; l ++)
	MolName[ntypes*Nspec+l] = Name[l];
	Nspec ++;
	}
	}
	}
	memory->destroy(NMolall);
	memory->destroy(Nameall);

	memory->destroy(nd);
	nd = NULL;
	memory->create(nd,Nspec,"reax/c/species:nd");

	memory->destroy(MolType);
	MolType = NULL;
	memory->create(MolType,Nspec*(ntypes+2),"reax/c/species:MolType");
	}

	/* ---------------------------------------------------------------------- */

	int FixReaxCSpecies::CheckExistence(int id, int ntypes)
	{
	int i, j, molid, flag;

	for (i = 0; i < Nmoltype; i ++) {
	flag = 0;
	for (j = 0; j < ntypes; j ++) {
	molid = MolType[ntypes * i + j];
	if (molid != MolName[ntypes * id + j]) flag = 1;
	}
	if (flag == 0) return i;
	}
	for (i = 0; i < ntypes; i ++)
	MolType[ntypes * Nmoltype + i] = MolName[ntypes *id + i];

	Nmoltype ++;
	return Nmoltype - 1;
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpecies::WriteFormulas(int Nmole, int Nspec)
	{
	int i, j, itemp;
	bigint ntimestep = update->ntimestep;

	fprintf(fp,"# Timestep No_Moles No_Specs ");

	Nmoltype = 0;

	for (i = 0; i < Nspec; i ++)
	nd[i] = CheckExistence(i, ntypes);

	for (i = 0; i < Nmoltype; i ++) {
	for (j = 0;j < ntypes; j ++) {
	itemp = MolType[ntypes * i + j];
	if (itemp != 0) {
	if (eletype) fprintf(fp,"%s",eletype[j]);
	else fprintf(fp,"%c",ele[j]);
	if (itemp != 1) fprintf(fp,"%d",itemp);
	}
	}
	fprintf(fp,"\t");
	}
	fprintf(fp,"\n");

	fprintf(fp,BIGINT_FORMAT,ntimestep);
	fprintf(fp,"%11d%11d\t",Nmole,Nspec);

	for (i = 0; i < Nmoltype; i ++)
	fprintf(fp," %d\t",NMol[i]);
	fprintf(fp,"\n");

	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpecies::OpenPos()
	{
	char *filecurrent;
	bigint ntimestep = update->ntimestep;

	filecurrent = (char) malloc((strlen(filepos)+16)sizeof(char));
	char ptr = strchr(filepos,'');
	*ptr = '\0';
	if (padflag == 0)
	sprintf(filecurrent,"%s" BIGINT_FORMAT "%s",
	filepos,ntimestep,ptr+1);
	else {
	char bif[8],pad[16];
	strcpy(bif,BIGINT_FORMAT);
	sprintf(pad,"%%s%%0%d%s%%s",padflag,&bif[1]);
	sprintf(filecurrent,pad,filepos,ntimestep,ptr+1);
	}
	ptr = '';

	if (me == 0) {
	pos = fopen(filecurrent, "w");
	if (pos == NULL) error->one(FLERR,"Cannot open fix reax/c/species position file");
	} else pos = NULL;
	multipos_opened = 1;

	free(filecurrent);
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpecies::WritePos(int Nmole, int Nspec)
	{
	int i, itype, cid;
	int count, count_tmp, m, n, k;
	int *Nameall;
	int *mask =atom->mask;
	double avq, avq_tmp, avx[3], avx_tmp, box[3], halfbox[3];
	double **spec_atom = f_SPECBOND->array_atom;

	if (multipos) OpenPos();

	box[0] = domain->boxhi[0] - domain->boxlo[0];
	box[1] = domain->boxhi[1] - domain->boxlo[1];
	box[2] = domain->boxhi[2] - domain->boxlo[2];

	for (int j = 0; j < 3; j++)
	halfbox[j] = box[j] / 2;

	if (me == 0) {
	fprintf(pos,"Timestep " BIGINT_FORMAT " NMole %d NSpec %d xlo %f "
	"xhi %f ylo %f yhi %f zlo %f zhi %f\n",
	update->ntimestep,Nmole, Nspec,
	domain->boxlo[0],domain->boxhi[0],
	domain->boxlo[1],domain->boxhi[1],
	domain->boxlo[2],domain->boxhi[2]);

	fprintf(pos,"ID\tAtom_Count\tType\tAve_q\t\tCoM_x\t\tCoM_y\t\tCoM_z\n");
	}

	Nameall = NULL;
	memory->create(Nameall,ntypes,"reax/c/species:Nameall");

	for (m = 1; m <= Nmole; m ++) {

	count = 0;
	avq = 0.0;
	for (n = 0; n < 3; n++)
	avx[n] = 0.0;
	for (n = 0; n < ntypes; n ++)
	Name[n] = 0;

	for (i = 0; i < nlocal; i ++) {
	if (!(mask[i] & groupbit)) continue;
	cid = nint(clusterID[i]);
	if (cid == m) {
	itype = atom->type[i]-1;
	Name[itype] ++;
	count ++;
	avq += spec_atom[i][0];
	if (PBCconnected[i]) {
	if ((x0[i].x - spec_atom[i][1]) > halfbox[0])
	spec_atom[i][1] += box[0];
	if ((spec_atom[i][1] - x0[i].x) > halfbox[0])
	spec_atom[i][1] -= box[0];
	if ((x0[i].y - spec_atom[i][2]) > halfbox[1])
	spec_atom[i][2] += box[1];
	if ((spec_atom[i][2] - x0[i].y) > halfbox[1])
	spec_atom[i][2] -= box[1];
	if ((x0[i].z - spec_atom[i][3]) > halfbox[2])
	spec_atom[i][3] += box[2];
	if ((spec_atom[i][3] - x0[i].z) > halfbox[2])
	spec_atom[i][3] -= box[2];
	}
	for (n = 0; n < 3; n++)
	avx[n] += spec_atom[i][n+1];
	}
	}

	avq_tmp = 0.0;
	MPI_Allreduce(&avq,&avq_tmp,1,MPI_DOUBLE,MPI_SUM,world);
	avq = avq_tmp;

	for (n = 0; n < 3; n++) {
	avx_tmp = 0.0;
	MPI_Reduce(&avx[n],&avx_tmp,1,MPI_DOUBLE,MPI_SUM,0,world);
	avx[n] = avx_tmp;
	}

	MPI_Reduce(&count,&count_tmp,1,MPI_INT,MPI_SUM,0,world);
	count = count_tmp;

	MPI_Reduce(Name,Nameall,ntypes,MPI_INT,MPI_SUM,0,world);
	for (n = 0; n < ntypes; n++) Name[n] = Nameall[n];

	if (me == 0) {
	fprintf(pos,"%d\t%d\t",m,count);
	for (n = 0; n < ntypes; n++) {
	if (Name[n] != 0) {
	if (eletype) fprintf(pos,"%s",eletype[n]);
	else fprintf(pos,"%c",ele[n]);
	if (Name[n] != 1) fprintf(pos,"%d",Name[n]);
	}
	}
	if (count > 0) {
	avq /= count;
	for (k = 0; k < 3; k++) {
	avx[k] /= count;
	if (avx[k] >= domain->boxhi[k])
	avx[k] -= box[k];
	if (avx[k] < domain->boxlo[k])
	avx[k] += box[k];

	avx[k] -= domain->boxlo[k];
	avx[k] /= box[k];
	}
	fprintf(pos,"\t%.8f \t%.8f \t%.8f \t%.8f",
	avq,avx[0],avx[1],avx[2]);
	}
	fprintf(pos,"\n");
	}
	}
	if (me == 0 && !multipos) fprintf(pos,"#\n");
	memory->destroy(Nameall);
	}

	/* ---------------------------------------------------------------------- */

	double FixReaxCSpecies::compute_vector(int n)
	{
	if (n == 0)
	return vector_nmole;
	if (n == 1)
	return vector_nspec;
	return 0.0;

	}

	/* ---------------------------------------------------------------------- */

	int FixReaxCSpecies::nint(const double &r)
	{
	int i = 0;
	if (r>0.0) i = static_cast<int>(r+0.5);
	else if (r<0.0) i = static_cast<int>(r-0.5);
	return i;
	}

	/* ---------------------------------------------------------------------- */

	int FixReaxCSpecies::pack_forward_comm(int n, int list, double buf,
	int pbc_flag, int *pbc)
	{
	int i,j,m;

	m = 0;
	for (i = 0; i < n; i++) {
	j = list[i];
	buf[m] = clusterID[j];
	buf[m+1] = (double)PBCconnected[j];
	buf[m+2] = x0[j].x;
	buf[m+3] = x0[j].y;
	buf[m+4] = x0[j].z;
	m += 5;
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void FixReaxCSpecies::unpack_forward_comm(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	clusterID[i] = buf[m];
	PBCconnected[i] = (int)buf[m+1];
	x0[i].x = buf[m+2];
	x0[i].y = buf[m+3];
	x0[i].z = buf[m+4];
	m += 5;
	}
	}

	/* ---------------------------------------------------------------------- */

	double FixReaxCSpecies::memory_usage()
	{
	double bytes;

	bytes = 5nmaxsizeof(double); // clusterID + PBCconnected + x0

	return bytes;
	}

	/* ---------------------------------------------------------------------- */
	diff --git a/src/USER-REAXC/fix_reaxc_species.h b/src/USER-REAXC/fix_reaxc_species.h
	index 872ea2528..563a10f39 100644
	--- a/src/USER-REAXC/fix_reaxc_species.h
	+++ b/src/USER-REAXC/fix_reaxc_species.h
	@@ -1,94 +1,94 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifdef FIX_CLASS

	FixStyle(reax/c/species,FixReaxCSpecies)

	#else

	#ifndef LMP_FIX_REAXC_SPECIES_H
	#define LMP_FIX_REAXC_SPECIES_H

	#include "fix.h"
	#include "pointers.h"

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_types.h"
	#include "reaxc_defs.h"

	#define BUFLEN 1000

	namespace LAMMPS_NS {

	typedef struct {
	double x, y, z;
	} AtomCoord;

	class FixReaxCSpecies : public Fix {
	public:
	FixReaxCSpecies(class LAMMPS , int, char *);
	virtual ~FixReaxCSpecies();
	int setmask();
	virtual void init();
	void init_list(int, class NeighList *);
	void setup(int);
	void post_integrate();
	double compute_vector(int);

	protected:
	int me, nprocs, nmax, nlocal, ntypes, ntotal;
	int nrepeat, nfreq, posfreq;
	int Nmoltype, vector_nmole, vector_nspec;
	int Name, MolName, NMol, nd, MolType, molmap;
	double *clusterID;
	int *PBCconnected;
	AtomCoord *x0;

	double bg_cut;
	double **BOCut;
	char **tmparg;

	FILE fp, pos;
	int eleflag, posflag, multipos, padflag, setupflag;
	int singlepos_opened, multipos_opened;
	char ele, eletype, filepos;

	void Output_ReaxC_Bonds(bigint, FILE *);
	void create_compute();
	void create_fix();
	AtomCoord chAnchor(AtomCoord, AtomCoord);
	virtual void FindMolecule();
	void SortMolecule(int &);
	void FindSpecies(int, int &);
	void WriteFormulas(int, int);
	int CheckExistence(int, int);

	int nint(const double &);
	int pack_forward_comm(int, int , double , int, int *);
	void unpack_forward_comm(int, int, double *);
	void OpenPos();
	void WritePos(int, int);
	double memory_usage();

	bigint nvalid;

	class NeighList *list;
	class FixAveAtom *f_SPECBOND;
	class PairReaxC *reaxc;

	};
	}

	#endif
	#endif
	diff --git a/src/USER-REAXC/pair_reax_c.cpp b/src/USER-REAXC/pair_reaxc.cpp
	similarity index 97%
	rename from src/USER-REAXC/pair_reax_c.cpp
	rename to src/USER-REAXC/pair_reaxc.cpp
	index 4933c90f0..d51b0fc2f 100644
	--- a/src/USER-REAXC/pair_reax_c.cpp
	+++ b/src/USER-REAXC/pair_reaxc.cpp
	@@ -1,826 +1,833 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Hasan Metin Aktulga, Purdue University
	(now at Lawrence Berkeley National Laboratory, hmaktulga@lbl.gov)
	Per-atom energy/virial added by Ray Shan (Sandia)
	Fix reax/c/bonds and fix reax/c/species for pair_style reax/c added by
	Ray Shan (Sandia)
	Hybrid and hybrid/overlay compatibility added by Ray Shan (Sandia)
	------------------------------------------------------------------------- */

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "atom.h"
	#include "update.h"
	#include "force.h"
	#include "comm.h"
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "neigh_request.h"
	#include "modify.h"
	#include "fix.h"
	-#include "fix_reax_c.h"
	+#include "fix_reaxc.h"
	#include "citeme.h"
	#include "memory.h"
	#include "error.h"

	#include "reaxc_types.h"
	#include "reaxc_allocate.h"
	#include "reaxc_control.h"
	#include "reaxc_ffield.h"
	#include "reaxc_forces.h"
	#include "reaxc_init_md.h"
	#include "reaxc_io_tools.h"
	#include "reaxc_list.h"
	#include "reaxc_lookup.h"
	#include "reaxc_reset_tools.h"
	#include "reaxc_traj.h"
	#include "reaxc_vector.h"
	#include "fix_reaxc_bonds.h"

	using namespace LAMMPS_NS;

	static const char cite_pair_reax_c[] =
	"pair reax/c command:\n\n"
	"@Article{Aktulga12,\n"
	" author = {H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama},\n"
	" title = {Parallel reactive molecular dynamics: Numerical methods and algorithmic techniques},\n"
	" journal = {Parallel Computing},\n"
	" year = 2012,\n"
	" volume = 38,\n"
	" pages = {245--259}\n"
	"}\n\n";

	/* ---------------------------------------------------------------------- */

	PairReaxC::PairReaxC(LAMMPS *lmp) : Pair(lmp)
	{
	if (lmp->citeme) lmp->citeme->add(cite_pair_reax_c);

	single_enable = 0;
	restartinfo = 0;
	one_coeff = 1;
	manybody_flag = 1;
	ghostneigh = 1;

	system = (reax_system *)
	memory->smalloc(sizeof(reax_system),"reax:system");
	control = (control_params *)
	memory->smalloc(sizeof(control_params),"reax:control");
	data = (simulation_data *)
	memory->smalloc(sizeof(simulation_data),"reax:data");
	workspace = (storage *)
	memory->smalloc(sizeof(storage),"reax:storage");
	lists = (reax_list *)
	memory->smalloc(LIST_N * sizeof(reax_list),"reax:lists");
	out_control = (output_controls *)
	memory->smalloc(sizeof(output_controls),"reax:out_control");
	mpi_data = (mpi_datatypes *)
	memory->smalloc(sizeof(mpi_datatypes),"reax:mpi");

	MPI_Comm_rank(world,&system->my_rank);

	system->my_coords[0] = 0;
	system->my_coords[1] = 0;
	system->my_coords[2] = 0;
	system->num_nbrs = 0;
	system->n = 0; // my atoms
	system->N = 0; // mine + ghosts
	system->bigN = 0; // all atoms in the system
	system->local_cap = 0;
	system->total_cap = 0;
	system->gcell_cap = 0;
	system->bndry_cuts.ghost_nonb = 0;
	system->bndry_cuts.ghost_hbond = 0;
	system->bndry_cuts.ghost_bond = 0;
	system->bndry_cuts.ghost_cutoff = 0;
	system->my_atoms = NULL;
	system->pair_ptr = this;

	fix_reax = NULL;
	tmpid = NULL;
	tmpbo = NULL;

	nextra = 14;
	pvector = new double[nextra];

	setup_flag = 0;
	fixspecies_flag = 0;

	nmax = 0;
	}

	/* ---------------------------------------------------------------------- */

	PairReaxC::~PairReaxC()
	{
	if (copymode) return;

	if (fix_reax) modify->delete_fix("REAXC");

	if (setup_flag) {
	Close_Output_Files( system, control, out_control, mpi_data );

	// deallocate reax data-structures

	if( control->tabulate ) Deallocate_Lookup_Tables( system );

	if( control->hbond_cut > 0 ) Delete_List( lists+HBONDS, world );
	Delete_List( lists+BONDS, world );
	Delete_List( lists+THREE_BODIES, world );
	Delete_List( lists+FAR_NBRS, world );

	DeAllocate_Workspace( control, workspace );
	DeAllocate_System( system );
	}

	memory->destroy( system );
	memory->destroy( control );
	memory->destroy( data );
	memory->destroy( workspace );
	memory->destroy( lists );
	memory->destroy( out_control );
	memory->destroy( mpi_data );

	// deallocate interface storage
	if( allocated ) {
	memory->destroy(setflag);
	memory->destroy(cutsq);
	memory->destroy(cutghost);
	delete [] map;

	delete [] chi;
	delete [] eta;
	delete [] gamma;
	}

	memory->destroy(tmpid);
	memory->destroy(tmpbo);

	delete [] pvector;

	}

	/* ---------------------------------------------------------------------- */

	void PairReaxC::allocate( )
	{
	allocated = 1;
	int n = atom->ntypes;

	memory->create(setflag,n+1,n+1,"pair:setflag");
	memory->create(cutsq,n+1,n+1,"pair:cutsq");
	memory->create(cutghost,n+1,n+1,"pair:cutghost");
	map = new int[n+1];

	chi = new double[n+1];
	eta = new double[n+1];
	gamma = new double[n+1];
	}

	/* ---------------------------------------------------------------------- */

	void PairReaxC::settings(int narg, char **arg)
	{
	if (narg < 1) error->all(FLERR,"Illegal pair_style command");

	// read name of control file or use default controls

	if (strcmp(arg[0],"NULL") == 0) {
	strcpy( control->sim_name, "simulate" );
	control->ensemble = 0;
	out_control->energy_update_freq = 0;
	control->tabulate = 0;

	control->reneighbor = 1;
	control->vlist_cut = control->nonb_cut;
	control->bond_cut = 5.;
	control->hbond_cut = 7.50;
	control->thb_cut = 0.001;
	control->thb_cutsq = 0.00001;
	control->bg_cut = 0.3;

	out_control->write_steps = 0;
	out_control->traj_method = 0;
	strcpy( out_control->traj_title, "default_title" );
	out_control->atom_info = 0;
	out_control->bond_info = 0;
	out_control->angle_info = 0;
	} else Read_Control_File(arg[0], control, out_control);

	// default values

	qeqflag = 1;
	control->lgflag = 0;
	+ control->enobondsflag = 1;
	system->mincap = MIN_CAP;
	system->safezone = SAFE_ZONE;
	system->saferzone = SAFER_ZONE;
	-
	+
	// process optional keywords

	int iarg = 1;

	while (iarg < narg) {
	if (strcmp(arg[iarg],"checkqeq") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal pair_style reax/c command");
	if (strcmp(arg[iarg+1],"yes") == 0) qeqflag = 1;
	else if (strcmp(arg[iarg+1],"no") == 0) qeqflag = 0;
	else error->all(FLERR,"Illegal pair_style reax/c command");
	iarg += 2;
	- } else if (strcmp(arg[iarg],"lgvdw") == 0) {
	+ } else if (strcmp(arg[iarg],"enobonds") == 0) {
	+ if (iarg+2 > narg) error->all(FLERR,"Illegal pair_style reax/c command");
	+ if (strcmp(arg[iarg+1],"yes") == 0) control->enobondsflag = 1;
	+ else if (strcmp(arg[iarg+1],"no") == 0) control->enobondsflag = 0;
	+ else error->all(FLERR,"Illegal pair_style reax/c command");
	+ iarg += 2;
	+ } else if (strcmp(arg[iarg],"lgvdw") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal pair_style reax/c command");
	if (strcmp(arg[iarg+1],"yes") == 0) control->lgflag = 1;
	else if (strcmp(arg[iarg+1],"no") == 0) control->lgflag = 0;
	else error->all(FLERR,"Illegal pair_style reax/c command");
	iarg += 2;
	} else if (strcmp(arg[iarg],"safezone") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal pair_style reax/c command");
	system->safezone = force->numeric(FLERR,arg[iarg+1]);
	if (system->safezone < 0.0)
	error->all(FLERR,"Illegal pair_style reax/c safezone command");
	system->saferzone = system->safezone*1.2;
	iarg += 2;
	} else if (strcmp(arg[iarg],"mincap") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal pair_style reax/c command");
	system->mincap = force->inumeric(FLERR,arg[iarg+1]);
	if (system->mincap < 0)
	error->all(FLERR,"Illegal pair_style reax/c mincap command");
	iarg += 2;
	} else error->all(FLERR,"Illegal pair_style reax/c command");
	}

	// LAMMPS is responsible for generating nbrs

	control->reneighbor = 1;
	}

	/* ---------------------------------------------------------------------- */

	void PairReaxC::coeff( int nargs, char **args )
	{
	if (!allocated) allocate();

	if (nargs != 3 + atom->ntypes)
	error->all(FLERR,"Incorrect args for pair coefficients");

	// insure I,J args are * *

	if (strcmp(args[0],"") != 0 \|\| strcmp(args[1],"") != 0)
	error->all(FLERR,"Incorrect args for pair coefficients");

	// read ffield file

	char *file = args[2];
	FILE *fp;
	fp = force->open_potential(file);
	if (fp != NULL)
	Read_Force_Field(fp, &(system->reax_param), control);
	else {
	char str[128];
	sprintf(str,"Cannot open ReaxFF potential file %s",file);
	error->all(FLERR,str);
	}

	// read args that map atom types to elements in potential file
	// map[i] = which element the Ith atom type is, -1 if NULL

	int itmp = 0;
	int nreax_types = system->reax_param.num_atom_types;
	for (int i = 3; i < nargs; i++) {
	if (strcmp(args[i],"NULL") == 0) {
	map[i-2] = -1;
	itmp ++;
	continue;
	}
	}

	int n = atom->ntypes;

	// pair_coeff element map
	for (int i = 3; i < nargs; i++)
	for (int j = 0; j < nreax_types; j++)
	if (strcasecmp(args[i],system->reax_param.sbp[j].name) == 0) {
	map[i-2] = j;
	itmp ++;
	}

	// error check
	if (itmp != n)
	error->all(FLERR,"Non-existent ReaxFF type");

	for (int i = 1; i <= n; i++)
	for (int j = i; j <= n; j++)
	setflag[i][j] = 0;

	// set setflag i,j for type pairs where both are mapped to elements

	int count = 0;
	for (int i = 1; i <= n; i++)
	for (int j = i; j <= n; j++)
	if (map[i] >= 0 && map[j] >= 0) {
	setflag[i][j] = 1;
	count++;
	}

	if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients");

	}

	/* ---------------------------------------------------------------------- */

	void PairReaxC::init_style( )
	{
	if (!atom->q_flag)
	error->all(FLERR,"Pair style reax/c requires atom attribute q");

	// firstwarn = 1;

	int iqeq;
	for (iqeq = 0; iqeq < modify->nfix; iqeq++)
	if (strstr(modify->fix[iqeq]->style,"qeq/reax")) break;
	if (iqeq == modify->nfix && qeqflag == 1)
	error->all(FLERR,"Pair reax/c requires use of fix qeq/reax");

	system->n = atom->nlocal; // my atoms
	system->N = atom->nlocal + atom->nghost; // mine + ghosts
	system->bigN = static_cast<int> (atom->natoms); // all atoms in the system
	system->wsize = comm->nprocs;

	system->big_box.V = 0;
	system->big_box.box_norms[0] = 0;
	system->big_box.box_norms[1] = 0;
	system->big_box.box_norms[2] = 0;

	if (atom->tag_enable == 0)
	error->all(FLERR,"Pair style reax/c requires atom IDs");
	if (force->newton_pair == 0)
	error->all(FLERR,"Pair style reax/c requires newton pair on");

	// need a half neighbor list w/ Newton off and ghost neighbors
	// built whenever re-neighboring occurs

	int irequest = neighbor->request(this,instance_me);
	neighbor->requests[irequest]->newton = 2;
	neighbor->requests[irequest]->ghost = 1;

	cutmax = MAX3(control->nonb_cut, control->hbond_cut, 2*control->bond_cut);

	for( int i = 0; i < LIST_N; ++i )
	lists[i].allocated = 0;

	if (fix_reax == NULL) {
	char *fixarg = new char[3];
	fixarg[0] = (char *) "REAXC";
	fixarg[1] = (char *) "all";
	fixarg[2] = (char *) "REAXC";
	modify->add_fix(3,fixarg);
	delete [] fixarg;
	fix_reax = (FixReaxC *) modify->fix[modify->nfix-1];
	}
	}

	/* ---------------------------------------------------------------------- */

	void PairReaxC::setup( )
	{
	int oldN;
	int mincap = system->mincap;
	double safezone = system->safezone;

	system->n = atom->nlocal; // my atoms
	system->N = atom->nlocal + atom->nghost; // mine + ghosts
	oldN = system->N;
	system->bigN = static_cast<int> (atom->natoms); // all atoms in the system

	if (setup_flag == 0) {

	setup_flag = 1;

	int *num_bonds = fix_reax->num_bonds;
	int *num_hbonds = fix_reax->num_hbonds;

	control->vlist_cut = neighbor->cutneighmax;

	// determine the local and total capacity

	system->local_cap = MAX( (int)(system->n * safezone), mincap );
	system->total_cap = MAX( (int)(system->N * safezone), mincap );

	// initialize my data structures

	PreAllocate_Space( system, control, workspace, world );
	write_reax_atoms();

	int num_nbrs = estimate_reax_lists();
	if(!Make_List(system->total_cap, num_nbrs, TYP_FAR_NEIGHBOR,
	lists+FAR_NBRS, world))
	error->all(FLERR,"Pair reax/c problem in far neighbor list");

	write_reax_lists();
	Initialize( system, control, data, workspace, &lists, out_control,
	mpi_data, world );
	for( int k = 0; k < system->N; ++k ) {
	num_bonds[k] = system->my_atoms[k].num_bonds;
	num_hbonds[k] = system->my_atoms[k].num_hbonds;
	}

	} else {

	// fill in reax datastructures

	write_reax_atoms();

	// reset the bond list info for new atoms

	for(int k = oldN; k < system->N; ++k)
	Set_End_Index( k, Start_Index( k, lists+BONDS ), lists+BONDS );

	// check if I need to shrink/extend my data-structs

	ReAllocate( system, control, data, workspace, &lists, mpi_data );
	}
	}

	/* ---------------------------------------------------------------------- */

	double PairReaxC::init_one(int i, int j)
	{
	if (setflag[i][j] == 0) error->all(FLERR,"All pair coeffs are not set");

	cutghost[i][j] = cutghost[j][i] = cutmax;
	return cutmax;
	}

	/* ---------------------------------------------------------------------- */

	void PairReaxC::compute(int eflag, int vflag)
	{
	double evdwl,ecoul;
	double t_start, t_end;

	// communicate num_bonds once every reneighboring
	// 2 num arrays stored by fix, grab ptr to them

	if (neighbor->ago == 0) comm->forward_comm_fix(fix_reax);
	int *num_bonds = fix_reax->num_bonds;
	int *num_hbonds = fix_reax->num_hbonds;

	evdwl = ecoul = 0.0;
	if (eflag \|\| vflag) ev_setup(eflag,vflag);
	else ev_unset();

	if (vflag_global) control->virial = 1;
	else control->virial = 0;

	system->n = atom->nlocal; // my atoms
	system->N = atom->nlocal + atom->nghost; // mine + ghosts
	system->bigN = static_cast<int> (atom->natoms); // all atoms in the system

	system->big_box.V = 0;
	system->big_box.box_norms[0] = 0;
	system->big_box.box_norms[1] = 0;
	system->big_box.box_norms[2] = 0;
	if( comm->me == 0 ) t_start = MPI_Wtime();

	// setup data structures

	setup();

	Reset( system, control, data, workspace, &lists, world );
	workspace->realloc.num_far = write_reax_lists();
	// timing for filling in the reax lists
	if( comm->me == 0 ) {
	t_end = MPI_Wtime();
	data->timing.nbrs = t_end - t_start;
	}

	// forces

	Compute_Forces(system,control,data,workspace,&lists,out_control,mpi_data);
	read_reax_forces(vflag);

	for(int k = 0; k < system->N; ++k) {
	num_bonds[k] = system->my_atoms[k].num_bonds;
	num_hbonds[k] = system->my_atoms[k].num_hbonds;
	}

	// energies and pressure

	if (eflag_global) {
	evdwl += data->my_en.e_bond;
	evdwl += data->my_en.e_ov;
	evdwl += data->my_en.e_un;
	evdwl += data->my_en.e_lp;
	evdwl += data->my_en.e_ang;
	evdwl += data->my_en.e_pen;
	evdwl += data->my_en.e_coa;
	evdwl += data->my_en.e_hb;
	evdwl += data->my_en.e_tor;
	evdwl += data->my_en.e_con;
	evdwl += data->my_en.e_vdW;

	ecoul += data->my_en.e_ele;
	ecoul += data->my_en.e_pol;

	// eng_vdwl += evdwl;
	// eng_coul += ecoul;

	// Store the different parts of the energy
	// in a list for output by compute pair command

	pvector[0] = data->my_en.e_bond;
	pvector[1] = data->my_en.e_ov + data->my_en.e_un;
	pvector[2] = data->my_en.e_lp;
	pvector[3] = 0.0;
	pvector[4] = data->my_en.e_ang;
	pvector[5] = data->my_en.e_pen;
	pvector[6] = data->my_en.e_coa;
	pvector[7] = data->my_en.e_hb;
	pvector[8] = data->my_en.e_tor;
	pvector[9] = data->my_en.e_con;
	pvector[10] = data->my_en.e_vdW;
	pvector[11] = data->my_en.e_ele;
	pvector[12] = 0.0;
	pvector[13] = data->my_en.e_pol;
	}

	if (vflag_fdotr) virial_fdotr_compute();

	// Set internal timestep counter to that of LAMMPS

	data->step = update->ntimestep;

	Output_Results( system, control, data, &lists, out_control, mpi_data );

	// populate tmpid and tmpbo arrays for fix reax/c/species
	int i, j;

	if(fixspecies_flag) {
	if (system->N > nmax) {
	memory->destroy(tmpid);
	memory->destroy(tmpbo);
	nmax = system->N;
	memory->create(tmpid,nmax,MAXSPECBOND,"pair:tmpid");
	memory->create(tmpbo,nmax,MAXSPECBOND,"pair:tmpbo");
	}

	for (i = 0; i < system->N; i ++)
	for (j = 0; j < MAXSPECBOND; j ++) {
	tmpbo[i][j] = 0.0;
	tmpid[i][j] = 0;
	}
	FindBond();
	}

	}

	/* ---------------------------------------------------------------------- */

	void PairReaxC::write_reax_atoms()
	{
	int *num_bonds = fix_reax->num_bonds;
	int *num_hbonds = fix_reax->num_hbonds;

	if (system->N > system->total_cap)
	error->all(FLERR,"Too many ghost atoms");

	for( int i = 0; i < system->N; ++i ){
	system->my_atoms[i].orig_id = atom->tag[i];
	system->my_atoms[i].type = map[atom->type[i]];
	system->my_atoms[i].x[0] = atom->x[i][0];
	system->my_atoms[i].x[1] = atom->x[i][1];
	system->my_atoms[i].x[2] = atom->x[i][2];
	system->my_atoms[i].q = atom->q[i];
	system->my_atoms[i].num_bonds = num_bonds[i];
	system->my_atoms[i].num_hbonds = num_hbonds[i];
	}
	}

	/* ---------------------------------------------------------------------- */

	void PairReaxC::get_distance( rvec xj, rvec xi, double d_sqr, rvec dvec )
	{
	(*dvec)[0] = xj[0] - xi[0];
	(*dvec)[1] = xj[1] - xi[1];
	(*dvec)[2] = xj[2] - xi[2];
	d_sqr = SQR((dvec)[0]) + SQR((dvec)[1]) + SQR((dvec)[2]);
	}

	/* ---------------------------------------------------------------------- */

	void PairReaxC::set_far_nbr( far_neighbor_data *fdest,
	int j, double d, rvec dvec )
	{
	fdest->nbr = j;
	fdest->d = d;
	rvec_Copy( fdest->dvec, dvec );
	ivec_MakeZero( fdest->rel_box );
	}

	/* ---------------------------------------------------------------------- */

	int PairReaxC::estimate_reax_lists()
	{
	int itr_i, itr_j, i, j;
	int num_nbrs, num_marked;
	int ilist, jlist, numneigh, firstneigh, marked;
	double d_sqr;
	rvec dvec;
	double **x;

	int mincap = system->mincap;
	double safezone = system->safezone;

	x = atom->x;
	ilist = list->ilist;
	numneigh = list->numneigh;
	firstneigh = list->firstneigh;

	num_nbrs = 0;
	num_marked = 0;
	marked = (int*) calloc( system->N, sizeof(int) );

	int numall = list->inum + list->gnum;

	for( itr_i = 0; itr_i < numall; ++itr_i ){
	i = ilist[itr_i];
	marked[i] = 1;
	++num_marked;
	jlist = firstneigh[i];

	for( itr_j = 0; itr_j < numneigh[i]; ++itr_j ){
	j = jlist[itr_j];
	j &= NEIGHMASK;
	get_distance( x[j], x[i], &d_sqr, &dvec );

	if( d_sqr <= SQR(control->nonb_cut) )
	++num_nbrs;
	}
	}

	free( marked );

	return static_cast<int> (MAX( num_nbrssafezone, mincapMIN_NBRS ));
	}

	/* ---------------------------------------------------------------------- */

	int PairReaxC::write_reax_lists()
	{
	int itr_i, itr_j, i, j;
	int num_nbrs;
	int ilist, jlist, numneigh, *firstneigh;
	double d_sqr;
	rvec dvec;
	double dist, *x;
	reax_list *far_nbrs;
	far_neighbor_data *far_list;

	x = atom->x;
	ilist = list->ilist;
	numneigh = list->numneigh;
	firstneigh = list->firstneigh;

	far_nbrs = lists + FAR_NBRS;
	far_list = far_nbrs->select.far_nbr_list;

	num_nbrs = 0;
	dist = (double*) calloc( system->N, sizeof(double) );

	int numall = list->inum + list->gnum;

	for( itr_i = 0; itr_i < numall; ++itr_i ){
	i = ilist[itr_i];
	jlist = firstneigh[i];
	Set_Start_Index( i, num_nbrs, far_nbrs );

	for( itr_j = 0; itr_j < numneigh[i]; ++itr_j ){
	j = jlist[itr_j];
	j &= NEIGHMASK;
	get_distance( x[j], x[i], &d_sqr, &dvec );

	if( d_sqr <= (control->nonb_cut*control->nonb_cut) ){
	dist[j] = sqrt( d_sqr );
	set_far_nbr( &far_list[num_nbrs], j, dist[j], dvec );
	++num_nbrs;
	}
	}
	Set_End_Index( i, num_nbrs, far_nbrs );
	}

	free( dist );

	return num_nbrs;
	}

	/* ---------------------------------------------------------------------- */

	void PairReaxC::read_reax_forces(int vflag)
	{
	for( int i = 0; i < system->N; ++i ) {
	system->my_atoms[i].f[0] = workspace->f[i][0];
	system->my_atoms[i].f[1] = workspace->f[i][1];
	system->my_atoms[i].f[2] = workspace->f[i][2];

	atom->f[i][0] += -workspace->f[i][0];
	atom->f[i][1] += -workspace->f[i][1];
	atom->f[i][2] += -workspace->f[i][2];
	}

	}

	/* ---------------------------------------------------------------------- */

	void PairReaxC::extract(const char str, int &dim)
	{
	dim = 1;
	if (strcmp(str,"chi") == 0 && chi) {
	for (int i = 1; i <= atom->ntypes; i++)
	if (map[i] >= 0) chi[i] = system->reax_param.sbp[map[i]].chi;
	else chi[i] = 0.0;
	return (void *) chi;
	}
	if (strcmp(str,"eta") == 0 && eta) {
	for (int i = 1; i <= atom->ntypes; i++)
	if (map[i] >= 0) eta[i] = system->reax_param.sbp[map[i]].eta;
	else eta[i] = 0.0;
	return (void *) eta;
	}
	if (strcmp(str,"gamma") == 0 && gamma) {
	for (int i = 1; i <= atom->ntypes; i++)
	if (map[i] >= 0) gamma[i] = system->reax_param.sbp[map[i]].gamma;
	else gamma[i] = 0.0;
	return (void *) gamma;
	}
	return NULL;
	}

	/* ---------------------------------------------------------------------- */

	double PairReaxC::memory_usage()
	{
	double bytes = 0.0;

	// From pair_reax_c
	bytes += 1.0 * system->N * sizeof(int);
	bytes += 1.0 * system->N * sizeof(double);

	// From reaxc_allocate: BO
	bytes += 1.0 * system->total_cap * sizeof(reax_atom);
	bytes += 19.0 * system->total_cap * sizeof(double);
	bytes += 3.0 * system->total_cap * sizeof(int);

	// From reaxc_lists
	bytes += 2.0 * lists->n * sizeof(int);
	bytes += lists->num_intrs * sizeof(three_body_interaction_data);
	bytes += lists->num_intrs * sizeof(bond_data);
	bytes += lists->num_intrs * sizeof(dbond_data);
	bytes += lists->num_intrs * sizeof(dDelta_data);
	bytes += lists->num_intrs * sizeof(far_neighbor_data);
	bytes += lists->num_intrs * sizeof(hbond_data);

	if(fixspecies_flag)
	bytes += 2 * nmax * MAXSPECBOND * sizeof(double);

	return bytes;
	}

	/* ---------------------------------------------------------------------- */

	void PairReaxC::FindBond()
	{
	int i, j, pj, nj;
	double bo_tmp, bo_cut;

	bond_data *bo_ij;
	bo_cut = 0.10;

	for (i = 0; i < system->n; i++) {
	nj = 0;
	for( pj = Start_Index(i, lists); pj < End_Index(i, lists); ++pj ) {
	bo_ij = &( lists->select.bond_list[pj] );
	j = bo_ij->nbr;
	if (j < i) continue;

	bo_tmp = bo_ij->bo_data.BO;

	if (bo_tmp >= bo_cut ) {
	tmpid[i][nj] = j;
	tmpbo[i][nj] = bo_tmp;
	nj ++;
	if (nj > MAXSPECBOND) error->all(FLERR,"Increase MAXSPECBOND in reaxc_defs.h");
	}
	}
	}
	}
	diff --git a/src/USER-REAXC/pair_reax_c.h b/src/USER-REAXC/pair_reaxc.h
	similarity index 100%
	rename from src/USER-REAXC/pair_reax_c.h
	rename to src/USER-REAXC/pair_reaxc.h
	diff --git a/src/USER-REAXC/reaxc_allocate.cpp b/src/USER-REAXC/reaxc_allocate.cpp
	index dc8545e00..969912e08 100644
	--- a/src/USER-REAXC/reaxc_allocate.cpp
	+++ b/src/USER-REAXC/reaxc_allocate.cpp
	@@ -1,461 +1,461 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_allocate.h"
	#include "reaxc_list.h"
	#include "reaxc_reset_tools.h"
	#include "reaxc_tool_box.h"
	#include "reaxc_vector.h"

	/* allocate space for my_atoms
	important: we cannot know the exact number of atoms that will fall into a
	process's box throughout the whole simulation. therefore
	we need to make upper bound estimates for various data structures */
	int PreAllocate_Space( reax_system system, control_params control,
	storage *workspace, MPI_Comm comm )
	{
	int mincap = system->mincap;
	double safezone = system->safezone;

	// determine the local and total capacity

	system->local_cap = MAX( (int)(system->n * safezone), mincap );
	system->total_cap = MAX( (int)(system->N * safezone), mincap );

	system->my_atoms = (reax_atom*)
	scalloc( system->total_cap, sizeof(reax_atom), "my_atoms", comm );

	return SUCCESS;
	}


	/*********** system ***********/

	int Allocate_System( reax_system *system, int local_cap, int total_cap,
	char *msg )
	{
	system->my_atoms = (reax_atom*)
	realloc( system->my_atoms, total_cap*sizeof(reax_atom) );

	return SUCCESS;
	}


	void DeAllocate_System( reax_system *system )
	{
	int i, j, k;
	int ntypes;
	reax_interaction *ff_params;

	// dealloocate the atom list
	sfree( system->my_atoms, "system->my_atoms" );

	// deallocate the ffield parameters storage
	ff_params = &(system->reax_param);
	ntypes = ff_params->num_atom_types;

	sfree( ff_params->gp.l, "ff:globals" );

	for( i = 0; i < ntypes; ++i ) {
	for( j = 0; j < ntypes; ++j ) {
	for( k = 0; k < ntypes; ++k ) {
	sfree( ff_params->fbp[i][j][k], "ff:fbp[i,j,k]" );
	}
	sfree( ff_params->fbp[i][j], "ff:fbp[i,j]" );
	sfree( ff_params->thbp[i][j], "ff:thbp[i,j]" );
	sfree( ff_params->hbp[i][j], "ff:hbp[i,j]" );
	}
	sfree( ff_params->fbp[i], "ff:fbp[i]" );
	sfree( ff_params->thbp[i], "ff:thbp[i]" );
	sfree( ff_params->hbp[i], "ff:hbp[i]" );
	sfree( ff_params->tbp[i], "ff:tbp[i]" );
	}
	sfree( ff_params->fbp, "ff:fbp" );
	sfree( ff_params->thbp, "ff:thbp" );
	sfree( ff_params->hbp, "ff:hbp" );
	sfree( ff_params->tbp, "ff:tbp" );
	sfree( ff_params->sbp, "ff:sbp" );
	}


	/*********** workspace ***********/
	void DeAllocate_Workspace( control_params control, storage workspace )
	{
	int i;

	if( !workspace->allocated )
	return;

	workspace->allocated = 0;

	/* communication storage */
	for( i = 0; i < MAX_NBRS; ++i ) {
	sfree( workspace->tmp_dbl[i], "tmp_dbl[i]" );
	sfree( workspace->tmp_rvec[i], "tmp_rvec[i]" );
	sfree( workspace->tmp_rvec2[i], "tmp_rvec2[i]" );
	}

	/* bond order storage */
	sfree( workspace->within_bond_box, "skin" );
	sfree( workspace->total_bond_order, "total_bo" );
	sfree( workspace->Deltap, "Deltap" );
	sfree( workspace->Deltap_boc, "Deltap_boc" );
	sfree( workspace->dDeltap_self, "dDeltap_self" );
	sfree( workspace->Delta, "Delta" );
	sfree( workspace->Delta_lp, "Delta_lp" );
	sfree( workspace->Delta_lp_temp, "Delta_lp_temp" );
	sfree( workspace->dDelta_lp, "dDelta_lp" );
	sfree( workspace->dDelta_lp_temp, "dDelta_lp_temp" );
	sfree( workspace->Delta_e, "Delta_e" );
	sfree( workspace->Delta_boc, "Delta_boc" );
	sfree( workspace->Delta_val, "Delta_val" );
	sfree( workspace->nlp, "nlp" );
	sfree( workspace->nlp_temp, "nlp_temp" );
	sfree( workspace->Clp, "Clp" );
	sfree( workspace->vlpex, "vlpex" );
	sfree( workspace->bond_mark, "bond_mark" );
	sfree( workspace->done_after, "done_after" );

	/* QEq storage */
	sfree( workspace->Hdia_inv, "Hdia_inv" );
	sfree( workspace->b_s, "b_s" );
	sfree( workspace->b_t, "b_t" );
	sfree( workspace->b_prc, "b_prc" );
	sfree( workspace->b_prm, "b_prm" );
	sfree( workspace->s, "s" );
	sfree( workspace->t, "t" );
	sfree( workspace->droptol, "droptol" );
	sfree( workspace->b, "b" );
	sfree( workspace->x, "x" );

	/* GMRES storage */
	for( i = 0; i < RESTART+1; ++i ) {
	sfree( workspace->h[i], "h[i]" );
	sfree( workspace->v[i], "v[i]" );
	}
	sfree( workspace->h, "h" );
	sfree( workspace->v, "v" );
	sfree( workspace->y, "y" );
	sfree( workspace->z, "z" );
	sfree( workspace->g, "g" );
	sfree( workspace->hs, "hs" );
	sfree( workspace->hc, "hc" );
	/* CG storage */
	sfree( workspace->r, "r" );
	sfree( workspace->d, "d" );
	sfree( workspace->q, "q" );
	sfree( workspace->p, "p" );
	sfree( workspace->r2, "r2" );
	sfree( workspace->d2, "d2" );
	sfree( workspace->q2, "q2" );
	sfree( workspace->p2, "p2" );

	/* integrator */
	sfree( workspace->v_const, "v_const" );

	/* force related storage */
	sfree( workspace->f, "f" );
	sfree( workspace->CdDelta, "CdDelta" );

	}


	int Allocate_Workspace( reax_system system, control_params control,
	storage *workspace, int local_cap, int total_cap,
	MPI_Comm comm, char *msg )
	{
	int i, total_real, total_rvec, local_rvec;

	workspace->allocated = 1;
	total_real = total_cap * sizeof(double);
	total_rvec = total_cap * sizeof(rvec);
	local_rvec = local_cap * sizeof(rvec);

	/* communication storage */
	for( i = 0; i < MAX_NBRS; ++i ) {
	workspace->tmp_dbl[i] = (double*)
	scalloc( total_cap, sizeof(double), "tmp_dbl", comm );
	workspace->tmp_rvec[i] = (rvec*)
	scalloc( total_cap, sizeof(rvec), "tmp_rvec", comm );
	workspace->tmp_rvec2[i] = (rvec2*)
	scalloc( total_cap, sizeof(rvec2), "tmp_rvec2", comm );
	}

	/* bond order related storage */
	workspace->within_bond_box = (int*)
	scalloc( total_cap, sizeof(int), "skin", comm );
	workspace->total_bond_order = (double*) smalloc( total_real, "total_bo", comm );
	workspace->Deltap = (double*) smalloc( total_real, "Deltap", comm );
	workspace->Deltap_boc = (double*) smalloc( total_real, "Deltap_boc", comm );
	workspace->dDeltap_self = (rvec*) smalloc( total_rvec, "dDeltap_self", comm );
	workspace->Delta = (double*) smalloc( total_real, "Delta", comm );
	workspace->Delta_lp = (double*) smalloc( total_real, "Delta_lp", comm );
	workspace->Delta_lp_temp = (double*)
	smalloc( total_real, "Delta_lp_temp", comm );
	workspace->dDelta_lp = (double*) smalloc( total_real, "dDelta_lp", comm );
	workspace->dDelta_lp_temp = (double*)
	smalloc( total_real, "dDelta_lp_temp", comm );
	workspace->Delta_e = (double*) smalloc( total_real, "Delta_e", comm );
	workspace->Delta_boc = (double*) smalloc( total_real, "Delta_boc", comm );
	workspace->Delta_val = (double*) smalloc( total_real, "Delta_val", comm );
	workspace->nlp = (double*) smalloc( total_real, "nlp", comm );
	workspace->nlp_temp = (double*) smalloc( total_real, "nlp_temp", comm );
	workspace->Clp = (double*) smalloc( total_real, "Clp", comm );
	workspace->vlpex = (double*) smalloc( total_real, "vlpex", comm );
	workspace->bond_mark = (int*)
	scalloc( total_cap, sizeof(int), "bond_mark", comm );
	workspace->done_after = (int*)
	scalloc( total_cap, sizeof(int), "done_after", comm );

	/* QEq storage */
	workspace->Hdia_inv = (double*)
	scalloc( total_cap, sizeof(double), "Hdia_inv", comm );
	workspace->b_s = (double*) scalloc( total_cap, sizeof(double), "b_s", comm );
	workspace->b_t = (double*) scalloc( total_cap, sizeof(double), "b_t", comm );
	workspace->b_prc = (double*) scalloc( total_cap, sizeof(double), "b_prc", comm );
	workspace->b_prm = (double*) scalloc( total_cap, sizeof(double), "b_prm", comm );
	workspace->s = (double*) scalloc( total_cap, sizeof(double), "s", comm );
	workspace->t = (double*) scalloc( total_cap, sizeof(double), "t", comm );
	workspace->droptol = (double*)
	scalloc( total_cap, sizeof(double), "droptol", comm );
	workspace->b = (rvec2*) scalloc( total_cap, sizeof(rvec2), "b", comm );
	workspace->x = (rvec2*) scalloc( total_cap, sizeof(rvec2), "x", comm );

	/* GMRES storage */
	workspace->y = (double*) scalloc( RESTART+1, sizeof(double), "y", comm );
	workspace->z = (double*) scalloc( RESTART+1, sizeof(double), "z", comm );
	workspace->g = (double*) scalloc( RESTART+1, sizeof(double), "g", comm );
	workspace->h = (double*) scalloc( RESTART+1, sizeof(double), "h", comm );
	workspace->hs = (double*) scalloc( RESTART+1, sizeof(double), "hs", comm );
	workspace->hc = (double*) scalloc( RESTART+1, sizeof(double), "hc", comm );
	workspace->v = (double*) scalloc( RESTART+1, sizeof(double), "v", comm );

	for( i = 0; i < RESTART+1; ++i ) {
	workspace->h[i] = (double*) scalloc( RESTART+1, sizeof(double), "h[i]", comm );
	workspace->v[i] = (double*) scalloc( total_cap, sizeof(double), "v[i]", comm );
	}

	/* CG storage */
	workspace->r = (double*) scalloc( total_cap, sizeof(double), "r", comm );
	workspace->d = (double*) scalloc( total_cap, sizeof(double), "d", comm );
	workspace->q = (double*) scalloc( total_cap, sizeof(double), "q", comm );
	workspace->p = (double*) scalloc( total_cap, sizeof(double), "p", comm );
	workspace->r2 = (rvec2*) scalloc( total_cap, sizeof(rvec2), "r2", comm );
	workspace->d2 = (rvec2*) scalloc( total_cap, sizeof(rvec2), "d2", comm );
	workspace->q2 = (rvec2*) scalloc( total_cap, sizeof(rvec2), "q2", comm );
	workspace->p2 = (rvec2*) scalloc( total_cap, sizeof(rvec2), "p2", comm );

	/* integrator storage */
	workspace->v_const = (rvec*) smalloc( local_rvec, "v_const", comm );

	// /* force related storage */
	workspace->f = (rvec*) scalloc( total_cap, sizeof(rvec), "f", comm );
	workspace->CdDelta = (double*)
	scalloc( total_cap, sizeof(double), "CdDelta", comm );

	return SUCCESS;
	}


	static void Reallocate_Neighbor_List( reax_list *far_nbrs, int n,
	int num_intrs, MPI_Comm comm )
	{
	Delete_List( far_nbrs, comm );
	if(!Make_List( n, num_intrs, TYP_FAR_NEIGHBOR, far_nbrs, comm )){
	fprintf(stderr, "Problem in initializing far nbrs list. Terminating!\n");
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}
	}


	static int Reallocate_HBonds_List( reax_system system, reax_list hbonds,
	MPI_Comm comm )
	{
	int i, id, total_hbonds;

	int mincap = system->mincap;
	double saferzone = system->saferzone;

	total_hbonds = 0;
	for( i = 0; i < system->n; ++i )
	if( (id = system->my_atoms[i].Hindex) >= 0 ) {
	total_hbonds += system->my_atoms[i].num_hbonds;
	}
	total_hbonds = (int)(MAX( total_hbondssaferzone, mincapMIN_HBONDS ));

	Delete_List( hbonds, comm );
	if( !Make_List( system->Hcap, total_hbonds, TYP_HBOND, hbonds, comm ) ) {
	fprintf( stderr, "not enough space for hbonds list. terminating!\n" );
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}

	return total_hbonds;
	}


	static int Reallocate_Bonds_List( reax_system system, reax_list bonds,
	int total_bonds, int est_3body,
	MPI_Comm comm )
	{
	int i;

	int mincap = system->mincap;
	double safezone = system->safezone;

	*total_bonds = 0;
	*est_3body = 0;
	for( i = 0; i < system->N; ++i ){
	*est_3body += SQR(system->my_atoms[i].num_bonds);
	*total_bonds += system->my_atoms[i].num_bonds;
	}
	total_bonds = (int)(MAX( total_bonds * safezone, mincap*MIN_BONDS ));

	Delete_List( bonds, comm );
	if(!Make_List(system->total_cap, *total_bonds, TYP_BOND, bonds, comm)) {
	fprintf( stderr, "not enough space for bonds list. terminating!\n" );
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}

	return SUCCESS;
	}


	void ReAllocate( reax_system system, control_params control,
	simulation_data data, storage workspace, reax_list **lists,
	mpi_datatypes *mpi_data )
	{
	int num_bonds, est_3body, Hflag, ret;
	int renbr, newsize;
	reallocate_data *realloc;
	reax_list *far_nbrs;
	MPI_Comm comm;
	char msg[200];

	int mincap = system->mincap;
	double safezone = system->safezone;
	double saferzone = system->saferzone;

	realloc = &(workspace->realloc);
	comm = mpi_data->world;

	if( system->n >= DANGER_ZONE * system->local_cap \|\|
	(0 && system->n <= LOOSE_ZONE * system->local_cap) ) {
	system->local_cap = MAX( (int)(system->n * safezone), mincap );
	}

	int Nflag = 0;
	if( system->N >= DANGER_ZONE * system->total_cap \|\|
	(0 && system->N <= LOOSE_ZONE * system->total_cap) ) {
	Nflag = 1;
	system->total_cap = MAX( (int)(system->N * safezone), mincap );
	}

	if( Nflag ) {
	/* system */
	ret = Allocate_System( system, system->local_cap, system->total_cap, msg );
	if( ret != SUCCESS ) {
	fprintf( stderr, "not enough space for atom_list: total_cap=%d",
	system->total_cap );
	fprintf( stderr, "terminating...\n" );
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}

	/* workspace */
	DeAllocate_Workspace( control, workspace );
	ret = Allocate_Workspace( system, control, workspace, system->local_cap,
	system->total_cap, comm, msg );
	if( ret != SUCCESS ) {
	fprintf( stderr, "no space for workspace: local_cap=%d total_cap=%d",
	system->local_cap, system->total_cap );
	fprintf( stderr, "terminating...\n" );
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}
	}


	renbr = (data->step - data->prev_steps) % control->reneighbor == 0;
	/* far neighbors */
	if( renbr ) {
	far_nbrs = *lists + FAR_NBRS;

	if( Nflag \|\| realloc->num_far >= far_nbrs->num_intrs * DANGER_ZONE ) {
	if( realloc->num_far > far_nbrs->num_intrs ) {
	fprintf( stderr, "step%d-ran out of space on far_nbrs: top=%d, max=%d",
	data->step, realloc->num_far, far_nbrs->num_intrs );
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}

	newsize = static_cast<int>
	(MAX( realloc->num_farsafezone, mincapMIN_NBRS ));

	Reallocate_Neighbor_List( far_nbrs, system->total_cap, newsize, comm );
	realloc->num_far = 0;
	}
	}

	/* hydrogen bonds list */
	if( control->hbond_cut > 0 ) {
	Hflag = 0;
	if( system->numH >= DANGER_ZONE * system->Hcap \|\|
	(0 && system->numH <= LOOSE_ZONE * system->Hcap) ) {
	Hflag = 1;
	system->Hcap = int(MAX( system->numH * saferzone, mincap ));
	}

	if( Hflag \|\| realloc->hbonds ) {
	ret = Reallocate_HBonds_List( system, (*lists)+HBONDS, comm );
	realloc->hbonds = 0;
	}
	}

	/* bonds list */
	num_bonds = est_3body = -1;
	if( Nflag \|\| realloc->bonds ){
	Reallocate_Bonds_List( system, (*lists)+BONDS, &num_bonds,
	&est_3body, comm );
	realloc->bonds = 0;
	realloc->num_3body = MAX( realloc->num_3body, est_3body );
	}

	/* 3-body list */
	if( realloc->num_3body > 0 ) {
	Delete_List( (*lists)+THREE_BODIES, comm );

	if( num_bonds == -1 )
	num_bonds = ((*lists)+BONDS)->num_intrs;

	realloc->num_3body = (int)(MAX(realloc->num_3body*safezone, MIN_3BODIES));

	if( !Make_List( num_bonds, realloc->num_3body, TYP_THREE_BODY,
	(*lists)+THREE_BODIES, comm ) ) {
	fprintf( stderr, "Problem in initializing angles list. Terminating!\n" );
	MPI_Abort( comm, CANNOT_INITIALIZE );
	}
	realloc->num_3body = -1;
	}

	}
	diff --git a/src/USER-REAXC/reaxc_bond_orders.cpp b/src/USER-REAXC/reaxc_bond_orders.cpp
	index 0b4ca21ad..04cedf18a 100644
	--- a/src/USER-REAXC/reaxc_bond_orders.cpp
	+++ b/src/USER-REAXC/reaxc_bond_orders.cpp
	@@ -1,592 +1,592 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_types.h"
	#include "reaxc_bond_orders.h"
	#include "reaxc_list.h"
	#include "reaxc_vector.h"

	void Add_dBond_to_Forces_NPT( int i, int pj, simulation_data *data,
	storage workspace, reax_list *lists )
	{
	reax_list bonds = (lists) + BONDS;
	bond_data nbr_j, nbr_k;
	bond_order_data bo_ij, bo_ji;
	dbond_coefficients coef;
	rvec temp, ext_press;
	ivec rel_box;
	int pk, k, j;

	/* Initializations */
	nbr_j = &(bonds->select.bond_list[pj]);
	j = nbr_j->nbr;
	bo_ij = &(nbr_j->bo_data);
	bo_ji = &(bonds->select.bond_list[ nbr_j->sym_index ].bo_data);

	coef.C1dbo = bo_ij->C1dbo * (bo_ij->Cdbo + bo_ji->Cdbo);
	coef.C2dbo = bo_ij->C2dbo * (bo_ij->Cdbo + bo_ji->Cdbo);
	coef.C3dbo = bo_ij->C3dbo * (bo_ij->Cdbo + bo_ji->Cdbo);

	coef.C1dbopi = bo_ij->C1dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);
	coef.C2dbopi = bo_ij->C2dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);
	coef.C3dbopi = bo_ij->C3dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);
	coef.C4dbopi = bo_ij->C4dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);

	coef.C1dbopi2 = bo_ij->C1dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);
	coef.C2dbopi2 = bo_ij->C2dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);
	coef.C3dbopi2 = bo_ij->C3dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);
	coef.C4dbopi2 = bo_ij->C4dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);

	coef.C1dDelta = bo_ij->C1dbo * (workspace->CdDelta[i]+workspace->CdDelta[j]);
	coef.C2dDelta = bo_ij->C2dbo * (workspace->CdDelta[i]+workspace->CdDelta[j]);
	coef.C3dDelta = bo_ij->C3dbo * (workspace->CdDelta[i]+workspace->CdDelta[j]);

	for( pk = Start_Index(i, bonds); pk < End_Index(i, bonds); ++pk ) {
	nbr_k = &(bonds->select.bond_list[pk]);
	k = nbr_k->nbr;

	rvec_Scale(temp, -coef.C2dbo, nbr_k->bo_data.dBOp); /2nd, dBO/
	rvec_ScaledAdd(temp, -coef.C2dDelta, nbr_k->bo_data.dBOp);/dDelta/
	rvec_ScaledAdd(temp, -coef.C3dbopi, nbr_k->bo_data.dBOp); /3rd, dBOpi/
	rvec_ScaledAdd(temp, -coef.C3dbopi2, nbr_k->bo_data.dBOp);/3rd, dBOpi2/

	/* force */
	rvec_Add( workspace->f[k], temp );
	/* pressure */
	rvec_iMultiply( ext_press, nbr_k->rel_box, temp );
	rvec_Add( data->my_ext_press, ext_press );

	}

	/* then atom i itself */
	rvec_Scale( temp, coef.C1dbo, bo_ij->dBOp ); /1st,dBO/
	rvec_ScaledAdd( temp, coef.C2dbo, workspace->dDeltap_self[i] ); /2nd,dBO/
	rvec_ScaledAdd( temp, coef.C1dDelta, bo_ij->dBOp ); /1st,dBO/
	rvec_ScaledAdd( temp, coef.C2dDelta, workspace->dDeltap_self[i] );/2nd,dBO/
	rvec_ScaledAdd( temp, coef.C1dbopi, bo_ij->dln_BOp_pi ); /1st,dBOpi/
	rvec_ScaledAdd( temp, coef.C2dbopi, bo_ij->dBOp ); /2nd,dBOpi/
	rvec_ScaledAdd( temp, coef.C3dbopi, workspace->dDeltap_self[i]);/3rd,dBOpi/

	rvec_ScaledAdd( temp, coef.C1dbopi2, bo_ij->dln_BOp_pi2 ); /1st,dBO_pi2/
	rvec_ScaledAdd( temp, coef.C2dbopi2, bo_ij->dBOp ); /2nd,dBO_pi2/
	rvec_ScaledAdd( temp, coef.C3dbopi2, workspace->dDeltap_self[i] );/3rd/

	/* force */
	rvec_Add( workspace->f[i], temp );

	for( pk = Start_Index(j, bonds); pk < End_Index(j, bonds); ++pk ) {
	nbr_k = &(bonds->select.bond_list[pk]);
	k = nbr_k->nbr;

	rvec_Scale( temp, -coef.C3dbo, nbr_k->bo_data.dBOp ); /3rd,dBO/
	rvec_ScaledAdd( temp, -coef.C3dDelta, nbr_k->bo_data.dBOp);/dDelta/
	rvec_ScaledAdd( temp, -coef.C4dbopi, nbr_k->bo_data.dBOp); /4th,dBOpi/
	rvec_ScaledAdd( temp, -coef.C4dbopi2, nbr_k->bo_data.dBOp);/4th,dBOpi2/

	/* force */
	rvec_Add( workspace->f[k], temp );
	/* pressure */
	if( k != i ) {
	ivec_Sum( rel_box, nbr_k->rel_box, nbr_j->rel_box ); //rel_box(k, i)
	rvec_iMultiply( ext_press, rel_box, temp );
	rvec_Add( data->my_ext_press, ext_press );

	}
	}

	/* then atom j itself */
	rvec_Scale( temp, -coef.C1dbo, bo_ij->dBOp ); /1st, dBO/
	rvec_ScaledAdd( temp, coef.C3dbo, workspace->dDeltap_self[j] ); /2nd, dBO/
	rvec_ScaledAdd( temp, -coef.C1dDelta, bo_ij->dBOp ); /1st, dBO/
	rvec_ScaledAdd( temp, coef.C3dDelta, workspace->dDeltap_self[j]);/2nd, dBO/

	rvec_ScaledAdd( temp, -coef.C1dbopi, bo_ij->dln_BOp_pi ); /1st,dBOpi/
	rvec_ScaledAdd( temp, -coef.C2dbopi, bo_ij->dBOp ); /2nd,dBOpi/
	rvec_ScaledAdd( temp, coef.C4dbopi, workspace->dDeltap_self[j]);/3rd,dBOpi/

	rvec_ScaledAdd( temp, -coef.C1dbopi2, bo_ij->dln_BOp_pi2 ); /1st,dBOpi2/
	rvec_ScaledAdd( temp, -coef.C2dbopi2, bo_ij->dBOp ); /2nd,dBOpi2/
	rvec_ScaledAdd( temp,coef.C4dbopi2,workspace->dDeltap_self[j]);/3rd,dBOpi2/

	/* force */
	rvec_Add( workspace->f[j], temp );
	/* pressure */
	rvec_iMultiply( ext_press, nbr_j->rel_box, temp );
	rvec_Add( data->my_ext_press, ext_press );

	}

	void Add_dBond_to_Forces( reax_system *system, int i, int pj,
	storage workspace, reax_list *lists )
	{
	reax_list bonds = (lists) + BONDS;
	bond_data nbr_j, nbr_k;
	bond_order_data bo_ij, bo_ji;
	dbond_coefficients coef;
	int pk, k, j;

	/* Virial Tallying variables */
	rvec fi_tmp, fj_tmp, fk_tmp, delij, delji, delki, delkj, temp;

	/* Initializations */
	nbr_j = &(bonds->select.bond_list[pj]);
	j = nbr_j->nbr;
	bo_ij = &(nbr_j->bo_data);
	bo_ji = &(bonds->select.bond_list[ nbr_j->sym_index ].bo_data);

	coef.C1dbo = bo_ij->C1dbo * (bo_ij->Cdbo + bo_ji->Cdbo);
	coef.C2dbo = bo_ij->C2dbo * (bo_ij->Cdbo + bo_ji->Cdbo);
	coef.C3dbo = bo_ij->C3dbo * (bo_ij->Cdbo + bo_ji->Cdbo);

	coef.C1dbopi = bo_ij->C1dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);
	coef.C2dbopi = bo_ij->C2dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);
	coef.C3dbopi = bo_ij->C3dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);
	coef.C4dbopi = bo_ij->C4dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);

	coef.C1dbopi2 = bo_ij->C1dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);
	coef.C2dbopi2 = bo_ij->C2dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);
	coef.C3dbopi2 = bo_ij->C3dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);
	coef.C4dbopi2 = bo_ij->C4dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);

	coef.C1dDelta = bo_ij->C1dbo * (workspace->CdDelta[i]+workspace->CdDelta[j]);
	coef.C2dDelta = bo_ij->C2dbo * (workspace->CdDelta[i]+workspace->CdDelta[j]);
	coef.C3dDelta = bo_ij->C3dbo * (workspace->CdDelta[i]+workspace->CdDelta[j]);

	// forces on i
	rvec_Scale( temp, coef.C1dbo, bo_ij->dBOp );
	rvec_ScaledAdd( temp, coef.C2dbo, workspace->dDeltap_self[i] );
	rvec_ScaledAdd( temp, coef.C1dDelta, bo_ij->dBOp );
	rvec_ScaledAdd( temp, coef.C2dDelta, workspace->dDeltap_self[i] );
	rvec_ScaledAdd( temp, coef.C1dbopi, bo_ij->dln_BOp_pi );
	rvec_ScaledAdd( temp, coef.C2dbopi, bo_ij->dBOp );
	rvec_ScaledAdd( temp, coef.C3dbopi, workspace->dDeltap_self[i]);
	rvec_ScaledAdd( temp, coef.C1dbopi2, bo_ij->dln_BOp_pi2 );
	rvec_ScaledAdd( temp, coef.C2dbopi2, bo_ij->dBOp );
	rvec_ScaledAdd( temp, coef.C3dbopi2, workspace->dDeltap_self[i] );
	rvec_Add( workspace->f[i], temp );

	if( system->pair_ptr->vflag_atom) {
	rvec_Scale(fi_tmp, -1.0, temp);
	rvec_ScaledSum( delij, 1., system->my_atoms[i].x,-1., system->my_atoms[j].x );
	system->pair_ptr->v_tally(i,fi_tmp,delij);
	}

	// forces on j
	rvec_Scale( temp, -coef.C1dbo, bo_ij->dBOp );
	rvec_ScaledAdd( temp, coef.C3dbo, workspace->dDeltap_self[j] );
	rvec_ScaledAdd( temp, -coef.C1dDelta, bo_ij->dBOp );
	rvec_ScaledAdd( temp, coef.C3dDelta, workspace->dDeltap_self[j]);
	rvec_ScaledAdd( temp, -coef.C1dbopi, bo_ij->dln_BOp_pi );
	rvec_ScaledAdd( temp, -coef.C2dbopi, bo_ij->dBOp );
	rvec_ScaledAdd( temp, coef.C4dbopi, workspace->dDeltap_self[j]);
	rvec_ScaledAdd( temp, -coef.C1dbopi2, bo_ij->dln_BOp_pi2 );
	rvec_ScaledAdd( temp, -coef.C2dbopi2, bo_ij->dBOp );
	rvec_ScaledAdd( temp, coef.C4dbopi2, workspace->dDeltap_self[j]);
	rvec_Add( workspace->f[j], temp );

	if( system->pair_ptr->vflag_atom) {
	rvec_Scale(fj_tmp, -1.0, temp);
	rvec_ScaledSum( delji, 1., system->my_atoms[j].x,-1., system->my_atoms[i].x );
	system->pair_ptr->v_tally(j,fj_tmp,delji);
	}

	// forces on k: i neighbor
	for( pk = Start_Index(i, bonds); pk < End_Index(i, bonds); ++pk ) {
	nbr_k = &(bonds->select.bond_list[pk]);
	k = nbr_k->nbr;

	rvec_Scale( temp, -coef.C2dbo, nbr_k->bo_data.dBOp);
	rvec_ScaledAdd( temp, -coef.C2dDelta, nbr_k->bo_data.dBOp);
	rvec_ScaledAdd( temp, -coef.C3dbopi, nbr_k->bo_data.dBOp);
	rvec_ScaledAdd( temp, -coef.C3dbopi2, nbr_k->bo_data.dBOp);
	rvec_Add( workspace->f[k], temp );

	if( system->pair_ptr->vflag_atom ) {
	rvec_Scale(fk_tmp, -1.0, temp);
	rvec_ScaledSum(delki,1.,system->my_atoms[k].x,-1.,system->my_atoms[i].x);
	system->pair_ptr->v_tally(k,fk_tmp,delki);
	rvec_ScaledSum(delkj,1.,system->my_atoms[k].x,-1.,system->my_atoms[j].x);
	system->pair_ptr->v_tally(k,fk_tmp,delkj);
	}
	}

	// forces on k: j neighbor
	for( pk = Start_Index(j, bonds); pk < End_Index(j, bonds); ++pk ) {
	nbr_k = &(bonds->select.bond_list[pk]);
	k = nbr_k->nbr;

	rvec_Scale( temp, -coef.C3dbo, nbr_k->bo_data.dBOp );
	rvec_ScaledAdd( temp, -coef.C3dDelta, nbr_k->bo_data.dBOp);
	rvec_ScaledAdd( temp, -coef.C4dbopi, nbr_k->bo_data.dBOp);
	rvec_ScaledAdd( temp, -coef.C4dbopi2, nbr_k->bo_data.dBOp);
	rvec_Add( workspace->f[k], temp );

	if( system->pair_ptr->vflag_atom ) {
	rvec_Scale(fk_tmp, -1.0, temp);
	rvec_ScaledSum(delki,1.,system->my_atoms[k].x,-1.,system->my_atoms[i].x);
	system->pair_ptr->v_tally(k,fk_tmp,delki);
	rvec_ScaledSum(delkj,1.,system->my_atoms[k].x,-1.,system->my_atoms[j].x);
	system->pair_ptr->v_tally(k,fk_tmp,delkj);
	}
	}

	}


	int BOp( storage workspace, reax_list bonds, double bo_cut,
	int i, int btop_i, far_neighbor_data *nbr_pj,
	single_body_parameters sbp_i, single_body_parameters sbp_j,
	two_body_parameters *twbp ) {
	int j, btop_j;
	double r2, C12, C34, C56;
	double Cln_BOp_s, Cln_BOp_pi, Cln_BOp_pi2;
	double BO, BO_s, BO_pi, BO_pi2;
	bond_data ibond, jbond;
	bond_order_data bo_ij, bo_ji;

	j = nbr_pj->nbr;
	r2 = SQR(nbr_pj->d);

	if( sbp_i->r_s > 0.0 && sbp_j->r_s > 0.0 ) {
	C12 = twbp->p_bo1 * pow( nbr_pj->d / twbp->r_s, twbp->p_bo2 );
	BO_s = (1.0 + bo_cut) * exp( C12 );
	}
	else BO_s = C12 = 0.0;

	if( sbp_i->r_pi > 0.0 && sbp_j->r_pi > 0.0 ) {
	C34 = twbp->p_bo3 * pow( nbr_pj->d / twbp->r_p, twbp->p_bo4 );
	BO_pi = exp( C34 );
	}
	else BO_pi = C34 = 0.0;

	if( sbp_i->r_pi_pi > 0.0 && sbp_j->r_pi_pi > 0.0 ) {
	C56 = twbp->p_bo5 * pow( nbr_pj->d / twbp->r_pp, twbp->p_bo6 );
	BO_pi2= exp( C56 );
	}
	else BO_pi2 = C56 = 0.0;

	/* Initially BO values are the uncorrected ones, page 1 */
	BO = BO_s + BO_pi + BO_pi2;

	if( BO >= bo_cut ) {
	/**** bonds i-j and j-i ****/
	ibond = &( bonds->select.bond_list[btop_i] );
	btop_j = End_Index( j, bonds );
	jbond = &(bonds->select.bond_list[btop_j]);

	ibond->nbr = j;
	jbond->nbr = i;
	ibond->d = nbr_pj->d;
	jbond->d = nbr_pj->d;
	rvec_Copy( ibond->dvec, nbr_pj->dvec );
	rvec_Scale( jbond->dvec, -1, nbr_pj->dvec );
	ivec_Copy( ibond->rel_box, nbr_pj->rel_box );
	ivec_Scale( jbond->rel_box, -1, nbr_pj->rel_box );
	ibond->dbond_index = btop_i;
	jbond->dbond_index = btop_i;
	ibond->sym_index = btop_j;
	jbond->sym_index = btop_i;
	Set_End_Index( j, btop_j+1, bonds );

	bo_ij = &( ibond->bo_data );
	bo_ji = &( jbond->bo_data );
	bo_ji->BO = bo_ij->BO = BO;
	bo_ji->BO_s = bo_ij->BO_s = BO_s;
	bo_ji->BO_pi = bo_ij->BO_pi = BO_pi;
	bo_ji->BO_pi2 = bo_ij->BO_pi2 = BO_pi2;

	/* Bond Order page2-3, derivative of total bond order prime */
	Cln_BOp_s = twbp->p_bo2 * C12 / r2;
	Cln_BOp_pi = twbp->p_bo4 * C34 / r2;
	Cln_BOp_pi2 = twbp->p_bo6 * C56 / r2;

	/* Only dln_BOp_xx wrt. dr_i is stored here, note that
	dln_BOp_xx/dr_i = -dln_BOp_xx/dr_j and all others are 0 */
	rvec_Scale(bo_ij->dln_BOp_s,-bo_ij->BO_s*Cln_BOp_s,ibond->dvec);
	rvec_Scale(bo_ij->dln_BOp_pi,-bo_ij->BO_pi*Cln_BOp_pi,ibond->dvec);
	rvec_Scale(bo_ij->dln_BOp_pi2,
	-bo_ij->BO_pi2*Cln_BOp_pi2,ibond->dvec);
	rvec_Scale(bo_ji->dln_BOp_s, -1., bo_ij->dln_BOp_s);
	rvec_Scale(bo_ji->dln_BOp_pi, -1., bo_ij->dln_BOp_pi );
	rvec_Scale(bo_ji->dln_BOp_pi2, -1., bo_ij->dln_BOp_pi2 );

	rvec_Scale( bo_ij->dBOp,
	-(bo_ij->BO_s * Cln_BOp_s +
	bo_ij->BO_pi * Cln_BOp_pi +
	bo_ij->BO_pi2 * Cln_BOp_pi2), ibond->dvec );
	rvec_Scale( bo_ji->dBOp, -1., bo_ij->dBOp );

	rvec_Add( workspace->dDeltap_self[i], bo_ij->dBOp );
	rvec_Add( workspace->dDeltap_self[j], bo_ji->dBOp );

	bo_ij->BO_s -= bo_cut;
	bo_ij->BO -= bo_cut;
	bo_ji->BO_s -= bo_cut;
	bo_ji->BO -= bo_cut;
	workspace->total_bond_order[i] += bo_ij->BO; //currently total_BOp
	workspace->total_bond_order[j] += bo_ji->BO; //currently total_BOp
	bo_ij->Cdbo = bo_ij->Cdbopi = bo_ij->Cdbopi2 = 0.0;
	bo_ji->Cdbo = bo_ji->Cdbopi = bo_ji->Cdbopi2 = 0.0;

	return 1;
	}

	return 0;
	}


	void BO( reax_system system, control_params control, simulation_data *data,
	storage workspace, reax_list lists, output_controls out_control )
	{
	int i, j, pj, type_i, type_j;
	int start_i, end_i, sym_index;
	double val_i, Deltap_i, Deltap_boc_i;
	double val_j, Deltap_j, Deltap_boc_j;
	double f1, f2, f3, f4, f5, f4f5, exp_f4, exp_f5;
	double exp_p1i, exp_p2i, exp_p1j, exp_p2j;
	double temp, u1_ij, u1_ji, Cf1A_ij, Cf1B_ij, Cf1_ij, Cf1_ji;
	double Cf45_ij, Cf45_ji, p_lp1; //u_ij, u_ji
	double A0_ij, A1_ij, A2_ij, A2_ji, A3_ij, A3_ji;
	double explp1, p_boc1, p_boc2;
	single_body_parameters sbp_i, sbp_j;
	two_body_parameters *twbp;
	bond_order_data bo_ij, bo_ji;
	reax_list bonds = (lists) + BONDS;

	p_boc1 = system->reax_param.gp.l[0];
	p_boc2 = system->reax_param.gp.l[1];

	/* Calculate Deltaprime, Deltaprime_boc values */
	for( i = 0; i < system->N; ++i ) {
	type_i = system->my_atoms[i].type;
	if (type_i < 0) continue;
	sbp_i = &(system->reax_param.sbp[type_i]);
	workspace->Deltap[i] = workspace->total_bond_order[i] - sbp_i->valency;
	workspace->Deltap_boc[i] =
	workspace->total_bond_order[i] - sbp_i->valency_val;

	workspace->total_bond_order[i] = 0;
	}

	/* Corrected Bond Order calculations */
	for( i = 0; i < system->N; ++i ) {
	type_i = system->my_atoms[i].type;
	if (type_i < 0) continue;
	sbp_i = &(system->reax_param.sbp[type_i]);
	val_i = sbp_i->valency;
	Deltap_i = workspace->Deltap[i];
	Deltap_boc_i = workspace->Deltap_boc[i];
	start_i = Start_Index(i, bonds);
	end_i = End_Index(i, bonds);

	for( pj = start_i; pj < end_i; ++pj ) {
	j = bonds->select.bond_list[pj].nbr;
	type_j = system->my_atoms[j].type;
	if (type_j < 0) continue;
	bo_ij = &( bonds->select.bond_list[pj].bo_data );
	// fprintf( stderr, "\tj:%d - ubo: %8.3f\n", j+1, bo_ij->BO );

	if( i < j \|\| workspace->bond_mark[j] > 3 ) {
	twbp = &( system->reax_param.tbp[type_i][type_j] );

	if( twbp->ovc < 0.001 && twbp->v13cor < 0.001 ) {
	bo_ij->C1dbo = 1.000000;
	bo_ij->C2dbo = 0.000000;
	bo_ij->C3dbo = 0.000000;

	bo_ij->C1dbopi = bo_ij->BO_pi;
	bo_ij->C2dbopi = 0.000000;
	bo_ij->C3dbopi = 0.000000;
	bo_ij->C4dbopi = 0.000000;

	bo_ij->C1dbopi2 = bo_ij->BO_pi2;
	bo_ij->C2dbopi2 = 0.000000;
	bo_ij->C3dbopi2 = 0.000000;
	bo_ij->C4dbopi2 = 0.000000;

	}
	else {
	val_j = system->reax_param.sbp[type_j].valency;
	Deltap_j = workspace->Deltap[j];
	Deltap_boc_j = workspace->Deltap_boc[j];

	/* on page 1 */
	if( twbp->ovc >= 0.001 ) {
	/* Correction for overcoordination */
	exp_p1i = exp( -p_boc1 * Deltap_i );
	exp_p2i = exp( -p_boc2 * Deltap_i );
	exp_p1j = exp( -p_boc1 * Deltap_j );
	exp_p2j = exp( -p_boc2 * Deltap_j );

	f2 = exp_p1i + exp_p1j;
	f3 = -1.0 / p_boc2 * log( 0.5 * ( exp_p2i + exp_p2j ) );
	f1 = 0.5 * ( ( val_i + f2 )/( val_i + f2 + f3 ) +
	( val_j + f2 )/( val_j + f2 + f3 ) );

	temp = f2 + f3;
	u1_ij = val_i + temp;
	u1_ji = val_j + temp;
	Cf1A_ij = 0.5 * f3 * (1.0 / SQR( u1_ij ) +
	1.0 / SQR( u1_ji ));
	Cf1B_ij = -0.5 * (( u1_ij - f3 ) / SQR( u1_ij ) +
	( u1_ji - f3 ) / SQR( u1_ji ));

	Cf1_ij = 0.50 * ( -p_boc1 * exp_p1i / u1_ij -
	((val_i+f2) / SQR(u1_ij)) *
	( -p_boc1 * exp_p1i +
	exp_p2i / ( exp_p2i + exp_p2j ) ) +
	-p_boc1 * exp_p1i / u1_ji -
	((val_j+f2) / SQR(u1_ji)) *
	( -p_boc1 * exp_p1i +
	exp_p2i / ( exp_p2i + exp_p2j ) ));


	Cf1_ji = -Cf1A_ij * p_boc1 * exp_p1j +
	Cf1B_ij * exp_p2j / ( exp_p2i + exp_p2j );

	}
	else {
	/* No overcoordination correction! */
	f1 = 1.0;
	Cf1_ij = Cf1_ji = 0.0;
	}

	if( twbp->v13cor >= 0.001 ) {
	/* Correction for 1-3 bond orders */
	exp_f4 =exp(-(twbp->p_boc4 * SQR( bo_ij->BO ) -
	Deltap_boc_i) * twbp->p_boc3 + twbp->p_boc5);
	exp_f5 =exp(-(twbp->p_boc4 * SQR( bo_ij->BO ) -
	Deltap_boc_j) * twbp->p_boc3 + twbp->p_boc5);

	f4 = 1. / (1. + exp_f4);
	f5 = 1. / (1. + exp_f5);
	f4f5 = f4 * f5;

	/* Bond Order pages 8-9, derivative of f4 and f5 */
	Cf45_ij = -f4 * exp_f4;
	Cf45_ji = -f5 * exp_f5;
	}
	else {
	f4 = f5 = f4f5 = 1.0;
	Cf45_ij = Cf45_ji = 0.0;
	}

	/* Bond Order page 10, derivative of total bond order */
	A0_ij = f1 * f4f5;
	A1_ij = -2 * twbp->p_boc3 * twbp->p_boc4 * bo_ij->BO *
	(Cf45_ij + Cf45_ji);
	A2_ij = Cf1_ij / f1 + twbp->p_boc3 * Cf45_ij;
	A2_ji = Cf1_ji / f1 + twbp->p_boc3 * Cf45_ji;
	A3_ij = A2_ij + Cf1_ij / f1;
	A3_ji = A2_ji + Cf1_ji / f1;

	/* find corrected bond orders and their derivative coef */
	bo_ij->BO = bo_ij->BO * A0_ij;
	bo_ij->BO_pi = bo_ij->BO_pi * A0_ij *f1;
	bo_ij->BO_pi2= bo_ij->BO_pi2* A0_ij *f1;
	bo_ij->BO_s = bo_ij->BO - ( bo_ij->BO_pi + bo_ij->BO_pi2 );

	bo_ij->C1dbo = A0_ij + bo_ij->BO * A1_ij;
	bo_ij->C2dbo = bo_ij->BO * A2_ij;
	bo_ij->C3dbo = bo_ij->BO * A2_ji;

	bo_ij->C1dbopi = f1f1f4*f5;
	bo_ij->C2dbopi = bo_ij->BO_pi * A1_ij;
	bo_ij->C3dbopi = bo_ij->BO_pi * A3_ij;
	bo_ij->C4dbopi = bo_ij->BO_pi * A3_ji;

	bo_ij->C1dbopi2 = f1f1f4*f5;
	bo_ij->C2dbopi2 = bo_ij->BO_pi2 * A1_ij;
	bo_ij->C3dbopi2 = bo_ij->BO_pi2 * A3_ij;
	bo_ij->C4dbopi2 = bo_ij->BO_pi2 * A3_ji;

	}

	/* neglect bonds that are < 1e-10 */
	if( bo_ij->BO < 1e-10 )
	bo_ij->BO = 0.0;
	if( bo_ij->BO_s < 1e-10 )
	bo_ij->BO_s = 0.0;
	if( bo_ij->BO_pi < 1e-10 )
	bo_ij->BO_pi = 0.0;
	if( bo_ij->BO_pi2 < 1e-10 )
	bo_ij->BO_pi2 = 0.0;

	workspace->total_bond_order[i] += bo_ij->BO; //now keeps total_BO

	}
	else {
	/* We only need to update bond orders from bo_ji
	everything else is set in uncorrected_bo calculations */
	sym_index = bonds->select.bond_list[pj].sym_index;
	bo_ji = &(bonds->select.bond_list[ sym_index ].bo_data);
	bo_ij->BO = bo_ji->BO;
	bo_ij->BO_s = bo_ji->BO_s;
	bo_ij->BO_pi = bo_ji->BO_pi;
	bo_ij->BO_pi2 = bo_ji->BO_pi2;

	workspace->total_bond_order[i] += bo_ij->BO;// now keeps total_BO
	}
	}

	}

	p_lp1 = system->reax_param.gp.l[15];
	for( j = 0; j < system->N; ++j ){
	type_j = system->my_atoms[j].type;
	if (type_j < 0) continue;
	sbp_j = &(system->reax_param.sbp[ type_j ]);

	workspace->Delta[j] = workspace->total_bond_order[j] - sbp_j->valency;
	workspace->Delta_e[j] = workspace->total_bond_order[j] - sbp_j->valency_e;
	workspace->Delta_boc[j] = workspace->total_bond_order[j] -
	sbp_j->valency_boc;
	workspace->Delta_val[j] = workspace->total_bond_order[j] -
	sbp_j->valency_val;

	workspace->vlpex[j] = workspace->Delta_e[j] -
	2.0 * (int)(workspace->Delta_e[j]/2.0);
	explp1 = exp(-p_lp1 * SQR(2.0 + workspace->vlpex[j]));
	workspace->nlp[j] = explp1 - (int)(workspace->Delta_e[j] / 2.0);
	workspace->Delta_lp[j] = sbp_j->nlp_opt - workspace->nlp[j];
	workspace->Clp[j] = 2.0 * p_lp1 * explp1 * (2.0 + workspace->vlpex[j]);
	workspace->dDelta_lp[j] = workspace->Clp[j];

	if( sbp_j->mass > 21.0 ) {
	workspace->nlp_temp[j] = 0.5 * (sbp_j->valency_e - sbp_j->valency);
	workspace->Delta_lp_temp[j] = sbp_j->nlp_opt - workspace->nlp_temp[j];
	workspace->dDelta_lp_temp[j] = 0.;
	}
	else {
	workspace->nlp_temp[j] = workspace->nlp[j];
	workspace->Delta_lp_temp[j] = sbp_j->nlp_opt - workspace->nlp_temp[j];
	workspace->dDelta_lp_temp[j] = workspace->Clp[j];
	}

	}

	}
	diff --git a/src/USER-REAXC/reaxc_bonds.cpp b/src/USER-REAXC/reaxc_bonds.cpp
	index e0ef38ba0..a8a129816 100644
	--- a/src/USER-REAXC/reaxc_bonds.cpp
	+++ b/src/USER-REAXC/reaxc_bonds.cpp
	@@ -1,137 +1,137 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_bonds.h"
	#include "reaxc_bond_orders.h"
	#include "reaxc_list.h"
	#include "reaxc_tool_box.h"
	#include "reaxc_vector.h"

	void Bonds( reax_system system, control_params control,
	simulation_data data, storage workspace, reax_list **lists,
	output_controls *out_control )
	{
	int i, j, pj, natoms;
	int start_i, end_i;
	int type_i, type_j;
	double ebond, pow_BOs_be2, exp_be12, CEbo;
	double gp3, gp4, gp7, gp10, gp37;
	double exphu, exphua1, exphub1, exphuov, hulpov, estriph;
	double decobdbo, decobdboua, decobdboub;
	single_body_parameters sbp_i, sbp_j;
	two_body_parameters *twbp;
	bond_order_data *bo_ij;
	reax_list *bonds;

	bonds = (*lists) + BONDS;
	gp3 = system->reax_param.gp.l[3];
	gp4 = system->reax_param.gp.l[4];
	gp7 = system->reax_param.gp.l[7];
	gp10 = system->reax_param.gp.l[10];
	gp37 = (int) system->reax_param.gp.l[37];
	natoms = system->n;

	for( i = 0; i < natoms; ++i ) {
	start_i = Start_Index(i, bonds);
	end_i = End_Index(i, bonds);

	for( pj = start_i; pj < end_i; ++pj ) {
	j = bonds->select.bond_list[pj].nbr;

	if( system->my_atoms[i].orig_id > system->my_atoms[j].orig_id )
	continue;
	if( system->my_atoms[i].orig_id == system->my_atoms[j].orig_id ) {
	if (system->my_atoms[j].x[2] < system->my_atoms[i].x[2]) continue;
	if (system->my_atoms[j].x[2] == system->my_atoms[i].x[2] &&
	system->my_atoms[j].x[1] < system->my_atoms[i].x[1]) continue;
	if (system->my_atoms[j].x[2] == system->my_atoms[i].x[2] &&
	system->my_atoms[j].x[1] == system->my_atoms[i].x[1] &&
	system->my_atoms[j].x[0] < system->my_atoms[i].x[0]) continue;
	}

	/* set the pointers */
	type_i = system->my_atoms[i].type;
	type_j = system->my_atoms[j].type;
	sbp_i = &( system->reax_param.sbp[type_i] );
	sbp_j = &( system->reax_param.sbp[type_j] );
	twbp = &( system->reax_param.tbp[type_i][type_j] );
	bo_ij = &( bonds->select.bond_list[pj].bo_data );

	/* calculate the constants */
	pow_BOs_be2 = pow( bo_ij->BO_s, twbp->p_be2 );
	exp_be12 = exp( twbp->p_be1 * ( 1.0 - pow_BOs_be2 ) );
	CEbo = -twbp->De_s * exp_be12 *
	( 1.0 - twbp->p_be1 * twbp->p_be2 * pow_BOs_be2 );

	/* calculate the Bond Energy */
	data->my_en.e_bond += ebond =
	-twbp->De_s * bo_ij->BO_s * exp_be12
	-twbp->De_p * bo_ij->BO_pi
	-twbp->De_pp * bo_ij->BO_pi2;

	/* tally into per-atom energy */
	if( system->pair_ptr->evflag)
	system->pair_ptr->ev_tally(i,j,natoms,1,ebond,0.0,0.0,0.0,0.0,0.0);

	/* calculate derivatives of Bond Orders */
	bo_ij->Cdbo += CEbo;
	bo_ij->Cdbopi -= (CEbo + twbp->De_p);
	bo_ij->Cdbopi2 -= (CEbo + twbp->De_pp);

	/* Stabilisation terminal triple bond */
	if( bo_ij->BO >= 1.00 ) {
	if( gp37 == 2 \|\|
	(sbp_i->mass == 12.0000 && sbp_j->mass == 15.9990) \|\|
	(sbp_j->mass == 12.0000 && sbp_i->mass == 15.9990) ) {
	exphu = exp( -gp7 * SQR(bo_ij->BO - 2.50) );
	exphua1 = exp(-gp3 * (workspace->total_bond_order[i]-bo_ij->BO));
	exphub1 = exp(-gp3 * (workspace->total_bond_order[j]-bo_ij->BO));
	exphuov = exp(gp4 * (workspace->Delta[i] + workspace->Delta[j]));
	hulpov = 1.0 / (1.0 + 25.0 * exphuov);

	estriph = gp10 * exphu * hulpov * (exphua1 + exphub1);
	data->my_en.e_bond += estriph;

	decobdbo = gp10 * exphu * hulpov * (exphua1 + exphub1) *
	( gp3 - 2.0 * gp7 * (bo_ij->BO-2.50) );
	decobdboua = -gp10 * exphu * hulpov *
	(gp3exphua1 + 25.0gp4exphuovhulpov*(exphua1+exphub1));
	decobdboub = -gp10 * exphu * hulpov *
	(gp3exphub1 + 25.0gp4exphuovhulpov*(exphua1+exphub1));

	/* tally into per-atom energy */
	if( system->pair_ptr->evflag)
	system->pair_ptr->ev_tally(i,j,natoms,1,estriph,0.0,0.0,0.0,0.0,0.0);

	bo_ij->Cdbo += decobdbo;
	workspace->CdDelta[i] += decobdboua;
	workspace->CdDelta[j] += decobdboub;
	}
	}
	}
	}
	}
	diff --git a/src/USER-REAXC/reaxc_control.cpp b/src/USER-REAXC/reaxc_control.cpp
	index 3753360c6..4def41bc8 100644
	--- a/src/USER-REAXC/reaxc_control.cpp
	+++ b/src/USER-REAXC/reaxc_control.cpp
	@@ -1,385 +1,385 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_control.h"
	#include "reaxc_tool_box.h"

	char Read_Control_File( char control_file, control_params control,
	output_controls *out_control )
	{
	FILE *fp;
	char s, *tmp;
	int i,ival;
	double val;

	/* open control file */
	if ( (fp = fopen( control_file, "r" ) ) == NULL ) {
	fprintf( stderr, "error opening the control file! terminating...\n" );
	MPI_Abort( MPI_COMM_WORLD, FILE_NOT_FOUND );
	}

	/* assign default values */
	strcpy( control->sim_name, "simulate" );
	control->ensemble = NVE;
	control->nsteps = 0;
	control->dt = 0.25;
	control->nprocs = 1;
	control->procs_by_dim[0] = 1;
	control->procs_by_dim[1] = 1;
	control->procs_by_dim[2] = 1;
	control->geo_format = 1;

	control->restart = 0;
	out_control->restart_format = WRITE_BINARY;
	out_control->restart_freq = 0;
	control->reposition_atoms = 0;
	control->restrict_bonds = 0;
	control->remove_CoM_vel = 25;
	out_control->debug_level = 0;
	out_control->energy_update_freq = 0;

	control->reneighbor = 1;
	control->vlist_cut = control->nonb_cut;
	control->bond_cut = 5.0;
	control->bg_cut = 0.3;
	control->thb_cut = 0.001;
	control->thb_cutsq = 0.00001;
	control->hbond_cut = 7.5;

	control->tabulate = 0;

	control->qeq_freq = 1;
	control->q_err = 1e-6;
	control->refactor = 100;
	control->droptol = 1e-2;;

	control->T_init = 0.;
	control->T_final = 300.;
	control->Tau_T = 500.0;
	control->T_mode = 0;
	control->T_rate = 1.;
	control->T_freq = 1.;

	control->P[0] = control->P[1] = control->P[2] = 0.000101325;
	control->Tau_P[0] = control->Tau_P[1] = control->Tau_P[2] = 500.0;
	control->Tau_PT[0] = control->Tau_PT[1] = control->Tau_PT[2] = 500.0;
	control->compressibility = 1.0;
	control->press_mode = 0;
	control->virial = 0;

	out_control->write_steps = 0;
	out_control->traj_compress = 0;
	out_control->traj_method = REG_TRAJ;
	strcpy( out_control->traj_title, "default_title" );
	out_control->atom_info = 0;
	out_control->bond_info = 0;
	out_control->angle_info = 0;

	control->molecular_analysis = 0;
	control->dipole_anal = 0;
	control->freq_dipole_anal = 0;
	control->diffusion_coef = 0;
	control->freq_diffusion_coef = 0;
	control->restrict_type = 0;

	/* memory allocations */
	s = (char) malloc(sizeof(char)MAX_LINE);
	tmp = (char*) malloc(sizeof(char)*MAX_TOKENS);
	for (i=0; i < MAX_TOKENS; i++)
	tmp[i] = (char) malloc(sizeof(char)MAX_LINE);

	/* read control parameters file */
	while (!feof(fp)) {
	fgets( s, MAX_LINE, fp );
	Tokenize( s, &tmp );

	if( strcmp(tmp[0], "simulation_name") == 0 ) {
	strcpy( control->sim_name, tmp[1] );
	}
	else if( strcmp(tmp[0], "ensemble_type") == 0 ) {
	ival = atoi(tmp[1]);
	control->ensemble = ival;
	if( ival == iNPT \|\| ival ==sNPT \|\| ival == NPT )
	control->virial = 1;
	}
	else if( strcmp(tmp[0], "nsteps") == 0 ) {
	ival = atoi(tmp[1]);
	control->nsteps = ival;
	}
	else if( strcmp(tmp[0], "dt") == 0) {
	val = atof(tmp[1]);
	control->dt = val * 1.e-3; // convert dt from fs to ps!
	}
	else if( strcmp(tmp[0], "proc_by_dim") == 0 ) {
	ival = atoi(tmp[1]);
	control->procs_by_dim[0] = ival;
	ival = atoi(tmp[2]);
	control->procs_by_dim[1] = ival;
	ival = atoi(tmp[3]);
	control->procs_by_dim[2] = ival;

	control->nprocs = control->procs_by_dim[0]control->procs_by_dim[1]
	control->procs_by_dim[2];
	}
	else if( strcmp(tmp[0], "random_vel") == 0 ) {
	ival = atoi(tmp[1]);
	control->random_vel = ival;
	}
	else if( strcmp(tmp[0], "restart_format") == 0 ) {
	ival = atoi(tmp[1]);
	out_control->restart_format = ival;
	}
	else if( strcmp(tmp[0], "restart_freq") == 0 ) {
	ival = atoi(tmp[1]);
	out_control->restart_freq = ival;
	}
	else if( strcmp(tmp[0], "reposition_atoms") == 0 ) {
	ival = atoi(tmp[1]);
	control->reposition_atoms = ival;
	}
	else if( strcmp(tmp[0], "restrict_bonds") == 0 ) {
	ival = atoi( tmp[1] );
	control->restrict_bonds = ival;
	}
	else if( strcmp(tmp[0], "remove_CoM_vel") == 0 ) {
	ival = atoi(tmp[1]);
	control->remove_CoM_vel = ival;
	}
	else if( strcmp(tmp[0], "debug_level") == 0 ) {
	ival = atoi(tmp[1]);
	out_control->debug_level = ival;
	}
	else if( strcmp(tmp[0], "energy_update_freq") == 0 ) {
	ival = atoi(tmp[1]);
	out_control->energy_update_freq = ival;
	}
	else if( strcmp(tmp[0], "reneighbor") == 0 ) {
	ival = atoi( tmp[1] );
	control->reneighbor = ival;
	}
	else if( strcmp(tmp[0], "vlist_buffer") == 0 ) {
	val = atof(tmp[1]);
	control->vlist_cut= val + control->nonb_cut;
	}
	else if( strcmp(tmp[0], "nbrhood_cutoff") == 0 ) {
	val = atof(tmp[1]);
	control->bond_cut = val;
	}
	else if( strcmp(tmp[0], "bond_graph_cutoff") == 0 ) {
	val = atof(tmp[1]);
	control->bg_cut = val;
	}
	else if( strcmp(tmp[0], "thb_cutoff") == 0 ) {
	val = atof(tmp[1]);
	control->thb_cut = val;
	}
	else if( strcmp(tmp[0], "thb_cutoff_sq") == 0 ) {
	val = atof(tmp[1]);
	control->thb_cutsq = val;
	}
	else if( strcmp(tmp[0], "hbond_cutoff") == 0 ) {
	val = atof( tmp[1] );
	control->hbond_cut = val;
	}
	else if( strcmp(tmp[0], "ghost_cutoff") == 0 ) {
	val = atof(tmp[1]);
	control->user_ghost_cut = val;
	}
	else if( strcmp(tmp[0], "tabulate_long_range") == 0 ) {
	ival = atoi( tmp[1] );
	control->tabulate = ival;
	}
	else if( strcmp(tmp[0], "qeq_freq") == 0 ) {
	ival = atoi( tmp[1] );
	control->qeq_freq = ival;
	}
	else if( strcmp(tmp[0], "q_err") == 0 ) {
	val = atof( tmp[1] );
	control->q_err = val;
	}
	else if( strcmp(tmp[0], "ilu_refactor") == 0 ) {
	ival = atoi( tmp[1] );
	control->refactor = ival;
	}
	else if( strcmp(tmp[0], "ilu_droptol") == 0 ) {
	val = atof( tmp[1] );
	control->droptol = val;
	}
	else if( strcmp(tmp[0], "temp_init") == 0 ) {
	val = atof(tmp[1]);
	control->T_init = val;

	if( control->T_init < 0.1 )
	control->T_init = 0.1;
	}
	else if( strcmp(tmp[0], "temp_final") == 0 ) {
	val = atof(tmp[1]);
	control->T_final = val;

	if( control->T_final < 0.1 )
	control->T_final = 0.1;
	}
	else if( strcmp(tmp[0], "t_mass") == 0 ) {
	val = atof(tmp[1]);
	control->Tau_T = val * 1.e-3; // convert t_mass from fs to ps
	}
	else if( strcmp(tmp[0], "t_mode") == 0 ) {
	ival = atoi(tmp[1]);
	control->T_mode = ival;
	}
	else if( strcmp(tmp[0], "t_rate") == 0 ) {
	val = atof(tmp[1]);
	control->T_rate = val;
	}
	else if( strcmp(tmp[0], "t_freq") == 0 ) {
	val = atof(tmp[1]);
	control->T_freq = val;
	}
	else if( strcmp(tmp[0], "pressure") == 0 ) {
	if( control->ensemble == iNPT ) {
	control->P[0] = control->P[1] = control->P[2] = atof(tmp[1]);
	}
	else if( control->ensemble == sNPT ) {
	control->P[0] = atof(tmp[1]);
	control->P[1] = atof(tmp[2]);
	control->P[2] = atof(tmp[3]);
	}
	}
	else if( strcmp(tmp[0], "p_mass") == 0 ) {
	// convert p_mass from fs to ps
	if( control->ensemble == iNPT ) {
	control->Tau_P[0] = control->Tau_P[1] = control->Tau_P[2] =
	atof(tmp[1]) * 1.e-3;
	}
	else if( control->ensemble == sNPT ) {
	control->Tau_P[0] = atof(tmp[1]) * 1.e-3;
	control->Tau_P[1] = atof(tmp[2]) * 1.e-3;
	control->Tau_P[2] = atof(tmp[3]) * 1.e-3;
	}
	}
	else if( strcmp(tmp[0], "pt_mass") == 0 ) {
	val = atof(tmp[1]);
	control->Tau_PT[0] = control->Tau_PT[1] = control->Tau_PT[2] =
	val * 1.e-3; // convert pt_mass from fs to ps
	}
	else if( strcmp(tmp[0], "compress") == 0 ) {
	val = atof(tmp[1]);
	control->compressibility = val;
	}
	else if( strcmp(tmp[0], "press_mode") == 0 ) {
	ival = atoi(tmp[1]);
	control->press_mode = ival;
	}
	else if( strcmp(tmp[0], "geo_format") == 0 ) {
	ival = atoi( tmp[1] );
	control->geo_format = ival;
	}
	else if( strcmp(tmp[0], "write_freq") == 0 ) {
	ival = atoi(tmp[1]);
	out_control->write_steps = ival;
	}
	else if( strcmp(tmp[0], "traj_compress") == 0 ) {
	ival = atoi(tmp[1]);
	out_control->traj_compress = ival;
	}
	else if( strcmp(tmp[0], "traj_method") == 0 ) {
	ival = atoi(tmp[1]);
	out_control->traj_method = ival;
	}
	else if( strcmp(tmp[0], "traj_title") == 0 ) {
	strcpy( out_control->traj_title, tmp[1] );
	}
	else if( strcmp(tmp[0], "atom_info") == 0 ) {
	ival = atoi(tmp[1]);
	out_control->atom_info += ival * 4;
	}
	else if( strcmp(tmp[0], "atom_velocities") == 0 ) {
	ival = atoi(tmp[1]);
	out_control->atom_info += ival * 2;
	}
	else if( strcmp(tmp[0], "atom_forces") == 0 ) {
	ival = atoi(tmp[1]);
	out_control->atom_info += ival * 1;
	}
	else if( strcmp(tmp[0], "bond_info") == 0 ) {
	ival = atoi(tmp[1]);
	out_control->bond_info = ival;
	}
	else if( strcmp(tmp[0], "angle_info") == 0 ) {
	ival = atoi(tmp[1]);
	out_control->angle_info = ival;
	}
	else if( strcmp(tmp[0], "molecular_analysis") == 0 ) {
	ival = atoi(tmp[1]);
	control->molecular_analysis = ival;
	}
	else if( strcmp(tmp[0], "ignore") == 0 ) {
	control->num_ignored = atoi(tmp[1]);
	for( i = 0; i < control->num_ignored; ++i )
	control->ignore[atoi(tmp[i+2])] = 1;
	}
	else if( strcmp(tmp[0], "dipole_anal") == 0 ) {
	ival = atoi(tmp[1]);
	control->dipole_anal = ival;
	}
	else if( strcmp(tmp[0], "freq_dipole_anal") == 0 ) {
	ival = atoi(tmp[1]);
	control->freq_dipole_anal = ival;
	}
	else if( strcmp(tmp[0], "diffusion_coef") == 0 ) {
	ival = atoi(tmp[1]);
	control->diffusion_coef = ival;
	}
	else if( strcmp(tmp[0], "freq_diffusion_coef") == 0 ) {
	ival = atoi(tmp[1]);
	control->freq_diffusion_coef = ival;
	}
	else if( strcmp(tmp[0], "restrict_type") == 0 ) {
	ival = atoi(tmp[1]);
	control->restrict_type = ival;
	}
	else {
	fprintf( stderr, "WARNING: unknown parameter %s\n", tmp[0] );
	MPI_Abort( MPI_COMM_WORLD, 15 );
	}
	}

	/* determine target T */
	if( control->T_mode == 0 )
	control->T = control->T_final;
	else control->T = control->T_init;

	/* free memory allocations at the top */
	for( i = 0; i < MAX_TOKENS; i++ )
	free( tmp[i] );
	free( tmp );
	free( s );

	fclose(fp);

	return SUCCESS;
	}
	diff --git a/src/USER-REAXC/reaxc_defs.h b/src/USER-REAXC/reaxc_defs.h
	index d0a75d431..101b554fb 100644
	--- a/src/USER-REAXC/reaxc_defs.h
	+++ b/src/USER-REAXC/reaxc_defs.h
	@@ -1,159 +1,159 @@


	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	#ifndef REAX_DEFS_H
	#define REAX_DEFS_H

	#if defined(__IBMC__)
	#define inline __inline__
	#endif /IBMC/

	#ifndef SUCCESS
	#define SUCCESS 1
	#endif
	#ifndef FAILURE
	#define FAILURE 0
	#endif
	#ifndef TRUE
	#define TRUE 1
	#endif
	#ifndef FALSE
	#define FALSE 0
	#endif

	#define SQR(x) ((x)*(x))
	#define CUBE(x) ((x)(x)(x))
	#define DEG2RAD(a) ((a)*constPI/180.0)
	#define RAD2DEG(a) ((a)*180.0/constPI)
	// #define MAX(x,y) (((x) > (y)) ? (x) : (y))
	// #define MIN(x,y) (((x) < (y)) ? (x) : (y))
	#define MAX3(x,y,z) MAX( MAX(x,y), z)

	#define constPI 3.14159265
	#define C_ele 332.06371
	//#define K_B 503.398008 // kcal/mol/K
	#define K_B 0.831687 // amu A^2 / ps^2 / K
	#define F_CONV 1e6 / 48.88821291 / 48.88821291 // --> amu A / ps^2
	#define E_CONV 0.002391 // amu A^2 / ps^2 --> kcal/mol
	#define EV_to_KCALpMOL 14.400000 // ElectronVolt --> KCAL per MOLe
	#define KCALpMOL_to_EV 23.02 // 23.060549 //KCAL per MOLe --> ElectronVolt
	#define ECxA_to_DEBYE 4.803204 // elem. charge * Ang -> debye
	#define CAL_to_JOULES 4.184000 // CALories --> JOULES
	#define JOULES_to_CAL 1/4.184000 // JOULES --> CALories
	#define AMU_to_GRAM 1.6605e-24
	#define ANG_to_CM 1e-8
	#define AVOGNR 6.0221367e23
	#define P_CONV 1e-24 * AVOGNR * JOULES_to_CAL

	#define MAX_STR 1024
	#define MAX_LINE 1024
	#define MAX_TOKENS 1024
	#define MAX_TOKEN_LEN 1024

	#define MAX_ATOM_ID 100000
	#define MAX_RESTRICT 15
	#define MAX_MOLECULE_SIZE 20
	#define MAX_ATOM_TYPES 25

	#define NUM_INTRS 10
	#define ALMOST_ZERO 1e-10
	#define NEG_INF -1e10
	#define NO_BOND 1e-3 // 0.001
	#define HB_THRESHOLD 1e-2 // 0.01

	#define MIN_CAP 50
	#define MIN_NBRS 100
	#define MIN_HENTRIES 100
	#define MAX_BONDS 30
	#define MIN_BONDS 25
	#define MIN_HBONDS 25
	#define MIN_3BODIES 1000
	#define MIN_GCELL_POPL 50
	#define MIN_SEND 100
	#define SAFE_ZONE 1.2
	#define SAFER_ZONE 1.4
	#define DANGER_ZONE 0.90
	#define LOOSE_ZONE 0.75
	#define MAX_3BODY_PARAM 5
	#define MAX_4BODY_PARAM 5

	#define MAX_dV 1.01
	#define MIN_dV 0.99
	#define MAX_dT 4.00
	#define MIN_dT 0.00

	#define MASTER_NODE 0
	#define MAX_NBRS 6 //27
	#define MYSELF 13 // encoding of relative coordinate (0,0,0)

	#define MAX_ITR 10
	#define RESTART 30

	#define MAX_BOND 20

	-#define MAXREAXBOND 24 /* used in fix_reaxc_bonds.cpp and pair_reax_c.cpp */
	-#define MAXSPECBOND 24 /* used in fix_reaxc_species.cpp and pair_reax_c.cpp */
	+#define MAXREAXBOND 24 /* used in fix_reaxc_bonds.cpp and pair_reaxc.cpp */
	+#define MAXSPECBOND 24 /* used in fix_reaxc_species.cpp and pair_reaxc.cpp */

	/***************** ENUMERATIONS ***********************/
	enum geo_formats { CUSTOM, PDB, ASCII_RESTART, BINARY_RESTART, GF_N };

	enum restart_formats { WRITE_ASCII, WRITE_BINARY, RF_N };

	enum ensembles { NVE, bNVT, nhNVT, sNPT, iNPT, NPT, ens_N };

	enum lists { BONDS, OLD_BONDS, THREE_BODIES,
	HBONDS, FAR_NBRS, DBOS, DDELTAS, LIST_N };

	enum interactions { TYP_VOID, TYP_BOND, TYP_THREE_BODY,
	TYP_HBOND, TYP_FAR_NEIGHBOR, TYP_DBO, TYP_DDELTA, TYP_N };

	enum message_tags { INIT, UPDATE, BNDRY, UPDATE_BNDRY,
	EXC_VEC1, EXC_VEC2, DIST_RVEC2, COLL_RVEC2,
	DIST_RVECS, COLL_RVECS, INIT_DESCS, ATOM_LINES,
	BOND_LINES, ANGLE_LINES, RESTART_ATOMS, TAGS_N };

	enum errors { FILE_NOT_FOUND = -10, UNKNOWN_ATOM_TYPE = -11,
	CANNOT_OPEN_FILE = -12, CANNOT_INITIALIZE = -13,
	INSUFFICIENT_MEMORY = -14, UNKNOWN_OPTION = -15,
	INVALID_INPUT = -16, INVALID_GEO = -17 };

	enum exchanges { NONE, NEAR_EXCH, FULL_EXCH };

	enum gcell_types { NO_NBRS=0, NEAR_ONLY=1, HBOND_ONLY=2, FAR_ONLY=4,
	NEAR_HBOND=3, NEAR_FAR=5, HBOND_FAR=6, FULL_NBRS=7,
	NATIVE=8 };

	enum atoms { C_ATOM = 0, H_ATOM = 1, O_ATOM = 2, N_ATOM = 3,
	S_ATOM = 4, SI_ATOM = 5, GE_ATOM = 6, X_ATOM = 7 };

	enum traj_methods { REG_TRAJ, MPI_TRAJ, TF_N };

	enum molecules { UNKNOWN, WATER };


	#endif
	diff --git a/src/USER-REAXC/reaxc_ffield.cpp b/src/USER-REAXC/reaxc_ffield.cpp
	index fda284140..58a347ebf 100644
	--- a/src/USER-REAXC/reaxc_ffield.cpp
	+++ b/src/USER-REAXC/reaxc_ffield.cpp
	@@ -1,699 +1,699 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "error.h"
	#include "reaxc_ffield.h"
	#include "reaxc_tool_box.h"

	char Read_Force_Field( FILE fp, reax_interaction reax,
	control_params *control )
	{
	char *s;
	char **tmp;
	char ****tor_flag;
	int c, i, j, k, l, m, n, o, p, cnt;
	int lgflag = control->lgflag;
	int errorflag = 1;
	double val;
	MPI_Comm comm;

	comm = MPI_COMM_WORLD;

	s = (char) malloc(sizeof(char)MAX_LINE);
	tmp = (char*) malloc(sizeof(char)*MAX_TOKENS);
	for (i=0; i < MAX_TOKENS; i++)
	tmp[i] = (char) malloc(sizeof(char)MAX_TOKEN_LEN);

	/* reading first header comment */
	fgets( s, MAX_LINE, fp );

	/* line 2 is number of global parameters */
	fgets( s, MAX_LINE, fp );
	c = Tokenize( s, &tmp );

	/* reading the number of global parameters */
	n = atoi(tmp[0]);
	if (n < 1) {
	fprintf( stderr, "WARNING: number of globals in ffield file is 0!\n" );
	fclose(fp);
	free(s);
	free(tmp);
	return 1;
	}

	reax->gp.n_global = n;
	reax->gp.l = (double) malloc(sizeof(double)n);

	/* see reax_types.h for mapping between l[i] and the lambdas used in ff */
	for (i=0; i < n; i++) {
	fgets(s,MAX_LINE,fp);
	c = Tokenize(s,&tmp);

	val = (double) atof(tmp[0]);
	reax->gp.l[i] = val;
	}

	control->bo_cut = 0.01 * reax->gp.l[29];
	control->nonb_low = reax->gp.l[11];
	control->nonb_cut = reax->gp.l[12];

	/* next line is number of atom types and some comments */
	fgets( s, MAX_LINE, fp );
	c = Tokenize( s, &tmp );
	reax->num_atom_types = atoi(tmp[0]);

	/* 3 lines of comments */
	fgets(s,MAX_LINE,fp);
	fgets(s,MAX_LINE,fp);
	fgets(s,MAX_LINE,fp);

	/* Allocating structures in reax_interaction */
	reax->sbp = (single_body_parameters*)
	scalloc( reax->num_atom_types, sizeof(single_body_parameters), "sbp",
	comm );
	reax->tbp = (two_body_parameters**)
	scalloc( reax->num_atom_types, sizeof(two_body_parameters*), "tbp", comm );
	reax->thbp= (three_body_header***)
	scalloc( reax->num_atom_types, sizeof(three_body_header**), "thbp", comm );
	reax->hbp = (hbond_parameters***)
	scalloc( reax->num_atom_types, sizeof(hbond_parameters**), "hbp", comm );
	reax->fbp = (four_body_header****)
	scalloc( reax->num_atom_types, sizeof(four_body_header***), "fbp", comm );
	tor_flag = (char****)
	scalloc( reax->num_atom_types, sizeof(char***), "tor_flag", comm );

	for( i = 0; i < reax->num_atom_types; i++ ) {
	reax->tbp[i] = (two_body_parameters*)
	scalloc( reax->num_atom_types, sizeof(two_body_parameters), "tbp[i]",
	comm );
	reax->thbp[i]= (three_body_header**)
	scalloc( reax->num_atom_types, sizeof(three_body_header*), "thbp[i]",
	comm );
	reax->hbp[i] = (hbond_parameters**)
	scalloc( reax->num_atom_types, sizeof(hbond_parameters*), "hbp[i]",
	comm );
	reax->fbp[i] = (four_body_header***)
	scalloc( reax->num_atom_types, sizeof(four_body_header**), "fbp[i]",
	comm );
	tor_flag[i] = (char***)
	scalloc( reax->num_atom_types, sizeof(char**), "tor_flag[i]", comm );

	for( j = 0; j < reax->num_atom_types; j++ ) {
	reax->thbp[i][j]= (three_body_header*)
	scalloc( reax->num_atom_types, sizeof(three_body_header), "thbp[i,j]",
	comm );
	reax->hbp[i][j] = (hbond_parameters*)
	scalloc( reax->num_atom_types, sizeof(hbond_parameters), "hbp[i,j]",
	comm );
	reax->fbp[i][j] = (four_body_header**)
	scalloc( reax->num_atom_types, sizeof(four_body_header*), "fbp[i,j]",
	comm );
	tor_flag[i][j] = (char**)
	scalloc( reax->num_atom_types, sizeof(char*), "tor_flag[i,j]", comm );

	for (k=0; k < reax->num_atom_types; k++) {
	reax->fbp[i][j][k] = (four_body_header*)
	scalloc( reax->num_atom_types, sizeof(four_body_header), "fbp[i,j,k]",
	comm );
	tor_flag[i][j][k] = (char*)
	scalloc( reax->num_atom_types, sizeof(char), "tor_flag[i,j,k]",
	comm );
	}
	}
	}

	reax->gp.vdw_type = 0;


	for( i = 0; i < reax->num_atom_types; i++ ) {
	/* line one */
	fgets( s, MAX_LINE, fp );
	c = Tokenize( s, &tmp );

	for( j = 0; j < (int)(strlen(tmp[0])); ++j )
	reax->sbp[i].name[j] = toupper( tmp[0][j] );

	val = atof(tmp[1]); reax->sbp[i].r_s = val;
	val = atof(tmp[2]); reax->sbp[i].valency = val;
	val = atof(tmp[3]); reax->sbp[i].mass = val;
	val = atof(tmp[4]); reax->sbp[i].r_vdw = val;
	val = atof(tmp[5]); reax->sbp[i].epsilon = val;
	val = atof(tmp[6]); reax->sbp[i].gamma = val;
	val = atof(tmp[7]); reax->sbp[i].r_pi = val;
	val = atof(tmp[8]); reax->sbp[i].valency_e = val;
	reax->sbp[i].nlp_opt = 0.5 * (reax->sbp[i].valency_e-reax->sbp[i].valency);

	/* line two */
	fgets( s, MAX_LINE, fp );
	c = Tokenize( s, &tmp );

	val = atof(tmp[0]); reax->sbp[i].alpha = val;
	val = atof(tmp[1]); reax->sbp[i].gamma_w = val;
	val = atof(tmp[2]); reax->sbp[i].valency_boc= val;
	val = atof(tmp[3]); reax->sbp[i].p_ovun5 = val;
	val = atof(tmp[4]);
	val = atof(tmp[5]); reax->sbp[i].chi = val;
	val = atof(tmp[6]); reax->sbp[i].eta = 2.0 * val;
	val = atof(tmp[7]); reax->sbp[i].p_hbond = (int) val;

	/* line 3 */
	fgets( s, MAX_LINE, fp );
	c = Tokenize( s, &tmp );

	val = atof(tmp[0]); reax->sbp[i].r_pi_pi = val;
	val = atof(tmp[1]); reax->sbp[i].p_lp2 = val;
	val = atof(tmp[2]);
	val = atof(tmp[3]); reax->sbp[i].b_o_131 = val;
	val = atof(tmp[4]); reax->sbp[i].b_o_132 = val;
	val = atof(tmp[5]); reax->sbp[i].b_o_133 = val;
	val = atof(tmp[6]);
	val = atof(tmp[7]);

	/* line 4 */
	fgets( s, MAX_LINE, fp );
	c = Tokenize( s, &tmp );

	/* Sanity check */
	if (c < 3) {
	fprintf(stderr, "Inconsistent ffield file (reaxc_ffield.cpp) \n");
	MPI_Abort( comm, FILE_NOT_FOUND );
	}

	val = atof(tmp[0]); reax->sbp[i].p_ovun2 = val;
	val = atof(tmp[1]); reax->sbp[i].p_val3 = val;
	val = atof(tmp[2]);
	val = atof(tmp[3]); reax->sbp[i].valency_val= val;
	val = atof(tmp[4]); reax->sbp[i].p_val5 = val;
	val = atof(tmp[5]); reax->sbp[i].rcore2 = val;
	val = atof(tmp[6]); reax->sbp[i].ecore2 = val;
	val = atof(tmp[7]); reax->sbp[i].acore2 = val;

	/* line 5, only if lgvdw is yes */
	if (lgflag) {
	fgets( s, MAX_LINE, fp );
	c = Tokenize( s, &tmp );

	/* Sanity check */
	if (c > 3) {
	fprintf(stderr, "Inconsistent ffield file (reaxc_ffield.cpp) \n");
	MPI_Abort( comm, FILE_NOT_FOUND );
	}

	val = atof(tmp[0]); reax->sbp[i].lgcij = val;
	val = atof(tmp[1]); reax->sbp[i].lgre = val;
	}

	if( reax->sbp[i].rcore2>0.01 && reax->sbp[i].acore2>0.01 ){ // Inner-wall
	if( reax->sbp[i].gamma_w>0.5 ){ // Shielding vdWaals
	if( reax->gp.vdw_type != 0 && reax->gp.vdw_type != 3 ) {
	if (errorflag)
	fprintf( stderr, "Warning: inconsistent vdWaals-parameters\n" \
	"Force field parameters for element %s\n" \
	"indicate inner wall+shielding, but earlier\n" \
	"atoms indicate different vdWaals-method.\n" \
	"This may cause division-by-zero errors.\n" \
	"Keeping vdWaals-setting for earlier atoms.\n",
	reax->sbp[i].name );
	errorflag = 0;
	}
	else{
	reax->gp.vdw_type = 3;
	}
	}
	else { // No shielding vdWaals parameters present
	if( reax->gp.vdw_type != 0 && reax->gp.vdw_type != 2 )
	fprintf( stderr, "Warning: inconsistent vdWaals-parameters\n" \
	"Force field parameters for element %s\n" \
	"indicate inner wall without shielding, but earlier\n" \
	"atoms indicate different vdWaals-method.\n" \
	"This may cause division-by-zero errors.\n" \
	"Keeping vdWaals-setting for earlier atoms.\n",
	reax->sbp[i].name );
	else{
	reax->gp.vdw_type = 2;
	}
	}
	}
	else{ // No Inner wall parameters present
	if( reax->sbp[i].gamma_w>0.5 ){ // Shielding vdWaals
	if( reax->gp.vdw_type != 0 && reax->gp.vdw_type != 1 )
	fprintf( stderr, "Warning: inconsistent vdWaals-parameters\n" \
	"Force field parameters for element %s\n" \
	"indicate shielding without inner wall, but earlier\n" \
	"atoms indicate different vdWaals-method.\n" \
	"This may cause division-by-zero errors.\n" \
	"Keeping vdWaals-setting for earlier atoms.\n",
	reax->sbp[i].name );
	else{
	reax->gp.vdw_type = 1;
	}
	}
	else{
	fprintf( stderr, "Error: inconsistent vdWaals-parameters\n"\
	"No shielding or inner-wall set for element %s\n",
	reax->sbp[i].name );
	MPI_Abort( comm, INVALID_INPUT );
	}
	}
	}

	/* Equate vval3 to valf for first-row elements (25/10/2004) */
	for( i = 0; i < reax->num_atom_types; i++ )
	if( reax->sbp[i].mass < 21 &&
	reax->sbp[i].valency_val != reax->sbp[i].valency_boc ){
	fprintf( stderr, "Warning: changed valency_val to valency_boc for %s\n",
	reax->sbp[i].name );
	reax->sbp[i].valency_val = reax->sbp[i].valency_boc;
	}

	/* next line is number of two body combination and some comments */
	fgets(s,MAX_LINE,fp);
	c=Tokenize(s,&tmp);
	l = atoi(tmp[0]);

	/* a line of comments */
	fgets(s,MAX_LINE,fp);

	for (i=0; i < l; i++) {
	/* line 1 */
	fgets(s,MAX_LINE,fp);
	c=Tokenize(s,&tmp);

	j = atoi(tmp[0]) - 1;
	k = atoi(tmp[1]) - 1;

	if (j < reax->num_atom_types && k < reax->num_atom_types) {

	val = atof(tmp[2]); reax->tbp[j][k].De_s = val;
	reax->tbp[k][j].De_s = val;
	val = atof(tmp[3]); reax->tbp[j][k].De_p = val;
	reax->tbp[k][j].De_p = val;
	val = atof(tmp[4]); reax->tbp[j][k].De_pp = val;
	reax->tbp[k][j].De_pp = val;
	val = atof(tmp[5]); reax->tbp[j][k].p_be1 = val;
	reax->tbp[k][j].p_be1 = val;
	val = atof(tmp[6]); reax->tbp[j][k].p_bo5 = val;
	reax->tbp[k][j].p_bo5 = val;
	val = atof(tmp[7]); reax->tbp[j][k].v13cor = val;
	reax->tbp[k][j].v13cor = val;

	val = atof(tmp[8]); reax->tbp[j][k].p_bo6 = val;
	reax->tbp[k][j].p_bo6 = val;
	val = atof(tmp[9]); reax->tbp[j][k].p_ovun1 = val;
	reax->tbp[k][j].p_ovun1 = val;

	/* line 2 */
	fgets(s,MAX_LINE,fp);
	c=Tokenize(s,&tmp);

	val = atof(tmp[0]); reax->tbp[j][k].p_be2 = val;
	reax->tbp[k][j].p_be2 = val;
	val = atof(tmp[1]); reax->tbp[j][k].p_bo3 = val;
	reax->tbp[k][j].p_bo3 = val;
	val = atof(tmp[2]); reax->tbp[j][k].p_bo4 = val;
	reax->tbp[k][j].p_bo4 = val;
	val = atof(tmp[3]);

	val = atof(tmp[4]); reax->tbp[j][k].p_bo1 = val;
	reax->tbp[k][j].p_bo1 = val;
	val = atof(tmp[5]); reax->tbp[j][k].p_bo2 = val;
	reax->tbp[k][j].p_bo2 = val;
	val = atof(tmp[6]); reax->tbp[j][k].ovc = val;
	reax->tbp[k][j].ovc = val;

	val = atof(tmp[7]);
	}
	}

	for (i=0; i < reax->num_atom_types; i++)
	for (j=i; j < reax->num_atom_types; j++) {
	reax->tbp[i][j].r_s = 0.5 *
	(reax->sbp[i].r_s + reax->sbp[j].r_s);
	reax->tbp[j][i].r_s = 0.5 *
	(reax->sbp[j].r_s + reax->sbp[i].r_s);

	reax->tbp[i][j].r_p = 0.5 *
	(reax->sbp[i].r_pi + reax->sbp[j].r_pi);
	reax->tbp[j][i].r_p = 0.5 *
	(reax->sbp[j].r_pi + reax->sbp[i].r_pi);

	reax->tbp[i][j].r_pp = 0.5 *
	(reax->sbp[i].r_pi_pi + reax->sbp[j].r_pi_pi);
	reax->tbp[j][i].r_pp = 0.5 *
	(reax->sbp[j].r_pi_pi + reax->sbp[i].r_pi_pi);


	reax->tbp[i][j].p_boc3 =
	sqrt(reax->sbp[i].b_o_132 *
	reax->sbp[j].b_o_132);
	reax->tbp[j][i].p_boc3 =
	sqrt(reax->sbp[j].b_o_132 *
	reax->sbp[i].b_o_132);

	reax->tbp[i][j].p_boc4 =
	sqrt(reax->sbp[i].b_o_131 *
	reax->sbp[j].b_o_131);
	reax->tbp[j][i].p_boc4 =
	sqrt(reax->sbp[j].b_o_131 *
	reax->sbp[i].b_o_131);

	reax->tbp[i][j].p_boc5 =
	sqrt(reax->sbp[i].b_o_133 *
	reax->sbp[j].b_o_133);
	reax->tbp[j][i].p_boc5 =
	sqrt(reax->sbp[j].b_o_133 *
	reax->sbp[i].b_o_133);


	reax->tbp[i][j].D =
	sqrt(reax->sbp[i].epsilon *
	reax->sbp[j].epsilon);

	reax->tbp[j][i].D =
	sqrt(reax->sbp[j].epsilon *
	reax->sbp[i].epsilon);

	reax->tbp[i][j].alpha =
	sqrt(reax->sbp[i].alpha *
	reax->sbp[j].alpha);

	reax->tbp[j][i].alpha =
	sqrt(reax->sbp[j].alpha *
	reax->sbp[i].alpha);

	reax->tbp[i][j].r_vdW =
	2.0 * sqrt(reax->sbp[i].r_vdw * reax->sbp[j].r_vdw);

	reax->tbp[j][i].r_vdW =
	2.0 * sqrt(reax->sbp[j].r_vdw * reax->sbp[i].r_vdw);

	reax->tbp[i][j].gamma_w =
	sqrt(reax->sbp[i].gamma_w *
	reax->sbp[j].gamma_w);

	reax->tbp[j][i].gamma_w =
	sqrt(reax->sbp[j].gamma_w *
	reax->sbp[i].gamma_w);

	reax->tbp[i][j].gamma =
	pow(reax->sbp[i].gamma *
	reax->sbp[j].gamma,-1.5);

	reax->tbp[j][i].gamma =
	pow(reax->sbp[j].gamma *
	reax->sbp[i].gamma,-1.5);

	// additions for additional vdWaals interaction types - inner core

	reax->tbp[i][j].rcore = reax->tbp[j][i].rcore =
	sqrt( reax->sbp[i].rcore2 * reax->sbp[j].rcore2 );

	reax->tbp[i][j].ecore = reax->tbp[j][i].ecore =
	sqrt( reax->sbp[i].ecore2 * reax->sbp[j].ecore2 );

	reax->tbp[i][j].acore = reax->tbp[j][i].acore =
	sqrt( reax->sbp[i].acore2 * reax->sbp[j].acore2 );

	// additions for additional vdWalls interaction types lg correction

	reax->tbp[i][j].lgcij = reax->tbp[j][i].lgcij =
	sqrt( reax->sbp[i].lgcij * reax->sbp[j].lgcij );

	reax->tbp[i][j].lgre = reax->tbp[j][i].lgre = 2.0 * reax->gp.l[35] *
	sqrt( reax->sbp[i].lgre*reax->sbp[j].lgre );

	}

	fgets(s,MAX_LINE,fp);
	c=Tokenize(s,&tmp);
	l = atoi(tmp[0]);

	for (i=0; i < l; i++) {
	fgets(s,MAX_LINE,fp);
	c=Tokenize(s,&tmp);

	j = atoi(tmp[0]) - 1;
	k = atoi(tmp[1]) - 1;

	if (j < reax->num_atom_types && k < reax->num_atom_types) {
	val = atof(tmp[2]);
	if (val > 0.0) {
	reax->tbp[j][k].D = val;
	reax->tbp[k][j].D = val;
	}

	val = atof(tmp[3]);
	if (val > 0.0) {
	reax->tbp[j][k].r_vdW = 2 * val;
	reax->tbp[k][j].r_vdW = 2 * val;
	}

	val = atof(tmp[4]);
	if (val > 0.0) {
	reax->tbp[j][k].alpha = val;
	reax->tbp[k][j].alpha = val;
	}

	val = atof(tmp[5]);
	if (val > 0.0) {
	reax->tbp[j][k].r_s = val;
	reax->tbp[k][j].r_s = val;
	}

	val = atof(tmp[6]);
	if (val > 0.0) {
	reax->tbp[j][k].r_p = val;
	reax->tbp[k][j].r_p = val;
	}

	val = atof(tmp[7]);
	if (val > 0.0) {
	reax->tbp[j][k].r_pp = val;
	reax->tbp[k][j].r_pp = val;
	}

	val = atof(tmp[8]);
	if (val >= 0.0) {
	reax->tbp[j][k].lgcij = val;
	reax->tbp[k][j].lgcij = val;
	}
	}
	}

	for( i = 0; i < reax->num_atom_types; ++i )
	for( j = 0; j < reax->num_atom_types; ++j )
	for( k = 0; k < reax->num_atom_types; ++k )
	reax->thbp[i][j][k].cnt = 0;

	fgets( s, MAX_LINE, fp );
	c = Tokenize( s, &tmp );
	l = atoi( tmp[0] );

	for( i = 0; i < l; i++ ) {
	fgets(s,MAX_LINE,fp);
	c=Tokenize(s,&tmp);

	j = atoi(tmp[0]) - 1;
	k = atoi(tmp[1]) - 1;
	m = atoi(tmp[2]) - 1;

	if (j < reax->num_atom_types && k < reax->num_atom_types &&
	m < reax->num_atom_types) {
	cnt = reax->thbp[j][k][m].cnt;
	reax->thbp[j][k][m].cnt++;
	reax->thbp[m][k][j].cnt++;

	val = atof(tmp[3]);
	reax->thbp[j][k][m].prm[cnt].theta_00 = val;
	reax->thbp[m][k][j].prm[cnt].theta_00 = val;

	val = atof(tmp[4]);
	reax->thbp[j][k][m].prm[cnt].p_val1 = val;
	reax->thbp[m][k][j].prm[cnt].p_val1 = val;

	val = atof(tmp[5]);
	reax->thbp[j][k][m].prm[cnt].p_val2 = val;
	reax->thbp[m][k][j].prm[cnt].p_val2 = val;

	val = atof(tmp[6]);
	reax->thbp[j][k][m].prm[cnt].p_coa1 = val;
	reax->thbp[m][k][j].prm[cnt].p_coa1 = val;

	val = atof(tmp[7]);
	reax->thbp[j][k][m].prm[cnt].p_val7 = val;
	reax->thbp[m][k][j].prm[cnt].p_val7 = val;

	val = atof(tmp[8]);
	reax->thbp[j][k][m].prm[cnt].p_pen1 = val;
	reax->thbp[m][k][j].prm[cnt].p_pen1 = val;

	val = atof(tmp[9]);
	reax->thbp[j][k][m].prm[cnt].p_val4 = val;
	reax->thbp[m][k][j].prm[cnt].p_val4 = val;
	}
	}

	/* clear all entries first */
	for( i = 0; i < reax->num_atom_types; ++i )
	for( j = 0; j < reax->num_atom_types; ++j )
	for( k = 0; k < reax->num_atom_types; ++k )
	for( m = 0; m < reax->num_atom_types; ++m ) {
	reax->fbp[i][j][k][m].cnt = 0;
	tor_flag[i][j][k][m] = 0;
	}

	/* next line is number of 4-body params and some comments */
	fgets( s, MAX_LINE, fp );
	c = Tokenize( s, &tmp );
	l = atoi( tmp[0] );

	for( i = 0; i < l; i++ ) {
	fgets( s, MAX_LINE, fp );
	c = Tokenize( s, &tmp );

	j = atoi(tmp[0]) - 1;
	k = atoi(tmp[1]) - 1;
	m = atoi(tmp[2]) - 1;
	n = atoi(tmp[3]) - 1;

	if (j >= 0 && n >= 0) { // this means the entry is not in compact form
	if (j < reax->num_atom_types && k < reax->num_atom_types &&
	m < reax->num_atom_types && n < reax->num_atom_types) {
	tor_flag[j][k][m][n] = 1;
	tor_flag[n][m][k][j] = 1;

	reax->fbp[j][k][m][n].cnt = 1;
	reax->fbp[n][m][k][j].cnt = 1;

	val = atof(tmp[4]);
	reax->fbp[j][k][m][n].prm[0].V1 = val;
	reax->fbp[n][m][k][j].prm[0].V1 = val;

	val = atof(tmp[5]);
	reax->fbp[j][k][m][n].prm[0].V2 = val;
	reax->fbp[n][m][k][j].prm[0].V2 = val;

	val = atof(tmp[6]);
	reax->fbp[j][k][m][n].prm[0].V3 = val;
	reax->fbp[n][m][k][j].prm[0].V3 = val;

	val = atof(tmp[7]);
	reax->fbp[j][k][m][n].prm[0].p_tor1 = val;
	reax->fbp[n][m][k][j].prm[0].p_tor1 = val;

	val = atof(tmp[8]);
	reax->fbp[j][k][m][n].prm[0].p_cot1 = val;
	reax->fbp[n][m][k][j].prm[0].p_cot1 = val;
	}
	}
	else { /* This means the entry is of the form 0-X-Y-0 */
	if( k < reax->num_atom_types && m < reax->num_atom_types )
	for( p = 0; p < reax->num_atom_types; p++ )
	for( o = 0; o < reax->num_atom_types; o++ ) {
	reax->fbp[p][k][m][o].cnt = 1;
	reax->fbp[o][m][k][p].cnt = 1;

	if (tor_flag[p][k][m][o] == 0) {
	reax->fbp[p][k][m][o].prm[0].V1 = atof(tmp[4]);
	reax->fbp[p][k][m][o].prm[0].V2 = atof(tmp[5]);
	reax->fbp[p][k][m][o].prm[0].V3 = atof(tmp[6]);
	reax->fbp[p][k][m][o].prm[0].p_tor1 = atof(tmp[7]);
	reax->fbp[p][k][m][o].prm[0].p_cot1 = atof(tmp[8]);
	}

	if (tor_flag[o][m][k][p] == 0) {
	reax->fbp[o][m][k][p].prm[0].V1 = atof(tmp[4]);
	reax->fbp[o][m][k][p].prm[0].V2 = atof(tmp[5]);
	reax->fbp[o][m][k][p].prm[0].V3 = atof(tmp[6]);
	reax->fbp[o][m][k][p].prm[0].p_tor1 = atof(tmp[7]);
	reax->fbp[o][m][k][p].prm[0].p_cot1 = atof(tmp[8]);
	}
	}
	}
	}



	/* next line is number of hydrogen bond params and some comments */
	fgets( s, MAX_LINE, fp );
	c = Tokenize( s, &tmp );
	l = atoi( tmp[0] );

	for( i = 0; i < reax->num_atom_types; ++i )
	for( j = 0; j < reax->num_atom_types; ++j )
	for( k = 0; k < reax->num_atom_types; ++k )
	reax->hbp[i][j][k].r0_hb = -1.0;

	for( i = 0; i < l; i++ ) {
	fgets( s, MAX_LINE, fp );
	c = Tokenize( s, &tmp );

	j = atoi(tmp[0]) - 1;
	k = atoi(tmp[1]) - 1;
	m = atoi(tmp[2]) - 1;


	if( j < reax->num_atom_types && m < reax->num_atom_types ) {
	val = atof(tmp[3]);
	reax->hbp[j][k][m].r0_hb = val;

	val = atof(tmp[4]);
	reax->hbp[j][k][m].p_hb1 = val;

	val = atof(tmp[5]);
	reax->hbp[j][k][m].p_hb2 = val;

	val = atof(tmp[6]);
	reax->hbp[j][k][m].p_hb3 = val;
	}
	}

	/* deallocate helper storage */
	for( i = 0; i < MAX_TOKENS; i++ )
	free( tmp[i] );
	free( tmp );
	free( s );


	/* deallocate tor_flag */
	for( i = 0; i < reax->num_atom_types; i++ ) {
	for( j = 0; j < reax->num_atom_types; j++ ) {
	for( k = 0; k < reax->num_atom_types; k++ ) {
	free( tor_flag[i][j][k] );
	}
	free( tor_flag[i][j] );
	}
	free( tor_flag[i] );
	}
	free( tor_flag );

	// close file

	fclose(fp);

	return SUCCESS;
	}
	diff --git a/src/USER-REAXC/reaxc_forces.cpp b/src/USER-REAXC/reaxc_forces.cpp
	index 7f11f5565..215ded6e5 100644
	--- a/src/USER-REAXC/reaxc_forces.cpp
	+++ b/src/USER-REAXC/reaxc_forces.cpp
	@@ -1,459 +1,459 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_forces.h"
	#include "reaxc_bond_orders.h"
	#include "reaxc_bonds.h"
	#include "reaxc_hydrogen_bonds.h"
	#include "reaxc_io_tools.h"
	#include "reaxc_list.h"
	#include "reaxc_lookup.h"
	#include "reaxc_multi_body.h"
	#include "reaxc_nonbonded.h"
	#include "reaxc_tool_box.h"
	#include "reaxc_torsion_angles.h"
	#include "reaxc_valence_angles.h"
	#include "reaxc_vector.h"

	interaction_function Interaction_Functions[NUM_INTRS];

	void Dummy_Interaction( reax_system system, control_params control,
	simulation_data data, storage workspace,
	reax_list *lists, output_controls out_control )
	{
	}


	void Init_Force_Functions( control_params *control )
	{
	Interaction_Functions[0] = BO;
	Interaction_Functions[1] = Bonds; //Dummy_Interaction;
	Interaction_Functions[2] = Atom_Energy; //Dummy_Interaction;
	Interaction_Functions[3] = Valence_Angles; //Dummy_Interaction;
	Interaction_Functions[4] = Torsion_Angles; //Dummy_Interaction;
	if( control->hbond_cut > 0 )
	Interaction_Functions[5] = Hydrogen_Bonds;
	else Interaction_Functions[5] = Dummy_Interaction;
	Interaction_Functions[6] = Dummy_Interaction; //empty
	Interaction_Functions[7] = Dummy_Interaction; //empty
	Interaction_Functions[8] = Dummy_Interaction; //empty
	Interaction_Functions[9] = Dummy_Interaction; //empty
	}


	void Compute_Bonded_Forces( reax_system system, control_params control,
	simulation_data data, storage workspace,
	reax_list *lists, output_controls out_control,
	MPI_Comm comm )
	{
	int i;

	/* Implement all force calls as function pointers */
	for( i = 0; i < NUM_INTRS; i++ ) {
	(Interaction_Functions[i])( system, control, data, workspace,
	lists, out_control );
	}
	}


	void Compute_NonBonded_Forces( reax_system system, control_params control,
	simulation_data data, storage workspace,
	reax_list *lists, output_controls out_control,
	MPI_Comm comm )
	{

	/* van der Waals and Coulomb interactions */
	if( control->tabulate == 0 )
	vdW_Coulomb_Energy( system, control, data, workspace,
	lists, out_control );
	else
	Tabulated_vdW_Coulomb_Energy( system, control, data, workspace,
	lists, out_control );
	}


	void Compute_Total_Force( reax_system system, control_params control,
	simulation_data data, storage workspace,
	reax_list *lists, mpi_datatypes mpi_data )
	{
	int i, pj;
	reax_list bonds = (lists) + BONDS;

	for( i = 0; i < system->N; ++i )
	for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj )
	if( i < bonds->select.bond_list[pj].nbr ) {
	if( control->virial == 0 )
	Add_dBond_to_Forces( system, i, pj, workspace, lists );
	else
	Add_dBond_to_Forces_NPT( i, pj, data, workspace, lists );
	}

	}

	void Validate_Lists( reax_system system, storage workspace, reax_list **lists,
	int step, int n, int N, int numH, MPI_Comm comm )
	{
	int i, comp, Hindex;
	reax_list bonds, hbonds;

	double saferzone = system->saferzone;

	/* bond list */
	if( N > 0 ) {
	bonds = *lists + BONDS;

	for( i = 0; i < N; ++i ) {
	system->my_atoms[i].num_bonds = MAX(Num_Entries(i,bonds)*2, MIN_BONDS);

	if( i < N-1 )
	comp = Start_Index(i+1, bonds);
	else comp = bonds->num_intrs;

	if( End_Index(i, bonds) > comp ) {
	fprintf( stderr, "step%d-bondchk failed: i=%d end(i)=%d str(i+1)=%d\n",
	step, i, End_Index(i,bonds), comp );
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}
	}
	}


	/* hbonds list */
	if( numH > 0 ) {
	hbonds = *lists + HBONDS;

	for( i = 0; i < N; ++i ) {
	Hindex = system->my_atoms[i].Hindex;
	if( Hindex > -1 ) {
	system->my_atoms[i].num_hbonds =
	(int)(MAX( Num_Entries(Hindex, hbonds)*saferzone, MIN_HBONDS ));

	//if( Num_Entries(i, hbonds) >=
	//(Start_Index(i+1,hbonds)-Start_Index(i,hbonds))0.90/DANGER_ZONE*/){
	// workspace->realloc.hbonds = 1;

	if( Hindex < numH-1 )
	comp = Start_Index(Hindex+1, hbonds);
	else comp = hbonds->num_intrs;

	if( End_Index(Hindex, hbonds) > comp ) {
	fprintf(stderr,"step%d-hbondchk failed: H=%d end(H)=%d str(H+1)=%d\n",
	step, Hindex, End_Index(Hindex,hbonds), comp );
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}
	}
	}
	}
	}


	void Init_Forces_noQEq( reax_system system, control_params control,
	simulation_data data, storage workspace,
	reax_list *lists, output_controls out_control,
	MPI_Comm comm ) {
	int i, j, pj;
	int start_i, end_i;
	int type_i, type_j;
	int btop_i, btop_j, num_bonds, num_hbonds;
	int ihb, jhb, ihb_top, jhb_top;
	int local, flag, renbr;
	double cutoff;
	reax_list far_nbrs, bonds, *hbonds;
	single_body_parameters sbp_i, sbp_j;
	two_body_parameters *twbp;
	far_neighbor_data *nbr_pj;
	reax_atom atom_i, atom_j;

	far_nbrs = *lists + FAR_NBRS;
	bonds = *lists + BONDS;
	hbonds = *lists + HBONDS;

	for( i = 0; i < system->n; ++i )
	workspace->bond_mark[i] = 0;
	for( i = system->n; i < system->N; ++i ) {
	workspace->bond_mark[i] = 1000; // put ghost atoms to an infinite distance
	}

	num_bonds = 0;
	num_hbonds = 0;
	btop_i = btop_j = 0;
	renbr = (data->step-data->prev_steps) % control->reneighbor == 0;

	for( i = 0; i < system->N; ++i ) {
	atom_i = &(system->my_atoms[i]);
	type_i = atom_i->type;
	if (type_i < 0) continue;
	start_i = Start_Index(i, far_nbrs);
	end_i = End_Index(i, far_nbrs);
	btop_i = End_Index( i, bonds );
	sbp_i = &(system->reax_param.sbp[type_i]);

	if( i < system->n ) {
	local = 1;
	cutoff = MAX( control->hbond_cut, control->bond_cut );
	}
	else {
	local = 0;
	cutoff = control->bond_cut;
	}

	ihb = -1;
	ihb_top = -1;
	if( local && control->hbond_cut > 0 ) {
	ihb = sbp_i->p_hbond;
	if( ihb == 1 )
	ihb_top = End_Index( atom_i->Hindex, hbonds );
	else ihb_top = -1;
	}

	/* update i-j distance - check if j is within cutoff */
	for( pj = start_i; pj < end_i; ++pj ) {
	nbr_pj = &( far_nbrs->select.far_nbr_list[pj] );
	j = nbr_pj->nbr;
	atom_j = &(system->my_atoms[j]);

	if( renbr ) {
	if( nbr_pj->d <= cutoff )
	flag = 1;
	else flag = 0;
	}
	else{
	nbr_pj->dvec[0] = atom_j->x[0] - atom_i->x[0];
	nbr_pj->dvec[1] = atom_j->x[1] - atom_i->x[1];
	nbr_pj->dvec[2] = atom_j->x[2] - atom_i->x[2];
	nbr_pj->d = rvec_Norm_Sqr( nbr_pj->dvec );
	if( nbr_pj->d <= SQR(cutoff) ) {
	nbr_pj->d = sqrt(nbr_pj->d);
	flag = 1;
	}
	else {
	flag = 0;
	}
	}

	if( flag ) {
	type_j = atom_j->type;
	if (type_j < 0) continue;
	sbp_j = &(system->reax_param.sbp[type_j]);
	twbp = &(system->reax_param.tbp[type_i][type_j]);

	if( local ) {
	/* hydrogen bond lists */
	if( control->hbond_cut > 0 && (ihb==1 \|\| ihb==2) &&
	nbr_pj->d <= control->hbond_cut ) {
	// fprintf( stderr, "%d %d\n", atom1, atom2 );
	jhb = sbp_j->p_hbond;
	if( ihb == 1 && jhb == 2 ) {
	hbonds->select.hbond_list[ihb_top].nbr = j;
	hbonds->select.hbond_list[ihb_top].scl = 1;
	hbonds->select.hbond_list[ihb_top].ptr = nbr_pj;
	++ihb_top;
	++num_hbonds;
	}
	else if( j < system->n && ihb == 2 && jhb == 1 ) {
	jhb_top = End_Index( atom_j->Hindex, hbonds );
	hbonds->select.hbond_list[jhb_top].nbr = i;
	hbonds->select.hbond_list[jhb_top].scl = -1;
	hbonds->select.hbond_list[jhb_top].ptr = nbr_pj;
	Set_End_Index( atom_j->Hindex, jhb_top+1, hbonds );
	++num_hbonds;
	}
	}
	}

	if( //(workspace->bond_mark[i] < 3 \|\| workspace->bond_mark[j] < 3) &&
	nbr_pj->d <= control->bond_cut &&
	BOp( workspace, bonds, control->bo_cut,
	i , btop_i, nbr_pj, sbp_i, sbp_j, twbp ) ) {
	num_bonds += 2;
	++btop_i;

	if( workspace->bond_mark[j] > workspace->bond_mark[i] + 1 )
	workspace->bond_mark[j] = workspace->bond_mark[i] + 1;
	else if( workspace->bond_mark[i] > workspace->bond_mark[j] + 1 ) {
	workspace->bond_mark[i] = workspace->bond_mark[j] + 1;
	}
	}
	}
	}

	Set_End_Index( i, btop_i, bonds );
	if( local && ihb == 1 )
	Set_End_Index( atom_i->Hindex, ihb_top, hbonds );
	}


	workspace->realloc.num_bonds = num_bonds;
	workspace->realloc.num_hbonds = num_hbonds;

	Validate_Lists( system, workspace, lists, data->step,
	system->n, system->N, system->numH, comm );
	}


	void Estimate_Storages( reax_system system, control_params control,
	reax_list *lists, int Htop, int *hb_top,
	int bond_top, int num_3body, MPI_Comm comm )
	{
	int i, j, pj;
	int start_i, end_i;
	int type_i, type_j;
	int ihb, jhb;
	int local;
	double cutoff;
	double r_ij;
	double C12, C34, C56;
	double BO, BO_s, BO_pi, BO_pi2;
	reax_list *far_nbrs;
	single_body_parameters sbp_i, sbp_j;
	two_body_parameters *twbp;
	far_neighbor_data *nbr_pj;
	reax_atom atom_i, atom_j;

	int mincap = system->mincap;
	double safezone = system->safezone;
	double saferzone = system->saferzone;

	far_nbrs = *lists + FAR_NBRS;
	*Htop = 0;
	memset( hb_top, 0, sizeof(int) * system->local_cap );
	memset( bond_top, 0, sizeof(int) * system->total_cap );
	*num_3body = 0;

	for( i = 0; i < system->N; ++i ) {
	atom_i = &(system->my_atoms[i]);
	type_i = atom_i->type;
	if (type_i < 0) continue;
	start_i = Start_Index(i, far_nbrs);
	end_i = End_Index(i, far_nbrs);
	sbp_i = &(system->reax_param.sbp[type_i]);

	if( i < system->n ) {
	local = 1;
	cutoff = control->nonb_cut;
	++(*Htop);
	ihb = sbp_i->p_hbond;
	}
	else {
	local = 0;
	cutoff = control->bond_cut;
	ihb = -1;
	}

	for( pj = start_i; pj < end_i; ++pj ) {
	nbr_pj = &( far_nbrs->select.far_nbr_list[pj] );
	j = nbr_pj->nbr;
	atom_j = &(system->my_atoms[j]);

	if(nbr_pj->d <= cutoff) {
	type_j = system->my_atoms[j].type;
	if (type_j < 0) continue;
	r_ij = nbr_pj->d;
	sbp_j = &(system->reax_param.sbp[type_j]);
	twbp = &(system->reax_param.tbp[type_i][type_j]);

	if( local ) {
	if( j < system->n \|\| atom_i->orig_id < atom_j->orig_id ) //tryQEq \|\|1
	++(*Htop);

	/* hydrogen bond lists */
	if( control->hbond_cut > 0.1 && (ihb==1 \|\| ihb==2) &&
	nbr_pj->d <= control->hbond_cut ) {
	jhb = sbp_j->p_hbond;
	if( ihb == 1 && jhb == 2 )
	++hb_top[i];
	else if( j < system->n && ihb == 2 && jhb == 1 )
	++hb_top[j];
	}
	}

	/* uncorrected bond orders */
	if( nbr_pj->d <= control->bond_cut ) {
	if( sbp_i->r_s > 0.0 && sbp_j->r_s > 0.0) {
	C12 = twbp->p_bo1 * pow( r_ij / twbp->r_s, twbp->p_bo2 );
	BO_s = (1.0 + control->bo_cut) * exp( C12 );
	}
	else BO_s = C12 = 0.0;

	if( sbp_i->r_pi > 0.0 && sbp_j->r_pi > 0.0) {
	C34 = twbp->p_bo3 * pow( r_ij / twbp->r_p, twbp->p_bo4 );
	BO_pi = exp( C34 );
	}
	else BO_pi = C34 = 0.0;

	if( sbp_i->r_pi_pi > 0.0 && sbp_j->r_pi_pi > 0.0) {
	C56 = twbp->p_bo5 * pow( r_ij / twbp->r_pp, twbp->p_bo6 );
	BO_pi2= exp( C56 );
	}
	else BO_pi2 = C56 = 0.0;

	/* Initially BO values are the uncorrected ones, page 1 */
	BO = BO_s + BO_pi + BO_pi2;

	if( BO >= control->bo_cut ) {
	++bond_top[i];
	++bond_top[j];
	}
	}
	}
	}
	}

	Htop = (int)(MAX( Htop * safezone, mincap * MIN_HENTRIES ));
	for( i = 0; i < system->n; ++i )
	hb_top[i] = (int)(MAX( hb_top[i] * saferzone, MIN_HBONDS ));

	for( i = 0; i < system->N; ++i ) {
	*num_3body += SQR(bond_top[i]);
	bond_top[i] = MAX( bond_top[i] * 2, MIN_BONDS );
	}

	}


	void Compute_Forces( reax_system system, control_params control,
	simulation_data data, storage workspace,
	reax_list *lists, output_controls out_control,
	mpi_datatypes *mpi_data )
	{
	MPI_Comm comm = mpi_data->world;

	Init_Forces_noQEq( system, control, data, workspace,
	lists, out_control, comm );

	/******* bonded interactions **********/
	Compute_Bonded_Forces( system, control, data, workspace,
	lists, out_control, mpi_data->world );

	/******* nonbonded interactions **********/
	Compute_NonBonded_Forces( system, control, data, workspace,
	lists, out_control, mpi_data->world );

	/********* total force *************/
	Compute_Total_Force( system, control, data, workspace, lists, mpi_data );

	}
	diff --git a/src/USER-REAXC/reaxc_hydrogen_bonds.cpp b/src/USER-REAXC/reaxc_hydrogen_bonds.cpp
	index 8d7b3b381..ff771ad65 100644
	--- a/src/USER-REAXC/reaxc_hydrogen_bonds.cpp
	+++ b/src/USER-REAXC/reaxc_hydrogen_bonds.cpp
	@@ -1,184 +1,184 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_hydrogen_bonds.h"
	#include "reaxc_bond_orders.h"
	#include "reaxc_list.h"
	#include "reaxc_valence_angles.h"
	#include "reaxc_vector.h"

	void Hydrogen_Bonds( reax_system system, control_params control,
	simulation_data data, storage workspace,
	reax_list *lists, output_controls out_control )
	{
	int i, j, k, pi, pk;
	int type_i, type_j, type_k;
	int start_j, end_j, hb_start_j, hb_end_j;
	int hblist[MAX_BONDS];
	int itr, top;
	int num_hb_intrs = 0;
	ivec rel_jk;
	double r_jk, theta, cos_theta, sin_xhz4, cos_xhz1, sin_theta2;
	double e_hb, exp_hb2, exp_hb3, CEhb1, CEhb2, CEhb3;
	rvec dcos_theta_di, dcos_theta_dj, dcos_theta_dk;
	rvec dvec_jk, force, ext_press;
	hbond_parameters *hbp;
	bond_order_data *bo_ij;
	bond_data *pbond_ij;
	far_neighbor_data *nbr_jk;
	reax_list bonds, hbonds;
	bond_data *bond_list;
	hbond_data *hbond_list;

	// tally variables
	double fi_tmp[3], fk_tmp[3], delij[3], delkj[3];

	bonds = (*lists) + BONDS;
	bond_list = bonds->select.bond_list;
	hbonds = (*lists) + HBONDS;
	hbond_list = hbonds->select.hbond_list;

	for( j = 0; j < system->n; ++j )
	if( system->reax_param.sbp[system->my_atoms[j].type].p_hbond == 1 ) {
	type_j = system->my_atoms[j].type;
	start_j = Start_Index(j, bonds);
	end_j = End_Index(j, bonds);
	hb_start_j = Start_Index( system->my_atoms[j].Hindex, hbonds );
	hb_end_j = End_Index( system->my_atoms[j].Hindex, hbonds );
	if (type_j < 0) continue;

	top = 0;
	for( pi = start_j; pi < end_j; ++pi ) {
	pbond_ij = &( bond_list[pi] );
	i = pbond_ij->nbr;
	type_i = system->my_atoms[i].type;
	if (type_i < 0) continue;
	bo_ij = &(pbond_ij->bo_data);

	if( system->reax_param.sbp[type_i].p_hbond == 2 &&
	bo_ij->BO >= HB_THRESHOLD )
	hblist[top++] = pi;
	}

	for( pk = hb_start_j; pk < hb_end_j; ++pk ) {
	/* set k's varibles */
	k = hbond_list[pk].nbr;
	type_k = system->my_atoms[k].type;
	if (type_k < 0) continue;
	nbr_jk = hbond_list[pk].ptr;
	r_jk = nbr_jk->d;
	rvec_Scale( dvec_jk, hbond_list[pk].scl, nbr_jk->dvec );

	for( itr = 0; itr < top; ++itr ) {
	pi = hblist[itr];
	pbond_ij = &( bonds->select.bond_list[pi] );
	i = pbond_ij->nbr;

	if( system->my_atoms[i].orig_id != system->my_atoms[k].orig_id ) {
	bo_ij = &(pbond_ij->bo_data);
	type_i = system->my_atoms[i].type;
	if (type_i < 0) continue;
	hbp = &(system->reax_param.hbp[ type_i ][ type_j ][ type_k ]);
	if (hbp->r0_hb <= 0.0) continue;
	++num_hb_intrs;

	Calculate_Theta( pbond_ij->dvec, pbond_ij->d, dvec_jk, r_jk,
	&theta, &cos_theta );
	/* the derivative of cos(theta) */
	Calculate_dCos_Theta( pbond_ij->dvec, pbond_ij->d, dvec_jk, r_jk,
	&dcos_theta_di, &dcos_theta_dj,
	&dcos_theta_dk );

	/* hyrogen bond energy*/
	sin_theta2 = sin( theta/2.0 );
	sin_xhz4 = SQR(sin_theta2);
	sin_xhz4 *= sin_xhz4;
	cos_xhz1 = ( 1.0 - cos_theta );
	exp_hb2 = exp( -hbp->p_hb2 * bo_ij->BO );
	exp_hb3 = exp( -hbp->p_hb3 * ( hbp->r0_hb / r_jk +
	r_jk / hbp->r0_hb - 2.0 ) );

	data->my_en.e_hb += e_hb =
	hbp->p_hb1 * (1.0 - exp_hb2) * exp_hb3 * sin_xhz4;

	CEhb1 = hbp->p_hb1 * hbp->p_hb2 * exp_hb2 * exp_hb3 * sin_xhz4;
	CEhb2 = -hbp->p_hb1/2.0 * (1.0 - exp_hb2) * exp_hb3 * cos_xhz1;
	CEhb3 = -hbp->p_hb3 *
	(-hbp->r0_hb / SQR(r_jk) + 1.0 / hbp->r0_hb) * e_hb;

	/* hydrogen bond forces */
	bo_ij->Cdbo += CEhb1; // dbo term

	if( control->virial == 0 ) {
	// dcos terms
	rvec_ScaledAdd( workspace->f[i], +CEhb2, dcos_theta_di );
	rvec_ScaledAdd( workspace->f[j], +CEhb2, dcos_theta_dj );
	rvec_ScaledAdd( workspace->f[k], +CEhb2, dcos_theta_dk );
	// dr terms
	rvec_ScaledAdd( workspace->f[j], -CEhb3/r_jk, dvec_jk );
	rvec_ScaledAdd( workspace->f[k], +CEhb3/r_jk, dvec_jk );
	}
	else {
	rvec_Scale( force, +CEhb2, dcos_theta_di ); // dcos terms
	rvec_Add( workspace->f[i], force );
	rvec_iMultiply( ext_press, pbond_ij->rel_box, force );
	rvec_ScaledAdd( data->my_ext_press, 1.0, ext_press );

	rvec_ScaledAdd( workspace->f[j], +CEhb2, dcos_theta_dj );

	ivec_Scale( rel_jk, hbond_list[pk].scl, nbr_jk->rel_box );
	rvec_Scale( force, +CEhb2, dcos_theta_dk );
	rvec_Add( workspace->f[k], force );
	rvec_iMultiply( ext_press, rel_jk, force );
	rvec_ScaledAdd( data->my_ext_press, 1.0, ext_press );
	// dr terms
	rvec_ScaledAdd( workspace->f[j], -CEhb3/r_jk, dvec_jk );

	rvec_Scale( force, CEhb3/r_jk, dvec_jk );
	rvec_Add( workspace->f[k], force );
	rvec_iMultiply( ext_press, rel_jk, force );
	rvec_ScaledAdd( data->my_ext_press, 1.0, ext_press );
	}

	/* tally into per-atom virials */
	if (system->pair_ptr->vflag_atom \|\| system->pair_ptr->evflag) {
	rvec_ScaledSum( delij, 1., system->my_atoms[j].x,
	-1., system->my_atoms[i].x );
	rvec_ScaledSum( delkj, 1., system->my_atoms[j].x,
	-1., system->my_atoms[k].x );

	rvec_Scale(fi_tmp, CEhb2, dcos_theta_di);
	rvec_Scale(fk_tmp, CEhb2, dcos_theta_dk);
	rvec_ScaledAdd(fk_tmp, CEhb3/r_jk, dvec_jk);

	system->pair_ptr->ev_tally3(i,j,k,e_hb,0.0,fi_tmp,fk_tmp,delij,delkj);
	}
	}
	}
	}
	}
	}
	diff --git a/src/USER-REAXC/reaxc_init_md.cpp b/src/USER-REAXC/reaxc_init_md.cpp
	index f912c95ea..b11cdd2fb 100644
	--- a/src/USER-REAXC/reaxc_init_md.cpp
	+++ b/src/USER-REAXC/reaxc_init_md.cpp
	@@ -1,279 +1,279 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_init_md.h"
	#include "reaxc_allocate.h"
	#include "reaxc_forces.h"
	#include "reaxc_io_tools.h"
	#include "reaxc_list.h"
	#include "reaxc_lookup.h"
	#include "reaxc_reset_tools.h"
	#include "reaxc_system_props.h"
	#include "reaxc_tool_box.h"
	#include "reaxc_vector.h"

	int Init_System( reax_system system, control_params control, char *msg )
	{
	int i;
	reax_atom *atom;

	int mincap = system->mincap;
	double safezone = system->safezone;
	double saferzone = system->saferzone;

	// determine the local and total capacity

	system->local_cap = MAX( (int)(system->n * safezone), mincap);
	system->total_cap = MAX( (int)(system->N * safezone), mincap);

	/* estimate numH and Hcap */
	system->numH = 0;
	if( control->hbond_cut > 0 )
	for( i = 0; i < system->n; ++i ) {
	atom = &(system->my_atoms[i]);
	if (system->reax_param.sbp[ atom->type ].p_hbond == 1 && atom->type >= 0)
	atom->Hindex = system->numH++;
	else atom->Hindex = -1;
	}
	system->Hcap = (int)(MAX( system->numH * saferzone, mincap ));

	return SUCCESS;
	}


	int Init_Simulation_Data( reax_system system, control_params control,
	simulation_data data, char msg )
	{
	Reset_Simulation_Data( data, control->virial );

	/* initialize the timer(s) */
	if( system->my_rank == MASTER_NODE ) {
	data->timing.start = Get_Time( );
	}

	data->step = data->prev_steps = 0;

	return SUCCESS;
	}

	void Init_Taper( control_params control, storage workspace, MPI_Comm comm )
	{
	double d1, d7;
	double swa, swa2, swa3;
	double swb, swb2, swb3;

	swa = control->nonb_low;
	swb = control->nonb_cut;

	if( fabs( swa ) > 0.01 )
	fprintf( stderr, "Warning: non-zero lower Taper-radius cutoff\n" );

	if( swb < 0 ) {
	fprintf( stderr, "Negative upper Taper-radius cutoff\n" );
	MPI_Abort( comm, INVALID_INPUT );
	}
	else if( swb < 5 )
	fprintf( stderr, "Warning: very low Taper-radius cutoff: %f\n", swb );

	d1 = swb - swa;
	d7 = pow( d1, 7.0 );
	swa2 = SQR( swa );
	swa3 = CUBE( swa );
	swb2 = SQR( swb );
	swb3 = CUBE( swb );

	workspace->Tap[7] = 20.0 / d7;
	workspace->Tap[6] = -70.0 * (swa + swb) / d7;
	workspace->Tap[5] = 84.0 * (swa2 + 3.0swaswb + swb2) / d7;
	workspace->Tap[4] = -35.0 * (swa3 + 9.0swa2swb + 9.0swaswb2 + swb3 ) / d7;
	workspace->Tap[3] = 140.0 * (swa3swb + 3.0swa2swb2 + swaswb3 ) / d7;
	workspace->Tap[2] =-210.0 * (swa3swb2 + swa2swb3) / d7;
	workspace->Tap[1] = 140.0 * swa3 * swb3 / d7;
	workspace->Tap[0] = (-35.0swa3swb2swb2 + 21.0swa2swb3swb2 +
	7.0swaswb3swb3 + swb3swb3*swb ) / d7;
	}


	int Init_Workspace( reax_system system, control_params control,
	storage workspace, MPI_Comm comm, char msg )
	{
	int ret;

	ret = Allocate_Workspace( system, control, workspace,
	system->local_cap, system->total_cap, comm, msg );
	if( ret != SUCCESS )
	return ret;

	memset( &(workspace->realloc), 0, sizeof(reallocate_data) );
	Reset_Workspace( system, workspace );

	/* Initialize the Taper function */
	Init_Taper( control, workspace, comm );

	return SUCCESS;
	}


	/************ setup communication data structures ************/
	int Init_MPI_Datatypes( reax_system system, storage workspace,
	mpi_datatypes mpi_data, MPI_Comm comm, char msg )
	{

	/* setup the world */
	mpi_data->world = comm;
	MPI_Comm_size( comm, &(system->wsize) );

	return SUCCESS;
	}

	int Init_Lists( reax_system system, control_params control,
	simulation_data data, storage workspace, reax_list **lists,
	mpi_datatypes mpi_data, char msg )
	{
	int i, total_hbonds, total_bonds, bond_cap, num_3body, cap_3body, Htop;
	int hb_top, bond_top;
	MPI_Comm comm;

	int mincap = system->mincap;
	double safezone = system->safezone;
	double saferzone = system->saferzone;

	comm = mpi_data->world;
	bond_top = (int*) calloc( system->total_cap, sizeof(int) );
	hb_top = (int*) calloc( system->local_cap, sizeof(int) );
	Estimate_Storages( system, control, lists,
	&Htop, hb_top, bond_top, &num_3body, comm );

	if( control->hbond_cut > 0 ) {
	/* init H indexes */
	total_hbonds = 0;
	for( i = 0; i < system->n; ++i ) {
	system->my_atoms[i].num_hbonds = hb_top[i];
	total_hbonds += hb_top[i];
	}
	total_hbonds = (int)(MAX( total_hbondssaferzone, mincapMIN_HBONDS ));

	if( !Make_List( system->Hcap, total_hbonds, TYP_HBOND,
	*lists+HBONDS, comm ) ) {
	fprintf( stderr, "not enough space for hbonds list. terminating!\n" );
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}
	}

	total_bonds = 0;
	for( i = 0; i < system->N; ++i ) {
	system->my_atoms[i].num_bonds = bond_top[i];
	total_bonds += bond_top[i];
	}
	bond_cap = (int)(MAX( total_bondssafezone, mincapMIN_BONDS ));

	if( !Make_List( system->total_cap, bond_cap, TYP_BOND,
	*lists+BONDS, comm ) ) {
	fprintf( stderr, "not enough space for bonds list. terminating!\n" );
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}

	/* 3bodies list */
	cap_3body = (int)(MAX( num_3body*safezone, MIN_3BODIES ));
	if( !Make_List( bond_cap, cap_3body, TYP_THREE_BODY,
	*lists+THREE_BODIES, comm ) ){
	fprintf( stderr, "Problem in initializing angles list. Terminating!\n" );
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}

	free( hb_top );
	free( bond_top );

	return SUCCESS;
	}

	void Initialize( reax_system system, control_params control,
	simulation_data data, storage workspace,
	reax_list *lists, output_controls out_control,
	mpi_datatypes *mpi_data, MPI_Comm comm )
	{
	char msg[MAX_STR];


	if( Init_MPI_Datatypes(system, workspace, mpi_data, comm, msg) == FAILURE ) {
	fprintf( stderr, "p%d: init_mpi_datatypes: could not create datatypes\n",
	system->my_rank );
	fprintf( stderr, "p%d: mpi_data couldn't be initialized! terminating.\n",
	system->my_rank );
	MPI_Abort( mpi_data->world, CANNOT_INITIALIZE );
	}

	if( Init_System(system, control, msg) == FAILURE ){
	fprintf( stderr, "p%d: %s\n", system->my_rank, msg );
	fprintf( stderr, "p%d: system could not be initialized! terminating.\n",
	system->my_rank );
	MPI_Abort( mpi_data->world, CANNOT_INITIALIZE );
	}

	if( Init_Simulation_Data( system, control, data, msg ) == FAILURE ) {
	fprintf( stderr, "p%d: %s\n", system->my_rank, msg );
	fprintf( stderr, "p%d: sim_data couldn't be initialized! terminating.\n",
	system->my_rank );
	MPI_Abort( mpi_data->world, CANNOT_INITIALIZE );
	}

	if( Init_Workspace( system, control, workspace, mpi_data->world, msg ) ==
	FAILURE ) {
	fprintf( stderr, "p%d:init_workspace: not enough memory\n",
	system->my_rank );
	fprintf( stderr, "p%d:workspace couldn't be initialized! terminating.\n",
	system->my_rank );
	MPI_Abort( mpi_data->world, CANNOT_INITIALIZE );
	}

	if( Init_Lists( system, control, data, workspace, lists, mpi_data, msg ) ==
	FAILURE ) {
	fprintf( stderr, "p%d: %s\n", system->my_rank, msg );
	fprintf( stderr, "p%d: system could not be initialized! terminating.\n",
	system->my_rank );
	MPI_Abort( mpi_data->world, CANNOT_INITIALIZE );
	}

	if( Init_Output_Files(system,control,out_control,mpi_data,msg)== FAILURE) {
	fprintf( stderr, "p%d: %s\n", system->my_rank, msg );
	fprintf( stderr, "p%d: could not open output files! terminating...\n",
	system->my_rank );
	MPI_Abort( mpi_data->world, CANNOT_INITIALIZE );
	}

	if( control->tabulate ) {
	if( Init_Lookup_Tables( system, control, workspace, mpi_data, msg ) == FAILURE ) {
	fprintf( stderr, "p%d: %s\n", system->my_rank, msg );
	fprintf( stderr, "p%d: couldn't create lookup table! terminating.\n",
	system->my_rank );
	MPI_Abort( mpi_data->world, CANNOT_INITIALIZE );
	}
	}


	Init_Force_Functions( control );
	}
	diff --git a/src/USER-REAXC/reaxc_io_tools.cpp b/src/USER-REAXC/reaxc_io_tools.cpp
	index 0c14dad5d..4d58f7514 100644
	--- a/src/USER-REAXC/reaxc_io_tools.cpp
	+++ b/src/USER-REAXC/reaxc_io_tools.cpp
	@@ -1,153 +1,153 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "update.h"
	#include "reaxc_io_tools.h"
	#include "reaxc_list.h"
	#include "reaxc_reset_tools.h"
	#include "reaxc_system_props.h"
	#include "reaxc_tool_box.h"
	#include "reaxc_traj.h"
	#include "reaxc_vector.h"

	int Init_Output_Files( reax_system system, control_params control,
	output_controls out_control, mpi_datatypes mpi_data,
	char *msg )
	{
	char temp[MAX_STR];
	int ret;

	if( out_control->write_steps > 0 ){
	ret = Init_Traj( system, control, out_control, mpi_data, msg );
	if( ret == FAILURE )
	return ret;
	}

	if( system->my_rank == MASTER_NODE ) {
	/* These files are written only by the master node */
	if( out_control->energy_update_freq > 0 ) {

	/* init potentials file */
	sprintf( temp, "%s.pot", control->sim_name );
	if( (out_control->pot = fopen( temp, "w" )) != NULL ) {
	fflush( out_control->pot );
	}
	else {
	strcpy( msg, "init_out_controls: .pot file could not be opened\n" );
	return FAILURE;
	}

	/* init log file */
	}

	/* init pressure file */
	if( control->ensemble == NPT \|\|
	control->ensemble == iNPT \|\|
	control->ensemble == sNPT ) {
	sprintf( temp, "%s.prs", control->sim_name );
	if( (out_control->prs = fopen( temp, "w" )) != NULL ) {
	fprintf(out_control->prs,"%8s%13s%13s%13s%13s%13s%13s%13s\n",
	"step", "Pint/norm[x]", "Pint/norm[y]", "Pint/norm[z]",
	"Pext/Ptot[x]", "Pext/Ptot[y]", "Pext/Ptot[z]", "Pkin/V" );
	fflush( out_control->prs );
	}
	else {
	strcpy(msg,"init_out_controls: .prs file couldn't be opened\n");
	return FAILURE;
	}
	}
	}

	return SUCCESS;
	}


	/********************** close output files **********************/
	int Close_Output_Files( reax_system system, control_params control,
	output_controls out_control, mpi_datatypes mpi_data )
	{
	if( out_control->write_steps > 0 )
	End_Traj( system->my_rank, out_control );

	if( system->my_rank == MASTER_NODE ) {
	if( out_control->energy_update_freq > 0 ) {
	fclose( out_control->pot );
	}

	if( control->ensemble == NPT \|\| control->ensemble == iNPT \|\|
	control->ensemble == sNPT )
	fclose( out_control->prs );
	}

	return SUCCESS;
	}


	void Output_Results( reax_system system, control_params control,
	simulation_data data, reax_list *lists,
	output_controls out_control, mpi_datatypes mpi_data )
	{

	if((out_control->energy_update_freq > 0 &&
	data->step%out_control->energy_update_freq == 0) \|\|
	(out_control->write_steps > 0 &&
	data->step%out_control->write_steps == 0)){
	/* update system-wide energies */
	Compute_System_Energy( system, data, mpi_data->world );

	/* output energies */
	if( system->my_rank == MASTER_NODE &&
	out_control->energy_update_freq > 0 &&
	data->step % out_control->energy_update_freq == 0 ) {

	if( control->virial ){
	fprintf( out_control->prs,
	"%8d%13.6f%13.6f%13.6f%13.6f%13.6f%13.6f%13.6f\n",
	data->step,
	data->int_press[0], data->int_press[1], data->int_press[2],
	data->ext_press[0], data->ext_press[1], data->ext_press[2],
	data->kin_press );

	fprintf( out_control->prs,
	"%8s%13.6f%13.6f%13.6f%13.6f%13.6f%13.6f%13.6f\n",
	"",system->big_box.box_norms[0], system->big_box.box_norms[1],
	system->big_box.box_norms[2],
	data->tot_press[0], data->tot_press[1], data->tot_press[2],
	system->big_box.V );

	fflush( out_control->prs);
	}
	}

	/* write current frame */
	if( out_control->write_steps > 0 &&
	(data->step-data->prev_steps) % out_control->write_steps == 0 ) {
	Append_Frame( system, control, data, lists, out_control, mpi_data );
	}
	}

	}
	diff --git a/src/USER-REAXC/reaxc_list.cpp b/src/USER-REAXC/reaxc_list.cpp
	index d22ac4ca7..2755d5506 100644
	--- a/src/USER-REAXC/reaxc_list.cpp
	+++ b/src/USER-REAXC/reaxc_list.cpp
	@@ -1,126 +1,126 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_list.h"
	#include "reaxc_tool_box.h"

	/*********** allocate list space ****************/
	int Make_List(int n, int num_intrs, int type, reax_list *l, MPI_Comm comm)
	{
	l->allocated = 1;

	l->n = n;
	l->num_intrs = num_intrs;

	l->index = (int) smalloc( n sizeof(int), "list:index", comm );
	l->end_index = (int) smalloc( n sizeof(int), "list:end_index", comm );

	l->type = type;

	switch(l->type) {
	case TYP_VOID:
	l->select.v = (void) smalloc(l->num_intrs sizeof(void*), "list:v", comm);
	break;

	case TYP_THREE_BODY:
	l->select.three_body_list = (three_body_interaction_data*)
	smalloc( l->num_intrs * sizeof(three_body_interaction_data),
	"list:three_bodies", comm );
	break;

	case TYP_BOND:
	l->select.bond_list = (bond_data*)
	smalloc( l->num_intrs * sizeof(bond_data), "list:bonds", comm );
	break;

	case TYP_DBO:
	l->select.dbo_list = (dbond_data*)
	smalloc( l->num_intrs * sizeof(dbond_data), "list:dbonds", comm );
	break;

	case TYP_DDELTA:
	l->select.dDelta_list = (dDelta_data*)
	smalloc( l->num_intrs * sizeof(dDelta_data), "list:dDeltas", comm );
	break;

	case TYP_FAR_NEIGHBOR:
	l->select.far_nbr_list = (far_neighbor_data*)
	smalloc(l->num_intrs * sizeof(far_neighbor_data), "list:far_nbrs", comm);
	break;

	case TYP_HBOND:
	l->select.hbond_list = (hbond_data*)
	smalloc( l->num_intrs * sizeof(hbond_data), "list:hbonds", comm );
	break;

	default:
	fprintf( stderr, "ERROR: no %d list type defined!\n", l->type );
	MPI_Abort( comm, INVALID_INPUT );
	}

	return SUCCESS;
	}


	void Delete_List( reax_list *l, MPI_Comm comm )
	{
	if( l->allocated == 0 )
	return;
	l->allocated = 0;

	sfree( l->index, "list:index" );
	sfree( l->end_index, "list:end_index" );

	switch(l->type) {
	case TYP_VOID:
	sfree( l->select.v, "list:v" );
	break;
	case TYP_HBOND:
	sfree( l->select.hbond_list, "list:hbonds" );
	break;
	case TYP_FAR_NEIGHBOR:
	sfree( l->select.far_nbr_list, "list:far_nbrs" );
	break;
	case TYP_BOND:
	sfree( l->select.bond_list, "list:bonds" );
	break;
	case TYP_DBO:
	sfree( l->select.dbo_list, "list:dbos" );
	break;
	case TYP_DDELTA:
	sfree( l->select.dDelta_list, "list:dDeltas" );
	break;
	case TYP_THREE_BODY:
	sfree( l->select.three_body_list, "list:three_bodies" );
	break;

	default:
	fprintf( stderr, "ERROR: no %d list type defined!\n", l->type );
	MPI_Abort( comm, INVALID_INPUT );
	}
	}

	diff --git a/src/USER-REAXC/reaxc_lookup.cpp b/src/USER-REAXC/reaxc_lookup.cpp
	index 903e54962..9db8b7b9f 100644
	--- a/src/USER-REAXC/reaxc_lookup.cpp
	+++ b/src/USER-REAXC/reaxc_lookup.cpp
	@@ -1,304 +1,304 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_lookup.h"
	#include "reaxc_nonbonded.h"
	#include "reaxc_tool_box.h"

	LR_lookup_table **LR;

	void Tridiagonal_Solve( const double a, const double b,
	double c, double d, double *x, unsigned int n){
	int i;
	double id;

	c[0] /= b[0]; /* Division by zero risk. */
	d[0] /= b[0]; /* Division by zero would imply a singular matrix. */
	for(i = 1; i < n; i++){
	id = (b[i] - c[i-1] * a[i]); /* Division by zero risk. */
	c[i] /= id; /* Last value calculated is redundant. */
	d[i] = (d[i] - d[i-1] * a[i])/id;
	}

	x[n - 1] = d[n - 1];
	for(i = n - 2; i >= 0; i--)
	x[i] = d[i] - c[i] * x[i + 1];
	}


	void Natural_Cubic_Spline( const double h, const double f,
	cubic_spline_coef *coef, unsigned int n,
	MPI_Comm comm )
	{
	int i;
	double a, b, c, d, *v;

	/* allocate space for the linear system */
	a = (double) smalloc( n sizeof(double), "cubic_spline:a", comm );
	b = (double) smalloc( n sizeof(double), "cubic_spline:a", comm );
	c = (double) smalloc( n sizeof(double), "cubic_spline:a", comm );
	d = (double) smalloc( n sizeof(double), "cubic_spline:a", comm );
	v = (double) smalloc( n sizeof(double), "cubic_spline:a", comm );

	/* build the linear system */
	a[0] = a[1] = a[n-1] = 0;
	for( i = 2; i < n-1; ++i )
	a[i] = h[i-1];

	b[0] = b[n-1] = 0;
	for( i = 1; i < n-1; ++i )
	b[i] = 2 * (h[i-1] + h[i]);

	c[0] = c[n-2] = c[n-1] = 0;
	for( i = 1; i < n-2; ++i )
	c[i] = h[i];

	d[0] = d[n-1] = 0;
	for( i = 1; i < n-1; ++i )
	d[i] = 6 * ((f[i+1]-f[i])/h[i] - (f[i]-f[i-1])/h[i-1]);

	v[0] = 0;
	v[n-1] = 0;
	Tridiagonal_Solve( &(a[1]), &(b[1]), &(c[1]), &(d[1]), &(v[1]), n-2 );

	for( i = 1; i < n; ++i ){
	coef[i-1].d = (v[i] - v[i-1]) / (6*h[i-1]);
	coef[i-1].c = v[i]/2;
	coef[i-1].b = (f[i]-f[i-1])/h[i-1] + h[i-1](2v[i] + v[i-1])/6;
	coef[i-1].a = f[i];
	}

	sfree( a, "cubic_spline:a" );
	sfree( b, "cubic_spline:b" );
	sfree( c, "cubic_spline:c" );
	sfree( d, "cubic_spline:d" );
	sfree( v, "cubic_spline:v" );
	}



	void Complete_Cubic_Spline( const double h, const double f, double v0, double vlast,
	cubic_spline_coef *coef, unsigned int n,
	MPI_Comm comm )
	{
	int i;
	double a, b, c, d, *v;

	/* allocate space for the linear system */
	a = (double) smalloc( n sizeof(double), "cubic_spline:a", comm );
	b = (double) smalloc( n sizeof(double), "cubic_spline:a", comm );
	c = (double) smalloc( n sizeof(double), "cubic_spline:a", comm );
	d = (double) smalloc( n sizeof(double), "cubic_spline:a", comm );
	v = (double) smalloc( n sizeof(double), "cubic_spline:a", comm );

	/* build the linear system */
	a[0] = 0;
	for( i = 1; i < n; ++i )
	a[i] = h[i-1];

	b[0] = 2*h[0];
	for( i = 1; i < n; ++i )
	b[i] = 2 * (h[i-1] + h[i]);

	c[n-1] = 0;
	for( i = 0; i < n-1; ++i )
	c[i] = h[i];

	d[0] = 6 * (f[1]-f[0])/h[0] - 6 * v0;
	d[n-1] = 6 * vlast - 6 * (f[n-1]-f[n-2]/h[n-2]);
	for( i = 1; i < n-1; ++i )
	d[i] = 6 * ((f[i+1]-f[i])/h[i] - (f[i]-f[i-1])/h[i-1]);

	Tridiagonal_Solve( &(a[0]), &(b[0]), &(c[0]), &(d[0]), &(v[0]), n );

	for( i = 1; i < n; ++i ){
	coef[i-1].d = (v[i] - v[i-1]) / (6*h[i-1]);
	coef[i-1].c = v[i]/2;
	coef[i-1].b = (f[i]-f[i-1])/h[i-1] + h[i-1](2v[i] + v[i-1])/6;
	coef[i-1].a = f[i];
	}

	sfree( a, "cubic_spline:a" );
	sfree( b, "cubic_spline:b" );
	sfree( c, "cubic_spline:c" );
	sfree( d, "cubic_spline:d" );
	sfree( v, "cubic_spline:v" );
	}


	int Init_Lookup_Tables( reax_system system, control_params control,
	storage workspace, mpi_datatypes mpi_data, char *msg )
	{
	int i, j, r;
	int num_atom_types;
	int existing_types[MAX_ATOM_TYPES], aggregated[MAX_ATOM_TYPES];
	double dr;
	double h, fh, fvdw, fele, fCEvd, fCEclmb;
	double v0_vdw, v0_ele, vlast_vdw, vlast_ele;
	MPI_Comm comm;

	/* initializations */
	v0_vdw = 0;
	v0_ele = 0;
	vlast_vdw = 0;
	vlast_ele = 0;
	comm = mpi_data->world;

	num_atom_types = system->reax_param.num_atom_types;
	dr = control->nonb_cut / control->tabulate;
	h = (double*)
	smalloc( (control->tabulate+2) * sizeof(double), "lookup:h", comm );
	fh = (double*)
	smalloc( (control->tabulate+2) * sizeof(double), "lookup:fh", comm );
	fvdw = (double*)
	smalloc( (control->tabulate+2) * sizeof(double), "lookup:fvdw", comm );
	fCEvd = (double*)
	smalloc( (control->tabulate+2) * sizeof(double), "lookup:fCEvd", comm );
	fele = (double*)
	smalloc( (control->tabulate+2) * sizeof(double), "lookup:fele", comm );
	fCEclmb = (double*)
	smalloc( (control->tabulate+2) * sizeof(double), "lookup:fCEclmb", comm );

	LR = (LR_lookup_table**)
	scalloc( num_atom_types, sizeof(LR_lookup_table*), "lookup:LR", comm );
	for( i = 0; i < num_atom_types; ++i )
	LR[i] = (LR_lookup_table*)
	scalloc( num_atom_types, sizeof(LR_lookup_table), "lookup:LR[i]", comm );

	for( i = 0; i < MAX_ATOM_TYPES; ++i )
	existing_types[i] = 0;
	for( i = 0; i < system->n; ++i )
	existing_types[ system->my_atoms[i].type ] = 1;

	MPI_Allreduce( existing_types, aggregated, MAX_ATOM_TYPES,
	MPI_INT, MPI_SUM, mpi_data->world );

	for( i = 0; i < num_atom_types; ++i ) {
	if( aggregated[i] ) {
	for( j = i; j < num_atom_types; ++j ) {
	if( aggregated[j] ) {
	LR[i][j].xmin = 0;
	LR[i][j].xmax = control->nonb_cut;
	LR[i][j].n = control->tabulate + 2;
	LR[i][j].dx = dr;
	LR[i][j].inv_dx = control->tabulate / control->nonb_cut;
	LR[i][j].y = (LR_data*)
	smalloc( LR[i][j].n * sizeof(LR_data), "lookup:LR[i,j].y", comm );
	LR[i][j].H = (cubic_spline_coef*)
	smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].H" ,
	comm );
	LR[i][j].vdW = (cubic_spline_coef*)
	smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].vdW",
	comm);
	LR[i][j].CEvd = (cubic_spline_coef*)
	smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].CEvd",
	comm);
	LR[i][j].ele = (cubic_spline_coef*)
	smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].ele",
	comm );
	LR[i][j].CEclmb = (cubic_spline_coef*)
	smalloc( LR[i][j].n*sizeof(cubic_spline_coef),
	"lookup:LR[i,j].CEclmb", comm );

	for( r = 1; r <= control->tabulate; ++r ) {
	LR_vdW_Coulomb( system, workspace, control, i, j, r * dr, &(LR[i][j].y[r]) );
	h[r] = LR[i][j].dx;
	fh[r] = LR[i][j].y[r].H;
	fvdw[r] = LR[i][j].y[r].e_vdW;
	fCEvd[r] = LR[i][j].y[r].CEvd;
	fele[r] = LR[i][j].y[r].e_ele;
	fCEclmb[r] = LR[i][j].y[r].CEclmb;
	}

	// init the start-end points
	h[r] = LR[i][j].dx;
	v0_vdw = LR[i][j].y[1].CEvd;
	v0_ele = LR[i][j].y[1].CEclmb;
	fh[r] = fh[r-1];
	fvdw[r] = fvdw[r-1];
	fCEvd[r] = fCEvd[r-1];
	fele[r] = fele[r-1];
	fCEclmb[r] = fCEclmb[r-1];
	vlast_vdw = fCEvd[r-1];
	vlast_ele = fele[r-1];

	Natural_Cubic_Spline( &h[1], &fh[1],
	&(LR[i][j].H[1]), control->tabulate+1, comm );

	Complete_Cubic_Spline( &h[1], &fvdw[1], v0_vdw, vlast_vdw,
	&(LR[i][j].vdW[1]), control->tabulate+1,
	comm );

	Natural_Cubic_Spline( &h[1], &fCEvd[1],
	&(LR[i][j].CEvd[1]), control->tabulate+1,
	comm );

	Complete_Cubic_Spline( &h[1], &fele[1], v0_ele, vlast_ele,
	&(LR[i][j].ele[1]), control->tabulate+1,
	comm );

	Natural_Cubic_Spline( &h[1], &fCEclmb[1],
	&(LR[i][j].CEclmb[1]), control->tabulate+1,
	comm );
	} else{
	LR[i][j].n = 0;
	}
	}
	}
	}
	free(h);
	free(fh);
	free(fvdw);
	free(fCEvd);
	free(fele);
	free(fCEclmb);

	return 1;
	}


	void Deallocate_Lookup_Tables( reax_system *system )
	{
	int i, j;
	int ntypes;

	ntypes = system->reax_param.num_atom_types;

	for( i = 0; i < ntypes; ++i ) {
	for( j = i; j < ntypes; ++j )
	if( LR[i][j].n ) {
	sfree( LR[i][j].y, "LR[i,j].y" );
	sfree( LR[i][j].H, "LR[i,j].H" );
	sfree( LR[i][j].vdW, "LR[i,j].vdW" );
	sfree( LR[i][j].CEvd, "LR[i,j].CEvd" );
	sfree( LR[i][j].ele, "LR[i,j].ele" );
	sfree( LR[i][j].CEclmb, "LR[i,j].CEclmb" );
	}
	sfree( LR[i], "LR[i]" );
	}
	sfree( LR, "LR" );
	}
	diff --git a/src/USER-REAXC/reaxc_multi_body.cpp b/src/USER-REAXC/reaxc_multi_body.cpp
	index 1923668e8..ecfd3ad04 100644
	--- a/src/USER-REAXC/reaxc_multi_body.cpp
	+++ b/src/USER-REAXC/reaxc_multi_body.cpp
	@@ -1,240 +1,243 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_multi_body.h"
	#include "reaxc_bond_orders.h"
	#include "reaxc_list.h"
	#include "reaxc_vector.h"

	void Atom_Energy( reax_system system, control_params control,
	simulation_data data, storage workspace, reax_list **lists,
	output_controls *out_control )
	{
	int i, j, pj, type_i, type_j;
	double Delta_lpcorr, dfvl;
	double e_lp, expvd2, inv_expvd2, dElp, CElp, DlpVi;
	double e_lph, Di, vov3, deahu2dbo, deahu2dsbo;
	double e_ov, CEover1, CEover2, CEover3, CEover4;
	double exp_ovun1, exp_ovun2, sum_ovun1, sum_ovun2;
	double exp_ovun2n, exp_ovun6, exp_ovun8;
	double inv_exp_ovun1, inv_exp_ovun2, inv_exp_ovun2n, inv_exp_ovun8;
	double e_un, CEunder1, CEunder2, CEunder3, CEunder4;
	double p_lp2, p_lp3;
	double p_ovun2, p_ovun3, p_ovun4, p_ovun5, p_ovun6, p_ovun7, p_ovun8;
	double eng_tmp;
	int numbonds;

	single_body_parameters *sbp_i;
	two_body_parameters *twbp;
	bond_data *pbond;
	bond_order_data *bo_ij;
	reax_list bonds = (lists) + BONDS;

	/* Initialize parameters */
	p_lp3 = system->reax_param.gp.l[5];
	p_ovun3 = system->reax_param.gp.l[32];
	p_ovun4 = system->reax_param.gp.l[31];
	p_ovun6 = system->reax_param.gp.l[6];
	p_ovun7 = system->reax_param.gp.l[8];
	p_ovun8 = system->reax_param.gp.l[9];

	for( i = 0; i < system->n; ++i ) {
	/* set the parameter pointer */
	type_i = system->my_atoms[i].type;
	if (type_i < 0) continue;
	sbp_i = &(system->reax_param.sbp[ type_i ]);

	/* lone-pair Energy */
	p_lp2 = sbp_i->p_lp2;
	expvd2 = exp( -75 * workspace->Delta_lp[i] );
	inv_expvd2 = 1. / (1. + expvd2 );

	numbonds = 0;
	e_lp = 0.0;
	for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj )
	numbonds ++;

	/* calculate the energy */
	- if (numbonds > 0)
	+ if (numbonds > 0 \|\| control->enobondsflag)
	data->my_en.e_lp += e_lp =
	p_lp2 * workspace->Delta_lp[i] * inv_expvd2;

	dElp = p_lp2 * inv_expvd2 +
	75 * p_lp2 * workspace->Delta_lp[i] * expvd2 * SQR(inv_expvd2);
	CElp = dElp * workspace->dDelta_lp[i];

	- if (numbonds > 0) workspace->CdDelta[i] += CElp; // lp - 1st term
	+ if (numbonds > 0 \|\| control->enobondsflag)
	+ workspace->CdDelta[i] += CElp; // lp - 1st term

	/* tally into per-atom energy */
	if( system->pair_ptr->evflag)
	system->pair_ptr->ev_tally(i,i,system->n,1,e_lp,0.0,0.0,0.0,0.0,0.0);

	/* correction for C2 */
	if( p_lp3 > 0.001 && !strcmp(system->reax_param.sbp[type_i].name, "C") )
	for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj ) {
	j = bonds->select.bond_list[pj].nbr;
	type_j = system->my_atoms[j].type;
	if (type_j < 0) continue;

	if( !strcmp( system->reax_param.sbp[type_j].name, "C" ) ) {
	twbp = &( system->reax_param.tbp[type_i][type_j]);
	bo_ij = &( bonds->select.bond_list[pj].bo_data );
	Di = workspace->Delta[i];
	vov3 = bo_ij->BO - Di - 0.040*pow(Di, 4.);

	if( vov3 > 3. ) {
	data->my_en.e_lp += e_lph = p_lp3 * SQR(vov3-3.0);

	deahu2dbo = 2.p_lp3(vov3 - 3.);
	deahu2dsbo = 2.p_lp3(vov3 - 3.)(-1. - 0.16pow(Di, 3.));

	bo_ij->Cdbo += deahu2dbo;
	workspace->CdDelta[i] += deahu2dsbo;

	/* tally into per-atom energy */
	if( system->pair_ptr->evflag)
	system->pair_ptr->ev_tally(i,j,system->n,1,e_lph,0.0,0.0,0.0,0.0,0.0);

	}
	}
	}
	}


	for( i = 0; i < system->n; ++i ) {
	type_i = system->my_atoms[i].type;
	if (type_i < 0) continue;
	sbp_i = &(system->reax_param.sbp[ type_i ]);

	/* over-coordination energy */
	if( sbp_i->mass > 21.0 )
	dfvl = 0.0;
	else dfvl = 1.0; // only for 1st-row elements

	p_ovun2 = sbp_i->p_ovun2;
	sum_ovun1 = sum_ovun2 = 0;
	for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj ) {
	j = bonds->select.bond_list[pj].nbr;
	type_j = system->my_atoms[j].type;
	if (type_j < 0) continue;
	bo_ij = &(bonds->select.bond_list[pj].bo_data);
	twbp = &(system->reax_param.tbp[ type_i ][ type_j ]);

	sum_ovun1 += twbp->p_ovun1 * twbp->De_s * bo_ij->BO;
	sum_ovun2 += (workspace->Delta[j] - dfvlworkspace->Delta_lp_temp[j])
	( bo_ij->BO_pi + bo_ij->BO_pi2 );

	}

	exp_ovun1 = p_ovun3 * exp( p_ovun4 * sum_ovun2 );
	inv_exp_ovun1 = 1.0 / (1 + exp_ovun1);
	Delta_lpcorr = workspace->Delta[i] -
	(dfvl * workspace->Delta_lp_temp[i]) * inv_exp_ovun1;

	exp_ovun2 = exp( p_ovun2 * Delta_lpcorr );
	inv_exp_ovun2 = 1.0 / (1.0 + exp_ovun2);

	DlpVi = 1.0 / (Delta_lpcorr + sbp_i->valency + 1e-8);
	CEover1 = Delta_lpcorr * DlpVi * inv_exp_ovun2;

	data->my_en.e_ov += e_ov = sum_ovun1 * CEover1;

	CEover2 = sum_ovun1 * DlpVi * inv_exp_ovun2 *
	(1.0 - Delta_lpcorr * ( DlpVi + p_ovun2 * exp_ovun2 * inv_exp_ovun2 ));

	CEover3 = CEover2 * (1.0 - dfvl * workspace->dDelta_lp[i] * inv_exp_ovun1 );

	CEover4 = CEover2 * (dfvl * workspace->Delta_lp_temp[i]) *
	p_ovun4 * exp_ovun1 * SQR(inv_exp_ovun1);


	/* under-coordination potential */
	p_ovun2 = sbp_i->p_ovun2;
	p_ovun5 = sbp_i->p_ovun5;

	exp_ovun2n = 1.0 / exp_ovun2;
	exp_ovun6 = exp( p_ovun6 * Delta_lpcorr );
	exp_ovun8 = p_ovun7 * exp(p_ovun8 * sum_ovun2);
	inv_exp_ovun2n = 1.0 / (1.0 + exp_ovun2n);
	inv_exp_ovun8 = 1.0 / (1.0 + exp_ovun8);

	numbonds = 0;
	e_un = 0.0;
	for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj )
	numbonds ++;

	- if (numbonds > 0)
	+ if (numbonds > 0 \|\| control->enobondsflag)
	data->my_en.e_un += e_un =
	-p_ovun5 * (1.0 - exp_ovun6) * inv_exp_ovun2n * inv_exp_ovun8;

	CEunder1 = inv_exp_ovun2n *
	( p_ovun5 * p_ovun6 * exp_ovun6 * inv_exp_ovun8 +
	p_ovun2 * e_un * exp_ovun2n );
	CEunder2 = -e_un * p_ovun8 * exp_ovun8 * inv_exp_ovun8;
	CEunder3 = CEunder1 * (1.0 - dfvlworkspace->dDelta_lp[i]inv_exp_ovun1);
	CEunder4 = CEunder1 * (dfvlworkspace->Delta_lp_temp[i])
	p_ovun4 * exp_ovun1 * SQR(inv_exp_ovun1) + CEunder2;

	/* tally into per-atom energy */
	if( system->pair_ptr->evflag) {
	eng_tmp = e_ov;
	- if (numbonds > 0) eng_tmp += e_un;
	+ if (numbonds > 0 \|\| control->enobondsflag)
	+ eng_tmp += e_un;
	system->pair_ptr->ev_tally(i,i,system->n,1,eng_tmp,0.0,0.0,0.0,0.0,0.0);
	}

	/* forces */
	workspace->CdDelta[i] += CEover3; // OvCoor - 2nd term
	- if (numbonds > 0) workspace->CdDelta[i] += CEunder3; // UnCoor - 1st term
	+ if (numbonds > 0 \|\| control->enobondsflag)
	+ workspace->CdDelta[i] += CEunder3; // UnCoor - 1st term

	for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj ) {
	pbond = &(bonds->select.bond_list[pj]);
	j = pbond->nbr;
	bo_ij = &(pbond->bo_data);
	twbp = &(system->reax_param.tbp[ system->my_atoms[i].type ]
	[system->my_atoms[pbond->nbr].type]);


	bo_ij->Cdbo += CEover1 * twbp->p_ovun1 * twbp->De_s;// OvCoor-1st
	workspace->CdDelta[j] += CEover4 * (1.0 - dfvlworkspace->dDelta_lp[j])
	(bo_ij->BO_pi + bo_ij->BO_pi2); // OvCoor-3a
	bo_ij->Cdbopi += CEover4 *
	(workspace->Delta[j] - dfvl*workspace->Delta_lp_temp[j]); // OvCoor-3b
	bo_ij->Cdbopi2 += CEover4 *
	(workspace->Delta[j] - dfvl*workspace->Delta_lp_temp[j]); // OvCoor-3b


	workspace->CdDelta[j] += CEunder4 * (1.0 - dfvlworkspace->dDelta_lp[j])
	(bo_ij->BO_pi + bo_ij->BO_pi2); // UnCoor - 2a
	bo_ij->Cdbopi += CEunder4 *
	(workspace->Delta[j] - dfvl*workspace->Delta_lp_temp[j]); // UnCoor-2b
	bo_ij->Cdbopi2 += CEunder4 *
	(workspace->Delta[j] - dfvl*workspace->Delta_lp_temp[j]); // UnCoor-2b

	}

	}
	}
	diff --git a/src/USER-REAXC/reaxc_nonbonded.cpp b/src/USER-REAXC/reaxc_nonbonded.cpp
	index cb24e2dc3..9c223428a 100644
	--- a/src/USER-REAXC/reaxc_nonbonded.cpp
	+++ b/src/USER-REAXC/reaxc_nonbonded.cpp
	@@ -1,432 +1,432 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_types.h"
	#include "reaxc_nonbonded.h"
	#include "reaxc_bond_orders.h"
	#include "reaxc_list.h"
	#include "reaxc_vector.h"

	void vdW_Coulomb_Energy( reax_system system, control_params control,
	simulation_data data, storage workspace,
	reax_list *lists, output_controls out_control )
	{
	int i, j, pj, natoms;
	int start_i, end_i, flag;
	rc_tagint orig_i, orig_j;
	double p_vdW1, p_vdW1i;
	double powr_vdW1, powgi_vdW1;
	double tmp, r_ij, fn13, exp1, exp2;
	double Tap, dTap, dfn13, CEvd, CEclmb, de_core;
	double dr3gamij_1, dr3gamij_3;
	double e_ele, e_vdW, e_core, SMALL = 0.0001;
	double e_lg, de_lg, r_ij5, r_ij6, re6;
	rvec temp, ext_press;
	two_body_parameters *twbp;
	far_neighbor_data *nbr_pj;
	reax_list *far_nbrs;

	// Tallying variables:
	double pe_vdw, f_tmp, delij[3];

	natoms = system->n;
	far_nbrs = (*lists) + FAR_NBRS;
	p_vdW1 = system->reax_param.gp.l[28];
	p_vdW1i = 1.0 / p_vdW1;
	e_core = 0;
	e_vdW = 0;
	e_lg = de_lg = 0.0;

	for( i = 0; i < natoms; ++i ) {
	if (system->my_atoms[i].type < 0) continue;
	start_i = Start_Index(i, far_nbrs);
	end_i = End_Index(i, far_nbrs);
	orig_i = system->my_atoms[i].orig_id;

	for( pj = start_i; pj < end_i; ++pj ) {
	nbr_pj = &(far_nbrs->select.far_nbr_list[pj]);
	j = nbr_pj->nbr;
	if (system->my_atoms[j].type < 0) continue;
	orig_j = system->my_atoms[j].orig_id;

	flag = 0;
	if(nbr_pj->d <= control->nonb_cut) {
	if (j < natoms) flag = 1;
	else if (orig_i < orig_j) flag = 1;
	else if (orig_i == orig_j) {
	if (nbr_pj->dvec[2] > SMALL) flag = 1;
	else if (fabs(nbr_pj->dvec[2]) < SMALL) {
	if (nbr_pj->dvec[1] > SMALL) flag = 1;
	else if (fabs(nbr_pj->dvec[1]) < SMALL && nbr_pj->dvec[0] > SMALL)
	flag = 1;
	}
	}
	}

	if (flag) {

	r_ij = nbr_pj->d;
	twbp = &(system->reax_param.tbp[ system->my_atoms[i].type ]
	[ system->my_atoms[j].type ]);

	Tap = workspace->Tap[7] * r_ij + workspace->Tap[6];
	Tap = Tap * r_ij + workspace->Tap[5];
	Tap = Tap * r_ij + workspace->Tap[4];
	Tap = Tap * r_ij + workspace->Tap[3];
	Tap = Tap * r_ij + workspace->Tap[2];
	Tap = Tap * r_ij + workspace->Tap[1];
	Tap = Tap * r_ij + workspace->Tap[0];

	dTap = 7workspace->Tap[7] r_ij + 6*workspace->Tap[6];
	dTap = dTap * r_ij + 5*workspace->Tap[5];
	dTap = dTap * r_ij + 4*workspace->Tap[4];
	dTap = dTap * r_ij + 3*workspace->Tap[3];
	dTap = dTap * r_ij + 2*workspace->Tap[2];
	dTap += workspace->Tap[1]/r_ij;

	/vdWaals Calculations/
	if(system->reax_param.gp.vdw_type==1 \|\| system->reax_param.gp.vdw_type==3)
	{ // shielding
	powr_vdW1 = pow(r_ij, p_vdW1);
	powgi_vdW1 = pow( 1.0 / twbp->gamma_w, p_vdW1);

	fn13 = pow( powr_vdW1 + powgi_vdW1, p_vdW1i );
	exp1 = exp( twbp->alpha * (1.0 - fn13 / twbp->r_vdW) );
	exp2 = exp( 0.5 * twbp->alpha * (1.0 - fn13 / twbp->r_vdW) );

	e_vdW = twbp->D * (exp1 - 2.0 * exp2);
	data->my_en.e_vdW += Tap * e_vdW;

	dfn13 = pow( powr_vdW1 + powgi_vdW1, p_vdW1i - 1.0) *
	pow(r_ij, p_vdW1 - 2.0);

	CEvd = dTap * e_vdW -
	Tap * twbp->D * (twbp->alpha / twbp->r_vdW) * (exp1 - exp2) * dfn13;
	}
	else{ // no shielding
	exp1 = exp( twbp->alpha * (1.0 - r_ij / twbp->r_vdW) );
	exp2 = exp( 0.5 * twbp->alpha * (1.0 - r_ij / twbp->r_vdW) );

	e_vdW = twbp->D * (exp1 - 2.0 * exp2);
	data->my_en.e_vdW += Tap * e_vdW;

	CEvd = dTap * e_vdW -
	Tap * twbp->D * (twbp->alpha / twbp->r_vdW) * (exp1 - exp2) / r_ij;
	}

	if(system->reax_param.gp.vdw_type==2 \|\| system->reax_param.gp.vdw_type==3)
	{ // inner wall
	e_core = twbp->ecore * exp(twbp->acore * (1.0-(r_ij/twbp->rcore)));
	data->my_en.e_vdW += Tap * e_core;

	de_core = -(twbp->acore/twbp->rcore) * e_core;
	CEvd += dTap * e_core + Tap * de_core / r_ij;

	// lg correction, only if lgvdw is yes
	if (control->lgflag) {
	r_ij5 = pow( r_ij, 5.0 );
	r_ij6 = pow( r_ij, 6.0 );
	re6 = pow( twbp->lgre, 6.0 );
	e_lg = -(twbp->lgcij/( r_ij6 + re6 ));
	data->my_en.e_vdW += Tap * e_lg;

	de_lg = -6.0 * e_lg * r_ij5 / ( r_ij6 + re6 ) ;
	CEvd += dTap * e_lg + Tap * de_lg / r_ij;
	}

	}

	/Coulomb Calculations/
	dr3gamij_1 = ( r_ij * r_ij * r_ij + twbp->gamma );
	dr3gamij_3 = pow( dr3gamij_1 , 0.33333333333333 );

	tmp = Tap / dr3gamij_3;
	data->my_en.e_ele += e_ele =
	C_ele * system->my_atoms[i].q * system->my_atoms[j].q * tmp;

	CEclmb = C_ele * system->my_atoms[i].q * system->my_atoms[j].q *
	( dTap - Tap * r_ij / dr3gamij_1 ) / dr3gamij_3;

	/* tally into per-atom energy */
	if( system->pair_ptr->evflag \|\| system->pair_ptr->vflag_atom) {
	pe_vdw = Tap * (e_vdW + e_core + e_lg);
	rvec_ScaledSum( delij, 1., system->my_atoms[i].x,
	-1., system->my_atoms[j].x );
	f_tmp = -(CEvd + CEclmb);
	system->pair_ptr->ev_tally(i,j,natoms,1,pe_vdw,e_ele,
	f_tmp,delij[0],delij[1],delij[2]);
	}

	if( control->virial == 0 ) {
	rvec_ScaledAdd( workspace->f[i], -(CEvd + CEclmb), nbr_pj->dvec );
	rvec_ScaledAdd( workspace->f[j], +(CEvd + CEclmb), nbr_pj->dvec );
	}
	else { /* NPT, iNPT or sNPT */
	rvec_Scale( temp, CEvd + CEclmb, nbr_pj->dvec );

	rvec_ScaledAdd( workspace->f[i], -1., temp );
	rvec_Add( workspace->f[j], temp );

	rvec_iMultiply( ext_press, nbr_pj->rel_box, temp );
	rvec_Add( data->my_ext_press, ext_press );
	}
	}
	}
	}

	Compute_Polarization_Energy( system, data );
	}



	void Tabulated_vdW_Coulomb_Energy( reax_system system,control_params control,
	simulation_data data, storage workspace,
	reax_list **lists,
	output_controls *out_control )
	{
	int i, j, pj, r, natoms;
	int type_i, type_j, tmin, tmax;
	int start_i, end_i, flag;
	rc_tagint orig_i, orig_j;
	double r_ij, base, dif;
	double e_vdW, e_ele;
	double CEvd, CEclmb, SMALL = 0.0001;
	double f_tmp, delij[3];

	rvec temp, ext_press;
	far_neighbor_data *nbr_pj;
	reax_list *far_nbrs;
	LR_lookup_table *t;

	natoms = system->n;
	far_nbrs = (*lists) + FAR_NBRS;

	e_ele = e_vdW = 0;

	for( i = 0; i < natoms; ++i ) {
	type_i = system->my_atoms[i].type;
	if (type_i < 0) continue;
	start_i = Start_Index(i,far_nbrs);
	end_i = End_Index(i,far_nbrs);
	orig_i = system->my_atoms[i].orig_id;

	for( pj = start_i; pj < end_i; ++pj ) {
	nbr_pj = &(far_nbrs->select.far_nbr_list[pj]);
	j = nbr_pj->nbr;
	type_j = system->my_atoms[j].type;
	if (type_j < 0) continue;
	orig_j = system->my_atoms[j].orig_id;

	flag = 0;
	if(nbr_pj->d <= control->nonb_cut) {
	if (j < natoms) flag = 1;
	else if (orig_i < orig_j) flag = 1;
	else if (orig_i == orig_j) {
	if (nbr_pj->dvec[2] > SMALL) flag = 1;
	else if (fabs(nbr_pj->dvec[2]) < SMALL) {
	if (nbr_pj->dvec[1] > SMALL) flag = 1;
	else if (fabs(nbr_pj->dvec[1]) < SMALL && nbr_pj->dvec[0] > SMALL)
	flag = 1;
	}
	}
	}

	if (flag) {

	r_ij = nbr_pj->d;
	tmin = MIN( type_i, type_j );
	tmax = MAX( type_i, type_j );
	t = &( LR[tmin][tmax] );

	/* Cubic Spline Interpolation */
	r = (int)(r_ij * t->inv_dx);
	if( r == 0 ) ++r;
	base = (double)(r+1) * t->dx;
	dif = r_ij - base;

	e_vdW = ((t->vdW[r].ddif + t->vdW[r].c)dif + t->vdW[r].b)*dif +
	t->vdW[r].a;

	e_ele = ((t->ele[r].ddif + t->ele[r].c)dif + t->ele[r].b)*dif +
	t->ele[r].a;
	e_ele = system->my_atoms[i].q system->my_atoms[j].q;

	data->my_en.e_vdW += e_vdW;
	data->my_en.e_ele += e_ele;

	CEvd = ((t->CEvd[r].ddif + t->CEvd[r].c)dif + t->CEvd[r].b)*dif +
	t->CEvd[r].a;

	CEclmb = ((t->CEclmb[r].ddif+t->CEclmb[r].c)dif+t->CEclmb[r].b)*dif +
	t->CEclmb[r].a;
	CEclmb = system->my_atoms[i].q system->my_atoms[j].q;

	/* tally into per-atom energy */
	if( system->pair_ptr->evflag \|\| system->pair_ptr->vflag_atom) {
	rvec_ScaledSum( delij, 1., system->my_atoms[i].x,
	-1., system->my_atoms[j].x );
	f_tmp = -(CEvd + CEclmb);
	system->pair_ptr->ev_tally(i,j,natoms,1,e_vdW,e_ele,
	f_tmp,delij[0],delij[1],delij[2]);
	}

	if( control->virial == 0 ) {
	rvec_ScaledAdd( workspace->f[i], -(CEvd + CEclmb), nbr_pj->dvec );
	rvec_ScaledAdd( workspace->f[j], +(CEvd + CEclmb), nbr_pj->dvec );
	}
	else { // NPT, iNPT or sNPT
	rvec_Scale( temp, CEvd + CEclmb, nbr_pj->dvec );

	rvec_ScaledAdd( workspace->f[i], -1., temp );
	rvec_Add( workspace->f[j], temp );

	rvec_iMultiply( ext_press, nbr_pj->rel_box, temp );
	rvec_Add( data->my_ext_press, ext_press );
	}
	}
	}
	}

	Compute_Polarization_Energy( system, data );
	}



	void Compute_Polarization_Energy( reax_system system, simulation_data data )
	{
	int i, type_i;
	double q, en_tmp;

	data->my_en.e_pol = 0.0;
	for( i = 0; i < system->n; i++ ) {
	type_i = system->my_atoms[i].type;
	if (type_i < 0) continue;
	q = system->my_atoms[i].q;

	en_tmp = KCALpMOL_to_EV * (system->reax_param.sbp[type_i].chi * q +
	(system->reax_param.sbp[type_i].eta / 2.) * SQR(q));
	data->my_en.e_pol += en_tmp;

	/* tally into per-atom energy */
	if( system->pair_ptr->evflag)
	system->pair_ptr->ev_tally(i,i,system->n,1,0.0,en_tmp,0.0,0.0,0.0,0.0);
	}
	}

	void LR_vdW_Coulomb( reax_system system, storage workspace,
	control_params control, int i, int j, double r_ij, LR_data lr )
	{
	double p_vdW1 = system->reax_param.gp.l[28];
	double p_vdW1i = 1.0 / p_vdW1;
	double powr_vdW1, powgi_vdW1;
	double tmp, fn13, exp1, exp2;
	double Tap, dTap, dfn13;
	double dr3gamij_1, dr3gamij_3;
	double e_core, de_core;
	double e_lg, de_lg, r_ij5, r_ij6, re6;
	two_body_parameters *twbp;

	twbp = &(system->reax_param.tbp[i][j]);
	e_core = 0;
	de_core = 0;
	e_lg = de_lg = 0.0;

	/* calculate taper and its derivative */
	Tap = workspace->Tap[7] * r_ij + workspace->Tap[6];
	Tap = Tap * r_ij + workspace->Tap[5];
	Tap = Tap * r_ij + workspace->Tap[4];
	Tap = Tap * r_ij + workspace->Tap[3];
	Tap = Tap * r_ij + workspace->Tap[2];
	Tap = Tap * r_ij + workspace->Tap[1];
	Tap = Tap * r_ij + workspace->Tap[0];

	dTap = 7workspace->Tap[7] r_ij + 6*workspace->Tap[6];
	dTap = dTap * r_ij + 5*workspace->Tap[5];
	dTap = dTap * r_ij + 4*workspace->Tap[4];
	dTap = dTap * r_ij + 3*workspace->Tap[3];
	dTap = dTap * r_ij + 2*workspace->Tap[2];
	dTap += workspace->Tap[1]/r_ij;

	/vdWaals Calculations/
	if(system->reax_param.gp.vdw_type==1 \|\| system->reax_param.gp.vdw_type==3)
	{ // shielding
	powr_vdW1 = pow(r_ij, p_vdW1);
	powgi_vdW1 = pow( 1.0 / twbp->gamma_w, p_vdW1);

	fn13 = pow( powr_vdW1 + powgi_vdW1, p_vdW1i );
	exp1 = exp( twbp->alpha * (1.0 - fn13 / twbp->r_vdW) );
	exp2 = exp( 0.5 * twbp->alpha * (1.0 - fn13 / twbp->r_vdW) );

	lr->e_vdW = Tap * twbp->D * (exp1 - 2.0 * exp2);

	dfn13 = pow( powr_vdW1 + powgi_vdW1, p_vdW1i-1.0) * pow(r_ij, p_vdW1-2.0);

	lr->CEvd = dTap * twbp->D * (exp1 - 2.0 * exp2) -
	Tap * twbp->D * (twbp->alpha / twbp->r_vdW) * (exp1 - exp2) * dfn13;
	}
	else{ // no shielding
	exp1 = exp( twbp->alpha * (1.0 - r_ij / twbp->r_vdW) );
	exp2 = exp( 0.5 * twbp->alpha * (1.0 - r_ij / twbp->r_vdW) );

	lr->e_vdW = Tap * twbp->D * (exp1 - 2.0 * exp2);
	lr->CEvd = dTap * twbp->D * (exp1 - 2.0 * exp2) -
	Tap * twbp->D * (twbp->alpha / twbp->r_vdW) * (exp1 - exp2) / r_ij;
	}

	if(system->reax_param.gp.vdw_type==2 \|\| system->reax_param.gp.vdw_type==3)
	{ // inner wall
	e_core = twbp->ecore * exp(twbp->acore * (1.0-(r_ij/twbp->rcore)));
	lr->e_vdW += Tap * e_core;

	de_core = -(twbp->acore/twbp->rcore) * e_core;
	lr->CEvd += dTap * e_core + Tap * de_core / r_ij;

	// lg correction, only if lgvdw is yes
	if (control->lgflag) {
	r_ij5 = pow( r_ij, 5.0 );
	r_ij6 = pow( r_ij, 6.0 );
	re6 = pow( twbp->lgre, 6.0 );
	e_lg = -(twbp->lgcij/( r_ij6 + re6 ));
	lr->e_vdW += Tap * e_lg;

	de_lg = -6.0 * e_lg * r_ij5 / ( r_ij6 + re6 ) ;
	lr->CEvd += dTap * e_lg + Tap * de_lg/r_ij;
	}

	}


	/* Coulomb calculations */
	dr3gamij_1 = ( r_ij * r_ij * r_ij + twbp->gamma );
	dr3gamij_3 = pow( dr3gamij_1 , 0.33333333333333 );

	tmp = Tap / dr3gamij_3;
	lr->H = EV_to_KCALpMOL * tmp;
	lr->e_ele = C_ele * tmp;

	lr->CEclmb = C_ele * ( dTap - Tap * r_ij / dr3gamij_1 ) / dr3gamij_3;
	}
	diff --git a/src/USER-REAXC/reaxc_reset_tools.cpp b/src/USER-REAXC/reaxc_reset_tools.cpp
	index 1e6aeab47..4ec744e7b 100644
	--- a/src/USER-REAXC/reaxc_reset_tools.cpp
	+++ b/src/USER-REAXC/reaxc_reset_tools.cpp
	@@ -1,192 +1,192 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_reset_tools.h"
	#include "reaxc_list.h"
	#include "reaxc_tool_box.h"
	#include "reaxc_vector.h"

	void Reset_Atoms( reax_system* system, control_params *control )
	{
	int i;
	reax_atom *atom;

	system->numH = 0;
	if( control->hbond_cut > 0 )
	for( i = 0; i < system->n; ++i ) {
	atom = &(system->my_atoms[i]);
	if (atom->type < 0) continue;
	if( system->reax_param.sbp[ atom->type ].p_hbond == 1 )
	atom->Hindex = system->numH++;
	else atom->Hindex = -1;
	}
	}


	void Reset_Energies( energy_data *en )
	{
	en->e_bond = 0;
	en->e_ov = 0;
	en->e_un = 0;
	en->e_lp = 0;
	en->e_ang = 0;
	en->e_pen = 0;
	en->e_coa = 0;
	en->e_hb = 0;
	en->e_tor = 0;
	en->e_con = 0;
	en->e_vdW = 0;
	en->e_ele = 0;
	en->e_pol = 0;

	en->e_pot = 0;
	en->e_kin = 0;
	en->e_tot = 0;
	}


	void Reset_Temperatures( simulation_data *data )
	{
	data->therm.T = 0;
	}


	void Reset_Pressures( simulation_data *data )
	{
	data->flex_bar.P_scalar = 0;
	rtensor_MakeZero( data->flex_bar.P );

	data->iso_bar.P = 0;
	rvec_MakeZero( data->int_press );
	rvec_MakeZero( data->my_ext_press );
	rvec_MakeZero( data->ext_press );
	}


	void Reset_Simulation_Data( simulation_data* data, int virial )
	{
	Reset_Energies( &data->my_en );
	Reset_Energies( &data->sys_en );
	Reset_Temperatures( data );
	Reset_Pressures( data );
	}


	void Reset_Timing( reax_timing *rt )
	{
	rt->total = Get_Time();
	rt->comm = 0;
	rt->nbrs = 0;
	rt->init_forces = 0;
	rt->bonded = 0;
	rt->nonb = 0;
	rt->qEq = 0;
	rt->s_matvecs = 0;
	rt->t_matvecs = 0;
	}

	void Reset_Workspace( reax_system system, storage workspace )
	{
	memset( workspace->total_bond_order, 0, system->total_cap * sizeof( double ) );
	memset( workspace->dDeltap_self, 0, system->total_cap * sizeof( rvec ) );
	memset( workspace->CdDelta, 0, system->total_cap * sizeof( double ) );
	memset( workspace->f, 0, system->total_cap * sizeof( rvec ) );

	}


	void Reset_Neighbor_Lists( reax_system system, control_params control,
	storage workspace, reax_list *lists,
	MPI_Comm comm )
	{
	int i, total_bonds, Hindex, total_hbonds;
	reax_list bonds, hbonds;

	/* bonds list */
	if( system->N > 0 ){
	bonds = (*lists) + BONDS;
	total_bonds = 0;

	/* reset start-end indexes */
	for( i = 0; i < system->N; ++i ) {
	Set_Start_Index( i, total_bonds, bonds );
	Set_End_Index( i, total_bonds, bonds );
	total_bonds += system->my_atoms[i].num_bonds;
	}

	/* is reallocation needed? */
	if( total_bonds >= bonds->num_intrs * DANGER_ZONE ) {
	workspace->realloc.bonds = 1;
	if( total_bonds >= bonds->num_intrs ) {
	fprintf(stderr,
	"p%d: not enough space for bonds! total=%d allocated=%d\n",
	system->my_rank, total_bonds, bonds->num_intrs );
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}
	}
	}

	if( control->hbond_cut > 0 && system->numH > 0 ) {
	hbonds = (*lists) + HBONDS;
	total_hbonds = 0;

	/* reset start-end indexes */
	for( i = 0; i < system->n; ++i ) {
	Hindex = system->my_atoms[i].Hindex;
	if( Hindex > -1 ) {
	Set_Start_Index( Hindex, total_hbonds, hbonds );
	Set_End_Index( Hindex, total_hbonds, hbonds );
	total_hbonds += system->my_atoms[i].num_hbonds;
	}
	}

	/* is reallocation needed? */
	if( total_hbonds >= hbonds->num_intrs * 0.90/DANGER_ZONE/ ) {
	workspace->realloc.hbonds = 1;
	if( total_hbonds >= hbonds->num_intrs ) {
	fprintf(stderr,
	"p%d: not enough space for hbonds! total=%d allocated=%d\n",
	system->my_rank, total_hbonds, hbonds->num_intrs );
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}
	}
	}
	}


	void Reset( reax_system system, control_params control, simulation_data *data,
	storage workspace, reax_list *lists, MPI_Comm comm )
	{
	Reset_Atoms( system, control );

	Reset_Simulation_Data( data, control->virial );

	Reset_Workspace( system, workspace );

	Reset_Neighbor_Lists( system, control, workspace, lists, comm );

	}
	diff --git a/src/USER-REAXC/reaxc_system_props.cpp b/src/USER-REAXC/reaxc_system_props.cpp
	index 6b4551a03..54eeb6da1 100644
	--- a/src/USER-REAXC/reaxc_system_props.cpp
	+++ b/src/USER-REAXC/reaxc_system_props.cpp
	@@ -1,88 +1,88 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_system_props.h"
	#include "reaxc_tool_box.h"
	#include "reaxc_vector.h"


	void Compute_System_Energy( reax_system system, simulation_data data,
	MPI_Comm comm )
	{
	double my_en[15], sys_en[15];

	my_en[0] = data->my_en.e_bond;
	my_en[1] = data->my_en.e_ov;
	my_en[2] = data->my_en.e_un;
	my_en[3] = data->my_en.e_lp;
	my_en[4] = data->my_en.e_ang;
	my_en[5] = data->my_en.e_pen;
	my_en[6] = data->my_en.e_coa;
	my_en[7] = data->my_en.e_hb;
	my_en[8] = data->my_en.e_tor;
	my_en[9] = data->my_en.e_con;
	my_en[10] = data->my_en.e_vdW;
	my_en[11] = data->my_en.e_ele;
	my_en[12] = data->my_en.e_pol;
	my_en[13] = data->my_en.e_kin;
	MPI_Reduce( my_en, sys_en, 14, MPI_DOUBLE, MPI_SUM, MASTER_NODE, comm );

	data->my_en.e_pot = data->my_en.e_bond +
	data->my_en.e_ov + data->my_en.e_un + data->my_en.e_lp +
	data->my_en.e_ang + data->my_en.e_pen + data->my_en.e_coa +
	data->my_en.e_hb +
	data->my_en.e_tor + data->my_en.e_con +
	data->my_en.e_vdW + data->my_en.e_ele + data->my_en.e_pol;

	data->my_en.e_tot = data->my_en.e_pot + E_CONV * data->my_en.e_kin;

	if( system->my_rank == MASTER_NODE ) {
	data->sys_en.e_bond = sys_en[0];
	data->sys_en.e_ov = sys_en[1];
	data->sys_en.e_un = sys_en[2];
	data->sys_en.e_lp = sys_en[3];
	data->sys_en.e_ang = sys_en[4];
	data->sys_en.e_pen = sys_en[5];
	data->sys_en.e_coa = sys_en[6];
	data->sys_en.e_hb = sys_en[7];
	data->sys_en.e_tor = sys_en[8];
	data->sys_en.e_con = sys_en[9];
	data->sys_en.e_vdW = sys_en[10];
	data->sys_en.e_ele = sys_en[11];
	data->sys_en.e_pol = sys_en[12];
	data->sys_en.e_kin = sys_en[13];

	data->sys_en.e_pot = data->sys_en.e_bond +
	data->sys_en.e_ov + data->sys_en.e_un + data->sys_en.e_lp +
	data->sys_en.e_ang + data->sys_en.e_pen + data->sys_en.e_coa +
	data->sys_en.e_hb +
	data->sys_en.e_tor + data->sys_en.e_con +
	data->sys_en.e_vdW + data->sys_en.e_ele + data->sys_en.e_pol;

	data->sys_en.e_tot = data->sys_en.e_pot + E_CONV * data->sys_en.e_kin;
	}
	}
	diff --git a/src/USER-REAXC/reaxc_tool_box.cpp b/src/USER-REAXC/reaxc_tool_box.cpp
	index 22576e9f3..4fc6796ef 100644
	--- a/src/USER-REAXC/reaxc_tool_box.cpp
	+++ b/src/USER-REAXC/reaxc_tool_box.cpp
	@@ -1,121 +1,121 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_tool_box.h"

	struct timeval tim;
	double t_end;

	double Get_Time( )
	{
	gettimeofday(&tim, NULL );
	return( tim.tv_sec + (tim.tv_usec / 1000000.0) );
	}

	int Tokenize( char* s, char*** tok )
	{
	char test[MAX_LINE];
	const char sep = (const char )"\t \n\r\f!=";
	char *word;
	int count=0;

	strncpy( test, s, MAX_LINE );

	for( word = strtok(test, sep); word; word = strtok(NULL, sep) ) {
	strncpy( (*tok)[count], word, MAX_LINE );
	count++;
	}

	return count;
	}

	/* safe malloc */
	void smalloc( rc_bigint n, const char name, MPI_Comm comm )
	{
	void *ptr;

	if( n <= 0 ) {
	fprintf( stderr, "WARNING: trying to allocate %ld bytes for array %s. ",
	n, name );
	fprintf( stderr, "returning NULL.\n" );
	return NULL;
	}

	ptr = malloc( n );
	if( ptr == NULL ) {
	fprintf( stderr, "ERROR: failed to allocate %ld bytes for array %s",
	n, name );
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}

	return ptr;
	}


	/* safe calloc */
	void scalloc( rc_bigint n, rc_bigint size, const char name, MPI_Comm comm )
	{
	void *ptr;

	if( n <= 0 ) {
	fprintf( stderr, "WARNING: trying to allocate %ld elements for array %s. ",
	n, name );
	fprintf( stderr, "returning NULL.\n" );
	return NULL;
	}

	if( size <= 0 ) {
	fprintf( stderr, "WARNING: elements size for array %s is %ld. ",
	name, size );
	fprintf( stderr, "returning NULL.\n" );
	return NULL;
	}

	ptr = calloc( n, size );
	if( ptr == NULL ) {
	fprintf( stderr, "ERROR: failed to allocate %ld bytes for array %s",
	n*size, name );
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}

	return ptr;
	}


	/* safe free */
	void sfree( void ptr, const char name )
	{
	if( ptr == NULL ) {
	fprintf( stderr, "WARNING: trying to free the already NULL pointer %s!\n",
	name );
	return;
	}

	free( ptr );
	ptr = NULL;
	}

	diff --git a/src/USER-REAXC/reaxc_torsion_angles.cpp b/src/USER-REAXC/reaxc_torsion_angles.cpp
	index 2cfe32976..74d5b04f2 100644
	--- a/src/USER-REAXC/reaxc_torsion_angles.cpp
	+++ b/src/USER-REAXC/reaxc_torsion_angles.cpp
	@@ -1,479 +1,479 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_torsion_angles.h"
	#include "reaxc_bond_orders.h"
	#include "reaxc_list.h"
	#include "reaxc_tool_box.h"
	#include "reaxc_vector.h"

	#define MIN_SINE 1e-10

	double Calculate_Omega( rvec dvec_ij, double r_ij,
	rvec dvec_jk, double r_jk,
	rvec dvec_kl, double r_kl,
	rvec dvec_li, double r_li,
	three_body_interaction_data *p_ijk,
	three_body_interaction_data *p_jkl,
	rvec dcos_omega_di, rvec dcos_omega_dj,
	rvec dcos_omega_dk, rvec dcos_omega_dl,
	output_controls *out_control )
	{
	double unnorm_cos_omega, unnorm_sin_omega, omega;
	double sin_ijk, cos_ijk, sin_jkl, cos_jkl;
	double htra, htrb, htrc, hthd, hthe, hnra, hnrc, hnhd, hnhe;
	double arg, poem, tel;
	rvec cross_jk_kl;

	sin_ijk = sin( p_ijk->theta );
	cos_ijk = cos( p_ijk->theta );
	sin_jkl = sin( p_jkl->theta );
	cos_jkl = cos( p_jkl->theta );

	/* omega */
	unnorm_cos_omega = -rvec_Dot(dvec_ij, dvec_jk) * rvec_Dot(dvec_jk, dvec_kl) +
	SQR( r_jk ) * rvec_Dot( dvec_ij, dvec_kl );

	rvec_Cross( cross_jk_kl, dvec_jk, dvec_kl );
	unnorm_sin_omega = -r_jk * rvec_Dot( dvec_ij, cross_jk_kl );

	omega = atan2( unnorm_sin_omega, unnorm_cos_omega );

	htra = r_ij + cos_ijk * ( r_kl * cos_jkl - r_jk );
	htrb = r_jk - r_ij * cos_ijk - r_kl * cos_jkl;
	htrc = r_kl + cos_jkl * ( r_ij * cos_ijk - r_jk );
	hthd = r_ij * sin_ijk * ( r_jk - r_kl * cos_jkl );
	hthe = r_kl * sin_jkl * ( r_jk - r_ij * cos_ijk );
	hnra = r_kl * sin_ijk * sin_jkl;
	hnrc = r_ij * sin_ijk * sin_jkl;
	hnhd = r_ij * r_kl * cos_ijk * sin_jkl;
	hnhe = r_ij * r_kl * sin_ijk * cos_jkl;

	poem = 2.0 * r_ij * r_kl * sin_ijk * sin_jkl;
	if( poem < 1e-20 ) poem = 1e-20;

	tel = SQR( r_ij ) + SQR( r_jk ) + SQR( r_kl ) - SQR( r_li ) -
	2.0 * ( r_ij * r_jk * cos_ijk - r_ij * r_kl * cos_ijk * cos_jkl +
	r_jk * r_kl * cos_jkl );

	arg = tel / poem;
	if( arg > 1.0 ) arg = 1.0;
	if( arg < -1.0 ) arg = -1.0;

	if( sin_ijk >= 0 && sin_ijk <= MIN_SINE ) sin_ijk = MIN_SINE;
	else if( sin_ijk <= 0 && sin_ijk >= -MIN_SINE ) sin_ijk = -MIN_SINE;
	if( sin_jkl >= 0 && sin_jkl <= MIN_SINE ) sin_jkl = MIN_SINE;
	else if( sin_jkl <= 0 && sin_jkl >= -MIN_SINE ) sin_jkl = -MIN_SINE;

	// dcos_omega_di
	rvec_ScaledSum( dcos_omega_di, (htra-arg*hnra)/r_ij, dvec_ij, -1., dvec_li );
	rvec_ScaledAdd( dcos_omega_di,-(hthd-arg*hnhd)/sin_ijk, p_ijk->dcos_dk );
	rvec_Scale( dcos_omega_di, 2.0 / poem, dcos_omega_di );

	// dcos_omega_dj
	rvec_ScaledSum( dcos_omega_dj,-(htra-arg*hnra)/r_ij, dvec_ij,
	-htrb / r_jk, dvec_jk );
	rvec_ScaledAdd( dcos_omega_dj,-(hthd-arg*hnhd)/sin_ijk, p_ijk->dcos_dj );
	rvec_ScaledAdd( dcos_omega_dj,-(hthe-arg*hnhe)/sin_jkl, p_jkl->dcos_di );
	rvec_Scale( dcos_omega_dj, 2.0 / poem, dcos_omega_dj );

	// dcos_omega_dk
	rvec_ScaledSum( dcos_omega_dk,-(htrc-arg*hnrc)/r_kl, dvec_kl,
	htrb / r_jk, dvec_jk );
	rvec_ScaledAdd( dcos_omega_dk,-(hthd-arg*hnhd)/sin_ijk, p_ijk->dcos_di );
	rvec_ScaledAdd( dcos_omega_dk,-(hthe-arg*hnhe)/sin_jkl, p_jkl->dcos_dj );
	rvec_Scale( dcos_omega_dk, 2.0 / poem, dcos_omega_dk );

	// dcos_omega_dl
	rvec_ScaledSum( dcos_omega_dl, (htrc-arg*hnrc)/r_kl, dvec_kl, 1., dvec_li );
	rvec_ScaledAdd( dcos_omega_dl,-(hthe-arg*hnhe)/sin_jkl, p_jkl->dcos_dk );
	rvec_Scale( dcos_omega_dl, 2.0 / poem, dcos_omega_dl );

	return omega;
	}



	void Torsion_Angles( reax_system system, control_params control,
	simulation_data data, storage workspace,
	reax_list *lists, output_controls out_control )
	{
	int i, j, k, l, pi, pj, pk, pl, pij, plk, natoms;
	int type_i, type_j, type_k, type_l;
	int start_j, end_j;
	int start_pj, end_pj, start_pk, end_pk;
	int num_frb_intrs = 0;

	double Delta_j, Delta_k;
	double r_ij, r_jk, r_kl, r_li;
	double BOA_ij, BOA_jk, BOA_kl;

	double exp_tor2_ij, exp_tor2_jk, exp_tor2_kl;
	double exp_tor1, exp_tor3_DjDk, exp_tor4_DjDk, exp_tor34_inv;
	double exp_cot2_jk, exp_cot2_ij, exp_cot2_kl;
	double fn10, f11_DjDk, dfn11, fn12;
	double theta_ijk, theta_jkl;
	double sin_ijk, sin_jkl;
	double cos_ijk, cos_jkl;
	double tan_ijk_i, tan_jkl_i;
	double omega, cos_omega, cos2omega, cos3omega;
	rvec dcos_omega_di, dcos_omega_dj, dcos_omega_dk, dcos_omega_dl;
	double CV, cmn, CEtors1, CEtors2, CEtors3, CEtors4;
	double CEtors5, CEtors6, CEtors7, CEtors8, CEtors9;
	double Cconj, CEconj1, CEconj2, CEconj3;
	double CEconj4, CEconj5, CEconj6;
	double e_tor, e_con;
	rvec dvec_li;
	rvec force, ext_press;
	ivec rel_box_jl;
	// rtensor total_rtensor, temp_rtensor;
	four_body_header *fbh;
	four_body_parameters *fbp;
	bond_data pbond_ij, pbond_jk, *pbond_kl;
	bond_order_data bo_ij, bo_jk, *bo_kl;
	three_body_interaction_data p_ijk, p_jkl;
	double p_tor2 = system->reax_param.gp.l[23];
	double p_tor3 = system->reax_param.gp.l[24];
	double p_tor4 = system->reax_param.gp.l[25];
	double p_cot2 = system->reax_param.gp.l[27];
	reax_list bonds = (lists) + BONDS;
	reax_list thb_intrs = (lists) + THREE_BODIES;

	// Virial tallying variables
	double delil[3], deljl[3], delkl[3];
	double eng_tmp, fi_tmp[3], fj_tmp[3], fk_tmp[3];

	natoms = system->n;

	for( j = 0; j < natoms; ++j ) {
	type_j = system->my_atoms[j].type;
	Delta_j = workspace->Delta_boc[j];
	start_j = Start_Index(j, bonds);
	end_j = End_Index(j, bonds);

	for( pk = start_j; pk < end_j; ++pk ) {
	pbond_jk = &( bonds->select.bond_list[pk] );
	k = pbond_jk->nbr;
	bo_jk = &( pbond_jk->bo_data );
	BOA_jk = bo_jk->BO - control->thb_cut;

	if( system->my_atoms[j].orig_id > system->my_atoms[k].orig_id )
	continue;
	if( system->my_atoms[j].orig_id == system->my_atoms[k].orig_id ) {
	if (system->my_atoms[k].x[2] < system->my_atoms[j].x[2]) continue;
	if (system->my_atoms[k].x[2] == system->my_atoms[j].x[2] &&
	system->my_atoms[k].x[1] < system->my_atoms[j].x[1]) continue;
	if (system->my_atoms[k].x[2] == system->my_atoms[j].x[2] &&
	system->my_atoms[k].x[1] == system->my_atoms[j].x[1] &&
	system->my_atoms[k].x[0] < system->my_atoms[j].x[0]) continue;
	}

	if( bo_jk->BO > control->thb_cut/0/ && Num_Entries(pk, thb_intrs) ) {
	pj = pbond_jk->sym_index; // pj points to j on k's list

	if( Num_Entries(pj, thb_intrs) ) {
	type_k = system->my_atoms[k].type;
	Delta_k = workspace->Delta_boc[k];
	r_jk = pbond_jk->d;

	start_pk = Start_Index(pk, thb_intrs );
	end_pk = End_Index(pk, thb_intrs );
	start_pj = Start_Index(pj, thb_intrs );
	end_pj = End_Index(pj, thb_intrs );

	exp_tor2_jk = exp( -p_tor2 * BOA_jk );
	exp_cot2_jk = exp( -p_cot2 * SQR(BOA_jk - 1.5) );
	exp_tor3_DjDk = exp( -p_tor3 * (Delta_j + Delta_k) );
	exp_tor4_DjDk = exp( p_tor4 * (Delta_j + Delta_k) );
	exp_tor34_inv = 1.0 / (1.0 + exp_tor3_DjDk + exp_tor4_DjDk);
	f11_DjDk = (2.0 + exp_tor3_DjDk) * exp_tor34_inv;

	for( pi = start_pk; pi < end_pk; ++pi ) {
	p_ijk = &( thb_intrs->select.three_body_list[pi] );
	pij = p_ijk->pthb; // pij is pointer to i on j's bond_list
	pbond_ij = &( bonds->select.bond_list[pij] );
	bo_ij = &( pbond_ij->bo_data );

	if( bo_ij->BO > control->thb_cut/0/ ) {
	i = p_ijk->thb;
	type_i = system->my_atoms[i].type;
	r_ij = pbond_ij->d;
	BOA_ij = bo_ij->BO - control->thb_cut;

	theta_ijk = p_ijk->theta;
	sin_ijk = sin( theta_ijk );
	cos_ijk = cos( theta_ijk );
	//tan_ijk_i = 1. / tan( theta_ijk );
	if( sin_ijk >= 0 && sin_ijk <= MIN_SINE )
	tan_ijk_i = cos_ijk / MIN_SINE;
	else if( sin_ijk <= 0 && sin_ijk >= -MIN_SINE )
	tan_ijk_i = cos_ijk / -MIN_SINE;
	else tan_ijk_i = cos_ijk / sin_ijk;

	exp_tor2_ij = exp( -p_tor2 * BOA_ij );
	exp_cot2_ij = exp( -p_cot2 * SQR(BOA_ij -1.5) );

	for( pl = start_pj; pl < end_pj; ++pl ) {
	p_jkl = &( thb_intrs->select.three_body_list[pl] );
	l = p_jkl->thb;
	plk = p_jkl->pthb; //pointer to l on k's bond_list!
	pbond_kl = &( bonds->select.bond_list[plk] );
	bo_kl = &( pbond_kl->bo_data );
	type_l = system->my_atoms[l].type;
	fbh = &(system->reax_param.fbp[type_i][type_j]
	[type_k][type_l]);
	fbp = &(system->reax_param.fbp[type_i][type_j]
	[type_k][type_l].prm[0]);

	if( i != l && fbh->cnt &&
	bo_kl->BO > control->thb_cut/0/ &&
	bo_ij->BO * bo_jk->BO * bo_kl->BO > control->thb_cut/0/ ){
	++num_frb_intrs;
	r_kl = pbond_kl->d;
	BOA_kl = bo_kl->BO - control->thb_cut;

	theta_jkl = p_jkl->theta;
	sin_jkl = sin( theta_jkl );
	cos_jkl = cos( theta_jkl );
	//tan_jkl_i = 1. / tan( theta_jkl );
	if( sin_jkl >= 0 && sin_jkl <= MIN_SINE )
	tan_jkl_i = cos_jkl / MIN_SINE;
	else if( sin_jkl <= 0 && sin_jkl >= -MIN_SINE )
	tan_jkl_i = cos_jkl / -MIN_SINE;
	else tan_jkl_i = cos_jkl /sin_jkl;

	rvec_ScaledSum( dvec_li, 1., system->my_atoms[i].x,
	-1., system->my_atoms[l].x );
	r_li = rvec_Norm( dvec_li );


	/* omega and its derivative */
	omega = Calculate_Omega( pbond_ij->dvec, r_ij,
	pbond_jk->dvec, r_jk,
	pbond_kl->dvec, r_kl,
	dvec_li, r_li,
	p_ijk, p_jkl,
	dcos_omega_di, dcos_omega_dj,
	dcos_omega_dk, dcos_omega_dl,
	out_control );

	cos_omega = cos( omega );
	cos2omega = cos( 2. * omega );
	cos3omega = cos( 3. * omega );
	/* end omega calculations */

	/* torsion energy */
	exp_tor1 = exp( fbp->p_tor1 *
	SQR(2.0 - bo_jk->BO_pi - f11_DjDk) );
	exp_tor2_kl = exp( -p_tor2 * BOA_kl );
	exp_cot2_kl = exp( -p_cot2 * SQR(BOA_kl - 1.5) );
	fn10 = (1.0 - exp_tor2_ij) * (1.0 - exp_tor2_jk) *
	(1.0 - exp_tor2_kl);

	CV = 0.5 * ( fbp->V1 * (1.0 + cos_omega) +
	fbp->V2 * exp_tor1 * (1.0 - cos2omega) +
	fbp->V3 * (1.0 + cos3omega) );

	data->my_en.e_tor += e_tor = fn10 * sin_ijk * sin_jkl * CV;

	dfn11 = (-p_tor3 * exp_tor3_DjDk +
	(p_tor3 * exp_tor3_DjDk - p_tor4 * exp_tor4_DjDk) *
	(2.0 + exp_tor3_DjDk) * exp_tor34_inv) *
	exp_tor34_inv;

	CEtors1 = sin_ijk * sin_jkl * CV;

	CEtors2 = -fn10 * 2.0 * fbp->p_tor1 * fbp->V2 * exp_tor1 *
	(2.0 - bo_jk->BO_pi - f11_DjDk) * (1.0 - SQR(cos_omega)) *
	sin_ijk * sin_jkl;
	CEtors3 = CEtors2 * dfn11;

	CEtors4 = CEtors1 * p_tor2 * exp_tor2_ij *
	(1.0 - exp_tor2_jk) * (1.0 - exp_tor2_kl);
	CEtors5 = CEtors1 * p_tor2 *
	(1.0 - exp_tor2_ij) * exp_tor2_jk * (1.0 - exp_tor2_kl);
	CEtors6 = CEtors1 * p_tor2 *
	(1.0 - exp_tor2_ij) * (1.0 - exp_tor2_jk) * exp_tor2_kl;

	cmn = -fn10 * CV;
	CEtors7 = cmn * sin_jkl * tan_ijk_i;
	CEtors8 = cmn * sin_ijk * tan_jkl_i;

	CEtors9 = fn10 * sin_ijk * sin_jkl *
	(0.5 * fbp->V1 - 2.0 * fbp->V2 * exp_tor1 * cos_omega +
	1.5 * fbp->V3 * (cos2omega + 2.0 * SQR(cos_omega)));
	/* end of torsion energy */

	/* 4-body conjugation energy */
	fn12 = exp_cot2_ij * exp_cot2_jk * exp_cot2_kl;
	data->my_en.e_con += e_con =
	fbp->p_cot1 * fn12 *
	(1.0 + (SQR(cos_omega) - 1.0) * sin_ijk * sin_jkl);

	Cconj = -2.0 * fn12 * fbp->p_cot1 * p_cot2 *
	(1.0 + (SQR(cos_omega) - 1.0) * sin_ijk * sin_jkl);

	CEconj1 = Cconj * (BOA_ij - 1.5e0);
	CEconj2 = Cconj * (BOA_jk - 1.5e0);
	CEconj3 = Cconj * (BOA_kl - 1.5e0);

	CEconj4 = -fbp->p_cot1 * fn12 *
	(SQR(cos_omega) - 1.0) * sin_jkl * tan_ijk_i;
	CEconj5 = -fbp->p_cot1 * fn12 *
	(SQR(cos_omega) - 1.0) * sin_ijk * tan_jkl_i;
	CEconj6 = 2.0 * fbp->p_cot1 * fn12 *
	cos_omega * sin_ijk * sin_jkl;
	/* end 4-body conjugation energy */

	/* forces */
	bo_jk->Cdbopi += CEtors2;
	workspace->CdDelta[j] += CEtors3;
	workspace->CdDelta[k] += CEtors3;
	bo_ij->Cdbo += (CEtors4 + CEconj1);
	bo_jk->Cdbo += (CEtors5 + CEconj2);
	bo_kl->Cdbo += (CEtors6 + CEconj3);

	if( control->virial == 0 ) {
	/* dcos_theta_ijk */
	rvec_ScaledAdd( workspace->f[i],
	CEtors7 + CEconj4, p_ijk->dcos_dk );
	rvec_ScaledAdd( workspace->f[j],
	CEtors7 + CEconj4, p_ijk->dcos_dj );
	rvec_ScaledAdd( workspace->f[k],
	CEtors7 + CEconj4, p_ijk->dcos_di );

	/* dcos_theta_jkl */
	rvec_ScaledAdd( workspace->f[j],
	CEtors8 + CEconj5, p_jkl->dcos_di );
	rvec_ScaledAdd( workspace->f[k],
	CEtors8 + CEconj5, p_jkl->dcos_dj );
	rvec_ScaledAdd( workspace->f[l],
	CEtors8 + CEconj5, p_jkl->dcos_dk );

	/* dcos_omega */
	rvec_ScaledAdd( workspace->f[i],
	CEtors9 + CEconj6, dcos_omega_di );
	rvec_ScaledAdd( workspace->f[j],
	CEtors9 + CEconj6, dcos_omega_dj );
	rvec_ScaledAdd( workspace->f[k],
	CEtors9 + CEconj6, dcos_omega_dk );
	rvec_ScaledAdd( workspace->f[l],
	CEtors9 + CEconj6, dcos_omega_dl );
	}
	else {
	ivec_Sum(rel_box_jl, pbond_jk->rel_box, pbond_kl->rel_box);

	/* dcos_theta_ijk */
	rvec_Scale( force, CEtors7 + CEconj4, p_ijk->dcos_dk );
	rvec_Add( workspace->f[i], force );
	rvec_iMultiply( ext_press, pbond_ij->rel_box, force );
	rvec_Add( data->my_ext_press, ext_press );

	rvec_ScaledAdd( workspace->f[j],
	CEtors7 + CEconj4, p_ijk->dcos_dj );

	rvec_Scale( force, CEtors7 + CEconj4, p_ijk->dcos_di );
	rvec_Add( workspace->f[k], force );
	rvec_iMultiply( ext_press, pbond_jk->rel_box, force );
	rvec_Add( data->my_ext_press, ext_press );


	/* dcos_theta_jkl */
	rvec_ScaledAdd( workspace->f[j],
	CEtors8 + CEconj5, p_jkl->dcos_di );

	rvec_Scale( force, CEtors8 + CEconj5, p_jkl->dcos_dj );
	rvec_Add( workspace->f[k], force );
	rvec_iMultiply( ext_press, pbond_jk->rel_box, force );
	rvec_Add( data->my_ext_press, ext_press );

	rvec_Scale( force, CEtors8 + CEconj5, p_jkl->dcos_dk );
	rvec_Add( workspace->f[l], force );
	rvec_iMultiply( ext_press, rel_box_jl, force );
	rvec_Add( data->my_ext_press, ext_press );


	/* dcos_omega */
	rvec_Scale( force, CEtors9 + CEconj6, dcos_omega_di );
	rvec_Add( workspace->f[i], force );
	rvec_iMultiply( ext_press, pbond_ij->rel_box, force );
	rvec_Add( data->my_ext_press, ext_press );

	rvec_ScaledAdd( workspace->f[j],
	CEtors9 + CEconj6, dcos_omega_dj );

	rvec_Scale( force, CEtors9 + CEconj6, dcos_omega_dk );
	rvec_Add( workspace->f[k], force );
	rvec_iMultiply( ext_press, pbond_jk->rel_box, force );
	rvec_Add( data->my_ext_press, ext_press );

	rvec_Scale( force, CEtors9 + CEconj6, dcos_omega_dl );
	rvec_Add( workspace->f[l], force );
	rvec_iMultiply( ext_press, rel_box_jl, force );
	rvec_Add( data->my_ext_press, ext_press );
	}

	/* tally into per-atom virials */
	if( system->pair_ptr->vflag_atom \|\| system->pair_ptr->evflag) {

	// acquire vectors
	rvec_ScaledSum( delil, 1., system->my_atoms[l].x,
	-1., system->my_atoms[i].x );
	rvec_ScaledSum( deljl, 1., system->my_atoms[l].x,
	-1., system->my_atoms[j].x );
	rvec_ScaledSum( delkl, 1., system->my_atoms[l].x,
	-1., system->my_atoms[k].x );
	// dcos_theta_ijk
	rvec_Scale( fi_tmp, CEtors7 + CEconj4, p_ijk->dcos_dk );
	rvec_Scale( fj_tmp, CEtors7 + CEconj4, p_ijk->dcos_dj );
	rvec_Scale( fk_tmp, CEtors7 + CEconj4, p_ijk->dcos_di );

	// dcos_theta_jkl
	rvec_ScaledAdd( fj_tmp, CEtors8 + CEconj5, p_jkl->dcos_di );
	rvec_ScaledAdd( fk_tmp, CEtors8 + CEconj5, p_jkl->dcos_dj );

	// dcos_omega
	rvec_ScaledAdd( fi_tmp, CEtors9 + CEconj6, dcos_omega_di );
	rvec_ScaledAdd( fj_tmp, CEtors9 + CEconj6, dcos_omega_dj );
	rvec_ScaledAdd( fk_tmp, CEtors9 + CEconj6, dcos_omega_dk );

	// tally
	eng_tmp = e_tor + e_con;
	if( system->pair_ptr->evflag)
	system->pair_ptr->ev_tally(j,k,natoms,1,eng_tmp,0.0,0.0,0.0,0.0,0.0);
	if( system->pair_ptr->vflag_atom)
	system->pair_ptr->v_tally4(i,j,k,l,fi_tmp,fj_tmp,fk_tmp,delil,deljl,delkl);
	}
	} // pl check ends
	} // pl loop ends
	} // pi check ends
	} // pi loop ends
	} // k-j neighbor check ends
	} // j-k neighbor check ends
	} // pk loop ends
	} // j loop
	}
	diff --git a/src/USER-REAXC/reaxc_traj.cpp b/src/USER-REAXC/reaxc_traj.cpp
	index 9d4fa7352..ae2bba215 100644
	--- a/src/USER-REAXC/reaxc_traj.cpp
	+++ b/src/USER-REAXC/reaxc_traj.cpp
	@@ -1,777 +1,777 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_traj.h"
	#include "reaxc_list.h"
	#include "reaxc_tool_box.h"

	int Reallocate_Output_Buffer( output_controls *out_control, int req_space,
	MPI_Comm comm )
	{
	if( out_control->buffer_len > 0 )
	free( out_control->buffer );

	out_control->buffer_len = (int)(req_space*SAFE_ZONE);
	out_control->buffer = (char) malloc(out_control->buffer_lensizeof(char));
	if( out_control->buffer == NULL ) {
	fprintf( stderr,
	"insufficient memory for required buffer size %d. terminating!\n",
	(int) (req_space*SAFE_ZONE) );
	MPI_Abort( comm, INSUFFICIENT_MEMORY );
	}

	return SUCCESS;
	}


	void Write_Skip_Line( output_controls out_control, mpi_datatypes mpi_data,
	int my_rank, int skip, int num_section )
	{
	if( my_rank == MASTER_NODE )
	fprintf( out_control->strj, INT2_LINE,
	"chars_to_skip_section:", skip, num_section );
	}


	int Write_Header( reax_system system, control_params control,
	output_controls out_control, mpi_datatypes mpi_data )
	{
	int num_hdr_lines, my_hdr_lines, buffer_req;
	char ensembles[ens_N][25] = { "NVE", "NVT", "fully flexible NPT",
	"semi isotropic NPT", "isotropic NPT" };
	char reposition[3][25] = { "fit to periodic box", "CoM to center of box",
	"CoM to origin" };
	char t_regime[3][25] = { "T-coupling only", "step-wise", "constant slope" };

	char traj_methods[TF_N][10] = { "custom", "xyz" };
	char atom_formats[8][40] = { "none", "invalid", "invalid", "invalid",
	"xyz_q",
	"xyz_q_fxfyfz",
	"xyz_q_vxvyvz",
	"detailed_atom_info" };
	char bond_formats[3][30] = { "none",
	"basic_bond_info",
	"detailed_bond_info" };
	char angle_formats[2][30] = { "none", "basic_angle_info" };

	/* set header lengths */
	num_hdr_lines = NUM_HEADER_LINES;
	my_hdr_lines = num_hdr_lines * ( system->my_rank == MASTER_NODE );
	buffer_req = my_hdr_lines * HEADER_LINE_LEN;
	if( buffer_req > out_control->buffer_len * DANGER_ZONE )
	Reallocate_Output_Buffer( out_control, buffer_req, mpi_data->world );

	/* only the master node writes into trajectory header */
	if( system->my_rank == MASTER_NODE ) {
	/* clear the contents of line & buffer */
	out_control->line[0] = 0;
	out_control->buffer[0] = 0;

	/* to skip the header */
	sprintf( out_control->line, INT_LINE, "chars_to_skip_header:",
	(num_hdr_lines-1) * HEADER_LINE_LEN );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	/* general simulation info */
	sprintf( out_control->line, STR_LINE, "simulation_name:",
	out_control->traj_title );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, BIGINT_LINE, "number_of_atoms:", system->bigN );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, STR_LINE, "ensemble_type:",
	ensembles[ control->ensemble ] );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, INT_LINE, "number_of_steps:",
	control->nsteps );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "timestep_length_(in_fs):",
	control->dt * 1000 );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	/* restart info */
	sprintf( out_control->line, STR_LINE, "is_this_a_restart?:",
	(control->restart ? "yes" : "no") );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, STR_LINE, "write_restart_files?:",
	((out_control->restart_freq > 0) ? "yes" : "no") );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, INT_LINE, "frequency_to_write_restarts:",
	out_control->restart_freq );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	/* preferences */
	sprintf( out_control->line, STR_LINE, "tabulate_long_range_intrs?:",
	(control->tabulate ? "yes" : "no") );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, INT_LINE, "table_size:", control->tabulate );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, STR_LINE, "restrict_bonds?:",
	(control->restrict_bonds ? "yes" : "no") );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, INT_LINE, "bond_restriction_length:",
	control->restrict_bonds );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, STR_LINE, "reposition_atoms?:",
	reposition[control->reposition_atoms] );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, INT_LINE, "remove_CoM_velocity?:",
	(control->ensemble==NVE) ? 0 : control->remove_CoM_vel);
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );


	/* cut-off values */
	sprintf( out_control->line, REAL_LINE, "bonded_intr_dist_cutoff:",
	control->bond_cut );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "nonbonded_intr_dist_cutoff:",
	control->nonb_cut );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "hbond_dist_cutoff:",
	control->hbond_cut );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "reax_bond_threshold:",
	control->bo_cut );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "physical_bond_threshold:",
	control->bg_cut );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "valence_angle_threshold:",
	control->thb_cut );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, SCI_LINE, "QEq_tolerance:", control->q_err );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	/* temperature controls */
	sprintf( out_control->line, REAL_LINE, "initial_temperature:",
	control->T_init );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "target_temperature:",
	control->T_final );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "thermal_inertia:",
	control->Tau_T );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, STR_LINE, "temperature_regime:",
	t_regime[ control->T_mode ] );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "temperature_change_rate_(K/ps):",
	control->T_rate / control->T_freq );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	/* pressure controls */
	sprintf( out_control->line, REAL3_LINE, "target_pressure_(GPa):",
	control->P[0], control->P[1], control->P[2] );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL3_LINE, "virial_inertia:",
	control->Tau_P[0], control->Tau_P[1], control->Tau_P[2] );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	/* trajectory */
	sprintf( out_control->line, INT_LINE, "energy_dumping_freq:",
	out_control->energy_update_freq );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, INT_LINE, "trajectory_dumping_freq:",
	out_control->write_steps );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, STR_LINE, "compress_trajectory_output?:",
	(out_control->traj_compress ? "yes" : "no") );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, STR_LINE, "trajectory_format:",
	traj_methods[ out_control->traj_method ] );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, STR_LINE, "atom_info:",
	atom_formats[ out_control->atom_info ] );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, STR_LINE, "bond_info:",
	bond_formats[ out_control->bond_info ] );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, STR_LINE, "angle_info:",
	angle_formats[ out_control->angle_info ] );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	/* analysis */
	//sprintf( out_control->line, STR_LINE, "molecular_analysis:",
	// (control->molec_anal ? "yes" : "no") );
	//strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, INT_LINE, "molecular_analysis_frequency:",
	control->molecular_analysis );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
	}

	/* dump out the buffer */
	if( system->my_rank == MASTER_NODE )
	fprintf( out_control->strj, "%s", out_control->buffer );

	return SUCCESS;
	}


	int Write_Init_Desc( reax_system system, control_params control,
	output_controls out_control, mpi_datatypes mpi_data )
	{
	int i, me, np, cnt, buffer_len, buffer_req;
	reax_atom *p_atom;
	MPI_Status status;

	me = system->my_rank;
	np = system->wsize;

	/* skip info */
	Write_Skip_Line( out_control, mpi_data, me,
	system->bigN * INIT_DESC_LEN, system->bigN );

	if( out_control->traj_method == REG_TRAJ && me == MASTER_NODE )
	buffer_req = system->bigN * INIT_DESC_LEN + 1;
	else buffer_req = system->n * INIT_DESC_LEN + 1;

	if( buffer_req > out_control->buffer_len * DANGER_ZONE )
	Reallocate_Output_Buffer( out_control, buffer_req, mpi_data->world );

	out_control->line[0] = 0;
	out_control->buffer[0] = 0;
	for( i = 0; i < system->n; ++i ) {
	p_atom = &( system->my_atoms[i] );
	sprintf( out_control->line, INIT_DESC,
	p_atom->orig_id, p_atom->type, p_atom->name,
	system->reax_param.sbp[ p_atom->type ].mass );
	strncpy( out_control->buffer + i*INIT_DESC_LEN,
	out_control->line, INIT_DESC_LEN+1 );
	}

	if( me != MASTER_NODE )
	MPI_Send( out_control->buffer, buffer_req-1, MPI_CHAR, MASTER_NODE,
	np * INIT_DESCS + me, mpi_data->world );
	else{
	buffer_len = system->n * INIT_DESC_LEN;
	for( i = 0; i < np; ++i )
	if( i != MASTER_NODE ) {
	MPI_Recv( out_control->buffer + buffer_len, buffer_req - buffer_len,
	MPI_CHAR, i, np*INIT_DESCS+i, mpi_data->world, &status );
	MPI_Get_count( &status, MPI_CHAR, &cnt );
	buffer_len += cnt;
	}
	out_control->buffer[buffer_len] = 0;
	fprintf( out_control->strj, "%s", out_control->buffer );
	}

	return SUCCESS;
	}


	int Init_Traj( reax_system system, control_params control,
	output_controls out_control, mpi_datatypes mpi_data,
	char *msg )
	{
	char fname[MAX_STR];
	int atom_line_len[ NR_OPT_ATOM ] = { 0, 0, 0, 0,
	ATOM_BASIC_LEN, ATOM_wV_LEN,
	ATOM_wF_LEN, ATOM_FULL_LEN };
	int bond_line_len[ NR_OPT_BOND ] = { 0, BOND_BASIC_LEN, BOND_FULL_LEN };
	int angle_line_len[ NR_OPT_ANGLE ] = { 0, ANGLE_BASIC_LEN };

	/* generate trajectory name */
	sprintf( fname, "%s.trj", control->sim_name );

	/* how should I write atoms? */
	out_control->atom_line_len = atom_line_len[ out_control->atom_info ];
	out_control->write_atoms = ( out_control->atom_line_len ? 1 : 0 );
	/* bonds? */
	out_control->bond_line_len = bond_line_len[ out_control->bond_info ];
	out_control->write_bonds = ( out_control->bond_line_len ? 1 : 0 );
	/* angles? */
	out_control->angle_line_len = angle_line_len[ out_control->angle_info ];
	out_control->write_angles = ( out_control->angle_line_len ? 1 : 0 );

	/* allocate line & buffer space */
	out_control->line = (char*) calloc( MAX_TRJ_LINE_LEN + 1, sizeof(char) );
	out_control->buffer_len = 0;
	out_control->buffer = NULL;

	/* write trajectory header and atom info, if applicable */
	if( out_control->traj_method == REG_TRAJ) {
	if( system->my_rank == MASTER_NODE )
	out_control->strj = fopen( fname, "w" );
	}
	else {
	strcpy( msg, "init_traj: unknown trajectory option" );
	return FAILURE;
	}
	Write_Header( system, control, out_control, mpi_data );
	Write_Init_Desc( system, control, out_control, mpi_data );

	return SUCCESS;
	}


	int Write_Frame_Header( reax_system system, control_params control,
	simulation_data data, output_controls out_control,
	mpi_datatypes *mpi_data )
	{
	int me, num_frm_hdr_lines, my_frm_hdr_lines, buffer_req;

	me = system->my_rank;
	/* frame header lengths */
	num_frm_hdr_lines = 22;
	my_frm_hdr_lines = num_frm_hdr_lines * ( me == MASTER_NODE );
	buffer_req = my_frm_hdr_lines * HEADER_LINE_LEN;
	if( buffer_req > out_control->buffer_len * DANGER_ZONE )
	Reallocate_Output_Buffer( out_control, buffer_req, mpi_data->world );

	/* only the master node writes into trajectory header */
	if( me == MASTER_NODE ) {
	/* clear the contents of line & buffer */
	out_control->line[0] = 0;
	out_control->buffer[0] = 0;

	/* skip info */
	sprintf( out_control->line, INT_LINE, "chars_to_skip_frame_header:",
	(num_frm_hdr_lines - 1) * HEADER_LINE_LEN );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	/* step & time */
	sprintf( out_control->line, INT_LINE, "step:", data->step );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "time_in_ps:",
	data->step * control->dt );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );


	/* box info */
	sprintf( out_control->line, REAL_LINE, "volume:", system->big_box.V );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL3_LINE, "box_dimensions:",
	system->big_box.box_norms[0],
	system->big_box.box_norms[1],
	system->big_box.box_norms[2] );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL3_LINE,
	"coordinate_angles:", 90.0, 90.0, 90.0 );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );


	/* system T and P */
	sprintf( out_control->line, REAL_LINE, "temperature:", data->therm.T );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "pressure:",
	(control->ensemble==iNPT) ?
	data->iso_bar.P : data->flex_bar.P_scalar );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );


	/* energies */
	sprintf( out_control->line, REAL_LINE, "total_energy:",
	data->sys_en.e_tot );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "total_kinetic:",
	data->sys_en.e_kin );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "total_potential:",
	data->sys_en.e_pot );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "bond_energy:",
	data->sys_en.e_bond );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "atom_energy:",
	data->sys_en.e_ov + data->sys_en.e_un );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "lone_pair_energy:",
	data->sys_en.e_lp );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "valence_angle_energy:",
	data->sys_en.e_ang + data->sys_en.e_pen );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "3-body_conjugation:",
	data->sys_en.e_coa );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "hydrogen_bond_energy:",
	data->sys_en.e_hb );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "torsion_angle_energy:",
	data->sys_en.e_tor );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "4-body_conjugation:",
	data->sys_en.e_con );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "vdWaals_energy:",
	data->sys_en.e_vdW );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "electrostatics_energy:",
	data->sys_en.e_ele );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );

	sprintf( out_control->line, REAL_LINE, "polarization_energy:",
	data->sys_en.e_pol );
	strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
	}

	/* dump out the buffer */
	if( system->my_rank == MASTER_NODE )
	fprintf( out_control->strj, "%s", out_control->buffer );

	return SUCCESS;
	}



	int Write_Atoms( reax_system system, control_params control,
	output_controls out_control, mpi_datatypes mpi_data )
	{
	int i, me, np, line_len, buffer_len, buffer_req, cnt;
	MPI_Status status;
	reax_atom *p_atom;

	me = system->my_rank;
	np = system->wsize;
	line_len = out_control->atom_line_len;

	Write_Skip_Line( out_control, mpi_data, me,
	system->bigN*line_len, system->bigN );

	if( out_control->traj_method == REG_TRAJ && me == MASTER_NODE )
	buffer_req = system->bigN * line_len + 1;
	else buffer_req = system->n * line_len + 1;

	if( buffer_req > out_control->buffer_len * DANGER_ZONE )
	Reallocate_Output_Buffer( out_control, buffer_req, mpi_data->world );

	/* fill in buffer */
	out_control->line[0] = 0;
	out_control->buffer[0] = 0;
	for( i = 0; i < system->n; ++i ) {
	p_atom = &( system->my_atoms[i] );

	switch( out_control->atom_info ) {
	case OPT_ATOM_BASIC:
	sprintf( out_control->line, ATOM_BASIC,
	p_atom->orig_id, p_atom->x[0], p_atom->x[1], p_atom->x[2],
	p_atom->q );
	break;
	case OPT_ATOM_wF:
	sprintf( out_control->line, ATOM_wF,
	p_atom->orig_id, p_atom->x[0], p_atom->x[1], p_atom->x[2],
	p_atom->f[0], p_atom->f[1], p_atom->f[2], p_atom->q );
	break;
	case OPT_ATOM_wV:
	sprintf( out_control->line, ATOM_wV,
	p_atom->orig_id, p_atom->x[0], p_atom->x[1], p_atom->x[2],
	p_atom->v[0], p_atom->v[1], p_atom->v[2], p_atom->q );
	break;
	case OPT_ATOM_FULL:
	sprintf( out_control->line, ATOM_FULL,
	p_atom->orig_id, p_atom->x[0], p_atom->x[1], p_atom->x[2],
	p_atom->v[0], p_atom->v[1], p_atom->v[2],
	p_atom->f[0], p_atom->f[1], p_atom->f[2], p_atom->q );
	break;
	default:
	fprintf( stderr,
	"write_traj_atoms: unknown atom trajectroy format!\n");
	MPI_Abort( mpi_data->world, UNKNOWN_OPTION );
	}

	strncpy( out_control->buffer + i*line_len, out_control->line, line_len+1 );
	}

	if( me != MASTER_NODE )
	MPI_Send( out_control->buffer, buffer_req-1, MPI_CHAR, MASTER_NODE,
	np*ATOM_LINES+me, mpi_data->world );
	else{
	buffer_len = system->n * line_len;
	for( i = 0; i < np; ++i )
	if( i != MASTER_NODE ) {
	MPI_Recv( out_control->buffer + buffer_len, buffer_req - buffer_len,
	MPI_CHAR, i, np*ATOM_LINES+i, mpi_data->world, &status );
	MPI_Get_count( &status, MPI_CHAR, &cnt );
	buffer_len += cnt;
	}
	out_control->buffer[buffer_len] = 0;
	fprintf( out_control->strj, "%s", out_control->buffer );
	}

	return SUCCESS;
	}


	int Write_Bonds(reax_system system, control_params control, reax_list *bonds,
	output_controls out_control, mpi_datatypes mpi_data)
	{
	int i, j, pj, me, np;
	int my_bonds, num_bonds;
	int line_len, buffer_len, buffer_req, cnt;
	MPI_Status status;
	bond_data *bo_ij;

	me = system->my_rank;
	np = system->wsize;
	line_len = out_control->bond_line_len;

	/* count the number of bonds I will write */
	my_bonds = 0;
	for( i=0; i < system->n; ++i )
	for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj ) {
	j = bonds->select.bond_list[pj].nbr;
	if( system->my_atoms[i].orig_id <= system->my_atoms[j].orig_id &&
	bonds->select.bond_list[pj].bo_data.BO >= control->bg_cut )
	++my_bonds;
	}

	/* allreduce - total number of bonds */
	MPI_Allreduce( &my_bonds, &num_bonds, 1, MPI_INT, MPI_SUM, mpi_data->world );

	Write_Skip_Line( out_control, mpi_data, me, num_bonds*line_len, num_bonds );

	if( out_control->traj_method == REG_TRAJ && me == MASTER_NODE )
	buffer_req = num_bonds * line_len + 1;
	else buffer_req = my_bonds * line_len + 1;

	if( buffer_req > out_control->buffer_len * DANGER_ZONE )
	Reallocate_Output_Buffer( out_control, buffer_req, mpi_data->world );

	/* fill in the buffer */
	out_control->line[0] = 0;
	out_control->buffer[0] = 0;

	my_bonds = 0;
	for( i=0; i < system->n; ++i ) {
	for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj ) {
	bo_ij = &( bonds->select.bond_list[pj] );
	j = bo_ij->nbr;

	if( system->my_atoms[i].orig_id <= system->my_atoms[j].orig_id &&
	bo_ij->bo_data.BO >= control->bg_cut ) {
	switch( out_control->bond_info ) {
	case OPT_BOND_BASIC:
	sprintf( out_control->line, BOND_BASIC,
	system->my_atoms[i].orig_id, system->my_atoms[j].orig_id,
	bo_ij->d, bo_ij->bo_data.BO );
	break;
	case OPT_BOND_FULL:
	sprintf( out_control->line, BOND_FULL,
	system->my_atoms[i].orig_id, system->my_atoms[j].orig_id,
	bo_ij->d, bo_ij->bo_data.BO, bo_ij->bo_data.BO_s,
	bo_ij->bo_data.BO_pi, bo_ij->bo_data.BO_pi2 );
	break;
	default:
	fprintf(stderr, "write_traj_bonds: FATAL! invalid bond_info option");
	MPI_Abort( mpi_data->world, UNKNOWN_OPTION );
	}
	strncpy( out_control->buffer + my_bonds*line_len,
	out_control->line, line_len+1 );
	++my_bonds;
	}
	}
	}

	if( me != MASTER_NODE )
	MPI_Send( out_control->buffer, buffer_req-1, MPI_CHAR, MASTER_NODE,
	np*BOND_LINES+me, mpi_data->world );
	else{
	buffer_len = my_bonds * line_len;
	for( i = 0; i < np; ++i )
	if( i != MASTER_NODE ) {
	MPI_Recv( out_control->buffer + buffer_len, buffer_req - buffer_len,
	MPI_CHAR, i, np*BOND_LINES+i, mpi_data->world, &status );
	MPI_Get_count( &status, MPI_CHAR, &cnt );
	buffer_len += cnt;
	}
	out_control->buffer[buffer_len] = 0;
	fprintf( out_control->strj, "%s", out_control->buffer );
	}

	return SUCCESS;
	}


	int Write_Angles( reax_system system, control_params control,
	reax_list bonds, reax_list thb_intrs,
	output_controls out_control, mpi_datatypes mpi_data )
	{
	int i, j, k, pi, pk, me, np;
	int my_angles, num_angles;
	int line_len, buffer_len, buffer_req, cnt;
	bond_data bo_ij, bo_jk;
	three_body_interaction_data *angle_ijk;
	MPI_Status status;

	me = system->my_rank;
	np = system->wsize;
	line_len = out_control->angle_line_len;

	/* count the number of valence angles I will output */
	my_angles = 0;
	for( j = 0; j < system->n; ++j )
	for( pi = Start_Index(j, bonds); pi < End_Index(j, bonds); ++pi ) {
	bo_ij = &(bonds->select.bond_list[pi]);
	i = bo_ij->nbr;

	if( bo_ij->bo_data.BO >= control->bg_cut ) // physical j&i bond
	for( pk = Start_Index( pi, thb_intrs );
	pk < End_Index( pi, thb_intrs ); ++pk ) {
	angle_ijk = &(thb_intrs->select.three_body_list[pk]);
	k = angle_ijk->thb;
	bo_jk = &(bonds->select.bond_list[ angle_ijk->pthb ]);

	if( system->my_atoms[i].orig_id < system->my_atoms[k].orig_id &&
	bo_jk->bo_data.BO >= control->bg_cut ) // physical j&k bond
	++my_angles;
	}
	}
	/* total number of valences */
	MPI_Allreduce(&my_angles, &num_angles, 1, MPI_INT, MPI_SUM, mpi_data->world);

	Write_Skip_Line( out_control, mpi_data, me, num_angles*line_len, num_angles );

	if( out_control->traj_method == REG_TRAJ && me == MASTER_NODE )
	buffer_req = num_angles * line_len + 1;
	else buffer_req = my_angles * line_len + 1;

	if( buffer_req > out_control->buffer_len * DANGER_ZONE )
	Reallocate_Output_Buffer( out_control, buffer_req, mpi_data->world );

	/* fill in the buffer */
	my_angles = 0;
	out_control->line[0] = 0;
	out_control->buffer[0] = 0;
	for( j = 0; j < system->n; ++j )
	for( pi = Start_Index(j, bonds); pi < End_Index(j, bonds); ++pi ) {
	bo_ij = &(bonds->select.bond_list[pi]);
	i = bo_ij->nbr;

	if( bo_ij->bo_data.BO >= control->bg_cut ) // physical j&i bond
	for( pk = Start_Index( pi, thb_intrs );
	pk < End_Index( pi, thb_intrs ); ++pk ) {
	angle_ijk = &(thb_intrs->select.three_body_list[pk]);
	k = angle_ijk->thb;
	bo_jk = &(bonds->select.bond_list[ angle_ijk->pthb ]);

	if( system->my_atoms[i].orig_id < system->my_atoms[k].orig_id &&
	bo_jk->bo_data.BO >= control->bg_cut ) { // physical j&k bond
	sprintf( out_control->line, ANGLE_BASIC,
	system->my_atoms[i].orig_id, system->my_atoms[j].orig_id,
	system->my_atoms[k].orig_id, RAD2DEG( angle_ijk->theta ) );

	strncpy( out_control->buffer + my_angles*line_len,
	out_control->line, line_len+1 );
	++my_angles;
	}
	}
	}

	if( me != MASTER_NODE )
	MPI_Send( out_control->buffer, buffer_req-1, MPI_CHAR, MASTER_NODE,
	np*ANGLE_LINES+me, mpi_data->world );
	else{
	buffer_len = my_angles * line_len;
	for( i = 0; i < np; ++i )
	if( i != MASTER_NODE ) {
	MPI_Recv( out_control->buffer + buffer_len, buffer_req - buffer_len,
	MPI_CHAR, i, np*ANGLE_LINES+i, mpi_data->world, &status );
	MPI_Get_count( &status, MPI_CHAR, &cnt );
	buffer_len += cnt;
	}
	out_control->buffer[buffer_len] = 0;
	fprintf( out_control->strj, "%s", out_control->buffer );
	}

	return SUCCESS;
	}


	int Append_Frame( reax_system system, control_params control,
	simulation_data data, reax_list *lists,
	output_controls out_control, mpi_datatypes mpi_data )
	{
	Write_Frame_Header( system, control, data, out_control, mpi_data );

	if( out_control->write_atoms )
	Write_Atoms( system, control, out_control, mpi_data );

	if( out_control->write_bonds )
	Write_Bonds( system, control, (*lists + BONDS), out_control, mpi_data );

	if( out_control->write_angles )
	Write_Angles( system, control, (lists + BONDS), (lists + THREE_BODIES),
	out_control, mpi_data );

	return SUCCESS;
	}


	int End_Traj( int my_rank, output_controls *out_control )
	{
	if( my_rank == MASTER_NODE )
	fclose( out_control->strj );

	free( out_control->buffer );
	free( out_control->line );

	return SUCCESS;
	}
	diff --git a/src/USER-REAXC/reaxc_types.h b/src/USER-REAXC/reaxc_types.h
	index db4cf0417..b3e2f40f0 100644
	--- a/src/USER-REAXC/reaxc_types.h
	+++ b/src/USER-REAXC/reaxc_types.h
	@@ -1,882 +1,862 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	#ifndef __REAX_TYPES_H_
	#define __REAX_TYPES_H_

	#include "lmptype.h"

	#include <ctype.h>
	#include <math.h>
	#include <mpi.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <string.h>
	#include "sys/time.h"
	#include <time.h>

	/*********** SOME DEFS - crucial for reax_types.h *******/

	#define LAMMPS_REAX

	//#define DEBUG
	//#define DEBUG_FOCUS
	//#define TEST_ENERGY
	//#define TEST_FORCES
	//#define CG_PERFORMANCE
	//#define LOG_PERFORMANCE
	//#define STANDARD_BOUNDARIES
	//#define OLD_BOUNDARIES
	//#define MIDPOINT_BOUNDARIES

	#define REAX_MAX_STR 1024
	#define REAX_MAX_NBRS 6
	#define REAX_MAX_3BODY_PARAM 5
	#define REAX_MAX_4BODY_PARAM 5
	#define REAX_MAX_ATOM_TYPES 25
	#define REAX_MAX_MOLECULE_SIZE 20
	#define MAX_BOND 20 // same as reaxc_defs.h

	/******************** TYPE DEFINITIONS ******************/
	-typedef int ivec[3];
	+typedef int ivec[3];
	typedef double rvec[3];
	typedef double rtensor[3][3];
	typedef double rvec2[2];
	typedef double rvec4[4];

	-
	// import LAMMPS' definition of tagint and bigint
	typedef LAMMPS_NS::tagint rc_tagint;
	typedef LAMMPS_NS::bigint rc_bigint;

	typedef struct
	{
	int cnt;
	int *index;
	void *out_atoms;
	} mpi_out_data;

	-
	typedef struct
	{
	MPI_Comm world;
	MPI_Comm comm_mesh3D;

	MPI_Datatype sys_info;
	MPI_Datatype mpi_atom_type;
	MPI_Datatype boundary_atom_type;
	MPI_Datatype mpi_rvec, mpi_rvec2;
	MPI_Datatype restart_atom_type;

	MPI_Datatype header_line;
	MPI_Datatype header_view;
	MPI_Datatype init_desc_line;
	MPI_Datatype init_desc_view;
	MPI_Datatype atom_line;
	MPI_Datatype atom_view;
	MPI_Datatype bond_line;
	MPI_Datatype bond_view;
	MPI_Datatype angle_line;
	MPI_Datatype angle_view;

	mpi_out_data out_buffers[REAX_MAX_NBRS];
	void *in1_buffer;
	void *in2_buffer;
	} mpi_datatypes;

	-
	typedef struct
	{
	int n_global;
	double* l;
	int vdw_type;
	} global_parameters;

	-
	-
	typedef struct
	{
	/* Line one in field file */
	char name[15]; // Two character atom name

	double r_s;
	double valency; // Valency of the atom
	double mass; // Mass of atom
	double r_vdw;
	double epsilon;
	double gamma;
	double r_pi;
	double valency_e;
	double nlp_opt;

	/* Line two in field file */
	double alpha;
	double gamma_w;
	double valency_boc;
	double p_ovun5;
	double chi;
	double eta;
	int p_hbond; // 1 for H, 2 for hbonding atoms (O,S,P,N), 0 for others

	/* Line three in field file */
	double r_pi_pi;
	double p_lp2;
	double b_o_131;
	double b_o_132;
	double b_o_133;

	/* Line four in the field file */
	double p_ovun2;
	double p_val3;
	double valency_val;
	double p_val5;
	double rcore2;
	double ecore2;
	double acore2;

	/* Line five in the ffield file, only for lgvdw yes */
	double lgcij;
	double lgre;

	} single_body_parameters;

	-
	-
	/* Two Body Parameters */
	typedef struct {
	/* Bond Order parameters */
	double p_bo1,p_bo2,p_bo3,p_bo4,p_bo5,p_bo6;
	double r_s, r_p, r_pp; // r_o distances in BO formula
	double p_boc3, p_boc4, p_boc5;

	/* Bond Energy parameters */
	double p_be1, p_be2;
	double De_s, De_p, De_pp;

	/* Over/Under coordination parameters */
	double p_ovun1;

	/* Van der Waal interaction parameters */
	double D;
	double alpha;
	double r_vdW;
	double gamma_w;
	double rcore, ecore, acore;
	double lgcij, lgre;

	/* electrostatic parameters */
	double gamma; // note: this parameter is gamma^-3 and not gamma.

	double v13cor, ovc;
	} two_body_parameters;

	-
	-
	/* 3-body parameters */
	typedef struct {
	/* valence angle */
	double theta_00;
	double p_val1, p_val2, p_val4, p_val7;

	/* penalty */
	double p_pen1;

	/* 3-body conjugation */
	double p_coa1;
	} three_body_parameters;


	typedef struct{
	int cnt;
	three_body_parameters prm[REAX_MAX_3BODY_PARAM];
	} three_body_header;

	-
	-
	/* hydrogen-bond parameters */
	typedef struct{
	double r0_hb, p_hb1, p_hb2, p_hb3;
	} hbond_parameters;

	-
	-
	/* 4-body parameters */
	typedef struct {
	double V1, V2, V3;

	/* torsion angle */
	double p_tor1;

	/* 4-body conjugation */
	double p_cot1;
	} four_body_parameters;

	-
	typedef struct
	{
	int cnt;
	four_body_parameters prm[REAX_MAX_4BODY_PARAM];
	} four_body_header;

	-
	typedef struct
	{
	int num_atom_types;
	global_parameters gp;
	single_body_parameters *sbp;
	two_body_parameters **tbp;
	three_body_header ***thbp;
	hbond_parameters ***hbp;
	four_body_header ****fbp;
	} reax_interaction;

	-
	-
	struct _reax_atom
	{
	rc_tagint orig_id;
	int imprt_id;
	int type;
	char name[8];

	rvec x; // position
	rvec v; // velocity
	rvec f; // force
	rvec f_old;

	double q; // charge
	rvec4 s; // they take part in
	rvec4 t; // computing q

	int Hindex;
	int num_bonds;
	int num_hbonds;
	int renumber;

	int numbonds; // true number of bonds around atoms
	int nbr_id[MAX_BOND]; // ids of neighbors around atoms
	double nbr_bo[MAX_BOND]; // BO values of bond between i and nbr
	double sum_bo, no_lp; // sum of BO values and no. of lone pairs
	};
	typedef _reax_atom reax_atom;

	-
	-
	typedef struct
	{
	double V;
	rvec min, max, box_norms;

	rtensor box, box_inv;
	rtensor trans, trans_inv;
	rtensor g;
	} simulation_box;

	-
	-
	struct grid_cell
	{
	double cutoff;
	rvec min, max;
	ivec rel_box;

	int mark;
	int type;
	int str;
	int end;
	int top;
	int* atoms;
	struct grid_cell** nbrs;
	ivec* nbrs_x;
	rvec* nbrs_cp;
	};

	typedef struct grid_cell grid_cell;


	typedef struct
	{
	int total, max_atoms, max_nbrs;
	ivec ncells;
	rvec cell_len;
	rvec inv_len;

	ivec bond_span;
	ivec nonb_span;
	ivec vlist_span;

	ivec native_cells;
	ivec native_str;
	ivec native_end;

	double ghost_cut;
	ivec ghost_span;
	ivec ghost_nonb_span;
	ivec ghost_hbond_span;
	ivec ghost_bond_span;

	grid_cell*** cells;
	ivec *order;
	} grid;


	typedef struct
	{
	int rank;
	int est_send, est_recv;
	int atoms_str, atoms_cnt;
	ivec rltv, prdc;
	rvec bndry_min, bndry_max;

	int send_type;
	int recv_type;
	ivec str_send;
	ivec end_send;
	ivec str_recv;
	ivec end_recv;
	} neighbor_proc;



	typedef struct
	{
	int N;
	int exc_gcells;
	int exc_atoms;
	} bound_estimate;



	typedef struct
	{
	double ghost_nonb;
	double ghost_hbond;
	double ghost_bond;
	double ghost_cutoff;
	} boundary_cutoff;

	using LAMMPS_NS::Pair;

	struct _reax_system
	{
	reax_interaction reax_param;

	rc_bigint bigN;
	int n, N, numH;
	int local_cap, total_cap, gcell_cap, Hcap;
	int est_recv, est_trans, max_recved;
	int wsize, my_rank, num_nbrs;
	ivec my_coords;
	neighbor_proc my_nbrs[REAX_MAX_NBRS];
	int *global_offset;
	simulation_box big_box, my_box, my_ext_box;
	grid my_grid;
	boundary_cutoff bndry_cuts;
	reax_atom *my_atoms;

	class Pair *pair_ptr;
	int my_bonds;
	int mincap;
	double safezone, saferzone;

	};
	typedef _reax_system reax_system;



	/* system control parameters */
	typedef struct
	{
	char sim_name[REAX_MAX_STR];
	int nprocs;
	ivec procs_by_dim;
	/* ensemble values:
	0 : NVE
	1 : bNVT (Berendsen)
	2 : nhNVT (Nose-Hoover)
	3 : sNPT (Parrinello-Rehman-Nose-Hoover) semiisotropic
	4 : iNPT (Parrinello-Rehman-Nose-Hoover) isotropic
	5 : NPT (Parrinello-Rehman-Nose-Hoover) Anisotropic*/
	int ensemble;
	int nsteps;
	double dt;
	int geo_format;
	int restart;

	int restrict_bonds;
	int remove_CoM_vel;
	int random_vel;
	int reposition_atoms;

	int reneighbor;
	double vlist_cut;
	double bond_cut;
	double nonb_cut, nonb_low;
	double hbond_cut;
	double user_ghost_cut;

	double bg_cut;
	double bo_cut;
	double thb_cut;
	double thb_cutsq;

	int tabulate;

	int qeq_freq;
	double q_err;
	int refactor;
	double droptol;

	double T_init, T_final, T;
	double Tau_T;
	int T_mode;
	double T_rate, T_freq;

	int virial;
	rvec P, Tau_P, Tau_PT;
	int press_mode;
	double compressibility;

	int molecular_analysis;
	int num_ignored;
	int ignore[REAX_MAX_ATOM_TYPES];

	int dipole_anal;
	int freq_dipole_anal;
	int diffusion_coef;
	int freq_diffusion_coef;
	int restrict_type;

	int lgflag;
	-
	+ int enobondsflag;
	+
	} control_params;


	typedef struct
	{
	double T;
	double xi;
	double v_xi;
	double v_xi_old;
	double G_xi;

	} thermostat;


	typedef struct
	{
	double P;
	double eps;
	double v_eps;
	double v_eps_old;
	double a_eps;

	} isotropic_barostat;


	typedef struct
	{
	rtensor P;
	double P_scalar;

	double eps;
	double v_eps;
	double v_eps_old;
	double a_eps;

	rtensor h0;
	rtensor v_g0;
	rtensor v_g0_old;
	rtensor a_g0;

	} flexible_barostat;


	typedef struct
	{
	double start;
	double end;
	double elapsed;

	double total;
	double comm;
	double nbrs;
	double init_forces;
	double bonded;
	double nonb;
	double qEq;
	int s_matvecs;
	int t_matvecs;
	} reax_timing;


	typedef struct
	{
	double e_tot;
	double e_kin; // Total kinetic energy
	double e_pot;

	double e_bond; // Total bond energy
	double e_ov; // Total over coordination
	double e_un; // Total under coordination energy
	double e_lp; // Total under coordination energy
	double e_ang; // Total valance angle energy
	double e_pen; // Total penalty energy
	double e_coa; // Total three body conjgation energy
	double e_hb; // Total Hydrogen bond energy
	double e_tor; // Total torsional energy
	double e_con; // Total four body conjugation energy
	double e_vdW; // Total van der Waals energy
	double e_ele; // Total electrostatics energy
	double e_pol; // Polarization energy
	} energy_data;

	typedef struct
	{
	int step;
	int prev_steps;
	double time;

	double M; // Total Mass
	double inv_M; // 1 / Total Mass

	rvec xcm; // Center of mass
	rvec vcm; // Center of mass velocity
	rvec fcm; // Center of mass force
	rvec amcm; // Angular momentum of CoM
	rvec avcm; // Angular velocity of CoM
	double etran_cm; // Translational kinetic energy of CoM
	double erot_cm; // Rotational kinetic energy of CoM

	rtensor kinetic; // Kinetic energy tensor
	rtensor virial; // Hydrodynamic virial

	energy_data my_en;
	energy_data sys_en;

	double N_f; //Number of degrees of freedom
	rvec t_scale;
	rtensor p_scale;
	thermostat therm; // Used in Nose_Hoover method
	isotropic_barostat iso_bar;
	flexible_barostat flex_bar;
	double inv_W;

	double kin_press;
	rvec int_press;
	rvec my_ext_press;
	rvec ext_press;
	rvec tot_press;

	reax_timing timing;
	} simulation_data;


	typedef struct{
	int thb;
	int pthb; // pointer to the third body on the central atom's nbrlist
	double theta, cos_theta;
	rvec dcos_di, dcos_dj, dcos_dk;
	} three_body_interaction_data;


	typedef struct {
	int nbr;
	ivec rel_box;
	double d;
	rvec dvec;
	} far_neighbor_data;


	typedef struct {
	int nbr;
	int scl;
	far_neighbor_data *ptr;
	} hbond_data;


	typedef struct{
	int wrt;
	rvec dVal;
	} dDelta_data;


	typedef struct{
	int wrt;
	rvec dBO, dBOpi, dBOpi2;
	} dbond_data;

	typedef struct{
	double BO, BO_s, BO_pi, BO_pi2;
	double Cdbo, Cdbopi, Cdbopi2;
	double C1dbo, C2dbo, C3dbo;
	double C1dbopi, C2dbopi, C3dbopi, C4dbopi;
	double C1dbopi2, C2dbopi2, C3dbopi2, C4dbopi2;
	rvec dBOp, dln_BOp_s, dln_BOp_pi, dln_BOp_pi2;
	} bond_order_data;

	typedef struct {
	int nbr;
	int sym_index;
	int dbond_index;
	ivec rel_box;
	// rvec ext_factor;
	double d;
	rvec dvec;
	bond_order_data bo_data;
	} bond_data;


	typedef struct {
	int j;
	double val;
	} sparse_matrix_entry;

	typedef struct {
	int cap, n, m;
	int start, end;
	sparse_matrix_entry *entries;
	} sparse_matrix;


	typedef struct {
	int num_far;
	int H, Htop;
	int hbonds, num_hbonds;
	int bonds, num_bonds;
	int num_3body;
	int gcell_atoms;
	} reallocate_data;


	typedef struct
	{
	int allocated;

	/* communication storage */
	double *tmp_dbl[REAX_MAX_NBRS];
	rvec *tmp_rvec[REAX_MAX_NBRS];
	rvec2 *tmp_rvec2[REAX_MAX_NBRS];
	int *within_bond_box;

	/* bond order related storage */
	double *total_bond_order;
	double Deltap, Deltap_boc;
	double Delta, Delta_lp, Delta_lp_temp, Delta_e, Delta_boc, Delta_val;
	double dDelta_lp, dDelta_lp_temp;
	double nlp, nlp_temp, Clp, vlpex;
	rvec *dDeltap_self;
	int bond_mark, done_after;

	/* QEq storage */
	sparse_matrix H, L, *U;
	double Hdia_inv, b_s, b_t, b_prc, b_prm, s, *t;
	double *droptol;
	rvec2 b, x;

	/* GMRES storage */
	double y, z, *g;
	double hc, hs;
	double h, v;
	/* CG storage */
	double r, d, q, p;
	rvec2 r2, d2, q2, p2;
	/* Taper */
	double Tap[8]; //Tap7, Tap6, Tap5, Tap4, Tap3, Tap2, Tap1, Tap0;

	/* storage for analysis */
	int mark, old_mark;
	rvec *x_old;

	/* storage space for bond restrictions */
	int *restricted;
	int **restricted_list;

	/* integrator */
	rvec *v_const;

	/* force calculations */
	double *CdDelta; // coefficient of dDelta
	rvec *f;

	reallocate_data realloc;
	} storage;


	typedef union
	{
	void *v;
	three_body_interaction_data *three_body_list;
	bond_data *bond_list;
	dbond_data *dbo_list;
	dDelta_data *dDelta_list;
	far_neighbor_data *far_nbr_list;
	hbond_data *hbond_list;
	} list_type;


	struct _reax_list
	{
	int allocated;

	int n;
	int num_intrs;

	int *index;
	int *end_index;

	int type;
	list_type select;
	};
	typedef _reax_list reax_list;


	typedef struct
	{
	FILE *strj;
	int trj_offset;
	int atom_line_len;
	int bond_line_len;
	int angle_line_len;
	int write_atoms;
	int write_bonds;
	int write_angles;
	char *line;
	int buffer_len;
	char *buffer;

	FILE *out;
	FILE *pot;
	FILE *log;
	FILE mol, ign;
	FILE *dpl;
	FILE *drft;
	FILE *pdb;
	FILE *prs;

	int write_steps;
	int traj_compress;
	int traj_method;
	char traj_title[81];
	int atom_info;
	int bond_info;
	int angle_info;

	int restart_format;
	int restart_freq;
	int debug_level;
	int energy_update_freq;

	} output_controls;


	typedef struct
	{
	int atom_count;
	int atom_list[REAX_MAX_MOLECULE_SIZE];
	int mtypes[REAX_MAX_ATOM_TYPES];
	} molecule;


	struct LR_data
	{
	double H;
	double e_vdW, CEvd;
	double e_ele, CEclmb;

	void operator = (const LR_data& rhs) {
	H = rhs.H;
	e_vdW = rhs.e_vdW;
	CEvd = rhs.CEvd;
	e_ele = rhs.e_ele;
	CEclmb = rhs.CEclmb;
	}
	void operator = (const LR_data& rhs) volatile {
	H = rhs.H;
	e_vdW = rhs.e_vdW;
	CEvd = rhs.CEvd;
	e_ele = rhs.e_ele;
	CEclmb = rhs.CEclmb;
	}
	};


	struct cubic_spline_coef
	{
	double a, b, c, d;
	void operator = (const cubic_spline_coef& rhs) {
	a = rhs.a;
	b = rhs.b;
	c = rhs.c;
	d = rhs.d;
	}
	void operator = (const cubic_spline_coef& rhs) volatile {
	a = rhs.a;
	b = rhs.b;
	c = rhs.c;
	d = rhs.d;
	}
	};



	typedef struct
	{
	double xmin, xmax;
	int n;
	double dx, inv_dx;
	double a;
	double m;
	double c;

	LR_data *y;
	cubic_spline_coef *H;
	cubic_spline_coef vdW, CEvd;
	cubic_spline_coef ele, CEclmb;
	} LR_lookup_table;
	extern LR_lookup_table **LR;

	/* function pointer defs */
	typedef void (evolve_function)(reax_system, control_params*,
	simulation_data, storage, reax_list**,
	output_controls, mpi_datatypes );

	typedef void (interaction_function) (reax_system, control_params*,
	simulation_data, storage,
	reax_list*, output_controls);

	typedef void (print_interaction)(reax_system, control_params*,
	simulation_data, storage,
	reax_list*, output_controls);

	typedef double (*lookup_function)(double);

	typedef void (message_sorter) (reax_system, int, int, int, mpi_out_data*);
	typedef void (unpacker) ( reax_system, int, void, int, neighbor_proc, int );

	typedef void (dist_packer) (void, mpi_out_data*);
	typedef void (coll_unpacker) (void, void, mpi_out_data);
	#endif
	diff --git a/src/USER-REAXC/reaxc_valence_angles.cpp b/src/USER-REAXC/reaxc_valence_angles.cpp
	index c2b3287be..c92996e56 100644
	--- a/src/USER-REAXC/reaxc_valence_angles.cpp
	+++ b/src/USER-REAXC/reaxc_valence_angles.cpp
	@@ -1,416 +1,416 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_valence_angles.h"
	#include "reaxc_bond_orders.h"
	#include "reaxc_list.h"
	#include "reaxc_vector.h"

	static double Dot( double* v1, double* v2, int k )
	{
	double ret = 0.0;

	for( int i=0; i < k; ++i )
	ret += v1[i] * v2[i];

	return ret;
	}

	void Calculate_Theta( rvec dvec_ji, double d_ji, rvec dvec_jk, double d_jk,
	double theta, double cos_theta )
	{
	(cos_theta) = Dot( dvec_ji, dvec_jk, 3 ) / ( d_ji d_jk );
	if( cos_theta > 1. ) cos_theta = 1.0;
	if( cos_theta < -1. ) cos_theta = -1.0;

	(theta) = acos( cos_theta );
	}

	void Calculate_dCos_Theta( rvec dvec_ji, double d_ji, rvec dvec_jk, double d_jk,
	rvec* dcos_theta_di,
	rvec* dcos_theta_dj,
	rvec* dcos_theta_dk )
	{
	int t;
	double sqr_d_ji = SQR(d_ji);
	double sqr_d_jk = SQR(d_jk);
	double inv_dists = 1.0 / (d_ji * d_jk);
	double inv_dists3 = pow( inv_dists, 3.0 );
	double dot_dvecs = Dot( dvec_ji, dvec_jk, 3 );
	double Cdot_inv3 = dot_dvecs * inv_dists3;

	for( t = 0; t < 3; ++t ) {
	(dcos_theta_di)[t] = dvec_jk[t] inv_dists -
	Cdot_inv3 * sqr_d_jk * dvec_ji[t];
	(dcos_theta_dj)[t] = -(dvec_jk[t] + dvec_ji[t]) inv_dists +
	Cdot_inv3 * ( sqr_d_jk * dvec_ji[t] + sqr_d_ji * dvec_jk[t] );
	(dcos_theta_dk)[t] = dvec_ji[t] inv_dists -
	Cdot_inv3 * sqr_d_ji * dvec_jk[t];
	}
	}


	void Valence_Angles( reax_system system, control_params control,
	simulation_data data, storage workspace,
	reax_list *lists, output_controls out_control )
	{
	int i, j, pi, k, pk, t;
	int type_i, type_j, type_k;
	int start_j, end_j, start_pk, end_pk;
	int cnt, num_thb_intrs;

	double temp, temp_bo_jt, pBOjt7;
	double p_val1, p_val2, p_val3, p_val4, p_val5;
	double p_val6, p_val7, p_val8, p_val9, p_val10;
	double p_pen1, p_pen2, p_pen3, p_pen4;
	double p_coa1, p_coa2, p_coa3, p_coa4;
	double trm8, expval6, expval7, expval2theta, expval12theta, exp3ij, exp3jk;
	double exp_pen2ij, exp_pen2jk, exp_pen3, exp_pen4, trm_pen34, exp_coa2;
	double dSBO1, dSBO2, SBO, SBO2, CSBO2, SBOp, prod_SBO, vlpadj;
	double CEval1, CEval2, CEval3, CEval4, CEval5, CEval6, CEval7, CEval8;
	double CEpen1, CEpen2, CEpen3;
	double e_ang, e_coa, e_pen;
	double CEcoa1, CEcoa2, CEcoa3, CEcoa4, CEcoa5;
	double Cf7ij, Cf7jk, Cf8j, Cf9j;
	double f7_ij, f7_jk, f8_Dj, f9_Dj;
	double Ctheta_0, theta_0, theta_00, theta, cos_theta, sin_theta;
	double BOA_ij, BOA_jk;
	rvec force, ext_press;

	// Tallying variables
	double eng_tmp, fi_tmp[3], fj_tmp[3], fk_tmp[3];
	double delij[3], delkj[3];

	three_body_header *thbh;
	three_body_parameters *thbp;
	three_body_interaction_data p_ijk, p_kji;
	bond_data pbond_ij, pbond_jk, *pbond_jt;
	bond_order_data bo_ij, bo_jk, *bo_jt;
	reax_list bonds = (lists) + BONDS;
	reax_list thb_intrs = (lists) + THREE_BODIES;

	/* global parameters used in these calculations */
	p_val6 = system->reax_param.gp.l[14];
	p_val8 = system->reax_param.gp.l[33];
	p_val9 = system->reax_param.gp.l[16];
	p_val10 = system->reax_param.gp.l[17];
	num_thb_intrs = 0;


	for( j = 0; j < system->N; ++j ) { // Ray: the first one with system->N
	type_j = system->my_atoms[j].type;
	if (type_j < 0) continue;
	start_j = Start_Index(j, bonds);
	end_j = End_Index(j, bonds);

	p_val3 = system->reax_param.sbp[ type_j ].p_val3;
	p_val5 = system->reax_param.sbp[ type_j ].p_val5;

	SBOp = 0, prod_SBO = 1;
	for( t = start_j; t < end_j; ++t ) {
	bo_jt = &(bonds->select.bond_list[t].bo_data);
	SBOp += (bo_jt->BO_pi + bo_jt->BO_pi2);
	temp = SQR( bo_jt->BO );
	temp *= temp;
	temp *= temp;
	prod_SBO *= exp( -temp );
	}

	if( workspace->vlpex[j] >= 0 ){
	vlpadj = 0;
	dSBO2 = prod_SBO - 1;
	}
	else{
	vlpadj = workspace->nlp[j];
	dSBO2 = (prod_SBO - 1) * (1 - p_val8 * workspace->dDelta_lp[j]);
	}

	SBO = SBOp + (1 - prod_SBO) * (-workspace->Delta_boc[j] - p_val8 * vlpadj);
	dSBO1 = -8 * prod_SBO * ( workspace->Delta_boc[j] + p_val8 * vlpadj );

	if( SBO <= 0 )
	SBO2 = 0, CSBO2 = 0;
	else if( SBO > 0 && SBO <= 1 ) {
	SBO2 = pow( SBO, p_val9 );
	CSBO2 = p_val9 * pow( SBO, p_val9 - 1 );
	}
	else if( SBO > 1 && SBO < 2 ) {
	SBO2 = 2 - pow( 2-SBO, p_val9 );
	CSBO2 = p_val9 * pow( 2 - SBO, p_val9 - 1 );
	}
	else
	SBO2 = 2, CSBO2 = 0;

	expval6 = exp( p_val6 * workspace->Delta_boc[j] );

	for( pi = start_j; pi < end_j; ++pi ) {
	Set_Start_Index( pi, num_thb_intrs, thb_intrs );
	pbond_ij = &(bonds->select.bond_list[pi]);
	bo_ij = &(pbond_ij->bo_data);
	BOA_ij = bo_ij->BO - control->thb_cut;


	if( BOA_ij/bo_ij->BO/ > 0.0 &&
	( j < system->n \|\| pbond_ij->nbr < system->n ) ) {
	i = pbond_ij->nbr;
	type_i = system->my_atoms[i].type;

	for( pk = start_j; pk < pi; ++pk ) {
	start_pk = Start_Index( pk, thb_intrs );
	end_pk = End_Index( pk, thb_intrs );

	for( t = start_pk; t < end_pk; ++t )
	if( thb_intrs->select.three_body_list[t].thb == i ) {
	p_ijk = &(thb_intrs->select.three_body_list[num_thb_intrs] );
	p_kji = &(thb_intrs->select.three_body_list[t]);

	p_ijk->thb = bonds->select.bond_list[pk].nbr;
	p_ijk->pthb = pk;
	p_ijk->theta = p_kji->theta;
	rvec_Copy( p_ijk->dcos_di, p_kji->dcos_dk );
	rvec_Copy( p_ijk->dcos_dj, p_kji->dcos_dj );
	rvec_Copy( p_ijk->dcos_dk, p_kji->dcos_di );

	++num_thb_intrs;
	break;
	}
	}

	for( pk = pi+1; pk < end_j; ++pk ) {
	pbond_jk = &(bonds->select.bond_list[pk]);
	bo_jk = &(pbond_jk->bo_data);
	BOA_jk = bo_jk->BO - control->thb_cut;
	k = pbond_jk->nbr;
	type_k = system->my_atoms[k].type;
	p_ijk = &( thb_intrs->select.three_body_list[num_thb_intrs] );

	Calculate_Theta( pbond_ij->dvec, pbond_ij->d,
	pbond_jk->dvec, pbond_jk->d,
	&theta, &cos_theta );

	Calculate_dCos_Theta( pbond_ij->dvec, pbond_ij->d,
	pbond_jk->dvec, pbond_jk->d,
	&(p_ijk->dcos_di), &(p_ijk->dcos_dj),
	&(p_ijk->dcos_dk) );
	p_ijk->thb = k;
	p_ijk->pthb = pk;
	p_ijk->theta = theta;

	sin_theta = sin( theta );
	if( sin_theta < 1.0e-5 )
	sin_theta = 1.0e-5;

	++num_thb_intrs;


	if( (j < system->n) && (BOA_jk > 0.0) &&
	(bo_ij->BO > control->thb_cut) &&
	(bo_jk->BO > control->thb_cut) &&
	(bo_ij->BO * bo_jk->BO > control->thb_cutsq) ) {
	thbh = &( system->reax_param.thbp[ type_i ][ type_j ][ type_k ] );

	for( cnt = 0; cnt < thbh->cnt; ++cnt ) {
	if( fabs(thbh->prm[cnt].p_val1) > 0.001 ) {
	thbp = &( thbh->prm[cnt] );

	/* ANGLE ENERGY */
	p_val1 = thbp->p_val1;
	p_val2 = thbp->p_val2;
	p_val4 = thbp->p_val4;
	p_val7 = thbp->p_val7;
	theta_00 = thbp->theta_00;

	exp3ij = exp( -p_val3 * pow( BOA_ij, p_val4 ) );
	f7_ij = 1.0 - exp3ij;
	Cf7ij = p_val3 * p_val4 * pow( BOA_ij, p_val4 - 1.0 ) * exp3ij;

	exp3jk = exp( -p_val3 * pow( BOA_jk, p_val4 ) );
	f7_jk = 1.0 - exp3jk;
	Cf7jk = p_val3 * p_val4 * pow( BOA_jk, p_val4 - 1.0 ) * exp3jk;

	expval7 = exp( -p_val7 * workspace->Delta_boc[j] );
	trm8 = 1.0 + expval6 + expval7;
	f8_Dj = p_val5 - ( (p_val5 - 1.0) * (2.0 + expval6) / trm8 );
	Cf8j = ( (1.0 - p_val5) / SQR(trm8) ) *
	( p_val6 * expval6 * trm8 -
	(2.0 + expval6) * ( p_val6expval6 - p_val7expval7 ) );

	theta_0 = 180.0 - theta_00 * (1.0 -
	exp(-p_val10 * (2.0 - SBO2)));
	theta_0 = DEG2RAD( theta_0 );

	expval2theta = exp( -p_val2 * SQR(theta_0 - theta) );
	if( p_val1 >= 0 )
	expval12theta = p_val1 * (1.0 - expval2theta);
	else // To avoid linear Me-H-Me angles (6/6/06)
	expval12theta = p_val1 * -expval2theta;

	CEval1 = Cf7ij * f7_jk * f8_Dj * expval12theta;
	CEval2 = Cf7jk * f7_ij * f8_Dj * expval12theta;
	CEval3 = Cf8j * f7_ij * f7_jk * expval12theta;
	CEval4 = -2.0 * p_val1 * p_val2 * f7_ij * f7_jk * f8_Dj *
	expval2theta * (theta_0 - theta);

	Ctheta_0 = p_val10 * DEG2RAD(theta_00) *
	exp( -p_val10 * (2.0 - SBO2) );

	CEval5 = -CEval4 * Ctheta_0 * CSBO2;
	CEval6 = CEval5 * dSBO1;
	CEval7 = CEval5 * dSBO2;
	CEval8 = -CEval4 / sin_theta;

	data->my_en.e_ang += e_ang =
	f7_ij * f7_jk * f8_Dj * expval12theta;
	/* END ANGLE ENERGY*/

	/* PENALTY ENERGY */
	p_pen1 = thbp->p_pen1;
	p_pen2 = system->reax_param.gp.l[19];
	p_pen3 = system->reax_param.gp.l[20];
	p_pen4 = system->reax_param.gp.l[21];

	exp_pen2ij = exp( -p_pen2 * SQR( BOA_ij - 2.0 ) );
	exp_pen2jk = exp( -p_pen2 * SQR( BOA_jk - 2.0 ) );
	exp_pen3 = exp( -p_pen3 * workspace->Delta[j] );
	exp_pen4 = exp( p_pen4 * workspace->Delta[j] );
	trm_pen34 = 1.0 + exp_pen3 + exp_pen4;
	f9_Dj = ( 2.0 + exp_pen3 ) / trm_pen34;
	Cf9j = ( -p_pen3 * exp_pen3 * trm_pen34 -
	(2.0 + exp_pen3) * ( -p_pen3 * exp_pen3 +
	p_pen4 * exp_pen4 ) ) /
	SQR( trm_pen34 );

	data->my_en.e_pen += e_pen =
	p_pen1 * f9_Dj * exp_pen2ij * exp_pen2jk;

	CEpen1 = e_pen * Cf9j / f9_Dj;
	temp = -2.0 * p_pen2 * e_pen;
	CEpen2 = temp * (BOA_ij - 2.0);
	CEpen3 = temp * (BOA_jk - 2.0);
	/* END PENALTY ENERGY */

	/* COALITION ENERGY */
	p_coa1 = thbp->p_coa1;
	p_coa2 = system->reax_param.gp.l[2];
	p_coa3 = system->reax_param.gp.l[38];
	p_coa4 = system->reax_param.gp.l[30];

	exp_coa2 = exp( p_coa2 * workspace->Delta_val[j] );
	data->my_en.e_coa += e_coa =
	p_coa1 / (1. + exp_coa2) *
	exp( -p_coa3 * SQR(workspace->total_bond_order[i]-BOA_ij) ) *
	exp( -p_coa3 * SQR(workspace->total_bond_order[k]-BOA_jk) ) *
	exp( -p_coa4 * SQR(BOA_ij - 1.5) ) *
	exp( -p_coa4 * SQR(BOA_jk - 1.5) );

	CEcoa1 = -2 * p_coa4 * (BOA_ij - 1.5) * e_coa;
	CEcoa2 = -2 * p_coa4 * (BOA_jk - 1.5) * e_coa;
	CEcoa3 = -p_coa2 * exp_coa2 * e_coa / (1 + exp_coa2);
	CEcoa4 = -2 * p_coa3 *
	(workspace->total_bond_order[i]-BOA_ij) * e_coa;
	CEcoa5 = -2 * p_coa3 *
	(workspace->total_bond_order[k]-BOA_jk) * e_coa;
	/* END COALITION ENERGY */

	/* FORCES */
	bo_ij->Cdbo += (CEval1 + CEpen2 + (CEcoa1 - CEcoa4));
	bo_jk->Cdbo += (CEval2 + CEpen3 + (CEcoa2 - CEcoa5));
	workspace->CdDelta[j] += ((CEval3 + CEval7) + CEpen1 + CEcoa3);
	workspace->CdDelta[i] += CEcoa4;
	workspace->CdDelta[k] += CEcoa5;

	for( t = start_j; t < end_j; ++t ) {
	pbond_jt = &( bonds->select.bond_list[t] );
	bo_jt = &(pbond_jt->bo_data);
	temp_bo_jt = bo_jt->BO;
	temp = CUBE( temp_bo_jt );
	pBOjt7 = temp * temp * temp_bo_jt;

	bo_jt->Cdbo += (CEval6 * pBOjt7);
	bo_jt->Cdbopi += CEval5;
	bo_jt->Cdbopi2 += CEval5;
	}

	if( control->virial == 0 ) {
	rvec_ScaledAdd( workspace->f[i], CEval8, p_ijk->dcos_di );
	rvec_ScaledAdd( workspace->f[j], CEval8, p_ijk->dcos_dj );
	rvec_ScaledAdd( workspace->f[k], CEval8, p_ijk->dcos_dk );
	}
	else {
	rvec_Scale( force, CEval8, p_ijk->dcos_di );
	rvec_Add( workspace->f[i], force );
	rvec_iMultiply( ext_press, pbond_ij->rel_box, force );
	rvec_Add( data->my_ext_press, ext_press );

	rvec_ScaledAdd( workspace->f[j], CEval8, p_ijk->dcos_dj );

	rvec_Scale( force, CEval8, p_ijk->dcos_dk );
	rvec_Add( workspace->f[k], force );
	rvec_iMultiply( ext_press, pbond_jk->rel_box, force );
	rvec_Add( data->my_ext_press, ext_press );
	}

	/* tally into per-atom virials */
	if( system->pair_ptr->vflag_atom \|\| system->pair_ptr->evflag) {

	/* Acquire vectors */
	rvec_ScaledSum( delij, 1., system->my_atoms[i].x,
	-1., system->my_atoms[j].x );
	rvec_ScaledSum( delkj, 1., system->my_atoms[k].x,
	-1., system->my_atoms[j].x );

	rvec_Scale( fi_tmp, -CEval8, p_ijk->dcos_di );
	rvec_Scale( fj_tmp, -CEval8, p_ijk->dcos_dj );
	rvec_Scale( fk_tmp, -CEval8, p_ijk->dcos_dk );

	eng_tmp = e_ang + e_pen + e_coa;

	if( system->pair_ptr->evflag)
	system->pair_ptr->ev_tally(j,j,system->N,1,eng_tmp,0.0,0.0,0.0,0.0,0.0);
	if( system->pair_ptr->vflag_atom)
	system->pair_ptr->v_tally3(i,j,k,fi_tmp,fk_tmp,delij,delkj);
	}
	}
	}
	}
	}
	}

	Set_End_Index(pi, num_thb_intrs, thb_intrs );
	}
	}

	if( num_thb_intrs >= thb_intrs->num_intrs * DANGER_ZONE ) {
	workspace->realloc.num_3body = num_thb_intrs;
	if( num_thb_intrs > thb_intrs->num_intrs ) {
	fprintf( stderr, "step%d-ran out of space on angle_list: top=%d, max=%d",
	data->step, num_thb_intrs, thb_intrs->num_intrs );
	MPI_Abort( MPI_COMM_WORLD, INSUFFICIENT_MEMORY );
	}
	}

	}
	diff --git a/src/USER-REAXC/reaxc_vector.cpp b/src/USER-REAXC/reaxc_vector.cpp
	index ee63e9428..977b17a6d 100644
	--- a/src/USER-REAXC/reaxc_vector.cpp
	+++ b/src/USER-REAXC/reaxc_vector.cpp
	@@ -1,159 +1,159 @@
	/*----------------------------------------------------------------------
	PuReMD - Purdue ReaxFF Molecular Dynamics Program

	Copyright (2010) Purdue University
	Hasan Metin Aktulga, hmaktulga@lbl.gov
	Joseph Fogarty, jcfogart@mail.usf.edu
	Sagar Pandit, pandit@usf.edu
	Ananth Y Grama, ayg@cs.purdue.edu

	Please cite the related publication:
	H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
	"Parallel Reactive Molecular Dynamics: Numerical Methods and
	Algorithmic Techniques", Parallel Computing, in press.

	This program is free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 2 of
	the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
	See the GNU General Public License for more details:
	<http://www.gnu.org/licenses/>.
	----------------------------------------------------------------------*/

	-#include "pair_reax_c.h"
	+#include "pair_reaxc.h"
	#include "reaxc_vector.h"


	void rvec_Copy( rvec dest, rvec src )
	{
	dest[0] = src[0], dest[1] = src[1], dest[2] = src[2];
	}


	void rvec_Scale( rvec ret, double c, rvec v )
	{
	ret[0] = c * v[0], ret[1] = c * v[1], ret[2] = c * v[2];
	}


	void rvec_Add( rvec ret, rvec v )
	{
	ret[0] += v[0], ret[1] += v[1], ret[2] += v[2];
	}


	void rvec_ScaledAdd( rvec ret, double c, rvec v )
	{
	ret[0] += c * v[0], ret[1] += c * v[1], ret[2] += c * v[2];
	}


	void rvec_ScaledSum( rvec ret, double c1, rvec v1 ,double c2, rvec v2 )
	{
	ret[0] = c1 * v1[0] + c2 * v2[0];
	ret[1] = c1 * v1[1] + c2 * v2[1];
	ret[2] = c1 * v1[2] + c2 * v2[2];
	}


	double rvec_Dot( rvec v1, rvec v2 )
	{
	return v1[0]v2[0] + v1[1]v2[1] + v1[2]*v2[2];
	}


	void rvec_iMultiply( rvec r, ivec v1, rvec v2 )
	{
	r[0] = v1[0] * v2[0];
	r[1] = v1[1] * v2[1];
	r[2] = v1[2] * v2[2];
	}


	void rvec_Cross( rvec ret, rvec v1, rvec v2 )
	{
	ret[0] = v1[1] * v2[2] - v1[2] * v2[1];
	ret[1] = v1[2] * v2[0] - v1[0] * v2[2];
	ret[2] = v1[0] * v2[1] - v1[1] * v2[0];
	}


	double rvec_Norm_Sqr( rvec v )
	{
	return SQR(v[0]) + SQR(v[1]) + SQR(v[2]);
	}


	double rvec_Norm( rvec v )
	{
	return sqrt( SQR(v[0]) + SQR(v[1]) + SQR(v[2]) );
	}


	void rvec_MakeZero( rvec v )
	{
	v[0] = v[1] = v[2] = 0.000000000000000e+00;
	}


	void rtensor_MatVec( rvec ret, rtensor m, rvec v )
	{
	int i;
	rvec temp;

	if( ret == v )
	{
	for( i = 0; i < 3; ++i )
	temp[i] = m[i][0] * v[0] + m[i][1] * v[1] + m[i][2] * v[2];

	for( i = 0; i < 3; ++i )
	ret[i] = temp[i];
	}
	else
	{
	for( i = 0; i < 3; ++i )
	ret[i] = m[i][0] * v[0] + m[i][1] * v[1] + m[i][2] * v[2];
	}
	}


	void rtensor_MakeZero( rtensor t )
	{
	t[0][0] = t[0][1] = t[0][2] = 0;
	t[1][0] = t[1][1] = t[1][2] = 0;
	t[2][0] = t[2][1] = t[2][2] = 0;
	}


	void ivec_MakeZero( ivec v )
	{
	v[0] = v[1] = v[2] = 0;
	}


	void ivec_Copy( ivec dest, ivec src )
	{
	dest[0] = src[0], dest[1] = src[1], dest[2] = src[2];
	}


	void ivec_Scale( ivec dest, double C, ivec src )
	{
	dest[0] = (int)(C * src[0]);
	dest[1] = (int)(C * src[1]);
	dest[2] = (int)(C * src[2]);
	}


	void ivec_Sum( ivec dest, ivec v1, ivec v2 )
	{
	dest[0] = v1[0] + v2[0];
	dest[1] = v1[1] + v2[1];
	dest[2] = v1[2] + v2[2];
	}


	diff --git a/src/USER-TALLY/compute_force_tally.cpp b/src/USER-TALLY/compute_force_tally.cpp
	index e9ecedd5a..e97a1c751 100644
	--- a/src/USER-TALLY/compute_force_tally.cpp
	+++ b/src/USER-TALLY/compute_force_tally.cpp
	@@ -1,224 +1,224 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <string.h>
	#include "compute_force_tally.h"
	#include "atom.h"
	#include "group.h"
	#include "pair.h"
	#include "update.h"
	#include "memory.h"
	#include "error.h"
	#include "force.h"

	using namespace LAMMPS_NS;

	/* ---------------------------------------------------------------------- */

	ComputeForceTally::ComputeForceTally(LAMMPS lmp, int narg, char *arg) :
	Compute(lmp, narg, arg)
	{
	if (narg < 4) error->all(FLERR,"Illegal compute force/tally command");

	igroup2 = group->find(arg[3]);
	if (igroup2 == -1)
	error->all(FLERR,"Could not find compute force/tally second group ID");
	groupbit2 = group->bitmask[igroup2];

	scalar_flag = 1;
	vector_flag = 0;
	peratom_flag = 1;
	timeflag = 1;

	comm_reverse = size_peratom_cols = 3;
	extscalar = 1;
	peflag = 1; // we need Pair::ev_tally() to be run

	did_compute = 0;
	invoked_peratom = invoked_scalar = -1;
	nmax = -1;
	fatom = NULL;
	vector = new double[size_peratom_cols];
	}

	/* ---------------------------------------------------------------------- */

	ComputeForceTally::~ComputeForceTally()
	{
	if (force && force->pair) force->pair->del_tally_callback(this);
	memory->destroy(fatom);
	delete[] vector;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeForceTally::init()
	{
	if (force->pair == NULL)
	- error->all(FLERR,"Trying to use compute force/tally with no pair style");
	+ error->all(FLERR,"Trying to use compute force/tally without pair style");
	else
	force->pair->add_tally_callback(this);

	if (force->pair->single_enable == 0 \|\| force->pair->manybody_flag)
	- error->all(FLERR,"Compute force/tally used with incompatible pair style.");
	+ error->warning(FLERR,"Compute force/tally used with incompatible pair style");

	if ((comm->me == 0) && (force->bond \|\| force->angle \|\| force->dihedral
	\|\| force->improper \|\| force->kspace))
	error->warning(FLERR,"Compute force/tally only called from pair style");

	did_compute = -1;
	}


	/* ---------------------------------------------------------------------- */
	void ComputeForceTally::pair_tally_callback(int i, int j, int nlocal, int newton,
	double, double, double fpair,
	double dx, double dy, double dz)
	{
	const int ntotal = atom->nlocal + atom->nghost;
	const int * const mask = atom->mask;

	// do setup work that needs to be done only once per timestep

	if (did_compute != update->ntimestep) {
	did_compute = update->ntimestep;

	// grow local force array if necessary
	// needs to be atom->nmax in length

	if (atom->nmax > nmax) {
	memory->destroy(fatom);
	nmax = atom->nmax;
	memory->create(fatom,nmax,size_peratom_cols,"force/tally:fatom");
	array_atom = fatom;
	}

	// clear storage as needed

	if (newton) {
	for (int i=0; i < ntotal; ++i)
	for (int j=0; j < size_peratom_cols; ++j)
	fatom[i][j] = 0.0;
	} else {
	for (int i=0; i < atom->nlocal; ++i)
	for (int j=0; j < size_peratom_cols; ++j)
	fatom[i][j] = 0.0;
	}

	for (int i=0; i < size_peratom_cols; ++i)
	vector[i] = ftotal[i] = 0.0;
	}

	if ( ((mask[i] & groupbit) && (mask[j] & groupbit2))
	\|\| ((mask[i] & groupbit2) && (mask[j] & groupbit)) ) {

	if (newton \|\| i < nlocal) {
	if (mask[i] & groupbit) {
	ftotal[0] += fpair*dx;
	ftotal[1] += fpair*dy;
	ftotal[2] += fpair*dz;
	}
	fatom[i][0] += fpair*dx;
	fatom[i][1] += fpair*dy;
	fatom[i][2] += fpair*dz;
	}
	if (newton \|\| j < nlocal) {
	if (mask[j] & groupbit) {
	ftotal[0] -= fpair*dx;
	ftotal[1] -= fpair*dy;
	ftotal[2] -= fpair*dz;
	}
	fatom[j][0] -= fpair*dx;
	fatom[j][1] -= fpair*dy;
	fatom[j][2] -= fpair*dz;
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	int ComputeForceTally::pack_reverse_comm(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	buf[m++] = fatom[i][0];
	buf[m++] = fatom[i][1];
	buf[m++] = fatom[i][2];
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeForceTally::unpack_reverse_comm(int n, int list, double buf)
	{
	int i,j,m;

	m = 0;
	for (i = 0; i < n; i++) {
	j = list[i];
	fatom[j][0] += buf[m++];
	fatom[j][1] += buf[m++];
	fatom[j][2] += buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	double ComputeForceTally::compute_scalar()
	{
	invoked_scalar = update->ntimestep;
	if ((did_compute != invoked_scalar) \|\| (update->eflag_global != invoked_scalar))
	error->all(FLERR,"Energy was not tallied on needed timestep");

	// sum accumulated forces across procs

	MPI_Allreduce(ftotal,vector,size_peratom_cols,MPI_DOUBLE,MPI_SUM,world);

	scalar = sqrt(vector[0]vector[0]+vector[1]vector[1]+vector[2]*vector[2]);
	return scalar;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeForceTally::compute_peratom()
	{
	invoked_peratom = update->ntimestep;
	if ((did_compute != invoked_peratom) \|\| (update->eflag_global != invoked_peratom))
	error->all(FLERR,"Energy was not tallied on needed timestep");

	// collect contributions from ghost atoms

	if (force->newton_pair) {
	comm->reverse_comm_compute(this);

	const int nall = atom->nlocal + atom->nghost;
	for (int i = atom->nlocal; i < nall; ++i)
	for (int j = 0; j < size_peratom_cols; ++j)
	fatom[i][j] = 0.0;
	}
	}

	/* ----------------------------------------------------------------------
	memory usage of local atom-based array
	------------------------------------------------------------------------- */

	double ComputeForceTally::memory_usage()
	{
	double bytes = nmaxsize_peratom_cols sizeof(double);
	return bytes;
	}

	diff --git a/src/USER-TALLY/compute_heat_flux_tally.cpp b/src/USER-TALLY/compute_heat_flux_tally.cpp
	index 214311cb3..48cad538d 100644
	--- a/src/USER-TALLY/compute_heat_flux_tally.cpp
	+++ b/src/USER-TALLY/compute_heat_flux_tally.cpp
	@@ -1,286 +1,286 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <string.h>
	#include "compute_heat_flux_tally.h"
	#include "atom.h"
	#include "group.h"
	#include "pair.h"
	#include "update.h"
	#include "memory.h"
	#include "error.h"
	#include "force.h"

	using namespace LAMMPS_NS;

	/* ---------------------------------------------------------------------- */

	ComputeHeatFluxTally::ComputeHeatFluxTally(LAMMPS lmp, int narg, char *arg) :
	Compute(lmp, narg, arg)
	{
	if (narg < 4) error->all(FLERR,"Illegal compute heat/flux/tally command");

	igroup2 = group->find(arg[3]);
	if (igroup2 == -1)
	error->all(FLERR,"Could not find compute heat/flux/tally second group ID");
	groupbit2 = group->bitmask[igroup2];

	vector_flag = 1;
	timeflag = 1;

	comm_reverse = 7;
	extvector = 1;
	size_vector = 6;
	peflag = 1; // we need Pair::ev_tally() to be run

	did_compute = 0;
	invoked_peratom = invoked_scalar = -1;
	nmax = -1;
	stress = NULL;
	eatom = NULL;
	vector = new double[size_vector];
	}

	/* ---------------------------------------------------------------------- */

	ComputeHeatFluxTally::~ComputeHeatFluxTally()
	{
	if (force && force->pair) force->pair->del_tally_callback(this);
	delete[] vector;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeHeatFluxTally::init()
	{
	if (force->pair == NULL)
	- error->all(FLERR,"Trying to use compute heat/flux/tally with no pair style");
	+ error->all(FLERR,"Trying to use compute heat/flux/tally without pair style");
	else
	force->pair->add_tally_callback(this);

	if (force->pair->single_enable == 0 \|\| force->pair->manybody_flag)
	- error->all(FLERR,"Compute heat/flux/tally used with incompatible pair style.");
	+ error->warning(FLERR,"Compute heat/flux/tally used with incompatible pair style");

	if ((comm->me == 0) && (force->bond \|\| force->angle \|\| force->dihedral
	\|\| force->improper \|\| force->kspace))
	error->warning(FLERR,"Compute heat/flux/tally only called from pair style");

	did_compute = -1;
	}


	/* ---------------------------------------------------------------------- */
	void ComputeHeatFluxTally::pair_tally_callback(int i, int j, int nlocal, int newton,
	double evdwl, double ecoul, double fpair,
	double dx, double dy, double dz)
	{
	const int ntotal = atom->nlocal + atom->nghost;
	const int * const mask = atom->mask;

	// do setup work that needs to be done only once per timestep

	if (did_compute != update->ntimestep) {
	did_compute = update->ntimestep;

	// grow local stress and eatom arrays if necessary
	// needs to be atom->nmax in length

	if (atom->nmax > nmax) {
	memory->destroy(stress);
	nmax = atom->nmax;
	memory->create(stress,nmax,6,"heat/flux/tally:stress");

	memory->destroy(eatom);
	nmax = atom->nmax;
	memory->create(eatom,nmax,"heat/flux/tally:eatom");
	}

	// clear storage as needed

	if (newton) {
	for (int i=0; i < ntotal; ++i) {
	eatom[i] = 0.0;
	stress[i][0] = 0.0;
	stress[i][1] = 0.0;
	stress[i][2] = 0.0;
	stress[i][3] = 0.0;
	stress[i][4] = 0.0;
	stress[i][5] = 0.0;
	}
	} else {
	for (int i=0; i < atom->nlocal; ++i) {
	eatom[i] = 0.0;
	stress[i][0] = 0.0;
	stress[i][1] = 0.0;
	stress[i][2] = 0.0;
	stress[i][3] = 0.0;
	stress[i][4] = 0.0;
	stress[i][5] = 0.0;
	}
	}

	for (int i=0; i < size_vector; ++i)
	vector[i] = heatj[i] = 0.0;
	}

	if ( ((mask[i] & groupbit) && (mask[j] & groupbit2))
	\|\| ((mask[i] & groupbit2) && (mask[j] & groupbit)) ) {

	const double epairhalf = 0.5 * (evdwl + ecoul);
	fpair *= 0.5;
	const double v0 = dxdxfpair; // dx*fpair = Fij_x
	const double v1 = dydyfpair;
	const double v2 = dzdzfpair;
	const double v3 = dxdyfpair;
	const double v4 = dxdzfpair;
	const double v5 = dydzfpair;

	if (newton \|\| i < nlocal) {
	eatom[i] += epairhalf;
	stress[i][0] += v0;
	stress[i][1] += v1;
	stress[i][2] += v2;
	stress[i][3] += v3;
	stress[i][4] += v4;
	stress[i][5] += v5;
	}
	if (newton \|\| j < nlocal) {
	eatom[j] += epairhalf;
	stress[j][0] += v0;
	stress[j][1] += v1;
	stress[j][2] += v2;
	stress[j][3] += v3;
	stress[j][4] += v4;
	stress[j][5] += v5;
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	int ComputeHeatFluxTally::pack_reverse_comm(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	buf[m++] = eatom[i];
	buf[m++] = stress[i][0];
	buf[m++] = stress[i][1];
	buf[m++] = stress[i][2];
	buf[m++] = stress[i][3];
	buf[m++] = stress[i][4];
	buf[m++] = stress[i][5];
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeHeatFluxTally::unpack_reverse_comm(int n, int list, double buf)
	{
	int i,j,m;

	m = 0;
	for (i = 0; i < n; i++) {
	j = list[i];
	eatom[j] += buf[m++];
	stress[j][0] += buf[m++];
	stress[j][1] += buf[m++];
	stress[j][2] += buf[m++];
	stress[j][3] += buf[m++];
	stress[j][4] += buf[m++];
	stress[j][5] += buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputeHeatFluxTally::compute_vector()
	{
	invoked_vector = update->ntimestep;
	if ((did_compute != invoked_vector) \|\| (update->eflag_global != invoked_vector))
	error->all(FLERR,"Energy was not tallied on needed timestep");

	// collect contributions from ghost atoms

	if (force->newton_pair) {
	comm->reverse_comm_compute(this);

	const int nall = atom->nlocal + atom->nghost;
	for (int i = atom->nlocal; i < nall; ++i) {
	eatom[i] = 0.0;
	stress[i][0] = 0.0;
	stress[i][1] = 0.0;
	stress[i][2] = 0.0;
	stress[i][3] = 0.0;
	stress[i][4] = 0.0;
	stress[i][5] = 0.0;
	}
	}

	// compute heat currents
	// heat flux vector = jc[3] + jv[3]
	// jc[3] = convective portion of heat flux = sum_i (ke_i + pe_i) v_i[3]
	// jv[3] = virial portion of heat flux = sum_i (stress_tensor_i . v_i[3])
	// normalization by volume is not included
	// J = sum_i( (0.5mv_i^2 + 0.5(evdwl_i+ecoul_i))v_i +
	// + (F_ij . v_i)*dR_ij/2 )

	int nlocal = atom->nlocal;
	int *mask = atom->mask;
	const double pfactor = 0.5 * force->mvv2e;
	double **v = atom->v;
	double *mass = atom->mass;
	int *type = atom->type;

	double jc[3] = {0.0,0.0,0.0};
	double jv[3] = {0.0,0.0,0.0};

	for (int i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) {
	double ke_i = pfactor * mass[type[i]] *
	(v[i][0]v[i][0] + v[i][1]v[i][1] + v[i][2]*v[i][2]);
	jc[0] += (ke_i + eatom[i]) * v[i][0];
	jc[1] += (ke_i + eatom[i]) * v[i][1];
	jc[2] += (ke_i + eatom[i]) * v[i][2];
	jv[0] += stress[i][0]v[i][0] + stress[i][3]v[i][1] +
	stress[i][4]*v[i][2];
	jv[1] += stress[i][3]v[i][0] + stress[i][1]v[i][1] +
	stress[i][5]*v[i][2];
	jv[2] += stress[i][4]v[i][0] + stress[i][5]v[i][1] +
	stress[i][2]*v[i][2];
	}
	}

	// sum accumulated heatj across procs
	heatj[0] = jc[0] + jv[0];
	heatj[1] = jc[1] + jv[1];
	heatj[2] = jc[2] + jv[2];
	heatj[3] = jc[0];
	heatj[4] = jc[1];
	heatj[5] = jc[2];
	MPI_Allreduce(heatj,vector,size_vector,MPI_DOUBLE,MPI_SUM,world);
	}

	/* ----------------------------------------------------------------------
	memory usage of local atom-based array
	------------------------------------------------------------------------- */

	double ComputeHeatFluxTally::memory_usage()
	{
	double bytes = nmaxcomm_reverse sizeof(double);
	return bytes;
	}

	diff --git a/src/USER-TALLY/compute_pe_mol_tally.cpp b/src/USER-TALLY/compute_pe_mol_tally.cpp
	index 09ee04d57..a30f2d6b9 100644
	--- a/src/USER-TALLY/compute_pe_mol_tally.cpp
	+++ b/src/USER-TALLY/compute_pe_mol_tally.cpp
	@@ -1,129 +1,129 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <string.h>
	#include "compute_pe_mol_tally.h"
	#include "atom.h"
	#include "group.h"
	#include "pair.h"
	#include "update.h"
	#include "memory.h"
	#include "error.h"
	#include "force.h"

	using namespace LAMMPS_NS;

	/* ---------------------------------------------------------------------- */

	ComputePEMolTally::ComputePEMolTally(LAMMPS lmp, int narg, char *arg) :
	Compute(lmp, narg, arg)
	{
	if (narg < 4) error->all(FLERR,"Illegal compute pe/mol/tally command");

	igroup2 = group->find(arg[3]);
	if (igroup2 == -1)
	error->all(FLERR,"Could not find compute pe/mol/tally second group ID");
	groupbit2 = group->bitmask[igroup2];

	vector_flag = 1;
	size_vector = 4;
	timeflag = 1;

	extvector = 1;
	peflag = 1; // we need Pair::ev_tally() to be run

	did_compute = invoked_vector = -1;
	vector = new double[size_vector];
	}

	/* ---------------------------------------------------------------------- */

	ComputePEMolTally::~ComputePEMolTally()
	{
	if (force && force->pair) force->pair->del_tally_callback(this);
	delete[] vector;
	}

	/* ---------------------------------------------------------------------- */

	void ComputePEMolTally::init()
	{
	if (force->pair == NULL)
	- error->all(FLERR,"Trying to use compute pe/mol/tally with no pair style");
	+ error->all(FLERR,"Trying to use compute pe/mol/tally without pair style");
	else
	force->pair->add_tally_callback(this);

	if (atom->molecule_flag == 0)
	- error->all(FLERR,"Compute pe/mol/tally requires molecule IDs.");
	+ error->all(FLERR,"Compute pe/mol/tally requires molecule IDs");

	if (force->pair->single_enable == 0 \|\| force->pair->manybody_flag)
	- error->all(FLERR,"Compute pe/mol/tally used with incompatible pair style.");
	+ error->warning(FLERR,"Compute pe/mol/tally used with incompatible pair style");

	if ((comm->me == 0) && (force->bond \|\| force->angle \|\| force->dihedral
	\|\| force->improper \|\| force->kspace))
	error->warning(FLERR,"Compute pe/mol/tally only called from pair style");

	did_compute = -1;
	}


	/* ---------------------------------------------------------------------- */
	void ComputePEMolTally::pair_tally_callback(int i, int j, int nlocal, int newton,
	double evdwl, double ecoul, double,
	double, double, double)
	{
	const int * const mask = atom->mask;
	const tagint * const molid = atom->molecule;

	// do setup work that needs to be done only once per timestep

	if (did_compute != update->ntimestep) {
	did_compute = update->ntimestep;

	etotal[0] = etotal[1] = etotal[2] = etotal[3] = 0.0;
	}

	if ( ((mask[i] & groupbit) && (mask[j] & groupbit2))
	\|\| ((mask[i] & groupbit2) && (mask[j] & groupbit)) ){

	evdwl = 0.5; ecoul = 0.5;
	if (newton \|\| i < nlocal) {
	if (molid[i] == molid[j]) {
	etotal[0] += evdwl; etotal[1] += ecoul;
	} else {
	etotal[2] += evdwl; etotal[3] += ecoul;
	}
	}
	if (newton \|\| j < nlocal) {
	if (molid[i] == molid[j]) {
	etotal[0] += evdwl; etotal[1] += ecoul;
	} else {
	etotal[2] += evdwl; etotal[3] += ecoul;
	}
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	void ComputePEMolTally::compute_vector()
	{
	invoked_vector = update->ntimestep;
	if ((did_compute != invoked_vector) \|\| (update->eflag_global != invoked_vector))
	error->all(FLERR,"Energy was not tallied on needed timestep");

	// sum accumulated energies across procs

	MPI_Allreduce(etotal,vector,size_vector,MPI_DOUBLE,MPI_SUM,world);
	}

	diff --git a/src/USER-TALLY/compute_pe_tally.cpp b/src/USER-TALLY/compute_pe_tally.cpp
	index 68c00b6d2..2117f2cb1 100644
	--- a/src/USER-TALLY/compute_pe_tally.cpp
	+++ b/src/USER-TALLY/compute_pe_tally.cpp
	@@ -1,205 +1,205 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <string.h>
	#include "compute_pe_tally.h"
	#include "atom.h"
	#include "group.h"
	#include "pair.h"
	#include "update.h"
	#include "memory.h"
	#include "error.h"
	#include "force.h"

	using namespace LAMMPS_NS;

	/* ---------------------------------------------------------------------- */

	ComputePETally::ComputePETally(LAMMPS lmp, int narg, char *arg) :
	Compute(lmp, narg, arg)
	{
	if (narg < 4) error->all(FLERR,"Illegal compute pe/tally command");

	igroup2 = group->find(arg[3]);
	if (igroup2 == -1)
	error->all(FLERR,"Could not find compute pe/tally second group ID");
	groupbit2 = group->bitmask[igroup2];

	scalar_flag = 1;
	vector_flag = 0;
	peratom_flag = 1;
	timeflag = 1;

	comm_reverse = size_peratom_cols = 2;
	extscalar = 1;
	peflag = 1; // we need Pair::ev_tally() to be run

	did_compute = invoked_peratom = invoked_scalar = -1;
	nmax = -1;
	eatom = NULL;
	vector = new double[size_peratom_cols];
	}

	/* ---------------------------------------------------------------------- */

	ComputePETally::~ComputePETally()
	{
	if (force && force->pair) force->pair->del_tally_callback(this);
	memory->destroy(eatom);
	delete[] vector;
	}

	/* ---------------------------------------------------------------------- */

	void ComputePETally::init()
	{
	if (force->pair == NULL)
	- error->all(FLERR,"Trying to use compute pe/tally with no pair style");
	+ error->all(FLERR,"Trying to use compute pe/tally without a pair style");
	else
	force->pair->add_tally_callback(this);

	if (force->pair->single_enable == 0 \|\| force->pair->manybody_flag)
	- error->all(FLERR,"Compute pe/tally used with incompatible pair style.");
	+ error->warning(FLERR,"Compute pe/tally used with incompatible pair style");

	if ((comm->me == 0) && (force->bond \|\| force->angle \|\| force->dihedral
	\|\| force->improper \|\| force->kspace))
	error->warning(FLERR,"Compute pe/tally only called from pair style");

	did_compute = -1;
	}


	/* ---------------------------------------------------------------------- */
	void ComputePETally::pair_tally_callback(int i, int j, int nlocal, int newton,
	double evdwl, double ecoul, double,
	double, double, double)
	{
	const int ntotal = atom->nlocal + atom->nghost;
	const int * const mask = atom->mask;

	// do setup work that needs to be done only once per timestep

	if (did_compute != update->ntimestep) {
	did_compute = update->ntimestep;

	// grow local eatom array if necessary
	// needs to be atom->nmax in length

	if (atom->nmax > nmax) {
	memory->destroy(eatom);
	nmax = atom->nmax;
	memory->create(eatom,nmax,size_peratom_cols,"pe/tally:eatom");
	array_atom = eatom;
	}

	// clear storage as needed

	if (newton) {
	for (int i=0; i < ntotal; ++i)
	eatom[i][0] = eatom[i][1] = 0.0;
	} else {
	for (int i=0; i < atom->nlocal; ++i)
	eatom[i][0] = eatom[i][1] = 0.0;
	}

	vector[0] = etotal[0] = vector[1] = etotal[1] = 0.0;
	}

	if ( ((mask[i] & groupbit) && (mask[j] & groupbit2))
	\|\| ((mask[i] & groupbit2) && (mask[j] & groupbit)) ){

	evdwl = 0.5; ecoul = 0.5;
	if (newton \|\| i < nlocal) {
	etotal[0] += evdwl; eatom[i][0] += evdwl;
	etotal[1] += ecoul; eatom[i][1] += ecoul;
	}
	if (newton \|\| j < nlocal) {
	etotal[0] += evdwl; eatom[j][0] += evdwl;
	etotal[1] += ecoul; eatom[j][1] += ecoul;
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	int ComputePETally::pack_reverse_comm(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	buf[m++] = eatom[i][0];
	buf[m++] = eatom[i][1];
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void ComputePETally::unpack_reverse_comm(int n, int list, double buf)
	{
	int i,j,m;

	m = 0;
	for (i = 0; i < n; i++) {
	j = list[i];
	eatom[j][0] += buf[m++];
	eatom[j][1] += buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	double ComputePETally::compute_scalar()
	{
	invoked_scalar = update->ntimestep;
	if ((did_compute != invoked_scalar) \|\| (update->eflag_global != invoked_scalar))
	error->all(FLERR,"Energy was not tallied on needed timestep");

	// sum accumulated energies across procs

	MPI_Allreduce(etotal,vector,size_peratom_cols,MPI_DOUBLE,MPI_SUM,world);

	scalar = vector[0]+vector[1];
	return scalar;
	}

	/* ---------------------------------------------------------------------- */

	void ComputePETally::compute_peratom()
	{
	invoked_peratom = update->ntimestep;
	if ((did_compute != invoked_peratom) \|\| (update->eflag_global != invoked_peratom))
	error->all(FLERR,"Energy was not tallied on needed timestep");

	// collect contributions from ghost atoms

	if (force->newton_pair) {
	comm->reverse_comm_compute(this);
	const int nall = atom->nlocal + atom->nghost;
	for (int i = atom->nlocal; i < nall; ++i)
	eatom[i][0] = eatom[i][1] = 0.0;
	}
	}

	/* ----------------------------------------------------------------------
	memory usage of local atom-based array
	------------------------------------------------------------------------- */

	double ComputePETally::memory_usage()
	{
	double bytes = nmaxsize_peratom_cols sizeof(double);
	return bytes;
	}

	diff --git a/src/USER-TALLY/compute_stress_tally.cpp b/src/USER-TALLY/compute_stress_tally.cpp
	index 2575bd372..66df9f6e4 100644
	--- a/src/USER-TALLY/compute_stress_tally.cpp
	+++ b/src/USER-TALLY/compute_stress_tally.cpp
	@@ -1,250 +1,250 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <string.h>
	#include "compute_stress_tally.h"
	#include "atom.h"
	#include "group.h"
	#include "pair.h"
	#include "update.h"
	#include "memory.h"
	#include "error.h"
	#include "force.h"

	using namespace LAMMPS_NS;

	/* ---------------------------------------------------------------------- */

	ComputeStressTally::ComputeStressTally(LAMMPS lmp, int narg, char *arg) :
	Compute(lmp, narg, arg)
	{
	if (narg < 4) error->all(FLERR,"Illegal compute stress/tally command");

	igroup2 = group->find(arg[3]);
	if (igroup2 == -1)
	error->all(FLERR,"Could not find compute stress/tally second group ID");
	groupbit2 = group->bitmask[igroup2];

	scalar_flag = 1;
	vector_flag = 0;
	peratom_flag = 1;
	timeflag = 1;

	comm_reverse = size_peratom_cols = 6;
	extscalar = 0;
	peflag = 1; // we need Pair::ev_tally() to be run

	did_compute = 0;
	invoked_peratom = invoked_scalar = -1;
	nmax = -1;
	stress = NULL;
	vector = new double[size_peratom_cols];
	}

	/* ---------------------------------------------------------------------- */

	ComputeStressTally::~ComputeStressTally()
	{
	if (force && force->pair) force->pair->del_tally_callback(this);
	memory->destroy(stress);
	delete[] vector;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeStressTally::init()
	{
	if (force->pair == NULL)
	- error->all(FLERR,"Trying to use compute stress/tally with no pair style");
	+ error->all(FLERR,"Trying to use compute stress/tally without pair style");
	else
	force->pair->add_tally_callback(this);

	if (force->pair->single_enable == 0 \|\| force->pair->manybody_flag)
	- error->all(FLERR,"Compute stress/tally used with incompatible pair style.");
	+ error->warning(FLERR,"Compute stress/tally used with incompatible pair style");

	if ((comm->me == 0) && (force->bond \|\| force->angle \|\| force->dihedral
	\|\| force->improper \|\| force->kspace))
	error->warning(FLERR,"Compute stress/tally only called from pair style");

	did_compute = -1;
	}


	/* ---------------------------------------------------------------------- */
	void ComputeStressTally::pair_tally_callback(int i, int j, int nlocal, int newton,
	double, double, double fpair,
	double dx, double dy, double dz)
	{
	const int ntotal = atom->nlocal + atom->nghost;
	const int * const mask = atom->mask;

	// do setup work that needs to be done only once per timestep

	if (did_compute != update->ntimestep) {
	did_compute = update->ntimestep;

	// grow local stress array if necessary
	// needs to be atom->nmax in length

	if (atom->nmax > nmax) {
	memory->destroy(stress);
	nmax = atom->nmax;
	memory->create(stress,nmax,size_peratom_cols,"stress/tally:stress");
	array_atom = stress;
	}

	// clear storage as needed

	if (newton) {
	for (int i=0; i < ntotal; ++i)
	for (int j=0; j < size_peratom_cols; ++j)
	stress[i][j] = 0.0;
	} else {
	for (int i=0; i < atom->nlocal; ++i)
	for (int j=0; j < size_peratom_cols; ++j)
	stress[i][j] = 0.0;
	}

	for (int i=0; i < size_peratom_cols; ++i)
	vector[i] = virial[i] = 0.0;
	}

	if ( ((mask[i] & groupbit) && (mask[j] & groupbit2))
	\|\| ((mask[i] & groupbit2) && (mask[j] & groupbit)) ) {

	fpair *= 0.5;
	const double v0 = dxdxfpair;
	const double v1 = dydyfpair;
	const double v2 = dzdzfpair;
	const double v3 = dxdyfpair;
	const double v4 = dxdzfpair;
	const double v5 = dydzfpair;

	if (newton \|\| i < nlocal) {
	virial[0] += v0; stress[i][0] += v0;
	virial[1] += v1; stress[i][1] += v1;
	virial[2] += v2; stress[i][2] += v2;
	virial[3] += v3; stress[i][3] += v3;
	virial[4] += v4; stress[i][4] += v4;
	virial[5] += v5; stress[i][5] += v5;
	}
	if (newton \|\| j < nlocal) {
	virial[0] += v0; stress[j][0] += v0;
	virial[1] += v1; stress[j][1] += v1;
	virial[2] += v2; stress[j][2] += v2;
	virial[3] += v3; stress[j][3] += v3;
	virial[4] += v4; stress[j][4] += v4;
	virial[5] += v5; stress[j][5] += v5;
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	int ComputeStressTally::pack_reverse_comm(int n, int first, double *buf)
	{
	int i,m,last;

	m = 0;
	last = first + n;
	for (i = first; i < last; i++) {
	buf[m++] = stress[i][0];
	buf[m++] = stress[i][1];
	buf[m++] = stress[i][2];
	buf[m++] = stress[i][3];
	buf[m++] = stress[i][4];
	buf[m++] = stress[i][5];
	}
	return m;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeStressTally::unpack_reverse_comm(int n, int list, double buf)
	{
	int i,j,m;

	m = 0;
	for (i = 0; i < n; i++) {
	j = list[i];
	stress[j][0] += buf[m++];
	stress[j][1] += buf[m++];
	stress[j][2] += buf[m++];
	stress[j][3] += buf[m++];
	stress[j][4] += buf[m++];
	stress[j][5] += buf[m++];
	}
	}

	/* ---------------------------------------------------------------------- */

	double ComputeStressTally::compute_scalar()
	{
	invoked_scalar = update->ntimestep;
	if ((did_compute != invoked_scalar) \|\| (update->eflag_global != invoked_scalar))
	error->all(FLERR,"Energy was not tallied on needed timestep");

	// sum accumulated forces across procs

	MPI_Allreduce(virial,vector,size_peratom_cols,MPI_DOUBLE,MPI_SUM,world);

	if (domain->dimension == 3)
	scalar = (vector[0]+vector[1]+vector[2])/3.0;
	else
	scalar = (vector[0]+vector[1])/2.0;

	return scalar;
	}

	/* ---------------------------------------------------------------------- */

	void ComputeStressTally::compute_peratom()
	{
	invoked_peratom = update->ntimestep;
	if ((did_compute != invoked_peratom) \|\| (update->eflag_global != invoked_peratom))
	error->all(FLERR,"Energy was not tallied on needed timestep");

	// collect contributions from ghost atoms

	if (force->newton_pair) {
	comm->reverse_comm_compute(this);

	const int nall = atom->nlocal + atom->nghost;
	for (int i = atom->nlocal; i < nall; ++i)
	for (int j = 0; j < size_peratom_cols; ++j)
	stress[i][j] = 0.0;
	}

	// convert to stressvolume units = -pressurevolume

	const double nktv2p = -force->nktv2p;
	for (int i = 0; i < atom->nlocal; i++) {
	stress[i][0] *= nktv2p;
	stress[i][1] *= nktv2p;
	stress[i][2] *= nktv2p;
	stress[i][3] *= nktv2p;
	stress[i][4] *= nktv2p;
	stress[i][5] *= nktv2p;
	}
	}

	/* ----------------------------------------------------------------------
	memory usage of local atom-based array
	------------------------------------------------------------------------- */

	double ComputeStressTally::memory_usage()
	{
	double bytes = nmaxsize_peratom_cols sizeof(double);
	return bytes;
	}

	diff --git a/src/USER-VTK/README b/src/USER-VTK/README
	index 86ef56a74..3429c96b7 100644
	--- a/src/USER-VTK/README
	+++ b/src/USER-VTK/README
	@@ -1,17 +1,17 @@
	-This package implements the "dump custom/vtk" command which can be used in a
	+This package implements the "dump vtk" command which can be used in a
	LAMMPS input script.

	-This dump allows to output atom data similar to dump custom, but directly into
	-VTK files.
	+This dump allows output of atom data similar to the dump custom
	+command, but in VTK format.

	-This package uses the VTK library (www.vtk.org) which must be installed on your
	-system. See the lib/vtk/README file and the LAMMPS manual for information on
	-building LAMMPS with external libraries. The settings in the Makefile.lammps
	-file in that directory must be correct for LAMMPS to build correctly with this
	-package installed.
	+This package uses the VTK library (www.vtk.org) which must be
	+installed on your system. See the lib/vtk/README file and the LAMMPS
	+manual for information on building LAMMPS with external libraries.
	+The settings in the Makefile.lammps file in that directory must be
	+correct for LAMMPS to build correctly with this package installed.

	-This code was initially developed for LIGGGHTS by Daniel Queteschiner at DCS
	-Computing. This is an effort to integrate it back to LAMMPS.
	+This code was initially developed for LIGGGHTS by Daniel Queteschiner
	+at DCS Computing. This is an effort to integrate it back to LAMMPS.

	The person who created this package is Richard Berger at JKU
	(richard.berger@jku.at). Contact him directly if you have questions.
	diff --git a/src/USER-VTK/dump_custom_vtk.cpp b/src/USER-VTK/dump_vtk.cpp
	similarity index 91%
	rename from src/USER-VTK/dump_custom_vtk.cpp
	rename to src/USER-VTK/dump_vtk.cpp
	index 0e4bc4597..0aa749e73 100644
	--- a/src/USER-VTK/dump_custom_vtk.cpp
	+++ b/src/USER-VTK/dump_vtk.cpp
	@@ -1,2398 +1,2400 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	This file initially came from LIGGGHTS (www.liggghts.com)
	Copyright (2014) DCS Computing GmbH, Linz
	Copyright (2015) Johannes Kepler University Linz

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing authors:
	Daniel Queteschiner (DCS, JKU)
	Christoph Kloss (DCS)
	Richard Berger (JKU)
	------------------------------------------------------------------------- */

	#include <math.h>
	#include <stdlib.h>
	#include <string.h>
	-#include "dump_custom_vtk.h"
	+#include "dump_vtk.h"
	#include "atom.h"
	#include "force.h"
	#include "domain.h"
	#include "region.h"
	#include "group.h"
	#include "input.h"
	#include "variable.h"
	#include "update.h"
	#include "modify.h"
	#include "compute.h"
	#include "fix.h"
	#include "memory.h"
	#include "error.h"
	+
	#include <vector>
	#include <sstream>
	#include <vtkVersion.h>
	+
	#ifndef VTK_MAJOR_VERSION
	#include <vtkConfigure.h>
	#endif
	+
	#include <vtkPointData.h>
	#include <vtkCellData.h>
	#include <vtkDoubleArray.h>
	#include <vtkIntArray.h>
	#include <vtkStringArray.h>
	#include <vtkPolyData.h>
	#include <vtkPolyDataWriter.h>
	#include <vtkXMLPolyDataWriter.h>
	#include <vtkXMLPPolyDataWriter.h>
	#include <vtkRectilinearGrid.h>
	#include <vtkRectilinearGridWriter.h>
	#include <vtkXMLRectilinearGridWriter.h>
	#include <vtkHexahedron.h>
	#include <vtkUnstructuredGrid.h>
	#include <vtkUnstructuredGridWriter.h>
	#include <vtkXMLUnstructuredGridWriter.h>
	#include <vtkXMLPUnstructuredGridWriter.h>

	using namespace LAMMPS_NS;

	// customize by
	// * adding an enum constant (add vector components in consecutive order)
	// * adding a pack_*(int) function for the value
	// * adjusting parse_fields function to add the pack_* function to pack_choice
	// (in case of vectors, adjust identify_vectors as well)
	// * adjusting thresh part in modify_param and count functions

	enum{X,Y,Z, // required for vtk, must come first
	ID,MOL,PROC,PROCP1,TYPE,ELEMENT,MASS,
	XS,YS,ZS,XSTRI,YSTRI,ZSTRI,XU,YU,ZU,XUTRI,YUTRI,ZUTRI,
	XSU,YSU,ZSU,XSUTRI,YSUTRI,ZSUTRI,
	IX,IY,IZ,
	VX,VY,VZ,FX,FY,FZ,
	Q,MUX,MUY,MUZ,MU,RADIUS,DIAMETER,
	OMEGAX,OMEGAY,OMEGAZ,ANGMOMX,ANGMOMY,ANGMOMZ,
	TQX,TQY,TQZ,
	VARIABLE,COMPUTE,FIX,INAME,DNAME,
	ATTRIBUTES}; // must come last
	enum{LT,LE,GT,GE,EQ,NEQ};
	enum{INT,DOUBLE,STRING,BIGINT}; // same as in DumpCFG
	enum{VTK,VTP,VTU,PVTP,PVTU}; // file formats

	#define INVOKED_PERATOM 8
	#define ONEFIELD 32
	#define DELTA 1048576

	/* ---------------------------------------------------------------------- */

	-DumpCustomVTK::DumpCustomVTK(LAMMPS lmp, int narg, char *arg) :
	+DumpVTK::DumpVTK(LAMMPS lmp, int narg, char *arg) :
	DumpCustom(lmp, narg, arg)
	{
	- if (narg == 5) error->all(FLERR,"No dump custom/vtk arguments specified");
	+ if (narg == 5) error->all(FLERR,"No dump vtk arguments specified");

	pack_choice.clear();
	vtype.clear();
	name.clear();

	myarrays.clear();
	n_calls_ = 0;

	// process attributes
	// ioptional = start of additional optional args
	// only dump image and dump movie styles process optional args

	ioptional = parse_fields(narg,arg);

	if (ioptional < narg &&
	strcmp(style,"image") != 0 && strcmp(style,"movie") != 0)
	- error->all(FLERR,"Invalid attribute in dump custom command");
	+ error->all(FLERR,"Invalid attribute in dump vtk command");
	size_one = pack_choice.size();
	current_pack_choice_key = -1;

	if (filewriter) reset_vtk_data_containers();


	label = NULL;

	{
	// parallel vtp/vtu requires proc number to be preceded by underscore '_'
	multiname_ex = NULL;
	char *ptr = strchr(filename,'%');
	if (ptr) {
	multiname_ex = new char[strlen(filename) + 16];
	*ptr = '\0';
	sprintf(multiname_ex,"%s_%d%s",filename,me,ptr+1);
	*ptr = '%';
	}
	}

	vtk_file_format = VTK;

	char *suffix = filename + strlen(filename) - strlen(".vtp");
	if (suffix > filename && strcmp(suffix,".vtp") == 0) {
	if (multiproc) vtk_file_format = PVTP;
	else vtk_file_format = VTP;
	} else if (suffix > filename && strcmp(suffix,".vtu") == 0) {
	if (multiproc) vtk_file_format = PVTU;
	else vtk_file_format = VTU;
	}

	if (vtk_file_format == VTK) { // no multiproc support for legacy vtk format
	if (me != 0) filewriter = 0;
	fileproc = 0;
	multiproc = 0;
	nclusterprocs = nprocs;
	}

	filecurrent = NULL;
	domainfilecurrent = NULL;
	parallelfilecurrent = NULL;
	header_choice = NULL;
	write_choice = NULL;
	boxcorners = NULL;
	}

	/* ---------------------------------------------------------------------- */

	-DumpCustomVTK::~DumpCustomVTK()
	+DumpVTK::~DumpVTK()
	{
	delete [] filecurrent;
	delete [] domainfilecurrent;
	delete [] parallelfilecurrent;
	delete [] multiname_ex;
	delete [] label;
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::init_style()
	+void DumpVTK::init_style()
	{
	// default for element names = C

	if (typenames == NULL) {
	typenames = new char*[ntypes+1];
	for (int itype = 1; itype <= ntypes; itype++) {
	typenames[itype] = new char[2];
	strcpy(typenames[itype],"C");
	}
	}

	// setup boundary string

	domain->boundary_string(boundstr);

	// setup function ptrs

	- header_choice = &DumpCustomVTK::header_vtk;
	+ header_choice = &DumpVTK::header_vtk;

	if (vtk_file_format == VTP \|\| vtk_file_format == PVTP)
	- write_choice = &DumpCustomVTK::write_vtp;
	+ write_choice = &DumpVTK::write_vtp;
	else if (vtk_file_format == VTU \|\| vtk_file_format == PVTU)
	- write_choice = &DumpCustomVTK::write_vtu;
	+ write_choice = &DumpVTK::write_vtu;
	else
	- write_choice = &DumpCustomVTK::write_vtk;
	+ write_choice = &DumpVTK::write_vtk;

	// find current ptr for each compute,fix,variable
	// check that fix frequency is acceptable

	int icompute;
	for (int i = 0; i < ncompute; i++) {
	icompute = modify->find_compute(id_compute[i]);
	- if (icompute < 0) error->all(FLERR,"Could not find dump custom/vtk compute ID");
	+ if (icompute < 0) error->all(FLERR,"Could not find dump vtk compute ID");
	compute[i] = modify->compute[icompute];
	}

	int ifix;
	for (int i = 0; i < nfix; i++) {
	ifix = modify->find_fix(id_fix[i]);
	- if (ifix < 0) error->all(FLERR,"Could not find dump custom/vtk fix ID");
	+ if (ifix < 0) error->all(FLERR,"Could not find dump vtk fix ID");
	fix[i] = modify->fix[ifix];
	if (nevery % modify->fix[ifix]->peratom_freq)
	- error->all(FLERR,"Dump custom/vtk and fix not computed at compatible times");
	+ error->all(FLERR,"Dump vtk and fix not computed at compatible times");
	}

	int ivariable;
	for (int i = 0; i < nvariable; i++) {
	ivariable = input->variable->find(id_variable[i]);
	if (ivariable < 0)
	- error->all(FLERR,"Could not find dump custom/vtk variable name");
	+ error->all(FLERR,"Could not find dump vtk variable name");
	variable[i] = ivariable;
	}

	int icustom;
	for (int i = 0; i < ncustom; i++) {
	icustom = atom->find_custom(id_custom[i],flag_custom[i]);
	if (icustom < 0)
	error->all(FLERR,"Could not find custom per-atom property ID");
	}

	// set index and check validity of region

	if (iregion >= 0) {
	iregion = domain->find_region(idregion);
	if (iregion == -1)
	- error->all(FLERR,"Region ID for dump custom/vtk does not exist");
	+ error->all(FLERR,"Region ID for dump vtk does not exist");
	}
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::write_header(bigint)
	+void DumpVTK::write_header(bigint)
	{
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::header_vtk(bigint)
	+void DumpVTK::header_vtk(bigint)
	{
	}

	/* ---------------------------------------------------------------------- */

	-int DumpCustomVTK::count()
	+int DumpVTK::count()
	{
	n_calls_ = 0;

	int i;

	// grow choose and variable vbuf arrays if needed

	int nlocal = atom->nlocal;
	if (atom->nmax > maxlocal) {
	maxlocal = atom->nmax;

	memory->destroy(choose);
	memory->destroy(dchoose);
	memory->destroy(clist);
	memory->create(choose,maxlocal,"dump:choose");
	memory->create(dchoose,maxlocal,"dump:dchoose");
	memory->create(clist,maxlocal,"dump:clist");

	for (i = 0; i < nvariable; i++) {
	memory->destroy(vbuf[i]);
	memory->create(vbuf[i],maxlocal,"dump:vbuf");
	}
	}

	// invoke Computes for per-atom quantities
	// only if within a run or minimize
	// else require that computes are current
	// this prevents a compute from being invoked by the WriteDump class

	if (ncompute) {
	if (update->whichflag == 0) {
	for (i = 0; i < ncompute; i++)
	if (compute[i]->invoked_peratom != update->ntimestep)
	error->all(FLERR,"Compute used in dump between runs is not current");
	} else {
	for (i = 0; i < ncompute; i++) {
	if (!(compute[i]->invoked_flag & INVOKED_PERATOM)) {
	compute[i]->compute_peratom();
	compute[i]->invoked_flag \|= INVOKED_PERATOM;
	}
	}
	}
	}

	// evaluate atom-style Variables for per-atom quantities

	if (nvariable)
	for (i = 0; i < nvariable; i++)
	input->variable->compute_atom(variable[i],igroup,vbuf[i],1,0);

	// choose all local atoms for output

	for (i = 0; i < nlocal; i++) choose[i] = 1;

	// un-choose if not in group

	if (igroup) {
	int *mask = atom->mask;
	for (i = 0; i < nlocal; i++)
	if (!(mask[i] & groupbit))
	choose[i] = 0;
	}

	// un-choose if not in region

	if (iregion >= 0) {
	Region *region = domain->regions[iregion];
	region->prematch();
	double **x = atom->x;
	for (i = 0; i < nlocal; i++)
	if (choose[i] && region->match(x[i][0],x[i][1],x[i][2]) == 0)
	choose[i] = 0;
	}

	// un-choose if any threshold criterion isn't met

	if (nthresh) {
	double *ptr;
	double value;
	int nstride;
	int nlocal = atom->nlocal;

	for (int ithresh = 0; ithresh < nthresh; ithresh++) {

	// customize by adding to if statement

	if (thresh_array[ithresh] == ID) {
	tagint *tag = atom->tag;
	for (i = 0; i < nlocal; i++) dchoose[i] = tag[i];
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == MOL) {
	if (!atom->molecule_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	tagint *molecule = atom->molecule;
	for (i = 0; i < nlocal; i++) dchoose[i] = molecule[i];
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == PROC) {
	for (i = 0; i < nlocal; i++) dchoose[i] = me;
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == PROCP1) {
	for (i = 0; i < nlocal; i++) dchoose[i] = me;
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == TYPE) {
	int *type = atom->type;
	for (i = 0; i < nlocal; i++) dchoose[i] = type[i];
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == ELEMENT) {
	int *type = atom->type;
	for (i = 0; i < nlocal; i++) dchoose[i] = type[i];
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == MASS) {
	if (atom->rmass) {
	ptr = atom->rmass;
	nstride = 1;
	} else {
	double *mass = atom->mass;
	int *type = atom->type;
	for (i = 0; i < nlocal; i++) dchoose[i] = mass[type[i]];
	ptr = dchoose;
	nstride = 1;
	}

	} else if (thresh_array[ithresh] == X) {
	ptr = &atom->x[0][0];
	nstride = 3;
	} else if (thresh_array[ithresh] == Y) {
	ptr = &atom->x[0][1];
	nstride = 3;
	} else if (thresh_array[ithresh] == Z) {
	ptr = &atom->x[0][2];
	nstride = 3;

	} else if (thresh_array[ithresh] == XS) {
	double **x = atom->x;
	double boxxlo = domain->boxlo[0];
	double invxprd = 1.0/domain->xprd;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = (x[i][0] - boxxlo) * invxprd;
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == YS) {
	double **x = atom->x;
	double boxylo = domain->boxlo[1];
	double invyprd = 1.0/domain->yprd;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = (x[i][1] - boxylo) * invyprd;
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == ZS) {
	double **x = atom->x;
	double boxzlo = domain->boxlo[2];
	double invzprd = 1.0/domain->zprd;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = (x[i][2] - boxzlo) * invzprd;
	ptr = dchoose;
	nstride = 1;

	} else if (thresh_array[ithresh] == XSTRI) {
	double **x = atom->x;
	double *boxlo = domain->boxlo;
	double *h_inv = domain->h_inv;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = h_inv[0]*(x[i][0]-boxlo[0]) +
	h_inv[5](x[i][1]-boxlo[1]) + h_inv[4](x[i][2]-boxlo[2]);
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == YSTRI) {
	double **x = atom->x;
	double *boxlo = domain->boxlo;
	double *h_inv = domain->h_inv;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = h_inv[1]*(x[i][1]-boxlo[1]) +
	h_inv[3]*(x[i][2]-boxlo[2]);
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == ZSTRI) {
	double **x = atom->x;
	double *boxlo = domain->boxlo;
	double *h_inv = domain->h_inv;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = h_inv[2]*(x[i][2]-boxlo[2]);
	ptr = dchoose;
	nstride = 1;

	} else if (thresh_array[ithresh] == XU) {
	double **x = atom->x;
	imageint *image = atom->image;
	double xprd = domain->xprd;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = x[i][0] + ((image[i] & IMGMASK) - IMGMAX) * xprd;
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == YU) {
	double **x = atom->x;
	imageint *image = atom->image;
	double yprd = domain->yprd;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = x[i][1] +
	((image[i] >> IMGBITS & IMGMASK) - IMGMAX) * yprd;
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == ZU) {
	double **x = atom->x;
	imageint *image = atom->image;
	double zprd = domain->zprd;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = x[i][2] + ((image[i] >> IMG2BITS) - IMGMAX) * zprd;
	ptr = dchoose;
	nstride = 1;

	} else if (thresh_array[ithresh] == XUTRI) {
	double **x = atom->x;
	imageint *image = atom->image;
	double *h = domain->h;
	int xbox,ybox,zbox;
	for (i = 0; i < nlocal; i++) {
	xbox = (image[i] & IMGMASK) - IMGMAX;
	ybox = (image[i] >> IMGBITS & IMGMASK) - IMGMAX;
	zbox = (image[i] >> IMG2BITS) - IMGMAX;
	dchoose[i] = x[i][0] + h[0]xbox + h[5]ybox + h[4]*zbox;
	}
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == YUTRI) {
	double **x = atom->x;
	imageint *image = atom->image;
	double *h = domain->h;
	int ybox,zbox;
	for (i = 0; i < nlocal; i++) {
	ybox = (image[i] >> IMGBITS & IMGMASK) - IMGMAX;
	zbox = (image[i] >> IMG2BITS) - IMGMAX;
	dchoose[i] = x[i][1] + h[1]ybox + h[3]zbox;
	}
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == ZUTRI) {
	double **x = atom->x;
	imageint *image = atom->image;
	double *h = domain->h;
	int zbox;
	for (i = 0; i < nlocal; i++) {
	zbox = (image[i] >> IMG2BITS) - IMGMAX;
	dchoose[i] = x[i][2] + h[2]*zbox;
	}
	ptr = dchoose;
	nstride = 1;

	} else if (thresh_array[ithresh] == XSU) {
	double **x = atom->x;
	imageint *image = atom->image;
	double boxxlo = domain->boxlo[0];
	double invxprd = 1.0/domain->xprd;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = (x[i][0] - boxxlo) * invxprd +
	(image[i] & IMGMASK) - IMGMAX;
	ptr = dchoose;
	nstride = 1;

	} else if (thresh_array[ithresh] == YSU) {
	double **x = atom->x;
	imageint *image = atom->image;
	double boxylo = domain->boxlo[1];
	double invyprd = 1.0/domain->yprd;
	for (i = 0; i < nlocal; i++)
	dchoose[i] =
	(x[i][1] - boxylo) * invyprd +
	(image[i] >> IMGBITS & IMGMASK) - IMGMAX;
	ptr = dchoose;
	nstride = 1;

	} else if (thresh_array[ithresh] == ZSU) {
	double **x = atom->x;
	imageint *image = atom->image;
	double boxzlo = domain->boxlo[2];
	double invzprd = 1.0/domain->zprd;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = (x[i][2] - boxzlo) * invzprd +
	(image[i] >> IMG2BITS) - IMGMAX;
	ptr = dchoose;
	nstride = 1;

	} else if (thresh_array[ithresh] == XSUTRI) {
	double **x = atom->x;
	imageint *image = atom->image;
	double *boxlo = domain->boxlo;
	double *h_inv = domain->h_inv;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = h_inv[0]*(x[i][0]-boxlo[0]) +
	h_inv[5]*(x[i][1]-boxlo[1]) +
	h_inv[4]*(x[i][2]-boxlo[2]) +
	(image[i] & IMGMASK) - IMGMAX;
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == YSUTRI) {
	double **x = atom->x;
	imageint *image = atom->image;
	double *boxlo = domain->boxlo;
	double *h_inv = domain->h_inv;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = h_inv[1]*(x[i][1]-boxlo[1]) +
	h_inv[3]*(x[i][2]-boxlo[2]) +
	(image[i] >> IMGBITS & IMGMASK) - IMGMAX;
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == ZSUTRI) {
	double **x = atom->x;
	imageint *image = atom->image;
	double *boxlo = domain->boxlo;
	double *h_inv = domain->h_inv;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = h_inv[2]*(x[i][2]-boxlo[2]) +
	(image[i] >> IMG2BITS) - IMGMAX;
	ptr = dchoose;
	nstride = 1;

	} else if (thresh_array[ithresh] == IX) {
	imageint *image = atom->image;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = (image[i] & IMGMASK) - IMGMAX;
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == IY) {
	imageint *image = atom->image;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = (image[i] >> IMGBITS & IMGMASK) - IMGMAX;
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == IZ) {
	imageint *image = atom->image;
	for (i = 0; i < nlocal; i++)
	dchoose[i] = (image[i] >> IMG2BITS) - IMGMAX;
	ptr = dchoose;
	nstride = 1;

	} else if (thresh_array[ithresh] == VX) {
	ptr = &atom->v[0][0];
	nstride = 3;
	} else if (thresh_array[ithresh] == VY) {
	ptr = &atom->v[0][1];
	nstride = 3;
	} else if (thresh_array[ithresh] == VZ) {
	ptr = &atom->v[0][2];
	nstride = 3;
	} else if (thresh_array[ithresh] == FX) {
	ptr = &atom->f[0][0];
	nstride = 3;
	} else if (thresh_array[ithresh] == FY) {
	ptr = &atom->f[0][1];
	nstride = 3;
	} else if (thresh_array[ithresh] == FZ) {
	ptr = &atom->f[0][2];
	nstride = 3;

	} else if (thresh_array[ithresh] == Q) {
	if (!atom->q_flag)
	error->all(FLERR,"Threshold for an atom property that isn't allocated");
	ptr = atom->q;
	nstride = 1;
	} else if (thresh_array[ithresh] == MUX) {
	if (!atom->mu_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	ptr = &atom->mu[0][0];
	nstride = 4;
	} else if (thresh_array[ithresh] == MUY) {
	if (!atom->mu_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	ptr = &atom->mu[0][1];
	nstride = 4;
	} else if (thresh_array[ithresh] == MUZ) {
	if (!atom->mu_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	ptr = &atom->mu[0][2];
	nstride = 4;
	} else if (thresh_array[ithresh] == MU) {
	if (!atom->mu_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	ptr = &atom->mu[0][3];
	nstride = 4;

	} else if (thresh_array[ithresh] == RADIUS) {
	if (!atom->radius_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	ptr = atom->radius;
	nstride = 1;
	} else if (thresh_array[ithresh] == DIAMETER) {
	if (!atom->radius_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	double *radius = atom->radius;
	for (i = 0; i < nlocal; i++) dchoose[i] = 2.0*radius[i];
	ptr = dchoose;
	nstride = 1;
	} else if (thresh_array[ithresh] == OMEGAX) {
	if (!atom->omega_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	ptr = &atom->omega[0][0];
	nstride = 3;
	} else if (thresh_array[ithresh] == OMEGAY) {
	if (!atom->omega_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	ptr = &atom->omega[0][1];
	nstride = 3;
	} else if (thresh_array[ithresh] == OMEGAZ) {
	if (!atom->omega_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	ptr = &atom->omega[0][2];
	nstride = 3;
	} else if (thresh_array[ithresh] == ANGMOMX) {
	if (!atom->angmom_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	ptr = &atom->angmom[0][0];
	nstride = 3;
	} else if (thresh_array[ithresh] == ANGMOMY) {
	if (!atom->angmom_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	ptr = &atom->angmom[0][1];
	nstride = 3;
	} else if (thresh_array[ithresh] == ANGMOMZ) {
	if (!atom->angmom_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	ptr = &atom->angmom[0][2];
	nstride = 3;
	} else if (thresh_array[ithresh] == TQX) {
	if (!atom->torque_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	ptr = &atom->torque[0][0];
	nstride = 3;
	} else if (thresh_array[ithresh] == TQY) {
	if (!atom->torque_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	ptr = &atom->torque[0][1];
	nstride = 3;
	} else if (thresh_array[ithresh] == TQZ) {
	if (!atom->torque_flag)
	error->all(FLERR,
	"Threshold for an atom property that isn't allocated");
	ptr = &atom->torque[0][2];
	nstride = 3;

	} else if (thresh_array[ithresh] == COMPUTE) {
	i = ATTRIBUTES + nfield + ithresh;
	if (argindex[i] == 0) {
	ptr = compute[field2index[i]]->vector_atom;
	nstride = 1;
	} else {
	ptr = &compute[field2index[i]]->array_atom[0][argindex[i]-1];
	nstride = compute[field2index[i]]->size_peratom_cols;
	}

	} else if (thresh_array[ithresh] == FIX) {
	i = ATTRIBUTES + nfield + ithresh;
	if (argindex[i] == 0) {
	ptr = fix[field2index[i]]->vector_atom;
	nstride = 1;
	} else {
	ptr = &fix[field2index[i]]->array_atom[0][argindex[i]-1];
	nstride = fix[field2index[i]]->size_peratom_cols;
	}

	} else if (thresh_array[ithresh] == VARIABLE) {
	i = ATTRIBUTES + nfield + ithresh;
	ptr = vbuf[field2index[i]];
	nstride = 1;

	} else if (thresh_array[ithresh] == DNAME) {
	int iwhich,tmp;
	i = ATTRIBUTES + nfield + ithresh;
	iwhich = atom->find_custom(id_custom[field2index[i]],tmp);
	ptr = atom->dvector[iwhich];
	nstride = 1;

	} else if (thresh_array[ithresh] == INAME) {
	int iwhich,tmp;
	i = ATTRIBUTES + nfield + ithresh;
	iwhich = atom->find_custom(id_custom[field2index[i]],tmp);

	int *ivector = atom->ivector[iwhich];
	for (i = 0; i < nlocal; i++)
	dchoose[i] = ivector[i];
	ptr = dchoose;
	nstride = 1;
	}

	// unselect atoms that don't meet threshold criterion

	value = thresh_value[ithresh];

	switch (thresh_op[ithresh]) {
	case LT:
	for (i = 0; i < nlocal; i++, ptr += nstride)
	if (choose[i] && *ptr >= value) choose[i] = 0;
	break;
	case LE:
	for (i = 0; i < nlocal; i++, ptr += nstride)
	if (choose[i] && *ptr > value) choose[i] = 0;
	break;
	case GT:
	for (i = 0; i < nlocal; i++, ptr += nstride)
	if (choose[i] && *ptr <= value) choose[i] = 0;
	break;
	case GE:
	for (i = 0; i < nlocal; i++, ptr += nstride)
	if (choose[i] && *ptr < value) choose[i] = 0;
	break;
	case EQ:
	for (i = 0; i < nlocal; i++, ptr += nstride)
	if (choose[i] && *ptr != value) choose[i] = 0;
	break;
	case NEQ:
	for (i = 0; i < nlocal; i++, ptr += nstride)
	if (choose[i] && *ptr == value) choose[i] = 0;
	break;
	}
	}
	}

	// compress choose flags into clist
	// nchoose = # of selected atoms
	// clist[i] = local index of each selected atom

	nchoose = 0;
	for (i = 0; i < nlocal; i++)
	if (choose[i]) clist[nchoose++] = i;

	return nchoose;
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::write()
	+void DumpVTK::write()
	{
	// simulation box bounds

	if (domain->triclinic == 0) {
	boxxlo = domain->boxlo[0];
	boxxhi = domain->boxhi[0];
	boxylo = domain->boxlo[1];
	boxyhi = domain->boxhi[1];
	boxzlo = domain->boxlo[2];
	boxzhi = domain->boxhi[2];
	} else {
	domain->box_corners();
	boxcorners = domain->corners;
	}

	// nme = # of dump lines this proc contributes to dump

	nme = count();

	// ntotal = total # of dump lines in snapshot
	// nmax = max # of dump lines on any proc

	bigint bnme = nme;
	MPI_Allreduce(&bnme,&ntotal,1,MPI_LMP_BIGINT,MPI_SUM,world);

	int nmax;
	if (multiproc != nprocs) MPI_Allreduce(&nme,&nmax,1,MPI_INT,MPI_MAX,world);
	else nmax = nme;

	// write timestep header
	// for multiproc,
	// nheader = # of lines in this file via Allreduce on clustercomm

	bigint nheader = ntotal;
	if (multiproc)
	MPI_Allreduce(&bnme,&nheader,1,MPI_LMP_BIGINT,MPI_SUM,clustercomm);

	if (filewriter) write_header(nheader);

	// insure buf is sized for packing and communicating
	// use nmax to insure filewriter proc can receive info from others
	// limit nmax*size_one to int since used as arg in MPI calls

	if (nmax > maxbuf) {
	if ((bigint) nmax * size_one > MAXSMALLINT)
	error->all(FLERR,"Too much per-proc info for dump");
	maxbuf = nmax;
	memory->destroy(buf);
	memory->create(buf,maxbuf*size_one,"dump:buf");
	}

	// insure ids buffer is sized for sorting

	if (sort_flag && sortcol == 0 && nmax > maxids) {
	maxids = nmax;
	memory->destroy(ids);
	memory->create(ids,maxids,"dump:ids");
	}

	// pack my data into buf
	// if sorting on IDs also request ID list from pack()
	// sort buf as needed

	if (sort_flag && sortcol == 0) pack(ids);
	else pack(NULL);
	if (sort_flag) sort();

	// filewriter = 1 = this proc writes to file
	// ping each proc in my cluster, receive its data, write data to file
	// else wait for ping from fileproc, send my data to fileproc

	int tmp,nlines;
	MPI_Status status;
	MPI_Request request;

	// comm and output buf of doubles

	if (filewriter) {
	for (int iproc = 0; iproc < nclusterprocs; iproc++) {
	if (iproc) {
	MPI_Irecv(buf,maxbuf*size_one,MPI_DOUBLE,me+iproc,0,world,&request);
	MPI_Send(&tmp,0,MPI_INT,me+iproc,0,world);
	MPI_Wait(&request,&status);
	MPI_Get_count(&status,MPI_DOUBLE,&nlines);
	nlines /= size_one;
	} else nlines = nme;

	write_data(nlines,buf);
	}
	} else {
	MPI_Recv(&tmp,0,MPI_INT,fileproc,0,world,&status);
	MPI_Rsend(buf,nme*size_one,MPI_DOUBLE,fileproc,0,world);
	}
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::pack(tagint *ids)
	+void DumpVTK::pack(tagint *ids)
	{
	int n = 0;
	for (std::map<int,FnPtrPack>::iterator it=pack_choice.begin(); it!=pack_choice.end(); ++it, ++n) {
	current_pack_choice_key = it->first; // work-around for pack_compute, pack_fix, pack_variable
	(this->*(it->second))(n);
	}

	if (ids) {
	tagint *tag = atom->tag;
	for (int i = 0; i < nchoose; i++)
	ids[i] = tag[clist[i]];
	}
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::write_data(int n, double *mybuf)
	+void DumpVTK::write_data(int n, double *mybuf)
	{
	(this->*write_choice)(n,mybuf);
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::setFileCurrent() {
	+void DumpVTK::setFileCurrent() {
	delete [] filecurrent;
	filecurrent = NULL;

	char *filestar = filename;
	if (multiproc) {
	if (multiproc > 1) { // if dump_modify fileper or nfile was used
	delete [] multiname_ex;
	multiname_ex = NULL;
	char *ptr = strchr(filename,'%');
	if (ptr) {
	int id;
	if (me + nclusterprocs == nprocs) // last filewriter
	id = multiproc -1;
	else
	id = me/nclusterprocs;
	multiname_ex = new char[strlen(filename) + 16];
	*ptr = '\0';
	sprintf(multiname_ex,"%s_%d%s",filename,id,ptr+1);
	*ptr = '%';
	}
	} // else multiname_ex built in constructor is OK
	filestar = multiname_ex;
	}

	if (multifile == 0) {
	filecurrent = new char[strlen(filestar) + 1];
	strcpy(filecurrent, filestar);
	} else {
	filecurrent = new char[strlen(filestar) + 16];
	char ptr = strchr(filestar,'');
	*ptr = '\0';
	if (padflag == 0) {
	sprintf(filecurrent,"%s" BIGINT_FORMAT "%s",
	filestar,update->ntimestep,ptr+1);
	} else {
	char bif[8],pad[16];
	strcpy(bif,BIGINT_FORMAT);
	sprintf(pad,"%%s%%0%d%s%%s",padflag,&bif[1]);
	sprintf(filecurrent,pad,filestar,update->ntimestep,ptr+1);
	}
	ptr = '';
	}

	// filename of domain box data file
	delete [] domainfilecurrent;
	domainfilecurrent = NULL;
	if (multiproc) {
	// remove '%' character
	char *ptr = strchr(filename,'%');
	domainfilecurrent = new char[strlen(filename)];
	*ptr = '\0';
	sprintf(domainfilecurrent,"%s%s",filename,ptr+1);
	*ptr = '%';
	// insert "_boundingBox" string
	ptr = strrchr(domainfilecurrent,'.');
	filestar = new char[strlen(domainfilecurrent)+16];
	*ptr = '\0';
	sprintf(filestar,"%s_boundingBox.%s",domainfilecurrent,ptr+1);
	delete [] domainfilecurrent;
	domainfilecurrent = NULL;

	if (multifile == 0) {
	domainfilecurrent = new char[strlen(filestar) + 1];
	strcpy(domainfilecurrent, filestar);
	} else {
	domainfilecurrent = new char[strlen(filestar) + 16];
	char ptr = strchr(filestar,'');
	*ptr = '\0';
	if (padflag == 0) {
	sprintf(domainfilecurrent,"%s" BIGINT_FORMAT "%s",
	filestar,update->ntimestep,ptr+1);
	} else {
	char bif[8],pad[16];
	strcpy(bif,BIGINT_FORMAT);
	sprintf(pad,"%%s%%0%d%s%%s",padflag,&bif[1]);
	sprintf(domainfilecurrent,pad,filestar,update->ntimestep,ptr+1);
	}
	ptr = '';
	}
	delete [] filestar;
	filestar = NULL;
	} else {
	domainfilecurrent = new char[strlen(filecurrent) + 16];
	char *ptr = strrchr(filecurrent,'.');
	*ptr = '\0';
	sprintf(domainfilecurrent,"%s_boundingBox.%s",filecurrent,ptr+1);
	*ptr = '.';
	}

	// filename of parallel file
	if (multiproc && me == 0) {
	delete [] parallelfilecurrent;
	parallelfilecurrent = NULL;

	// remove '%' character and add 'p' to file extension
	// -> string length stays the same
	char *ptr = strchr(filename,'%');
	filestar = new char[strlen(filename) + 1];
	*ptr = '\0';
	sprintf(filestar,"%s%s",filename,ptr+1);
	*ptr = '%';
	ptr = strrchr(filestar,'.');
	ptr++;
	*ptr++='p';
	*ptr++='v';
	*ptr++='t';
	*ptr++= (vtk_file_format == PVTP)?'p':'u';
	*ptr++= 0;

	if (multifile == 0) {
	parallelfilecurrent = new char[strlen(filestar) + 1];
	strcpy(parallelfilecurrent, filestar);
	} else {
	parallelfilecurrent = new char[strlen(filestar) + 16];
	char ptr = strchr(filestar,'');
	*ptr = '\0';
	if (padflag == 0) {
	sprintf(parallelfilecurrent,"%s" BIGINT_FORMAT "%s",
	filestar,update->ntimestep,ptr+1);
	} else {
	char bif[8],pad[16];
	strcpy(bif,BIGINT_FORMAT);
	sprintf(pad,"%%s%%0%d%s%%s",padflag,&bif[1]);
	sprintf(parallelfilecurrent,pad,filestar,update->ntimestep,ptr+1);
	}
	ptr = '';
	}
	delete [] filestar;
	filestar = NULL;
	}
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::buf2arrays(int n, double *mybuf)
	+void DumpVTK::buf2arrays(int n, double *mybuf)
	{
	for (int iatom=0; iatom < n; ++iatom) {
	vtkIdType pid[1];
	pid[0] = points->InsertNextPoint(mybuf[iatomsize_one],mybuf[iatomsize_one+1],mybuf[iatom*size_one+2]);

	int j=3; // 0,1,2 = x,y,z handled just above
	for (std::map<int, vtkSmartPointer<vtkAbstractArray> >::iterator it=myarrays.begin(); it!=myarrays.end(); ++it) {
	vtkAbstractArray *paa = it->second;
	if (it->second->GetNumberOfComponents() == 3) {
	switch (vtype[it->first]) {
	case INT:
	{
	int iv3[3] = { static_cast<int>(mybuf[iatom*size_one+j ]),
	static_cast<int>(mybuf[iatom*size_one+j+1]),
	static_cast<int>(mybuf[iatom*size_one+j+2]) };
	vtkIntArray pia = static_cast<vtkIntArray>(paa);
	pia->InsertNextTupleValue(iv3);
	break;
	}
	case DOUBLE:
	{
	vtkDoubleArray pda = static_cast<vtkDoubleArray>(paa);
	pda->InsertNextTupleValue(&mybuf[iatom*size_one+j]);
	break;
	}
	}
	j+=3;
	} else {
	switch (vtype[it->first]) {
	case INT:
	{
	vtkIntArray pia = static_cast<vtkIntArray>(paa);
	pia->InsertNextValue(mybuf[iatom*size_one+j]);
	break;
	}
	case DOUBLE:
	{
	vtkDoubleArray pda = static_cast<vtkDoubleArray>(paa);
	pda->InsertNextValue(mybuf[iatom*size_one+j]);
	break;
	}
	case STRING:
	{
	vtkStringArray psa = static_cast<vtkStringArray>(paa);
	psa->InsertNextValue(typenames[static_cast<int>(mybuf[iatom*size_one+j])]);
	break;
	}
	}
	++j;
	}
	}

	pointsCells->InsertNextCell(1,pid);
	}
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::prepare_domain_data(vtkRectilinearGrid *rgrid)
	+void DumpVTK::prepare_domain_data(vtkRectilinearGrid *rgrid)
	{
	vtkSmartPointer<vtkDoubleArray> xCoords = vtkSmartPointer<vtkDoubleArray>::New();
	xCoords->InsertNextValue(boxxlo);
	xCoords->InsertNextValue(boxxhi);
	vtkSmartPointer<vtkDoubleArray> yCoords = vtkSmartPointer<vtkDoubleArray>::New();
	yCoords->InsertNextValue(boxylo);
	yCoords->InsertNextValue(boxyhi);
	vtkSmartPointer<vtkDoubleArray> zCoords = vtkSmartPointer<vtkDoubleArray>::New();
	zCoords->InsertNextValue(boxzlo);
	zCoords->InsertNextValue(boxzhi);

	rgrid->SetDimensions(2,2,2);
	rgrid->SetXCoordinates(xCoords);
	rgrid->SetYCoordinates(yCoords);
	rgrid->SetZCoordinates(zCoords);
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::prepare_domain_data_triclinic(vtkUnstructuredGrid *hexahedronGrid)
	+void DumpVTK::prepare_domain_data_triclinic(vtkUnstructuredGrid *hexahedronGrid)
	{
	vtkSmartPointer<vtkPoints> hexahedronPoints = vtkSmartPointer<vtkPoints>::New();
	hexahedronPoints->SetNumberOfPoints(8);
	hexahedronPoints->InsertPoint(0, boxcorners[0][0], boxcorners[0][1], boxcorners[0][2]);
	hexahedronPoints->InsertPoint(1, boxcorners[1][0], boxcorners[1][1], boxcorners[1][2]);
	hexahedronPoints->InsertPoint(2, boxcorners[3][0], boxcorners[3][1], boxcorners[3][2]);
	hexahedronPoints->InsertPoint(3, boxcorners[2][0], boxcorners[2][1], boxcorners[2][2]);
	hexahedronPoints->InsertPoint(4, boxcorners[4][0], boxcorners[4][1], boxcorners[4][2]);
	hexahedronPoints->InsertPoint(5, boxcorners[5][0], boxcorners[5][1], boxcorners[5][2]);
	hexahedronPoints->InsertPoint(6, boxcorners[7][0], boxcorners[7][1], boxcorners[7][2]);
	hexahedronPoints->InsertPoint(7, boxcorners[6][0], boxcorners[6][1], boxcorners[6][2]);
	vtkSmartPointer<vtkHexahedron> hexahedron = vtkSmartPointer<vtkHexahedron>::New();
	hexahedron->GetPointIds()->SetId(0, 0);
	hexahedron->GetPointIds()->SetId(1, 1);
	hexahedron->GetPointIds()->SetId(2, 2);
	hexahedron->GetPointIds()->SetId(3, 3);
	hexahedron->GetPointIds()->SetId(4, 4);
	hexahedron->GetPointIds()->SetId(5, 5);
	hexahedron->GetPointIds()->SetId(6, 6);
	hexahedron->GetPointIds()->SetId(7, 7);

	hexahedronGrid->Allocate(1, 1);
	hexahedronGrid->InsertNextCell(hexahedron->GetCellType(),
	hexahedron->GetPointIds());
	hexahedronGrid->SetPoints(hexahedronPoints);
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::write_domain_vtk()
	+void DumpVTK::write_domain_vtk()
	{
	vtkSmartPointer<vtkRectilinearGrid> rgrid = vtkSmartPointer<vtkRectilinearGrid>::New();
	prepare_domain_data(rgrid.GetPointer());

	vtkSmartPointer<vtkRectilinearGridWriter> gwriter = vtkSmartPointer<vtkRectilinearGridWriter>::New();

	if(label) gwriter->SetHeader(label);
	else gwriter->SetHeader("Generated by LAMMPS");

	if (binary) gwriter->SetFileTypeToBinary();
	else gwriter->SetFileTypeToASCII();

	#if VTK_MAJOR_VERSION < 6
	gwriter->SetInput(rgrid);
	#else
	gwriter->SetInputData(rgrid);
	#endif
	gwriter->SetFileName(domainfilecurrent);
	gwriter->Write();
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::write_domain_vtk_triclinic()
	+void DumpVTK::write_domain_vtk_triclinic()
	{
	vtkSmartPointer<vtkUnstructuredGrid> hexahedronGrid = vtkSmartPointer<vtkUnstructuredGrid>::New();
	prepare_domain_data_triclinic(hexahedronGrid.GetPointer());

	vtkSmartPointer<vtkUnstructuredGridWriter> gwriter = vtkSmartPointer<vtkUnstructuredGridWriter>::New();

	if(label) gwriter->SetHeader(label);
	else gwriter->SetHeader("Generated by LAMMPS");

	if (binary) gwriter->SetFileTypeToBinary();
	else gwriter->SetFileTypeToASCII();

	#if VTK_MAJOR_VERSION < 6
	gwriter->SetInput(hexahedronGrid);
	#else
	gwriter->SetInputData(hexahedronGrid);
	#endif
	gwriter->SetFileName(domainfilecurrent);
	gwriter->Write();
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::write_domain_vtr()
	+void DumpVTK::write_domain_vtr()
	{
	vtkSmartPointer<vtkRectilinearGrid> rgrid = vtkSmartPointer<vtkRectilinearGrid>::New();
	prepare_domain_data(rgrid.GetPointer());

	vtkSmartPointer<vtkXMLRectilinearGridWriter> gwriter = vtkSmartPointer<vtkXMLRectilinearGridWriter>::New();

	if (binary) gwriter->SetDataModeToBinary();
	else gwriter->SetDataModeToAscii();

	#if VTK_MAJOR_VERSION < 6
	gwriter->SetInput(rgrid);
	#else
	gwriter->SetInputData(rgrid);
	#endif
	gwriter->SetFileName(domainfilecurrent);
	gwriter->Write();
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::write_domain_vtu_triclinic()
	+void DumpVTK::write_domain_vtu_triclinic()
	{
	vtkSmartPointer<vtkUnstructuredGrid> hexahedronGrid = vtkSmartPointer<vtkUnstructuredGrid>::New();
	prepare_domain_data_triclinic(hexahedronGrid.GetPointer());

	vtkSmartPointer<vtkXMLUnstructuredGridWriter> gwriter = vtkSmartPointer<vtkXMLUnstructuredGridWriter>::New();

	if (binary) gwriter->SetDataModeToBinary();
	else gwriter->SetDataModeToAscii();

	#if VTK_MAJOR_VERSION < 6
	gwriter->SetInput(hexahedronGrid);
	#else
	gwriter->SetInputData(hexahedronGrid);
	#endif
	gwriter->SetFileName(domainfilecurrent);
	gwriter->Write();
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::write_vtk(int n, double *mybuf)
	+void DumpVTK::write_vtk(int n, double *mybuf)
	{
	++n_calls_;

	buf2arrays(n, mybuf);

	if (n_calls_ < nclusterprocs)
	return; // multiple processors but only proc 0 is a filewriter (-> nclusterprocs procs contribute to the filewriter's output data)

	setFileCurrent();

	{
	#ifdef UNSTRUCTURED_GRID_VTK
	vtkSmartPointer<vtkUnstructuredGrid> unstructuredGrid = vtkSmartPointer<vtkUnstructuredGrid>::New();
	unstructuredGrid->SetPoints(points);
	unstructuredGrid->SetCells(VTK_VERTEX, pointsCells);

	for (std::map<int, vtkSmartPointer<vtkAbstractArray> >::iterator it=myarrays.begin(); it!=myarrays.end(); ++it) {
	unstructuredGrid->GetPointData()->AddArray(it->second);
	}

	vtkSmartPointer<vtkUnstructuredGridWriter> writer = vtkSmartPointer<vtkUnstructuredGridWriter>::New();
	#else
	vtkSmartPointer<vtkPolyData> polyData = vtkSmartPointer<vtkPolyData>::New();
	polyData->SetPoints(points);
	polyData->SetVerts(pointsCells);

	for (std::map<int, vtkSmartPointer<vtkAbstractArray> >::iterator it=myarrays.begin(); it!=myarrays.end(); ++it) {
	polyData->GetPointData()->AddArray(it->second);
	}

	vtkSmartPointer<vtkPolyDataWriter> writer = vtkSmartPointer<vtkPolyDataWriter>::New();
	#endif

	if(label) writer->SetHeader(label);
	else writer->SetHeader("Generated by LAMMPS");

	if (binary) writer->SetFileTypeToBinary();
	else writer->SetFileTypeToASCII();

	#ifdef UNSTRUCTURED_GRID_VTK
	#if VTK_MAJOR_VERSION < 6
	writer->SetInput(unstructuredGrid);
	#else
	writer->SetInputData(unstructuredGrid);
	#endif
	#else
	#if VTK_MAJOR_VERSION < 6
	writer->SetInput(polyData);
	#else
	writer->SetInputData(polyData);
	#endif
	#endif
	writer->SetFileName(filecurrent);
	writer->Write();

	if (domain->triclinic == 0)
	write_domain_vtk();
	else
	write_domain_vtk_triclinic();
	}

	reset_vtk_data_containers();
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::write_vtp(int n, double *mybuf)
	+void DumpVTK::write_vtp(int n, double *mybuf)
	{
	++n_calls_;

	buf2arrays(n, mybuf);

	if (n_calls_ < nclusterprocs)
	return; // multiple processors but not all are filewriters (-> nclusterprocs procs contribute to the filewriter's output data)

	setFileCurrent();

	{
	vtkSmartPointer<vtkPolyData> polyData = vtkSmartPointer<vtkPolyData>::New();

	polyData->SetPoints(points);
	polyData->SetVerts(pointsCells);

	for (std::map<int, vtkSmartPointer<vtkAbstractArray> >::iterator it=myarrays.begin(); it!=myarrays.end(); ++it) {
	polyData->GetPointData()->AddArray(it->second);
	}

	vtkSmartPointer<vtkXMLPolyDataWriter> writer = vtkSmartPointer<vtkXMLPolyDataWriter>::New();
	if (binary) writer->SetDataModeToBinary();
	else writer->SetDataModeToAscii();

	#if VTK_MAJOR_VERSION < 6
	writer->SetInput(polyData);
	#else
	writer->SetInputData(polyData);
	#endif
	writer->SetFileName(filecurrent);
	writer->Write();

	if (me == 0) {
	if (multiproc) {
	vtkSmartPointer<vtkXMLPPolyDataWriter> pwriter = vtkSmartPointer<vtkXMLPPolyDataWriter>::New();
	pwriter->SetFileName(parallelfilecurrent);
	pwriter->SetNumberOfPieces((multiproc > 1)?multiproc:nprocs);
	if (binary) pwriter->SetDataModeToBinary();
	else pwriter->SetDataModeToAscii();

	#if VTK_MAJOR_VERSION < 6
	pwriter->SetInput(polyData);
	#else
	pwriter->SetInputData(polyData);
	#endif
	pwriter->Write();
	}

	if (domain->triclinic == 0) {
	domainfilecurrent[strlen(domainfilecurrent)-1] = 'r'; // adjust filename extension
	write_domain_vtr();
	} else {
	domainfilecurrent[strlen(domainfilecurrent)-1] = 'u'; // adjust filename extension
	write_domain_vtu_triclinic();
	}
	}
	}

	reset_vtk_data_containers();
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::write_vtu(int n, double *mybuf)
	+void DumpVTK::write_vtu(int n, double *mybuf)
	{
	++n_calls_;

	buf2arrays(n, mybuf);

	if (n_calls_ < nclusterprocs)
	return; // multiple processors but not all are filewriters (-> nclusterprocs procs contribute to the filewriter's output data)

	setFileCurrent();

	{
	vtkSmartPointer<vtkUnstructuredGrid> unstructuredGrid = vtkSmartPointer<vtkUnstructuredGrid>::New();

	unstructuredGrid->SetPoints(points);
	unstructuredGrid->SetCells(VTK_VERTEX, pointsCells);

	for (std::map<int, vtkSmartPointer<vtkAbstractArray> >::iterator it=myarrays.begin(); it!=myarrays.end(); ++it) {
	unstructuredGrid->GetPointData()->AddArray(it->second);
	}

	vtkSmartPointer<vtkXMLUnstructuredGridWriter> writer = vtkSmartPointer<vtkXMLUnstructuredGridWriter>::New();
	if (binary) writer->SetDataModeToBinary();
	else writer->SetDataModeToAscii();

	#if VTK_MAJOR_VERSION < 6
	writer->SetInput(unstructuredGrid);
	#else
	writer->SetInputData(unstructuredGrid);
	#endif
	writer->SetFileName(filecurrent);
	writer->Write();

	if (me == 0) {
	if (multiproc) {
	vtkSmartPointer<vtkXMLPUnstructuredGridWriter> pwriter = vtkSmartPointer<vtkXMLPUnstructuredGridWriter>::New();
	pwriter->SetFileName(parallelfilecurrent);
	pwriter->SetNumberOfPieces((multiproc > 1)?multiproc:nprocs);
	if (binary) pwriter->SetDataModeToBinary();
	else pwriter->SetDataModeToAscii();

	#if VTK_MAJOR_VERSION < 6
	pwriter->SetInput(unstructuredGrid);
	#else
	pwriter->SetInputData(unstructuredGrid);
	#endif
	pwriter->Write();
	}

	if (domain->triclinic == 0) {
	domainfilecurrent[strlen(domainfilecurrent)-1] = 'r'; // adjust filename extension
	write_domain_vtr();
	} else {
	write_domain_vtu_triclinic();
	}
	}
	}

	reset_vtk_data_containers();
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::reset_vtk_data_containers()
	+void DumpVTK::reset_vtk_data_containers()
	{
	points = vtkSmartPointer<vtkPoints>::New();
	pointsCells = vtkSmartPointer<vtkCellArray>::New();

	std::map<int,int>::iterator it=vtype.begin();
	++it; ++it; ++it;
	for (; it!=vtype.end(); ++it) {
	switch(vtype[it->first]) {
	case INT:
	myarrays[it->first] = vtkSmartPointer<vtkIntArray>::New();
	break;
	case DOUBLE:
	myarrays[it->first] = vtkSmartPointer<vtkDoubleArray>::New();
	break;
	case STRING:
	myarrays[it->first] = vtkSmartPointer<vtkStringArray>::New();
	break;
	}

	if (vector_set.find(it->first) != vector_set.end()) {
	myarrays[it->first]->SetNumberOfComponents(3);
	myarrays[it->first]->SetName(name[it->first].c_str());
	++it; ++it;
	} else {
	myarrays[it->first]->SetName(name[it->first].c_str());
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	-int DumpCustomVTK::parse_fields(int narg, char **arg)
	+int DumpVTK::parse_fields(int narg, char **arg)
	{

	- pack_choice[X] = &DumpCustomVTK::pack_x;
	+ pack_choice[X] = &DumpVTK::pack_x;
	vtype[X] = DOUBLE;
	name[X] = "x";
	- pack_choice[Y] = &DumpCustomVTK::pack_y;
	+ pack_choice[Y] = &DumpVTK::pack_y;
	vtype[Y] = DOUBLE;
	name[Y] = "y";
	- pack_choice[Z] = &DumpCustomVTK::pack_z;
	+ pack_choice[Z] = &DumpVTK::pack_z;
	vtype[Z] = DOUBLE;
	name[Z] = "z";

	// customize by adding to if statement
	int i;
	for (int iarg = 5; iarg < narg; iarg++) {
	i = iarg-5;

	if (strcmp(arg[iarg],"id") == 0) {
	- pack_choice[ID] = &DumpCustomVTK::pack_id;
	+ pack_choice[ID] = &DumpVTK::pack_id;
	vtype[ID] = INT;
	name[ID] = arg[iarg];
	} else if (strcmp(arg[iarg],"mol") == 0) {
	if (!atom->molecule_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[MOL] = &DumpCustomVTK::pack_molecule;
	+ pack_choice[MOL] = &DumpVTK::pack_molecule;
	vtype[MOL] = INT;
	name[MOL] = arg[iarg];
	} else if (strcmp(arg[iarg],"proc") == 0) {
	- pack_choice[PROC] = &DumpCustomVTK::pack_proc;
	+ pack_choice[PROC] = &DumpVTK::pack_proc;
	vtype[PROC] = INT;
	name[PROC] = arg[iarg];
	} else if (strcmp(arg[iarg],"procp1") == 0) {
	- pack_choice[PROCP1] = &DumpCustomVTK::pack_procp1;
	+ pack_choice[PROCP1] = &DumpVTK::pack_procp1;
	vtype[PROCP1] = INT;
	name[PROCP1] = arg[iarg];
	} else if (strcmp(arg[iarg],"type") == 0) {
	- pack_choice[TYPE] = &DumpCustomVTK::pack_type;
	+ pack_choice[TYPE] = &DumpVTK::pack_type;
	vtype[TYPE] = INT;
	name[TYPE] =arg[iarg];
	} else if (strcmp(arg[iarg],"element") == 0) {
	- pack_choice[ELEMENT] = &DumpCustomVTK::pack_type;
	+ pack_choice[ELEMENT] = &DumpVTK::pack_type;
	vtype[ELEMENT] = STRING;
	name[ELEMENT] = arg[iarg];
	} else if (strcmp(arg[iarg],"mass") == 0) {
	- pack_choice[MASS] = &DumpCustomVTK::pack_mass;
	+ pack_choice[MASS] = &DumpVTK::pack_mass;
	vtype[MASS] = DOUBLE;
	name[MASS] = arg[iarg];

	} else if (strcmp(arg[iarg],"x") == 0) {
	// required property
	} else if (strcmp(arg[iarg],"y") == 0) {
	// required property
	} else if (strcmp(arg[iarg],"z") == 0) {
	// required property
	} else if (strcmp(arg[iarg],"xs") == 0) {
	- if (domain->triclinic) pack_choice[XS] = &DumpCustomVTK::pack_xs_triclinic;
	- else pack_choice[XS] = &DumpCustomVTK::pack_xs;
	+ if (domain->triclinic) pack_choice[XS] = &DumpVTK::pack_xs_triclinic;
	+ else pack_choice[XS] = &DumpVTK::pack_xs;
	vtype[XS] = DOUBLE;
	name[XS] = arg[iarg];
	} else if (strcmp(arg[iarg],"ys") == 0) {
	- if (domain->triclinic) pack_choice[YS] = &DumpCustomVTK::pack_ys_triclinic;
	- else pack_choice[YS] = &DumpCustomVTK::pack_ys;
	+ if (domain->triclinic) pack_choice[YS] = &DumpVTK::pack_ys_triclinic;
	+ else pack_choice[YS] = &DumpVTK::pack_ys;
	vtype[YS] = DOUBLE;
	name[YS] = arg[iarg];
	} else if (strcmp(arg[iarg],"zs") == 0) {
	- if (domain->triclinic) pack_choice[ZS] = &DumpCustomVTK::pack_zs_triclinic;
	- else pack_choice[ZS] = &DumpCustomVTK::pack_zs;
	+ if (domain->triclinic) pack_choice[ZS] = &DumpVTK::pack_zs_triclinic;
	+ else pack_choice[ZS] = &DumpVTK::pack_zs;
	vtype[ZS] = DOUBLE;
	name[ZS] = arg[iarg];
	} else if (strcmp(arg[iarg],"xu") == 0) {
	- if (domain->triclinic) pack_choice[XU] = &DumpCustomVTK::pack_xu_triclinic;
	- else pack_choice[XU] = &DumpCustomVTK::pack_xu;
	+ if (domain->triclinic) pack_choice[XU] = &DumpVTK::pack_xu_triclinic;
	+ else pack_choice[XU] = &DumpVTK::pack_xu;
	vtype[XU] = DOUBLE;
	name[XU] = arg[iarg];
	} else if (strcmp(arg[iarg],"yu") == 0) {
	- if (domain->triclinic) pack_choice[YU] = &DumpCustomVTK::pack_yu_triclinic;
	- else pack_choice[YU] = &DumpCustomVTK::pack_yu;
	+ if (domain->triclinic) pack_choice[YU] = &DumpVTK::pack_yu_triclinic;
	+ else pack_choice[YU] = &DumpVTK::pack_yu;
	vtype[YU] = DOUBLE;
	name[YU] = arg[iarg];
	} else if (strcmp(arg[iarg],"zu") == 0) {
	- if (domain->triclinic) pack_choice[ZU] = &DumpCustomVTK::pack_zu_triclinic;
	- else pack_choice[ZU] = &DumpCustomVTK::pack_zu;
	+ if (domain->triclinic) pack_choice[ZU] = &DumpVTK::pack_zu_triclinic;
	+ else pack_choice[ZU] = &DumpVTK::pack_zu;
	vtype[ZU] = DOUBLE;
	name[ZU] = arg[iarg];
	} else if (strcmp(arg[iarg],"xsu") == 0) {
	- if (domain->triclinic) pack_choice[XSU] = &DumpCustomVTK::pack_xsu_triclinic;
	- else pack_choice[XSU] = &DumpCustomVTK::pack_xsu;
	+ if (domain->triclinic) pack_choice[XSU] = &DumpVTK::pack_xsu_triclinic;
	+ else pack_choice[XSU] = &DumpVTK::pack_xsu;
	vtype[XSU] = DOUBLE;
	name[XSU] = arg[iarg];
	} else if (strcmp(arg[iarg],"ysu") == 0) {
	- if (domain->triclinic) pack_choice[YSU] = &DumpCustomVTK::pack_ysu_triclinic;
	- else pack_choice[YSU] = &DumpCustomVTK::pack_ysu;
	+ if (domain->triclinic) pack_choice[YSU] = &DumpVTK::pack_ysu_triclinic;
	+ else pack_choice[YSU] = &DumpVTK::pack_ysu;
	vtype[YSU] = DOUBLE;
	name[YSU] = arg[iarg];
	} else if (strcmp(arg[iarg],"zsu") == 0) {
	- if (domain->triclinic) pack_choice[ZSU] = &DumpCustomVTK::pack_zsu_triclinic;
	- else pack_choice[ZSU] = &DumpCustomVTK::pack_zsu;
	+ if (domain->triclinic) pack_choice[ZSU] = &DumpVTK::pack_zsu_triclinic;
	+ else pack_choice[ZSU] = &DumpVTK::pack_zsu;
	vtype[ZSU] = DOUBLE;
	name[ZSU] = arg[iarg];
	} else if (strcmp(arg[iarg],"ix") == 0) {
	- pack_choice[IX] = &DumpCustomVTK::pack_ix;
	+ pack_choice[IX] = &DumpVTK::pack_ix;
	vtype[IX] = INT;
	name[IX] = arg[iarg];
	} else if (strcmp(arg[iarg],"iy") == 0) {
	- pack_choice[IY] = &DumpCustomVTK::pack_iy;
	+ pack_choice[IY] = &DumpVTK::pack_iy;
	vtype[IY] = INT;
	name[IY] = arg[iarg];
	} else if (strcmp(arg[iarg],"iz") == 0) {
	- pack_choice[IZ] = &DumpCustomVTK::pack_iz;
	+ pack_choice[IZ] = &DumpVTK::pack_iz;
	vtype[IZ] = INT;
	name[IZ] = arg[iarg];

	} else if (strcmp(arg[iarg],"vx") == 0) {
	- pack_choice[VX] = &DumpCustomVTK::pack_vx;
	+ pack_choice[VX] = &DumpVTK::pack_vx;
	vtype[VX] = DOUBLE;
	name[VX] = arg[iarg];
	} else if (strcmp(arg[iarg],"vy") == 0) {
	- pack_choice[VY] = &DumpCustomVTK::pack_vy;
	+ pack_choice[VY] = &DumpVTK::pack_vy;
	vtype[VY] = DOUBLE;
	name[VY] = arg[iarg];
	} else if (strcmp(arg[iarg],"vz") == 0) {
	- pack_choice[VZ] = &DumpCustomVTK::pack_vz;
	+ pack_choice[VZ] = &DumpVTK::pack_vz;
	vtype[VZ] = DOUBLE;
	name[VZ] = arg[iarg];
	} else if (strcmp(arg[iarg],"fx") == 0) {
	- pack_choice[FX] = &DumpCustomVTK::pack_fx;
	+ pack_choice[FX] = &DumpVTK::pack_fx;
	vtype[FX] = DOUBLE;
	name[FX] = arg[iarg];
	} else if (strcmp(arg[iarg],"fy") == 0) {
	- pack_choice[FY] = &DumpCustomVTK::pack_fy;
	+ pack_choice[FY] = &DumpVTK::pack_fy;
	vtype[FY] = DOUBLE;
	name[FY] = arg[iarg];
	} else if (strcmp(arg[iarg],"fz") == 0) {
	- pack_choice[FZ] = &DumpCustomVTK::pack_fz;
	+ pack_choice[FZ] = &DumpVTK::pack_fz;
	vtype[FZ] = DOUBLE;
	name[FZ] = arg[iarg];
	} else if (strcmp(arg[iarg],"q") == 0) {
	if (!atom->q_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[Q] = &DumpCustomVTK::pack_q;
	+ pack_choice[Q] = &DumpVTK::pack_q;
	vtype[Q] = DOUBLE;
	name[Q] = arg[iarg];
	} else if (strcmp(arg[iarg],"mux") == 0) {
	if (!atom->mu_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[MUX] = &DumpCustomVTK::pack_mux;
	+ pack_choice[MUX] = &DumpVTK::pack_mux;
	vtype[MUX] = DOUBLE;
	name[MUX] = arg[iarg];
	} else if (strcmp(arg[iarg],"muy") == 0) {
	if (!atom->mu_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[MUY] = &DumpCustomVTK::pack_muy;
	+ pack_choice[MUY] = &DumpVTK::pack_muy;
	vtype[MUY] = DOUBLE;
	name[MUY] = arg[iarg];
	} else if (strcmp(arg[iarg],"muz") == 0) {
	if (!atom->mu_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[MUZ] = &DumpCustomVTK::pack_muz;
	+ pack_choice[MUZ] = &DumpVTK::pack_muz;
	vtype[MUZ] = DOUBLE;
	name[MUZ] = arg[iarg];
	} else if (strcmp(arg[iarg],"mu") == 0) {
	if (!atom->mu_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[MU] = &DumpCustomVTK::pack_mu;
	+ pack_choice[MU] = &DumpVTK::pack_mu;
	vtype[MU] = DOUBLE;
	name[MU] = arg[iarg];

	} else if (strcmp(arg[iarg],"radius") == 0) {
	if (!atom->radius_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[RADIUS] = &DumpCustomVTK::pack_radius;
	+ pack_choice[RADIUS] = &DumpVTK::pack_radius;
	vtype[RADIUS] = DOUBLE;
	name[RADIUS] = arg[iarg];
	} else if (strcmp(arg[iarg],"diameter") == 0) {
	if (!atom->radius_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[DIAMETER] = &DumpCustomVTK::pack_diameter;
	+ pack_choice[DIAMETER] = &DumpVTK::pack_diameter;
	vtype[DIAMETER] = DOUBLE;
	name[DIAMETER] = arg[iarg];
	} else if (strcmp(arg[iarg],"omegax") == 0) {
	if (!atom->omega_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[OMEGAX] = &DumpCustomVTK::pack_omegax;
	+ pack_choice[OMEGAX] = &DumpVTK::pack_omegax;
	vtype[OMEGAX] = DOUBLE;
	name[OMEGAX] = arg[iarg];
	} else if (strcmp(arg[iarg],"omegay") == 0) {
	if (!atom->omega_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[OMEGAY] = &DumpCustomVTK::pack_omegay;
	+ pack_choice[OMEGAY] = &DumpVTK::pack_omegay;
	vtype[OMEGAY] = DOUBLE;
	name[OMEGAY] = arg[iarg];
	} else if (strcmp(arg[iarg],"omegaz") == 0) {
	if (!atom->omega_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[OMEGAZ] = &DumpCustomVTK::pack_omegaz;
	+ pack_choice[OMEGAZ] = &DumpVTK::pack_omegaz;
	vtype[OMEGAZ] = DOUBLE;
	name[OMEGAZ] = arg[iarg];
	} else if (strcmp(arg[iarg],"angmomx") == 0) {
	if (!atom->angmom_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[ANGMOMX] = &DumpCustomVTK::pack_angmomx;
	+ pack_choice[ANGMOMX] = &DumpVTK::pack_angmomx;
	vtype[ANGMOMX] = DOUBLE;
	name[ANGMOMX] = arg[iarg];
	} else if (strcmp(arg[iarg],"angmomy") == 0) {
	if (!atom->angmom_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[ANGMOMY] = &DumpCustomVTK::pack_angmomy;
	+ pack_choice[ANGMOMY] = &DumpVTK::pack_angmomy;
	vtype[ANGMOMY] = DOUBLE;
	name[ANGMOMY] = arg[iarg];
	} else if (strcmp(arg[iarg],"angmomz") == 0) {
	if (!atom->angmom_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[ANGMOMZ] = &DumpCustomVTK::pack_angmomz;
	+ pack_choice[ANGMOMZ] = &DumpVTK::pack_angmomz;
	vtype[ANGMOMZ] = DOUBLE;
	name[ANGMOMZ] = arg[iarg];
	} else if (strcmp(arg[iarg],"tqx") == 0) {
	if (!atom->torque_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[TQX] = &DumpCustomVTK::pack_tqx;
	+ pack_choice[TQX] = &DumpVTK::pack_tqx;
	vtype[TQX] = DOUBLE;
	name[TQX] = arg[iarg];
	} else if (strcmp(arg[iarg],"tqy") == 0) {
	if (!atom->torque_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[TQY] = &DumpCustomVTK::pack_tqy;
	+ pack_choice[TQY] = &DumpVTK::pack_tqy;
	vtype[TQY] = DOUBLE;
	name[TQY] = arg[iarg];
	} else if (strcmp(arg[iarg],"tqz") == 0) {
	if (!atom->torque_flag)
	error->all(FLERR,"Dumping an atom property that isn't allocated");
	- pack_choice[TQZ] = &DumpCustomVTK::pack_tqz;
	+ pack_choice[TQZ] = &DumpVTK::pack_tqz;
	vtype[TQZ] = DOUBLE;
	name[TQZ] = arg[iarg];

	// compute value = c_ID
	// if no trailing [], then arg is set to 0, else arg is int between []

	} else if (strncmp(arg[iarg],"c_",2) == 0) {
	- pack_choice[ATTRIBUTES+i] = &DumpCustomVTK::pack_compute;
	+ pack_choice[ATTRIBUTES+i] = &DumpVTK::pack_compute;
	vtype[ATTRIBUTES+i] = DOUBLE;

	int n = strlen(arg[iarg]);
	char *suffix = new char[n];
	strcpy(suffix,&arg[iarg][2]);

	char *ptr = strchr(suffix,'[');
	if (ptr) {
	if (suffix[strlen(suffix)-1] != ']')
	- error->all(FLERR,"Invalid attribute in dump custom/vtk command");
	+ error->all(FLERR,"Invalid attribute in dump vtk command");
	argindex[ATTRIBUTES+i] = atoi(ptr+1);
	*ptr = '\0';
	} else argindex[ATTRIBUTES+i] = 0;

	n = modify->find_compute(suffix);
	- if (n < 0) error->all(FLERR,"Could not find dump custom/vtk compute ID");
	+ if (n < 0) error->all(FLERR,"Could not find dump vtk compute ID");
	if (modify->compute[n]->peratom_flag == 0)
	- error->all(FLERR,"Dump custom/vtk compute does not compute per-atom info");
	+ error->all(FLERR,"Dump vtk compute does not compute per-atom info");
	if (argindex[ATTRIBUTES+i] == 0 && modify->compute[n]->size_peratom_cols > 0)
	error->all(FLERR,
	- "Dump custom/vtk compute does not calculate per-atom vector");
	+ "Dump vtk compute does not calculate per-atom vector");
	if (argindex[ATTRIBUTES+i] > 0 && modify->compute[n]->size_peratom_cols == 0)
	error->all(FLERR,\
	- "Dump custom/vtk compute does not calculate per-atom array");
	+ "Dump vtk compute does not calculate per-atom array");
	if (argindex[ATTRIBUTES+i] > 0 &&
	argindex[ATTRIBUTES+i] > modify->compute[n]->size_peratom_cols)
	- error->all(FLERR,"Dump custom/vtk compute vector is accessed out-of-range");
	+ error->all(FLERR,"Dump vtk compute vector is accessed out-of-range");

	field2index[ATTRIBUTES+i] = add_compute(suffix);
	name[ATTRIBUTES+i] = arg[iarg];
	delete [] suffix;

	// fix value = f_ID
	// if no trailing [], then arg is set to 0, else arg is between []

	} else if (strncmp(arg[iarg],"f_",2) == 0) {
	- pack_choice[ATTRIBUTES+i] = &DumpCustomVTK::pack_fix;
	+ pack_choice[ATTRIBUTES+i] = &DumpVTK::pack_fix;
	vtype[ATTRIBUTES+i] = DOUBLE;

	int n = strlen(arg[iarg]);
	char *suffix = new char[n];
	strcpy(suffix,&arg[iarg][2]);

	char *ptr = strchr(suffix,'[');
	if (ptr) {
	if (suffix[strlen(suffix)-1] != ']')
	- error->all(FLERR,"Invalid attribute in dump custom/vtk command");
	+ error->all(FLERR,"Invalid attribute in dump vtk command");
	argindex[ATTRIBUTES+i] = atoi(ptr+1);
	*ptr = '\0';
	} else argindex[ATTRIBUTES+i] = 0;

	n = modify->find_fix(suffix);
	- if (n < 0) error->all(FLERR,"Could not find dump custom/vtk fix ID");
	+ if (n < 0) error->all(FLERR,"Could not find dump vtk fix ID");
	if (modify->fix[n]->peratom_flag == 0)
	- error->all(FLERR,"Dump custom/vtk fix does not compute per-atom info");
	+ error->all(FLERR,"Dump vtk fix does not compute per-atom info");
	if (argindex[ATTRIBUTES+i] == 0 && modify->fix[n]->size_peratom_cols > 0)
	- error->all(FLERR,"Dump custom/vtk fix does not compute per-atom vector");
	+ error->all(FLERR,"Dump vtk fix does not compute per-atom vector");
	if (argindex[ATTRIBUTES+i] > 0 && modify->fix[n]->size_peratom_cols == 0)
	- error->all(FLERR,"Dump custom/vtk fix does not compute per-atom array");
	+ error->all(FLERR,"Dump vtk fix does not compute per-atom array");
	if (argindex[ATTRIBUTES+i] > 0 &&
	argindex[ATTRIBUTES+i] > modify->fix[n]->size_peratom_cols)
	- error->all(FLERR,"Dump custom/vtk fix vector is accessed out-of-range");
	+ error->all(FLERR,"Dump vtk fix vector is accessed out-of-range");

	field2index[ATTRIBUTES+i] = add_fix(suffix);
	name[ATTRIBUTES+i] = arg[iarg];
	delete [] suffix;

	// variable value = v_name

	} else if (strncmp(arg[iarg],"v_",2) == 0) {
	- pack_choice[ATTRIBUTES+i] = &DumpCustomVTK::pack_variable;
	+ pack_choice[ATTRIBUTES+i] = &DumpVTK::pack_variable;
	vtype[ATTRIBUTES+i] = DOUBLE;

	int n = strlen(arg[iarg]);
	char *suffix = new char[n];
	strcpy(suffix,&arg[iarg][2]);

	argindex[ATTRIBUTES+i] = 0;

	n = input->variable->find(suffix);
	- if (n < 0) error->all(FLERR,"Could not find dump custom/vtk variable name");
	+ if (n < 0) error->all(FLERR,"Could not find dump vtk variable name");
	if (input->variable->atomstyle(n) == 0)
	- error->all(FLERR,"Dump custom/vtk variable is not atom-style variable");
	+ error->all(FLERR,"Dump vtk variable is not atom-style variable");

	field2index[ATTRIBUTES+i] = add_variable(suffix);
	name[ATTRIBUTES+i] = suffix;
	delete [] suffix;

	// custom per-atom floating point value = d_ID

	} else if (strncmp(arg[iarg],"d_",2) == 0) {
	- pack_choice[ATTRIBUTES+i] = &DumpCustomVTK::pack_custom;
	+ pack_choice[ATTRIBUTES+i] = &DumpVTK::pack_custom;
	vtype[ATTRIBUTES+i] = DOUBLE;

	int n = strlen(arg[iarg]);
	char *suffix = new char[n];
	strcpy(suffix,&arg[iarg][2]);
	argindex[ATTRIBUTES+i] = 0;

	int tmp = -1;
	n = atom->find_custom(suffix,tmp);
	if (n < 0)
	error->all(FLERR,"Could not find custom per-atom property ID");

	if (tmp != 1)
	error->all(FLERR,"Custom per-atom property ID is not floating point");

	field2index[ATTRIBUTES+i] = add_custom(suffix,1);
	name[ATTRIBUTES+i] = suffix;
	delete [] suffix;

	// custom per-atom integer value = i_ID

	} else if (strncmp(arg[iarg],"i_",2) == 0) {
	- pack_choice[ATTRIBUTES+i] = &DumpCustomVTK::pack_custom;
	+ pack_choice[ATTRIBUTES+i] = &DumpVTK::pack_custom;
	vtype[ATTRIBUTES+i] = INT;

	int n = strlen(arg[iarg]);
	char *suffix = new char[n];
	strcpy(suffix,&arg[iarg][2]);
	argindex[ATTRIBUTES+i] = 0;

	int tmp = -1;
	n = atom->find_custom(suffix,tmp);
	if (n < 0)
	error->all(FLERR,"Could not find custom per-atom property ID");

	if (tmp != 0)
	error->all(FLERR,"Custom per-atom property ID is not integer");

	field2index[ATTRIBUTES+i] = add_custom(suffix,0);
	name[ATTRIBUTES+i] = suffix;
	delete [] suffix;

	} else return iarg;
	}

	identify_vectors();

	return narg;
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::identify_vectors()
	+void DumpVTK::identify_vectors()
	{
	// detect vectors
	vector_set.insert(X); // required

	int vector3_starts[] = {XS, XU, XSU, IX, VX, FX, MUX, OMEGAX, ANGMOMX, TQX};
	int num_vector3_starts = sizeof(vector3_starts) / sizeof(int);

	for (int v3s = 0; v3s < num_vector3_starts; v3s++) {
	if(name.count(vector3_starts[v3s] ) &&
	name.count(vector3_starts[v3s]+1) &&
	name.count(vector3_starts[v3s]+2) )
	{
	std::string vectorName = name[vector3_starts[v3s]];
	vectorName.erase(vectorName.find_first_of('x'));
	name[vector3_starts[v3s]] = vectorName;
	vector_set.insert(vector3_starts[v3s]);
	}
	}

	// compute and fix vectors
	for (std::map<int,std::string>::iterator it=name.begin(); it!=name.end(); ++it) {
	if (it->first < ATTRIBUTES) // neither fix nor compute
	continue;

	if(argindex[it->first] == 0) // single value
	continue;

	// assume components are grouped together and in correct order
	if(name.count(it->first + 1) && name.count(it->first + 2) ) { // more attributes?
	if(it->second.compare(0,it->second.length()-3,name[it->first + 1],0,it->second.length()-3) == 0 && // same attributes?
	it->second.compare(0,it->second.length()-3,name[it->first + 2],0,it->second.length()-3) == 0 )
	{
	it->second.erase(it->second.length()-1);
	std::ostringstream oss;
	oss << "-" << argindex[it->first+2] << "]";
	it->second += oss.str();
	vector_set.insert(it->first);
	++it; ++it;
	}
	}
	}
	}

	/* ----------------------------------------------------------------------
	add Compute to list of Compute objects used by dump
	return index of where this Compute is in list
	if already in list, do not add, just return index, else add to list
	------------------------------------------------------------------------- */

	-int DumpCustomVTK::add_compute(char *id)
	+int DumpVTK::add_compute(char *id)
	{
	int icompute;
	for (icompute = 0; icompute < ncompute; icompute++)
	if (strcmp(id,id_compute[icompute]) == 0) break;
	if (icompute < ncompute) return icompute;

	id_compute = (char **)
	memory->srealloc(id_compute,(ncompute+1)sizeof(char ),"dump:id_compute");
	delete [] compute;
	compute = new Compute*[ncompute+1];

	int n = strlen(id) + 1;
	id_compute[ncompute] = new char[n];
	strcpy(id_compute[ncompute],id);
	ncompute++;
	return ncompute-1;
	}

	/* ----------------------------------------------------------------------
	add Fix to list of Fix objects used by dump
	return index of where this Fix is in list
	if already in list, do not add, just return index, else add to list
	------------------------------------------------------------------------- */

	-int DumpCustomVTK::add_fix(char *id)
	+int DumpVTK::add_fix(char *id)
	{
	int ifix;
	for (ifix = 0; ifix < nfix; ifix++)
	if (strcmp(id,id_fix[ifix]) == 0) break;
	if (ifix < nfix) return ifix;

	id_fix = (char **)
	memory->srealloc(id_fix,(nfix+1)sizeof(char ),"dump:id_fix");
	delete [] fix;
	fix = new Fix*[nfix+1];

	int n = strlen(id) + 1;
	id_fix[nfix] = new char[n];
	strcpy(id_fix[nfix],id);
	nfix++;
	return nfix-1;
	}

	/* ----------------------------------------------------------------------
	add Variable to list of Variables used by dump
	return index of where this Variable is in list
	if already in list, do not add, just return index, else add to list
	------------------------------------------------------------------------- */

	-int DumpCustomVTK::add_variable(char *id)
	+int DumpVTK::add_variable(char *id)
	{
	int ivariable;
	for (ivariable = 0; ivariable < nvariable; ivariable++)
	if (strcmp(id,id_variable[ivariable]) == 0) break;
	if (ivariable < nvariable) return ivariable;

	id_variable = (char **)
	memory->srealloc(id_variable,(nvariable+1)sizeof(char ),
	"dump:id_variable");
	delete [] variable;
	variable = new int[nvariable+1];
	delete [] vbuf;
	vbuf = new double*[nvariable+1];
	for (int i = 0; i <= nvariable; i++) vbuf[i] = NULL;

	int n = strlen(id) + 1;
	id_variable[nvariable] = new char[n];
	strcpy(id_variable[nvariable],id);
	nvariable++;
	return nvariable-1;
	}

	/* ----------------------------------------------------------------------
	add custom atom property to list used by dump
	return index of where this property is in list
	if already in list, do not add, just return index, else add to list
	------------------------------------------------------------------------- */

	-int DumpCustomVTK::add_custom(char *id, int flag)
	+int DumpVTK::add_custom(char *id, int flag)
	{
	int icustom;
	for (icustom = 0; icustom < ncustom; icustom++)
	if ((strcmp(id,id_custom[icustom]) == 0)
	&& (flag == flag_custom[icustom])) break;
	if (icustom < ncustom) return icustom;

	id_custom = (char **)
	memory->srealloc(id_custom,(ncustom+1)sizeof(char ),"dump:id_custom");
	flag_custom = (int *)
	memory->srealloc(flag_custom,(ncustom+1)*sizeof(int),"dump:flag_custom");

	int n = strlen(id) + 1;
	id_custom[ncustom] = new char[n];
	strcpy(id_custom[ncustom],id);
	flag_custom[ncustom] = flag;

	ncustom++;
	return ncustom-1;
	}

	/* ---------------------------------------------------------------------- */

	-int DumpCustomVTK::modify_param(int narg, char **arg)
	+int DumpVTK::modify_param(int narg, char **arg)
	{
	if (strcmp(arg[0],"region") == 0) {
	if (narg < 2) error->all(FLERR,"Illegal dump_modify command");
	if (strcmp(arg[1],"none") == 0) iregion = -1;
	else {
	iregion = domain->find_region(arg[1]);
	if (iregion == -1)
	error->all(FLERR,"Dump_modify region ID does not exist");
	delete [] idregion;
	int n = strlen(arg[1]) + 1;
	idregion = new char[n];
	strcpy(idregion,arg[1]);
	}
	return 2;
	}

	if (strcmp(arg[0],"label") == 0) {
	if (narg < 2) error->all(FLERR,"Illegal dump_modify command [label]");
	delete [] label;
	int n = strlen(arg[1]) + 1;
	label = new char[n];
	strcpy(label,arg[1]);
	return 2;
	}

	if (strcmp(arg[0],"binary") == 0) {
	if (narg < 2) error->all(FLERR,"Illegal dump_modify command [binary]");
	if (strcmp(arg[1],"yes") == 0) binary = 1;
	else if (strcmp(arg[1],"no") == 0) binary = 0;
	else error->all(FLERR,"Illegal dump_modify command [binary]");
	return 2;
	}

	if (strcmp(arg[0],"element") == 0) {
	if (narg < ntypes+1)
	error->all(FLERR,"Dump modify: number of element names do not match atom types");

	if (typenames) {
	for (int i = 1; i <= ntypes; i++) delete [] typenames[i];
	delete [] typenames;
	typenames = NULL;
	}

	typenames = new char*[ntypes+1];
	for (int itype = 1; itype <= ntypes; itype++) {
	int n = strlen(arg[itype]) + 1;
	typenames[itype] = new char[n];
	strcpy(typenames[itype],arg[itype]);
	}
	return ntypes+1;
	}

	if (strcmp(arg[0],"thresh") == 0) {
	if (narg < 2) error->all(FLERR,"Illegal dump_modify command");
	if (strcmp(arg[1],"none") == 0) {
	if (nthresh) {
	memory->destroy(thresh_array);
	memory->destroy(thresh_op);
	memory->destroy(thresh_value);
	thresh_array = NULL;
	thresh_op = NULL;
	thresh_value = NULL;
	}
	nthresh = 0;
	return 2;
	}

	if (narg < 4) error->all(FLERR,"Illegal dump_modify command");

	// grow threshold arrays

	memory->grow(thresh_array,nthresh+1,"dump:thresh_array");
	memory->grow(thresh_op,(nthresh+1),"dump:thresh_op");
	memory->grow(thresh_value,(nthresh+1),"dump:thresh_value");

	// set attribute type of threshold
	// customize by adding to if statement

	if (strcmp(arg[1],"id") == 0) thresh_array[nthresh] = ID;
	else if (strcmp(arg[1],"mol") == 0) thresh_array[nthresh] = MOL;
	else if (strcmp(arg[1],"proc") == 0) thresh_array[nthresh] = PROC;
	else if (strcmp(arg[1],"procp1") == 0) thresh_array[nthresh] = PROCP1;
	else if (strcmp(arg[1],"type") == 0) thresh_array[nthresh] = TYPE;
	else if (strcmp(arg[1],"mass") == 0) thresh_array[nthresh] = MASS;

	else if (strcmp(arg[1],"x") == 0) thresh_array[nthresh] = X;
	else if (strcmp(arg[1],"y") == 0) thresh_array[nthresh] = Y;
	else if (strcmp(arg[1],"z") == 0) thresh_array[nthresh] = Z;

	else if (strcmp(arg[1],"xs") == 0 && domain->triclinic == 0)
	thresh_array[nthresh] = XS;
	else if (strcmp(arg[1],"xs") == 0 && domain->triclinic == 1)
	thresh_array[nthresh] = XSTRI;
	else if (strcmp(arg[1],"ys") == 0 && domain->triclinic == 0)
	thresh_array[nthresh] = YS;
	else if (strcmp(arg[1],"ys") == 0 && domain->triclinic == 1)
	thresh_array[nthresh] = YSTRI;
	else if (strcmp(arg[1],"zs") == 0 && domain->triclinic == 0)
	thresh_array[nthresh] = ZS;
	else if (strcmp(arg[1],"zs") == 0 && domain->triclinic == 1)
	thresh_array[nthresh] = ZSTRI;

	else if (strcmp(arg[1],"xu") == 0 && domain->triclinic == 0)
	thresh_array[nthresh] = XU;
	else if (strcmp(arg[1],"xu") == 0 && domain->triclinic == 1)
	thresh_array[nthresh] = XUTRI;
	else if (strcmp(arg[1],"yu") == 0 && domain->triclinic == 0)
	thresh_array[nthresh] = YU;
	else if (strcmp(arg[1],"yu") == 0 && domain->triclinic == 1)
	thresh_array[nthresh] = YUTRI;
	else if (strcmp(arg[1],"zu") == 0 && domain->triclinic == 0)
	thresh_array[nthresh] = ZU;
	else if (strcmp(arg[1],"zu") == 0 && domain->triclinic == 1)
	thresh_array[nthresh] = ZUTRI;

	else if (strcmp(arg[1],"xsu") == 0 && domain->triclinic == 0)
	thresh_array[nthresh] = XSU;
	else if (strcmp(arg[1],"xsu") == 0 && domain->triclinic == 1)
	thresh_array[nthresh] = XSUTRI;
	else if (strcmp(arg[1],"ysu") == 0 && domain->triclinic == 0)
	thresh_array[nthresh] = YSU;
	else if (strcmp(arg[1],"ysu") == 0 && domain->triclinic == 1)
	thresh_array[nthresh] = YSUTRI;
	else if (strcmp(arg[1],"zsu") == 0 && domain->triclinic == 0)
	thresh_array[nthresh] = ZSU;
	else if (strcmp(arg[1],"zsu") == 0 && domain->triclinic == 1)
	thresh_array[nthresh] = ZSUTRI;

	else if (strcmp(arg[1],"ix") == 0) thresh_array[nthresh] = IX;
	else if (strcmp(arg[1],"iy") == 0) thresh_array[nthresh] = IY;
	else if (strcmp(arg[1],"iz") == 0) thresh_array[nthresh] = IZ;
	else if (strcmp(arg[1],"vx") == 0) thresh_array[nthresh] = VX;
	else if (strcmp(arg[1],"vy") == 0) thresh_array[nthresh] = VY;
	else if (strcmp(arg[1],"vz") == 0) thresh_array[nthresh] = VZ;
	else if (strcmp(arg[1],"fx") == 0) thresh_array[nthresh] = FX;
	else if (strcmp(arg[1],"fy") == 0) thresh_array[nthresh] = FY;
	else if (strcmp(arg[1],"fz") == 0) thresh_array[nthresh] = FZ;

	else if (strcmp(arg[1],"q") == 0) thresh_array[nthresh] = Q;
	else if (strcmp(arg[1],"mux") == 0) thresh_array[nthresh] = MUX;
	else if (strcmp(arg[1],"muy") == 0) thresh_array[nthresh] = MUY;
	else if (strcmp(arg[1],"muz") == 0) thresh_array[nthresh] = MUZ;
	else if (strcmp(arg[1],"mu") == 0) thresh_array[nthresh] = MU;

	else if (strcmp(arg[1],"radius") == 0) thresh_array[nthresh] = RADIUS;
	else if (strcmp(arg[1],"diameter") == 0) thresh_array[nthresh] = DIAMETER;
	else if (strcmp(arg[1],"omegax") == 0) thresh_array[nthresh] = OMEGAX;
	else if (strcmp(arg[1],"omegay") == 0) thresh_array[nthresh] = OMEGAY;
	else if (strcmp(arg[1],"omegaz") == 0) thresh_array[nthresh] = OMEGAZ;
	else if (strcmp(arg[1],"angmomx") == 0) thresh_array[nthresh] = ANGMOMX;
	else if (strcmp(arg[1],"angmomy") == 0) thresh_array[nthresh] = ANGMOMY;
	else if (strcmp(arg[1],"angmomz") == 0) thresh_array[nthresh] = ANGMOMZ;
	else if (strcmp(arg[1],"tqx") == 0) thresh_array[nthresh] = TQX;
	else if (strcmp(arg[1],"tqy") == 0) thresh_array[nthresh] = TQY;
	else if (strcmp(arg[1],"tqz") == 0) thresh_array[nthresh] = TQZ;

	// compute value = c_ID
	// if no trailing [], then arg is set to 0, else arg is between []

	else if (strncmp(arg[1],"c_",2) == 0) {
	thresh_array[nthresh] = COMPUTE;
	int n = strlen(arg[1]);
	char *suffix = new char[n];
	strcpy(suffix,&arg[1][2]);

	char *ptr = strchr(suffix,'[');
	if (ptr) {
	if (suffix[strlen(suffix)-1] != ']')
	error->all(FLERR,"Invalid attribute in dump modify command");
	argindex[ATTRIBUTES+nfield+nthresh] = atoi(ptr+1);
	*ptr = '\0';
	} else argindex[ATTRIBUTES+nfield+nthresh] = 0;

	n = modify->find_compute(suffix);
	if (n < 0) error->all(FLERR,"Could not find dump modify compute ID");

	if (modify->compute[n]->peratom_flag == 0)
	error->all(FLERR,
	"Dump modify compute ID does not compute per-atom info");
	if (argindex[ATTRIBUTES+nfield+nthresh] == 0 &&
	modify->compute[n]->size_peratom_cols > 0)
	error->all(FLERR,
	"Dump modify compute ID does not compute per-atom vector");
	if (argindex[ATTRIBUTES+nfield+nthresh] > 0 &&
	modify->compute[n]->size_peratom_cols == 0)
	error->all(FLERR,
	"Dump modify compute ID does not compute per-atom array");
	if (argindex[ATTRIBUTES+nfield+nthresh] > 0 &&
	argindex[ATTRIBUTES+nfield+nthresh] > modify->compute[n]->size_peratom_cols)
	error->all(FLERR,"Dump modify compute ID vector is not large enough");

	field2index[ATTRIBUTES+nfield+nthresh] = add_compute(suffix);
	delete [] suffix;

	// fix value = f_ID
	// if no trailing [], then arg is set to 0, else arg is between []

	} else if (strncmp(arg[1],"f_",2) == 0) {
	thresh_array[nthresh] = FIX;
	int n = strlen(arg[1]);
	char *suffix = new char[n];
	strcpy(suffix,&arg[1][2]);

	char *ptr = strchr(suffix,'[');
	if (ptr) {
	if (suffix[strlen(suffix)-1] != ']')
	error->all(FLERR,"Invalid attribute in dump modify command");
	argindex[ATTRIBUTES+nfield+nthresh] = atoi(ptr+1);
	*ptr = '\0';
	} else argindex[ATTRIBUTES+nfield+nthresh] = 0;

	n = modify->find_fix(suffix);
	if (n < 0) error->all(FLERR,"Could not find dump modify fix ID");

	if (modify->fix[n]->peratom_flag == 0)
	error->all(FLERR,"Dump modify fix ID does not compute per-atom info");
	if (argindex[ATTRIBUTES+nfield+nthresh] == 0 &&
	modify->fix[n]->size_peratom_cols > 0)
	error->all(FLERR,"Dump modify fix ID does not compute per-atom vector");
	if (argindex[ATTRIBUTES+nfield+nthresh] > 0 &&
	modify->fix[n]->size_peratom_cols == 0)
	error->all(FLERR,"Dump modify fix ID does not compute per-atom array");
	if (argindex[ATTRIBUTES+nfield+nthresh] > 0 &&
	argindex[ATTRIBUTES+nfield+nthresh] > modify->fix[n]->size_peratom_cols)
	error->all(FLERR,"Dump modify fix ID vector is not large enough");

	field2index[ATTRIBUTES+nfield+nthresh] = add_fix(suffix);
	delete [] suffix;

	// variable value = v_ID

	} else if (strncmp(arg[1],"v_",2) == 0) {
	thresh_array[nthresh] = VARIABLE;
	int n = strlen(arg[1]);
	char *suffix = new char[n];
	strcpy(suffix,&arg[1][2]);

	argindex[ATTRIBUTES+nfield+nthresh] = 0;

	n = input->variable->find(suffix);
	if (n < 0) error->all(FLERR,"Could not find dump modify variable name");
	if (input->variable->atomstyle(n) == 0)
	error->all(FLERR,"Dump modify variable is not atom-style variable");

	field2index[ATTRIBUTES+nfield+nthresh] = add_variable(suffix);
	delete [] suffix;

	} else error->all(FLERR,"Invalid dump_modify threshold operator");

	// set operation type of threshold

	if (strcmp(arg[2],"<") == 0) thresh_op[nthresh] = LT;
	else if (strcmp(arg[2],"<=") == 0) thresh_op[nthresh] = LE;
	else if (strcmp(arg[2],">") == 0) thresh_op[nthresh] = GT;
	else if (strcmp(arg[2],">=") == 0) thresh_op[nthresh] = GE;
	else if (strcmp(arg[2],"==") == 0) thresh_op[nthresh] = EQ;
	else if (strcmp(arg[2],"!=") == 0) thresh_op[nthresh] = NEQ;
	else error->all(FLERR,"Invalid dump_modify threshold operator");

	// set threshold value

	thresh_value[nthresh] = force->numeric(FLERR,arg[3]);

	nthresh++;
	return 4;
	}

	return 0;
	}

	/* ----------------------------------------------------------------------
	return # of bytes of allocated memory in buf, choose, variable arrays
	------------------------------------------------------------------------- */

	-bigint DumpCustomVTK::memory_usage()
	+bigint DumpVTK::memory_usage()
	{
	bigint bytes = Dump::memory_usage();
	bytes += memory->usage(choose,maxlocal);
	bytes += memory->usage(dchoose,maxlocal);
	bytes += memory->usage(clist,maxlocal);
	bytes += memory->usage(vbuf,nvariable,maxlocal);
	return bytes;
	}

	/* ----------------------------------------------------------------------
	extraction of Compute, Fix, Variable results
	------------------------------------------------------------------------- */

	-void DumpCustomVTK::pack_compute(int n)
	+void DumpVTK::pack_compute(int n)
	{
	double *vector = compute[field2index[current_pack_choice_key]]->vector_atom;
	double **array = compute[field2index[current_pack_choice_key]]->array_atom;
	int index = argindex[current_pack_choice_key];

	if (index == 0) {
	for (int i = 0; i < nchoose; i++) {
	buf[n] = vector[clist[i]];
	n += size_one;
	}
	} else {
	index--;
	for (int i = 0; i < nchoose; i++) {
	buf[n] = array[clist[i]][index];
	n += size_one;
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::pack_fix(int n)
	+void DumpVTK::pack_fix(int n)
	{
	double *vector = fix[field2index[current_pack_choice_key]]->vector_atom;
	double **array = fix[field2index[current_pack_choice_key]]->array_atom;
	int index = argindex[current_pack_choice_key];

	if (index == 0) {
	for (int i = 0; i < nchoose; i++) {
	buf[n] = vector[clist[i]];
	n += size_one;
	}
	} else {
	index--;
	for (int i = 0; i < nchoose; i++) {
	buf[n] = array[clist[i]][index];
	n += size_one;
	}
	}
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::pack_variable(int n)
	+void DumpVTK::pack_variable(int n)
	{
	double *vector = vbuf[field2index[current_pack_choice_key]];

	for (int i = 0; i < nchoose; i++) {
	buf[n] = vector[clist[i]];
	n += size_one;
	}
	}

	/* ---------------------------------------------------------------------- */

	-void DumpCustomVTK::pack_custom(int n)
	+void DumpVTK::pack_custom(int n)
	{
	-
	int index = field2index[n];

	if (flag_custom[index] == 0) { // integer
	int iwhich,tmp;
	iwhich = atom->find_custom(id_custom[index],tmp);

	int *ivector = atom->ivector[iwhich];
	for (int i = 0; i < nchoose; i++) {
	buf[n] = ivector[clist[i]];
	n += size_one;
	}
	} else if (flag_custom[index] == 1) { // double
	int iwhich,tmp;
	iwhich = atom->find_custom(id_custom[index],tmp);

	double *dvector = atom->dvector[iwhich];
	for (int i = 0; i < nchoose; i++) {
	buf[n] = dvector[clist[i]];
	n += size_one;
	}
	}
	}
	diff --git a/src/USER-VTK/dump_custom_vtk.h b/src/USER-VTK/dump_vtk.h
	similarity index 95%
	rename from src/USER-VTK/dump_custom_vtk.h
	rename to src/USER-VTK/dump_vtk.h
	index f3b4a8b63..603ca114b 100644
	--- a/src/USER-VTK/dump_custom_vtk.h
	+++ b/src/USER-VTK/dump_vtk.h
	@@ -1,320 +1,321 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	This file initially came from LIGGGHTS (www.liggghts.com)
	Copyright (2014) DCS Computing GmbH, Linz
	Copyright (2015) Johannes Kepler University Linz

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifdef DUMP_CLASS

	-DumpStyle(custom/vtk,DumpCustomVTK)
	+DumpStyle(vtk,DumpVTK)

	#else

	-#ifndef LMP_DUMP_CUSTOM_VTK_H
	-#define LMP_DUMP_CUSTOM_VTK_H
	+#ifndef LMP_DUMP_VTK_H
	+#define LMP_DUMP_VTK_H

	#include "dump_custom.h"
	#include <map>
	#include <set>
	#include <string>

	#include <vtkSmartPointer.h>
	#include <vtkPoints.h>
	#include <vtkCellArray.h>

	class vtkAbstractArray;
	class vtkRectilinearGrid;
	class vtkUnstructuredGrid;

	namespace LAMMPS_NS {

	/**
	- * @brief DumpCustomVTK class
	+ * @brief DumpVTK class
	* write atom data to vtk files.
	*
	* Similar to the DumpCustom class but uses the vtk library to write data to vtk simple
	* legacy or xml format depending on the filename extension specified. (Since this
	* conflicts with the way binary output is specified, dump_modify allows to set the
	* binary flag for this dump command explicitly).
	* In contrast to DumpCustom class the attributes to be packed are stored in a std::map
	* to avoid duplicate entries and enforce correct ordering of vector components (except
	* for computes and fixes - these have to be given in the right order in the input script).
	* (Note: std::map elements are sorted by their keys.)
	* This dump command does not support compressed files, buffering or custom format strings,
	* multiproc is only supported by the xml formats, multifile option has to be used.
	*/
	-class DumpCustomVTK : public DumpCustom {
	+
	+class DumpVTK : public DumpCustom {
	public:
	- DumpCustomVTK(class LAMMPS , int, char *);
	- virtual ~DumpCustomVTK();
	+ DumpVTK(class LAMMPS , int, char *);
	+ virtual ~DumpVTK();

	virtual void write();
	protected:
	char *label; // string for dump file header

	int vtk_file_format; // which vtk file format to write (vtk, vtp, vtu ...)

	std::map<int, int> field2index; // which compute,fix,variable calcs this field
	std::map<int, int> argindex; // index into compute,fix scalar_atom,vector_atom
	// 0 for scalar_atom, 1-N for vector_atom values

	// private methods

	virtual void init_style();
	virtual void write_header(bigint);
	int count();
	void pack(tagint *);
	virtual void write_data(int, double *);
	bigint memory_usage();

	int parse_fields(int, char **);
	void identify_vectors();
	int add_compute(char *);
	int add_fix(char *);
	int add_variable(char *);
	int add_custom(char *, int);
	virtual int modify_param(int, char **);

	- typedef void (DumpCustomVTK::*FnPtrHeader)(bigint);
	+ typedef void (DumpVTK::*FnPtrHeader)(bigint);
	FnPtrHeader header_choice; // ptr to write header functions
	void header_vtk(bigint);

	- typedef void (DumpCustomVTK::FnPtrWrite)(int, double );
	+ typedef void (DumpVTK::FnPtrWrite)(int, double );
	FnPtrWrite write_choice; // ptr to write data functions
	void write_vtk(int, double *);
	void write_vtp(int, double *);
	void write_vtu(int, double *);

	void prepare_domain_data(vtkRectilinearGrid *);
	void prepare_domain_data_triclinic(vtkUnstructuredGrid *);
	void write_domain_vtk();
	void write_domain_vtk_triclinic();
	void write_domain_vtr();
	void write_domain_vtu_triclinic();

	- typedef void (DumpCustomVTK::*FnPtrPack)(int);
	+ typedef void (DumpVTK::*FnPtrPack)(int);
	std::map<int, FnPtrPack> pack_choice; // ptrs to pack functions
	std::map<int, int> vtype; // data type
	std::map<int, std::string> name; // attribute labels
	std::set<int> vector_set; // set of vector attributes
	int current_pack_choice_key;

	// vtk data containers
	vtkSmartPointer<vtkPoints> points;
	vtkSmartPointer<vtkCellArray> pointsCells;
	std::map<int, vtkSmartPointer<vtkAbstractArray> > myarrays;

	int n_calls_;
	double (*boxcorners)[3]; // corners of triclinic domain box
	char *filecurrent;
	char *domainfilecurrent;
	char *parallelfilecurrent;
	char *multiname_ex;

	void setFileCurrent();
	void buf2arrays(int, double *); // transfer data from buf array to vtk arrays
	void reset_vtk_data_containers();

	// customize by adding a method prototype
	void pack_compute(int);
	void pack_fix(int);
	void pack_variable(int);
	void pack_custom(int);
	};

	}

	#endif
	#endif

	/* ERROR/WARNING messages:

	E: No dump custom arguments specified

	The dump custom command requires that atom quantities be specified to
	output to dump file.

	E: Invalid attribute in dump custom command

	Self-explanatory.

	E: Dump_modify format string is too short

	There are more fields to be dumped in a line of output than your
	format string specifies.

	E: Could not find dump custom compute ID

	Self-explanatory.

	E: Could not find dump custom fix ID

	Self-explanatory.

	E: Dump custom and fix not computed at compatible times

	The fix must produce per-atom quantities on timesteps that dump custom
	needs them.

	E: Could not find dump custom variable name

	Self-explanatory.

	E: Could not find custom per-atom property ID

	Self-explanatory.

	E: Region ID for dump custom does not exist

	Self-explanatory.

	E: Compute used in dump between runs is not current

	The compute was not invoked on the current timestep, therefore it
	cannot be used in a dump between runs.

	E: Threshhold for an atom property that isn't allocated

	A dump threshold has been requested on a quantity that is
	not defined by the atom style used in this simulation.

	E: Dumping an atom property that isn't allocated

	The chosen atom style does not define the per-atom quantity being
	dumped.

	E: Dump custom compute does not compute per-atom info

	Self-explanatory.

	E: Dump custom compute does not calculate per-atom vector

	Self-explanatory.

	E: Dump custom compute does not calculate per-atom array

	Self-explanatory.

	E: Dump custom compute vector is accessed out-of-range

	Self-explanatory.

	E: Dump custom fix does not compute per-atom info

	Self-explanatory.

	E: Dump custom fix does not compute per-atom vector

	Self-explanatory.

	E: Dump custom fix does not compute per-atom array

	Self-explanatory.

	E: Dump custom fix vector is accessed out-of-range

	Self-explanatory.

	E: Dump custom variable is not atom-style variable

	Only atom-style variables generate per-atom quantities, needed for
	dump output.

	E: Custom per-atom property ID is not floating point

	Self-explanatory.

	E: Custom per-atom property ID is not integer

	Self-explanatory.

	E: Illegal ... command

	Self-explanatory. Check the input script syntax and compare to the
	documentation for the command. You can use -echo screen as a
	command-line option when running LAMMPS to see the offending line.

	E: Dump_modify region ID does not exist

	Self-explanatory.

	E: Dump modify element names do not match atom types

	Number of element names must equal number of atom types.

	E: Invalid attribute in dump modify command

	Self-explanatory.

	E: Could not find dump modify compute ID

	Self-explanatory.

	E: Dump modify compute ID does not compute per-atom info

	Self-explanatory.

	E: Dump modify compute ID does not compute per-atom vector

	Self-explanatory.

	E: Dump modify compute ID does not compute per-atom array

	Self-explanatory.

	E: Dump modify compute ID vector is not large enough

	Self-explanatory.

	E: Could not find dump modify fix ID

	Self-explanatory.

	E: Dump modify fix ID does not compute per-atom info

	Self-explanatory.

	E: Dump modify fix ID does not compute per-atom vector

	Self-explanatory.

	E: Dump modify fix ID does not compute per-atom array

	Self-explanatory.

	E: Dump modify fix ID vector is not large enough

	Self-explanatory.

	E: Could not find dump modify variable name

	Self-explanatory.

	E: Dump modify variable is not atom-style variable

	Self-explanatory.

	E: Could not find dump modify custom atom floating point property ID

	Self-explanatory.

	E: Could not find dump modify custom atom integer property ID

	Self-explanatory.

	E: Invalid dump_modify threshold operator

	Operator keyword used for threshold specification in not recognized.

	*/
	diff --git a/src/compute_dipole_chunk.cpp b/src/compute_dipole_chunk.cpp
	index 74d66e7c1..45389ee61 100644
	--- a/src/compute_dipole_chunk.cpp
	+++ b/src/compute_dipole_chunk.cpp
	@@ -1,294 +1,296 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <string.h>
	#include "compute_dipole_chunk.h"
	#include "atom.h"
	#include "update.h"
	#include "modify.h"
	#include "compute_chunk_atom.h"
	#include "domain.h"
	#include "memory.h"
	#include "error.h"
	#include "math_special.h"

	using namespace LAMMPS_NS;
	using namespace MathSpecial;

	enum { MASSCENTER, GEOMCENTER };

	/* ---------------------------------------------------------------------- */

	ComputeDipoleChunk::ComputeDipoleChunk(LAMMPS lmp, int narg, char *arg) :
	Compute(lmp, narg, arg),
	- idchunk(NULL), massproc(NULL), masstotal(NULL), chrgproc(NULL), chrgtotal(NULL), com(NULL),
	+ idchunk(NULL), massproc(NULL), masstotal(NULL), chrgproc(NULL),
	+ chrgtotal(NULL), com(NULL),
	comall(NULL), dipole(NULL), dipoleall(NULL)
	{
	- if ((narg != 4) && (narg != 5)) error->all(FLERR,"Illegal compute dipole/chunk command");
	+ if ((narg != 4) && (narg != 5))
	+ error->all(FLERR,"Illegal compute dipole/chunk command");

	array_flag = 1;
	size_array_cols = 4;
	size_array_rows = 0;
	size_array_rows_variable = 1;
	extarray = 0;

	// ID of compute chunk/atom

	int n = strlen(arg[3]) + 1;
	idchunk = new char[n];
	strcpy(idchunk,arg[3]);

	usecenter = MASSCENTER;

	if (narg == 5) {
	if (strncmp(arg[4],"geom",4) == 0) usecenter = GEOMCENTER;
	else if (strcmp(arg[4],"mass") == 0) usecenter = MASSCENTER;
	else error->all(FLERR,"Illegal compute dipole/chunk command");
	}

	init();

	// chunk-based data

	nchunk = 1;
	maxchunk = 0;
	allocate();
	}

	/* ---------------------------------------------------------------------- */

	ComputeDipoleChunk::~ComputeDipoleChunk()
	{
	delete [] idchunk;
	memory->destroy(massproc);
	memory->destroy(masstotal);
	memory->destroy(chrgproc);
	memory->destroy(chrgtotal);
	memory->destroy(com);
	memory->destroy(comall);
	memory->destroy(dipole);
	memory->destroy(dipoleall);
	}

	/* ---------------------------------------------------------------------- */

	void ComputeDipoleChunk::init()
	{
	int icompute = modify->find_compute(idchunk);
	if (icompute < 0)
	error->all(FLERR,"Chunk/atom compute does not exist for "
	"compute dipole/chunk");
	cchunk = (ComputeChunkAtom *) modify->compute[icompute];
	if (strcmp(cchunk->style,"chunk/atom") != 0)
	error->all(FLERR,"Compute dipole/chunk does not use chunk/atom compute");
	}

	/* ---------------------------------------------------------------------- */

	void ComputeDipoleChunk::compute_array()
	{
	int i,index;
	double massone;
	double unwrap[3];

	invoked_array = update->ntimestep;

	// compute chunk/atom assigns atoms to chunk IDs
	// extract ichunk index vector from compute
	// ichunk = 1 to Nchunk for included atoms, 0 for excluded atoms

	nchunk = cchunk->setup_chunks();
	cchunk->compute_ichunk();
	int *ichunk = cchunk->ichunk;

	if (nchunk > maxchunk) allocate();
	size_array_rows = nchunk;

	// zero local per-chunk values

	for (int i = 0; i < nchunk; i++) {
	massproc[i] = chrgproc[i] = 0.0;
	com[i][0] = com[i][1] = com[i][2] = 0.0;
	dipole[i][0] = dipole[i][1] = dipole[i][2] = dipole[i][3] = 0.0;
	}

	// compute COM for each chunk

	double **x = atom->x;
	int *mask = atom->mask;
	int *type = atom->type;
	imageint *image = atom->image;
	double *mass = atom->mass;
	double *rmass = atom->rmass;
	double *q = atom->q;
	double **mu = atom->mu;

	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++)
	if (mask[i] & groupbit) {
	index = ichunk[i]-1;
	if (index < 0) continue;
	if (usecenter == MASSCENTER) {
	if (rmass) massone = rmass[i];
	else massone = mass[type[i]];
	} else massone = 1.0; // usecenter == GEOMCENTER

	domain->unmap(x[i],image[i],unwrap);
	massproc[index] += massone;
	if (atom->q_flag) chrgproc[index] += atom->q[i];
	com[index][0] += unwrap[0] * massone;
	com[index][1] += unwrap[1] * massone;
	com[index][2] += unwrap[2] * massone;
	}

	MPI_Allreduce(massproc,masstotal,nchunk,MPI_DOUBLE,MPI_SUM,world);
	MPI_Allreduce(chrgproc,chrgtotal,nchunk,MPI_DOUBLE,MPI_SUM,world);
	MPI_Allreduce(&com[0][0],&comall[0][0],3*nchunk,MPI_DOUBLE,MPI_SUM,world);

	for (int i = 0; i < nchunk; i++) {
	if (masstotal[i] > 0.0) {
	comall[i][0] /= masstotal[i];
	comall[i][1] /= masstotal[i];
	comall[i][2] /= masstotal[i];
	}
	}

	// compute dipole for each chunk

	for (i = 0; i < nlocal; i++) {
	if (mask[i] & groupbit) {
	index = ichunk[i]-1;
	if (index < 0) continue;
	domain->unmap(x[i],image[i],unwrap);
	if (atom->q_flag) {
	dipole[index][0] += q[i]*unwrap[0];
	dipole[index][1] += q[i]*unwrap[1];
	dipole[index][2] += q[i]*unwrap[2];
	}
	if (atom->mu_flag) {
	dipole[index][0] += mu[i][0];
	dipole[index][1] += mu[i][1];
	dipole[index][2] += mu[i][2];
	}
	}
	}

	MPI_Allreduce(&dipole[0][0],&dipoleall[0][0],4*nchunk,
	MPI_DOUBLE,MPI_SUM,world);

	for (i = 0; i < nchunk; i++) {
	// correct for position dependence with charged chunks
	dipoleall[i][0] -= chrgtotal[i]*comall[i][0];
	dipoleall[i][1] -= chrgtotal[i]*comall[i][1];
	dipoleall[i][2] -= chrgtotal[i]*comall[i][2];
	// compute total dipole moment
	dipoleall[i][3] = sqrt(square(dipoleall[i][0])
	+ square(dipoleall[i][1])
	+ square(dipoleall[i][2]));
	}
	}

	/* ----------------------------------------------------------------------
	lock methods: called by fix ave/time
	these methods insure vector/array size is locked for Nfreq epoch
	by passing lock info along to compute chunk/atom
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	increment lock counter
	------------------------------------------------------------------------- */

	void ComputeDipoleChunk::lock_enable()
	{
	cchunk->lockcount++;
	}

	/* ----------------------------------------------------------------------
	decrement lock counter in compute chunk/atom, it if still exists
	------------------------------------------------------------------------- */

	void ComputeDipoleChunk::lock_disable()
	{
	int icompute = modify->find_compute(idchunk);
	if (icompute >= 0) {
	cchunk = (ComputeChunkAtom *) modify->compute[icompute];
	cchunk->lockcount--;
	}
	}

	/* ----------------------------------------------------------------------
	calculate and return # of chunks = length of vector/array
	------------------------------------------------------------------------- */

	int ComputeDipoleChunk::lock_length()
	{
	nchunk = cchunk->setup_chunks();
	return nchunk;
	}

	/* ----------------------------------------------------------------------
	set the lock from startstep to stopstep
	------------------------------------------------------------------------- */

	void ComputeDipoleChunk::lock(Fix *fixptr, bigint startstep, bigint stopstep)
	{
	cchunk->lock(fixptr,startstep,stopstep);
	}

	/* ----------------------------------------------------------------------
	unset the lock
	------------------------------------------------------------------------- */

	void ComputeDipoleChunk::unlock(Fix *fixptr)
	{
	cchunk->unlock(fixptr);
	}

	/* ----------------------------------------------------------------------
	free and reallocate per-chunk arrays
	------------------------------------------------------------------------- */

	void ComputeDipoleChunk::allocate()
	{
	memory->destroy(massproc);
	memory->destroy(masstotal);
	memory->destroy(chrgproc);
	memory->destroy(chrgtotal);
	memory->destroy(com);
	memory->destroy(comall);
	memory->destroy(dipole);
	memory->destroy(dipoleall);
	maxchunk = nchunk;
	memory->create(massproc,maxchunk,"dipole/chunk:massproc");
	memory->create(masstotal,maxchunk,"dipole/chunk:masstotal");
	memory->create(chrgproc,maxchunk,"dipole/chunk:chrgproc");
	memory->create(chrgtotal,maxchunk,"dipole/chunk:chrgtotal");
	memory->create(com,maxchunk,3,"dipole/chunk:com");
	memory->create(comall,maxchunk,3,"dipole/chunk:comall");
	memory->create(dipole,maxchunk,4,"dipole/chunk:dipole");
	memory->create(dipoleall,maxchunk,4,"dipole/chunk:dipoleall");
	array = dipoleall;
	}

	/* ----------------------------------------------------------------------
	memory usage of local data
	------------------------------------------------------------------------- */

	double ComputeDipoleChunk::memory_usage()
	{
	double bytes = (bigint) maxchunk * 2 * sizeof(double);
	bytes += (bigint) maxchunk * 23 sizeof(double);
	bytes += (bigint) maxchunk * 24 sizeof(double);
	return bytes;
	}
	diff --git a/src/domain.cpp b/src/domain.cpp
	index 31fb3b855..8ead12cd4 100644
	--- a/src/domain.cpp
	+++ b/src/domain.cpp
	@@ -1,2053 +1,2125 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author (triclinic) : Pieter in 't Veld (SNL)
	------------------------------------------------------------------------- */

	#include <mpi.h>
	#include <stdlib.h>
	#include <string.h>
	#include <stdio.h>
	#include <math.h>
	#include "domain.h"
	#include "style_region.h"
	#include "atom.h"
	#include "atom_vec.h"
	#include "molecule.h"
	#include "force.h"
	#include "kspace.h"
	#include "update.h"
	#include "modify.h"
	#include "fix.h"
	#include "fix_deform.h"
	#include "region.h"
	#include "lattice.h"
	#include "comm.h"
	#include "output.h"
	#include "thermo.h"
	#include "universe.h"
	#include "math_const.h"
	#include "memory.h"
	#include "error.h"

	using namespace LAMMPS_NS;
	using namespace MathConst;

	enum{NO_REMAP,X_REMAP,V_REMAP}; // same as fix_deform.cpp
	enum{IGNORE,WARN,ERROR}; // same as thermo.cpp
	enum{LAYOUT_UNIFORM,LAYOUT_NONUNIFORM,LAYOUT_TILED}; // several files

	#define BIG 1.0e20
	#define SMALL 1.0e-4
	#define DELTAREGION 4
	#define BONDSTRETCH 1.1

	/* ----------------------------------------------------------------------
	default is periodic
	------------------------------------------------------------------------- */

	Domain::Domain(LAMMPS *lmp) : Pointers(lmp)
	{
	box_exist = 0;

	dimension = 3;
	nonperiodic = 0;
	xperiodic = yperiodic = zperiodic = 1;
	periodicity[0] = xperiodic;
	periodicity[1] = yperiodic;
	periodicity[2] = zperiodic;

	boundary[0][0] = boundary[0][1] = 0;
	boundary[1][0] = boundary[1][1] = 0;
	boundary[2][0] = boundary[2][1] = 0;

	minxlo = minxhi = 0.0;
	minylo = minyhi = 0.0;
	minzlo = minzhi = 0.0;

	triclinic = 0;
	tiltsmall = 1;

	boxlo[0] = boxlo[1] = boxlo[2] = -0.5;
	boxhi[0] = boxhi[1] = boxhi[2] = 0.5;
	xy = xz = yz = 0.0;

	h[3] = h[4] = h[5] = 0.0;
	h_inv[3] = h_inv[4] = h_inv[5] = 0.0;
	h_rate[0] = h_rate[1] = h_rate[2] =
	h_rate[3] = h_rate[4] = h_rate[5] = 0.0;
	h_ratelo[0] = h_ratelo[1] = h_ratelo[2] = 0.0;

	prd_lamda[0] = prd_lamda[1] = prd_lamda[2] = 1.0;
	prd_half_lamda[0] = prd_half_lamda[1] = prd_half_lamda[2] = 0.5;
	boxlo_lamda[0] = boxlo_lamda[1] = boxlo_lamda[2] = 0.0;
	boxhi_lamda[0] = boxhi_lamda[1] = boxhi_lamda[2] = 1.0;

	lattice = NULL;
	char *args = new char[2];
	args[0] = (char *) "none";
	args[1] = (char *) "1.0";
	set_lattice(2,args);
	delete [] args;

	nregion = maxregion = 0;
	regions = NULL;

	copymode = 0;

	region_map = new RegionCreatorMap();

	#define REGION_CLASS
	#define RegionStyle(key,Class) \
	(*region_map)[#key] = &region_creator<Class>;
	#include "style_region.h"
	#undef RegionStyle
	#undef REGION_CLASS
	}

	/* ---------------------------------------------------------------------- */

	Domain::~Domain()
	{
	if (copymode) return;

	delete lattice;
	for (int i = 0; i < nregion; i++) delete regions[i];
	memory->sfree(regions);

	delete region_map;
	}

	/* ---------------------------------------------------------------------- */

	void Domain::init()
	{
	// set box_change flags if box size/shape/sub-domains ever change
	// due to shrink-wrapping or fixes that change box size/shape/sub-domains

	box_change_size = box_change_shape = box_change_domain = 0;

	if (nonperiodic == 2) box_change_size = 1;
	for (int i = 0; i < modify->nfix; i++) {
	if (modify->fix[i]->box_change_size) box_change_size = 1;
	if (modify->fix[i]->box_change_shape) box_change_shape = 1;
	if (modify->fix[i]->box_change_domain) box_change_domain = 1;
	}

	box_change = 0;
	if (box_change_size \|\| box_change_shape \|\| box_change_domain) box_change = 1;

	// check for fix deform

	deform_flag = deform_vremap = deform_groupbit = 0;
	for (int i = 0; i < modify->nfix; i++)
	if (strcmp(modify->fix[i]->style,"deform") == 0) {
	deform_flag = 1;
	if (((FixDeform *) modify->fix[i])->remapflag == V_REMAP) {
	deform_vremap = 1;
	deform_groupbit = modify->fix[i]->groupbit;
	}
	}

	// region inits

	for (int i = 0; i < nregion; i++) regions[i]->init();
	}

	/* ----------------------------------------------------------------------
	set initial global box
	assumes boxlo/hi and triclinic tilts are already set
	expandflag = 1 if need to expand box in shrink-wrapped dims
	not invoked by read_restart since box is already expanded
	if don't prevent further expansion, restarted triclinic box
	with unchanged tilt factors can become a box with atoms outside the box
	------------------------------------------------------------------------- */

	void Domain::set_initial_box(int expandflag)
	{
	// error checks for orthogonal and triclinic domains

	if (boxlo[0] >= boxhi[0] \|\| boxlo[1] >= boxhi[1] \|\| boxlo[2] >= boxhi[2])
	error->one(FLERR,"Box bounds are invalid or missing");

	if (domain->dimension == 2 && (xz != 0.0 \|\| yz != 0.0))
	error->all(FLERR,"Cannot skew triclinic box in z for 2d simulation");

	// error check or warning on triclinic tilt factors

	if (triclinic) {
	if ((fabs(xy/(boxhi[0]-boxlo[0])) > 0.5 && xperiodic) \|\|
	(fabs(xz/(boxhi[0]-boxlo[0])) > 0.5 && xperiodic) \|\|
	(fabs(yz/(boxhi[1]-boxlo[1])) > 0.5 && yperiodic)) {
	if (tiltsmall)
	error->all(FLERR,"Triclinic box skew is too large");
	else if (comm->me == 0)
	error->warning(FLERR,"Triclinic box skew is large");
	}
	}

	// set small based on box size and SMALL
	// this works for any unit system

	small[0] = SMALL * (boxhi[0] - boxlo[0]);
	small[1] = SMALL * (boxhi[1] - boxlo[1]);
	small[2] = SMALL * (boxhi[2] - boxlo[2]);

	// if expandflag, adjust box lo/hi for shrink-wrapped dims

	if (!expandflag) return;

	if (boundary[0][0] == 2) boxlo[0] -= small[0];
	else if (boundary[0][0] == 3) minxlo = boxlo[0];
	if (boundary[0][1] == 2) boxhi[0] += small[0];
	else if (boundary[0][1] == 3) minxhi = boxhi[0];

	if (boundary[1][0] == 2) boxlo[1] -= small[1];
	else if (boundary[1][0] == 3) minylo = boxlo[1];
	if (boundary[1][1] == 2) boxhi[1] += small[1];
	else if (boundary[1][1] == 3) minyhi = boxhi[1];

	if (boundary[2][0] == 2) boxlo[2] -= small[2];
	else if (boundary[2][0] == 3) minzlo = boxlo[2];
	if (boundary[2][1] == 2) boxhi[2] += small[2];
	else if (boundary[2][1] == 3) minzhi = boxhi[2];
	}

	/* ----------------------------------------------------------------------
	set global box params
	assumes boxlo/hi and triclinic tilts are already set
	------------------------------------------------------------------------- */

	void Domain::set_global_box()
	{
	prd[0] = xprd = boxhi[0] - boxlo[0];
	prd[1] = yprd = boxhi[1] - boxlo[1];
	prd[2] = zprd = boxhi[2] - boxlo[2];

	h[0] = xprd;
	h[1] = yprd;
	h[2] = zprd;
	h_inv[0] = 1.0/h[0];
	h_inv[1] = 1.0/h[1];
	h_inv[2] = 1.0/h[2];

	prd_half[0] = xprd_half = 0.5*xprd;
	prd_half[1] = yprd_half = 0.5*yprd;
	prd_half[2] = zprd_half = 0.5*zprd;

	if (triclinic) {
	h[3] = yz;
	h[4] = xz;
	h[5] = xy;
	h_inv[3] = -h[3] / (h[1]*h[2]);
	h_inv[4] = (h[3]h[5] - h[1]h[4]) / (h[0]h[1]h[2]);
	h_inv[5] = -h[5] / (h[0]*h[1]);

	boxlo_bound[0] = MIN(boxlo[0],boxlo[0]+xy);
	boxlo_bound[0] = MIN(boxlo_bound[0],boxlo_bound[0]+xz);
	boxlo_bound[1] = MIN(boxlo[1],boxlo[1]+yz);
	boxlo_bound[2] = boxlo[2];

	boxhi_bound[0] = MAX(boxhi[0],boxhi[0]+xy);
	boxhi_bound[0] = MAX(boxhi_bound[0],boxhi_bound[0]+xz);
	boxhi_bound[1] = MAX(boxhi[1],boxhi[1]+yz);
	boxhi_bound[2] = boxhi[2];
	}
	}

	/* ----------------------------------------------------------------------
	set lamda box params
	assumes global box is defined and proc assignment has been made
	uses comm->xyz_split or comm->mysplit
	to define subbox boundaries in consistent manner
	------------------------------------------------------------------------- */

	void Domain::set_lamda_box()
	{
	if (comm->layout != LAYOUT_TILED) {
	int *myloc = comm->myloc;
	double *xsplit = comm->xsplit;
	double *ysplit = comm->ysplit;
	double *zsplit = comm->zsplit;

	sublo_lamda[0] = xsplit[myloc[0]];
	subhi_lamda[0] = xsplit[myloc[0]+1];
	sublo_lamda[1] = ysplit[myloc[1]];
	subhi_lamda[1] = ysplit[myloc[1]+1];
	sublo_lamda[2] = zsplit[myloc[2]];
	subhi_lamda[2] = zsplit[myloc[2]+1];

	} else {
	double (*mysplit)[2] = comm->mysplit;

	sublo_lamda[0] = mysplit[0][0];
	subhi_lamda[0] = mysplit[0][1];
	sublo_lamda[1] = mysplit[1][0];
	subhi_lamda[1] = mysplit[1][1];
	sublo_lamda[2] = mysplit[2][0];
	subhi_lamda[2] = mysplit[2][1];
	}
	}

	/* ----------------------------------------------------------------------
	set local subbox params for orthogonal boxes
	assumes global box is defined and proc assignment has been made
	uses comm->xyz_split or comm->mysplit
	to define subbox boundaries in consistent manner
	insure subhi[max] = boxhi
	------------------------------------------------------------------------- */

	void Domain::set_local_box()
	{
	if (triclinic) return;

	if (comm->layout != LAYOUT_TILED) {
	int *myloc = comm->myloc;
	int *procgrid = comm->procgrid;
	double *xsplit = comm->xsplit;
	double *ysplit = comm->ysplit;
	double *zsplit = comm->zsplit;

	sublo[0] = boxlo[0] + xprd*xsplit[myloc[0]];
	if (myloc[0] < procgrid[0]-1) subhi[0] = boxlo[0] + xprd*xsplit[myloc[0]+1];
	else subhi[0] = boxhi[0];

	sublo[1] = boxlo[1] + yprd*ysplit[myloc[1]];
	if (myloc[1] < procgrid[1]-1) subhi[1] = boxlo[1] + yprd*ysplit[myloc[1]+1];
	else subhi[1] = boxhi[1];

	sublo[2] = boxlo[2] + zprd*zsplit[myloc[2]];
	if (myloc[2] < procgrid[2]-1) subhi[2] = boxlo[2] + zprd*zsplit[myloc[2]+1];
	else subhi[2] = boxhi[2];

	} else {
	double (*mysplit)[2] = comm->mysplit;

	sublo[0] = boxlo[0] + xprd*mysplit[0][0];
	if (mysplit[0][1] < 1.0) subhi[0] = boxlo[0] + xprd*mysplit[0][1];
	else subhi[0] = boxhi[0];

	sublo[1] = boxlo[1] + yprd*mysplit[1][0];
	if (mysplit[1][1] < 1.0) subhi[1] = boxlo[1] + yprd*mysplit[1][1];
	else subhi[1] = boxhi[1];

	sublo[2] = boxlo[2] + zprd*mysplit[2][0];
	if (mysplit[2][1] < 1.0) subhi[2] = boxlo[2] + zprd*mysplit[2][1];
	else subhi[2] = boxhi[2];
	}
	}

	/* ----------------------------------------------------------------------
	reset global & local boxes due to global box boundary changes
	if shrink-wrapped, determine atom extent and reset boxlo/hi
	for triclinic, atoms must be in lamda coords (0-1) before reset_box is called
	------------------------------------------------------------------------- */

	void Domain::reset_box()
	{
	// perform shrink-wrapping
	// compute extent of atoms on this proc
	// for triclinic, this is done in lamda space

	if (nonperiodic == 2) {
	double extent[3][2],all[3][2];

	extent[2][0] = extent[1][0] = extent[0][0] = BIG;
	extent[2][1] = extent[1][1] = extent[0][1] = -BIG;

	double **x = atom->x;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	extent[0][0] = MIN(extent[0][0],x[i][0]);
	extent[0][1] = MAX(extent[0][1],x[i][0]);
	extent[1][0] = MIN(extent[1][0],x[i][1]);
	extent[1][1] = MAX(extent[1][1],x[i][1]);
	extent[2][0] = MIN(extent[2][0],x[i][2]);
	extent[2][1] = MAX(extent[2][1],x[i][2]);
	}

	// compute extent across all procs
	// flip sign of MIN to do it in one Allreduce MAX

	extent[0][0] = -extent[0][0];
	extent[1][0] = -extent[1][0];
	extent[2][0] = -extent[2][0];

	MPI_Allreduce(extent,all,6,MPI_DOUBLE,MPI_MAX,world);

	// for triclinic, convert back to box coords before changing box

	if (triclinic) lamda2x(atom->nlocal);

	// in shrink-wrapped dims, set box by atom extent
	// if minimum set, enforce min box size settings
	// for triclinic, convert lamda extent to box coords, then set box lo/hi
	// decided NOT to do the next comment - don't want to sneakily change tilt
	// for triclinic, adjust tilt factors if 2nd dim is shrink-wrapped,
	// so that displacement in 1st dim stays the same

	if (triclinic == 0) {
	if (xperiodic == 0) {
	if (boundary[0][0] == 2) boxlo[0] = -all[0][0] - small[0];
	else if (boundary[0][0] == 3)
	boxlo[0] = MIN(-all[0][0]-small[0],minxlo);
	if (boundary[0][1] == 2) boxhi[0] = all[0][1] + small[0];
	else if (boundary[0][1] == 3) boxhi[0] = MAX(all[0][1]+small[0],minxhi);
	if (boxlo[0] > boxhi[0]) error->all(FLERR,"Illegal simulation box");
	}
	if (yperiodic == 0) {
	if (boundary[1][0] == 2) boxlo[1] = -all[1][0] - small[1];
	else if (boundary[1][0] == 3)
	boxlo[1] = MIN(-all[1][0]-small[1],minylo);
	if (boundary[1][1] == 2) boxhi[1] = all[1][1] + small[1];
	else if (boundary[1][1] == 3) boxhi[1] = MAX(all[1][1]+small[1],minyhi);
	if (boxlo[1] > boxhi[1]) error->all(FLERR,"Illegal simulation box");
	}
	if (zperiodic == 0) {
	if (boundary[2][0] == 2) boxlo[2] = -all[2][0] - small[2];
	else if (boundary[2][0] == 3)
	boxlo[2] = MIN(-all[2][0]-small[2],minzlo);
	if (boundary[2][1] == 2) boxhi[2] = all[2][1] + small[2];
	else if (boundary[2][1] == 3) boxhi[2] = MAX(all[2][1]+small[2],minzhi);
	if (boxlo[2] > boxhi[2]) error->all(FLERR,"Illegal simulation box");
	}

	} else {
	double lo[3],hi[3];
	if (xperiodic == 0) {
	lo[0] = -all[0][0]; lo[1] = 0.0; lo[2] = 0.0;
	lamda2x(lo,lo);
	hi[0] = all[0][1]; hi[1] = 0.0; hi[2] = 0.0;
	lamda2x(hi,hi);
	if (boundary[0][0] == 2) boxlo[0] = lo[0] - small[0];
	else if (boundary[0][0] == 3) boxlo[0] = MIN(lo[0]-small[0],minxlo);
	if (boundary[0][1] == 2) boxhi[0] = hi[0] + small[0];
	else if (boundary[0][1] == 3) boxhi[0] = MAX(hi[0]+small[0],minxhi);
	if (boxlo[0] > boxhi[0]) error->all(FLERR,"Illegal simulation box");
	}
	if (yperiodic == 0) {
	lo[0] = 0.0; lo[1] = -all[1][0]; lo[2] = 0.0;
	lamda2x(lo,lo);
	hi[0] = 0.0; hi[1] = all[1][1]; hi[2] = 0.0;
	lamda2x(hi,hi);
	if (boundary[1][0] == 2) boxlo[1] = lo[1] - small[1];
	else if (boundary[1][0] == 3) boxlo[1] = MIN(lo[1]-small[1],minylo);
	if (boundary[1][1] == 2) boxhi[1] = hi[1] + small[1];
	else if (boundary[1][1] == 3) boxhi[1] = MAX(hi[1]+small[1],minyhi);
	if (boxlo[1] > boxhi[1]) error->all(FLERR,"Illegal simulation box");
	//xy *= (boxhi[1]-boxlo[1]) / yprd;
	}
	if (zperiodic == 0) {
	lo[0] = 0.0; lo[1] = 0.0; lo[2] = -all[2][0];
	lamda2x(lo,lo);
	hi[0] = 0.0; hi[1] = 0.0; hi[2] = all[2][1];
	lamda2x(hi,hi);
	if (boundary[2][0] == 2) boxlo[2] = lo[2] - small[2];
	else if (boundary[2][0] == 3) boxlo[2] = MIN(lo[2]-small[2],minzlo);
	if (boundary[2][1] == 2) boxhi[2] = hi[2] + small[2];
	else if (boundary[2][1] == 3) boxhi[2] = MAX(hi[2]+small[2],minzhi);
	if (boxlo[2] > boxhi[2]) error->all(FLERR,"Illegal simulation box");
	//xz *= (boxhi[2]-boxlo[2]) / xprd;
	//yz *= (boxhi[2]-boxlo[2]) / yprd;
	}
	}
	}

	// reset box whether shrink-wrapping or not

	set_global_box();
	set_local_box();

	// if shrink-wrapped & kspace is defined (i.e. using MSM), call setup()
	// also call init() (to test for compatibility) ?

	if (nonperiodic == 2 && force->kspace) {
	//force->kspace->init();
	force->kspace->setup();
	}

	// if shrink-wrapped & triclinic, re-convert to lamda coords for new box
	// re-invoke pbc() b/c x2lamda result can be outside [0,1] due to roundoff

	if (nonperiodic == 2 && triclinic) {
	x2lamda(atom->nlocal);
	pbc();
	}
	}

	/* ----------------------------------------------------------------------
	enforce PBC and modify box image flags for each atom
	called every reneighboring and by other commands that change atoms
	resulting coord must satisfy lo <= coord < hi
	MAX is important since coord - prd < lo can happen when coord = hi
	if fix deform, remap velocity of fix group atoms by box edge velocities
	for triclinic, atoms must be in lamda coords (0-1) before pbc is called
	image = 10 or 20 bits for each dimension depending on sizeof(imageint)
	increment/decrement in wrap-around fashion
	------------------------------------------------------------------------- */

	void Domain::pbc()
	{
	int i;
	imageint idim,otherdims;
	double lo,hi,*period;
	int nlocal = atom->nlocal;
	double **x = atom->x;
	double **v = atom->v;
	int *mask = atom->mask;
	imageint *image = atom->image;

	// verify owned atoms have valid numerical coords
	// may not if computed pairwise force between 2 atoms at same location

	double *coord;
	int n3 = 3*nlocal;
	coord = &x[0][0]; // note: x is always initialized to at least one element.
	int flag = 0;
	for (i = 0; i < n3; i++)
	if (!ISFINITE(*coord++)) flag = 1;
	if (flag) error->one(FLERR,"Non-numeric atom coords - simulation unstable");

	// setup for PBC checks

	if (triclinic == 0) {
	lo = boxlo;
	hi = boxhi;
	period = prd;
	} else {
	lo = boxlo_lamda;
	hi = boxhi_lamda;
	period = prd_lamda;
	}

	// apply PBC to each owned atom

	for (i = 0; i < nlocal; i++) {
	if (xperiodic) {
	if (x[i][0] < lo[0]) {
	x[i][0] += period[0];
	if (deform_vremap && mask[i] & deform_groupbit) v[i][0] += h_rate[0];
	idim = image[i] & IMGMASK;
	otherdims = image[i] ^ idim;
	idim--;
	idim &= IMGMASK;
	image[i] = otherdims \| idim;
	}
	if (x[i][0] >= hi[0]) {
	x[i][0] -= period[0];
	x[i][0] = MAX(x[i][0],lo[0]);
	if (deform_vremap && mask[i] & deform_groupbit) v[i][0] -= h_rate[0];
	idim = image[i] & IMGMASK;
	otherdims = image[i] ^ idim;
	idim++;
	idim &= IMGMASK;
	image[i] = otherdims \| idim;
	}
	}

	if (yperiodic) {
	if (x[i][1] < lo[1]) {
	x[i][1] += period[1];
	if (deform_vremap && mask[i] & deform_groupbit) {
	v[i][0] += h_rate[5];
	v[i][1] += h_rate[1];
	}
	idim = (image[i] >> IMGBITS) & IMGMASK;
	otherdims = image[i] ^ (idim << IMGBITS);
	idim--;
	idim &= IMGMASK;
	image[i] = otherdims \| (idim << IMGBITS);
	}
	if (x[i][1] >= hi[1]) {
	x[i][1] -= period[1];
	x[i][1] = MAX(x[i][1],lo[1]);
	if (deform_vremap && mask[i] & deform_groupbit) {
	v[i][0] -= h_rate[5];
	v[i][1] -= h_rate[1];
	}
	idim = (image[i] >> IMGBITS) & IMGMASK;
	otherdims = image[i] ^ (idim << IMGBITS);
	idim++;
	idim &= IMGMASK;
	image[i] = otherdims \| (idim << IMGBITS);
	}
	}

	if (zperiodic) {
	if (x[i][2] < lo[2]) {
	x[i][2] += period[2];
	if (deform_vremap && mask[i] & deform_groupbit) {
	v[i][0] += h_rate[4];
	v[i][1] += h_rate[3];
	v[i][2] += h_rate[2];
	}
	idim = image[i] >> IMG2BITS;
	otherdims = image[i] ^ (idim << IMG2BITS);
	idim--;
	idim &= IMGMASK;
	image[i] = otherdims \| (idim << IMG2BITS);
	}
	if (x[i][2] >= hi[2]) {
	x[i][2] -= period[2];
	x[i][2] = MAX(x[i][2],lo[2]);
	if (deform_vremap && mask[i] & deform_groupbit) {
	v[i][0] -= h_rate[4];
	v[i][1] -= h_rate[3];
	v[i][2] -= h_rate[2];
	}
	idim = image[i] >> IMG2BITS;
	otherdims = image[i] ^ (idim << IMG2BITS);
	idim++;
	idim &= IMGMASK;
	image[i] = otherdims \| (idim << IMG2BITS);
	}
	}
	}
	}

	/* ----------------------------------------------------------------------
	check that point is inside box boundaries, in [lo,hi) sense
	return 1 if true, 0 if false
	------------------------------------------------------------------------- */

	int Domain::inside(double* x)
	{
	double lo,hi;
	double lamda[3];

	if (triclinic == 0) {
	lo = boxlo;
	hi = boxhi;

	if (x[0] < lo[0] \|\| x[0] >= hi[0] \|\|
	x[1] < lo[1] \|\| x[1] >= hi[1] \|\|
	x[2] < lo[2] \|\| x[2] >= hi[2]) return 0;
	else return 1;

	} else {
	lo = boxlo_lamda;
	hi = boxhi_lamda;

	x2lamda(x,lamda);

	if (lamda[0] < lo[0] \|\| lamda[0] >= hi[0] \|\|
	lamda[1] < lo[1] \|\| lamda[1] >= hi[1] \|\|
	lamda[2] < lo[2] \|\| lamda[2] >= hi[2]) return 0;
	else return 1;

	}

	}

	/* ----------------------------------------------------------------------
	check that point is inside nonperiodic boundaries, in [lo,hi) sense
	return 1 if true, 0 if false
	------------------------------------------------------------------------- */

	int Domain::inside_nonperiodic(double* x)
	{
	double lo,hi;
	double lamda[3];

	if (xperiodic && yperiodic && zperiodic) return 1;

	if (triclinic == 0) {
	lo = boxlo;
	hi = boxhi;

	if (!xperiodic && (x[0] < lo[0] \|\| x[0] >= hi[0])) return 0;
	if (!yperiodic && (x[1] < lo[1] \|\| x[1] >= hi[1])) return 0;
	if (!zperiodic && (x[2] < lo[2] \|\| x[2] >= hi[2])) return 0;
	return 1;

	} else {
	lo = boxlo_lamda;
	hi = boxhi_lamda;

	x2lamda(x,lamda);

	if (!xperiodic && (lamda[0] < lo[0] \|\| lamda[0] >= hi[0])) return 0;
	if (!yperiodic && (lamda[1] < lo[1] \|\| lamda[1] >= hi[1])) return 0;
	if (!zperiodic && (lamda[2] < lo[2] \|\| lamda[2] >= hi[2])) return 0;
	return 1;
	}

	}

	/* ----------------------------------------------------------------------
	warn if image flags of any bonded atoms are inconsistent
	could be a problem when using replicate or fix rigid
	------------------------------------------------------------------------- */

	void Domain::image_check()
	{
	int i,j,k,n,imol,iatom;
	tagint tagprev;

	// only need to check if system is molecular and some dimension is periodic
	// if running verlet/split, don't check on KSpace partition since
	// it has no ghost atoms and thus bond partners won't exist

	if (!atom->molecular) return;
	if (!xperiodic && !yperiodic && (dimension == 2 \|\| !zperiodic)) return;
	if (strncmp(update->integrate_style,"verlet/split",12) == 0 &&
	universe->iworld != 0) return;

	// communicate unwrapped position of owned atoms to ghost atoms

	double **unwrap;
	memory->create(unwrap,atom->nmax,3,"domain:unwrap");

	double **x = atom->x;
	imageint *image = atom->image;
	int nlocal = atom->nlocal;

	for (i = 0; i < nlocal; i++)
	unmap(x[i],image[i],unwrap[i]);

	comm->forward_comm_array(3,unwrap);

	// compute unwrapped extent of each bond
	// flag if any bond component is longer than 1/2 of periodic box length
	// flag if any bond component is longer than non-periodic box length
	// which means image flags in that dimension were different

	int molecular = atom->molecular;

	int *num_bond = atom->num_bond;
	tagint **bond_atom = atom->bond_atom;
	int **bond_type = atom->bond_type;
	tagint *tag = atom->tag;
	int *molindex = atom->molindex;
	int *molatom = atom->molatom;
	Molecule **onemols = atom->avec->onemols;

	double delx,dely,delz;

	int lostbond = output->thermo->lostbond;
	int nmissing = 0;

	int flag = 0;
	for (i = 0; i < nlocal; i++) {
	if (molecular == 1) n = num_bond[i];
	else {
	if (molindex[i] < 0) continue;
	imol = molindex[i];
	iatom = molatom[i];
	n = onemols[imol]->num_bond[iatom];
	}

	for (j = 0; j < n; j++) {
	if (molecular == 1) {
	if (bond_type[i][j] <= 0) continue;
	k = atom->map(bond_atom[i][j]);
	} else {
	if (onemols[imol]->bond_type[iatom][j] < 0) continue;
	tagprev = tag[i] - iatom - 1;
	k = atom->map(onemols[imol]->bond_atom[iatom][j]+tagprev);
	}

	if (k == -1) {
	nmissing++;
	if (lostbond == ERROR)
	error->one(FLERR,"Bond atom missing in image check");
	continue;
	}

	delx = unwrap[i][0] - unwrap[k][0];
	dely = unwrap[i][1] - unwrap[k][1];
	delz = unwrap[i][2] - unwrap[k][2];

	if (xperiodic && delx > xprd_half) flag = 1;
	if (xperiodic && dely > yprd_half) flag = 1;
	if (dimension == 3 && zperiodic && delz > zprd_half) flag = 1;
	if (!xperiodic && delx > xprd) flag = 1;
	if (!yperiodic && dely > yprd) flag = 1;
	if (dimension == 3 && !zperiodic && delz > zprd) flag = 1;
	}
	}

	int flagall;
	MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_MAX,world);
	if (flagall && comm->me == 0)
	error->warning(FLERR,"Inconsistent image flags");

	if (lostbond == WARN) {
	int all;
	MPI_Allreduce(&nmissing,&all,1,MPI_INT,MPI_SUM,world);
	if (all && comm->me == 0)
	error->warning(FLERR,"Bond atom missing in image check");
	}

	memory->destroy(unwrap);
	}

	/* ----------------------------------------------------------------------
	warn if end atoms in any bonded interaction
	are further apart than half a periodic box length
	could cause problems when bonded neighbor list is built since
	closest_image() could return wrong image
	------------------------------------------------------------------------- */

	void Domain::box_too_small_check()
	{
	int i,j,k,n,imol,iatom;
	tagint tagprev;

	// only need to check if system is molecular and some dimension is periodic
	// if running verlet/split, don't check on KSpace partition since
	// it has no ghost atoms and thus bond partners won't exist

	if (!atom->molecular) return;
	if (!xperiodic && !yperiodic && (dimension == 2 \|\| !zperiodic)) return;
	if (strncmp(update->integrate_style,"verlet/split",12) == 0 &&
	universe->iworld != 0) return;

	// maxbondall = longest current bond length
	// if periodic box dim is tiny (less than 2 * bond-length),
	// minimum_image() itself may compute bad bond lengths
	// in this case, image_check() should warn,
	// assuming 2 atoms have consistent image flags

	int molecular = atom->molecular;

	double **x = atom->x;
	int *num_bond = atom->num_bond;
	tagint **bond_atom = atom->bond_atom;
	int **bond_type = atom->bond_type;
	tagint *tag = atom->tag;
	int *molindex = atom->molindex;
	int *molatom = atom->molatom;
	Molecule **onemols = atom->avec->onemols;
	int nlocal = atom->nlocal;

	double delx,dely,delz,rsq;
	double maxbondme = 0.0;

	int lostbond = output->thermo->lostbond;
	int nmissing = 0;

	for (i = 0; i < nlocal; i++) {
	if (molecular == 1) n = num_bond[i];
	else {
	if (molindex[i] < 0) continue;
	imol = molindex[i];
	iatom = molatom[i];
	n = onemols[imol]->num_bond[iatom];
	}

	for (j = 0; j < n; j++) {
	if (molecular == 1) {
	if (bond_type[i][j] <= 0) continue;
	k = atom->map(bond_atom[i][j]);
	} else {
	if (onemols[imol]->bond_type[iatom][j] < 0) continue;
	tagprev = tag[i] - iatom - 1;
	k = atom->map(onemols[imol]->bond_atom[iatom][j]+tagprev);
	}

	if (k == -1) {
	nmissing++;
	if (lostbond == ERROR)
	error->one(FLERR,"Bond atom missing in box size check");
	continue;
	}

	delx = x[i][0] - x[k][0];
	dely = x[i][1] - x[k][1];
	delz = x[i][2] - x[k][2];
	minimum_image(delx,dely,delz);
	rsq = delxdelx + delydely + delz*delz;
	maxbondme = MAX(maxbondme,rsq);
	}
	}

	if (lostbond == WARN) {
	int all;
	MPI_Allreduce(&nmissing,&all,1,MPI_INT,MPI_SUM,world);
	if (all && comm->me == 0)
	error->warning(FLERR,"Bond atom missing in box size check");
	}

	double maxbondall;
	MPI_Allreduce(&maxbondme,&maxbondall,1,MPI_DOUBLE,MPI_MAX,world);
	maxbondall = sqrt(maxbondall);

	// maxdelta = furthest apart 2 atoms in a bonded interaction can be
	// include BONDSTRETCH factor to account for dynamics

	double maxdelta = maxbondall * BONDSTRETCH;
	if (atom->nangles) maxdelta = 2.0 * maxbondall * BONDSTRETCH;
	if (atom->ndihedrals) maxdelta = 3.0 * maxbondall * BONDSTRETCH;

	// warn if maxdelta > than half any periodic box length
	// since atoms in the interaction could rotate into that dimension

	int flag = 0;
	if (xperiodic && maxdelta > xprd_half) flag = 1;
	if (yperiodic && maxdelta > yprd_half) flag = 1;
	if (dimension == 3 && zperiodic && maxdelta > zprd_half) flag = 1;

	int flagall;
	MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_MAX,world);
	if (flagall && comm->me == 0)
	error->warning(FLERR,
	"Bond/angle/dihedral extent > half of periodic box length");
	}

	/* ----------------------------------------------------------------------
	check warn if any proc's subbox is smaller than thresh
	since may lead to lost atoms in comm->exchange()
	current callers set thresh = neighbor skin
	------------------------------------------------------------------------- */

	void Domain::subbox_too_small_check(double thresh)
	{
	int flag = 0;
	if (!triclinic) {
	if (subhi[0]-sublo[0] < thresh \|\| subhi[1]-sublo[1] < thresh) flag = 1;
	if (dimension == 3 && subhi[2]-sublo[2] < thresh) flag = 1;
	} else {
	double delta = subhi_lamda[0] - sublo_lamda[0];
	if (delta*prd[0] < thresh) flag = 1;
	delta = subhi_lamda[1] - sublo_lamda[1];
	if (delta*prd[1] < thresh) flag = 1;
	if (dimension == 3) {
	delta = subhi_lamda[2] - sublo_lamda[2];
	if (delta*prd[2] < thresh) flag = 1;
	}
	}

	int flagall;
	MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_SUM,world);
	if (flagall && comm->me == 0)
	error->warning(FLERR,"Proc sub-domain size < neighbor skin, "
	"could lead to lost atoms");
	}

	/* ----------------------------------------------------------------------
	minimum image convention in periodic dimensions
	use 1/2 of box size as test
	for triclinic, also add/subtract tilt factors in other dims as needed
	changed "if" to "while" to enable distance to
	far-away ghost atom returned by atom->map() to be wrapped back into box
	could be problem for looking up atom IDs when cutoff > boxsize
	+ this should not be used if atom has moved infinitely far outside box
	+ b/c while could iterate forever
	+ e.g. fix shake prediction of new position with highly overlapped atoms
	+ use minimum_image_once() instead
	------------------------------------------------------------------------- */

	void Domain::minimum_image(double &dx, double &dy, double &dz)
	{
	if (triclinic == 0) {
	if (xperiodic) {
	while (fabs(dx) > xprd_half) {
	if (dx < 0.0) dx += xprd;
	else dx -= xprd;
	}
	}
	if (yperiodic) {
	while (fabs(dy) > yprd_half) {
	if (dy < 0.0) dy += yprd;
	else dy -= yprd;
	}
	}
	if (zperiodic) {
	while (fabs(dz) > zprd_half) {
	if (dz < 0.0) dz += zprd;
	else dz -= zprd;
	}
	}

	} else {
	if (zperiodic) {
	while (fabs(dz) > zprd_half) {
	if (dz < 0.0) {
	dz += zprd;
	dy += yz;
	dx += xz;
	} else {
	dz -= zprd;
	dy -= yz;
	dx -= xz;
	}
	}
	}
	if (yperiodic) {
	while (fabs(dy) > yprd_half) {
	if (dy < 0.0) {
	dy += yprd;
	dx += xy;
	} else {
	dy -= yprd;
	dx -= xy;
	}
	}
	}
	if (xperiodic) {
	while (fabs(dx) > xprd_half) {
	if (dx < 0.0) dx += xprd;
	else dx -= xprd;
	}
	}
	}
	}

	/* ----------------------------------------------------------------------
	minimum image convention in periodic dimensions
	use 1/2 of box size as test
	for triclinic, also add/subtract tilt factors in other dims as needed
	changed "if" to "while" to enable distance to
	far-away ghost atom returned by atom->map() to be wrapped back into box
	could be problem for looking up atom IDs when cutoff > boxsize
	+ this should not be used if atom has moved infinitely far outside box
	+ b/c while could iterate forever
	+ e.g. fix shake prediction of new position with highly overlapped atoms
	+ use minimum_image_once() instead
	------------------------------------------------------------------------- */

	void Domain::minimum_image(double *delta)
	{
	if (triclinic == 0) {
	if (xperiodic) {
	while (fabs(delta[0]) > xprd_half) {
	if (delta[0] < 0.0) delta[0] += xprd;
	else delta[0] -= xprd;
	}
	}
	if (yperiodic) {
	while (fabs(delta[1]) > yprd_half) {
	if (delta[1] < 0.0) delta[1] += yprd;
	else delta[1] -= yprd;
	}
	}
	if (zperiodic) {
	while (fabs(delta[2]) > zprd_half) {
	if (delta[2] < 0.0) delta[2] += zprd;
	else delta[2] -= zprd;
	}
	}

	} else {
	if (zperiodic) {
	while (fabs(delta[2]) > zprd_half) {
	if (delta[2] < 0.0) {
	delta[2] += zprd;
	delta[1] += yz;
	delta[0] += xz;
	} else {
	delta[2] -= zprd;
	delta[1] -= yz;
	delta[0] -= xz;
	}
	}
	}
	if (yperiodic) {
	while (fabs(delta[1]) > yprd_half) {
	if (delta[1] < 0.0) {
	delta[1] += yprd;
	delta[0] += xy;
	} else {
	delta[1] -= yprd;
	delta[0] -= xy;
	}
	}
	}
	if (xperiodic) {
	while (fabs(delta[0]) > xprd_half) {
	if (delta[0] < 0.0) delta[0] += xprd;
	else delta[0] -= xprd;
	}
	}
	}
	}

	+/* ----------------------------------------------------------------------
	+ minimum image convention in periodic dimensions
	+ use 1/2 of box size as test
	+ for triclinic, also add/subtract tilt factors in other dims as needed
	+ only shift by one box length in each direction
	+ this should not be used if multiple box shifts are required
	+------------------------------------------------------------------------- */
	+
	+void Domain::minimum_image_once(double *delta)
	+{
	+ if (triclinic == 0) {
	+ if (xperiodic) {
	+ if (fabs(delta[0]) > xprd_half) {
	+ if (delta[0] < 0.0) delta[0] += xprd;
	+ else delta[0] -= xprd;
	+ }
	+ }
	+ if (yperiodic) {
	+ if (fabs(delta[1]) > yprd_half) {
	+ if (delta[1] < 0.0) delta[1] += yprd;
	+ else delta[1] -= yprd;
	+ }
	+ }
	+ if (zperiodic) {
	+ if (fabs(delta[2]) > zprd_half) {
	+ if (delta[2] < 0.0) delta[2] += zprd;
	+ else delta[2] -= zprd;
	+ }
	+ }
	+
	+ } else {
	+ if (zperiodic) {
	+ if (fabs(delta[2]) > zprd_half) {
	+ if (delta[2] < 0.0) {
	+ delta[2] += zprd;
	+ delta[1] += yz;
	+ delta[0] += xz;
	+ } else {
	+ delta[2] -= zprd;
	+ delta[1] -= yz;
	+ delta[0] -= xz;
	+ }
	+ }
	+ }
	+ if (yperiodic) {
	+ if (fabs(delta[1]) > yprd_half) {
	+ if (delta[1] < 0.0) {
	+ delta[1] += yprd;
	+ delta[0] += xy;
	+ } else {
	+ delta[1] -= yprd;
	+ delta[0] -= xy;
	+ }
	+ }
	+ }
	+ if (xperiodic) {
	+ if (fabs(delta[0]) > xprd_half) {
	+ if (delta[0] < 0.0) delta[0] += xprd;
	+ else delta[0] -= xprd;
	+ }
	+ }
	+ }
	+}
	+
	/* ----------------------------------------------------------------------
	return local index of atom J or any of its images that is closest to atom I
	if J is not a valid index like -1, just return it
	------------------------------------------------------------------------- */

	int Domain::closest_image(int i, int j)
	{
	if (j < 0) return j;

	int *sametag = atom->sametag;
	double **x = atom->x;
	double *xi = x[i];

	int closest = j;
	double delx = xi[0] - x[j][0];
	double dely = xi[1] - x[j][1];
	double delz = xi[2] - x[j][2];
	double rsqmin = delxdelx + delydely + delz*delz;
	double rsq;

	while (sametag[j] >= 0) {
	j = sametag[j];
	delx = xi[0] - x[j][0];
	dely = xi[1] - x[j][1];
	delz = xi[2] - x[j][2];
	rsq = delxdelx + delydely + delz*delz;
	if (rsq < rsqmin) {
	rsqmin = rsq;
	closest = j;
	}
	}

	return closest;
	}

	/* ----------------------------------------------------------------------
	return local index of atom J or any of its images that is closest to pos
	if J is not a valid index like -1, just return it
	------------------------------------------------------------------------- */

	int Domain::closest_image(double *pos, int j)
	{
	if (j < 0) return j;

	int *sametag = atom->sametag;
	double **x = atom->x;

	int closest = j;
	double delx = pos[0] - x[j][0];
	double dely = pos[1] - x[j][1];
	double delz = pos[2] - x[j][2];
	double rsqmin = delxdelx + delydely + delz*delz;
	double rsq;

	while (sametag[j] >= 0) {
	j = sametag[j];
	delx = pos[0] - x[j][0];
	dely = pos[1] - x[j][1];
	delz = pos[2] - x[j][2];
	rsq = delxdelx + delydely + delz*delz;
	if (rsq < rsqmin) {
	rsqmin = rsq;
	closest = j;
	}
	}

	return closest;
	}

	/* ----------------------------------------------------------------------
	find and return Xj image = periodic image of Xj that is closest to Xi
	for triclinic, add/subtract tilt factors in other dims as needed
	not currently used (Jan 2017):
	used to be called by pair TIP4P styles but no longer,
	due to use of other closest_image() method
	------------------------------------------------------------------------- */

	void Domain::closest_image(const double * const xi, const double * const xj,
	double * const xjimage)
	{
	double dx = xj[0] - xi[0];
	double dy = xj[1] - xi[1];
	double dz = xj[2] - xi[2];

	if (triclinic == 0) {
	if (xperiodic) {
	if (dx < 0.0) {
	while (dx < 0.0) dx += xprd;
	if (dx > xprd_half) dx -= xprd;
	} else {
	while (dx > 0.0) dx -= xprd;
	if (dx < -xprd_half) dx += xprd;
	}
	}
	if (yperiodic) {
	if (dy < 0.0) {
	while (dy < 0.0) dy += yprd;
	if (dy > yprd_half) dy -= yprd;
	} else {
	while (dy > 0.0) dy -= yprd;
	if (dy < -yprd_half) dy += yprd;
	}
	}
	if (zperiodic) {
	if (dz < 0.0) {
	while (dz < 0.0) dz += zprd;
	if (dz > zprd_half) dz -= zprd;
	} else {
	while (dz > 0.0) dz -= zprd;
	if (dz < -zprd_half) dz += zprd;
	}
	}

	} else {
	if (zperiodic) {
	if (dz < 0.0) {
	while (dz < 0.0) {
	dz += zprd;
	dy += yz;
	dx += xz;
	}
	if (dz > zprd_half) {
	dz -= zprd;
	dy -= yz;
	dx -= xz;
	}
	} else {
	while (dz > 0.0) {
	dz -= zprd;
	dy -= yz;
	dx -= xz;
	}
	if (dz < -zprd_half) {
	dz += zprd;
	dy += yz;
	dx += xz;
	}
	}
	}
	if (yperiodic) {
	if (dy < 0.0) {
	while (dy < 0.0) {
	dy += yprd;
	dx += xy;
	}
	if (dy > yprd_half) {
	dy -= yprd;
	dx -= xy;
	}
	} else {
	while (dy > 0.0) {
	dy -= yprd;
	dx -= xy;
	}
	if (dy < -yprd_half) {
	dy += yprd;
	dx += xy;
	}
	}
	}
	if (xperiodic) {
	if (dx < 0.0) {
	while (dx < 0.0) dx += xprd;
	if (dx > xprd_half) dx -= xprd;
	} else {
	while (dx > 0.0) dx -= xprd;
	if (dx < -xprd_half) dx += xprd;
	}
	}
	}

	xjimage[0] = xi[0] + dx;
	xjimage[1] = xi[1] + dy;
	xjimage[2] = xi[2] + dz;
	}

	/* ----------------------------------------------------------------------
	remap the point into the periodic box no matter how far away
	adjust 3 image flags encoded in image accordingly
	resulting coord must satisfy lo <= coord < hi
	MAX is important since coord - prd < lo can happen when coord = hi
	for triclinic, point is converted to lamda coords (0-1) before doing remap
	image = 10 bits for each dimension
	increment/decrement in wrap-around fashion
	------------------------------------------------------------------------- */

	void Domain::remap(double *x, imageint &image)
	{
	double lo,hi,period,coord;
	double lamda[3];
	imageint idim,otherdims;

	if (triclinic == 0) {
	lo = boxlo;
	hi = boxhi;
	period = prd;
	coord = x;
	} else {
	lo = boxlo_lamda;
	hi = boxhi_lamda;
	period = prd_lamda;
	x2lamda(x,lamda);
	coord = lamda;
	}

	if (xperiodic) {
	while (coord[0] < lo[0]) {
	coord[0] += period[0];
	idim = image & IMGMASK;
	otherdims = image ^ idim;
	idim--;
	idim &= IMGMASK;
	image = otherdims \| idim;
	}
	while (coord[0] >= hi[0]) {
	coord[0] -= period[0];
	idim = image & IMGMASK;
	otherdims = image ^ idim;
	idim++;
	idim &= IMGMASK;
	image = otherdims \| idim;
	}
	coord[0] = MAX(coord[0],lo[0]);
	}

	if (yperiodic) {
	while (coord[1] < lo[1]) {
	coord[1] += period[1];
	idim = (image >> IMGBITS) & IMGMASK;
	otherdims = image ^ (idim << IMGBITS);
	idim--;
	idim &= IMGMASK;
	image = otherdims \| (idim << IMGBITS);
	}
	while (coord[1] >= hi[1]) {
	coord[1] -= period[1];
	idim = (image >> IMGBITS) & IMGMASK;
	otherdims = image ^ (idim << IMGBITS);
	idim++;
	idim &= IMGMASK;
	image = otherdims \| (idim << IMGBITS);
	}
	coord[1] = MAX(coord[1],lo[1]);
	}

	if (zperiodic) {
	while (coord[2] < lo[2]) {
	coord[2] += period[2];
	idim = image >> IMG2BITS;
	otherdims = image ^ (idim << IMG2BITS);
	idim--;
	idim &= IMGMASK;
	image = otherdims \| (idim << IMG2BITS);
	}
	while (coord[2] >= hi[2]) {
	coord[2] -= period[2];
	idim = image >> IMG2BITS;
	otherdims = image ^ (idim << IMG2BITS);
	idim++;
	idim &= IMGMASK;
	image = otherdims \| (idim << IMG2BITS);
	}
	coord[2] = MAX(coord[2],lo[2]);
	}

	if (triclinic) lamda2x(coord,x);
	}

	/* ----------------------------------------------------------------------
	remap the point into the periodic box no matter how far away
	no image flag calculation
	resulting coord must satisfy lo <= coord < hi
	MAX is important since coord - prd < lo can happen when coord = hi
	for triclinic, point is converted to lamda coords (0-1) before remap
	------------------------------------------------------------------------- */

	void Domain::remap(double *x)
	{
	double lo,hi,period,coord;
	double lamda[3];

	if (triclinic == 0) {
	lo = boxlo;
	hi = boxhi;
	period = prd;
	coord = x;
	} else {
	lo = boxlo_lamda;
	hi = boxhi_lamda;
	period = prd_lamda;
	x2lamda(x,lamda);
	coord = lamda;
	}

	if (xperiodic) {
	while (coord[0] < lo[0]) coord[0] += period[0];
	while (coord[0] >= hi[0]) coord[0] -= period[0];
	coord[0] = MAX(coord[0],lo[0]);
	}

	if (yperiodic) {
	while (coord[1] < lo[1]) coord[1] += period[1];
	while (coord[1] >= hi[1]) coord[1] -= period[1];
	coord[1] = MAX(coord[1],lo[1]);
	}

	if (zperiodic) {
	while (coord[2] < lo[2]) coord[2] += period[2];
	while (coord[2] >= hi[2]) coord[2] -= period[2];
	coord[2] = MAX(coord[2],lo[2]);
	}

	if (triclinic) lamda2x(coord,x);
	}

	/* ----------------------------------------------------------------------
	remap xnew to be within half box length of xold
	do it directly, not iteratively, in case is far away
	for triclinic, both points are converted to lamda coords (0-1) before remap
	------------------------------------------------------------------------- */

	void Domain::remap_near(double xnew, double xold)
	{
	int n;
	double coordnew,coordold,period,half;
	double lamdanew[3],lamdaold[3];

	if (triclinic == 0) {
	period = prd;
	half = prd_half;
	coordnew = xnew;
	coordold = xold;
	} else {
	period = prd_lamda;
	half = prd_half_lamda;
	x2lamda(xnew,lamdanew);
	coordnew = lamdanew;
	x2lamda(xold,lamdaold);
	coordold = lamdaold;
	}

	// iterative form
	// if (xperiodic) {
	// while (coordnew[0]-coordold[0] > half[0]) coordnew[0] -= period[0];
	// while (coordold[0]-coordnew[0] > half[0]) coordnew[0] += period[0];
	// }

	if (xperiodic) {
	if (coordnew[0]-coordold[0] > period[0]) {
	n = static_cast<int> ((coordnew[0]-coordold[0])/period[0]);
	coordnew[0] -= n*period[0];
	}
	while (coordnew[0]-coordold[0] > half[0]) coordnew[0] -= period[0];
	if (coordold[0]-coordnew[0] > period[0]) {
	n = static_cast<int> ((coordold[0]-coordnew[0])/period[0]);
	coordnew[0] += n*period[0];
	}
	while (coordold[0]-coordnew[0] > half[0]) coordnew[0] += period[0];
	}

	if (yperiodic) {
	if (coordnew[1]-coordold[1] > period[1]) {
	n = static_cast<int> ((coordnew[1]-coordold[1])/period[1]);
	coordnew[1] -= n*period[1];
	}
	while (coordnew[1]-coordold[1] > half[1]) coordnew[1] -= period[1];
	if (coordold[1]-coordnew[1] > period[1]) {
	n = static_cast<int> ((coordold[1]-coordnew[1])/period[1]);
	coordnew[1] += n*period[1];
	}
	while (coordold[1]-coordnew[1] > half[1]) coordnew[1] += period[1];
	}

	if (zperiodic) {
	if (coordnew[2]-coordold[2] > period[2]) {
	n = static_cast<int> ((coordnew[2]-coordold[2])/period[2]);
	coordnew[2] -= n*period[2];
	}
	while (coordnew[2]-coordold[2] > half[2]) coordnew[2] -= period[2];
	if (coordold[2]-coordnew[2] > period[2]) {
	n = static_cast<int> ((coordold[2]-coordnew[2])/period[2]);
	coordnew[2] += n*period[2];
	}
	while (coordold[2]-coordnew[2] > half[2]) coordnew[2] += period[2];
	}

	if (triclinic) lamda2x(coordnew,xnew);
	}

	/* ----------------------------------------------------------------------
	unmap the point via image flags
	x overwritten with result, don't reset image flag
	for triclinic, use h[] to add in tilt factors in other dims as needed
	------------------------------------------------------------------------- */

	void Domain::unmap(double *x, imageint image)
	{
	int xbox = (image & IMGMASK) - IMGMAX;
	int ybox = (image >> IMGBITS & IMGMASK) - IMGMAX;
	int zbox = (image >> IMG2BITS) - IMGMAX;

	if (triclinic == 0) {
	x[0] += xbox*xprd;
	x[1] += ybox*yprd;
	x[2] += zbox*zprd;
	} else {
	x[0] += h[0]xbox + h[5]ybox + h[4]*zbox;
	x[1] += h[1]ybox + h[3]zbox;
	x[2] += h[2]*zbox;
	}
	}

	/* ----------------------------------------------------------------------
	unmap the point via image flags
	result returned in y, don't reset image flag
	for triclinic, use h[] to add in tilt factors in other dims as needed
	------------------------------------------------------------------------- */

	void Domain::unmap(const double x, imageint image, double y)
	{
	int xbox = (image & IMGMASK) - IMGMAX;
	int ybox = (image >> IMGBITS & IMGMASK) - IMGMAX;
	int zbox = (image >> IMG2BITS) - IMGMAX;

	if (triclinic == 0) {
	y[0] = x[0] + xbox*xprd;
	y[1] = x[1] + ybox*yprd;
	y[2] = x[2] + zbox*zprd;
	} else {
	y[0] = x[0] + h[0]xbox + h[5]ybox + h[4]*zbox;
	y[1] = x[1] + h[1]ybox + h[3]zbox;
	y[2] = x[2] + h[2]*zbox;
	}
	}

	/* ----------------------------------------------------------------------
	adjust image flags due to triclinic box flip
	flip operation is changing box vectors A,B,C to new A',B',C'
	A' = A (A does not change)
	B' = B + mA (B shifted by A)
	C' = C + pB + nA (C shifted by B and/or A)
	this requires the image flags change from (a,b,c) to (a',b',c')
	so that x_unwrap for each atom is same before/after
	x_unwrap_before = xlocal + aA + bB + cC
	x_unwrap_after = xlocal + a'A' + b'B' + c'C'
	this requires:
	c' = c
	b' = b - cp
	a' = a - (b-cp)m - cn = a - b'm - cn
	in other words, for xy flip, change in x flag depends on current y flag
	this is b/c the xy flip dramatically changes which tiled image of
	simulation box an unwrapped point maps to
	------------------------------------------------------------------------- */

	void Domain::image_flip(int m, int n, int p)
	{
	imageint *image = atom->image;
	int nlocal = atom->nlocal;

	for (int i = 0; i < nlocal; i++) {
	int xbox = (image[i] & IMGMASK) - IMGMAX;
	int ybox = (image[i] >> IMGBITS & IMGMASK) - IMGMAX;
	int zbox = (image[i] >> IMG2BITS) - IMGMAX;

	ybox -= p*zbox;
	xbox -= mybox + nzbox;

	image[i] = ((imageint) (xbox + IMGMAX) & IMGMASK) \|
	(((imageint) (ybox + IMGMAX) & IMGMASK) << IMGBITS) \|
	(((imageint) (zbox + IMGMAX) & IMGMASK) << IMG2BITS);
	}
	}

	/* ----------------------------------------------------------------------
	return 1 if this proc owns atom with coords x, else return 0
	x is returned remapped into periodic box
	if image flag is passed, flag is updated via remap(x,image)
	if image = NULL is passed, no update with remap(x)
	if shrinkexceed, atom can be outside shrinkwrap boundaries
	called from create_atoms() in library.cpp
	------------------------------------------------------------------------- */

	int Domain::ownatom(int id, double x, imageint image, int shrinkexceed)
	{
	double lamda[3];
	double coord,blo,bhi,slo,*shi;

	if (image) remap(x,*image);
	else remap(x);

	if (triclinic) {
	x2lamda(x,lamda);
	coord = lamda;
	} else coord = x;

	// box and subbox bounds for orthogonal vs triclinic

	if (triclinic == 0) {
	blo = boxlo;
	bhi = boxhi;
	slo = sublo;
	shi = subhi;
	} else {
	blo = boxlo_lamda;
	bhi = boxhi_lamda;
	slo = sublo_lamda;
	shi = subhi_lamda;
	}

	if (coord[0] >= slo[0] && coord[0] < shi[0] &&
	coord[1] >= slo[1] && coord[1] < shi[1] &&
	coord[2] >= slo[2] && coord[2] < shi[2]) return 1;

	// check if atom did not return 1 only b/c it was
	// outside a shrink-wrapped boundary

	if (shrinkexceed) {
	int outside = 0;
	if (coord[0] < blo[0] && boundary[0][0] > 1) outside = 1;
	if (coord[0] >= bhi[0] && boundary[0][1] > 1) outside = 1;
	if (coord[1] < blo[1] && boundary[1][0] > 1) outside = 1;
	if (coord[1] >= bhi[1] && boundary[1][1] > 1) outside = 1;
	if (coord[2] < blo[2] && boundary[2][0] > 1) outside = 1;
	if (coord[2] >= bhi[2] && boundary[2][1] > 1) outside = 1;
	if (!outside) return 0;

	// newcoord = coords pushed back to be on shrink-wrapped boundary
	// newcoord is a copy, so caller's x[] is not affected

	double newcoord[3];
	if (coord[0] < blo[0] && boundary[0][0] > 1) newcoord[0] = blo[0];
	else if (coord[0] >= bhi[0] && boundary[0][1] > 1) newcoord[0] = bhi[0];
	else newcoord[0] = coord[0];
	if (coord[1] < blo[1] && boundary[1][0] > 1) newcoord[1] = blo[1];
	else if (coord[1] >= bhi[1] && boundary[1][1] > 1) newcoord[1] = bhi[1];
	else newcoord[1] = coord[1];
	if (coord[2] < blo[2] && boundary[2][0] > 1) newcoord[2] = blo[2];
	else if (coord[2] >= bhi[2] && boundary[2][1] > 1) newcoord[2] = bhi[2];
	else newcoord[2] = coord[2];

	// re-test for newcoord inside my sub-domain
	// use <= test for upper-boundary since may have just put atom at boxhi

	if (newcoord[0] >= slo[0] && newcoord[0] <= shi[0] &&
	newcoord[1] >= slo[1] && newcoord[1] <= shi[1] &&
	newcoord[2] >= slo[2] && newcoord[2] <= shi[2]) return 1;
	}

	return 0;
	}

	/* ----------------------------------------------------------------------
	create a lattice
	------------------------------------------------------------------------- */

	void Domain::set_lattice(int narg, char **arg)
	{
	if (lattice) delete lattice;
	lattice = new Lattice(lmp,narg,arg);
	}

	/* ----------------------------------------------------------------------
	create a new region
	------------------------------------------------------------------------- */

	void Domain::add_region(int narg, char **arg)
	{
	if (narg < 2) error->all(FLERR,"Illegal region command");

	if (strcmp(arg[1],"delete") == 0) {
	delete_region(narg,arg);
	return;
	}

	if (find_region(arg[0]) >= 0) error->all(FLERR,"Reuse of region ID");

	// extend Region list if necessary

	if (nregion == maxregion) {
	maxregion += DELTAREGION;
	regions = (Region **)
	memory->srealloc(regions,maxregionsizeof(Region ),"domain:regions");
	}

	// create the Region

	if (lmp->suffix_enable) {
	if (lmp->suffix) {
	char estyle[256];
	sprintf(estyle,"%s/%s",arg[1],lmp->suffix);
	if (region_map->find(estyle) != region_map->end()) {
	RegionCreator region_creator = (*region_map)[estyle];
	regions[nregion] = region_creator(lmp, narg, arg);
	regions[nregion]->init();
	nregion++;
	return;
	}
	}

	if (lmp->suffix2) {
	char estyle[256];
	sprintf(estyle,"%s/%s",arg[1],lmp->suffix2);
	if (region_map->find(estyle) != region_map->end()) {
	RegionCreator region_creator = (*region_map)[estyle];
	regions[nregion] = region_creator(lmp, narg, arg);
	regions[nregion]->init();
	nregion++;
	return;
	}
	}
	}

	if (strcmp(arg[1],"none") == 0) error->all(FLERR,"Unknown region style");
	if (region_map->find(arg[1]) != region_map->end()) {
	RegionCreator region_creator = (*region_map)[arg[1]];
	regions[nregion] = region_creator(lmp, narg, arg);
	}
	else error->all(FLERR,"Unknown region style");

	// initialize any region variables via init()
	// in case region is used between runs, e.g. to print a variable

	regions[nregion]->init();
	nregion++;
	}

	/* ----------------------------------------------------------------------
	one instance per region style in style_region.h
	------------------------------------------------------------------------- */

	template <typename T>
	Region Domain::region_creator(LAMMPS lmp, int narg, char ** arg)
	{
	return new T(lmp, narg, arg);
	}

	/* ----------------------------------------------------------------------
	delete a region
	------------------------------------------------------------------------- */

	void Domain::delete_region(int narg, char **arg)
	{
	if (narg != 2) error->all(FLERR,"Illegal region command");

	int iregion = find_region(arg[0]);
	if (iregion == -1) error->all(FLERR,"Delete region ID does not exist");

	delete regions[iregion];
	regions[iregion] = regions[nregion-1];
	nregion--;
	}

	/* ----------------------------------------------------------------------
	return region index if name matches existing region ID
	return -1 if no such region
	------------------------------------------------------------------------- */

	int Domain::find_region(char *name)
	{
	for (int iregion = 0; iregion < nregion; iregion++)
	if (strcmp(name,regions[iregion]->id) == 0) return iregion;
	return -1;
	}

	/* ----------------------------------------------------------------------
	(re)set boundary settings
	flag = 0, called from the input script
	flag = 1, called from change box command
	------------------------------------------------------------------------- */

	void Domain::set_boundary(int narg, char **arg, int flag)
	{
	if (narg != 3) error->all(FLERR,"Illegal boundary command");

	char c;
	for (int idim = 0; idim < 3; idim++)
	for (int iside = 0; iside < 2; iside++) {
	if (iside == 0) c = arg[idim][0];
	else if (iside == 1 && strlen(arg[idim]) == 1) c = arg[idim][0];
	else c = arg[idim][1];

	if (c == 'p') boundary[idim][iside] = 0;
	else if (c == 'f') boundary[idim][iside] = 1;
	else if (c == 's') boundary[idim][iside] = 2;
	else if (c == 'm') boundary[idim][iside] = 3;
	else {
	if (flag == 0) error->all(FLERR,"Illegal boundary command");
	if (flag == 1) error->all(FLERR,"Illegal change_box command");
	}
	}

	for (int idim = 0; idim < 3; idim++)
	if ((boundary[idim][0] == 0 && boundary[idim][1]) \|\|
	(boundary[idim][0] && boundary[idim][1] == 0))
	error->all(FLERR,"Both sides of boundary must be periodic");

	if (boundary[0][0] == 0) xperiodic = 1;
	else xperiodic = 0;
	if (boundary[1][0] == 0) yperiodic = 1;
	else yperiodic = 0;
	if (boundary[2][0] == 0) zperiodic = 1;
	else zperiodic = 0;

	periodicity[0] = xperiodic;
	periodicity[1] = yperiodic;
	periodicity[2] = zperiodic;

	nonperiodic = 0;
	if (xperiodic == 0 \|\| yperiodic == 0 \|\| zperiodic == 0) {
	nonperiodic = 1;
	if (boundary[0][0] >= 2 \|\| boundary[0][1] >= 2 \|\|
	boundary[1][0] >= 2 \|\| boundary[1][1] >= 2 \|\|
	boundary[2][0] >= 2 \|\| boundary[2][1] >= 2) nonperiodic = 2;
	}
	}

	/* ----------------------------------------------------------------------
	set domain attributes
	------------------------------------------------------------------------- */

	void Domain::set_box(int narg, char **arg)
	{
	if (narg < 1) error->all(FLERR,"Illegal box command");

	int iarg = 0;
	while (iarg < narg) {
	if (strcmp(arg[iarg],"tilt") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal box command");
	if (strcmp(arg[iarg+1],"small") == 0) tiltsmall = 1;
	else if (strcmp(arg[iarg+1],"large") == 0) tiltsmall = 0;
	else error->all(FLERR,"Illegal box command");
	iarg += 2;
	} else error->all(FLERR,"Illegal box command");
	}
	}

	/* ----------------------------------------------------------------------
	print box info, orthogonal or triclinic
	------------------------------------------------------------------------- */

	void Domain::print_box(const char *str)
	{
	if (comm->me == 0) {
	if (screen) {
	if (triclinic == 0)
	fprintf(screen,"%sorthogonal box = (%g %g %g) to (%g %g %g)\n",
	str,boxlo[0],boxlo[1],boxlo[2],boxhi[0],boxhi[1],boxhi[2]);
	else {
	char format = (char )
	"%striclinic box = (%g %g %g) to (%g %g %g) with tilt (%g %g %g)\n";
	fprintf(screen,format,
	str,boxlo[0],boxlo[1],boxlo[2],boxhi[0],boxhi[1],boxhi[2],
	xy,xz,yz);
	}
	}
	if (logfile) {
	if (triclinic == 0)
	fprintf(logfile,"%sorthogonal box = (%g %g %g) to (%g %g %g)\n",
	str,boxlo[0],boxlo[1],boxlo[2],boxhi[0],boxhi[1],boxhi[2]);
	else {
	char format = (char )
	"%striclinic box = (%g %g %g) to (%g %g %g) with tilt (%g %g %g)\n";
	fprintf(logfile,format,
	str,boxlo[0],boxlo[1],boxlo[2],boxhi[0],boxhi[1],boxhi[2],
	xy,xz,yz);
	}
	}
	}
	}

	/* ----------------------------------------------------------------------
	format boundary string for output
	assume str is 9 chars or more in length
	------------------------------------------------------------------------- */

	void Domain::boundary_string(char *str)
	{
	int m = 0;
	for (int idim = 0; idim < 3; idim++) {
	for (int iside = 0; iside < 2; iside++) {
	if (boundary[idim][iside] == 0) str[m++] = 'p';
	else if (boundary[idim][iside] == 1) str[m++] = 'f';
	else if (boundary[idim][iside] == 2) str[m++] = 's';
	else if (boundary[idim][iside] == 3) str[m++] = 'm';
	}
	str[m++] = ' ';
	}
	str[8] = '\0';
	}

	/* ----------------------------------------------------------------------
	convert triclinic 0-1 lamda coords to box coords for all N atoms
	x = H lamda + x0;
	------------------------------------------------------------------------- */

	void Domain::lamda2x(int n)
	{
	double **x = atom->x;

	for (int i = 0; i < n; i++) {
	x[i][0] = h[0]x[i][0] + h[5]x[i][1] + h[4]*x[i][2] + boxlo[0];
	x[i][1] = h[1]x[i][1] + h[3]x[i][2] + boxlo[1];
	x[i][2] = h[2]*x[i][2] + boxlo[2];
	}
	}

	/* ----------------------------------------------------------------------
	convert box coords to triclinic 0-1 lamda coords for all N atoms
	lamda = H^-1 (x - x0)
	------------------------------------------------------------------------- */

	void Domain::x2lamda(int n)
	{
	double delta[3];
	double **x = atom->x;

	for (int i = 0; i < n; i++) {
	delta[0] = x[i][0] - boxlo[0];
	delta[1] = x[i][1] - boxlo[1];
	delta[2] = x[i][2] - boxlo[2];

	x[i][0] = h_inv[0]delta[0] + h_inv[5]delta[1] + h_inv[4]*delta[2];
	x[i][1] = h_inv[1]delta[1] + h_inv[3]delta[2];
	x[i][2] = h_inv[2]*delta[2];
	}
	}

	/* ----------------------------------------------------------------------
	convert triclinic 0-1 lamda coords to box coords for one atom
	x = H lamda + x0;
	lamda and x can point to same 3-vector
	------------------------------------------------------------------------- */

	void Domain::lamda2x(double lamda, double x)
	{
	x[0] = h[0]lamda[0] + h[5]lamda[1] + h[4]*lamda[2] + boxlo[0];
	x[1] = h[1]lamda[1] + h[3]lamda[2] + boxlo[1];
	x[2] = h[2]*lamda[2] + boxlo[2];
	}

	/* ----------------------------------------------------------------------
	convert box coords to triclinic 0-1 lamda coords for one atom
	lamda = H^-1 (x - x0)
	x and lamda can point to same 3-vector
	------------------------------------------------------------------------- */

	void Domain::x2lamda(double x, double lamda)
	{
	double delta[3];
	delta[0] = x[0] - boxlo[0];
	delta[1] = x[1] - boxlo[1];
	delta[2] = x[2] - boxlo[2];

	lamda[0] = h_inv[0]delta[0] + h_inv[5]delta[1] + h_inv[4]*delta[2];
	lamda[1] = h_inv[1]delta[1] + h_inv[3]delta[2];
	lamda[2] = h_inv[2]*delta[2];
	}

	/* ----------------------------------------------------------------------
	convert box coords to triclinic 0-1 lamda coords for one atom
	use my_boxlo & my_h_inv stored by caller for previous state of box
	lamda = H^-1 (x - x0)
	x and lamda can point to same 3-vector
	------------------------------------------------------------------------- */

	void Domain::x2lamda(double x, double lamda,
	double my_boxlo, double my_h_inv)
	{
	double delta[3];
	delta[0] = x[0] - my_boxlo[0];
	delta[1] = x[1] - my_boxlo[1];
	delta[2] = x[2] - my_boxlo[2];

	lamda[0] = my_h_inv[0]delta[0] + my_h_inv[5]delta[1] + my_h_inv[4]*delta[2];
	lamda[1] = my_h_inv[1]delta[1] + my_h_inv[3]delta[2];
	lamda[2] = my_h_inv[2]*delta[2];
	}

	/* ----------------------------------------------------------------------
	convert 8 lamda corner pts of lo/hi box to box coords
	return bboxlo/hi = bounding box around 8 corner pts in box coords
	------------------------------------------------------------------------- */

	void Domain::bbox(double lo, double hi, double bboxlo, double bboxhi)
	{
	double x[3];

	bboxlo[0] = bboxlo[1] = bboxlo[2] = BIG;
	bboxhi[0] = bboxhi[1] = bboxhi[2] = -BIG;

	x[0] = lo[0]; x[1] = lo[1]; x[2] = lo[2];
	lamda2x(x,x);
	bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
	bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
	bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);

	x[0] = hi[0]; x[1] = lo[1]; x[2] = lo[2];
	lamda2x(x,x);
	bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
	bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
	bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);

	x[0] = lo[0]; x[1] = hi[1]; x[2] = lo[2];
	lamda2x(x,x);
	bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
	bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
	bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);

	x[0] = hi[0]; x[1] = hi[1]; x[2] = lo[2];
	lamda2x(x,x);
	bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
	bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
	bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);

	x[0] = lo[0]; x[1] = lo[1]; x[2] = hi[2];
	lamda2x(x,x);
	bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
	bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
	bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);

	x[0] = hi[0]; x[1] = lo[1]; x[2] = hi[2];
	lamda2x(x,x);
	bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
	bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
	bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);

	x[0] = lo[0]; x[1] = hi[1]; x[2] = hi[2];
	lamda2x(x,x);
	bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
	bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
	bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);

	x[0] = hi[0]; x[1] = hi[1]; x[2] = hi[2];
	lamda2x(x,x);
	bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
	bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
	bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
	}

	/* ----------------------------------------------------------------------
	compute 8 corner pts of my triclinic sub-box
	output is in corners, see ordering in lamda_box_corners
	------------------------------------------------------------------------- */

	void Domain::box_corners()
	{
	lamda_box_corners(boxlo_lamda,boxhi_lamda);
	}

	/* ----------------------------------------------------------------------
	compute 8 corner pts of my triclinic sub-box
	output is in corners, see ordering in lamda_box_corners
	------------------------------------------------------------------------- */

	void Domain::subbox_corners()
	{
	lamda_box_corners(sublo_lamda,subhi_lamda);
	}

	/* ----------------------------------------------------------------------
	compute 8 corner pts of any triclinic box with lo/hi in lamda coords
	8 output corners are ordered with x changing fastest, then y, finally z
	could be more efficient if just coded with xy,yz,xz explicitly
	------------------------------------------------------------------------- */

	void Domain::lamda_box_corners(double lo, double hi)
	{
	corners[0][0] = lo[0]; corners[0][1] = lo[1]; corners[0][2] = lo[2];
	lamda2x(corners[0],corners[0]);
	corners[1][0] = hi[0]; corners[1][1] = lo[1]; corners[1][2] = lo[2];
	lamda2x(corners[1],corners[1]);
	corners[2][0] = lo[0]; corners[2][1] = hi[1]; corners[2][2] = lo[2];
	lamda2x(corners[2],corners[2]);
	corners[3][0] = hi[0]; corners[3][1] = hi[1]; corners[3][2] = lo[2];
	lamda2x(corners[3],corners[3]);
	corners[4][0] = lo[0]; corners[4][1] = lo[1]; corners[4][2] = hi[2];
	lamda2x(corners[4],corners[4]);
	corners[5][0] = hi[0]; corners[5][1] = lo[1]; corners[5][2] = hi[2];
	lamda2x(corners[5],corners[5]);
	corners[6][0] = lo[0]; corners[6][1] = hi[1]; corners[6][2] = hi[2];
	lamda2x(corners[6],corners[6]);
	corners[7][0] = hi[0]; corners[7][1] = hi[1]; corners[7][2] = hi[2];
	lamda2x(corners[7],corners[7]);
	}
	diff --git a/src/domain.h b/src/domain.h
	index 22e319123..0f47a3c2c 100644
	--- a/src/domain.h
	+++ b/src/domain.h
	@@ -1,281 +1,282 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifndef LMP_DOMAIN_H
	#define LMP_DOMAIN_H

	#include <math.h>
	#include "pointers.h"
	#include <map>
	#include <string>

	namespace LAMMPS_NS {

	class Domain : protected Pointers {
	public:
	int box_exist; // 0 = not yet created, 1 = exists
	int dimension; // 2 = 2d, 3 = 3d
	int nonperiodic; // 0 = periodic in all 3 dims
	// 1 = periodic or fixed in all 6
	// 2 = shrink-wrap in any of 6
	int xperiodic,yperiodic,zperiodic; // 0 = non-periodic, 1 = periodic
	int periodicity[3]; // xyz periodicity as array

	int boundary[3][2]; // settings for 6 boundaries
	// 0 = periodic
	// 1 = fixed non-periodic
	// 2 = shrink-wrap non-periodic
	// 3 = shrink-wrap non-per w/ min

	int triclinic; // 0 = orthog box, 1 = triclinic
	int tiltsmall; // 1 if limit tilt, else 0

	// orthogonal box
	double xprd,yprd,zprd; // global box dimensions
	double xprd_half,yprd_half,zprd_half; // half dimensions
	double prd[3]; // array form of dimensions
	double prd_half[3]; // array form of half dimensions

	// triclinic box
	// xprd,xprd_half,prd,prd_half =
	// same as if untilted
	double prd_lamda[3]; // lamda box = (1,1,1)
	double prd_half_lamda[3]; // lamda half box = (0.5,0.5,0.5)

	double boxlo[3],boxhi[3]; // orthogonal box global bounds

	// triclinic box
	// boxlo/hi = same as if untilted
	double boxlo_lamda[3],boxhi_lamda[3]; // lamda box = (0,1)
	double boxlo_bound[3],boxhi_bound[3]; // bounding box of tilted domain
	double corners[8][3]; // 8 corner points

	// orthogonal box & triclinic box
	double minxlo,minxhi; // minimum size of global box
	double minylo,minyhi; // when shrink-wrapping
	double minzlo,minzhi; // tri only possible for non-skew dims

	// orthogonal box
	double sublo[3],subhi[3]; // sub-box bounds on this proc

	// triclinic box
	// sublo/hi = undefined
	double sublo_lamda[3],subhi_lamda[3]; // bounds of subbox in lamda

	// triclinic box
	double xy,xz,yz; // 3 tilt factors
	double h[6],h_inv[6]; // shape matrix in Voigt notation
	double h_rate[6],h_ratelo[3]; // rate of box size/shape change

	int box_change; // 1 if any of next 3 flags are set, else 0
	int box_change_size; // 1 if box size changes, 0 if not
	int box_change_shape; // 1 if box shape changes, 0 if not
	int box_change_domain; // 1 if proc sub-domains change, 0 if not

	int deform_flag; // 1 if fix deform exist, else 0
	int deform_vremap; // 1 if fix deform remaps v, else 0
	int deform_groupbit; // atom group to perform v remap for

	class Lattice *lattice; // user-defined lattice

	int nregion; // # of defined Regions
	int maxregion; // max # list can hold
	class Region **regions; // list of defined Regions

	int copymode;

	typedef Region (RegionCreator)(LAMMPS ,int,char*);
	typedef std::map<std::string,RegionCreator> RegionCreatorMap;
	RegionCreatorMap *region_map;

	Domain(class LAMMPS *);
	virtual ~Domain();
	virtual void init();
	void set_initial_box(int expandflag=1);
	virtual void set_global_box();
	virtual void set_lamda_box();
	virtual void set_local_box();
	virtual void reset_box();
	virtual void pbc();
	void image_check();
	void box_too_small_check();
	void subbox_too_small_check(double);
	void minimum_image(double &, double &, double &);
	void minimum_image(double *);
	+ void minimum_image_once(double *);
	int closest_image(int, int);
	int closest_image(double *, int);
	void closest_image(const double * const, const double * const,
	double * const);
	void remap(double *, imageint &);
	void remap(double *);
	void remap_near(double , double );
	void unmap(double *, imageint);
	void unmap(const double , imageint, double );
	void image_flip(int, int, int);
	int ownatom(int, double , imageint , int);

	void set_lattice(int, char **);
	void add_region(int, char **);
	void delete_region(int, char **);
	int find_region(char *);
	void set_boundary(int, char **, int);
	void set_box(int, char **);
	void print_box(const char *);
	void boundary_string(char *);

	virtual void lamda2x(int);
	virtual void x2lamda(int);
	virtual void lamda2x(double , double );
	virtual void x2lamda(double , double );
	int inside(double *);
	int inside_nonperiodic(double *);
	void x2lamda(double , double , double , double );
	void bbox(double , double , double , double );
	void box_corners();
	void subbox_corners();
	void lamda_box_corners(double , double );

	// minimum image convention check
	// return 1 if any distance > 1/2 of box size
	// indicates a special neighbor is actually not in a bond,
	// but is a far-away image that should be treated as an unbonded neighbor
	// inline since called from neighbor build inner loop

	inline int minimum_image_check(double dx, double dy, double dz) {
	if (xperiodic && fabs(dx) > xprd_half) return 1;
	if (yperiodic && fabs(dy) > yprd_half) return 1;
	if (zperiodic && fabs(dz) > zprd_half) return 1;
	return 0;
	}

	protected:
	double small[3]; // fractions of box lengths

	private:
	template <typename T> static Region region_creator(LAMMPS ,int,char**);
	};

	}

	#endif

	/* ERROR/WARNING messages:

	E: Box bounds are invalid

	The box boundaries specified in the read_data file are invalid. The
	lo value must be less than the hi value for all 3 dimensions.

	E: Cannot skew triclinic box in z for 2d simulation

	Self-explanatory.

	E: Triclinic box skew is too large

	The displacement in a skewed direction must be less than half the box
	length in that dimension. E.g. the xy tilt must be between -half and
	+half of the x box length. This constraint can be relaxed by using
	the box tilt command.

	W: Triclinic box skew is large

	The displacement in a skewed direction is normally required to be less
	than half the box length in that dimension. E.g. the xy tilt must be
	between -half and +half of the x box length. You have relaxed the
	constraint using the box tilt command, but the warning means that a
	LAMMPS simulation may be inefficient as a result.

	E: Illegal simulation box

	The lower bound of the simulation box is greater than the upper bound.

	E: Bond atom missing in image check

	The 2nd atom in a particular bond is missing on this processor.
	Typically this is because the pairwise cutoff is set too short or the
	bond has blown apart and an atom is too far away.

	W: Inconsistent image flags

	The image flags for a pair on bonded atoms appear to be inconsistent.
	Inconsistent means that when the coordinates of the two atoms are
	unwrapped using the image flags, the two atoms are far apart.
	Specifically they are further apart than half a periodic box length.
	Or they are more than a box length apart in a non-periodic dimension.
	This is usually due to the initial data file not having correct image
	flags for the 2 atoms in a bond that straddles a periodic boundary.
	They should be different by 1 in that case. This is a warning because
	inconsistent image flags will not cause problems for dynamics or most
	LAMMPS simulations. However they can cause problems when such atoms
	are used with the fix rigid or replicate commands.

	W: Bond atom missing in image check

	The 2nd atom in a particular bond is missing on this processor.
	Typically this is because the pairwise cutoff is set too short or the
	bond has blown apart and an atom is too far away.

	E: Bond atom missing in box size check

	The 2nd atoms needed to compute a particular bond is missing on this
	processor. Typically this is because the pairwise cutoff is set too
	short or the bond has blown apart and an atom is too far away.

	W: Bond atom missing in box size check

	The 2nd atoms needed to compute a particular bond is missing on this
	processor. Typically this is because the pairwise cutoff is set too
	short or the bond has blown apart and an atom is too far away.

	W: Bond/angle/dihedral extent > half of periodic box length

	This is a restriction because LAMMPS can be confused about which image
	of an atom in the bonded interaction is the correct one to use.
	"Extent" in this context means the maximum end-to-end length of the
	bond/angle/dihedral. LAMMPS computes this by taking the maximum bond
	length, multiplying by the number of bonds in the interaction (e.g. 3
	for a dihedral) and adding a small amount of stretch.

	W: Proc sub-domain size < neighbor skin, could lead to lost atoms

	The decomposition of the physical domain (likely due to load
	balancing) has led to a processor's sub-domain being smaller than the
	neighbor skin in one or more dimensions. Since reneighboring is
	triggered by atoms moving the skin distance, this may lead to lost
	atoms, if an atom moves all the way across a neighboring processor's
	sub-domain before reneighboring is triggered.

	E: Illegal ... command

	Self-explanatory. Check the input script syntax and compare to the
	documentation for the command. You can use -echo screen as a
	command-line option when running LAMMPS to see the offending line.

	E: Reuse of region ID

	A region ID cannot be used twice.

	E: Unknown region style

	The choice of region style is unknown.

	E: Delete region ID does not exist

	Self-explanatory.

	E: Both sides of boundary must be periodic

	Cannot specify a boundary as periodic only on the lo or hi side. Must
	be periodic on both sides.

	*/
	diff --git a/src/min.cpp b/src/min.cpp
	index 79d7d6a8b..d308efb84 100644
	--- a/src/min.cpp
	+++ b/src/min.cpp
	@@ -1,821 +1,828 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author: Aidan Thompson (SNL)
	improved CG and backtrack ls, added quadratic ls
	Sources: Numerical Recipes frprmn routine
	"Conjugate Gradient Method Without the Agonizing Pain" by
	JR Shewchuk, http://www-2.cs.cmu.edu/~jrs/jrspapers.html#cg
	------------------------------------------------------------------------- */

	#include <math.h>
	#include <stdlib.h>
	#include <string.h>
	#include "min.h"
	#include "atom.h"
	#include "atom_vec.h"
	#include "domain.h"
	#include "comm.h"
	#include "update.h"
	#include "modify.h"
	#include "fix_minimize.h"
	#include "compute.h"
	#include "neighbor.h"
	#include "force.h"
	#include "pair.h"
	#include "bond.h"
	#include "angle.h"
	#include "dihedral.h"
	#include "improper.h"
	#include "kspace.h"
	#include "output.h"
	#include "thermo.h"
	#include "timer.h"
	#include "memory.h"
	#include "error.h"

	using namespace LAMMPS_NS;

	/* ---------------------------------------------------------------------- */

	Min::Min(LAMMPS *lmp) : Pointers(lmp)
	{
	dmax = 0.1;
	searchflag = 0;
	linestyle = 1;

	elist_global = elist_atom = NULL;
	vlist_global = vlist_atom = NULL;

	nextra_global = 0;
	fextra = NULL;

	nextra_atom = 0;
	xextra_atom = fextra_atom = NULL;
	extra_peratom = extra_nlen = NULL;
	extra_max = NULL;
	requestor = NULL;

	external_force_clear = 0;
	}

	/* ---------------------------------------------------------------------- */

	Min::~Min()
	{
	delete [] elist_global;
	delete [] elist_atom;
	delete [] vlist_global;
	delete [] vlist_atom;

	delete [] fextra;

	memory->sfree(xextra_atom);
	memory->sfree(fextra_atom);
	memory->destroy(extra_peratom);
	memory->destroy(extra_nlen);
	memory->destroy(extra_max);
	memory->sfree(requestor);
	}

	/* ---------------------------------------------------------------------- */

	void Min::init()
	{
	// create fix needed for storing atom-based quantities
	// will delete it at end of run

	char *fixarg = new char[3];
	fixarg[0] = (char *) "MINIMIZE";
	fixarg[1] = (char *) "all";
	fixarg[2] = (char *) "MINIMIZE";
	modify->add_fix(3,fixarg);
	delete [] fixarg;
	fix_minimize = (FixMinimize *) modify->fix[modify->nfix-1];

	// clear out extra global and per-atom dof
	// will receive requests for new per-atom dof during pair init()
	// can then add vectors to fix_minimize in setup()

	nextra_global = 0;
	delete [] fextra;
	fextra = NULL;

	nextra_atom = 0;
	memory->sfree(xextra_atom);
	memory->sfree(fextra_atom);
	memory->destroy(extra_peratom);
	memory->destroy(extra_nlen);
	memory->destroy(extra_max);
	memory->sfree(requestor);
	xextra_atom = fextra_atom = NULL;
	extra_peratom = extra_nlen = NULL;
	extra_max = NULL;
	requestor = NULL;

	// virial_style:
	// 1 if computed explicitly by pair->compute via sum over pair interactions
	// 2 if computed implicitly by pair->virial_compute via sum over ghost atoms

	if (force->newton_pair) virial_style = 2;
	else virial_style = 1;

	// setup lists of computes for global and per-atom PE and pressure

	ev_setup();

	// detect if fix omp is present for clearing force arrays

	int ifix = modify->find_fix("package_omp");
	if (ifix >= 0) external_force_clear = 1;

	// set flags for arrays to clear in force_clear()

	torqueflag = extraflag = 0;
	if (atom->torque_flag) torqueflag = 1;
	if (atom->avec->forceclearflag) extraflag = 1;

	// allow pair and Kspace compute() to be turned off via modify flags

	if (force->pair && force->pair->compute_flag) pair_compute_flag = 1;
	else pair_compute_flag = 0;
	if (force->kspace && force->kspace->compute_flag) kspace_compute_flag = 1;
	else kspace_compute_flag = 0;

	// orthogonal vs triclinic simulation box

	triclinic = domain->triclinic;

	// reset reneighboring criteria if necessary

	neigh_every = neighbor->every;
	neigh_delay = neighbor->delay;
	neigh_dist_check = neighbor->dist_check;

	if (neigh_every != 1 \|\| neigh_delay != 0 \|\| neigh_dist_check != 1) {
	if (comm->me == 0)
	error->warning(FLERR,
	"Resetting reneighboring criteria during minimization");
	}

	neighbor->every = 1;
	neighbor->delay = 0;
	neighbor->dist_check = 1;

	niter = neval = 0;
	}

	/* ----------------------------------------------------------------------
	setup before run
	------------------------------------------------------------------------- */

	void Min::setup(int flag)
	{
	if (comm->me == 0 && screen) {
	fprintf(screen,"Setting up %s style minimization ...\n",
	update->minimize_style);
	if (flag) {
	fprintf(screen," Unit style : %s\n", update->unit_style);
	+ fprintf(screen," Current step : " BIGINT_FORMAT "\n",
	+ update->ntimestep);
	timer->print_timeout(screen);
	}
	}
	update->setupflag = 1;

	// setup extra global dof due to fixes
	// cannot be done in init() b/c update init() is before modify init()

	nextra_global = modify->min_dof();
	- if (nextra_global) fextra = new double[nextra_global];
	+ if (nextra_global) {
	+ fextra = new double[nextra_global];
	+ if (comm->me == 0 && screen)
	+ fprintf(screen,"WARNING: Energy due to %d extra global DOFs will"
	+ " be included in minimizer energies\n",nextra_global);
	+ }

	// compute for potential energy

	int id = modify->find_compute("thermo_pe");
	if (id < 0) error->all(FLERR,"Minimization could not find thermo_pe compute");
	pe_compute = modify->compute[id];

	// style-specific setup does two tasks
	// setup extra global dof vectors
	// setup extra per-atom dof vectors due to requests from Pair classes
	// cannot be done in init() b/c update init() is before modify/pair init()

	setup_style();

	// ndoftotal = total dof for entire minimization problem
	// dof for atoms, extra per-atom, extra global

	bigint ndofme = 3 * static_cast<bigint>(atom->nlocal);
	for (int m = 0; m < nextra_atom; m++)
	ndofme += extra_peratom[m]*atom->nlocal;
	MPI_Allreduce(&ndofme,&ndoftotal,1,MPI_LMP_BIGINT,MPI_SUM,world);
	ndoftotal += nextra_global;

	// setup domain, communication and neighboring
	// acquire ghosts
	// build neighbor lists

	atom->setup();
	modify->setup_pre_exchange();
	if (triclinic) domain->x2lamda(atom->nlocal);
	domain->pbc();
	domain->reset_box();
	comm->setup();
	if (neighbor->style) neighbor->setup_bins();
	comm->exchange();
	if (atom->sortfreq > 0) atom->sort();
	comm->borders();
	if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	domain->image_check();
	domain->box_too_small_check();
	modify->setup_pre_neighbor();
	neighbor->build();
	neighbor->ncalls = 0;

	// remove these restriction eventually

	if (searchflag == 0) {
	if (nextra_global)
	error->all(FLERR,
	"Cannot use a damped dynamics min style with fix box/relax");
	if (nextra_atom)
	error->all(FLERR,
	"Cannot use a damped dynamics min style with per-atom DOF");
	}

	if (strcmp(update->minimize_style,"hftn") == 0) {
	if (nextra_global)
	error->all(FLERR, "Cannot use hftn min style with fix box/relax");
	if (nextra_atom)
	error->all(FLERR, "Cannot use hftn min style with per-atom DOF");
	}

	// atoms may have migrated in comm->exchange()

	reset_vectors();

	// compute all forces

	force->setup();
	ev_set(update->ntimestep);
	force_clear();
	modify->setup_pre_force(vflag);

	if (pair_compute_flag) force->pair->compute(eflag,vflag);
	else if (force->pair) force->pair->compute_dummy(eflag,vflag);

	if (atom->molecular) {
	if (force->bond) force->bond->compute(eflag,vflag);
	if (force->angle) force->angle->compute(eflag,vflag);
	if (force->dihedral) force->dihedral->compute(eflag,vflag);
	if (force->improper) force->improper->compute(eflag,vflag);
	}

	if (force->kspace) {
	force->kspace->setup();
	if (kspace_compute_flag) force->kspace->compute(eflag,vflag);
	else force->kspace->compute_dummy(eflag,vflag);
	}

	modify->setup_pre_reverse(eflag,vflag);
	if (force->newton) comm->reverse_comm();

	// update per-atom minimization variables stored by pair styles

	if (nextra_atom)
	for (int m = 0; m < nextra_atom; m++)
	requestor[m]->min_xf_get(m);

	modify->setup(vflag);
	output->setup(flag);
	update->setupflag = 0;

	// stats for initial thermo output

	ecurrent = pe_compute->compute_scalar();
	if (nextra_global) ecurrent += modify->min_energy(fextra);
	if (output->thermo->normflag) ecurrent /= atom->natoms;

	einitial = ecurrent;
	fnorm2_init = sqrt(fnorm_sqr());
	fnorminf_init = fnorm_inf();
	}

	/* ----------------------------------------------------------------------
	setup without output or one-time post-init setup
	flag = 0 = just force calculation
	flag = 1 = reneighbor and force calculation
	------------------------------------------------------------------------- */

	void Min::setup_minimal(int flag)
	{
	update->setupflag = 1;

	// setup domain, communication and neighboring
	// acquire ghosts
	// build neighbor lists

	if (flag) {
	modify->setup_pre_exchange();
	if (triclinic) domain->x2lamda(atom->nlocal);
	domain->pbc();
	domain->reset_box();
	comm->setup();
	if (neighbor->style) neighbor->setup_bins();
	comm->exchange();
	comm->borders();
	if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	domain->image_check();
	domain->box_too_small_check();
	modify->setup_pre_neighbor();
	neighbor->build();
	neighbor->ncalls = 0;
	}

	// atoms may have migrated in comm->exchange()

	reset_vectors();

	// compute all forces

	ev_set(update->ntimestep);
	force_clear();
	modify->setup_pre_force(vflag);

	if (pair_compute_flag) force->pair->compute(eflag,vflag);
	else if (force->pair) force->pair->compute_dummy(eflag,vflag);

	if (atom->molecular) {
	if (force->bond) force->bond->compute(eflag,vflag);
	if (force->angle) force->angle->compute(eflag,vflag);
	if (force->dihedral) force->dihedral->compute(eflag,vflag);
	if (force->improper) force->improper->compute(eflag,vflag);
	}

	if (force->kspace) {
	force->kspace->setup();
	if (kspace_compute_flag) force->kspace->compute(eflag,vflag);
	else force->kspace->compute_dummy(eflag,vflag);
	}

	modify->setup_pre_reverse(eflag,vflag);
	if (force->newton) comm->reverse_comm();

	// update per-atom minimization variables stored by pair styles

	if (nextra_atom)
	for (int m = 0; m < nextra_atom; m++)
	requestor[m]->min_xf_get(m);

	modify->setup(vflag);
	update->setupflag = 0;

	// stats for Finish to print

	ecurrent = pe_compute->compute_scalar();
	if (nextra_global) ecurrent += modify->min_energy(fextra);
	if (output->thermo->normflag) ecurrent /= atom->natoms;

	einitial = ecurrent;
	fnorm2_init = sqrt(fnorm_sqr());
	fnorminf_init = fnorm_inf();
	}

	/* ----------------------------------------------------------------------
	perform minimization, calling iterate() for N steps
	------------------------------------------------------------------------- */

	void Min::run(int n)
	{
	// minimizer iterations

	stop_condition = iterate(n);
	stopstr = stopstrings(stop_condition);

	// if early exit from iterate loop:
	// set update->nsteps to niter for Finish stats to print
	// set output->next values to this timestep
	// call energy_force() to insure vflag is set when forces computed
	// output->write does final output for thermo, dump, restart files
	// add ntimestep to all computes that store invocation times
	// since are hardwiring call to thermo/dumps and computes may not be ready

	if (stop_condition != MAXITER) {
	update->nsteps = niter;

	if (update->restrict_output == 0) {
	for (int idump = 0; idump < output->ndump; idump++)
	output->next_dump[idump] = update->ntimestep;
	output->next_dump_any = update->ntimestep;
	if (output->restart_flag) {
	output->next_restart = update->ntimestep;
	if (output->restart_every_single)
	output->next_restart_single = update->ntimestep;
	if (output->restart_every_double)
	output->next_restart_double = update->ntimestep;
	}
	}
	output->next_thermo = update->ntimestep;

	modify->addstep_compute_all(update->ntimestep);
	ecurrent = energy_force(0);
	output->write(update->ntimestep);
	}
	}

	/* ---------------------------------------------------------------------- */

	void Min::cleanup()
	{
	modify->post_run();

	// stats for Finish to print

	efinal = ecurrent;
	fnorm2_final = sqrt(fnorm_sqr());
	fnorminf_final = fnorm_inf();

	// reset reneighboring criteria

	neighbor->every = neigh_every;
	neighbor->delay = neigh_delay;
	neighbor->dist_check = neigh_dist_check;

	// delete fix at end of run, so its atom arrays won't persist

	modify->delete_fix("MINIMIZE");
	domain->box_too_small_check();
	}

	/* ----------------------------------------------------------------------
	evaluate potential energy and forces
	may migrate atoms due to reneighboring
	return new energy, which should include nextra_global dof
	return negative gradient stored in atom->f
	return negative gradient for nextra_global dof in fextra
	------------------------------------------------------------------------- */

	double Min::energy_force(int resetflag)
	{
	// check for reneighboring
	// always communicate since minimizer moved atoms

	int nflag = neighbor->decide();

	if (nflag == 0) {
	timer->stamp();
	comm->forward_comm();
	timer->stamp(Timer::COMM);
	} else {
	if (modify->n_min_pre_exchange) {
	timer->stamp();
	modify->min_pre_exchange();
	timer->stamp(Timer::MODIFY);
	}
	if (triclinic) domain->x2lamda(atom->nlocal);
	domain->pbc();
	if (domain->box_change) {
	domain->reset_box();
	comm->setup();
	if (neighbor->style) neighbor->setup_bins();
	}
	timer->stamp();
	comm->exchange();
	if (atom->sortfreq > 0 &&
	update->ntimestep >= atom->nextsort) atom->sort();
	comm->borders();
	if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
	timer->stamp(Timer::COMM);
	if (modify->n_min_pre_neighbor) {
	timer->stamp();
	modify->min_pre_neighbor();
	timer->stamp(Timer::MODIFY);
	}
	neighbor->build();
	timer->stamp(Timer::NEIGH);
	}

	ev_set(update->ntimestep);
	force_clear();

	timer->stamp();

	if (modify->n_min_pre_force) {
	modify->min_pre_force(vflag);
	timer->stamp(Timer::MODIFY);
	}

	if (pair_compute_flag) {
	force->pair->compute(eflag,vflag);
	timer->stamp(Timer::PAIR);
	}

	if (atom->molecular) {
	if (force->bond) force->bond->compute(eflag,vflag);
	if (force->angle) force->angle->compute(eflag,vflag);
	if (force->dihedral) force->dihedral->compute(eflag,vflag);
	if (force->improper) force->improper->compute(eflag,vflag);
	timer->stamp(Timer::BOND);
	}

	if (kspace_compute_flag) {
	force->kspace->compute(eflag,vflag);
	timer->stamp(Timer::KSPACE);
	}

	if (modify->n_min_pre_reverse) {
	modify->min_pre_reverse(eflag,vflag);
	timer->stamp(Timer::MODIFY);
	}

	if (force->newton) {
	comm->reverse_comm();
	timer->stamp(Timer::COMM);
	}

	// update per-atom minimization variables stored by pair styles

	if (nextra_atom)
	for (int m = 0; m < nextra_atom; m++)
	requestor[m]->min_xf_get(m);

	// fixes that affect minimization

	if (modify->n_min_post_force) {
	timer->stamp();
	modify->min_post_force(vflag);
	timer->stamp(Timer::MODIFY);
	}

	// compute potential energy of system
	// normalize if thermo PE does

	double energy = pe_compute->compute_scalar();
	if (nextra_global) energy += modify->min_energy(fextra);
	if (output->thermo->normflag) energy /= atom->natoms;

	// if reneighbored, atoms migrated
	// if resetflag = 1, update x0 of atoms crossing PBC
	// reset vectors used by lo-level minimizer

	if (nflag) {
	if (resetflag) fix_minimize->reset_coords();
	reset_vectors();
	}

	return energy;
	}

	/* ----------------------------------------------------------------------
	clear force on own & ghost atoms
	clear other arrays as needed
	------------------------------------------------------------------------- */

	void Min::force_clear()
	{
	if (external_force_clear) return;

	// clear global force array
	// if either newton flag is set, also include ghosts

	size_t nbytes = sizeof(double) * atom->nlocal;
	if (force->newton) nbytes += sizeof(double) * atom->nghost;

	if (nbytes) {
	memset(&atom->f[0][0],0,3*nbytes);
	if (torqueflag) memset(&atom->torque[0][0],0,3*nbytes);
	if (extraflag) atom->avec->force_clear(0,nbytes);
	}
	}

	/* ----------------------------------------------------------------------
	pair style makes request to add a per-atom variables to minimization
	requestor stores callback to pair class to invoke during min
	to get current variable and forces on it and to update the variable
	return flag that pair can use if it registers multiple variables
	------------------------------------------------------------------------- */

	int Min::request(Pair *pair, int peratom, double maxvalue)
	{
	int n = nextra_atom + 1;
	xextra_atom = (double *) memory->srealloc(xextra_atom,nsizeof(double *),
	"min:xextra_atom");
	fextra_atom = (double *) memory->srealloc(fextra_atom,nsizeof(double *),
	"min:fextra_atom");
	memory->grow(extra_peratom,n,"min:extra_peratom");
	memory->grow(extra_nlen,n,"min:extra_nlen");
	memory->grow(extra_max,n,"min:extra_max");
	requestor = (Pair *) memory->srealloc(requestor,nsizeof(Pair *),
	"min:requestor");

	requestor[nextra_atom] = pair;
	extra_peratom[nextra_atom] = peratom;
	extra_max[nextra_atom] = maxvalue;
	nextra_atom++;
	return nextra_atom-1;
	}

	/* ---------------------------------------------------------------------- */

	void Min::modify_params(int narg, char **arg)
	{
	if (narg == 0) error->all(FLERR,"Illegal min_modify command");

	int iarg = 0;
	while (iarg < narg) {
	if (strcmp(arg[iarg],"dmax") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal min_modify command");
	dmax = force->numeric(FLERR,arg[iarg+1]);
	iarg += 2;
	} else if (strcmp(arg[iarg],"line") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal min_modify command");
	if (strcmp(arg[iarg+1],"backtrack") == 0) linestyle = 0;
	else if (strcmp(arg[iarg+1],"quadratic") == 0) linestyle = 1;
	else if (strcmp(arg[iarg+1],"forcezero") == 0) linestyle = 2;
	else error->all(FLERR,"Illegal min_modify command");
	iarg += 2;
	} else error->all(FLERR,"Illegal min_modify command");
	}
	}

	/* ----------------------------------------------------------------------
	setup lists of computes for global and per-atom PE and pressure
	------------------------------------------------------------------------- */

	void Min::ev_setup()
	{
	delete [] elist_global;
	delete [] elist_atom;
	delete [] vlist_global;
	delete [] vlist_atom;
	elist_global = elist_atom = NULL;
	vlist_global = vlist_atom = NULL;

	nelist_global = nelist_atom = 0;
	nvlist_global = nvlist_atom = 0;
	for (int i = 0; i < modify->ncompute; i++) {
	if (modify->compute[i]->peflag) nelist_global++;
	if (modify->compute[i]->peatomflag) nelist_atom++;
	if (modify->compute[i]->pressflag) nvlist_global++;
	if (modify->compute[i]->pressatomflag) nvlist_atom++;
	}

	if (nelist_global) elist_global = new Compute*[nelist_global];
	if (nelist_atom) elist_atom = new Compute*[nelist_atom];
	if (nvlist_global) vlist_global = new Compute*[nvlist_global];
	if (nvlist_atom) vlist_atom = new Compute*[nvlist_atom];

	nelist_global = nelist_atom = 0;
	nvlist_global = nvlist_atom = 0;
	for (int i = 0; i < modify->ncompute; i++) {
	if (modify->compute[i]->peflag)
	elist_global[nelist_global++] = modify->compute[i];
	if (modify->compute[i]->peatomflag)
	elist_atom[nelist_atom++] = modify->compute[i];
	if (modify->compute[i]->pressflag)
	vlist_global[nvlist_global++] = modify->compute[i];
	if (modify->compute[i]->pressatomflag)
	vlist_atom[nvlist_atom++] = modify->compute[i];
	}
	}

	/* ----------------------------------------------------------------------
	set eflag,vflag for current iteration
	invoke matchstep() on all timestep-dependent computes to clear their arrays
	eflag/vflag based on computes that need info on this ntimestep
	always set eflag_global = 1, since need energy every iteration
	eflag = 0 = no energy computation
	eflag = 1 = global energy only
	eflag = 2 = per-atom energy only
	eflag = 3 = both global and per-atom energy
	vflag = 0 = no virial computation (pressure)
	vflag = 1 = global virial with pair portion via sum of pairwise interactions
	vflag = 2 = global virial with pair portion via F dot r including ghosts
	vflag = 4 = per-atom virial only
	vflag = 5 or 6 = both global and per-atom virial
	------------------------------------------------------------------------- */

	void Min::ev_set(bigint ntimestep)
	{
	int i,flag;

	int eflag_global = 1;
	for (i = 0; i < nelist_global; i++)
	elist_global[i]->matchstep(ntimestep);

	flag = 0;
	int eflag_atom = 0;
	for (i = 0; i < nelist_atom; i++)
	if (elist_atom[i]->matchstep(ntimestep)) flag = 1;
	if (flag) eflag_atom = 2;

	if (eflag_global) update->eflag_global = update->ntimestep;
	if (eflag_atom) update->eflag_atom = update->ntimestep;
	eflag = eflag_global + eflag_atom;

	flag = 0;
	int vflag_global = 0;
	for (i = 0; i < nvlist_global; i++)
	if (vlist_global[i]->matchstep(ntimestep)) flag = 1;
	if (flag) vflag_global = virial_style;

	flag = 0;
	int vflag_atom = 0;
	for (i = 0; i < nvlist_atom; i++)
	if (vlist_atom[i]->matchstep(ntimestep)) flag = 1;
	if (flag) vflag_atom = 4;

	if (vflag_global) update->vflag_global = update->ntimestep;
	if (vflag_atom) update->vflag_atom = update->ntimestep;
	vflag = vflag_global + vflag_atom;
	}

	/* ----------------------------------------------------------------------
	compute and return \|\|force\|\|_2^2
	------------------------------------------------------------------------- */

	double Min::fnorm_sqr()
	{
	int i,n;
	double *fatom;

	double local_norm2_sqr = 0.0;
	for (i = 0; i < nvec; i++) local_norm2_sqr += fvec[i]*fvec[i];
	if (nextra_atom) {
	for (int m = 0; m < nextra_atom; m++) {
	fatom = fextra_atom[m];
	n = extra_nlen[m];
	for (i = 0; i < n; i++)
	local_norm2_sqr += fatom[i]*fatom[i];
	}
	}

	double norm2_sqr = 0.0;
	MPI_Allreduce(&local_norm2_sqr,&norm2_sqr,1,MPI_DOUBLE,MPI_SUM,world);

	if (nextra_global)
	for (i = 0; i < nextra_global; i++)
	norm2_sqr += fextra[i]*fextra[i];

	return norm2_sqr;
	}

	/* ----------------------------------------------------------------------
	compute and return \|\|force\|\|_inf
	------------------------------------------------------------------------- */

	double Min::fnorm_inf()
	{
	int i,n;
	double *fatom;

	double local_norm_inf = 0.0;
	for (i = 0; i < nvec; i++)
	local_norm_inf = MAX(fabs(fvec[i]),local_norm_inf);
	if (nextra_atom) {
	for (int m = 0; m < nextra_atom; m++) {
	fatom = fextra_atom[m];
	n = extra_nlen[m];
	for (i = 0; i < n; i++)
	local_norm_inf = MAX(fabs(fatom[i]),local_norm_inf);
	}
	}

	double norm_inf = 0.0;
	MPI_Allreduce(&local_norm_inf,&norm_inf,1,MPI_DOUBLE,MPI_MAX,world);

	if (nextra_global)
	for (i = 0; i < nextra_global; i++)
	norm_inf = MAX(fabs(fextra[i]),norm_inf);

	return norm_inf;
	}

	/* ----------------------------------------------------------------------
	possible stop conditions
	------------------------------------------------------------------------- */

	char *Min::stopstrings(int n)
	{
	const char *strings[] = {"max iterations",
	"max force evaluations",
	"energy tolerance",
	"force tolerance",
	"search direction is not downhill",
	"linesearch alpha is zero",
	"forces are zero",
	"quadratic factors are zero",
	"trust region too small",
	"HFTN minimizer error",
	"walltime limit reached"};
	return (char *) strings[n];
	}
	diff --git a/src/min.h b/src/min.h
	index 464018e82..021198bc0 100644
	--- a/src/min.h
	+++ b/src/min.h
	@@ -1,147 +1,153 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifndef LMP_MIN_H
	#define LMP_MIN_H

	#include "pointers.h"

	namespace LAMMPS_NS {

	class Min : protected Pointers {
	public:
	double einitial,efinal,eprevious;
	double fnorm2_init,fnorminf_init,fnorm2_final,fnorminf_final;
	double alpha_final;
	int niter,neval;
	int stop_condition;
	char *stopstr;
	int searchflag; // 0 if damped dynamics, 1 if sub-cycles on local search

	Min(class LAMMPS *);
	virtual ~Min();
	virtual void init();
	void setup(int flag=1);
	void setup_minimal(int);
	void run(int);
	void cleanup();
	int request(class Pair *, int, double);
	virtual bigint memory_usage() {return 0;}
	void modify_params(int, char **);
	double fnorm_sqr();
	double fnorm_inf();

	virtual void init_style() {}
	virtual void setup_style() = 0;
	virtual void reset_vectors() = 0;
	virtual int iterate(int) = 0;

	// possible return values of iterate() method
	enum{MAXITER,MAXEVAL,ETOL,FTOL,DOWNHILL,ZEROALPHA,ZEROFORCE,
	ZEROQUAD,TRSMALL,INTERROR,TIMEOUT};

	protected:
	int eflag,vflag; // flags for energy/virial computation
	int virial_style; // compute virial explicitly or implicitly
	int external_force_clear; // clear forces locally or externally

	double dmax; // max dist to move any atom in one step
	int linestyle; // 0 = backtrack, 1 = quadratic, 2 = forcezero

	int nelist_global,nelist_atom; // # of PE,virial computes to check
	int nvlist_global,nvlist_atom;
	class Compute **elist_global; // lists of PE,virial Computes
	class Compute **elist_atom;
	class Compute **vlist_global;
	class Compute **vlist_atom;

	int triclinic; // 0 if domain is orthog, 1 if triclinic
	int pairflag;
	int torqueflag,extraflag;

	int pair_compute_flag; // 0 if pair->compute is skipped
	int kspace_compute_flag; // 0 if kspace->compute is skipped

	int narray; // # of arrays stored by fix_minimize
	class FixMinimize *fix_minimize; // fix that stores auxiliary data

	class Compute *pe_compute; // compute for potential energy
	double ecurrent; // current potential energy

	bigint ndoftotal; // total dof for entire problem

	int nvec; // local atomic dof = length of xvec
	double *xvec; // variables for atomic dof, as 1d vector
	double *fvec; // force vector for atomic dof, as 1d vector

	int nextra_global; // # of extra global dof due to fixes
	double *fextra; // force vector for extra global dof
	// xextra is stored by fix

	int nextra_atom; // # of extra per-atom variables
	double **xextra_atom; // ptr to the variable
	double **fextra_atom; // ptr to the force on the variable
	int *extra_peratom; // # of values in variable, e.g. 3 in x
	int extra_nlen; // total local length of variable, e.g 3nlocal
	double *extra_max; // max allowed change per iter for atom's var
	class Pair **requestor; // Pair that stores/manipulates the variable

	int neigh_every,neigh_delay,neigh_dist_check; // neighboring params

	double energy_force(int);
	void force_clear();

	double compute_force_norm_sqr();
	double compute_force_norm_inf();

	void ev_setup();
	void ev_set(bigint);

	char *stopstrings(int);
	};

	}

	#endif

	/* ERROR/WARNING messages:

	W: Resetting reneighboring criteria during minimization

	Minimization requires that neigh_modify settings be delay = 0, every =
	1, check = yes. Since these settings were not in place, LAMMPS
	changed them and will restore them to their original values after the
	minimization.

	+W: Energy due to X extra global DOFs will be included in minimizer energies
	+
	+When using fixes like box/relax, the potential energy used by the minimizer
	+is augmented by an additional energy provided by the fix. Thus the printed
	+converged energy may be different from the total potential energy.
	+
	E: Minimization could not find thermo_pe compute

	This compute is created by the thermo command. It must have been
	explicitly deleted by a uncompute command.

	E: Cannot use a damped dynamics min style with fix box/relax

	This is a current restriction in LAMMPS. Use another minimizer
	style.

	E: Cannot use a damped dynamics min style with per-atom DOF

	This is a current restriction in LAMMPS. Use another minimizer
	style.

	E: Illegal ... command

	Self-explanatory. Check the input script syntax and compare to the
	documentation for the command. You can use -echo screen as a
	command-line option when running LAMMPS to see the offending line.

	*/
	diff --git a/src/neighbor.cpp b/src/neighbor.cpp
	index 4cd99b41d..1d12ef578 100644
	--- a/src/neighbor.cpp
	+++ b/src/neighbor.cpp
	@@ -1,2420 +1,2420 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	/* ----------------------------------------------------------------------
	Contributing author (triclinic and multi-neigh) : Pieter in 't Veld (SNL)
	------------------------------------------------------------------------- */

	#include <mpi.h>
	#include <math.h>
	#include <stdlib.h>
	#include <string.h>
	#include "neighbor.h"
	#include "neigh_list.h"
	#include "neigh_request.h"
	#include "style_nbin.h"
	#include "style_nstencil.h"
	#include "style_npair.h"
	#include "style_ntopo.h"
	#include "atom.h"
	#include "atom_vec.h"
	#include "comm.h"
	#include "force.h"
	#include "pair.h"
	#include "domain.h"
	#include "group.h"
	#include "modify.h"
	#include "fix.h"
	#include "compute.h"
	#include "update.h"
	#include "respa.h"
	#include "output.h"
	#include "citeme.h"
	#include "memory.h"
	#include "error.h"

	#include <map>

	using namespace LAMMPS_NS;
	using namespace NeighConst;

	#define RQDELTA 1
	#define EXDELTA 1

	#define BIG 1.0e20

	enum{NSQ,BIN,MULTI}; // also in NBin, NeighList, NStencil
	enum{NONE,ALL,PARTIAL,TEMPLATE};

	static const char cite_neigh_multi[] =
	"neighbor multi command:\n\n"
	"@Article{Intveld08,\n"
	" author = {P.{\\,}J.~in{\\,}'t~Veld and S.{\\,}J.~Plimpton"
	" and G.{\\,}S.~Grest},\n"
	" title = {Accurate and Efficient Methods for Modeling Colloidal\n"
	" Mixtures in an Explicit Solvent using Molecular Dynamics},\n"
	" journal = {Comp.~Phys.~Comm.},\n"
	" year = 2008,\n"
	" volume = 179,\n"
	" pages = {320--329}\n"
	"}\n\n";

	//#define NEIGH_LIST_DEBUG 1

	/* ---------------------------------------------------------------------- */

	Neighbor::Neighbor(LAMMPS *lmp) : Pointers(lmp),
	pairclass(NULL), pairnames(NULL), pairmasks(NULL)
	{
	MPI_Comm_rank(world,&me);
	MPI_Comm_size(world,&nprocs);

	firsttime = 1;

	style = BIN;
	every = 1;
	delay = 10;
	dist_check = 1;
	pgsize = 100000;
	oneatom = 2000;
	binsizeflag = 0;
	build_once = 0;
	cluster_check = 0;
	ago = -1;

	cutneighmax = 0.0;
	cutneighsq = NULL;
	cutneighghostsq = NULL;
	cuttype = NULL;
	cuttypesq = NULL;
	fixchecklist = NULL;

	// pairwise neighbor lists and associated data structs

	nlist = 0;
	lists = NULL;

	nbin = 0;
	neigh_bin = NULL;

	nstencil = 0;
	neigh_stencil = NULL;

	neigh_pair = NULL;

	nstencil_perpetual = 0;
	slist = NULL;

	npair_perpetual = 0;
	plist = NULL;

	nrequest = maxrequest = 0;
	requests = NULL;

	old_nrequest = 0;
	old_requests = NULL;

	old_style = style;
	old_triclinic = 0;
	old_pgsize = pgsize;
	old_oneatom = oneatom;

	zeroes = NULL;

	binclass = NULL;
	binnames = NULL;
	binmasks = NULL;
	stencilclass = NULL;
	stencilnames = NULL;
	stencilmasks = NULL;

	// topology lists

	bondwhich = anglewhich = dihedralwhich = improperwhich = NONE;

	neigh_bond = NULL;
	neigh_angle = NULL;
	neigh_dihedral = NULL;
	neigh_improper = NULL;

	// coords at last neighboring

	maxhold = 0;
	xhold = NULL;
	lastcall = -1;
	last_setup_bins = -1;

	// pair exclusion list info

	includegroup = 0;

	nex_type = maxex_type = 0;
	ex1_type = ex2_type = NULL;
	ex_type = NULL;

	nex_group = maxex_group = 0;
	ex1_group = ex2_group = ex1_bit = ex2_bit = NULL;

	nex_mol = maxex_mol = 0;
	ex_mol_group = ex_mol_bit = ex_mol_intra = NULL;

	// Kokkos setting

	copymode = 0;
	}

	/* ---------------------------------------------------------------------- */

	Neighbor::~Neighbor()
	{
	if (copymode) return;

	memory->destroy(cutneighsq);
	memory->destroy(cutneighghostsq);
	delete [] cuttype;
	delete [] cuttypesq;
	delete [] fixchecklist;

	for (int i = 0; i < nlist; i++) delete lists[i];
	for (int i = 0; i < nbin; i++) delete neigh_bin[i];
	for (int i = 0; i < nstencil; i++) delete neigh_stencil[i];
	for (int i = 0; i < nlist; i++) delete neigh_pair[i];
	delete [] lists;
	delete [] neigh_bin;
	delete [] neigh_stencil;
	delete [] neigh_pair;

	delete [] slist;
	delete [] plist;

	for (int i = 0; i < nlist; i++)
	if (requests[i]) delete requests[i];
	memory->sfree(requests);
	for (int i = 0; i < old_nrequest; i++)
	if (old_requests[i]) delete old_requests[i];
	memory->sfree(old_requests);

	delete [] zeroes;

	delete [] binclass;
	delete [] binnames;
	delete [] binmasks;
	delete [] stencilclass;
	delete [] stencilnames;
	delete [] stencilmasks;
	delete [] pairclass;
	delete [] pairnames;
	delete [] pairmasks;

	delete neigh_bond;
	delete neigh_angle;
	delete neigh_dihedral;
	delete neigh_improper;

	memory->destroy(xhold);

	memory->destroy(ex1_type);
	memory->destroy(ex2_type);
	memory->destroy(ex_type);

	memory->destroy(ex1_group);
	memory->destroy(ex2_group);
	delete [] ex1_bit;
	delete [] ex2_bit;

	memory->destroy(ex_mol_group);
	delete [] ex_mol_bit;
	memory->destroy(ex_mol_intra);
	}

	/* ---------------------------------------------------------------------- */

	void Neighbor::init()
	{
	int i,j,n;

	ncalls = ndanger = 0;
	dimension = domain->dimension;
	triclinic = domain->triclinic;
	newton_pair = force->newton_pair;

	// error check

	if (delay > 0 && (delay % every) != 0)
	error->all(FLERR,"Neighbor delay must be 0 or multiple of every setting");

	if (pgsize < 10*oneatom)
	error->all(FLERR,"Neighbor page size must be >= 10x the one atom setting");

	// ------------------------------------------------------------------
	// settings

	// bbox lo/hi ptrs = bounding box of entire domain, stored by Domain

	if (triclinic == 0) {
	bboxlo = domain->boxlo;
	bboxhi = domain->boxhi;
	} else {
	bboxlo = domain->boxlo_bound;
	bboxhi = domain->boxhi_bound;
	}

	// set neighbor cutoffs (force cutoff + skin)
	// trigger determines when atoms migrate and neighbor lists are rebuilt
	// needs to be non-zero for migration distance check
	// even if pair = NULL and no neighbor lists are used
	// cutneigh = force cutoff + skin if cutforce > 0, else cutneigh = 0
	// cutneighghost = pair cutghost if it requests it, else same as cutneigh

	triggersq = 0.25skinskin;
	boxcheck = 0;
	if (domain->box_change && (domain->xperiodic \|\| domain->yperiodic \|\|
	(dimension == 3 && domain->zperiodic)))
	boxcheck = 1;

	n = atom->ntypes;
	if (cutneighsq == NULL) {
	if (lmp->kokkos) init_cutneighsq_kokkos(n);
	else memory->create(cutneighsq,n+1,n+1,"neigh:cutneighsq");
	memory->create(cutneighghostsq,n+1,n+1,"neigh:cutneighghostsq");
	cuttype = new double[n+1];
	cuttypesq = new double[n+1];
	}

	double cutoff,delta,cut;
	cutneighmin = BIG;
	cutneighmax = 0.0;

	for (i = 1; i <= n; i++) {
	cuttype[i] = cuttypesq[i] = 0.0;
	for (j = 1; j <= n; j++) {
	if (force->pair) cutoff = sqrt(force->pair->cutsq[i][j]);
	else cutoff = 0.0;
	if (cutoff > 0.0) delta = skin;
	else delta = 0.0;
	cut = cutoff + delta;

	cutneighsq[i][j] = cut*cut;
	cuttype[i] = MAX(cuttype[i],cut);
	cuttypesq[i] = MAX(cuttypesq[i],cut*cut);
	cutneighmin = MIN(cutneighmin,cut);
	cutneighmax = MAX(cutneighmax,cut);

	if (force->pair && force->pair->ghostneigh) {
	cut = force->pair->cutghost[i][j] + skin;
	cutneighghostsq[i][j] = cut*cut;
	} else cutneighghostsq[i][j] = cut*cut;
	}
	}
	cutneighmaxsq = cutneighmax * cutneighmax;

	// rRESPA cutoffs

	int respa = 0;
	if (update->whichflag == 1 && strstr(update->integrate_style,"respa")) {
	if (((Respa *) update->integrate)->level_inner >= 0) respa = 1;
	if (((Respa *) update->integrate)->level_middle >= 0) respa = 2;
	}

	if (respa) {
	double cut_respa = ((Respa ) update->integrate)->cutoff;
	cut_inner_sq = (cut_respa[1] + skin) * (cut_respa[1] + skin);
	cut_middle_sq = (cut_respa[3] + skin) * (cut_respa[3] + skin);
	cut_middle_inside_sq = (cut_respa[0] - skin) * (cut_respa[0] - skin);
	if (cut_respa[0]-skin < 0) cut_middle_inside_sq = 0.0;
	}

	// fixchecklist = other classes that can induce reneighboring in decide()

	restart_check = 0;
	if (output->restart_flag) restart_check = 1;

	delete [] fixchecklist;
	fixchecklist = NULL;
	fixchecklist = new int[modify->nfix];

	fix_check = 0;
	for (i = 0; i < modify->nfix; i++)
	if (modify->fix[i]->force_reneighbor)
	fixchecklist[fix_check++] = i;

	must_check = 0;
	if (restart_check \|\| fix_check) must_check = 1;

	// set special_flag for 1-2, 1-3, 1-4 neighbors
	// flag[0] is not used, flag[1] = 1-2, flag[2] = 1-3, flag[3] = 1-4
	// flag = 0 if both LJ/Coulomb special values are 0.0
	// flag = 1 if both LJ/Coulomb special values are 1.0
	// flag = 2 otherwise or if KSpace solver is enabled
	// pairwise portion of KSpace solver uses all 1-2,1-3,1-4 neighbors
	// or selected Coulomb-approixmation pair styles require it

	if (force->special_lj[1] == 0.0 && force->special_coul[1] == 0.0)
	special_flag[1] = 0;
	else if (force->special_lj[1] == 1.0 && force->special_coul[1] == 1.0)
	special_flag[1] = 1;
	else special_flag[1] = 2;

	if (force->special_lj[2] == 0.0 && force->special_coul[2] == 0.0)
	special_flag[2] = 0;
	else if (force->special_lj[2] == 1.0 && force->special_coul[2] == 1.0)
	special_flag[2] = 1;
	else special_flag[2] = 2;

	if (force->special_lj[3] == 0.0 && force->special_coul[3] == 0.0)
	special_flag[3] = 0;
	else if (force->special_lj[3] == 1.0 && force->special_coul[3] == 1.0)
	special_flag[3] = 1;
	else special_flag[3] = 2;

	if (force->kspace \|\| force->pair_match("coul/wolf",0) \|\|
	force->pair_match("coul/dsf",0) \|\| force->pair_match("thole",0))
	special_flag[1] = special_flag[2] = special_flag[3] = 2;

	// maxwt = max multiplicative factor on atom indices stored in neigh list

	maxwt = 0;
	if (special_flag[1] == 2) maxwt = 2;
	if (special_flag[2] == 2) maxwt = 3;
	if (special_flag[3] == 2) maxwt = 4;

	// ------------------------------------------------------------------
	// xhold array

	// free if not needed for this run

	if (dist_check == 0) {
	memory->destroy(xhold);
	maxhold = 0;
	xhold = NULL;
	}

	// first time allocation

	if (dist_check) {
	if (maxhold == 0) {
	maxhold = atom->nmax;
	memory->create(xhold,maxhold,3,"neigh:xhold");
	}
	}

	// ------------------------------------------------------------------
	// exclusion lists

	// depend on type, group, molecule settings from neigh_modify
	// warn if exclusions used with KSpace solver

	n = atom->ntypes;

	if (nex_type == 0 && nex_group == 0 && nex_mol == 0) exclude = 0;
	else exclude = 1;

	if (nex_type) {
	if (lmp->kokkos)
	init_ex_type_kokkos(n);
	else {
	memory->destroy(ex_type);
	memory->create(ex_type,n+1,n+1,"neigh:ex_type");
	}

	for (i = 1; i <= n; i++)
	for (j = 1; j <= n; j++)
	ex_type[i][j] = 0;

	for (i = 0; i < nex_type; i++) {
	if (ex1_type[i] <= 0 \|\| ex1_type[i] > n \|\|
	ex2_type[i] <= 0 \|\| ex2_type[i] > n)
	error->all(FLERR,"Invalid atom type in neighbor exclusion list");
	ex_type[ex1_type[i]][ex2_type[i]] = 1;
	ex_type[ex2_type[i]][ex1_type[i]] = 1;
	}
	}

	if (nex_group) {
	if (lmp->kokkos)
	init_ex_bit_kokkos();
	else {
	delete [] ex1_bit;
	delete [] ex2_bit;
	ex1_bit = new int[nex_group];
	ex2_bit = new int[nex_group];
	}

	for (i = 0; i < nex_group; i++) {
	ex1_bit[i] = group->bitmask[ex1_group[i]];
	ex2_bit[i] = group->bitmask[ex2_group[i]];
	}
	}

	if (nex_mol) {
	if (lmp->kokkos)
	init_ex_mol_bit_kokkos();
	else {
	delete [] ex_mol_bit;
	ex_mol_bit = new int[nex_mol];
	}

	for (i = 0; i < nex_mol; i++)
	ex_mol_bit[i] = group->bitmask[ex_mol_group[i]];
	}

	if (exclude && force->kspace && me == 0)
	error->warning(FLERR,"Neighbor exclusions used with KSpace solver "
	"may give inconsistent Coulombic energies");

	// ------------------------------------------------------------------
	// create pairwise lists
	// one-time call to init_styles() to scan style files and setup
	// init_pair() creates auxiliary classes: NBin, NStencil, NPair

	if (firsttime) init_styles();
	firsttime = 0;

	int same = init_pair();

	// invoke copy_neighbor_info() in Bin,Stencil,Pair classes
	// copied once per run in case any cutoff, exclusion, special info changed

	for (i = 0; i < nbin; i++) neigh_bin[i]->copy_neighbor_info();
	for (i = 0; i < nstencil; i++) neigh_stencil[i]->copy_neighbor_info();
	for (i = 0; i < nlist; i++)
	if (neigh_pair[i]) neigh_pair[i]->copy_neighbor_info();

	if (!same && comm->me == 0) print_pairwise_info();

	// can now delete requests so next run can make new ones
	// print_pairwise_info() made use of requests
	// set of NeighLists now stores all needed info

	for (int i = 0; i < nrequest; i++) {
	delete requests[i];
	requests[i] = NULL;
	}
	nrequest = 0;

	// ------------------------------------------------------------------
	// create topology lists
	// instantiated topo styles can change from run to run

	init_topology();
	}

	/* ----------------------------------------------------------------------
	create and initialize lists of Nbin, Nstencil, NPair classes
	lists have info on all classes in 3 style*.h files
	cannot do this in constructor, b/c too early to instantiate classes
	------------------------------------------------------------------------- */

	void Neighbor::init_styles()
	{
	// extract info from NBin classes listed in style_nbin.h

	nbclass = 0;

	#define NBIN_CLASS
	#define NBinStyle(key,Class,bitmasks) nbclass++;
	#include "style_nbin.h"
	#undef NBinStyle
	#undef NBIN_CLASS

	binclass = new BinCreator[nbclass];
	binnames = new char*[nbclass];
	binmasks = new int[nbclass];
	nbclass = 0;

	#define NBIN_CLASS
	#define NBinStyle(key,Class,bitmasks) \
	binnames[nbclass] = (char *) #key; \
	binclass[nbclass] = &bin_creator<Class>; \
	binmasks[nbclass++] = bitmasks;
	#include "style_nbin.h"
	#undef NBinStyle
	#undef NBIN_CLASS

	// extract info from NStencil classes listed in style_nstencil.h

	nsclass = 0;

	#define NSTENCIL_CLASS
	#define NStencilStyle(key,Class,bitmasks) nsclass++;
	#include "style_nstencil.h"
	#undef NStencilStyle
	#undef NSTENCIL_CLASS

	stencilclass = new StencilCreator[nsclass];
	stencilnames = new char*[nsclass];
	stencilmasks = new int[nsclass];
	nsclass = 0;

	#define NSTENCIL_CLASS
	#define NStencilStyle(key,Class,bitmasks) \
	stencilnames[nsclass] = (char *) #key; \
	stencilclass[nsclass] = &stencil_creator<Class>; \
	stencilmasks[nsclass++] = bitmasks;
	#include "style_nstencil.h"
	#undef NStencilStyle
	#undef NSTENCIL_CLASS

	// extract info from NPair classes listed in style_npair.h

	npclass = 0;

	#define NPAIR_CLASS
	#define NPairStyle(key,Class,bitmasks) npclass++;
	#include "style_npair.h"
	#undef NPairStyle
	#undef NPAIR_CLASS

	pairclass = new PairCreator[npclass];
	pairnames = new char*[npclass];
	pairmasks = new int[npclass];
	npclass = 0;

	#define NPAIR_CLASS
	#define NPairStyle(key,Class,bitmasks) \
	pairnames[npclass] = (char *) #key; \
	pairclass[npclass] = &pair_creator<Class>; \
	pairmasks[npclass++] = bitmasks;
	#include "style_npair.h"
	#undef NPairStyle
	#undef NPAIR_CLASS
	}

	/* ----------------------------------------------------------------------
	create and initialize NPair classes
	------------------------------------------------------------------------- */

	int Neighbor::init_pair()
	{
	int i,j,k,m;

	// test if pairwise lists need to be re-created
	// no need to re-create if:
	// neigh style, triclinic, pgsize, oneatom have not changed
	// current requests = old requests
	// so just return:
	// delete requests so next run can make new ones
	// current set of NeighLists already stores all needed info
	// requests are compared via identical() before:
	// any requests are morphed using logic below
	// any requests are added below, e.g. as parents of pair hybrid skip lists
	// copy them via requests_new2old() BEFORE any changes made to requests
	// necessary b/c morphs can change requestor settings (see comment below)

	int same = 1;
	if (style != old_style) same = 0;
	if (triclinic != old_triclinic) same = 0;
	if (pgsize != old_pgsize) same = 0;
	if (oneatom != old_oneatom) same = 0;

	if (nrequest != old_nrequest) same = 0;
	else
	for (i = 0; i < nrequest; i++)
	if (requests[i]->identical(old_requests[i]) == 0) same = 0;

	#ifdef NEIGH_LIST_DEBUG
	if (comm->me == 0) printf("SAME flag %d\n",same);
	#endif

	if (same) return same;
	requests_new2old();

	// delete old lists since creating new ones

	for (i = 0; i < nlist; i++) delete lists[i];
	for (i = 0; i < nbin; i++) delete neigh_bin[i];
	for (i = 0; i < nstencil; i++) delete neigh_stencil[i];
	for (i = 0; i < nlist; i++) delete neigh_pair[i];
	delete [] lists;
	delete [] neigh_bin;
	delete [] neigh_stencil;
	delete [] neigh_pair;

	// morph requests in various ways
	// purpose is to avoid duplicate or inefficient builds
	// may add new requests if a needed request to derive from does not exist
	// methods:
	// (1) other = point history and rRESPA lists at their partner lists
	// (2) skip = create any new non-skip lists needed by pair hybrid skip lists
	// (3) granular = adjust parent and skip lists for granular onesided usage
	// (4) h/f = pair up any matching half/full lists
	// (5) copy = convert as many lists as possible to copy lists
	// order of morph methods matters:
	// (1) before (2), b/c (2) needs to know history partner pairings
	// (2) after (1), b/c (2) may also need to create new history lists
	// (3) after (2), b/c it adjusts lists created by (2)
	// (4) after (2) and (3),
	// b/c (2) may create new full lists, (3) may change them
	// (5) last, after all lists are finalized, so all possible copies found

	int nrequest_original = nrequest;

	morph_other();
	morph_skip();
	morph_granular(); // this method can change flags set by requestor
	morph_halffull();
	morph_copy();

	// create new lists, one per request including added requests
	// wait to allocate initial pages until copy lists are detected
	- // NOTE: can I allocation now, instead of down below?
	+ // NOTE: can I allocate now, instead of down below?

	nlist = nrequest;

	lists = new NeighList*[nrequest];
	neigh_bin = new NBin*[nrequest];
	neigh_stencil = new NStencil*[nrequest];
	neigh_pair = new NPair*[nrequest];

	// allocate new lists
	// pass list ptr back to requestor (except for Command class)
	// only for original requests, not ones added by Neighbor class

	for (i = 0; i < nrequest; i++) {
	if (requests[i]->kokkos_host \|\| requests[i]->kokkos_device)
	create_kokkos_list(i);
	else lists[i] = new NeighList(lmp);
	lists[i]->index = i;

	if (requests[i]->pair && i < nrequest_original) {
	Pair pair = (Pair ) requests[i]->requestor;
	pair->init_list(requests[i]->id,lists[i]);
	} else if (requests[i]->fix && i < nrequest_original) {
	Fix fix = (Fix ) requests[i]->requestor;
	fix->init_list(requests[i]->id,lists[i]);
	} else if (requests[i]->compute && i < nrequest_original) {
	Compute compute = (Compute ) requests[i]->requestor;
	compute->init_list(requests[i]->id,lists[i]);
	}
	}

	// invoke post_constructor() for all lists
	// copies info from requests to lists, sets ptrs to related lists

	for (i = 0; i < nrequest; i++)
	lists[i]->post_constructor(requests[i]);

	// assign Bin,Stencil,Pair style to each list

	int flag;
	for (i = 0; i < nrequest; i++) {
	flag = choose_bin(requests[i]);
	lists[i]->bin_method = flag;
	if (flag < 0)
	error->all(FLERR,"Requested neighbor bin option does not exist");

	flag = choose_stencil(requests[i]);
	lists[i]->stencil_method = flag;
	if (flag < 0)
	error->all(FLERR,"Requested neighbor stencil method does not exist");

	flag = choose_pair(requests[i]);
	lists[i]->pair_method = flag;
	if (flag < 0)
	error->all(FLERR,"Requested neighbor pair method does not exist");
	}

	// instantiate unique Bin,Stencil classes in neigh_bin & neigh_stencil vecs
	// unique = only one of its style, or request unique flag set (custom cutoff)

	nbin = 0;
	for (i = 0; i < nrequest; i++) {
	requests[i]->index_bin = -1;
	flag = lists[i]->bin_method;
	if (flag == 0) continue;
	for (j = 0; j < nbin; j++)
	if (neigh_bin[j]->istyle == flag) break;
	if (j < nbin && !requests[i]->unique) {
	requests[i]->index_bin = j;
	continue;
	}

	BinCreator bin_creator = binclass[flag-1];
	neigh_bin[nbin] = bin_creator(lmp);
	neigh_bin[nbin]->post_constructor(requests[i]);
	neigh_bin[nbin]->istyle = flag;

	requests[i]->index_bin = nbin;
	nbin++;
	}

	nstencil = 0;
	for (i = 0; i < nrequest; i++) {
	requests[i]->index_stencil = -1;
	flag = lists[i]->stencil_method;
	if (flag == 0) continue;
	for (j = 0; j < nstencil; j++)
	if (neigh_stencil[j]->istyle == flag) break;
	if (j < nstencil && !requests[i]->unique) {
	requests[i]->index_stencil = j;
	continue;
	}

	StencilCreator stencil_creator = stencilclass[flag-1];
	neigh_stencil[nstencil] = stencil_creator(lmp);
	neigh_stencil[nstencil]->post_constructor(requests[i]);
	neigh_stencil[nstencil]->istyle = flag;

	if (lists[i]->bin_method > 0) {
	neigh_stencil[nstencil]->nb = neigh_bin[requests[i]->index_bin];
	if (neigh_stencil[nstencil]->nb == NULL)
	error->all(FLERR,"Could not assign bin method to neighbor stencil");
	}

	requests[i]->index_stencil = nstencil;
	nstencil++;
	}

	// instantiate one Pair class per list in neigh_pair vec

	for (i = 0; i < nrequest; i++) {
	requests[i]->index_pair = -1;
	flag = lists[i]->pair_method;
	if (flag == 0) {
	neigh_pair[i] = NULL;
	continue;
	}

	PairCreator pair_creator = pairclass[flag-1];
	neigh_pair[i] = pair_creator(lmp);
	neigh_pair[i]->post_constructor(requests[i]);
	neigh_pair[i]->istyle = flag;

	if (lists[i]->bin_method > 0) {
	neigh_pair[i]->nb = neigh_bin[requests[i]->index_bin];
	if (neigh_pair[i]->nb == NULL)
	error->all(FLERR,"Could not assign bin method to neighbor pair");
	}
	if (lists[i]->stencil_method > 0) {
	neigh_pair[i]->ns = neigh_stencil[requests[i]->index_stencil];
	if (neigh_pair[i]->ns == NULL)
	error->all(FLERR,"Could not assign stencil method to neighbor pair");
	}

	requests[i]->index_pair = i;
	}

	// allocate initial pages for each list, except if copy flag set
	// allocate dnum vector of zeroes if set

	int dnummax = 0;
	for (i = 0; i < nlist; i++) {
	if (lists[i]->copy) continue;
	lists[i]->setup_pages(pgsize,oneatom);
	dnummax = MAX(dnummax,lists[i]->dnum);
	}

	if (dnummax) {
	delete [] zeroes;
	zeroes = new double[dnummax];
	for (i = 0; i < dnummax; i++) zeroes[i] = 0.0;
	}

	// first-time allocation of per-atom data for lists that are built and store
	// lists that are not built: granhistory, respa inner/middle (no neigh_pair)
	// lists that do not store: copy
	// use atom->nmax for both grow() args
	// i.e. grow first time to expanded size to avoid future reallocs
	// also Kokkos list initialization

	int maxatom = atom->nmax;
	for (i = 0; i < nlist; i++)
	if (neigh_pair[i] && !lists[i]->copy) lists[i]->grow(maxatom,maxatom);

	// plist = indices of perpetual NPair classes
	// perpetual = non-occasional, re-built at every reneighboring
	// slist = indices of perpetual NStencil classes
	// perpetual = used by any perpetual NPair class

	delete [] slist;
	delete [] plist;
	nstencil_perpetual = npair_perpetual = 0;
	slist = new int[nstencil];
	plist = new int[nlist];

	for (i = 0; i < nlist; i++) {
	if (lists[i]->occasional == 0 && lists[i]->pair_method)
	plist[npair_perpetual++] = i;
	}

	for (i = 0; i < nstencil; i++) {
	flag = 0;
	for (j = 0; j < npair_perpetual; j++)
	if (lists[plist[j]]->stencil_method == neigh_stencil[i]->istyle)
	flag = 1;
	if (flag) slist[nstencil_perpetual++] = i;
	}

	// reorder plist vector if necessary
	// relevant for lists that are derived from a parent list:
	// half-full,copy,skip
	// the child index must appear in plist after the parent index
	// swap two indices within plist when dependency is mis-ordered
	// start double loop check again whenever a swap is made
	// done when entire double loop test results in no swaps

	NeighList *ptr;

	int done = 0;
	while (!done) {
	done = 1;
	for (i = 0; i < npair_perpetual; i++) {
	for (k = 0; k < 3; k++) {
	ptr = NULL;
	if (k == 0) ptr = lists[plist[i]]->listcopy;
	if (k == 1) ptr = lists[plist[i]]->listskip;
	if (k == 2) ptr = lists[plist[i]]->listfull;
	if (ptr == NULL) continue;
	for (m = 0; m < nrequest; m++)
	if (ptr == lists[m]) break;
	for (j = 0; j < npair_perpetual; j++)
	if (m == plist[j]) break;
	if (j < i) continue;
	int tmp = plist[i]; // swap I,J indices
	plist[i] = plist[j];
	plist[j] = tmp;
	done = 0;
	break;
	}
	if (!done) break;
	}
	}

	// debug output

	#ifdef NEIGH_LIST_DEBUG
	for (i = 0; i < nrequest; i++) lists[i]->print_attributes();
	#endif

	return same;
	}

	/* ----------------------------------------------------------------------
	scan NeighRequests to set additional flags
	only for history, respaouter, custom cutoff lists
	------------------------------------------------------------------------- */

	void Neighbor::morph_other()
	{
	NeighRequest *irq;

	for (int i = 0; i < nrequest; i++) {
	irq = requests[i];

	// if history, point this list and partner list at each other

	if (irq->history) {
	irq->historylist = i-1;
	requests[i-1]->history_partner = 1;
	requests[i-1]->historylist = i;
	}

	// if respaouter, point all associated rRESPA lists at each other

	if (irq->respaouter) {
	if (requests[i-1]->respainner) {
	irq->respainnerlist = i-1;
	requests[i-1]->respaouterlist = i;
	} else {
	irq->respamiddlelist = i-1;
	requests[i-1]->respaouterlist = i;
	requests[i-1]->respainnerlist = i-1;
	irq->respainnerlist = i-2;
	requests[i-2]->respaouterlist = i;
	requests[i-2]->respamiddlelist = i-1;
	}
	}

	// if cut flag set by requestor, set unique flag
	// this forces Pair,Stencil,Bin styles to be instantiated separately

	if (irq->cut) irq->unique = 1;
	}
	}

	/* ----------------------------------------------------------------------
	scan NeighRequests to process all skip lists
	look for a matching non-skip list
	if one exists, point at it via skiplist
	else make new parent via copy_request() and point at it
	------------------------------------------------------------------------- */

	void Neighbor::morph_skip()
	{
	int i,j,inewton,jnewton;
	NeighRequest irq,jrq,*nrq;

	for (i = 0; i < nrequest; i++) {
	irq = requests[i];

	// only processing skip lists

	if (!irq->skip) continue;

	// these lists are created other ways, no need for skipping
	// halffull list and its full parent may both skip,
	// but are checked to insure matching skip info

	if (irq->history) continue;
	if (irq->respainner \|\| irq->respamiddle) continue;
	if (irq->halffull) continue;
	if (irq->copy) continue;

	// check all other lists

	for (j = 0; j < nrequest; j++) {
	if (i == j) continue;
	jrq = requests[j];

	// can only skip from a perpetual non-skip list

	if (jrq->occasional) continue;
	if (jrq->skip) continue;

	// both lists must be half, or both full

	if (irq->half != jrq->half) continue;
	if (irq->full != jrq->full) continue;

	// both lists must be newton on, or both newton off
	// IJ newton = 1 for newton on, 2 for newton off

	inewton = irq->newton;
	if (inewton == 0) inewton = force->newton_pair ? 1 : 2;
	jnewton = jrq->newton;
	if (jnewton == 0) jnewton = force->newton_pair ? 1 : 2;
	if (inewton != jnewton) continue;

	// these flags must be same,
	// else 2 lists do not store same pairs
	// or their data structures are different
	// this includes custom cutoff set by requestor
	// no need to check respaouter b/c it stores same pairs
	// no need to check dnum b/c only set for history
	// NOTE: need check for 2 Kokkos flags?

	if (irq->ghost != jrq->ghost) continue;
	if (irq->size != jrq->size) continue;
	if (irq->bond != jrq->bond) continue;
	if (irq->omp != jrq->omp) continue;
	if (irq->intel != jrq->intel) continue;
	if (irq->kokkos_host != jrq->kokkos_host) continue;
	if (irq->kokkos_device != jrq->kokkos_device) continue;
	if (irq->ssa != jrq->ssa) continue;
	if (irq->cut != jrq->cut) continue;
	if (irq->cutoff != jrq->cutoff) continue;

	// 2 lists are a match

	break;
	}

	// if matching list exists, point to it
	// else create a new identical list except non-skip
	// for new list, set neigh = 1, skip = 0, no skip vec/array,
	// copy unique flag (since copy_request() will not do it)
	// note: parents of skip lists do not have associated history list
	// b/c child skip lists store their own history info

	if (j < nrequest) irq->skiplist = j;
	else {
	int newrequest = request(this,-1);
	irq->skiplist = newrequest;

	nrq = requests[newrequest];
	nrq->copy_request(irq,0);
	nrq->pair = nrq->fix = nrq->compute = nrq->command = 0;
	nrq->neigh = 1;
	nrq->skip = 0;
	if (irq->unique) nrq->unique = 1;
	}
	}
	}

	/* ----------------------------------------------------------------------
	scan NeighRequests just added by morph_skip for hybrid granular
	adjust newton/oneside parent settings if children require onesided skipping
	also set children off2on flag if parent becomes a newton off list
	this is needed because line/gran and tri/gran pair styles
	require onesided neigh lists and system newton on,
	but parent list must be newton off to enable the onesided skipping
	------------------------------------------------------------------------- */

	void Neighbor::morph_granular()
	{
	int i,j;
	NeighRequest irq,jrq;

	for (i = 0; i < nrequest; i++) {
	irq = requests[i];

	// only examine NeighRequests added by morph_skip()
	// only those with size attribute for granular systems

	if (!irq->neigh) continue;
	if (!irq->size) continue;

	// check children of this list

	int onesided = -1;
	for (j = 0; j < nrequest; j++) {
	jrq = requests[j];

	// only consider JRQ pair, size lists that skip from Irq list

	if (!jrq->pair) continue;
	if (!jrq->size) continue;
	if (!jrq->skip \|\| jrq->skiplist != i) continue;

	// onesided = -1 if no children
	// onesided = 0/1 = child granonesided value if same for all children
	// onesided = 2 if children have different granonesided values

	if (onesided < 0) onesided = jrq->granonesided;
	else if (onesided != jrq->granonesided) onesided = 2;
	if (onesided == 2) break;
	}


	// if onesided = 2, parent has children with both granonesided = 0/1
	// force parent newton off (newton = 2) to enable onesided skip by child
	// set parent granonesided = 0, so it stores all neighs in usual manner
	// set off2on = 1 for all children, since they expect newton on lists
	// this is b/c granonesided only set by line/gran and tri/gran which
	// both require system newton on

	if (onesided == 2) {
	irq->newton = 2;
	irq->granonesided = 0;

	for (j = 0; j < nrequest; j++) {
	jrq = requests[j];

	// only consider JRQ pair, size lists that skip from Irq list

	if (!jrq->pair) continue;
	if (!jrq->size) continue;
	if (!jrq->skip \|\| jrq->skiplist != i) continue;

	jrq->off2on = 1;
	}
	}
	}
	}

	/* ----------------------------------------------------------------------
	scan NeighRequests for possible half lists to derive from full lists
	if 2 requests match, set half list to derive from full list
	------------------------------------------------------------------------- */

	void Neighbor::morph_halffull()
	{
	int i,j;
	NeighRequest irq,jrq;

	for (i = 0; i < nrequest; i++) {
	irq = requests[i];

	// only processing half lists

	if (!irq->half) continue;

	// Kokkos doesn't yet support half from full

	if (irq->kokkos_host) continue;
	if (irq->kokkos_device) continue;

	// these lists are created other ways, no need for halffull
	// do want to process skip lists

	if (irq->history) continue;
	if (irq->respainner \|\| irq->respamiddle) continue;
	if (irq->copy) continue;

	// check all other lists

	for (j = 0; j < nrequest; j++) {
	if (i == j) continue;
	jrq = requests[j];

	// can only derive from a perpetual full list
	// newton setting of derived list does not matter

	if (jrq->occasional) continue;
	if (!jrq->full) continue;

	// these flags must be same,
	// else 2 lists do not store same pairs
	// or their data structures are different
	// this includes custom cutoff set by requestor
	// no need to check respaouter b/c it stores same pairs
	// no need to check dnum b/c only set for history

	if (irq->ghost != jrq->ghost) continue;
	if (irq->size != jrq->size) continue;
	if (irq->bond != jrq->bond) continue;
	if (irq->omp != jrq->omp) continue;
	if (irq->intel != jrq->intel) continue;
	if (irq->kokkos_host != jrq->kokkos_host) continue;
	if (irq->kokkos_device != jrq->kokkos_device) continue;
	if (irq->ssa != jrq->ssa) continue;
	if (irq->cut != jrq->cut) continue;
	if (irq->cutoff != jrq->cutoff) continue;

	// skip flag must be same
	// if both are skip lists, skip info must match

	if (irq->skip != jrq->skip) continue;
	if (irq->skip && irq->same_skip(jrq) == 0) continue;

	// 2 lists are a match

	break;
	}

	// if matching list exists, point to it

	if (j < nrequest) {
	irq->halffull = 1;
	irq->halffulllist = j;
	}
	}
	}

	/* ----------------------------------------------------------------------
	scan NeighRequests for possible copies
	if 2 requests match, turn one into a copy of the other
	------------------------------------------------------------------------- */

	void Neighbor::morph_copy()
	{
	int i,j,inewton,jnewton;
	NeighRequest irq,jrq;

	for (i = 0; i < nrequest; i++) {
	irq = requests[i];

	// this list is already a copy list due to another morph method

	if (irq->copy) continue;

	// these lists are created other ways, no need to copy
	// skip lists are eligible to become a copy list

	if (irq->history) continue;
	if (irq->respainner \|\| irq->respamiddle) continue;

	// check all other lists

	- for (j = 0; j < i; j++) {
	+ for (j = 0; j < nrequest; j++) {
	if (i == j) continue;
	jrq = requests[j];

	// other list is already copied from this one

	if (jrq->copy && jrq->copylist == i) continue;

	// parent list must be perpetual
	// copied list can be perpetual or occasional

	if (jrq->occasional) continue;

	// both lists must be half, or both full

	if (irq->half != jrq->half) continue;
	if (irq->full != jrq->full) continue;

	// both lists must be newton on, or both newton off
	// IJ newton = 1 for newton on, 2 for newton off

	inewton = irq->newton;
	if (inewton == 0) inewton = force->newton_pair ? 1 : 2;
	jnewton = jrq->newton;
	if (jnewton == 0) jnewton = force->newton_pair ? 1 : 2;
	if (inewton != jnewton) continue;

	// ok for non-ghost list to copy from ghost list, but not vice versa

	if (irq->ghost && !jrq->ghost) continue;

	// these flags must be same,
	// else 2 lists do not store same pairs
	// or their data structures are different
	// this includes custom cutoff set by requestor
	// no need to check respaouter b/c it stores same pairs
	// no need to check omp b/c it stores same pairs
	// no need to check dnum b/c only set for history
	// NOTE: need check for 2 Kokkos flags?

	if (irq->size != jrq->size) continue;
	if (irq->bond != jrq->bond) continue;
	if (irq->intel != jrq->intel) continue;
	if (irq->kokkos_host != jrq->kokkos_host) continue;
	if (irq->kokkos_device != jrq->kokkos_device) continue;
	if (irq->ssa != jrq->ssa) continue;
	if (irq->cut != jrq->cut) continue;
	if (irq->cutoff != jrq->cutoff) continue;

	// skip flag must be same
	// if both are skip lists, skip info must match

	if (irq->skip != jrq->skip) continue;
	if (irq->skip && irq->same_skip(jrq) == 0) continue;

	// 2 lists are a match

	break;
	}

	// turn list I into a copy of list J
	// do not copy a list from another copy list, but from its parent list

	- if (j < i) {
	+ if (j < nrequest) {
	irq->copy = 1;
	if (jrq->copy) irq->copylist = jrq->copylist;
	else irq->copylist = j;
	}
	}
	}

	/* ----------------------------------------------------------------------
	create and initialize NTopo classes
	------------------------------------------------------------------------- */

	void Neighbor::init_topology()
	{
	int i,m;

	if (!atom->molecular) return;

	// set flags that determine which topology neighbor classes to use
	// these settings could change from run to run, depending on fixes defined
	// bonds,etc can only be broken for atom->molecular = 1, not 2
	// SHAKE sets bonds and angles negative
	// gcmc sets all bonds, angles, etc negative
	// bond_quartic sets bonds to 0
	// delete_bonds sets all interactions negative

	int bond_off = 0;
	int angle_off = 0;
	for (i = 0; i < modify->nfix; i++)
	if ((strcmp(modify->fix[i]->style,"shake") == 0)
	\|\| (strcmp(modify->fix[i]->style,"rattle") == 0))
	bond_off = angle_off = 1;
	if (force->bond && force->bond_match("quartic")) bond_off = 1;

	if (atom->avec->bonds_allow && atom->molecular == 1) {
	for (i = 0; i < atom->nlocal; i++) {
	if (bond_off) break;
	for (m = 0; m < atom->num_bond[i]; m++)
	if (atom->bond_type[i][m] <= 0) bond_off = 1;
	}
	}

	if (atom->avec->angles_allow && atom->molecular == 1) {
	for (i = 0; i < atom->nlocal; i++) {
	if (angle_off) break;
	for (m = 0; m < atom->num_angle[i]; m++)
	if (atom->angle_type[i][m] <= 0) angle_off = 1;
	}
	}

	int dihedral_off = 0;
	if (atom->avec->dihedrals_allow && atom->molecular == 1) {
	for (i = 0; i < atom->nlocal; i++) {
	if (dihedral_off) break;
	for (m = 0; m < atom->num_dihedral[i]; m++)
	if (atom->dihedral_type[i][m] <= 0) dihedral_off = 1;
	}
	}

	int improper_off = 0;
	if (atom->avec->impropers_allow && atom->molecular == 1) {
	for (i = 0; i < atom->nlocal; i++) {
	if (improper_off) break;
	for (m = 0; m < atom->num_improper[i]; m++)
	if (atom->improper_type[i][m] <= 0) improper_off = 1;
	}
	}

	for (i = 0; i < modify->nfix; i++)
	if ((strcmp(modify->fix[i]->style,"gcmc") == 0))
	bond_off = angle_off = dihedral_off = improper_off = 1;

	// sync on/off settings across all procs

	int onoff = bond_off;
	MPI_Allreduce(&onoff,&bond_off,1,MPI_INT,MPI_MAX,world);
	onoff = angle_off;
	MPI_Allreduce(&onoff,&angle_off,1,MPI_INT,MPI_MAX,world);
	onoff = dihedral_off;
	MPI_Allreduce(&onoff,&dihedral_off,1,MPI_INT,MPI_MAX,world);
	onoff = improper_off;
	MPI_Allreduce(&onoff,&improper_off,1,MPI_INT,MPI_MAX,world);

	// instantiate NTopo classes

	if (atom->avec->bonds_allow) {
	int old_bondwhich = bondwhich;
	if (atom->molecular == 2) bondwhich = TEMPLATE;
	else if (bond_off) bondwhich = PARTIAL;
	else bondwhich = ALL;
	if (!neigh_bond \|\| bondwhich != old_bondwhich) {
	delete neigh_bond;
	if (bondwhich == ALL)
	neigh_bond = new NTopoBondAll(lmp);
	else if (bondwhich == PARTIAL)
	neigh_bond = new NTopoBondPartial(lmp);
	else if (bondwhich == TEMPLATE)
	neigh_bond = new NTopoBondTemplate(lmp);
	}
	}

	if (atom->avec->angles_allow) {
	int old_anglewhich = anglewhich;
	if (atom->molecular == 2) anglewhich = TEMPLATE;
	else if (angle_off) anglewhich = PARTIAL;
	else anglewhich = ALL;
	if (!neigh_angle \|\| anglewhich != old_anglewhich) {
	delete neigh_angle;
	if (anglewhich == ALL)
	neigh_angle = new NTopoAngleAll(lmp);
	else if (anglewhich == PARTIAL)
	neigh_angle = new NTopoAnglePartial(lmp);
	else if (anglewhich == TEMPLATE)
	neigh_angle = new NTopoAngleTemplate(lmp);
	}
	}

	if (atom->avec->dihedrals_allow) {
	int old_dihedralwhich = dihedralwhich;
	if (atom->molecular == 2) dihedralwhich = TEMPLATE;
	else if (dihedral_off) dihedralwhich = PARTIAL;
	else dihedralwhich = ALL;
	if (!neigh_dihedral \|\| dihedralwhich != old_dihedralwhich) {
	delete neigh_dihedral;
	if (dihedralwhich == ALL)
	neigh_dihedral = new NTopoDihedralAll(lmp);
	else if (dihedralwhich == PARTIAL)
	neigh_dihedral = new NTopoDihedralPartial(lmp);
	else if (dihedralwhich == TEMPLATE)
	neigh_dihedral = new NTopoDihedralTemplate(lmp);
	}
	}

	if (atom->avec->impropers_allow) {
	int old_improperwhich = improperwhich;
	if (atom->molecular == 2) improperwhich = TEMPLATE;
	else if (improper_off) improperwhich = PARTIAL;
	else improperwhich = ALL;
	if (!neigh_improper \|\| improperwhich != old_improperwhich) {
	delete neigh_improper;
	if (improperwhich == ALL)
	neigh_improper = new NTopoImproperAll(lmp);
	else if (improperwhich == PARTIAL)
	neigh_improper = new NTopoImproperPartial(lmp);
	else if (improperwhich == TEMPLATE)
	neigh_improper = new NTopoImproperTemplate(lmp);
	}
	}
	}

	/* ----------------------------------------------------------------------
	output summary of pairwise neighbor list info
	only called by proc 0
	------------------------------------------------------------------------- */

	void Neighbor::print_pairwise_info()
	{
	int i,m;
	char str[128];
	NeighRequest *rq;
	FILE *out;

	const double cutghost = MAX(cutneighmax,comm->cutghostuser);

	double binsize, bbox[3];
	bbox[0] = bboxhi[0]-bboxlo[0];
	bbox[1] = bboxhi[1]-bboxlo[1];
	bbox[2] = bboxhi[2]-bboxlo[2];
	if (binsizeflag) binsize = binsize_user;
	else if (style == BIN) binsize = 0.5*cutneighmax;
	else binsize = 0.5*cutneighmin;
	if (binsize == 0.0) binsize = bbox[0];

	int nperpetual = 0;
	int noccasional = 0;
	int nextra = 0;
	for (i = 0; i < nlist; i++) {
	if (lists[i]->pair_method == 0) nextra++;
	else if (lists[i]->occasional) noccasional++;
	else nperpetual++;
	}

	for (m = 0; m < 2; m++) {
	if (m == 0) out = screen;
	else out = logfile;

	if (out) {
	fprintf(out,"Neighbor list info ...\n");
	fprintf(out," update every %d steps, delay %d steps, check %s\n",
	every,delay,dist_check ? "yes" : "no");
	fprintf(out," max neighbors/atom: %d, page size: %d\n",
	oneatom, pgsize);
	fprintf(out," master list distance cutoff = %g\n",cutneighmax);
	fprintf(out," ghost atom cutoff = %g\n",cutghost);
	if (style != NSQ)
	fprintf(out," binsize = %g, bins = %g %g %g\n",binsize,
	ceil(bbox[0]/binsize), ceil(bbox[1]/binsize),
	ceil(bbox[2]/binsize));

	fprintf(out," %d neighbor lists, "
	"perpetual/occasional/extra = %d %d %d\n",
	nlist,nperpetual,noccasional,nextra);

	for (i = 0; i < nlist; i++) {
	rq = requests[i];
	if (rq->pair) {
	char pname = force->pair_match_ptr((Pair ) rq->requestor);
	sprintf(str," (%d) pair %s",i+1,pname);
	} else if (rq->fix) {
	sprintf(str," (%d) fix %s",i+1,((Fix *) rq->requestor)->style);
	} else if (rq->compute) {
	sprintf(str," (%d) compute %s",i+1,
	((Compute *) rq->requestor)->style);
	} else if (rq->command) {
	sprintf(str," (%d) command %s",i+1,rq->command_style);
	} else if (rq->neigh) {
	sprintf(str," (%d) neighbor class addition",i+1);
	}
	fprintf(out,"%s",str);

	if (rq->occasional) fprintf(out,", occasional");
	else fprintf(out,", perpetual");

	// order these to get single output of most relevant

	if (rq->history)
	fprintf(out,", history for (%d)",rq->historylist+1);
	else if (rq->copy)
	fprintf(out,", copy from (%d)",rq->copylist+1);
	else if (rq->halffull)
	fprintf(out,", half/full from (%d)",rq->halffulllist+1);
	else if (rq->skip)
	fprintf(out,", skip from (%d)",rq->skiplist+1);

	fprintf(out,"\n");

	// list of neigh list attributes

	fprintf(out," attributes: ");
	if (rq->half) fprintf(out,"half");
	else if (rq->full) fprintf(out,"full");

	if (rq->newton == 0) {
	if (force->newton_pair) fprintf(out,", newton on");
	else fprintf(out,", newton off");
	} else if (rq->newton == 1) fprintf(out,", newton on");
	else if (rq->newton == 2) fprintf(out,", newton off");

	if (rq->ghost) fprintf(out,", ghost");
	if (rq->size) fprintf(out,", size");
	if (rq->history) fprintf(out,", history");
	if (rq->granonesided) fprintf(out,", onesided");
	if (rq->respainner) fprintf(out,", respa outer");
	if (rq->respamiddle) fprintf(out,", respa middle");
	if (rq->respaouter) fprintf(out,", respa inner");
	if (rq->bond) fprintf(out,", bond");
	if (rq->omp) fprintf(out,", omp");
	if (rq->intel) fprintf(out,", intel");
	if (rq->kokkos_device) fprintf(out,", kokkos_device");
	if (rq->kokkos_host) fprintf(out,", kokkos_host");
	if (rq->ssa) fprintf(out,", ssa");
	if (rq->cut) fprintf(out,", cut %g",rq->cutoff);
	if (rq->off2on) fprintf(out,", off2on");
	fprintf(out,"\n");

	fprintf(out," ");
	if (lists[i]->pair_method == 0) fprintf(out,"pair build: none\n");
	else fprintf(out,"pair build: %s\n",pairnames[lists[i]->pair_method-1]);

	fprintf(out," ");
	if (lists[i]->stencil_method == 0) fprintf(out,"stencil: none\n");
	else fprintf(out,"stencil: %s\n",
	stencilnames[lists[i]->stencil_method-1]);

	fprintf(out," ");
	if (lists[i]->bin_method == 0) fprintf(out,"bin: none\n");
	else fprintf(out,"bin: %s\n",binnames[lists[i]->bin_method-1]);
	}

	/*
	fprintf(out," %d stencil methods\n",nstencil);
	for (i = 0; i < nstencil; i++)
	fprintf(out," (%d) %s\n",
	i+1,stencilnames[neigh_stencil[i]->istyle-1]);

	fprintf(out," %d bin methods\n",nbin);
	for (i = 0; i < nbin; i++)
	fprintf(out," (%d) %s\n",i+1,binnames[neigh_bin[i]->istyle-1]);
	*/
	}
	}
	}

	/* ----------------------------------------------------------------------
	make copy of current requests and Neighbor params
	used to compare to when next run occurs
	------------------------------------------------------------------------- */

	void Neighbor::requests_new2old()
	{
	for (int i = 0; i < old_nrequest; i++) delete old_requests[i];
	memory->sfree(old_requests);

	old_nrequest = nrequest;
	old_requests = (NeighRequest **)
	memory->smalloc(old_nrequestsizeof(NeighRequest ),
	"neighbor:old_requests");

	for (int i = 0; i < old_nrequest; i++) {
	old_requests[i] = new NeighRequest(lmp);
	old_requests[i]->copy_request(requests[i],1);
	}

	old_style = style;
	old_triclinic = triclinic;
	old_pgsize = pgsize;
	old_oneatom = oneatom;
	}

	/* ----------------------------------------------------------------------
	assign NBin class to a NeighList
	use neigh request settings to build mask
	match mask to list of masks of known Nbin classes
	return index+1 of match in list of masks
	return 0 for no binning
	return -1 if no match
	------------------------------------------------------------------------- */

	int Neighbor::choose_bin(NeighRequest *rq)
	{
	// no binning needed

	if (style == NSQ) return 0;
	if (rq->skip \|\| rq->copy \|\| rq->halffull) return 0;
	if (rq->history) return 0;
	if (rq->respainner \|\| rq->respamiddle) return 0;

	// use request settings to match exactly one NBin class mask
	// checks are bitwise using NeighConst bit masks

	int mask;

	for (int i = 0; i < nbclass; i++) {
	mask = binmasks[i];

	// require match of these request flags and mask bits
	// (!A != !B) is effectively a logical xor

	if (!rq->intel != !(mask & NB_INTEL)) continue;
	if (!rq->ssa != !(mask & NB_SSA)) continue;
	if (!rq->kokkos_device != !(mask & NB_KOKKOS_DEVICE)) continue;
	if (!rq->kokkos_host != !(mask & NB_KOKKOS_HOST)) continue;

	return i+1;
	}

	// error return if matched none

	return -1;
	}

	/* ----------------------------------------------------------------------
	assign NStencil class to a NeighList
	use neigh request settings to build mask
	match mask to list of masks of known NStencil classes
	return index+1 of match in list of masks
	return 0 for no binning
	return -1 if no match
	------------------------------------------------------------------------- */

	int Neighbor::choose_stencil(NeighRequest *rq)
	{
	// no stencil creation needed

	if (style == NSQ) return 0;
	if (rq->skip \|\| rq->copy \|\| rq->halffull) return 0;
	if (rq->history) return 0;
	if (rq->respainner \|\| rq->respamiddle) return 0;

	// convert newton request to newtflag = on or off

	int newtflag;
	if (rq->newton == 0 && newton_pair) newtflag = 1;
	else if (rq->newton == 0 && !newton_pair) newtflag = 0;
	else if (rq->newton == 1) newtflag = 1;
	else if (rq->newton == 2) newtflag = 0;


	//printf("STENCIL RQ FLAGS: hff %d %d n %d g %d s %d newtflag %d\n",
	// rq->half,rq->full,rq->newton,rq->ghost,rq->ssa,
	// newtflag);

	// use request and system settings to match exactly one NStencil class mask
	// checks are bitwise using NeighConst bit masks

	int mask;

	for (int i = 0; i < nsclass; i++) {
	mask = stencilmasks[i];

	//printf("III %d: half %d full %d newton %d newtoff %d ghost %d ssa %d\n",
	// i,mask & NS_HALF,mask & NS_FULL,mask & NS_NEWTON,
	// mask & NS_NEWTOFF,mask & NS_GHOST,mask & NS_SSA);

	// exactly one of half or full is set and must match

	if (rq->half) {
	if (!(mask & NS_HALF)) continue;
	} else if (rq->full) {
	if (!(mask & NS_FULL)) continue;
	}

	// newtflag is on or off and must match

	if (newtflag) {
	if (!(mask & NS_NEWTON)) continue;
	} else if (!newtflag) {
	if (!(mask & NS_NEWTOFF)) continue;
	}

	// require match of these request flags and mask bits
	// (!A != !B) is effectively a logical xor

	if (!rq->ghost != !(mask & NS_GHOST)) continue;
	if (!rq->ssa != !(mask & NS_SSA)) continue;

	// neighbor style is BIN or MULTI and must match

	if (style == BIN) {
	if (!(mask & NS_BIN)) continue;
	} else if (style == MULTI) {
	if (!(mask & NS_MULTI)) continue;
	}

	// dimension is 2 or 3 and must match

	if (dimension == 2) {
	if (!(mask & NS_2D)) continue;
	} else if (dimension == 3) {
	if (!(mask & NS_3D)) continue;
	}

	// domain triclinic flag is on or off and must match

	if (triclinic) {
	if (!(mask & NS_TRI)) continue;
	} else if (!triclinic) {
	if (!(mask & NS_ORTHO)) continue;
	}

	return i+1;
	}

	// error return if matched none

	return -1;
	}

	/* ----------------------------------------------------------------------
	assign NPair class to a NeighList
	use neigh request settings to build mask
	match mask to list of masks of known NPair classes
	return index+1 of match in list of masks
	return 0 for no binning
	return -1 if no match
	------------------------------------------------------------------------- */

	int Neighbor::choose_pair(NeighRequest *rq)
	{
	// no neighbor list build performed

	if (rq->history) return 0;
	if (rq->respainner \|\| rq->respamiddle) return 0;

	// error check for includegroup with ghost neighbor request

	if (includegroup && rq->ghost)
	error->all(FLERR,"Neighbor include group not allowed with ghost neighbors");

	// convert newton request to newtflag = on or off

	int newtflag;
	if (rq->newton == 0 && newton_pair) newtflag = 1;
	else if (rq->newton == 0 && !newton_pair) newtflag = 0;
	else if (rq->newton == 1) newtflag = 1;
	else if (rq->newton == 2) newtflag = 0;

	int molecular = atom->molecular;

	//printf("PAIR RQ FLAGS: hf %d %d n %d g %d sz %d gos %d r %d b %d o %d i %d "
	// "kk %d %d ss %d dn %d sk %d cp %d hf %d oo %d\n",
	// rq->half,rq->full,rq->newton,rq->ghost,rq->size,
	// rq->granonesided,rq->respaouter,rq->bond,rq->omp,rq->intel,
	// rq->kokkos_host,rq->kokkos_device,rq->ssa,rq->dnum,
	// rq->skip,rq->copy,rq->halffull,rq->off2on);

	// use request and system settings to match exactly one NPair class mask
	// checks are bitwise using NeighConst bit masks

	int mask;

	for (int i = 0; i < npclass; i++) {
	mask = pairmasks[i];

	//printf(" PAIR NAMES i %d %d name %s mask %d\n",i,nrequest,
	// pairnames[i],pairmasks[i]);

	// if copy request, no further checks needed, just return or continue
	// Kokkos device/host flags must also match in order to copy

	if (rq->copy) {
	if (!(mask & NP_COPY)) continue;
	if (!rq->kokkos_device != !(mask & NP_KOKKOS_DEVICE)) continue;
	if (!rq->kokkos_host != !(mask & NP_KOKKOS_HOST)) continue;
	return i+1;
	}

	// exactly one of half or full is set and must match

	if (rq->half) {
	if (!(mask & NP_HALF)) continue;
	} else if (rq->full) {
	if (!(mask & NP_FULL)) continue;
	}

	// newtflag is on or off and must match

	if (newtflag) {
	if (!(mask & NP_NEWTON)) continue;
	} else if (!newtflag) {
	if (!(mask & NP_NEWTOFF)) continue;
	}

	// if molecular on, do not match ATOMONLY (b/c a MOLONLY Npair exists)
	// if molecular off, do not match MOLONLY (b/c an ATOMONLY Npair exists)

	if (molecular) {
	if (mask & NP_ATOMONLY) continue;
	} else if (!molecular) {
	if (mask & NP_MOLONLY) continue;
	}

	// require match of these request flags and mask bits
	// (!A != !B) is effectively a logical xor

	if (!rq->ghost != !(mask & NP_GHOST)) continue;
	if (!rq->size != !(mask & NP_SIZE)) continue;
	if (!rq->respaouter != !(mask & NP_RESPA)) continue;
	if (!rq->granonesided != !(mask & NP_ONESIDE)) continue;
	if (!rq->respaouter != !(mask & NP_RESPA)) continue;
	if (!rq->bond != !(mask & NP_BOND)) continue;
	if (!rq->omp != !(mask & NP_OMP)) continue;
	if (!rq->intel != !(mask & NP_INTEL)) continue;
	if (!rq->kokkos_device != !(mask & NP_KOKKOS_DEVICE)) continue;
	if (!rq->kokkos_host != !(mask & NP_KOKKOS_HOST)) continue;
	if (!rq->ssa != !(mask & NP_SSA)) continue;

	if (!rq->skip != !(mask & NP_SKIP)) continue;

	if (!rq->halffull != !(mask & NP_HALF_FULL)) continue;
	if (!rq->off2on != !(mask & NP_OFF2ON)) continue;

	// neighbor style is one of NSQ,BIN,MULTI and must match

	if (style == NSQ) {
	if (!(mask & NP_NSQ)) continue;
	} else if (style == BIN) {
	if (!(mask & NP_BIN)) continue;
	} else if (style == MULTI) {
	if (!(mask & NP_MULTI)) continue;
	}

	// domain triclinic flag is on or off and must match

	if (triclinic) {
	if (!(mask & NP_TRI)) continue;
	} else if (!triclinic) {
	if (!(mask & NP_ORTHO)) continue;
	}

	return i+1;
	}

	// error return if matched none

	return -1;
	}

	/* ----------------------------------------------------------------------
	called by other classes to request a pairwise neighbor list
	------------------------------------------------------------------------- */

	int Neighbor::request(void *requestor, int instance)
	{
	if (nrequest == maxrequest) {
	maxrequest += RQDELTA;
	requests = (NeighRequest **)
	memory->srealloc(requests,maxrequestsizeof(NeighRequest ),
	"neighbor:requests");
	}

	requests[nrequest] = new NeighRequest(lmp);
	requests[nrequest]->index = nrequest;
	requests[nrequest]->requestor = requestor;
	requests[nrequest]->requestor_instance = instance;
	nrequest++;
	return nrequest-1;
	}

	/* ----------------------------------------------------------------------
	one instance per entry in style_neigh_bin.h
	------------------------------------------------------------------------- */

	template <typename T>
	NBin Neighbor::bin_creator(LAMMPS lmp)
	{
	return new T(lmp);
	}

	/* ----------------------------------------------------------------------
	one instance per entry in style_neigh_stencil.h
	------------------------------------------------------------------------- */

	template <typename T>
	NStencil Neighbor::stencil_creator(LAMMPS lmp)
	{
	return new T(lmp);
	}

	/* ----------------------------------------------------------------------
	one instance per entry in style_neigh_pair.h
	------------------------------------------------------------------------- */

	template <typename T>
	NPair Neighbor::pair_creator(LAMMPS lmp)
	{
	return new T(lmp);
	}

	/* ----------------------------------------------------------------------
	setup neighbor binning and neighbor stencils
	called before run and every reneighbor if box size/shape changes
	only operates on perpetual lists
	build_one() operates on occasional lists
	------------------------------------------------------------------------- */

	void Neighbor::setup_bins()
	{
	// invoke setup_bins() for all NBin
	// actual binning is performed in build()

	for (int i = 0; i < nbin; i++)
	neigh_bin[i]->setup_bins(style);

	// invoke create_setup() and create() for all perpetual NStencil
	// same ops performed for occasional lists in build_one()

	for (int i = 0; i < nstencil_perpetual; i++) {
	neigh_stencil[slist[i]]->create_setup();
	neigh_stencil[slist[i]]->create();
	}

	last_setup_bins = update->ntimestep;
	}

	/* ---------------------------------------------------------------------- */

	int Neighbor::decide()
	{
	if (must_check) {
	bigint n = update->ntimestep;
	if (restart_check && n == output->next_restart) return 1;
	for (int i = 0; i < fix_check; i++)
	if (n == modify->fix[fixchecklist[i]]->next_reneighbor) return 1;
	}

	ago++;
	if (ago >= delay && ago % every == 0) {
	if (build_once) return 0;
	if (dist_check == 0) return 1;
	return check_distance();
	} else return 0;
	}

	/* ----------------------------------------------------------------------
	if any atom moved trigger distance (half of neighbor skin) return 1
	shrink trigger distance if box size has changed
	conservative shrink procedure:
	compute distance each of 8 corners of box has moved since last reneighbor
	reduce skin distance by sum of 2 largest of the 8 values
	new trigger = 1/2 of reduced skin distance
	for orthogonal box, only need 2 lo/hi corners
	for triclinic, need all 8 corners since deformations can displace all 8
	------------------------------------------------------------------------- */

	int Neighbor::check_distance()
	{
	double delx,dely,delz,rsq;
	double delta,deltasq,delta1,delta2;

	if (boxcheck) {
	if (triclinic == 0) {
	delx = bboxlo[0] - boxlo_hold[0];
	dely = bboxlo[1] - boxlo_hold[1];
	delz = bboxlo[2] - boxlo_hold[2];
	delta1 = sqrt(delxdelx + delydely + delz*delz);
	delx = bboxhi[0] - boxhi_hold[0];
	dely = bboxhi[1] - boxhi_hold[1];
	delz = bboxhi[2] - boxhi_hold[2];
	delta2 = sqrt(delxdelx + delydely + delz*delz);
	delta = 0.5 * (skin - (delta1+delta2));
	deltasq = delta*delta;
	} else {
	domain->box_corners();
	delta1 = delta2 = 0.0;
	for (int i = 0; i < 8; i++) {
	delx = corners[i][0] - corners_hold[i][0];
	dely = corners[i][1] - corners_hold[i][1];
	delz = corners[i][2] - corners_hold[i][2];
	delta = sqrt(delxdelx + delydely + delz*delz);
	if (delta > delta1) delta1 = delta;
	else if (delta > delta2) delta2 = delta;
	}
	delta = 0.5 * (skin - (delta1+delta2));
	deltasq = delta*delta;
	}
	} else deltasq = triggersq;

	double **x = atom->x;
	int nlocal = atom->nlocal;
	if (includegroup) nlocal = atom->nfirst;

	int flag = 0;
	for (int i = 0; i < nlocal; i++) {
	delx = x[i][0] - xhold[i][0];
	dely = x[i][1] - xhold[i][1];
	delz = x[i][2] - xhold[i][2];
	rsq = delxdelx + delydely + delz*delz;
	if (rsq > deltasq) flag = 1;
	}

	int flagall;
	MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_MAX,world);
	if (flagall && ago == MAX(every,delay)) ndanger++;
	return flagall;
	}

	/* ----------------------------------------------------------------------
	build perpetual neighbor lists
	called at setup and every few timesteps during run or minimization
	topology lists also built if topoflag = 1 (Kokkos calls with topoflag=0)
	------------------------------------------------------------------------- */

	void Neighbor::build(int topoflag)
	{
	int i,m;

	ago = 0;
	ncalls++;
	lastcall = update->ntimestep;

	int nlocal = atom->nlocal;
	int nall = nlocal + atom->nghost;

	// check that using special bond flags will not overflow neigh lists

	if (nall > NEIGHMASK)
	error->one(FLERR,"Too many local+ghost atoms for neighbor list");

	// store current atom positions and box size if needed

	if (dist_check) {
	double **x = atom->x;
	if (includegroup) nlocal = atom->nfirst;
	if (atom->nmax > maxhold) {
	maxhold = atom->nmax;
	memory->destroy(xhold);
	memory->create(xhold,maxhold,3,"neigh:xhold");
	}
	for (i = 0; i < nlocal; i++) {
	xhold[i][0] = x[i][0];
	xhold[i][1] = x[i][1];
	xhold[i][2] = x[i][2];
	}
	if (boxcheck) {
	if (triclinic == 0) {
	boxlo_hold[0] = bboxlo[0];
	boxlo_hold[1] = bboxlo[1];
	boxlo_hold[2] = bboxlo[2];
	boxhi_hold[0] = bboxhi[0];
	boxhi_hold[1] = bboxhi[1];
	boxhi_hold[2] = bboxhi[2];
	} else {
	domain->box_corners();
	corners = domain->corners;
	for (i = 0; i < 8; i++) {
	corners_hold[i][0] = corners[i][0];
	corners_hold[i][1] = corners[i][1];
	corners_hold[i][2] = corners[i][2];
	}
	}
	}
	}

	// bin atoms for all NBin instances
	// not just NBin associated with perpetual lists
	// b/c cannot wait to bin occasional lists in build_one() call
	// if bin then, atoms may have moved outside of proc domain & bin extent,
	// leading to errors or even a crash

	if (style != NSQ) {
	for (int i = 0; i < nbin; i++) {
	neigh_bin[i]->bin_atoms_setup(nall);
	neigh_bin[i]->bin_atoms();
	}
	}

	// build pairwise lists for all perpetual NPair/NeighList
	// grow() with nlocal/nall args so that only realloc if have to

	for (i = 0; i < npair_perpetual; i++) {
	m = plist[i];
	if (!lists[m]->copy) lists[m]->grow(nlocal,nall);
	neigh_pair[m]->build_setup();
	neigh_pair[m]->build(lists[m]);
	}

	// build topology lists for bonds/angles/etc

	if (atom->molecular && topoflag) build_topology();
	}

	/* ----------------------------------------------------------------------
	build topology neighbor lists: bond, angle, dihedral, improper
	copy their list info back to Neighbor for access by bond/angle/etc classes
	------------------------------------------------------------------------- */

	void Neighbor::build_topology()
	{
	if (force->bond) {
	neigh_bond->build();
	nbondlist = neigh_bond->nbondlist;
	bondlist = neigh_bond->bondlist;
	}
	if (force->angle) {
	neigh_angle->build();
	nanglelist = neigh_angle->nanglelist;
	anglelist = neigh_angle->anglelist;
	}
	if (force->dihedral) {
	neigh_dihedral->build();
	ndihedrallist = neigh_dihedral->ndihedrallist;
	dihedrallist = neigh_dihedral->dihedrallist;
	}
	if (force->improper) {
	neigh_improper->build();
	nimproperlist = neigh_improper->nimproperlist;
	improperlist = neigh_improper->improperlist;
	}
	}

	/* ----------------------------------------------------------------------
	build a single occasional pairwise neighbor list indexed by I
	called by other classes
	------------------------------------------------------------------------- */

	void Neighbor::build_one(class NeighList *mylist, int preflag)
	{
	// check if list structure is initialized

	if (mylist == NULL)
	error->all(FLERR,"Trying to build an occasional neighbor list "
	"before initialization completed");

	// build_one() should never be invoked on a perpetual list

	if (!mylist->occasional)
	error->all(FLERR,"Neighbor build one invoked on perpetual list");

	// no need to build if already built since last re-neighbor
	// preflag is set by fix bond/create and fix bond/swap
	// b/c they invoke build_one() on same step neigh list is re-built,
	// but before re-build, so need to use ">" instead of ">="

	NPair *np = neigh_pair[mylist->index];

	if (preflag) {
	if (np->last_build > lastcall) return;
	} else {
	if (np->last_build >= lastcall) return;
	}

	// if this is copy list and parent is occasional list,
	// or this is halffull and parent is occasional list,
	// insure parent is current

	if (mylist->listcopy && mylist->listcopy->occasional)
	build_one(mylist->listcopy,preflag);
	if (mylist->listfull && mylist->listfull->occasional)
	build_one(mylist->listfull,preflag);

	// create stencil if hasn't been created since last setup_bins() call

	NStencil *ns = np->ns;
	if (ns && ns->last_stencil < last_setup_bins) {
	ns->create_setup();
	ns->create();
	}

	// build the list

	np->build_setup();
	np->build(mylist);
	}

	/* ----------------------------------------------------------------------
	set neighbor style and skin distance
	------------------------------------------------------------------------- */

	void Neighbor::set(int narg, char **arg)
	{
	if (narg != 2) error->all(FLERR,"Illegal neighbor command");

	skin = force->numeric(FLERR,arg[0]);
	if (skin < 0.0) error->all(FLERR,"Illegal neighbor command");

	if (strcmp(arg[1],"nsq") == 0) style = NSQ;
	else if (strcmp(arg[1],"bin") == 0) style = BIN;
	else if (strcmp(arg[1],"multi") == 0) style = MULTI;
	else error->all(FLERR,"Illegal neighbor command");

	if (style == MULTI && lmp->citeme) lmp->citeme->add(cite_neigh_multi);
	}

	/* ----------------------------------------------------------------------
	reset timestamps in all NeignBin, NStencil, NPair classes
	so that neighbor lists will rebuild properly with timestep change
	ditto for lastcall and last_setup_bins
	------------------------------------------------------------------------- */

	void Neighbor::reset_timestep(bigint ntimestep)
	{
	for (int i = 0; i < nbin; i++)
	neigh_bin[i]->last_bin = -1;
	for (int i = 0; i < nstencil; i++)
	neigh_stencil[i]->last_stencil = -1;
	for (int i = 0; i < nlist; i++) {
	if (!neigh_pair[i]) continue;
	neigh_pair[i]->last_build = -1;
	}

	lastcall = -1;
	last_setup_bins = -1;
	}

	/* ----------------------------------------------------------------------
	modify parameters of the pair-wise neighbor build
	------------------------------------------------------------------------- */

	void Neighbor::modify_params(int narg, char **arg)
	{
	int iarg = 0;
	while (iarg < narg) {
	if (strcmp(arg[iarg],"every") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
	every = force->inumeric(FLERR,arg[iarg+1]);
	if (every <= 0) error->all(FLERR,"Illegal neigh_modify command");
	iarg += 2;
	} else if (strcmp(arg[iarg],"delay") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
	delay = force->inumeric(FLERR,arg[iarg+1]);
	if (delay < 0) error->all(FLERR,"Illegal neigh_modify command");
	iarg += 2;
	} else if (strcmp(arg[iarg],"check") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
	if (strcmp(arg[iarg+1],"yes") == 0) dist_check = 1;
	else if (strcmp(arg[iarg+1],"no") == 0) dist_check = 0;
	else error->all(FLERR,"Illegal neigh_modify command");
	iarg += 2;
	} else if (strcmp(arg[iarg],"once") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
	if (strcmp(arg[iarg+1],"yes") == 0) build_once = 1;
	else if (strcmp(arg[iarg+1],"no") == 0) build_once = 0;
	else error->all(FLERR,"Illegal neigh_modify command");
	iarg += 2;
	} else if (strcmp(arg[iarg],"page") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
	old_pgsize = pgsize;
	pgsize = force->inumeric(FLERR,arg[iarg+1]);
	iarg += 2;
	} else if (strcmp(arg[iarg],"one") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
	old_oneatom = oneatom;
	oneatom = force->inumeric(FLERR,arg[iarg+1]);
	iarg += 2;
	} else if (strcmp(arg[iarg],"binsize") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
	binsize_user = force->numeric(FLERR,arg[iarg+1]);
	if (binsize_user <= 0.0) binsizeflag = 0;
	else binsizeflag = 1;
	iarg += 2;
	} else if (strcmp(arg[iarg],"cluster") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
	if (strcmp(arg[iarg+1],"yes") == 0) cluster_check = 1;
	else if (strcmp(arg[iarg+1],"no") == 0) cluster_check = 0;
	else error->all(FLERR,"Illegal neigh_modify command");
	iarg += 2;

	} else if (strcmp(arg[iarg],"include") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
	includegroup = group->find(arg[iarg+1]);
	if (includegroup < 0)
	error->all(FLERR,"Invalid group ID in neigh_modify command");
	if (includegroup && (atom->firstgroupname == NULL \|\|
	strcmp(arg[iarg+1],atom->firstgroupname) != 0))
	error->all(FLERR,
	"Neigh_modify include group != atom_modify first group");
	iarg += 2;

	} else if (strcmp(arg[iarg],"exclude") == 0) {
	if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");

	if (strcmp(arg[iarg+1],"type") == 0) {
	if (iarg+4 > narg) error->all(FLERR,"Illegal neigh_modify command");
	if (nex_type == maxex_type) {
	maxex_type += EXDELTA;
	memory->grow(ex1_type,maxex_type,"neigh:ex1_type");
	memory->grow(ex2_type,maxex_type,"neigh:ex2_type");
	}
	ex1_type[nex_type] = force->inumeric(FLERR,arg[iarg+2]);
	ex2_type[nex_type] = force->inumeric(FLERR,arg[iarg+3]);
	nex_type++;
	iarg += 4;

	} else if (strcmp(arg[iarg+1],"group") == 0) {
	if (iarg+4 > narg) error->all(FLERR,"Illegal neigh_modify command");
	if (nex_group == maxex_group) {
	maxex_group += EXDELTA;
	memory->grow(ex1_group,maxex_group,"neigh:ex1_group");
	memory->grow(ex2_group,maxex_group,"neigh:ex2_group");
	}
	ex1_group[nex_group] = group->find(arg[iarg+2]);
	ex2_group[nex_group] = group->find(arg[iarg+3]);
	if (ex1_group[nex_group] == -1 \|\| ex2_group[nex_group] == -1)
	error->all(FLERR,"Invalid group ID in neigh_modify command");
	nex_group++;
	iarg += 4;

	} else if (strcmp(arg[iarg+1],"molecule/inter") == 0 \|\|
	strcmp(arg[iarg+1],"molecule/intra") == 0) {
	if (iarg+3 > narg) error->all(FLERR,"Illegal neigh_modify command");
	if (atom->molecule_flag == 0)
	error->all(FLERR,"Neigh_modify exclude molecule "
	"requires atom attribute molecule");
	if (nex_mol == maxex_mol) {
	maxex_mol += EXDELTA;
	memory->grow(ex_mol_group,maxex_mol,"neigh:ex_mol_group");
	if (lmp->kokkos)
	grow_ex_mol_intra_kokkos();
	else
	memory->grow(ex_mol_intra,maxex_mol,"neigh:ex_mol_intra");
	}
	ex_mol_group[nex_mol] = group->find(arg[iarg+2]);
	if (ex_mol_group[nex_mol] == -1)
	error->all(FLERR,"Invalid group ID in neigh_modify command");
	if (strcmp(arg[iarg+1],"molecule/intra") == 0)
	ex_mol_intra[nex_mol] = 1;
	else
	ex_mol_intra[nex_mol] = 0;
	nex_mol++;
	iarg += 3;

	} else if (strcmp(arg[iarg+1],"none") == 0) {
	nex_type = nex_group = nex_mol = 0;
	iarg += 2;

	} else error->all(FLERR,"Illegal neigh_modify command");

	} else error->all(FLERR,"Illegal neigh_modify command");
	}
	}

	/* ----------------------------------------------------------------------
	remove the first group-group exclusion matching group1, group2
	------------------------------------------------------------------------- */

	void Neighbor::exclusion_group_group_delete(int group1, int group2)
	{
	int m, mlast;
	for (m = 0; m < nex_group; m++)
	if (ex1_group[m] == group1 && ex2_group[m] == group2 )
	break;

	mlast = m;
	if (mlast == nex_group)
	error->all(FLERR,"Unable to find group-group exclusion");

	for (m = mlast+1; m < nex_group; m++) {
	ex1_group[m-1] = ex1_group[m];
	ex2_group[m-1] = ex2_group[m];
	ex1_bit[m-1] = ex1_bit[m];
	ex2_bit[m-1] = ex2_bit[m];
	}
	nex_group--;
	}


	/* ----------------------------------------------------------------------
	return the value of exclude - used to check compatibility with GPU
	------------------------------------------------------------------------- */

	int Neighbor::exclude_setting()
	{
	return exclude;
	}

	/* ----------------------------------------------------------------------
	return # of bytes of allocated memory
	------------------------------------------------------------------------- */

	bigint Neighbor::memory_usage()
	{
	bigint bytes = 0;
	bytes += memory->usage(xhold,maxhold,3);

	for (int i = 0; i < nlist; i++)
	if (lists[i]) bytes += lists[i]->memory_usage();
	for (int i = 0; i < nstencil; i++)
	bytes += neigh_stencil[i]->memory_usage();
	for (int i = 0; i < nbin; i++)
	bytes += neigh_bin[i]->memory_usage();

	if (neigh_bond) bytes += neigh_bond->memory_usage();
	if (neigh_angle) bytes += neigh_angle->memory_usage();
	if (neigh_dihedral) bytes += neigh_dihedral->memory_usage();
	if (neigh_improper) bytes += neigh_improper->memory_usage();

	return bytes;
	}
	diff --git a/src/pair.h b/src/pair.h
	index 3f66c6095..dd859e5f2 100644
	--- a/src/pair.h
	+++ b/src/pair.h
	@@ -1,362 +1,362 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifndef LMP_PAIR_H
	#define LMP_PAIR_H

	#include "pointers.h"
	#include "accelerator_kokkos.h"

	namespace LAMMPS_NS {

	class Pair : protected Pointers {
	friend class AngleSDK;
	friend class AngleSDKOMP;
	friend class BondQuartic;
	friend class BondQuarticOMP;
	friend class DihedralCharmm;
	friend class DihedralCharmmOMP;
	friend class FixGPU;
	friend class FixOMP;
	friend class ThrOMP;
	friend class Info;

	public:
	static int instance_total; // # of Pair classes ever instantiated

	double eng_vdwl,eng_coul; // accumulated energies
	double virial[6]; // accumulated virial
	double eatom,*vatom; // accumulated per-atom energy/virial

	double cutforce; // max cutoff for all atom pairs
	double **cutsq; // cutoff sq for each atom pair
	int **setflag; // 0/1 = whether each i,j has been set

	int comm_forward; // size of forward communication (0 if none)
	int comm_reverse; // size of reverse communication (0 if none)
	int comm_reverse_off; // size of reverse comm even if newton off

	int single_enable; // 1 if single() routine exists
	int restartinfo; // 1 if pair style writes restart info
	int respa_enable; // 1 if inner/middle/outer rRESPA routines
	int one_coeff; // 1 if allows only one coeff * * call
	int manybody_flag; // 1 if a manybody potential
	int no_virial_fdotr_compute; // 1 if does not invoke virial_fdotr_compute()
	int writedata; // 1 if writes coeffs to data file
	int ghostneigh; // 1 if pair style needs neighbors of ghosts
	double **cutghost; // cutoff for each ghost pair

	int ewaldflag; // 1 if compatible with Ewald solver
	int pppmflag; // 1 if compatible with PPPM solver
	int msmflag; // 1 if compatible with MSM solver
	int dispersionflag; // 1 if compatible with LJ/dispersion solver
	int tip4pflag; // 1 if compatible with TIP4P solver
	int dipoleflag; // 1 if compatible with dipole solver
	int reinitflag; // 1 if compatible with fix adapt and alike

	int tail_flag; // pair_modify flag for LJ tail correction
	double etail,ptail; // energy/pressure tail corrections
	double etail_ij,ptail_ij;

	int evflag; // energy,virial settings
	int eflag_either,eflag_global,eflag_atom;
	int vflag_either,vflag_global,vflag_atom;

	int ncoultablebits; // size of Coulomb table, accessed by KSpace
	int ndisptablebits; // size of dispersion table
	double tabinnersq;
	double tabinnerdispsq;
	double rtable,drtable,ftable,dftable,ctable,dctable;
	double etable,detable,ptable,dptable,vtable,dvtable;
	double rdisptable, drdisptable, fdisptable, dfdisptable;
	double edisptable, dedisptable;
	int ncoulshiftbits,ncoulmask;
	int ndispshiftbits, ndispmask;

	int nextra; // # of extra quantities pair style calculates
	double *pvector; // vector of extra pair quantities

	int single_extra; // number of extra single values calculated
	double *svector; // vector of extra single quantities

	class NeighList *list; // standard neighbor list used by most pairs
	class NeighList *listhalf; // half list used by some pairs
	class NeighList *listfull; // full list used by some pairs
	class NeighList *listhistory; // neighbor history list used by some pairs
	class NeighList *listinner; // rRESPA lists used by some pairs
	class NeighList *listmiddle;
	class NeighList *listouter;

	int allocated; // 0/1 = whether arrays are allocated
	// public so external driver can check
	int compute_flag; // 0 if skip compute()

	// KOKKOS host/device flag and data masks

	ExecutionSpace execution_space;
	unsigned int datamask_read,datamask_modify;

	Pair(class LAMMPS *);
	virtual ~Pair();

	// top-level Pair methods

	void init();
	virtual void reinit();
	virtual void setup() {}
	double mix_energy(double, double, double, double);
	double mix_distance(double, double);
	void write_file(int, char **);
	void init_bitmap(double, double, int, int &, int &, int &, int &);
	virtual void modify_params(int, char **);
	void compute_dummy(int, int);

	// need to be public, so can be called by pair_style reaxc

	void v_tally(int, double , double );
	void ev_tally(int, int, int, int, double, double, double,
	double, double, double);
	void ev_tally3(int, int, int, double, double,
	double , double , double , double );
	void v_tally3(int, int, int, double , double , double , double );
	void v_tally4(int, int, int, int, double , double , double *,
	double , double , double *);
	void ev_tally_xyz(int, int, int, int, double, double,
	double, double, double, double, double, double);

	// general child-class methods

	virtual void compute(int, int) = 0;
	virtual void compute_inner() {}
	virtual void compute_middle() {}
	virtual void compute_outer(int, int) {}

	virtual double single(int, int, int, int,
	double, double, double,
	double& fforce) {
	fforce = 0.0;
	return 0.0;
	}

	virtual void settings(int, char **) = 0;
	virtual void coeff(int, char **) = 0;

	virtual void init_style();
	virtual void init_list(int, class NeighList *);
	virtual double init_one(int, int) {return 0.0;}

	virtual void init_tables(double, double *);
	virtual void init_tables_disp(double);
	virtual void free_tables();
	virtual void free_disp_tables();

	virtual void write_restart(FILE *) {}
	virtual void read_restart(FILE *) {}
	virtual void write_restart_settings(FILE *) {}
	virtual void read_restart_settings(FILE *) {}
	virtual void write_data(FILE *) {}
	virtual void write_data_all(FILE *) {}

	virtual int pack_forward_comm(int, int , double , int, int *) {return 0;}
	virtual void unpack_forward_comm(int, int, double *) {}
	virtual int pack_forward_comm_kokkos(int, DAT::tdual_int_2d,
	int, DAT::tdual_xfloat_1d &,
	int, int *) {return 0;};
	virtual void unpack_forward_comm_kokkos(int, int, DAT::tdual_xfloat_1d &) {}
	virtual int pack_reverse_comm(int, int, double *) {return 0;}
	virtual void unpack_reverse_comm(int, int , double ) {}
	virtual double memory_usage();

	void set_copymode(int value) {copymode = value;}

	// specific child-class methods for certain Pair styles

	virtual void extract(const char , int &) {return NULL;}
	virtual void swap_eam(double , double *) {}
	virtual void reset_dt() {}
	virtual void min_xf_pointers(int, double , double ) {}
	virtual void min_xf_get(int) {}
	virtual void min_x_set(int) {}

	// management of callbacks to be run from ev_tally()

	protected:
	int num_tally_compute;
	class Compute **list_tally_compute;
	public:
	- void add_tally_callback(class Compute *);
	- void del_tally_callback(class Compute *);
	+ virtual void add_tally_callback(class Compute *);
	+ virtual void del_tally_callback(class Compute *);

	protected:
	int instance_me; // which Pair class instantiation I am

	enum{GEOMETRIC,ARITHMETIC,SIXTHPOWER}; // mixing options

	int special_lj[4]; // copied from force->special_lj for Kokkos

	int suffix_flag; // suffix compatibility flag

	// pair_modify settings
	int offset_flag,mix_flag; // flags for offset and mixing
	double tabinner; // inner cutoff for Coulomb table
	double tabinner_disp; // inner cutoff for dispersion table

	// custom data type for accessing Coulomb tables

	typedef union {int i; float f;} union_int_float_t;

	int vflag_fdotr;
	int maxeatom,maxvatom;

	int copymode; // if set, do not deallocate during destruction
	// required when classes are used as functors by Kokkos

	virtual void ev_setup(int, int, int alloc = 1);
	void ev_unset();
	void ev_tally_full(int, double, double, double, double, double, double);
	void ev_tally_xyz_full(int, double, double,
	double, double, double, double, double, double);
	void ev_tally4(int, int, int, int, double,
	double , double , double , double , double , double );
	void ev_tally_tip4p(int, int , double , double, double);
	void v_tally2(int, int, double, double *);
	void v_tally_tensor(int, int, int, int,
	double, double, double, double, double, double);
	void virial_fdotr_compute();

	// union data struct for packing 32-bit and 64-bit ints into double bufs
	// see atom_vec.h for documentation

	union ubuf {
	double d;
	int64_t i;
	ubuf(double arg) : d(arg) {}
	ubuf(int64_t arg) : i(arg) {}
	ubuf(int arg) : i(arg) {}
	};

	inline int sbmask(int j) {
	return j >> SBBITS & 3;
	}
	};

	}

	#endif

	/* ERROR/WARNING messages:

	E: Illegal ... command

	Self-explanatory. Check the input script syntax and compare to the
	documentation for the command. You can use -echo screen as a
	command-line option when running LAMMPS to see the offending line.

	E: Too many total bits for bitmapped lookup table

	Table size specified via pair_modify command is too large. Note that
	a value of N generates a 2^N size table.

	E: Cannot have both pair_modify shift and tail set to yes

	These 2 options are contradictory.

	E: Cannot use pair tail corrections with 2d simulations

	The correction factors are only currently defined for 3d systems.

	W: Using pair tail corrections with nonperiodic system

	This is probably a bogus thing to do, since tail corrections are
	computed by integrating the density of a periodic system out to
	infinity.

	W: Using pair tail corrections with pair_modify compute no

	The tail corrections will thus not be computed.

	W: Using pair potential shift with pair_modify compute no

	The shift effects will thus not be computed.

	W: Using a manybody potential with bonds/angles/dihedrals and special_bond exclusions

	This is likely not what you want to do. The exclusion settings will
	eliminate neighbors in the neighbor list, which the manybody potential
	needs to calculated its terms correctly.

	E: All pair coeffs are not set

	All pair coefficients must be set in the data file or by the
	pair_coeff command before running a simulation.

	E: Fix adapt interface to this pair style not supported

	New coding for the pair style would need to be done.

	E: Pair style requires a KSpace style

	No kspace style is defined.

	E: Cannot yet use compute tally with Kokkos

	This feature is not yet supported.

	E: Pair style does not support pair_write

	The pair style does not have a single() function, so it can
	not be invoked by pair write.

	E: Invalid atom types in pair_write command

	Atom types must range from 1 to Ntypes inclusive.

	E: Invalid style in pair_write command

	Self-explanatory. Check the input script.

	E: Invalid cutoffs in pair_write command

	Inner cutoff must be larger than 0.0 and less than outer cutoff.

	E: Cannot open pair_write file

	The specified output file for pair energies and forces cannot be
	opened. Check that the path and name are correct.

	E: Bitmapped lookup tables require int/float be same size

	Cannot use pair tables on this machine, because of word sizes. Use
	the pair_modify command with table 0 instead.

	W: Table inner cutoff >= outer cutoff

	You specified an inner cutoff for a Coulombic table that is longer
	than the global cutoff. Probably not what you wanted.

	E: Too many exponent bits for lookup table

	Table size specified via pair_modify command does not work with your
	machine's floating point representation.

	E: Too many mantissa bits for lookup table

	Table size specified via pair_modify command does not work with your
	machine's floating point representation.

	E: Too few bits for lookup table

	Table size specified via pair_modify command does not work with your
	machine's floating point representation.

	*/
	diff --git a/src/pair_hybrid.cpp b/src/pair_hybrid.cpp
	index 03e55006f..fa79f1cf9 100644
	--- a/src/pair_hybrid.cpp
	+++ b/src/pair_hybrid.cpp
	@@ -1,933 +1,968 @@
	/* ----------------------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#include <math.h>
	#include <stdlib.h>
	#include <string.h>
	#include <ctype.h>
	#include "pair_hybrid.h"
	#include "atom.h"
	#include "force.h"
	#include "pair.h"
	#include "neighbor.h"
	#include "neigh_request.h"
	#include "update.h"
	#include "comm.h"
	#include "memory.h"
	#include "error.h"
	#include "respa.h"

	using namespace LAMMPS_NS;

	/* ---------------------------------------------------------------------- */

	PairHybrid::PairHybrid(LAMMPS *lmp) : Pair(lmp),
	styles(NULL), keywords(NULL), multiple(NULL), nmap(NULL),
	- map(NULL), special_lj(NULL), special_coul(NULL)
	+ map(NULL), special_lj(NULL), special_coul(NULL), compute_tally(NULL)
	{
	nstyles = 0;

	outerflag = 0;
	respaflag = 0;

	if (lmp->kokkos)
	error->all(FLERR,"Cannot yet use pair hybrid with Kokkos");
	}

	/* ---------------------------------------------------------------------- */

	PairHybrid::~PairHybrid()
	{
	if (nstyles) {
	for (int m = 0; m < nstyles; m++) {
	delete styles[m];
	delete [] keywords[m];
	if (special_lj[m]) delete [] special_lj[m];
	if (special_coul[m]) delete [] special_coul[m];
	}
	}
	delete [] styles;
	delete [] keywords;
	delete [] multiple;

	delete [] special_lj;
	delete [] special_coul;
	+ delete [] compute_tally;

	delete [] svector;

	if (allocated) {
	memory->destroy(setflag);
	memory->destroy(cutsq);
	memory->destroy(cutghost);
	memory->destroy(nmap);
	memory->destroy(map);
	}
	}

	/* ----------------------------------------------------------------------
	call each sub-style's compute() or compute_outer() function
	accumulate sub-style global/peratom energy/virial in hybrid
	for global vflag = 1:
	each sub-style computes own virial[6]
	sum sub-style virial[6] to hybrid's virial[6]
	for global vflag = 2:
	call sub-style with adjusted vflag to prevent it calling
	virial_fdotr_compute()
	hybrid calls virial_fdotr_compute() on final accumulated f
	------------------------------------------------------------------------- */

	void PairHybrid::compute(int eflag, int vflag)
	{
	int i,j,m,n;

	// if no_virial_fdotr_compute is set and global component of
	// incoming vflag = 2, then
	// reset vflag as if global component were 1
	// necessary since one or more sub-styles cannot compute virial as F dot r

	if (no_virial_fdotr_compute && vflag % 4 == 2) vflag = 1 + vflag/4 * 4;

	if (eflag \|\| vflag) ev_setup(eflag,vflag);
	else evflag = vflag_fdotr = eflag_global = vflag_global =
	eflag_atom = vflag_atom = 0;

	// check if global component of incoming vflag = 2
	// if so, reset vflag passed to substyle as if it were 0
	// necessary so substyle will not invoke virial_fdotr_compute()

	int vflag_substyle;
	if (vflag % 4 == 2) vflag_substyle = vflag/4 * 4;
	else vflag_substyle = vflag;

	double *saved_special = save_special();

	// check if we are running with r-RESPA using the hybrid keyword

	Respa *respa = NULL;
	respaflag = 0;
	if (strstr(update->integrate_style,"respa")) {
	respa = (Respa *) update->integrate;
	if (respa->nhybrid_styles > 0) respaflag = 1;
	}

	for (m = 0; m < nstyles; m++) {

	set_special(m);

	if (!respaflag \|\| (respaflag && respa->hybrid_compute[m])) {

	// invoke compute() unless compute flag is turned off or
	// outerflag is set and sub-style has a compute_outer() method

	if (styles[m]->compute_flag == 0) continue;
	if (outerflag && styles[m]->respa_enable)
	styles[m]->compute_outer(eflag,vflag_substyle);
	else styles[m]->compute(eflag,vflag_substyle);
	}

	restore_special(saved_special);

	// jump to next sub-style if r-RESPA does not want global accumulated data

	if (respaflag && !respa->tally_global) continue;

	if (eflag_global) {
	eng_vdwl += styles[m]->eng_vdwl;
	eng_coul += styles[m]->eng_coul;
	}
	if (vflag_global) {
	for (n = 0; n < 6; n++) virial[n] += styles[m]->virial[n];
	}
	if (eflag_atom) {
	n = atom->nlocal;
	if (force->newton_pair) n += atom->nghost;
	double *eatom_substyle = styles[m]->eatom;
	for (i = 0; i < n; i++) eatom[i] += eatom_substyle[i];
	}
	if (vflag_atom) {
	n = atom->nlocal;
	if (force->newton_pair) n += atom->nghost;
	double **vatom_substyle = styles[m]->vatom;
	for (i = 0; i < n; i++)
	for (j = 0; j < 6; j++)
	vatom[i][j] += vatom_substyle[i][j];
	}
	}

	delete [] saved_special;

	if (vflag_fdotr) virial_fdotr_compute();
	}

	+
	+/* ---------------------------------------------------------------------- */
	+
	+void PairHybrid::add_tally_callback(Compute *ptr)
	+{
	+ for (int m = 0; m < nstyles; m++)
	+ if (compute_tally[m]) styles[m]->add_tally_callback(ptr);
	+}
	+
	+/* ---------------------------------------------------------------------- */
	+
	+void PairHybrid::del_tally_callback(Compute *ptr)
	+{
	+ for (int m = 0; m < nstyles; m++)
	+ if (compute_tally[m]) styles[m]->del_tally_callback(ptr);
	+}
	+
	/* ---------------------------------------------------------------------- */

	void PairHybrid::compute_inner()
	{
	for (int m = 0; m < nstyles; m++)
	if (styles[m]->respa_enable) styles[m]->compute_inner();
	}

	/* ---------------------------------------------------------------------- */

	void PairHybrid::compute_middle()
	{
	for (int m = 0; m < nstyles; m++)
	if (styles[m]->respa_enable) styles[m]->compute_middle();
	}

	/* ---------------------------------------------------------------------- */

	void PairHybrid::compute_outer(int eflag, int vflag)
	{
	outerflag = 1;
	compute(eflag,vflag);
	outerflag = 0;
	}

	/* ----------------------------------------------------------------------
	allocate all arrays
	------------------------------------------------------------------------- */

	void PairHybrid::allocate()
	{
	allocated = 1;
	int n = atom->ntypes;

	memory->create(setflag,n+1,n+1,"pair:setflag");
	for (int i = 1; i <= n; i++)
	for (int j = i; j <= n; j++)
	setflag[i][j] = 0;

	memory->create(cutsq,n+1,n+1,"pair:cutsq");
	memory->create(cutghost,n+1,n+1,"pair:cutghost");

	memory->create(nmap,n+1,n+1,"pair:nmap");
	memory->create(map,n+1,n+1,nstyles,"pair:map");
	for (int i = 1; i <= n; i++)
	for (int j = i; j <= n; j++)
	nmap[i][j] = 0;
	}

	/* ----------------------------------------------------------------------
	create one pair style for each arg in list
	------------------------------------------------------------------------- */

	void PairHybrid::settings(int narg, char **arg)
	{
	if (narg < 1) error->all(FLERR,"Illegal pair_style command");

	// delete old lists, since cannot just change settings

	if (nstyles) {
	for (int m = 0; m < nstyles; m++) delete styles[m];
	delete [] styles;
	for (int m = 0; m < nstyles; m++) delete [] keywords[m];
	delete [] keywords;
	}

	if (allocated) {
	memory->destroy(setflag);
	memory->destroy(cutsq);
	memory->destroy(cutghost);
	memory->destroy(nmap);
	memory->destroy(map);
	}
	allocated = 0;

	// allocate list of sub-styles as big as possibly needed if no extra args

	styles = new Pair*[narg];
	keywords = new char*[narg];
	multiple = new int[narg];

	special_lj = new double*[narg];
	special_coul = new double*[narg];

	+ compute_tally = new int[narg];
	+
	// allocate each sub-style
	// allocate uses suffix, but don't store suffix version in keywords,
	// else syntax in coeff() will not match
	// call settings() with set of args that are not pair style names
	// use force->pair_map to determine which args these are

	int iarg,jarg,dummy;

	iarg = 0;
	nstyles = 0;
	while (iarg < narg) {
	if (strcmp(arg[iarg],"hybrid") == 0)
	error->all(FLERR,"Pair style hybrid cannot have hybrid as an argument");
	if (strcmp(arg[iarg],"none") == 0)
	error->all(FLERR,"Pair style hybrid cannot have none as an argument");

	styles[nstyles] = force->new_pair(arg[iarg],1,dummy);
	force->store_style(keywords[nstyles],arg[iarg],0);
	special_lj[nstyles] = special_coul[nstyles] = NULL;
	+ compute_tally[nstyles] = 1;

	jarg = iarg + 1;
	while (jarg < narg && !force->pair_map->count(arg[jarg])) jarg++;
	styles[nstyles]->settings(jarg-iarg-1,&arg[iarg+1]);
	iarg = jarg;
	nstyles++;
	}

	// multiple[i] = 1 to M if sub-style used multiple times, else 0

	for (int i = 0; i < nstyles; i++) {
	int count = 0;
	for (int j = 0; j < nstyles; j++) {
	if (strcmp(keywords[j],keywords[i]) == 0) count++;
	if (j == i) multiple[i] = count;
	}
	if (count == 1) multiple[i] = 0;
	}

	// set pair flags from sub-style flags

	flags();
	}

	/* ----------------------------------------------------------------------
	set top-level pair flags from sub-style flags
	------------------------------------------------------------------------- */

	void PairHybrid::flags()
	{
	int m;

	// set comm_forward, comm_reverse, comm_reverse_off to max of any sub-style

	for (m = 0; m < nstyles; m++) {
	if (styles[m]) comm_forward = MAX(comm_forward,styles[m]->comm_forward);
	if (styles[m]) comm_reverse = MAX(comm_reverse,styles[m]->comm_reverse);
	if (styles[m]) comm_reverse_off = MAX(comm_reverse_off,
	styles[m]->comm_reverse_off);
	}

	// single_enable = 1 if any sub-style is set
	// respa_enable = 1 if any sub-style is set
	// manybody_flag = 1 if any sub-style is set
	// no_virial_fdotr_compute = 1 if any sub-style is set
	// ghostneigh = 1 if any sub-style is set
	// ewaldflag, pppmflag, msmflag, dipoleflag, dispersionflag, tip4pflag = 1
	// if any sub-style is set
	// compute_flag = 1 if any sub-style is set

	single_enable = 0;
	compute_flag = 0;
	for (m = 0; m < nstyles; m++) {
	if (styles[m]->single_enable) single_enable = 1;
	if (styles[m]->respa_enable) respa_enable = 1;
	if (styles[m]->manybody_flag) manybody_flag = 1;
	if (styles[m]->no_virial_fdotr_compute) no_virial_fdotr_compute = 1;
	if (styles[m]->ghostneigh) ghostneigh = 1;
	if (styles[m]->ewaldflag) ewaldflag = 1;
	if (styles[m]->pppmflag) pppmflag = 1;
	if (styles[m]->msmflag) msmflag = 1;
	if (styles[m]->dipoleflag) dipoleflag = 1;
	if (styles[m]->dispersionflag) dispersionflag = 1;
	if (styles[m]->tip4pflag) tip4pflag = 1;
	if (styles[m]->compute_flag) compute_flag = 1;
	}

	// single_extra = min of all sub-style single_extra
	// allocate svector

	single_extra = styles[0]->single_extra;
	for (m = 1; m < nstyles; m++)
	single_extra = MIN(single_extra,styles[m]->single_extra);

	if (single_extra) {
	delete [] svector;
	svector = new double[single_extra];
	}
	}

	/* ----------------------------------------------------------------------
	set coeffs for one or more type pairs
	------------------------------------------------------------------------- */

	void PairHybrid::coeff(int narg, char **arg)
	{
	if (narg < 3) error->all(FLERR,"Incorrect args for pair coefficients");
	if (!allocated) allocate();

	int ilo,ihi,jlo,jhi;
	force->bounds(FLERR,arg[0],atom->ntypes,ilo,ihi);
	force->bounds(FLERR,arg[1],atom->ntypes,jlo,jhi);

	// 3rd arg = pair sub-style name
	// 4th arg = pair sub-style index if name used multiple times
	// allow for "none" as valid sub-style name

	int multflag;
	int m;

	for (m = 0; m < nstyles; m++) {
	multflag = 0;
	if (strcmp(arg[2],keywords[m]) == 0) {
	if (multiple[m]) {
	multflag = 1;
	if (narg < 4) error->all(FLERR,"Incorrect args for pair coefficients");
	if (!isdigit(arg[3][0]))
	error->all(FLERR,"Incorrect args for pair coefficients");
	int index = force->inumeric(FLERR,arg[3]);
	if (index == multiple[m]) break;
	else continue;
	} else break;
	}
	}

	int none = 0;
	if (m == nstyles) {
	if (strcmp(arg[2],"none") == 0) none = 1;
	else error->all(FLERR,"Pair coeff for hybrid has invalid style");
	}

	// move 1st/2nd args to 2nd/3rd args
	// if multflag: move 1st/2nd args to 3rd/4th args
	// just copy ptrs, since arg[] points into original input line

	arg[2+multflag] = arg[1];
	arg[1+multflag] = arg[0];

	// invoke sub-style coeff() starting with 1st remaining arg

	if (!none) styles[m]->coeff(narg-1-multflag,&arg[1+multflag]);

	// if sub-style only allows one pair coeff call (with * * and type mapping)
	// then unset setflag/map assigned to that style before setting it below
	// in case pair coeff for this sub-style is being called for 2nd time

	if (!none && styles[m]->one_coeff)
	for (int i = 1; i <= atom->ntypes; i++)
	for (int j = i; j <= atom->ntypes; j++)
	if (nmap[i][j] && map[i][j][0] == m) {
	setflag[i][j] = 0;
	nmap[i][j] = 0;
	}

	// set setflag and which type pairs map to which sub-style
	// if sub-style is none: set hybrid setflag, wipe out map
	// else: set hybrid setflag & map only if substyle setflag is set
	// previous mappings are wiped out

	int count = 0;
	for (int i = ilo; i <= ihi; i++) {
	for (int j = MAX(jlo,i); j <= jhi; j++) {
	if (none) {
	setflag[i][j] = 1;
	nmap[i][j] = 0;
	count++;
	} else if (styles[m]->setflag[i][j]) {
	setflag[i][j] = 1;
	nmap[i][j] = 1;
	map[i][j][0] = m;
	count++;
	}
	}
	}

	if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients");
	}

	/* ----------------------------------------------------------------------
	init specific to this pair style
	------------------------------------------------------------------------- */

	void PairHybrid::init_style()
	{
	int i,m,itype,jtype,used,istyle,skip;

	// error if a sub-style is not used

	int ntypes = atom->ntypes;

	for (istyle = 0; istyle < nstyles; istyle++) {
	used = 0;
	for (itype = 1; itype <= ntypes; itype++)
	for (jtype = itype; jtype <= ntypes; jtype++)
	for (m = 0; m < nmap[itype][jtype]; m++)
	if (map[itype][jtype][m] == istyle) used = 1;
	if (used == 0) error->all(FLERR,"Pair hybrid sub-style is not used");
	}

	// check if special_lj/special_coul overrides are compatible

	for (istyle = 0; istyle < nstyles; istyle++) {
	if (special_lj[istyle]) {
	for (i = 1; i < 4; ++i) {
	if (((force->special_lj[i] == 0.0) \|\| (force->special_lj[i] == 1.0))
	&& (force->special_lj[i] != special_lj[istyle][i]))
	error->all(FLERR,"Pair_modify special setting for pair hybrid "
	"incompatible with global special_bonds setting");
	}
	}

	if (special_coul[istyle]) {
	for (i = 1; i < 4; ++i) {
	if (((force->special_coul[i] == 0.0)
	\|\| (force->special_coul[i] == 1.0))
	&& (force->special_coul[i] != special_coul[istyle][i]))
	error->all(FLERR,"Pair_modify special setting for pair hybrid "
	"incompatible with global special_bonds setting");
	}
	}
	}

	// each sub-style makes its neighbor list request(s)

	for (istyle = 0; istyle < nstyles; istyle++) styles[istyle]->init_style();

	// create skip lists inside each pair neigh request
	// any kind of list can have its skip flag set in this loop

	for (i = 0; i < neighbor->nrequest; i++) {
	if (!neighbor->requests[i]->pair) continue;

	// istyle = associated sub-style for the request

	for (istyle = 0; istyle < nstyles; istyle++)
	if (styles[istyle] == neighbor->requests[i]->requestor) break;

	// allocate iskip and ijskip
	// initialize so as to skip all pair types
	// set ijskip = 0 if type pair matches any entry in sub-style map
	// set ijskip = 0 if mixing will assign type pair to this sub-style
	// will occur if type pair is currently unassigned
	// and both I,I and J,J are assigned to single sub-style
	// and sub-style for both I,I and J,J match istyle
	// set iskip = 1 only if all ijskip for itype are 1

	int *iskip = new int[ntypes+1];
	int **ijskip;
	memory->create(ijskip,ntypes+1,ntypes+1,"pair_hybrid:ijskip");

	for (itype = 1; itype <= ntypes; itype++)
	for (jtype = 1; jtype <= ntypes; jtype++)
	ijskip[itype][jtype] = 1;

	for (itype = 1; itype <= ntypes; itype++)
	for (jtype = itype; jtype <= ntypes; jtype++) {
	for (m = 0; m < nmap[itype][jtype]; m++)
	if (map[itype][jtype][m] == istyle)
	ijskip[itype][jtype] = ijskip[jtype][itype] = 0;
	if (nmap[itype][jtype] == 0 &&
	nmap[itype][itype] == 1 && map[itype][itype][0] == istyle &&
	nmap[jtype][jtype] == 1 && map[jtype][jtype][0] == istyle)
	ijskip[itype][jtype] = ijskip[jtype][itype] = 0;
	}

	for (itype = 1; itype <= ntypes; itype++) {
	iskip[itype] = 1;
	for (jtype = 1; jtype <= ntypes; jtype++)
	if (ijskip[itype][jtype] == 0) iskip[itype] = 0;
	}

	// if any skipping occurs
	// set request->skip and copy iskip and ijskip into request
	// else delete iskip and ijskip
	// no skipping if pair style assigned to all type pairs

	skip = 0;
	for (itype = 1; itype <= ntypes; itype++)
	for (jtype = 1; jtype <= ntypes; jtype++)
	if (ijskip[itype][jtype] == 1) skip = 1;

	if (skip) {
	neighbor->requests[i]->skip = 1;
	neighbor->requests[i]->iskip = iskip;
	neighbor->requests[i]->ijskip = ijskip;
	} else {
	delete [] iskip;
	memory->destroy(ijskip);
	}
	}
	}

	/* ----------------------------------------------------------------------
	init for one type pair i,j and corresponding j,i
	------------------------------------------------------------------------- */

	double PairHybrid::init_one(int i, int j)
	{
	// if I,J is not set explicitly:
	// perform mixing only if I,I sub-style = J,J sub-style
	// also require I,I and J,J are both assigned to single sub-style

	if (setflag[i][j] == 0) {
	if (nmap[i][i] != 1 \|\| nmap[j][j] != 1 \|\| map[i][i][0] != map[j][j][0])
	error->one(FLERR,"All pair coeffs are not set");
	nmap[i][j] = 1;
	map[i][j][0] = map[i][i][0];
	}

	// call init/mixing for all sub-styles of I,J
	// set cutsq in sub-style just as Pair::init() does via call to init_one()
	// set cutghost for I,J and J,I just as sub-style does
	// sum tail corrections for I,J
	// return max cutoff of all sub-styles assigned to I,J
	// if no sub-styles assigned to I,J (pair_coeff none), cutmax = 0.0 returned

	double cutmax = 0.0;
	cutghost[i][j] = cutghost[j][i] = 0.0;
	if (tail_flag) etail_ij = ptail_ij = 0.0;

	nmap[j][i] = nmap[i][j];

	for (int k = 0; k < nmap[i][j]; k++) {
	map[j][i][k] = map[i][j][k];
	double cut = styles[map[i][j][k]]->init_one(i,j);
	styles[map[i][j][k]]->cutsq[i][j] =
	styles[map[i][j][k]]->cutsq[j][i] = cut*cut;
	if (styles[map[i][j][k]]->ghostneigh)
	cutghost[i][j] = cutghost[j][i] =
	MAX(cutghost[i][j],styles[map[i][j][k]]->cutghost[i][j]);
	if (tail_flag) {
	etail_ij += styles[map[i][j][k]]->etail_ij;
	ptail_ij += styles[map[i][j][k]]->ptail_ij;
	}
	cutmax = MAX(cutmax,cut);
	}

	return cutmax;
	}

	/* ----------------------------------------------------------------------
	invoke setup for each sub-style
	------------------------------------------------------------------------- */

	void PairHybrid::setup()
	{
	for (int m = 0; m < nstyles; m++) styles[m]->setup();
	}

	/* ----------------------------------------------------------------------
	proc 0 writes to restart file
	------------------------------------------------------------------------- */

	void PairHybrid::write_restart(FILE *fp)
	{
	fwrite(&nstyles,sizeof(int),1,fp);

	// each sub-style writes its settings, but no coeff info

	int n;
	for (int m = 0; m < nstyles; m++) {
	n = strlen(keywords[m]) + 1;
	fwrite(&n,sizeof(int),1,fp);
	fwrite(keywords[m],sizeof(char),n,fp);
	styles[m]->write_restart_settings(fp);
	// write out per style special settings, if present
	n = (special_lj[m] == NULL) ? 0 : 1;
	fwrite(&n,sizeof(int),1,fp);
	if (n) fwrite(special_lj[m],sizeof(double),4,fp);
	n = (special_coul[m] == NULL) ? 0 : 1;
	fwrite(&n,sizeof(int),1,fp);
	if (n) fwrite(special_coul[m],sizeof(double),4,fp);
	}
	}

	/* ----------------------------------------------------------------------
	proc 0 reads from restart file, bcasts
	------------------------------------------------------------------------- */

	void PairHybrid::read_restart(FILE *fp)
	{
	int me = comm->me;
	if (me == 0) fread(&nstyles,sizeof(int),1,fp);
	MPI_Bcast(&nstyles,1,MPI_INT,0,world);

	// allocate list of sub-styles

	styles = new Pair*[nstyles];
	keywords = new char*[nstyles];
	multiple = new int[nstyles];

	special_lj = new double*[nstyles];
	special_coul = new double*[nstyles];

	// each sub-style is created via new_pair()
	// each reads its settings, but no coeff info

	int n,dummy;
	for (int m = 0; m < nstyles; m++) {
	if (me == 0) fread(&n,sizeof(int),1,fp);
	MPI_Bcast(&n,1,MPI_INT,0,world);
	keywords[m] = new char[n];
	if (me == 0) fread(keywords[m],sizeof(char),n,fp);
	MPI_Bcast(keywords[m],n,MPI_CHAR,0,world);
	styles[m] = force->new_pair(keywords[m],0,dummy);
	styles[m]->read_restart_settings(fp);
	// read back per style special settings, if present
	special_lj[m] = special_coul[m] = NULL;
	if (me == 0) fread(&n,sizeof(int),1,fp);
	MPI_Bcast(&n,1,MPI_INT,0,world);
	if (n > 0 ) {
	special_lj[m] = new double[4];
	if (me == 0) fread(special_lj[m],sizeof(double),4,fp);
	MPI_Bcast(special_lj[m],4,MPI_DOUBLE,0,world);
	}
	if (me == 0) fread(&n,sizeof(int),1,fp);
	MPI_Bcast(&n,1,MPI_INT,0,world);
	if (n > 0 ) {
	special_coul[m] = new double[4];
	if (me == 0) fread(special_coul[m],sizeof(double),4,fp);
	MPI_Bcast(special_coul[m],4,MPI_DOUBLE,0,world);
	}
	}

	// multiple[i] = 1 to M if sub-style used multiple times, else 0

	for (int i = 0; i < nstyles; i++) {
	int count = 0;
	for (int j = 0; j < nstyles; j++) {
	if (strcmp(keywords[j],keywords[i]) == 0) count++;
	if (j == i) multiple[i] = count;
	}
	if (count == 1) multiple[i] = 0;
	}

	// set pair flags from sub-style flags

	flags();
	}

	/* ----------------------------------------------------------------------
	call sub-style to compute single interaction
	error if sub-style does not support single() call
	since overlay could have multiple sub-styles, sum results explicitly
	------------------------------------------------------------------------- */

	double PairHybrid::single(int i, int j, int itype, int jtype,
	double rsq, double factor_coul, double factor_lj,
	double &fforce)
	{
	if (nmap[itype][jtype] == 0)
	error->one(FLERR,"Invoked pair single on pair style none");

	double fone;
	fforce = 0.0;
	double esum = 0.0;

	for (int m = 0; m < nmap[itype][jtype]; m++) {
	if (rsq < styles[map[itype][jtype][m]]->cutsq[itype][jtype]) {
	if (styles[map[itype][jtype][m]]->single_enable == 0)
	error->one(FLERR,"Pair hybrid sub-style does not support single call");

	if ((special_lj[map[itype][jtype][m]] != NULL) \|\|
	(special_coul[map[itype][jtype][m]] != NULL))
	error->one(FLERR,"Pair hybrid single calls do not support"
	" per sub-style special bond values");

	esum += styles[map[itype][jtype][m]]->
	single(i,j,itype,jtype,rsq,factor_coul,factor_lj,fone);
	fforce += fone;

	// copy substyle extra values into hybrid's svector

	if (single_extra && styles[map[itype][jtype][m]]->single_extra)
	for (m = 0; m < single_extra; m++)
	svector[m] = styles[map[itype][jtype][m]]->svector[m];
	}
	}

	return esum;
	}

	/* ----------------------------------------------------------------------
	modify parameters of the pair style and its sub-styles
	------------------------------------------------------------------------- */

	void PairHybrid::modify_params(int narg, char **arg)
	{
	if (narg == 0) error->all(FLERR,"Illegal pair_modify command");

	// if 1st keyword is pair, apply other keywords to one sub-style

	if (strcmp(arg[0],"pair") == 0) {
	if (narg < 2) error->all(FLERR,"Illegal pair_modify command");
	int m;
	for (m = 0; m < nstyles; m++)
	if (strcmp(arg[1],keywords[m]) == 0) break;
	if (m == nstyles) error->all(FLERR,"Unknown pair_modify hybrid sub-style");
	int iarg = 2;

	if (multiple[m]) {
	if (narg < 3) error->all(FLERR,"Illegal pair_modify command");
	int multiflag = force->inumeric(FLERR,arg[2]);
	for (m = 0; m < nstyles; m++)
	if (strcmp(arg[1],keywords[m]) == 0 && multiflag == multiple[m]) break;
	if (m == nstyles)
	error->all(FLERR,"Unknown pair_modify hybrid sub-style");
	iarg = 3;
	}

	// if 2nd keyword (after pair) is special:
	// invoke modify_special() for the sub-style

	if (iarg < narg && strcmp(arg[iarg],"special") == 0) {
	if (narg < iarg+5)
	error->all(FLERR,"Illegal pair_modify special command");
	modify_special(m,narg-iarg,&arg[iarg+1]);
	iarg += 5;
	}

	+ // if 2nd keyword (after pair) is compute/tally:
	+ // set flag to register USER-TALLY computes accordingly
	+
	+ if (iarg < narg && strcmp(arg[iarg],"compute/tally") == 0) {
	+ if (narg < iarg+2)
	+ error->all(FLERR,"Illegal pair_modify compute/tally command");
	+ if (strcmp(arg[iarg+1],"yes") == 0) {
	+ compute_tally[m] = 1;
	+ } else if (strcmp(arg[iarg+1],"no") == 0) {
	+ compute_tally[m] = 0;
	+ } else error->all(FLERR,"Illegal pair_modify compute/tally command");
	+ iarg += 2;
	+ }
	+
	// apply the remaining keywords to the base pair style itself and the
	// sub-style except for "pair" and "special".
	// the former is important for some keywords like "tail" or "compute"

	if (narg-iarg > 0) {
	Pair::modify_params(narg-iarg,&arg[iarg]);
	styles[m]->modify_params(narg-iarg,&arg[iarg]);
	}

	// apply all keywords to pair hybrid itself and every sub-style

	} else {
	Pair::modify_params(narg,arg);
	for (int m = 0; m < nstyles; m++) styles[m]->modify_params(narg,arg);
	}
	}

	/* ----------------------------------------------------------------------
	store a local per pair style override for special_lj and special_coul
	------------------------------------------------------------------------- */

	void PairHybrid::modify_special(int m, int narg, char **arg)
	{
	double special[4];
	int i;

	special[0] = 1.0;
	special[1] = force->numeric(FLERR,arg[1]);
	special[2] = force->numeric(FLERR,arg[2]);
	special[3] = force->numeric(FLERR,arg[3]);

	if (strcmp(arg[0],"lj/coul") == 0) {
	if (!special_lj[m]) special_lj[m] = new double[4];
	if (!special_coul[m]) special_coul[m] = new double[4];
	for (i = 0; i < 4; ++i)
	special_lj[m][i] = special_coul[m][i] = special[i];

	} else if (strcmp(arg[0],"lj") == 0) {
	if (!special_lj[m]) special_lj[m] = new double[4];
	for (i = 0; i < 4; ++i)
	special_lj[m][i] = special[i];

	} else if (strcmp(arg[0],"coul") == 0) {
	if (!special_coul[m]) special_coul[m] = new double[4];
	for (i = 0; i < 4; ++i)
	special_coul[m][i] = special[i];

	} else error->all(FLERR,"Illegal pair_modify special command");
	}

	/* ----------------------------------------------------------------------
	override global special bonds settings with per substyle values
	------------------------------------------------------------------------- */

	void PairHybrid::set_special(int m)
	{
	int i;
	if (special_lj[m])
	for (i = 0; i < 4; ++i) force->special_lj[i] = special_lj[m][i];
	if (special_coul[m])
	for (i = 0; i < 4; ++i) force->special_coul[i] = special_coul[m][i];
	}

	/* ----------------------------------------------------------------------
	store global special settings
	------------------------------------------------------------------------- */

	double * PairHybrid::save_special()
	{
	double *saved = new double[8];

	for (int i = 0; i < 4; ++i) {
	saved[i] = force->special_lj[i];
	saved[i+4] = force->special_coul[i];
	}
	return saved;
	}

	/* ----------------------------------------------------------------------
	restore global special settings from saved data
	------------------------------------------------------------------------- */

	void PairHybrid::restore_special(double *saved)
	{
	for (int i = 0; i < 4; ++i) {
	force->special_lj[i] = saved[i];
	force->special_coul[i] = saved[i+4];
	}
	}

	/* ----------------------------------------------------------------------
	extract a ptr to a particular quantity stored by pair
	pass request thru to sub-styles
	return first non-NULL result except for cut_coul request
	for cut_coul, insure all non-NULL results are equal since required by Kspace
	------------------------------------------------------------------------- */

	void PairHybrid::extract(const char str, int &dim)
	{
	void *cutptr = NULL;
	void *ptr;
	double cutvalue = 0.0;

	for (int m = 0; m < nstyles; m++) {
	ptr = styles[m]->extract(str,dim);
	if (ptr && strcmp(str,"cut_coul") == 0) {
	double p_newvalue = (double ) ptr;
	double newvalue = *p_newvalue;
	if (cutptr && newvalue != cutvalue)
	error->all(FLERR,
	"Coulomb cutoffs of pair hybrid sub-styles do not match");
	cutptr = ptr;
	cutvalue = newvalue;
	} else if (ptr) return ptr;
	}

	if (strcmp(str,"cut_coul") == 0) return cutptr;
	return NULL;
	}

	/* ---------------------------------------------------------------------- */

	void PairHybrid::reset_dt()
	{
	for (int m = 0; m < nstyles; m++) styles[m]->reset_dt();
	}

	/* ----------------------------------------------------------------------
	check if itype,jtype maps to sub-style
	------------------------------------------------------------------------- */

	int PairHybrid::check_ijtype(int itype, int jtype, char *substyle)
	{
	for (int m = 0; m < nmap[itype][jtype]; m++)
	if (strcmp(keywords[map[itype][jtype][m]],substyle) == 0) return 1;
	return 0;
	}

	/* ----------------------------------------------------------------------
	memory usage of each sub-style
	------------------------------------------------------------------------- */

	double PairHybrid::memory_usage()
	{
	double bytes = maxeatom * sizeof(double);
	bytes += maxvatom6 sizeof(double);
	for (int m = 0; m < nstyles; m++) bytes += styles[m]->memory_usage();
	return bytes;
	}
	diff --git a/src/pair_hybrid.h b/src/pair_hybrid.h
	index e3de3b022..b8b9af5f4 100644
	--- a/src/pair_hybrid.h
	+++ b/src/pair_hybrid.h
	@@ -1,154 +1,158 @@
	/* -- c++ -- ----------------------------------------------------------
	LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
	http://lammps.sandia.gov, Sandia National Laboratories
	Steve Plimpton, sjplimp@sandia.gov

	Copyright (2003) Sandia Corporation. Under the terms of Contract
	DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
	certain rights in this software. This software is distributed under
	the GNU General Public License.

	See the README file in the top-level LAMMPS directory.
	------------------------------------------------------------------------- */

	#ifdef PAIR_CLASS

	PairStyle(hybrid,PairHybrid)

	#else

	#ifndef LMP_PAIR_HYBRID_H
	#define LMP_PAIR_HYBRID_H

	#include <stdio.h>
	#include "pair.h"

	namespace LAMMPS_NS {

	class PairHybrid : public Pair {
	friend class FixGPU;
	friend class FixIntel;
	friend class FixOMP;
	friend class Force;
	friend class Respa;
	friend class Info;
	public:
	PairHybrid(class LAMMPS *);
	virtual ~PairHybrid();
	void compute(int, int);
	void settings(int, char **);
	virtual void coeff(int, char **);
	void init_style();
	double init_one(int, int);
	void setup();
	void write_restart(FILE *);
	void read_restart(FILE *);
	double single(int, int, int, int, double, double, double, double &);
	void modify_params(int narg, char **arg);
	double memory_usage();

	void compute_inner();
	void compute_middle();
	void compute_outer(int, int);
	void extract(const char , int &);
	void reset_dt();

	int check_ijtype(int, int, char *);

	+ virtual void add_tally_callback(class Compute *);
	+ virtual void del_tally_callback(class Compute *);
	+
	protected:
	int nstyles; // # of sub-styles
	Pair **styles; // list of Pair style classes
	char **keywords; // style name of each Pair style
	int *multiple; // 0 if style used once, else Mth instance

	int outerflag; // toggle compute() when invoked by outer()
	int respaflag; // 1 if different substyles are assigned to
	// different r-RESPA levels

	int **nmap; // # of sub-styles itype,jtype points to
	int ***map; // list of sub-styles itype,jtype points to
	double **special_lj; // list of per style LJ exclusion factors
	double **special_coul; // list of per style Coulomb exclusion factors
	+ int *compute_tally; // list of on/off flags for tally computes

	void allocate();
	void flags();

	void modify_special(int, int, char**);
	double *save_special();
	void set_special(int);
	void restore_special(double *);
	};

	}

	#endif
	#endif

	/* ERROR/WARNING messages:

	E: Cannot yet use pair hybrid with Kokkos

	This feature is not yet supported.

	E: Illegal ... command

	Self-explanatory. Check the input script syntax and compare to the
	documentation for the command. You can use -echo screen as a
	command-line option when running LAMMPS to see the offending line.

	E: Pair style hybrid cannot have hybrid as an argument

	Self-explanatory.

	E: Pair style hybrid cannot have none as an argument

	Self-explanatory.

	E: Incorrect args for pair coefficients

	Self-explanatory. Check the input script or data file.

	E: Pair coeff for hybrid has invalid style

	Style in pair coeff must have been listed in pair_style command.

	E: Pair hybrid sub-style is not used

	No pair_coeff command used a sub-style specified in the pair_style
	command.

	E: Pair_modify special setting for pair hybrid incompatible with global special_bonds setting

	Cannot override a setting of 0.0 or 1.0 or change a setting between
	0.0 and 1.0.

	E: All pair coeffs are not set

	All pair coefficients must be set in the data file or by the
	pair_coeff command before running a simulation.

	E: Invoked pair single on pair style none

	A command (e.g. a dump) attempted to invoke the single() function on a
	pair style none, which is illegal. You are probably attempting to
	compute per-atom quantities with an undefined pair style.

	E: Pair hybrid sub-style does not support single call

	You are attempting to invoke a single() call on a pair style
	that doesn't support it.

	E: Pair hybrid single calls do not support per sub-style special bond values

	Self-explanatory.

	E: Unknown pair_modify hybrid sub-style

	The choice of sub-style is unknown.

	E: Coulomb cutoffs of pair hybrid sub-styles do not match

	If using a Kspace solver, all Coulomb cutoffs of long pair styles must
	be the same.

	*/
	diff --git a/src/version.h b/src/version.h
	index e6ffb22dc..7ae7ec487 100644
	--- a/src/version.h
	+++ b/src/version.h
	@@ -1 +1 @@
	-#define LAMMPS_VERSION "11 Apr 2017"
	+#define LAMMPS_VERSION "4 May 2017"
	diff --git a/tools/msi2lmp/README b/tools/msi2lmp/README
	index a20f6e893..db9b1aca5 100644
	--- a/tools/msi2lmp/README
	+++ b/tools/msi2lmp/README
	@@ -1,227 +1,240 @@
	-Axel Kohlmeyer is the current maintainer of the msi2lmp tool.
	-Please send any inquiries about msi2lmp to the lammps-users mailing list.

	-06 Oct 2016 Axel Kohlmeyer <akohlmey@gmail.com>
	-
	-Improved whitespace handling in parsing topology and force field
	-files to avoid bogus warnings about type name truncation.
	-
	-24 Oct 2015 Axel Kohlmeyer <akohlmey@gmail.com>
	-
	-Added check to make certain that force field files
	-are consistent with the notation of non-bonded parameters
	-that the msi2lmp code expects. For Class 1 and OPLS-AA
	-the A-B notation with geometric mixing is expected and for
	-Class 2 the r-eps notation with sixthpower mixing.
	-
	-11 Sep 2014 Axel Kohlmeyer <akohlmey@gmail.com>
	-
	-Refactored ReadMdfFile.c so it more consistently honors
	-the MAX_NAME and MAX_STRING string length defines and
	-potentially handles inputs with long names better.
	-
	-27 May 2014 Axel Kohlmeyer <akohlmey@gmail.com>
	-
	-Added TopoTools style type hints as comments to all Mass, PairCoeff,
	-BondCoeff, AngleCoeff, DihedralCoeff, ImproperCoeff entries.
	-This should make it easier to identify force field entries with
	-the structure and force field map in the data file later.
	-
	-06 Mar 2014 Axel Kohlmeyer <akohlmey@gmail.com>
	-
	-Fixed a bug in handling of triclinic cells, where the matrices to
	-convert to and from fractional coordinates were incorrectly built.
	-
	-26 Oct 2013 Axel Kohlmeyer <akohlmey@gmail.com>
	-
	-Implemented writing out force field style hints in generated data
	-files for improved consistency checking when reading those files.
	-Also added writing out CGCMM style comments to identify atom types.
	+ msi2lmp.exe

	-08 Oct 2013 Axel Kohlmeyer <akohlmey@gmail.com>
	+This code has several known limitations listed below under "LIMITATIONS"
	+(and possibly some unknown ones, too) and is no longer under active
	+development. Only the occasional bugfix is applied.

	-Fixed a memory access violation with Class 2 force fields.
	-Free all allocated memory to better detection of memory errors.
	-Print out version number and data with all print levels > 0.
	-Added valgrind checks to the regression tests
	+Please send any inquiries about msi2lmp to the lammps-users
	+mailing list and not to individual people.

	-08 Oct 2013 Axel Kohlmeyer <akohlmey@gmail.com>
	+------------------------------------------------------------------------

	-Fixed a memory access violation with Class 2 force fields.
	-Free all allocated memory to better detection of memory errors.
	-Print out version number and data with all print levels > 0.
	-Added valgrind checks to the regression tests
	+OVERVIEW

	-02 Aug 2013 Axel Kohlmeyer <akohlmey@gmail.com>
	-
	-Added rudimentary support for OPLS-AA based on
	-input provided by jeff greathouse.
	+This is the third version of a program that generates a LAMMPS data file
	+based on the information in MSI .car (atom coordinates), .mdf (molecular
	+topology) and .frc (forcefield) files. The .car and .mdf files are
	+specific to a molecular system while the .frc file is specific to a
	+forcefield version. The only coherency needed between .frc and
	+.car/.mdf files are the atom types.

	-18 Jul 2013 Axel Kohlmeyer <akohlmey@gmail.com>
	-
	-Added support for writing out image flags
	-Improved accuracy of atom masses
	-Added flag for shifting the entire system
	-Fixed some minor logic bugs and prepared
	-for supporting other force fields and morse style bonds.
	-
	-12 Jul 2013 Axel Kohlmeyer <akohlmey@gmail.com>
	-
	-Fixed the bug that caused improper coefficients to be wrong
	-Cleaned up the handling of box parameters and center the box
	-by default around the system/molecule. Added a flag to make
	-this step optional and center the box around the origin instead.
	-Added a regression test script with examples.
	-
	-1 Jul 2013 Axel Kohlmeyer <akohlmey@gmail.com>
	+The first version was written by Steve Lustig at Dupont, but required
	+using Discover to derive internal coordinates and forcefield parameters

	-Cleanup and improved port to windows.
	-Removed some more static string limits.
	-Added print level 3 for additional output.
	-Make code stop at missing force field parameters
	-and added -i flag to override this.
	-Safer argument checking.
	-Provide short versions for all flags.
	+The second version was written by Michael Peachey while an intern in the
	+Cray Chemistry Applications Group managed by John Carpenter. This
	+version derived internal coordinates from the mdf file and looked up
	+parameters in the frc file thus eliminating the need for Discover.

	-23 Sep 2011
	+The third version was written by John Carpenter to optimize the
	+performance of the program for large molecular systems (the original
	+code for deriving atom numbers was quadratic in time) and to make the
	+program fully dynamic. The second version used fixed dimension arrays
	+for the internal coordinates.

	-added support for triclinic boxes
	-see msi2lmp/TriclinicModification.pdf doc for details
	+The third version was revised in Fall 2011 by Stephanie Teich-McGoldrick
	+to add support non-orthogonal cells.

	------------------------------
	+The next revision was started in Summer/Fall 2013 by Axel Kohlmeyer to
	+improve portability to Windows compilers, clean up command line parsing
	+and improve compatibility with the then current LAMMPS versions. This
	+revision removes compatibility with the obsolete LAMMPS version written
	+in Fortran 90.

	- msi2lmp V3.6 4/10/2005
	+INSTALLATION & USAGE

	- This program uses the .car and .mdf files from MSI/Biosyms's INSIGHT
	+This program uses the .car and .mdf files from MSI/Biosyms's INSIGHT
	program to produce a LAMMPS data file.

	1. Building msi2lmp

	Use the Makefile in the src directory. It is
	currently set up for gcc. You will have to modify
	it to use a different compiler.

	2. Testing the program

	There are several pairs of input test files in the format generated
	by materials studio or compatible programs (one .car and one .mdf
	file each) in the test directory. There is also a LAMMPS input to
	run a minimization for each and write out the resulting system as
	a data file. With the runtests.sh script all of those inputs are
	converted via msi2lmp, then the minimization with LAMMPS is run
	and the generated data files are compared with the corresponding
	files in the reference folder. This script assumes you are on a
	unix/linux system and that you have compile a serial LAMMPS executable
	called lmp_serial with make serial. The tests are groups by the
	force fields they use.

	3. To run the program

	The program is started by supplying information at the command prompt
	according to the usage described below.

	USAGE: msi2lmp.exe <ROOTNAME> {-print #} {-class #} {-frc FRC_FILE}
	{-ignore} {-nocenter} {-shift # # #}

	-- msi2lmp.exe is the name of the executable
	-- <ROOTNAME> is the base name of the .car and .mdf files
	-- -2001
	Output lammps files for LAMMPS version 2001 (F90 version)
	Default is to write output for the C++ version of LAMMPS

	-- -print (or -p)
	# is the print level 0 - silent except for error messages
	1 - minimal (default)
	2 - verbose (usual for developing and
	checking new data files for consistency)
	3 - even more verbose (additional debug info)

	-- -ignore (or -i) ignore errors about missing force field parameters
	and treat them as warnings instead.

	-- -nocenter (or -n) do not recenter the simulation box around the
	geometrical center of the provided geometry but
	rather around the origin

	-- -oldstyle (or -o) write out a data file without style hints
	(to be compatible with older LAMMPS versions)

	-- -shift (or -s) translate the entire system (box and coordinates)
	by a vector (default: 0.0 0.0 0.0)

	-- -class (or -c)
	# is the class of forcefield to use (I or 1 = Class I e.g., CVFF)
	(O or 0 = OPLS-AA)
	(II or 2 = Class II e.g., CFFx)
	default is -class I

	-- -frc (or -f) specifies name of the forcefield file (e.g., cff91)

	If the file name includes a directory component (or drive letter
	on Windows), then the name is used as is. Otherwise, the program
	looks for the forcefield file in $MSI2LMP_LIBRARY (or %MSI2LMP_LIBRARY%
	on Windows). If $MSI2LMP_LIBRARY is not set, ../frc_files is used
	(for testing). If the file name does not end in .frc, then .frc
	is appended to the name.

	For example, -frc cvff (assumes cvff.frc is in $MSI2LMP_LIBRARY
	or ../frc_files)

	-frc cff/cff91 (assumes cff91.frc is in cff)

	-frc /usr/local/forcefields/cff95
	(assumes cff95.frc is in /usr/local/forcefields/)

	By default, the program uses $MSI2LMP_LIBRARY/cvff.frc or
	../frc_files/cvff.frc depending on whether MSI2LMP_LIBRARY is set.

	-- the LAMMPS data file is written to <ROOTNAME>.data
	protocol and error information is written to the screen.

	-****************************************************************
	-*
	-* msi2lmp
	-*
	-* This is the third version of a program that generates a LAMMPS
	-* data file based on the information in MSI .car (atom
	-* coordinates), .mdf (molecular topology) and .frc (forcefield)
	-* files. The .car and .mdf files are specific to a molecular
	-* system while the .frc file is specific to a forcefield version.
	-* The only coherency needed between .frc and .car/.mdf files are
	-* the atom types.
	-*
	-* The first version was written by Steve Lustig at Dupont, but
	-* required using Discover to derive internal coordinates and
	-* forcefield parameters
	-*
	-* The second version was written by Michael Peachey while an
	-* intern in the Cray Chemistry Applications Group managed
	-* by John Carpenter. This version derived internal coordinates
	-* from the mdf file and looked up parameters in the frc file
	-* thus eliminating the need for Discover.
	-*
	-* The third version was written by John Carpenter to optimize
	-* the performance of the program for large molecular systems
	-* (the original code for deriving atom numbers was quadratic in time)
	-* and to make the program fully dynamic. The second version used
	-* fixed dimension arrays for the internal coordinates.
	-*
	-* The current maintainer is only reluctantly doing so because John Mayo no longer
	-* needs this code.
	-*
	-* V3.2 corresponds to adding code to MakeLists.c to gracefully deal with
	-* systems that may only be molecules of 1 to 3 atoms. In V3.1, the values
	-* for number_of_dihedrals, etc. could be unpredictable in these systems.
	-*
	-* V3.3 was generated in response to a strange error reading a MDF file generated by
	-* Accelys' Materials Studio GUI. Simply rewriting the input part of ReadMdfFile.c
	-* seems to have fixed the problem.
	-*
	-* V3.4 and V3.5 are minor upgrades to fix bugs associated mostly with .car and .mdf files
	-* written by Accelys' Materials Studio GUI.
	-*
	-* V3.6 outputs to LAMMPS 2005 (C++ version).
	-*
	-* Contact: Kelly L. Anderson, kelly.anderson@cantab.net
	-*
	-* April 2005
	+------------------------------------------------------------------------
	+
	+LIMITATIONS
	+
	+msi2lmp has the following known limitations:
	+
	+- there is no support to select morse bonds over harmonic bonds
	+- there is no support for auto-equivalences to supplement fully
	+ parameterized interactions with heuristic ones
	+- there is no support for bond increments
	+
	+------------------------------------------------------------------------
	+
	+CHANGELOG
	+
	+06 Oct 2016 Axel Kohlmeyer <akohlmey@gmail.com>
	+
	+Improved whitespace handling in parsing topology and force field
	+files to avoid bogus warnings about type name truncation.
	+
	+24 Oct 2015 Axel Kohlmeyer <akohlmey@gmail.com>
	+
	+Added check to make certain that force field files are consistent with
	+the notation of non-bonded parameters that the msi2lmp code expects.
	+For Class 1 and OPLS-AA the A-B notation with geometric mixing is
	+expected and for Class 2 the r-eps notation with sixthpower mixing.
	+
	+11 Sep 2014 Axel Kohlmeyer <akohlmey@gmail.com>
	+
	+Refactored ReadMdfFile.c so it more consistently honors the MAX_NAME
	+and MAX_STRING string length defines and potentially handles inputs
	+with long names better.
	+
	+27 May 2014 Axel Kohlmeyer <akohlmey@gmail.com>
	+
	+Added TopoTools style type hints as comments to all Mass, PairCoeff,
	+BondCoeff, AngleCoeff, DihedralCoeff, ImproperCoeff entries.
	+This should make it easier to identify force field entries with
	+the structure and force field map in the data file later.
	+
	+06 Mar 2014 Axel Kohlmeyer <akohlmey@gmail.com>
	+
	+Fixed a bug in handling of triclinic cells, where the matrices to
	+convert to and from fractional coordinates were incorrectly built.
	+
	+26 Oct 2013 Axel Kohlmeyer <akohlmey@gmail.com>
	+
	+Implemented writing out force field style hints in generated data
	+files for improved consistency checking when reading those files.
	+Also added writing out CGCMM style comments to identify atom types.
	+
	+08 Oct 2013 Axel Kohlmeyer <akohlmey@gmail.com>
	+
	+Fixed a memory access violation with Class 2 force fields. Free all
	+allocated memory to better detection of memory errors. Print out
	+version number and data with all print levels > 0. Added valgrind
	+checks to the regression tests.
	+
	+02 Aug 2013 Axel Kohlmeyer <akohlmey@gmail.com>
	+
	+Added rudimentary support for OPLS-AA based on input provided
	+by jeff greathouse.
	+
	+18 Jul 2013 Axel Kohlmeyer <akohlmey@gmail.com>
	+
	+Added support for writing out image flags. Improved accuracy of atom
	+masses. Added flag for shifting the entire system. Fixed some minor
	+logic bugs and prepared for supporting other force fields and morse
	+style bonds.
	+
	+12 Jul 2013 Axel Kohlmeyer <akohlmey@gmail.com>
	+
	+Fixed the bug that caused improper coefficients to be wrong. Cleaned up
	+the handling of box parameters and center the box by default around the
	+system/molecule. Added a flag to make this step optional and center the
	+box around the origin instead. Added a regression test script with
	+examples.
	+
	+1 Jul 2013 Axel Kohlmeyer <akohlmey@gmail.com>
	+
	+Cleanup and improved port to windows. Removed some more static string
	+limits. Added print level 3 for additional output. Make code stop at
	+missing force field parameters and added -i flag to override this.
	+Safer argument checking. Provide short versions for all flags.
	+
	+23 Sep 2011
	+
	+added support for triclinic boxes
	+
	+V3.6 outputs to LAMMPS 2005 (C++ version).
	+
	+Contact: Kelly L. Anderson, kelly.anderson@cantab.net
	+
	+V3.4 and V3.5 are minor upgrades to fix bugs associated mostly with .car
	+ and .mdf files written by Accelys' Materials Studio GUI. April 2005
	+
	+V3.3 was generated in response to a strange error reading a MDF file
	+generated by Accelys' Materials Studio GUI. Simply rewriting the input
	+part of ReadMdfFile.c seems to have fixed the problem.
	+
	+V3.2 corresponds to adding code to MakeLists.c to gracefully deal with
	+systems that may only be molecules of 1 to 3 atoms. In V3.1, the values
	+for number_of_dihedrals, etc. could be unpredictable in these systems.
	+
	+-----------------------------
	+
	+ msi2lmp v3.9.8 6/10/2016
	+
	diff --git a/tools/msi2lmp/src/GetParameters.c b/tools/msi2lmp/src/GetParameters.c
	index e183c529e..192b4d296 100644
	--- a/tools/msi2lmp/src/GetParameters.c
	+++ b/tools/msi2lmp/src/GetParameters.c
	@@ -1,1310 +1,1310 @@

	#include "msi2lmp.h"
	#include "Forcefield.h"

	#include <string.h>
	#include <stdlib.h>
	#include <math.h>

	static int find_improper_body_data(char [][5],struct FrcFieldItem,int *);
	static void rearrange_improper(int,int);
	static int find_trigonal_body_data(char [][5],struct FrcFieldItem);
	static int find_angleangle_data(char [][5],struct FrcFieldItem,int[]);
	static int find_match(int, char [][5],struct FrcFieldItem,int *);
	static int match_types(int,int,char [][5],char [][5],int *);
	static double get_r0(int,int);
	static double get_t0(int,int,int);
	static int quo_cp();
	static void get_equivs(int,char [][5],char[][5]);
	static int find_equiv_type(char[]);

	/**********************************************************************/
	/* */
	/* GetParameters is a long routine for searching the forcefield */
	/* parameters (read in by ReadFrcFile) for parameters corresponding */
	/* to the different internal coordinate types derived by MakeLists */
	/* */
	/**********************************************************************/

	void GetParameters()
	{
	int i,j,k,backwards,cp_type,rearrange;
	int kloc[3],multiplicity;
	char potential_types[4][5];
	char equiv_types[4][5];
	double rab,rbc,rcd,tabc,tbcd,tabd,tcbd;

	if (pflag > 1) fprintf(stderr," Trying Atom Equivalences if needed\n");

	/**********************************************************************/
	/* */
	/* Find masses of atom types */
	/* */
	/**********************************************************************/

	for (i=0; i < no_atom_types; i++) {
	backwards = -1;
	strncpy(potential_types[0],atomtypes[i].potential,5);
	k = find_match(1,potential_types,ff_atomtypes,&backwards);
	if (k < 0) {
	printf(" Unable to find mass for %s\n",atomtypes[i].potential);
	condexit(10);
	} else {
	atomtypes[i].mass = ff_atomtypes.data[k].ff_param[0];
	}
	}

	/**********************************************************************/
	/* */
	/* Find VDW parameters for atom types */
	/* */
	/**********************************************************************/

	for (i=0; i < no_atom_types; i++) {
	backwards = 0;
	for (j=0; j < 2; j++) atomtypes[i].params[j] = 0.0;
	strncpy(potential_types[0],atomtypes[i].potential,5);
	k = find_match(1,potential_types,ff_vdw,&backwards);
	if (k < 0) {
	get_equivs(1,potential_types,equiv_types);

	if (pflag > 2) printf(" Using equivalences for VDW %s -> %s\n",
	potential_types[0],equiv_types[0]);

	k = find_match(1,equiv_types,ff_vdw,&backwards);
	}
	if (k < 0) {
	printf(" Unable to find vdw data for %s\n",atomtypes[i].potential);
	condexit(11);
	} else {
	if (ljtypeflag == 0) {
	if((ff_vdw.data[k].ff_param[0] != 0.0 ) &&
	(ff_vdw.data[k].ff_param[1] != 0.0)) {
	atomtypes[i].params[0] =
	(ff_vdw.data[k].ff_param[1]*
	ff_vdw.data[k].ff_param[1])/(4.0*ff_vdw.data[k].ff_param[0]);
	atomtypes[i].params[1] = pow((ff_vdw.data[k].ff_param[0]/
	ff_vdw.data[k].ff_param[1]),
	(1.0/6.0));
	}
	} else if (ljtypeflag == 1) {
	atomtypes[i].params[0] = ff_vdw.data[k].ff_param[1];
	atomtypes[i].params[1] = ff_vdw.data[k].ff_param[0];
	} else {
	printf(" Unknown LJ parameter type %d\n",ljtypeflag);
	exit(111);
	}
	}
	}

	if (pflag > 2) {
	printf("\n Atom Types, Masses and VDW Parameters\n");
	for (i=0; i < no_atom_types; i++) {
	printf(" %3s %8.4f %8.4f %8.4f\n",
	atomtypes[i].potential,atomtypes[i].mass, atomtypes[i].params[0],atomtypes[i].params[1]);
	}
	}

	/**********************************************************************/
	/* */
	/* Find parameters for bond types */
	/* */
	/**********************************************************************/

	for (i=0; i < no_bond_types; i++) {
	backwards = 0;
	for (j=0; j < 4; j++) bondtypes[i].params[j] = 0.0;
	for (j=0; j < 2; j++)
	strncpy(potential_types[j],
	atomtypes[bondtypes[i].types[j]].potential,5);
	k = find_match(2,potential_types,ff_bond,&backwards);
	if (k < 0) {
	get_equivs(2,potential_types,equiv_types);

	if (pflag > 2) {
	printf(" Using equivalences for bond %s %s -> %s %s\n",
	potential_types[0],potential_types[1],
	equiv_types[0],equiv_types[1]);
	}
	k = find_match(2,equiv_types,ff_bond,&backwards);
	}
	if (k < 0) {
	printf(" Unable to find bond data for %s %s\n",
	potential_types[0],potential_types[1]);
	condexit(12);
	} else {
	if (forcefield & (FF_TYPE_CLASS1\|FF_TYPE_OPLSAA)) {
	bondtypes[i].params[0] = ff_bond.data[k].ff_param[1];
	bondtypes[i].params[1] = ff_bond.data[k].ff_param[0];
	- }
	+ }

	if (forcefield & FF_TYPE_CLASS2) {
	for (j=0; j < 4; j++)
	bondtypes[i].params[j] = ff_bond.data[k].ff_param[j];
	}
	}
	}

	if (pflag > 2) {
	printf("\n Bond Types and Parameters\n");
	for (i=0; i < no_bond_types; i++) {
	for (j=0; j < 2; j++)
	printf(" %-3s",atomtypes[bondtypes[i].types[j]].potential);
	for (j=0; j < 4; j++)
	printf(" %8.4f",bondtypes[i].params[j]);
	printf("\n");
	}
	}


	/**********************************************************************/
	/* */
	/* Find parameters for angle types including bondbond, */
	/* and bondangle parameters if Class II */
	/* */
	/* Each of the cross terms are searched separately even though */
	/* they share a given angle type. This allows parameters to be */
	/* in different order in the forcefield for each cross term or */
	/* maybe not even there. */
	/* */
	/**********************************************************************/
	for (i=0; i < no_angle_types; i++) {
	backwards = 0;
	for (j=0; j < 4; j++) angletypes[i].params[j] = 0.0;
	for (j=0; j < 3; j++)
	strncpy(potential_types[j],atomtypes[angletypes[i].types[j]].potential,5);
	k = find_match(3,potential_types,ff_ang,&backwards);
	if (k < 0) {
	get_equivs(3,potential_types,equiv_types);
	if (pflag > 2) {
	printf(" Using equivalences for angle %s %s %s -> %s %s %s\n",
	potential_types[0],potential_types[1],
	potential_types[2],
	equiv_types[0],equiv_types[1],
	equiv_types[2]);
	}
	k = find_match(3,equiv_types,ff_ang,&backwards);
	}
	if (k < 0) {
	printf(" Unable to find angle data for %s %s %s\n",
	potential_types[0],potential_types[1],potential_types[2]);
	condexit(13);
	} else {
	if (forcefield & (FF_TYPE_CLASS1\|FF_TYPE_OPLSAA)) {
	angletypes[i].params[0] = ff_ang.data[k].ff_param[1];
	angletypes[i].params[1] = ff_ang.data[k].ff_param[0];
	}

	if (forcefield & FF_TYPE_CLASS2) {
	for (j=0; j < 4; j++)
	angletypes[i].params[j] = ff_ang.data[k].ff_param[j];
	}
	}
	if (forcefield & FF_TYPE_CLASS2) {
	get_equivs(3,potential_types,equiv_types);
	if (pflag > 2) {
	printf(" Using equivalences for 3 body cross terms %s %s %s -> %s %s %s\n",
	potential_types[0],potential_types[1],potential_types[2],
	equiv_types[0],equiv_types[1],equiv_types[2]);
	}
	for (j=0; j < 3; j++) angletypes[i].bondbond_cross_term[j] = 0.0;
	for (j=0; j < 4; j++) angletypes[i].bondangle_cross_term[j] = 0.0;

	rab = get_r0(angletypes[i].types[0],angletypes[i].types[1]);
	rbc = get_r0(angletypes[i].types[1],angletypes[i].types[2]);

	angletypes[i].bondbond_cross_term[1] = rab;
	angletypes[i].bondbond_cross_term[2] = rbc;
	angletypes[i].bondangle_cross_term[2] = rab;
	angletypes[i].bondangle_cross_term[3] = rbc;

	k = find_match(3,potential_types,ff_bonbon,&backwards);
	if (k < 0) {
	k = find_match(3,equiv_types,ff_bonbon,&backwards);
	}
	if (k < 0) {
	printf(" Unable to find bondbond data for %s %s %s\n",
	potential_types[0],potential_types[1],potential_types[2]);
	condexit(14);
	} else {
	angletypes[i].bondbond_cross_term[0] = ff_bonbon.data[k].ff_param[0];
	}
	k = find_match(3,potential_types,ff_bonang,&backwards);
	if (k < 0) {
	k = find_match(3,equiv_types,ff_bonang,&backwards);
	}
	if (k < 0) {
	printf(" Unable to find bondangle data for %s %s %s\n",
	potential_types[0],potential_types[1],potential_types[2]);
	condexit(15);
	} else {
	if (backwards) {
	angletypes[i].bondangle_cross_term[0] = ff_bonang.data[k].ff_param[1];
	angletypes[i].bondangle_cross_term[1] = ff_bonang.data[k].ff_param[0];
	} else {
	angletypes[i].bondangle_cross_term[0] = ff_bonang.data[k].ff_param[0];
	angletypes[i].bondangle_cross_term[1] = ff_bonang.data[k].ff_param[1];
	}
	}
	}
	}

	if (pflag > 2) {
	printf("\n Angle Types and Parameters\n");
	for (i=0; i < no_angle_types; i++) {
	for (j=0; j < 3; j++)
	printf(" %-3s", atomtypes[angletypes[i].types[j]].potential);
	for (j=0; j < 4; j++) printf(" %8.4f",angletypes[i].params[j]);
	printf("\n");
	}

	if (forcefield & FF_TYPE_CLASS2) {
	printf("\n BondBond Types and Parameters\n");
	for (i=0; i < no_angle_types; i++) {
	for (j=0; j < 3; j++)
	printf(" %-3s",atomtypes[angletypes[i].types[j]].potential);
	for (j=0; j < 3; j++)
	printf(" %8.4f",angletypes[i].bondbond_cross_term[j]);
	printf("\n");
	}
	printf("\n BondAngle Types and Parameters\n");
	for (i=0; i < no_angle_types; i++) {
	for (j=0; j < 3; j++)
	printf(" %-3s",atomtypes[angletypes[i].types[j]].potential);
	for (j=0; j < 4; j++)
	printf(" %8.4f",angletypes[i].bondangle_cross_term[j]);
	printf("\n");
	}
	}
	}

	/**********************************************************************/
	/* */
	/* Find parameters for dihedral types including endbonddihedral, */
	/* midbonddihedral, angledihedral, angleangledihedral and */
	/* bondbond13 parameters if Class II */
	/* */
	/* Each of the cross terms are searched separately even though */
	/* they share a given dihedral type. This allows parameters to be */
	/* in different order in the forcefield for each cross term or */
	/* maybe not even there. */
	/* */
	/**********************************************************************/

	for (i=0; i < no_dihedral_types; i++) {
	for (j=0; j < 6; j++)
	dihedraltypes[i].params[j] = 0.0;
	for (j=0; j < 4; j++)
	strncpy(potential_types[j],
	atomtypes[dihedraltypes[i].types[j]].potential,5);
	backwards = 0;
	k = find_match(4,potential_types,ff_tor,&backwards);

	if (k < 0) {
	get_equivs(4,potential_types,equiv_types);

	if (pflag > 2) {
	printf(" Using equivalences for dihedral %s %s %s %s -> %s %s %s %s\n",
	potential_types[0],potential_types[1],
	potential_types[2],potential_types[3],
	equiv_types[0],equiv_types[1],
	equiv_types[2],equiv_types[3]);
	}
	k = find_match(4,equiv_types,ff_tor,&backwards);
	}
	if (k < 0) {
	printf(" Unable to find torsion data for %s %s %s %s\n",
	potential_types[0],
	potential_types[1],
	potential_types[2],
	potential_types[3]);
	condexit(16);
	} else {
	if (forcefield & FF_TYPE_CLASS1) {
	multiplicity = 1;
	if (ff_tor.data[k].ff_types[0][0] == '*')
	multiplicity =
	atomtypes[dihedraltypes[i].types[1]].no_connect-1;
	if (ff_tor.data[k].ff_types[3][0] == '*')
	multiplicity *=
	atomtypes[dihedraltypes[i].types[2]].no_connect-1;

	dihedraltypes[i].params[0] = ff_tor.data[k].ff_param[0]/(double) multiplicity;
	if (ff_tor.data[k].ff_param[2] == 0.0)
	dihedraltypes[i].params[1] = 1.0;
	else if (ff_tor.data[k].ff_param[2] == 180.0)
	dihedraltypes[i].params[1] = -1.0;
	else {
	printf(" Non planar phi0 for %s %s %s %s\n",
	potential_types[0],potential_types[1],
	potential_types[2],potential_types[3]);
	dihedraltypes[i].params[1] = 0.0;
	}
	dihedraltypes[i].params[2] = ff_tor.data[k].ff_param[1];
	}
	if (forcefield & FF_TYPE_OPLSAA) {
	for (j=0; j < 4; j++)
	dihedraltypes[i].params[j] = ff_tor.data[k].ff_param[j];
	}
	if (forcefield & FF_TYPE_CLASS2) {
	for (j=0; j < 6; j++)
	dihedraltypes[i].params[j] = ff_tor.data[k].ff_param[j];
	}
	}

	if (forcefield & FF_TYPE_CLASS2) {
	get_equivs(4,potential_types,equiv_types);
	if (pflag > 2) {
	printf(" Using equivalences for linear 4 body cross terms %s %s %s %s -> %s %s %s %s\n",
	potential_types[0],potential_types[1],
	potential_types[2],potential_types[3],
	equiv_types[0],equiv_types[1],
	equiv_types[2],equiv_types[3]);
	}

	for (j=0; j < 8; j++)
	dihedraltypes[i].endbonddihedral_cross_term[j] = 0.0;
	for (j=0; j < 4; j++)
	dihedraltypes[i].midbonddihedral_cross_term[j] = 0.0;
	for (j=0; j < 8; j++)
	dihedraltypes[i].angledihedral_cross_term[j] = 0.0;
	for (j=0; j < 3; j++)
	dihedraltypes[i].angleangledihedral_cross_term[j] = 0.0;
	for (j=0; j < 3; j++)
	dihedraltypes[i].bond13_cross_term[j] = 0.0;

	rab = get_r0(dihedraltypes[i].types[0],dihedraltypes[i].types[1]);
	rbc = get_r0(dihedraltypes[i].types[1],dihedraltypes[i].types[2]);
	rcd = get_r0(dihedraltypes[i].types[2],dihedraltypes[i].types[3]);
	tabc = get_t0(dihedraltypes[i].types[0],
	dihedraltypes[i].types[1],
	dihedraltypes[i].types[2]);

	tbcd = get_t0(dihedraltypes[i].types[1],
	dihedraltypes[i].types[2],
	dihedraltypes[i].types[3]);

	dihedraltypes[i].endbonddihedral_cross_term[6] = rab;
	dihedraltypes[i].endbonddihedral_cross_term[7] = rcd;
	dihedraltypes[i].midbonddihedral_cross_term[3] = rbc;
	dihedraltypes[i].angledihedral_cross_term[6] = tabc;
	dihedraltypes[i].angledihedral_cross_term[7] = tbcd;
	dihedraltypes[i].angleangledihedral_cross_term[1] = tabc;
	dihedraltypes[i].angleangledihedral_cross_term[2] = tbcd;
	dihedraltypes[i].bond13_cross_term[1] = rab;
	dihedraltypes[i].bond13_cross_term[2] = rcd;

	backwards = 0;
	k = find_match(4,potential_types,ff_endbontor,&backwards);
	if (k < 0) {
	k = find_match(4,equiv_types,ff_endbontor,&backwards);
	}
	if (k < 0) {
	printf(" Unable to find endbonddihedral data for %s %s %s %s\n",
	potential_types[0],potential_types[1],
	potential_types[2],potential_types[3]);
	condexit(17);
	} else {
	if (backwards) {
	dihedraltypes[i].endbonddihedral_cross_term[0] =
	ff_endbontor.data[k].ff_param[3];
	dihedraltypes[i].endbonddihedral_cross_term[1] =
	ff_endbontor.data[k].ff_param[4];
	dihedraltypes[i].endbonddihedral_cross_term[2] =
	ff_endbontor.data[k].ff_param[5];
	dihedraltypes[i].endbonddihedral_cross_term[3] =
	ff_endbontor.data[k].ff_param[0];
	dihedraltypes[i].endbonddihedral_cross_term[4] =
	ff_endbontor.data[k].ff_param[1];
	dihedraltypes[i].endbonddihedral_cross_term[5] =
	ff_endbontor.data[k].ff_param[2];
	} else {
	dihedraltypes[i].endbonddihedral_cross_term[0] =
	ff_endbontor.data[k].ff_param[0];
	dihedraltypes[i].endbonddihedral_cross_term[1] =
	ff_endbontor.data[k].ff_param[1];
	dihedraltypes[i].endbonddihedral_cross_term[2] =
	ff_endbontor.data[k].ff_param[2];
	dihedraltypes[i].endbonddihedral_cross_term[3] =
	ff_endbontor.data[k].ff_param[3];
	dihedraltypes[i].endbonddihedral_cross_term[4] =
	ff_endbontor.data[k].ff_param[4];
	dihedraltypes[i].endbonddihedral_cross_term[5] =
	ff_endbontor.data[k].ff_param[5];
	}
	}
	backwards = 0;
	k = find_match(4,potential_types,ff_midbontor,&backwards);
	if (k < 0) {
	k = find_match(4,equiv_types,ff_midbontor,&backwards);
	}
	if (k < 0) {
	printf(" Unable to find midbonddihedral data for %s %s %s %s\n",
	potential_types[0],potential_types[1],
	potential_types[2],potential_types[3]);
	condexit(18);
	} else {
	dihedraltypes[i].midbonddihedral_cross_term[0] =
	ff_midbontor.data[k].ff_param[0];
	dihedraltypes[i].midbonddihedral_cross_term[1] =
	ff_midbontor.data[k].ff_param[1];
	dihedraltypes[i].midbonddihedral_cross_term[2] =
	ff_midbontor.data[k].ff_param[2];
	}

	backwards = 0;
	k = find_match(4,potential_types,ff_angtor,&backwards);
	if (k < 0) {
	k = find_match(4,equiv_types,ff_angtor,&backwards);
	}
	if (k < 0) {
	printf(" Unable to find angledihedral data for %s %s %s %s\n",
	potential_types[0],potential_types[1],
	potential_types[2],potential_types[3]);
	condexit(19);
	} else {
	if (backwards) {
	dihedraltypes[i].angledihedral_cross_term[0] =
	ff_angtor.data[k].ff_param[3];
	dihedraltypes[i].angledihedral_cross_term[1] =
	ff_angtor.data[k].ff_param[4];
	dihedraltypes[i].angledihedral_cross_term[2] =
	ff_angtor.data[k].ff_param[5];
	dihedraltypes[i].angledihedral_cross_term[3] =
	ff_angtor.data[k].ff_param[0];
	dihedraltypes[i].angledihedral_cross_term[4] =
	ff_angtor.data[k].ff_param[1];
	dihedraltypes[i].angledihedral_cross_term[5] =
	ff_angtor.data[k].ff_param[2];
	} else {
	dihedraltypes[i].angledihedral_cross_term[0] =
	ff_angtor.data[k].ff_param[0];
	dihedraltypes[i].angledihedral_cross_term[1] =
	ff_angtor.data[k].ff_param[1];
	dihedraltypes[i].angledihedral_cross_term[2] =
	ff_angtor.data[k].ff_param[2];
	dihedraltypes[i].angledihedral_cross_term[3] =
	ff_angtor.data[k].ff_param[3];
	dihedraltypes[i].angledihedral_cross_term[4] =
	ff_angtor.data[k].ff_param[4];
	dihedraltypes[i].angledihedral_cross_term[5] =
	ff_angtor.data[k].ff_param[5];
	}
	}
	backwards = 0;
	k = find_match(4,potential_types,ff_angangtor,&backwards);
	if (k < 0) {
	k = find_match(4,equiv_types,ff_angangtor,&backwards);
	}
	if (k < 0) {
	printf(" Unable to find angleangledihedral data for %s %s %s %s\n",
	potential_types[0],potential_types[1],
	potential_types[2],potential_types[3]);
	condexit(20);
	} else {
	dihedraltypes[i].angleangledihedral_cross_term[0] =
	ff_angangtor.data[k].ff_param[0];
	}
	cp_type = quo_cp();
	if ((cp_type >= 0) &&
	((dihedraltypes[i].types[0] == cp_type) \|\|
	(dihedraltypes[i].types[1] == cp_type) \|\|
	(dihedraltypes[i].types[2] == cp_type) \|\|
	(dihedraltypes[i].types[3] == cp_type) )) {
	backwards = 0;
	k = find_match(4,potential_types,ff_bonbon13,&backwards);
	if (k < 0) {
	k = find_match(4,equiv_types,ff_bonbon13,&backwards);
	}
	if (k < 0) {
	printf(" Unable to find bond13 data for %s %s %s %s\n",
	potential_types[0],potential_types[1],
	potential_types[2],potential_types[3]);
	condexit(21);
	} else {
	dihedraltypes[i].bond13_cross_term[0] =
	ff_bonbon13.data[k].ff_param[0];
	}
	}
	}
	}

	if (pflag > 2) {
	printf("\n Dihedral Types and Parameters\n");
	for (i=0; i < no_dihedral_types; i++) {
	for (j=0; j < 4; j++)
	printf(" %-3s",atomtypes[dihedraltypes[i].types[j]].potential);
	for (j=0; j < 6; j++)
	printf(" %8.4f",dihedraltypes[i].params[j]);
	printf("\n");
	}

	if (forcefield & FF_TYPE_CLASS2) {

	printf("\n EndBondDihedral Types and Parameters\n");
	for (i=0; i < no_dihedral_types; i++) {
	for (j=0; j < 4; j++)
	printf(" %-3s",atomtypes[dihedraltypes[i].types[j]].potential);
	for (j=0; j < 8; j++)
	printf(" %8.4f",dihedraltypes[i].endbonddihedral_cross_term[j]);
	printf("\n");
	}
	printf("\n MidBondDihedral Types and Parameters\n");
	for (i=0; i < no_dihedral_types; i++) {
	for (j=0; j < 4; j++)
	printf(" %-3s",atomtypes[dihedraltypes[i].types[j]].potential);
	for (j=0; j < 4; j++)
	printf(" %8.4f",dihedraltypes[i].midbonddihedral_cross_term[j]);
	printf("\n");
	}

	printf("\n AngleDihedral Types and Parameters\n");
	for (i=0; i < no_dihedral_types; i++) {
	for (j=0; j < 4; j++)
	printf(" %-3s",atomtypes[dihedraltypes[i].types[j]].potential);
	for (j=0; j < 8; j++)
	printf(" %8.4f",dihedraltypes[i].angledihedral_cross_term[j]);
	printf("\n");
	}

	printf("\n AngleAngleDihedral Types and Parameters\n");
	for (i=0; i < no_dihedral_types; i++) {
	for (j=0; j < 4; j++)
	printf(" %-3s",atomtypes[dihedraltypes[i].types[j]].potential);
	for (j=0; j < 3; j++)
	printf(" %8.4f",dihedraltypes[i].angleangledihedral_cross_term[j]);
	printf("\n");
	}

	printf("\n Bond13 Types and Parameters\n");

	for (i=0; i < no_dihedral_types; i++) {
	for (j=0; j < 4; j++)
	printf(" %-3s",atomtypes[dihedraltypes[i].types[j]].potential);
	for (j=0; j < 3; j++)
	printf(" %8.4f",dihedraltypes[i].bond13_cross_term[j]);
	printf("\n");
	}
	}
	}


	/**********************************************************************/
	/* */
	/* Find parameters for oop types */
	/* */
	/* This is the most complicated of all the types because the */
	/* the class I oop is actually an improper torsion and does */
	/* not have the permutation symmetry of a well defined oop */
	/* The net result is that if one does not find the current */
	/* atom type ordering in the forcefield file then one must try each */
	/* of the next permutations (6 in total) and when a match is found */
	/* the program must go back and rearrange the oop type AND the atom */
	/* ordering in the oop lists for those with the current type */
	/* */
	/* The Class II oop types are easier but also tedious since the */
	/* program has to try all permutations of the a c and d atom */
	/* types to find a match. A special routine is used to do this. */
	/* */
	/* Fortunately, there are typically few oop types */
	/* */
	/**********************************************************************/

	if (forcefield & FF_TYPE_CLASS1) {
	for (i=0; i < no_oop_types; i++) {
	for (j=0; j < 3; j++) ooptypes[i].params[j] = 0.0;
	for (j=0; j < 4; j++)
	strncpy(potential_types[j],
	atomtypes[ooptypes[i].types[j]].potential,5);

	k = find_improper_body_data(potential_types,ff_oop,&rearrange);
	if (k < 0) {
	get_equivs(5,potential_types,equiv_types);

	if (pflag > 2) {
	printf(" Using equivalences for oop %s %s %s %s -> %s %s %s %s\n",
	potential_types[0],potential_types[1],
	potential_types[2],potential_types[3],
	equiv_types[0],equiv_types[1],
	equiv_types[2],equiv_types[3]);
	}
	k = find_improper_body_data(equiv_types,ff_oop,&rearrange);
	}
	if (k < 0) {
	printf(" Unable to find oop data for %s %s %s %s\n",
	potential_types[0],
	potential_types[1],potential_types[2],potential_types[3]);
	condexit(22);
	} else {
	ooptypes[i].params[0] = ff_oop.data[k].ff_param[0];
	if (ff_oop.data[k].ff_param[2] == 0.0)
	ooptypes[i].params[1] = 1.0;
	else if (ff_oop.data[k].ff_param[2] == 180.0)
	ooptypes[i].params[1] = -1.0;
	else {
	printf(" Non planar phi0 for %s %s %s %s\n",
	potential_types[0],potential_types[1],
	potential_types[2],potential_types[3]);
	ooptypes[i].params[1] = 0.0;
	}
	ooptypes[i].params[2] = ff_oop.data[k].ff_param[1];
	if (rearrange > 0) rearrange_improper(i,rearrange);
	}
	}
	}

	if (forcefield & FF_TYPE_CLASS2) {
	for (i=0; i < no_oop_types; i++) {
	for (j=0; j < 3; j++)
	ooptypes[i].params[j] = 0.0;
	for (j=0; j < 4; j++)
	strncpy(potential_types[j],
	atomtypes[ooptypes[i].types[j]].potential,5);
	k = find_trigonal_body_data(potential_types,ff_oop);
	if (k < 0) {
	get_equivs(5,potential_types,equiv_types);
	if (pflag > 2) {
	printf(" Using equivalences for oop %s %s %s %s -> %s %s %s %s\n",
	potential_types[0],potential_types[1],
	potential_types[2],potential_types[3],
	equiv_types[0],equiv_types[1],
	equiv_types[2],equiv_types[3]);
	}
	k = find_trigonal_body_data(equiv_types,ff_oop);
	}
	if (k < 0) {
	printf(" Unable to find oop data for %s %s %s %s\n",
	potential_types[0],
	potential_types[1],potential_types[2],potential_types[3]);
	condexit(23);
	} else {
	for (j=0; j < 2; j++)
	ooptypes[i].params[j] = ff_oop.data[k].ff_param[j];
	}
	}
	}

	if (pflag > 2) {
	printf("\n OOP Types and Parameters\n");
	for (i=0; i < no_oop_types; i++) {
	for (j=0; j < 4; j++)
	printf(" %-3s",atomtypes[ooptypes[i].types[j]].potential);
	for (j=0; j < 3; j++)
	printf(" %8.4f",ooptypes[i].params[j]);
	printf("\n");
	}
	}


	/**********************************************************************/
	/* */
	/* Find parameters for angleangle types (Class II only) */
	/* */
	/* This is somewhat complicated in that one set of four types */
	/* a b c d has three angleangle combinations so for each type */
	/* the program needs to find three sets of parameters by */
	/* progressively looking for data for different permutations of */
	/* a c and d */
	/* */
	/**********************************************************************/

	if (forcefield & FF_TYPE_CLASS2) {

	for (i=0; i < no_oop_types; i++) {

	for (j=0; j < 6; j++) ooptypes[i].angleangle_params[j] = 0.0;

	for (j=0; j < 4; j++)
	strncpy(potential_types[j],
	atomtypes[ooptypes[i].types[j]].potential,5);


	tabc = get_t0(ooptypes[i].types[0],
	ooptypes[i].types[1],
	ooptypes[i].types[2]);

	tabd = get_t0(ooptypes[i].types[0],
	ooptypes[i].types[1],
	ooptypes[i].types[3]);
	tcbd = get_t0(ooptypes[i].types[2],
	ooptypes[i].types[1],

	ooptypes[i].types[3]);

	ooptypes[i].angleangle_params[3] = tabc;
	ooptypes[i].angleangle_params[4] = tcbd;
	ooptypes[i].angleangle_params[5] = tabd;

	k = find_angleangle_data(potential_types,ff_angang,kloc);
	if (k < 0) {
	get_equivs(5,potential_types,equiv_types);
	if (pflag > 2) {
	printf(" Using equivalences for angleangle %s %s %s %s -> %s %s %s %s\n",
	potential_types[0],potential_types[1],
	potential_types[2],potential_types[3],
	equiv_types[0],equiv_types[1],
	equiv_types[2],equiv_types[3]);
	k = find_angleangle_data(equiv_types,ff_angang,kloc);
	}
	}
	if (k < 0) {
	printf(" Unable to find angleangle data for %s %s %s %s\n",
	potential_types[0],
	potential_types[1],potential_types[2],potential_types[3]);
	condexit(24);
	} else {
	for (j=0; j < 3; j++) {
	if (kloc[j] > -1)
	ooptypes[i].angleangle_params[j] = ff_angang.data[kloc[j]].ff_param[0];
	}
	}
	}

	for (i=0; i < no_angleangle_types; i++) {
	for (j=0; j < 6; j++) angleangletypes[i].params[j] = 0.0;
	for (j=0; j < 4; j++)
	strncpy(potential_types[j],
	atomtypes[angleangletypes[i].types[j]].potential,5);

	tabc = get_t0(angleangletypes[i].types[0],
	angleangletypes[i].types[1],
	angleangletypes[i].types[2]);
	tabd = get_t0(angleangletypes[i].types[0],
	angleangletypes[i].types[1],
	angleangletypes[i].types[3]);
	tcbd = get_t0(angleangletypes[i].types[2],
	angleangletypes[i].types[1],
	angleangletypes[i].types[3]);

	angleangletypes[i].params[3] = tabc;
	angleangletypes[i].params[4] = tcbd;
	angleangletypes[i].params[5] = tabd;

	k = find_angleangle_data(potential_types,ff_angang,kloc);
	if (k < 0) {
	get_equivs(5,potential_types,equiv_types);
	if (pflag > 2) {
	printf("Using equivalences for angleangle %s %s %s %s -> %s %s %s %s\n",
	potential_types[0],potential_types[1],
	potential_types[2],potential_types[3],
	equiv_types[0],equiv_types[1],
	equiv_types[2],equiv_types[3]);
	}
	k = find_angleangle_data(equiv_types,ff_angang,kloc);
	}
	if (k < 0) {
	printf(" Unable to find angleangle data for %s %s %s %s\n",
	potential_types[0],
	potential_types[1],potential_types[2],potential_types[3]);
	condexit(25);
	} else {
	for (j=0; j < 3; j++) {
	if (kloc[j] > -1)
	angleangletypes[i].params[j] =
	ff_angang.data[kloc[j]].ff_param[0];
	}
	}
	}
	if (pflag > 2) {
	printf("\n AngleAngle Types and Parameters\n");
	for (i=0; i < no_oop_types; i++) {
	for (j=0; j < 4; j++)
	printf(" %-3s",atomtypes[ooptypes[i].types[j]].potential);
	for (j=0; j < 6; j++)
	printf(" %8.4f",ooptypes[i].angleangle_params[j]);
	printf("\n");
	}
	for (i=0; i < no_angleangle_types; i++) {
	for (j=0; j < 4; j++)
	printf(" %-3s",atomtypes[angleangletypes[i].types[j]].potential);
	for (j=0; j < 6; j++) printf(" %8.4f",angleangletypes[i].params[j]);
	printf("\n");
	}
	}
	}
	}

	int find_improper_body_data(char types1[][5],struct FrcFieldItem item,
	int *rearrange_ptr)
	{
	int k,backwards;
	char mirror_types[4][5];

	backwards = 0;

	/* a b c d */

	*rearrange_ptr = 0;
	k = find_match(4,types1,item,&backwards);
	if (k >= 0) return k;

	/* a b d c */

	*rearrange_ptr = 1;
	strncpy(mirror_types[0],types1[0],5);
	strncpy(mirror_types[1],types1[1],5);
	strncpy(mirror_types[2],types1[3],5);
	strncpy(mirror_types[3],types1[2],5);
	k = find_match(4,mirror_types,item,&backwards);
	if (k >= 0) return k;

	/* d b a c */

	*rearrange_ptr = 2;
	strncpy(mirror_types[0],types1[3],5);
	strncpy(mirror_types[2],types1[0],5);
	strncpy(mirror_types[3],types1[2],5);
	k = find_match(4,mirror_types,item,&backwards);
	if (k >= 0) return k;

	/* d b c a */

	*rearrange_ptr = 3;
	strncpy(mirror_types[2],types1[2],5);
	strncpy(mirror_types[3],types1[0],5);
	k = find_match(4,mirror_types,item,&backwards);
	if (k >= 0) return k;

	/* c b a d */

	*rearrange_ptr = 4;
	strncpy(mirror_types[0],types1[2],5);
	strncpy(mirror_types[2],types1[0],5);
	strncpy(mirror_types[3],types1[3],5);
	k = find_match(4,mirror_types,item,&backwards);
	if (k >= 0) return k;

	/* c b d a */

	*rearrange_ptr = 5;
	strncpy(mirror_types[2],types1[3],5);
	strncpy(mirror_types[3],types1[0],5);
	k = find_match(4,mirror_types,item,&backwards);
	return k;
	}

	void rearrange_improper(int ooptype,int rearrange)
	{
	int i,j,temp[4];

	for (i=0; i < 4; i++) temp[i] = ooptypes[ooptype].types[i];

	switch (rearrange) {
	case 1:
	ooptypes[ooptype].types[0] = temp[0];
	ooptypes[ooptype].types[2] = temp[3];
	ooptypes[ooptype].types[3] = temp[2];
	for (i=0; i < total_no_oops; i++) {
	if (oops[i].type == ooptype) {
	for (j=0; j < 4; j++) temp[j] = oops[i].members[j];
	oops[i].members[2] = temp[3];
	oops[i].members[3] = temp[2];
	}
	}
	break;
	case 2:
	ooptypes[ooptype].types[0] = temp[3];
	ooptypes[ooptype].types[2] = temp[0];
	ooptypes[ooptype].types[3] = temp[2];
	for (i=0; i < total_no_oops; i++) {
	if (oops[i].type == ooptype) {
	for (j=0; j < 4; j++) temp[j] = oops[i].members[j];
	oops[i].members[0] = temp[3];
	oops[i].members[2] = temp[0];
	oops[i].members[3] = temp[2];
	}
	}
	break;
	case 3:
	ooptypes[ooptype].types[0] = temp[3];
	ooptypes[ooptype].types[2] = temp[2];
	ooptypes[ooptype].types[3] = temp[0];
	for (i=0; i < total_no_oops; i++) {
	if (oops[i].type == ooptype) {
	for (j=0; j < 4; j++) temp[j] = oops[i].members[j];
	oops[i].members[0] = temp[3];
	oops[i].members[2] = temp[2];
	oops[i].members[3] = temp[0];
	}
	}
	break;
	case 4:
	ooptypes[ooptype].types[0] = temp[2];
	ooptypes[ooptype].types[2] = temp[0];
	ooptypes[ooptype].types[3] = temp[3];
	for (i=0; i < total_no_oops; i++) {
	if (oops[i].type == ooptype) {
	for (j=0; j < 4; j++) temp[j] = oops[i].members[j];
	oops[i].members[0] = temp[2];
	oops[i].members[2] = temp[0];
	oops[i].members[3] = temp[3];
	}
	}
	break;
	case 5:
	ooptypes[ooptype].types[0] = temp[2];
	ooptypes[ooptype].types[2] = temp[3];
	ooptypes[ooptype].types[3] = temp[0];
	for (i=0; i < total_no_oops; i++) {
	if (oops[i].type == ooptype) {
	for (j=0; j < 4; j++) temp[j] = oops[i].members[j];
	oops[i].members[0] = temp[2];
	oops[i].members[2] = temp[3];
	oops[i].members[3] = temp[0];
	}
	}
	break;
	default:
	break;
	}
	}

	int find_trigonal_body_data(char types1[][5],struct FrcFieldItem item)
	{
	int k,backwards;
	char mirror_types[4][5];

	backwards = -1;

	/* a b c d */

	k = find_match(4,types1,item,&backwards);
	if (k >= 0) return k;

	/* a b d c */

	strncpy(mirror_types[0],types1[0],5);
	strncpy(mirror_types[1],types1[1],5);
	strncpy(mirror_types[2],types1[3],5);
	strncpy(mirror_types[3],types1[2],5);
	k = find_match(4,mirror_types,item,&backwards);
	if (k >= 0) return k;

	/* d b a c */

	strncpy(mirror_types[0],types1[3],5);
	strncpy(mirror_types[2],types1[0],5);
	strncpy(mirror_types[3],types1[2],5);
	k = find_match(4,mirror_types,item,&backwards);
	if (k >= 0) return k;

	/* d b c a */

	strncpy(mirror_types[2],types1[2],5);
	strncpy(mirror_types[3],types1[0],5);
	k = find_match(4,mirror_types,item,&backwards);
	if (k >= 0) return k;
	/* c b a d */

	strncpy(mirror_types[0],types1[2],5);
	strncpy(mirror_types[2],types1[0],5);
	strncpy(mirror_types[3],types1[3],5);
	k = find_match(4,mirror_types,item,&backwards);
	if (k >= 0) return k;

	/* c b d a */

	strncpy(mirror_types[2],types1[3],5);
	strncpy(mirror_types[3],types1[0],5);
	k = find_match(4,mirror_types,item,&backwards);
	return k;
	}

	int find_angleangle_data(char types1[][5],struct FrcFieldItem item,int kloc[3])
	{
	int k,backwards = -1;
	char mirror_types[4][5];

	strncpy(mirror_types[1],types1[1],5);

	/* go for first parameter a b c d or d b c a */

	k = find_match(4,types1,item,&backwards);
	if (k < 0) {
	strncpy(mirror_types[0],types1[3],5);
	strncpy(mirror_types[2],types1[2],5);
	strncpy(mirror_types[3],types1[0],5);
	k = find_match(4,mirror_types,item,&backwards);
	}
	kloc[0] = k;

	/* go for second parameter d b a c or c b a d */

	strncpy(mirror_types[0],types1[3],5);
	strncpy(mirror_types[2],types1[0],5);
	strncpy(mirror_types[3],types1[2],5);
	k = find_match(4,mirror_types,item,&backwards);
	if (k < 0) {
	strncpy(mirror_types[0],types1[2],5);
	strncpy(mirror_types[3],types1[3],5);
	k = find_match(4,mirror_types,item,&backwards);
	}
	kloc[1] = k;

	/* go for third parameter a b d c or c b d a */

	strncpy(mirror_types[0],types1[0],5);
	strncpy(mirror_types[2],types1[3],5);
	strncpy(mirror_types[3],types1[2],5);
	k = find_match(4,mirror_types,item,&backwards);
	if (k < 0) {
	strncpy(mirror_types[0],types1[2],5);
	strncpy(mirror_types[3],types1[0],5);
	k = find_match(4,mirror_types,item,&backwards);
	}
	kloc[2] = k;
	k = 0;
	if ((kloc[0] < 0) && (kloc[1] < 0) && (kloc[2] < 0)) k = -1;
	return k;
	}

	int find_match(int n, char types1[][5],struct FrcFieldItem item,int
	*backwards_ptr)
	{
	int k,match;

	match = 0;
	k=0;

	/* Try for an exact match (no wildcards) first */

	while (!match && (k < item.entries)) {
	if (match_types(n, 0,types1,item.data[k].ff_types,backwards_ptr) == 1)
	match = 1;
	else
	k++;
	}

	/* Try again - allow wildcard matching */

	if (!match) {
	k=0;
	while (!match && (k < item.entries)) {
	if (match_types(n,1,types1,item.data[k].ff_types,backwards_ptr) == 1)
	match = 1;
	else
	k++;
	}
	}
	if (match) return k;
	else return -1;
	}

	int match_types(int n,int wildcard,char types1[][5],char types2[][5],
	int *backwards_ptr)
	{
	int k,match;

	/* Routine to match short arrays of characters strings which contain
	atom potential types. The arrays range from 1 to 4 (VDW or equivalences,
	bond, angle, dihedrals or oops). There are potentially four ways the
	arrays can match: exact match (forwards), exact match when one array is
	run backwards (backwards), forwards with wildcard character match allowed
	(forwards *) and finally backwards with wildcard character match
	(backwards *). If the variable, backwards (pointed by backwards_ptr)
	is -1, then the backwards options are not to be used (such when
	matching oop types)
	*/


	if (wildcard == 0) {

	/* forwards */

	k=0;
	match = 1;
	while (match && (k < n)) {
	if (strncmp(types1[k],types2[k],5) == 0)
	k++;
	else
	match = 0;
	}
	} else {

	/* forwards * */

	k=0;

	match = 1;
	while (match && (k < n)) {
	if ((strncmp(types1[k],types2[k],5) == 0) \|\|
	(types2[k][0] == '*'))
	k++;
	else
	match = 0;
	}
	}

	if (match) {
	*backwards_ptr = 0;
	return 1;
	}
	if ((n < 2) \|\| (*backwards_ptr == -1)) return 0;

	if (wildcard == 0) {

	/* backwards */

	k=0;
	match = 1;
	while (match && (k < n)) {
	if (strncmp(types1[n-k-1],types2[k],5) == 0)
	k++;
	else
	match = 0;
	}
	} else {

	/* backwards * */

	k=0;
	match = 1;
	while (match && (k < n)) {
	if ((strncmp(types1[n-k-1],types2[k],5) == 0) \|\|
	(types2[k][0] == '*') )
	k++;
	else
	match = 0;
	}
	}

	if (match) {
	*backwards_ptr = 1;
	return 1;
	} else return 0;
	}

	double get_r0(int typei,int typej)
	{
	int k,match;
	double r;

	k=0;
	match=0;
	r = 0.0;

	while (!match && (k < no_bond_types)) {
	if (((typei == bondtypes[k].types[0]) &&
	(typej == bondtypes[k].types[1])) \|\|
	((typej == bondtypes[k].types[0]) &&
	(typei == bondtypes[k].types[1])) ) {
	r = bondtypes[k].params[0];
	match = 1;
	} else k++;
	}

	if (match == 0)
	printf(" Unable to find r0 for types %d %d\n",typei,typej);
	return r;
	}

	double get_t0(int typei,int typej,int typek)
	{
	int k,match;
	double theta;

	k=0;
	match=0;
	theta = 0.0;

	while (!match && (k < no_angle_types)) {
	if (((typei == angletypes[k].types[0]) &&
	(typej == angletypes[k].types[1]) &&
	(typek == angletypes[k].types[2])) \|\|
	((typek == angletypes[k].types[0]) &&
	(typej == angletypes[k].types[1]) &&
	(typei == angletypes[k].types[2])) ) {
	theta = angletypes[k].params[0];
	match = 1;
	} else k++;
	}

	if (match == 0)
	printf(" Unable to find t0 for types %d %d %d\n",
	typei,typej,typek);
	return theta;
	}

	int quo_cp()
	{
	char cp[] = "cp ";
	int i,type,found;

	i = 0;
	type = -1;
	found = 0;

	while (!found && (i < no_atom_types)) {
	if (strncmp(atomtypes[i].potential,cp,2) == 0) {
	found = 1;
	type = i;
	} else i++;
	}

	return type;
	}

	void get_equivs(int ic,char potential_types[][5],char equiv_types[][5])
	{
	int i,k;
	switch (ic) {
	case 1:
	k = find_equiv_type(potential_types[0]);
	if (k > -1) strncpy(equiv_types[0],equivalence.data[k].ff_types[1],5);
	break;

	case 2:
	for (i=0; i < 2; i++) {
	k = find_equiv_type(potential_types[i]);
	if (k > -1) strncpy(equiv_types[i],equivalence.data[k].ff_types[2],5);
	}
	break;
	case 3:
	for (i=0; i < 3; i++) {
	k = find_equiv_type(potential_types[i]);
	if (k > -1) strncpy(equiv_types[i],equivalence.data[k].ff_types[3],5);
	}
	break;
	case 4:
	for (i=0; i < 4; i++) {
	k = find_equiv_type(potential_types[i]);
	if (k > -1) strncpy(equiv_types[i],equivalence.data[k].ff_types[4],5);
	}
	break;

	case 5:
	for (i=0; i < 4; i++) {
	k = find_equiv_type(potential_types[i]);
	if (k > -1)
	strncpy(equiv_types[i],equivalence.data[k].ff_types[5],5);
	}
	break;
	default:
	printf(" Requesting equivalences of unsupported type: %d\n",ic);
	condexit(26);
	break;
	}
	return;
	}

	int find_equiv_type(char potential_type[5])
	{
	int j,k,match;

	j = -1;
	k = 0;
	match = 0;

	while (!match && (k < equivalence.entries)) {
	if (strncmp(potential_type,
	equivalence.data[k].ff_types[0],5) == 0) {
	match = 1;
	j = k;
	} else {
	k++;
	}
	}
	if (j < 0)
	printf(" Unable to find equivalent type for %s\n",potential_type);
	return j;
	}
	diff --git a/tools/msi2lmp/src/InitializeItems.c b/tools/msi2lmp/src/InitializeItems.c
	index 4df9fd0f1..1e3363691 100644
	--- a/tools/msi2lmp/src/InitializeItems.c
	+++ b/tools/msi2lmp/src/InitializeItems.c
	@@ -1,140 +1,140 @@
	/*
	* This function fills in the keyword field, the number of members for each
	* item and the number of parameters for each item
	*
	*/

	#include "msi2lmp.h"
	#include "Forcefield.h"

	#include <string.h>

	void InitializeItems(void)
	{
	/* ATOM TYPES */
	strcpy(ff_atomtypes.keyword,"#atom_types");
	ff_atomtypes.number_of_members = 1;
	ff_atomtypes.number_of_parameters = 1;

	/* EQUIVALENCE */

	strcpy(equivalence.keyword,"#equivalence");
	equivalence.number_of_members = 6;
	equivalence.number_of_parameters = 0;

	/* NON-BOND */

	strcpy(ff_vdw.keyword,"#nonbond");
	ff_vdw.number_of_members = 1;
	ff_vdw.number_of_parameters = 2;

	/* BOND */

	ff_bond.number_of_members = 2;
	if (forcefield & (FF_TYPE_CLASS1\|FF_TYPE_OPLSAA)) {
	strcpy(ff_bond.keyword,"#quadratic_bond");
	ff_bond.number_of_parameters = 2;
	}

	if (forcefield & FF_TYPE_CLASS2) {
	strcpy(ff_bond.keyword,"#quartic_bond");
	ff_bond.number_of_parameters = 4;
	}

	/* MORSE */

	if (forcefield & FF_TYPE_CLASS1) {
	ff_morse.number_of_members = 2;
	strcpy(ff_morse.keyword,"#morse_bond");
	ff_morse.number_of_parameters = 3;
	}

	/* ANGLE */

	ff_ang.number_of_members = 3;
	if (forcefield & (FF_TYPE_CLASS1\|FF_TYPE_OPLSAA)) {
	strcpy(ff_ang.keyword,"#quadratic_angle");
	ff_ang.number_of_parameters = 2;
	}

	if (forcefield & FF_TYPE_CLASS2) {
	strcpy(ff_ang.keyword,"#quartic_angle");
	ff_ang.number_of_parameters = 4;
	}

	/* TORSION */

	ff_tor.number_of_members = 4;
	if (forcefield & (FF_TYPE_CLASS1\|FF_TYPE_OPLSAA)) {
	strcpy(ff_tor.keyword,"#torsion_1");
	ff_tor.number_of_parameters = 3;
	- }
	+ }

	if (forcefield & FF_TYPE_CLASS2) {
	strcpy(ff_tor.keyword,"#torsion_3");
	ff_tor.number_of_parameters = 6;
	}

	/* OOP */

	ff_oop.number_of_members = 4;
	if (forcefield & (FF_TYPE_CLASS1\|FF_TYPE_OPLSAA)) {
	strcpy(ff_oop.keyword,"#out_of_plane");
	ff_oop.number_of_parameters = 3;
	}

	if (forcefield & FF_TYPE_CLASS2) {
	strcpy(ff_oop.keyword,"#wilson_out_of_plane");
	ff_oop.number_of_parameters = 2;
	}

	if (forcefield & FF_TYPE_CLASS2) {
	/* BOND-BOND */

	strcpy(ff_bonbon.keyword,"#bond-bond");
	ff_bonbon.number_of_members = 3;
	ff_bonbon.number_of_parameters = 1;

	/* BOND-ANGLE */

	strcpy(ff_bonang.keyword,"#bond-angle");
	ff_bonang.number_of_members = 3;
	ff_bonang.number_of_parameters = 2;

	/* ANGLE-TORSION */

	strcpy(ff_angtor.keyword,"#angle-torsion_3");
	ff_angtor.number_of_members = 4;
	ff_angtor.number_of_parameters = 6;

	/* ANGLE-ANGLE-TORSION */

	strcpy(ff_angangtor.keyword,"#angle-angle-torsion_1");
	ff_angangtor.number_of_members = 4;
	ff_angangtor.number_of_parameters = 1;

	/* END-BOND-TORSION */

	strcpy(ff_endbontor.keyword,"#end_bond-torsion_3");
	ff_endbontor.number_of_members = 4;
	ff_endbontor.number_of_parameters = 6;

	/* MID-BOND-TORSION */

	strcpy(ff_midbontor.keyword,"#middle_bond-torsion_3");
	ff_midbontor.number_of_members = 4;
	ff_midbontor.number_of_parameters = 3;

	/* ANGLE-ANGLE */

	strcpy(ff_angang.keyword,"#angle-angle");
	ff_angang.number_of_members = 4;
	ff_angang.number_of_parameters = 1;

	/* BOND-BOND-1-3 */

	strcpy(ff_bonbon13.keyword,"#bond-bond_1_3");
	ff_bonbon13.number_of_members = 4;
	ff_bonbon13.number_of_parameters = 1;
	}
	}
	diff --git a/tools/msi2lmp/src/WriteDataFile.c b/tools/msi2lmp/src/WriteDataFile.c
	index 498978406..c03eba71c 100644
	--- a/tools/msi2lmp/src/WriteDataFile.c
	+++ b/tools/msi2lmp/src/WriteDataFile.c
	@@ -1,478 +1,478 @@
	/*
	* This function creates and writes the data file to be used with LAMMPS
	*/

	#include "msi2lmp.h"
	#include "Forcefield.h"

	#include <stdlib.h>

	void WriteDataFile(char *nameroot)
	{
	int i,j,k,m;
	char line[MAX_LINE_LENGTH];
	FILE *DatF;

	/* Open data file */

	sprintf(line,"%s.data",rootname);
	if (pflag > 0) {
	printf(" Writing LAMMPS data file %s.data",rootname);
	if (forcefield & FF_TYPE_CLASS1) puts(" for Class I force field");
	if (forcefield & FF_TYPE_CLASS2) puts(" for Class II force field");
	if (forcefield & FF_TYPE_OPLSAA) puts(" for OPLS-AA force field");
	}

	if ((DatF = fopen(line,"w")) == NULL ) {
	printf("Cannot open %s\n",line);
	exit(62);
	}

	if (forcefield & (FF_TYPE_CLASS1\|FF_TYPE_OPLSAA)) total_no_angle_angles = 0;

	if (hintflag) fprintf(DatF, "LAMMPS data file. msi2lmp " MSI2LMP_VERSION
	" / CGCMM for %s\n\n", nameroot);
	else fprintf(DatF, "LAMMPS data file. msi2lmp " MSI2LMP_VERSION
	" for %s\n\n", nameroot);
	fprintf(DatF, " %6d atoms\n", total_no_atoms);
	fprintf(DatF, " %6d bonds\n", total_no_bonds);
	fprintf(DatF, " %6d angles\n",total_no_angles);
	fprintf(DatF, " %6d dihedrals\n", total_no_dihedrals);
	fprintf(DatF, " %6d impropers\n", total_no_oops+total_no_angle_angles);
	fputs("\n",DatF);


	fprintf(DatF, " %3d atom types\n", no_atom_types);
	if (no_bond_types > 0)
	fprintf(DatF, " %3d bond types\n", no_bond_types);
	if (no_angle_types> 0)
	fprintf(DatF, " %3d angle types\n", no_angle_types);
	if (no_dihedral_types > 0) fprintf (DatF," %3d dihedral types\n",
	no_dihedral_types);
	if (forcefield & FF_TYPE_CLASS1) {
	if (no_oop_types > 0)
	fprintf (DatF, " %3d improper types\n", no_oop_types);
	}

	if (forcefield & FF_TYPE_CLASS2) {
	if ((no_oop_types + no_angleangle_types) > 0)
	fprintf (DatF, " %3d improper types\n",
	no_oop_types + no_angleangle_types);
	}

	/* Modified by SLTM to print out triclinic box types 10/05/10 - lines 56-68 */

	if (TriclinicFlag == 0) {
	fputs("\n",DatF);
	fprintf(DatF, " %15.9f %15.9f xlo xhi\n", box[0][0], box[1][0]);
	fprintf(DatF, " %15.9f %15.9f ylo yhi\n", box[0][1], box[1][1]);
	fprintf(DatF, " %15.9f %15.9f zlo zhi\n", box[0][2], box[1][2]);
	} else {
	fputs("\n",DatF);
	fprintf(DatF, " %15.9f %15.9f xlo xhi\n", box[0][0], box[1][0]);
	fprintf(DatF, " %15.9f %15.9f ylo yhi\n", box[0][1], box[1][1]);
	fprintf(DatF, " %15.9f %15.9f zlo zhi\n", box[0][2], box[1][2]);
	fprintf(DatF, " %15.9f %15.9f %15.9f xy xz yz\n",box[2][0], box[2][1], box[2][2]);
	}

	/* MASSES */

	fprintf(DatF, "\nMasses\n\n");
	for(k=0; k < no_atom_types; k++) {
	if (hintflag) fprintf(DatF, " %3d %10.6f # %s\n",k+1,atomtypes[k].mass,atomtypes[k].potential);
	else fprintf(DatF, " %3d %10.6f\n",k+1,atomtypes[k].mass);
	}
	fputs("\n",DatF);


	/* COEFFICIENTS */

	fputs("Pair Coeffs",DatF);
	if (hintflag) {
	if (forcefield & (FF_TYPE_CLASS1\|FF_TYPE_OPLSAA))
	fputs(" # lj/cut/coul/long\n\n",DatF);
	else if (forcefield & FF_TYPE_CLASS2)
	fputs(" # lj/class2/coul/long\n\n",DatF);
	} else fputs("\n\n",DatF);

	for (i=0; i < no_atom_types; i++) {
	fprintf(DatF, " %3i ", i+1);
	for ( j = 0; j < 2; j++)
	fprintf(DatF, "%14.10f ",atomtypes[i].params[j]);

	if (hintflag) fprintf(DatF, "# %s\n",atomtypes[i].potential);
	else fputs("\n",DatF);
	}
	fputs("\n",DatF);

	if (no_bond_types > 0) {
	m = 0;
	if (forcefield & FF_TYPE_CLASS1) m = 2;
	if (forcefield & FF_TYPE_OPLSAA) m = 2;
	if (forcefield & FF_TYPE_CLASS2) m = 4;

	fputs("Bond Coeffs",DatF);
	if (hintflag) {
	if (forcefield & (FF_TYPE_CLASS1\|FF_TYPE_OPLSAA))
	fputs(" # harmonic\n\n",DatF);
	else if (forcefield & FF_TYPE_CLASS2)
	fputs(" # class2\n\n",DatF);
	} else fputs("\n\n",DatF);

	for (i=0; i < no_bond_types; i++) {
	fprintf(DatF, " %3i", i+1);
	for ( j = 0; j < m; j++)
	fprintf(DatF, " %10.4f", bondtypes[i].params[j]);

	if (hintflag) fprintf(DatF," # %s-%s\n",atomtypes[bondtypes[i].types[0]].potential,
	atomtypes[bondtypes[i].types[1]].potential);
	else fputs("\n",DatF);
	}
	fputs("\n",DatF);
	}

	if (no_angle_types > 0) {
	m = 0;
	if (forcefield & FF_TYPE_CLASS1) m = 2;
	if (forcefield & FF_TYPE_OPLSAA) m = 2;
	if (forcefield & FF_TYPE_CLASS2) m = 4;

	fputs("Angle Coeffs",DatF);
	if (hintflag) {
	if (forcefield & (FF_TYPE_CLASS1\|FF_TYPE_OPLSAA))
	fputs(" # harmonic\n\n",DatF);
	else if (forcefield & FF_TYPE_CLASS2)
	fputs(" # class2\n\n",DatF);
	} else fputs("\n\n",DatF);
	-
	+
	for (i=0; i < no_angle_types; i++) {
	fprintf(DatF, " %3i", i+1);
	for ( j = 0; j < m; j++)
	fprintf(DatF, " %10.4f", angletypes[i].params[j]);

	if (hintflag) fprintf(DatF," # %s-%s-%s\n",
	atomtypes[angletypes[i].types[0]].potential,
	atomtypes[angletypes[i].types[1]].potential,
	atomtypes[angletypes[i].types[2]].potential);
	else fputs("\n",DatF);
	}
	fputs("\n",DatF);
	}

	if (no_dihedral_types > 0) {

	fputs("Dihedral Coeffs",DatF);
	if (forcefield & FF_TYPE_CLASS1) {

	if (hintflag) fputs(" # harmonic\n\n",DatF);
	else fputs("\n\n",DatF);

	for (i=0; i < no_dihedral_types; i++) {
	fprintf(DatF, "%3i %10.4f %3i %3i", i+1,
	dihedraltypes[i].params[0],
	(int) dihedraltypes[i].params[1],
	(int) dihedraltypes[i].params[2]);
	if (hintflag) fprintf(DatF," # %s-%s-%s-%s\n",
	atomtypes[dihedraltypes[i].types[0]].potential,
	atomtypes[dihedraltypes[i].types[1]].potential,
	atomtypes[dihedraltypes[i].types[2]].potential,
	atomtypes[dihedraltypes[i].types[3]].potential);
	else fputs("\n",DatF);
	}
	fputs("\n",DatF);
	} else if (forcefield & FF_TYPE_OPLSAA) {

	if (hintflag) fputs(" # opls\n\n",DatF);
	else fputs("\n\n",DatF);

	for (i=0; i < no_dihedral_types; i++) {
	fprintf(DatF, " %3i",i+1);
	for ( j = 0; j < 4; j++)
	fprintf(DatF, " %10.4f",dihedraltypes[i].params[j]);

	if (hintflag) fprintf(DatF," # %s-%s-%s-%s\n",
	atomtypes[dihedraltypes[i].types[0]].potential,
	atomtypes[dihedraltypes[i].types[1]].potential,
	atomtypes[dihedraltypes[i].types[2]].potential,
	atomtypes[dihedraltypes[i].types[3]].potential);
	else fputs("\n",DatF);
	}
	fputs("\n",DatF);
	} else if (forcefield & FF_TYPE_CLASS2) {

	if (hintflag) fputs(" # class2\n\n",DatF);
	else fputs("\n\n",DatF);

	for (i=0; i < no_dihedral_types; i++) {
	fprintf(DatF, " %3i",i+1);
	for ( j = 0; j < 6; j++)
	fprintf(DatF, " %10.4f",dihedraltypes[i].params[j]);

	if (hintflag) fprintf(DatF,"# %s-%s-%s-%s\n",
	atomtypes[dihedraltypes[i].types[0]].potential,
	atomtypes[dihedraltypes[i].types[1]].potential,
	atomtypes[dihedraltypes[i].types[2]].potential,
	atomtypes[dihedraltypes[i].types[3]].potential);
	else fputs("\n",DatF);
	}
	fputs("\n",DatF);
	}
	}

	if (forcefield & FF_TYPE_CLASS1) {
	if (no_oop_types > 0) {
	/* cvff improper coeffs are: type K0 d n */
	if (hintflag) fputs("Improper Coeffs # cvff\n\n",DatF);
	else fputs("Improper Coeffs\n\n",DatF);

	for (i=0; i < no_oop_types; i++) {
	fprintf(DatF,"%5i %10.4f %3i %3i ",i+1,
	ooptypes[i].params[0], (int) ooptypes[i].params[1],
	(int) ooptypes[i].params[2]);

	if (hintflag) fprintf(DatF,"# %s-%s-%s-%s\n",
	atomtypes[ooptypes[i].types[0]].potential,
	atomtypes[ooptypes[i].types[1]].potential,
	atomtypes[ooptypes[i].types[2]].potential,
	atomtypes[ooptypes[i].types[3]].potential);
	else fputs("\n",DatF);
	}
	fputs("\n",DatF);
	}
	} else if (forcefield & FF_TYPE_OPLSAA) {
	if (no_oop_types > 0) {
	/* opls improper coeffs are like cvff: type K0 d(=-1) n(=2) */
	if (hintflag) fputs("Improper Coeffs # cvff\n\n",DatF);
	else fputs("Improper Coeffs\n\n",DatF);

	for (i=0; i < no_oop_types; i++) {
	fprintf(DatF,"%5i %10.4f %3i %3i ",i+1,
	ooptypes[i].params[0], (int) ooptypes[i].params[1],
	(int) ooptypes[i].params[2]);

	if (hintflag) fprintf(DatF,"# %s-%s-%s-%s\n",
	atomtypes[ooptypes[i].types[0]].potential,
	atomtypes[ooptypes[i].types[1]].potential,
	atomtypes[ooptypes[i].types[2]].potential,
	atomtypes[ooptypes[i].types[3]].potential);
	else fputs("\n",DatF);
	}
	fputs("\n",DatF);
	}
	} else if (forcefield & FF_TYPE_CLASS2) {
	if ((no_oop_types + no_angleangle_types) > 0) {
	if (hintflag) fputs("Improper Coeffs # class2\n\n",DatF);
	else fputs("Improper Coeffs\n\n",DatF);

	for (i=0; i < no_oop_types; i++) {
	fprintf(DatF, "%3i ", i+1);
	for ( j = 0; j < 2; j++)
	fprintf(DatF, "%10.4f ", ooptypes[i].params[j]);

	if (hintflag) fprintf(DatF,"# %s-%s-%s-%s\n",
	atomtypes[ooptypes[i].types[0]].potential,
	atomtypes[ooptypes[i].types[1]].potential,
	atomtypes[ooptypes[i].types[2]].potential,
	atomtypes[ooptypes[i].types[3]].potential);
	else fputs("\n",DatF);
	}
	for (i=0; i < no_angleangle_types; i++) {
	fprintf(DatF, "%3i ", i+no_oop_types+1);
	for ( j = 0; j < 2; j++)
	fprintf(DatF, "%10.4f ", 0.0);
	fputs("\n",DatF);
	}
	fputs("\n",DatF);
	}
	}

	if (forcefield & FF_TYPE_CLASS2) {

	if (no_angle_types > 0) {
	fprintf(DatF,"BondBond Coeffs\n\n");
	for (i=0; i < no_angle_types; i++) {
	fprintf(DatF, "%3i ", i+1);
	for ( j = 0; j < 3; j++)
	fprintf(DatF, "%10.4f ", angletypes[i].bondbond_cross_term[j]);
	fputs("\n",DatF);
	}
	fputs("\n",DatF);

	fprintf(DatF,"BondAngle Coeffs\n\n");

	for (i=0; i < no_angle_types; i++) {
	fprintf(DatF, "%3i ", i+1);
	for ( j = 0; j < 4; j++)
	fprintf(DatF, "%10.4f ",angletypes[i].bondangle_cross_term[j]);
	fputs("\n",DatF);
	}
	fputs("\n",DatF);
	}

	if ((no_oop_types+no_angleangle_types) > 0) {
	fprintf(DatF,"AngleAngle Coeffs\n\n");
	for (i=0; i < no_oop_types; i++) {
	fprintf(DatF, "%3i ", i+1);
	for ( j = 0; j < 6; j++)
	fprintf(DatF, "%10.4f ", ooptypes[i].angleangle_params[j]);
	fputs("\n",DatF);
	}
	for (i=0; i < no_angleangle_types; i++) {
	fprintf(DatF, "%3i ", i+no_oop_types+1);
	for ( j = 0; j < 6; j++)
	fprintf(DatF, "%10.4f ", angleangletypes[i].params[j]);
	fputs("\n",DatF);
	}
	fputs("\n",DatF);
	}

	if (no_dihedral_types > 0) {
	fprintf(DatF,"AngleAngleTorsion Coeffs\n\n");
	for (i=0; i < no_dihedral_types; i++) {
	fprintf(DatF, "%3i ", i+1);
	for ( j = 0; j < 3; j++)
	fprintf(DatF,"%10.4f ",
	dihedraltypes[i].angleangledihedral_cross_term[j]);
	fputs("\n",DatF);
	}
	fputs("\n",DatF);

	fprintf(DatF,"EndBondTorsion Coeffs\n\n");
	for (i=0; i < no_dihedral_types; i++) {
	fprintf(DatF, "%i ", i+1);
	for ( j = 0; j < 8; j++)
	fprintf(DatF, "%10.4f ",
	dihedraltypes[i].endbonddihedral_cross_term[j]);
	fputs("\n",DatF);
	}
	fputs("\n",DatF);

	fprintf(DatF,"MiddleBondTorsion Coeffs\n\n");
	for (i=0; i < no_dihedral_types; i++) {
	fprintf(DatF, "%3i ", i+1);
	for ( j = 0; j < 4; j++)
	fprintf(DatF,"%10.4f ",
	dihedraltypes[i].midbonddihedral_cross_term[j]);
	fputs("\n",DatF);
	}
	fputs("\n",DatF);


	fprintf(DatF,"BondBond13 Coeffs\n\n");
	for (i=0; i < no_dihedral_types; i++) {
	fprintf(DatF, "%3i ", i+1);
	for ( j = 0; j < 3; j++)
	fprintf(DatF, "%10.4f ",
	dihedraltypes[i].bond13_cross_term[j]);
	fputs("\n",DatF);
	}
	fputs("\n",DatF);

	fprintf(DatF,"AngleTorsion Coeffs\n\n");
	for (i=0; i < no_dihedral_types; i++) {
	fprintf(DatF, "%3i ", i+1);
	for ( j = 0; j < 8; j++)
	fprintf(DatF, "%10.4f ",
	dihedraltypes[i].angledihedral_cross_term[j]);
	fputs("\n",DatF);
	}
	fputs("\n",DatF);
	}
	}

	/--------------------------------------------------------------------/

	/* ATOMS */

	if (hintflag) fputs("Atoms # full\n\n",DatF);
	else fputs("Atoms\n\n",DatF);

	for(k=0; k < total_no_atoms; k++) {
	int typ = atoms[k].type;
	fprintf(DatF," %6i %6i %3i %9.6f %15.9f %15.9f %15.9f %3i %3i %3i",
	k+1,
	atoms[k].molecule,
	typ+1,
	atoms[k].q,
	atoms[k].x[0],
	atoms[k].x[1],
	atoms[k].x[2],
	atoms[k].image[0],
	atoms[k].image[1],
	atoms[k].image[2]);
	if (hintflag) fprintf(DatF," # %s\n",atomtypes[typ].potential);
	else fputs("\n",DatF);
	}
	fputs("\n",DatF);

	/*** BONDS ***/

	if (total_no_bonds > 0) {
	fprintf(DatF, "Bonds\n\n");
	for(k=0; k < total_no_bonds; k++)
	fprintf(DatF, "%6i %3i %6i %6i\n",k+1,
	bonds[k].type+1,
	bonds[k].members[0]+1,
	bonds[k].members[1]+1);
	fputs("\n",DatF);
	}

	/*** ANGLES ***/

	if (total_no_angles > 0) {
	fprintf(DatF, "Angles\n\n");
	for(k=0; k < total_no_angles; k++)
	fprintf(DatF, "%6i %3i %6i %6i %6i\n",k+1,
	angles[k].type+1,
	angles[k].members[0]+1,
	angles[k].members[1]+1,
	angles[k].members[2]+1);
	fputs("\n",DatF);
	}


	/*** TORSIONS ***/

	if (total_no_dihedrals > 0) {
	fprintf(DatF,"Dihedrals\n\n");
	for(k=0; k < total_no_dihedrals; k++)
	fprintf(DatF, "%6i %3i %6i %6i %6i %6i\n",k+1,
	dihedrals[k].type+1,
	dihedrals[k].members[0]+1,
	dihedrals[k].members[1]+1,
	dihedrals[k].members[2]+1,
	dihedrals[k].members[3]+1);
	fputs("\n",DatF);
	}

	/*** OUT-OF-PLANES ***/

	if (total_no_oops+total_no_angle_angles > 0) {
	fprintf(DatF,"Impropers\n\n");
	for (k=0; k < total_no_oops; k++)
	fprintf(DatF, "%6i %3i %6i %6i %6i %6i \n", k+1,
	oops[k].type+1,
	oops[k].members[0]+1,
	oops[k].members[1]+1,
	oops[k].members[2]+1,
	oops[k].members[3]+1);
	if (forcefield & FF_TYPE_CLASS2) {
	for (k=0; k < total_no_angle_angles; k++)
	fprintf(DatF, "%6i %3i %6i %6i %6i %6i \n",k+total_no_oops+1,
	angleangles[k].type+no_oop_types+1,
	angleangles[k].members[0]+1,
	angleangles[k].members[1]+1,
	angleangles[k].members[2]+1,
	angleangles[k].members[3]+1);
	}
	fputs("\n",DatF);
	}

	/* Close data file */

	if (fclose(DatF) !=0) {
	printf("Error closing %s.lammps05\n", rootname);
	exit(61);
	}
	}

	diff --git a/tools/msi2lmp/src/msi2lmp.c b/tools/msi2lmp/src/msi2lmp.c
	index c94d4b4d7..15cfddd25 100644
	--- a/tools/msi2lmp/src/msi2lmp.c
	+++ b/tools/msi2lmp/src/msi2lmp.c
	@@ -1,442 +1,439 @@
	/*
	*
	* msi2lmp.exe
	*
	* v3.9.8 AK- Improved whitespace handling in parsing topology and force
	* field files to avoid bogus warnings about type name truncation
	*
	* v3.9.7 AK- Add check to enforce that Class1/OPLS-AA use A-B parameter
	* conventions in force field file and Class2 us r-eps conventions
	*
	* v3.9.6 AK- Refactoring of MDF file parser with more consistent
	* handling of compile time constants MAX_NAME and MAX_STRING
	*
	* v3.9.5 AK- Add TopoTools style force field parameter type hints
	*
	* v3.9.4 AK- Make force field style hints optional with a flag
	*
	* v3.9.3 AK- Bugfix for triclinic cells.
	*
	* v3.9.2 AK- Support for writing out force field style hints
	*
	* v3.9.1 AK- Bugfix for Class2. Free allocated memory. Print version number.
	*
	* v3.9 AK - Rudimentary support for OPLS-AA
	*
	* v3.8 AK - Some refactoring and cleanup of global variables
	* - Bugfixes for argument parsing and improper definitions
	* - improved handling of box dimensions and image flags
	* - port to compiling on windows using MinGW
	* - more consistent print level handling
	* - more consistent handling of missing parameters
	* - Added a regression test script with examples.
	*
	* V3.7 STM - Added support for triclinic cells
	*
	* v3.6 KLA - Changes to output to either lammps 2001 (F90 version) or to
	* lammps 2005 (C++ version)
	*
	* v3.4 JEC - a number of minor changes due to way newline and EOF are generated
	* on Materials Studio generated .car and .mdf files as well as odd
	* behavior out of newer Linux IO libraries. ReadMdfFile was restructured
	* in the process.
	*
	* v3.1 JEC - changed IO interface to standard in/out, forcefield file
	* location can be indicated by environmental variable; added
	* printing options, consistency checks and forcefield
	* parameter versions sensitivity (highest one used)
	*
	* v3.0 JEC - program substantially rewritten to reduce execution time
	* and be 98 % dynamic in memory use (still fixed limits on
	* number of parameter types for different internal coordinate
	* sets)
	*
	* v2.0 MDP - got internal coordinate information from mdf file and
	* forcefield parameters from frc file thus eliminating
	* need for Discover
	*
	* V1.0 SL - original version. Used .car file and internal coordinate
	* information from Discover to produce LAMMPS data file.
	*
	* This program uses the .car and .mdf files from MSI/Biosyms's INSIGHT
	* program to produce a LAMMPS data file.
	*
	* The program is started by supplying information at the command prompt
	* according to the usage described below.
	*
	* USAGE: msi2lmp3 ROOTNAME {-print #} {-class #} {-frc FRC_FILE} {-ignore} {-nocenter} {-oldstyle}
	*
	* -- msi2lmp3 is the name of the executable
	* -- ROOTNAME is the base name of the .car and .mdf files
	* -- all opther flags are optional and can be abbreviated (e.g. -p instead of -print)
	*
	* -- -print
	* # is the print level: 0 - silent except for errors
	* 1 - minimal (default)
	* 2 - more verbose
	* 3 - even more verbose
	* -- -class
	* # is the class of forcefield to use (I or 1 = Class I e.g., CVFF, clayff)
	* (II or 2 = Class II e.g., CFFx, COMPASS)
	* (O or 0 = OPLS-AA)
	* default is -class I
	*
	* -- -ignore - tells msi2lmp to ignore warnings and errors and keep going
	*
	* -- -nocenter - tells msi2lmp to not center the box around the (geometrical)
	* center of the atoms, but around the origin
	*
	* -- -oldstyle - tells msi2lmp to write out a data file without style hints
	* (to be compatible with older LAMMPS versions)
	*
	* -- -shift - tells msi2lmp to shift the entire system (box and coordinates)
	* by a vector (default: 0.0 0.0 0.0)
	*
	* -- -frc - specifies name of the forcefield file (e.g., cff91)
	*
	* If the name includes a hard wired directory (i.e., if the name
	* starts with . or /), then the name is used alone. Otherwise,
	* the program looks for the forcefield file in $MSI2LMP_LIBRARY.
	* If $MSI2LMP_LIBRARY is not set, then the current directory is
	* used.
	*
	* If the file name does not include a dot after the first
	* character, then .frc is appended to the name.
	*
	* For example, -frc cvff (assumes cvff.frc is in $MSI2LMP_LIBRARY
	* or .)
	*
	* -frc cff/cff91 (assumes cff91.frc is in
	* $MSI2LMP_LIBRARY/cff or ./cff)
	*
	* -frc /usr/local/forcefields/cff95 (absolute
	* location)
	*
	* By default, the program uses $MSI2LMP_LIBRARY/cvff.frc
	*
	* -- output is written to a file called ROOTNAME.data
	*
	*
	****************************************************************
	*
	* msi2lmp
	*
	* This is the third version of a program that generates a LAMMPS
	* data file based on the information in a MSI car file (atom
	* coordinates) and mdf file (molecular topology). A key part of
	* the program looks up forcefield parameters from an MSI frc file.
	*
	* The first version was written by Steve Lustig at Dupont, but
	* required using Discover to derive internal coordinates and
	* forcefield parameters
	*
	* The second version was written by Michael Peachey while an
	* in intern in the Cray Chemistry Applications Group managed
	* by John Carpenter. This version derived internal coordinates
	* from the mdf file and looked up parameters in the frc file
	* thus eliminating the need for Discover.
	*
	* The third version was written by John Carpenter to optimize
	* the performance of the program for large molecular systems
	* (the original code for deriving atom numbers was quadratic in time)
	* and to make the program fully dynamic. The second version used
	* fixed dimension arrays for the internal coordinates.
	*
	-* John Carpenter can be contacted by sending email to
	-* jec374@earthlink.net
	-*
	* November 2000
	*/

	#include "msi2lmp.h"

	#include <stdlib.h>
	#include <string.h>
	#include <ctype.h>

	/* global variables */

	char *rootname;
	double pbc[6];
	double box[3][3];
	double shift[3];
	int periodic = 1;
	int TriclinicFlag = 0;
	int forcefield = 0;
	int centerflag = 1;
	int hintflag = 1;
	int ljtypeflag = 0;

	int pflag;
	int iflag;
	int *no_atoms;
	int no_molecules;
	int replicate[3];
	int total_no_atoms = 0;
	int total_no_bonds = 0;
	int total_no_angles = 0;
	int total_no_dihedrals = 0;
	int total_no_angle_angles = 0;
	int total_no_oops = 0;
	int no_atom_types = 0;
	int no_bond_types = 0;
	int no_angle_types = 0;
	int no_dihedral_types = 0;
	int no_oop_types = 0;
	int no_angleangle_types = 0;
	char *FrcFileName = NULL;
	FILE *CarF = NULL;
	FILE *FrcF = NULL;
	FILE *PrmF = NULL;
	FILE *MdfF = NULL;
	FILE *RptF = NULL;

	struct Atom *atoms = NULL;
	struct MoleculeList *molecule = NULL;
	struct BondList *bonds = NULL;
	struct AngleList *angles = NULL;
	struct DihedralList *dihedrals = NULL;
	struct OOPList *oops = NULL;
	struct AngleAngleList *angleangles = NULL;
	struct AtomTypeList *atomtypes = NULL;
	struct BondTypeList *bondtypes = NULL;
	struct AngleTypeList *angletypes = NULL;
	struct DihedralTypeList *dihedraltypes = NULL;
	struct OOPTypeList *ooptypes = NULL;
	struct AngleAngleTypeList *angleangletypes = NULL;

	void condexit(int val)
	{
	if (iflag == 0) exit(val);
	}

	static int check_arg(char *arg, const char flag, int num, int argc)
	{
	if (num >= argc) {
	printf("Missing argument to \"%s\" flag\n",flag);
	return 1;
	}
	if (arg[num][0] == '-') {
	printf("Incorrect argument to \"%s\" flag: %s\n",flag,arg[num]);
	return 1;
	}
	return 0;
	}

	int main (int argc, char *argv[])
	{
	int n,i,found_sep;
	const char *frc_dir_name = NULL;
	const char *frc_file_name = NULL;

	pflag = 1;
	iflag = 0;
	forcefield = FF_TYPE_CLASS1 \| FF_TYPE_COMMON;
	shift[0] = shift[1] = shift[2] = 0.0;

	frc_dir_name = getenv("MSI2LMP_LIBRARY");

	if (argc < 2) {
	printf("usage: %s <rootname> [-class <I\|1\|II\|2>] [-frc <path to frc file>] [-print #] [-ignore] [-nocenter] [-oldstyle]\n",argv[0]);
	return 1;
	} else { /* rootname was supplied as first argument, copy to rootname */
	int len = strlen(argv[1]) + 1;
	rootname = (char *)malloc(len);
	strcpy(rootname,argv[1]);
	}

	n = 2;
	while (n < argc) {
	if (strncmp(argv[n],"-c",2) == 0) {
	n++;
	if (check_arg(argv,"-class",n,argc))
	return 2;
	if ((strcmp(argv[n],"I") == 0) \|\| (strcmp(argv[n],"1") == 0)) {
	forcefield = FF_TYPE_CLASS1 \| FF_TYPE_COMMON;
	} else if ((strcmp(argv[n],"II") == 0) \|\| (strcmp(argv[n],"2") == 0)) {
	forcefield = FF_TYPE_CLASS2 \| FF_TYPE_COMMON;
	} else if ((strcmp(argv[n],"O") == 0) \|\| (strcmp(argv[n],"0") == 0)) {
	forcefield = FF_TYPE_OPLSAA \| FF_TYPE_COMMON;
	} else {
	printf("Unrecognized Forcefield class: %s\n",argv[n]);
	return 3;
	}
	} else if (strncmp(argv[n],"-f",2) == 0) {
	n++;
	if (check_arg(argv,"-frc",n,argc))
	return 4;
	frc_file_name = argv[n];
	} else if (strncmp(argv[n],"-s",2) == 0) {
	if (n+3 > argc) {
	printf("Missing argument(s) to \"-shift\" flag\n");
	return 1;
	}
	shift[0] = atof(argv[++n]);
	shift[1] = atof(argv[++n]);
	shift[2] = atof(argv[++n]);
	} else if (strncmp(argv[n],"-i",2) == 0 ) {
	iflag = 1;
	} else if (strncmp(argv[n],"-n",4) == 0 ) {
	centerflag = 0;
	} else if (strncmp(argv[n],"-o",2) == 0 ) {
	hintflag = 0;
	} else if (strncmp(argv[n],"-p",2) == 0) {
	n++;
	if (check_arg(argv,"-print",n,argc))
	return 5;
	pflag = atoi(argv[n]);
	} else {
	printf("Unrecognized option: %s\n",argv[n]);
	return 6;
	}
	n++;
	}

	/* set defaults, if nothing else was given */
	if (frc_dir_name == NULL)
	#if (_WIN32)
	frc_dir_name = "..\\frc_files";
	#else
	frc_dir_name = "../frc_files";
	#endif
	if (frc_file_name == NULL)
	frc_file_name = "cvff.frc";

	found_sep=0;
	#ifdef _WIN32
	if (isalpha(frc_file_name[0]) && (frc_file_name[1] == ':'))
	found_sep=1; /* windows drive letter => full path. */
	#endif

	n = strlen(frc_file_name);
	for (i=0; i < n; ++i) {
	#ifdef _WIN32
	if ((frc_file_name[i] == '/') \|\| (frc_file_name[i] == '\\'))
	found_sep=1+i;
	#else
	if (frc_file_name[i] == '/')
	found_sep=1+i;
	#endif
	}

	/* full pathname given */
	if (found_sep) {
	i = 0;
	/* need to append extension? */
	if ((n < 5) \|\| (strcmp(frc_file_name+n-4,".frc") !=0))
	i=1;

	FrcFileName = (char )malloc(n+1+i4);
	strcpy(FrcFileName,frc_file_name);
	if (i) strcat(FrcFileName,".frc");
	} else {
	i = 0;
	/* need to append extension? */
	if ((n < 5) \|\| (strcmp(frc_file_name+n-4,".frc") !=0))
	i=1;

	FrcFileName = (char )malloc(n+2+i4+strlen(frc_dir_name));
	strcpy(FrcFileName,frc_dir_name);
	#ifdef _WIN32
	strcat(FrcFileName,"\\");
	#else
	strcat(FrcFileName,"/");
	#endif
	strcat(FrcFileName,frc_file_name);
	if (i) strcat(FrcFileName,".frc");
	}


	if (pflag > 0) {
	puts("\nRunning msi2lmp " MSI2LMP_VERSION "\n");
	if (forcefield & FF_TYPE_CLASS1) puts(" Forcefield: Class I");
	if (forcefield & FF_TYPE_CLASS2) puts(" Forcefield: Class II");
	if (forcefield & FF_TYPE_OPLSAA) puts(" Forcefield: OPLS-AA");
	printf(" Forcefield file name: %s\n",FrcFileName);
	if (centerflag) puts(" Output is recentered around geometrical center");
	if (hintflag) puts(" Output contains style flag hints");
	else puts(" Style flag hints disabled");
	- printf(" System translated by: %g %g %g\n",shift[0],shift[1],shift[2]);
	+ printf(" System translated by: %g %g %g\n",shift[0],shift[1],shift[2]);
	}

	n = 0;
	if (forcefield & FF_TYPE_CLASS1) {
	if (strstr(FrcFileName,"cvff") != NULL) ++n;
	if (strstr(FrcFileName,"clayff") != NULL) ++n;
	} else if (forcefield & FF_TYPE_OPLSAA) {
	if (strstr(FrcFileName,"oplsaa") != NULL) ++n;
	} else if (forcefield & FF_TYPE_CLASS2) {
	if (strstr(FrcFileName,"pcff") != NULL) ++n;
	if (strstr(FrcFileName,"cff91") != NULL) ++n;
	if (strstr(FrcFileName,"compass") != NULL) ++n;
	}

	if (n == 0) {
	if (iflag > 0) fputs(" WARNING",stderr);
	else fputs(" Error ",stderr);
	-
	+
	fputs("- forcefield name and class appear to be inconsistent\n\n",stderr);
	if (iflag == 0) return 7;
	}

	/* Read in .car file */
	ReadCarFile();

	/Read in .mdf file /

	ReadMdfFile();

	/* Define bonds, angles, etc...*/

	if (pflag > 0)
	printf("\n Building internal coordinate lists \n");
	MakeLists();

	/* Read .frc file into memory */

	if (pflag > 0)
	printf("\n Reading forcefield file \n");
	ReadFrcFile();

	/* Get forcefield parameters */

	if (pflag > 0)
	printf("\n Get force field parameters for this system\n");
	GetParameters();

	/* Do internal check of internal coordinate lists */
	if (pflag > 0)
	printf("\n Check parameters for internal consistency\n");
	CheckLists();

	/* Write out the final data */
	WriteDataFile(rootname);

	/* free up memory to detect possible memory corruption */
	free(rootname);
	free(FrcFileName);
	ClearFrcData();

	for (n=0; n < no_molecules; n++) {
	free(molecule[n].residue);
	}

	free(no_atoms);
	free(molecule);
	free(atoms);
	free(atomtypes);
	if (bonds) free(bonds);
	if (bondtypes) free(bondtypes);
	if (angles) free(angles);
	if (angletypes) free(angletypes);
	if (dihedrals) free(dihedrals);
	if (dihedraltypes) free(dihedraltypes);
	if (oops) free(oops);
	if (ooptypes) free(ooptypes);
	if (angleangles) free(angleangles);
	if (angleangletypes) free(angleangletypes);

	if (pflag > 0)
	printf("\nNormal program termination\n");
	return 0;
	}
	diff --git a/tools/msi2lmp/src/msi2lmp.h b/tools/msi2lmp/src/msi2lmp.h
	index 377ab1a6c..4716f719d 100644
	--- a/tools/msi2lmp/src/msi2lmp.h
	+++ b/tools/msi2lmp/src/msi2lmp.h
	@@ -1,228 +1,228 @@
	/********************************
	*
	* Header file for msi2lmp conversion program.
	*
	* This is the header file for the third version of a program
	* that generates a LAMMPS data file based on the information
	* in an MSI car file (atom coordinates) and mdf file (molecular
	* topology). A key part of the program looks up forcefield parameters
	* from an MSI frc file.
	*
	* The first version was written by Steve Lustig at Dupont, but
	* required using Discover to derive internal coordinates and
	* forcefield parameters
	*
	* The second version was written by Michael Peachey while an
	* intern in the Cray Chemistry Applications Group managed
	* by John Carpenter. This version derived internal coordinates
	* from the mdf file and looked up parameters in the frc file
	* thus eliminating the need for Discover.
	*
	* The third version was written by John Carpenter to optimize
	* the performance of the program for large molecular systems
	* (the original code for derving atom numbers was quadratic in time)
	* and to make the program fully dynamic. The second version used
	* fixed dimension arrays for the internal coordinates.
	*
	-* The thrid version was revised in Fall 2011 by
	+* The third version was revised in Fall 2011 by
	* Stephanie Teich-McGoldrick to add support non-orthogonal cells.
	*
	* The next revision was started in Summer/Fall 2013 by
	* Axel Kohlmeyer to improve portability to Windows compilers,
	* clean up command line parsing and improve compatibility with
	-* the then current LAMMPS versions. This revision removes
	+* the then current LAMMPS versions. This revision removes
	* compatibility with the obsolete LAMMPS version written in Fortran 90.
	*/

	# include <stdio.h>

	#define MSI2LMP_VERSION "v3.9.8 / 06 Oct 2016"

	#define PI_180 0.01745329251994329576

	#define MAX_LINE_LENGTH 256
	#define MAX_CONNECTIONS 8
	#define MAX_STRING 64
	#define MAX_NAME 16

	#define WHITESPACE " \t\r\n\f"

	#define MAX_ATOM_TYPES 100
	#define MAX_BOND_TYPES 200
	#define MAX_ANGLE_TYPES 300
	#define MAX_DIHEDRAL_TYPES 400
	#define MAX_OOP_TYPES 400
	#define MAX_ANGLEANGLE_TYPES 400
	#define MAX_TYPES 12000

	#define FF_TYPE_COMMON 1<<0
	#define FF_TYPE_CLASS1 1<<1
	#define FF_TYPE_CLASS2 1<<2
	#define FF_TYPE_OPLSAA 1<<3

	struct ResidueList {
	int start;
	int end;
	char name[MAX_NAME];
	};

	struct MoleculeList {
	int start;
	int end;
	int no_residues;
	struct ResidueList *residue;
	};

	/* Internal coodinate Lists */

	struct BondList {
	int type;
	int members[2];
	};

	struct AngleList {
	int type;
	int members[3];
	};

	struct DihedralList {
	int type;
	int members[4];
	};

	struct OOPList {
	int type;
	int members[4];
	};

	struct AngleAngleList {
	int type;
	int members[4];
	};

	/* Internal coodinate Types Lists */


	struct AtomTypeList
	{
	char potential[5];
	double mass;
	double params[2];
	int no_connect;
	};

	struct BondTypeList {
	int types[2];
	double params[4];
	};

	struct AngleTypeList {
	int types[3];
	double params[4];
	double bondangle_cross_term[4];
	double bondbond_cross_term[3];
	};

	struct DihedralTypeList {
	int types[4];
	double params[6];
	double endbonddihedral_cross_term[8];
	double midbonddihedral_cross_term[4];
	double angledihedral_cross_term[8];
	double angleangledihedral_cross_term[3];
	double bond13_cross_term[3];
	};

	struct OOPTypeList {
	int types[4];
	double params[3];
	double angleangle_params[6];
	};

	struct AngleAngleTypeList {
	int types[4];
	double params[6];
	};

	/* ---------------------------------------------- */

	struct Atom {
	int molecule; /* molecule id */
	int no; /* atom id */
	char name[MAX_NAME]; /* atom name */
	double x[3]; /* position vector */
	int image[3]; /* image flag */
	char potential[6]; /* atom potential type */
	char element[4]; /* atom element */
	double q; /* charge */
	char residue_string[MAX_NAME]; /* residue string */
	int no_connect; /* number of connections to atom */
	char connections[MAX_CONNECTIONS][MAX_STRING]; /* long form, connection name*/
	double bond_order[MAX_CONNECTIONS];
	int conn_no[MAX_CONNECTIONS]; /* Atom number to which atom is connected */
	int type;
	};

	extern char *rootname;
	extern char *FrcFileName;
	extern double pbc[6]; /* A, B, C, alpha, beta, gamma */
	extern double box[3][3]; /* hi/lo for x/y/z and xy, xz, yz for triclinic */
	extern double shift[3]; /* shift vector for all coordinates and box positions */
	extern int periodic; /* 0= nonperiodic 1= 3-D periodic */
	extern int TriclinicFlag; /* 0= Orthogonal 1= Triclinic */
	extern int forcefield; /* BitMask: the value FF_TYPE_COMMON is set for common components of the options below,
	* FF_TYPE_CLASS1 = ClassI, FF_TYPE_CLASS2 = ClassII, FF_TYPE_OPLSAA = OPLS-AA*/
	extern int ljtypeflag; /* how LJ parameters are stored: 0 = A-B, 1 = r-eps */
	extern int centerflag; /* 1= center box 0= keep box */
	extern int hintflag; /* 1= print style hint comments 0= no hints */
	extern int pflag; /* print level: 0, 1, 2, 3 */
	extern int iflag; /* 0 stop at errors 1 = ignore errors */
	extern int *no_atoms;
	extern int no_molecules;
	extern int replicate[3];
	extern int total_no_atoms;
	extern int total_no_bonds;
	extern int total_no_angles;
	extern int total_no_dihedrals;
	extern int total_no_angle_angles;
	extern int total_no_oops;
	extern int no_atom_types;
	extern int no_bond_types;
	extern int no_angle_types;
	extern int no_dihedral_types;
	extern int no_oop_types;
	extern int no_angleangle_types;
	extern FILE *CarF;
	extern FILE *FrcF;
	extern FILE *PrmF;
	extern FILE *MdfF;
	extern FILE *RptF;
	extern struct Atom *atoms;
	extern struct MoleculeList *molecule;
	extern struct BondList *bonds;
	extern struct AngleList *angles;
	extern struct DihedralList *dihedrals;
	extern struct OOPList *oops;
	extern struct AngleAngleList *angleangles;
	extern struct AtomTypeList *atomtypes;
	extern struct BondTypeList *bondtypes;
	extern struct AngleTypeList *angletypes;
	extern struct DihedralTypeList *dihedraltypes;
	extern struct OOPTypeList *ooptypes;
	extern struct AngleAngleTypeList *angleangletypes;

	extern void FrcMenu();
	extern void ReadCarFile();
	extern void ReadMdfFile();
	extern void ReadFrcFile();
	extern void ClearFrcData();
	extern void MakeLists();
	extern void GetParameters();
	extern void CheckLists();
	extern void WriteDataFile(char *);

	extern void set_box(double box[3][3], double h, double h_inv);
	extern void lamda2x(double lamda, double x, double h, double boxlo);
	extern void x2lamda(double x, double lamda, double h_inv, double boxlo);

	extern void condexit(int);

No OneTemporaryActions

File Metadata

View Options

Event Timeline

No OneTemporary
Actions